Table of Contents
 1. Introduction
 2. Perspective coordinates
 2.1 Projection in XZ plane and YZ plane
 2.2 View plane distance d
 2.3 Normalized perspectice coordinates (d = 1)
 2.4 View plane perspective coordinates
 3. Screen coordinates
 3.1 Screen coordinates from normalized perspective coordinates
 3.2 Screen coordinates from view plane perspective coordinates
 4. Summary
1. Introduction
Here you are reading my personal notes I made for myself while trying to understand the socalled cameratoperspective and subsequent perspectivetoscreen transformation.
So what's this all about? This is one of the fundamentals of every 3D engine. Assume you have a point in the threedimensional space, defined by its coordinates x, y and z, and you want to project it onto your screen, depending on your current location and direction of view. How do you do this?
If you have asked yourself this question, then you are at the right location!
I always like to start new topics in a very primitive manner, to really get a basic understanding (or feeling) for this topic. Therefore, following definitions and simplifications are used in the following explanations:
 A LHS (left handed coordinate system) used, so the xaxis goes to right, yaxis to top and zaxis 'into the screen'.
 The camera, also called view point, is located at the origin of the coordinate system. It is looking straight down the zaxis.
 Rotation, viewing frustum and clipping planes are not considered  only the fundamental process to calculate the perspective and screen coordinates is examined.
Above diagram shows the used left handed coordinate system.
The camera is located at the origin (0, 0, 0) and has the direction ("looks at") straight down the positive zaxis.
The view plane is an imaginery plane which is parallel to the xaxis and yaxis and perpendicular to the zaxis. A rectangluar portion of this plane with a defined width and height forms the screen. The view plane has distance d from the origin.
The goal is to project point p from three dimensional space onto the view plane and calculate the final two dimensional point on the screen.
The whole process is split into two steps:
 Perspective transformation: This projects the point (based on camera coordinates) onto the view plane. Note that the perspective coordinate is not equal to the coordinate on our screen.
The camera point p is transformed to perspective coordinate pv.  Screen transformation: Transforms the perspective coordinate to the actual screen coordinate.
The perspective coordinate pv is transformed to screen coordinate ps.
2. Perspective coordinates
The easiest approach to retrieve the formulas for perspective projection is to analyze the geometry for the 2D case, once for the xz plane and once for the yz plane and finally combine the retrieved information to get the overall projection formulas.
2.1 Projection in XZ plane and YZ plane
On the left side, the xz plane of the coordinate system is shown. This can be imagined as we look at it from the top, right along from the positive yaxis to the origin.
The location of point pv can be calculated easily with the basic geometry, using the proportions of similar triangles:
pv_{x} / p_{x} = pv_{z} / p_{z}
Rearraning the formula to separate pv_{x} and knowing that pv_{z} = d results in:
pv_{x} = d * p_{x} / p_{z}
On the right side, the yz plane is shown. Similar to the case above, the projected point onto the view plane is calculated using the proportions of similar triangles:
pv_{y} / p_{y} = pv_{z} / p_{z}
Rearraning the formula to separate pv_{y} and knowing that pv_{z} = d results in:
pv_{y} = d * p_{y} / p_{z}
pv_{x} = d * p_{x} / p_{z}
pv_{y} = d * p_{y} / p_{z}
where p is the 3D point in camera coordinates, pv the projected point p onto the view plane and d is the distance from the origin to the view plane.
Well, unfortunately it is not that easy, as the distance d and the aspect ratio ar. So let's see ...
2.2 View plane distance d
The question arises how to choose the distance d to the view plane? The answer is actually quite simple: You can choose it whatever you want. Of course, it must somehow match with the logical unit used with for your camera coordinates.
In practice, one of the following two methods are the simplested to use:
 Set d = 1. The perspective coordinates are then called normalized perspective coordinates.
 Calculate d from the field of view to match the screen size. I call then the perspective coordinates as view plane perspective coordinates. This is my preferred option.
Before examining both perspective projection types, at first the field of view (FOV) needs to be shortly discussed:
The field of view (FOV) is the extent of the observable game world, normally measured as an angle. In the diagram above, the FOV Θ is 90 degrees (90°). Obviously, there is the following geometric relation between FOV Θ, distance d of the view plane and width w of the view plane:
tan(Θ/2) = (w/2)/d.
Note that for a FOV of 90°, tan(Θ/2) is 1. This implies that w/2 = d.
2.3 Normalized perspectice coordinates (d = 1)
An easy way is to strictly define d = 1. On the one hand, this simplifies the perspective transformation formulas to pv_{x} = p_{x} / p_{z} and pv_{y} = p_{y} / p_{z}.
On the other hand, when assuming the common field of view of 90°, both sides of the triangle have equal length, thus d = w/2.
Note: For further discussion of normalized perspectice coordinates, only a FOV of 90° is used. Of course it is possible to use another FOV, but then the range of the calculated perspectice coordinates will change which needs to be considered by an additional factor in the screen transformation. For simplicity, this is not considered here.
Using a FOV of 90°, the perspective coordinates fall into the range [1, 1]. This is apparant when considering the p_{x} and p_{z} coordinates:
If p_{x} = p_{z}, then the point lies directly on the field of view lines (grey lines in diagram above), so the point is mappted to pv_{x} = 1 if p_{x} = 1 respectively to pv_{x} = 1 if p_{x} = 1.
If p_{x} > p_{z}, then the point lies outside the field of view and is not projected onto the view plane.
If p_{x} < p_{z}, then the point is inside the field of view cone so p_{x} / p_{z} < 1.
The same applies for p_{y}, however only when using square projection view with equal width and height. In general, screens have a nonsquare area where normally the width is larger than the height. Here comes the aspect ratio (ar) into play which is the ratio between screen width and screen height and is defined as ar = width / height.
For example, a screen with size 1920 x 1080 has an aspect ratio of ar = 1920 / 1080 = 1,77 (or 16:9).
So the ycoordinates are not mapped into range [1, 1] but to [1/ar, 1/ar]. This can be achieved by multiplying pv_{y} with ar during the perspective transformation.
So considering the aspect ratio, the perspective transformation formulas are:
pv_{x} = p_{x} / p_{z} and
pv_{y} = p_{y} * ar / p_{z}.
The aspect ratio can be tricky. It is important to consider it at any step when using normalized coordinates. I think it's easier to handle it as factor in the perspective transformation than in a later step.
2.4 View plane perspective coordinates
Another way is to calculate the distance d so that the view plane size matches with the target screen size. So distance d is calculated from the screen width and the FOV Θ.
Above, the formula tan(Θ/2) = (w / 2) / d was derived. Solving for d results in:
d = 0.5 * w / tan(Θ/2).
The remaining perspective transformation is the same as already derived above:
pv_{x} = p_{x} * d / p_{z}
pv_{y} = p_{y} * d / p_{z}.
Interestingly, the aspect ratio needs not to be considered here when using view plane perspective coordinates. I am not absolutely sure why it is like that, but I assume considering the screen size for calculating the distance and no multiplication is required in the screen transformation, you get the aspect ratio handling for free :)
3. Screen coordinates
After having calculated the perspective coordinates from the camera coordinates, the final step is to retrieve the actual screen coordinates. Following two diagrams show the needed transformation:
There are two points that needs to be considered:
Point 1: The yaxis is inverted. It goes from top to bottom on the screen instead of from bottom to top as in the perspective coordinate system.
Point 2: The center (0,0) is the middle of the view plane in the perspective coordinate system while it is in the top left corner in the screen coordinate system.
The exact transformation depends again on the fact if normalized perspective coordinates (d = 1) or view plane perspective coordinates (arbitrary d to map view plane) are used. Note also that the screen width ranges from [0, ScreenWidth  1] and screen height from [0, ScreenHeight 1 ].
3.1 Screen coordinates from normalized perspective coordinates
The perspective coordinates ranges needs to be mapped to screen coordinates ranges:
X: [1, 1] > [0, ScreenWidth  1]
Y: [1 * ar, 1 * ar] > [0, ScreenHeight  1]
In xdirection, at first one is added to x to change the range from [1, 1] to [0, 2]. Then x is multiplied by half of the screen width including rounding, resulting in:
x = (x + 1) * ((ScreenWidth  1) / 2)
= (x + 1) * (0.5*ScreenWidth  0.5).
In ydirection, it is not necessary to handle the aspect ratio range value explicitely  it is only important to consider the aspect ratio once in the process, and this has been already done in the perspective transformation.
The y coordinate has to be inverted, this can be done by y = (ScreenHeight  1)  y. Then, like in the xdirection case, one is added to shift the range starting at zero and the coordinate need to be scaled by half of the screen height considering rounding. This results in:
y = (ScreenHeight  1)  (y + 1) * ((ScreenHeight  1) / 2)
= (ScreenHeight  1)  (y * ((ScreenHeight  1) / 2) + ((ScreenHeight  1) / 2))
= (ScreenHeight  1)  (y * (0.5*ScreenHeight  0.5) + (0.5*ScreenHeight  0.5)
= ScreenHeight  1  y * (0.5*ScreenHeight  0.5)  0.5*ScreenHeight + 0.5
=  y * (0.5*ScreenHeight  0.5) + (0.5*ScreenHeight + 0.5).
3.2 Screen coordinates from view plane perspective coordinates
The perspective coordinates ranges needs to be mapped to screen coordinates ranges:
X: [(ScreenWidth  1)/2, (ScreenWidth  1)/2] > [0, ScreenWidth  1]
Y: [(ScreenHeight  1)/2, (ScreenHeight  1)/2] > [0, ScreenHeight  1]
In xdirection, this is a simple translation by half of the screen width, thus:
x = x + ((ScreenWidth  1) / 2)
= x + (0.5*ScreenWidth  0.5)
In ydirection, the translation is by half of the screen height, so y = y + ((ScreenHeight  1) / 2). Further, the y coordinate has to be inverted, this can be done by y = (ScreenHeight  1)  y. Put everthing together, this results in:
y = (ScreenHeight  1)  (y + ((ScreenHeight  1) / 2))
= ScreenHeight  1  y  ((ScreenHeight  1) / 2)
= ScreenHeight  1  y  0.5*ScreenHeight + 0.5
= 0.5*ScreenHeight  0.5  y
= y + ((ScreenHeight  1) / 2).
4. Summary
Summarized, the camera  perspective  screen transformation is as follows, depending on the used projection coordinate mode.

Cameratoperspective transformation (p > pv):
pv_{x} = p_{x} / p_{z}
pv_{y} = p_{y} * ar / p_{z}

Perspectivetoscreen transformation (pv > ps):
ps_{x} = (pv_{x} + 1) * (0.5*ScreenWidth  0.5)
ps_{y} = p_{y} * (0.5*ScreenHeight  0.5) + (0.5*ScreenHeight + 0.5)
View plane perspective coordinates:

Cameratoperspective transformation (p > pv):
pv_{x} = p_{x} * d / p_{z}
pv_{y} = p_{y} * d / p_{z}

Perspectivetoscreen transformation (pv > ps):
ps_{x} = pv_{x} + (0.5*ScreenWidth  0.5)
ps_{y} = pv_{y} + (0.5*ScreenHeight + 0.5)
That's it! Hope it was interesting for you and have fun and learned something! Keep coding!
Sunshine, October 2020
History
 2020/10/01: Initial version.