Identiﬁcation of Location and Camera Parameters for Public Live Streaming Web Cameras

: Public live streaming web cameras are quite common now and widely used by drivers for qualitative analysis of trafﬁc conditions. At the same time, they can be a valuable source of quantita-tive information on transport ﬂows and speed for the development of urban trafﬁc models. However, to obtain reliable data from raw video streams, it is necessary to preprocess them, considering the camera location and parameters without direct access to the camera. Here we suggest a procedure for estimating camera parameters, which allows us to determine pixel coordinates for a point cloud in the camera’s view ﬁeld and transform them into metric data. They are used with advanced moving object detection and tracking for measurements. MSC: bim cim collinear


Introduction
There are many ways to measure traffic [1][2][3]. The most common in the practice of road services are radars combined with video cameras and other sensors. The measuring complexes are above or at the edge of the road. Inductive sensors are the least dependent on weather conditions and lighting. The listed surveillance tools assume installation by the road or on the road. Providers of mobile navigator applications receive data about the car's movement from the sensors of mobile devices. Autonomous cars collect environmental information using various sensors, including multiview cameras, lidars, and radars. The results of the data collection systems of road services, operators of navigators, and autonomous cars are usually not available to third-party researchers. Notably, pioneering traffic analysis work describes processing video data recorded on film [3] (pp. 3-10).
Operators worldwide install public web cameras, many of which look at city highways; e.g., there are more than a hundred similar cameras available in Vladivostok.
A transport model verification requires actual and accurate data on transport traffic, covering a wide range of time intervals with substantial transport activity. Public livestreaming cameras can be a good and easily accessible source of data for that kind of research. Of course, this accessibility is relative. Video processing involves storing and processing large amounts of data.
The ref. [4] demonstrates road traffic statistics collection from a public camera video, where the camera has little perspective and radial distortions in the region of interest in the road (Region Of Interest, ROI). However, the distortions make significant changes in images for the majority of public cameras. The current article generalizes this approach to the case where a camera has essential radial and perspective distortions.
Street camera owners usually do not announce camera parameters (focal length, radial distortion coefficients, camera position, orientation). Standard calibration procedures with [R|t] =   r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 r 33 t z   .
The triangular matrix A contains the intrinsic parameters of the camera. Camera position and orientation determine matrix [R|t]. Matrix [R|t] defines transformation from frame F enu to camera coordinates F c : ( Here, R is a rotation matrix, and t is the shift vector from the origin of F enu to O (origin of F c ). Matrix A includes the principal point C coordinates (c u , c v ) in frame F im and the camera focal length f divided by pixel width and height: The camera's optical axis and the image plane intersect at C. It is the spatial point G image (Figures 1 and 2). Usually, the coordinates (c u , c v ) point to the image sensor matrix center (e.g., full HD resolution is 1920 × 1080, so c u = 960, c v = 540). Some modes of cameras can produce cropped images shifted from the principal point; we do not consider this case here.  The triangles (O, P im , C) and (O, P, (z c , 0, 0)) are similar on the plane ZX c and on ZY c (Figure 2), so It follows from (5) that pixel coordinates in F im are connected with coordinates of the spatial source in F c by the equations where P h im are homogeneous coordinates of P im . If f u = 0 and f v = 0, the pixel (u, v) defines the relations x c /z c , y c /z c (5). However, we need to know the value z c (u, v) ("depth map") to recover original 3D coordinates x x , y c , and z c from the pixel coordinates (u, v). Additional information about the scene is needed to build the depth map z c (u, v) for some range of pixels (u, v). Perspective projection preserves straight lines.
A wide-angle lens, used by most street public cameras, introduces perspective distortion described by the pinhole camera model and substantial radial distortion. Usually, the radial distortion is easily detectable, especially on "straight" lines near the edges of the image. An extended model was used to take it into account. A popular quadratic model of radial distortion ( [6] (pp. 63-66), [7] (pp. 189-193), [5]) changes the pinhole camera model (6) as follows: x where k 1 and k 2 are radial distortion coefficients. In addition to the quadratic, more complex polynomials and models of other types are used ( [5], [6] (pp. 63-66, 691-692), [7] (pp. 189-193)).
Let the coefficients f u , f v , c u , c v , k 1 , and k 2 be selected so that the formation of images from the camera is sufficiently well described by (7)- (9). To construct the artificial image on which the radial distortion k 1 and k 2 is eliminated, we must for each pair of relations (x c /z c , y c /z c ) obtain new positions of pixels (û,v) according to the pinhole camera model: To do this, we need all values (x c /z c , y c /z c ) (coordinates in F c ), which form the original image. To obtain the relations from an image, you have the image pixel coordinates (u, v) and equations (follows from (7)-(9)) if v = c v . Similar equations are valid for v = c v and u = c u . Equation (13) has five roots (complex, generally speaking), so we need an additional condition to select one root. We can select the real root nearest to (v − c v )/ f v . The artificial image will not be a rectangle, but we can select a rectangle part of it (see figures of Example 2). Note that the mapping (u, v) → (û,v) is determined by the camera parameters f u , f v , c u , c v , k 1 , and k 2 . It does not change from frame to frame and can be computed once and used before the parameters are changed.

Public Camera Parameters Estimation
Model parameters (3), (7)-(9) define the transformation of the 3D point, visible by camera, to pixel coordinates in the image. These parameters are: rotation matrix of the camera orientation R (2) and (3); camera position coordinates (point O) in F enu (3); intrinsic camera parameters f u , f v , c u , and c v , (1) and (4); and radial distortion coefficients k 1 and k 2 , (8) and (9).
The camera calibration process estimates the parameters using a set of 3D point coordinates {P i enu } and a set of the point image coordinates {P i im }. The large set {P i enu } is called a point cloud. There are software libraries that work with point clouds. Long range 3D lidars or geodesic instruments measure 3D coordinates values for a set {P i enu }. Another camera depth map can help obtain a set {P i enu }. Stereo/multi-view cameras can build depth maps, but it is hard to obtain high accuracy for long distances. When the tools listed above are unavailable, it is possible to obtain global coordinates of points in the camera field of view by GNSS sensors or from online maps. The latter variants are easier to access but less accurate. There are many tools [8] to translate global coordinates to an ENU.
Camera calibration procedure is well studied ( [7] (pp. 178-194), [9] (pp. 22-28), [10], [6] (pp. 685-692)). It looks for the parameter values that minimize the difference between pixels {P i im } and pixels generated by the model (3), (7)-(9) from {P i enu }. Computer vision software libraries include calibration functions for a suitable set of images from a camera [5]. The OpenCV function calibrateCamera [5,11] needs a set of points and pixels of a special pattern (e.g., "chessboard") rotated before the camera. The function returns an estimation of intrinsic camera parameters and camera placements concerning the pattern positions. These relative camera placements are usually useless after outdoor camera installation. If the camera uses a zoom lens and the zoom changes on the street, the camera focal length changes too, and new calibration is needed. If we use a public camera installed on a wall or tower, it is hard to collect appropriate pattern images from the camera to apply a function such as calibrateCamera. Camera operators usually do not publish camera parameters, but this information is indispensable for many computer vision algorithms. We need a calibration procedure applicable to the available online data.

Site-Specific Calibration
If the unified calibration procedure data are unavailable, we can estimate camera position and orientation parameters (R,O) separately from others. Resolution N × M of images/video from a camera is available with the images/video. As noted earlier, usually Many photo cameras (including phone cameras) add the EXIF metadata to the image file, which often contain focal length f . The image sensor model description can contain a pixel width w and height h, so (4) gives f u and f v values. The distortion coefficients are known for high-quality photo lenses. Moreover, the raw image processing programs can remove the lens radial distortion. If lucky, we can obtain intrinsic camera parameters and radial distortion coefficients and apply a pose computation algorithm (PnP, [12,13]) to sets {P enu }, {P im } to obtain R,O estimates. When EXIF metadata from the camera are of no help (this is typical for most public cameras), small sets {P enu } and {P im } and Equation (5) can help to obtain f u and f v estimations. Suppose we know the installation site of the camera (usually, the place is visible from the camera's field of view). In that case, we can estimate GNSS/ENU coordinates of the place (point O coordinates) by the method listed earlier.
The camera orientation (matrix R) can be detected with the point G coordinates (Figure 1) and the estimation of the camera horizon tilt. Horizontal or vertical 3D lines (which can be artificial) in the camera's field of view can help evaluate the tilt.

Formulation of the Problem and Its Solution Algorithm
Designers and users of transport models are interested in the flow density (number of vehicles per unit of lane length or direction at a time); speed of vehicles (on a lane or direction); and intensity of the flow (number of vehicles crossing the cross section of a lane or direction). We capture some areas (ROI) of the frames to determine these values from a fixed camera. The camera generates a series of ROI images, usually at a fixed interval, such as 25 frames/second. The algorithms (object detection or instance segmentation or contours detection and object tracking, see [14][15][16][17]) find a set of contours describing the trajectory of each vehicle crossing the ROI. The contour description consists of the coordinates of the vertices in F im . We count the number of contours per meter in the ROI of each frame to estimate flow density. We choose the vertex (e.g., "bottom left") of the contour in the trajectory and count the meters that the vertex has passed in the trajectory to estimate the car speed. Both cases require estimation in meters of distance given in pixels, so we need to convert lengths in the image (in F im ) to lengths in space (in F enu or F c ). In some examples, distances in pixels are related to metric distances almost linearly (where radial and perspective distortions are negligible). We will consider the public cameras that produce images with significant radial and perspective distortions in the ROI (more common case). Problem 1.

1.
Let Q be the area that is a plane section of the road surface in space (the road plane can be inclined); 2.
Φ is the camera frame of N × M resolution containing the image of Q, denoting the image by Q im ; 3.
The camera forms pixel coordinates of the image Φ according to the model (3), (7)- (9) with unknown parameters f u , f v , k 1 , k 2 , R, Φ contains the image of at least one segment, the prototype of which in space is a segment of a straight line with a known slope (e.g., vertical); 5.
The image is centered relative to the optical axis of the video camera, that is, c u = N/2 and c v = M/2; 6.
The F enu coordinates of the points O (camera position) and G (the source of the optical center C) are known; 7.
The F im coordinates of one or more pixels of {P u im = C} located on the line v = c v and the F enu coordinates of their sources {P u enu } are known; 8.
The F im coordinates of one or more pixels of {P v im = C} located on the line u = c u and the F enu coordinates of their sources {P v enu } are known; 9.
The F enu coordinates of three or more points {P Q enu } ∈ Q are known; at least three of them must be non-collinear; 10. The F im coordinates of one or more groups of three pixels {(P a im , P b im , P c im )} are known, and in the group, the sources of the pixels are collinear in space.
Find the parameters of the camera f u , f v , R, k 1 , and k 2 and construct the mapping Online maps allow remote estimation of global coordinates of points and horizontal distances. Many such maps do not show the altitude, and most do not show the height of buildings, bridges, or vegetation. Online photographs, street view images, and horizontal distances can help estimate such objects' heights. Camera locations are often visible in street photos. This variant of measurements suggests that the coordinate estimates may contain a significant error. The errors result in some algorithms (e.g., PnPransac) being able to generate a response with an unacceptable error.
To find {P u im },{P u enu },{P v im }, and {P v enu } coordinates, the points must be visible both in the image and on the online map.

Solution Algorithm
We want to eliminate the radial distortion of area Q im to go to the pinhole camera model. From (10)- (13), it follows that for this you need values f u and f v .

Obtain the Intrinsic Parameters (Matrix A)
Note that from v u = c v and (7)-(9) it follows yc/zc = 0 for the point P u im = (u u , v u ) (because 1 + k 1 r 2 + k 2 r 4 = 0 leads to u u = c u and P u im = C). So, the point P u im stays on the central horizontal line of the image for any values k 1 and k 2 . By analogy, for P v im = (u v , v v ) take place x c /z c = 0, and the point stays on the central vertical line for any values k 1 and k 2 .
Evaluate the angles between the optical axis OG and vectors OP u enu and OP v enu (Figure 2, Figure 1): It follows from (5) that if the effect of radial distortion on the values (u u , v u ) and (u v , v v ) can be ignored (camera radial distortion is moderate, and the points are not too far from C), then Equation (16) gives the initial approximation of coefficients f u and f v and an evaluation of matrix A.

Obtain the Radial Distortion Compensation Map
LetΦ be the image obtained from Φ by the transformation (10) and solution of Equations (11)- (13). Denote Θ the mapping of Φ toΦ: We create the imageΦ according to the pinhole camera model. The model transforms a straight line in space into a straight line in the image. Let We can find k 1 and k 2 values that minimize the sum of distances from pixelsP b im to the lines passing throughP a im andP c im . We calculate the distance as: for each triplet (P a im , P b im , P c im ). This approach is a variant of the one described, for example, in [7] (pp. [189][190][191][192][193][194]. OpenCV offers the realization of Θ −1 (initUndistortRectifyMap). It is fast, but we need to invert it for our case. We solve Equations (11)-(13) (including version for v = c v , u = c u ) to obtain Θ. The Θ −1 is polinomial fromû,v (see (7)-(10)).

Obtain the Camera Orientation (Matrix R )
To determine the camera orientation, we use the point O and the point G given in the coordinates F enu (Figure 1). Unit vector e zc = OG/|OG| = (e zc x , e zc y , e zc z ) enu gives direction to axis Z c of frame F c . e zc , and O determines the plane of points x, for which the vector Ox is perpendicular to e zc . In this plane, lie the axes X c and Y c of coordinate frame F c . Unit vector e d = (0, 0, −1) enu is downward. Let F v be camera coordinates with the same optical axis Z c as F c , but the F v has zero horizon tilt. Axis X v of the frame is perpendicular to e d as far as X v is parallel to the horizontal plane. We can find axes directions Y v and X v of the frame F v : n yv = e d − ( e d · e zc ) e zc = e d + e zc z e zc e yv = n yv /| n yv | e xv = e yv × e zc .
Vectors (21) form the orthonormal basis, which allows us to construct a rotation matrix R v2e for transition from F v to F enu : If R v2e is a rotation matrix, the following equations hold: From Clause 4 of the problem statement, there is a line segment in the artificial imagê Φ with a known slope in space. It is possible to compare the segment slope in the imagê Φ and the slope of its source in space. So, we can estimate the camera horizon tilt angle (denote it as γ) and rotate the plane with the axes X v and Y v around the optical axis Z c at this angle. The resulting system of coordinates F c corresponds to the actual camera orientation. To pass from the camera coordinates F c to F enu , we can first go from F c to F v by rotation with the angle γ around the optical axis Z c using the rotation matrix We can describe the transition from F c to F enu (without origin displacement) as a combination of rotations by the matrix which is also a rotation matrix. The matrix that we have already designated R (2) gives the inverse transition from F enu to the coordinates of the camera frame F c : and the shift t, which in the coordinates F c characterizes the transition from the beginning of the coordinates of F enu to the point of installation of the camera O. The t is often more easily expressed through the coordinates O enu given in F enu , see (3). To convert the coordinates of a P enu from F enu to P c coordinates for the F c camera frame, use the following expressions: Typically, an operator aims to set a camera with zero horizon tilt.

Obtain the Mapping Λ −1 for Q im
LetQ im = Θ(Q im ) be the area inΦ corresponding to Q im in the image Φ. Q enu and Q c are the domain Q on the road plane in coordinates F enu , and F c , respectively. From (27), From Clause 2 of the problem statement, it follows that an image of Q c is visible in Φ (and inΦ, soQ im ⊂Φ ).
We can convert the ENU coordinates of points {P Q enu } to F c by following (27). We denote the result as . We approximate its plane with the least squares method. The plane is defined in F c by the equation The matrix D and vector E represent the points on the road: The plane parameters p can be found by solving the least squares problem the exact solution to the least squares task is: We denote by Π the plane defined by (29) and, (32). Note that Q c ⊂ Π.
For a point (x c , y c , z c ) ∈ Π represented by the pixel coordinates (û,v) inΦ we obtain, taking into account (6) z . Ifû = c u andv = c v (it means that x c = 0 and y c = 0) then so there are linear equations that allow us to express x c and y c throughû andv. Let From (34), we obtain solution: where a(û) and b(v) defined in (35) and We have constructed a function that assigns to each pixel coordinates g = (û,v) (in imageΦ) a spatial point in the coordinate system F c , according to (36) and (37): Next, we will refer more to (39). F enu helps obtain measurement results, but F c is sufficient for calculating metric lengths. There are other ways to map pixels to meters (see, e.g., [7] (pp. 47-55)), but their applicability depends on the data available.

Auxiliary
Steps 4.2.1. Describing of the Area Q Different Q shapes help for varying tasks. We fixQ im as a quadrilateral on the imagê Φ. The Q c (source of the quadrilateral in space) in the plane Π may be a rectangle or a tetragon. We choose the four corners of g 1 , g 2 , g 3 , and g 4 of the domainQ im as pixels in thê Φ. The "tetragon" Q im = Θ −1 (Q im ) is the ROI in the original image Φ, in which we use object detection to estimate transport flows statistics. The lines boundQ im are ( [7] (p. 28)): where g h i = (û i ,v i , 1) are the homogeneous coordinates of pixel g i = (û i ,v i ). Pixel g h ∈Q im must be the solution to a system of four inequalities: The Q im is a continuous domain in R 2 , but the whole image Φ contains only a fixed number of actual pixels (N × M). The same is true forQ im = Θ(Q im ). We can obtain real pixels that fall in Q im fromQ im with the per-pixel correspondence Q im = Θ −1 (Q im ). We apply the mapping (39) and obtain the set of points in F c which correspond to actual pixels in Q im . F c is a metric coordinate system, so the distance in F c can be used to estimate meters/second or objects/meter. The same is true for any rotations or shifts of the frame F c .

Coordinate Frame Associated with the Plane Π
We can go from F c to a coordinate system associated with the plane Π. We apply it for illustrations, but it can be helpful for other purposes. We can regard the traffic in Q im as anything that rises above the plane Π . We denote by F π the coordinate system connected to Π. and use the normal to the plane Π at the coordinates F c from (32) as the axis Z π .
We choose in the Π plane the direction of another axis (e.g., Y π ), and the third axis is determined automatically. Let the Y axis be pointed by the g 4 − g 2 vector (that is, along the direction of traffic movement in the Q area), then Select the origin of the coordinate system F π as follows The rotation matrices from F p to F c and their inverses look like the following: Use the following expression for the translation of coordinates of P c given in frame F c to P π given in frame F π :

Examples
Consider a couple of public camera parameters evaluations, where the images and small sets of coordinates {P enu } and {P im } of limited accuracy are available.  The camera is a good example, as its video contains noticeable perspective and radial distortion. In the image, there is a line demonstrating the camera's slight horizon tilt (the wall at the base of which is P v ). The visible part of the road has a significant inclination. This is a FullHD camera (N = 1920, M = 1080). Select a point P 0 as the origin of F enu . Assess ENU coordinates of the point O (the camera position), point G (the source of the principal point C, Figures 1 and 3), and points P u and P v on the lines v = c v and u = c u , respectively. Using maps and online photos, we obtained the values listed in Table 1. Convert global coordinates to the F enu coordinates (see [8]) and add it to the table (in meters). The points from Table 1 can be found on the satellite layer of [18] using the latitude and longitude of the query.

Example 1: Inclined Driving Surface and a Camera with Tilted Horizon
We obtain f u = 1166.2 and f v = 1241.55 by using (15), (16) and Table 1 data. Street camera image sensors usually have square pixels, so w = h and f u = f v (4). If the difference f u and f v is small, let f u , f v be equal f u = f v = 1203.89 (we use mean value). So we have an approximation of matrix A. Now, we can estimate radial distortion coefficients k 1 and k 2 by minimizing distances (19) or in another way. Put k 1 = −0.24 and k 2 = 0 to compute the mapping Θ (17) and apply it to eliminate radial distortion k 1 and k 2 from the original image (Φ). The mapping does not change for different frames from the camera video. We obtained the undistorted version of image Φ, which we have identifiedΦ (Figure 4). The radial distortion of the straight lines in the vicinity of the road has almost disappeared inΦ. The camera's field of view decreased, the C point remained in place, and the p 1 and p 2 points moved further along the lines v = c v and u = c u . The values of f u and f v can be recalculated, but radial distortion is not the only cause of errors. Therefore, we will perform additional cross-validation and compensate the values of f u and f v if required.
We can estimate the horizon tilt angle from the image ( Figure 4) and rotate the plane with the axes X v and Y v of frame F v around the optical axis Z c at this angle. As a result of rotating the image around the optical axis of the camera on 4.1 • , the verticality of the required line was achieved ( Figure 5), so let γ = 4.1 • in (24). We compute the camera orientation matrix R with (20)-(26). Now, we can convert F enu coordinates to F c with (27).
Let us use the area nearest to the camera carriageway region as the ROI (area Q). We select several points in Q, estimate their global coordinates, and convert them to F enu [8]. Next, we convert F enu coordinates to F c with (27). The results are in Table 2. Table 2. The spatial points used for the road plane approximation (global and F c coordinates); F c coordinates are in meters.

Num Lat
Long Height x c y c z c We choose four corners of g 1 , g 2 , g 3 and g 4 of the domainQ im in theΦ (see Table 3 and Figure 6). We calculate the lines that bound domainQ im by (41) and detect the set of pixel coordinates that belong toQ im . Note that this set does not change for different frames from the camera video. We can save it for later usage with the camera and the Q.
We compute the Q c as ξ(Q im ) by (48) and (35)-(39). We convert the pixel coordinates from Q c to F π by (43)-(47): and save the result. The coordinates set Q π does not change for the (fixed) camera and the Q. The obtained discrete sets Q c and Q π are sets of metric coordinates. We can use Q π for measurements on the plane Π as is or apply an interpolation. Since Q π ⊂ Π, the point set is suitable to output in a plane picture. If for each point, we use ξ(g) ∈ Q π and the color of the pixel g for all g ∈Q im ⊂Φ, and output the plane as the scatter map, we obtain the bottom image from Figure 6. We chose a multi-car scene in the center of Q for the demonstration. We obtain the accurate "view from above" for the points that initially lie on the road's surface. The positions of the pixels representing objects that tower above plane Π have shifted. There are options for estimating and accounting for the height of cars and other objects so that their images look realistic in the bottom image. However, this is the subject of a separate article.
It is worth paying attention to the axes of the bottom figure (they show meters). In comparison with the original (top image of Figure 6), there are noticeable perspective distortion changes in the perception of distance. The width and length of the area are in good agreement with the measurements made by the rangefinder and estimates from online maps. The horizontal lines in the image are almost parallel which indicates the good quality of f u and f v . In this way, we cross-checked the camera parameters obtained earlier.
Obviously, all vehicles are in contact with the road surface. Figure 6. The Q π set visualization in plane Π, by the colors of the pixels fromQ im . Given the geometry of the sample, y π is the horizontal axis, x π the vertical. We do not need to remove distortions (radial and perspective) from all video frames to estimate traffic statistics. The object detection or instance segmentation works with original video frames in Q im . We use the distortions compensation for traffic measurements to obtain distances in meters for selected pixels. We calculate the needed maps once for the camera parameters and Q and use them as needed for measurements. Let us demonstrate this on a specific trajectory.
The detected trajectory of the vehicle consists of 252 contours, 180 of which are in Q im . A total of 200 contours in the image look messy, so we draw every 20th (Figure 7). We deliberately did not choose rectangles to demonstrate a more general case. Vertex coordinates describe the contour. It is enough to select one point on or inside each contour to evaluate the speed or acceleration of an object.  We call the left bottom corner of a contour the vertex (u j , v j ) for that considering the axes direction of the F im . Another option is to search the point nearest to (u min , v max ) on the edges of the contour with the help of (19).
We map all contour vertexes on the Q π to obtain Figure 8. However, we need to map only these "corners" for measurements.  Figure 9 illustrates a frame image from another public camera of the same operator. This camera has zero horizon tilt and more substantial radial distortion. We have calibrated the camera and calculated the maps in the season of rich vegetation. Tree foliage complicates the selection and estimation of coordinates of points. The two-way road is visible to the camera. This is a FullHD camera (N = 1920, M = 1080, c u = N/2, c v = M/2). The zero horizon tilt is visible on the line u = c u near P v 5 (see Table 4). We select the point P 0 as the origin of F enu (see Table 4). We assess global coordinates of the point O (the camera position), point G (the source of the principal point C, Figure 1, Figure 9), and points P u 1 , P v 2 , P u 3 , P v 4 , and P v 5 on the lines v = c v (index u) or u = c u (index v). Next, we convert the global coordinates to F enu and append them to Table 4.

Example 2. More Radial Distortion and Vegetation, More Calibration Points
We obtained estimations .82, and f 5 v = 1349.67 using Formulas (15) and (16) and Table 4 data. There are a few points, but one can assume a pattern f v > f u . Perhaps the camera's sensor pixel has a rectangular shape (see (4)). Consider the two hypotheses: The second variant means the square pixel.  Let us try variant (50). With f u and f v we obtained the intrinsic parameters matrix A (1). We estimate radial distortion coefficients k 1 and k 2 , put k 1 = −0.17 and k 2 = 0.01 to compute the mapping Θ (17), and apply it to eliminate radial distortion from the (Φ). We obtained the undistorted imageΦ (see Figure 10). We showed a cropped and interpolated version of Φ in Example 1 (Figure 4). Let us demonstrate the result of mapping Θ without postprocessing. The image's resolutionΦ is 2935 × 1651, but it contains only 1920 × 1080 pixels colored by the camera. So, the black grid is the unfilled pixels of the large rectangular. Let us return to the illustration that is more pleasing to the eye by cutting off part ofΦ and filling the void with interpolation (see Figure 11). We noted that in the field of view of the camera, there is a section of the wall of the building (near P u 5 ), allowing you to put γ = 0 (see Figure 9). We are ready to compute the orientation matrix R (20)-(26), so we can convert F enu coordinates to F c (27). We select the points to approximate the road surface (see Table 5).
We choose four corners of g 1 , g 2 , g 3 , and g 4 ofQ im in the imageΦ (see Table 6 and Figure 12). Let us try the nonrectangular areaQ im . We calculate the lines that bound domainQ im with (41) and detect the set of pixel coordinates that belongs toQ im . We compute the Q c as ξ(Q im ) with (48) and (35)-(39). Next, we convert the pixel coordinates from Q c to F π with (43)-(47) and (49) and save the result. The distances in Q correspond to estimates obtained online and a simple rangefinder (accuracy up to a meter). We plan more accurate assessments using the geodetic tools.
We repeat the needed steps for hypothesis (51) and compare the results (see Figures 12 and 13). We observed a change in the geometry of Q π for the hypothesis (51). For example, the pedestrian crossing changed its inclination; the road began to expand to the right. In this example, it is not easy to obtain several long parallel lines due to vegetation, and it is an argument for examining the cameras in a suitable season.

Conclusions
This work is an extension of the technology of road traffic data collection described in [4] to allow accurate car density and speed measurements using physical units with compensation for perspective and radial distortion.
Although we processed video frames in the article, all the mappings obtained for measurements (Θ, ξ, [R|t], R c2π ) transform only coordinates. We used the pixel content of the images only for demonstrations. In the context of measurements, object detection (or instance segmentation) and object tracking algorithms work with pixel colors. The mappings work with the pixel coordinates of the results of these algorithms. We can calculate the discrete maps once and use them until the camera parameters or the Q area change. In this sense, the computational complexity of mapping generation is not particularly important. We plan to refine estimates of camera parameters as new data on the actual geometry of the ROI area will be available, and the accuracy of the maps will increase. Evaluating or refining the camera parameters in a suitable season might be better.