Identification of Location and Camera Parameters for Public Live Streaming Web Cameras

Aleksander Zatserkovnyy; Evgeni Nurminski

doi:10.3390/math10193601

and

¹

Pacific Oceanological Institute of Far Eastern Branch of RAS, 690041 Vladivostok, Russia

²

Center for Research and Education in Mathematics, Institute of Mathematics and Computer Technologies, Far Eastern Federal University, 690922 Vladivostok, Russia

^*

Author to whom correspondence should be addressed.

Mathematics2022, 10(19), 3601;https://doi.org/10.3390/math10193601

This article belongs to the Special Issue Numerical Methods and Algorithms Applied in Intelligent Transportation Systems

Version Notes

Order Reprints

Abstract

Public live streaming web cameras are quite common now and widely used by drivers for qualitative analysis of traffic conditions. At the same time, they can be a valuable source of quantitative information on transport flows and speed for the development of urban traffic models. However, to obtain reliable data from raw video streams, it is necessary to preprocess them, considering the camera location and parameters without direct access to the camera. Here we suggest a procedure for estimating camera parameters, which allows us to determine pixel coordinates for a point cloud in the camera’s view field and transform them into metric data. They are used with advanced moving object detection and tracking for measurements.

Keywords:

depth map; radial distortion; perspective distortion; camera calibration; transport traffic statistics

MSC:

37M10; 65D18; 68U10; 90B20

1. Introduction

There are many ways to measure traffic [1,2,3]. The most common in the practice of road services are radars combined with video cameras and other sensors. The measuring complexes are above or at the edge of the road. Inductive sensors are the least dependent on weather conditions and lighting. The listed surveillance tools assume installation by the road or on the road. Providers of mobile navigator applications receive data about the car’s movement from the sensors of mobile devices. Autonomous cars collect environmental information using various sensors, including multiview cameras, lidars, and radars. The results of the data collection systems of road services, operators of navigators, and autonomous cars are usually not available to third-party researchers. Notably, pioneering traffic analysis work describes processing video data recorded on film [3] (pp. 3–10).

Operators worldwide install public web cameras, many of which look at city highways; e.g., there are more than a hundred similar cameras available in Vladivostok.

A transport model verification requires actual and accurate data on transport traffic, covering a wide range of time intervals with substantial transport activity. Public live-streaming cameras can be a good and easily accessible source of data for that kind of research. Of course, this accessibility is relative. Video processing involves storing and processing large amounts of data.

The ref. [4] demonstrates road traffic statistics collection from a public camera video, where the camera has little perspective and radial distortions in the region of interest in the road (Region Of Interest, ROI). However, the distortions make significant changes in images for the majority of public cameras. The current article generalizes this approach to the case where a camera has essential radial and perspective distortions.

Street camera owners usually do not announce camera parameters (focal length, radial distortion coefficients, camera position, orientation). Standard calibration procedures with a pattern rotation can be useless for cameras on a wall or tower. In this case, we suggest the implementable camera calibration procedure, which uses only online data (global coordinates of some visible points, photos, and street view images).

With known camera parameters, we construct the mapping between the ROI pixel coordinates and metric coordinates of points on the driving surface, which gives a way to estimate traffic flow parameters in standard units such as car/meter for traffic density and meter/second for car velocity with improved accuracy.

2. Coordinate Systems, Models

We select an ENU coordinate frame

F_{e n u}

(East, North, Up) with an origin on a fixed object to localize points

P = (x_{e}, y_{e}, z_{e})

, where

x_{e}, y_{e},

and

z_{e}

are coordinates of P in

F_{e n u}

.

A camera forms an image in the sensor (Figure 1 and Figure 2). A digital image is a pixel matrix

N \times M

(N columns, M rows). A pixel position is described in the image coordinate frame

F_{i m}

in the image plane

P_{i m} = (u, v)

where u and v are real numbers, coordinates of

P_{i m}

in

F_{i m}

. Axis U corresponds to the image matrix rows (towards the right), and axis V corresponds to columns (from up to down). The camera orientation determines the image sensor plane orientation. Integer indexes define pixel position in the image matrix. We can obtain them by rounding u and v to the nearest integers. Let w and h be the image pixel width and height, respectively. For the image width and height, we have

W = N w

and

H = M h

.

Figure 1. Coordinate frames

F_{c}

,

F_{i m}

.

P_{i m}

are the image of the 3D point P. Axis

Z_{c}

follows the camera optical axis. The camera forms a real image on the image sensor behind aperture O in the plane

z = - f

; however, the equivalent virtual image in the plane

z = f

is preferable in illustrations.

Figure 2. Virtual and real images for a camera with aperture O. The scene is projected on the plane

X Z

of frame

F_{c}

. The coordinates of the point P in

F_{c}

are denoted by

x_{c}, z_{c}

, and the

u, c_{u}

are coordinates of points

P, C

on the axis U of frame

F_{i m}

.

Camera position and orientation define the camera coordinate frame

F_{c}

(Figure 1). The camera perspective projection center O is the origin of

F_{c}

. We denote by

(O_{x e}, O_{y e}, O_{z e})

the O coordinates in

F_{e n u}

. Axis

Z_{c}

follows along the camera optical axis, axis

X_{c}

is parallel to axis U, and

Y_{c}

is parallel to axis V. Axis

X_{c}

defines the camera horizon. The deviation of

X_{c}

from the horizontal plane

X_{e n u} Y_{e n u}

defines the camera horizon tilt. Unlike

F_{i m}

, the

F_{e n u}

and

F_{c}

are metric coordinate frames.

F_{i m}

coordinates

(u, v)

approximate matrix indexes. To obtain meters from u on axis U, we use the expression

m (u) = w u

, and to obtain meters from v on axis V, we use

m (v) = h v

.

Suppose the camera builds an image close to a perspective projection of a 3D scene. We describe this camera transformation as the pinhole camera model (Figure 1, Figure 2 and ref. [5]). A point image

P_{i m} = (u, v)

is the transformation of the original 3D point P coordinates described in

F_{e n u}

. The matrices A and

[R | t]

define this transformation:

A = [\begin{matrix} f_{u} & 0 & c_{u} \\ 0 & f_{v} & c_{v} \\ 0 & 0 & 1 \end{matrix}],

(1)

[R | t] = [\begin{matrix} r_{11} & r_{12} & r_{13} & t_{x} \\ r_{21} & r_{22} & r_{23} & t_{y} \\ r_{31} & r_{32} & r_{33} & t_{z} \end{matrix}] .

(2)

The triangular matrix A contains the intrinsic parameters of the camera. Camera position and orientation determine matrix

[R | t]

. Matrix

[R | t]

defines transformation from frame

F_{e n u}

to camera coordinates

F_{c}

:

[\begin{matrix} x_{c} \\ y_{c} \\ z_{c} \end{matrix}] = [R | t] [\begin{matrix} x_{e} \\ y_{e} \\ z_{e} \\ 1 \end{matrix}], [\begin{matrix} t_{x} \\ t_{y} \\ t_{z} \end{matrix}] = - R [\begin{matrix} O_{x e} \\ O_{y e} \\ o_{z e} \end{matrix}] .

(3)

Here, R is a rotation matrix, and t is the shift vector from the origin of

F_{e n u}

to O (origin of

F_{c}

). Matrix A includes the principal point C coordinates

(c_{u}, c_{v})

in frame

F_{i m}

and the camera focal length f divided by pixel width and height:

f_{u} = f / w, f_{v} = f / h .

(4)

The camera’s optical axis and the image plane intersect at C. It is the spatial point G image (Figure 1 and Figure 2). Usually, the coordinates

(c_{u}, c_{v})

point to the image sensor matrix center (e.g., full HD resolution is 1920 × 1080, so

c_{u} = 960, c_{v} = 540

). Some modes of cameras can produce cropped images shifted from the principal point; we do not consider this case here.

The triangles

(O, P_{i m}, C)

and

(O, P, (z_{c}, 0, 0))

are similar on the plane

Z X_{c}

and on

Z Y_{c}

(Figure 2), so

\begin{matrix} x_{c} / z_{c} = w (u - c_{u}) / f = tan θ_{x} \\ y_{c} / z_{c} = h (v - c_{v}) / f = tan θ_{y} \end{matrix} .

(5)

It follows from (5) that pixel coordinates in

F_{i m}

are connected with coordinates of the spatial source in

F_{c}

by the equations

P_{i m}^{h} = [\begin{matrix} u \\ v \\ 1 \end{matrix}] = A [\begin{matrix} x_{c} / z_{c} \\ y_{c} / z_{c} \\ 1 \end{matrix}] = [\begin{matrix} f_{u} x_{c} / z_{c} + c_{u} \\ f_{v} y_{c} / z_{c} + c_{v} \\ 1 \end{matrix}],

(6)

where

P_{i m}^{h}

are homogeneous coordinates of

P_{i m}

. If

f_{u} \neq 0

and

f_{v} \neq 0

, the pixel

(u, v)

defines the relations

x_{c} / z_{c}, y_{c} / z_{c}

(5). However, we need to know the value

z_{c} (u, v)

(“depth map”) to recover original 3D coordinates

x_{x}, y_{c},

and

z_{c}

from the pixel coordinates

(u, v)

. Additional information about the scene is needed to build the depth map

z_{c} (u, v)

for some range of pixels

(u, v)

. Perspective projection preserves straight lines.

A wide-angle lens, used by most street public cameras, introduces perspective distortion described by the pinhole camera model and substantial radial distortion. Usually, the radial distortion is easily detectable, especially on ”straight” lines near the edges of the image. An extended model was used to take it into account. A popular quadratic model of radial distortion ([6] (pp. 63–66), [7] (pp. 189–193), [5]) changes the pinhole camera model (6) as follows:

r^{2} = {(x_{c} / z_{c})}^{2} + {(y_{c} / z_{c})}^{2},

(7)

[\begin{matrix} \hat{x} \\ \hat{y} \end{matrix}] = (1 + k_{1} r^{2} + k_{2} r^{4}) [\begin{matrix} x_{c} / z_{c} \\ y_{c} / z_{c} \end{matrix}],

(8)

P_{i m}^{h} = [\begin{matrix} u \\ v \\ 1 \end{matrix}] = A [\begin{matrix} \hat{x} \\ \hat{y} \\ 1 \end{matrix}] = [\begin{matrix} f_{u} \hat{x} + c_{u} \\ f_{v} \hat{y} + c_{v} \\ 1 \end{matrix}],

(9)

where

k_{1}

and

k_{2}

are radial distortion coefficients. In addition to the quadratic, more complex polynomials and models of other types are used ([5], [6] (pp. 63–66, 691–692), [7] (pp. 189–193)). Let the coefficients

f_{u}, f_{v}, c_{u}, c_{v}, k_{1},

and

k_{2}

be selected so that the formation of images from the camera is sufficiently well described by (7)–(9). To construct the artificial image on which the radial distortion

k_{1}

and

k_{2}

is eliminated, we must for each pair of relations

(x_{c} / z_{c}, y_{c} / z_{c})

obtain new positions of pixels

(\hat{u}, \hat{v})

according to the pinhole camera model:

[\begin{matrix} \hat{u} \\ \hat{v} \end{matrix}] = [\begin{matrix} f_{u} x_{c} / z_{c} + c_{u} \\ f_{v} y_{c} / z_{c} + c_{v} \end{matrix}] .

(10)

To do this, we need all values

(x_{c} / z_{c}, y_{c} / z_{c})

(coordinates in

F_{c}

), which form the original image. To obtain the relations from an image, you have the image pixel coordinates

(u, v)

and equations (follows from (7)–(9))

Q (u, v) = \frac{f_{v} (u - c_{u})}{f_{u} (v - c_{v})}, D (u, v) = Q^{2} + 1,

(11)

x_{c} / z_{c} = Q (u, v) y_{c} / z_{c},

(12)

k_{2} D {(u, v)}^{2} {(y_{c} / z_{c})}^{5} + k_{1} D (u, v) {(y_{c} / z_{c})}^{3} + y_{c} / z_{c} + \frac{c_{v} - v}{f_{v}} = 0,

(13)

if

v \neq c_{v}

. Similar equations are valid for

v = c_{v}

and

u \neq c_{u}

. Equation (13) has five roots (complex, generally speaking), so we need an additional condition to select one root. We can select the real root nearest to

(v - c_{v}) / f_{v}

. The artificial image will not be a rectangle, but we can select a rectangle part of it (see figures of Example 2). Note that the mapping

(u, v) \to (\hat{u}, \hat{v})

is determined by the camera parameters

f_{u}, f_{v}, c_{u}, c_{v}, k_{1},

and

k_{2}

. It does not change from frame to frame and can be computed once and used before the parameters are changed.

3. Public Camera Parameters Estimation

Model parameters (3), (7)–(9) define the transformation of the 3D point, visible by camera, to pixel coordinates in the image. These parameters are: rotation matrix of the camera orientation R (2) and (3); camera position coordinates (point O) in

F_{e n u}

(3); intrinsic camera parameters

f_{u}, f_{v}, c_{u},

and

c_{v}

, (1) and (4); and radial distortion coefficients

k_{1}

and

k_{2}

, (8) and (9).

The camera calibration process estimates the parameters using a set of 3D point coordinates

{P_{e n u}^{i}}

and a set of the point image coordinates

{P_{i m}^{i}}

. The large set

{P_{e n u}^{i}}

is called a point cloud. There are software libraries that work with point clouds. Long range 3D lidars or geodesic instruments measure 3D coordinates values for a set

{P_{e n u}^{i}}

. Another camera depth map can help obtain a set

{P_{e n u}^{i}}

. Stereo/multi-view cameras can build depth maps, but it is hard to obtain high accuracy for long distances. When the tools listed above are unavailable, it is possible to obtain global coordinates of points in the camera field of view by GNSS sensors or from online maps. The latter variants are easier to access but less accurate. There are many tools [8] to translate global coordinates to an ENU.

Camera calibration procedure is well studied ([7] (pp. 178–194), [9] (pp. 22–28), [10], [6] (pp. 685–692)). It looks for the parameter values that minimize the difference between pixels

{P_{i m}^{i}}

and pixels generated by the model (3), (7)–(9) from

{P_{e n u}^{i}}

. Computer vision software libraries include calibration functions for a suitable set of images from a camera [5]. The OpenCV function calibrateCamera [5,11] needs a set of points and pixels of a special pattern (e.g., “chessboard”) rotated before the camera. The function returns an estimation of intrinsic camera parameters and camera placements concerning the pattern positions. These relative camera placements are usually useless after outdoor camera installation. If the camera uses a zoom lens and the zoom changes on the street, the camera focal length changes too, and new calibration is needed. If we use a public camera installed on a wall or tower, it is hard to collect appropriate pattern images from the camera to apply a function such as calibrateCamera. Camera operators usually do not publish camera parameters, but this information is indispensable for many computer vision algorithms. We need a calibration procedure applicable to the available online data.

Site-Specific Calibration

If the unified calibration procedure data are unavailable, we can estimate camera position and orientation parameters (R,O) separately from others. Resolution

N \times M

of images/video from a camera is available with the images/video. As noted earlier, usually

c_{u} = N / 2, c_{v} = M / 2 .

(14)

Many photo cameras (including phone cameras) add the EXIF metadata to the image file, which often contain focal length f. The image sensor model description can contain a pixel width w and height h, so (4) gives

f_{u}

and

f_{v}

values. The distortion coefficients are known for high-quality photo lenses. Moreover, the raw image processing programs can remove the lens radial distortion. If lucky, we can obtain intrinsic camera parameters and radial distortion coefficients and apply a pose computation algorithm (PnP, [12,13]) to sets

{P_{e n u}}

,

{P_{i m}}

to obtain R,O estimates. When EXIF metadata from the camera are of no help (this is typical for most public cameras), small sets

{P_{e n u}}

and

{P_{i m}}

and Equation (5) can help to obtain

f_{u}

and

f_{v}

estimations. Suppose we know the installation site of the camera (usually, the place is visible from the camera’s field of view). In that case, we can estimate GNSS/ENU coordinates of the place (point O coordinates) by the method listed earlier. The camera orientation (matrix R) can be detected with the point G coordinates (Figure 1) and the estimation of the camera horizon tilt. Horizontal or vertical 3D lines (which can be artificial) in the camera’s field of view can help evaluate the tilt.

4. Formulation of the Problem and Its Solution Algorithm

Designers and users of transport models are interested in the flow density (number of vehicles per unit of lane length or direction at a time); speed of vehicles (on a lane or direction); and intensity of the flow (number of vehicles crossing the cross section of a lane or direction). We capture some areas (ROI) of the frames to determine these values from a fixed camera. The camera generates a series of ROI images, usually at a fixed interval, such as 25 frames/second. The algorithms (object detection or instance segmentation or contours detection and object tracking, see [14,15,16,17]) find a set of contours describing the trajectory of each vehicle crossing the ROI. The contour description consists of the coordinates of the vertices in

F_{i m}

. We count the number of contours per meter in the ROI of each frame to estimate flow density. We choose the vertex (e.g., “bottom left”) of the contour in the trajectory and count the meters that the vertex has passed in the trajectory to estimate the car speed. Both cases require estimation in meters of distance given in pixels, so we need to convert lengths in the image (in

F_{i m}

) to lengths in space (in

F_{e n u}

or

F_{c}

). In some examples, distances in pixels are related to metric distances almost linearly (where radial and perspective distortions are negligible). We will consider the public cameras that produce images with significant radial and perspective distortions in the ROI (more common case).

Problem 1.

Let Q be the area that is a plane section of the road surface in space (the road plane can be inclined);
Φ is the camera frame of $N \times M$ resolution containing the image of Q, denoting the image by $Q_{i m}$ ;
The camera forms pixel coordinates of the image Φ according to the model (3), (7)–(9) with unknown parameters $f_{u}, f_{v}, k_{1}, k_{2}, R$ ,

$Q_{i m} = Λ (Q_{e n u});$
Φ contains the image of at least one segment, the prototype of which in space is a segment of a straight line with a known slope (e.g., vertical);
The image is centered relative to the optical axis of the video camera, that is, $c_{u} = N / 2$ and $c_{v} = M / 2$ ;
The $F_{e n u}$ coordinates of the points O (camera position) and G (the source of the optical center C) are known;
The $F_{i m}$ coordinates of one or more pixels of ${P_{i m}^{u} \neq C}$ located on the line $v = c_{v}$ and the $F_{e n u}$ coordinates of their sources ${P_{e n u}^{u}}$ are known;
The $F_{i m}$ coordinates of one or more pixels of ${P_{i m}^{v} \neq C}$ located on the line $u = c_{u}$ and the $F_{e n u}$ coordinates of their sources ${P_{e n u}^{v}}$ are known;
The $F_{e n u}$ coordinates of three or more points ${P_{e n u}^{Q}} \in Q$ are known; at least three of them must be non-collinear;
The $F_{i m}$ coordinates of one or more groups of three pixels ${(P_{i m}^{a}, P_{i m}^{b}, P_{i m}^{c})}$ are known, and in the group, the sources of the pixels are collinear in space.

Find the parameters of the camera

f_{u}, f_{v}, R, k_{1},

and

k_{2}

and construct the mapping

Q_{e n u} = Λ^{- 1} (Q_{i m}) .

Online maps allow remote estimation of global coordinates of points and horizontal distances. Many such maps do not show the altitude, and most do not show the height of buildings, bridges, or vegetation. Online photographs, street view images, and horizontal distances can help estimate such objects’ heights. Camera locations are often visible in street photos. This variant of measurements suggests that the coordinate estimates may contain a significant error. The errors result in some algorithms (e.g., PnPransac) being able to generate a response with an unacceptable error.

To find

{P_{i m}^{u}}

,

{P_{e n u}^{u}}

,

{P_{i m}^{v}}

, and

{P_{e n u}^{v}}

coordinates, the points must be visible both in the image and on the online map.

4.1. Solution Algorithm

We want to eliminate the radial distortion of area

Q_{i m}

to go to the pinhole camera model. From (10)–(13), it follows that for this you need values

f_{u}

and

f_{v}

.

4.1.1. Obtain the Intrinsic Parameters (Matrix A)

Note that from

v^{u} = c_{v}

and (7)–(9) it follows

y c / z c = 0

for the point

P_{i m}^{u} = (u^{u}, v^{u})

(because

1 + k_{1} r^{2} + k_{2} r^{4} = 0

leads to

u^{u} = c_{u}

and

P_{i m}^{u} = C

). So, the point

P_{i m}^{u}

stays on the central horizontal line of the image for any values

k_{1}

and

k_{2}

. By analogy, for

P_{i m}^{v} = (u^{v}, v^{v})

take place

x_{c} / z_{c} = 0

, and the point stays on the central vertical line for any values

k_{1}

and

k_{2}

.

Evaluate the angles between the optical axis

O G

and vectors

O P_{e n u}^{u}

and

O P_{e n u}^{v}

(Figure 2, Figure 1):

\begin{matrix} θ_{x} = arccos (O G \cdot O P_{e n u}^{u} / (| O G | | O P_{e n u}^{u} |)) \\ θ_{y} = arccos (O G \cdot O P_{e n u}^{v} / (| O G | | O P_{e n u}^{v} |)) \end{matrix} .

(15)

It follows from (5) that if the effect of radial distortion on the values

(u^{u}, v^{u})

and

(u^{v}, v^{v})

can be ignored (camera radial distortion is moderate, and the points are not too far from C), then

\begin{matrix} f_{u} = (u^{u} - c_{u}) / tan θ_{x} \\ f_{v} = (v^{v} - c_{v}) / tan θ_{y} \end{matrix} .

(16)

Equation (16) gives the initial approximation of coefficients

f_{u}

and

f_{v}

and an evaluation of matrix A.

4.1.2. Obtain the Radial Distortion Compensation Map

Let

\hat{Φ}

be the image obtained from

Φ

by the transformation (10) and solution of Equations (11)–(13). Denote

Θ

the mapping of

Φ

to

\hat{Φ}

:

\hat{Φ} = (\hat{u}, \hat{v}) = Θ (u, v; k_{1}, k_{2}, f_{u}, f_{v}, c_{u}, c_{v}) = (f_{u} x_{c} / z_{c} + c_{u}, f_{v} y_{c} / z_{c} + c_{v}) .

(17)

We create the image

\hat{Φ}

according to the pinhole camera model. The model transforms a straight line in space into a straight line in the image.

Let

P_{i m}^{a} = (u^{a}, v^{a}), P_{i m}^{b} = (u^{b}, v^{b}),

and

P_{i m}^{c} = (u^{c}, v^{c})

in

F_{i m}

and

{\hat{P}}_{i m}^{a} = ({\hat{u}}^{a}, {\hat{v}}^{a}) = Θ (u^{a}, v^{a}), {\hat{P}}_{i m}^{b} = ({\hat{u}}^{b}, {\hat{v}}^{b}) = Θ (u^{b}, v^{b}), {\hat{P}}_{i m}^{c} = ({\hat{u}}^{c}, {\hat{v}}^{c}) = Θ (u^{c}, v^{c}) .

(18)

We can find

k_{1}

and

k_{2}

values that minimize the sum of distances from pixels

{\hat{P}}_{i m}^{b}

to the lines passing through

{\hat{P}}_{i m}^{a}

and

{\hat{P}}_{i m}^{c}

. We calculate the distance as:

\frac{| ({\hat{u}}^{c} - {\hat{u}}^{a}) ({\hat{v}}^{a} - {\hat{v}}^{b}) - ({\hat{u}}^{a} - {\hat{u}}^{b}) ({\hat{v}}^{c} - {\hat{v}}^{a}) |}{\sqrt{{({\hat{u}}^{a} - {\hat{u}}^{c})}^{2} + {({\hat{v}}^{a} - {\hat{v}}^{c})}^{2}}}

(19)

for each triplet

(P_{i m}^{a}, P_{i m}^{b}, P_{i m}^{c})

. This approach is a variant of the one described, for example, in [7] (pp. 189–194). OpenCV offers the realization of

Θ^{- 1}

(initUndistortRectifyMap). It is fast, but we need to invert it for our case. We solve Equations (11)–(13) (including version for

v = c_{v}, u \neq c_{u}

) to obtain

Θ

. The

Θ^{- 1}

is polinomial from

\hat{u}, \hat{v}

(see (7)–(10)).

4.1.3. Obtain the Camera Orientation (Matrix R )

To determine the camera orientation, we use the point O and the point G given in the coordinates

F_{e n u}

(Figure 1). Unit vector

{\vec{e}}_{z c} = O G / | O G | = {(e_{x}^{z c}, e_{y}^{z c}, e_{z}^{z c})}_{e n u}

(20)

gives direction to axis

Z_{c}

of frame

F_{c}

.

{\vec{e}}_{z c}

, and O determines the plane of points x, for which the vector

\vec{O} x

is perpendicular to

{\vec{e}}_{z c}

. In this plane, lie the axes

X_{c}

and

Y_{c}

of coordinate frame

F_{c}

. Unit vector

{\vec{e}}_{d} = {(0, 0, - 1)}_{e n u}

is downward. Let

F_{v}

be camera coordinates with the same optical axis

Z_{c}

as

F_{c}

, but the

F_{v}

has zero horizon tilt. Axis

X_{v}

of the frame is perpendicular to

{\vec{e}}_{d}

as far as

X_{v}

is parallel to the horizontal plane. We can find axes directions

Y_{v}

and

X_{v}

of the frame

F_{v}

:

\begin{matrix} {\vec{n}}_{y v} & = {\vec{e}}_{d} - ({\vec{e}}_{d} \cdot {\vec{e}}_{z c}) {\vec{e}}_{z c} = {\vec{e}}_{d} + e_{z}^{z c} {\vec{e}}_{z c} \\ {\vec{e}}_{y v} & = {\vec{n}}_{y v} / | {\vec{n}}_{y v} | \\ {\vec{e}}_{x v} & = {\vec{e}}_{y v} \times {\vec{e}}_{z c} \end{matrix} .

(21)

Vectors (21) form the orthonormal basis, which allows us to construct a rotation matrix

R_{v 2 e}

for transition from

F_{v}

to

F_{e n u}

:

R_{v 2 e} = [\begin{matrix} e_{x}^{x v} & e_{x}^{y v} & e_{x}^{z c} \\ e_{y}^{x v} & e_{y}^{y v} & e_{y}^{z c} \\ e_{z}^{x v} & e_{z}^{y v} & e_{z}^{z c} \end{matrix}] = [\begin{matrix} {\vec{e}}_{x v} & {\vec{e}}_{y v} & {\vec{e}}_{z c} \end{matrix}] .

(22)

If

R_{v 2 e}

is a rotation matrix, the following equations hold:

R_{e 2 v} = R_{v 2 e}^{- 1} = R_{v 2 e}^{T} .

(23)

From Clause 4 of the problem statement, there is a line segment in the artificial image

\hat{Φ}

with a known slope in space. It is possible to compare the segment slope in the image

\hat{Φ}

and the slope of its source in space. So, we can estimate the camera horizon tilt angle (denote it as

γ

) and rotate the plane with the axes

X_{v}

and

Y_{v}

around the optical axis

Z_{c}

at this angle. The resulting system of coordinates

F_{c}

corresponds to the actual camera orientation. To pass from the camera coordinates

F_{c}

to

F_{e n u}

, we can first go from

F_{c}

to

F_{v}

by rotation with the angle

γ

around the optical axis

Z_{c}

using the rotation matrix

Γ = [\begin{matrix} cos γ & - sin γ & 0 \\ sin γ & cos γ & 0 \\ 0 & 0 & 1 \end{matrix}] .

(24)

We can describe the transition from

F_{c}

to

F_{e n u}

(without origin displacement) as a combination of rotations by the matrix

R_{c 2 e} = R_{v 2 e} Γ,

(25)

which is also a rotation matrix. The matrix that we have already designated R (2) gives the inverse transition from

F_{e n u}

to the coordinates of the camera frame

F_{c}

:

R = R_{c 2 e}^{T}

(26)

and the shift t, which in the coordinates

F_{c}

characterizes the transition from the beginning of the coordinates of

F_{e n u}

to the point of installation of the camera O. The t is often more easily expressed through the coordinates

O_{e n u}

given in

F_{e n u}

, see (3). To convert the coordinates of a

P_{e n u}

from

F_{e n u}

to

P_{c}

coordinates for the

F_{c}

camera frame, use the following expressions:

P_{c} = R (P_{e n u} - O) = R P_{e n u} + t .

(27)

Typically, an operator aims to set a camera with zero horizon tilt.

4.1.4. Obtain the Mapping $Λ^{- 1}$ for $Q_{i m}$

Let

{\hat{Q}}_{i m} = Θ (Q_{i m})

be the area in

\hat{Φ}

corresponding to

Q_{i m}

in the image

Φ

.

Q_{e n u}

and

Q_{c}

are the domain Q on the road plane in coordinates

F_{e n u}

, and

F_{c}

, respectively. From (27), we obtain

Q_{e n u} = R^{T} Q_{c} + O, Q_{c} = R (Q_{e n u} - O) .

(28)

From Clause 2 of the problem statement, it follows that an image of

Q_{c}

is visible in

Φ

(and in

\hat{Φ}

, so

{\hat{Q}}_{i m} \subset \hat{Φ}

).

We can convert the ENU coordinates of points

{P_{e n u}^{Q}}

to

F_{c}

by following (27). We denote the result as

{P_{c}^{Q_{i}}}_{i = 1}^{L}

. Let

P_{c}^{Q_{i}} = (x_{c}^{i}, y_{c}^{i}, z_{c}^{i})

.

We approximate its plane with the least squares method. The plane is defined in

F_{c}

by the equation

p_{x} x_{c} + p_{y} y_{c} + z_{c} = p_{z}, p = [\begin{matrix} p_{x} \\ p_{y} \\ p_{z} \end{matrix}] .

(29)

The matrix D and vector E represent the points on the road:

D = [\begin{matrix} x_{c}^{1} & y_{c}^{1} & - 1 \\ x_{c}^{2} & y_{c}^{2} & - 1 \\ ⋮ & ⋮ & ⋮ \\ x_{c}^{L} & y_{c}^{L} & - 1 \end{matrix}], E = [\begin{matrix} - z_{c}^{1} \\ - z_{c}^{2} \\ ⋮ \\ - z_{c}^{L} \end{matrix}] .

(30)

The plane parameters p can be found by solving the least squares problem

min_{q} {∥ D q - E ∥}^{2} = {∥ D p - E ∥}^{2},

(31)

the exact solution to the least squares task is:

p = {(D^{T} D)}^{- 1} D^{T} E .

(32)

We denote by

Π

the plane defined by (29) and, (32). Note that

Q_{c} \subset Π

.

For a point

(x_{c}, y_{c}, z_{c}) \in Π

represented by the pixel coordinates

(\hat{u}, \hat{v})

in

\hat{Φ}

we obtain, taking into account (6)

\begin{matrix} z_{c} & = & p_{z} - p_{x} x_{c} - p_{y} y_{c} \\ x_{c} / z_{c} & = & (\hat{u} - c_{u}) / f_{u} \\ y_{c} / z_{c} & = & (\hat{v} - c_{v}) / f_{v} \end{matrix}

(33)

.

If

\hat{u} \neq c_{u}

and

\hat{v} \neq c_{v}

(it means that

x_{c} \neq 0

and

y_{c} \neq 0

) then

\begin{matrix} p_{z} - p_{x} x_{c} - p_{y} y_{c} & = & x_{c} f_{u} / (\hat{u} - c_{u}) \\ p_{z} - p_{x} x_{c} - p_{y} y_{c} & = & y_{c} f_{v} / (\hat{v} - c_{v}) \end{matrix},

(34)

so there are linear equations that allow us to express

x_{c}

and

y_{c}

through

\hat{u}

and

\hat{v}

. Let

\begin{matrix} a (\hat{u}) = f_{u} / (\hat{u} - c_{u}) + p_{x} \\ b (\hat{v}) = f_{v} / (\hat{v} - c_{v}) + p_{y} \end{matrix} .

(35)

From (34), we obtain solution:

[\begin{matrix} x_{c} (\hat{u}, \hat{v}) \\ y_{c} (\hat{u}, \hat{v}) \end{matrix}] = \{\begin{matrix} [\begin{matrix} p_{z} (p_{y} - b (\hat{v})) / (p_{x} p_{y} - a (\hat{u}) b (\hat{v})) \\ p_{z} (p_{x} - a (\hat{u})) / (p_{x} p_{y} - a (\hat{u}) b (\hat{v})) \end{matrix}], & if \hat{u} \neq c_{u}, \hat{v} \neq c_{v} \\ [\begin{matrix} 0 \\ p_{z} / b (\hat{v}) \end{matrix}], & if \hat{u} = c_{u}, \hat{v} \neq c_{v} \\ [\begin{matrix} p_{z} / a (\hat{u}) \\ 0 \end{matrix}], & if \hat{u} \neq c_{u}, \hat{v} = c_{v} \\ [\begin{matrix} 0 \\ 0 \end{matrix}], & if \hat{u} = c_{u}, \hat{v} = c_{v} \end{matrix},

(36)

where

a (\hat{u})

and

b (\hat{v})

defined in (35) and

z_{c} (\hat{u}, \hat{v}) = p_{z} - p_{x} x_{c} (\hat{u}, \hat{v}) - p_{y} y_{c} (\hat{u}, \hat{v}) .

(37)

We have constructed a function that assigns to each pixel coordinates

g = (\hat{u}, \hat{v})

(in image

\hat{Φ}

) a spatial point in the coordinate system

F_{c}

, according to (36) and (37):

ξ (g) = [\begin{matrix} x_{c} (\hat{u}, \hat{v}) \\ y_{c} (\hat{u}, \hat{v}) \\ z_{c} (\hat{u}, \hat{v}) \end{matrix}] .

(38)

If

g \in {\hat{Q}}_{i m}

ξ (g) \in Q_{c}

, so

Q_{c} = ξ ({\hat{Q}}_{i m}),

(39)

Q_{e n u} = R^{T} Q_{c} + O = R^{T} ξ ({\hat{Q}}_{i m}) + O = R^{T} ξ (Θ (Q_{i m})) + O = Λ^{- 1} (Q_{i m}) .

(40)

Next, we will refer more to (39).

F_{e n u}

helps obtain measurement results, but

F_{c}

is sufficient for calculating metric lengths. There are other ways to map pixels to meters (see, e.g., [7] (pp. 47–55)), but their applicability depends on the data available.

4.2. Auxiliary Steps

4.2.1. Describing of the Area Q

Different Q shapes help for varying tasks. We fix

{\hat{Q}}_{i m}

as a quadrilateral on the image

\hat{Φ}

. The

Q_{c}

(source of the quadrilateral in space) in the plane

Π

may be a rectangle or a tetragon. We choose the four corners of

g_{1}, g_{2}, g_{3},

and

g_{4}

of the domain

{\hat{Q}}_{i m}

as pixels in the

\hat{Φ}

. The “tetragon”

Q_{i m} = Θ^{- 1} ({\hat{Q}}_{i m})

is the ROI in the original image

Φ

, in which we use object detection to estimate transport flows statistics. The lines bound

{\hat{Q}}_{i m}

are ([7] (p. 28)):

l_{b l}^{h} = g_{1}^{h} \times g_{2}^{h}, l_{t l}^{h} = g_{1}^{h} \times g_{3}^{h}, l_{t r}^{h} = g_{3}^{h} \times g_{4}^{h}, l_{b r}^{h} = g_{2}^{h} \times g_{4}^{h},

(41)

where

g_{i}^{h} = ({\hat{u}}^{i}, {\hat{v}}^{i}, 1)

are the homogeneous coordinates of pixel

g_{i} = ({\hat{u}}^{i}, {\hat{v}}^{i})

. Pixel

g^{h} \in {\hat{Q}}_{i m}

must be the solution to a system of four inequalities:

\{\begin{matrix} l_{t l}^{h} \cdot g^{h} \geq 0 \\ l_{t r}^{h} \cdot g^{h} \geq 0 \\ l_{b l}^{h} \cdot g^{h} \leq 0 \\ l_{b r}^{h} \cdot g^{h} \leq 0 \end{matrix} .

(42)

The

Q_{i m}

is a continuous domain in

R^{2}

, but the whole image

Φ

contains only a fixed number of actual pixels (

N \times M

). The same is true for

{\hat{Q}}_{i m} = Θ (Q_{i m})

. We can obtain real pixels that fall in

Q_{i m}

from

{\hat{Q}}_{i m}

with the per-pixel correspondence

Q_{i m} = Θ^{- 1} ({\hat{Q}}_{i m})

. We apply the mapping (39) and obtain the set of points in

F_{c}

which correspond to actual pixels in

Q_{i m}

.

F_{c}

is a metric coordinate system, so the distance in

F_{c}

can be used to estimate meters/second or objects/meter. The same is true for any rotations or shifts of the frame

F_{c}

.

4.2.2. Coordinate Frame Associated with the Plane $Π$

We can go from

F_{c}

to a coordinate system associated with the plane

Π

. We apply it for illustrations, but it can be helpful for other purposes. We can regard the traffic in

Q_{i m}

as anything that rises above the plane

Π

. We denote by

F_{π}

the coordinate system connected to

Π

. and use the normal to the plane

Π

at the coordinates

F_{c}

from (32) as the axis

Z_{π}

.

{\vec{n}}_{π} = [\begin{matrix} p_{x} \\ p_{y} \\ 1 \end{matrix}], {\vec{e}}_{z π} = {\vec{n}}_{π} / | {\vec{n}}_{π} | .

(43)

We choose in the

Π

plane the direction of another axis (e.g.,

Y_{π}

), and the third axis is determined automatically. Let the Y axis be pointed by the

g_{4} - g_{2}

vector (that is, along the direction of traffic movement in the Q area), then

\begin{matrix} {\vec{e}}_{y p} = & (ξ (g_{4}) - ξ (g_{2})) / | ξ (g_{4}) - ξ (g_{2}) | \\ {\vec{e}}_{x p} = & {\vec{e}}_{y p} \times {\vec{e}}_{z p} \end{matrix} .

(44)

Select the origin of the coordinate system

F_{π}

as follows

o = ξ (g_{2}) .

(45)

The rotation matrices from

F_{p}

to

F_{c}

and their inverses look like the following:

R_{π 2 c} = [\begin{matrix} {\vec{e}}_{x π}, {\vec{e}}_{y π}, {\vec{e}}_{z π} \end{matrix}] = [\begin{matrix} e_{x}^{x π} & e_{x}^{y π} & e_{x}^{z π} \\ e_{y}^{x π} & e_{y}^{y π} & e_{y}^{z π} \\ e_{z}^{x π} & e_{z}^{y π} & e_{z}^{z π} \end{matrix}], R_{c 2 π} = R_{π 2 c}^{T} .

(46)

Use the following expression for the translation of coordinates of

P_{c}

given in frame

F_{c}

to

P_{π}

given in frame

F_{π}

:

P_{π} = R_{c 2 π} (P_{c} - o) .

(47)

5. Examples

Consider a couple of public camera parameters evaluations, where the images and small sets of coordinates

{P_{e n u}}

and

{P_{i m}}

of limited accuracy are available.

5.1. Example 1: Inclined Driving Surface and a Camera with Tilted Horizon

Figure 3 shows a video frame from the public camera.

Figure 3. Field of view of the public street camera in Vladivostok. Example 1, image

Φ

.

The camera is a good example, as its video contains noticeable perspective and radial distortion. In the image, there is a line demonstrating the camera’s slight horizon tilt (the wall at the base of which is

P^{v}

). The visible part of the road has a significant inclination. This is a FullHD camera (

N = 1920, M = 1080

). Select a point

P^{0}

as the origin of

F_{e n u}

. Assess ENU coordinates of the point O (the camera position), point G (the source of the principal point C, Figure 1 and Figure 3), and points

P^{u}

and

P^{v}

on the lines

v = c_{v}

and

u = c_{u}

, respectively. Using maps and online photos, we obtained the values listed in Table 1. Convert global coordinates to the

F_{e n u}

coordinates (see [8]) and add it to the table (in meters).

Table 1. The points used for the calibration. Latitude and longitude in degrees, altitudes and ENU coordinates in meters, coordinates of pixels in

F_{i m}

units (u is the column index, v is the row).

The points from Table 1 can be found on the satellite layer of [18] using the latitude and longitude of the query.

We obtain

f_{u} = 1166.2

and

f_{v} = 1241.55

by using (15), (16) and Table 1 data. Street camera image sensors usually have square pixels, so

w = h

and

f_{u} = f_{v}

(4). If the difference

f_{u}

and

f_{v}

is small, let

f_{u}, f_{v}

be equal

f_{u} = f_{v} = 1203.89

(we use mean value). So we have an approximation of matrix A.

Now, we can estimate radial distortion coefficients

k_{1}

and

k_{2}

by minimizing distances (19) or in another way. Put

k_{1} = - 0.24

and

k_{2} = 0

to compute the mapping

Θ

(17) and apply it to eliminate radial distortion

k_{1}

and

k_{2}

from the original image (

Φ

). The mapping does not change for different frames from the camera video. We obtained the undistorted version of image

Φ

, which we have identified

\hat{Φ}

(Figure 4).

Figure 4. The image after compensation of radial distortion. Example 1, image

\hat{Φ} = Θ (Φ)

.

The radial distortion of the straight lines in the vicinity of the road has almost disappeared in

\hat{Φ}

. The camera’s field of view decreased, the C point remained in place, and the

p_{1}

and

p_{2}

points moved further along the lines

v = c_{v}

and

u = c_{u}

. The values of

f_{u}

and

f_{v}

can be recalculated, but radial distortion is not the only cause of errors. Therefore, we will perform additional cross-validation and compensate the values of

f_{u}

and

f_{v}

if required.

We can estimate the horizon tilt angle from the image (Figure 4) and rotate the plane with the axes

X_{v}

and

Y_{v}

of frame

F_{v}

around the optical axis

Z_{c}

at this angle. As a result of rotating the image around the optical axis of the camera on

4.1 °

, the verticality of the required line was achieved (Figure 5), so let

γ = 4.1 °

in (24).

Figure 5. The image

\hat{Φ}

rotated by

4.1 °

around the optical axis of the camera.

We compute the camera orientation matrix R with (20)–(26). Now, we can convert

F_{e n u}

coordinates to

F_{c}

with (27).

Let us use the area nearest to the camera carriageway region as the ROI (area Q). We select several points in Q, estimate their global coordinates, and convert them to

F_{e n u}

[8]. Next, we convert

F_{e n u}

coordinates to

F_{c}

with (27). The results are in Table 2.

Table 2. The spatial points used for the road plane approximation (global and

F_{c}

coordinates);

F_{c}

coordinates are in meters.

We obtain et the road plane

Π

approximation with the (29) and (32):

p = (- 0.20316, 2.04433, 86.99813) .

(48)

We choose four corners of

g_{1}, g_{2}, g_{3}

and

g_{4}

of the domain

{\hat{Q}}_{i m}

in the

\hat{Φ}

(see Table 3 and Figure 6).

Table 3. Domain

{\hat{Q}}_{i m}

corners coordinates in

\hat{Φ}

.

Figure 6. The

Q_{π}

set visualization in plane

Π

, by the colors of the pixels from

{\hat{Q}}_{i m}

. Given the geometry of the sample,

y_{π}

is the horizontal axis,

x_{π}

the vertical.

We calculate the lines that bound domain

{\hat{Q}}_{i m}

by (41) and detect the set of pixel coordinates that belong to

{\hat{Q}}_{i m}

. Note that this set does not change for different frames from the camera video. We can save it for later usage with the camera and the Q.

We compute the

Q_{c}

as

ξ ({\hat{Q}}_{i m})

by (48) and (35)–(39). We convert the pixel coordinates from

Q_{c}

to

F_{π}

by (43)–(47):

Q_{π} = R_{c 2 π} (Q_{c} - o)

(49)

and save the result. The coordinates set

Q_{π}

does not change for the (fixed) camera and the Q. The obtained discrete sets

Q_{c}

and

Q_{π}

are sets of metric coordinates. We can use

Q_{π}

for measurements on the plane

Π

as is or apply an interpolation.

Since

Q_{π} \subset Π

, the point set is suitable to output in a plane picture. If for each point, we use

ξ (g) \in Q_{π}

and the color of the pixel g for all

g \in {\hat{Q}}_{i m} \subset \hat{Φ}

, and output the plane as the scatter map, we obtain the bottom image from Figure 6. We chose a multi-car scene in the center of Q for the demonstration. We obtain the accurate ”view from above” for the points that initially lie on the road’s surface. The positions of the pixels representing objects that tower above plane

Π

have shifted. There are options for estimating and accounting for the height of cars and other objects so that their images look realistic in the bottom image. However, this is the subject of a separate article.

It is worth paying attention to the axes of the bottom figure (they show meters). In comparison with the original (top image of Figure 6), there are noticeable perspective distortion changes in the perception of distance. The width and length of the area are in good agreement with the measurements made by the rangefinder and estimates from online maps. The horizontal lines in the image are almost parallel which indicates the good quality of

f_{u}

and

f_{v}

. In this way, we cross-checked the camera parameters obtained earlier. Obviously, all vehicles are in contact with the road surface.

We do not need to remove distortions (radial and perspective) from all video frames to estimate traffic statistics. The object detection or instance segmentation works with original video frames in

Q_{i m}

. We use the distortions compensation for traffic measurements to obtain distances in meters for selected pixels. We calculate the needed maps once for the camera parameters and Q and use them as needed for measurements. Let us demonstrate this on a specific trajectory.

The detected trajectory of the vehicle consists of 252 contours, 180 of which are in

Q_{i m}

. A total of 200 contours in the image look messy, so we draw every 20th (Figure 7). We deliberately did not choose rectangles to demonstrate a more general case. Vertex coordinates describe the contour. It is enough to select one point on or inside each contour to evaluate the speed or acceleration of an object. The point should not move around the object; we want the point source closer to the road’s surface. The left bottom corner of a contour is well suited for this camera. However, what does it mean for a polygon? For a contour

{(u_{i}, v_{i})}_{i}

we can build the pixel coordinates

(u_{m i n}, v_{m a x}) = (min_{i} u_{i}, max_{i} (v_{i})) .

Figure 7. Every twentieth contour of a vehicle’s trajectory in the original image

Φ

.

We call the left bottom corner of a contour the vertex

(u_{j}, v_{j})

for that

j = \underset{i}{arg min} | (u_{i}, v_{i}) - (u_{m i n}, v_{m a x}) |,

considering the axes direction of the

F_{i m}

. Another option is to search the point nearest to

(u_{m i n}, v_{m a x})

on the edges of the contour with the help of (19).

We map all contour vertexes on the

Q_{π}

to obtain Figure 8. However, we need to map only these “corners” for measurements.

Figure 8. Every twentieth contour in the vehicle’s trajectory mapped on the

Q_{π}

with the selected points.

The “corners” of the selected contours have the following

y_{π}

coordinates after the mapping:

{- 1.05, 10.13, 23.72, 35.66, 48.63, 61.80, 74.20, 86.15, 99.6} .

We can use a variety of formulas to estimate speed, acceleration, and variation. The simplest estimation of the vehicle speed is (the camera frame rate is 25 frames/second, we use 1/20 of the frames):

= (25 / 20) * (99.6 + 1.05) / 8 = 15.72656 m / s

or

56.6

km/h.

5.2. Example 2. More Radial Distortion and Vegetation, More Calibration Points

Figure 9 illustrates a frame image from another public camera of the same operator. This camera has zero horizon tilt and more substantial radial distortion. We have calibrated the camera and calculated the maps in the season of rich vegetation. Tree foliage complicates the selection and estimation of coordinates of points. The two-way road is visible to the camera. This is a FullHD camera (

N = 1920, M = 1080, c_{u} = N / 2, c_{v} = M / 2

). The zero horizon tilt is visible on the line

u = c_{u}

near

P_{5}^{v}

(see Table 4). We select the point

P^{0}

as the origin of

F_{e n u}

(see Table 4). We assess global coordinates of the point O (the camera position), point G (the source of the principal point C, Figure 1, Figure 9), and points

P_{1}^{u}, P_{2}^{v}, P_{3}^{u}, P_{4}^{v},

and

P_{5}^{v}

on the lines

v = c_{v}

(index u) or

u = c_{u}

(index v). Next, we convert the global coordinates to

F_{e n u}

and append them to Table 4.

Figure 9. Field of view of the public street camera in Vladivostok. Example 2, image

Φ

.

Table 4. The points used for the calibration. Latitude and longitude are in degrees, altitudes and ENU coordinates are in meters, and coordinates of pixels are in

F_{i m}

units (u is the column index, v is the row).

We obtained estimations

f_{u}^{1} = 1161.93, f_{v}^{2} = 1382.45, f_{u}^{3} = 1168.78, f_{v}^{4} = 1354.82,

and

f_{v}^{5} = 1349.67

using Formulas (15) and (16) and Table 4 data. There are a few points, but one can assume a pattern

f_{v} > f_{u}

. Perhaps the camera’s sensor pixel has a rectangular shape (see (4)). Consider the two hypotheses:

f_{u} = (f_{u}^{1} + f_{u}^{3}) / 2 = 1165.355, f_{v} = (f_{v}^{2} + f_{v}^{4} + f_{v}^{5}) / 3 = 1362.313;

(50)

f_{u} = f_{v} = (f_{u}^{1} + f_{u}^{3} + f_{v}^{2} + f_{v}^{4} + f_{v}^{5}) / 5 = 1283.53 .

(51)

The second variant means the square pixel.

Let us try variant (50). With

f_{u}

and

f_{v}

we obtained the intrinsic parameters matrix A (1). We estimate radial distortion coefficients

k_{1}

and

k_{2}

, put

k_{1} = - 0.17

and

k_{2} = 0.01

to compute the mapping

Θ

(17), and apply it to eliminate radial distortion from the (

Φ

). We obtained the undistorted image

\hat{Φ}

(see Figure 10). We showed a cropped and interpolated version of

Φ

in Example 1 (Figure 4). Let us demonstrate the result of mapping

Θ

without postprocessing. The image’s resolution

\hat{Φ}

is

2935 \times 1651

, but it contains only

1920 \times 1080

pixels colored by the camera. So, the black grid is the unfilled pixels of the large rectangular.

Figure 10. The image

Φ

after compensation of radial distortion; Example 2, image

\hat{Φ} = Θ (Φ)

without postprocessing.

Let us return to the illustration that is more pleasing to the eye by cutting off part of

\hat{Φ}

and filling the void with interpolation (see Figure 11).

Figure 11. Cropped and interpolated part of image

\hat{Φ}

.

We noted that in the field of view of the camera, there is a section of the wall of the building (near

P_{5}^{u}

), allowing you to put

γ = 0

(see Figure 9). We are ready to compute the orientation matrix R (20)–(26), so we can convert

F_{e n u}

coordinates to

F_{c}

(27). We select the points to approximate the road surface (see Table 5).

Table 5. The points used to approximate road plane

Π

(global and

F_{c}

coordinates,

F_{c}

and altitudes given in meters).

We calculate road plane

Π

with (29) and (32):

p = (0.00006008, 2.93113699, 130.65768737) .

(52)

We choose four corners of

g_{1}, g_{2}, g_{3},

and

g_{4}

of

{\hat{Q}}_{i m}

in the image

\hat{Φ}

(see Table 6 and Figure 12). Let us try the nonrectangular area

{\hat{Q}}_{i m}

.

Table 6. Domain

{\hat{Q}}_{i m}

corners coordinates in

\hat{Φ}

, Example 2.

Figure 12. The

Q_{π}

set visualization in plane

Π

for case (50).

We calculate the lines that bound domain

{\hat{Q}}_{i m}

with (41) and detect the set of pixel coordinates that belongs to

{\hat{Q}}_{i m}

. We compute the

Q_{c}

as

ξ ({\hat{Q}}_{i m})

with (48) and (35)–(39). Next, we convert the pixel coordinates from

Q_{c}

to

F_{π}

with (43)–(47) and (49) and save the result.

The distances in Q correspond to estimates obtained online and a simple rangefinder (accuracy up to a meter). We plan more accurate assessments using the geodetic tools.

We repeat the needed steps for hypothesis (51) and compare the results (see Figure 12 and Figure 13).

Figure 13. The

Q_{π}

set visualization in plane

Π

for case (51).

We observed a change in the geometry of

Q_{π}

for the hypothesis (51). For example, the pedestrian crossing changed its inclination; the road began to expand to the right. In this example, it is not easy to obtain several long parallel lines due to vegetation, and it is an argument for examining the cameras in a suitable season.

6. Conclusions

This work is an extension of the technology of road traffic data collection described in [4] to allow accurate car density and speed measurements using physical units with compensation for perspective and radial distortion.

Although we processed video frames in the article, all the mappings obtained for measurements (

Θ, ξ, [R | t], R_{c 2 π}

) transform only coordinates. We used the pixel content of the images only for demonstrations. In the context of measurements, object detection (or instance segmentation) and object tracking algorithms work with pixel colors. The mappings work with the pixel coordinates of the results of these algorithms. We can calculate the discrete maps once and use them until the camera parameters or the Q area change. In this sense, the computational complexity of mapping generation is not particularly important. We plan to refine estimates of camera parameters as new data on the actual geometry of the ROI area will be available, and the accuracy of the maps will increase. Evaluating or refining the camera parameters in a suitable season might be better.

Author Contributions

Investigation, A.Z. and E.N.; Software, A.Z.; Visualization, A.Z.; Writing—original draft, A.Z.; Writing—review & editing, A.Z. and E.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by the project 075-02-2022-880 of Ministry for Science and Higher Education of the Russian Federation from 31 January 2022.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Xinqiang, C.; Shubo, W.; Chaojian, S.; Yanguo, H.; Yongsheng, Y.; Ruimin, K.; Jiansen, Z. Sensing Data Supported Traffic Flow Prediction via Denoising Schemes and ANN: A Comparison. IEEE Sens. J. 2020, 20, 14317–14328. [Google Scholar]
Xiaohan, L.; Xiaobo, Q.; Xiaolei, M. Improving Flex-route Transit Services with Modular Autonomous Vehicles. Transp. Res. Part E Logist. Transp. Rev. 2021, 149, 102331. [Google Scholar]
75 Years of the Fundamental Diagram for Traffic Flow Theory: Greenshields Symposium TRB Transportation Research Electronic Circular E-C149. 2011. Available online: http://onlinepubs.trb.org/onlinepubs/circulars/ec149.pdf (accessed on 17 September 2022).
Zatserkovnyy, A.; Nurminski, E. Neural network analysis of transportation flows of urban agglomeration using the data from public video cameras. Math. Model. Numer. Simul. 2021, 13, 305–318. [Google Scholar]
Camera Calibration and 3D Reconstruction. OpenCV Documentation Main Modules. Available online: https://docs.opencv.org/4.5.5/d9/d0c/group__calib3d.html (accessed on 18 March 2022).
Szeliski, R. Computer Vision: Algorithms and Applications, 2nd ed.; Springer: Cham, Switzerland, 2022. [Google Scholar]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: New York, NY, USA, 2003. [Google Scholar]
Transformations between ECEF and ENU Coordinates. ESA Naupedia. Available online: https://gssc.esa.int/navipedia/index.php/Transformations_between_ECEF_and_ENU_coordinates (accessed on 18 March 2022).
Forsyth, D.A.; Ponce, J. Computer Vision: A Modern Approach, 2nd ed.; Pearson: Hoboken, NJ, USA, 2012. [Google Scholar]
Peng, S.; Sturm, P. Calibration Wizard: A Guidance System for Camera Calibration Based on Modeling Geometric and Corner Uncertainty. arXiv 2019, arXiv:1811.03264. [Google Scholar]
Camera Calibration. OpenCV tutorials, Python. Available online: http://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html (accessed on 18 March 2022).
Marchand, E.; Uchiyama, H.; Spindler, F. Pose estimation for augmented reality: A hands-on survey. IEEE Trans. Vis. Comput. Graph. 2016, 22, 2633–2651. [Google Scholar] [CrossRef] [PubMed]
Perspective-n-Point (PnP) Pose Computation. OpenCV documentation. Available online: https://docs.opencv.org/4.x/d5/d1f/calib3d_solvePnP.html (accessed on 18 March 2022).
Liu, C.-M.; Juang, J.-C. Estimation of Lane-Level Traffic Flow Using a Deep Learning Technique. Appl. Sci. 2021, 11, 5619. [Google Scholar] [CrossRef]
Khazukov, K.; Shepelev, V.; Karpeta, T.; Shabiev, S.; Slobodin, I.; Charbadze, I.; Alferova, I. Real-time monitoring of traffic parameters. J. Big Data 2020, 7, 84. [Google Scholar] [CrossRef]
Zhengxia, Z.; Zhenwei, S. Object Detection in 20 Years: A Survey. arXiv 2019, arXiv:1905.05055v2. [Google Scholar]
Huchuan, L.; Dong, W. Online Visual Tracking; Springer: Singapore, 2019. [Google Scholar]
Yandex Maps. Available online: https://maps.yandex.ru (accessed on 18 March 2022).

Figure 1. Coordinate frames

F_{c}

,

F_{i m}

.

P_{i m}

are the image of the 3D point P. Axis

Z_{c}

follows the camera optical axis. The camera forms a real image on the image sensor behind aperture O in the plane

z = - f

; however, the equivalent virtual image in the plane

z = f

is preferable in illustrations.

Figure 2. Virtual and real images for a camera with aperture O. The scene is projected on the plane

X Z

of frame

F_{c}

. The coordinates of the point P in

F_{c}

are denoted by

x_{c}, z_{c}

, and the

u, c_{u}

are coordinates of points

P, C

on the axis U of frame

F_{i m}

.

Figure 3. Field of view of the public street camera in Vladivostok. Example 1, image

Φ

.

Figure 4. The image after compensation of radial distortion. Example 1, image

\hat{Φ} = Θ (Φ)

.

Figure 5. The image

\hat{Φ}

rotated by

4.1 °

around the optical axis of the camera.

Figure 6. The

Q_{π}

set visualization in plane

Π

, by the colors of the pixels from

{\hat{Q}}_{i m}

. Given the geometry of the sample,

y_{π}

is the horizontal axis,

x_{π}

the vertical.

Figure 7. Every twentieth contour of a vehicle’s trajectory in the original image

Φ

.

Figure 8. Every twentieth contour in the vehicle’s trajectory mapped on the

Q_{π}

with the selected points.

Figure 9. Field of view of the public street camera in Vladivostok. Example 2, image

Φ

.

Figure 10. The image

Φ

after compensation of radial distortion; Example 2, image

\hat{Φ} = Θ (Φ)

without postprocessing.

Figure 11. Cropped and interpolated part of image

\hat{Φ}

.

Figure 12. The

Q_{π}

set visualization in plane

Π

for case (50).

Figure 13. The

Q_{π}

set visualization in plane

Π

for case (51).

Table 1. The points used for the calibration. Latitude and longitude in degrees, altitudes and ENU coordinates in meters, coordinates of pixels in

F_{i m}

units (u is the column index, v is the row).

Table 1. The points used for the calibration. Latitude and longitude in degrees, altitudes and ENU coordinates in meters, coordinates of pixels in

F_{i m}

units (u is the column index, v is the row).

Name	Lat	Long	Height	u	v	$x_{enu}$	$y_{enu}$	$z_{enu}$
$P^{0}$	$43.175553 °$	$131.917725 °$	56			0	0	0
G	$43.176295 °$	$131.918380 °$	57	960	540	$53.3$	$82.4$	1
O	$43.176934 °$	$131.917912 °$	98			$15.2$	$153.4$	42
$P^{u}$	$43.176033 °$	$131.917937 °$	57	1534	540	$17.2$	$53.3$	1
$P^{v}$	$43.175828 °$	$131.918728 °$	52	960	353	$81.6$	$30.6$	$- 4$

Table 2. The spatial points used for the road plane approximation (global and

F_{c}

coordinates);

F_{c}

coordinates are in meters.

Table 2. The spatial points used for the road plane approximation (global and

F_{c}

coordinates);

F_{c}

coordinates are in meters.

Num	Lat	Long	Height	$x_{c}$	$y_{c}$	$z_{c}$
1	$43.176500 °$	$131.918103 °$	$59.36$	$8.22$	$12.45$	$61.94$
2	$43.176442 °$	$131.918310 °$	59	$- 3.12$	$5.76$	$74.25$
3	$43.176532 °$	$131.918362 °$	$59.3$	$- 11.75$	$7.97$	$68.04$
4	$43.176329 °$	$131.918286 °$	$58.6$	$4.82$	$2.07$	$83.47$
5	$43.176139 °$	$131.917970 °$	$57$	$37.46$	$2.9$	$89.96$
6	$43.176093 °$	$131.918187 °$	$56.9$	$24.76$	$- 3.76$	$101.44$
7	$43.175553 °$	$131.917725 °$	56	$87.14$	$- 14.46$	$133.16$
8	$43.175495 °$	$131.917991 °$	$55.9$	$71.67$	$- 22.71$	$147.37$
9	$43.175794 °$	$131.917845 °$	56	$65.33$	$- 7.39$	$116.24$

Table 3. Domain

{\hat{Q}}_{i m}

corners coordinates in

\hat{Φ}

.

Table 3. Domain

{\hat{Q}}_{i m}

corners coordinates in

\hat{Φ}

.

Corner	Name	u	v
top left	$g_{1}$	504	849
bottom left	$g_{2}$	739	1079
top right	$g_{3}$	1468	410
bottom right	$g_{4}$	1642	462

Table 4. The points used for the calibration. Latitude and longitude are in degrees, altitudes and ENU coordinates are in meters, and coordinates of pixels are in

F_{i m}

units (u is the column index, v is the row).

Table 4. The points used for the calibration. Latitude and longitude are in degrees, altitudes and ENU coordinates are in meters, and coordinates of pixels are in

F_{i m}

units (u is the column index, v is the row).

Name	Lat	Long	Height	u	v	$x_{enu}$	$y_{e n u}$	$z_{enu}$
$P^{0}$	$43.168143 °$	$131.916257 °$	14			0	0	0
G	$43.168356 °$	$131.916733 °$	25	960	540	$38.7$	$23.66$	$11$
O	$43.167616 °$	$131.916245 °$	56			$- 0.97$	$- 58.54$	$42$
$P_{1}^{u}$	$43.168775 °$	$131.916107 °$	18	306	540	$- 12.2$	$70.2$	$4$
$P_{2}^{v}$	$43.168361 °$	$131.916740 °$	15	960	671	$39.27$	$24.22$	$0.7$
$P_{3}^{u}$	$43.168330 °$	$131.917431 °$	14	1559	540	$116.61$	$25.89$	$0$
$P_{4}^{v}$	$43.168200 °$	$131.916631 °$	$13.5$	960	815	$30.41$	$6.33$	$0$
$P_{5}^{v}$	$43.169134 °$	$131.917249 °$	43	960	199	$80.67$	$110.1$	$27$

Table 5. The points used to approximate road plane

Π

(global and

F_{c}

coordinates,

F_{c}

and altitudes given in meters).

Table 5. The points used to approximate road plane

Π

(global and

F_{c}

coordinates,

F_{c}

and altitudes given in meters).

Num	Lat	Long	Height	$x_{c}$	$y_{c}$	$z_{c}$
1	$43.168145 °$	$131.916262 °$	$13.5$	$0$	$0$	$- 0.1$
2	$43.168192 °$	$131.916673 °$	$13.5$	$33.82$	$5.44$	$- 0.1$
3	$43.168275 °$	$131.917036 °$	14	$63.34$	$14.67$	$- 0.1$
4	$43.168367 °$	$131.917515 °$	$14.3$	$102.29$	$24.89$	$- 0$
6	$43.168519 °$	$131.918425 °$	15	$176.29$	$41.77$	$0.1$
7	$43.167925 °$	$131.916348 °$	14	$7.4$	$- 24.22$	$- 0.1$
8	$43.168005 °$	$131.916540 °$	14	$23.01$	$- 15.33$	$- 0.1$
9	$43.168011 °$	$131.916699 °$	14	$35.94$	$- 14.66$	$- 0.1$
10	$43.168110 °$	$131.917060 °$	14	$65.3$	$- 3.67$	$- 0.1$
11	$43.168210 °$	$131.917540 °$	$14.5$	$104.33$	$7.44$	$- 0.1$

Table 6. Domain

{\hat{Q}}_{i m}

corners coordinates in

\hat{Φ}

, Example 2.

Table 6. Domain

{\hat{Q}}_{i m}

corners coordinates in

\hat{Φ}

, Example 2.

Corner	Name	u	v
top left	$g_{1}$	509	962
bottom left	$g_{2}$	993	1078
top right	$g_{3}$	1630	498
bottom right	$g_{4}$	1858	558

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Identification of Location and Camera Parameters for Public Live Streaming Web Cameras

Abstract

1. Introduction

2. Coordinate Systems, Models

3. Public Camera Parameters Estimation

Site-Specific Calibration

4. Formulation of the Problem and Its Solution Algorithm

4.1. Solution Algorithm

4.1.1. Obtain the Intrinsic Parameters (Matrix A)

4.1.2. Obtain the Radial Distortion Compensation Map

4.1.3. Obtain the Camera Orientation (Matrix R )

4.1.4. Obtain the Mapping $Λ^{- 1}$ for $Q_{i m}$

4.2. Auxiliary Steps

4.2.1. Describing of the Area Q

4.2.2. Coordinate Frame Associated with the Plane $Π$

5. Examples

5.1. Example 1: Inclined Driving Surface and a Camera with Tilted Horizon

5.2. Example 2. More Radial Distortion and Vegetation, More Calibration Points

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Identification of Location and Camera Parameters for Public Live Streaming Web Cameras

Abstract

1. Introduction

2. Coordinate Systems, Models

3. Public Camera Parameters Estimation

Site-Specific Calibration

4. Formulation of the Problem and Its Solution Algorithm

4.1. Solution Algorithm

4.1.1. Obtain the Intrinsic Parameters (Matrix A)

4.1.2. Obtain the Radial Distortion Compensation Map

4.1.3. Obtain the Camera Orientation (Matrix R )

4.1.4. Obtain the Mapping Λ − 1 for Q i m

4.2. Auxiliary Steps

4.2.1. Describing of the Area Q

4.2.2. Coordinate Frame Associated with the Plane Π

5. Examples

5.1. Example 1: Inclined Driving Surface and a Camera with Tilted Horizon

5.2. Example 2. More Radial Distortion and Vegetation, More Calibration Points

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

4.1.4. Obtain the Mapping $Λ^{- 1}$ for $Q_{i m}$

4.2.2. Coordinate Frame Associated with the Plane $Π$