^{1}

^{*}

^{1}

^{2}

This article is an Open Access article distributed under the terms and conditions of the Creative Commons Attribution license (

In order to estimate the speed of a moving vehicle with side view camera images, velocity vectors of a sufficient number of reference points identified on the vehicle must be found using frame images. This procedure involves two main steps. In the first step, a sufficient number of points from the vehicle is selected, and these points must be accurately tracked on at least two successive video frames. In the second step, by using the displacement vectors of the tracked points and passed time, the velocity vectors of those points are computed. Computed velocity vectors are defined in the video image coordinate system and displacement vectors are measured by the means of pixel units. Then the magnitudes of the computed vectors in image space should be transformed to the object space to find the absolute values of these magnitudes. This transformation requires an image to object space information in a mathematical sense that is achieved by means of the calibration and orientation parameters of the video frame images. This paper presents proposed solutions for the problems of using side view camera images mentioned here.

In recent years, much research has been performed for developing real time traffic monitoring systems for managing the traffic flow of roadways, prevention of accidents, providing secure transportation,

In this paper, we present some of the first results of our ongoing research project on the problem of real time estimation of moving vehicles by using side view video images. To find vehicle speed, any digital video camera which acquires images in visible light spectrum may be used. Frame sampling rate, geometric and radiometric resolutions, and distortion amounts of the optical system of the camera affect the precision of the estimated speeds.

Solutions and the models to be used for speed estimation problem vary according to the applications and their final purposes. When applications related to vehicle speed estimation problems are investigated, two main fields are distinguished: traffic surveillance [

The starting point of many works for traffic surveillance applications are based on the segmentation of the moving objects, and for this purpose background subtraction methods are mostly used [

In this paper, we examine the problem of real time speed estimation of one moving vehicle from side view video images. The proposed solution to this problem may be used directly for traffic law enforcement to prevent the drivers from exceeding the speed limits. Furthermore, the proposed methods may also be used within a sensor network for active driver assistance and security systems. We are currently developing an intelligent sensor network to be used for both driver assistance and for automatic mobile vehicles [

In order to solve the speed estimation problem of an individual vehicle using video frame images, many points which are identified on the image of the vehicle should be selected. Then, the displacement amounts of each selected point between two successive image frames and per unit time, should be found. Those displacement amounts per unit time are essentially equal to the instantaneous speeds of each point. These briefly explained tasks must be performed automatically and also within a very short time period of less than one second. Since the nature of the problem is ill posed, many technical problems relating to the above tasks must be solved. Even if we ignore the physical structure of the problem for a moment, the matters related to the selection process of the points to be tracked and tracking those selected points on the successive image frames involve difficult problems to be solved too. For example, because of the motion of the moving vehicle, if a selected point cannot be seen on the next frame or it falls into the out of vision range of the camera, what should be done? Some other problems are how will the time passed be measured? If displacement vectors of the points have been obtained in the image coordinate system, what will be their corresponding absolute values in the object space? It is possible to find solutions to those problems by using different approaches according to the underlying problem. In this paper, all of those problems mentioned above will be handled and we will give the first results of our ongoing studies on the proposed solutions to those problems. In conjunction with this, we will explain the approaches that we used to estimate the speed of a vehicle as well as the image processing procedure that we used to select the tracking points and computation of displacement vectors,

In this paper, we propose the real-time estimation of only one moving vehicle’s speed by using one video camera and side view images taken with it. Since it is not possible to extract 3D geometric information with one camera, in order to solve the speed estimation problem, some geometric constraints are required and the images should be taken under these constraints and the processing procedures should also be performed with those restrictions. For example, we assume that the imaged scene is flat. Perspective distortions on the acquired images must be either very small or of a degree that they can easily be rectified. Furthermore, by using only one camera, the velocity vectors can only be obtained in two dimensions, so the scale of the images along the 2D velocity vectors should be defined in a precise manner. For this purpose, at least the length of a line joining two points within the field of view of the camera and on the road and aligned along the velocity vectors, must be measured precisely. In this paper, we measured the lengths of two lines along the road by geodetic measurements using a simple measurement tape, within a precision of ±1 millimetre.

Since we define the scale of the images along the road, the field of view (FOV) of the camera must be set up so that it acquires the moving direction of the vehicles, ^{2}. The pixel size which corresponds to the effective area of the camera is 9 microns. The focal length of the camera is 5.9 mm. We capture images in grey level mode at 30 fps (frames per second), meaning that a frame is captured within 33.3 milliseconds after the previous frame had been obtained, so in this case, all of the computations required for the speed estimation problem must also be completed within 33.3 milliseconds, which raises two very important questions at this time: (1) is it possible to perfom all of the computations within a very short time period (

In order to answer the first question, at first it should be noted that we use gradient based LK optical flow and we compute sparse optical flow rather than dense flow. This is a reasonable approach given the computation time of the each individual algorithms and their total computation time. Before giving this information, it will be better to give the flow of the overall process because this will enable the reader to see which algorithms have been used. In

As seen in ^{2}.

The total computation time of the whole image resolution, ^{2} changes between 35 × 125 milliseconds. Our routines are executed on the Windows XP platform. We have discarded all of the utility programs such as antivirus and other unnecessary Windows components. On the other side, to guarantee we can capture frames without any frame loss, our code controls the computation time while running in real-time mode. For example, if the computations end in 30 milliseconds, the program waits for 3.3 milliseconds for capturing the next frame. To capture a new frame, we force a capture command under the control of the program. In very rare cases, especially when some of the Windows programs are running, the computation time might increase. We have obtained the maximum delay time about 10 milliseconds when this situation occurs.

Some comments should be given for the second problem too. Is it possible to track a point on the next frame, if the vehicle is too fast and the camera is very close to vehicle? There are two main factors which are to be considered for the speed estimation problem with one camera and with side view images taken with it. One of them is the Lucas Kanade algorithm (LK) that we used for tracking, and the other one is results from the physical nature of the problem. The physical problem is originated by the theory of relativity and from the image motion effect. This situation arises when both the camera is very close to vehicle and the vehicle is faster than the scanline sampling rate of the camera. This physical problem does not occur, if the limitations given in

Now, let us explain the factors regarding the LK optical flow algorithm which is explained in detail in Section 6.2. The LK algorithm assumes that the displacement of a point is a few pixels (assumption 2 of Section 6.2). By using a pyramidal level approach, the algorithm can match the corresponding points, even if their displacements are substantially greater. When the level of the pyramid increases, the algorithm can match the more distant points. But in this case, the accuracy of the matching decreases [

In the following section, we explain the physical model of the speed estimation problem. In Section 4, rectification and the solution of the scale problem are given, in Section 5 selection of the tracking points and in Section 6, optical flow for the tracking of the selected points are explained with a real application sample from our test studies.

In order to find the speed of a vehicle by a camera, we must find how the reference points, which are selected on the vehicle, change their positions in time. Since those points are stable on the vehicle (relative to vehicle itself) if the vehicle is moving relative to the observer (

To find the vehicle speed, successive frame images of the camera can be used. In this case, only the instantaneous speed can be found. This instantaneous speed is computed as follows:
^{2} (^{2}. The displacement vector expresses the spatial displacement of a point during the time interval Δt. Here the time interval Δt is equal to the time which passes between two successive video frames and is equal to the frame replay rate (or frame capture rate) of the camera. In the experiments given in this paper, Δt is 33.3 milliseconds, which is the frame capture time of the camera that we used. As seen in _{i}_{iv}_{i}^{th} point on the vehicle and n is the number of the selected and tracked points. Here, it should be noted that, if some of the _{i}_{iv}_{iv}_{i}_{i}

In

As seen in

In

In

When the vehicle enters the FOV of the camera for the first time, the instantaneous speeds of the vehicle at time instances between each successive image frames, should be computed continuously until it leaves the FOV. By using those computed instantaneous speeds, the average speed of the vehicle, during the time interval that passed between the entrance and exit times, can be found. Let I(t_{1}), …, I(t_{m}) be the frame images of the vehicle at time instances t_{j} and m is the number of frames on which the vehicle is apparent. Then by using the instantaneous speeds v_{iv}(t_{j}) of the vehicle which are computed at time instances t_{j} between the frames I(t_{j}) and I(t_{j + 1}) where (j = 1, …, m), average speed of the vehicle can be computed as:
_{avg} is the average velocity of the vehicle, _{iv}_{j}) is the instantaneous speed of the vehicle at time instance t_{j} and m is the number of instances namely the number of image frames on which the vehicle is apparent. Three difficult problems should be solved to find the speed of a vehicle by the explained physical approach above. Those are: (1) solution of the scale problem to find the absolute speed of the vehicle, (2) selection of the points to be tracked from the image of the vehicle and (3) tracking of selected points and computing the velocity vectors. In the following chapter the first problem and in the Section 5 the next problem and their solution methods are discussed.

In order to find absolute values of displacement vectors or velocity vectors in object space, the vectors computed in video image coordinate system should be transformed to the object coordinate system which is in the object space. For this purpose, we assumed some restrictions as explained in Section 2. For example, we assumed that the observed scene is flat. But on the other hand, we acquire sideview images of the vehicle as seen in

In order to solve this scale problem, we simply measure two distances in the object space with a measurement tape. These measured distances are lying on the two vertical planes and along the borders of the road. The vehicle travels on the road surface, either from left to right or from right to left. These moving directions are shown with vehicle 1 and vehicle 2 respectively, in the

In order to answer this question, first let’s assume that the vehicle is moving from left to right as vehicle 1 as in _{1} with the scale λ_{1}. Scales of the vertical planes Π_{1} and Π_{2} are obtained with the measured distances d_{1}, d_{2} and their corresponding distances d′_{1} and d′_{2} on the image plane such that λ_{1} = d_{1}’/d_{1} and λ_{2} = d_{2}’/d_{2} respectively. In s similar way let’s assume that the vehicle is moving from right to left. Then its visible side is the left side and it is closer to centre of the road axis. In this case, the scale can be taken as λ = (λ_{1} + λ_{2})/2. According to this configuration and assumptions, if the ideal situation is achieved, then absolute values of the velocity vectors or displacement vectors can be obtained by using the corresponding scale factors.

If the image plane is in the ideal case, then any parallel line in the vertical planes must remain parallel in the image plane. Similarly, the parallel lines on the horizontal plane must also remain parallel in the image plane. If the image plane is far away from the ideal situation, these parallel lines will not be parallel in the image plane. This means that those parallel lines in the object space intersect each other in the image plane. Intersection points of the parallel lines are known as vanishing points. By using vanishing points and their corresponding vanishing planes at the horizontal and vertical directions, the images can be rectified by using vanishing points geometry [

In our acquisition plan, the camera is stable. In this case, when the rectification parameters are found for the first time, they can be used until the camera changes its position. Thus, at the beginning of the speed estimation application, at first the rectification parameters can be found for the first time and these parameters can be used as long as the camera stays stable.

For the speed estimation problem, after rectification parameters have been found, it is not necessary to rectify the whole image. Instead, only the selected and tracked point coordinates may be rectified for speed improvement of the real time computational cost. But however, we give the wholly rectified image on the right image of

In order to track moving objects with video images, points to be tracked which belong to the object on the successive video frames, should be selected automatically. It is well known that good features to be tracked are corner points which have large spatial gradients in two orthogonal directions. Since the corner points cannot be on an edge (except endpoints), aperture problem does not occur. One of the most frequently used definitions of a corner point is given in [^{2}x, ∂^{2}y and ∂x∂y. By computing second order derivatives of pixels of an image, a new image can be formed. This new image is called “Hessian image”. The name “Hessian” arises from the Hessian matrix that is computed around a point [

Shi and Tomasi in [

When it is desired to extract precise geometric information from the images, the corner points should be found within a subpixel accuracy. For this purpose, the all candidate pixels around the corner point can be used. By using the smallest eigenvalues at those points, a parabola can be fitted to represent the spatial location of the corner point. The coordinates of the maximum of the parabola is assumed to be the best location for being a corner. Thus the computed coordinates are obtained in subpixel precision. For the subpixel selection methods readers are referred to [

In our system, as soon as the camera begins for image acquisition, points are selected continuously in real time from the frame images. On the first frame, points are selected and on the next frames those points are tracked and instantaneous velocity vectors of those points are computed. In our system, we restrict the number of points between 20 and 500 and at least 20 points are selected in the worst case.

For speed estimation, correspondence of each selected point on the first frame on which the vehicle appears for the first time, must be found on the next (successive) frame. In the ideal case, correspondence of a selected point must be the same point on the next frame. In order to find the corresponding point, there is no prior information other than the point itself, and it seems that it is not also possible to find exactly where the match point is. Since only one camera is used, it is not possible to define a search area on the next frame which will restrict the search space with some geometric constraints such as epipolar geometry. From this it can be said that, it is impossible to use stereo matching approaches. Instead, if we assume that each image in the each frame is flowing by the very short time period and thus changing the position during the flow, then a modelling approach which models this flow event can be used. These kinds of flow models are called “optical flow”. First we will briefly explain the optical flow approach, and later explain the methods that we used in this paper.

Let p(x,y) be a corner point on a 2D image space, where (x,y) are the image coordinates of the point p. I(t) be a video frame image which was taken in time instance t. Then a point on that frame, namely p € I(t), may be expressed by its position vector as _{i}(x_{i}, y_{i}, t) € I(t) can be expressed by position vectors whose starting points are on the origin of the image coordinate system and their end points are equal to points p_{i}s as in

As can easily be seen in _{x},v_{y}). From the above explanation we see that, after two successive frames had been processed, each tracked point is assigned a velocity vector or equivalently a displacement vector. If this point-velocity vector assignment operation is performed for not only the selected points but also the whole pixels on the image, this is called “dense optical flow”. In this case, velocity vector field of the whole image can be obtained. The Horn and Schunk method given in [

When only one video camera is used, there is no information other than themselves of the selected points to find their correspondences on the next frame. For this reason, it is not possible to know exactly where the corresponding points are on the next frame. But however, by investigating the nature of the problem, some assumptions may be made about the possible locations where the corresponding points might be located. In order to ensure these assumptions are as close to the physical reality as possible, there must exist a theoretic substratum at which these assumptions are supported. Furthermore, this theoretic substratum must be acceptable under some certain situations.

In this sense, it is first necessary to decide and define what information is to be used for finding correspondences. According to this reasoning, the first thing evoked in the mind is the idea of looking at the texture, colour or intensity of the neighbouring area of a selected point in the first frame and expecting that its correspondence on the next frame must also has the same or nearly the same structural properties by the means of texture, intensity or colour,

The three assumptions above help develop an effective target tracking algorithm. In order to track the points and to compute their speeds by using the above assumptions, it is necessary to express those assumptions with mathematical formalisms and then velocity equations must be extracted by using these formalisms. For this purpose, if the symbolic expressions given in ^{2} (

If the derivative given in

In

The derivative ∂

If _{x} component of the velocity vector _{y} component of the velocity vector

If again _{t}

Then

The values of I_{x}, I_{y} and I_{t} in _{x} and v_{y} are two unknown components of the velocity vector _{x},v_{y}). According to these explanations, the same equations as (13) are written for 3 × 3 or 5 × 5 neighbourhood of the point _{x} and v_{y}. Now the unknowns can be solved with overdetermined set of

During the real time tracking, some selected points may not be seen on the next frame. This situation may arise because of different reasons. Especially, when the vehicle is entering into or exiting from the FOV of the camera, the possibility of occurrence of this situation is too high. In order to prevent such situations, we have interpreted the algorithm with the image pyramid approach, which uses coarse to fine image scale levels. For details of the image pyramid approaches, we refer the reader to [

Accuracy of the estimated speed of our system is ±1.12 km/h. We tested the system by comparing the estimated speeds to GPS measured speeds. One another way of the accuracy test is comparison of the estimated speeds to the measurement results of a speed gun, as described in [

We can assume that the GPS speed measurements are accurate and error free because its accuracy is very high, as given by [

In this paper, we have explained the real time speed estimation problem and its solution for one vehicle by using side view video images of the vehicle. The accuracy of the estimated speed had been obtained and is approximately ±1.12 km/h. The sparse optical flow technique is a very effective technique for the speed estimation of the vehicles. When considering more than one vehicle, speeds of each vehicle can also be found at the same time with the proposed methods. However in this case, estimated velocity vectors of the tracked points should be classified to find which vector belongs to which vehicle on the scene. For classification purposes, a classification scheme such as clustering methods may be used. For the speed estimation of multiple vehicles on the same scene, different methods than our proposed approach may also be used. For example, before selection of the points to be tracked, each vehicle can be segmented by using a parametric or geometric active contour and then the deformation of those contours may be tracked to find the speed of each vehicle separately. But this approach is not very suitable for real time speed estimation due to its computational costs. Our proposed method may also be used for automatic driver assistance systems if it is used within a real time sensor network. We continue to work on estimation of the speed of vehicles by cameras mounted on a moving vehicle and both using side view, front view and rear view images of the moving vehicles within a real time local network architecture.

Velocity vectors before filtering of outliers.

Graphical representation of vectors.

Error free vectors and speed.

Sideview image acquisition plan.

Vanishing lines found with Hough transformation (left) and rectified image (right).

Optical flow.

Overall operations of proposed speed estimation process.

Step I Operations (performed offline) | Step II Operations (real time operations ) |
---|---|

| |

1.1. Capture frame I and Frame II 1.2. Compute the rectification parameters with vanishing point geometry 1.3. Store the rectification parameters 1.4. Enter the distance measurements for scale computation 1.5. Define a ROI region where the road and vehicle are visible |
2.1. Capture frame i 2.2. Capture frame i + 1 2.3. Find difference ROI image 2.4. Eliminate background changes with histogram thresholding. 2.5. Select tracking points from the foreground (vehicle) image 2.6. Find corresponding points 2.7. Rectify the coordinates of the selected and the tracked points 2.8. Compute velocity vectors 2.9. Compute mean and standard deviations of the vectors 2.10. Eliminate outlier vectors 2.11. Compute the average instantaneous speed of the vehicle 2.12. Go to 2.2 |

Computation times of real time operations.

2.3. Find difference ROI image | < 1.0 | completed in microseconds |

2.4. Eliminate background changes with histogram thresholding. | < 1.0 | |

2.5. Select tracking points from the foreground (vehicle) image | 10–12 | |

2.6. Find corresponding points | 14–16 | |

2.7. Rectify the coordinates of the selected and the tracked points | < 1.0 | completed in microseconds |

2.8. Compute the velocity vectors | < 1.0 | |

2.9. Compute mean and standard deviations of the vectors | < 1.0 | |

2.10. Eliminate outlier vectors | < 1.0 | |

2.11. Compute the average instantaneous speed of the vehicle | < 1.0 | |

Total execution time | 29–31 |

Laptop configuration: Intel core 2 Duo CPU, 2.40 GHz, 2 GB RAM

Camera to object distance and maximum speed that can be measured.

| |||

10 | 75 | ||

22.95 | 171 | used in this paper | |

26.20 | 196 | ||

30 | 224 | ||

40 | 300 |

Magnitudes of all vectors in pixels.

| |||||||
---|---|---|---|---|---|---|---|

15.17244 | 15.09534 | 15.10201 | 14.67062 | ||||

14.67051 | 14.53567 | 14.97215 | 0.75555 | ||||

14.44615 | 14.48191 | 14.67011 | 14.79012 | ||||

15.09515 | 14.97209 | 0.37625 | 14.67086 | ||||

14.48138 | 15.17195 | 15.12538 | 15.17658 | ||||

15.10202 | 15.09523 | 14.63253 | 0.36652 | ||||

0.367685 | 15.09504 | 1.14171 | 14.73801 | ||||

0.478954 | 14.64967 | 14.44434 | 14.34300 | ||||

14.81166 | 0.37704 | 0.47859 | 14.84108 | ||||

14.42731 | 14.73827 | 1.63186 | 14.52454 | ||||

14.63479 | 15.17197 | 0.47909 | 15.11971 | ||||

15.09527 | 14.52401 | 14.97558 | 0.47341 | ||||

15.11739 | 14.52534 | 14.67038 | 14.29117 |

Accuracy test measurements.

| ||||
---|---|---|---|---|

1 | LR | 38.26 | 38.6 | 0.34 |

2 | RL | 36.73 | 38.5 | 1.77 |

3 | LR | 37.41 | 38.5 | 1.09 |

4 | LR | 47.61 | 48.3 | 0.69 |

5 | RL | 57.92 | 57.7 | −0.22 |

6 | LR | 57.50 | 57.0 | −0.50 |

7 | RL | 64.25 | 63.2 | −1.05 |

8 | LR | 68.92 | 67.3 | −1.62 |

9 | RL | 75.35 | 76.9 | 1.55 |