This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (

This paper presents an analytical study of the depth estimation error of a stereo vision-based pedestrian detection sensor for automotive applications such as pedestrian collision avoidance and/or mitigation. The sensor comprises two synchronized and calibrated low-cost cameras. Pedestrians are detected by combining a 3D clustering method with Support Vector Machine-based (SVM) classification. The influence of the sensor parameters in the stereo quantization errors is analyzed in detail providing a point of reference for choosing the sensor setup according to the application requirements. The sensor is then validated in real experiments. Collision avoidance maneuvers by steering are carried out by manual driving. A real time kinematic differential global positioning system (RTK-DGPS) is used to provide ground truth data corresponding to both the pedestrian and the host vehicle locations. The performed field test provided encouraging results and proved the validity of the proposed sensor for being used in the automotive sector towards applications such as autonomous pedestrian collision avoidance.

Pedestrian protection is a key problem in the context of the automotive industry and its applications. Sensor systems onboard the vehicles are required for predicting the vehicle host-to-pedestrian (H2P) distance as wells as the time-to-collision (TTC). Cameras are the most commonly used sensor for that purpose. The use of video sensors comes quite natural for the problem of pedestrian detection since they provide texture information which enables the use of quite discriminative pattern recognition techniques. The human visual perception system is perhaps the best example of what performance might be possible with such sensors, if only the appropriate algorithm is used.

Pedestrian detection is a difficult task from computer vision perspective. Large variations in pedestrian appearance (e.g., clothing, pose,

Accurate depth information is essential in the area of pedestrian protection applications (e.g., driving assessment, collision avoidance, collision mitigation,

In this paper we present an analytical study of the depth estimation error of a stereo vision-based pedestrian detection sensor for automotive applications such as pedestrian collision avoidance and/or mitigation. The sensor comprises two synchronized and calibrated low-cost cameras. Pedestrians are detected by combining a 3D clustering method with Support Vector Machine-based (SVM) classification. The influence of the sensor parameters in the stereo quantization errors is analyzed in detail providing a point of reference for choosing the sensor setup according to the application requirements. The sensor is then validated in real experiments. Collision avoidance maneuvers by steering are carried out by manual driving. A real time kinematic differential global positioning system (RTK-DGPS) is used to provide ground truth data corresponding to both the pedestrian and the host vehicle locations. The performed field test provided encouraging results and proved the validity of the proposed sensor concerning the accuracy required in one of the most challenging and difficult applications in the context of the automotive industry.

The remainder of this paper is organized as follows: Section 2 provides an overall description of the proposed sensor, covering details of implementation, focusing on the analysis of the depth estimation error and the sensor setup and describing the proposed maneuver for pedestrian collision avoidance. Experimental results that validate the proposed approach are presented and discussed in Section 3. Finally, Section 4 summarizes the conclusions.

The experimental vehicle used in this work is a car-like robot (a modified Citröen C4) which can be seen in

The stereo vision sensor uses 320 × 240 pixel greyscale images with a baseline of approximately 300 mm and a focal length of 4.2 mm. These parameters satisfy the application requirements as we will see in subsequent sections. The cameras are calibrated in a semi-supervised fashion by using a modified version of the Camera Calibration Toolbox for Matlab and a chessboard pattern. Thus we obtain the intrinsic parameters of each camera (focal length - _{x}_{y}_{0}, _{0}-and distortion parameters -_{0}, _{1}, _{2}, _{3} -) as well as the extrinsic transformation between them (rotation angles -_{x}_{y}_{z}_{x}_{y}_{z}_{0}, _{1}, _{2}, _{3}) are used to compensate both radial and tangential lens distortions.

Pedestrian detection is carried out using the system described in [

3D maps are filtered assuming the road surface as planar (which can be acceptable in most cases),

Based on the idea that obstacles (including pedestrians) have a higher density of 3D points than the road surface, ROI selection can be carried out by determining those positions in the 3D space where there is a high concentration of 3D points. A 3D subtractive clustering method is proposed to cope with the ROI selection stage using sparse data. The idea is to find high-density regions, which are roughly modelled by a single 3D Gaussian distribution, in the Euclidean space. The parameters of each Gaussian distribution are defined according to a minimum and maximum extent of pedestrians. Thus, whereas pedestrians are correctly selected, bigger obstacles such as vehicles or groups of pedestrians are usually split in two or more parts. To cope with the stereo accuracy the method is adapted to the expected depth error [

The 2D candidates are then obtained by projecting the 3D points of each resulting cluster and computing their bounding boxes. A Support Vector Machine-based (SVM) classifier is then applied using an optimal combination of feature-extraction methods and a by-components approach [

Nonetheless, the 2D bounding box corresponding to a 3D candidate might not perfectly match the actual pedestrian appearance in the image plane. Multiple candidates are generated around each original candidate. The so-called multi-candidate (MC) approach proves to increase the detection rate, the accuracy of depth measurements, as well as the detection range [

The last block of

The pedestrian detection system runs in real time (25 Hz) with 320 × 240 images and a baseline of _{x}

Pedestrian collision avoidance is one of the most difficult and challenging automatic driving operations for autonomous vehicles and can be carried out by braking or by steering. Before designing autonomous collision avoidance maneuvers a proper analysis of the sensor errors has to be performed in order to validate the proposed approach. In our case, the stereo vision-based pedestrian detection sensor is evaluated in real scenarios with real drivers and real pedestrians. Since emergency brake maneuvers are risky, a set of experiments in which drivers have been requested to perform pedestrian collision avoidance maneuvers by steering at speeds—10, 15, 20, 25 and 30 km/h—have been devised. Higher speeds have not been considered due to the associated risks.

The avoidance maneuver has to fulfil some conditions. First, the vehicle has to be moving along a straight road in the right lane. Second, the pedestrian has to be located in the same lane. Third, the left lane has to be free and long enough for the pedestrian collision avoidance maneuver to be completed at the current speed. As soon as the driver detects a potential pedestrian collision that can be avoided, a lane change to the adjacent left lane is performed. Once the pedestrian has been passed, a second lane change is carried out to go back to the right lane (see

There is a significant amount of published research on characterization of range estimation errors based on system parameters for stereo vision [

Given a calibrated rig of cameras and a correspondence between two points, one on the left camera (_{l}_{l}_{r}_{r}

Each camera intrinsic parameters [^{L}^{R}

In order to compute how the different errors in quantization in _{l}_{l}_{r}_{r}

Applying the product rule for matrices:

Solving the partial derivatives for

Finally, substituting the intrinsic matrices values from

Assuming T is a normally distributed random variable with mean 0 and variance:

As each pedestrian is roughly modelled by a high concentration of 3D points, the final host-to-pedestrian distance estimation error is defined as the mean value of the Δ_{zz}

A stereo imaging system needs to know how the various system parameters affect the depth estimation error, especially for automotive applications due to their safety component. Designing a stereo system involves choosing three main parameters: the

In order to facilitate the choice of the system parameters we propose the use of pre-computed graphs including different settings. Whereas the H2P distance estimation error is computed assuming the general stereo case (non parallel optical axes), the graphs are computed using the ideal case. From the geometry of a parallel stereo pair (ideal case), _{x}_{y}_{0}, _{0}) and separated by a baseline _{x}^{T}_{r}_{l}_{x}_{x}_{x}_{r}_{l}_{u}_{x}_{x}_{uMAX}_{uMIN}_{MAX}_{x}t_{x}_{x}t_{x}_{uMAX}

As can be seen, the higher the baseline the lower the error. Let’s consider that our system (with 320 × 240 images and _{x}

Finally, _{x}

The proposed stereo vision-based pedestrian detection system uses 320 × 240 images, a baseline of _{x}

The proposed stereo vision-based pedestrian detection sensor is evaluated in a set of experiments carried out in one of the most challenging tasks in the context of automotive applications: collision avoidance maneuvers. The experimental setup is described in

In order to support the use of the RTK-DGPS sensor as ground truth we have devised a simple experiment in which we locate the sensor on the dummy position and we obtain the global position during 90 s.

In order to compare these trajectories with the ones provided by the stereo sensor, the relative car-to-pedestrian positions with respect to the left camera have to be computed. This transformation is carried out by applying two translations: one from the UTM global reference to the RTK-DGPS onboard the vehicle and other from the DGPS to the left camera. The orientation of both axes is computed using the longitudinal movement of the vehicle. _{zz}

Some remarkable conclusions can be deduced from these figures. The maximum range (25–30 m) and the inverse proportion between the depth and the stereo accuracy can be easily appreciated (as demonstrated in Section 2.3). The ground truth measurements are almost always (99%) inside the limits of the stereo measurements plus their corresponding quantization errors, which proves that the stereo sensor provides information accurate enough despite its inner accuracy constraints. The reason why there are some cases where the H2P ground truth measurements are outside the error interval is because the stereo quantization errors are not computed according to the filtered values (note that the Kalman filter blocks high frequency changes) but to the measurements [e.g., see frame number 30 in

In

This paper presents an analytical study of the depth estimation error of a stereo vision-based pedestrian detection sensor for automotive applications such as pedestrian collision avoidance and/or mitigation. The sensor comprises two synchronized and calibrated low-cost cameras, providing information about the relative pedestrian position with respect to the host vehicle (H2P distance) and the TTC. Pedestrians are detected in a six stage process:

The accuracy of the measurements provided by the proposed sensor is obtained by computing the stereo quantization error. Sensor setup is defined according to the application requirements. The relationship between the relative range error and the sensor parameters (focal length, baseline and images size) is analyzed by means of graphs.

The proposed sensor is validated in a set of experiments in which real collision avoidance maneuvers were carried out by real drivers and with real pedestrians up to speeds of 30 km/h. The experimental results demonstrate that the sensor provides suitable measurements despite its inner accuracy constraints due to the quantization error. Even the fact that sensors measurements (H2P distance and TTC) are not reliable at long distances, their quantization errors decrease as long as both the distance and the TTC decrease. In other words, the higher the risks, the better the sensor accuracy.

These statements can be accepted up to speeds of 30 km/h. The risks associated with performing collision avoidance maneuvers at higher speeds are not acceptable with the current experimental setup. However, one main conclusion can be extrapolated from our results: higher speeds will endure higher errors in the estimated TTC, compromising the effectiveness of the proposed approach. In order to increase the accuracy of the measurements provided by the stereo sensor, higher resolution images and longer baseline can be used. However, that would increase the computational cost.

The performed field test provided encouraging results and proved the validity of the proposed sensor for being used in the automotive sector towards applications such as autonomous pedestrian collision avoidance.

This work has been supported by the Spanish Ministry of Science and Innovation by means of Research Grant TRASNITO TRA2008-06602-C03 and Spanish Ministry of Development by means of Research Grant GUIADE P9/08.

(Top left) Low cost stereo vision sensor. (Top right) RTK-DGPS. (Bottom) Experimental vehicle (modified Citröen C4).

Overview of the stereo vision-based pedestrian detection architecture.

Pedestrian collision avoidance maneuver.

Absolute and relative depth estimation errors for a stereo sensor with

Absolute and relative depth estimation errors for a stereo sensor with _{x}

Absolute and relative depth estimation error for a stereo sensor with _{x}

Blind frontal range as a function of the focal length, for different baselines. Note that the size of the images has no effect on the size of the blind frontal area.

Size of the disparity search space as a function of the focal length, for different baselines with images of 320 × 240 px. The disparity search space is computed from 2 m to 30 m.

Size of the disparity search space as a function of the focal length, for different image sizes, with a baseline of _{x}

Experimental setup. The RTK-DGPS is used as ground truth data from both pedestrian position and vehicle position. The stereo sensor provides host-to-pedestrian sTTC measurements.

(a) RTK-DGPS stationary position in a 90 s run. (b) RTK-DGPS distance to its mean along time.

Pedestrian collision avoidance maneuvers at different speeds.

Stereo Host-to-Pedestrian (H2P) distance measurements and their accuracy and RTK-DGPS H2P distance (ground truth) at (a) 10 km/h, (b) 20 km/h and (c) 30 km/h.

RTK-DGPS TTC, stereo TTC and absolute error in avoidance experiments performed at (a) 10 km/h, (b) 20 km/h and (c) 30 km/h.

RMSE of the TTC.

| |||
---|---|---|---|

10 km/h | 0.9625 | 0.2164 | 0.0341 |

15 km/h | 0.8775 | 0.3419 | 0.0360 |

20 km/h | 0.5430 | 0.2200 | 0.0919 |

25 km/h | 0.5997 | 0.2870 | 0.1281 |

30 km/h | 0.7110 | 0.3731 | 0.1436 |

| |||

0.7387 | 0.2877 | 0.0867 |