Vehicle Detection Based on Probability Hypothesis Density Filter

In the past decade, the developments of vehicle detection have been significantly improved. By utilizing cameras, vehicles can be detected in the Regions of Interest (ROI) in complex environments. However, vision techniques often suffer from false positives and limited field of view. In this paper, a LiDAR based vehicle detection approach is proposed by using the Probability Hypothesis Density (PHD) filter. The proposed approach consists of two phases: the hypothesis generation phase to detect potential objects and the hypothesis verification phase to classify objects. The performance of the proposed approach is evaluated in complex scenarios, compared with the state-of-the-art.


Introduction
Traffic accidents are a major cause of death worldwide. A study by the World Health Organization (WHO) reports that an estimated 1.2 million people die in traffic accidents every year, and up to 50 million people are injured [1]. Autonomous driving thus becomes significantly important in order to prevent accidents in traffic scenarios. However, it is still quite challenging for autonomous driving in all scenarios. For automotive manufacturers, the technology behind autonomous driving has been continually refined as a long-term goal, whereas the Advanced Driver Assistance System (ADAS) has been proposed as a short-term development to gradually improve road safety. Numerous ADAS functions have been developed to help drivers avoid accidents, improve driving efficiency, and reduce driver fatigue, in which vehicle detection plays an important role.
Most approaches rely on vision techniques to first detect Regions Of Interest (ROI) and then classify vehicles [2]. Khammari et al. use the Adaboost classification to detect vehicles [3]. Miller et al. and Paragios et al. have also utilized filtering techniques to detect vehicles [4,5]. Meanwhile, vehicle profile symmetry and the corresponding shadows are used in Reference [6]. However, vision techniques suffer from light intensities.
LiDAR is also widely used in vehicle detection. In contrast to vision sensors, LiDAR is robust against light intensities and offers a range of information [7][8][9][10][11]. Teichman et al. use the log odds estimators to recognize objects, where the performance is demonstrated in a large scale environment. Dominguez et al. demonstrate a data fusion platform for tracking vehicles [12]. Compared with vision sensors, LiDAR measurement often suffers from data association issues. This paper extends our previous work to detect vehicles by using information from LiDAR, where objects are represented by the position and shape parameters (more details would be explained later). In Reference [13], we used the Difference of Normal (DoN) operator and the Random Hypersurface Model (RHM) to cluster the points cloud data and estimate the shape parameters [14,15]. To avoid the data association issue, the Probability Hypothesis Density (PHD) filter is proposed to detect vehicles based on Random Finite Set statistics (RFSs) [16]. In RFSs, several approaches are developed to avoid the data association issue, including the PHD filter, the Cardinalized PHD (CPHD) filter [17] and the Bernoulli filter [18]. The PHD filter propagates the probability hypothesis density function over the single target state space, whereas the CPHD filter also propagates the distribution of the target numbers (cardinality). By using the CPHD filter, the system requires more complex implementations and achieves more reliability in cardinality estimation. As the main goal of this paper is to estimate the states in a speed-critical environment, the PHD filter is thus considered. Unlike the PHD or CPHD filter, the multi-Bernoulli filter propagates the posterior target density. Although it has the same complexity as the PHD filter, the performance is better in highly nonlinear environments (it does not require the additional clustering step for state estimation) [18]. As the proposed RHM could be linearly implemented, the PHD filter is considered as the cheapest solution. The estimated states are then classified by the Support Vector Machine (SVM) to eliminate the non-vehicle objects.
The contributions are summarized as: first and foremost, the proposed solution achieves high performance in the presence of unknown data association environments. Furthermore, the shape parameter is first proposed to classify objects. This paper is structured as follows: Section 2 describes the random hypersurface model, as well as the probability hypothesis density filter for hypothesis generation. Section 3 introduces the support vector machine for hypothesis verification. Section 4 demonstrates the performance of the experiments in urban environments. Finally, the paper is concluded in Section 5.

Hypothesis Generation
In the hypothesis generation phase, LiDAR measurements are filtered based on the Random Hypersurface Model (RHM). To further extend RHM in the presence of unknown data association scenarios, the Gaussian Mixture Probability Hypothesis Density (GMPHD) filter is proposed. Notice that objects are tracked and estimated in the 2D Cartesian coordinate system, whereas the depth information is unnecessarily required.
The result of the generation phase is then utilized for the verification phase to eliminate non-vehicles.

Random Hypersurface Model (RHM)
As illustrated in Figure 1, in a 2D Cartesian coordinate system, a point is considered as a scaled point with the factor within the range [0, 1] drawn on the surface. Thus, the RHM is defined as: S(b k ) denotes the surface which consists of both the shape parameterb k and the center c k . Each point that lies on the surface is represented as a scaled boundary when s is drawn from [0, 1]: In Figure 2, r(φ) denotes the distance function calculated from the center to boundary on angle φ in the polar coordinate system.
Assuming r(φ) consists of the shape parameterb k and the center c k , the surface is represented as where e(φ) : = cos φ sin φ and r(b k , φ) denote the unit vector and the radial function in the form of the Fourier series, respectively. Due to its periodic proprieties, the Fourier series expansion of degree N F becomes whereb k is given byb If φ is fixed, Equation (3) is represented as Notice that a low number of Fourier coefficients encode rough information of the surface, whereas a larger number of coefficients give more details.

Bayes Filter
The RHM represents the shape information by using the Fourier coefficients, and the Bayes filter is utilized to calculate the corresponding parameters. •

Process model
Assuming the state x k denotes the Fourier descriptorsb k and does not drift against time, the process model is described as where A k and w k denote the identity matrix and the process noise, respectively. •

Measurement model
As illustrated in Figure 1, a single measurement y k from LiDAR is originated from a surface boundary point z k with scaled factor s, where v k denotes the measurement noise. Using Equation (2), Equation (8) becomes which maps the relationship between the stateb k and the measurement y k . Based on Equation (5), the measurement model is represented as with algebraic manipulations on Equation (10), we get The measurement model is thus acquired as: where a pseudo measurement 0 is used to model the relationship between the state, the scaled factor, the measurement and its noise. The state x k convergences to the true Fourier descriptorb k by updating with a large number of measurements. Notice that the proposed measurement model is implemented in the 2D Cartesian coordinate system and is only effective on the backside of the target. In addition, the depth information from LiDAR measurement is unrequested. Figure 3 exhibits the estimated result with a cross target, in which the state x and measurement y are represented by 13 Fourier coefficients and 2D Cartesian coordinates, respectively.

Probability Hypothesis Density (PHD) Filter
As illustrated in Figure 3, the RHM estimates the shape information by using the Bayes filter.
The key challenge for practice implementation is data association. A traditional estimator operates in measurement-to-target known scenarios, where all assignments are confirmed. However, LiDAR provides a large number of measurements without any association information. In previous work, the Difference of Normal (DoN) operator is utilized as a preprocessing procedure to cluster measurements. The miss detection is quite high since the clustering process may also eliminate objects. Hence, the PHD filter is proposed to track objects in presence of unknown data association scenarios.

Overview
The PHD filter is represented with the set-valued state and observation for multiple-object Bayesian filtering. All targets and observations are collected and represented in the set space, whereas the data association problem in traditional filtering domain is avoided. Figure 4 is a basic introduction of the RFS statistic. Compared to the single target filtering, the PHD filter relies on the random finite set statistics to process data in the set space level.

Mathematic Background
Considering the survived targets S k|k−1 , the spontaneous targets σ k and the spawned targets B k|k−1 , the set-valued state is described as: In addition, the set observation Z k consists of the reflections from both the targets θ k (x) and the clutters κ k .
Similar to Bayesian estimator, the PHD filter is also divided into prediction and update processes. Notice that D and f k|k−1 (x|ζ) denote the posterior density and the transition function, respectively. ζ is the previous state.
Thus, the prediction is represented as: The intensity function is updated based on the measurement set Z k : where g k (z i |x) and P D denote the likelihood function and the detection probability, respectively. Notice that the predict function in Equation (15) are affected by targets, which enter the scene (γ k ), and survive from the previous time step P S and the spawn targets (β(x|ζ)).
The update function in Equation (16) corrects the prediction by using innovations from the observations. Notice that the clutter rate is also considered in the update function.
Equation (17) exhibits that the integration of the intensity function represents the number of targets. Meanwhile, the intensity is not a probability density and thus unnecessarily sums up to 1 [19].
The PHD recursions have multiple integrals with no closed form representation. Thus, the most common approach is to use Gaussian Mixture (GM)-PHD approximations [20]: where z and x denote the current measurement and state, whereas ζ denotes the previous state.
A Gaussian distribution is represented as N (·; m, P) with mean m and covariance P. F k−1 and Q k−1 denote the transition matrix and process covariance, respectively. H k and R k denote the observation matrix and observation noise covariance. Notice that the detection and survival probabilities are constant values: Birth targets γ k are modeled as: where ω γ,k and J γ,k denote the weight, covariance, mean and amount of the Gaussians. Assuming the posterior intensity at time k − 1 is a Gaussian mixture: Equation (15) to time k is also a Gaussian mixture and the Equation (16) at time k is calculated as where The GMPHD filter addresses the data association challenge in contrast to the standard Bayes filter. The process model Equation (18) and measurement model Equation (19) are equal to the RHM Bayes filter in Equations (7) and (8), whereas the implementation process is different.

RHM-GM-PHD Filter
In Section 2.2, the GMPHD filter is introduced for dealing with unknown data association issues. Notice that the standard GMPHD filter operates in condition that one reflection is received for each target per frame, called "Point Target (PT) tracking". In the vehicle detection scenario, a large number of measurements would be collected from the surface of a single object, called "Extended Target (ET) tracking". Therefore, the GMPHD filter should be redesigned for dealing with extended targets solely relying on LiDAR measurement.
The process model and measurement model are similar to the RHM Bayes filter in Equations (7) and (8). The prediction equation of the ET-GM-PHD filter are also the same as the standard GMPHD filter. The measurement update formulas for the ET-GM-PHD filter is introduced as: (24) and the pseudo-likelihood function L Z k (x) is defined as where λ k c k (z) is the mean number of clutter measurements, c k (z) is the spatial distribution of the clutter, notation p∠Z k denotes that p partitions the measurement set Z k into non-empty cells W, notation W ∈ p denotes that W is a cell in the partition p, w p and d W denote the non-negative coefficients for each partition and cell, and φ(x) denotes the same likelihood function for a single measurement in Equation (12). Here, and ] (27) where δ i,j is the Kronecker delta function and |W| is the number of measurements in cell W. More details of the implementation process could be found in [21,22]. Hence, the ET-GM-PHD filter tracks the potential objects by solely relying on LiDAR without any association or cluster process. The estimated states represent the potential objects and would be filtered again by the support vector machine to eliminate non-vehicle objects.

Hypothesis Verification
To eliminate the outliers, the support vector machine is utilized to classify the vehicle and non-vehicle Fourier coefficients.

Support Vector Machine (SVM)
As exhibited in Figure 5, the SVM is proposed to obtain classifiers with good generalization [23]. The mathematical background is introduced as follows: For x i ∈ R n with respect to the classification y i ∈ {−1, +1}, i = 1, · · · , k, the hyperplane is defined as: to linearly separate each data. Notice that x, w and b denote the input vector, weight vector and the bias, respectively. Hence a maximum margin is found to separate positive class from negative class based on The calculation of the hyperplane is subjected to the following constraints: and the classification performance relies on the optimization as: where K(x i , x j ) and α i denote the Kernel function and Lagrange multiplier, respectively. Notice that if the data can not be separated linearly, the kernel function changes according to where C denotes the penalize parameter.

Implementation Detail
During the implementation, there are still issues in both the generation and verification phases. •

ET-GM-PHD Implementation
To track objects in multiple frames, the alignment issue should also be considered. Thus the state to be tracked in the ET-GM-PHD filter is x, y, ] T is the normal state of the object centroid in the 2D Cartesian coordinate system and (b k ) is the shape parameter by using 13 Fourier descriptors. Although a higher number of Fourier descriptors returns more details of the surface, the computation performance and robustness is unsatisfied. In this paper, 13 Fourier descriptors are selected based on the experience during the experiment.
Each object follows the liner Gaussian dynamic Equation (18) with the following configurations: where I n and 0 n denote the n × n identity and zero matrix, respectively. The measurement covariance is given using parameters diag[0.5, ..., 0.5].

SVM Implementation
The KITTI dataset is utilized for training the classifier, which provides a set of 5000 training frames with 1893 manually labeled objects (car, van, tram, misc, pedestrian, cyclist, trunk and so on). For each object, the original measurements (all objects are labeled with a 2D box, where measurement-to-target association is confirmed) are projected to the 2D Cartesian coordinate system around its original point. The 13 Fourier coefficients are then calculated by using the standard nonlinear Kalman filter [15]. Instead of multiple categories, objects are only considered as vehicles and non-vehicles (actually it is divided by cars and non-cars). Then, the SVM is trained by collecting Fourier coefficients from all calculated objects. In further evaluation, another 2000 frames of test data is utilized to guarantee the performance both quantitatively and qualitatively. The SVM is implemented in both the training and test phases without cross-validation. •

Key Parameters and Open Issues
During the ET-GM-PHD process, potential objects are collected in the set-valued state. The Fourier coefficients describe the shape information, and the position vector represents the location. The estimated objects are also shown by using bounding boxes, where the mean point uses the position and the width/height are calculated based on the Fourier coefficients (the Fourier coefficients represent the rough shapes in polar coordinate systems, in which the width and height of the boundary box are calculated by setting the φ equals to 0 and π 2 ). Since the PHD filter addresses the data association issue, the points cloud data is directly utilized to estimate the set-valued states. No further cluster process is required. Meanwhile, for each single measurement, the probabilities of both detection and survival are constant and no more than 1. In addition, the PHD filter may estimate close objects as single objects mainly due to the scale factor s. For the RHM model, measurements are considered as random draws from boundaries with different scaling factors in the range [0, 1]. When objects are close to each other, it is quite challenging to distinguish them.
During the SVM process, the calculated coefficients from objects are utilized to eliminate the outliers. Since most cars have a similar width/length rate, the car-labeled objects are treated as vehicles and the rest are non-vehicles. Furthermore, the 13 Fourier coefficients have been found to be quite challenging for linearly separating vehicles and non-vehicles. To better train the SVM, the radial based kernel function is utilized as: where σ affects distributing complexity in the feature space.

Experiment Evaluation
To evaluate the approach quantitatively and qualitatively, the KIT dataset is utilized and compared with the state-of-the-art [24]. During the experiment, the proposed approach is implemented in Matlab with 2 Cores@3 GHz, and the average time is 5 s per frame. Figures 6-8 demonstrate the detection performance based on the proposed approach in one scenario. In Figure 6, a bicycle is fully observed in the middle of the road. On the left side, a parking car is observed with partial occlusion. On the right side, both car and pedestrian are fully observed. As illustrated in Figure 7, potential objects are detected based on the RHM-ET-PHD filter. It is observed that the proposed approach extracts the potential objects based on the geometry of the road, where the birth model plays an important role. The extracted objects are drawn by boundary boxes calculated by the set-valued state (the center point is based on the position, and the width/height are based on the Fourier coefficients). Afterwards, the SVM is utilized to eliminate the non-vehicle objects. Figure 8 shows the results of the verification phase. Due to the occluded issues, it is observed that the left vehicle is also eliminated. Table 1 demonstrates the overall performance, in contrast to the state-of-the-art, in all scenarios from the KITTI dataset. Although there are also approaches using cameras, the evaluation focuses on the algorithms which only use LiDAR measurements.
As illustrated in Table 1, moderate, easy and hard denote the occlusion level of vehicles, with respect to partly occluded, fully visible and difficult to see. Among the references, the proposed approach achieves a high performance for the easy category and poor performance for both moderate and hard categories. In easy scenarios, all vehicles are fully observed and the corresponding reflections on surfaces are uniformly distributed. Hence, the proposed approach can track and estimate the states successfully, and the final classification process has high performance. In moderate and hard scenarios, the corresponding performance drops significantly in both the generation and verification processes. For PHD filter, although it detects the potential objects, the calculated Fourier coefficients are strongly influenced by the invisible measurements. For SVM classification, the training process mainly relies on the visible measurements to calculate Fourier coefficients.
As a summary, the contributions are concluded as follows: first and foremost, the proposed framework solely relies on LiDAR measurements for vehicle detection in the presence of unknown data association environments. Furthermore, the Fourier coefficient is first proposed for object classification and concluded with high performance for fully visible vehicles.

Conclusions
Vehicle detection is important for developing driver assistance systems. To address the data association problem that suffers from points cloud data, the Probability Hypothesis Density (PHD) filter is proposed in this paper. The proposed scheme utilizes contour information for classification. The evaluation results illustrate a high performance in contrast to the state-of-the-art techniques.
Future work focuses on the improvement of detecting occluded vehicles.