Developing an On-Road Object Detection System Using Monovision and Radar Fusion †

: In this study, a millimeter-wave (MMW) radar and an onboard camera are used to develop a sensor fusion algorithm for a forward collision warning system. This study proposed integrating an MMW radar and camera to compensate for the deﬁciencies caused by relying on a single sensor and to improve frontal object detection rates. Density-based spatial clustering of applications with noise and particle ﬁlter algorithms are used in the radar-based object detection system to remove non-object noise and track the target object. Meanwhile, the two-stage vision recognition system can detect and recognize the objects in front of a vehicle. The detected objects include pedestrians, motorcycles, and cars. The spatial alignment uses a radial basis function neural network to learn the conversion relationship between the distance information of the MMW radar and the coordinate information in the image. Then a neural network is utilized for object matching. The sensor with a higher conﬁdence index is selected as the system output. Finally, three kinds of scenario conditions (daytime, nighttime, and rainy-day) were designed to test the performance of the proposed method. The detection rates and the false alarm rates of proposed system were approximately 90.5% and 0.6%, respectively.


Introduction
In recent years, the development of advanced driving assist systems (ADAS) has attracted a large amount of research and funds from major car factories and universities. The key issues of ADAS include on road object detection, anti-collision technology, park assist system, etc. Three kinds of sensors (i.e., radar, Lidar, and camera) are widely adopted for object detection in front of vehicles [1][2][3][4][5].
Since there are limitations of single sensors, multi-sensor fusion technology can be used to compensate for the disadvantages of each single sensor [6,7].
In reference [8], by using background subtraction and a Haar wavelet translation, the foreground image was transformed into a second-order feature space. Then, based on the concept of a histogram of original gradients (HOG), horizontal and vertical high-frequency components were obtained. In a hierarchical SVM classifier architecture, the proposed system can classify pedestrians, automobiles, and two wheeled vehicles effectively. Yang et al. [9] used an optical flow method to calculate the motion 1.
In order to solve the shortcomings of each single sensor, by using sensor fusion technology, we integrated the two sensor systems and improved the reliability of the systems.

2.
For the fusion architecture of series type, any single sensor failure causes whole system failure. The proposed parallel architecture system depends on the confidence index of each sensor. The system can compensate for each other's sensors and avoid the limitations of series fusion architecture. 3.
Three kinds of scenario conditions (daytime, nighttime, and rainy-day) were implemented in an urban environment to verify the proposed system's viability. The experiment results can provide the baseline of comparison for future research.

System Architecture
This study proposed a sensor fusion technology integrating MMW radar and camera for front object detection. The proposed system consists of three subsystems, including a radar-based detection system, vision-based recognition system, and sensor fusion system.
The image captured by the camera can easily be affected by lighting and weather conditions. Furthermore, the estimated distance of the front object derived from the camera image has a low precision. A sufficiently large velocity relative to the front object is necessary for the MMW radar to stably detect it. Accordingly, these two sensor subsystems were combined in a parallel connection to compensate for the limitations of each sensor and improve the robustness of the detection system. The overall architecture of the proposed detection and recognition system is shown in Figure 1. sensors. The radar subsystem provides noise filtering, tracking, and credibility analysis. The twostage vision detection subsystem can rapidly identify the candidate area form image. The fusion strategy of parallel architecture systems depends on the confidence index of each sensor. Three kinds of scenario conditions (daytime, nighttime, and rainy-day) are implemented in an urban environment to verify the proposed system. The contributions of this study include the following: 1. In order to solve the shortcomings of each single sensor, by using sensor fusion technology, we integrated the two sensor systems and improved the reliability of the systems. 2. For the fusion architecture of series type, any single sensor failure causes whole system failure.
The proposed parallel architecture system depends on the confidence index of each sensor. The system can compensate for each other's sensors and avoid the limitations of series fusion architecture. 3. Three kinds of scenario conditions (daytime, nighttime, and rainy-day) were implemented in an urban environment to verify the proposed system's viability. The experiment results can provide the baseline of comparison for future research.

System Architecture
This study proposed a sensor fusion technology integrating MMW radar and camera for front object detection. The proposed system consists of three subsystems, including a radar-based detection system, vision-based recognition system, and sensor fusion system.
The image captured by the camera can easily be affected by lighting and weather conditions. Furthermore, the estimated distance of the front object derived from the camera image has a low precision. A sufficiently large velocity relative to the front object is necessary for the MMW radar to stably detect it. Accordingly, these two sensor subsystems were combined in a parallel connection to compensate for the limitations of each sensor and improve the robustness of the detection system. The overall architecture of the proposed detection and recognition system is shown in Figure 1. A clustering algorithm and particle filter were applied to the MMW radar data to achieve noise removing and multi-object tracking. Then the object detected by the coordinate system of radar  A clustering algorithm and particle filter were applied to the MMW radar data to achieve noise removing and multi-object tracking. Then the object detected by the coordinate system of radar sensor was converted into an image coordinate. On the other hand, two-stage classifiers were implemented for the foreground segmentation and object recognition for the image data, respectively, then the object information could be obtained. Finally, a radial basis function neural network (RBFNN) was used to fuse the detected object information from the MMW radar and camera.

Radar-Based Object Detection
A 24 GHz short-range radar was adopted for front-end environment detection and a multi-object tracking method based on radar was proposed. This method can facilitate tracking multiple object simultaneously and removing noises, which were considered as non-real objects. The flow chart of the proposed radar-based detection subsystem is shown in Figure 2. First, the radar data were divided into different clusters using a clustering algorithm. The particle filter is then used for signal filtering and target tracking. Two kinds of probability scores will be evaluated in the particle filter process. The convergence of the particle swarm can reflect the quality of the tracking. For the stable tracking objects, the particles around the object have a higher weighting in the importance sampling step. Furthermore, these particles have a higher probability of survival in the resampling step. We define the range probability (P r ) as the survival probability of the particles within a radius of 1 m around the object to evaluate the quality of the tracking. On the other hand, the diversity of the particle swarm can cover of all the states of the object. We defined the available probability (P a ) as the survival probability of the particles after the resampling step. During the tracking process, in line with the value of P a , the system adjusts the particle percentage of resampling to ensure the diversity of the particle swarm. In addition, the confidence index of the target object was derived from the range probability and probability of survival. This confidence index determines the credibility of the actual object. The relative velocity and distance between the vehicle and front object were provided by this subsystem. sensor was converted into an image coordinate. On the other hand, two-stage classifiers were implemented for the foreground segmentation and object recognition for the image data, respectively, then the object information could be obtained. Finally, a radial basis function neural network (RBFNN) was used to fuse the detected object information from the MMW radar and camera.

Radar-Based Object Detection
A 24 GHz short-range radar was adopted for front-end environment detection and a multi-object tracking method based on radar was proposed. This method can facilitate tracking multiple object simultaneously and removing noises, which were considered as non-real objects. The flow chart of the proposed radar-based detection subsystem is shown in Figure 2. First, the radar data were divided into different clusters using a clustering algorithm. The particle filter is then used for signal filtering and target tracking. Two kinds of probability scores will be evaluated in the particle filter process. The convergence of the particle swarm can reflect the quality of the tracking. For the stable tracking objects, the particles around the object have a higher weighting in the importance sampling step. Furthermore, these particles have a higher probability of survival in the resampling step. We define the range probability ( ) as the survival probability of the particles within a radius of 1 m around the object to evaluate the quality of the tracking. On the other hand, the diversity of the particle swarm can cover of all the states of the object. We defined the available probability ( ) as the survival probability of the particles after the resampling step. During the tracking process, in line with the value of , the system adjusts the particle percentage of resampling to ensure the diversity of the particle swarm. In addition, the confidence index of the target object was derived from the range probability and probability of survival. This confidence index determines the credibility of the actual object. The relative velocity and distance between the vehicle and front object were provided by this subsystem.

Radar Data Pre-Processing
The MMW radar signals are electromagnetic waves. Both reflection and refraction will occur when the electromagnetic waves occur on the medium. In addition to the reflected wave from the medium itself, some noise signals of non-real objects are also prone to appear. The relationship between relative distance and echo intensity information was statistically analyzed using a vast amount of data collected during experiments. The statistical results are shown in Figure 3. The statistical results of the signal distribution indicate that both real objects and noise show respective concentrations, and only a small part of the distribution of both overlaps. Accordingly, a noise filtering operation was performed. As shown in Figure 3a, after the signal on the left side of red curve was filtered, the subsequent target tracking and particle filter algorithm were performed. Density-based spatial clustering of applications with noise (DBSCAN) algorithm [17] was used to cluster the radar data, and the number of possible front objects was estimated.

Radar Data Pre-Processing
The MMW radar signals are electromagnetic waves. Both reflection and refraction will occur when the electromagnetic waves occur on the medium. In addition to the reflected wave from the medium itself, some noise signals of non-real objects are also prone to appear. The relationship between relative distance and echo intensity information was statistically analyzed using a vast amount of data collected during experiments. The statistical results are shown in Figure 3. The statistical results of the signal distribution indicate that both real objects and noise show respective concentrations, and only a small part of the distribution of both overlaps. Accordingly, a noise filtering operation was performed. As shown in Figure 3a, after the signal on the left side of red curve was filtered, the subsequent target tracking and particle filter algorithm were performed. Densitybased spatial clustering of applications with noise (DBSCAN) algorithm [17] was used to cluster the radar data, and the number of possible front objects was estimated.

Particle Filter
A particle filter [18] is widely used in many fields, including object tracking, signal processing, and automatic control. In this study, particle filtering was used to filter the radar signal and track the objects in front of a vehicle. The particle filter algorithm uses a finite number of particles to represent the posterior probability of some stochastic process with partial observations. Each particle has the respective weight values that represent the probability of the particle being sampled from the probability density function. The procedure to implement a particle filter algorithm in this study was roughly divided into four steps as follows:

Particle Initialization
To cover all the potential object positions, n pieces of particles were randomly distributed within the radar detection area. Each particle represents a potential position of a real object, where the weight of the particle indicates the probability that the object is at this location.

State Prediction
The state of the object changes over time. Discrete time was used to calculate the object state, and the state of the particle at next moment was predicted by the state and motion model at time − 1. Then the prior probability ( | −1 ) was obtained. The equation used to predict the object state is expressed as follows [19]:

Particle Filter
A particle filter [18] is widely used in many fields, including object tracking, signal processing, and automatic control. In this study, particle filtering was used to filter the radar signal and track the objects in front of a vehicle. The particle filter algorithm uses a finite number of particles to represent the posterior probability of some stochastic process with partial observations. Each particle has the respective weight values that represent the probability of the particle being sampled from the probability density function. The procedure to implement a particle filter algorithm in this study was roughly divided into four steps as follows:

Particle Initialization
To cover all the potential object positions, n pieces of particles were randomly distributed within the radar detection area. Each particle represents a potential position of a real object, where the weight of the particle indicates the probability that the object is at this location.

State Prediction
The state of the object changes over time. Discrete time was used to calculate the object state, and the state of the particle at next moment was predicted by the state and motion model at time k − 1. Then the prior probability P(x k |x k−1 ) was obtained. The equation used to predict the object state is expressed as follows [19]: Energies 2020, 13, 116 6 of 18 where T is the sampling time of the radar sensor, X k = x k y k .
x k . y k T denotes the state vector, and x k and x k−1 denote the relative lateral distances between the target object and the sensor at the current time and the previous moment, respectively. y k and y k−1 are the relative longitudinal distances between the target object and the sensor at the current time and the previous moment, respectively. . x k and . y k represent the lateral and longitudinal relative speeds of the target and the sensor, respectively. W k is zero-mean Gaussian white noise.

Importance Sampling
This step is based on the concept of a Bayesian filter. The particles that are obtained during the state prediction stage and the information obtained from MMW radar are used to estimate the target position. The Bayesian theorem is used to update the prior probability then obtain the posterior probability. In this step, each particle is assigned a weight. Based on the assumption that the radar measurement area is M × N blocks, each block unit is 1 m 2 . The measurement model of the radar sensor is expressed by Equation (3), where υ (i,j) k is the measured noise in (i, j) block and its Gaussian white noise with the means equal to 0 and the variance σ 2 , while h (i,j) k (x k ) is the signal strength of the object in the (i, j) block and its point spread function [20] is expressed as follows: where ∆ x and ∆ y are the block sizes, I k is echo strength of the MMW radar, Σ is the blurring degree of the sensor, and the weight value of the particle can be obtained by the following equation: The weight of each particle in the space region is normalized. The normalization method is based on dividing the weight of each particle by the sum of all particle weights, as shown by Equation (6): Energies 2020, 13, 116 7 of 18 After the weight of each particle is obtained, the relative position of the object detected by the MMW radar can be estimated. The expected value of the target estimation is expressed as follows: 3.

Resampling
The method of estimating according to the weight of each particle is referred to as the sequential importance sampling (SIS) particle filter [18]. However, this method involves particle degradation, leading to insignificant weight values of most particles after several iterative operations. This triggers the system to perform unnecessary calculations on these particles. Thus, the real target position may not be covered by the remaining particles. The resampling method was used to address this issue. In each iteration process, the particles with smaller weight values were discarded and replaced by particles with larger weight values. After resampling, the weight values of all particles was set at 1 n , then the next iteration was performed with new particles. The expected value of the target estimation is expressed as follows:

Experimental Verification
A lot of object information was lost while the MMW radar information was processed by internal algorithms. Therefore, the original unprocessed data was obtained from the MMW radar in this study. The proposed particle filter algorithm was used to track the front object and address the issue of losing too much information.
To verify the feasibility of the algorithm proposed in this study, a laser range finder with high precision was used. The measurement error of the adopted lase finder was ±10 mm to record the center position of the frontal object. The experimental equipment installed to verify the radar tracking system is shown in Figure 4. Three verification conditions were set to avoid dark objects and lack of relative speeds, which can lead to losing laser range finder and radar information, as follows: metal and light-colored moving objects, a relative velocity of ±15 km/h or more, and objects moving from far away to nearby.
After the weight of each particle is obtained, the relative position of the object detected by the MMW radar can be estimated. The expected value of the target estimation is expressed as follows: The method of estimating according to the weight of each particle is referred to as the sequential importance sampling (SIS) particle filter [18]. However, this method involves particle degradation, leading to insignificant weight values of most particles after several iterative operations. This triggers the system to perform unnecessary calculations on these particles. Thus, the real target position may not be covered by the remaining particles. The resampling method was used to address this issue. In each iteration process, the particles with smaller weight values were discarded and replaced by particles with larger weight values. After resampling, the weight values of all particles was set at 1 , then the next iteration was performed with new particles. The expected value of the target estimation is expressed as follows:

Experimental Verification
A lot of object information was lost while the MMW radar information was processed by internal algorithms. Therefore, the original unprocessed data was obtained from the MMW radar in this study. The proposed particle filter algorithm was used to track the front object and address the issue of losing too much information.
To verify the feasibility of the algorithm proposed in this study, a laser range finder with high precision was used. The measurement error of the adopted lase finder was ±10 mm to record the center position of the frontal object. The experimental equipment installed to verify the radar tracking system is shown in Figure 4. Three verification conditions were set to avoid dark objects and lack of relative speeds, which can lead to losing laser range finder and radar information, as follows: metal and light-colored moving objects, a relative velocity of ±15 km/h or more, and objects moving from far away to nearby.  The position of the object measured by the laser range finder is considered as the ground truth, which is illustrated by the blue line seen in Figure 5. The red line represents the tracking result obtained by the proposed particle filtering algorithm. The result of the internal algorithm of the radar sensor is illustrated by the green line. An offset between the detected and actual positions of the object may be observed owing to the characteristics of the radar sensor. The position of the object measured by the laser range finder is considered as the ground truth, which is illustrated by the blue line seen in Figure 5. The red line represents the tracking result obtained by the proposed particle filtering algorithm. The result of the internal algorithm of the radar sensor is illustrated by the green line. An offset between the detected and actual positions of the object may be observed owing to the characteristics of the radar sensor. The error and standard deviation of our proposed particle filter tracking algorithm and the internal algorithm of the radar sensor were compared to the ground truth to verify the tracking results. The error is defined as the absolute value of the estimated position from the algorithm and the ground truth. The average error is the sum of the errors divided by the number of times of detections. As shown in Table 1, the proposed algorithm had better performance considering the average error, the maximum error, and the standard deviation of error of the longitudinal or lateral direction. In addition, the number of times the proposed algorithm effectively detected objects was also greater than that obtained by the sensor internal algorithm.

Vision-Based Object Recognition
The two-stage vision-based object recognition system was similar to in our earlier work [16]. In the first stage, the Haar-like features algorithm was used to identify the candidate regions of object from foreground segmentation. The second stage is responsible for object recognition. Three kinds of objects (i.e., pedestrians, motorcycles, and cars) can be identified by SVM classifiers. The scheme of the two-stage vision-based object recognition process in shown in Figure 6. The object recognition results are shown in Figure 7.
The distance estimation of image object can be determined by using the polynomial model as expressed in Equation (9): where ( ) is the estimation of distance, while denotes the object coordinates v of the image. The error and standard deviation of our proposed particle filter tracking algorithm and the internal algorithm of the radar sensor were compared to the ground truth to verify the tracking results. The error is defined as the absolute value of the estimated position from the algorithm and the ground truth. The average error is the sum of the errors divided by the number of times of detections. As shown in Table 1, the proposed algorithm had better performance considering the average error, the maximum error, and the standard deviation of error of the longitudinal or lateral direction. In addition, the number of times the proposed algorithm effectively detected objects was also greater than that obtained by the sensor internal algorithm.

Vision-Based Object Recognition
The two-stage vision-based object recognition system was similar to in our earlier work [16]. In the first stage, the Haar-like features algorithm was used to identify the candidate regions of object from foreground segmentation. The second stage is responsible for object recognition. Three kinds of objects (i.e., pedestrians, motorcycles, and cars) can be identified by SVM classifiers. The scheme of the two-stage vision-based object recognition process in shown in Figure 6. The object recognition results are shown in Figure 7.
The distance estimation of image object can be determined by using the polynomial model as expressed in Equation (9): where f (y im ) is the estimation of distance, while y im denotes the object coordinates v of the image.

Sensors Fusion and Decision Mechanism
A single sensor system can operate independently; however, a parallel architecture was adopted in this study to fuse two different sensors. The main purpose of this is to improve the detection rate that can be achieved by a single sensor. The sensor fusion was divided into three parts. First, the twodimensional coordinate information of the MMW radar was converted into the coordinate of the image. Afterwards, the information obtained by the two sensors was integrated into the same coordinate system. Next, the object information needed to be matched to determine whether the same object information had been obtained by both the MMW radar and camera, and to integrate the detection results of the two systems. Finally, the trusted sensor was determined based on the confidence index of the sensor.

Coordinate Transformation
The supervised learning algorithms was used to learn the relationship between the MMW radar coordinate and image coordinate system. Before the coordinate transformation, the radar coordinates (x, y) and image coordinate (u, v) needed to be recorded synchronously to be considered as training samples for offline learning. An MMW radar uses electromagnetic waves as a medium, and it exhibits better reflective property to metal objects. Hence, a triangular metal reflector was used as a target object to gather data obtained from the radar and the camera, as shown in Figure 8. A metal reflector

Sensors Fusion and Decision Mechanism
A single sensor system can operate independently; however, a parallel architecture was adopted in this study to fuse two different sensors. The main purpose of this is to improve the detection rate that can be achieved by a single sensor. The sensor fusion was divided into three parts. First, the twodimensional coordinate information of the MMW radar was converted into the coordinate of the image. Afterwards, the information obtained by the two sensors was integrated into the same coordinate system. Next, the object information needed to be matched to determine whether the same object information had been obtained by both the MMW radar and camera, and to integrate the detection results of the two systems. Finally, the trusted sensor was determined based on the confidence index of the sensor.

Coordinate Transformation
The supervised learning algorithms was used to learn the relationship between the MMW radar coordinate and image coordinate system. Before the coordinate transformation, the radar coordinates (x, y) and image coordinate (u, v) needed to be recorded synchronously to be considered as training samples for offline learning. An MMW radar uses electromagnetic waves as a medium, and it exhibits better reflective property to metal objects. Hence, a triangular metal reflector was used as a target object to gather data obtained from the radar and the camera, as shown in Figure 8. A metal reflector

Sensors Fusion and Decision Mechanism
A single sensor system can operate independently; however, a parallel architecture was adopted in this study to fuse two different sensors. The main purpose of this is to improve the detection rate that can be achieved by a single sensor. The sensor fusion was divided into three parts. First, the two-dimensional coordinate information of the MMW radar was converted into the coordinate of the image. Afterwards, the information obtained by the two sensors was integrated into the same coordinate system. Next, the object information needed to be matched to determine whether the same object information had been obtained by both the MMW radar and camera, and to integrate the detection results of the two systems. Finally, the trusted sensor was determined based on the confidence index of the sensor.

Coordinate Transformation
The supervised learning algorithms was used to learn the relationship between the MMW radar coordinate and image coordinate system. Before the coordinate transformation, the radar coordinates (x, y) and image coordinate (u, v) needed to be recorded synchronously to be considered as training samples for offline learning. An MMW radar uses electromagnetic waves as a medium, and it exhibits better reflective property to metal objects. Hence, a triangular metal reflector was used as a target object to gather data obtained from the radar and the camera, as shown in Figure 8. A metal reflector  The camera was installed at an angle parallel to the horizon. When the target object moved from far away to nearby, the position of its center point slightly changed near the center point of the image in the vertical direction. Thus, the variation in the image v -direction coordinate was not obvious. Therefore, the fusion system primarily enabled the neural network to learn the relationship between the MMW radar coordinate (x, y) and the image coordinate (u, v).
From the collected training samples, the longitudinal and lateral distances from the radar were considered as the input of the RBFNN, and the corresponding u coordinate of horizontal direction in the image was considered as an output. This network architecture allows for obtaining the coordinate conversion relationship between these two sensors. The network architecture is shown in Figure 9.

Object Match
The MMW radar detection and image recognition systems operate independently, and the two systems obtain information about the detected objects, respectively. To fuse the information of the two systems, the object information must be matched first to determine whether the same object information has been detected by the two sensors. Coordinates shown in the same image may correspond to several different radar coordinate information, as illustrated by the green points shown in Figure 10. In addition, the distance estimated from the image coordinates may be inaccurate owing to the bumpy road surfaces that can cause the vehicle to shake; thus, it is difficult to match the object information and effectively determine whether the same object is detected.
Another RBFNN is used to match the object information and determine whether the same objects are detected by the two sensors. Six factors were entered as the network inputs, which affect the object match, including image coordinate u, object width, object height, object distance estimated from image, object distance measured by the radar, and the u coordinate converted from the radar to the image. Either "match" or "non-match" were obtained as the network output. The camera was installed at an angle parallel to the horizon. When the target object moved from far away to nearby, the position of its center point slightly changed near the center point of the image in the vertical direction. Thus, the variation in the image v-direction coordinate was not obvious. Therefore, the fusion system primarily enabled the neural network to learn the relationship between the MMW radar coordinate (x, y) and the image coordinate (u, v).
From the collected training samples, the longitudinal and lateral distances from the radar were considered as the input of the RBFNN, and the corresponding u coordinate of horizontal direction in the image was considered as an output. This network architecture allows for obtaining the coordinate conversion relationship between these two sensors. The network architecture is shown in Figure 9.  The camera was installed at an angle parallel to the horizon. When the target object moved from far away to nearby, the position of its center point slightly changed near the center point of the image in the vertical direction. Thus, the variation in the image v -direction coordinate was not obvious. Therefore, the fusion system primarily enabled the neural network to learn the relationship between the MMW radar coordinate (x, y) and the image coordinate (u, v).
From the collected training samples, the longitudinal and lateral distances from the radar were considered as the input of the RBFNN, and the corresponding u coordinate of horizontal direction in the image was considered as an output. This network architecture allows for obtaining the coordinate conversion relationship between these two sensors. The network architecture is shown in Figure 9.

Object Match
The MMW radar detection and image recognition systems operate independently, and the two systems obtain information about the detected objects, respectively. To fuse the information of the two systems, the object information must be matched first to determine whether the same object information has been detected by the two sensors. Coordinates shown in the same image may correspond to several different radar coordinate information, as illustrated by the green points shown in Figure 10. In addition, the distance estimated from the image coordinates may be inaccurate owing to the bumpy road surfaces that can cause the vehicle to shake; thus, it is difficult to match the object information and effectively determine whether the same object is detected.
Another RBFNN is used to match the object information and determine whether the same objects are detected by the two sensors. Six factors were entered as the network inputs, which affect the object match, including image coordinate u, object width, object height, object distance estimated from image, object distance measured by the radar, and the u coordinate converted from the radar to the image. Either "match" or "non-match" were obtained as the network output.

Object Match
The MMW radar detection and image recognition systems operate independently, and the two systems obtain information about the detected objects, respectively. To fuse the information of the two systems, the object information must be matched first to determine whether the same object information has been detected by the two sensors. Coordinates shown in the same image may correspond to several different radar coordinate information, as illustrated by the green points shown in Figure 10. In addition, the distance estimated from the image coordinates may be inaccurate owing to the bumpy road surfaces that can cause the vehicle to shake; thus, it is difficult to match the object information and effectively determine whether the same object is detected.
Another RBFNN is used to match the object information and determine whether the same objects are detected by the two sensors. Six factors were entered as the network inputs, which affect the object match, including image coordinate u, object width, object height, object distance estimated from image, object distance measured by the radar, and the u coordinate converted from the radar to the image. Either "match" or "non-match" were obtained as the network output.

Decision Strategy
If a single sensor in the sensor fusion of cascade architecture fails, then the entire system will inevitably fail. Meanwhile, the sensor fusion of parallel architecture determines which sensor should be trusted based on the decision mechanism. Although one of the sensors might not detect an object or gives a false alarm, if the other sensor correctly detects the object, then the confidence index of each sensor can be calculated via a scoring mechanism, and a credible subsystem can be determined based on the confidence index.
The confidence index of the radar subsystem was calculated as follows: where is the number of times the object tracked by particle filter. is a constant. The confidence index of the image subsystem was calculated as follows: where denotes the distance from the input data point to the SVM hyperplane, is the number of times the object tracked in image subsystem, and and are constants. The confidence index of the sensor fusion system was expressed as follows: RI Score Score Score . (12) When the confidence index is greater than the set threshold ℎ, the reliability of the system is extremely high, and the output result obtained by the system represents the real situation. If the confidence index of each subsystem is greater than the threshold ℎ,, then the subsystem with the highest score is responsible for the entire system decision making process.

Experimental Platform and Scenarios
Three kinds of scenario conditions (daytime, nighttime, and rainy-day) were implemented to verify the proposed system. All the scenarios were carried out on urban roads. The MMW radar and camera were mounted on the front bumper of the experimental car, as shown in Figure 11.

Decision Strategy
If a single sensor in the sensor fusion of cascade architecture fails, then the entire system will inevitably fail. Meanwhile, the sensor fusion of parallel architecture determines which sensor should be trusted based on the decision mechanism. Although one of the sensors might not detect an object or gives a false alarm, if the other sensor correctly detects the object, then the confidence index of each sensor can be calculated via a scoring mechanism, and a credible subsystem can be determined based on the confidence index.
The confidence index of the radar subsystem was calculated as follows: where A rn is the number of times the object tracked by particle filter. η r is a constant. The confidence index of the image subsystem was calculated as follows: where S d denotes the distance from the input data point to the SVM hyperplane, A in is the number of times the object tracked in image subsystem, and η r and λ are constants. The confidence index of the sensor fusion system was expressed as follows: When the confidence index Score is greater than the set threshold Th, the reliability of the system is extremely high, and the output result obtained by the system represents the real situation. If the confidence index of each subsystem is greater than the threshold Th, then the subsystem with the highest score is responsible for the entire system decision making process.

Experimental Platform and Scenarios
Three kinds of scenario conditions (daytime, nighttime, and rainy-day) were implemented to verify the proposed system. All the scenarios were carried out on urban roads. The MMW radar and camera were mounted on the front bumper of the experimental car, as shown in Figure 11.
Considering the effect of pavement puddles and shadow environment, the daytime scenarios included direct sunlight, pavement puddles, and shadow environments, as shown in Figure 12.
In the nighttime experiment, the scenarios included flashing brake lights of front vehicles, headlight reflections, and poor lighting environments, as shown in Figure 13. Considering the effect of pavement puddles and shadow environment, the daytime scenarios included direct sunlight, pavement puddles, and shadow environments, as shown in Figure 12.
In the nighttime experiment, the scenarios included flashing brake lights of front vehicles, headlight reflections, and poor lighting environments, as shown in Figure 13.
In order to reproduce the actual road conditions, we designed a rainy-day scenario too. As the sensors are mounted on the front bumper, the raindrops often adhered to the camera lens during the rainy day experiment, as shown in Figure 14.   Considering the effect of pavement puddles and shadow environment, the daytime scenarios included direct sunlight, pavement puddles, and shadow environments, as shown in Figure 12.
In the nighttime experiment, the scenarios included flashing brake lights of front vehicles, headlight reflections, and poor lighting environments, as shown in Figure 13.
In order to reproduce the actual road conditions, we designed a rainy-day scenario too. As the sensors are mounted on the front bumper, the raindrops often adhered to the camera lens during the rainy day experiment, as shown in Figure 14.   Considering the effect of pavement puddles and shadow environment, the daytime scenarios included direct sunlight, pavement puddles, and shadow environments, as shown in Figure 12.
In the nighttime experiment, the scenarios included flashing brake lights of front vehicles, headlight reflections, and poor lighting environments, as shown in Figure 13.
In order to reproduce the actual road conditions, we designed a rainy-day scenario too. As the sensors are mounted on the front bumper, the raindrops often adhered to the camera lens during the rainy day experiment, as shown in Figure 14.  In order to reproduce the actual road conditions, we designed a rainy-day scenario too. As the sensors are mounted on the front bumper, the raindrops often adhered to the camera lens during the rainy day experiment, as shown in Figure 14. Considering the effect of pavement puddles and shadow environment, the daytime scenarios included direct sunlight, pavement puddles, and shadow environments, as shown in Figure 12.
In the nighttime experiment, the scenarios included flashing brake lights of front vehicles, headlight reflections, and poor lighting environments, as shown in Figure 13.
In order to reproduce the actual road conditions, we designed a rainy-day scenario too. As the sensors are mounted on the front bumper, the raindrops often adhered to the camera lens during the rainy day experiment, as shown in Figure 14.

Radar-Based Detection Subsystem
The radar detection subsystem uses MMW radar to perceive the environment ahead. The proposed multi-object tracking algorithm with a particle filter can effectively track the objects in front and remove Energies 2020, 13, 116 13 of 18 non-object noise. The radar subsystem experiments tested three different categories of objects under different conditions. The detection results are shown as green circles in Figures 15 and 16. The tests primarily involved a single target in a lane. If there were multiple targets, the alert was reported for closest target to the experimental vehicle. Other targets continued to be tracked.

Radar-Based Detection Subsystem
The radar detection subsystem uses MMW radar to perceive the environment ahead. The proposed multi-object tracking algorithm with a particle filter can effectively track the objects in front and remove non-object noise. The radar subsystem experiments tested three different categories of objects under different conditions. The detection results are shown as green circles in Figures 15 and  16. The tests primarily involved a single target in a lane. If there were multiple targets, the alert was reported for closest target to the experimental vehicle. Other targets continued to be tracked. A detection rate exceeding 60% was maintained by the radar detection system during daytime, nighttime, and rainy days. The experimental tests performed under different weather conditions verified that the radar detection system is not affected by weather conditions. The experimental results are listed in Table 2.

Radar-Based Detection Subsystem
The radar detection subsystem uses MMW radar to perceive the environment ahead. The proposed multi-object tracking algorithm with a particle filter can effectively track the objects in front and remove non-object noise. The radar subsystem experiments tested three different categories of objects under different conditions. The detection results are shown as green circles in Figures 15 and  16. The tests primarily involved a single target in a lane. If there were multiple targets, the alert was reported for closest target to the experimental vehicle. Other targets continued to be tracked. A detection rate exceeding 60% was maintained by the radar detection system during daytime, nighttime, and rainy days. The experimental tests performed under different weather conditions verified that the radar detection system is not affected by weather conditions. The experimental results are listed in Table 2. A detection rate exceeding 60% was maintained by the radar detection system during daytime, nighttime, and rainy days. The experimental tests performed under different weather conditions verified that the radar detection system is not affected by weather conditions. The experimental results are listed in Table 2.

Vision Recognition
The advantages of two-stage vision-based object recognition system are as follows: By using Haar-like features, the first-stage classifier can detect efficiently candidate areas. Unfortunately, the Haar-like algorithm suffers from higher false positive rates (see the purple rectangles in Figure 17). Therefore, the second-stage PCA-HOG algorithm classifier was utilized to compensate for the higher false positive rates of the first-stage result.

Vision Recognition
The advantages of two-stage vision-based object recognition system are as follows: By using Haar-like features, the first-stage classifier can detect efficiently candidate areas. Unfortunately, the Haar-like algorithm suffers from higher false positive rates (see the purple rectangles in Figure 17). Therefore, the second-stage PCA-HOG algorithm classifier was utilized to compensate for the higher false positive rates of the first-stage result. The detection results of the vision-based object recognition subsystem are shown as yellow rectangles in Figure 18. The results of the rainy-day experiment are shown as green rectangles in Figure 19.  The detection results of the vision-based object recognition subsystem are shown as yellow rectangles in Figure 18. The results of the rainy-day experiment are shown as green rectangles in Figure 19.

Vision Recognition
The advantages of two-stage vision-based object recognition system are as follows: By using Haar-like features, the first-stage classifier can detect efficiently candidate areas. Unfortunately, the Haar-like algorithm suffers from higher false positive rates (see the purple rectangles in Figure 17). Therefore, the second-stage PCA-HOG algorithm classifier was utilized to compensate for the higher false positive rates of the first-stage result. The detection results of the vision-based object recognition subsystem are shown as yellow rectangles in Figure 18. The results of the rainy-day experiment are shown as green rectangles in Figure 19.  All the experiments performed under different weather conditions involved three classifications of objects: pedestrians, motorcycles, and cars. The detection results of vision-based systems are listed in Table 3.  All the experiments performed under different weather conditions involved three classifications of objects: pedestrians, motorcycles, and cars. The detection results of vision-based systems are listed in Table 3.
Due to the high sensitivity to light sources, the performance of camera sensor depends on the condition of light sources. For example, suffering in an insufficient light source, the vision-based systems cannot extract completely the features of objects at night. On the other hand, in rainy weather experiments, the raindrops adhering to the camera lens block the object in front of the vehicle. Thus, the system cannot effectively identify the information of the target, leading to the failure of the image subsystem. Therefore, the worst detection rates are achieved at night and on rainy days.

Sensor Fusion System
This system integrates MMW radar and camera information and improves the scene when one of the detection systems fails by using the sensor fusion of parallel architecture. The system presents complementary characters. For example, as shown in Figure 20, the radar did not detect the front vehicle when the relative speed of the radar and object was relatively small; thus, the camera was used to compensate for the radar failure. On the other hand, when the raindrops adhering to the camera lens blocked the scene, leading to image detection failure, the radar compensated for this situation, as shown in Figure 21. Due to the high sensitivity to light sources, the performance of camera sensor depends on the condition of light sources. For example, suffering in an insufficient light source, the vision-based systems cannot extract completely the features of objects at night. On the other hand, in rainy weather experiments, the raindrops adhering to the camera lens block the object in front of the vehicle. Thus, the system cannot effectively identify the information of the target, leading to the failure of the image subsystem. Therefore, the worst detection rates are achieved at night and on rainy days.

Sensor Fusion System
This system integrates MMW radar and camera information and improves the scene when one of the detection systems fails by using the sensor fusion of parallel architecture. The system presents complementary characters. For example, as shown in Figure 20, the radar did not detect the front vehicle when the relative speed of the radar and object was relatively small; thus, the camera was used to compensate for the radar failure. On the other hand, when the raindrops adhering to the camera lens blocked the scene, leading to image detection failure, the radar compensated for this situation, as shown in Figure 21. All the experiments performed under different weather conditions involved three classifications of objects: pedestrians, motorcycles, and cars. The detection results of vision-based systems are listed in Table 3.
Due to the high sensitivity to light sources, the performance of camera sensor depends on the condition of light sources. For example, suffering in an insufficient light source, the vision-based systems cannot extract completely the features of objects at night. On the other hand, in rainy weather experiments, the raindrops adhering to the camera lens block the object in front of the vehicle. Thus, the system cannot effectively identify the information of the target, leading to the failure of the image subsystem. Therefore, the worst detection rates are achieved at night and on rainy days.

Sensor Fusion System
This system integrates MMW radar and camera information and improves the scene when one of the detection systems fails by using the sensor fusion of parallel architecture. The system presents complementary characters. For example, as shown in Figure 20, the radar did not detect the front vehicle when the relative speed of the radar and object was relatively small; thus, the camera was used to compensate for the radar failure. On the other hand, when the raindrops adhering to the camera lens blocked the scene, leading to image detection failure, the radar compensated for this situation, as shown in Figure 21. In addition to compensating for single sensors failures, the system integrates the sensors' information when both the radar and camera detect objects simultaneously. The system relies on the coordinate transformation and object matching decision mechanism to determine whether the same objects are detected by the two sensors, as shown in Figure 22.  In addition to compensating for single sensors failures, the system integrates the sensors' information when both the radar and camera detect objects simultaneously. The system relies on the coordinate transformation and object matching decision mechanism to determine whether the same objects are detected by the two sensors, as shown in Figure 22. In addition to compensating for single sensors failures, the system integrates the sensors' information when both the radar and camera detect objects simultaneously. The system relies on the coordinate transformation and object matching decision mechanism to determine whether the same objects are detected by the two sensors, as shown in Figure 22. The parallel sensor fusion architecture proposed in this study exhibits the advantages of compensating for the disadvantages of relying on a single sensor. It improves the scene in case of subsystem failure and significantly increases the system detection rate and stability, as listed in Table  4. Regardless of the weather conditions, better detection rates were achieved by the sensor fusion system than those obtained when relying on a single subsystem. Table 5 lists the detection results of each system for the three object categories under different weather conditions. The sensor fusion system can achieve a detection rate of more than 90%.
We also compared our results with existing related works. The comparison results are listed in Table 6.  The parallel sensor fusion architecture proposed in this study exhibits the advantages of compensating for the disadvantages of relying on a single sensor. It improves the scene in case of subsystem failure and significantly increases the system detection rate and stability, as listed in Table 4. Regardless of the weather conditions, better detection rates were achieved by the sensor fusion system than those obtained when relying on a single subsystem.  Table 5 lists the detection results of each system for the three object categories under different weather conditions. The sensor fusion system can achieve a detection rate of more than 90%. We also compared our results with existing related works. The comparison results are listed in Table 6.

Conclusions
Two types of sensors, an MMW radar and a camera were integrated in this study to develop a frontal object detection system based on sensor fusion using parallel architecture. A particle filter algorithm was employed by the radar detection subsystem to remove noise from non-objects while tracking objects at the same time, and converting the target information into the image coordinates using RBFNN. On the other hand, the image object could be identified as one of three main categories (pedestrians, motorcycles, and cars) by the two-stage vision-based recognition subsystem. The information obtained by the two subsystems was integrated. The sensor with higher credibility was selected as the system output result. Three kinds of experiments (daytime, nighttime, and rainy-days) were performed to verify the proposed system. The experiment results show the detection rates and the false alarm rates of proposed system were approximately 90.5% and 0.6%, respectively. These detection rates are better than those obtained by single sensor systems.