Research on Target Detection Based on Distributed Track Fusion for Intelligent Vehicles

Accurate target detection is the basis of normal driving for intelligent vehicles. However, the sensors currently used for target detection have types of defects at the perception level, which can be compensated by sensor fusion technology. In this paper, the application of sensor fusion technology in intelligent vehicle target detection is studied with a millimeter-wave (MMW) radar and a camera. The target level fusion hierarchy is adopted, and the fusion algorithm is divided into two tracking processing modules and one fusion center module based on the distributed structure. The measurement information output by two sensors enters the tracking processing module, and after processing by a multi-target tracking algorithm, the local tracks are generated and transmitted to the fusion center module. In the fusion center module, a two-level association structure is designed based on regional collision association and weighted track association. The association between two sensors’ local tracks is completed, and a non-reset federated filter is used to estimate the state of the fusion tracks. The experimental results indicate that the proposed algorithm can complete a tracks association between the MMW radar and camera, and the fusion track state estimation method has an excellent performance.


Introduction
Intelligent vehicles are conducive to reducing traffic accidents and easing traffic congestion, which is an important direction of automobile technology development [1]. Accurate target detection is an important condition for intelligent vehicles to drive normally in the increasingly complex road environment. However, the current sensors used for target detection all have different degrees of defects, such as a limited detection range and poor adaptability to climate and light, leading to incorrect detection information and other problems. In general, a combination of multiple sensors can expand the detection range and improve the detection reliability and robustness [2]. Therefore, sensor fusion technology can be used to solve the problem of target detection.
Currently, the sensors commonly used for target detection on the market include lidar, millimeter-wave (MMW) radar, camera, and ultrasonic radar [3]. Among them, MMW radar can work in all weather, and the detection of distance and speed is relatively accurate. The camera has a wide detection range and target type recognition ability [4]. In addition, the two sensors are cheap. Therefore, the combination of an MMW radar and a camera has become a mainstream scheme for intelligent vehicles.
According to the differences in the data processing methods, sensor fusion can be divided into three levels, namely: data level, feature level, and target level [5]. Data level fusion gathers the original center, which associates the fusion track, and the measurement also updates the fusion track state. The distributed structure has multiple tracking processing modules and a fusion center module. The measurement information of each sensor is transmitted to the corresponding tracking processing module, and the tracking processing module outputs the local tracks of the sensor. The local tracks of each sensor enter the fusion center, and the fusion center processes the local tracks and obtains the global tracks, which are the final result of the fusion algorithm.
Compared with the centralized structure, the distributed structure has a good stability and low requirements on the communication ability and computing speed of the system. Therefore, the distributed structure is selected as the basic structure of the fusion algorithm in this paper, and the designed fusion algorithm framework for an MMW radar and a camera is shown in Figure 1. The framework divides the fusion algorithm into two tracking processing modules and one fusion center module. The tracking processing module receives the sensor measurement information and carries out multi-target tracking processing, which is divided into several parts, including pretreatment, data association, track management, and state estimation. The fusion center module is mainly divided into several parts, including temporal-spatial alignment, data association, fusion track state estimation, and global track generation. When the system works, the MMW radar and camera separately output measurement information through the CAN communication, and the measurement information enters their respective tracking processing modules. The tracking processing module runs a multi-target tracking algorithm and outputs local track information, including track state and state covariance, which will enter the fusion center module. The fusion center module first registers the local tracks of two sensors in time and space, and then makes the association between the local tracks. The local track is divided into two parts through the track-track association algorithm. One part is the successfully associated tracks, which is called the residual track, and its state information is the same as that of the local track. The other part is the successfully associated tracks, which generate the fusion track, and then estimate the state of the fusion track according to the corresponding local track. The fusion track and residual track constitute the global track, which is the output information of the algorithm framework. Sections 3 and 4 will design the tracking processing module and the fusion center module, respectively.
Sensors 2019, 19, x FOR PEER REVIEW 3 of 18 measurement information of each sensor is transmitted to the corresponding tracking processing module, and the tracking processing module outputs the local tracks of the sensor. The local tracks of each sensor enter the fusion center, and the fusion center processes the local tracks and obtains the global tracks, which are the final result of the fusion algorithm. Compared with the centralized structure, the distributed structure has a good stability and low requirements on the communication ability and computing speed of the system. Therefore, the distributed structure is selected as the basic structure of the fusion algorithm in this paper, and the designed fusion algorithm framework for an MMW radar and a camera is shown in Figure 1. The framework divides the fusion algorithm into two tracking processing modules and one fusion center module. The tracking processing module receives the sensor measurement information and carries out multi-target tracking processing, which is divided into several parts, including pretreatment, data association, track management, and state estimation. The fusion center module is mainly divided into several parts, including temporal-spatial alignment, data association, fusion track state estimation, and global track generation. When the system works, the MMW radar and camera separately output measurement information through the CAN communication, and the measurement information enters their respective tracking processing modules. The tracking processing module runs a multitarget tracking algorithm and outputs local track information, including track state and state covariance, which will enter the fusion center module. The fusion center module first registers the local tracks of two sensors in time and space, and then makes the association between the local tracks. The local track is divided into two parts through the track-track association algorithm. One part is the successfully associated tracks, which is called the residual track, and its state information is the same as that of the local track. The other part is the successfully associated tracks, which generate the fusion track, and then estimate the state of the fusion track according to the corresponding local track. The fusion track and residual track constitute the global track, which is the output information of the algorithm framework. Sections 3 and 4 will design the tracking processing module and the fusion center module, respectively.   Figure 2 shows the algorithm structure of the tracking processing module. After the measurement information is entered into the tracking processing module, the measurement targets   Figure 2 shows the algorithm structure of the tracking processing module. After the measurement information is entered into the tracking processing module, the measurement targets that influence the running of the ego vehicle are firstly selected by combining with the status information of the ego vehicle, and pretreatment is completed. Then, the measurement is associated with the current local tracks. We assume that there is a local track, i, and the measurement prediction is z i (k|k − 1) at time, k, defined here:

Tracking Processing Module
where z j (k) is the value of measurement target j; S ij is the covariance matrix of innovation; and d 2 ij is the weighted norm of innovation vector, which can be understood as the statistical distance between the measurement prediction information of the local track and the measurement target.
that influence the running of the ego vehicle are firstly selected by combining with the status information of the ego vehicle, and pretreatment is completed. Then, the measurement is associated with the current local tracks. We assume that there is a local track, i, and the measurement prediction is ( ) where ( ) j z k is the value of measurement target j; ij S is the covariance matrix of innovation; and 2 ij d is the weighted norm of innovation vector, which can be understood as the statistical distance between the measurement prediction information of the local track and the measurement target.
The statistical distance is taken as the association reference, and the Kuhn-Munkres algorithm is taken as the allocation reference, and the association between track and measurement is completed according to the global nearest neighbor idea [19]. The relationship between the measurement and track after the association is completed can be divided into three categories, namely: if the measurement and track are successfully associated, the track is not associated with any measurement, and the measurement is not associated with any track. These association results will be fed into the track management section.  The main function of track management is to manage the generation, maintenance, and disappearance of the track. Track management can solve the problem of false measurement and missing target detection [20]. The measurement target that is not associated with any track is a generated temporary track. For the continuous multi-frame successfully associated temporary track, what can be considered as a real and confirmed track is generated, which is also local track. For the confirmed track, if there is no association measurement in continuous multiple frames, the track can be considered dead and discarded [21]. After the rule judgment, it is necessary to update the status of the confirmed track and temporary track to obtain the optimal state estimation. For the track that is determined to be dead, it is deleted from the track list without any state update.
State estimation is divided into state prediction, measurement prediction, and state update, which is the same as the Kalman filter [22]. Assuming that the acceleration of the target is constant in a short time, a motion model with constant acceleration can be established. The target state vector is The statistical distance is taken as the association reference, and the Kuhn-Munkres algorithm is taken as the allocation reference, and the association between track and measurement is completed according to the global nearest neighbor idea [19]. The relationship between the measurement and track after the association is completed can be divided into three categories, namely: if the measurement and track are successfully associated, the track is not associated with any measurement, and the measurement is not associated with any track. These association results will be fed into the track management section.
The main function of track management is to manage the generation, maintenance, and disappearance of the track. Track management can solve the problem of false measurement and missing target detection [20]. The measurement target that is not associated with any track is a generated temporary track. For the continuous multi-frame successfully associated temporary track, what can be considered as a real and confirmed track is generated, which is also local track. For the confirmed track, if there is no association measurement in continuous multiple frames, the track can be considered dead and discarded [21]. After the rule judgment, it is necessary to update the status of the confirmed track and temporary track to obtain the optimal state estimation. For the track that is determined to be dead, it is deleted from the track list without any state update.
State estimation is divided into state prediction, measurement prediction, and state update, which is the same as the Kalman filter [22]. Assuming that the acceleration of the target is constant in a short time, a motion model with constant acceleration can be established. The target state vector is ..
x, y, . y, where (x, y) is the position vector, ( y) is the velocity vector, and ( .. x, .. y) is the acceleration vector. Then, the motion state model can be obtained where is the white noise sequence in a discrete model, and w x and w y correspond to the target's noise "jerk" along the x-and y-axis, respectively. The MMW radar can detect the target distance, azimuth, and relative velocity, and its measurement vector be expressed as follows The corresponding measurement model is as follows: where υ r = υ S υ θ υ v T represents the measurement white noise sequence. Because of the nonlinearity of the measurement model, an extended Kalman filter is used to estimate the track state of the MMW radar.
A camera can detect the target distance, and its measurement vector can be expressed as follows The corresponding measurement model is as follows: where υ c = υ x υ y T represents the measurement white noise sequence. H c is the measurement matrix Both the motion state model and the measurement model are linear, so the track state of the camera is estimated through a linear Kalman filter.

Fusion Center Module
In the fusion center module, the camera detection cycle is taken as the fusion time node. In other words, during the camera detection cycle, the fusion center module will start to run after the camera tracking processing module is completed. First, the local track information of the two sensors is transformed into the same coordinate system to ensure spatial registration. In this paper, the motion coordinate system of the MMW radar is used as the fusion track coordinate system, so we only need to convert the local track information of the camera, which only involves to the conversion of a two-dimensional cartesian coordinate system, which will not be described in detail here. Moreover, every time the camera outputs a set of CAN messages, the track information obtained from the MMW radar in the previous N-cycles is fitted with the quadratic curve according to the least square method. Then, the fitted curve is extrapolated to the current time node of the camera in order to obtain the estimated value of the MMW radar track information [23]. The temporal-spatial alignment is completed. The following parts mainly design the track-track association and fusion track state estimation.

Association Algorithm Design
The measurement information of the two sensors is processed by their respective tracking processing modules in order to obtain effective targets, namely a local track. Intuitively, if the effective targets of the two sensors are close enough together, the two targets can be considered to be associated. This is an association method based on the location threshold. Specific to the MMW radar and the camera used in intelligent vehicles, in general, the longitudinal distance measurement of the MMW radar is relatively accurate, the lateral distance measurement is relatively rough, while the camera is just the opposite. This leads to a large deviation between the position of the MMW radar target and the camera target, and the greater distance between the target and the ego vehicle, the greater the deviation.
It is difficult to get good association results only depending on the position threshold, so the motion state information of the target can be further considered. The tracking processing module outputs the local track information, which can be used to compare the similarity of the motion state between the different sensor targets. The track information is used to determine the degree of association, and commonly-used methods include weighted track association [24], etc. However, when the environment is complex and there are many targets, the correlation performance of the weighted track association method will decrease, and there will be many errors and omissions in the associated track.
After comprehensive consideration, this paper designs a two-level association structure, as shown in Figure 3. Firstly, the regional collision association algorithm is designed based on the idea of a location threshold. Then, the unassociated local tracks are input into the weighted track association part. Because one association has been passed, the number of targets that need to be associated is decreased, the environmental complexity is reduced, and the weighted track association can play a better role.

Regional Collision Association
The selection of the position threshold is related to the state uncertainty of the local track, which is expressed by the state covariance in the state estimation process. In this paper, the rotation in the target motion process is ignored, and the rectangular uncertain region is established with the current local track position as the center, as shown in Figure 4. The length and width of the uncertain region are related to the position standard deviation of the local track in the longitudinal and lateral directions, respectively. For the local track, i, with state X and state covariance P, the length and width of the uncertain region are, respectively, as follows where K GL and K GW are the constants. Because there is a two-level association, the first-level association can select a smaller threshold to ensure that the association is valid.

Regional Collision Association
The selection of the position threshold is related to the state uncertainty of the local track, which is expressed by the state covariance in the state estimation process. In this paper, the rotation in the target motion process is ignored, and the rectangular uncertain region is established with the current local track position as the center, as shown in Figure 4. The length and width of the uncertain region are related to the position standard deviation of the local track in the longitudinal and lateral directions, respectively. For the local track, i, with state X and state covariance P, the length and width of the uncertain region are, respectively, as follows where GL K and GW K are the constants. Because there is a two-level association, the first-level association can select a smaller threshold to ensure that the association is valid. Regional collision association means that if two local tracks belonging to different sensors intersect with their uncertain regions, the association of two local tracks can be determined. The pseudocode to execute the regional collision association is shown in Algorithm 1. In order to quantify the degree of association between two local tracks, the concept of the Jaccard Regional collision association means that if two local tracks belonging to different sensors intersect with their uncertain regions, the association of two local tracks can be determined. The pseudocode to execute the regional collision association is shown in Algorithm 1. [3][3] then 4: return false 5: return true In order to quantify the degree of association between two local tracks, the concept of the Jaccard coefficient is cited. The Jaccard coefficient refers to the ratio between the intersection and union of the two sets. Here, the uncertain regional area of the local track is used to refer to the set, and the expression can be obtained as follows

Algorithm 1 Regional Collision Association
where J is called the association similarity index, which represents the association degree of two local tracks. S r and S c represent the uncertain region area of the local tracks of the MMW radar and camera, respectively.
FusionLife is set up to indicate the stability of the fusion track. When the fusion track is initially formed, the FusionLife value is 0. If the local tracks corresponding to the fusion track are all associated in the subsequent continuous period, then the FusionLife value is accumulated. When FusionLife reaches the set threshold of FusionLifeMax, it indicates that the fusion track is relatively stable. Then, in the later fusion time node, the corresponding local tracks that do not need to participate in the association algorithm can be directly used to update the state of the fusion track. If the FusionLife is less than FusionLifeMax at a certain fusion time node, and the corresponding two local tracks are not associated, the fusion track will die out.
For the fusion track obtained through regional collision association, the FusionLife value accumulation mode is as follows where µ area is a constant coefficient.

Weighted Track Association
It is assumed that for an MMW radar and a camera, there are local tracks, i and j, respectively. Through previous tracking processing modules, the state estimation of two local tracks isX i (k|k) and X j (k|k), and their state covariance is P i (k|k) and P j (k|k). The state estimation difference of two tracks is expressed as follows: The null hypothesis and alternative hypothesis are established, and the track association problem is transformed into a hypothesis testing problem.
H 0 :X i (k|k) andX j (k|k) are the track state estimation of the same target, namely, track i and j are associated; H 1 :X i (k|k) andX j (k|k) are not the track state estimation of the same target, namely track i and j are not associated.
It is assumed that the state errors for the local tracks of the same target are statistically independent. Under the H 0 assumption, the state estimation differences covariance of track i and j can be expressed as The statistical value of weighted track association is as follows Under the H 0 assumption, the state estimation difference ∆X ij (k|k) obeys gaussian distribution, and the statistical value α ij (k) obeys chi-square distribution. The chi-square distribution association threshold γ is selected. When α ij (k) is less than γ, the hypothesis H 0 is accepted, and tracks i and j are considered to be associated. Otherwise, we accept the hypothesis that tracks i and j are unassociated.
For the fusion track obtained through the weighted track association, the FusionLife value is accumulated in the form of FusionLi f e = FusionLi f e + ξ Weight , where ξ Weight is a constant. The associated quality is not evaluated here, so only a set constant ξ Weight is used as the added value of FusionLife.

Fusion Track State Estimation
A federated filter is applied to the distributed fusion structure [25], which can be used to build the connection of the state estimation part between the tracking processing modules and the fusion center module. A federated filter can be generally divided into four basic structures, namely: fusion-reset mode, zero-reset mode, no-reset mode, and rescale mode. In the non-reset mode, there is no information reset from the master filter to the sub-filters, so the sub-filters will not pollute each other. The non-reset mode is fast in computation and strong in fault tolerance. This paper designs a fusion track state estimation method based on the non-reset federated filter structure, as shown in Figure 5. The figure only estimates the fusion state for a single target, where z r and z c are the radar measurement and camera measurement associated with the target, respectively.X r , P r andX c , P c represent the state estimation output of the two sub-filters, andX g , P g represent state estimation output of the master filter, which is also the state information of the fusion track. The extended Kalman filter corresponds to the state estimation of the MMW radar track, and the Kalman filter corresponds to the state estimation of the camera track. They have been designed in the tracking processing module. information reset from the master filter to the sub-filters, so the sub-filters will not pollute each other. The non-reset mode is fast in computation and strong in fault tolerance. This paper designs a fusion track state estimation method based on the non-reset federated filter structure, as shown in Figure 5. The workflow of the federated filter includes the initial information determination, information allocation, time update, measurement update, and information fusion. Among them, the time update and measurement update of the two sub-filters belong to the state estimation part of the tracking processing modules, which will not be detailed here.
The target motion model adopted by the MMW radar and the camera is the same, and the target state format output by the sub-filters is the same. Therefore, the output state estimation information of the two sub-filters can be integrated into the master filter. At the initial time of fusion, the system needs to determine the initial information.
The global estimation error covariance 0 g P and system process noise 0 g Q at the initial moment can be calculated from Equations (17) and (18).
For a non-reset federated filter, the information is allocated only at the initial time. The initial information is generally distributed evenly. However, because of the different measurement accuracies of the millimeter wave radar and camera, if the initial information is distributed equally, the global estimation accuracy will be reduced. Now, r Λ and c Λ are set to represent the sum of the state covariance singular values of two sub-filters, respectively, and the singular values are used to calculate the two information allocation coefficients. The workflow of the federated filter includes the initial information determination, information allocation, time update, measurement update, and information fusion. Among them, the time update and measurement update of the two sub-filters belong to the state estimation part of the tracking processing modules, which will not be detailed here.
The target motion model adopted by the MMW radar and the camera is the same, and the target state format output by the sub-filters is the same. Therefore, the output state estimation information of the two sub-filters can be integrated into the master filter. At the initial time of fusion, the system needs to determine the initial information.
The global estimation error covariance P g 0 and system process noise Q g 0 at the initial moment can be calculated from Equations (17) and (18).
For a non-reset federated filter, the information is allocated only at the initial time. The initial information is generally distributed evenly. However, because of the different measurement accuracies of the millimeter wave radar and camera, if the initial information is distributed equally, the global estimation accuracy will be reduced. Now, Λ r and Λ c are set to represent the sum of the state covariance singular values of two sub-filters, respectively, and the singular values are used to calculate the two information allocation coefficients.
β 1 0 and β 2 0 represent the initial information allocation coefficients of the millimeter wave radar and camera local tracks, respectively. The coefficients are used to assign information about sub-filters and to update their initial information where i = 1 represents the local track information of the millimeter wave radar, i = 2 represents the local track information of the camera. In the information fusion part, the local state estimation information obtained by two independent sub-filters is fused to obtain the global optimal estimation

Experimental Result
In this paper, the ego vehicle was equipped with a Delphi's multimode electronically scanning radar (ESR) and a camera with a Mobileye Q3 chip. The MMW radar was installed in the middle of the front bumper of the ego vehicle, and the camera was installed in the windshield inside the longitudinal symmetry plane of the ego vehicle on the side of the cab. The inertial navigation system was also installed to detect the ego vehicle status. The sensor fusion experiment was carried out in an urban road environment, including street, expressways, tunnels, etc., as shown in Figure 6. Several typical working conditions were selected from the experimental data for the analysis. β represent the initial information allocation coefficients of the millimeter wave radar and camera local tracks, respectively. The coefficients are used to assign information about sub-filters and to update their initial information where 1 i = represents the local track information of the millimeter wave radar, 2 i = represents the local track information of the camera. In the information fusion part, the local state estimation information obtained by two independent sub-filters is fused to obtain the global optimal estimation ( ) ( )

Experimental Result
In this paper, the ego vehicle was equipped with a Delphi's multimode electronically scanning radar (ESR) and a camera with a Mobileye Q3 chip. The MMW radar was installed in the middle of the front bumper of the ego vehicle, and the camera was installed in the windshield inside the longitudinal symmetry plane of the ego vehicle on the side of the cab. The inertial navigation system was also installed to detect the ego vehicle status. The sensor fusion experiment was carried out in an urban road environment, including street, expressways, tunnels, etc., as shown in Figure 6. Several typical working conditions were selected from the experimental data for the analysis.

Single Target Fusion
In Figure 6a, there is only one vehicle target, which is detected by two sensors, and the fusion algorithm is run. Figure 7 shows the fusion experiment results. The whole experiment was divided into two sections, with a boundary of 28 s. The first section was gradually close to ego vehicle, while the second section was gradually away from ego vehicle, as the speed increased. During the whole experiment, the millimeter wave radar track was more stable in a longitudinal direction, and the camera track was more stable in a lateral direction, which is consistent with the characteristics of the two sensors. After the fusion algorithm, the state information of the fusion track was obtained. The

Single Target Fusion
In Figure 6a, there is only one vehicle target, which is detected by two sensors, and the fusion algorithm is run. Figure 7 shows the fusion experiment results. The whole experiment was divided into two sections, with a boundary of 28 s. The first section was gradually close to ego vehicle, while the second section was gradually away from ego vehicle, as the speed increased. During the whole experiment, the millimeter wave radar track was more stable in a longitudinal direction, and the camera track was more stable in a lateral direction, which is consistent with the characteristics of the two sensors. After the fusion algorithm, the state information of the fusion track was obtained. The longitudinal information was more like the MMW radar, and the lateral information was more like camera. The fusion track information was relatively stable and smooth, except for most of the spikes of the sensor track. Within a period of 0~10 s, the two local tracks had a relatively large deviation in the longitudinal distance, but from the perspective of all of the state components, their motion trend was the same. According to the weighted track association, they can be associated with the same target. After 10 s, the positions of the two local tracks are similar, and the association can be realized through the regional collision correlation. With the passage of time, the fusion track becomes stable gradually, and the corresponding local tracks can be fused directly, without any association. longitudinal information was more like the MMW radar, and the lateral information was more like camera. The fusion track information was relatively stable and smooth, except for most of the spikes of the sensor track. Within a period of 0~10 s, the two local tracks had a relatively large deviation in the longitudinal distance, but from the perspective of all of the state components, their motion trend was the same. According to the weighted track association, they can be associated with the same target. After 10 s, the positions of the two local tracks are similar, and the association can be realized through the regional collision correlation. With the passage of time, the fusion track becomes stable gradually, and the corresponding local tracks can be fused directly, without any association.   Figure 6b shows the multi-target motion condition on the urban expressways. Figure 8 shows the fusion experiment results under the multi-target condition. The ego vehicle was originally in the middle lane, it changed to the right lane after around five seconds, and then kept driving along the straight line. Each target track in the figure is represented by a curve of different colors and is marked with a serial number for convenient analysis. After about 35 s, the radar detected six targets, the camera detected seven targets, and the fusion algorithm confirmed the existence of eight targets. Among them, targets numbered 1, 2, 3, 4, and 7 were detected by two sensors, and the No. 8 target was detected only by the MMW radar, and the No. 5 target and No. 6 target were detected only by the camera. For the No. 2 target, the camera tracking failed within 2 to 3 s, and the MMW radar tracking failed within 4.5 to 5.7 s. Therefore, the local track information of the MMW radar and camera was used in the early stage, and then it became the fusion track. For the No. 7 target, only the MMW radar was tracking within 13.5~22 s, and the camera formed a confirmed track in 22 s, and the two were associated into a fusion track. As the distance is relatively far and is affected by the light, the camera had a large error in the longitudinal distance. At the initial stage of the fusion, the detection position of the camera and MMW radar was still close, and they could be linked together through the regional collision association. When there was a large error in camera detection, the corresponding two local tracks could be directly used for fusion due to the setting of the fusion track stability. The No. 4 and No. 5 targets appeared for a short time because of the occlusion between the targets.  Figure 6b shows the multi-target motion condition on the urban expressways. Figure 8 shows the fusion experiment results under the multi-target condition. The ego vehicle was originally in the middle lane, it changed to the right lane after around five seconds, and then kept driving along the straight line. Each target track in the figure is represented by a curve of different colors and is marked with a serial number for convenient analysis. After about 35 s, the radar detected six targets, the camera detected seven targets, and the fusion algorithm confirmed the existence of eight targets. Among them, targets numbered 1, 2, 3, 4, and 7 were detected by two sensors, and the No. 8 target was detected only by the MMW radar, and the No. 5 target and No. 6 target were detected only by the camera. For the No. 2 target, the camera tracking failed within 2 to 3 s, and the MMW radar tracking failed within 4.5 to 5.7 s. Therefore, the local track information of the MMW radar and camera was used in the early stage, and then it became the fusion track. For the No. 7 target, only the MMW radar was tracking within 13.5~22 s, and the camera formed a confirmed track in 22 s, and the two were associated into a fusion track. As the distance is relatively far and is affected by the light, the camera had a large error in the longitudinal distance. At the initial stage of the fusion, the detection position of the camera and MMW radar was still close, and they could be linked together through the regional collision association. When there was a large error in camera detection, the corresponding two local tracks could be directly used for fusion due to the setting of the fusion track stability. The No. 4 and No. 5 targets appeared for a short time because of the occlusion between the targets.

Application of Sensor Fusion
We refer to the targets closest to the ego vehicle in each lane as the dangerous targets, which have a significant impact on the decisions of the intelligent vehicle control system. Adaptive cruise control and autonomous emergency braking subsystems need to know in a timely manner whether there are dangerous vehicles in front of the ego vehicle, including dangerous vehicle targets such as a vehicle in the main lane in which the ego vehicle is located, and a vehicle cut in from the side lane [26]. This paper presents an example of dangerous target selection in the presence of target cut in and cut out in the main lane, as shown in Figure 6c. The same method can also be used to screen the dangerous targets of side lanes.
In the experiment results shown in Figure 9, During the overtaking period, the No. 2 target obscured the No. 1 target, and the detection performance of the MMW radar was better than that of the camera. Therefore, the MMW radar tracks can be used to maintain the target information, which also shows an advantage of sensor fusion. When targets were detected by both sensors, fusion tracks showed a better comprehensive performance after the fusion algorithm. With the help of the lane information provided by the camera, we could accurately judge the cutting in and cutting out time of the No. 2 target and timely change the dangerous target of the main lane.

Application of Sensor Fusion
We refer to the targets closest to the ego vehicle in each lane as the dangerous targets, which have a significant impact on the decisions of the intelligent vehicle control system. Adaptive cruise control and autonomous emergency braking subsystems need to know in a timely manner whether there are dangerous vehicles in front of the ego vehicle, including dangerous vehicle targets such as a vehicle in the main lane in which the ego vehicle is located, and a vehicle cut in from the side lane [26]. This paper presents an example of dangerous target selection in the presence of target cut in and cut out in the main lane, as shown in Figure 6c. The same method can also be used to screen the dangerous targets of side lanes.
In the experiment results shown in Figure 9, the No. 1 target was driving in the main lane, and the No. 3 target was driving in the right lane. The No. 2 target was detected to be in the right lane at 7.4 s, then it cut into the main lane, and cut back into the right lane after passing the No. 3 target. During the overtaking period, the No. 2 target obscured the No. 1 target, and the detection performance of the MMW radar was better than that of the camera. Therefore, the MMW radar tracks can be used to maintain the target information, which also shows an advantage of sensor fusion. When targets were detected by both sensors, fusion tracks showed a better comprehensive performance after the fusion algorithm. With the help of the lane information provided by the camera, we could accurately judge the cutting in and cutting out time of the No. 2 target and timely change the dangerous target of the main lane. Figure 10 shows the dangerous target state obtained using an MMW radar, camera, and fusion target, respectively. Compared with a single sensor, the fusion target provides more accurate state information. By playing back the collected video, we can see that the switching time of the dangerous target after fusion processing is more consistent with the actual situation. After the fusion algorithm, the dangerous target state curve is more stable, which can provide more accurate target state information for the control system. (e) (f) Figure 9. Experimental results of overtaking conditions. The meaning of each coordinate diagram is the same as that in Figure 7. Figure 10 shows the dangerous target state obtained using an MMW radar, camera, and fusion target, respectively. Compared with a single sensor, the fusion target provides more accurate state information. By playing back the collected video, we can see that the switching time of the dangerous target after fusion processing is more consistent with the actual situation. After the fusion algorithm, the dangerous target state curve is more stable, which can provide more accurate target state information for the control system.

Conclusions
In this paper, an algorithm framework of target level fusion of an MMW radar and a camera is designed. Combined with the regional collision association and weighted track association, a twolevel structure is proposed for local track association. Based on the non-reset federated filter, the state estimation of the fusion track is completed. In this paper, the single-target fusion, multi-target fusion, and the application of sensor fusion in dangerous target screening are selected. In all of the experiments, the association for different local tracks of the same target is good, and the overall performance of the fusion track state estimation is better than that of a single sensor. In the experiment of selecting dangerous targets, the fusion algorithm can replace dangerous targets more accurately and timely. In the future, we can consider using more accurate sensors to detect target state information and take it as the reference value of the truth value, so as to quantitatively analyze the accuracy of a fusion track.

Conclusions
In this paper, an algorithm framework of target level fusion of an MMW radar and a camera is designed. Combined with the regional collision association and weighted track association, a two-level structure is proposed for local track association. Based on the non-reset federated filter, the state estimation of the fusion track is completed. In this paper, the single-target fusion, multi-target fusion, and the application of sensor fusion in dangerous target screening are selected. In all of the experiments, the association for different local tracks of the same target is good, and the overall performance of the fusion track state estimation is better than that of a single sensor. In the experiment of selecting dangerous targets, the fusion algorithm can replace dangerous targets more accurately and timely.
In the future, we can consider using more accurate sensors to detect target state information and take it as the reference value of the truth value, so as to quantitatively analyze the accuracy of a fusion track.
Funding: This work was supported in part by the National Natural Science Foundation of China (grant number 51505354).