Robust Cooperative Multi-Vehicle Tracking with Inaccurate Self-Localization Based on On-Board Sensors and Inter-Vehicle Communication

The fusion of on-board sensors and transmitted information via inter-vehicle communication has been proved to be an effective way to increase the perception accuracy and extend the perception range of connected intelligent vehicles. The current approaches rely heavily on the accurate self-localization of both host and cooperative vehicles. However, such information is not always available or accurate enough for effective cooperative sensing. In this paper, we propose a robust cooperative multi-vehicle tracking framework suitable for the situation where the self-localization information is inaccurate. Our framework consists of three stages. First, each vehicle perceives its surrounding environment based on the on-board sensors and exchanges the local tracks through inter-vehicle communication. Then, an algorithm based on Bayes inference is developed to match the tracks from host and cooperative vehicles and simultaneously optimize the relative pose. Finally, the tracks associated with the same target are fused by fast covariance intersection based on information theory. The simulation results based on both synthesized data and a high-quality physics-based platform show that our approach successfully implements cooperative tracking without the assistance of accurate self-localization.


Introduction
Nowadays, intelligent vehicles equipped with advanced driver assistance systems (ADASs) can perceive other road participants and obstacles, including vehicles, pedestrians, etc., through on-board sensors. A variety of on-board sensors have been widely applied to achieve this goal, such as camera, Lidar, Radar, and so on. The perception system [1,2] of intelligent vehicles captures the measurements of surrounding targets through these sensors and build an environmental model which reflects the real states of different targets.
Multi-vehicle tracking (or more generally, multi-object tracking, MOT) is a crucial perception task since an accurate estimate of surrounding vehicles plays an important role in subsequent collision avoidance and route planning. The main challenge in MOT is the determination of the association between measurements and targets. In the literatures, extensive algorithms have been proposed to handle the data association problem. Multiple hypothesis tracking (MHT) [3] is known as a powerful algorithm to address the data association problem in MOT. Although MHT is Bayesian optimal in theory, the exact solution of MHT is computationally intractable, thus, requiring proper approximation. Joint probabilistic data vehicles are surrounded by skyscrapers or tall buildings, the location information provided by GPS would not be reliable. In such cases, inaccurate or even unavailable self-localization information seriously affects the performance of cooperative tracking system.
To address the above problem and enhance the robustness of cooperative multi-vehicle tracking, we propose, in this paper, an integrated framework which can determine the relative pose (including translation and orientation) of host and cooperative vehicles relying on their respective local tracks, instead of global positioning information. Moreover, this work concentrates on the dynamic situation where the relative translation and orientation between host and cooperative vehicles are changing with time. This is more consistent with real traffic environment where vehicles are usually driving with different intentions. Consequently, our cooperative multi-vehicle tracking system still works without the assistance of accurate self-localization information.
It should be emphasized that in the literature of wireless sensor networks (WSNs), some target tracking algorithms [23][24][25] have been developed for simultaneous localization and tracking (SLAT). In [24], a Bayesian filtering framework was proposed to infer the joint posterior distribution of both target and multiple sensors. Variational method [26] was used to approximate the joint state during the measurements correction stage. In [25], a dynamic non-parametric belief propagation (DNBP) method was proposed for cooperative vehicle sensing. However, most works in SLAT tend to track one target by using multiple static or moving sensors, thus, restricting their application to more complex scenarios where the number of targets can vary at different times.
The rest of this paper is organized as follows: In Section 2, we briefly review the adaptive GMPHD filter which is the basic component for target tracking; in Section 3, we present our framework of cooperative tracking and propose a Bayes model for simultaneous track association and relative pose estimation; in Section 4 we report some simulation results based on both the synthesized data and PreScan-based system; and finally, we provide some conclusions and future works in Section 5.

PHD Filter Formulation
In PHD filter, both the multi-target state and the set of measurements at each time step are represented by RFS. The PHD filter approximates the multi-target Bayes filter through propagating the first-order moment. The recursive process of PHD filter includes two steps, i.e., prediction and correction (or update) as follows: where v k|k−1 (x) and v k−1|k−1 (ζ) represent the predicted and posterior intensity of the target at time k − 1, P S,k (ζ) is the survival probability when the target state is ζ, f k|k−1 ( ) and γ k (x) denote the intensity of spawning target and newborn target, respectively. 2.
PHD correction where Z (k) = z k,1 , z k,2 , . . . , z k,M k denotes the measurement set at time k, M k is the total number of measurements, P D,k (x) is the detection probability when the target state is x, g k ( ) is the intensity of the clutter RFS.
The above two formulas are the basic recursive equations for PHD filtering. After correction at each time, the expected number of targets can be obtained by integrating the updated PHD intensity, and then taking the nearest integer value. The status of each target can be obtained from the state corresponding to the updated PHD peak point. It can be seen that the PHD filter can avoid the data association problem. However, the PHD filter includes multiple integration operations, which cannot obtain an analytical solution, and suffer from "dimensional disaster" in the calculation of numerical integration.

Gaussian Mixture Implementation
In order to give a closed-form solution for PHD recursion, Gaussian mixture PHD (GMPHD) filter uses a set of Gaussian components to approximate the posterior density where the weights, mean values, and covariances of each Gaussian component are continuously updated over time. Suppose that the posterior intensity at time k − 1 is expressed as follows: where J k−1|k−1 represents the number of Gaussian components at time k − 1, N(x|a, B) stands for the multivariate Gaussian distribution with mean a and covariance B. It is assumed that transition probability density and observed likelihood function also follow Gaussian distribution as follows: where F k−1 and H k are linear state transition matrix and linear observation matrix, respectively, and Q k−1 and R k are the covariance matrices of the process noise and the measurement noise, respectively.
Substituting v k−1|k−1 (x) in Equation (3) into the PHD prediction and correction equations, we can obtain the recursive form of the PHD in the Gaussian mixture form. Specifically, GMPHD performs the prediction and correction as follows: In this work, the spawning target is ignored and the prediction Formula (6) can be rewritten as where v S,k|k−1 (x) = P S,k Sensors 2020, 20, 3212 where w ( j) After the GMPHD correction is completed, the Gaussian components with small weight are pruned and the Gaussian components close to each other are merged. Finally, in order to extract tracks, the mean value of the Gaussian component with the weight greater than a certain threshold is used as the multi-object state estimation.
For the application of GMPHD, the intensity of newborn target γ k (x) should be defined at each time, indicating the possible state when new targets appear. In our cases, the target vehicles can encounter the FOV of host and cooperative vehicles at different positions and times. As a result, it is infeasible to define γ k (x) in advance. To address the problem, at each time, we let γ k (x) be driven by the observed measurements adaptively as follows: where P 0 denotes the initial covariance matrix and z k,i denotes the state constructed from the measurement z k,i In such a way, the resulting adaptive GMPHD filter is applicable to the cooperative tracking situations we concern.
3. Cooperative Tracking with Inaccurate Self-Localization

Framework of Cooperative Tracking
We assume that many vehicles are moving in the environment according to their maneuvers. Among these vehicles, some vehicles can independently sense the other vehicles within in certain range by using on-board sensors and exchange their local information via inter-vehicle communication. The vehicle transmitting message is called cooperative vehicle, whereas the vehicle receiving message and performing fusion is called host vehicle. Certainly, a vehicle can send or receive message depending on its role. The other vehicles not involved in cooperation are called target vehicles. This work concentrates on the situation where the relation translation and rotation between host and cooperative vehicles are dynamically changing and inaccurate (or even unknown), thus, preventing the direct application of existing technologies. To address the above problem and achieve sensor fusion for cooperative multi-vehicle tracking, we propose a novel framework depicted in Figure 1.

Formulation
The relative pose estimation and track association problem in dynamic scenarios is shown in Figure 2. Here, we focus on two-dimensional space for brevity, however, our proposed method can be extended to higher-dimensional space with slight modification. Given two vehicles S1 and S2, here, S1 is assumed to be the host vehicle, and S2 is assumed to be the cooperative vehicle, indicating that S2 sends its local estimates to S1 where the information fusion is performed. At given time , let = , , , , ⋯ , , and = , , , , ⋯ , , be the collection of local tracks of S1 and S2 which are obtained through MOT algorithm, such as adaptive GMPHD in Section 2. and denote the total number of tracks in S1 and S2, respectively. Moreover, the relative location and orientation of S2 with respect to S1 at time is characterized by = , , , , , where and , denote the translation of S2 in the Cartesian coordinate system of S1. Similarly, denotes the orientation angle. , , and represent the corresponding velocities. Any track , in the coordinate of S2 can be exactly transformed to that of S1 as follows: , where ( ) = cos − sin sin cos denotes the rotation matrix. In the situation we consider, a major difficulty is that ( , , ) is inaccurate (or even totally unknown) and dynamically changing with time. In addition, in the case of multiple targets, the association between tracks of different vehicles is also unknown. Suppose the track association between two vehicles is denoted by the true but unknown As shown in Figure 1, the host and cooperative vehicles first obtain respective local tracks by conducting the adaptive GMPHD algorithm presented in Section 2. Then, the cooperative vehicle transmits its local tracks to the host vehicle which attempts to estimate the relative translation and rotation between two vehicles and simultaneously associate the locals track from two vehicles by using the algorithm explained next. Finally, the matched tracks are fused following fast covariance intersection based on information theory (IT-FCI) [27].

Formulation
The relative pose estimation and track association problem in dynamic scenarios is shown in Figure 2. Here, we focus on two-dimensional space for brevity, however, our proposed method can be extended to higher-dimensional space with slight modification. Given two vehicles S1 and S2, here, S1 is assumed to be the host vehicle, and S2 is assumed to be the cooperative vehicle, indicating that S2 sends its local estimates to S1 where the information fusion is performed. At given time k, let X 1 be the collection of local tracks of S1 and S2 which are obtained through MOT algorithm, such as adaptive GMPHD in Section 2. N 1 k and N 2 k denote the total number of tracks in S1 and S2, respectively. Moreover, the relative location and orientation of S2 with respect to S1 at time k is characterized by where ξ k and η k , denote the translation of S2 in the Cartesian coordinate system of S1. Similarly, θ k denotes the orientation angle. Any track x 2 k,j in the coordinate of S2 can be exactly transformed to that of S1 as follows: where R(θ k ) = cos θ k − sin θ k sin θ k cos θ k denotes the rotation matrix. In the situation we consider, a major difficulty is that (ξ k , η k , θ k ) is inaccurate (or even totally unknown) and dynamically changing with time. In addition, in the case of multiple targets, the association between tracks of different vehicles is also unknown. Suppose the track association between two vehicles is denoted by the true but unknown N 1 representing the result of association between Formally, if u k ij = 1, it means that local tracks x 1 k,i and x 2 k,j refer to the same target; otherwise, it means they belong to different target. Since it is assumed that each local track in one sensor corresponds to one, and at most one local track in other sensor, we have the constraints applied to U k below In addition, in order to incorporate the prior information about s k , it is supposed that s k follows For example, similar to Kalman filter, we let a = Fŝ k−1 , B = FP k−1 F T + Q,ŝ k−1 , and P k−1 are the mean and covariance of s k−1 ; F is the state transition matrix of cooperative vehicle; and Q is covariance matrix of the process noise.
Similarly, according to Equation (17), we have the following likelihood function: where Σ is the measurement noise covariance. The above assumption is reasonable, since x 2→1 k,j should be close to x 1 k,i when x 1 k,i and x 2 k,j refer to the same target. We also assume that the local tracks are independent of each other. On the basis of the above discussion, we propose the following probabilistic model: Other prior knowledge, for example, given noisy measurement of partial entries of s k , can be easily integrated into Equation (21)

Formulation
The relative pose estimation and track association problem in dynamic scenarios is shown in Figure 2. Here, we focus on two-dimensional space for brevity, however, our proposed method can be extended to higher-dimensional space with slight modification. Given two vehicles S1 and S2, here, S1 is assumed to be the host vehicle, and S2 is assumed to be the cooperative vehicle, indicating that S2 sends its local estimates to S1 where the information fusion is performed. At given time , let , , , , ⋯ , , and , , , , ⋯ , , be the collection of local tracks of S1 and S2 which are obtained through MOT algorithm, such as adaptive GMPHD in Section 2. and denote the total number of tracks in S1 and S2, respectively. Moreover, the relative location and orientation of S2 with respect to S1 at time is characterized by , , , , , where and , denote the translation of S2 in the Cartesian coordinate system of S1. Similarly, denotes the orientation angle. , , and represent the corresponding velocities. Any track , in the coordinate of S2 can be exactly transformed to that of S1 as follows: where cos sin sin cos denotes the rotation matrix. In the situation we consider, a major difficulty is that , , is inaccurate (or even totally unknown) and dynamically changing with time. In addition, in the case of multiple targets, the association between tracks of different vehicles is also unknown. Suppose the track association between two vehicles is denoted by the true but unknown association matrix with each entry ∈ 0,1 representing the result of association between , and , , 1 , 1 . Formally, if 1, it means that local tracks , and , refer to the same target; otherwise, it means they belong to different target. Since

Expectation-Maximum (EM) Solution Algorithm
We treat s k , X 1 k , X 2 k , and U k as complete data; X 1 k , X 2 k as incomplete data; s k as hidden variable; and U k as unknown parameter. Then, we develop an effective solution in the spirit of the expectation-maximum (EM) solution algorithm, which attempts to jointly estimate the distribution of hidden variable s k and track association matrix U k in an iterative fashion. Specifically, the algorithm consists of the following two steps: 2. M-step where p refers to the pth iteration of the algorithm. The E-step and M-step repeat until certain convergence criteria is satisfied.

E-Step
Given the estimation of U k at the (p − 1)-th iteration, we need to calculate the expectation of indicating those associated tracks between sensors. Then, according to Bayes theorem and Equation (21), we can have the posterior distribution of s k as follows: Considering that R(θ k ) in the above distribution is nonlinear with respect to s k , we apply Talyor series expansion to obtain the first-order linear approximation around current estimate θ l−1 k as follows: where Incorporating Equation (25) into Equation (20), the likelihood can be approximated as Noticing the cumulative product in the numerator of Equation (24), we can, thus, iteratively apply each likelihood function P x 1 k,i |s k , x 2 k,j to update the distribution of s k . For instance, given any (i, j) ∈ Ω Sensors 2020, 20, 3212 where I is a identity matrix and K is the Kalman gain defined by Since c is a constant irrelevant to s k , the above correction equations mean that the posterior distribution of s k after combing with likelihood P x 1 k,i |s k , x 2 k,j , also follows Gaussian distribution with updated mean a and covariance B . As a result, by replacing the original a and B in Equation (19) with the latest estimates a and B , the above correction procedure can repeat until all (i, j) ∈ Ω p−1 k have been used to generate the final posterior distribution of s k .
Finally, the conditional expectation of logP s k , X 1 k , X 2 k |U can be calculated as From Equation (20), we can see that the conditional expectation in Equation (32) is difficult to evaluate because P x 1 k,i |s k , x 2 k,j is nonlinear with respect to θ k . Considering that s k follows Gaussian distribution, a special case of Monte Carlo (MC) approximation method [28] which uses the mean of s k is applied. Therefore, we obtain where η k ,θ k ,ξ k , andη k denote the estimated entries of s k .

M-Step
In the M-step, the association matrix U needs to be updated by solving the following constrained optimization problem: where d 2 ij = r T ij Σ −1 r ij is the Mahalanobis distance between local tracks x 1 k,i and x 2→1 k,j . We notice that Equation (34) is a linear sum assignment problem (LSAP) which can be readily solved by the Hungarian algorithm in polynomial time [29]. In addition, for local tracks corresponding to the same target, d 2 ij should be small, otherwise d 2 ij should be large. Taking these into account, an extra thresholding step is applied to d 2 ij such that local tracks with large distance are removed from association.

Track Fusion
The last stage of our proposed framework is to combine different estimates (generated by host and cooperative vehicles, respectively) of the same target vehicle into one solution. Since it is difficult to calculate the cross-correlation among multiple estimates, especially for our distributed fusion architecture, direct application of optimal Bayes fusion can lead to overconfidence [30]. To address this problem, we apply a special version of covariance intersection (CI), termed as information theory based fast CI (IT-FCI) [27] which is given as:x where (x 1 , P 1 ) and (x 2 , P 2 ) denote two estimates of state corresponding to the same target, ω is the weight. Let l denote the dimension of state, then ω is determined by

Performance Evaluation and Results
Currently, it is not easy to test cooperative perceptions system using real vehicles due to the high cost and potential risk [31]. Therefore, following previous studies [12], in this section, we carry out two types of computer simulation to evaluate the performance of the proposed cooperative multi-vehicle tracking.

Simulation Based on Synthesized Data
A total of seven target vehicles are moving on the region (−800 m, 800 m) × (−800 m, 800 m). In addition, there are two intelligent vehicles (Car-1 and Car-2) which are equipped with sensor, and thus can sense the target vehicles in the environment. Car-1 and Car-2 is treated as the host and cooperative vehicle, respectively. Therefore, the local tracks from Car-2 are sent and fused with local tracks from Car-1 to enhance the perception performance. The perception range for each sensor is 500 m, indicating that the target vehicles with distance larger than 500 from Car-1 and Car-2 cannot be tracked. The coordinate system of Car-1 is viewed as reference system and the relative motion between the target vehicles and Car-1 is assumed to follow the near constant velocity (NCV) [32] motion model where diag represents a diagonal matrix, target state For each target vehicle, the position ξ k and η k can be observed with measurement noise following Gaussian distribution with zero mean and covariance matrix diag [1,1]. For Car-2, besides the above relative motion in position, the relative rotation angle (in radian) between Car-2 and Car-1 also changes according to nonlinear model θ k = 0.3 + 0.1 sin(0.1k) in order to simulate the dynamic variation of vehicle heading. False alarms at any scan time are generated by Poisson distribution with mean λ = 3. The probability of detection P D = 0.98. Adaptive GMPHD filtering algorithm, presented in Section 2, is conducted on Car-1 and Car-2 independently so that the local tracks can be obtained. The simulation was performed for 50 Monte Carlo runs with randomly generated process noise and measurements, thus, changing the trajectory and measurement of each target at each run. The simulation length is set to 100 s. Figure 3 show one simulation run. Figure 3a shows the relative trajectories of seven target vehicles and Car-2 in the coordinate system of Car-1. Figure 3b,c show the measurements from both the target vehicles and false alarms in the coordinate system of Car-1 and Car-2, respectively. As can be seen from Figure 3, at different times, Car-1 and Car-2 can track a different number of target vehicles because of vehicle appearance, disappearance, or out of perception range. The uncertain miss detection, i.e., false alarm, will influence the measurements observed by Car-1 and Car-2.
at any scan time are generated by Poisson distribution with mean 3. The probability of detection 0.98. Adaptive GMPHD filtering algorithm, presented in Section 2, is conducted on Car-1 and Car-2 independently so that the local tracks can be obtained. The simulation was performed for 50 Monte Carlo runs with randomly generated process noise and measurements, thus, changing the trajectory and measurement of each target at each run. The simulation length is set to 100 s. Figure 3 show one simulation run. Figure 3a shows the relative trajectories of seven target vehicles and Car-2 in the coordinate system of Car-1. Figure 3b,c show the measurements from both the target vehicles and false alarms in the coordinate system of Car-1 and Car-2, respectively. As can be seen from Figure 3, at different times, Car-1 and Car-2 can track a different number of target vehicles because of vehicle appearance, disappearance, or out of perception range. The uncertain miss detection, i.e., false alarm, will influence the measurements observed by Car-1 and Car-2.  Figure 4 shows the estimate of relative translation and orientation angle in a simulation run. As can be seen, the estimated results are rather close to the true state of Car-2 with respect to Car-1. The association between local tracks from Car-1 and Car-2 at time 57 is shown in Figure 5. To quantitatively measure the accuracy of state estimation of cooperative vehicle, we calculate the absolute error (AE) for the state estimation. For example, the AE for is defined as follows: 1 where 100 is the simulation length, and represent the estimated and true position of Car-2 at time . Finally, we calculate the average, maximum, and minimum AE across all Monte Carlo runs. The overall results are shown in Table 1 showing that state estimation is rather accurate.  Figure 4 shows the estimate of relative translation and orientation angle in a simulation run. As can be seen, the estimated results are rather close to the true state of Car-2 with respect to Car-1. The association between local tracks from Car-1 and Car-2 at time k = 57 is shown in Figure 5. To quantitatively measure the accuracy of state estimation of cooperative vehicle, we calculate the absolute error (AE) for the state estimation. For example, the AE for ξ is defined as follows: where K = 100 is the simulation length, ξ k andξ k represent the estimated and true position of Car-2 at time k. Finally, we calculate the average, maximum, and minimum AE across all Monte Carlo runs. The overall results are shown in Table 1 showing that state estimation is rather accurate.   To evaluate the effect of cooperative tracking, we use the criterion known as optimal subpattern assignment (OSPA) distance because it captures the difference in cardinality and individual elements between to finite sets [33,34]. The OSPA distance with order and cut-off is given by:    To evaluate the effect of cooperative tracking, we use the criterion known as optimal subpattern assignment (OSPA) distance because it captures the difference in cardinality and individual elements between to finite sets [33,34]. The OSPA distance with order and cut-off is given by:  To evaluate the effect of cooperative tracking, we use the criterion known as optimal subpattern assignment (OSPA) distance because it captures the difference in cardinality and individual elements between to finite sets [33,34]. The OSPA distance with order p and cut-off c is given by: k ,x 2 k , · · · ,xN k k , Π denotes all possible permutation of set 1, 2, · · · ,N k . The above definition is suitable when N k ≤N k , otherwise we should use d c p X k , X k . In the simulation, we let p = 1 and c = 50. The Monte Carlo average of the OSPA distance obtained by Car-1, Car-2, and the fusion are shown in Table 2. In addition, we also show in Table 2 the optimal fusion result (fusion-opt), assuming the relative translation and orientation is accurately given. As can be seen, compared with independent perception, the average OSPA obtained by sensor fusion (cooperative tracking) has reduced by 27% and 43% with respect to Car-1 and Car-2, respectively. The variation of OSPA distance and the number of objects versus time in a simulation run is shown in Figure 6. As we can observe, the local tracks from Car-1 and Car-2 can be fused, thus, reducing the OSPA distance and improving the performance of MOT.  . The above definition is suitable when ≤ , otherwise we should use , . In the simulation, we let = 1 and = 50. The Monte Carlo average of the OSPA distance obtained by Car-1, Car-2, and the fusion are shown in Table 2. In addition, we also show in Table 2 the optimal fusion result (fusion-opt), assuming the relative translation and orientation is accurately given. As can be seen, compared with independent perception, the average OSPA obtained by sensor fusion (cooperative tracking) has reduced by 27% and 43% with respect to Car-1 and Car-2, respectively. The variation of OSPA distance and the number of objects versus time in a simulation run is shown in Figure 6. As we can observe, the local tracks from Car-1 and Car-2 can be fused, thus, reducing the OSPA distance and improving the performance of MOT. In order to compare the CPU time when dealing with an oncoming set of measurements, we show in Table 3 the average execution time (millisecond) required by Car-1, Car-2, and the fusion stage. Notice that the execution time of Car-1 and Car-2 is mainly consumed by adaptive GMPHD filtering, while fusion stage concentrates on local tracks association and covariance intersection fusion. We can see from Table 3 that the execution time consumed by sensor fusion is less than 10% of the adaptive GMPHD filtering. It indicates that the introduction of sensor fusion does not influence the efficiency, however, significantly improves the tracking performance of the whole tracking system.   In order to compare the CPU time when dealing with an oncoming set of measurements, we show in Table 3 the average execution time (millisecond) required by Car-1, Car-2, and the fusion stage. Notice that the execution time of Car-1 and Car-2 is mainly consumed by adaptive GMPHD filtering, while fusion stage concentrates on local tracks association and covariance intersection fusion. We can see from Table 3 that the execution time consumed by sensor fusion is less than 10% of the adaptive GMPHD filtering. It indicates that the introduction of sensor fusion does not influence the efficiency, however, significantly improves the tracking performance of the whole tracking system.

Simulation Based on PreScan Platform
PreScan is a physics-based simulation platform which can be used to construct various driving environments for the design and verification of autonomous vehicles. In PreScan, a lot of elements, such as road, vehicles, sensors, vehicle-to-vehicle communication, weather etc., can be configured according to specific requirements. In order to evaluate our proposed cooperative multi-vehicle tracking system, we have built a driving scenario where 11 vehicles are deployed. Among these vehicles, two vehicles (called Car-1 and Car-2) are equipped with Radar sensor to percept surrounding vehicles. Table 4 shows the parameter of Radar. We can see from this table that for Car-1 and Car-2, only the leading vehicles with distance less than 150 m and azimuth in the range of −120 • and 120 • can be detected. The simulation length is 100. The simulation scenario at starting time and ending time is shown in Figure 7a,b, respectively. Car-1 and Car-2 are also marked in Figure 7 for clarity. Similar to the previous simulation, Car-1 is treated as the host vehicle, while Car-2 is treated as the cooperative vehicle. From Figure 7, we notice that both the relative translation and rotation between Car-1 and Car-2 are changing with time, especially when the two vehicles pass through the junction. In Figure 8, we illustrate the measurements observed by Car-1 and Car-2 in their respective coordinate system. For Car-1, we also show the true relative trajectories of the other vehicles. It can be observed that due to the limitation of perception range and possible occlusion between vehicles, some vehicles cannot always be detected during the simulation. Consequently, the measurements of certain vehicle cannot cover the corresponding trajectory completely.

Simulation Based on PreScan Platform
PreScan is a physics-based simulation platform which can be used to construct various driving environments for the design and verification of autonomous vehicles. In PreScan, a lot of elements, such as road, vehicles, sensors, vehicle-to-vehicle communication, weather etc., can be configured according to specific requirements. In order to evaluate our proposed cooperative multi-vehicle tracking system, we have built a driving scenario where 11 vehicles are deployed. Among these vehicles, two vehicles (called Car-1 and Car-2) are equipped with Radar sensor to percept surrounding vehicles. Table 4 shows the parameter of Radar. We can see from this table that for Car-1 and Car-2, only the leading vehicles with distance less than 150 m and azimuth in the range of −120° and 120° can be detected. The simulation length is 100. The simulation scenario at starting time and ending time is shown in Figure 7a,b, respectively. Car-1 and Car-2 are also marked in Figure 7 for clarity. Similar to the previous simulation, Car-1 is treated as the host vehicle, while Car-2 is treated as the cooperative vehicle. From Figure 7, we notice that both the relative translation and rotation between Car-1 and Car-2 are changing with time, especially when the two vehicles pass through the junction. In Figure 8, we illustrate the measurements observed by Car-1 and Car-2 in their respective coordinate system. For Car-1, we also show the true relative trajectories of the other vehicles. It can be observed that due to the limitation of perception range and possible occlusion between vehicles, some vehicles cannot always be detected during the simulation. Consequently, the measurements of certain vehicle cannot cover the corresponding trajectory completely.  The relative translation and orientation estimated based on our proposed approach are shown in Figure 9. We can see that in most cases, the estimated relative translation and orientation is rather close to the true value. Figure 10 shows the local tracks from two vehicles before and after association at time step k = 50. In this case, Car-1 and Car-2 can detect nine and seven vehicles, respectively. After association, local tracks from Car-1 and Car-2 can be correctly matched, thus, leading to a total of 10 vehicles being detected. The variation of OSPA distance and the number of targets versus the simulation time is shown in Figure 11. The mean OSPA distance for Car-1, Car-2, and Fusion is 12.13, 22.10, and 9.80, respectively. A comparison with the case where Car-1 and Car-2 perform tracking alone shows that the fusion of their perception results not only reduces OSPA distance but also leads to better estimation of the number of vehicles. In summary, we conclude that the cooperative tracking successfully extends the perception field of view, thus, achieving superior MOT performance.
between Car-1 and Car-2 are changing with time, especially when the two vehicles pass through the junction. In Figure 8, we illustrate the measurements observed by Car-1 and Car-2 in their respective coordinate system. For Car-1, we also show the true relative trajectories of the other vehicles. It can be observed that due to the limitation of perception range and possible occlusion between vehicles, some vehicles cannot always be detected during the simulation. Consequently, the measurements of certain vehicle cannot cover the corresponding trajectory completely.   The relative translation and orientation estimated based on our proposed approach are shown in Figure 9. We can see that in most cases, the estimated relative translation and orientation is rather close to the true value. Figure 10 shows the local tracks from two vehicles before and after association at time step = 50. In this case, Car-1 and Car-2 can detect nine and seven vehicles, respectively. After association, local tracks from Car-1 and Car-2 can be correctly matched, thus, leading to a total of 10 vehicles being detected. The variation of OSPA distance and the number of targets versus the simulation time is shown in Figure 11. The mean OSPA distance for Car-1, Car-2, and Fusion is 12.13, 22.10, and 9.80, respectively. A comparison with the case where Car-1 and Car-2 perform tracking alone shows that the fusion of their perception results not only reduces OSPA distance but also leads to better estimation of the number of vehicles. In summary, we conclude that the cooperative tracking successfully extends the perception field of view, thus, achieving superior MOT performance.

Concluding Remarks and Future Work
In this paper, we present a novel framework for cooperative multi-vehicle tracking when the self-localization information is not available. The adaptive GMPHD filter is applied to implement effective vehicle tracking when the intensity of newborn target is infeasible to define in advance. A Bayes formulation for joint track association and relative pose estimation is developed and the solution is derived by following the EM algorithm. Finally, the tracks associated successfully are fused by fast covariance intersection based on theory information. The simulation results demonstrate the relative pose between host and cooperative vehicles can be inferred accurately in most cases. In addition, with slightly increased computational costs, the cooperative multi-vehicle tracking demonstrates clear advantage over the non-cooperative tracking algorithm in terms of perception performance.
In reality, communication delay is another important factor that affects the performance of cooperative perception [35], especially when the bandwidth of the wireless channel is limited. Therefore, suitable temporal alignment is necessary to account for the time bias caused by the communication delay and algorithm execution. The simplest approach is based on the prediction model for the compensation of communication delay. This work assumes the tracks from different vehicles have been synchronized to the same time instant before track association and relative pose estimation. In addition, loss of communication links also hinders the application of our proposed algorithm. Actually, when the communication links interrupt, the host vehicle cannot receive the OSPA dist.
Number of objects

Concluding Remarks and Future Work
In this paper, we present a novel framework for cooperative multi-vehicle tracking when the self-localization information is not available. The adaptive GMPHD filter is applied to implement effective vehicle tracking when the intensity of newborn target is infeasible to define in advance. A Bayes formulation for joint track association and relative pose estimation is developed and the solution is derived by following the EM algorithm. Finally, the tracks associated successfully are fused by fast covariance intersection based on theory information. The simulation results demonstrate the relative pose between host and cooperative vehicles can be inferred accurately in most cases. In addition, with slightly increased computational costs, the cooperative multi-vehicle tracking demonstrates clear advantage over the non-cooperative tracking algorithm in terms of perception performance.
In reality, communication delay is another important factor that affects the performance of cooperative perception [35], especially when the bandwidth of the wireless channel is limited. Therefore, suitable temporal alignment is necessary to account for the time bias caused by the communication delay and algorithm execution. The simplest approach is based on the prediction model for the compensation of communication delay. This work assumes the tracks from different vehicles have been synchronized to the same time instant before track association and relative pose estimation. In addition, loss of communication links also hinders the application of our proposed algorithm. Actually, when the communication links interrupt, the host vehicle cannot receive the

Concluding Remarks and Future Work
In this paper, we present a novel framework for cooperative multi-vehicle tracking when the self-localization information is not available. The adaptive GMPHD filter is applied to implement effective vehicle tracking when the intensity of newborn target is infeasible to define in advance. A Bayes formulation for joint track association and relative pose estimation is developed and the solution is derived by following the EM algorithm. Finally, the tracks associated successfully are fused by fast covariance intersection based on theory information. The simulation results demonstrate the relative pose between host and cooperative vehicles can be inferred accurately in most cases. In addition, with slightly increased computational costs, the cooperative multi-vehicle tracking demonstrates clear advantage over the non-cooperative tracking algorithm in terms of perception performance.
In reality, communication delay is another important factor that affects the performance of cooperative perception [35], especially when the bandwidth of the wireless channel is limited. Therefore, suitable temporal alignment is necessary to account for the time bias caused by the communication delay and algorithm execution. The simplest approach is based on the prediction model for the compensation of communication delay. This work assumes the tracks from different vehicles have been synchronized to the same time instant before track association and relative pose estimation. In addition, loss of communication links also hinders the application of our proposed algorithm. Actually, when the communication links interrupt, the host vehicle cannot receive the message from the cooperative vehicle, and thus fail to enhance the perception ability by fusion. After the communication links recover, the proposed algorithm can be performed by the host vehicle once the message from cooperative vehicle arrives. In future work, we plan to investigate the integration of temporal alignment into our framework to further enhance the performance of multi-vehicle tracking. Moreover, the extension of our approach to explicitly consider the effect of communication delays and failure is an interesting direction. One possible solution is to integrate these factors into our probabilistic model by introducing new variables. We would discuss these problems in future work.