Joint Probabilistic Hypergraph Matching Labeled Multi-Bernoulli Filter for Rigid Target Tracking

: The likelihood determined by the distance between measurements and predicted states of targets is widely used in many ﬁlters for data association. However, if the actual motion model of targets is not coincided with the preset dynamic motion model, this criterion will lead to poor performance when close-space targets are tracked. For rigid target tracking task, the structure of rigid targets can be exploited to improve the data association performance. In this paper, the structure of the rigid target is represented as a hypergraph, and the problem of data association is formulated as a hypergraph matching problem. However, the performance of hypergraph matching degrades if there are missed detections and clutter. To overcome this limitation, we propose a joint probabilistic hypergraph matching labeled multi-Bernoulli (JPHGM-LMB) ﬁlter with all undetected cases being considered. In JPHGM-LMB, the likelihood is built based on group structure rather than the distance between predicted states and measurements. Consequently, the probability of each target associated with each measurement (joint association probabilities) can be obtained. Then, the structure information is integrated into LMB ﬁlter by revising each single target likelihood with joint association probabilities. However, because all undetected cases is considered, proposed approach is usable in real time only for a limited number of targets. Extensive simulations have demonstrated the signiﬁcant performance improvement of our proposed method.


Introduction
For multi-target tracking (MTT) task, the number and the states of targets are jointly estimated from a sequence of measurements. MTT is widely used in many applications, such as radar [1,2], sonar [3], computer vision [4], robotics [5], etc. Compared with single target tracking, the number of targets changes with time in MTT because of target birth and target death.
The simplest method for MTT is the global nearest neighbor (GNN) tracker [6]. However, since a unique association between targets and measurements is obtained, this method cannot work well for close-space targets. Joint probabilistic data association (JPDA) filter [7,8], multiple hypotheses tracking (MHT) [9,10], and random finite set (RFS) framework [11,12] are three major works for MTT. JPDA filter and MHT consist of two steps: data association and (single-target) filtering. However, data association is integrated into filtering step in RFS framework with explicit data association being avoided. JPDA filter is widely used to track multiple targets under the assumption that the number of targets is already known and fixed. In JPDA, joint association probabilities rather than a unique association result are obtained to avoid conflicting association. Several extensions and related works of JPDA can be found in [13][14][15][16]. MHT attempts to take all of the association hypotheses into account and eliminates association hypotheses with low posterior probability to achieve tractable computational complexity. Related works and extensions of MHT can be found in [17][18][19]. In RFS framework, multi-target state is modeled as a finite set [11,12], and the posterior density of the multi-target state is propagated recursively based on Bayesian formulation. There are many works based on RFS framework, such as the probability hypothesis density (PHD) [20], cardinalized PHD (CPHD) [21], multi-Bernoulli filters [22,23], generalized labeled multi-Bernoulli filter (GLMB) [24,25] and labeled multi-Bernoulli filter (LMB) [26].
In many scenarios, targets travel in group, such as a group of airplanes moving in formation and the scattering centers on a 2D rigid target [27,28]. The scattering centers on a 2D rigid target share common motions and maintain relative order in certain time range [29], these scattering centers can be considered as group of targets, each target in group is almost rigidly positioned with respect to each other, thus the structure information of these scattering centers can be used to enhance tracking performance. For example, the states of undetected targets can be inferred through other detected targets. Although the structure information is beneficial to MTT, the targets in group are treated independently in JPDA filter, MHT, LMB, etc. In these filters, the measurement is associated with the closest target 's state, the reason is that the smallest distance corresponds to the highest likelihood under the assumption that these targets share common motion. Therefore, incorrect data association results can be obtained when actual motion model is not coincided with the preset dynamic motion model. In this paper, the structure information are used to enhance the tracking performance of rigid targets.
In order to make better use of structure information of group targets, a representation of hypergraph is introduced in this paper. Hypergraph representation is widely used in many applications, such as target tracking [30,31], feature association [32], etc. Considering measurements and predicted target states as the vertices of the hypergraph, the problem of data association can be formulated as a hypergraph matching problem. Hypergraph matching problem has been widely investigated with numerous algorithms being developed. [33] has presented a probabilistic model for soft hypergraph matching, the hypergraph matching problem is derived in a probabilistic setting represented by a convex optimization. [34] introduces a random walk view on the hypergraph matching problem. In [31], the hypergraph matching labeled multi-Bernoulli (HGM-LMB) filter is proposed to enhance the performance of data association. However, HGM-LMB cannot handle missed detections well. In order to overcome the limitation of HGM-LMB, we propose a joint probabilistic hypergraph matching labeled multi-Bernoulli (JPHGM-LMB) filter with all undetected cases being considered. In JPHGM-LMB, the likelihood is built based on group structure rather than distance between predicted states and measurements. Then, joint association probabilities can be obtained using the structure information. Next, these joint association probabilities are integrated into labeled multi-Bernoulli filter (LMB) by revising each single target likelihood. However, the computational complexity of the proposed approach exponentially grows with the number of targets since all undetected cases are considered, the proposed approach is usable in real time only for a limited number of targets.
The rest of the paper is outlined as follows. Section 2 reviews the LMB filter and hypergraph matching. Our JPHGM-LMB is presented detailedly in Section 3. Extensive experiments are conducted to demonstrate the effectiveness of our proposed approach in Section 4. Conclusion is drawn in Section 5.

Review of LMB Filer and Hypergraph Matching
In this paper, hypergraph matching is integrated into LMB filter to enhance tracking performance, so LMB filter and hypergraph matching algorithm in [33] are briefly reviewed in this section.

LMB Filter
The density of an LMB RFS with parameter set {r ( ) , p ( ) } l∈L is given below [26] π(X) = ∆(X)w(L(X))p X , where r ( ) is the existence probability and p ( ) is the probability density of the track corresponding to label l ∈ L, ∆(X) = δ |X| (|L(X)|) is the distinct label indicator, δ Y (X) is the generalized Kronecker delta function, 1 Y (X) is the inclusion function, | · | is used to obtain the cardinality of a set, L : X × L → L is a function extracting labels, p X = ∏ x∈X p(x) is the multi-object exponential notation.
In the following, prediction and correction step of LMB filter are reviewed. For further details on mathematical proofs and analysis, see [24][25][26]. Prediction: Suppose that the multi-target posterior density and the multi-target birth model are both LMB RFS with label space L, B and parameter sets π = {r ( ) , p ( ) } l∈L , π B = {r B , p B } l∈B , respectively.
The multi-target predicted density is also an LMB RFS with label space and parameter set of this LMB RFS can be obtained where r ( ) p + is the predicted spatial distribution, p S (·, ) is the state dependent survival probability, η S ( ) is survival probability of track , f (x|·, ) is the single target transition density of track , < f , g > f (x)g(x)dx is the inner product.

Correction:
Suppose that the multi-target predicted density is an LMB RFS represented by parameter set then LMB RFS that matches exactly the first moment of the multi-target posterior density is π(·|Z) = {r ( ) , p ( ) } ∈L + where F (X) is all finite subsets of X, Θ I + is the space of mappings, θ : I + → {0, 1, . . . , |Z|}, p D (x, ) is the detection probability, g(z θ ( )|x, ) is the single target likelihood, κ(·) is the intensity of Poisson clutter.

Hypergraph Matching
Hypergraph representation is widely applied in many practical applications, such as target tracking, visual recognition. A hypergraph G = (V, E) is constructed by a set of vertices V and a set of hyperedges E which contains d-tuple of vertices. Vertices represent elementary units, hyperedge represents a relationship among a tuple of vertices. There are many hypergraph matching algorithms [33,34]. Probabilistic hypergraph matching method which derives hypergraph matching problem in a probabilistic setting [33] is used in this paper because it retains more possible association results than hard matching methods and it is reviewed below.
Given two hypergraphs G = (V, E) and G = (V , E ), each hyperedges contains d-tuple of vertices, then The objective of matching between hypergraphs G and G is to seek an optimal vertex-to-vertex mapping m : V → V . In [33], the output of hypergraph matching is a vertex matching matrix X with entries where P(m(v) = v |G, G ) means the matching probability between v and v . The hard matching results can be obtained from X easily. Hyperedge represents a relationship among a tuple of vertices, hyperedge matching can be induced by vertex matching m : ) ∈ E , then the relationship between vertex matching and hyperedge matching can be established, the input of hypergraph matching in [33] is a hyperedge matching matrix S with entries S e,e = P(m(e) = e |G, G ), (17) where P(m(e) = e |G, G ) means the matching probability between e and e . In this paper, S e,e = exp(−|∂ − ∂ |) is used, where ∂ is the implementation of hyperedge e, it is worth note that the implementation of hyperedge depends on specific applications. For example, the Euclidian distance between two vertices is an implementation of an edge with d = 2 when the scattering centers on a 2D rigid target are tracked.
Because of Proposition 1 in [33], the connection between input and output is established.
From [33], hypergraph matching problem is transformed into a convex optimization problem, X can be recovered from S by minimizing the distance between S and ⊗ d X: where 1 is the column vector of which every elements is 1.
The optimization problem is solved in the context of a relative entropy error measure, see more details in [33].

JPHGM-LMB
Structure information can be used to achieve better tracking performance in scenario that group targets are well-structured, such as tracking scattering centers on a 2D rigid target. Considering predicted positions of each target and measurements as vertices of a hypergraph (the sensor is assumed to measure the position directly for simplicity), then the data association problem can be formulated as a problem of hypergraph matching, see Figure 1a.  Hypergraph representation allows us to use structure information for better data association performance. However, if there are missed detections and clutter, the hypergraph matching may produce wrong result as shown in Figure 1b, the underlying reason is that hypergraph matching finds an optimal subgraph of a hypergraph to matching another hypergraph. If there are some targets which are not detected, the maximum common subgraph between two hypergraphs should be found. In this paper, we focus on how to exploit structure information when undetected targets exist. Note, HGM(G, G ) denotes the hypergraph matching between G and G .
The measurement set obtained at time k is denoted as is generated based on the preset target dynamic motion model, N is the number of targets. Data association is to seek an optimal matching element z i ∈ Z k for each x j k|k−1 ∈ X k|k−1 . An association matrix A k N×M is defined, which consists of a set of association events.
where b it i means z t i is associated with x i k|k−1 , i = 1, 2, ..., N, t i = 1, ..., M. As mentioned above, Z k and X k|k−1 can be represented by hypergraphs, each element of Z k and X k|k−1 is the vertex of the hypergraph, data association problem is transformed into the problem of hypergraph matching. Because of missed detections and clutter, the maximum common subgraph between two hypergraphs should be found, data association problem become more challenging.
In order to deal with the missed detections, one intuitive idea is to go through all the undetected cases by enumerating all the subsets of X k|k−1 inspired by JPDA.
where H is total number of the subsets and H = 2 N . ∀y l ⊂ X k|k−1 , y l means a hypothesis that targets contained in y l are detected, the association result between y l and Z k can be obtained by hypergraph matching, where The probability of association event b l given the measurement set Z k = {Z l } k l=1 can be obtained via the total probability principle and Bayesian formulas where c is the normalization coefficient and of the form For notational convenience to deduce p(Z k |b l , Z k−1 ) and p(b l |Z k−1 ), two indicators are defined firstly.
Target detection indicator False (unassociated) measurement indicator Once b l is given, the association relationship between Z k and X k|k−1 is established, the likelihood for each z t i can be obtained.
The likelihood in Equation (27) is widely used for data association, it performs well under the assumption that actual motion model of target is coincident with the preset dynamic motion model. In some practical applications, however, this assumption does not hold.
In this paper, we focus on tracking rigid targets whose group structure almost maintains unchanged. The value of likelihood in Equation (27) for each target in group should be almost close because of the same dynamic motion model. The likelihood difference ij between i-th target and j-th target is defined Then If z k t i and z k t j are generated by the targets in the same group, log ij should be a small value, given the threshold α, then The maximum likelihood differences among b l is denoted as ∆ If ∆ > α, b l is considered as a wrong association, then If ∆ α, b l is considered a correct association, large likelihood should be achieved, From Equation (26), the number of undetected targets is ϕ(b l ), then Substitute Equations (35) and (36) into Equation (34) , then we can obtain Substitute Equations (32), (33) and (37) into Equation (24) , we have The marginal probability can be obtained At this point, the probability of each target associated with each measurement in Equation (39) is obtained by making full use of the structure information. This data association result mentioned above is integrated into LMB filter by revising the single target likelihood in Equation (15), the revised target likelihood is given below, The revised target likelihood g * (z θ ( )|x, ) is integrated into the LMB filter by replacing g(z θ ( )|x, ), then JPGHM-LMB filter is obtained.

Simulation
In this section, the performance of proposed method is evaluated on two simulated scenarios with missed detections and clutter. In the first scenario, a 2D rigid target with four scattering centers is tracked, these four scattering centers are considered as four targets and their relative position are assumed to be unchanged, the effect of detection probability and maneuverability on the tracking performance is analyzed. In the second scenario, the validation of proposed method is extended to more complex case, two 2D rigid targets with four scattering centers are tracked, the effect of measurement errors and occlusions are considered.
In our experiment, the target state x = [r x , r y ,ṙ x ,ṙ y ] consists of the position and velocity components , where r x andṙ x are the position and velocity in the x direction, likewise of y direction. The discrete-time version of the CV model is employed, the time evolution of target state is given where k is the process noise which is modeled as Gaussian white noise with zero mean and covariance Q where Ω represents the acceleration error and t is the sampling time. For simplicity, the sensor is assumed to measure the position directly, and the measurement model is where z (j) k is the measurement generated by j-th target at time k, q k is the measurement noise which is modeled as Gaussian white noise with zero mean and covariance R k , and where σ m is the standard deviation of measurement noise, then the single likelihood in Equation (15) is g(z θ ( )|x, ) = N (z θ ( ); H k x , R k ).
In this paper, optimal subpattern assignment metric (OSPA) [35] and clear multiple object tracking (mot) metrics [36] are used for performance evaluation over 100 Monte Carlo runs. The OSPA distance is defined by where m < n, κ k = {x k } is the estimated RFS, ∏ n is the assignment results which assign κ toκ, p means p − norm, c is the penalty cost for cardinality mismatch. In this simulation, c = 5 and p = 2.
Clear mot metrics contains two metrics including the multiple object tracking precision (MOTP) (which measures the performance of target states estimation, smaller value means better performance) and the multiple object tracking accuracy (MOTA) (which takes misses, false positives, mismatches into account). In MTT system, targets which have no estimates are considered as misses, the estimates for which no real target exists are considered as false positives. If the label of a target changed compared to previous frame, a mismatch happened.
In our experiment, the survival probability p s = 0.99, the acceleration error is set to Ω = 0.1 m/s 2 , the mean number of clutter λ c = 10 and clutter is assumed to be uniformly distributed over the surveillance area, the threshold α is set to 0.5, the standard deviation of measurement noise σ m is set to 0.1 m. The simulations are implemented by MATLAB R2019a on Intel Core i9-9750H 2.6 GHz processor and 16 GB RAM.
In order to test the superiority of the proposed approach, LMB, HGM-LMB and JPHGM-LMB are applied to track these four targets with detection probability p D = 0.98, the obtained trajectories by these three filters are shown in Figure 3. From Figure 3, it can be seen that the trajectories of these filters are almost identical to the true trajectory during the intervals 0 s-20 s and 40 s-60 s. However, the trajectories of LMB cross each other a lot and that of other filters are maintained correctly during the interval 20 s-40 s. The reason is that the actual motion model does not coincide with the transition motion model during 20 s-40 s. Therefore, measurements generated by one target may not appear near this target, the data association may produce a wrong association result in LMB. However, structure information facilitates HGM-LMB and JPHGM-LMB to produce more accurate trajectories. HGM-LMB and JPHGM-LMB achieve similar performance with high detection probability, HGM-LMB and JPHGM-LMB maintain correct trajectories as shown in Figure 3b,c. To further illustrate this, data association results of LMB and JPHGM-LMB at k = 33 are shown in Figure 4.  As we can see from Figure 4, the measurement generated by target 4 is associated with target 3 in LMB since the distance between measurement 4 and target 3 is smaller than others. When the structure information is used, the correct association result can be obtained, see Figure 4b.
In order to illustrate that JPHGM-LMB achieves better data association performance than HGM-LMB with missed detections and clutter, the detection probability p D is set to 0.9. Then these two filters are used to track these four targets, and the data association results of HGM-LMB and JPHGM-LMB at are shown in Figure 5.
From Figure 5, we can see that target 2 is not detected and the clutter is associated with target 2 in HGM-LMB since HGM-LMB is built based on the assumption that there is no missed detection. Therefore, the clutter is considered as a valid measurement and associated with target 2. However, target 2 can be identified as the undetected target in JPHGM-LMB, then better association performance can be achieved.
The effect of the turning rate ω and detection probability p D on the tracking performance is analyzed. Note that while analyzing the effect of one parameter, the other parameters remain unchanged. The OSPA of these filters with different detection probability p D and ω = 3 • /s are shown in Figure 6.
When targets maneuvering during the interval 20 s-40 s, there is a mismatch between actual motion model and dynamic motion model. Consequently, the estimation accuracy of target states degrades even if the data association is correct. Furthermore, it is susceptible to obtain wrong association results because of the model mismatching. From Figure 6, we can see that the OSPA distances of three filters increase when targets maneuvering. It can be observed that JPHGM-LMB filter outperforms HGM-LMB and LMB filter when the detection probability varies because not only the structure information of targets is used, but also all of the undetected cases are considered. However, the performance of JPHGM -LMB also degrades as detection probability decreases. The reason is that missed detections broken the structure of targets to some extent. An extreme case is that only one target is detected and thus no structure information can be used.  In order to further validate the superiority of JPHGM -LMB, clear mot metrics of these filters with different detection probability p D are shown in Table 1. From Table 1, we can see that JPHGM-LMB achieves lower miss rate and false positive rate than HGM-LMB and LMB. The number of mismatches of JPHGM-LMB is less than other filters as detection probability varies. These results further demonstrate the effectiveness of the proposed method. However, it can be observed that LMB achieves smaller MOTP than other filters and MOTP of these filters decrease as detection probability decreases. The reason is that MOTP only takes the difference between true target states and estimated states into account when there is no mismatch. However, mismatches are more likely to happened when targets maneuvering and the estimation accuracy of target states degrades when target maneuvering as mentioned above.
The OSPA of these filters with different turning rate ω and p D = 0.9 are shown in Figure 7. From Figure 7, we can see that turning rate ω has significant effect on the performance of targets tracking, the tracking performance degrades a lot as the turning rate increases. Highly unreliable estimates are obtained with larger turning rate and appear in the OSPA curve. The larger the turning rate is, the more serious the model mismatch is. As mentioned above, the estimation accuracy of target states degrades even if the data association is correct when model is mismatched. If one target is undetected at one moment, there is no measurement available to update its prediction state which is obtained through preset dynamic motion model. Because of the model mismatch, the state estimates of this target will be far away from the true state, the more serious model mismatch is, the worse target state estimation is, the target may be eliminated if target state estimation is worse enough, then the targets cardinality will be underestimated. The structure of targets is destroyed in some degree, the state estimates of this target can converge to true state again after several time steps if it is detected, then the structure of targets can be recovered. The more time is needed with larger turning rate. If missed detection occurs again before the structure is restored, the structure of targets may be destroyed permanent, the poor performance of JPHGM-LMB and HGM-LMB will be achieved.
Clear mot metrics of these filters with different turning rate ω are in Table 2.  From Table 2, it is shown that larger turning rate leads to poor performance. The performance of these filters degrades dramatically with increasing turning rate. However, JPHGM-LMB always outperforms LMB and HGM-LMB as turning rate varies. The effectiveness of the proposed method is validated again.
The average computation time for tracking these four targets of three filter with p D = 0.9 is shown in Figure 8. As mentioned in Section 3, JPHGM-LMB follows some idea of JPDA, it suffers the problem of JPDA more or less, the number of association events grows exponentially with the number of targets. The better tracking performance is achieved at the sacrifice of computation time.

Scenario 2
In scenario 2, in order to further validate the superiority of proposed method, the proposed method is demonstrated on simulated data for multiple rigid targets. This scenario can be considered as the tracking of cars in urban traffic, there are more challenges in practice. On the one hand, because of an inherent property of many sensor types, such as laser and video, the sensors are subject to occlusions, the measurements are missed on successive time instants, the structures of targets are broken to some extent. On the other hand, the increasing of measurement errors lead to the deformation of the target's structures, the effect of these factors will be analysed below. The birth and death of targets are considered in this scenario. Two rigid targets moving along different lines, where w b = 0.03, P b = diag ([100, 100, 100, 100]). The true trajectories of these targets are shown in Figure 9. In this section, the detection probability is p D = 0.9 and the acceleration of CA model is a = 1.5 m/s 2 .
Firstly, the effect of measurement errors on tracking performance is analyzed. The OSPA and clear mot metrics of these filters with different measurement errors are shown in Figure 10 and Table 3 respectively.
From Figure 10, when targets maneuvering during the interval 20 s-40 s, the OSPA of three filters increases, the reason is the mismatch between actual motion model and dynamic motion model. When targets born or die, the number of targets will change, these filters will take some time to obtain the true number of targets, then the OSPA distance increase. The performance of JPHGM-LMB degrades with increasing measurement error and these filters achieves almost similar performance when σ m = 0.4 m. However, JPHGM-LMB outperforms other filters, because JPHGM-LMB takes both kinematic and structure information into account and missed detections are considered. In JPHGM-LMB, the hypergraph is introduced to represent the structure of predictions and measurements, then the problem of data association can be formulated as a hypergraph matching problem, it becomes more challenge to obtain the correct matching results as measurement errors increases. Moreover, the structure of targets will be deformed because unreliable measurements are used to update the states of targets. From Table 3, JPHGM-LMB achieves larger MOTA values than other filters, it means that JPHGM-LMB obtains higher quality trajectories than other filters, it further validates the superiority of JPHGM-LMB.  Secondly, the effect of occlusions are considered, it can not be avoided in practice applications, for example, the car may be partly covered by pedestrians or other barriers. In order to simulate occlusions, the 4th scattering centers of two rigid targets are occluded during several time steps for simplicity. The beginning of the occlusion is at k = 32 s (targets are maneuvering), the duration of occlusion is denoted by β. The OSPA and clear mot metrics of these filters with different occlusion time steps are shown in Figure 11 and Table 4 respectively.  When occlusion happened, there are two aspects. Firstly, the number of targets will be underestimated, filters take several time steps to obtain the true cardinality estimation. There is no measurement available to update prediction state of occluded target, the estimation error of occluded target can not be avoided because of the model mismatch and measurement noise, the OSPA distance increases as shown in Figure 11. However, the superiority of JPHGM-LMB can be found clearly in Table 4, JPHGM-LMB always achieves larger MOTA than other filters as occlusion duration increases.
Secondly, the structure of targets is broken to some extent when occlusion happened, the structure can be recovered if occlusion duration is short, then the structure information can also enhance the tracking performance, see Figure 11a,b. However, if occlusion duration is long enough, then the structure of targets may be destroyed permanent, the performance of HGM-LMB and JPHGM-LMB will degrade, see Figure 11c,d.

Conclusions
In this paper, hypergraph representation is adopted to incorporate structure information of rigid targets for better data association performance in MTT. The problem of data association is formulated as a hypergraph matching problem. To achieve better data association performance in hostile environments with clutter and missed detections, all undetected cases are enumerated following some ideas of joint probabilistic data association and the likelihood is built based on group structure rather than distance between predicted states and measurements. Then, the structure information is integrated into LMB filter by revising each single target likelihood with joint association probabilities. However, the number of association events grows exponentially with the number of targets, the proposed approach is usable in real time only for a limited number of targets. The effectiveness of our method has been demonstrated by extensive experiments. Specially, in the first scenario, when detection probability is 0.9, MOTA of JPHGM-LMB is 91.86%, there is almost three percent increase compared with HGM-LMB and eight percent increase compared with LMB. Moreover, as detection probability decreases, JPHGM-LMB always outperforms other filters. In the second scenario, when the standard deviation of measurement noise is 0.2 m, MOTA of JPHGM-LMB achieves 78.61%, however, that of HGM-LMB and LMB are 70.13% and 63.9% respectively, JPHGM-LMB achieves better tracking performance.
The representation of hypergraph can utilize the structure information well, it can handle camera shake, because the structure of targets remain unchanged, the measurement noise will be more complicated if camera shake is considered. In order to handle more complex scenarios in practice applications, particle filter may be introduced, then the problem is how to reduce computational cost, it is our future work.