A Student’s t Mixture Probability Hypothesis Density Filter for Multi-Target Tracking with Outliers

In multi-target tracking, the outliers-corrupted process and measurement noises can reduce the performance of the probability hypothesis density (PHD) filter severely. To solve the problem, this paper proposed a novel PHD filter, called Student’s t mixture PHD (STM-PHD) filter. The proposed filter models the heavy-tailed process noise and measurement noise as a Student’s t distribution as well as approximates the multi-target intensity as a mixture of Student’s t components to be propagated in time. Then, a closed PHD recursion is obtained based on Student’s t approximation. Our approach can make full use of the heavy-tailed characteristic of a Student’s t distribution to handle the situations with heavy-tailed process and the measurement noises. The simulation results verify that the proposed filter can overcome the negative effect generated by outliers and maintain a good tracking accuracy in the simultaneous presence of process and measurement outliers.


Introduction
Multi-target tracking (MTT) plays an important role in many sensing systems, such as infrared, radar, sonar, etc., which uses the sensor data to jointly estimate the target state and the number of targets. Nowadays it is widely used in civilian and military applications such as air traffic control, remote sensing, ballistic missile guidance, and computer vision [1,2]. In MTT, the time-varying number of targets causes a problem in that the associations between state and measurement sets of targets are hard to know, which makes the traditional data association-based multi-target tracking methods problematic. In recent years, the random finite set (RFS) theory-based multi-target tracking filters, such as probability hypothesis density (PHD) filter [3], cardinalized PHD (CPHD) filter [4], multi-target multi-Bernoulli (MeMber) filter [1] and cardinality-balanced MeMBer (CBMeMBer) filter [5], have attracted much more attention since they can avoid the combinatorial problem that arises from data association. Moreover, some labeled RFS-based multi-Bernoulli filters [6,7] which can accommodate target tracks were proposed.
The focus of this paper is the PHD filter, which has relatively simple recursion, making it suitable for the applications demanding real time results. The PHD filter provides a tractable sub-optimal strategy for jointly estimating the number and the state of a variable number of targets by propagating the first-order statistical moment of multi-target posterior probability density in time. It has two basic implementations: the sequential Monte Carlo (SMC) method [8] and the Gaussian mixture (GM) method [9] which can solve the problem of computationally intractable multiple integrals involved in the PHD recursions. Compared to SMC implementation, GM implementation of the PHD filter has the advantages of simple state extraction and low computational cost, which is suitable for the requirement of real-time scenes. Moreover, some nonlinear extensions [9][10][11] and improvements [12][13][14] extend the scope of applications for the GM-PHD filter.
in MTT scenarios. According to RFS theory, the collections of target states and measurements at time k can be represented as finite sets X k = {x k,1 , . . . , x k,M(k) } ∈ F(X) (X ⊆ R d x ) and Z k = {z k,1 , . . . , z k,N(k) } ∈ F(Z) (Z ⊆ R d z ) where M(k) and N(k) are the number of targets and the number of measurements, respectively, F(X) and F(Z) are the collections of all finite subsets of target states and measurements, respectively, d x and d z are the dimensions of the state space X and the measurement space Z, respectively. Let: where S k|k−1 (x) is the RFS of survival targets at time k from target states X k−1 and B k|k−1 (x) is the RFS of spawned targets at time k from target states X k−1 , and Γ k is the RFS of birth targets at time k.
The RFS of measurements Z k is defined as: where G k (x) and K k are the RFS of measurements coming from target and clutter at time k, respectively. Based on the RFS approach, the MTT problem was recast into a Bayesian filtering framework, but optimal Bayesian recursions are computationally intractable due to their sets integral. Then the PHD filter, which recursively propagates the first-order moment of multi-target posterior probability density, was proposed as an approximate solution in [3]. Let D k−1 (x) denote the posterior intensity function at time k − 1, then the prediction step of the PHD filter is given by where f k|k−1 (•|•) is transition density function in a Markov process from time k − 1 to time k, p S,k (x) is the survival probability of each target at time k, β k|k−1 (x|u) is the intensity function of the RFS B k|k−1 (x|u) of targets spawning from previous state u, γ k (x) is the intensity function of birth targets at time k. Given the predicted intensity D k|k−1 (x), the update intensity D k (x) can be given by: where p D,k (x) is the probability of detection at time k, g k (z|x) denotes the measurement likelihood function at time k, κ k (x) is the intensity function of clutter. The integration domain of the integral functions in (3) and (4) is the state space X ⊆ R d x .

Review of the GM-PHD Filter
The GM-PHD filter approximates the intensity function of multi-target as Gaussian mixture components under the assumption of the linear Gaussian dynamical model and measurement model. In addition, it still needs to satisfy the following assumptions: (1) the survival and detection probabilities are state-independent; (2) the intensities of the birth and spawn RFSs are Gaussian mixtures of the form. Then a closed-form solution of the PHD filter can be obtained as follows.
Prediction: The Gaussian mixture formulation of the PHD recursion at time k − 1 is given as: where N(x; •, •) denotes that random vector x follows Gaussian distribution, m (j) k−1 and P (j) k−1 are the mean and the covariance matrix of the jth item in all Gaussian mixture components, respectively, w (j) k−1 is the corresponding weight, J k−1 is the number of Gaussian components.
Assume that the intensities of survival targets, spawned targets and birth targets are Gaussian mixtures of the form, the predicted intensity D k|k−1 (x) is given by: where: w Update: Given the PHD predictor as Equation (6), the PHD updater is formulated as: where: in the above, F k−1 and H k are the state transition matrix and observation matrix, respectively, Q k−1 and R k are the process and measurement noise covariance, respectively.

Student's t Distribution
Let a random vector x ∈ R d admit the Student's t distribution; its probability density function (PDF) can then be expressed as [23]: where Γ(•) denotes Gamma function and ∆ 2 = (x − m) T P −1 (x − m). The above PDF abbreviated by St(x; m, P, υ) with mean m, scale matrix P and degrees of freedom parameter υ. The corresponding covariance matrix can be calculated as υ υ−2 P (υ > 2) [23]. The tail behavior of the Student's t distribution is very much influenced by the degrees of freedom parameter υ. The smaller υ is, the heavier the tail is, and vice versa. In addition, the Student's t distribution reduces to the Gaussian as υ tends to infinity, thus includes it as a special case. A number of convenient properties are shared by both and can be easily derived. As for the PDF of affine mappings of Student's t variables [23] we have that z = Ax + b, with appropriate A and b, admits: The degrees of freedom parameter remains unaltered. Turning to random vectors x 1 ∈ R d 1 and x 2 ∈ R d 2 that are jointly Student's t distributed with: The marginal PDF of x 1 can be computed by applying a linear transformation with A = [I O] to (18): From (18) and (19), the conditional PDF is also a Student's t as follows: with: It can be seen that the conditional mean (21) corresponds to the Gaussian conditional mean. The matrix parameter (22) is a scaled version of the conditional covariance in the Gaussian case, which is recovered as υ tends to infinity. In contrast to the Gaussian, P 1|2 depends on x 2 . Also, the degrees of freedom parameter in (23) increases. Above properties of Student's t distribution like (17) and (20) are the foundations of Lemmas 1 and 2 (in Section 3.3) that are the key of our proposed approach.

Student's t Mixture PHD Recursion
Differing from the GM-PHD recursion, the proposed approach approximates the intensity of multi-target RFS as the Student's t mixture of the form, which is propagated in the closed-form recursions. The following gives the novel algorithm for linear system at first and then extends it to nonlinear system.

Basic Assumptions for Linear Model
As for the linear system, some foundational assumptions are given as follows.
Assumption 1. Given that the process and measurement noises admit the zero-mean t distribution with scale matrix Q k−1 and R k , respectively, and the initial state vector also follows a t distribution, each target follows a linear Student's t dynamical model and the sensor has a linear Student's t measurement model, i.e.: where F k−1 and H k denote the transition matrix and measurement matrix, respectively. And assume the matrices F k−1 , H k , Q k−1 and R k are known.

Assumption 2.
The survival and detection probabilities are state independent, i.e.: Assumption 3. The intensities of the birth and spawn RFSs are Student's t mixtures of the form: Assumption 1 is given according the affine transition nature of Student's t variables [19]. Assumption 2 is commonly used in GM-PHD filters. Assumption 3 is given in the presence of process and measurement noises with heavy tails, so it is reasonable in actual applications, i.e., when using unreliable sensors or sensors suffer from electromagnetic interference while tracking some agile targets, where both process noise and measurement noise are prone to show a heavy-tailed character. For illustrating this, a one-dimensional heavy-tailed noise distribution and Gaussian noise distribution are given as Figure 1.

Student's t Mixture PHD Recursion
Based on the Assumptions 1-3, the closed solution to the PHD recursion (3) and (4) is presented as two Propositions. The proposed Propositions show how the Student's t components of the posterior intensity are analytically propagated to the next time, analogous to the GM-PHD recursion.

Proposition 1.
Suppose that Assumptions 1-3 hold and that the posterior intensity at time k − 1 is a Student's t mixture of the form: Then, the predicted intensity at time k is also a Student's t mixture and is given by: where: Proposition 2. Suppose that Assumptions 1-3 hold and that the predicted intensity for time k is a Student's t mixture of the form: Then, the posterior intensity at time k is also a Student's t mixture and is given by: where: Aforementioned Propositions 1 and 2 can be established by applying the following approximated results for Student's t functions.
Lemma 1. Given that the jointly PDF of the current state and one-step ahead state vectors is Student's t and F, Q, m and P of appropriate dimensions and that Q and P are positive definite: Lemma 2. Given that the jointly PDF of the state and measurement vectors is Student's t and H, R, m, P of appropriate dimensions and that R and P are positive definite: where: Lemmas 1 and 2 are derived based on the properties of Student's t distribution listed in Section 2.3. The detailed derivations of Lemmas 1 and 2 can be seen [19]. Proposition 1 is established by substituting (25), (27) and (29)-(31) into the PHD prediction (3), and replacing integrals of the form (51) by appropriate Student's t as given by Lemma 1. Similarly, Proposition 2 is established by substituting (26), (28) and (39) into the PHD update (4), and then replacing integrals of the form (51) and product of Student's t of the form (52) by appropriate Student's t as given by Lemmas 1 and 2 respectively. The concrete proofs of Propositions 1 and 2 are given in Appendix B.
From (45), we know that the degree of freedom υ 3 will increase infinitely with recursion performing, which results in that the Student's t mixture degrades the Gaussian mixture according to [18]. It means that the robustness against outliers for the Student's t mixture PHD filter will be lost with time going by. To solve the problem, we adapts the moment matching approach [18] to obtain the correction of posterior intensity at time k given as (59)-(61). A pseudocode of main process of the proposed algorithm is given by Table A1 in Appendix A: Remark 1. The degree of freedom parameters υ 1 , υ 2 , υ 3 for the process noise model, measurement noise model and the initial multi-target state intensity are different in general. As in the recursion performed from (31) to (39), there is a problem that how to select the degree of freedom parameter between υ 1 and υ 3 as the degree of freedom parameter of predicted multi-target intensity to be propagated in the next recursive step. A valid method [18] is that choose the minimized value between υ 1 and υ 3 to be propagated. The same problem, existing in the recursion from (39) to (40), also can be handled by selecting the minimized value between υ 2 and υ 3 . For simplicity, this paper assumes that the degree of freedom parameters υ 1 , υ 2 , υ 3 are equal.

Remark 2.
Analogous to the derivation of GM implementation of the CPHD filter in [24], the proposed Student's t mixture implementation shown as in Propositions 1 and 2 can also be used in the CPHD filter, forming the corresponding Student's t mixture CPHD recursion.

Implementation Issues
Like the GM-PHD filter, the Student's t mixture PHD (STM-PHD) filter also suffers from the computation problem that the number of Student's t components increases endlessly with recursive time, so a pruning procedure and a merging procedure are necessary for the STM-PHD filter. The concrete procedures are similar to the procedures in the GM-PHD filter (readers can refer to [9]). The different point is that the Student's t components give the scale matrix not the covariance matrix, so the covariance matrix, calculated by υ υ−2 P, should be used in the merging procedure of the STM-PHD recursion. The estimated number of targets is obtained by summing up the weights of all the Student's t components. This procedure for the STM-PHD filter is no different from the GM-PHD filter. Again, the state extraction is also analogous to the GM-PHD filter as to select the means of the Student's t components that have weights greater than some threshold (generally set as 0.5 [9]).
In addition, for the scene with high clutter density, gating strategy is always used to reduce the computing cost for the GM-PHD filter. It is easy to know that the gating strategy is also suitable for the STM-PHD filter. The core principle to select the measurement can be expressed as: where Z k denotes the reduced set of the measurement at time k, P (j) zz,k|k−1 is the innovation covariance matrix corresponding to the jth predicted measurement, and T is the gate threshold. In contrast to the Gaussian mixture case, the judging variable (z k|k−1 ) for the Student's t mixture case follows an F-distribution not a chi-squared distribution. It means that the gate threshold for the STM-PHD filter, which can be chosen from F-distribution table, is different from the GM-PHD filter at the same probability regions.

Extension to Nonlinear Model
This section considers the situation that process and measurement models are nonlinear. The models in Assumption 1 change to become: where f k and h k are known nonlinear functions, w k−1 and v k are additional noises, which follow zero-mean Student's t distribution with scale matrix Q k−1 and R k , respectively. Different from the GM-PHD filter to cope with nonlinear problems via giving the numerical solution to Gaussian integrals, the core problem of the STM-PHD filter for nonlinear systems is how to compute the Student's t integrals. Some researchers have given numerical solutions to Student's t integrals based on Taylor linearization, unscented transform or cubature rule [19,21,22]. Compared with Taylor linearization and unscented transform, the cubature rule-based filter is a derivative-free and vigorous method [25], so this paper utilizes the cubature rule to extend the STM-PHD filter to nonlinear system according to [22]. The proposed algorithm for handle nonlinearity is given as Table A2 in Appendix A.

Simulations and Results
To illustrate the performance of the proposed filter, simulation examples are designed to compare with standard GM-PHD filter in linear and nonlinear scenarios, respectively. To compare the performance of two filters, we choose the Optimal Sub-pattern Assignment (OSPA) distance as the metric, which can comprehensively measure the cardinality and localization errors [26]. The OSPA distance is defined as follows. Let d (c) (x, y) := min(c, x − y ) for x, y ∈ W, and Π k denotes the set of permutations on {1, 2, . . . , k} for any k ∈ N = {1, 2, . . .}. For p ≥ 1, c > 0, and arbitrary finite subsets X = {x 1 , . . . , x m } and Y = {y 1 , . . . , y n } belong to W, where m, n ∈ N 0 = {0, 1, 2, . . .}: If m < n, and d (c) p is the order parameter that determines the sensitivity to outliers and c is the cut-off parameter that determines the relative weighting of the penalties assigned to cardinality and localization errors. The details to choose the parameters p and c can be seen in [26]. In our simulation examples, we set p = 2 and c = 100.

Linear Scenario
Consider a two-dimensional scenario where there are twelve targets over region [−1000, 1000] × [−1000, 1000] during the interval of 100 s. Assuming no target spawning and each target moves as a constant velocity model similar to [9] with: where T = 1 and q = 5. The state x k = [p x,k , p x,k , p y,k , p y,k ] T of each target consist of position [p x,k , p y,k ] and velocity [p x,k , p y,k ] at time k. Their corresponding initial state and life time of each target are given as Table 1. The noisy measurement model is the same as [9] with: For the process and measurement noises in (66) and (67), about one percent of process and measurement noise values are drawn from Gaussian with severely high covariance. This percentage is also called contaminated rate which can be denoted by ε [27]. Assuming no spawned target and birth targets appear spontaneously according to a Poisson point process with intensity function: where m  Figure 3, it is can be seen that the individual heavy-tailed measurements obviously bias the true position compared with the corresponding Gaussian measurements, which may degrade the estimation accuracy.
The detection probability and target survival probability are p D,k = 0.98 and p S,k = 0.99, respectively. Truncated threshold, merged threshold and the maximum Student's t components related to pruning and merging process are T p = 10 −5 , U = 4 and J max = 100, respectively. For simplification, set υ 1 = υ 2 = υ 3 = 10. To evaluate the performance of the STM-PHD filter, we compare it with GM-PHD filter over 100 Monte Carlo (MC) trails with fixed clutter density. Under the uniform distribution assumption, the clutter density can be given by clutter rate λ c with the relationship κ k (z) = λ c /V. In this simulation, we set λ c = 20 (giving an average of 20 clutter returns per scan). Figures 4 and 5 respectively show the estimated cardinality and the OSPA distance for two filters. The result in Figure 4 shows that the STM-PHD filter provides a noticeable improvement in terms of cardinality estimation accuracy compared with the GM-PHD filter, although some biased cardinality estimates appear for the STM-PHD filter.  In Figure 5 the OSPA distance of the STM-PHD filter is lower than that of the GM-PHD filter. Especially after 40 s more targets appear, difference of the OSPA distance between two filters is more noticeable. The main reason is that the STM-PHD filter has more accurate cardinality estimation.
To evaluate the performance of the proposed filter sufficiently, a simulation is executed over 100 MC trials with different contamination rates from ε = 0 to ε = 0.05. Then the time averaged OSPA distance of the STM-PHD filter and the GM-PHD filter, respectively, are shown in Figure 6.
From Figure 6, it can be seen that the time averaged OSPA distance of the STM-PHD filter is lower than that of the GM-PHD filter overall. The time averaged OSPA distances of two filters increases with the increasing contamination rate. Remarkably, the gap of OSPA distance between the STM-PHD filter and the GM-PHD filter changes wider from ε = 0 to ε = 0.05. It means that the STM-PHD filter has strong robustness against the negative effect of outliers, especially for high contamination rates. This is due to the fact the Student's t noise model in the proposed approach can match the heavy-tailed non-Gaussian noise well. On the contrary, the Gaussian-based GM approach matches such a non-Gaussian noise worse and worse with the increasing of contaminated rate. Additionally, at ε = 0, the OSPA distance for the STM-PHD filter is the same as the GM-PHD filter. It indicates that the STM-PHD filter and the GM-PHD filter have the same tracking performance when outliers do not exist.  To further evaluate the performance of the proposed filter, a simulation is performed over 100 MC trails with different clutter rates from λ c = 0 to λ c = 50. The resulting time averaged OSPA distances of the proposed filter and the GM-PHD filter are shown in Figure 7. It can be seen that the time averaged OSPA distances of two filters increase with the increasing clutter rate and the time averaged OSPA distance of the STM-PHD filter is always lower than that of the GM-PHD filter under different clutter rates. This means that the STM-PHD filter generally outperforms the GM-PHD filter when outliers exist, no matter what the clutter rate is.
In addition, the computational cost for the STM-PHD filter lies at the same level as that of the GM-PHD filter for the linear system. Running on a computer with an Intel(R) Core(TM) i5-4570 CPU at 3.2 GHz, the average computing times per execution of the GM-PHD filter and the STM-PHD filter with different clutter rate are given in Table 2.

Nonlinear Scenario
In this example, we assume a maximum of ten targets appears on the observation region [−π, π] × [0, 2000] and a nearly constant turn state model and nonlinear bearings and range measurement model are considered according to [9]. The state x k = [x T k , ω k ] T consists of position and velocity x k = [p x,k , p x,k , p y,k , p y,k ] T as well as the turn rate ω k . The state model is given by: where: The noisy measurement model with range and bearing measurement z k = [r k , θ k ] is given by: Like the linear scenario, the outliers contaminated process and measurement noises can be given by: with Q = diag([5, 5, π/180] T ) 2 , R = diag([10, 2(π/180)] T ) 2 .
In the simulation, we assume no spawned target and that the birth target is Poisson with intensity: where m The initial target states are given by Table 3 and the true trajectories of each target are shown as Figure 8. In addition, Figure 9 plots corresponding measurements with Gaussian noise and heavy-tailed noise respectively over time (not plot clutter in figure). In Figure 9, it also shows the results analogous to the linear case as shown in Figure 3. To evaluate the performance of the CKF based STM-PHD filter to cope with nonlinear problem, we compare it with CKF based GM-PHD filter [11] over 100 Monte Carlo (MC) trails with fixed clutter rate λ c = 20. Figures 10 and 11 respectively show the estimated cardinality and the OSPA distance of two filters. From Figure 10, it can be seen that the STM-PHD filter is superior to the GM-PHD filter in terms of cardinality estimation accuracy, although the STM-PHD filter has cardinality bias when the number of targets increases. The main reason for generating cardinality bias is that some necessary approximations for coping with nonlinear problems induce errors.
In Figure 11 the OSPA distances of the STM-PHD filter and the GM-PHD filter are at the same level before 65 s and later on the OSPA distance of the STM-PHD filter is obviously lower than that of the GM-PHD filter. This result matches the result in Figure 10 well. It indicates that the difference of OSPA distance between two filters is due to the difference of cardinality estimation accuracy.    To evaluate the performance of the STM-PHD filter sufficiently, simulation is performed over 100 MC trials with different contaminated rate from ε = 0 to ε = 0.05. Then the time averaged OSPA distances of the STM-PHD filter and the GM-PHD filter are shown in Figure 12.
Analogous to the linear scenario, it can be seen that the time averaged OSPA distance of the STM-PHD filter is lower than that of the GM-PHD filter at almost all contaminated rates. The gap of the time averaged OSPA distance between the STM-PHD filter and the GM-PHD filter also changes widely as the contamination rates increase. Nevertheless, the gap is not noticeable like in the linear scenario. This is due to the relatively big approximation error induced in the nonlinear scenario. The result indicates that the STM-PHD filter still outperforms the GM-PHD filter to cope with outliers for nonlinear systems.
To further evaluate the performance of the proposed filter, 100 MC trails are performed from λ c = 0 to λ c = 50. The time averaged OSPA distances versus varying clutter rate for the STM-PHD filter and the GM-PHD filter are shown in Figure 13. It can be seen that the time averaged OSPA distance of the STM-PHD filter is lower than that of the GM-PHD filter at different clutter rates, and the trend for two filters goes up with the increase of clutter rate. Generally speaking, the STM-PHD filter is superior to the GM-PHD filter but the superiority is not noticeable like in the linear scenario. This is due to the fact the approximation error induced in a nonlinear scenario is bigger than that in a linear scenario. Nevertheless, the results still indicate that the STM-PHD filter is valid to handle the outliers.  Again, the computing time for each filter under the different clutter rate is given in Table 4. It shows that the STM-PHD filter has a higher computing cost compared to the GM-PHD filter and the higher the clutter rate is, much more time is consumed for the STM-PHD filter. The reason is that computing nonlinear Student's t integrals is more complex than computing nonlinear Gaussian integrals.

Conclusions
To solve the problem that the process and measurement outliers degrade the performance of PHD filter, this paper proposes a Student's t mixture-based PHD filter, where the prior multi-target intensities are approximated as a mixture of the Student's t components to be propagated in time and the Student's t mixture approximated posterior multi-target intensity are obtained through a Student's t approximation-based recursion. The proposed approach can efficiently suppress the negative impact caused by the process and measurement outliers, while still maintaining good estimation accuracy. The advantages of the proposed filter are verified sufficiently by simulation. The simulation results also imply that the proposed approach outperforms the GM-based approach in scenarios where outliers appear due to electromagnetic interference or sensors' own unreliability. In addition, the proposed approach is also suitable for the CPHD filter. However, the proposed approach has relatively high computational cost for nonlinear systems. In further study, we will try to improve this. Moreover, how to combine the proposed Student's t mixture implementation with the multi-Bernoulli filters (including CBMeMBer filter and labeled RFS based multi-Bernoulli filter) is also our focus in the next work. Author Contributions: Zhuowei Liu performed the simulations and wrote the paper; Shuxin Chen and Hao Wu revised this paper and offered some useful suggestions in methodologies and simulations; Renke He and Lin Hao offered revision during the whole process.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Pseudocode for the STM-PHD filter.

Appendix A
given {w , the measurement set Z k and the degree of freedom parameters υ 1 , υ 2 , υ 3 . step 1. (prediction for birth targets)    Table A2. Pseudocode for the cubature rule based STM-PHD filter.
given {w , the measurement set Z k and the degree of freedom parameters υ 1 , υ 2 , υ 3 . step 1. (prediction for birth targets) follow Step 1. of table. for j = 1, . . . , i − use the cubature rule suitable for Student's t integral (see [22]) with m