GLMB Tracker with Partial Smoothing

In this paper, we introduce a tracking algorithm based on labeled Random Finite Sets (RFS) and Rauch–Tung–Striebel (RTS) smoother via a Generalized Labeled Multi-Bernoulli (GLMB) multi-scan estimator to track multiple objects in a wide range of tracking scenarios. In the forward filtering stage, we use the GLMB filter to generate a set of labels and the association history between labels and the measurements. In the trajectory-estimating stage, we apply a track management strategy to eliminate tracks with short lifespan compared to a threshold value. Subsequently, we apply the information of trajectories captured from the forward GLMB filtering stage to carry out standard forward filtering and RTS backward smoothing on each estimated trajectory. For the experiment, we implement the tracker with standard GLMB filter, the hybrid track-before-detect (TBD) GLMB filter, and the GLMB filter with objects spawning. The results show improvements in tracking performance for all implemented trackers given negligible extra computational effort compared to standard GLMB filters.


Introduction
While single-object tracking algorithms have been studied extensively for more than half a century, multi-object tracking is currently a trending topic in signal processing society due to its extensive applications. The challenges of the multi-object tracking problem arise in the context of miss-detection, false alarms, object thinning, and appearing processes. To tackle these problems, several frameworks have been put forward in the literature such as the Joint Probabilistic Data Association (JPDA) [1], multiple hypotheses tracking [2], and recently, Random Finite Sets (RFS) [2]. In particular, RFS forms the mathematical basis of many modern multi-object filters such as Probability Hypothesis Density (PHD) filter [3][4][5][6][7], cardinalized PHD (CPHD) filter [8][9][10], multi-Bernoulli filter [11,12], the Generalized Labeled Multi-Bernoulli (GLMB) filter [13][14][15][16][17][18][19], and its approximation the Labeled Multi-Bernoulli (LMB) filter [20,21]. In many applications, tracking algorithms rely on the standard point measurements to update the object states; in contrast, TBD [22][23][24][25] is an alternative approach that bypasses the detection module to directly exploit the observed spatial data. This technique is introduced under the RFS framework in Reference [26] with the development of the so-called separable likelihood model and, recently, in a hybrid (combination of standard observation and separable observation models) approach in Reference [27]. In terms of system modelling, in many multi-object tracking scenarios, it is sufficient to consider object thinning and appearing processes via survivals, deaths, and instantaneous birth models. However, in many practical applications, new objects are also generated from a set or a subset of existing objects. In the context of RFS-based filtering techniques, such spawning models have been proposed for CPHD filter in References [28,29] and for GLMB filter in Reference [30]. Because these spawning models correctly reflect the physical state of the systems with spawning objects, the accuracy of the estimate is improved.

The Labeled RFS
Throughout this article, we adhere to the following notations. The set exponential is denoted as [h(·)] X = ∏ x∈X h(x) while the inner product notation is denoted as f , g = f (x)g(x)dx. The generalization of the Kronecker delta is denoted as follows: The set inclusion function is written as follows: X denotes the labeled set of objects, while x = (x, l) denotes a single labeled object, specifically, x ∈ X and l ∈ L, where X and L are respectively the kinematic state space and the discrete labels space at the current time step. L is a label extraction function, i.e., L(x) = l and F (X) denote sets of finite subsets of X. The "+" sign is used to indicate the next time step when applicable.
The Finite-Set Statistics (FISST) integration is defined as follows [54]: In multi-object tracking problem, the cardinality of object sets varies when objects enter or leave the surveillance region. As RFS is a random set of points in the sense that the number of points in the set is random and the points themselves are also random and unordered [54], a set of random objects can be naturally characterized as a RFS. Being introduced systematically for the first time in Reference [13], the labeled RFS incorporates the identities of elements into the unlabeled counterpart. Precisely, with the state space X and marks space L, the labeled RFS is a marked simple point process whereas each realization has a distinct label [13,15]. The distinct label property is satisfied when X has the same cardinality as its labels L(X). Given this, the distinct label indicator can be written as follows [16]: The introduction of labeled RFS to the multi-object tracking problem allows direct estimation of trajectories which cannot be done previously with conventional RFS without a separate labeling scheme.

The Multi-Object Transition Kernel
In standard tracking scenario, an existing object can either survive or die in the next time step. The surviving objects are modeled as an LMB RFS with a survival probability of p S (x, l), a disappearance probability of q S (x, l) = 1 − p S (x, l), and a spatial distribution of f S+ (x + |x, l). The model for such surviving objects is given as follows [13,15,16]: where In addition, the new birth objects can instantaneously appear at each time steps and they are modeled with LMB RFS as follows [13,15,16]: Furthermore, in certain scenarios, new objects can also be generated from existing objects, which leads to the need of a spawning model in order to correctly predict the state of the system at the next time step. Recently, a spawning model for GLMB filter has been proposed in Reference [30]; we introduce this model again here as follows for the sake of completeness.
For spawned objects, the naming convention is given as follows: if at time step k the label of an object is l, then the spawned labels from l at the next time step is l spawn = (l, k + 1, i), where i is the index to distinguish between different spawned objects from the same parent. Following this convention, the set of all spawned labels in the next time step is S + = L × {k + 1} × N, where N is the set of positive natural numbers [30].
For each spawned object with the label l spawn ∈ S + (L(x)), it will either exist with the probability p T (x; l spawn ) and a spatial distribution f T+ (x + |x; l spawn ) or not with the probability q T (x; l spawn ) = 1 − p T (x; l spawn ).
The density of the set P of new spawned objects from x is formulated as follows [30]: where Φ T+ (P|x; l spawn ) = ∑ (x + ,l + ) Let Q be a labeled set of objects spawned from X with L(Q) ⊆ S + (L(X)). As all labels sets are disjoint, the FISST convolution theorem [54] can be applied. where As new birth objects (given in Equation (3)) are independent of the previous time step objects, the overall transition model is given as follows: As the spawned objects depend upon the objects from previous time steps, the prediction step of the filtering stage needs to be done in a joint manner to capture the objects' dependency. As a result, approximation is needed to convert the joint object distribution to a standard GLMB density for each time step in order to keep the algorithm tractable.
In the scenario where the spawning process is not present, the multi-object transition kernel is reduced to the following:

The Multi-Object Observation Models
In the RFS multi-object tracking framework, given a set of measurements Z = {z 1:|Z| }, we have a standard observation model of the following form: [54] is the clutter intensity, p D (·) and q D (·) are respectively the detection and miss-detection probabilities, g(z|x, l) is the likelihood that (x, l) generates measurement z, θ : L → {0 : |Z|} is a positive 1-1 map, and Θ is the entire set of such mappings. For image observation, with the assumption that object template T(·) is not overlapped, i.e., T(x 1 ) = T(x 2 ) given x 1 = x 2 , the separable likelihood is given by the following [26]: where y denotes the observed image, f B denotes the likelihood of the entire set of X, and g y (x) denotes the likelihood of a single object in the observed image. The designs of f B and g y vary according to the applications, characteristics of observed image, and object appearances. First introduced in Reference [27], the concept of a hybrid TBD observation model takes advantage of both standard and separable likelihood models. Intuitively, while detected objects can be updated by the associated point measurements, the miss-detected objects can be updated directly from the image observation. This intuition can be described mathematically by defining the following: The hybrid likelihood can then be written as follows [27]: where ϕ (θ(l)) y

The Single Object RTS Smoother
Given a set single object observation {z 1:N }, where N ≤ K with K is the total number of tracking time steps, the smoothed density of an object state at time k ≤ N, p(x k |z 1:N ), is obtained as follows [36].
Initially, let the joint distribution of x k and x k+1 be rewritten as follows: Then, the distribution of x k given x k+1 and z 1:k is given as follows: where p(x k+1 |z 1:k ) = p(x k+1 |x k )p(x k |z 1:k )dx k From the Markov state-space model, we have the following property: p(x k |x k+1 , z 1:N ) = p(x k |x k+1 , z 1:k ). Hence, we have the following: Then, the joint distribution of x k and x k+1 given the measurements set z 1:N is given as follows: Finally, the smoothed density of state x k can then be obtained via the marginalization step as follows:

The Filtering Stage
For this tracker, we assume Gaussian distribution for the dynamic state of each object. At this first stage, the tracker carries out a standard multi-object filtering process to obtain the forward estimated labels and the measurements to label association history. In this subsection, we provide the forward filtering steps for both the GLMB filter (with standard measurements and hybrid measurements observations) and GLMB filter with object spawning.

GLMB Filter without Objects Spawning
The procedure to estimate the state of a set of objects with the standard GLMB filter without including the spawning model in the transition kernel is given as follows.
Given a GLMB prior [16] π(X) = ∆(X) and the standard observation model as in Equation (8), the filtering density in the next time step is given by the following [16]: where I ∈ F (L), ξ ∈ Ξ, I + ∈ F (L + ), θ + ∈ Θ + where ξ is the tracks to measurement association history and Ξ is the entire space of ξ.
In tracking scenarios where raw spatial detection are also available, the hybrid model in Equation (11) can be used to replace the standard observation model with the probability of miss-detection being scaled by the spatial observation likelihood, i.e., given the GLMB prior as in Equation (17). The filtering density is then given as follows [27]: where ω (I,ξ,I + ,θ + )

GLMB Filter with Objects Spawning
For the prior density which is a GLMB density as in Equation (17) and the transition kernel defined in Equation (6), by applying the joint predict-update approach, a proposal density can be written as follows [30]: From this proposal density, Gibbs' sampler is applied to select high weight hypotheses. These hypotheses are subsequently used to form a standard GLMB density [30]:

GLMB Multi-Scan Estimator
The concept of a multi-scan estimator is introduced in Reference [50]. Given a multi-scan GLMB from time step j to k, the cardinality distribution of the number of trajectories is given as follows: One possible form of a multi-scan estimator is to determine the component with the highest weight w (ξ) j:k (I j:k ) given that it has the most probable cardinality by maximizing Equation (22). The expected trajectory estimate can then be computed from p (ξ) j:k (·, l) for each l ∈ I j:k . In this work, we proposed modifications to the multi-scan estimator in Reference [50], which can eliminate track fragmentation and improve localization performance. The set of all estimated trajectories is updated at each time step via the most significant hypothesis with the most probable cardinality in the GLMB density. At the time step when state estimation is required, the information of estimated trajectories is passed into the estimator. At this stage, trajectories pruning is applied to eliminate short-term tracks. Subsequently, standard filtering and RTS smoothing techniques are applied on each trajectory to produce smoothed state estimates. The significance of this estimator is that it allows the application of smoothing techniques to improve the tracking accuracy while completely eliminates track fragmentation as the entire trajectory is estimated as a whole. In addition, as the complexity of the estimator depends only on the number of estimated tracks, the additional computational effort of the estimator is negligible compared to GLMB filtering. The detailed implementation of the estimator is given as in following subsections.

Estimating the Trajectories
Given the GLMB density at the end of each filtering cycle, the GLMB filter estimate is the result of the maximum posteriori estimate of the cardinality with the means of the object states being conditioned on the estimated cardinality [15]. Given that the possible highest number of tracked objects is N max , the cardinality distribution of the the objects set over a finite set of hypotheses {(I, ξ) h } h=1:H is written as follows: The estimated cardinality is given as follows: The estimated hypothesis is as follows: The information from the filtering stage needs to be captured to facilitate the multi-scan estimator. At this stage, we represent a set of estimated trajectories at time k with a set of tuples defined aŝ (l kN ,blkN ,ξlkN )}, wherel k n is the label of estimated trajectory n at time k,blkˆn is its corresponding initial birth state (including the time of birth and initial kinematic state), andξlkˆn is the corresponding association history. In addition, we also have set of tuples for all estimated trajectorieŝ T from time step 1 to current time step k. This set of tuples is updated at the end of each filtering time step via updating the association history and initial birth state of existing trajectories and adding new tuples to the set if the trajectories are new. The procedure to update the tuples set is given in Algorithm 1. ,ξl N )} the lifetime of a trajectory with labell n is the length of the corresponding association history, which is given as follows:

Algorithm 1 Updating trajectories tuples
where f length (·) is the function that determines the length of the vector in its argument. If the length of a track is shorter than the threshold value τ t , i.e., τ(l n ) < τ t , this trajectory will be removed from the set of estimated trajectories.

Numerical Implementation of Single-Object Smoother
For completeness, we outline here the detailed numerical implementation of the single-object RTS smoother for both linear and nonlinear dynamic models with Gaussian assumption on the distribution of the states.
Given a linear dynamic model of the form x + = Fx + q, z = Hx + r where x is the system state, F is the linear transformation matrix, H is the linear observation matrix, q and r are respectively the process and observation Gaussian noise, and z is the current time step measurement, the backward smoothing step over an interval N ≤ K (where K is the total number of tracking time steps) can be implemented with the standard RTS Smoother [32]. The details of the RTS smoother is given in Algorithm 2, where the superscript s denotes the smoothed results.

Algorithm 2 Single-object Rauch-Tung-Striebel (RTS) smoother
Input: The filtered mean and covariance {x k , P k } k=1:N , F, Q Output: The smoothed mean and covariance {x s k , P s k } k=1:N Initialization: x s N = x N and P s N = P N for k = N − 1 down to 1 For a nonlinear dynamic model, the RTS smoother can also be applied via the unscented transformation [39]. Given the dynamic model where f is the nonlinear state transition function and h is the nonlinear observation function, other variables are interpreted the same as in the linear model; the smoothed results can be inferred via the Unscented RTS (URTS) smoother [36]. The smoothing procedure is presented in Algorithm 3, and the readers are referred to Reference [39] for the detailed implementation of the unscented transform. Compared to the Sequential Monte Carlo method, unscented transform is less computationally expensive as the number of sigma points to approximate a Gaussian distribution is much lower than the number of particles to represent the entire density.

Forward Filtering-Backward Smoothing of Trajectories
In this step, by using the measurement association history, the initial birth information (the state and the time at birth) in the estimated trajectories tuples set, and the measurements set, we apply standard single-object filtering and backward RTS smoothing techniques to produce a set of smoothed distributions of the trajectories. In this work, spatial distributions of tracks are assumed to be Gaussian distributed; hence, the estimated spatial distribution of track labeled l at time k is represented by the mean m l k and the covariance P l k . The details of the procedure to produce the tracks distributions are given in

Algorithm 3 Single-object Unscented RTS (URTS) smoother
Input: The filtered mean and covariance {x k , P k } k=1:N , f (x + |x) , Q Output: The smoothed mean and covariance {x s k , P s k } k=1:N Initialization: While the advantages are mentioned previously, this estimator is also subjected to certain drawbacks in challenging tracking scenarios. First, depending on the nature of the problem, the user needs to set an appropriate pruning threshold τ l to prevent the estimator from deleting correct trajectories, especially when track identity switching is severe. Second, as the estimator relies on the latest hypothesis to produce estimates, the more this hypothesis deviates from the truth, the more inaccurate the entire estimation. In addition, in the case that wrong new tracks keep appearing in the set of trajectory estimates, overestimating of the number of tracks is also possible. However, the benefit from track fragmentation reduction and improvement of tracking accuracy given negligible computational effort is much more than the risk of incorrectly estimating the number of tracks, and the following simulation results are a strong demonstration of the benefits of our proposed tracker.

Linear Dynamic Model
In this experiment, we use a constant velocity model for the dynamic of the system. The state vector consists of information regarding the planar position and the velocity of the objects, which is x k = p x , p y ,ṗ x ,ṗ y T ; while the measurement vector contains the position of the object, which is The transition and observation models are given respectively as follows: Particularly, in this experiment, we set σ v = 5 m/s and σ = 15 m. The surveillance region is the [−1000, 1000] m × [−1000, 1000] m area, the total time step is K = 100, and ∆ = 1. The ground truth plot for this experiment is given in Figure 1. The surviving probability is set to p S = 0.99, and the detection probability is p D = 0.95. Clutter rate is set to 66 false alarms per scan. The birth probability is set to r B = 0.03. The states of expected births are m ). The number of hypotheses for GLMB filter is capped at 20, 000 components. In this experiment, we smooth the entire tracking interval from k = 1 to k = K. The threshold for the smoother to prune the track is set to τ t = 3 time steps.
We conduct the experiment over 100 Monte Carlo runs. The means of the estimated Optimal Subpattern Assignment (OSPA) error [55] and OSPA 2 error [52,56] are given respectively in Figures 2  and 3. Figure 4 shows the GLMB filter and proposed tracker-estimated cardinality of objects set for each time step along with the true values.

Nonlinear Dynamic Model
For the demonstration of the nonlinear tracking scenario, we use a constant turn model with 5-D state vector x k = p x , p y ,ṗ x ,ṗ y , ω T , where ω is the object's turn rate. The transition density is given as follows: In this experiment, we set σ ω = π/180 rad/s and σ v = 5 m/s. The observation model is given as the bearing and range detection of the 2D vector z k = [θ, r] T with σ θ = π/90 rad and σ r = 5 m.
The surveillance region is the half disc of the radius 2000 m with K = 100 time steps and ∆ = 1. The ground truth for this experiment is given in Figure 5. The surviving probability is set to p S = 0.99 and the detection probability is p D = 0.95. Clutter rate is set to 66 false alarms per scan. The expected birth states are m = [1000, 0, 1500, 0, π/180] T with r B = 0.02, and the birth covariance is P B = diag( [50, 50, 50, 50, π/30]). The number of hypotheses is also capped at 20, 000 components. The smoothing interval is the entire tracking sequence from k = 1 to k = K. We also set the track pruning threshold for the smoother to 3 time steps in this experiment.
For this scenario, we also test the performance of the tracker over 100 Monte Carlo runs. The means of OSPA error and OSPA 2 error of the estimates are plotted in Figures 6 and 7, respectively, while the set cardinality is shown in Figure 8.

Hybrid TBD Observation Model
In this simulation, we use the hybrid measurement model to track objects following a linear dynamic motion model. The surveillance region is 100 × 100 pixels with image cell size of 1, total time step of K = 100, and ∆ = 1. The observation are the raw images, which are arrays of pixels. In particular, for a pixel i at the image coordinate (a (i) , b (i) ), the array value is given as follows [26,27]: where w (i) N (0, σ y ) is Gaussian noise. In this experiment, we set σ h = 4 and σ y = 1. We choose the value of I k such that the signal to noise ratio (SNR) varies over the range 7 to 10 dB. For the observation model from the perspective of the filter, we fix its SNR value to 10 dB. From the raw images, we then use hard-shareholding to extract the points measurements at each frame.  The ground truth location of objects is shown in Figure 9 while Figure 10 shows samples of raw image observation along with points detection. The implementation of the filtering phase is as the same as in Reference [27]. The smoothing interval is set to the entire tracking time with the track pruning threshold of the smoother set to 3 time steps. This experiment is run over 100 Monte Carlo trials. The means of OSPA error and OSPA 2 error are shown respectively in Figures 11 and 12. The estimated cardinality is plotted in Figure 13.

Discussion on the Simulation Results
For all simulated experiments, we observe lower OSPA and OSPA 2 errors for the proposed tracker compared to the GLMB filter results. In the first two experiments with the standard observation model, as the clutter rate is high, the filtered-only trajectories jiggle around the true paths due to false measurements. In Figures 2 and 6 as well as in Figures 3 and 7, the overall errors of the GLMB filter estimates are higher than of the proposed tracker estimates. The reduction of localization error contribute mainly to the improvement of the tracking performance. From the cardinality plots in Figures 4 and 8, on average, the proposed tracker slightly improves estimate cardinality performance as it is able to eliminate track fragmentation while eliminating incorrect tracks at some time steps.
In the hybrid TBD tracking experiment, as tracks are miss-detected due to low SNR, the proposed tracker improves tracking performance by eliminating track fragmentation. Not much localization error is reduced by the smoother step as the GLMB filter produces relatively good tracking results. The OSPA and OSPA 2 results presented in Figures 11 and 12 show slight improvement of the proposed tracker results compared to GLMB filter tracking results. However, the cardinality plot in Figure 13 clearly indicates that the proposed tracker is able to improve the estimated cardinality between time step 30 and 40.
The run time for all simulated scenario is given in Figure 14 in terms of the percentage of extra computational time of the proposed tracker over the computational time of the filtering step only. It is shown that the extra computational time is negligible in all three tracking scenarios with the extra computational time of the proposed tracker less than 0.5% of the filtering computational time. However, the main disadvantage is that the tracker needs to wait until the end of the smoothing interval to be able to produce tracking results.

Application to Cell Microscopy
In this experiment, we attempt to track biological cells from a sequence of images containing 90 frames by using the proposed tracker. A snapshot of the sequence is shown in Figure 15. In this application, we use the constant turn rate for the dynamic model as in Section 4.1.2 and the standard observation model as in Section 4.1.1. We also implement the measurement driven model as described in Reference [20]. For the first time step, the birth rate is set to a very high value (≈1) to initialize objects. Subsequently, the birth rate is capped at 10 −7 . The standard deviation of the turn rate noise is π/90 rad/s, and the standard deviation of the velocity noise is 5 pixels/frame. The number of hypotheses is capped at 10,000. The detection rate is set to 0.88, and the surviving rate and the spawning rate are 0.999 and 0.035, respectively. The clutter rate is set to 0.05. The cell spawning model is the same as described in Reference [57] with the covariance of the spawning model given   From the tracking results, significant improvement is observed as the proposed tracker is able to eliminate incorrect spawned tracks. While the OSPA error in Figure 16 shows similar performance for the GLMB filter and the proposed tracker, the improvement is clearly reflected in the OSPA 2 cardinality error plots in Figure 17. From the cardinality plot in Figure 18, the estimated cardinality from our tracker is much closer to the true values as fewer incorrect spawned tracks are estimated. In this experiment, there is not much difference between the GLMB filter and the proposed tracker estimates localization error due to the mismatch between the dynamic model and actual motion of the cells. Finally, in Figure 19, we illustrate the improved tracking results in terms of tracking sequence for several time steps at a selected region where the cell splitting process occurs.

Conclusions
In this paper, we detailed the implementation of a new tracker based on GLMB filter and a modified multi-scan estimator. In addition to lowering the localization error by performing RTS smoother on each individual estimated trajectory, the proposed tracker can also reduce cardinality errors by deleting the short-term tracks via track management and by completely eliminating track fragmentation. The computation time is shown to contribute to less than 0.5% of the total tracking time, although a fixed delay time is needed before the tracker can produce the estimate. Therefore, in applications when real-time updates are not required, the proposed tracker can be used to improve the tracking results given negligible extra computation time. However, as the smoothing results strongly depend on the quality of the estimates obtained from the forward filtering step, if the filtered estimate experiences strong distortion, the performance of the proposed tracker degrades significantly.