1. Introduction
Bird flock activity has attracted increasing attention in low-altitude airspace due to its significant impact on ecological systems and aviation safety. Tracking bird flocks provides important data support for biological flight mechanism research [
1], ecological monitoring [
2], and bird strike risk assessment [
3]. In the radar field, since birds are usually spatially concentrated and hard to distinguish, many approaches [
4,
5] focus on tracking the bird flock as a group target, including its centroid and extension state. However, such group target tracking is insufficient to reveal the interaction mechanism and the behavioral diversity of individual targets. Therefore, to enable individual behavioral analysis, tracking individual targets within a group is essential.
However, targets in a group are usually closely spaced and exhibit similar kinematic characteristics, making it hard to associate adjacent targets using only positional information. To achieve high-precision tracking of individual targets within a group, tracking radars must provide high range and angular resolution. Narrow-beam monopulse radars offer advantages in terms of high resolution and high data rate in target tracking. However, due to the limitation of beamwidth, only part of the group can be observed, and the number of observed targets varies with the radar-target geometry [
6]. These incomplete measurements lead to high fragmentation of tracks. Therefore, it is challenging to maintain stable tracking for closely spaced group targets with incomplete measurements.
Traditional tracking methods [
7,
8] rely on data association performance, and thus face problems of poor tracking accuracy and high computational complexity in dense target scenarios. Avoiding the measurement-to-target association problem, random finite set (RFS)-based filters [
9,
10,
11] estimate multi-target states under association uncertainty. To further form continuous target tracks, the target label has been introduced in these filters, such as the tagged PHD [
12], labeled multi-Bernoulli (LMB) [
13], and generalized LMB (GLMB) [
14]. Nonetheless, the tags or labels are not used for track formation, leading to track switches, missed detections, and false targets. By modeling the set of trajectories as the state variable, the trajectory PHD (TPHD) method [
15,
16] can maintain and smooth individual trajectories during target tracking, thereby improving tracking accuracy. However, it is still difficult to distinguish multiple targets with similar positions and velocities.
Instead of using only kinematic information, several methods utilize features to improve multi-target tracking performance. Handcrafted features, including polarization scattering characteristics [
17], time–frequency information [
18], and radar cross section (RCS) and Doppler information [
19,
20,
21], have been integrated into filters to enhance tracking performance in cluttered environments. These methods primarily utilize the statistical and kinematic characteristics to construct likelihood functions in the filter, which can better distinguish between clutter and different types of targets. Several methods extract spatial and temporal correlations and image feature similarity through deep learning networks. The AMIR network [
22] fuses the interaction, motion, and appearance features by long short-term memory (LSTM) networks. The MPNTrack method [
23] models the appearance and geometric feature interactions between targets. In [
24], a convolutional Siamese network is proposed to extract radar echo features, and multiple features are fused by extreme gradient boosting (XGBoost) for data association of maneuvering targets in dense false alarms. Similarly, the Siamese network is also used in [
25] to match the distribution pattern of echoes. In [
26], an interactive Transformer–graph attention network is designed to learn multi-frame spatio–temporal relationships and maneuvering characteristics. However, these features are not effective in distinguishing targets with similar echo characteristics, and the spatial correlations between targets are hard to capture due to the incomplete measurements. Consequently, the existing feature-based methods struggle to correctly track individuals within a group.
Birds exhibit micro-Doppler signatures due to their wingbeat behavior [
27,
28]. The Doppler spread varies among individuals, providing additional information for distinguishing targets within a group. These signatures contain rich information related to the target size and wingbeat dynamics, which have been widely studied for biophysical parameter estimation and species classification [
29,
30]. However, micro-Doppler signatures vary significantly over time due to wingbeat behavior, exhibiting complex temporal dynamics that make them difficult to directly use for track–measurement association.
To address these problems, this paper proposes an adaptive TPHD tracking method using Doppler features. The core contribution lies in the proposed Doppler feature representation learning and its effective integration into the TPHD filtering framework, which can improve tracking for individuals within a dense bird flock. Instead of establishing an explicit micro-Doppler model, the proposed method adopts a data-driven manner for Doppler feature prediction and contrast. Then, the learned Doppler feature is fused with kinematic parameters to guide the weight update in the TPHD filter. Moreover, adaptive mechanisms are incorporated into the TPHD filter to enhance tracking stability under incomplete measurements. The main contributions of this paper are given as follows:
- (1)
A Doppler temporal contrastive network is designed to extract discriminative representations from the time-varying radar echo of birds, which provides complementary information beyond kinematic parameters and improves the association of individuals within the bird flock.
- (2)
An XGBoost-based feature fusion strategy is proposed to incorporate the Doppler representation and kinematic parameters into the TPHD filter, thereby improving the tracking performance of closely spaced group targets.
- (3)
Adaptive detection probability and adaptive target birth mechanisms are applied in the TPHD filter to improve tracking stability under incomplete measurements and suppress false track initiation in a cluttered environment.
The rest of this paper is organized as follows.
Section 2 introduces the TPHD filter and its limitations under challenging scenarios.
Section 3 presents the proposed TPHD tracking method, using Doppler features.
Section 4 designs the network parameters and verifies the effectiveness in improving association performance.
Section 5 verifies the effectiveness of the tracking method by using simulation and experimental data. Finally, concluding remarks are given in
Section 6.
2. Problem Formulation
The TPHD filter uses a set of trajectories as the state variable, which has the advantage of extracting and smoothing trajectory estimates during tracking. In this section, a short review of the Gaussian mixture TPHD filter is presented, and the limitations under dense targets and incomplete measurements are introduced.
The TPHD filter estimates the trajectories of the alive targets by propagating a Poisson multi-trajectory density through prediction and update steps [
15]. Given a single target state
, including the position and velocity information of the target, its trajectory is
, where
is the initial time step of the trajectory, and
denotes a sequence of length
. A single-trajectory Gaussian density at time
is
with the mean
, covariance matrix
, start time
, and duration
. The PHD of the birth density is
where
is the number of components,
is the weight,
is the mean, and
is the covariance matrix of the
j-th component.
The closed form of the TPHD filter consists of the following prediction and update steps.
- (1)
Prediction step
Assume that the posterior intensity at time
k−1 is a Gaussian mixture of the form
Then, the prior intensity at time
k is given in Equation (4), where
is the intensity of survival trajectories, as in Equation (5).
The mean and covariance matrix of
are calculated as in Equations (6) and (7):
where
is the single-target transition matrix,
is the covariance matrix of the single-target process noise,
represents the Kronecker product, and
is the
zero matrix.
- (2)
Update step
Assume that the predicted intensity at time
k is a Gaussian mixture of the form
Then, the posterior intensity at time
k is given by Equation (10), where
is the detection probability, and
is the intensity, updated by the measurements, as in Equation (11).
The mean and covariance matrix of
are updated as in Equations (12) and (13):
where
is the single-measurement matrix, and
is the covariance matrix of the single-measurement noise. The weights are updated as follows:
where
is the intensity of the clutter RFS.
To limit the number of components, GM-TPHD uses pruning and absorption procedures. Instead of the merging techniques in the GM-PHD filter, the GM-TPHD filter uses the absorption techniques, which remove close components and add their weights to the unremoved components.
Finally, the number of trajectories is estimated as . The estimated set of trajectories consists of the components with the largest weights.
Considering computation feasibility, the L-scan implementation of the GM-TPHD filter is applied. The correlations of states before L time steps are discarded, and only the states of the last L time steps are used to calculate the PHD. So, the mean and covariance matrix of trajectory Gaussian densities are and , respectively, in practical implementation.
From the above process, it can be seen that GM-TPHD updates historical trajectories at each moment, thereby achieving track smoothing and improving the track accuracy. However, targets within a group are often closely spaced and exhibit similar kinematic characteristics. Under such conditions, the conventional likelihood function becomes insufficiently discriminative. Measurements originating from adjacent targets may yield similar likelihood values, leading to ambiguous weight updates in Equation (16). Therefore, features can be incorporated into the weight updates to achieve more accurate tracking.
Meanwhile, in high-resolution radar systems, narrow-beam monopulse radars can provide accurate range and angle measurements with high data rates, but only a subset of the group can be illuminated at each scan due to the limited beamwidth. Such incomplete measurements lead to track fragmentation and fluctuations in the estimated target cardinality. In the standard TPHD filter, the detection probability is typically modeled as a constant parameter, which is not adaptive to all targets. Under a limited beamwidth, a target outside the radar beam generates no measurement, and a target within the beam is detected with a constant probability.
In addition, the number and spatial distribution of newly appearing targets within the beam are generally unknown. Treating all unexpected measurements as birth components may lead to incorrect track initiation, as there may be clutter, such as insects, drones, and other aerial targets. The tracking breakage and incorrect initiation can further interfere with the stable tracking of other adjacent targets within the beam.
3. Adaptive TPHD Tracking Using Doppler Features
To achieve stable tracking for group targets under incomplete measurements, this section proposes an adaptive TPHD tracking method using the Doppler feature. Firstly, a neural network is designed to extract the Doppler feature for the track–measurement association. Then, multiple features are fused based on the XGBoost to acquire the association probability. Finally, adaptive mechanisms and the learned association probabilities are incorporated into the TPHD filter.
3.1. Doppler Temporal Contrastive Network
Due to the wingbeat behavior of birds, targets within a group exhibit distinct micro-Doppler characteristics. When birds maintain a stable wingbeat frequency during flight, the micro-Doppler characteristics exhibit periodic variations. Therefore, to exploit Doppler characteristics for robust association in tracking a bird flock, we propose the Doppler temporal contrastive network (DTCN).
Instead of explicitly predicting the Doppler echo, the DTCN directly models the temporal evolution of Doppler features in the latent feature space. Firstly, the radar echoes from both historical and current coherent processing intervals (CPIs) are embedded into high-dimensional feature vectors using the Doppler echo encoder. Subsequently, by introducing a masked conditional temporal prediction mechanism, the network learns to infer the current micro-Doppler representation from historical feature sequences with missed detections. Finally, a contrastive learning objective is incorporated to enforce feature consistency between the predicted representation and the embedding of the current detection, enabling reliable track–measurement association under dense scenarios. Meanwhile, the Doppler feature embeddings are also used to extract discriminative representations for clutter suppression and target birth probability estimation. The overall framework of the DTCN is illustrated in
Figure 1.
3.1.1. Input Data Preprocessing
After coherent accumulation within each CPI, the Doppler spectrum from each range cell of detections is extracted from the Range–Doppler (RD) plane, named the Doppler echo slices. Then, they are shifted, cropped, and centered on the average frequency of the main Doppler. The logarithmic power spectrum is computed for each Doppler echo slice and is normalized to the range [0, 1] for all historical CPIs of multiple targets. The Doppler echo slice after normalization is denoted as . For the CPI of missed detections, is filled with a zero vector. Since the Doppler echo at time k is unknown, it is also treated as a missed detection and set as . Thus, a sequence of normalized Doppler echo slices for the trajectory during the time period K−L+1:K is obtained, denoted as . Meanwhile, the normalized Doppler echo slices of the candidate detection in time K is denoted as . The DTCN network infers the association between and .
3.1.2. Doppler Feature Prediction and Contrastive Learning
The network first employs a shared-weight Doppler feature encoder to map the Doppler echo slices of historical trajectories and the candidate detections in the current CPI into a unified high-dimensional feature space, yielding embedded features. Subsequently, a temporal prediction module is introduced to model the temporal evolution of Doppler features and predict the current micro-Doppler representation. Then, both the predicted feature and the embedded feature pass through a shared-weight multi-layer perceptron (MLP) to obtain reduced-dimension feature vectors. Finally, the contrastive loss is applied to enforce feature consistency constraints for the association pairs.
- (1)
Doppler Feature Encoder
Firstly, a 1D convolutional neural network (1D-CNN) is used to extract spatial structural information from the Doppler echo, followed by an average pooling layer for feature dimension reduction:
where
conv represents the feature extraction network composed of two layers of 1D-CNN and ReLU activation layers, and
Pool denotes average pooling to reduce the original input dimension to 1.
While the 1D-CNN effectively extracts local structure features of the Doppler distribution, its translation invariance may weaken the perception of the position of the main Doppler component. To ensure the temporal model can learn the evolution of the main Doppler component over time, a fully connected layer
is applied. Then the features extracted by the fully connected layer and 1D-CNN are concatenated, enabling joint modeling of Doppler local structure and global evolution.
After the above process, the encoded feature vector of the trajectory is , and that of the candidate detection is . The features of missed detections in are still masked as zero vectors.
- (2)
Temporal Prediction
The feature sequence
is processed by a bidirectional LSTM (Bi-LSTM) network [
31] to learn the implicit periodic patterns within the Doppler feature sequence, enabling the prediction of latent features for masked time steps. The output is a feature sequence denoted as
.
We employ temporal self-attention mechanisms to capture correlations among different time steps, adaptively re-weighting historical temporal features to enhance the robustness of feature prediction. Considering the presence of missed detections in the historical sequence, mask awareness is introduced to focus primarily on non-missed time steps.
For each time step
k, we compute query, key, and value vectors:
The attention score of time step
on time step
is as follows, where the attention score from any time step to the misdetected time step is set to 0.
where
represents the feature dimension of the keys. We utilize multi-head attention. The output for the time step
is:
where
H is the number of attention heads,
denotes concatenation,
is a fully connected layer that fuses the multi-head attention features, and
is the attention coefficients for each head. The predicted feature for each time step is now denoted as
.
- (3)
Feature Contrastive Learning
After temporal feature prediction, the predicted feature for the current moment
and the encoder feature of the candidate detection
are fed into a weight-sharing MLP
to obtain reduced-dimension feature representations:
Features from the track and measurement of the same target should be closer in the feature space, while features from different targets should be farther apart. Therefore, the feature contrastive module adopts the following contrastive loss function:
Here, is a binary label. If the track and measurement originate from the same target, , otherwise, . represents a margin distance, i.e., the minimum distance in feature space between two track segments belonging to different targets. This paper sets . denotes the Euclidean distance.
3.1.3. Target Birth Probability Estimation
Since newly appearing targets within the radar beam are unknown, potential newborn components are established based on the measurements at each time step. It is necessary to distinguish between birds and other aerial targets. Therefore, a birth probability estimation network module is applied based on the encoded features in
Section 3.1.2:
where
is an MLP, including a fully connected layer, a ReLU layer, and a fully connected layer in sequence. It is further used to calculate the probability of a detection being a new target, which will be discussed in
Section 3.3.2.
Through this module, it is possible to rapidly suppress clutter by using only a single CPI of echoes. This network is trained using the binary cross-entropy loss function, as in Equation (30), with a label of
. For bird targets,
, while for other kinds of aerial targets,
.
3.1.4. Multi-Task Loss Function
The birth probability estimation network is trained simultaneously with the contrastive learning network, which is conducive to learning the discriminative features between the target and the clutter. Therefore, a multi-task loss function is employed during network training:
3.2. XGBoost-Based Feature Fusion
In dense target tracking scenarios, the effectiveness of a single association cue can vary significantly due to factors such as target range, measurement noise, and missed detections. In dense target situations, relying solely on spatial distance becomes unreliable. While Doppler similarity can reflect the difference in target attitude, association performance based solely on this feature is limited. For example, if the historical track is shorter than the wingbeat period, it is not possible to learn effective predictive features. Furthermore, individuals may exhibit similar attitudes at certain times, rendering the Doppler features incapable of distinguishing between them.
Therefore, we formulate data association as a probability inference problem at the feature level, and employ an XGBoost model to fuse various cues with differing reliabilities. In order to fully utilize the Doppler feature and kinematic parameters for the data association, the proposed model combines geometric consistency, competitive measurement context, Doppler similarity learned via the DTCN, and tracking reliability characterized by the historical missed detection rate. The feature is constructed using the following four components:
- (1)
The position difference between the target’s predicted position and the candidate measurement
- (2)
The position difference between the target’s predicted position and the nearest measurement
- (3)
The Doppler feature difference:
- (4)
The ratio of valid echoes in historical time steps:
where represents the misdetected time steps of the historical trajectory during the time period .
Here, and provides competitive information for the candidate measurement, and reflects the credibility of .
XGBoost is an ensemble learning method based on gradient boosted decision trees, which constructs a strong classifier by iteratively fitting decision trees to minimize a differentiable loss function. The model output is an additive ensemble of regression trees:
where
denotes the
th tree structure,
T is the total number of trees, and
denotes the space of the regression trees. Finally, the association probability is acquired through a sigmoid function:
This model is trained using the binary cross-entropy loss function. Compared with end-to-end neural association models, the tree-based fusion mechanism is more suitable for low-dimensional, semantically meaningful features and provides robustness to missing or unreliable inputs. It enables adaptive weighting of different cues under diverse tracking conditions, improving association accuracy in dense and cluttered environments.
3.3. Adaptive TPHD Under Incomplete Measurements
To achieve stable tracking for group targets under incomplete measurements, the adaptive detection probability and target birth mechanisms are developed. The proposed feature extraction and fusion networks are also incorporated into the TPHD filter, ultimately reflected in the weight update process.
3.3.1. Adaptive Detection Probability
Considering the incomplete measurements under a limited beamwidth, the detection probability of targets within the radar beam is modeled as:
where
is a constant detection probability for the target within the radar beam.
is the probability of the target
j, being within the radar beam at time
k. Ignoring the indication of
j and
k, given a target estimation with azimuth angle
and elevation angle
in the polar coordinates,
and
represent the radar beam coverage in azimuth and elevation, respectively. Assuming that
and
follow independent Gaussian distributions with means
and
, and the standard deviations
and
, respectively, we have
With this detection probability model, a long-miss track pruning operation is also applied. When a target leaves the radar beam coverage, its detection probability changes to 0. As a result, it will be permanently retained as the surviving component and predicted at each time step. Since the track prediction accuracy and association reliability degrade over time, retaining these predicted tracks results in unnecessary computational costs. Therefore, the redundant tracks with consecutive missed detections will be deleted.
3.3.2. Adaptive Target Birth
Since newly appearing targets within the radar beam are unknown, we establish the birth components based on all candidate measurements. In the absorption procedure, if the birth component is close to the distribution of the current target state of another component with a higher weight, it will be removed to avoid affecting the cardinality estimation. If no other components are available for absorption, the birth component will participate in subsequent track updates as a new target.
To distinguish birds and aerial clutter, the network module in
Section 3.1.3 is applied to estimate the probability of a birth target
j at time
k, denoted as
. Then, the modified weight assigned for each birth component is
, where
is the birth weight as in Equation (2). The modified birth weight will be a small value for the clutter, avoiding false track initialization.
3.3.3. GM-TPHD Update
Based on the adaptive association probability
, the TPHD update step in Equation (10) changes to:
After the feature fusion based on XGBoost, we can obtain the association probability
between trajectory
j and measurement
z for each time step
k, which is more discriminative for closely spaced targets. By replacing
and
into Equation (16), we acquired the modified weights in the update step:
With this update, the GM-TPHD filter can maintain the predication state for targets with low and acquire larger weights for correct components, thereby achieving stable target tracking.
5. Simulation and Experimental Results
This section evaluates the tracking performance of the proposed method. The tracking performance is first analyzed using synthetic data that combines simulated tracks with radar echo, and then is verified using experimental bird flock data.
The parameters of the proposed tracking method are given as follows. The detection probability within the target field of view is . The survival probability is 0.99. The clutter intensity is , and the new component weight is . The pruning threshold is 0.001, the absorption threshold is 1.5, and the maximum number of components is 30. The scan length of TPHD is set to 5. The comparison algorithm is the original TPHD algorithm, whose detection probability is a constant . All other parameters are identical to the proposed method.
In the following simulations, the roles of each module of the proposed method are analyzed through ablation experiments. The methods using the X-f2 and the X-f4 fusion model are called TPHD-X and TPHD-DX, respectively. Further, the TPHD-DX with the adaptive detection probability mechanism is called TPHD-DX-A1, and the TPHD-DX with both of the two adaptive mechanisms is called TPHD-DX-A2.
5.1. Simulation Results
To quantitatively evaluate tracking performance, simulated target tracks were added to a set of bird flock radar data from the test dataset constructed in
Section 4.1.1, along with clutter data randomly selected from the clutter data test dataset. The simulation scenario includes seven birds, with the overall group movement following the CV model and the individual movement following the formation flight model in [
33]. The target spacing is 3 m, and the initial group speed is 30 m/s. Subsequently, the target measurements are generated under the limited radar beamwidth. The sampling interval is set to 0.04 s, which is the same as the radar CPI. The tracking lasts for 100 time steps. The beamwidth is set to 0.2°, and the radar beam center is aligned with the center target of the group. The detection probability of targets within the beam is 0.99. The measurement noise follows a Gaussian distribution with a standard deviation of 0.8 m. The clutter at each moment is randomly generated within the beamwidth. The number of clutter points follows a Poisson distribution, and the clutter density is set to
.
It should be noted that incorrect associations may occur frequently in dense target scenarios, resulting in inconsistent trajectories, also represented as track splitting. Ignoring these inconsistent trajectories will lead to more missed targets. Therefore, in the following analysis, we retain all trajectories with non-overlapping states of more than 10 time steps. This threshold corresponds to the time required for a trajectory to achieve stable tracking, thereby ensuring reliable track confirmation while suppressing transient false trajectories.
The tracking results of GM-TPHD and the proposed method are shown in
Figure 6a and
Figure 6b, respectively. As can be seen from the figures, the original GM-TPHD method has several incorrect associations with nearby targets and forms several false tracks based on clutter. It also has track breakage times of k = 65~67 and k = 76~84 because of the frequent missed detections for the targets on the beam edge. In contrast, the proposed method achieves stable tracking. Comparison of the cardinality is shown in
Figure 7. The estimated cardinality of the original GM-TPHD method deviates from the truth at several time steps. The proposed method has more accurate target estimation and is able to achieve more stable tracking.
We conducted 100 Monte Carlo (MC) experiments to analyze the tracking performance. At each time step k, the error between the truth value and the estimated value of the alive trajectory set was evaluated. The evaluated metrics include the optimal sub-pattern assignment (OSPA) distance [
34], and the trajectory metric (TM) based on linear programming in [
35], with parameters
p = 2, penalty distance
c = 2.5 m, and
γ = 2. The trajectory metric consists of four components:
,
,
, and
, representing the localization cost of properly detected targets, missed target cost, false target cost, and track switching cost, respectively. Each cost was normalized using the time window length.
In the case when tracks have large errors in the initialization stage, the error accumulates and affects the metric of the subsequent tracking performance. However, the track initialization performance is not the focus of this paper. Therefore, the TM will be calculated starting from the 10th time step in the following experiments.
The average OSPA distance during tracking is shown in
Figure 8. There is a large error at the time steps 45 and 53 when the targets enter the beam, but the error drops significantly afterwards. This is because the weight assigned to the birth component is small at the beginning of the track, and it is not extracted as a trajectory until it is updated by subsequent measurements. There are three peaks from time steps 65 to 84, indicating more track breakages caused by missed detections. However, the proposed methods converge faster after target birth and missed detections, and achieve smaller OSPA distances than the original GM-TPHD method.
The average TM error during tracking is shown in
Figure 9. As the tracking time increases, the accumulation of trajectory errors leads to a gradual increase in the TM. It is due to the increase in the target number in the later stage of tracking, which brings larger tracking errors. It can be seen that the original GM-TPHD method has the largest tracking error, reflected in both OSPA distance and TM error. By introducing the XGBoost and the DTCN module, the tracking errors of the TPHD-X and TPHD-DX methods decrease successively. Further, by employing the adaptive detection probability and adaptive target birth mechanisms, the tracking errors of the TPHD-DX-A1 and TPHD-DX-A2 methods decrease successively.
The comparison of each TM component is shown in
Figure 10. It can be seen that the proposed method has better performance than the original GM-TPHD method in all components. The TPHD-X method uses XGBoost to fuse the kinematic information of candidate measurements, enhancing the discrimination between different measurements and achieving lower tracking errors. The TPHD-DX method further incorporates Doppler features, which reduces incorrect associations and, thereby, significantly decreases false targets and track switches, as shown in
Figure 10b,d.
Due to the missed detections, trajectories are prone to having delayed track initiation or premature track termination, which increases the missed target cost in
Figure 10c. By introducing the adaptive detection probability, the TPHD-DX-A1 method significantly reduces the missed target cost. It can better adapt to the missed detections of targets at the beam edge, improving the stability of target tracking. But it slightly increases the false target cost at the same time. The TPHD-DX-A2 method further incorporates the adaptive target birth mechanism to suppress false track initiation in cluttered environments, thereby reducing the false target cost in
Figure 10b. It can also reduce false association between targets and clutter, thereby reducing the missed target cost in
Figure 10c.
To sum up, the proposed TPHD-DX-A2 method has better association performance and is less affected by missed detections and false alarms, thus exhibiting the smallest trajectory metric error.
Furthermore, the runtime was evaluated in the above scenario. The original TPHD method needs 0.451 s per CPI, while the proposed TPHD-DX-A2 method needs 0.302 s per CPI. The runtime is not sufficient to support real-time processing, so they are all offline algorithms. The time consumption is mainly due to the fact that the TPHD filter maintains multiple trajectory assumptions. The original TPHD method generated more false tracks during tracking, resulting in a longer process time. In contrast, the proposed method generates fewer false tracks, which reduces the number of trajectory assumptions and thus has less runtime. According to the analysis in
Section 4.2.1, the runtime of the DTCN module is in the order of ms, which is significantly lower than that of the filter. Compared with the original TPHD filter, the proposed method not only improves the operational efficiency but also enhances the tracking performance.
5.2. Experimental Results
5.2.1. Bird Flock in Line Formation
The proposed method is further evaluated using experimental data of the bird flock. The first scene consists of six birds flying in a line formation, which is captured by the photoelectric pod, as in
Figure 11. By manually comparing the photoelectric video with the radar data, it can be inferred that the five birds in the yellow dotted line have been detected by the radar beam. The radar measurements and tracking results are shown in
Figure 12. The tracking lasts for 100 CPIs, with a CPI of 0.04 s. The red lines represent alive tracks at time step 100, while the blue lines represent dead tracks that terminated earlier.
As shown in
Figure 12a, Track 3 erroneously splits into Track 7, and the same situation occurs with Tracks 4 and 5. The occurrence of Track 7 further leads to the incorrect breakage of Track 5. In contrast, in
Figure 12b, the proposed method achieves correct association during tracking, avoiding incorrect track switching, indicating that it can utilize the Doppler characteristics to distinguish adjacent targets. Targets at the beam edge have a higher rate of missed detections, leading to track fragmentation. However, the proposed method achieves more stable tracking. It can be seen that Track 9 in
Figure 12b is broken into Tracks 8 and 9 in
Figure 12a, and Track 8 in
Figure 12b is broken into Tracks 4 and 6 in
Figure 12a. This mainly benefits from the adaptive probability model of the proposed method. The corresponding cardinality estimations are shown in
Figure 13. Because of the frequent missed detections of the two targets at the beam edge, the cardinality estimations are less than five most of the time. However, the proposed method has fewer track switches and more stable tracking, so it has a more accurate cardinality estimation than the original GM-TPHD.
The tracking performance is analyzed quantitatively through manual verification, with the number of track switches (NTS) [
35], the cumulative number of track breakages (CNTB) [
36], and the success tracking rate (STR) being considered [
36]. The CNTB is the total number of frames that the true target is not assigned to the track, as defined in Equation (44), where
N is the total number of true targets. The STR is defined as the proportion of true trajectories for which the tracking duration exceeds 80% of their total lifespan.
For the TPHD method, the NTS is 2, the CNTB is 14.4, and the STR is 60%. In contrast, the NTS for the proposed method is 0, the CNTB is 9.6, and the STR is 80%. The targets have missed detections during a long period, which results in a high number of track breakages for each method. However, the proposed method demonstrates better tracking continuity and correctness.
5.2.2. Bird Flock in V Formation
In the second scene, the radar detected five birds flying in a V formation. The radar measurements and tracking results are shown in
Figure 14. As shown in
Figure 14a, due to the frequent missed detections of the targets at the beam edge, the tracks of the original GM-TPHD method have frequent track breakages. After the track breakage, the measurements of the targets were associated with adjacent targets, resulting in an incorrect track switch. The tracking results of the proposed method are shown in
Figure 14b. There is no incorrect track switch. Among them, Tracks 1, 2, 4, and 5 achieved continuous tracking of four birds, while Track 3 was restarted as Track 6 due to a longer time track breakage. Compared with the original GM-TPHD algorithm, the proposed method has improved the stability of target tracking.
The cardinality estimations of the number of targets during the tracking process are shown in
Figure 15. Due to frequent missed detections of targets, the cardinality estimations vary during tracking. However, the estimation of the proposed method is more stable than that of the original GM-TPHD method, which is closer to five targets for most of the time.
By manual verification, the NTS of the TPHD method is 4, the CNTB is 4.4, and the STR is 60%. The NTS of the proposed method is 0, the CNTB is 0, and the STR is 100%. These results demonstrate that the proposed method has better tracking performance and can be applicable under practical scenarios.