Adaptive TPHD Tracking for Individuals Within a Bird Flock Using Doppler Features

Ni, Na; Guo, Yuhang; Wang, Zhiqin; Jiang, Qi; Li, Weidong; Wang, Rui; Hu, Cheng

doi:10.3390/rs18101538

Open AccessArticle

Adaptive TPHD Tracking for Individuals Within a Bird Flock Using Doppler Features

by

Na Ni

¹

,

Yuhang Guo

²,

Zhiqin Wang

^2,*,

Qi Jiang

¹,

Weidong Li

¹,

Rui Wang

¹ and

Cheng Hu

^1,3

¹

School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China

²

China Academy of Information and Communications Technology, Beijing 100191, China

³

Advanced Technology Research Institute, Beijing Institute of Technology, Jinan 250300, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(10), 1538; https://doi.org/10.3390/rs18101538

Submission received: 9 March 2026 / Revised: 5 May 2026 / Accepted: 9 May 2026 / Published: 12 May 2026

(This article belongs to the Special Issue Small Target Detection, Recognition, and Tracking in Remote Sensing)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A Doppler temporal contrastive network is developed to learn micro-Doppler representations of bird targets, and is fused with kinematic parameters using XGBoost to improve the association accuracy in dense flock scenarios.
The adaptive detection probability model and target birth mechanism are incorporated into the TPHD filter, reducing track fragmentation and false initialization under incomplete measurements and clutter interference.

What are the implications of the main findings?

The Doppler feature can provide complementary discriminative information beyond kinematic parameters, enhancing tracking performance within a bird flock.
The proposed framework provides a solution for robust radar tracking under incomplete measurements, with applicability to real-world bird flock monitoring.

Abstract

Tracking multiple targets within a group is a challenging task in the radar field, especially for a bird flock. Targets in a group are usually closely spaced and exhibit similar characteristics. Additionally, the tracking radar typically employs a narrow beam to achieve a high range–angular resolution, resulting in incomplete measurements within the limited beamwidth. These factors lead to false association and track fragmentation in target tracking. However, in addition to kinematic characteristics, birds exhibit temporally correlated micro-Doppler signatures because of their wingbeat behavior, which can be utilized in target tracking. Therefore, this paper proposes an adaptive TPHD tracking method using Doppler features. First, a Doppler temporal contrastive network is designed to learn the micro-Doppler representation for the association of birds. Then, the learned feature is fused with kinematic parameters, using XGBoost to guide the weight update in the filter. Moreover, adaptive mechanisms are incorporated into the TPHD filter to achieve stable tracking under incomplete measurements. Simulation and experimental results verified the effectiveness of the proposed method and showed better tracking performance than the competing method.

Keywords:

multi-target tracking; contrastive learning; Doppler feature; TPHD filter

1. Introduction

Bird flock activity has attracted increasing attention in low-altitude airspace due to its significant impact on ecological systems and aviation safety. Tracking bird flocks provides important data support for biological flight mechanism research [1], ecological monitoring [2], and bird strike risk assessment [3]. In the radar field, since birds are usually spatially concentrated and hard to distinguish, many approaches [4,5] focus on tracking the bird flock as a group target, including its centroid and extension state. However, such group target tracking is insufficient to reveal the interaction mechanism and the behavioral diversity of individual targets. Therefore, to enable individual behavioral analysis, tracking individual targets within a group is essential.

However, targets in a group are usually closely spaced and exhibit similar kinematic characteristics, making it hard to associate adjacent targets using only positional information. To achieve high-precision tracking of individual targets within a group, tracking radars must provide high range and angular resolution. Narrow-beam monopulse radars offer advantages in terms of high resolution and high data rate in target tracking. However, due to the limitation of beamwidth, only part of the group can be observed, and the number of observed targets varies with the radar-target geometry [6]. These incomplete measurements lead to high fragmentation of tracks. Therefore, it is challenging to maintain stable tracking for closely spaced group targets with incomplete measurements.

Traditional tracking methods [7,8] rely on data association performance, and thus face problems of poor tracking accuracy and high computational complexity in dense target scenarios. Avoiding the measurement-to-target association problem, random finite set (RFS)-based filters [9,10,11] estimate multi-target states under association uncertainty. To further form continuous target tracks, the target label has been introduced in these filters, such as the tagged PHD [12], labeled multi-Bernoulli (LMB) [13], and generalized LMB (GLMB) [14]. Nonetheless, the tags or labels are not used for track formation, leading to track switches, missed detections, and false targets. By modeling the set of trajectories as the state variable, the trajectory PHD (TPHD) method [15,16] can maintain and smooth individual trajectories during target tracking, thereby improving tracking accuracy. However, it is still difficult to distinguish multiple targets with similar positions and velocities.

Instead of using only kinematic information, several methods utilize features to improve multi-target tracking performance. Handcrafted features, including polarization scattering characteristics [17], time–frequency information [18], and radar cross section (RCS) and Doppler information [19,20,21], have been integrated into filters to enhance tracking performance in cluttered environments. These methods primarily utilize the statistical and kinematic characteristics to construct likelihood functions in the filter, which can better distinguish between clutter and different types of targets. Several methods extract spatial and temporal correlations and image feature similarity through deep learning networks. The AMIR network [22] fuses the interaction, motion, and appearance features by long short-term memory (LSTM) networks. The MPNTrack method [23] models the appearance and geometric feature interactions between targets. In [24], a convolutional Siamese network is proposed to extract radar echo features, and multiple features are fused by extreme gradient boosting (XGBoost) for data association of maneuvering targets in dense false alarms. Similarly, the Siamese network is also used in [25] to match the distribution pattern of echoes. In [26], an interactive Transformer–graph attention network is designed to learn multi-frame spatio–temporal relationships and maneuvering characteristics. However, these features are not effective in distinguishing targets with similar echo characteristics, and the spatial correlations between targets are hard to capture due to the incomplete measurements. Consequently, the existing feature-based methods struggle to correctly track individuals within a group.

Birds exhibit micro-Doppler signatures due to their wingbeat behavior [27,28]. The Doppler spread varies among individuals, providing additional information for distinguishing targets within a group. These signatures contain rich information related to the target size and wingbeat dynamics, which have been widely studied for biophysical parameter estimation and species classification [29,30]. However, micro-Doppler signatures vary significantly over time due to wingbeat behavior, exhibiting complex temporal dynamics that make them difficult to directly use for track–measurement association.

To address these problems, this paper proposes an adaptive TPHD tracking method using Doppler features. The core contribution lies in the proposed Doppler feature representation learning and its effective integration into the TPHD filtering framework, which can improve tracking for individuals within a dense bird flock. Instead of establishing an explicit micro-Doppler model, the proposed method adopts a data-driven manner for Doppler feature prediction and contrast. Then, the learned Doppler feature is fused with kinematic parameters to guide the weight update in the TPHD filter. Moreover, adaptive mechanisms are incorporated into the TPHD filter to enhance tracking stability under incomplete measurements. The main contributions of this paper are given as follows:

(1): A Doppler temporal contrastive network is designed to extract discriminative representations from the time-varying radar echo of birds, which provides complementary information beyond kinematic parameters and improves the association of individuals within the bird flock.
(2): An XGBoost-based feature fusion strategy is proposed to incorporate the Doppler representation and kinematic parameters into the TPHD filter, thereby improving the tracking performance of closely spaced group targets.
(3): Adaptive detection probability and adaptive target birth mechanisms are applied in the TPHD filter to improve tracking stability under incomplete measurements and suppress false track initiation in a cluttered environment.

The rest of this paper is organized as follows. Section 2 introduces the TPHD filter and its limitations under challenging scenarios. Section 3 presents the proposed TPHD tracking method, using Doppler features. Section 4 designs the network parameters and verifies the effectiveness in improving association performance. Section 5 verifies the effectiveness of the tracking method by using simulation and experimental data. Finally, concluding remarks are given in Section 6.

2. Problem Formulation

The TPHD filter uses a set of trajectories as the state variable, which has the advantage of extracting and smoothing trajectory estimates during tracking. In this section, a short review of the Gaussian mixture TPHD filter is presented, and the limitations under dense targets and incomplete measurements are introduced.

The TPHD filter estimates the trajectories of the alive targets by propagating a Poisson multi-trajectory density through prediction and update steps [15]. Given a single target state

x \in ℝ^{n_{x}}

, including the position and velocity information of the target, its trajectory is

X = (t, x_{1 : l})

, where

t

is the initial time step of the trajectory, and

x_{1 : l} = (x_{1}, \dots, x_{l})

denotes a sequence of length

l

. A single-trajectory Gaussian density at time

k

is

N (t, x_{1 : l}; m_{k}, P_{k}) = \{\begin{matrix} N (x_{1 : l}; m_{k}, P_{k}) & t = t_{k}, l = l_{k} \\ 0 & o t h e r w i s e \end{matrix}

(1)

with the mean

m_{k} \in ℝ^{l_{k} n_{x}}

, covariance matrix

P_{k} \in ℝ^{l_{k} n_{x} \times l_{k} n_{x}}

, start time

t_{k}

, and duration

l_{k}

. The PHD of the birth density is

D_{β, k} (X) = \sum_{j = 1}^{J_{β, k}} w_{β, k}^{j} N (X; t_{β, k}^{j}, m_{β, k}^{j}, P_{β, k}^{j})

(2)

where

J_{β, k} \in ℕ

is the number of components,

w_{β, k}^{j}

is the weight,

m_{β, k}^{j} \in ℝ^{n_{x}}

is the mean, and

P_{β, k}^{j} \in ℝ^{n_{x} \times n_{x}}

is the covariance matrix of the j-th component.

The closed form of the TPHD filter consists of the following prediction and update steps.

(1): Prediction step

Assume that the posterior intensity at time k−1 is a Gaussian mixture of the form

D_{k - 1} (X) = \sum_{j = 1}^{J_{k - 1}} w_{k - 1}^{j} N (X; t_{k - 1}^{j}, m_{k - 1}^{j}, P_{k - 1}^{j}) .

(3)

Then, the prior intensity at time k is given in Equation (4), where

D_{S, k | k - 1} (X)

is the intensity of survival trajectories, as in Equation (5).

D_{k | k - 1} (X) = D_{S, k | k - 1} (X) + D_{β, k} (X) .

(4)

D_{S, k | k - 1} (X) = p_{S} \sum_{j = 1}^{J_{k - 1}} w_{k - 1}^{j} N (X; t_{k - 1}^{j}, m_{S, k | k - 1}^{j}, P_{S, k | k - 1}^{j}) .

(5)

The mean and covariance matrix of

D_{S, k | k - 1} (X)

are calculated as in Equations (6) and (7):

m_{S, k | k - 1}^{j} = {[{(m_{k - 1}^{j})}^{T}, {({\dot{F}}^{j} m_{k - 1}^{j})}^{T}]}^{T},

(6)

P_{S, k | k - 1}^{j} = [\begin{matrix} P_{k - 1}^{j} & P_{k - 1}^{j} {({\dot{F}}^{j})}^{T} \\ {\dot{F}}^{j} P_{k - 1}^{j} & {\dot{F}}^{j} P_{k - 1}^{j} {({\dot{F}}^{j})}^{T} + Q \end{matrix}],

(7)

{\dot{F}}^{j} = [0_{1, l_{k - 1}^{j} - 1}, 1] \otimes F,

(8)

where

F

is the single-target transition matrix,

Q

is the covariance matrix of the single-target process noise,

\otimes

represents the Kronecker product, and

0_{m, n}

is the

m \times n

zero matrix.

(2): Update step

Assume that the predicted intensity at time k is a Gaussian mixture of the form

D_{k | k - 1} (X) = \sum_{j = 1}^{J_{k | k - 1}} w_{k | k - 1}^{j} N (X; t_{k | k - 1}^{j}, m_{k | k - 1}^{j}, P_{k | k - 1}^{j}) .

(9)

Then, the posterior intensity at time k is given by Equation (10), where

p_{d}

is the detection probability, and

D_{d, k} (X)

is the intensity, updated by the measurements, as in Equation (11).

D_{k} (X) = (1 - p_{d}) D_{k | k - 1} (X) + \sum_{z \in Z_{k}} D_{d, k} (X; z) .

(10)

D_{d, k} (X) = \sum_{j = 1}^{J_{k | k - 1}} w_{k}^{j} N (X; t_{k | k - 1}^{j}, m_{k}^{j}, P_{k}^{j}) .

(11)

The mean and covariance matrix of

D_{d, k} (X)

are updated as in Equations (12) and (13):

m_{k}^{j} = m_{k | k - 1}^{j} + P_{k | k - 1}^{j} {({\dot{H}}^{j})}^{T} {(S^{j})}^{- 1} (z - {\dot{H}}^{j} m_{k | k - 1}^{j}),

(12)

P_{k}^{j} = P_{k | k - 1}^{j} - P_{k | k - 1}^{j} {({\dot{H}}^{j})}^{T} {(S^{j})}^{- 1} {\dot{H}}^{j} P_{k | k - 1}^{j},

(13)

S^{j} = {\dot{H}}^{j} P_{k | k - 1}^{j} {({\dot{H}}^{j})}^{T} + R,

(14)

{\dot{H}}^{j} = [0_{1, l_{k | k - 1}^{j} - 1}, 1] \otimes H,

(15)

where

H

is the single-measurement matrix, and

R

is the covariance matrix of the single-measurement noise. The weights are updated as follows:

w_{k}^{j} = \frac{p_{d} w_{k | k - 1}^{j} q_{k}^{j} (z)}{K_{C} (z) + p_{d} \sum_{i = 1}^{J_{k | k - 1}} w_{k | k - 1}^{i} q_{k}^{j} (z)},

(16)

q_{k}^{j} (z) = N (z; {\dot{H}}^{j} m_{k | k - 1}^{j}, S^{j}) .

(17)

where

K_{C} (\cdot)

is the intensity of the clutter RFS.

To limit the number of components, GM-TPHD uses pruning and absorption procedures. Instead of the merging techniques in the GM-PHD filter, the GM-TPHD filter uses the absorption techniques, which remove close components and add their weights to the unremoved components.

Finally, the number of trajectories is estimated as

{\hat{N}}^{k} = r o u n d (\sum_{j = 1}^{J_{k}} w_{k}^{j})

. The estimated set of trajectories consists of the components with the

{\hat{N}}^{k}

largest weights.

Considering computation feasibility, the L-scan implementation of the GM-TPHD filter is applied. The correlations of states before L time steps are discarded, and only the states of the last L time steps are used to calculate the PHD. So, the mean and covariance matrix of trajectory Gaussian densities are

m_{k} \in ℝ^{L n_{x}}

and

P_{k} \in ℝ^{L n_{x} \times L n_{x}}

, respectively, in practical implementation.

From the above process, it can be seen that GM-TPHD updates historical trajectories at each moment, thereby achieving track smoothing and improving the track accuracy. However, targets within a group are often closely spaced and exhibit similar kinematic characteristics. Under such conditions, the conventional likelihood function

q_{k}^{j} (z)

becomes insufficiently discriminative. Measurements originating from adjacent targets may yield similar likelihood values, leading to ambiguous weight updates in Equation (16). Therefore, features can be incorporated into the weight updates to achieve more accurate tracking.

Meanwhile, in high-resolution radar systems, narrow-beam monopulse radars can provide accurate range and angle measurements with high data rates, but only a subset of the group can be illuminated at each scan due to the limited beamwidth. Such incomplete measurements lead to track fragmentation and fluctuations in the estimated target cardinality. In the standard TPHD filter, the detection probability is typically modeled as a constant parameter, which is not adaptive to all targets. Under a limited beamwidth, a target outside the radar beam generates no measurement, and a target within the beam is detected with a constant probability.

In addition, the number and spatial distribution of newly appearing targets within the beam are generally unknown. Treating all unexpected measurements as birth components may lead to incorrect track initiation, as there may be clutter, such as insects, drones, and other aerial targets. The tracking breakage and incorrect initiation can further interfere with the stable tracking of other adjacent targets within the beam.

3. Adaptive TPHD Tracking Using Doppler Features

To achieve stable tracking for group targets under incomplete measurements, this section proposes an adaptive TPHD tracking method using the Doppler feature. Firstly, a neural network is designed to extract the Doppler feature for the track–measurement association. Then, multiple features are fused based on the XGBoost to acquire the association probability. Finally, adaptive mechanisms and the learned association probabilities are incorporated into the TPHD filter.

3.1. Doppler Temporal Contrastive Network

Due to the wingbeat behavior of birds, targets within a group exhibit distinct micro-Doppler characteristics. When birds maintain a stable wingbeat frequency during flight, the micro-Doppler characteristics exhibit periodic variations. Therefore, to exploit Doppler characteristics for robust association in tracking a bird flock, we propose the Doppler temporal contrastive network (DTCN).

Instead of explicitly predicting the Doppler echo, the DTCN directly models the temporal evolution of Doppler features in the latent feature space. Firstly, the radar echoes from both historical and current coherent processing intervals (CPIs) are embedded into high-dimensional feature vectors using the Doppler echo encoder. Subsequently, by introducing a masked conditional temporal prediction mechanism, the network learns to infer the current micro-Doppler representation from historical feature sequences with missed detections. Finally, a contrastive learning objective is incorporated to enforce feature consistency between the predicted representation and the embedding of the current detection, enabling reliable track–measurement association under dense scenarios. Meanwhile, the Doppler feature embeddings are also used to extract discriminative representations for clutter suppression and target birth probability estimation. The overall framework of the DTCN is illustrated in Figure 1.

3.1.1. Input Data Preprocessing

After coherent accumulation within each CPI, the Doppler spectrum from each range cell of detections is extracted from the Range–Doppler (RD) plane, named the Doppler echo slices. Then, they are shifted, cropped, and centered on the average frequency of the main Doppler. The logarithmic power spectrum is computed for each Doppler echo slice and is normalized to the range [0, 1] for all historical CPIs of multiple targets. The Doppler echo slice after normalization is denoted as

e_{k}

. For the CPI of missed detections,

e_{k}

is filled with a zero vector. Since the Doppler echo at time k is unknown, it is also treated as a missed detection and set as

e_{k} = 0

. Thus, a sequence of normalized Doppler echo slices for the trajectory during the time period K−L+1:K is obtained, denoted as

e_{K - L + 1 : K} = (e_{K - L + 1}, \dots, e_{K})

. Meanwhile, the normalized Doppler echo slices of the candidate detection in time K is denoted as

e_{K}^{m}

. The DTCN network infers the association between

e_{K - L + 1 : K}

and

e_{K}^{m}

.

3.1.2. Doppler Feature Prediction and Contrastive Learning

The network first employs a shared-weight Doppler feature encoder to map the Doppler echo slices of historical trajectories and the candidate detections in the current CPI into a unified high-dimensional feature space, yielding embedded features. Subsequently, a temporal prediction module is introduced to model the temporal evolution of Doppler features and predict the current micro-Doppler representation. Then, both the predicted feature and the embedded feature pass through a shared-weight multi-layer perceptron (MLP) to obtain reduced-dimension feature vectors. Finally, the contrastive loss is applied to enforce feature consistency constraints for the association pairs.

(1): Doppler Feature Encoder

Firstly, a 1D convolutional neural network (1D-CNN) is used to extract spatial structural information from the Doppler echo, followed by an average pooling layer for feature dimension reduction:

h_{k}^{c n n} = P o o l (c o n v (e_{k}))

(18)

where conv represents the feature extraction network composed of two layers of 1D-CNN and ReLU activation layers, and Pool denotes average pooling to reduce the original input dimension to 1.

While the 1D-CNN effectively extracts local structure features of the Doppler distribution, its translation invariance may weaken the perception of the position of the main Doppler component. To ensure the temporal model can learn the evolution of the main Doppler component over time, a fully connected layer

ϕ_{e 1}

is applied. Then the features extracted by the fully connected layer and 1D-CNN are concatenated, enabling joint modeling of Doppler local structure and global evolution.

h_{k}^{f c} = ϕ_{e 1} (e_{k})

(19)

h_{k}^{e} = [h_{k}^{c n n}, h_{k}^{f c}]

(20)

After the above process, the encoded feature vector of the trajectory is

h_{K - L + 1 : K}^{e}

, and that of the candidate detection is

h_{K}^{e, d}

. The features of missed detections in

h_{K - L + 1 : K}^{e}

are still masked as zero vectors.

(2): Temporal Prediction

The feature sequence

h_{K - L + 1 : K}^{e}

is processed by a bidirectional LSTM (Bi-LSTM) network [31] to learn the implicit periodic patterns within the Doppler feature sequence, enabling the prediction of latent features for masked time steps. The output is a feature sequence denoted as

h_{K - L + 1 : K}^{l s t m}

.

We employ temporal self-attention mechanisms to capture correlations among different time steps, adaptively re-weighting historical temporal features to enhance the robustness of feature prediction. Considering the presence of missed detections in the historical sequence, mask awareness is introduced to focus primarily on non-missed time steps.

For each time step k, we compute query, key, and value vectors:

q_{k} = W_{q u e r y} h_{k}^{l s t m},

(21)

κ_{k} = W_{k e y} h_{k}^{l s t m},

(22)

v_{k} = W_{v a l u e} h_{k}^{l s t m} .

(23)

The attention score of time step

i

on time step

j

is as follows, where the attention score from any time step to the misdetected time step is set to 0.

α_{i j} = {softmax}_{j} (\frac{q_{i} κ_{j}^{T}}{\sqrt{d_{k e y}}})

(24)

where

d_{k e y}

represents the feature dimension of the keys. We utilize multi-head attention. The output for the time step

i

is:

h_{i} = ϕ_{f} ({| |}_{h = 1}^{H} (\sum_{j = K - L + 1}^{K} α_{i j}^{h} v_{j})),

(25)

where H is the number of attention heads,

| |

denotes concatenation,

ϕ_{f}

is a fully connected layer that fuses the multi-head attention features, and

α_{i j}^{(h)}

is the attention coefficients for each head. The predicted feature for each time step is now denoted as

h_{k}^{p r e d}

.

(3): Feature Contrastive Learning

After temporal feature prediction, the predicted feature for the current moment

h_{K}^{p r e d}

and the encoder feature of the candidate detection

h_{K}^{e, d}

are fed into a weight-sharing MLP

ϕ_{o u t}

to obtain reduced-dimension feature representations:

x_{K}^{d} = ϕ_{o u t} (h_{K}^{e, d})

(26)

x_{K}^{p r e d} = ϕ_{o u t} (h_{K}^{p r e d})

(27)

Features from the track and measurement of the same target should be closer in the feature space, while features from different targets should be farther apart. Therefore, the feature contrastive module adopts the following contrastive loss function:

L_{C} = \frac{1}{2} y_{a s s} {‖x_{K}^{p r e d} - x_{K}^{d}‖}_{2}^{2} + \frac{1}{2} (1 - y_{a s s}) \max {(0, m - {‖x_{K}^{p r e d} - x_{K}^{d}‖}_{2})}^{2}

(28)

Here,

y_{a s s}

is a binary label. If the track and measurement originate from the same target,

y_{a s s} = 1

, otherwise,

y_{a s s} = 0

.

m > 0

represents a margin distance, i.e., the minimum distance in feature space between two track segments belonging to different targets. This paper sets

m = 2

.

{‖\cdot‖}_{2}

denotes the Euclidean distance.

3.1.3. Target Birth Probability Estimation

Since newly appearing targets within the radar beam are unknown, potential newborn components are established based on the measurements at each time step. It is necessary to distinguish between birds and other aerial targets. Therefore, a birth probability estimation network module is applied based on the encoded features in Section 3.1.2:

p_{t a r} = S i g m o i d (ϕ_{t a r} (h_{K}^{e, d}))

(29)

where

ϕ_{t a r}

is an MLP, including a fully connected layer, a ReLU layer, and a fully connected layer in sequence. It is further used to calculate the probability of a detection being a new target, which will be discussed in Section 3.3.2.

Through this module, it is possible to rapidly suppress clutter by using only a single CPI of echoes. This network is trained using the binary cross-entropy loss function, as in Equation (30), with a label of

y_{t a r}

. For bird targets,

y_{t a r} = 1

, while for other kinds of aerial targets,

y_{t a r} = 0

.

L_{B C E} = y_{t a r} \log (p_{tar}) + (1 - y_{t a r}) \log (1 - p_{tar})

(30)

3.1.4. Multi-Task Loss Function

The birth probability estimation network is trained simultaneously with the contrastive learning network, which is conducive to learning the discriminative features between the target and the clutter. Therefore, a multi-task loss function is employed during network training:

L_{D P C N} = L_{C} + L_{B C E}

(31)

3.2. XGBoost-Based Feature Fusion

In dense target tracking scenarios, the effectiveness of a single association cue can vary significantly due to factors such as target range, measurement noise, and missed detections. In dense target situations, relying solely on spatial distance becomes unreliable. While Doppler similarity can reflect the difference in target attitude, association performance based solely on this feature is limited. For example, if the historical track is shorter than the wingbeat period, it is not possible to learn effective predictive features. Furthermore, individuals may exhibit similar attitudes at certain times, rendering the Doppler features incapable of distinguishing between them.

Therefore, we formulate data association as a probability inference problem at the feature level, and employ an XGBoost model to fuse various cues with differing reliabilities. In order to fully utilize the Doppler feature and kinematic parameters for the data association, the proposed model combines geometric consistency, competitive measurement context, Doppler similarity learned via the DTCN, and tracking reliability characterized by the historical missed detection rate. The feature

v_{K} = [d_{K}^{x}, d_{K}^{x, m i n}, R_{K}^{d o p}, d_{K}^{d o p}]

is constructed using the following four components:

(1): The position difference between the target’s predicted position and the candidate measurement $d_{K}^{x}$
(2): The position difference between the target’s predicted position and the nearest measurement $d_{K}^{x, m i n}$
(3): The Doppler feature difference:

$d_{K}^{d o p} = {‖x_{K}^{p r e d} - x_{K}^{d}‖}_{2}^{2};$

(32)
(4): The ratio of valid echoes in historical time steps:

$R_{K}^{d o p} = \frac{L_{m i s s}}{L},$

(33)

where

L_{m i s s}

represents the misdetected time steps of the historical trajectory during the time period

K - L + 1 : K

.

Here,

d_{K}^{x}

and

d_{K}^{x, m i n}

provides competitive information for the candidate measurement, and

R_{K}^{d o p}

reflects the credibility of

d_{K}^{d o p}

.

XGBoost is an ensemble learning method based on gradient boosted decision trees, which constructs a strong classifier by iteratively fitting decision trees to minimize a differentiable loss function. The model output is an additive ensemble of regression trees:

\begin{matrix} o_{K} = \sum_{t = 1}^{T} f_{t} (v_{K}), & f_{t} \in F, \end{matrix}

(34)

where

f_{t}

denotes the

t

th tree structure, T is the total number of trees, and

F

denotes the space of the regression trees. Finally, the association probability is acquired through a sigmoid function:

{\hat{q}}_{K} = S i g m o i d (o_{K})

(35)

This model is trained using the binary cross-entropy loss function. Compared with end-to-end neural association models, the tree-based fusion mechanism is more suitable for low-dimensional, semantically meaningful features and provides robustness to missing or unreliable inputs. It enables adaptive weighting of different cues under diverse tracking conditions, improving association accuracy in dense and cluttered environments.

3.3. Adaptive TPHD Under Incomplete Measurements

To achieve stable tracking for group targets under incomplete measurements, the adaptive detection probability and target birth mechanisms are developed. The proposed feature extraction and fusion networks are also incorporated into the TPHD filter, ultimately reflected in the weight update process.

3.3.1. Adaptive Detection Probability

Considering the incomplete measurements under a limited beamwidth, the detection probability of targets within the radar beam is modeled as:

p_{d, k}^{j} = p (d | B) p_{k}^{j} (B),

(36)

where

p (d | B)

is a constant detection probability for the target within the radar beam.

p_{k}^{j} (B)

is the probability of the target j, being within the radar beam at time k. Ignoring the indication of j and k, given a target estimation with azimuth angle

a

and elevation angle

e

in the polar coordinates,

[a_{\min}, a_{\max}]

and

[e_{\min}, e_{\max}]

represent the radar beam coverage in azimuth and elevation, respectively. Assuming that

a

and

e

follow independent Gaussian distributions with means

μ_{a}

and

μ_{e}

, and the standard deviations

σ_{a}^{2}

and

σ_{e}^{2}

, respectively, we have

p (B) = \int_{e_{\min}}^{e_{\max}} \int_{a_{\min}}^{a_{\max}} \frac{1}{2 π σ_{a} σ_{e}} \exp (\frac{1}{2} [\frac{{(a - μ_{a})}^{2}}{σ_{a}^{2}} + \frac{{(e - μ_{e})}^{2}}{σ_{e}^{2}}]) d a d e .

(37)

With this detection probability model, a long-miss track pruning operation is also applied. When a target leaves the radar beam coverage, its detection probability changes to 0. As a result, it will be permanently retained as the surviving component and predicted at each time step. Since the track prediction accuracy and association reliability degrade over time, retaining these predicted tracks results in unnecessary computational costs. Therefore, the redundant tracks with consecutive missed detections will be deleted.

3.3.2. Adaptive Target Birth

Since newly appearing targets within the radar beam are unknown, we establish the birth components based on all candidate measurements. In the absorption procedure, if the birth component is close to the distribution of the current target state of another component with a higher weight, it will be removed to avoid affecting the cardinality estimation. If no other components are available for absorption, the birth component will participate in subsequent track updates as a new target.

To distinguish birds and aerial clutter, the network module in Section 3.1.3 is applied to estimate the probability of a birth target j at time k, denoted as

p_{t a r, k}^{j}

. Then, the modified weight assigned for each birth component is

p_{t a r, k}^{j} w_{β, k}^{j}

, where

w_{β, k}^{j}

is the birth weight as in Equation (2). The modified birth weight will be a small value for the clutter, avoiding false track initialization.

3.3.3. GM-TPHD Update

Based on the adaptive association probability

p_{d, k}^{j}

, the TPHD update step in Equation (10) changes to:

\begin{matrix} D_{k} (X) = \sum_{j = 1}^{J_{k | k - 1}} (1 - p_{d, k}^{j}) w_{k | k - 1}^{j} N (X; t_{k | k - 1}^{j}, m_{k | k - 1}^{j}, P_{k | k - 1}^{j}) \\ + \sum_{z \in Z_{k}} \sum_{j = 1}^{J_{k | k - 1}} w_{k}^{j} N (X; t_{k | k - 1}^{j}, m_{k}^{j}, P_{k}^{j}) . \end{matrix}

(38)

After the feature fusion based on XGBoost, we can obtain the association probability

{\hat{q}}_{k}^{j} (z)

between trajectory j and measurement z for each time step k, which is more discriminative for closely spaced targets. By replacing

{\hat{q}}_{k}^{j} (z)

and

p_{d, k}^{j}

into Equation (16), we acquired the modified weights in the update step:

w_{k}^{j} = \frac{p_{d, k}^{j} w_{k | k - 1}^{j} {\hat{q}}_{k}^{j} (z)}{K_{C} (z) + p_{d, k}^{j} \sum_{i = 1}^{J_{k | k - 1}} w_{k | k - 1}^{i} {\hat{q}}_{k}^{j} (z)} .

(39)

With this update, the GM-TPHD filter can maintain the predication state for targets with low

p_{d, k}^{j}

and acquire larger weights for correct components, thereby achieving stable target tracking.

4. Network Training and Performance Analysis

In this section, the real-world echo dataset is constructed and further used in the synthetic multi-feature. Based on these datasets, the effectiveness of the DTCN and XGBoost models in improving the association performance is analyzed.

4.1. Dataset Construction

4.1.1. Real-World Echo Dataset

To collect real-world bird flock data, the research team carried out experiments with a monopulse radar [32] in Dongying, Shandong Province of China, as shown in Figure 2. It is located on one of the major migratory routes in eastern China, which is appropriate for collecting the radar data of bird flocks.

The radar operates in the Ka-band, with a beamwidth of approximately 0.23° and an angular measurement accuracy of 1/20 of the beamwidth. A stepped-frequency waveform is employed with a synthetic bandwidth of 1 GHz. The pulse width of each sub-pulse is 4 μs, with a pulse repetition period (PRP) of 40 μs. 10 sub-pulses are synthesized into a single high-resolution range profile (HRRP). The radar accumulates echoes over a CPI of 0.04 s, followed by CFAR detection and measurement extraction based on the RD plane.

Track feature samples were constructed based on real-world data of migratory bird flocks. The dataset includes continuous observation scenarios, with the number of targets varying from 1 to 10 at any given time. Measurements of resolvable multiple targets were extracted and associated through the nearest neighbor association method. Trajectories of verifiable correctness were used to extract the Doppler echo slices.

Subsequently, trajectories were divided into segments of a fixed length L using a sliding window. Positive samples were generated from each valid track segment. For negative samples, the Doppler echo slice of the track segment in the last time step is replaced by that of their neighboring targets.

To account for clutter interference from other aerial targets, radar echoes from insects and DJI Phantom 4 drones were introduced. The sequences of Doppler echo slices for bird targets, insects, and DJI Phantom 4 drones over 20 CPIs are shown in Figure 3, highlighting significant differences in their micro-Doppler characteristics. The clutter echoes were randomly paired with echoes of birds to form negative sample pairs.

Data within a 5 m/s velocity range around the average main Doppler were cropped and pre-processed, as described in Section 3.1.1. The dataset is split into training, validation, and test sets at a ratio of 7:1:2, where the partitioning is performed at the trajectory level to ensure that samples derived from the same trajectory are not shared across different subsets. Sliding-window segmentation is then independently applied within each subset to generate fixed-length samples. The final dataset consists of 20,293 samples with a positive-to-negative ratio of approximately 1:3.

4.1.2. Synthetic Multi-Feature Dataset

Since the ground truth of track positions in real-world data is often unavailable, we generated simulated group target measurements and combined them with the real radar echoes to construct four features, as described in Section 3.2. The simulated group targets are arranged in a line formation with a spacing of 3 m, with measurement noise levels set at 0.5 m and 1 m. The dataset maintains a 1:3 positive-to-negative ratio. Additionally, samples with missing Doppler features were included, accounting for 30% of the total dataset.

4.2. Performance Analysis of Networks

Association performance of the DTCN and XGBoost models is analyzed based on the test dataset. The association performance metrics include the accuracy metric, the probability of correct association (PCA), and the probability of false association (PFA), defined as in Equations (40)–(42), respectively.

A c c = \frac{N_{c a} + N_{c r}}{N_{p o s} + N_{n e g}},

(40)

P C A = \frac{N_{c a}}{N_{p o s}},

(41)

P F A = \frac{N_{f r}}{N_{n e g}},

(42)

Here

N_{c a}

is the number of correctly associated positive samples,

N_{c r}

is the number of correctly rejected negative samples,

N_{f r}

is the number of false rejected negative samples, and

N_{p o s}

and

N_{n e g}

the total number of positive and negative samples, respectively.

4.2.1. DTCN Model

The parameters of the DTCN model are set as follows. The input dimension of the network is 50. Two 1D-CNN layers are configured with a kernel length of 3 and output channels of 16 and 32, respectively. The Bi-LSTM for feature prediction has a hidden state dimension of 64. The multi-head attention mechanism uses H = 8 heads. The network is trained using the Adam optimizer with a batch size of 64, a learning rate of 0.01, and a maximum of 150 epochs. In the training stage, the input sequence is masked at a rate of 0.2 to imitate the missed detections, thereby improving the robustness of the network.

Based on the real-world echo dataset, the association performance of the network was analyzed under different track segment lengths L and different detection probabilities. As shown in Table 1, the performance is optimal at L = 14 when the detection probability is 1.0 and 0.6. The performance is optimal at L = 10 when the detection probability is 0.8. The accuracy of L = 10 is slightly lower than that of L = 14, but is better than in other cases.

The runtime for the DTCN feature extraction process under different L is shown in Figure 4, which is evaluated on the NVIDIA GeForce RTX 3090. It can be seen that the runtime is proportional to the target number. When there are fewer targets, the runtime for the DTCN can be controlled in the order of ms. The runtime for feature extraction of the tracks is also proportional to the track segment length. On the other hand, a longer track segment length means a longer period of stable tracking, which is challenging in low detection probability scenarios. Therefore, L = 10 was selected as a balance between performance and latency.

Under this network configuration, when the detection probability is 1, the association accuracy for the track–measurement association is 87.34%, and the accuracy for the target birth probability estimation module is 99.88%.

4.2.2. XGBoost-Based Fusion Model

The XGBoost (version 3.2) parameters are configured with a maximum tree depth of 5 and a learning rate of 0.1. The maximum epoch is 200. Considering the imbalance in the sample size, the weight of the positive samples is set to 3. The regularization terms are

γ = 0.5

and

λ = 0.5

. Feature importance is analyzed in Figure 5, where f0, f1, f2, and f3 represent the four features

d_{K}^{x}

,

d_{K}^{x, m i n}

,

R_{K}^{d o p}

and

d_{K}^{d o p}

, respectively. This showed that the position difference is the most critical feature, followed by the nearest measurement distance. Doppler features showed lower relative importance in scenarios where position differences were already distinct. This is consistent with the practical tracking scenario where positional information dominates under high-precision measurements, while Doppler features mainly provide complementary discriminative cues when kinematic information becomes ambiguous.

We denote the proposed feature fusion method with four features as X-f4, and denote the method with the first two features as X-f2. Assuming that the association is with the output probability of larger than 0.5. These methods are compared with a fixed threshold association method with a threshold of 2 m. When the measurement distance is smaller than 2 m, it signifies association. The CSX method [24] is also included in the comparison as a state-of-the-art method, which fuses kinematic information, radar echo similarity, and signal-to-noise ratios (SNRs) using the XGBoost.

The performance comparison is shown in Table 2. As measurement noise increases, the association performance all degrade. The CSX method has better performance than the fixed threshold method, but is worse than the X-f2 method. This indicates that the CSX method has limited performance for the association of closely spaced birds. As the bird echoes vary significantly over time due to wingbeat behavior, it is difficult to calculate similarity based on a single frame of echo, thereby confusing the association results. By contrast, the proposed X-f4 method maintains better association performance than other methods. This demonstrates that Doppler features provide efficient complementary information for robust data association in challenging scenarios.

5. Simulation and Experimental Results

This section evaluates the tracking performance of the proposed method. The tracking performance is first analyzed using synthetic data that combines simulated tracks with radar echo, and then is verified using experimental bird flock data.

The parameters of the proposed tracking method are given as follows. The detection probability within the target field of view is

p (d | B) = 0.99

. The survival probability is 0.99. The clutter intensity is

K_{C} = 0.01

, and the new component weight is

w_{β} = 0.01

. The pruning threshold is 0.001, the absorption threshold is 1.5, and the maximum number of components is 30. The scan length of TPHD is set to 5. The comparison algorithm is the original TPHD algorithm, whose detection probability is a constant

p_{d} = 0.8

. All other parameters are identical to the proposed method.

In the following simulations, the roles of each module of the proposed method are analyzed through ablation experiments. The methods using the X-f2 and the X-f4 fusion model are called TPHD-X and TPHD-DX, respectively. Further, the TPHD-DX with the adaptive detection probability mechanism is called TPHD-DX-A1, and the TPHD-DX with both of the two adaptive mechanisms is called TPHD-DX-A2.

5.1. Simulation Results

To quantitatively evaluate tracking performance, simulated target tracks were added to a set of bird flock radar data from the test dataset constructed in Section 4.1.1, along with clutter data randomly selected from the clutter data test dataset. The simulation scenario includes seven birds, with the overall group movement following the CV model and the individual movement following the formation flight model in [33]. The target spacing is 3 m, and the initial group speed is 30 m/s. Subsequently, the target measurements are generated under the limited radar beamwidth. The sampling interval is set to 0.04 s, which is the same as the radar CPI. The tracking lasts for 100 time steps. The beamwidth is set to 0.2°, and the radar beam center is aligned with the center target of the group. The detection probability of targets within the beam is 0.99. The measurement noise follows a Gaussian distribution with a standard deviation of 0.8 m. The clutter at each moment is randomly generated within the beamwidth. The number of clutter points follows a Poisson distribution, and the clutter density is set to

5 \times 10^{- 4}

.

It should be noted that incorrect associations may occur frequently in dense target scenarios, resulting in inconsistent trajectories, also represented as track splitting. Ignoring these inconsistent trajectories will lead to more missed targets. Therefore, in the following analysis, we retain all trajectories with non-overlapping states of more than 10 time steps. This threshold corresponds to the time required for a trajectory to achieve stable tracking, thereby ensuring reliable track confirmation while suppressing transient false trajectories.

The tracking results of GM-TPHD and the proposed method are shown in Figure 6a and Figure 6b, respectively. As can be seen from the figures, the original GM-TPHD method has several incorrect associations with nearby targets and forms several false tracks based on clutter. It also has track breakage times of k = 65~67 and k = 76~84 because of the frequent missed detections for the targets on the beam edge. In contrast, the proposed method achieves stable tracking. Comparison of the cardinality is shown in Figure 7. The estimated cardinality of the original GM-TPHD method deviates from the truth at several time steps. The proposed method has more accurate target estimation and is able to achieve more stable tracking.

We conducted 100 Monte Carlo (MC) experiments to analyze the tracking performance. At each time step k, the error between the truth value and the estimated value of the alive trajectory set was evaluated. The evaluated metrics include the optimal sub-pattern assignment (OSPA) distance [34], and the trajectory metric (TM) based on linear programming in [35], with parameters p = 2, penalty distance c = 2.5 m, and γ = 2. The trajectory metric consists of four components:

c_{l}

,

c_{m}

,

c_{f}

, and

c_{s}

, representing the localization cost of properly detected targets, missed target cost, false target cost, and track switching cost, respectively. Each cost was normalized using the time window length.

\begin{matrix} d^{2} (X^{k}, {\hat{X}}^{k}) = c_{l}^{2} (X^{k}, {\hat{X}}^{k}) + c_{m}^{2} (X^{k}, {\hat{X}}^{k}) \\ + c_{f}^{2} (X^{k}, {\hat{X}}^{k}) + c_{s}^{2} (X^{k}, {\hat{X}}^{k}) \end{matrix}

(43)

In the case when tracks have large errors in the initialization stage, the error accumulates and affects the metric of the subsequent tracking performance. However, the track initialization performance is not the focus of this paper. Therefore, the TM will be calculated starting from the 10th time step in the following experiments.

The average OSPA distance during tracking is shown in Figure 8. There is a large error at the time steps 45 and 53 when the targets enter the beam, but the error drops significantly afterwards. This is because the weight assigned to the birth component is small at the beginning of the track, and it is not extracted as a trajectory until it is updated by subsequent measurements. There are three peaks from time steps 65 to 84, indicating more track breakages caused by missed detections. However, the proposed methods converge faster after target birth and missed detections, and achieve smaller OSPA distances than the original GM-TPHD method.

The average TM error during tracking is shown in Figure 9. As the tracking time increases, the accumulation of trajectory errors leads to a gradual increase in the TM. It is due to the increase in the target number in the later stage of tracking, which brings larger tracking errors. It can be seen that the original GM-TPHD method has the largest tracking error, reflected in both OSPA distance and TM error. By introducing the XGBoost and the DTCN module, the tracking errors of the TPHD-X and TPHD-DX methods decrease successively. Further, by employing the adaptive detection probability and adaptive target birth mechanisms, the tracking errors of the TPHD-DX-A1 and TPHD-DX-A2 methods decrease successively.

The comparison of each TM component is shown in Figure 10. It can be seen that the proposed method has better performance than the original GM-TPHD method in all components. The TPHD-X method uses XGBoost to fuse the kinematic information of candidate measurements, enhancing the discrimination between different measurements and achieving lower tracking errors. The TPHD-DX method further incorporates Doppler features, which reduces incorrect associations and, thereby, significantly decreases false targets and track switches, as shown in Figure 10b,d.

Due to the missed detections, trajectories are prone to having delayed track initiation or premature track termination, which increases the missed target cost in Figure 10c. By introducing the adaptive detection probability, the TPHD-DX-A1 method significantly reduces the missed target cost. It can better adapt to the missed detections of targets at the beam edge, improving the stability of target tracking. But it slightly increases the false target cost at the same time. The TPHD-DX-A2 method further incorporates the adaptive target birth mechanism to suppress false track initiation in cluttered environments, thereby reducing the false target cost in Figure 10b. It can also reduce false association between targets and clutter, thereby reducing the missed target cost in Figure 10c.

To sum up, the proposed TPHD-DX-A2 method has better association performance and is less affected by missed detections and false alarms, thus exhibiting the smallest trajectory metric error.

Furthermore, the runtime was evaluated in the above scenario. The original TPHD method needs 0.451 s per CPI, while the proposed TPHD-DX-A2 method needs 0.302 s per CPI. The runtime is not sufficient to support real-time processing, so they are all offline algorithms. The time consumption is mainly due to the fact that the TPHD filter maintains multiple trajectory assumptions. The original TPHD method generated more false tracks during tracking, resulting in a longer process time. In contrast, the proposed method generates fewer false tracks, which reduces the number of trajectory assumptions and thus has less runtime. According to the analysis in Section 4.2.1, the runtime of the DTCN module is in the order of ms, which is significantly lower than that of the filter. Compared with the original TPHD filter, the proposed method not only improves the operational efficiency but also enhances the tracking performance.

5.2. Experimental Results

5.2.1. Bird Flock in Line Formation

The proposed method is further evaluated using experimental data of the bird flock. The first scene consists of six birds flying in a line formation, which is captured by the photoelectric pod, as in Figure 11. By manually comparing the photoelectric video with the radar data, it can be inferred that the five birds in the yellow dotted line have been detected by the radar beam. The radar measurements and tracking results are shown in Figure 12. The tracking lasts for 100 CPIs, with a CPI of 0.04 s. The red lines represent alive tracks at time step 100, while the blue lines represent dead tracks that terminated earlier.

As shown in Figure 12a, Track 3 erroneously splits into Track 7, and the same situation occurs with Tracks 4 and 5. The occurrence of Track 7 further leads to the incorrect breakage of Track 5. In contrast, in Figure 12b, the proposed method achieves correct association during tracking, avoiding incorrect track switching, indicating that it can utilize the Doppler characteristics to distinguish adjacent targets. Targets at the beam edge have a higher rate of missed detections, leading to track fragmentation. However, the proposed method achieves more stable tracking. It can be seen that Track 9 in Figure 12b is broken into Tracks 8 and 9 in Figure 12a, and Track 8 in Figure 12b is broken into Tracks 4 and 6 in Figure 12a. This mainly benefits from the adaptive probability model of the proposed method. The corresponding cardinality estimations are shown in Figure 13. Because of the frequent missed detections of the two targets at the beam edge, the cardinality estimations are less than five most of the time. However, the proposed method has fewer track switches and more stable tracking, so it has a more accurate cardinality estimation than the original GM-TPHD.

The tracking performance is analyzed quantitatively through manual verification, with the number of track switches (NTS) [35], the cumulative number of track breakages (CNTB) [36], and the success tracking rate (STR) being considered [36]. The CNTB is the total number of frames that the true target is not assigned to the track, as defined in Equation (44), where N is the total number of true targets. The STR is defined as the proportion of true trajectories for which the tracking duration exceeds 80% of their total lifespan.

C N T B = \frac{1}{N} \sum_{n = 1}^{N} C N T B_{n}

(44)

For the TPHD method, the NTS is 2, the CNTB is 14.4, and the STR is 60%. In contrast, the NTS for the proposed method is 0, the CNTB is 9.6, and the STR is 80%. The targets have missed detections during a long period, which results in a high number of track breakages for each method. However, the proposed method demonstrates better tracking continuity and correctness.

5.2.2. Bird Flock in V Formation

In the second scene, the radar detected five birds flying in a V formation. The radar measurements and tracking results are shown in Figure 14. As shown in Figure 14a, due to the frequent missed detections of the targets at the beam edge, the tracks of the original GM-TPHD method have frequent track breakages. After the track breakage, the measurements of the targets were associated with adjacent targets, resulting in an incorrect track switch. The tracking results of the proposed method are shown in Figure 14b. There is no incorrect track switch. Among them, Tracks 1, 2, 4, and 5 achieved continuous tracking of four birds, while Track 3 was restarted as Track 6 due to a longer time track breakage. Compared with the original GM-TPHD algorithm, the proposed method has improved the stability of target tracking.

The cardinality estimations of the number of targets during the tracking process are shown in Figure 15. Due to frequent missed detections of targets, the cardinality estimations vary during tracking. However, the estimation of the proposed method is more stable than that of the original GM-TPHD method, which is closer to five targets for most of the time.

By manual verification, the NTS of the TPHD method is 4, the CNTB is 4.4, and the STR is 60%. The NTS of the proposed method is 0, the CNTB is 0, and the STR is 100%. These results demonstrate that the proposed method has better tracking performance and can be applicable under practical scenarios.

6. Conclusions

This paper presents an adaptive TPHD tracking method using Doppler features for tracking individuals within a bird flock. To improve track–measurement association performance, the neural network DTCN is constructed to learn the temporal evolution and extract micro-Doppler representation from the radar echo. The fusion strategy based on XGBoost adaptively combines Doppler features and kinematic parameters to generate reliable association probabilities, thereby enhancing the tracking performance for closely spaced targets. Meanwhile, adaptive detection probability and adaptive target birth mechanisms are applied in the GM-TPHD filter to achieve stable tracking under incomplete measurements. Performance analysis on the synthetic dataset shows that the Doppler feature and XGBoost-based fusion strategy can enhance the association performance, especially under high levels of measurement noise. Simulation results show that the proposed tracking method can achieve better tracking performance in aspects of localization, missed targets, false targets, and track switches than the original GM-TPHD method. This is also verified by the experimental data on bird flocks, demonstrating the applicability of this tracking method in practical scenarios. Additionally, the proposed method is fundamentally based on the periodic variations in micro-Doppler signatures, so it can be potentially extended to other flying targets with observable periodic motion patterns, such as different bird species and rotary-wing UAVs.

Author Contributions

Conceptualization, N.N., Z.W. and C.H.; methodology, N.N. and Y.G.; software, N.N.; validation, N.N. and Q.J.; formal analysis, Y.G. and Q.J.; investigation, Z.W. and R.W.; resources, Q.J. and W.L.; data curation, N.N.; writing—original draft preparation, N.N. and Y.G.; writing—review and editing, Q.J. and R.W.; visualization, Z.W. and W.L.; supervision, R.W.; project administration, C.H.; funding acquisition, Q.J. and W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Science and Technology Major Project under Grant 2025ZD1301600, the National Natural Science Foundation of China under Grants 62501054 and 62427808, and the China Postdoctoral Science Foundation under Grant 2025M784246.

Data Availability Statement

The data supporting the conclusions of this article are not readily available because the data are part of the ongoing studies funded by the projects mentioned above.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Flack, A.; Nagy, M.; Fiedler, W.; Couzin, I.D.; Wikelski, M. From local collective behavior to global migratory patterns in white storks. Science 2018, 360, 911–914. [Google Scholar] [CrossRef]
Shi, X.; Hu, C.; Soderholm, J.; Chapman, J.; Mao, H.; Cui, K.; Ma, Z.; Wu, D.; Fuller, R.A.; Lecours, V.; et al. Prospects for monitoring bird migration along the East Asian-Australasian Flyway using weather radar. Remote Sens. Ecol. Conserv. 2022, 9, 169–181. [Google Scholar] [CrossRef]
Metz, I.C.; Ellerbroek, J.; Mühlhausen, T.; Kügler, D.; Hoekstra, J.M. Analysis of Risk-Based Operational Bird Strike Prevention. Aerospace 2021, 8, 32. [Google Scholar]
Jiang, Q.; Wang, R.; Zhang, J.; Zhang, R.; Li, Y.; Hu, C. A Multisubobject Approach to Dynamic Formation Target Tracking Using Random Matrices. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 7334–7351. [Google Scholar] [CrossRef]
Zhang, J.; Hu, C.; Wang, R.; Jiang, Q.; Shi, M.; Xu, L.; Tian, W. An Adaptive Multivariate Approach to Dynamic Group Target Tracking Using Variational Inference. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 17584–17606. [Google Scholar] [CrossRef]
Jiang, Q.; Wang, R.; Ni, N.; Dou, L.; Hu, C. A Gaussian Mixture PHD Filter for Multitarget Tracking in Target-Dependent False Alarms. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 4808–4824. [Google Scholar] [CrossRef]
Cox, I.J.; Hingorani, S.L. An efficient implementation of Reid’s multiple hypothesis tracking algorithm and its evaluation for the purpose of visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 1996, 18, 138–150. [Google Scholar] [CrossRef]
Fortmann, T.; Bar-Shalom, Y.; Scheffe, M. Sonar tracking of multiple targets using joint probabilistic data association. IEEE J. Ocean. Eng. 1983, 8, 173–184. [Google Scholar] [CrossRef]
Mahler, R.P.S. Multitarget Bayes filtering via first-order multitarget moments. IEEE Trans. Aerosp. Electron. Syst. 2004, 39, 1152–1178. [Google Scholar]
Mahler, R. PHD filters of higher order in target number. IEEE Trans. Aerosp. Electron. Syst. 2007, 43, 1523–1543. [Google Scholar]
Zhang, Z.; Sun, J.; Zhou, H.; Xu, C. Group Target Tracking Based on MS-MeMBer Filters. Remote Sens. 2021, 13, 1920. [Google Scholar]
Panta, K.; Clark, D.E.; Vo, B.-N. Data Association and Track Management for the Gaussian Mixture Probability Hypothesis Density Filter. IEEE Trans. Aerosp. Electron. Syst. 2009, 45, 1003–1016. [Google Scholar] [CrossRef]
Reuter, S.; Vo, B.-T.; Vo, B.-N.; Dietmayer, K. The Labeled Multi-Bernoulli Filter. IEEE Trans. Signal Process. 2014, 62, 3246–3260. [Google Scholar]
Vo, B.-N.; Vo, B.-T.; Phung, D. Labeled Random Finite Sets and the Bayes Multi-Target Tracking Filter. IEEE Trans. Signal Process. 2014, 62, 6554–6567. [Google Scholar] [CrossRef]
Garcia-Fernandez, A.F.; Svensson, L. Trajectory PHD and CPHD Filters. IEEE Trans. Signal Process. 2019, 67, 5702–5714. [Google Scholar] [CrossRef]
Garcia-Fernandez, A.F.; Svensson, L.; Morelande, M.R. Multiple Target Tracking Based on Sets of Trajectories. IEEE Trans. Aerosp. Electron. Syst. 2020, 56, 1685–1707. [Google Scholar] [CrossRef]
Fang, L.; Tian, W.; Wang, R.; Zhou, C.; Hu, C. Design of Insect Target Tracking Algorithm in Clutter Based on the Multidimensional Feature Fusion Strategy. Remote Sens. 2021, 13, 3744. [Google Scholar] [CrossRef]
Yin, Y.; Cheng, D.; Dai, Y.; Chen, C.; Chen, W. Feature-aided GM-PHD Algorithm for Sea-surface Target Tracking. In Proceedings of the 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 23–25 October 2020; pp. 220–225. [Google Scholar]
Tao, J.; Jiang, D.; Yang, J.; Zhang, C.; Wang, S.; Han, Y. Multi-Feature Matching GM-PHD Filter for Radar Multi-Target Tracking. Sensors 2022, 22, 5339. [Google Scholar] [CrossRef]
Zheng, S.; Jiang, L.; Yang, Q.; Zhao, Y.; Wang, Z. Adaptive PHD Filter With RCS and Doppler Feature for Space Targets Tracking via Space-Based Radar. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 3750–3765. [Google Scholar] [CrossRef]
Wang, J.; Xu, B.; Zhang, Z.; Jin, B. RCS–Doppler-Assisted MM-GM-PHD Filter for Passive Radar in Non-Uniform Clutter. Sensors 2025, 25, 5864. [Google Scholar] [CrossRef]
Sadeghian, A.; Alahi, A.; Savarese, S. Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 300–311. [Google Scholar]
Brasó, G.; Leal-Taixé, L. Learning a Neural Solver for Multiple Object Tracking. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6246–6256. [Google Scholar]
Gao, C.; Yan, J.; Chen, B.; Varshney, P.K.; Jia, T.; Liu, H. Data association for maneuvering targets through a combined siamese network and XGBoost model. Signal Process. 2023, 211, 109086. [Google Scholar] [CrossRef]
Wang, J.; Li, S.; Shi, K. Radar target tracking based on motion characteristic and distribution pattern matching. Signal Process. 2025, 236, 110034. [Google Scholar] [CrossRef]
Chen, X.; Wang, Y.; Zang, C.; Wang, X.; Xiang, Y.; Cui, G. Data-Driven Intelligent Multiframe Joint Tracking Method for Maneuvering Targets in Clutter Environments. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 2679–2702. [Google Scholar]
Chen, V.C. The Micro-Doppler Effect in Radar; Artech House: Norwood, MA, USA, 2011. [Google Scholar]
Farshchian, M.; Selesnick, I.; Parekh, A. Bird body and wing-beat radar Doppler signature separation using sparse optimization. In Proceedings of the 2016 4th International Workshop on Compressed Sensing Theory and Its Applications to Radar, Sonar and Remote Sensing (CoSeRa), Aachen, Germany, 19–22 September 2016; pp. 71–74. [Google Scholar]
Song, Q.; Huang, S.; Zhang, Y.; Chen, X.; Chen, Z.; Zhou, X.; Deng, Z. Radar Target Classification Using Enhanced Doppler Spectrograms with ResNet34_CA in Ubiquitous Radar. Remote Sens. 2024, 16, 2860. [Google Scholar] [CrossRef]
Rahman, S.; Robertson, D.A. Radar micro-Doppler signatures of drones and birds at K-band and W-band. Sci. Rep. 2018, 8, 17396. [Google Scholar] [CrossRef]
Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar]
Hu, C.; Yan, Y.; Wang, R.; Jiang, Q.; Cai, J.; Li, W. High-resolution, multi-frequency and full-polarization radar database of small and group targets in clutter environment. Sci. China Inf. Sci. 2023, 66, 227301. [Google Scholar] [CrossRef]
Vásárhelyi, G.; Virágh, C.; Somorjai, G.; Tarcai, N.; Szörenyi, T.; Nepusz, T.; Vicsek, T. Outdoor flocking and formation flight with autonomous aerial robots. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; pp. 3866–3873. [Google Scholar]
Schuhmacher, D.; Vo, B.-T.; Vo, B.-N. A Consistent Metric for Performance Evaluation of Multi-Object Filters. IEEE Trans. Signal Process. 2008, 56, 3447–3457. [Google Scholar]
Garcia-Fernandez, A.F.; Rahmathullah, A.S.; Svensson, L. A Metric on the Space of Finite Sets of Trajectories for Evaluation of Multi-Target Tracking Algorithms. IEEE Trans. Signal Process. 2020, 68, 3917–3928. [Google Scholar] [CrossRef]
Tian, W.; Fang, L.; Wang, R.; Li, W.; Zhou, C.; Hu, C. A robust tracking method focusing on target fluctuation and maneuver characteristics. Sci. China Inf. Sci. 2022, 65, 212302. [Google Scholar] [CrossRef]

Figure 1. The framework of the DTCN.

Figure 2. Experimental scenarios. (a) Monopulse radar. (b) Bird flock.

Figure 3. The sequence of Doppler echo slices for the different kinds of targets. The colorbar shows the logarithmic power in dB. (a) Birds. (b) Insects. (c) Drones.

Figure 4. Runtime for DTCN feature extraction process of measurements and tracks.

Figure 5. Feature importance comparison.

Figure 6. The track results of different tracking methods for the synthetic data. The solid line represents the tracking results, and the first frame of the track is marked with a square. (a) The original GM-TPHD. (b) The proposed method.

Figure 7. The cardinality estimation of the different tracking methods for the synthetic data.

Figure 8. Average OSPA distance.

Figure 9. Average trajectory metric error.

Figure 10. Average error of TM components. (a) Localization cost. (b) False target cost. (c) Missed target cost. (d) Track switching cost.

Figure 11. A bird flock in a line formation captured by the photoelectric pod. The birds in the yellow dotted line have been detected by the radar beam.

Figure 12. The tracking results of the different methods for the birds in the line formation. The trajectories are shown as solid lines, with the first and last frames marked with a square and their track ID, respectively. (a) The original GM-TPHD. (b) The proposed method.

Figure 13. Cardinality estimations for the birds in the line formation.

Figure 14. The tracking results of the different methods for the birds in the V formation. The trajectories are shown as solid lines, with the first and last frames marked with a square and their track ID, respectively. (a) The original GM-TPHD. (b) The proposed method.

Figure 15. Cardinality estimations for the birds in the V formation.

Table 1. The accuracy metric of the network under different track segment lengths.

Detection Prob.	L = 8	L = 10	L = 12	L = 14
1.0	86.14%	87.34%	86.63%	87.94%
0.8	84.73%	86.35%	85.69%	86.22%
0.6	83.59%	84.55%	83.46%	85.01%

Note: The best results for each detection probability are shown in bold.

Table 2. Comparison of association performance under different noise levels.

Noise Level	Method	PCA	PFA	Accuracy
0.5 m	Fixed-threshold	100%	0.46%	99.66%
	CSX	99.9%	0.46%	99.64%
	X-f2	99.9%	0.21%	99.82%
	X-f4	99.9%	0.18%	99.84%
0.8 m	Fixed-threshold	95.08%	1.88%	97.48%
	CSX	95.98%	1.68%	97.71%
	X-f2	97.19%	0.94%	98.57%
	X-f4	97.59%	0.39%	99.09%
1.0 m	Fixed-threshold	85.06%	3.11%	93.78%
	CSX	87.21%	2.63%	94.69%
	X-f2	91.50%	1.94%	96.34%
	X-f4	92.53%	1.35%	97.04%

Note: The best results for each noise level are shown in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ni, N.; Guo, Y.; Wang, Z.; Jiang, Q.; Li, W.; Wang, R.; Hu, C. Adaptive TPHD Tracking for Individuals Within a Bird Flock Using Doppler Features. Remote Sens. 2026, 18, 1538. https://doi.org/10.3390/rs18101538

AMA Style

Ni N, Guo Y, Wang Z, Jiang Q, Li W, Wang R, Hu C. Adaptive TPHD Tracking for Individuals Within a Bird Flock Using Doppler Features. Remote Sensing. 2026; 18(10):1538. https://doi.org/10.3390/rs18101538

Chicago/Turabian Style

Ni, Na, Yuhang Guo, Zhiqin Wang, Qi Jiang, Weidong Li, Rui Wang, and Cheng Hu. 2026. "Adaptive TPHD Tracking for Individuals Within a Bird Flock Using Doppler Features" Remote Sensing 18, no. 10: 1538. https://doi.org/10.3390/rs18101538

APA Style

Ni, N., Guo, Y., Wang, Z., Jiang, Q., Li, W., Wang, R., & Hu, C. (2026). Adaptive TPHD Tracking for Individuals Within a Bird Flock Using Doppler Features. Remote Sensing, 18(10), 1538. https://doi.org/10.3390/rs18101538

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive TPHD Tracking for Individuals Within a Bird Flock Using Doppler Features

Highlights

Abstract

1. Introduction

2. Problem Formulation

3. Adaptive TPHD Tracking Using Doppler Features

3.1. Doppler Temporal Contrastive Network

3.1.1. Input Data Preprocessing

3.1.2. Doppler Feature Prediction and Contrastive Learning

3.1.3. Target Birth Probability Estimation

3.1.4. Multi-Task Loss Function

3.2. XGBoost-Based Feature Fusion

3.3. Adaptive TPHD Under Incomplete Measurements

3.3.1. Adaptive Detection Probability

3.3.2. Adaptive Target Birth

3.3.3. GM-TPHD Update

4. Network Training and Performance Analysis

4.1. Dataset Construction

4.1.1. Real-World Echo Dataset

4.1.2. Synthetic Multi-Feature Dataset

4.2. Performance Analysis of Networks

4.2.1. DTCN Model

4.2.2. XGBoost-Based Fusion Model

5. Simulation and Experimental Results

5.1. Simulation Results

5.2. Experimental Results

5.2.1. Bird Flock in Line Formation

5.2.2. Bird Flock in V Formation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI