A Group Target Tracking Method for Unmanned Ground Vehicles Based on Multi-Ellipse Shape Modeling

Yu, Youjin; Li, Junxiang; Wu, Tao

doi:10.3390/drones9090620

Open AccessArticle

A Group Target Tracking Method for Unmanned Ground Vehicles Based on Multi-Ellipse Shape Modeling

by

Youjin Yu

,

Junxiang Li

^*

and

Tao Wu

^*

College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410028, China

^*

Authors to whom correspondence should be addressed.

Drones 2025, 9(9), 620; https://doi.org/10.3390/drones9090620

Submission received: 18 July 2025 / Revised: 28 August 2025 / Accepted: 2 September 2025 / Published: 3 September 2025

(This article belongs to the Topic Target Tracking, Guidance, and Navigation for Autonomous Systems, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

Enhanced tracking accuracy. The proposed method achieved a remarkable reduction in orientation error, decreasing it by 86.13% compared to single-target tracking and by 54.79% relative to the shapeless modeling method.
Robustness in complex scenarios. The algorithm maintained stable performance under severe occlusions, environmental disturbances, and dynamic changes in member composition, outperforming Gaussian process-based and single-ellipse methods.

What are the implications of the main findings?

Paradigm shift to group tracking. The approach shifts the operating paradigm from fragile single-target following to stable group target tracking, greatly reducing mission-interrupting target losses and improving patrol and rescue efficiency.
Flexible formation accommodation. By modeling the squad with multiple ellipses, the framework can accurately describe arbitrary and changing formations, giving squads more spatial flexibility.

Abstract

For unmanned ground vehicles in squad mission support systems (SMSS-UGVs), tracking the entire squad as a group, rather than focusing on individual members, can effectively mitigate issues such as target loss caused by occlusion and environmental interference. However, most existing group target tracking methods are designed for extended targets, which typically assume a rigid and unchanging shape. In contrast, pedestrian groups in SMSS-UGV scenarios exhibit inconsistent motions among members, resulting in continuous changes in the overall group shape. To address this challenge, this paper proposes a group target tracking method specifically tailored for SMSS-UGVs in pedestrian tracking scenarios. We introduce a tracking framework that incorporates a data selection mechanism based solely on positional information, enabling robust handling of dynamic group composition through adaptive shape modeling. Furthermore, a novel group target tracking method based on multi-ellipse shape modeling (ME-CGT-UGV) is presented, which effectively captures complex and evolving group formations. The experimental results show that the proposed method reduces orientation error by 86.13% compared to single-target tracking and by 54.79% compared to shapeless modeling methods. It also maintains strong performance under challenging conditions, including occlusions, environmental disturbances, sharp turns, and formation changes. These findings indicate that the proposed approach significantly enhances the effectiveness and operational reliability of SMSS-UGVs in real-world applications.

Keywords:

group target tracking; multi-ellipse shape modeling; pedestrian tracking

1. Introduction

Unmanned ground vehicles play an increasingly vital role in civilian safety and emergency operations, including urban patrols, disaster relief, and event security. They serve as transport platforms for medical and communication equipment, alleviating the human operational burden and enhancing human–robot collaboration. Equipped with autonomous capabilities—such as pedestrian following, path tracking, and scouting—UGVs improve both patrol efficiency and personnel safety while maintaining situational awareness. Notably, the pedestrian-following function uses onboard cameras to track a specific team member under predefined criteria, offering reliable support in dynamic settings [1].

In patrol or rescue contexts, a squad typically operates as a cohesive unit with common goals, exhibiting regularities in motion and spatial structure. However, the current pedestrian-following systems focus primarily on single-target tracking [2,3,4]. When a UGV tracks one individual within a visually similar group, it encounters difficulties arising from non-rigid formations, member spacing variation, frequent occlusion, intersecting paths, and similar appearances [5,6]. These challenges often result in target switching or loss, undermining follow continuity and mission effectiveness. It is therefore essential to develop UGVs that track the entire squad as a unified entity, improving tracking stability, reducing reliance on any single target, and facilitating team coordination during operations.

In complex group tracking scenarios where targets are indistinguishable due to proximity, the centroid group tracking (CGT) algorithm is commonly used to represent the group by its centroid as a single entity [7,8,9]. Chen et al. [10] demonstrated the advantages of CGT in dense and cluttered environments, such as bird detection near airports. Zhang et al. [11] further improved CGT by integrating clustering techniques to compute the group center under measurement noise, enhancing tracking accuracy. Despite these advances, such methods focus mainly on motion state estimation and neglect the explicit modeling of group shape.

To address shape representation, Koch et al. [12] proposed the random matrix model, which describes the target shape as an ellipse embedded directly in the state and measurement equations, enabling simultaneous tracking of kinematics and extent. The subsequent developments include the GGIW-PHD filter [13,14,15,16,17,18,19] and improved adaptability to variations in target number, measurement uncertainty, and shape evolution. However, a single ellipse often fails to capture complex structures. Alternative shape modeling approaches include multi-ellipse random matrix models [20,21,22,23], Gaussian process-based shape estimation [24,25,26,27], and random hypersurface models [28,29,30]. Although Gaussian processes and hypersurface models offer flexibility in representing arbitrary shapes, they rely on radial function formulations that are computationally expensive and highly sensitive to shape variations.

Existing studies [31,32,33,34,35,36,37,38,39,40] have primarily focused on extended targets—where multiple detections originate from a single continuous object with a relatively stable shape. When applied to SMSS-UGVs, methods designed for extended targets face new challenges. Dynamic membership changes (e.g., individuals entering or leaving the group) frequently alter the group’s motion pattern and spatial structure, necessitating continuous adaptation of the tracking strategy and leading to instability. Furthermore, although inter-member occlusions are less critical in single-target tracking, severe occlusions can significantly degrade the group observation quality, resulting in inaccurate motion estimates, erratic trajectory updates, and ultimately impaired tracking robustness.

To address the challenges posed by severe occlusions, dynamic changes in member composition, and complex formations of squads, this paper presents a group target tracking method designed for SMSS-UGVs:

(1): To manage the dynamic changes in member composition, we propose a group target tracking framework based on a data selection mechanism that relies solely on positional information. This approach filters measurements via shape modeling, minimizing disturbance from non-group individuals.
(2): To address the dynamic changes in collective shape and the complex geometry of SMSS-UGVs, we propose a multi-ellipse combined shape modeling approach that effectively captures the intricate distribution of the collective, enabling precise modeling.
(3): We conduct experiments in simulated and real-world environments to validate the method’s effectiveness. The experimental results show that the proposed ME-CGT-UGV surpasses single-target tracking techniques by 86.13% in terms of orientation error and reduces error by 54.79% compared to the shapeless modeling method.

2. Materials and Methods

This section first outlines the task of tracking a squad for SMSS-UGVs, and then introduces a group target tracking framework that incorporates a data selection mechanism based solely on positional information.

2.1. Problem Definition

In the field of group target tracking, researchers typically focus on simultaneously tracking multiple groups to meet the diverse needs of various application scenarios. However, for the specific application scenario of the squad mission support system, the task characteristics dictate that the SMSS-UGV needs to concentrate only on following the movement of a single group. Therefore, this study focuses on the tracking task of a single group to align with the practical application requirements of the squad.

In this task, the only available information consists of the positional data of the observed targets. While some studies employ sensors that capture not only position but also attributes such as orientation and velocity to enhance measurement accuracy, such capabilities are not feasible under our current framework. Specifically, under conditions of sparse or noisy observations, maintaining stable tracking becomes considerably challenging. For instance, Zhang et al. [33] successfully achieved tracking using exclusively angular information. In the context of SMSS-UGVs, sensors such as LiDAR and cameras typically provide only positional data. Moreover, due to the dense spatial distribution of pedestrians and frequent occlusion among pedestrians, reliably estimating motion information across consecutive frames is particularly difficult. As a result, this paper relies exclusively on positional information.

For the observation values

Z = {z_{t}^{1}, z_{t}^{2}, \dots, z_{t}^{n}}

at time t, the measurement model of single-group, multi-group, and group target tracking proposed in this paper for SMSS-UGVs can be represented as follows:

Z_{s} = {z_{t}^{i} \in Z ∣ \forall z_{t}^{i}},

(1)

Z_{m} = {z_{t}^{i} \in Z ∣ l (z_{t}^{i}) = G_{j}},

(2)

Z_{k} = {z_{t}^{i} \in Z ∣ h (z_{t}^{i}) = 1},

(3)

where l and h represent the mapping functions for multi-group target tracking and the current task, respectively. In single-group target tracking, all observation data

z_{t}^{i}

is linked to a single target group, enabling effective tracking by integrating these observations. In contrast, multi-group target tracking requires each observation to be assigned to the appropriate tracked group

G_{j}

, making the task more complex due to inter-group ambiguity. In the SMSS-UGV application scenario, only one group needs to be continuously tracked. However, due to the dynamic nature of group members, which may include individuals leaving or joining, the composition of the group is not fixed. Therefore, the selection function

h (z_{t}^{i})

is necessary to maintain the membership of the group dynamically, ensuring more stable and accurate tracking.

2.2. Framework

The group target tracking framework for SMSS-UGVs proposed in this paper is shown in Figure 1. The framework mainly includes the processes of initialization, data selection, state prediction, and extended shape modeling, among which initialization only takes effect at the initial stage. In contrast, in multi-group target tracking, the combination and splitting of groups also need to be considered, as shown in the red area in Figure 1.

2.2.1. Initialization

At the initial moment, multiple pedestrians are detected at various locations within the vehicle-mounted sensor’s range. To group these individuals and establish an initial tracking group, a clustering algorithm is employed. Although the K-means algorithm [41] is commonly employed in such scenarios, its requirement of a predefined number of clusters presents significant challenges in this context. In contrast, the DBSCAN algorithm [42], which is a density-based clustering method, offers a robust solution by identifying high-density regions through the definition of “core points” and “neighborhoods” without requiring predefined cluster counts. Specifically, the DBSCAN algorithm determines core points by evaluating the neighborhood of each point, forming clusters through the connection of these core points, and thereby effectively determining the number of clusters automatically. Furthermore, its robustness against noisy data makes the DBSCAN algorithm particularly well-suited for processing pedestrian location information captured by vehicle-mounted sensors. Consequently, this paper adopts the DBSCAN algorithm to effectively cluster dispersed pedestrians at the initial moment and determine the initial tracking group.

2.2.2. Data Selection

Data selection is crucial for addressing disturbance and the dynamic changes in the member composition of the group. In this study, we utilize positional data as our exclusive information source for selection. Our methodology involves predicting the extended shape of the pedestrian through an iterative process that includes shape modeling (as detailed in Section 3), filtering measurements based on this shape, and subsequently refining the shape further. The detailed selection process is as follows.

First, we use the odometry information to transform the pedestrian position from the vehicle coordinate to a unified coordinate system. For the pedestrian position

z_{k}^{i^{'}}

in each frame, the required measurement value

z_{k}^{i}

is obtained through Equation (4).

z_{k}^{i} = R_{r o t a t e} * z_{k}^{i^{'}} + T,

(4)

where

R_{r o t a t e}

represents the rotation matrix and T represents the translation matrix.

According to the spatial coherence of the group, pedestrians belonging to the group should lie within or near the predicted spatial extent of the group in each new frame. Let

(x_{k}^{c^{'}}, y_{k}^{c^{'}})

denote the predicted center of the group. For each measurement

z_{k}^{i} = (x_{k}^{i}, y_{k}^{i})

, its distance to this predicted center is computed as

d_{k}^{i} = \sqrt{{(x_{k}^{i} - x_{k}^{c^{'}})}^{2} + {(y_{k}^{i} - y_{k}^{c^{'}})}^{2}} .

(5)

A measurement is considered a member of the group if it lies within the predicted group boundary, i.e., if

d_{k}^{i} < D_{th}

, where

D_{th}

is a threshold determining the spatial boundary of the predicted group shape. Each included measurement is assigned a weight

f (z_{k}^{i})

by a weighting function

g (d_{k}^{i})

, which may depend on the point’s distance from the center. In this work, we use a uniform weighting scheme for all points within the threshold, although other schemes are also applicable. The weighting function is formally defined as

f (z_{k}^{i}) = \{\begin{matrix} g (d_{k}^{i}) & if d_{k}^{i} < D_{th}, \\ 0 & otherwise . \end{matrix}

(6)

The updated group center

(x_{k}^{c}, y_{k}^{c})

is then computed as the weighted centroid of all associated measurements:

\{\begin{matrix} x_{k}^{c} & = \frac{\sum_{i = 1}^{n_{k}} f (z_{k}^{i}) x_{k}^{i}}{\sum_{i = 1}^{n_{k}} f (z_{k}^{i})} \\ y_{k}^{c} & = \frac{\sum_{i = 1}^{n_{k}} f (z_{k}^{i}) y_{k}^{i}}{\sum_{i = 1}^{n_{k}} f (z_{k}^{i})}, \end{matrix}

(7)

where

n_{k}

is the total number of measurements. If

\sum_{i = 1}^{n_{k}} f (z_{k}^{i}) = 0

, no measurements are associated with the group, indicating that the group target is lost.

2.2.3. State Prediction

The state prediction mainly utilizes the current observation data to update the motion state and the extended shape of the group target. For the motion state of the group center, we describe it using position and velocity, representing the state vector as

x_{k} = [x_{k}^{c}, y_{k}^{c}, {\dot{x}}_{k}^{c}, {\dot{y}}_{k}^{c}]

. We employ the Kalman filter for state estimation. The Kalman filter is a classic method in the field of target tracking, which uses the minimum mean square error as the optimal estimation criterion. Based on the system’s state-space model, it predicts the target’s possible position at the current moment using only the estimated state from the previous moment and the current observation. The Kalman filter is characterized by its unbiased prediction and stability, with a small computational load, thus meeting the real-time requirements of target tracking. The extended shape modeling approach is presented in the next section.

3. Extended Shape Modeling

The extended shape estimation of group targets plays a crucial role in defining their characteristics as it directly reflects the geometric configuration of the group in space and the area it occupies [8]. In the squad mission support system, the substantial occlusions among pedestrians can lead to significant fluctuations during the tracking process if reliant solely on measurement values. Building upon the aforementioned tracking framework, we propose a group target tracking algorithm that employs multiple ellipses to model the spatial distribution of pedestrian movements. By incorporating shape modeling, the algorithm effectively captures the group’s morphological characteristics, thereby enhancing tracking stability.

3.1. Extended Shape Modeling Based on Random Matrix

In shape modeling, W. Koch et al. [12] modeled the shape of the target as an ellipse, characterized by a symmetric positive-definite random matrix

X_{k}

that represents the size and orientation of the ellipse. Specifically, the eigenvalues and eigenvectors of

X_{k}

correspond to the semi-axis lengths and directions of the ellipse, respectively.

By introducing shape modeling, the state of the target is jointly represented by its motion state

x_{k}

and shape matrix

X_{k}

. Under the Bayesian framework, assuming that the measurement results are affected by the target size and sensor errors, the distribution of the measurement values is assumed to be

p (z_{k}^{i} | x_{k}, X_{k}) = N (z_{k}^{i} | H_{k} x_{k}, X_{k} + f R),

(8)

where

H_{k}

is the measurement matrix, R is the covariance matrix of the sensor errors, and

N

denotes the Gaussian distribution.

Under the Bayesian framework, the prior distribution of the motion state

x_{k}

and shape matrix

X_{k}

are modeled as Gaussian and Inverse Wishart distribution (

IW

), respectively. Specifically, the motion state

x_{k}

follows a Gaussian distribution:

p (x_{k}) = N (x_{k} ∣ μ_{k | k - 1}, P_{k | k - 1}),

(9)

where

μ_{k | k - 1}

is the prior mean of the motion state, and

P_{k | k - 1}

is the prior covariance matrix.

The shape matrix

X_{k}

follows an Inverse Wishart distribution:

p (X_{k}) = IW (X_{k} ∣ V_{k | k - 1}, v_{k | k - 1}),

(10)

where

V_{k | k - 1}

is the scale matrix of the Inverse Wishart distribution, and

v_{k | k - 1}

is the degrees of freedom parameter.

In the Bayesian update process, combining the measurement data

Z_{k}

, the posterior distribution

p (x_{k}, X_{k} ∣ Z_{k})

is calculated using the following formula:

p (x_{k}, X_{k} ∣ Z_{k}) \propto p (Z_{k} ∣ x_{k}, X_{k}) p (x_{k}, X_{k}),

(11)

where

p (Z_{k} ∣ x_{k}, X_{k})

is the likelihood function, representing the probability of observing the measurement values

Z_{k}

given the target state

(x_{k}, X_{k})

;

p (x_{k}, X_{k})

is the prior distribution, representing the prior knowledge of the target state before receiving the new measurement data.

3.2. Multi-Ellipse Shape Modeling

In describing complex shapes or formations, a single-ellipse model often falls short in its expressive capability. This limitation is particularly evident in scenarios such as drone or crowd formations, where precise descriptions are essential. Inspired by [34], this section explores the use of a combination of multiple ellipses to more accurately characterize the shape of complex formations. It is assumed that the target consists of L sub-targets, each of which is represented by a symmetric positive-definite matrix

X_{k}^{ℓ}

. Each measurement value

z_{k}^{i}

comes from one of the sub-objects, and Equation (8) is extended to

p (z_{k}^{i} ∣ x_{k}, X_{k}^{1 : L}, π_{k}^{1 : L}) = \sum_{ℓ = 1}^{L} π_{k}^{ℓ} N (z_{k}^{i} ∣ H_{ℓ} x_{k}, X_{k}^{ℓ} + R),

(12)

where

π_{k}^{ℓ}

represents the mixture weight of the ℓ-th sub-object, satisfying

π_{k}^{ℓ} \geq 0

and

\sum_{ℓ = 1}^{L} π_{k}^{ℓ} = 1

. The mixture weights are usually modeled as a Dirichlet distribution, which represents the distribution of probability.

For L sub-targets, their motion states are the same, but their relative positional relationships need to be dynamically updated. Therefore, the state of the target is expressed as

x_{k} = [x_{k}^{c}, y_{k}^{c}, {\dot{x}}_{k}^{c}, {\dot{y}}_{k}^{c}, η_{k}^{2}, \dots, η_{k}^{ℓ}],

(13)

where

η_{k}^{ℓ}

is the offset of the center of the ℓ-th sub-target relative to the center position.

In multi-ellipse extended shape modeling, the measurement model involves associating each measurement with one of the sub-targets that comprise the extended target. This association is achieved using latent variables and responsibility vectors, which are components of the variational Bayesian inference used to estimate the target state. The measurement value

z_{k}^{i}

is modeled as the sum of a noise-free latent variable

b_{k}^{i}

and Gaussian white noise. This relationship is expressed as

z_{k}^{i} = b_{k}^{i} + ω_{k}^{i}, ω_{k}^{i} \sim N (0, R) .

(14)

By introducing latent variables, measurement noise can be separated from actual measurements, thereby more accurately estimating the state of the target. To determine the association between the measurement values and the sub-targets, a responsibility vector

r_{k}^{i}

is defined for each measurement value, defined as follows:

r_{k}^{i, ℓ} = \{\begin{matrix} 1 & if z_{k}^{i} is associated with sub-object ℓ, \\ 0 & otherwise . \end{matrix}

(15)

where the elements of

r_{k}^{i}

satisfy

\sum_{ℓ = 1}^{L} r_{k}^{i, ℓ} = 1

.

Since the direct calculation of the posterior distribution is relatively difficult, the variational Bayesian method is used to approximate the posterior distribution. Specifically, by introducing latent variables and responsibility vectors, the complex posterior distribution is decomposed into several simple distributions:

\begin{matrix} p (b_{k}^{1 : n_{k}}, r_{k}^{1 : n_{k}}, x_{k}, X_{k}^{1 : L}, π_{k}^{1 : L} ∣ Z_{k}) \approx q_{b} (b_{k}^{1 : n_{k}}) q_{r} (r_{k}^{1 : n_{k}}) q_{x} (x_{k}) q_{X} (X_{k}^{1 : L}) q_{π} (π_{k}^{1 : L}), \end{matrix}

(16)

where

q_{b} (b_{k}^{1 : n_{k}}) = \prod_{i = 1}^{n_{k}} N (b_{k}^{i}; {\hat{b}}_{k | k}^{i}, P_{k | k}^{n, i}),

(17)

q_{r} (r_{k}^{1 : n_{k}}) = \prod_{j = 1}^{n_{k}} \prod_{ℓ = 1}^{L} {(γ_{k | k}^{i, ℓ})}^{r_{k}^{i, ℓ}},

(18)

q_{x} (x_{k}) = N (x_{k}; μ_{k | k}, P_{k | k}),

(19)

q_{X} (X_{k}^{1 : L}) = \prod_{ℓ = 1}^{L} IW (X_{k}^{ℓ}; ν_{k | k}^{ℓ}, V_{k | k}^{ℓ}),

(20)

q_{π} (π_{k}^{1 : L}) = D (π_{k}^{1 : L}; α_{k | k}^{1 : L}),

(21)

where

D (π^{1 : L}; α^{1 : L})

represents the Dirichlet distribution, with

α

being the parameters within it.

γ

is the parameter of categorical distribution. Equations (17)–(21) are calculated iteratively.

Once the update probability in the above equation is obtained, the posterior

p (x_{k}, X_{k}^{1 : L}, π_{k}^{1 : L} ∣ Z_{k})

is defined as

\begin{matrix} p (x_{k}, X_{k}^{1 : L}, π_{k}^{1 : L} ∣ Z_{k}) \approx q_{x} (x_{k}) q_{X} (X_{k}^{1 : L}) q_{π} (π_{k}^{1 : L}) . \end{matrix}

(22)

Beyond updating the shape modeling, the motion state

x_{k}

is updated through the Kalman filter:

\{\begin{matrix} μ_{k | k} = μ_{k | k - 1} + K_{k} (z_{k} - H_{k} μ_{k | k - 1}) \\ P_{k | k} = (I - K_{k} H_{k}) P_{k | k - 1}, \end{matrix}

(23)

where

K_{k}

denotes the Kalman gain, and I is identity matrix. The shape matrix

X_{k}

is updated via the Bayesian method.

In the temporal dimension, it is assumed that the target shape does not undergo significant changes over time. Therefore, the propagation formula for the shape matrix is as follows:

\{\begin{matrix} V_{k + 1 | k} & = λ_{IW} V_{k | k} \\ v_{k + 1 | k} & = λ_{IW} v_{k | k}, \end{matrix}

(24)

where

λ_{IW}

is the forgetting factor. The Dirichlet distribution parameter

α

is updated as follows:

α_{k + 1 | k} = λ_{D} α_{k | k} .

(25)

4. Simulation Experiments

4.1. Experimental Setup

In this section, we use simulated data to evaluate the effectiveness of the proposed algorithm. As the main contribution of this work lies in the tracking and decision-making methodology, we assume ideal perception inputs at this stage. Specifically, pedestrian positions are artificially generated. The performance of the system under real-world perception noise and uncertainty will be assessed in later real-world experiments.

The simulation is designed to replicate a typical civilian patrol scenario, wherein a UGV follows a group of pedestrians moving ahead. Such a configuration leverages the vehicle’s capacity for load carriage and area protection, thereby reducing direct exposure of human personnel to dangerous environments. In accordance with typical squad sizes in real patrol and rescue operations, the number of simulated pedestrians is set to vary between seven and ten.

To further increase complexity, various disturbance factors are incorporated, reflecting unpredictable elements common in real missions. The motion trajectories of the pedestrians are illustrated in Figure 2, where solid lines represent the paths of core group members, and dashed lines denote dynamic events such as members joining, departing, or introducing intentional disturbances. The trajectories encompass multiple motion patterns—including straight-line movement and turning maneuvers.

4.2. Evaluation Metrics

To comprehensively evaluate the performance of group target tracking, this paper selected three evaluation metrics: root mean square error (RMSE), orientation error (OE), and the variance of lateral distance (Lat-STD). These metrics quantify the tracking performance from different perspectives, including prediction accuracy and stability, which can be formulated as

RMSE = \sqrt{\frac{1}{N} \sum_{k = 1}^{N} {∥z_{k}^{c} - z_{k}^{g t}∥}^{2}},

(26)

OE = \frac{1}{N} \sum_{k = 1}^{N} | θ_{k}^{c} - θ_{k}^{g t} |,

(27)

\{\begin{matrix} ▵_{k} = \frac{|(z_{k}^{c} - z_{k}^{g t}) \times v_{k}|}{∥ v_{k} ∥}, \\ Lat - STD = \sqrt{\frac{1}{N} \sum_{k = 1}^{N} {(▵_{k} - ▵_{μ})}^{2}}, \end{matrix}

(28)

where

z_{k}^{g t}

and

θ_{k}^{g t}

represent the true values of position and orientation.

▵_{k}

denotes the lateral distance,

▵_{μ}

represents the mean lateral distance, and

v_{k}

is the velocity vector.

4.3. Comparison Experiments with Tracking Methods

To verify the effectiveness of the proposed tracking framework, comparative experiments are conducted in this paper. In the experiments, we considered several methods, including single-object tracking (SOT), multi-object tracking (MOT), and mean position tracking (CGT-Mean). SOT focuses on tracking a single fixed pedestrian, while MOT tracks all pedestrians within the group and determines the tracking position through decision-level fusion. The proposed method in this paper, termed ME-CGT-UGV, does not track individual targets but instead directly estimates the group center based on single-frame detection results, thereby capturing the collective motion pattern of the squad.

The results from the straight movement scenario (Figure 2a) are summarized in Table 1. The proposed ME-CGT-UGV method outperforms both SOT and MOT, achieving a significant reduction in orientation error by 86.13% compared to SOT. While MOT can achieve detailed motion states for each individual, including position and velocity information, thereby facilitating a thorough evaluation of the group’s motion, ME-CGT-UGV relies solely on predictions derived from single-frame observation results. Nevertheless, the integration of shape modeling in ME-CGT-UGV significantly enhances its tracking performance, effectively mitigating the constraints associated with single-frame observations and ultimately yielding superior results compared to MOT.

Figure 3 compares Lat-STD and OE across various tracking methods. CGT-Mean exhibits the highest Lat-STD and the most significant OE, indicating instability in forward-following scenarios where the tracking center direction is inconsistent and prone to large fluctuations. Such instability can exacerbate occlusions, creating a detrimental feedback loop. In contrast, the proposed ME-CGT-UGV method demonstrates superior performance by effectively leveraging shape modeling, achieving more stable and accurate tracking results with significantly reduced Lat-STD and OE.

4.4. Comparison Experiments with Shape Modeling Methods

In addition, we conduct a comparison of the algorithm presented in this paper with tracking algorithms that utilize various shape modeling techniques. To ensure consistency, we incorporate the data selection mechanism proposed herein into the comparative analysis. Among these methods, CGT-UGV refers to the approach that does not involve shape modeling, while GP-CGT-UGV and MGP-CGT-UGV correspond to shape modeling achieved through Gaussian processes.

The experimental results (conducted in the scenario as shown in Figure 2a) are summarized in Table 2. Our proposed algorithm achieves a reduction in orientation error of 54.79% and a decrease in Lat-STD of 50.7% in comparison to the CGT-UGV method. These findings suggest that our algorithm significantly outperforms existing methodologies.

4.5. Robustness Experiments

To evaluate the robustness of the proposed algorithm, we conducted a series of experiments under various challenging conditions, including occlusion, external disturbance, and scenarios with dynamic or complex formations, such as distributed queues with pedestrians positioned on both sides of the UGV.

4.5.1. Disturbance Experiments

To further explore the algorithm’s performance under conditions of disturbance, non-group individuals were introduced to enhance disturbance levels, thereby emulating the complex factors typically encountered in challenging scenarios. As illustrated in Table 3, an increase in the number of disturbance factors led to a significant decline in the performance of the CGT-UGV, with its RMSE rising from 1.226 m to 3.082 m and the orientation error reaching 10°. Although the RM-CGT-UGV and ME-CGT-UGV methods, which integrate shape modeling, also experienced performance degradation, the decline was notably less severe compared to the CGT-UGV, particularly with respect to orientation error.

4.5.2. Occlusion Experiments

For the occlusion scenarios, interactions between individuals were examined, with a specific criterion established to define occlusion: an individual was considered occluded if other people were positioned between them and the observing vehicle. Moreover, different angles were implemented to simulate varying degrees of occlusion.

In the occlusion experiments, the degree of occlusion was described using the occlusion range and occlusion rate. The occlusion range was a parameter set before the experiment, while the occlusion rate was calculated based on the tracking results. Even with an occlusion rate of 53%, where over half the pedestrians are occluded, the proposed ME-CGT-UGV method maintains low OE and Lat-STD, demonstrating its robustness and highlighting the advantage of integrating shape modeling with multi-ellipse representation to overcome severe occlusions.

4.5.3. Formation-Changing Experiments

This section conducts experimental investigations using complex scenarios, such as formation changing (Figure 2b,c) and turning process (Figure 2d), to evaluate the predicted shapes. For CGT-UGV, the method only considers the prediction of the motion state and does not perform explicit shape modeling. Therefore, this paper describes the shape of the group using an elliptical envelope to maintain consistency with the shape modeling method.

As illustrated in Figure 4, our three sets of experiments reveal that the RM-CGT-UGV and ME-CGT-UGV models exhibit remarkably smoother behavior in both area and directional changes. In contrast, the CGT-UGV model, which is directly fitted to the observations, suffers from significant fluctuations due to occlusions that cause varying observations over time. On the other hand, the GP-CGT-UGV and MGP-CGT-UGV models, represented by radial functions, allow for a more nuanced depiction of the group outline but are also more susceptible to observation changes, resulting in unwanted shape fluctuations. In contrast, the ellipse-based RM-CGT-UGV and ME-CGT-UGV models focus on capturing overall distribution trends. Their insensitivity to local disturbances at edge points offers a compelling advantage, contributing to their robust stability.

When considering angles, we define the main direction as the one associated with the largest eigenvalue. Notably, in the formation expansion phase, the ME-CGT-UGV model displays some initial variability but quickly reaches a stable state. Meanwhile, the ellipse direction remains adaptive, accurately reflecting changes throughout the turning process. This capability underscores the advantage of utilizing ellipse-based models for consistent performance in dynamic scenarios.

4.5.4. Complex-Formation Experiments

In practical applications, pedestrians may not be entirely concentrated together but are dispersed around the SMSS-UGV, forming different formation shapes according to mission requirements. Therefore, this section mainly analyzes the tracking effects under different formation scenarios. Figure 5 shows the visualization results of group target tracking under double-column and U-shaped formations.

The CGT-UGV, RM-CGT-UGV, and GP-CGT-UGV methods consolidate all observational data into a single shape, which may lead to a loss of detailed information when dealing with complex formations. In contrast, the MGP-CGT-UGV approach utilizes multiple shapes to capture intricate details effectively. Additionally, both the ME-CGT-UGV and MGP-CGT-UGV methods employ multiple components to depict the distribution of pedestrians across various formation shapes more accurately. Furthermore, ME-CGT-UGV describes a more stable shape, providing more precise references for SMSS-UGV.

4.6. Sensitivity Analysis

To further investigate the impact of the number of ellipses on tracking performance, this paper conducted an analysis using a double-column formation as an example. In the proposed method, the number of ellipses is set at the initialization stage and remains constant during the subsequent update process. However, the weight of each ellipse component

π_{k}^{ℓ}

is continuously updated, and the sum of all weights always remains 1. To improve the efficiency and accuracy of the model, this paper adopts a weight threshold strategy: in the initial stage, ellipse components with weights less than 1% are ignored; in the subsequent stages, this threshold is increased to 10%. As shown in Figure 6, regardless of the initial number of ellipses, the number of ellipse components eventually converges to 2. This result indicates that, during the update process, the model can effectively capture the distribution characteristics of the data and achieve more precise fitting.

5. Real-World Experiments

Simulation studies have confirmed the effectiveness of the proposed algorithm. In real-world scenarios, the complete process involves detecting pedestrians and tracking groups of pedestrians. The accuracy of the detection algorithm, along with factors such as occlusions, clutter, and other environmental disturbances, directly impact tracking performance. Therefore, we conducted field experiments to further evaluate the practical applicability of our approach.

5.1. Experimental Setup

The experimental setup utilized a UGV, as shown in Figure 7. The UGV is equipped with a 32-beam Velodyne LiDAR sensor and an Inertial Measurement Unit (IMU) designed for environmental perception and precise self-state estimation. All sensors are temporally synchronized through an onboard computer running the Robot Operating System (ROS).

The detailed algorithmic flow is provided in Algorithm 1. Initially, the LiDAR point cloud data is fed into a pedestrian detection module, which clusters points and filters out non-human objects. We then use the FAST-LIO algorithm [44] to obtain the current pose and transform the pedestrian detections from the current frame into the odometry coordinate system. Finally, the proposed group target tracking method processes these aligned measurements to update the structural information and motion states, yielding the group center at each time step.

Algorithm 1 Group Target Tracking in Real-World Conditions

1:: Input: LiDAR point cloud $P_{i}$ , IMU data $I_{i}$
2:: Output: Group center $(x_{k}^{c}, y_{k}^{c})$
3:: // Pedestrian Detection
4:: Initialize pedestrian local set $Z_{local} \leftarrow \emptyset$ , global set $Z_{global} \leftarrow \emptyset$
5:: // Clustering and Detection
6:: Cluster $P_{i}$ into pedestrian clusters ${C_{j}}_{j = 1}^{m}$
7:: for each cluster $C_{j}$ do
8:: Compute oriented bounding box $B_{j}$ and extract width $w_{j}$ , height $h_{j}$ , depth $d_{j}$
9:: Compute footprint area $A_{j} \leftarrow w_{j} \cdot d_{j}$ and aspect ratio $r_{j} \leftarrow max (w_{j}, d_{j}) / min (w_{j}, d_{j})$
10:: if dimensions within thresholds then
11:: Store centroid $(x_{j}, y_{j}, z_{j}) \leftarrow mean (C_{j})$ into $Z_{local}$
12:: end if
13:: end for
14:: // Odometry with FAST LIO
15:: Acquire robot pose $T_{k} \leftarrow FAST_LIO (P_{i}, I_{i})$
16:: for each point $(x, y, z) \in Z_{local}$ do
17:: Transform to odometry frame: $(x_{g}, y_{g}) \leftarrow T_{k} \cdot {(x, y, z)}^{⊤}$
18:: Add to global set: $Z_{global} \leftarrow Z_{global} \cup {(x_{g}, y_{g})}$
19:: end for
20:: // Group Target Tracking
21:: // Initialization
22:: if first frame then
23:: Identify the group to be tracked by spatial closeness and size criteria
24:: Retain only members of the designated group as the initial observation set $Z_{track} \subseteq Z_{global}$
25:: Split $Z_{track}$ into L sub-targets
26:: Initialize global motion parameters $M_{0}$
27:: for each sub-target $i = 1, \dots, L$ do
28:: Initialize shape parameters $S_{0}^{(i)}$
29:: end for
30:: end if
31:: // Prediction and Updating
32:: Predict shape matrix ${\hat{S}}_{k | k - 1}^{(i)}$ for L sub-target and motion state $M_{k | k - 1}$
33:: Filter observations $Z_{track}$ via predicted state to obtain valid set $Z_{k}$
34:: for each observation $z \in Z_{k}$ do
35:: Compute responsibility vector by Equation (15)
36:: Assign z to sub-target $i^{*}$
37:: end for
38:: for each sub-target i do
39:: Update $S_{k}^{(i)}$ via Equation (22)
40:: end for
41:: Update global motion $M_{k}$ via Equation (23)
42:: // Returning Results
43:: $(x_{k}^{c}, y_{k}^{c}) \leftarrow mean of all sub - target centroids$
44:: return $(x_{k}^{c}, y_{k}^{c})$
45:: End

Two scenarios were tested in our real-world experiments. The first scenario involved straight movement, in which all pedestrians were positioned behind the UGV and moved forward in unison, with both the UGV and pedestrians traveling at 1.2 m/s. The second scenario involved pedestrians on either side of the UGV, moving in parallel with it. The specific routes and scenarios for these experiments are illustrated in Figure 8.

5.2. Straight-Movement Experiment

During the straight-movement experiment, we took into account dynamic changes in member composition. As indicated by the red dashed curve in Figure 9a, the experiment began with five pedestrians. At frame 70, one additional pedestrian joined the group, and, at frame 115, another member exited, allowing us to validate the algorithm’s ability to accommodate spontaneous changes in group membership.

In the actual experiments, the number of detected pedestrians (as shown in Figure 9a) did not always equal the true number. Instances where the detected count was lower than the actual number typically indicated occlusions obstructing the identification of certain pedestrians. Conversely, situations where the detection count exceeded the true number often resulted from environmental obstacles being mistakenly classified as pedestrians. Due to mutual occlusions among group members, achieving perfect detection of all pedestrian targets is challenging. Therefore, we used the pedestrian targets identified by the detection algorithm as input to verify the robustness of our algorithm.

The results are presented in Table 4 and Figure 9. Quantitative analysis shows that the ME-CGT-UGV algorithm demonstrates superior performance. The visualization indicates that, despite the presence of false positives, true negatives, and false negatives, as well as certain pedestrians being obscured due to occlusion, our algorithm still generates reliable predictions by utilizing available observational data. This capability results in improved prediction accuracy and highlights the robustness of our approach.

5.3. Double-Column Experiment

In the double-column experiment, group members were distributed on either side of the UGV and progressed forward alongside it. The elliptical component was set to 2 to better characterize the formation distribution. The results from this experiment are detailed in Table 4 and Figure 9. Our method shows commendable performance in terms of angular error, smoothness, and other evaluative metrics.

5.4. Discussion

Through the experiments with real UGVs, we have empirically verified that our method accurately predicts the group centroid despite imperfect detection. This reliability is due to our algorithm’s data selection mechanism based on shape modeling. When obstacles are mistakenly detected as pedestrians, our algorithm combines position and shape cues to exclude outliers lying outside the modeled group region while retaining those within it. Targets that subsequently move beyond the region are automatically discarded. Furthermore, because the shape model represents the entire group rather than individual members, undetected or misclassified pedestrians remain within the group boundary, ensuring continuous and coherent tracking.

6. Conclusions

For SMSS-UGVs, this paper shifts the focus from tracking specific individuals to achieving stable following of the entire squad. To address challenges such as severe occlusions, dynamic changes in member composition, and complex formations, this paper (1) proposes a group target tracking framework based on a data selection mechanism to address dynamic changes in group composition through shape modeling; (2) introduces a group target tracking method for SMSS-UGVs based on multi-ellipse shape modeling to capture complex group formations; and (3) conducts a series of experiments to validate the effectiveness and robustness of the proposed methods. Future research will extend the proposed group target tracking algorithms to more challenging scenarios, enabling unmanned vehicles to efficiently assist rescue or patrol teams and enhance human–robot collaboration.

Author Contributions

Conceptualization, Y.Y. and T.W.; methodology, Y.Y. and T.W.; validation, Y.Y., T.W. and J.L.; formal analysis, Y.Y., T.W. and J.L.; investigation, Y.Y., T.W. and J.L.; resources, T.W. and J.L.; data curation, Y.Y.; writing—original draft preparation, Y.Y. and T.W.; writing—review and editing, J.L.; visualization, Y.Y.; supervision, T.W. and J.L.; project administration, J.L.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of Hunan Province under Grant 2025JJ60425, and by the National Natural Science Foundation of China under Grants U21A20518 and U22A2061.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank all reviewers and editors for their comments on this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Islam, M.J.; Hong, J.; Sattar, J. Person-following by autonomous robots: A categorical overview. Int. J. Robot. Res. 2019, 38, 1581–1618. [Google Scholar] [CrossRef]
Wang, A.; Makino, Y.; Shinoda, H. Machine learning-based human-following system: Following the predicted position of a walking human. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 4502–4508. [Google Scholar]
Leisiazar, S.; Park, E.J.; Lim, A.; Chen, M. An MCTS-DRL based obstacle and occlusion avoidance methodology in robotic follow-ahead applications. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 221–228. [Google Scholar]
Jiang, Q.; Susam, B.; Chao, J.J.; Isler, V. Map-Aware Human Pose Prediction for Robot Follow-Ahead. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 13031–13038. [Google Scholar]
Ye, M.; Shen, J.; Lin, G.; Xiang, T.; Shao, L.; Hoi, S.C.H. Deep Learning for Person Re-Identification: A Survey and Outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 2872–2893. [Google Scholar] [CrossRef] [PubMed]
Zuo, J.; Wu, T.; Shi, M.; Liu, X.; Zhao, X. Multi-Modal Object Tracking with Vision-Language Adaptive Fusion and Alignment. In Proceedings of the 2023 5th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI), Hangzhou, China, 1–3 December 2023; pp. 1125–1133. [Google Scholar]
Lin-Hai, G.; Gang, W.; Jin-Mang, L.; Song, L. An overview of group target tracking. Acta Autom. Sin. 2020, 46, 411–426. [Google Scholar]
Chen, Y.; Wu, M.; Jiao, Y.; Ma, H.; Zhou, Y. Research progress of dense group tracking. Telecommun. Eng. 2023, 63, 589–597. [Google Scholar]
Du, M.Y.; Bi, D.P.; Wang, S.L. Advances in key technologies of group target tracking. Electron. Opt. Control 2019, 26, 59–65+90. [Google Scholar]
Chen, W.; Liu, H.; Hu, S.; Ning, H. Group Tracking of Flock Targets in Low-Altitude Airspace. In Proceedings of the 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications Workshops, Busan, Republic of Korea, 26–28 May 2011; pp. 131–136. [Google Scholar]
Zhang, Z.-x.; Wei, Z.; Chen, M.-y. A new method of tracking group space object. In Proceedings of the 2013 International Workshop on Microwave and Millimeter Wave Circuits and System Technology, Chengdu, China, 24–25 October 2013; pp. 403–406. [Google Scholar]
Feldmann, M.; Fränken, D.; Koch, W. Tracking of Extended Objects and Group Targets Using Random Matrices. IEEE Trans. Signal Process. 2011, 59, 1409–1420. [Google Scholar] [CrossRef]
Granstrom, K.; Orguner, U. On spawning and combination of extended/group targets modeled with random matrices. IEEE Trans. Signal Process. 2012, 61, 678–692. [Google Scholar] [CrossRef]
Granstrom, K.; Orguner, U. A PHD filter for tracking multiple extended targets using random matrices. IEEE Trans. Signal Process. 2012, 60, 5657–5671. [Google Scholar] [CrossRef]
Granström, K.; Natale, A.; Braca, P.; Ludeno, G.; Serafino, F. Gamma Gaussian Inverse Wishart Probability Hypothesis Density for Extended Target Tracking Using X-Band Marine Radar Data. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6617–6631. [Google Scholar] [CrossRef]
Lan, J.; Li, X.R. Tracking of extended object or target group using random matrix: New model and approach. IEEE Trans. Aerosp. Electron. Syst. 2016, 52, 2973–2989. [Google Scholar] [CrossRef]
Tan, J.T.; Qi, G.Q.; Qi, J.J.; Yang, Y.J.; Li, Y.Y.; Sheng, A.D. Model Parameter Adaptive Approach of Extended Object Tracking Using Random Matrix and Identification. In Proceedings of the 2022 International Conference on Cyber-Physical Social Intelligence (ICCSI), Nanjing, China, 18–21 November 2022; pp. 91–97. [Google Scholar]
Fowdur, J.S.; Baum, M.; Heymann, F. An Elliptical Principal Axes-based Model for Extended Target Tracking with Marine Radar Data. In Proceedings of the 2021 IEEE 24th International Conference on Information Fusion (FUSION), Sun City, South Africa, 1–4 November 2021; pp. 1–8. [Google Scholar]
Tuncer, B.; Özkan, E. Random matrix based extended target tracking with orientation: A new model and inference. IEEE Trans. Signal Process. 2021, 69, 1910–1923. [Google Scholar] [CrossRef]
Lan, J.; Li, X.R. Tracking of Maneuvering Non-Ellipsoidal Extended Object or Target Group Using Random Matrix. IEEE Trans. Signal Process. 2014, 62, 2450–2463. [Google Scholar] [CrossRef]
Granström, K.; Willett, P.; Bar-Shalom, Y. An extended target tracking model with multiple random matrices and unified kinematics. In Proceedings of the 2015 18th International Conference on Information Fusion (Fusion), Washington, DC, USA, 6–9 July 2015; pp. 1007–1014. [Google Scholar]
Hu, Q.; Ji, H.; Zhang, Y. Tracking of maneuvering non-ellipsoidal extended target with varying number of sub-objects. Mech. Syst. Signal Process. 2018, 99, 262–284. [Google Scholar] [CrossRef]
Xue, X.; Huang, S.; Wei, D. Adaptive tracking of non-ellipsoidal extended target with varying number of sub-objects based on variational Bayesian. Digit. Signal Process. 2023, 142, 104214. [Google Scholar] [CrossRef]
Yang, D.; Guo, Y.; Qiu, B.; Chen, Y. Multiple Gaussian Processes based Extended Target Tracking with Variational Inference. In Proceedings of the 2024 36th Chinese Control and Decision Conference (CCDC), Xi’an, China, 25–27 May 2024; pp. 4687–4692. [Google Scholar]
Yang, D.; Guo, Y.; Li, X.; Chen, Y.; Shentu, H. Three-dimensional Multiple Extended Targets Tracking under occlusion using Variational Gaussian Processes. IEEE Trans. Aerosp. Electron. Syst. 2025, 1–20. [Google Scholar] [CrossRef]
Kumru, M.; Özkan, E. Three-Dimensional Extended Object Tracking and Shape Learning Using Gaussian Processes. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 2795–2814. [Google Scholar] [CrossRef]
Kumru, M.; Özkan, E. Tracking Arbitrarily Shaped Extended Objects Using Gaussian Processes. In Proceedings of the 2024 27th International Conference on Information Fusion (FUSION), Venice, Italy, 8–11 July 2024; pp. 1–8. [Google Scholar]
Baum, M.; Hanebeck, U.D. Extended Object Tracking with Random Hypersurface Models. IEEE Trans. Aerosp. Electron. Syst. 2014, 50, 149–159. [Google Scholar] [CrossRef]
Yao, G.; Dani, A. Image moment-based random hypersurface model for extended object tracking. In Proceedings of the 2017 20th International Conference on Information Fusion (Fusion), Xi’an, China, 10–13 July 2017; pp. 1–7. [Google Scholar]
Sun, L.; Yu, H.; Fu, Z.; He, Z.; Zou, J. Modeling and Tracking of Maneuvering Extended Object with Random Hypersurface. IEEE Sens. J. 2021, 21, 20552–20562. [Google Scholar] [CrossRef]
Yang, J.; Li, P.; Yang, L.; Ge, H. An improved ET-GM-PHD filter for multiple closely-spaced extended target tracking. Int. J. Control. Autom. Syst. 2017, 15, 468–472. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, P.; Cao, Z.; Jia, Y. Random Matrix-Based Group Target Tracking Using Nonlinear Measurement. In Proceedings of the 2022 IEEE 5th International Conference on Electronics Technology (ICET), Chengdu, China, 13–16 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1224–1228. [Google Scholar]
Zhang, X.; Chen, S.; Qi, G.; Li, Y.; Sheng, A. The Algorithm of Group Target Tracking Based on Bearing-only Measurements. In Proceedings of the 2024 43rd Chinese Control Conference (CCC), Kunming, China, 28–31 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 4121–4128. [Google Scholar]
Tuncer, B.; Orguner, U.; Özkan, E. Multi-ellipsoidal extended target tracking with variational Bayes inference. IEEE Trans. Signal Process. 2022, 70, 3921–3934. [Google Scholar] [CrossRef]
Chen, Y.; Jiao, Y.; Wu, M.; Ma, H.; Lu, Z. Group Target Tracking for Highly Maneuverable Unmanned Aerial Vehicles Swarms: A Perspective. Sensors 2023, 23, 4465. [Google Scholar] [CrossRef]
Ferreira, D.; Basiri, M. Dynamic Target Tracking and Following with UAVs Using Multi-Target Information: Leveraging YOLOv8 and MOT Algorithms. Drones 2024, 8, 488. [Google Scholar] [CrossRef]
Wu, P.; Li, Y.; Xue, D. Multi-Target Tracking with Multiple Unmanned Aerial Vehicles Based on Information Fusion. Drones 2024, 8, 704. [Google Scholar] [CrossRef]
Zhou, S.; He, Z.; Chen, X.; Chang, W. An Anomaly Detection Method for UAV Based on Wavelet Decomposition and Stacked Denoising Autoencoder. Aerospace 2024, 11, 393. [Google Scholar] [CrossRef]
Xiao, J.; Ren, Y.; Du, J.; Zhao, Y.; Kumari, S.; Alenazi, M.J.F.; Yu, H. CALRA: Practical Conditional Anonymous and Leakage-Resilient Authentication Scheme for Vehicular Crowdsensing Communication. IEEE Trans. Intell. Transp. Syst. 2025, 26, 1273–1285. [Google Scholar] [CrossRef]
Zhou, S.; Wei, C.; Song, C.; Pan, X.; Chang, W.; Yang, L. Short-term traffic flow prediction of the smart city using 5G internet of vehicles based on edge computing. IEEE Trans. Intell. Transp. Syst. 2022, 24, 2229–2238. [Google Scholar] [CrossRef]
Le Cam, L.M.; Neyman, J. (Eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume II, Part II: Probability Theory, 1st ed.; University of California Press: Berkeley, CA, USA, 1967. [Google Scholar]
Simoudis, E.; Han, J.; Fayyad, U. (Eds.) KDD’96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining; AAAI Press: Washington, DC, USA, 1996. [Google Scholar]
Wahlström, N.; Özkan, E. Extended Target Tracking Using Gaussian Processes. IEEE Trans. Signal Process. 2015, 63, 4165–4178. [Google Scholar] [CrossRef]
Xu, W.; Zhang, F. FAST-LIO: A Fast, Robust LiDAR-Inertial Odometry Package by Tightly-Coupled Iterated Kalman Filter. IEEE Robot. Autom. Lett. 2021, 6, 3317–3324. [Google Scholar] [CrossRef]

Figure 1. The proposed framework of group target tracking for SMSS-UGVs. The green components are required only during the initialization phase; the red components pertain to issues that need to be addressed in multi-group target tracking; and the blue area represents the framework proposed in this paper.

Figure 2. Pedestrian trajectory patterns in experimental scenarios. Six distinct scenarios are simulated: (a) straight movement; (b) formation contraction; (c) formation expansion; (d) turning maneuver; (e) double-column formation; and (f) U-shaped formation. Section 4.3 and Section 4.4 utilize the straight movement scenario (a), while Section 4.5 employs scenarios (b–d) for formation transition and turning experiments.

Figure 3. The results of Lat-STD and OE across different tracking methods. This figure illustrates the tracking results for the scenario depicted in Figure 2a, where the SOT tracks pedestrians following an S-curve trajectory.

Figure 4. Evaluation of predicted shape area and orientation. We evaluate shape modeling performance in the following scenarios: (a,b) represent the formation contraction, (c,d) represent the expansion process, and (e,f) represent turning, to assess robustness and adaptability.

Figure 5. Visualization of group target tracking results for double-column (Dou-col) and U-shaped formations. The selected frames illustrate the corresponding formation configurations and predicted shapes. The red dashed line shows the group target’s ground truth, green dots are data for updates, pink dots indicate occluded points, and gray dots represent unselected points.

Figure 6. Variation in ellipse component counts for ME-CGT-UGV under different initialization settings. For the ME-CGT-UGV method, different numbers of elliptical components can be set in the initialization stage, and we show the variation in the elliptical components with different numbers of initial values during the experiment.

Figure 7. The UGV platform used in experiments. The UGV is equipped with a LiDAR, cameras, an IMU, and other essential sensors to enable robust perception and navigation.

Figure 8. The experimental routes and scenarios. The red line indicates the direction of movement, the blue line shows changes in team members, the orange ellipses represent the group formation, and the green square represents obstacles.

Figure 9. Results of target count and trajectory comparison. (a,c) The number of targets returned by the detection algorithms; (b,d) trajectories produced by the comparative methods alongside our proposed method.

Table 1. Quantitative results of comparative experiments with tracking methods.

Method	RMSE	OE	Lat-STD
SOT	3.534	29.491	3.531
MOT	1.295	6.711	0.554
CGT-Mean	2.714	76.830	2.303
ME-CGT-UGV(Ours)	0.966	4.088	0.473

Table 2. Quantitative results of comparative experiments with shape modeling methods.

Method	RMSE	OE	Lat-STD
CGT-UGV	1.567	8.591	0.858
GP-CGT-UGV	2.497	4.042	1.752
MGP-CGT-UGV	4.298	6.649	0.833
RM-CGT-UGV	1.060	6.598	0.755
ME-CGT-UGV(Ours)	0.984	3.884	0.423

Notes: GP-CGT-UGV is a modification of [43], MGP-CGT-UGV is based on [24], and RM-CGT-UGV is from [12].

Table 3. Performance under disturbance and occlusion conditions.

Disturbance Experiments					Occlusion Experiments
Numbers	Method	RMSE	OE	Lat-STD	Angle	Method	RMSE	OE	Lat-STD	Occlusion Rate
$n = 0$	CGT-UGV	1.226	7.019	0.576	$n = 1$ °	CGT-UGV	1.275	7.545	0.821	0.05
	GP-CGT-UGV	2.643	3.682	1.771		GP-CGT-UGV	1.818	3.653	1.166	0.06
	MGP-CGT-UGV	3.530	5.277	1.722		MGP-CGT-UGV	3.993	7.030	1.166	0.06
	RM-CGT-UGV	1.106	6.938	0.809		RM-CGT-UGV	0.895	6.870	0.723	0.04
	ME-CGT-UGV	0.999	3.869	0.437		ME-CGT-UGV	0.962	4.028	0.600	0.05
$n = 2$	CGT-UGV	1.567	8.591	0.858	$n = 5$ °	CGT-UGV	1.567	8.591	0.858	0.28
	GP-CGT-UGV	2.497	4.042	1.752		GP-CGT-UGV	2.497	4.042	1.752	0.22
	MGP-CGT-UGV	4.298	6.649	0.833		MGP-CGT-UGV	4.298	6.649	0.833	0.30
	RM-CGT-UGV	1.060	6.598	0.755		RM-CGT-UGV	1.060	6.598	0.755	0.25
	ME-CGT-UGV	0.984	3.884	0.423		ME-CGT-UGV	0.984	3.884	0.423	0.25
$n = 5$	CGT-UGV	2.331	10.908	1.569	$n = 8$ °	CGT-UGV	2.294	10.283	1.301	0.42
	GP-CGT-UGV	2.090	3.402	1.189		GP-CGT-UGV	3.897	4.055	1.355	0.35
	MGP-CGT-UGV	3.431	3.541	1.264		MGP-CGT-UGV	7.457	9.158	2.498	0.42
	RM-CGT-UGV	1.223	7.716	0.870		RM-CGT-UGV	1.208	8.668	1.081	0.38
	ME-CGT-UGV	1.289	4.082	0.708		ME-CGT-UGV	2.067	6.473	1.490	0.33
$n = 9$	CGT-UGV	2.263	9.648	0.963	$n = 10$ °	CGT-UGV	2.342	9.440	1.154	0.49
	GP-CGT-UGV	2.598	6.499	1.679		GP-CGT-UGV	4.509	5.036	1.554	0.42
	MGP-CGT-UGV	4.766	8.067	3.229		MGP-CGT-UGV	8.699	8.944	3.952	0.51
	RM-CGT-UGV	1.236	7.777	0.874		RM-CGT-UGV	1.430	7.666	1.267	0.43
	ME-CGT-UGV	1.297	4.093	0.704		ME-CGT-UGV	1.885	3.946	0.698	0.47
$n = 12$	CGT-UGV	3.082	9.630	1.354	$n = 12$ °	CGT-UGV	2.896	9.143	1.264	0.56
	GP-CGT-UGV	2.541	6.495	1.633		GP-CGT-UGV	5.599	5.059	2.049	0.49
	MGP-CGT-UGV	5.456	7.623	3.656		MGP-CGT-UGV	4.957	6.675	1.486	0.48
	RM-CGT-UGV	1.217	7.764	0.860		RM-CGT-UGV	2.533	9.4344	2.432	0.50
	ME-CGT-UGV	1.306	4.197	0.660		ME-CGT-UGV	1.619	3.8457	0.582	0.53

Table 4. Results of real-world experiments.

Scenario	Method	RMSE	OE	Lat-STD
Straight	CGT-UGV	4.078	8.632	0.164
	GP-CGT-UGV	8.397	12.901	0.417
	MGP-CGT-UGV	6.695	12.236	0.513
	RM-CGT-UGV	4.087	5.213	0.085
	ME-CGT-UGV(Ours)	3.953	3.722	0.127
Double-column	CGT-UGV	5.156	19.640	0.161
	GP-CGT-UGV	8.205	16.432	0.485
	MGP-CGT-UGV	4.813	10.551	0.394
	RM-CGT-UGV	5.029	20.845	0.255
	ME-CGT-UGV(Ours)	4.767	5.864	0.124

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, Y.; Li, J.; Wu, T. A Group Target Tracking Method for Unmanned Ground Vehicles Based on Multi-Ellipse Shape Modeling. Drones 2025, 9, 620. https://doi.org/10.3390/drones9090620

AMA Style

Yu Y, Li J, Wu T. A Group Target Tracking Method for Unmanned Ground Vehicles Based on Multi-Ellipse Shape Modeling. Drones. 2025; 9(9):620. https://doi.org/10.3390/drones9090620

Chicago/Turabian Style

Yu, Youjin, Junxiang Li, and Tao Wu. 2025. "A Group Target Tracking Method for Unmanned Ground Vehicles Based on Multi-Ellipse Shape Modeling" Drones 9, no. 9: 620. https://doi.org/10.3390/drones9090620

APA Style

Yu, Y., Li, J., & Wu, T. (2025). A Group Target Tracking Method for Unmanned Ground Vehicles Based on Multi-Ellipse Shape Modeling. Drones, 9(9), 620. https://doi.org/10.3390/drones9090620

Article Menu

A Group Target Tracking Method for Unmanned Ground Vehicles Based on Multi-Ellipse Shape Modeling

Abstract

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Problem Definition

2.2. Framework

2.2.1. Initialization

2.2.2. Data Selection

2.2.3. State Prediction

3. Extended Shape Modeling

3.1. Extended Shape Modeling Based on Random Matrix

3.2. Multi-Ellipse Shape Modeling

4. Simulation Experiments

4.1. Experimental Setup

4.2. Evaluation Metrics

4.3. Comparison Experiments with Tracking Methods

4.4. Comparison Experiments with Shape Modeling Methods

4.5. Robustness Experiments

4.5.1. Disturbance Experiments

4.5.2. Occlusion Experiments

4.5.3. Formation-Changing Experiments

4.5.4. Complex-Formation Experiments

4.6. Sensitivity Analysis

5. Real-World Experiments

5.1. Experimental Setup

5.2. Straight-Movement Experiment

5.3. Double-Column Experiment

5.4. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI