Data-Driven Optimal Bipartite Containment Tracking for Multi-UAV Systems with Compound Uncertainties

Chen, Bowen; Shi, Mengji; Li, Zhiqiang; Qin, Kaiyu

doi:10.3390/drones9080573

Open AccessArticle

Data-Driven Optimal Bipartite Containment Tracking for Multi-UAV Systems with Compound Uncertainties

¹

School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, Chengdu 611731, China

²

Aircraft Swarm Intelligent Sensing and Cooperative Control Key Laboratory of Sichuan Province, Chengdu 611731, China

³

National Laboratory on Adaptive Optics, Chengdu 610209, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(8), 573; https://doi.org/10.3390/drones9080573

Submission received: 24 June 2025 / Revised: 10 August 2025 / Accepted: 12 August 2025 / Published: 13 August 2025

Download

Browse Figures

Versions Notes

Abstract

With the increasing deployment of Unmanned Aerial Vehicle (UAV) swarms in uncertain and dynamically changing environments, optimal cooperative control has become essential for ensuring robust and efficient system coordination. To this end, this paper designs a data-driven optimal bipartite containment tracking control scheme for multi-UAV systems under compound uncertainties. A novel Dynamic Iteration Regulation Strategy (DIRS) is proposed, which enables real-time adjustment of the learning iteration step according to the task-specific demands. Compared with conventional fixed-step data-driven algorithms, the DIRS provides greater flexibility and computational efficiency, allowing for better trade-offs between the performance and cost. First, the optimal bipartite containment tracking control problem is formulated, and the associated coupled Hamilton–Jacobi–Bellman (HJB) equations are established. Then, a data-driven iterative policy learning algorithm equipped with the DIRS is developed to solve the optimal control law online. The stability and convergence of the proposed control scheme are rigorously analyzed. Furthermore, the control law is approximated via the neural network framework without requiring full knowledge of the model. Finally, numerical simulations are provided to demonstrate the effectiveness and robustness of the proposed DIRS-based optimal containment tracking scheme for multi-UAV systems, which can reduce the number of iterations by 88.27% compared to that for the conventional algorithm.

Keywords:

multi-UAV systems; optimal bipartite containment tracking; data-driven control; dynamic iteration regulation; compound uncertainties

1. Introduction

In recent years, multi-UAV systems have emerged as a cornerstone in advancing low-altitude economies and intelligent autonomous systems owing to their advantages in task flexibility, environmental adaptability, and cooperative autonomy. Technically, system architectures have evolved from centralized schemes to distributed cooperative frameworks, thereby enhancing the robustness, scalability, and resilience in complex and dynamic environments. These advancements have facilitated the widespread deployment of multi-UAV systems across various domains. For example, in precision agriculture, UAV swarms are employed for real-time crop monitoring, field mapping, and pesticide application [1,2]. In urban logistics, UAV-based delivery networks enable efficient last-mile transportation and emergency responses [3,4]. In the energy sector, UAVs are increasingly used to inspect power lines, substations, and wind farms, providing high-frequency data acquisition with reduced operational risks [5,6]. These diverse applications highlight the critical role of multi-UAV systems in addressing real-world challenges across civil, industrial, and environmental domains.

Among various coordination strategies, containment tracking has emerged as a fundamental problem in the cooperative control of multi-UAV systems. The primary objective is to drive the follower UAVs to converge into the convex hull formed by multiple leader agents under a leader–follower architecture. Depending on the motion behavior of the leaders, containment control problems are generally classified into two categories: static and dynamic containment [7]. Static containment considers scenarios where the leaders remain stationary, and the followers are required to converge to a fixed convex region. This formulation simplifies the controller design by eliminating the need for velocity measurements and is well suited to applications such as satellite formation surveillance and area monitoring [8,9,10]. In contrast, dynamic containment addresses cases where the leaders follow time-varying trajectories. It necessitates the development of distributed velocity or acceleration observers to enable the followers to track a moving convex region, thereby enhancing the robustness under dynamic operating conditions. Dynamic containment has demonstrated substantial practical relevance in time-sensitive missions such as UAV-based logistics and aerial surveillance [11,12,13]. Nevertheless, practical deployments of multi-UAV systems often involve non-cooperative behaviors and resource contention, especially in congested airspace or adversarial environments [14,15]. Such scenarios naturally give rise to both cooperative and antagonistic interactions among agents, which conventional containment control frameworks fail to adequately address. To overcome these limitations, the concept of bipartite containment control has been introduced as a significant extension. This framework is designed to ensure group-level convergence even in the presence of both positive (cooperative) and negative (competitive) relationships among agents.

Within the framework of bipartite containment control, follower agents are guided to converge to the union of two convex hulls, each constructed by a distinct subgroup of cooperative or competitive leaders. These leader subgroups independently span their respective convex regions, while the system collectively ensures global convergence and stability [16]. Several recent studies have explored this problem from various perspectives. For instance, Wu et al. [17] investigated bipartite containment for heterogeneous marine aerial–surface systems and proposed an adaptive control scheme with prescribed performance guarantees. He et al. [18] tackled the containment tracking problem in nonlinear systems subject to false data injection (FDI) attacks using a model-free adaptive iterative learning control method. Furthermore, Fan et al. [19] formulated a problem under adversarial disturbances as a zero-sum game and employed game-theoretic strategies to achieve bipartite containment. Also contributing to the research on bipartite containment control are the works in [20,21], which have significantly advanced the field by enhancing robustness and addressing adversarial scenarios. However, most of the existing approaches primarily focus on feasibility and stability, while a critical gap persists in performance-oriented design. In particular, aspects such as minimizing the control effort, reducing the energy consumption, and improving transient behavior have received insufficient attention, despite their practical significance in real-world multi-UAV applications that operate under resource constraints and mission-critical timelines.

Moreover, most of the aforementioned studies do not explicitly account for the presence of uncertainties, which are ubiquitous in practical multi-UAV control scenarios. In real-world deployments, UAVs are often subjected to external disturbances such as gusty winds, atmospheric turbulence, and aerodynamic interactions with neighboring vehicles [22]. Simultaneously, internal uncertainties may stem from actuator degradation, sensor misalignment, mounting errors, and structural deformations [23]. These compound uncertainties introduce significant challenges into the design of control strategies that are both robust and performance-optimal. Consequently, there is a pressing need for control frameworks capable of achieving high-performance coordination while effectively accommodating such uncertainties.

In recent years, adaptive dynamic programming (ADP) has garnered considerable attention as an effective approach to solving the optimal control problem [24]. ADP facilitates online learning of the optimal control policies via iterative value function approximation and is particularly appealing due to its model-free nature, enabling adaptation to unknown or time-varying system dynamics [25]. Some representative studies [26,27] show that ADP exhibits adaptivity and practicality in both energy system optimization and UAV cooperative control. Meanwhile, to address the uncertainty in the control process, some works [28,29] propose robust ADP methods. Despite these advances, the majority of the existing ADP approaches rely on the generalized policy iteration (GPI) framework [30,31], which typically assumes a preassigned and fixed number of internal iterations. Such static iteration strategies may result in a suboptimal computational efficiency, particularly in scenarios with heterogeneous learning speeds or dynamic environmental conditions, thereby hindering real-time applicability in resource-constrained systems.

Therefore, to address the aforementioned challenges, this paper proposes a data-driven Dynamic Iterative Reward Shaping (DIRS) approach to achieve optimal bipartite containment control for multi-UAV systems subject to compound model uncertainties. The main contributions of this paper are summarized as follows:

(1) A data-driven control scheme is developed to address the problem of optimal bipartite containment tracking control for multi-UAV systems under compound uncertainties. The proposed scheme integrates task-specific cost function design with online policy learning, enabling performance-oriented and adaptive control. Compared with the existing approaches [32,33], this framework provides greater flexibility and applicability in real-world scenarios by allowing the control strategies to be tailored to mission-specific objectives and environmental conditions.

(2) By leveraging a critic–actor neural network architecture, the proposed scheme realizes a data-driven, online iterative control process that operates independently of explicit system model parameters. This model-free formulation enhances the robustness against compound uncertainties, including external disturbances and internal structural or parametric errors, thereby ensuring a reliable performance in uncertain and dynamically evolving multi-UAV environments.

(3) To enhance the computational efficiency of traditional ADP algorithms [34,35], this paper designs a novel DIRS method. By incorporating a dynamic regulation mechanism and an adaptive regulation threshold, the DIRS enables task-dependent adjustment of the learning iteration steps. This mechanism effectively eliminates redundant iterations, improves the computational efficiency, and facilitates a better trade-off between the control performance and resource cost, thereby making this approach more suitable for real-time deployment in complex multi-UAV systems.

The remainder of this paper is organized as follows. Section 2 presents the preliminary concepts in graph theory and formulates the optimal bipartite containment tracking problem for multi-UAV systems. Section 3 introduces a novel DIRS-based iterative strategy for solving the optimal control problem and analyzes its performance and further details the implementation of a critic–actor neural network for online approximation of the optimal control law. Section 4 provides numerical simulations to demonstrate the effectiveness of the proposed method. Finally, Section 5 concludes the whole paper.

2. Preliminaries and Problem Formulation

2.1. Preliminaries

Consider a multi-UAV system with N followers and M leaders, denoted as a structurally balanced graph

G = (V, E, A)

, which is used to model the interactions in the system, where

V = {v_{1}, \dots, v_{M}, v_{M + 1}, \dots, v_{M + N}}

is the set of nodes formed by all UAVs.

V_{L} = {v_{1}, \dots, v_{M}}

and

V_{F} = {v_{M + 1}, \dots, v_{M + N}}

are the subsets of nodes representing the leader UAVs and follower UAVs, respectively. In addition,

E \subset V \times V

is the set of edges, and

A = [a_{i j}]

is the adjacency matrix, which is used to represent the weight of each edge. If an edge from UAV

v_{j}

to UAV

v_{i}

exists, it is denoted as

(v_{j}, v_{i}) \in E

, and the weight of the edge is

a_{i j} > 0

when these two UAVs are cooperative, and conversely,

a_{i j} < 0

. If

a_{i j} = 0

, that means that there is no interaction between UAVs

v_{i}

and

v_{j}

. The set of neighbors of the UAV

v_{i}

is defined as

N_{i} = {j \in V ∣ (v_{j}, v_{i}) \in E}

, which is the set of all UAVs that can directly communicate with UAV

v_{i}

.

Assumption 1.

The communication topology of the multi-UAV system is assumed to be a directed graph. In this topology, leader agents do not receive information from follower agents, reflecting a unidirectional leader–follower interaction structure. Furthermore, it is assumed that at least one directed path exists from each leader to every follower, thereby ensuring that the leader set maintains directed connectivity to the follower set.

2.2. Problem Formulation

In this subsection, we consider a multi-UAV system composed of M leaders and N followers operating under compound uncertainties. Dynamic models of both the leader and follower agents are constructed and simplified accordingly. Based on this, the problem of optimal bipartite containment tracking control is formally formulated.

2.2.1. Models of the UAVs

Consider a follower quadrotor UAV system in the Earth-fixed coordinate system

O_{e}

, whose spatial information is determined by the position vector

x_{i} = {[p_{i_{x}}, p_{i_{y}}, p_{i_{z}}]}^{T}

and the attitude Euler angle vector

Ω_{i} = {[ϕ_{i}, θ_{i}, ψ_{i}]}^{T}

, where

p_{i_{x}}, p_{i_{y}}, p_{i_{z}}

are the three-axis co-ordinates of the

O_{e}

system. And

ϕ_{i}, θ_{i}, ψ_{i}

represent the roll, pitch, and yaw angles, respectively. Based on the rigid-body dynamics established in the literature [36], the model of the ith follower UAV can be expressed as follows:

\begin{matrix} x_{i} (t + 1) & = x_{i} (t) + T v_{i} (t) + o_{1} (t), \\ v_{i} (t + 1) & = v_{i} (t) + T (- g \vec{e_{3}} + \frac{1}{m_{i}} T_{b i} (t) R_{b i}^{e} (t) \vec{e_{3}} + o_{2} (t)), \end{matrix}

(1)

where the velocity vector

v_{i} = {[v_{i_{x}}, v_{i_{y}}, v_{i_{z}}]}^{T}

to represent the velocity state of the ist UAV in the inertial system. Equally, the constant g denotes the gravity acceleration, whose direction is determined by the base vector

\vec{e_{3}} = {[0, 0, 1]}^{T}

(vertically downward positive).

m_{i}

represents the total mass of the UAV body’s structure and payload. The nonlinear function terms

o_{1} (t)

and

o_{2} (t)

denote the external uncertainties for the follower UAV. The actuator characteristics are described by the total lift

T_{b i}

in the body-fixed coordinate system

O_{b}

, which is mapped to the Earth-fixed coordinate system

O_{e}

by the rotation matrix

R_{b i}^{e}

.

R_{b i}^{e} (t) = [\begin{matrix} \cos θ_{i} \cos ψ_{i} & \cos ψ_{i} \sin ϕ_{i} \sin θ_{i} - \cos ϕ_{i} \sin ψ_{i} & \sin ϕ_{i} \sin ψ_{i} + \cos ϕ_{i} \cos ψ_{i} \sin θ_{i} \\ \cos θ_{i} \sin ψ_{i} & \cos ϕ_{i} \cos ψ_{i} + \sin ϕ_{i} \sin θ_{i} \sin ψ_{i} & \cos ϕ_{i} \cos θ_{i} \sin ψ_{i} - \cos ψ_{i} \cos ϕ_{i} \\ - \sin θ_{i} & \cos θ_{i} \sin ϕ_{i} & \cos ϕ_{i} \cos θ_{i} \end{matrix}] .

(2)

To facilitate the analysis, we define

u_{i_{x}} (t) = T_{b i} (t) (\sin ϕ_{i} \sin ψ_{i} + \cos ϕ_{i} \cos ψ_{i} \sin θ_{i}) / m_{i}

,

u_{i_{y}} (t) = T_{b i} (t) (\cos ϕ_{i} \cos θ_{i} \sin ψ_{i} - \cos ψ_{i} \cos ϕ_{i}) / m_{i}

, and

u_{i_{z}} = T_{b i} (t) (\cos ϕ_{i} \cos θ_{i}) / m_{i} - g

. Moreover, let

u_{i} = {[u_{i_{x}}, u_{i_{y}}, u_{i_{z}}]}^{T}

be the control inputs. A simplified form of the follower UAV dynamics with compound uncertainties can be obtained.

\begin{matrix} x_{i} (t + 1) = & x_{i} (t) + T v_{i} (t) + o_{1} (t), \\ v_{i} (t + 1) = & v_{i} (t) + T u_{i} (t) + o_{2} (t), i = 1, 2, \dots, N . \end{matrix}

(3)

Similarly, the same simplification can be made for the dynamics of the leader UAVs, resulting in the following:

\begin{matrix} x_{m} (t + 1) = & x_{m} (t) + T v_{m} (t) + ℓ_{1} (t), \\ v_{m} (t + 1) = & v_{m} (t) + ℓ_{2} (t), m = 1, 2, \dots, M . \end{matrix}

(4)

According to (3) and (4), the dynamics of a multi-UAV system with compound uncertainties can be simplified into a second-order integrator. The subsequent controller design will be developed on this basis.

2.2.2. The Description of the Optimal Bipartite Containment Tracking Control Problem

Definition 1.

For a set of leader UAV positions

x_{1}, \dots, x_{M}

, the corresponding convex hulls formed are defined as

c o {x_{1}, \dots, x_{M}} = \{\sum_{i = 1}^{M} γ_{i} x_{i} | γ_{i} \in R, γ \geq 0, \sum_{i = 1}^{M} γ_{i} = 1\} .

(5)

Definition 2.

Bipartite containment control of a multi-UAV system (3), (4) can be achieved if driven by the control inputs, some of the followers converge to the convex hull

c o {x_{j}, j \in V_{L}}

according to the topological relation, and the others converge to the bipartite convex hull

- c o {x_{j}, j \in V_{L}}

.

Remark 1.

In contrast to traditional containment control problems [9,12], the bipartite containment control framework divides the UAV agents into distinct groups that interact through both cooperative and antagonistic links within a signed communication network. Such interactions result in the follower UAVs asymptotically converging to two opposing convex hulls, each spanned by a subgroup of leader UAVs. Based on this structure, bipartite containment tracking control not only requires the follower UAVs to converge to the appropriate convex hull region in terms of the state but also mandates synchronization of the velocity with the corresponding convex hulls, thereby ensuring coordinated motion tracking in dynamic environments.

Therefore, to solve the bipartite containment tracking control problem in multi-UAV systems, the local neighbourhood bipartite containment tracking error is described as

\begin{matrix} e_{i} (t) = & \sum_{j \in N_{i}} | a_{i j} | [(sign (a_{i j}) x_{j} (t) - x_{i} (t)) + (sign (a_{i j}) v_{j} (t) - v_{i} (t))], i \in V_{F} . \end{matrix}

(6)

To formulate the optimal containment control problem, the following performance index

R_{i}

is defined for each follower UAV i:

\begin{matrix} R_{i} (e_{i} (t), u_{i} (t)) = \sum_{τ = t}^{\infty} ρ^{τ - t} H_{i} (e_{i} (τ), u_{i} (τ)), \end{matrix}

(7)

where

ρ

is a discount factor that ranges in (0, 1], which determines the relative importance of the future rewards to the current value estimates. Therefore, it is usually necessary in practice to balance the initial convergence speed with the final performance in order to select the optimal discount factor. And the utility function

H_{i}

can be denoted as

\begin{matrix} H_{i} (e_{i} (t), u_{i} (t)) = & e_{i}^{T} (t) P_{i} e_{i} (t) + u_{i}^{T} (t) Q_{i i} u_{i} (t) . \end{matrix}

(8)

where

P_{i} \geq 0

corresponds to the weight matrix of the quadratic term of the tracking error, and the matrices

Q_{i i} > 0

correspond to the weight matrix of the quadratic term of the control inputs. Therefore, different values for

P_{i}

and

Q_{i i}

can dynamically adjust the weights of the tracking error and control inputs in the utility function, which in turn determines the convergence speed.

Moreover, according to the admissible control set

{u_{1}, u_{2}, \dots, u_{N}}

, the optimal performance metric function can be defined as

\begin{matrix} R_{i}^{*} (e_{i} (t), u_{i} (t)) = & \min_{u_{i} (t)} {H_{i} (e_{i} (t), u_{i} (t)) + ρ R_{i}^{*} (e_{i} (t + 1), u_{i} (t + 1))} . \end{matrix}

(9)

Meanwhile, the value function of the iteration process is

\begin{matrix} V_{i} (e_{i} (t), u_{i} (t) = \sum_{τ = t}^{\infty} φ^{τ - t} H_{i} (e_{i} (τ), u_{i} (τ)) . \end{matrix}

(10)

Furthermore, according to the Bellman optimality principle,

V_{i}

should satisfy the following discrete-time HJB equation.

\begin{matrix} V_{i}^{*} (e_{i} (t), u_{i} (t) = & \min_{u_{i} (t)} {H_{i} (e_{i} (t), u_{i} (t)) + ρ R_{i}^{*} (e_{i} (t + 1), u_{i} (t + 1))} . \end{matrix}

(11)

The corresponding optimal bipartite containment tracking control law can be defined as

\begin{matrix} u_{i}^{*} (t) = & arg \min_{u_{i} (t)} {H_{i} (e_{i} (t), u_{i} (t)) + ρ R_{i}^{*} (e_{i} (t), u_{i} (t))} . \end{matrix}

(12)

Remark 2.

As mentioned in the literature [37], there is a correlation between the connectivity of the topology and the Laplace matrix eigenvalues. In general, if the connectivity is higher, the control is relatively better and converges relatively faster; however, greater connectivity means that each UAV also needs to obtain more information about its neighbors and therefore may consume more computational resources.

This section presents the formulation of the optimal bipartite containment tracking control problem for multi-UAV systems subject to compound uncertainties. It is important to note that the Bellman equation associated with the optimal value function (11) and the corresponding control law (12) are inherently nonlinear and lack closed-form analytical solutions. To address this challenge, online numerical methods with iterative approximation schemes are employed to solve the HJB equations and approximate both the optimal value function and the associated control inputs. Furthermore, in subsequent sections, an enhanced ADP framework is proposed by developing a novel DIRS, which significantly reduces the computational burden of the iterative process.

3. DIRS Algorithm Design, Analysis, and Online Implementation

In this section, a novel DIRS is proposed to adaptively adjust the sub-iteration step size. The proposed algorithm effectively eliminates redundant iterations and facilitates the solution of the optimal value function and the corresponding control inputs. A neural-network-based approximation scheme is then employed to enable an online implementation of the optimal bipartite containment tracking control. The overall DIRS-based control framework is illustrated in Figure 1.

3.1. The DIRS Algorithm Design

Let

u_{i}^{b} (t)

and

V_{i}^{b} (e_{i} (t))

represent the control law and the value function, respectively. Furthermore, to simplify the iteration process,

V_{i}^{b, d} (e_{i} (t))

is defined as an adaptive sub-iteration value function with the sub-iteration indexes

b = 0, 1, 2, \dots, φ, \dots, Ω

. The symbol

φ

denotes the number of sub-iterations that are adaptively adjusted based on the convergence of the value function, while

Ω

represents the predetermined maximum number of sub-iterations. In order to facilitate the representation, all polynomials of a similar form have

V_{i} (e_{i} (t)) = V_{i} (t), H_{i} (e_{i} (t), u_{i} (t)) = H_{i} (t)

, for those mentioned later.

For the progress of the

φ

th sub-iteration, the value function is

V_{i}^{b} (t) = V_{i}^{b, φ} (t) .

(13)

The DIRS algorithm’s iterative process is summarized below:

(1): Initializing the admissible control law

u_{i}^{0} (t)

, the initial value function

V_{i}^{0} (t)

is determined as

\begin{matrix} V_{i}^{0} (t) = H_{i}^{0} (t) + ρ V_{i}^{0} (t + 1) . \end{matrix}

(14)

For

b = 1, 2, \dots

, the iteration control law

u_{i}^{b} (t)

is

\begin{matrix} u_{i}^{b} (t) = arg \min_{u_{i} (t)} {H_{i}^{0} (t) + ρ V_{i}^{b - 1} (t + 1)} . \end{matrix}

(15)

(2): Let the control law

u_{i}^{b} (t)

remain a constant value. For the bth step of the iteration, the value function

V_{i}^{b} (t)

can be solved as follows.

For

b = 1, 2, \dots, Ω

,

\begin{matrix} V_{i}^{b, d} (t) = H_{i}^{b} (t) + ρ V_{i}^{b, d - 1} (t + 1), \end{matrix}

(16)

where the initial value function of the sub-iterations is

\begin{matrix} V_{i}^{b, 0} (t) = H_{i}^{b} (t) + ρ V_{i}^{b - 1} (t + 1) . \end{matrix}

(17)

If the difference in the value function in two iterations is required to satisfy the following equation,

∥ V_{i}^{b, d} (t) - V_{i}^{b, d - 1} (t) ∥ < Θ,

(18)

then the d sub-iteration terminates, while its iteration number is recorded as

φ

. In particular, the threshold size is set to

Θ

. In fact, a smaller

Θ

imposes stricter accuracy requirements, leading to more internal iterations to achieve a higher control accuracy. Conversely, a larger

Θ

relaxes the accuracy requirements, allowing the system to reach a steady state faster with fewer iterations.

In the

φ

step iteration,

V_{i}^{b} (t) = V_{i}^{b, φ} (t) .

(19)

Otherwise, the adaptive sub-iteration value function is

V_{i}^{b} (t) = V_{i}^{b, Ω} (t) .

(20)

This subsection presents a DIRS algorithm with an adaptive termination mechanism (18), which enables online solution of the HJB equation via alternating policy evaluations and policy improvements. Specifically, the algorithm reduces the number of iterations once sub-iteration convergence is detected, thereby accelerating computation and conserving resources. Moreover, by adjusting the termination threshold, the DIRS algorithm balances computational efficiency and solution accuracy, ensuring reliable approximation of the optimal value function with reduced computational costs.

Remark 3.

Compared with existing ADP algorithms [32,35] for which the number of internal iterations needs to be set in advance, the DIRS algorithm proposed in this paper can adaptively and dynamically adjust the iterative process when (18) is satisfied, avoiding unnecessary iterations after the internal iterations converge. Furthermore, the algorithm can satisfy the required control accuracy while performing the minimum number of iterations, thus saving computational resources and improving the efficiency of optimal bipartite containment control.

3.2. Stability, Convergence, and Optimality

In this subsection, the stability, convergence, and optimality properties of the proposed DIRS algorithm are analyzed, and the corresponding theoretical results are rigorously established. These results lay the theoretical foundation for solving the bipartite containment tracking control problem in multi-UAV systems.

Theorem 1.

The bipartite tracking control error

e_{i} (t)

in the multi-UAV system can be asymptotically stabilized under the application of the optimal control input

u_{i}^{*} (t)

, which is obtained via the proposed DIRS algorithm. That is, the following condition holds:

\lim_{t \to \infty} e_{i} (t) \to 0 .

(21)

Proof.

According to the DIRS algorithm, the optimal value function

V_{i}^{*}

satisfies the HJB equation as given in Equation (11) and consequently obtains

H_{i}^{*} (t) = V_{i}^{*} (t) - ρ V_{i}^{*} (t + 1) .

(22)

Multiply both sides of Equation (22) by

ρ^{k}

:

\begin{matrix} ρ^{k} H_{i}^{*} (t) = ρ^{k} V_{i}^{*} (t) - ρ^{k + 1} V_{i}^{*} (t + 1) . \end{matrix}

(23)

Define

ρ^{k} V_{i}^{*} (t)

as the Lyapunov function for all agents. Consequently, the first-order difference form of the Lyapunov equation is derived as

\begin{matrix} Θ (ρ^{k} V_{i}^{*} (t)) = ρ^{k + 1} V_{i}^{*} (t + 1) - ρ^{k} V_{i}^{*} (t) . \end{matrix}

(24)

By combining Equations (23) and (24), we obtain the following result.

Θ (ρ^{k} V_{i}^{*} (t)) = - ρ^{k} H_{i}^{*} (t) \leq 0 .

(25)

From Equation (25), it is evident that the bipartite tracking control error

e_{i} (t)

of multi-UAV systems is asymptotically stable, i.e.,

e_{i} (t) \to 0

when

t \to \infty

. This indicates that the system driven by the designed DIRS algorithm achieves bipartite tracking control and the agents track the leader’s trajectory.

The proof is completed. □

Theorem 2.

The value function, as determined by Equations (14) to (20) of the DIRS algorithm, forms a monotonically non-increasing sequence, satisfying the subsequent conditions:

V_{i}^{b, d + 1} (t) \leq V_{i}^{b, d} (t) .

(26)

V_{i}^{b + 1, d} (t) \leq V_{i}^{b, d} (t) .

(27)

Proof.

The mathematical induction method will be employed to prove Equations (26) and (27).

As the DIRS algorithm incorporates an adaptive termination mechanism, the maximum number of sub-iterations for the value function becomes uncertain. The algorithm is designed to automatically terminate when the change in the sub-iteration value function is insignificant, recording the maximum number of sub-iterations as

φ

, or

Ω

if it is not a termination. Therefore, in the subsequent proof, the maximum number of sub-iterations within a single iteration is denoted as ℑ, where

ℑ \in {φ, Ω}

.

Let

b = 1

, and denote the value function as follows:

\begin{matrix} V_{i}^{1, 0} (t) & = H_{i}^{1} (t) + ρ V_{i}^{0} (t + 1) \\ = \min_{u_{i} (t)} {H_{i} (t) + ρ V_{i}^{0} (t + 1)} \\ \leq H_{i}^{0} (t) + ρ V_{i}^{0} (t + 1) \\ = V_{i}^{0} (t) . \end{matrix}

(28)

Furthermore, when

d = 0

, it follows that

\begin{matrix} V_{i}^{1, 1} (t) & = H_{i}^{1} (t) + ρ V_{i}^{1, 0} (t + 1) \\ \leq H_{i}^{1} (t) + ρ V_{i}^{0} (t + 1) \\ = V_{i}^{1, 0} (t) . \end{matrix}

(29)

Let

d = α_{1}

, where

α_{1}

is a positive integer such that

α_{1} \in (0, ℑ - 1]

. From this, it follows that

\begin{matrix} V_{i}^{1, α_{1} + 1} (t) & = H_{i}^{1} (t) + ρ V_{i}^{1, α_{1}} (t + 1) \\ \leq H_{i}^{1} (t) + ρ V_{i}^{1, α_{1} - 1} (t + 1) \\ = V_{i}^{1, α_{1}} (t) . \end{matrix}

(30)

Based on Equations (28) to (30), it is straightforward to demonstrate that inequality (26) from Theorem 2 is satisfied when

b = 2

. Suppose further that inequality (26) from Theorem 2 holds for

b = β

. This leads to

\begin{matrix} V_{i}^{β, d + 1} (t) \leq V_{i}^{β, d} (t) . \end{matrix}

(31)

Consequently, it remains to establish that inequality (26) is also valid for

b = β + 1

:

\begin{matrix} V_{i}^{β + 1, 0} (t) & = H_{i}^{β + 1} (t) + ρ V_{i}^{β} (t + 1) \\ = \min_{u_{i} (t)} {H_{i} (t) + ρ V_{i}^{β} (t + 1)} \\ \leq H_{i}^{β} (t) + ρ V_{i}^{β} (t + 1) \\ = H_{i}^{β} (t) + ρ V_{i}^{β, ℑ} (t + 1) \\ \leq H_{i}^{β} (t) + ρ V_{i}^{β, ℑ - 1} (t + 1) \\ = V_{i}^{β, ℑ} (t) \\ = V_{i}^{β} (t) . \end{matrix}

(32)

By using (29), for when

d = 0

,

\begin{matrix} V_{i}^{β + 1, 1} (t) & = H_{i}^{β + 1} (t) + ρ V_{i}^{β + 1, 0} (t + 1) \\ \leq H_{i}^{β + 1} (t) + ρ V_{i}^{β} (t + 1) \\ = V_{i}^{β + 1, 0} (t) . \end{matrix}

(33)

Moreover, for

d = 1

,

\begin{matrix} V_{i}^{β + 1, 2} (t) & = H_{i}^{β + 1} (t) + ρ V_{i}^{β + 1, 1} (t + 1) \\ \leq H_{i}^{β + 1} (t) + ρ V_{i}^{β + 1, 0} (t + 1) \\ = V_{i}^{β + 1, 1} (t) . \end{matrix}

(34)

Similarly, let

d = α_{β}

and

α_{β}

be a positive integer satisfying

α_{β} \in (0, ℑ - 1]

; then, it can be derived that

\begin{matrix} V_{i}^{β + 1, α_{β} + 1} (t) = & H_{i}^{β + 1} (t) + ρ V_{i}^{β + 1, α_{β}} (t + 1) \\ \leq & H_{i}^{β + 1} (t) + ρ V_{i}^{β + 1, α_{β} - 1} (t + 1) \\ = & V_{i}^{β + 1, α_{β}} (t) . \end{matrix}

(35)

From the derivation process outlined above, it can be summarized that when

β = 1, 2, \dots

\begin{matrix} V_{i}^{1} (t) = & V_{i}^{1, ℑ} (t) \leq V_{i}^{1, 0} (t) \leq V_{i}^{0} (t), \\ V_{i}^{2} (t) = & V_{i}^{2, ℑ} (t) \leq V_{i}^{2, 0} (t) \leq V_{i}^{1} (t), \\ ⋮ \\ V_{i}^{β + 1} (t) = & V_{i}^{β + 1, ℑ} (t) \leq V_{i}^{β + 1, 0} (t) \leq V_{i}^{β} (t) . \end{matrix}

(36)

The proof is complete. □

An important lemma is introduced before determining the optimality of the DIRS algorithm.

Lemma 1

([38]). If the sequence

Π_{ı}, ı = 1, 2, \dots

is monotonically non-increasing, then it converges to the same limit as its subsequence.

Theorem 3.

The value function

V_{i} (t)

and the control law

u_{i} (t)

, as determined by Equations (14) to (20) of the DIRS algorithm, converge to the optimal values

V_{i}^{*} (t)

nd

u_{i}^{*} (t)

.

\lim_{p \to \infty} V_{i}^{b, d} (t) = R_{i}^{*} (e_{i} (t), u_{i} (t)) .

(37)

\lim_{b \to \infty} u_{i}^{b} (t) = u_{i}^{*} (t) .

(38)

Proof.

Initially, construct the sequence associated with the value function according to the DIRS algorithm as follows:

\begin{matrix} {V_{i}^{b, d} (t)} = & {V_{i}^{0} (t), \\ V_{i}^{1, 0} (t), V_{i}^{1, 1} (t), \dots, V_{i}^{1, μ} (t), \\ V_{i}^{1} (t), V_{i}^{2, 0} (t), \dots, V_{i}^{2, μ} (t), \\ V_{i}^{2} (t), \dots} . \end{matrix}

(39)

Furthermore, define the sequence of the value function for the outer iterations as follows:

\begin{matrix} {V_{i}^{b} (t)} = {V_{i}^{0} (t), V_{i}^{1} (t), V_{i}^{2} (t), \dots} . \end{matrix}

(40)

It is evident that the sequence (40) is a subsequence of the sequence (39). Moreover, given that sequence (39) is monotonically non-increasing, it follows from Lemma 1 that

\lim_{p \to \infty} V_{i}^{b, d} (e_{i} (t)) = \lim_{p \to \infty} V_{i}^{b} (e_{i} (t)) .

(41)

Therefore, Theorem 3 to be proven is transformed into

\lim_{p \to \infty} V_{i}^{b} (e_{i} (t)) = V_{i}^{*} (e_{i} (t)) .

(42)

Let the limit of

V_{i}^{b} (t)

as p approaches infinity be denoted by

V_{i}^{\infty} (t)

. Given that sequence (40) is monotonically non-increasing, it follows that

\begin{matrix} V_{i}^{\infty} (e_{i} (t)) = & H_{i}^{\infty} (t) + ρ V_{i}^{\infty} (t + 1) \\ \leq & H_{i}^{b} (t) + ρ V_{i}^{b} (t + 1) \\ = & \min_{u_{i} (t)} (H_{i} (t) + ρ V_{i}^{b} (t + 1)) . \end{matrix}

(43)

As

b \to \infty

, Equation (43) transforms into

\begin{matrix} V_{i}^{\infty} (e_{i} (t)) \leq \min_{u_{i} (t)} (H_{i} (t) + ρ V_{i}^{\infty} (t + 1)) . \end{matrix}

(44)

Further, suppose an arbitrarily small real number

ν

exists. Since the sequence

\{V_{i}^{b} (t)\}

is monotonically non-increasing, there is a positive iteration index Y and

V_{i}^{Υ} (e_{i} (t)) - ν \leq V_{i}^{\infty} (t) \leq V_{i}^{Υ} (t) .

(45)

According to Equation (46), it follows that

\begin{matrix} V_{i}^{\infty} (t) & \geq V_{i}^{Υ} (t) - ν \\ = H_{i}^{Υ} (t) + ρ V_{i}^{Υ} (t + 1) - ν \\ \geq H_{i}^{Υ} (t) + ρ V_{i}^{\infty} (t + 1) - ν \\ \geq \min_{u_{i} (t)} {H_{i} (t) + ρ V_{i}^{\infty} (t + 1)} - ν . \end{matrix}

(46)

Based on the prior definition,

ν

is an arbitrarily small real number, thus establishing that

V_{i}^{\infty} (e_{i} (t)) \geq \min_{u_{i} (t)} {H_{i} (t) + ρ V_{i}^{\infty} (t + 1)} .

(47)

As Equations (44) and (47) are both satisfied, we derive the following conclusion:

V_{i}^{\infty} (e_{i} (t)) = \min_{u_{i} (t)} {H_{i} (t) + ρ V_{i}^{\infty} (t + 1)} .

(48)

It is clear that Equations (9) and (48) have the same right-hand side equation; hence, it can be concluded that

V_{i}^{\infty} (e_{i} (t)) = R_{i}^{*} (e_{i} (t), u_{i} (t)) .

(49)

The proof for Equation (37) is thereby complete. With the value function converges to the optimal value, it is deduced from Equation (12) that the control law similarly converges to the optimal control law, i.e.,

\lim_{b \to \infty} u_{i}^{b} (t) = u_{i}^{*} (t)

.

The proof is complete. □

In this subsection, a data-driven control scheme based on the proposed DIRS algorithm is developed to solve the optimal bipartite containment tracking control problem for multi-UAV systems. The DIRS algorithm incorporates a novel adaptive termination mechanism, which effectively reduces redundant iteration steps and improves the overall computational efficiency. Furthermore, the stability, convergence, and optimality of the proposed control framework are rigorously established using a Lyapunov-based analysis and mathematical induction.

3.3. The Neural Network Framework

The DIRS algorithm is implemented within a neural network framework using an actor–critic architecture to enable data-driven reinforcement learning. Specifically, the critic network is employed to approximate the value function

V_{i}

, while the actor network is used to generate the control input

u_{i}

. In this paper, two independent fully connected feedforward neural networks are used to approximate the optimal value function

V_{i}^{*}

and the optimal control law

u_{i}^{*}

, respectively. These networks are updated using an error back-propagation algorithm.

3.3.1. The Critic Network

The critic network is employed to approximate the optimal value function

V_{i}^{*}

. The input to the critic network consists of the error state

e_{i} (t)

and the control inputs

u_{i} (t)

from the local agent, respectively. Accordingly, the output of the critic network is defined as follows:

{\hat{V}}_{i} (h_{i} (t)) = W_{c i} Φ_{i} (h_{i} (t)),

(50)

where

h_{i} (t) = {[e_{i}^{T} (t), u_{i}^{T} (t)]}^{T}

,

Φ_{i} (\cdot)

is a function expression, and

W_{c i}

is the network weight for the critic network.

Subsequently, the error function for the critic network is formulated as

ζ_{c i} (t) = H_{i} (t) + W_{c i} Φ_{i} (h_{i} (t + 1)) - W_{c i} Φ_{i} (h_{i} (t)) .

(51)

Therefore, the objective function within the critic network is defined as

E_{c i} (t) = \frac{1}{2} ζ_{c i} {(t)}^{T} ζ_{c i} (t) .

(52)

Furthermore, utilizing a gradient-descent-based approach, we derive the weight update rule for the critic network:

\begin{matrix} W_{c i}^{b + 1} = & W_{c i}^{b} - Δ W_{c i}^{b} = W_{c i}^{b} - κ_{c i} [H_{i} (t) + {\hat{V}}_{i} (h_{i} (t + 1)) - {\hat{V}}_{i} (h_{i} (t))] \\ \times {[Φ_{i} (h_{i} (t + 1)) - Φ_{i} (h_{i} (t))]}^{T}, \end{matrix}

(53)

where

κ_{c i} > 0

represents the learning rate of the critic network for the ith agent.

3.3.2. The Actor Network

The actor network is used to approximate the optimal control law

u_{i}^{*}

. The output vector of the actor network is given by

{\hat{u}}_{i} (t) = W_{a i} Ξ_{i} (e_{i} (t)),

(54)

where

Ξ_{i} (\cdot)

denotes the function expression, and

W_{a i}

represents the network weight of the actor network.

When the value function

V_{i}

of the ith agent converges to its optimal form, this implies that the corresponding control law is optimal. Based on this observation, the objective function for the actor neural network is designed as follows:

E_{a i} (t) = \frac{1}{2} ζ_{a i} {(t)}^{T} ζ_{a i} (t),

(55)

where

ζ_{a i} (t) = {\hat{V}}_{i} (t)

.

Similarly, the actor network updates its weights using a gradient-descent-based method, which can be expressed as follows:

\begin{matrix} W_{a i}^{b + 1} = & W_{a i}^{b} - Δ W_{a i}^{b} = W_{a i}^{b} - κ_{a i} W_{c i}^{b} Φ_{i} (h_{i} (t)) W_{c i}^{b} Φ_{i}^{'} C_{i} Ξ_{a i}^{T} (e_{i} (t)), \end{matrix}

(56)

where

κ_{a i} > 0

is the learning rate of the actor network,

Φ_{i}^{'} = \partial Φ_{i} (g_{i}) / \partial h_{i} (t)

, and

C_{i} = \partial h_{i} (t) / \partial {\hat{u}}_{i} (t)

is a constant matrix.

This subsection has established a reinforcement learning framework for the online approximation and implementation of the DIRS algorithm. The framework consists of two main components. First, the critic network evaluates the control law using real-time error data and estimates the approximate optimal value function, thereby enabling online learning of the system’s control behavior. Second, the actor network approximates the optimal control law using the learned value function, without relying on explicit system models, thus facilitating model-free optimal control. In the following section, this framework is applied to implementing the DIRS algorithm to achieve optimal bipartite containment tracking.

Remark 4.

It is worth noting that the designed DIRS-based controller is a data-driven model-free online control scheme. As mentioned in the literature [35,39], the neural network architectures introduced to solve the ADP-based control problem have only been used to fit the optimal value function and the optimal control inputs to solve nonlinear, non-analytic HJB equations. The data required for the neural network is generated entirely through the online interaction of the system, without the need for pre-collected or offline datasets. The whole training process is synchronized with the mission, and there is no independent offline phase.

4. Simulation Results

In this section, numerical simulations are conducted to validate the effectiveness of the proposed DIRS-based control framework and the neural network approximation scheme. The communication topology of the multi-UAV system considered, which incorporates both cooperative and competitive interactions, is depicted in Figure 2. In this topology, black arrows denote cooperative relationships, while red arrows represent competitive ones. The blue nodes correspond to the leader UAVs labeled A, B, C, and D, whereas the green nodes indicate the follower UAVs.

In the simulation, the value function parameters are initialized as

P_{i} = 1

and

Q_{i i} = 1

. The dynamic behaviors of both the leader and follower UAVs are modeled according to (4) and (3), with a sampling time of

T = 0.01

. The uncertainties are defined as

ℓ_{1} (t) = 0.03 \times {[\cos (0.3 T t), \sin (0.3 T t)]}^{⊤}

and

ℓ_{2} (t) = 0.0005 \times {[\sin (0.15 T t), \cos (0.15 T t)]}^{⊤}

, while the external disturbances are set as

o_{1} (t) = 0.002 \times {[\sin (T t), \sin (T t)]}^{⊤}

and

o_{2} (t) = 0.002 \times {[\cos (0.85 T t), \cos (1.25 T t)]}^{⊤}

.

For spatial initialization, the leader UAVs are placed at fixed coordinates

{[14, 14]}^{⊤}

,

{[14, 6]}^{⊤}

,

{[6, 14]}^{⊤}

, and

{[6, 6]}^{⊤}

, whereas the initial positions of the follower UAVs are randomly generated. The hyperparameter variables used in the neural network architecture are summarized in Table 1.

Figure 3 and Figure 4 depict the two-dimensional and three-dimensional trajectories of the optimal bipartite containment tracking for the multi-UAV system. Under the proposed optimal control strategy, all follower UAVs successfully converge to the bipartite convex hull spanned by the leader UAVs and achieve stable containment tracking. The corresponding bipartite containment tracking errors are illustrated in Figure 5, where the tracking errors of all follower UAVs asymptotically converge to zero, thereby validating the effectiveness of the proposed control law. Furthermore, Figure 6 presents the evolution of the weight vectors of the critic and actor networks, demonstrating the convergence behavior of the neural-network-based approximation process. Equally, according to the enlarged transient states in Figure 5 and Figure 6, it can be found that the system is in the regulation process in the first 6 s. The DIRS algorithm learns and continuously generates the optimal control inputs through the system data, and finally, the system achieves convergence after 6 s.

In order to further demonstrate the advantages of the designed DIRS in improving the efficiency of ADP computation, the same experimental conditions are used for comparison with the method in [34], and the average number of iterations for each UAV is calculated during the whole process of control. In addition, comparative experiments are conducted in order to assess the impact of the dynamic iteration threshold

Θ

on the iteration efficiency. Finally, the comparison results are shown in Figure 7. It can be seen that compared with the existing ADP algorithm, the introduction of the DIRS can effectively reduce the average number of iterations, and the DIRS with

Θ = 10^{- 6}

reduces the total number of iterations by 88.27% compared to that with the GPI. Meanwhile, the simulation results under different

Θ

values further illustrate that it is very important to choose a suitable iteration threshold to trade-off between control accuracy and computational efficiency in practical applications.

Remark 5.

In this paper, numerical simulations are conducted for the optimal bipartite containment tracking control problem for multi-UAV systems, mainly to verify the effectiveness of the designed DIRS algorithm-based controller. As mentioned in the literature [40,41,42], by extending the Newton’s second-law-compliant second-order integrator model and the disturbance model investigated in this paper, it could be deployed and developed in Gazebo, Webots, and IsaacSim. This has also inspired us to conduct more realistic flight experiments using Gazebo and other tools in the future.

5. Conclusions

In this paper, a data-driven optimal bipartite containment tracking control scheme is developed for multi-UAV systems subject to compound uncertainties. The dynamics of the multi-UAV system and the associated uncertainties are first modeled, followed by formulation of the optimal bipartite containment tracking problem and derivation of the corresponding coupled HJB equations. To alleviate the computational burden, a DIRS is introduced to adaptively adjust the number of iterations, thereby balancing the control accuracy and computational efficiency. Rigorous proofs are provided to establish the stability, convergence, and optimality of the proposed control framework. Furthermore, a neural-network-based online approximation method is employed to realize a model-free solution to the optimal control problem. Finally, the simulation results are also presented and compared with the existing ADP methods to verify the effectiveness of the proposed DIRS in saving computational resources.

The proposed DIRS framework is applicable to multi-UAV systems with compound uncertainties (e.g., external interference and internal modeling errors) and communication topologies that contain both cooperative and competitive interactions. Nonetheless, the framework suffers from a number of limitations, including the fact that the current design assumes a structural equilibrium map, which may be violated in the event of communication failures or switching topologies, and that practical constraints such as actuator saturation, failures, and communication delays have not yet been considered. Also, the current results build on numerical simulations to verify the effectiveness of the control algorithm. Therefore, future work will address these issues and extend the experimental results further to a UAV vehicle swarm for flight experiments to enhance its robustness and applicability.

Author Contributions

Methodology, B.C. and M.S.; Writing—original draft, B.C.; Writing—review & editing, M.S., Z.L. and K.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Sichuan Province (2024NSFSC0021), the Sichuan Science and Technology Programs (MZGC20240139), and the Fundamental Research Funds for the Central Universities (ZYGX2024K028, ZYGX2025K028).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ranjha, A.; Kaddoum, G.; Dev, K. Facilitating URLLC in UAV-Assisted Relay Systems with Multiple-Mobile Robots for 6G Networks: A Prospective of Agriculture 4.0. IEEE Trans. Ind. Inform. 2022, 18, 4954–4965. [Google Scholar] [CrossRef]
Fu, R.; Ren, X.; Li, Y.; Wu, Y.; Sun, H.; Al-Absi, M.A. Machine-Learning-Based UAV-Assisted Agricultural Information Security Architecture and Intrusion Detection. IEEE Internet Things J. 2023, 10, 18589–18598. [Google Scholar] [CrossRef]
Shao, X.; Du, J.; Xia, Y.; Zhang, Z.; Hou, X.; Debbah, M. Efficient Path-Following for Urban Logistics: A Fuzzy Control Strategy for Consumer UAVs under Disturbance Constraints. IEEE Trans. Consum. Electron. 2025; early access. [Google Scholar]
Li, W.; Yue, J.; Shi, M.; Lin, B.; Qin, K. Neural network-based dynamic target enclosing control for uncertain nonlinear multi-agent systems over signed networks. Neural Netw. 2025, 184, 107057. [Google Scholar] [CrossRef] [PubMed]
Yang, L.; Fan, J.; Liu, Y.; Li, E.; Peng, J.; Liang, Z. A review on state-of-the-art power line inspection techniques. IEEE Trans. Instrum. Meas. 2020, 69, 9350–9365. [Google Scholar] [CrossRef]
Li, W.; Ren, R.; Shi, M.; Lin, B.; Qin, K. Seeking Secure Adaptive Distributed Discrete-Time Observer for Networked Agent Systems Under External Cyber Attacks. IEEE Trans. Consum. Electron. 2025, 71, 918–930. [Google Scholar] [CrossRef]
Thummalapeta, M.; Liu, Y.C. Survey of containment control in multi-agent systems: Concepts, communication, dynamics, and controller design. Int. J. Syst. Sci. 2023, 54, 2809–2835. [Google Scholar] [CrossRef]
Santilli, M.; Franceschelli, M.; Gasparri, A. Secure rendezvous and static containment in multi-agent systems with adversarial intruders. Automatica 2022, 143, 110456. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, S.; Guo, F.; Zhao, X.; Zhang, F. Prescribed time attitude containment control for satellite cluster with bounded disturbances. ISA Trans. 2023, 137, 160–174. [Google Scholar] [CrossRef]
Li, W.; Zhou, S.; Shi, M.; Yue, J.; Lin, B.; Qin, K. Collision avoidance time-varying group formation tracking control for multi-agent systems. Appl. Intell. 2025, 55, 175. [Google Scholar] [CrossRef]
Zhang, H.; Zhao, W.; Xie, X.; Yue, D. Dynamic leader–follower output containment control of heterogeneous multiagent systems using reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 5307–5316. [Google Scholar] [CrossRef]
Chen, W.; Wang, Z.; Ding, D.; Ghinea, G.; Liu, H. Distributed Formation-Containment Control for Discrete-Time Multiagent Systems Under Dynamic Event-Triggered Transmission Scheme. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 1308–1319. [Google Scholar] [CrossRef]
Santilli, M.; Franceschelli, M.; Gasparri, A. Dynamic Resilient Containment Control in Multirobot Systems. IEEE Trans. Robot. 2022, 38, 57–70. [Google Scholar] [CrossRef]
Li, W.; Yan, S.; Shi, L.; Yue, J.; Shi, M.; Lin, B.; Qin, K. Multiagent Consensus Tracking Control Over Asynchronous Cooperation–Competition Networks. IEEE Trans. Cybern. 2025, 1–14. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Shi, L.; Shi, M.; Lin, B.; Qin, K. Cooperative Formation Tracking of Multi-Agent Systems Over Switching Signed Networks. IEEE Trans. Control Netw. Syst. 2025, 1–11. [Google Scholar] [CrossRef]
Wu, Y.; Meng, D.; Wu, Z.G. Disagreement and antagonism in signed networks: A survey. IEEE/CAA J. Autom. Sin. 2022, 9, 1166–1187. [Google Scholar] [CrossRef]
Wu, Y.; Mao, Z.; Jiang, B.; Park, J.H.; Shi, P. Prescribed Performance and Safety-Driven Bipartite Formation Containment Control for Marine Aerial-Surface Heterogeneous Systems. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 8743–8756. [Google Scholar] [CrossRef]
He, X.; Hou, Z. Bipartite containment tracking for nonlinear MASs under FDI attack based on model-free adaptive iterative learning control. Neurocomputing 2025, 614, 128783. [Google Scholar] [CrossRef]
Fan, S.; Peng, F.; Liu, X.; Wang, T.; Qiu, J. Bipartite containment control of multi-agent systems subject to adversarial inputs based on zero-sum game. Inf. Sci. 2024, 681, 121234. [Google Scholar] [CrossRef]
Guang, W.; Wang, X.; Zhang, W.; Li, H.; Li, H.; Huang, T. Fixed-Time Optimal Bipartite Containment Control for Stochastic Nonlinear Multiagent Systems with Unknown Hysteresis. IEEE Trans. Autom. Sci. Eng. 2025, 22, 5516–5525. [Google Scholar] [CrossRef]
Wang, L.; Yan, H.; Hu, X.; Li, Z.; Wang, M. Fixed-Time Bipartite Containment Control for Heterogeneous Multiagent Systems Under DoS Attacks: An Event-Triggered Mechanism. IEEE Trans. Syst. Man Cybern. Syst. 2025, 55, 2782–2794. [Google Scholar] [CrossRef]
Yue, J.; Qin, K.; Shi, M.; Jiang, B.; Li, W.; Shi, L. Event-trigger-based finite-time privacy-preserving formation control for multi-uav system. Drones 2023, 7, 235. [Google Scholar] [CrossRef]
Al-Qahtani, F.M.; Aldhaifallah, M.; El Ferik, S.; Saif, A.W.A. Robust FOSMC of a Quadrotor in the Presence of Parameter Uncertainty. Drones 2025, 9, 303. [Google Scholar] [CrossRef]
Wang, D.; Gao, N.; Liu, D.; Li, J.; Lewis, F.L. Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications. IEEE/CAA J. Autom. Sin. 2023, 11, 18–36. [Google Scholar] [CrossRef]
Yang, X.; Wei, Q. Adaptive dynamic programming for robust event-driven tracking control of nonlinear systems with asymmetric input constraints. IEEE Trans. Cybern. 2024, 54, 6333–6344. [Google Scholar] [CrossRef] [PubMed]
Guo, S.; Pan, Y.; Li, H.; Cao, L. Dynamic Event-Driven ADP for N-Player Nonzero-Sum Games of Constrained Nonlinear Systems. IEEE Trans. Autom. Sci. Eng. 2025, 22, 7657–7669. [Google Scholar] [CrossRef]
He, Z.; Hu, J.; Wang, Y.; Cong, J.; Bian, Y.; Han, L. Attitude-Tracking Control for Over-Actuated Tailless UAVs at Cruise Using Adaptive Dynamic Programming. Drones 2023, 7, 294. [Google Scholar] [CrossRef]
Lan, Y.; Zhai, Q.; Yan, C.B.; Liu, X.; Guan, X. Robust approximate dynamic programming for large-scale unit commitment with energy storages. IEEE Trans. Autom. Sci. Eng. 2023, 21, 7401–7412. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, K.; Xie, X.; Stojanovic, V. ADP-Based Prescribed-Time Control for Nonlinear Time-Varying Delay Systems with Uncertain Parameters. IEEE Trans. Autom. Sci. Eng. 2025, 22, 3086–3096. [Google Scholar] [CrossRef]
Mu, C.; Peng, J. Learning-Based Cooperative Multiagent Formation Control with Collision Avoidance. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 7341–7352. [Google Scholar] [CrossRef]
Mu, C.; Peng, J.; Sun, C. Hierarchical Multiagent Formation Control Scheme via Actor-Critic Learning. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 8764–8777. [Google Scholar] [CrossRef]
Yang, S.; Yu, F.; Liu, H.; Ma, H.; Zhang, H. Adaptive-Dynamic-Programming-Based Robust Control for a Quadrotor UAV with External Disturbances and Parameter Uncertainties. Appl. Sci. 2023, 13, 12672. [Google Scholar] [CrossRef]
Zhang, H.; Ren, H.; Mu, Y.; Han, J. Optimal consensus control design for multiagent systems with multiple time delay using adaptive dynamic programming. IEEE Trans. Cybern. 2021, 52, 12832–12842. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Wei, Q.; Yan, P. Generalized Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems. IEEE Trans. Syst. Man Cybern. Syst. 2015, 45, 1577–1591. [Google Scholar] [CrossRef]
Wei, Q.; Liao, Z.; Shi, G. Generalized Actor-Critic Learning Optimal Control in Smart Home Energy Management. IEEE Trans. Ind. Inform. 2021, 17, 6614–6623. [Google Scholar] [CrossRef]
Xue, X.; Yuan, B.; Yi, Y.; Mu, L.; Zhang, Y. Connectivity Preservation and Obstacle Avoidance Control for Multiple Quadrotor UAVs with Limited Communication Distance. Drones 2025, 9, 136. [Google Scholar] [CrossRef]
Ren, W.; Beard, R. Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Trans. Autom. Control 2005, 50, 655–661. [Google Scholar] [CrossRef]
Zorich, V.A.; Paniagua, O. Mathematical Analysis II; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Wang, K.; Mu, C.; Ni, Z.; Liu, D. Safe Reinforcement Learning and Adaptive Optimal Control with Applications to Obstacle Avoidance Problem. IEEE Trans. Autom. Sci. Eng. 2024, 21, 4599–4612. [Google Scholar] [CrossRef]
Yogi, S.C.; Behera, L.; Nahavandi, S. Adaptive Intelligent Minimum Parameter Singularity Free Sliding Mode Controller Design for Quadrotor. IEEE Trans. Autom. Sci. Eng. 2024, 21, 1805–1823. [Google Scholar] [CrossRef]
Tang, J.; Wan, Y.; Lao, S.; Zhao, Z. A Distributed Autonomous System for Multi-UAVs with Limited Visualization: Employing Dual-Horizon NMPC Controller. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 6910–6924. [Google Scholar] [CrossRef]
Singh, K.; Mehndiratta, M.; Feroskhan, M. QuadPlus: Design, Modeling, and Receding-Horizon-Based Control of a Hyperdynamic Quadrotor. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 1766–1779. [Google Scholar] [CrossRef]

Figure 1. The DIRS-based optimal bipartite containment tracking control scheme.

Figure 2. The communication topology of the multi-UAV system.

Figure 3. Optimal bipartite containment tracking trajectories in 3D for the multi-UAV system.

Figure 4. The optimal bipartite containment tracking trajectories in 2D for the multi-UAV system.

Figure 5. The trajectories of the optimal bipartite containment tracking control errors. (a)

e_{i, 1}

. (b)

e_{i, 2}

.

Figure 5. The trajectories of the optimal bipartite containment tracking control errors. (a)

e_{i, 1}

. (b)

e_{i, 2}

.

Figure 6. Weights of the neural networks. (a) Actor network weights

W_{a i}

. (b) Critic network weights

W_{c i}

.

Figure 6. Weights of the neural networks. (a) Actor network weights

W_{a i}

. (b) Critic network weights

W_{c i}

.

Figure 7. Average number of iterations per UAV for GPI [34] and DIRS with different thresholds

Θ

.

Figure 7. Average number of iterations per UAV for GPI [34] and DIRS with different thresholds

Θ

.

Table 1. The hyperparameter variables used in the neural network architecture.

	Critic Network	Actor Network
Learning rate $κ$	$0.01$	$0.01$
Discount factor $ρ$	$0.9$	$0.9$
Maximum sub-iteration limit $Ω$	15	15
Dynamic iteration threshold $Θ$	$1 \times 10^{- 6}$	$1 \times 10^{- 6}$
Activation function	$Φ_{i} (t) = {[e_{i}^{2} (t), u_{i}^{2} (t)]}^{T}$	$Ξ_{i} (t) = e_{i} (t)$
Weight initialization	0	Random numbers in (0, 1]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, B.; Shi, M.; Li, Z.; Qin, K. Data-Driven Optimal Bipartite Containment Tracking for Multi-UAV Systems with Compound Uncertainties. Drones 2025, 9, 573. https://doi.org/10.3390/drones9080573

AMA Style

Chen B, Shi M, Li Z, Qin K. Data-Driven Optimal Bipartite Containment Tracking for Multi-UAV Systems with Compound Uncertainties. Drones. 2025; 9(8):573. https://doi.org/10.3390/drones9080573

Chicago/Turabian Style

Chen, Bowen, Mengji Shi, Zhiqiang Li, and Kaiyu Qin. 2025. "Data-Driven Optimal Bipartite Containment Tracking for Multi-UAV Systems with Compound Uncertainties" Drones 9, no. 8: 573. https://doi.org/10.3390/drones9080573

APA Style

Chen, B., Shi, M., Li, Z., & Qin, K. (2025). Data-Driven Optimal Bipartite Containment Tracking for Multi-UAV Systems with Compound Uncertainties. Drones, 9(8), 573. https://doi.org/10.3390/drones9080573

Article Menu

Data-Driven Optimal Bipartite Containment Tracking for Multi-UAV Systems with Compound Uncertainties

Abstract

1. Introduction

2. Preliminaries and Problem Formulation

2.1. Preliminaries

2.2. Problem Formulation

2.2.1. Models of the UAVs

2.2.2. The Description of the Optimal Bipartite Containment Tracking Control Problem

3. DIRS Algorithm Design, Analysis, and Online Implementation

3.1. The DIRS Algorithm Design

3.2. Stability, Convergence, and Optimality

3.3. The Neural Network Framework

3.3.1. The Critic Network

3.3.2. The Actor Network

4. Simulation Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI