Next Article in Journal
From Human Teams to Autonomous Swarms: A Reinforcement Learning-Based Benchmarking Framework for Unmanned Aerial Vehicle Search and Rescue Missions
Previous Article in Journal
Drone-Based Road Marking Condition Mapping: A Drone Imaging and Geospatial Pipeline for Asset Management
Previous Article in Special Issue
Scalable Pursuit–Evasion Game for Multi-Fixed-Wing UAV Based on Dynamic Target Assignment and Hierarchical Reinforcement Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Optimal Consensus Control for Multi-Agent Systems with Disturbances

1
School of Automation, Chengdu University of Information Technology, Chengdu 610225, China
2
School of Applied Mathematics, Chengdu University of Information Technology, Chengdu 610225, China
*
Author to whom correspondence should be addressed.
Drones 2026, 10(2), 78; https://doi.org/10.3390/drones10020078 (registering DOI)
Submission received: 25 September 2025 / Revised: 11 January 2026 / Accepted: 17 January 2026 / Published: 23 January 2026

Highlights

What are the main findings?
  • A novel improved nonlinear extended state observer (INESO) is designed to achieve real-time estimation of unknown disturbances in multi-agent systems (MASs), significantly enhancing system robustness through an effective compensation mechanism.
  • A momentum-accelerated Actor–Critic network is developed that substantially improves convergence speed and suppresses unstable oscillations during multi-agent consensus.
What are the implications of the main findings?
  • The proposed distributed optimal disturbance rejection control protocol provides a scalable solution for nonlinear MASs under unknown disturbances, overcoming the inherent trade-off between convergence rate and stability in existing methods.
  • This framework establishes a reliable technical foundation for UAV swarm applications in search and rescue, logistics, and inspection missions, demonstrating improved cooperative stability in disturbed environments.

Abstract

The purpose of this article is to develop optimal control strategies for discrete-time multi-agent systems (DT-MASs) with unknown disturbances, with the goal of enhancing their consensus performance and disturbance rejection capabilities. Complex flight conditions, such as the scenario of multi-unmanned aerial vehicle (multi-UAV) maintaining consensus under strong wind gusts, pose significant challenges for MAS control. To address these challenges, this article develops an optimal controller for UAV-based MASs with unknown disturbances to reach consensus. First, a novel improved nonlinear extended state observer (INESO) is designed to estimate disturbances in real time, accompanied by a corresponding disturbance compensation scheme. Subsequently, the consensus error systems and cost functions are established based on the disturbance-free DT-MASs. Building on this, a policy iterative algorithm based on a momentum-accelerated Actor–Critic network is proposed for the disturbance-free DT-MASs to synthesize an optimal consensus controller, whose integration with the disturbance compensation scheme yields an optimal disturbance rejection controller for the disturbance-affected DT-MASs to achieve consensus control. Comparative quantitative analysis demonstrates significant performance improvements over a standard gradient Actor–Critic network: the proposed approach reduces convergence time by 12.8%, improves steady-state position accuracy by 22.7%, enhances orientation accuracy by 42.1%, and reduces overshoot by 22.7%. Finally, numerical simulations confirm the efficacy and superiority of the method.

1. Introduction

In recent years, growing research interest in cooperative control of multi-agent systems (MASs) has emerged, motivated by extensive applications in unmanned air vehicle (UAV) [1], aerospace robotics [2], autonomous vehicles [3], and smart grids [4]. As a cornerstone of cooperative control, consensus problems constitute a fundamental research focus, wherein control protocols ensure state agreement among agents [5,6,7,8]. Notably, the consensus problem of MASs is a comprehensive interdisciplinary field encompassing artificial intelligence [9], computer science [10], control theory [11], and communication technology [12]. However, it is insufficient for MASs to achieve consensus alone in practice, and practical objectives typically require simultaneous optimization of control costs, leading to optimal consensus control frameworks [13,14].
It is well established that optimal consensus control depends on solving the coupled Hamilton–Jacobi–Bellman (HJB) equations, which are generally intractable for analytical solutions [15,16]. Consequently, developing computationally feasible and efficient methods to overcome this challenge remains an open and significant research problem [17]. Fortunately, adaptive dynamic programming (ADP) [18,19,20] has received increasing attention as a powerful machine learning and optimization strategy for addressing the above-mentioned difficulties in the past several decades. This approach tackles optimal control problems forward-in-time assisted by approximators, such as neural networks (NNs). Within the existing literature on ADP control for nonlinear systems, policy iteration (PI) is extensively utilized and is typically implemented via Actor–Critic networks using standard gradient descent learning. An online adaptive PI algorithm in Vamvoudakis and Lewis [21] was developed to solve optimal control problems, utilizing the classic Actor–Critic framework with gradient descent-based learning, and rigorous proofs for stability and convergence were provided. Subsequently, a novel data-based adaptive dynamic programming method was developed in [22] to solve the optimal consensus tracking problem for discrete-time multi-agent systems (DT-MASs) with multiple time delays, which also relies on gradient descent-driven Actor–Critic networks to realize policy iteration. Building upon this line of work, ref. [23] introduced an event-triggered control scheme to alleviate communication burdens while retaining the gradient-based Actor–Critic design for policy learning. Notably, in the context of UAV swarm coordination, ADP-based approaches have shown significant promise. For instance, ref. [24] developed an ADP-based optimal formation control protocol for UAVs, enabling distributed learning without requiring precise system models. Similarly, online ADP algorithms have been proposed to solve the cooperative tracking control problem for multiple UAVs, achieving optimal performance through critic-only network design [25]. These studies demonstrate the potential of ADP in addressing the computational challenges of coupled HJB Equations for MASs, especially in complex platforms like UAVs. While these ADP approaches offer promising solutions for optimal consensus control, they face significant challenges when applied to practical MASs operating in disturbed environments. A critical limitation in existing ADP-based consensus control approaches is their inability to simultaneously achieve fast convergence and stable performance [26]. The standard gradient descent methods employed in Actor–Critic networks often result in either prolonged training times or oscillatory behavior, severely constraining their applicability in real-time multi-agent scenarios where both rapid consensus and operational stability are paramount.
Beyond these inherent challenges in ADP methods, practical implementations of MASs inevitably encounter unknown disturbances, which critically impede optimal consensus control [27,28,29]. This is particularly acute in UAV swarm applications, where agents are exposed to time-varying wind gusts, payload uncertainties, and complex aerodynamic interactions. For instance, ref. [30] investigated the impact of wind disturbances on UAV formation stability, while [31] addressed robust trajectory tracking for quadrotors under both model uncertainties and external disturbances. For dynamical systems subject to external disturbances, the robust control problem can be reformulated as an optimal control framework through proper selection of the cost function [32]. However, this method requires a priori knowledge of the disturbances bound, which may reduce the actual robustness. In [33], an enhanced ESO-based feedback control scheme was developed, specifically designed for multi-input multi-output (MIMO) systems to improve disturbance compensation. Building upon this foundation, the active disturbance rejection control (ADRC) method proposed in [34] utilizes an extended state observer (ESO) to simultaneously estimate both exogenous disturbances and system uncertainties. In [35], an ESO is designed for a single nonlinear system to simultaneously estimate system states and total disturbances, with real-time compensation achieved. The study in [36] addresses the tracking control problem for uncertain surface vessels (SVs) subject to external disturbances by proposing a disturbance observer (DO)-based optimal control scheme with ADP. An intelligent transportation system disturbance suppression method was proposed in [37], where the event-triggered observer was used to estimate the disturbances, and the upper bound of the disturbances was processed by the adaptive compensation law. This method avoided the Zeno behavior while ensuring the boundedness of the tracking error.
Despite significant advances in optimal consensus control, ADP frameworks, and ESO-based disturbance estimation for MASs, critical limitations persist in existing methodologies. First, Actor–Critic training strategies relying on standard gradient descent exhibit an inherent trade-off between convergence rate and stability, where low learning rates slow convergence while high learning rates induce destabilizing oscillations [21,23,38], which severely constrains the real-time control performance of MASs. Second, conventional observers, typically based on linear assumptions, may struggle to achieve rapid and accurate estimation when dealing with the complex, nonlinear, and time-varying disturbances commonly encountered in UAV swarms [37,39], such as wind gusts, thereby compromising disturbance rejection and degrading system performance. To holistically address these dual challenges, this article introduces an integrated solution framework. The principal innovations are summarized as follows:
  • A novel improved nonlinear extended state observer (INESO) is designed to realize the real-time estimation of the unknown disturbances of DT-MASs. The unknown disturbances are effectively handled by a compensation mechanism, and the robustness of the systems is significantly improved.
  • A momentum-accelerated Actor–Critic network is developed for DT-MASs to accelerate state convergence and suppress unstable behaviors. The problem of a slow convergence rate of the standard gradient descent method is effectively overcome by historical gradient information, and the state convergence time is shorter for DT-MASs to achieve stability.
  • A distributed optimal disturbance rejection consensus control protocol is proposed for DT-MASs subject to unknown disturbances to reach consensus. Relying solely on local neighbor interactions, this protocol guarantees scalability to arbitrarily large-scale networks.
The remainder of this article is organized as follows. The preliminaries on graph theory and neural network are introduced in Section 2. The dynamics of DT-MASs under unknown disturbances and the definition of consensus, along with the underlying research motivation, are presented in Section 3. In Section 4, an INESO is designed to estimate the disturbances, and a momentum-accelerated Actor–Critic network and an optimal control protocol are proposed for DT-MASs to reach consensus. Simulations and discussions follow in Section 5 and Section 6. Finally, Section 7 concludes this work.

2. Preliminaries

2.1. Algebraic Graph Theory

The communication topology among agents is described by a directed graph G = ( V , E , A ) , where V = { 1 , 2 , , N } is the node set, E V × V denotes the edge set, and  A = [ a i j ] R N × N represents the weighted adjacency matrix, where a i j > 0 iff ( j , i ) E , otherwise a i j = 0 . Self-loops are excluded, i.e.,  a i i = 0 . The neighbor set of node i V is N i = { j ( j , i ) E } , where agent j is a neighbor of agent i. The in-degree matrix D = diag { d 1 , , d i , , d N } is defined by d i = j N i a i j , yielding the Laplacian matrix L = D A . A directed graph is said to contain a spanning tree if there exists at least one node, called the root, from which there is a directed path to every other node. This structural feature is essential for ensuring global information propagation throughout MASs.

2.2. Neural Network

Neural network (NN) is widely recognized for their capability to universally approximate any continuous function F ( X ) : R n R m with arbitrary accuracy over a compact subset X R n . The NN-based approximation of F ( X ) can be expressed as
F ( X ) = W T S ( X ) ,
where W R p × m represents the weight matrix with p neurons, and  S ( X ) = [ S 1 ( X ) , S 2 ( X ) , …, S p ( X ) ] T R p denotes the vector of Gaussian basis functions.
For the NN approximation (1), there exists an ideal weight matrix W * , such that F ( X ) can be reformulated as
F ( X ) = W * T S ( X ) + δ ( X ) ,
where δ ( X ) R m is the approximation error satisfying δ ( X ) ϑ , and  ϑ is a small positive constant.
The ideal weight matrix W * , introduced solely for analytical purposes, is defined as the solution to the following optimization problem
W * : = arg min W R p × m sup X Ω F ( X ) W T S ( X ) ,
where sup X Ω denotes the supremum of the approximation error over the compact set Ω . This means that W * is chosen to minimize the worst-case approximation error across the entire operating region Ω , ensuring uniform approximation performance.

3. Problem Formulation

3.1. Modeling Assumptions and Motivation

The multi-agent coordination problem in drone swarms is intrinsically rooted in three-dimensional space. However, to elucidate the core control mechanisms with enhanced clarity and prevent analytical interference from the holistic complexity of three-dimensional dynamics, this study strategically confines both system modeling and convergence analysis to a two-dimensional plane. This dimensional reduction transcends mere formal simplification, constituting a focused methodological approach that isolates motion space dimensions weakly correlated with collaborative consensus, thereby intensively investigating the pivotal scientific challenge of achieving state consensus under wind disturbance conditions. The rationale for this approach stems fundamentally from the dominance of horizontal lateral components in both planar coordinated dynamics among agents and wind disturbances. The design and verification within this two-dimensional framework facilitate thorough characterization of algorithmic performance at the theoretical stage, establishing a robust foundation for subsequent extension to three-dimensional space. Accordingly, the validity of this dimensionally reduced modeling paradigm is substantiated through the following two principal aspects:
  • Application Context: The primary motivation stems from a broad category of real-world applications where drone swarms predominantly operate in planar configurations. A representative example is precision agriculture, where coordinated drone formations are deployed for monitoring vast croplands. In such missions, drones typically maintain a predetermined constant altitude to achieve uniform area coverage, thereby effectively constraining their operational space to an approximately horizontal plane. While minor altitude adjustments may occur, the core coordination tasks—formation maintenance, collision avoidance, and consensus on flight paths—remain fundamentally two-dimensional problems within the ( x , y ) plane. Focusing on this plane enables the isolation and rigorous analysis of swarm responses to lateral wind disturbances, which constitute the primary source of trajectory deviations in outdoor environments.
  • Methodological Rationale: This reduction from 3D to 2D represents an established and justified methodology in robotics and control theory for initial theoretical development. The 2D model preserves the fundamental challenges of multi-agent consensus, such as nonlinear dynamics and disturbance rejection, while significantly enhancing analytical clarity and reducing computational overhead in simulations [24]. It serves as a critical and necessary first step toward understanding the core principles of the proposed coordination algorithm before undertaking more complex 3D extensions. We explicitly acknowledge that a full 3D model, incorporating turbulent wind fields and vertical dynamics, remains the ultimate objective for real-world deployment. The insights gained from this 2D study will provide a solid foundation for such future work.
In conclusion, the 2D modeling paradigm adopted in this article is a deliberate and justified strategy to make a clear, verifiable, and significant contribution to the theory of robust cooperative control. It allows us to answer critical scientific questions with a level of rigor that would be severely compromised in a more complex setting. We explicitly acknowledge that the final goal is 3D deployment, and the results presented here constitute a vital and necessary step toward that ambitious objective. The subsequent discussion and future work sections will outline the specific pathway for this 3D extension, building directly upon the theoretical foundation established in this study.

3.2. System Dynamics

Assume that UAV-based DT-MASs comprise a disturbance-free leader and N followers subject to unknown disturbances, and that all UAVs are flying at the same altitude. Given these assumptions, the system can be modeled in a two-dimensional plane [24], and the dynamics of the i-th follower, which is also the i-th agent, can be governed by
x i ( k + 1 ) y i ( k + 1 ) θ i ( k + 1 ) = x i ( k ) y i ( k ) θ i ( k ) + T c o s ( θ i ( k ) ) 0 s i n ( θ i ( k ) ) 0 0 1 v i ( k ) + d i 1 ( k ) w i ( k ) + d i 2 ( k ) ,
where the state vector of agent i ( i V ) is defined by the vector ( x i ( k ) , y i ( k ) , θ i ( k ) ) T R 3 , T is the sampling period, the control input vector is represented by ( v i ( k ) , w i ( k ) ) T R 2 , and the unknown disturbances is denoted by d i 1 ( k ) R and d i 2 ( k ) R . Specifically, x i ( k ) and y i ( k ) are the position coordinates representing the UAV’s location in the Cartesian plane, while θ i ( k ) is the orientation angle denoting the UAV’s heading direction, measured counterclockwise from the positive x-axis. The control input ( v i ( k ) , w i ( k ) ) T consists of v i ( k ) , the linear velocity along the current heading direction θ i ( k ) , and  w i ( k ) , the angular velocity controlling the rate of orientation change. The terms d i 1 ( k ) and d i 2 ( k ) represent unknown but bounded input disturbances affecting the linear velocity and angular velocity channels, respectively, which may arise from environmental factors, model uncertainties, or external disturbances.
The operator Λ i ( k ) is defined as the transformation matrix that converts the control inputs from the UAV’s body-fixed frame to the global inertial frame. This matrix is constructed based on the current orientation angle θ i ( k ) and has the specific form:
Λ i ( k ) = cos ( θ i ( k ) ) 0 sin ( θ i ( k ) ) 0 0 1 .
The first column, containing cos ( θ i ( k ) ) and sin ( θ i ( k ) ) , projects the linear velocity v i ( k ) onto the global x and y axes, respectively. The second column maps the angular velocity w i ( k ) directly to the orientation dynamics, as orientation is already defined in the global frame.
If ( x i ( k ) , y i ( k ) , θ i ( k ) ) T = X i ( k ) , ( v i ( k ) , w i ( k ) ) T = u i ( k ) , and  ( d i 1 ( k ) , d i 2 ( k ) ) T = d i ( k ) , then system (4) can be compactly expressed as
X i ( k + 1 ) = X i ( k ) + T Λ i ( k ) ( u i ( k ) + d i ( k ) ) ,
where Λ i ( k ) is the transformation matrix defined above. Similarly, the dynamics of the leader can be expressed as
X 0 ( k + 1 ) = X 0 ( k ) + T Λ 0 ( k ) u 0 ( k ) ,
where X 0 ( k ) = ( x 0 ( k ) , y 0 ( k ) , θ 0 ( k ) ) T R 3 , Λ 0 ( k ) is defined in the same manner as Λ i ( k ) , and  u 0 ( k ) = ( v 0 ( k ) , w 0 ( k ) ) T R 2 .

3.3. Control Object

In order to investigate the robust optimal consensus control of DT-MASs (5) and (6), the consensus error system for agent i is defined as
ε i ( k ) = j N i a i j ( X i ( k ) X j ( k ) ) + a i 0 ( X i ( k ) X 0 ( k ) ) .
In distributed control frameworks, each agent’s decision-making relies on both its own state information and that received from neighboring agents within the network. Consequently, for the DT-MASs described by (5) and (6), the performance index associated with agent i is influenced not only by its own control input u i ( k ) but also by the control input u j ( k ) of its neighbor j and the consensus error system ε i ( k ) . To capture this interdependence, the cost function for agent i is defined as
J i ( ε i ( k ) , u i ( k ) , u j ( k ) ) = l = k μ ( l k ) C i ( ε i ( l ) , u i ( l ) , u j ( l ) ) = l = k μ ( l k ) ( ε i T ( l ) S i i ε i ( l ) + u i T ( l ) R i i u i ( l ) + j N i u j T ( l ) R i j u j ( l ) ) ,
where C i ( ε i ( l ) , u i ( l ) , u j ( l ) ) is the instantaneous cost founction at step l, S i i 0 , R i i 0 , and  R i j 0 are all positive symmetric matrices, and  0 < μ 1 is the discount factor.
The control objective is to design the control input u i ( k ) , such that the cost function (8) is minimized while ensuring that the DT-MASs (5) and (6) satisfy the consensus condition, thereby achieving robust optimal consensus control. The consensus condition is given as
lim k + X i ( k ) X 0 ( k ) ϱ , i V ,
where ϱ is an arbitrarily assigned nonnegative constant that represents the maximum allowable state error for the DT-MASs (5) and (6). The control input u i ( k ) can constitute an optimal consensus controller for the i-th agent when the DT-MASs (5) and (6) satisfy the consensus condition (9).

4. Main Results

This section first designs an INESO for real-time estimation of unknown disturbances and then constructs a disturbance compensation scheme. Subsequently, a momentum-accelerated Actor–Critic network is proposed for the disturbance-free DT-MASs to synthesize an optimal consensus controller, whose integration with the disturbance compensation scheme yields an optimal disturbance rejection controller for the disturbance-affected DT-MASs to achieve consensus control.

4.1. Improved Nonlinear Extended State Observer

To estimate the unknown disturbance d i ( k ) , an INESO is designed as
X ^ i ( k + 1 ) = X ^ i ( k ) + T Λ i ( k ) ( u i ( k ) + d ^ i ( k ) ) T α i Ψ 1 ( ξ i ( k ) , ς i , γ i , β i ) , d ^ i ( k + 1 ) = d ^ i ( k ) T ζ i Λ i T ( k ) Ψ 2 ( ξ i ( k ) , λ i , η i , χ i ) ,
where
  • X ^ i ( k ) R 3 is the estimate of X i ( k ) ;
  • d ^ i ( k ) R 2 is the estimate of d i ( k ) ;
  • ξ i ( k ) = X i ( k ) X ^ i ( k ) R 3 is the state estimation error vector;
  • α i > 0 , ζ i > 0 , 0 < γ i < 1 , β i 0 , 0 < η i < 1 , χ i 0 are tunable parameters;
  • ς i and λ i are arbitrarily small positive constants;
  • Ψ 1 ( ξ i ( k ) , ς i , γ i , β i ) R 3 and Ψ 2 ( ξ i ( k ) , λ i , η i , χ i ) R 3 are nonlinear functions defined by
Ψ 1 ( ξ i ( k ) , ς i , γ i , β i ) = ( ξ i ( k ) γ i 1 + β i ) ξ i ( k ) , ξ i ( k ) > ς i , ( ς i ( 1 γ i ) + β i ) ξ i ( k ) , ξ i ( k ) ς i ,
Ψ 2 ( ξ i ( k ) , λ i , η i , χ i ) = ( ξ i ( k ) η i 1 + χ i ) ξ i ( k ) , ξ i ( k ) > λ i , ( λ i ( 1 η i ) + χ i ) ξ i ( k ) , ξ i ( k ) λ i .
Theorem 1.
Consider the DT-MASs (5) equipped with the INESO (10). If the following conditions hold
(i) 
The disturbance difference Δ d i ( k ) = d i ( k + 1 ) d i ( k ) is bounded, i.e., there exists a positive constant d ¯ i such that Δ d i ( k ) d ¯ i for all k;
(ii) 
The parameters α i , ζ i , ς i , γ i , β i , λ i , η i , and  χ i are chosen such that the induced matrix norm (or spectral radius) of A I N E S O ( k ) satisfies
A I N E S O ( k ) σ < 1 , for all k ,
where σ is a positive constant,
A I N E S O ( k ) = ( 1 + T α i h ( ξ i ( k ) ) ) I 3 × 3 T Λ i ( k ) T ζ i g ( ξ i ( k ) ) Λ i T ( k ) I 2 × 2 ,
h ( ξ i ( k ) ) = ( ξ i ( k ) γ i 1 + β i ) , ξ i ( k ) > ς i , ( ς i ( 1 γ i ) + β i ) , ξ i ( k ) ς i ,
and
g ( ξ i ( k ) ) = ( ξ i ( k ) η i 1 + χ i ) , ξ i ( k ) > λ i , ( λ i ( 1 η i ) + χ i ) , ξ i ( k ) λ i ,
  • then the estimation error Γ i ( k ) = [ ξ i ( k ) T , φ i ( k ) T ] T = [ ( X i ( k ) X ^ i ( k ) ) T , ( d i ( k ) d ^ i ( k ) ) T ] T is uniformly ultimately bounded. Moreover, there exists a finite time step K such that for all k > K , the estimation error is bounded by Γ i ( k ) κ i , where the ultimate bound κ i is given by
κ i = d ¯ i 1 σ .
Proof. 
From the system dynamics (5) and the INESO (10), the state estimation error
ξ i ( k + 1 ) = X i ( k + 1 ) X ^ i ( k + 1 ) = [ X i ( k ) + T Λ i ( k ) ( u i ( k ) + d i ( k ) ) ] [ X ^ i ( k ) + T Λ i ( k ) ( u i ( k ) + d ^ i ( k ) ) T α i Ψ 1 ( ξ i ( k ) , ς i , γ i , β i ) ] = ξ i ( k ) + T Λ i ( k ) φ i ( k ) + T α i Ψ 1 ( ξ i ( k ) , ς i , γ i , β i ) ,
where φ i ( k ) = d i ( k ) d ^ i ( k ) R 2 .
Note that based on the definition of Ψ 1 ( ξ i ( k ) , ς i , γ i , β i ) , one can obtain
Ψ 1 ( ξ i ( k ) , ς i , γ i , β i ) = h ( ξ i ( k ) ) ξ i ( k ) ,
where h ( ξ i ( k ) ) is defined in Theorem 1. Therefore, Equation (16) can be rewritten as
ξ i ( k + 1 ) = ξ i ( k ) + T Λ i ( k ) φ i ( k ) + T α i h ( ξ i ( k ) ) ξ i ( k ) = ( 1 + T α i h ( ξ i ( k ) ) ) I 3 ξ i ( k ) + T Λ i ( k ) φ i ( k ) .
Furthermore, the disturbance estimation error
φ i ( k + 1 ) = d i ( k + 1 ) d ^ i ( k + 1 ) = [ d i ( k ) + Δ d i ( k ) ] [ d ^ i ( k ) T ζ i Λ i T ( k ) Ψ 2 ( ξ i ( k ) , λ i , η i , χ i ) ] = φ i ( k ) + Δ d i ( k ) + T ζ i Λ i T ( k ) Ψ 2 ( ξ i ( k ) , λ i , η i , χ i ) .
Based on the definition of Ψ 2 ( ξ i ( k ) , λ i , η i , χ i ) , one can get
Ψ 2 ( ξ i ( k ) , λ i , η i , χ i ) = g ( ξ i ( k ) ) ξ i ( k ) ,
where g ( ξ i ( k ) ) is defined in Theorem 1. Therefore, Equation (19) can be rewritten as
φ i ( k + 1 ) = T ζ i g ( ξ i ( k ) ) Λ i T ( k ) ξ i ( k ) + φ i ( k ) + Δ d i ( k ) .
Combining Equations (18) and (21), the dynamics of the estimation error Γ i ( k ) can be written as
Γ i ( k + 1 ) = A I N E S O ( k ) Γ i ( k ) + 0 3 × 1 Δ d i ( k ) ,
where A I N E S O ( k ) is defined as in (14).
Taking the norm on both sides of (22), and applying the triangle inequality and the sub-multiplicative property of the induced norm yields
Γ i ( k + 1 ) A I N E S O ( k ) Γ i ( k ) + Δ d i ( k ) .
Since A I N E S O ( k ) σ < 1 and Δ d i ( k ) d ¯ i , then
Γ i ( k + 1 ) σ Γ i ( k ) + d ¯ i .
This is a stable linear difference inequality. Solving it recursively leads to
Γ i ( k ) σ k Γ i ( 0 ) + d ¯ i l = 0 k 1 σ l
σ k Γ i ( 0 ) + d ¯ i 1 σ .
As k , the first term σ k Γ i ( 0 ) 0 due to σ < 1 . Therefore, for any ϕ > 0 , there exists a positive integer K such that for all k > K , the estimation error Γ i ( k ) is uniformly ultimately bounded by
Γ i ( k ) d ¯ i 1 σ + ϕ .
Moreover, from the above inequality, a global and explicit bound for all k is also available, confirming that Γ i ( k ) does not exceed the finite constant κ i = d ¯ i 1 σ . This completes the proof.    □

4.2. Optimal Consensus Control

Based on the system (5), consider the disturbance-free dynamics of the i-th follower, which can be expressed as
X i ( k + 1 ) = X i ( k ) + T Λ i ( k ) u i ( k ) ,
and the consensus error system for the i-th follower is represented as
e i ( k ) = j N i a i j ( X i ( k ) X j ( k ) ) + a i 0 ( X i ( k ) X 0 ( k ) ) .
To achieve distributed optimal consensus, each follower minimizes a cost function, which is
J i ( e i ( k ) , u i ( k ) , u j ( k ) ) = l = k μ ( l k ) ( C i ( e i ( l ) , u i ( l ) , u j ( l ) ) = l = k μ ( l k ) ( e i T ( l ) S i i e i ( l ) + u i T ( l ) R i i u i ( l ) + j N i u j T ( l ) R i j u j ( l ) ) .
For simplicity, let J i ( e i ( k ) , u i ( k ) , u j ( k ) ) = J i ( e i ( k ) ) and C i ( e i ( k ) , u i ( k ) , u j ( k ) ) = C i ( e i ( k ) , u i ( k ) ) . The cost function then satisfies the Bellman equation
J i ( e i ( k ) ) = C i ( e i ( k ) , u i ( k ) ) + μ J i ( e i ( k + 1 ) ) ,
and the optimal cost function J i * ( e i ( k ) ) obeys the HJB Equation
J i * ( e i ( k ) ) = min u i ( k ) { C i ( e i ( k ) , u i ( k ) ) + μ J i * ( e i ( k + 1 ) ) } .
By differentiating (32) with respect to u i ( k ) , the optimal control protocol of the i-th follower is derived as
u i * ( k ) = μ T 2 ( d i + a i 0 ) R i i 1 Λ i T ( k ) J i * ( e i ( k + 1 ) ) e i ( k + 1 ) ,
whose PI algorithm flow can be obtained in Algorithm 1, and the overall execution flow of the proposed policy iteration scheme is visualized in Figure 1.
After obtaining d ^ i ( k ) and u i * ( k ) , the optimal disturbance rejection consensus control protocol for system (5) is designed as
u i ( k ) = u i * ( k ) d ^ i ( k ) .
Remark 1.
The cost function (30) is designed to achieve optimal consensus by balancing the trade-off between consensus error and control effort. Specifically, the first term e i T ( l ) S i i e i ( l ) penalizes the consensus error, encouraging the follower agents to synchronize their states with the leader and neighbors. The second term u i T ( l ) R i i u i ( l ) penalizes the control effort of the i-th follower, preventing excessive control input. The third term j N i u j T ( l ) R i j u j ( l ) accounts for the control efforts of neighboring agents, which reflects the cooperative nature of MASs.
The weighting matrices S i i , R i i , and  R i j are chosen as positive definite symmetric matrices based on the following assumptions: (1) S i i emphasizes the importance of state consensus, where larger values lead to faster convergence but potentially higher control costs; (2) R i i regulates the control effort of agent i, with larger values resulting in more conservative control actions; (3) R i j influences the cooperative behavior by considering neighbors’ control costs, promoting energy-efficient coordination.
The discount factor μ ( 0 < μ 1 ) ensures the convergence of the infinite-horizon cost function and determines the relative importance of future costs. A value closer to 1 places more emphasis on long-term performance, while smaller values focus on immediate costs. This cost function formulation directly influences the optimal consensus control process through the HJB Equation (32), where the minimization of (30) leads to the optimal control protocol (33) that achieves consensus while optimizing the defined performance criteria.
Algorithm 1 Policy Iteration Algorithm for Optimal Consensus Control
1:
Initialization:
2:
   Let u i ( 0 ) ( k ) be an arbitrary admissible control policy
3:
   Let J i ( 0 ) ( e i ( k ) ) be an initial cost function
4:
   Set iteration index t 0
5:
   Select convergence tolerance ϵ > 0
6:
Procedure:
7:
Step 1: Policy Evaluation
8:
   Update the cost function:
9:
    J i ( t + 1 ) ( e i ( k ) ) C i ( e i ( k ) , u i ( t ) ( k ) ) + μ J i ( t ) ( e i ( k + 1 ) )
10:
Step 2: Policy Improvement
11:
   Update the control policy:
12:
    u i ( t + 1 ) ( k ) μ T 2 ( d i + a i 0 ) R i i 1 Λ i T ( k ) J i ( t ) ( e i ( k + 1 ) ) e i ( k + 1 )
13:
Step 3: Convergence Check
14:
   Compute error: Δ J i ( t + 1 ) ( e i ( k ) ) J i ( t ) ( e i ( k ) )
15:
   If Δ ϵ , then:
16:
      Set u i * ( k ) u i ( t + 1 ) ( k )
17:
      break
18:
   Else:
19:
       t t + 1
20:
      Return to Step 1
21:
Output: Optimal control protocol u i * ( k )

4.3. Momentum-Accelerated Actor–Critic Network

A closed-form solution to the HJB Equation (32) remains analytically challenging. To address this, an Actor–Critic framework based on ADP is developed to approximate its solution.
An critic network (CN) is used to approximate J i ( e i ( k ) ) , that is
J ^ i ( k ) = W ^ c i T ϕ c i ( z c i ( k ) ) ,
where J ^ i ( k ) is the estimated value of J i ( e i ( k ) ) , z c i ( k ) is an input vector of the CN, ϕ c i ( z c i ( k ) ) denotes the activation function, and W ^ c i T denotes the weight matrix.
Therefore, the estimation error of (31) of the CN is represented as
e c i ( k ) = J ^ i ( k ) J i ( e i ( k ) ) ,
and
E c i ( k ) = 1 2 e c i T ( k ) e c i ( k ) .
In order to accelerate the convergence of the weight matrix W ^ c i T of the CN, a momentum term is incorporated to update W ^ c i T based on a standard gradient descent method. Consequently, the iterative update rule of W ^ c i T is as follows:
W ^ c i T ( t + 1 ) = W ^ c i T ( t ) ρ c i E c i ( k ) e c i ( k ) e c i ( k ) J ^ i ( k ) J ^ i ( k ) W ^ c i ρ c i E c i ( k 1 ) W ^ c i = W ^ c i T ( t ) ρ c i e c i ( k ) ϕ c i T ( z c i ( k ) ) ρ c i e c i ( k 1 ) ϕ c i T ( z c i ( k 1 ) ) ,
where E c i ( k 1 ) / W ^ c i is the momentum term, and ρ c i is the learning rate of the CN.
An actor network (AN) is used to approximate u i * ( k ) , that is
u ^ i ( k ) = W ^ a i T ϕ a i ( z a i ( k ) ) ,
where u ^ i ( k ) is the estimated value of u i * ( k ) , z a i ( k ) is an input vector of the AN, ϕ a i ( z a i ( k ) ) denotes the activation function, and W ^ a i T is the weight matrix.
On the basis of (35), (33) is expressed as
u i * ( k ) = μ T 2 ( d i + a i 0 ) R i i 1 Λ i T ( k ) W ^ c i T ϕ c i ( z c i ( k + 1 ) ) e i ( k + 1 ) .
If the estimation error of (40) of the AN is defined as
e a i ( k ) = u ^ i ( k ) u i * ( k ) ,
and
E a i ( k ) = 1 2 e a i T ( k ) e a i ( k ) ,
then the iterative update rule of W ^ a i T is as follows:
W ^ a i T ( t + 1 ) = W ^ a i T ( t ) ρ a i E a i ( k ) e a i ( k ) e a i ( k ) u ^ i ( k ) u ^ i ( k ) W ^ a i ρ a i E a i ( k 1 ) W ^ a i = W ^ a i T ( t ) ρ a i e a i ( k ) ϕ a i T ( z a i ( k ) ) ρ a i e a i ( k 1 ) ϕ a i T ( z a i ( k 1 ) ) ,
where ρ a i is the learning rate of the AN.
These weight update rules ensure that the critic and actor networks converge to accurately approximate J i * ( e i ( k ) ) and u i * ( k ) . Based on the derivation above, the optimal control protocol (33) is obtained, and the final control protocol (34) is consequently derived. By substituting the control protocol (34) into the system (5), the original system (5) is transformed into the disturbance-free system (28).

4.4. Opitimal Consensus Analysis

Theorem 2.
For agent i, if J i * ( e i ( k ) ) satisfies the coupled HJB Equation (32), then e i ( k ) is asymptotically stable, which implies that the DT-MASs (6) and (28) under the control protocol (33) can achieve optimal consensus, and the DT-MASs (5) and (6) under the control protocol (34) can reach optimal consensus.
Proof. 
According to (32), one has
J i * ( e i ( k ) ) μ J i * ( e i ( k + 1 ) ) = C i ( e i ( k ) , u i * ( k ) ) ,
and
μ k J i * ( e i ( k ) ) μ k + 1 J i * ( e i ( k + 1 ) ) = μ k C i ( e i ( k ) , u i * ( k ) ) .
Define the difference of the Lyapunov function candidate as
Δ ( μ k J i * ( e i ( k ) ) ) = μ k + 1 J i * ( e i ( k + 1 ) ) μ k J i * ( e i ( k ) ) .
According to (45), (46) can be rewritten as
Δ ( μ k J i * ( e i ( k ) ) ) = μ k C i ( e i ( k ) , u i * ( k ) ) 0 ,
which implies that e i ( k ) is asymptotically stable, satisfying lim k e i ( k ) = 0 , and the DT-MASs (6) and (28) can achieve optimal consensus. This indicates that the DT-MASs (5) and (6) can reach optimal consensus under the control protocol (34). This completes the proof. □

5. Simulation Results

This section provides a numerical example to validate the effectiveness of the proposed momentum-accelerated Actor–Critic network-based control method. The validation is conducted on UAV-based DT-MASs, which are given by Equations (5) and (6), with the control objective of achieving optimal consensus. The considered DT-MAS comprises one leader and three follower UAVs, whose directed communication topology is illustrated in Figure 2.
According to Figure 2, choose a 10 = 1, a 12 = a 23 = a 31 = 1, and the rest of the communication weights are 0. The initial states of the DT-MASs (5) and (6) are set as X 1 ( 0 ) = ( 1 , 0.5 , 0 ) T , X 2 ( 0 ) = ( 0 , 1 , π / 4 ) T , X 3 ( 0 ) = ( 1 , 0.5 , π / 6 ) T , and X 0 ( 0 ) = ( 2 , 1, π / 3 ) T . The sampling period T is given as 0.005 s. The control input of the leader is u 0 ( k ) = ( 0.5 , 0.05 ) T .
The parameters of the INESO (10) are configured as α i = 1.5 , ζ i = 100 , γ i = 0.1 , ς i = 0.5 , λ i = 0.1 , β i = 0.1 , χ i = 0.1 , and η i = 0.01 , where i { 1 , 2 , 3 } . The parameters of cost function (30) are set as S i i = I 3 × 3 , and R i i = R i j = I 2 × 2 , and μ is set as 0.99 . The learning rate ρ c i = ρ a i = 0.1 , and the activation functions of the CN and AN are respectively designed as ϕ c i ( z c i ( k ) ) = ( e i 1 2 , e i 2 2 , e i 3 2 , e i 1 e i 2 , e i 1 e i 3 , e i 2 e i 3 ) T and ϕ a i ( z a i ( k ) ) = e i T , where i { 1 , 2 , 3 } . The disturbances are defined as d i 1 ( k ) = 3 s i n ( 0.0025 k ) + 1 and d i 2 ( k ) = e s i n ( 0.0025 k ) + 0.5 c o s ( 0.0025 k ) + 0.5 , where i { 1 , 2 , 3 } .
Remark 2.
The selection of these specific disturbance functions is physically motivated and reflects realistic operational scenarios for UAV swarms. The sinusoidal component 3 s i n ( 0.0025 k ) + 1 represents periodic wind gusts and atmospheric turbulence commonly encountered during UAV flight operations, where the amplitude and frequency parameters capture typical wind disturbance characteristics. The exponential term e s i n ( 0.0025 k ) + 0.5 c o s ( 0.0025 k ) + 0.5 models the nonlinear and transient disturbance effects, such as sudden wind shear or gust fronts that exhibit rapid onset and decay patterns. This combined disturbance profile effectively captures the complex, time-varying nature of environmental disturbances that multi-UAV systems encounter in practical applications, providing a comprehensive test scenario that challenges both the estimation accuracy of the INESO and the compensation capability of the proposed control scheme. Such disturbance models are widely adopted in the literature for evaluating the robustness of UAV control systems under realistic operating conditions.

5.1. Simulations for INESO

This subsection validates the estimation performance of the INESO (10) under the prescribed complex disturbance scenario. To focus on evaluating the observer’s inherent estimation performance for the disturbances and to streamline the analysis, all follower UAVs are subjected to the same disturbances described above.
Figure 3 shows the trajectories of the actual disturbances d i ( k ) and their estimates d ^ i ( k ) provided by the INESO (10). The close match between the two, for both disturbance components, demonstrates the observer’s high estimation accuracy and rapid convergence.
To quantitatively validate the advantages of the proposed INESO (10), a comparative analysis with a conventional LESO is conducted. The conventional LESO is designed as follows:
X ^ i l i n ( k + 1 ) = X ^ i l i n ( k ) + T Λ i ( k ) ( u i ( k ) + d ^ i l i n ( k ) ) T α i l i n ξ i l i n ( k ) , d ^ i l i n ( k + 1 ) = d ^ i l i n ( k ) T ζ i l i n Λ i T ( k ) ξ i l i n ( k ) ,
where ξ i l i n ( k ) = X i ( k ) X ^ i l i n ( k ) , α i l i n = α i = 1.5 , ζ i l i n = ζ i = 100 , and the remaining parameters are the same as the INESO (10).
Figure 4 presents the performance comparison of the disturbance estimation between the proposed INESO and the conventional LESO for UAV1 under identical disturbance conditions. The quantitative comparison metrics are summarized in Table 1.
The comparative results demonstrate three key advantages of the proposed INESO:
  • Enhanced Nonlinear Disturbance Estimation: The nonlinear functions Ψ 1 ( ξ i ( k ) , ς i , γ i , β i ) and Ψ 2 ( ξ i ( k ) , λ i , η i , χ i ) enable adaptive gain adjustment based on estimation error magnitude. For large errors ( ξ i ( k ) > ς i ), the nonlinear term ξ i ( k ) γ i 1 provides aggressive correction, while for small errors, the linearized term ensures smooth convergence with almost no overshoot.
  • Superior Estimation Accuracy: As shown in Table 1, the INESO demonstrates significant improvements over the conventional LESO across multiple key estimation accuracy metrics. The mean squared error is reduced by 46.4%, indicating a substantial decrease in overall estimation bias. The maximum estimation error is reduced by 44.0%, showcasing enhanced robustness against abrupt or large disturbances. Furthermore, the steady-state error is decreased by 48.4%, reflecting the INESO’s ability to achieve more precise disturbance tracking in equilibrium states. This comprehensive enhancement in accuracy directly strengthens the control system’s disturbance rejection and tracking performance, providing a reliable foundation for precise control in high-order, nonlinear, or strongly disturbed environments.
  • Faster Convergence Speed: The 42.6% reduction in convergence time (from 4.95 s to 2.84 s) indicates that the INESO achieves stable and accurate disturbance estimates more rapidly. This accelerated convergence enhances the overall responsiveness of the control system, allowing it to compensate for disturbances in a timelier manner, which is crucial for maintaining system stability and convergence performance under dynamic conditions.
These quantitative results substantiate the mechanistic innovation of the proposed INESO in addressing the limitations of the conventional LESO for MASs operating under disturbance conditions.

5.2. Simulations for Momentum-Accelerated Actor-Critic Network and Optimal Consensus

Figure 5 illustrates the weight norm curves of the CN and the AN for all followers, indicating that the norm of each network’s weights asymptotically stabilizes to an equilibrium value, confirming the convergence of the learning parameters.
To further validate the optimization performance of the proposed control strategy, Figure 6 presents the convergence curves of the cost function J i ( e i ( k ) ) for each follower agent. All cost functions exhibit a consistent decreasing trend and converge to near-zero values, demonstrating the effectiveness of the momentum-accelerated Actor-Critic network in minimizing the cost function defined in (30). The rapid and smooth descent of the cost curves reflects the accelerated learning capability of the proposed method.
The trajectories of the state errors x i ( k ) x 0 ( k ) , y i ( k ) y 0 ( k ) , and θ i ( k ) θ 0 ( k ) for all agents are shown in Figure 7. It is observed that all error trajectories converge asymptotically to zero. This result verifies that, despite the presence of external disturbances, the DT-MASs (5) and (6) achieve robust optimal consensus through the integrated control strategy based on the designed INESO and the momentum-accelerated Actor–Critic network. The fast and smooth convergence of the errors further demonstrates that the proposed method not only optimizes transient performance but also ensures closed-loop system stability and enhances robustness in the face of system uncertainties and external disturbances.
To evaluate the advantage of the proposed acceleration technique, a comparative study is conducted under identical parameters. Figure 8 displays the 2D state trajectories of the systems based on the standard gradient Actor–Critic network and the momentum-accelerated one. A clear observation is that the momentum-accelerated network exhibits a significantly faster convergence rate, underscoring the effectiveness and superiority of the proposed method.
In order to further clarify the superiority of the proposed momentum-accelerated Actor–Critic network compared with the standard gradient algorithm, this article makes a quantitative statistical analysis of its convergence performance. Table 2 summarizes the key performance metrics obtained from extensive simulations.
The statistical analysis clearly demonstrates the superior performance of the momentum-accelerated approach across all metrics. The convergence time was reduced by 12.8%, from 22.13 s with the standard method to 19.30 s with the momentum-accelerated network. This improvement is attributed to the incorporation of historical gradient information, which enables more stable and efficient weight updates during the learning process.
Furthermore, the momentum-accelerated network has achieved significantly better steady-state performance, with steady-state position and orientation errors reduced by 22.7% and 42.1%, respectively. The overshoot percentage, which indicates the transient response quality, was reduced by 22.7%, demonstrating enhanced stability during the convergence process. These statistical results substantiate the visual observations in Figure 8 and provide the quantitative evidence of the performance gains achieved by the proposed momentum-accelerated Actor–Critic network.
Simulation results demonstrate that the proposed INESO can accurately estimate complex disturbances, showing significant improvements in estimation error and convergence time compared to the conventional LESO. Meanwhile, the momentum-accelerated Actor–Critic network effectively enhances the learning speed and consensus performance of DT-MASs, achieving faster and more stable optimal consensus control. Overall, the integrated control strategy exhibits good robustness and optimization capability in the presence of external disturbances.

6. Discussion

This article presents an optimal disturbance rejection control protocol for achieving consensus in nonlinear MASs under unknown disturbances. Compared to existing methods [21,22,33], the proposed momentum-accelerated Actor–Critic framework significantly accelerates the convergence to consensus. Integrated with an extended state observer, the approach further enhances the system’s capability to suppress external disturbances, highlighting its potential value for practical applications.
It is important to acknowledge several challenges for the real-world deployment of this protocol. First, at the communication level, issues such as network latency, packet loss, and dynamic topology changes must be addressed. Second, regarding motion control, dynamic characteristics not fully captured by the kinematic model—including actuator saturation, aerodynamic coupling effects, and inertial forces—require further consideration. Third, computational constraints of embedded platforms, including real-time processing requirements and memory limitations, must be satisfied.
A primary limitation of the current work lies in its validation within a two-dimensional environment. Real-world UAV swarms operate in three-dimensional space, and wind disturbances exhibit complex spatial characteristics. Although the simplified 2D model serves as a crucial step for verifying the core algorithms and establishing a theoretical foundation, future work should extend the framework in the following aspects to improve physical realism: extension to 3D operational environments with full spatial dynamics; incorporation of vertical wind components and their effects on altitude control; development of 3D consensus protocols accounting for full positional and attitude coordination; and validation using more realistic three-dimensional turbulence models.
Nevertheless, the fundamental contributions of this work—the nonlinear extended state observer and the momentum-accelerated Actor–Critic network—remain applicable to 3D scenarios. The principles of disturbance estimation and compensation can be directly extended to three dimensions, and the ADP framework naturally accommodates higher-dimensional state spaces. Subsequent research will focus on implementing the algorithm in three-dimensional environments and conducting experimental validation on real physical systems, such as multi-rotor UAV platforms, to further investigate its performance in tasks like multi-UAV trajectory tracking.

7. Conclusions

This article investigates the optimal consensus problem for MASs based on UAVs under unknown disturbances. An INESO is designed to accurately estimate disturbances, and a composite compensation strategy is introduced to significantly improve disturbance rejection performance. Moreover, a momentum-accelerated Actor–Critic network is proposed, which enhances convergence speed during consensus achievement. Simulation results confirm the effectiveness and robustness of the approach, demonstrating improved cooperative stability and mission performance of MASs in disturbed environments.
While the current study is conducted in a two-dimensional framework, it provides a reliable technical foundation for UAV swarm applications in areas such as search and rescue, logistics, and inspection. The core algorithmic contributions, including the nonlinear extended state observer and momentum-accelerated learning framework, establish essential building blocks for future extensions to three-dimensional operational environments.
Beyond UAV applications, the proposed control framework opens up new possibilities for coordinated decision-making in complex and dynamic multi-agent scenarios. By ADP with disturbance estimation, the strategy supports real-time task allocation and adaptive cooperative control. The method is general and scalable, making it applicable to a wide range of MAS coordination tasks beyond UAVs. Future work will focus on extending the framework to three-dimensional environments and addressing consensus control for highly nonlinear multi-UAV systems under strong disturbances, further bridging the gap between theoretical design and practical deployment.

Author Contributions

Conceptualization, J.L., M.P. and P.L.; methodology, K.L., J.L., M.P. and C.W.; software, K.L. and J.L.; validation, K.L.; formal analysis, K.L.; investigation, K.L. and J.L.; resources, P.L. and M.P.; data curation, K.L.; writing—original draft preparation, J.L. and K.L.; writing—review and editing, K.L., J.L., M.P. and P.L.; visualization, K.L.; supervision, J.L., P.L. and C.W.; project administration, J.L., P.L. and M.P.; funding acquisition, J.L., P.L. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Key Program of National Natural Science Foundation Joint Fund of China under Grant U21A20485, in part by the Natural Science Foundation of Sichuan of China under Grant 2026NSFSC0138 and Grant 2023NSFSC1985, and in part by the Transformation Program of Scientific and Technological Achievements of Sichuan of China under Grant 2024ZHCG0032 and Grant 2024ZHCG0017.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, W.; Liu, Y.; Srikant, R.; Ying, L. 3M-RL: Multi-Resolution, Multi-Agent, Mean-Field Reinforcement Learning for Autonomous UAV Routing. IEEE Trans. Intell. Transp. Syst. 2022, 23, 8985–8996. [Google Scholar] [CrossRef]
  2. Logothetis, M.; Karras, G.C.; Alevizos, K.; Verginis, C.K.; Roque, P.; Roditakis, K.; Makris, A.; Garcia, S.; Schillinger, P.; Di Fava, A.; et al. Efficient Cooperation of Heterogeneous Robotic Agents. IEEE Robot. Autom. Mag. 2021, 28, 74–87. [Google Scholar] [CrossRef]
  3. Li, Z.; Zhao, Y.; Yan, H.; Zhang, H.; Zeng, L.; Wang, X. Active Disturbance Rejection Formation Tracking Control for Uncertain Nonlinear Multi-Agent Systems With Switching Topology via Dynamic Event-Triggered Extended State Observer. IEEE Trans. Circuits Syst. I Regul. Pap. 2023, 70, 518–529. [Google Scholar] [CrossRef]
  4. Khan, M.W.; Wang, J. The research on multi-agent system for microgrid control and optimization. Renew. Sustain. Energy Rev. 2017, 80, 1399–1411. [Google Scholar] [CrossRef]
  5. Liu, J.; Chen, W.; Qin, K.; Li, P. Consensus of Multi-Integral Fractional-Order Multiagent Systems with Nonuniform Time-Delays. Complexity 2018, 2018, 8154230. [Google Scholar] [CrossRef]
  6. Shen, H.; Wang, Y.; Xia, J.; Park, J.H.; Wang, Z. Fault-tolerant leader-following consensus for multi-agent systems subject to semi-Markov switching topologies: An event-triggered control scheme. Nonlinear Anal. Hybrid Syst. 2019, 34, 92–107. [Google Scholar] [CrossRef]
  7. Du, H.; Wen, G.; Wu, D.; Cheng, Y.; Lu, J. Distributed fixed-time consensus for nonlinear heterogeneous multi-agent systems. Automatica 2020, 113, 108797. [Google Scholar] [CrossRef]
  8. Qiao, Y.; Huang, X.; Yang, B.; Geng, F.; Wang, B.; Hao, M.; Li, S. Formation Tracking Control for Multi-Agent Systems with Collision Avoidance and Connectivity Maintenance. Drones 2022, 6, 419. [Google Scholar] [CrossRef]
  9. Liu, J.; Qin, K.; Chen, W.; Li, P. Consensus of Delayed Fractional-Order Multiagent Systems Based on State-Derivative Feedback. Complexity 2018, 2018, 8789632. [Google Scholar] [CrossRef]
  10. Du, Z.; Xie, X.; Qu, Z.; Hu, Y.; Stojanovic, V. Dynamic Event-Triggered Consensus Control for Interval Type-2 Fuzzy Multi-Agent Systems. IEEE Trans. Circuits Syst. I Regul. Pap. 2024, 71, 3857–3866. [Google Scholar] [CrossRef]
  11. Li, K.; Hua, C.; You, X.; Guan, X. Distributed Output-Feedback Consensus Control for Nonlinear Multiagent Systems Subject to Unknown Input Delays. IEEE Trans. Cybern. 2022, 52, 1292–1301. [Google Scholar] [CrossRef] [PubMed]
  12. Liu, J.; Zhou, N.; Qin, K.; Chen, B.; Wu, Y.; Choi, K.S. Distributed optimization for consensus performance of delayed fractional-order double-integrator multi-agent systems. Neurocomputing 2023, 522, 105–115. [Google Scholar] [CrossRef]
  13. Zhang, H.; Jiang, H.; Luo, Y.; Xiao, G. Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method. IEEE Trans. Ind. Electron. 2017, 64, 4091–4100. [Google Scholar] [CrossRef]
  14. Zhang, J.; Ding, D.W.; Lu, Y.; Deng, C.; Ren, Y. Distributed Fault-Tolerant Bipartite Output Synchronization of Discrete-Time Linear Multiagent Systems. IEEE Trans. Cybern. 2023, 53, 1360–1373. [Google Scholar] [CrossRef]
  15. Yang, X.; Zhang, H.; Wang, Z. Data-Based Optimal Consensus Control for Multiagent Systems With Policy Gradient Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 3872–3883. [Google Scholar] [CrossRef]
  16. Xu, C.; Xu, H.; Guan, Z.H.; Ge, Y. Observer-Based Dynamic Event-Triggered Semiglobal Bipartite Consensus of Linear Multi-Agent Systems With Input Saturation. IEEE Trans. Cybern. 2023, 53, 3139–3152. [Google Scholar] [CrossRef]
  17. Liu, J.; Qin, K.; Li, P.; Chen, W. Distributed consensus control for double-integrator fractional-order multi-agent systems with nonuniform time-delays. Neurocomputing 2018, 321, 369–380. [Google Scholar] [CrossRef]
  18. Zhao, W.; Li, R.; Zhang, H. Leader-follower optimal coordination tracking control for multi-agent systems with unknown internal states. Neurocomputing 2017, 249, 171–181. [Google Scholar] [CrossRef]
  19. Xu, Y.; Li, T.; Bai, W.; Shan, Q.; Yuan, L.; Wu, Y. Online event-triggered optimal control for multi-agent systems using simplified ADP and experience replay technique. Nonlinear Dyn. 2021, 106, 509–522. [Google Scholar] [CrossRef]
  20. Li, H.; Wei, Q. Data-Driven Optimal Output Cluster Synchronization Control of Heterogeneous Multi-Agent Systems. IEEE Trans. Autom. Sci. Eng. 2024, 21, 3910–3920. [Google Scholar] [CrossRef]
  21. Vamvoudakis, K.G.; Lewis, F.L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
  22. Zhang, H.; Ren, H.; Mu, Y.; Han, J. Optimal Consensus Control Design for Multiagent Systems With Multiple Time Delay Using Adaptive Dynamic Programming. IEEE Trans. Cybern. 2022, 52, 12832–12842. [Google Scholar] [CrossRef] [PubMed]
  23. Zhang, G.; Liang, C.; Zhu, Q. Adaptive Fuzzy Event-Triggered Optimized Consensus Control for Delayed Unknown Stochastic Nonlinear Multi-Agent Systems Using Simplified ADP. IEEE Trans. Autom. Sci. Eng. 2025, 22, 11780–11793. [Google Scholar] [CrossRef]
  24. Hu, C.; Zhao, L.; Qu, G. Event-Triggered Model Predictive Adaptive Dynamic Programming for Road Intersection Path Planning of Unmanned Ground Vehicle. IEEE Trans. Veh. Technol. 2021, 70, 11228–11243. [Google Scholar] [CrossRef]
  25. Jiao, Y.; Fu, W.; Cao, X.; Kou, K.; Tang, J.; Shen, R.; Zhang, Y.; Du, H. A Cooperative Decision-Making and Control Algorithm for UAV Formation Based on Non-Cooperative Game Theory. Drones 2024, 8, 698. [Google Scholar] [CrossRef]
  26. Xue, S.; Zhao, N.; Zhang, W.; Luo, B.; Liu, D. A Hybrid Adaptive Dynamic Programming for Optimal Tracking Control of USVs. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 9961–9969. [Google Scholar] [CrossRef]
  27. Zhang, Y.; Sun, J.; Liang, H.; Li, H. Event-Triggered Adaptive Tracking Control for Multiagent Systems With Unknown Disturbances. IEEE Trans. Cybern. 2020, 50, 890–901. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Zhu, H.; Tang, D.; Zhou, T.; Gui, Y. Dynamic job shop scheduling based on deep reinforcement learning for multi-agent manufacturing systems. Robot. Comput.-Integr. Manuf. 2022, 78, 102412. [Google Scholar] [CrossRef]
  29. Chang, X.; Yang, Y.; Zhang, Z.; Jiao, J.; Cheng, H.; Fu, W. Consensus-Based Formation Control for Heterogeneous Multi-Agent Systems in Complex Environments. Drones 2025, 9, 175. [Google Scholar] [CrossRef]
  30. Chai, Y.; Yang, Z.; Yu, H.; Liang, X.; Han, J. Adaptive Trajectory Tracking Control for Double-Pendulum Aerial Transportation System. IEEE Trans. Ind. Electron. 2025, 72, 9282–9292. [Google Scholar] [CrossRef]
  31. Feng, Y.; Zhou, Y.; Ho, H.W. Performance-guaranteed quadrotor control with incremental adaptive dynamic programming and disturbance compensation. Eng. Appl. Artif. Intell. 2026, 163, 113127. [Google Scholar] [CrossRef]
  32. Fu, J.; Wen, G.; Yu, X.; Huang, T. Robust Collision-Avoidance Formation Navigation of Velocity and Input-Constrained Multirobot Systems. IEEE Trans. Cybern. 2024, 54, 1734–1746. [Google Scholar] [CrossRef] [PubMed]
  33. Zhao, Z.L.; Guo, B.Z. A Novel Extended State Observer for Output Tracking of MIMO Systems With Mismatched Uncertainty. IEEE Trans. Autom. Control 2018, 63, 211–218. [Google Scholar] [CrossRef]
  34. Liu, J.; Tan, J.; Li, H.; Chen, B. Active Disturbance Rejection Consensus Control of Multi-Agent Systems Based on a Novel NESO. IEEE/ASME Trans. Mechatron. 2024, 30, 634–644. [Google Scholar] [CrossRef]
  35. Ran, M.; Li, J.; Xie, L. Reinforcement-Learning-Based Disturbance Rejection Control for Uncertain Nonlinear Systems. IEEE Trans. Cybern. 2022, 52, 9621–9633. [Google Scholar] [CrossRef]
  36. Vu, V.T.; Pham, T.L.; Dao, P.N. Disturbance observer-based H adaptive reinforcement learning for perturbed uncertain surface vessels. ISA Trans. 2022, 130, 277–292. [Google Scholar] [CrossRef]
  37. Wang, X.; Yang, J.; Liu, C.; Yan, Y.; Li, S. Safety-Critical Disturbance Rejection Control of Nonlinear Systems With Unmatched Disturbances. IEEE Trans. Autom. Control 2025, 70, 2722–2729. [Google Scholar] [CrossRef]
  38. Wei, Q.; Li, H.; Yang, X.; He, H. Continuous-Time Distributed Policy Iteration for Multicontroller Nonlinear Systems. IEEE Trans. Cybern. 2021, 51, 2372–2383. [Google Scholar] [CrossRef]
  39. Yu, H.; Miao, K.; He, Z.; Zhang, H.; Niu, Y. Fault-Tolerant Time-Varying Formation Trajectory Tracking Control for Multi-Agent Systems with Time Delays and Semi-Markov Switching Topologies. Drones 2024, 8, 778. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the proposed policy iteration algorithm for optimal consensus control.
Figure 1. Flowchart of the proposed policy iteration algorithm for optimal consensus control.
Drones 10 00078 g001
Figure 2. Directed communication topology.
Figure 2. Directed communication topology.
Drones 10 00078 g002
Figure 3. Trajectories of d ^ i ( k ) and d i ( k ) . (a) Trajectories of d ^ i 1 ( k ) and d i 1 ( k ) . (b) Trajectories of d ^ i 2 ( k ) and d i 2 ( k ) .
Figure 3. Trajectories of d ^ i ( k ) and d i ( k ) . (a) Trajectories of d ^ i 1 ( k ) and d i 1 ( k ) . (b) Trajectories of d ^ i 2 ( k ) and d i 2 ( k ) .
Drones 10 00078 g003
Figure 4. Performance comparison of the disturbance estimation between the proposed INESO and the conventional LESO.
Figure 4. Performance comparison of the disturbance estimation between the proposed INESO and the conventional LESO.
Drones 10 00078 g004
Figure 5. Weight norm curves. (a) Weight norm curves of CN for each follower. (b) Weight norm curves of the AN for each follower.
Figure 5. Weight norm curves. (a) Weight norm curves of CN for each follower. (b) Weight norm curves of the AN for each follower.
Drones 10 00078 g005
Figure 6. Convergence trajectories of the cost function J i ( e i ( k ) ) for all follower agents.
Figure 6. Convergence trajectories of the cost function J i ( e i ( k ) ) for all follower agents.
Drones 10 00078 g006
Figure 7. Trajectories of state errors in the DT-MASs (5) and (6). (a) Trajectories of x i ( k ) x 0 ( k ) . (b) Trajectories of y i ( k ) y 0 ( k ) . (c) Trajectories of θ i ( k ) θ 0 ( k ) .
Figure 7. Trajectories of state errors in the DT-MASs (5) and (6). (a) Trajectories of x i ( k ) x 0 ( k ) . (b) Trajectories of y i ( k ) y 0 ( k ) . (c) Trajectories of θ i ( k ) θ 0 ( k ) .
Drones 10 00078 g007
Figure 8. Two-dimensional state trajectories of the DT-MASs (5) and (6). (a) Trajectories under the standard gradient Actor–Critic network. (b) Trajectories under the momentum-accelerated Actor–Critic network.
Figure 8. Two-dimensional state trajectories of the DT-MASs (5) and (6). (a) Trajectories under the standard gradient Actor–Critic network. (b) Trajectories under the momentum-accelerated Actor–Critic network.
Drones 10 00078 g008
Table 1. Quantitative comparison between the proposed INESO and the conventional LESO.
Table 1. Quantitative comparison between the proposed INESO and the conventional LESO.
Performance MetricLESOINESOImprovement
Mean Squared Error0.0410.02846.4%
Maximum Estimation Error0.0720.05044.0%
Steady-State Error0.0460.03148.4%
Convergence Time (s)4.952.8442.6%
Table 2. Performance comparison between the standard gradient Actor–Critic network (SGACN) and the momentum-accelerated Actor–Critic network (MAACN).
Table 2. Performance comparison between the standard gradient Actor–Critic network (SGACN) and the momentum-accelerated Actor–Critic network (MAACN).
Average Performance MetricSGACNMAACNImprovement
convergence time (s)22.1319.3012.8%
Convergence steps4426386012.8%
Steady-state position error (m)0.0750.05822.7%
Steady-state orientation error (rad)0.0380.02242.1%
Overshoot percentage12.3%9.5%22.7%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Luo, K.; Li, P.; Pu, M.; Wang, C. Robust Optimal Consensus Control for Multi-Agent Systems with Disturbances. Drones 2026, 10, 78. https://doi.org/10.3390/drones10020078

AMA Style

Liu J, Luo K, Li P, Pu M, Wang C. Robust Optimal Consensus Control for Multi-Agent Systems with Disturbances. Drones. 2026; 10(2):78. https://doi.org/10.3390/drones10020078

Chicago/Turabian Style

Liu, Jun, Kuan Luo, Ping Li, Ming Pu, and Changyou Wang. 2026. "Robust Optimal Consensus Control for Multi-Agent Systems with Disturbances" Drones 10, no. 2: 78. https://doi.org/10.3390/drones10020078

APA Style

Liu, J., Luo, K., Li, P., Pu, M., & Wang, C. (2026). Robust Optimal Consensus Control for Multi-Agent Systems with Disturbances. Drones, 10(2), 78. https://doi.org/10.3390/drones10020078

Article Metrics

Back to TopTop