Next Article in Journal
Homomorphic Cryptographic Scheme Based on Nilpotent Lie Algebras for Post-Quantum Security
Previous Article in Journal
Gluon Condensation as a Unifying Mechanism for Special Spectra of Cosmic Gamma Rays and Low-Momentum Pion Enhancement at the Large Hadron Collider
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach

1
Aerospace Times Feihong Technology Company Limited, Beijing 100094, China
2
National Elite Institute of Engineering, Northwestern Polytechnical University, Xi’an 710072, China
*
Author to whom correspondence should be addressed.
Symmetry 2025, 17(10), 1665; https://doi.org/10.3390/sym17101665
Submission received: 5 August 2025 / Revised: 25 August 2025 / Accepted: 10 September 2025 / Published: 6 October 2025
(This article belongs to the Special Issue Symmetries in Dynamical Systems and Control Theory)

Abstract

An approximate optimal control issue for modular unmanned systems (MUSs) is presented via a cooperative differential game for solving the trajectory tracking problem. Initially, the modular unmanned system’s dynamic model is built with the joint torque feedback technique. The moment of inertia of the motor rotor has positive symmetry. Each MUS module is deemed as a participant in the cooperative differential game. Then, the MUS trajectory tracking problem is transformed into an approximate optimal control problem by means of adaptive critic design (ACD). The approximate optimal control is obtained by the critic network, approaching the joint performance index function of the system. The stability of the closed-loop system is proved through Lyapunov theory. The feasibility of the proposed control algorithm is verified by an experimental platform.

1. Introduction

Unmanned systems [1,2] refer to collections of intelligent equipment that realize unmanned or semi-autonomous operation through advanced sensing, artificial intelligence, autonomous navigation, and other technologies, including unmanned aerial vehicles [3], unmanned vehicles [4], unmanned ships [5], robots [6], and cross-domain collaborative unmanned cluster systems. Their core goals are to improve task efficiency, reduce human risk, and push the limits of human physiology in complex, dangerous, or repetitive scenarios through robot replacement or human–robot collaboration. However, traditional unmanned systems usually have fixed configurations, thus limiting their further application. In order to solve the above challenges, modular unmanned systems (MUSs) [7] have been proposed and applied to many aspects, such as autonomous assembly of large-aperture telescopes in space via the handling, assembly, and connection of trusses and lenses. In addition, modular reconfigurable robots use the same ideas as MUSs.
An ideal MUS should not only ensure its stability and accuracy under control, but also consider its optimality. Adaptive critic design (ACD) [8,9], as a branch of optimal control, is a cutting-edge method at the intersection of machine learning, control theory, and operations research which aims to solve the real-time optimization and decision problems of complex dynamic systems. Through the integration of reinforcement learning, neural networks, and dynamic programming, it overcomes the dependence of traditional dynamic programming on precise mathematical models, and realizes online learning and adaptive adjustment of strategies in unknown or time-varying environments, known as the intelligent optimization brain of dynamic systems. Adaptive critic design has many applications, such as in continuous-time systems [10,11], discrete-time systems [12,13], data-driven systems [14,15], event-triggered systems [16,17], fault diagnosis systems [18,19], and model uncertain system [20,21]. However, most of the systems mentioned above have a single controller, and with the gradual expansion of the system scale, multi-controller or multi-player systems should be further considered to optimize overall system performance.
Differential game theory [22,23] is a frontier field combining game theory and dynamic system theory, focusing on the strategic interaction between multiple decision-makers under continuous time frames. Different from traditional static games, differential games model the strategy selection of the participants as a time-dependent state variable control process, and describe the dynamic evolution of the system state through differential equations. Each participant adjusts their control strategy according to real-time information. In the process of pursuing the optimization of individual objective functions, it is necessary for one player to predict the influence of the opponent’s behavior on the system trajectory and consider the chain reaction of its own strategy on the global dynamics. There are many types of games, such as zero-sum games [24,25], non-zero-sum games [26,27], etc. Cooperative games [28,29], as special non-zero-sum games, have only one performance index, and each player hopes to optimize the overall performance index function of the system through cooperation. Therefore, the control problem of MUSs is suitable to be solved by the concept of cooperative differential games.
This paper proposes an approximate optimal control method based on cooperative differential games for MUSs and addresses the trajectory tracking problem. First, a dynamic model is established. Then, each module of the MUS is treated as a participant in a cooperative differential game, transforming the trajectory tracking problem into an approximate optimal control problem based on a cooperative differential game via ACD. A critic network is employed to approximate the system’s joint performance index function, and control law is derived through a policy iteration algorithm. The stability is proven using Lyapunov theory. The effectiveness of the proposed algorithm is validated through an experimental platform. This paper’s contributions are as follows:
  • It is the first paper to use cooperative differential games for MUSs via ACD to guarantee accuracy and optimality. The developed control method is verified on the actual platform.
  • The experimental results are verified via tracking error and control torque under the developed cooperative game method using ACD.

Notation

x is the global state of the MUS; E is the position tracking error of the MUS; U is the global optimal control; W is the global coupled strategy; Q , R , P are a given positive definite matrix; U l t i ( E ˙ , U , W ) is the utility function; and V a l ( E ˙ , U , W ) is the cost function.

2. Dynamic Model of MUS

According to the dynamic modeling method of MUS based on the JTF [26], we obtain
I m o i γ r a i θ ¨ i + f l u i ( θ i , θ ˙ i ) + I d c i ( θ , θ ˙ , θ ¨ ) + τ c o i γ r a i = τ i ,
where I m o i is the moment of inertia of the motor rotor in relation to the rotating shaft; γ r a i is the reduction ratio of the motor; θ i , θ ˙ i , and θ ¨ i are the position vector, velocity vector, and acceleration vector of the joint of the ith subsystem of the MUS; f l u i ( θ i , θ ˙ i ) is the concentrated joint friction torque; I d c i ( θ , θ ˙ , θ ¨ ) is the coupling term between joint subsystems; τ c o i is the coupling joint torque measurement; and τ i indicates control torque.
The joint module of the MUS is shown in Figure 1.
Property 1.
The moment of inertia of the motor rotor I m o i has positive symmetry. It will be useful in the stability analysis.
(1)
Coupling joint torque measurement τ c o i
Coupling joint torque τ c o i is measured and mainly consists of
τ c o i = τ c o f i + τ c o c i ,
where τ c o f i is the joint torque measured in free space, and τ c o c i represents the value of the external torque generated by the continuous or instantaneous contact environment. Additionally, the joint torque values in the constrained space τ c o i and free space τ c o f i are easily found; therefore, the external torque τ c o c i can be obtained by Formula (2).
(2)
Concentrated joint friction torque f l u i ( θ i , θ ˙ i )
Concentrated joint friction torque f l u i ( θ i , θ ˙ i ) reflects the friction torque on the DC motor and reducer of the joint module. f l u i ( θ i , θ ˙ i ) is composed of nonlinear functions related to joint position and velocity:
f l u i ( θ i , θ ˙ i ) = f b v i θ ˙ i + f c o i + f s t i e ( f τ s i θ ˙ i 2 ) s g n ( θ ˙ i ) + f p d i ( θ i , θ ˙ i ) ,
where f b v i , f c o i , f s t i , and f τ s i represent the viscous, Coulomb, static, and Stribeck friction of subsystem module i, respectively; f p d i ( θ i , θ ˙ i ) represents position-dependent friction term; and s g n ( θ ˙ i ) is a symbolic function.
According to linearization criteria, f p d i is approximated by the following formula:
f ^ l u i ( θ i , θ ˙ i ) = f ^ b v i θ ˙ i + f ^ c o i + f ^ s t i e ( f ^ τ s i θ ˙ i 2 ) s g n ( θ ˙ i ) + f p d i ( θ i , θ ˙ i ) + K ( θ ˙ i ) F ˜ p u i ,
where f ^ b v i , f ^ c o i , f ^ s t i , and f ^ τ s i are the approximate approximation values of f b v i , f c o i , f s t i , and f τ s i ; F ˜ p u i = f b v i f ^ b v i f c o i f ^ c o i f s t i f ^ s t i f τ s i f ^ τ s i T is the parameter uncertainty of friction torque; and K ( θ ˙ i ) = θ ˙ i , s g n ( θ ˙ i ) , e ( f ^ τ s i θ ˙ i 2 ) s g n ( θ ˙ i ) , f ^ s t i θ ˙ i 2 e ( f ^ τ s i θ ˙ i 2 ) s g n ( θ ˙ i ) .
Remark 1.
Friction torque parameters f b v i , f c o i , f s t i , f τ s i and their estimates f ^ b v i , f ^ c o i , f ^ s t i , f ^ τ s i are uniformly bounded, and the upper bound of F ˜ p u i can be defined as F ˜ p u i ρ p u i k ( k = 1 , 2 , 3 , 4 ) , where ρ p u i k is a known upper bound constant. Therefore, the friction modeling error K ( θ ˙ i ) F ˜ p u i has the following relationship K ( θ ˙ i ) F ˜ p u i K ( θ ˙ i ) ρ p u i k . In addition, the upper bound of the position-dependent friction term f p d i is f p d i ( θ i , θ ˙ i ) ρ f p d i , where ρ f p d i is a known upper bound constant.
(3)
Coupling term I d c i ( θ , θ ˙ , θ ¨ )
The coupling term I d c i ( θ , θ ˙ , θ ¨ ) , related to the coupling dynamics of the robot’s global vector, is
I d c i ( θ , θ ˙ , θ ¨ ) = I m o i k = 1 i 1 a m o i T a l n k θ ¨ k + I m o i k = 2 i 1 m = 1 k 1 a m o i T ( a l n m × a l n k ) θ ˙ m θ ˙ k = I m o i k = 1 i 1 Λ k i θ ¨ k + I m o i k = 2 i 1 m = 1 k 1 Ω m k i θ ˙ m θ ˙ k = k = 1 i 1 I m o i Λ ^ k i I m o i θ ¨ k Λ ˜ k i θ ¨ k T + k = 2 i 1 m = 1 k 1 I m o i Ω ^ m k i I m o i θ ˙ m θ ˙ k Ω ˜ m k i θ ˙ m θ ˙ k T ,
where a m o i , a l n m , and a l n k are unit vectors around the rotation axes of the ith motor and the m and k joints, respectively; Λ k i = a m o i T a l n k ; and Ω m k i = a m o i T ( a l n m × a l n k ) . In addition, Λ ^ k i = Λ k i Λ ˜ k i and Ω ^ m k i = Ω m k i Ω ˜ m k i , where Λ ^ k i and Ω ^ m k i are estimated values of Λ k i and Ω m k i , and Λ ˜ k i and Ω ˜ m k i are calibration errors.
Remark 2.
For coupling term I d c i ( θ , θ ˙ , θ ¨ ) , the dot product between a m o i , a l n m , and a l n k is bounded, and satisfies Λ k i = a m o i T a l n k < 1 and Ω m k i = a m o i T ( a l n m × a l n k ) < 1 . When the k and m ( 1 < k , m < i 1 ) joints near the base joint are stable, the coupling term I d c i ( θ , θ ˙ , θ ¨ ) can be given the following upper bounds: I d c i ( θ , θ ˙ , θ ¨ ) < ρ d c i , where ρ d c i is bounded by a positive number. Therefore, while stabilizing the current joint, the MUS can be stabilized step-by-step, thus making the system globally stable.
Define x i = [ x i 1 , x i 2 ] T = [ θ i , θ i ] T , u i = τ i = R 1 × 1 . Then, (1) can be converted into a state space equation:
x ˙ i 1 = x i 2 x ˙ i 2 = f i ( x i 1 , x i 2 ) + g i ( x i 1 ) u i + ψ i ( x ) y i = x i 1 ,
where f i ( x i 1 , x i 2 ) , g i ( x i 1 ) , and ψ i ( x ) are the measurable dynamics, control input matrix, and uncertainty of MUS, respectively, and
g i ( x i 1 ) = 1 I m o i γ r a i f i ( x i 1 , x i 2 ) = g i ( x i 1 ) f ^ b v i θ ˙ i + f ^ c o i + f ^ s t i e ( f ^ τ s i θ ˙ i 2 ) s g n ( θ ˙ i ) + f p d i ( θ i , θ ˙ i ) + K ( θ ˙ i ) F ˜ p u i + τ c o i γ r a i ψ i ( x ) = g i ( x i 1 ) I d c i ( θ , θ ˙ , θ ¨ ) .

3. Approximate Optimal-Control-Based Cooperative Differential Game via ACD

3.1. Problem Description

In this section, an approximate optimal control method based on a cooperative differential game created via ACD is proposed. In order to ensure the convenience of controller design, the state space equation of the following system is considered:
x ˙ 1 = x 2 x ˙ 2 = f ( x ) + m = 1 n G m u m + W ( x ) y = x 1 ,
where x = x 1 T , x 2 T T R 2 n is the global state of the MUS, and vectors x 1 and x 2 , respectively, are expressed by the following formula:
x m = x 1 m , , x i m , x n m T R n , m = 1 , 2 .
In addition, the following equation is established:
f ( x ) = f 1 ( x ) , , f i ( x ) , f n ( x ) T , G m = 0 , , 0 , g m , 0 , , 0 T , W ( x ) = ψ 1 ( x ) , , ψ i ( x ) , ψ n ( x ) T ,
where g m = I m o m γ r a m 1 and m = 1 , 2 , , n . G m is a positive symmetry matrix.
Define the following cost function:
V a l ( E ˙ , U , W ) = t E ˙ T Q E ˙ + U T R U + W T P W d τ = t U l t ( E ˙ , U , W ) d τ ,
where E is the error of the MUS, and the velocity error is defined as E ˙ = x 2 x ˙ 1 d ; x 1 d is the desired trajectory; Q , R = d i a g [ R 1 , R 2 , , R n ] , P are a given positive definite matrix; and U l t i ( E ˙ , U , W ) is the utility function.
H a m ( E ˙ , U , W ) = U l t i ( E ˙ , U , W ) + V a l T f ( x ) + G U + W x ¨ 1 d ,
where V a l ( E ˙ ) = V a l ( E ˙ ) E ˙ .
Define the optimal cost function:
V a l * ( E ˙ , U , W ) = min U , W 0 U l t ( E ˙ , U , W ) d τ .
According to optimality conditions H a m U = 0 and H a m W = 0 , the approximate optimal control strategy is obtained a follows:
U * = 1 2 R 1 G T V a l * .
W * = 1 2 P 1 V a l * .
Considering the cooperative differential game represented by Formula (10), each player needs to minimize the cost function of the corresponding coupling and define the control input matrix G = [ G 1 , G 2 , , G n ] ; then, the cooperative game problem is transformed to solve the approximate optimal control related to the cost function (10).
Then, Equations (10), (13) and (14) are brought into the Hamiltonian function (11), and the coupled HJB equation is as follows:
0 = V a l T f ( x ) x ¨ 1 d + E ˙ T Q E ˙ 1 4 ( V a l * ) T G P 1 G T ( V a l * ) 1 4 ( V a l * ) T P 1 ( V a l * ) .
The cost function can be deduced by Formula (15) V a l * , and the corresponding equilibrium solution of the Pareto equation is obtained. However, since Formula (15) is a nonlinear PDE, it is difficult to directly obtain the Pareto equilibrium solution of the system by analytical methods. Therefore, in the next section, the critic NN is used in the MUS so as to obtain the approximate optimal control strategy.

3.2. An Approximate Solution of Decentralized Approximate Optimal Control in a Cooperative Differential Game Based on a Critic Network

Dynamic compensation is important in MUS control, and a controller based on dynamic model compensation, local expectation information, and approximate optimal control law is figured as follows:
u i * = u i 1 + u i 2 + u i 3 * ,
where u i 1 is used to process dynamic models f i ( x i 1 , x i 2 ) , u i 2 deals with the coupling terms of the MUS using locally desired control information, and u i 3 * is the optimal compensation for uncertainty.
According to Formula (6), the design control law u i 1 to compensate a subsystem model that has been accurately modeled and measurable is as follows:
u i 1 = f ^ b v i x i 2 f ^ c o i + f ^ s t i e ( f ^ τ s i x i 2 2 ) s g n ( θ ˙ i ) g i 1 x ¨ i 1 d τ c o i γ r a i .
The critic neural network is
V a l ( E ˙ ) = W c r T δ c r ( E ˙ ) + ε c r ,
where W c r is the ideal weight vector of the critic neural network; δ c r ( E ˙ ) represents the activation function; and ε c r is the finite approximation error.
Taking the partial derivative of (18) yields
V a l ( E ˙ ) = δ c r ( E ˙ ) T W c r + ε c r ,
where δ c r ( E ˙ ) = δ c r ( E ˙ ) E ˙ and ε c r are the partial derivatives of the activation function and approximation error.
By substituting Formula (19) into (13) and (14), we get
U * = 1 2 R 1 G T δ c r ( E ˙ ) T W c r + ε c r ,
W * = 1 2 P 1 δ c r ( E ˙ ) T W c r + ε c r .
According to Formulas (20) and (21), each element of the expansion matrix can be obtained as
u i * = 1 2 R i 1 G i T δ c r ( E ˙ ) T W c r + ε c r .
ψ i * = 1 2 P i 1 δ c r ( E ˙ ) T W c r + ε c r .
Substituting Formulas (19), (20), and (21) into (11), we get
H a m ( E ˙ , U , W ) = E ˙ T Q E ˙ + U T R U + W T P W + V a l T f ( x ) + G U + W x ¨ 1 d e J h = 0 ,
where e J h is the approximation residual of the neural network.
In order to implement a decentralized control mechanism, the assumption of nominal limitation between coupling terms needs to be eliminated, so that the state of the coupled subsystem can be represented by the expected state of the other subsystems.
f i ( x ) = f i ( x i , x m d ) + Δ f i ( x , x m d ) , u m 2 = G m 1 ( x ˙ m 2 d f m ( x d ) ) , m i .
Here, x m d represents the desired state of the coupled subsystem m = 1 , , i 1 , i + 1 , , n , and Δ f i ( x , x m d ) is substitution error. Since the coupling term satisfies the local Lipschitz condition, there are
Δ f i ( x , x m d ) m = 1 , m i n d i m E m ,
where E m = x m x m d and d i m 0 .
The ideal critic neural network weight vector W c r is unknown, but it can be approximated as
V ^ a l ( E ˙ ) = W ^ c r T δ c r ( E ˙ ) ,
where W ^ c r and V ^ a l ( E ˙ ) are estimated values of W c r and V a l ( E ˙ ) .
Taking the partial derivative of (27) yields
V ^ a l ( E ˙ ) = δ c r ( E ˙ ) T W ^ c r .
Combined with Formulas (22), (23), and (28), the approximate decentralized optimal control laws are as follows:
u ^ i 3 * = 1 2 R i 1 G i T δ c r ( E ˙ ) T W ^ c r .
ψ ^ i * = 1 2 P i 1 δ c r ( E ˙ ) T W ^ c r .
Define F ( x ) = f 1 ( x 1 , x m d ) , f 2 ( x 2 , x m d ) , , f n ( x n , x m d ) . According to Formula (24), the approximate Hamiltonian function can be obtained:
H ^ a m ( E ˙ , U ^ , W ^ ) = E ˙ T Q E ˙ + U ^ T R U ^ + W ^ T P W ^ + V a l T F ( x ) + G U ^ + W ^ x ¨ 1 d = e J .
The approximation error of the Hamiltonian function is
e J = H ^ a m H a m ,
where e J = H ^ a m is obtained by Formulas (24) and (31).
The weight approximation error is defined as
W ˜ c r = W c r W ^ c r .
According to (24), (31), and (32), we can deduce
e J = e J h W ˜ c r T δ c r ( E ˙ ) E ¨ .
Define the following objective function:
E J = 1 2 e J T e J .
The weight update law for critic NNs is obtained as follows:
W ^ ˙ c r = α l e e J δ c r ( E ˙ ) E ¨ ,
where α l e is the critic neural network learning law. Define υ a u = δ c r ( E ˙ ) E ¨ , and assume that normal numbers for υ L b satisfy ν a u ν L b . The dynamic equation for weight approximation error can be deduced as follows:
W ˜ ˙ c r = W ^ ˙ c r = α l e e J υ a u = α l e ( e J h W ˜ c r T υ a u ) υ a u .
Finally, according to Formulas (17), (25), and (29), the decentralized approximate optimal control law based on the cooperative differential game via ACD is obtained as follows:
u ^ i * = u i 1 + u i 2 + u ^ i 3 * = f ^ b v i x i 2 f ^ c o i + f ^ s t i e ( f ^ τ s i x i 2 2 ) s g n ( θ ˙ i ) g i 1 x ¨ i 1 d τ c o i γ r a i + G m 1 ( x ˙ m 2 d f m ( x d ) ) 1 2 R i 1 G i T δ c r ( E ˙ ) T W ^ c r .

3.3. Fulfillment of Policy Iteration

In order to solve the difficulty of the HJB equation, a policy iteration method is proposed.
Step 1: Let k = 0 and begin with admissible control u i ( 0 ) , then choose a small positive number ε i .
Step 2: Let k = 0 ; based on admissible control u i ( k ) , solve V a l k ( E ˙ ) through E ˙ T Q E ˙ + U ( k ) T R U ( k ) + W ( k ) T P W + V a l ( k + 1 ) T f ( x ) + G U ( K ) + W ( K ) x ¨ 1 d e J h = 0 , with V a l ( k + 1 ) ( 0 ) = 0 .
Step 3: Update the control policy u i ( k ) with u i ( k + 1 ) = 1 2 R i 1 G i T ( x i ) V a l ( k + 1 ) .
Step 4: If V a l ( k + 1 ) V a l ( k ) ε i , stop; else, k = k + 1 , return to Step 2.
Theorem 1.
Given the initial control policy U ( 0 ) and the base policy iteration, the improved cost function and control policy converge to the optimal ones as k , i.e., V a l ( k ) V a l ( * ) , U ( k ) U ( * ) .
One has U ( k ) for any k 0 with initial policy U ( 0 ) . Then, one has integer k 0 i for any ω i . For k k 0 i , one has:
sup V a l ( k ) V a l ( * ) < ω i , sup U ( k ) U ( * ) < ω i .
The algorithm will converge to the improved cost function and optimal control.
Theorem 2.
Considering the dynamic model (1) of the MUS subsystem, its state space equation is shown in (7). Under the proposed decentralized approximate optimal control law (38) based on a cooperative differential game, the tracking error of the MUS is guaranteed to be UUB.
Choose V a l ( E ˙ ) = V c l ( t ) as a Lyapunov candidate function, the derivative of which can be obtained as follows:
V ˙ c l ( t ) = V a l T F ( x ) + G U + W x ¨ 1 d ,
where V s i = J i * e ˙ s i and V s j i = J i * e ˙ s j i .
Considering the HJB Equation (15), it can be obtained as follows:
V a l T F ( x ) x ¨ 1 d = E ˙ T Q E ˙ + 1 4 ( V a l * ) T G R 1 G T ( V a l * ) + 1 4 ( V a l * ) T P 1 ( V a l * ) .
By substituting Formula (40) into (39), we get
V ˙ c l ( t ) = E ˙ T Q E ˙ + 1 4 ( V a l * ) T G R 1 G T ( V a l * ) + 1 4 ( V a l * ) T P 1 ( V a l * ) + V a l T G U + W .
Combining this with Formula (41), it can be deduced that
V ˙ c l ( t ) = E ˙ T Q E ˙ V a l T G ( U * U ^ ) + ( W * W ^ ) + 1 4 ( V a l * ) T G R 1 G T ( V a l * ) + 1 4 ( V a l * ) T P 1 ( V a l * ) .
Substituting Formula (39) into (42) yields
V ˙ c l ( t ) = E ˙ T Q E ˙ + 1 4 ( V a l * ) T G R 1 G T ( V a l * ) + P 1 ( V a l * ) + 1 2 δ c r ( E ˙ ) T W c r + ε c r T G R 1 G T δ c r ( E ˙ ) T W c r + G T ε c r + P 1 δ c r ( E ˙ ) T W c r + ε c r = E ˙ T Q E ˙ + Π J ,
where Π J is with the upper bound:
Π J 1 4 ( V a l * ) T G R 1 G T ( V a l * ) + P 1 ( V a l * ) + 1 2 δ c r ( E ˙ ) T W c r + ε c r T G R 1 G T δ c r ( E ˙ ) T W c r + G T ε c r + P 1 δ c r ( E ˙ ) T W c r + ε c r π J ,
where π J is a computable positive number.
According to Formula (44), V ˙ c l ( t ) has the following upper bounds:
V ˙ c l ( t ) E ˙ T Q E ˙ + π J λ min ( Q ) E ˙ 2 + π J ,
where λ min is the minimum of the eigenvalue.
If E ˙ is outside the compact set
Ω c l = E ˙ : E ˙ π J λ min ( Q ) ,
it follows that Formula (39) is negative definite; that is, if Formula (46) is satisfied for any E ˙ 0 , then V ˙ c l ( t ) < 0 . Therefore, we can conclude that the trajectory tracking error is ultimately uniformly bounded under the decentralized approximate optimal control law (30) based on a cooperative differential game via ACD. Proof complete.

4. Experiment

4.1. Experimental Setup

The proposed approximate optimal control law based on the cooperative differential game was verified on a 2-DOF MUS experimental platform, the details of which are shown in Figure 2. The measurable joint control torque is measured by the torque sensor, and the joint position information can be obtained by a combination of the absolute encoder and the incremental encoder. Because the control system is built in the Simulink environment, the sampling interval of the system can be set by the software. We conducted several experiments with different related algorithms. To reduce the length of the article, two representative different control methods are used in this section, namely non-zero-sum game-based optimal control [7] and the proposed cooperative differential game-based adaptive critic design method. The critic neural network chose a 2-3-1 structure with a single hidden layer structure. The activation function was δ c r ( E ˙ ) = e ( E ˙ ς ) T ( E ˙ ς ) l , with ς = [ 0.5 , 0.5 ] and l = [ 1.28 , 1.28 ] . The learning rate of weight update was 0.82. Q = I , R = 0.97 I , and P = 0.94 I , where I is the identity matrix.

4.2. Experimental Results

(1)
Position tracking performance
Figure 3, Figure 4, and Figure 5, respectively, show the position tracking curves under the non-zero-sum game-based optimal control method [7,30] cooperative differential game-based approximate optimal control strategy. The desired values are those which the MUS is expected to achieve. The actual values are those of the MUS under the different control algorithms’ curves. If the difference between the desired values and actual values is small, we say that the performance is good. It can be seen from the figures that both the traditional control method and the proposed control strategy can ensure good position tracking performance. Figure 4 and Figure 5 show the position tracking error curves of the traditional and proposed control methods. Both methods guarantee that the position tracking error is within 0.002 rad. It can be seen from the graph that the actual position immediately deviates from the expected position when the curve deflects, but due to the good robustness of the proposed control system, the actual trajectory coincides with the expected trajectory again in a short time. The proposed method’s tracking error is reduced by nearly 30% in both joint one and joint two. The proper learning law shows good performance. Based on different experiments, we chose a learning law with a better performance of 0.82.
(2)
Control torque
Figure 6 and Figure 7 show the control torque curve under the existing and proposed control methods. Control torque has a serious tremor effect, which will affect the position tracking performance and durability of the DC motor. Each joint of the MUS has a certain cost function under the existing non-zero-sum game method. However, this method cannot optimize the whole performance of the MUS. The developed cooperative differential game method is the only one with a whole cost function, and it can minimize the total performance. Therefore, in Figure 7, the control torque is reasonably optimized due to the adoption of the cooperative game strategy based on ACD. The root mean squares of the tracking error and control torque are shown in Table 1. The values of the control torque when completing the same task should be approximately the same. However, the cooperative differential game-based ACD method can minimize the value of the control torque to the greatest extent possible. According to Table 1, the control torque in the proposed method is reduced by nearly 15%.
(3)
Critic NN weight
Figure 8 shows the weight curves of the neural network with the proposed control method. Due to the training of the neural network and the critic neural network’s policy iteration, the weights of the neural network are aggregated. However, because of the nature of RBFNNs, the aggregation of the curves is not to a specific value but within a small range.

5. Conclusions

This paper proposes an MUS approximate optimal control method based on cooperative differential game theory and solves the trajectory tracking problem. Firstly, a dynamic model of an MUS is established. Then, each module of the MUS is treated as a participant in the cooperative differential game, and the trajectory tracking problem of the MUS is transformed into an approximate optimal control problem through ACD based on the cooperative differential game. Using the critic network to approximate the joint performance index function of the system, an approximate optimal control law is obtained through a policy iteration algorithm. According to Lyapunov theory, the stability has been proven. Finally, the effectiveness was verified through an experimental platform.

Author Contributions

Conceptualization, L.S.; Methodology, L.S.; Software, L.S.; Validation, Y.L.; Formal analysis, Y.L.; Resources, L.Z. and Y.Q.; Data curation, L.Z.; Writing—original draft, L.S.; Writing—review & editing, Y.L.; Visualization, Y.L.; Supervision, Y.Q.; Project administration, Y.Q.; Funding acquisition, Y.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.

Conflicts of Interest

Liang Si, Yebao Liu, Luyang Zhong and Yuhan Qian were employed by the Aerospace Times Feihong Technology Company Limited.

References

  1. Liu, Y.-J.; Gao, B.; Yu, D.; Li, D.; Liu, L. Neuro-Adaptive Fault-Tolerant Attitude Control of a Quadrotor UAV with Flight Envelope Limitation and Feedforward Compensation. IEEE Trans. Syst. Man Cybern. Syst. 2025, 55, 3143–3151. [Google Scholar] [CrossRef]
  2. Xue, S.; Zhao, N.; Zhang, W.; Luo, B.; Liu, D. A Hybrid Adaptive Dynamic Programming for Optimal Tracking Control of USVs. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 9961–9969. [Google Scholar] [CrossRef]
  3. Huang, Y.; Xu, X.; Meng, Z.; Sun, J. A Smooth Distributed Formation Control Method for Quadrotor UAVs under Event-Triggering Mechanism and Switching Topologies. IEEE Trans. Veh. Technol. 2025, 74, 10081–10091. [Google Scholar] [CrossRef]
  4. Xue, S.; Zhang, W.; Luo, B.; Liu, D. Integral Reinforcement Learning-Based Dynamic Event-Triggered Nonzero-Sum Games of USVs. IEEE Trans. Cybern. 2025, 55, 1706–1716. [Google Scholar] [CrossRef]
  5. Luo, D.; Wang, Y.; Li, Z.; Song, Y.; Lewis, F.L. Asymptotic Leader-Following Consensus of Heterogeneous Multi-Agent Systems with Unknown and Time-Varying Control Gains. IEEE Trans. Autom. Sci. Eng. 2025, 22, 2768–2779. [Google Scholar] [CrossRef]
  6. Huang, Y.; Kuai, J.; Cui, S.; Meng, Z.; Sun, J. Distributed Algorithms via Saddle-Point Dynamics for Multi-Robot Task Assignment. IEEE Robot. Autom. Lett. 2024, 9, 11178–11185. [Google Scholar] [CrossRef]
  7. Liu, Y.; An, T.; Chen, J.; Zhong, L.; Qian, Y. Event-trigger Reinforcement Learning-based Coordinate Control of Modular Unmanned System via Nonzero-sum Game. Sensors 2025, 25, 314. [Google Scholar] [CrossRef]
  8. Ren, J.; Wang, D.; Li, M.; Qiao, J. Discounted Stable Adaptive Critic Design for Zero-Sum Games with Application Verifications. IEEE Trans. Autom. Sci. Eng. 2025, 22, 11706–11716. [Google Scholar] [CrossRef]
  9. Wang, D.; Hu, L.; Li, X.; Qiao, J. Online Fault-Tolerant Tracking Control with Adaptive Critic for Nonaffine Nonlinear Systems. IEEE/CAA J. Autom. Sin. 2025, 12, 215–227. [Google Scholar] [CrossRef]
  10. Zhang, Y.; Li, J.-Y. Reinforcement Learning-Based Distributed Robust Bipartite Consensus Control for Multispacecraft Systems with Dynamic Uncertainties. IEEE Trans. Ind. Inform. 2024, 20, 13341–13351. [Google Scholar] [CrossRef]
  11. Zhao, B.; Zhang, S.; Liu, D. Self-Triggered Approximate Optimal Neuro-Control for Nonlinear Systems Through Adaptive Dynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 4713–4723. [Google Scholar] [CrossRef]
  12. Lin, M.; Zhao, B.; Liu, D. Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method with Convergence Analysis. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 5574–5585. [Google Scholar] [CrossRef]
  13. Liu, N.; Zhang, K.; Xie, X.; Yue, D. UKF-Based Optimal Tracking Control for Uncertain Dynamic Systems with Asymmetric Input Constraints. IEEE Trans. Cybern. 2024, 54, 7224–7235. [Google Scholar] [CrossRef]
  14. Wang, K.; Mu, C.; Ni, Z.; Liu, D. Safe Reinforcement Learning and Adaptive Optimal Control with Applications to Obstacle Avoidance Problem. IEEE Trans. Autom. Sci. Eng. 2024, 21, 4599–4612. [Google Scholar] [CrossRef]
  15. Zhao, J.; Wang, Z.; Lv, Y.; Na, J.; Liu, C.; Zhao, Z. Data-Driven Learning for H∞ Control of Adaptive Cruise Control Systems. IEEE Trans. Veh. Technol. 2024, 73, 18348–18362. [Google Scholar] [CrossRef]
  16. Ding, C.; Zhang, Z.; Miao, Z.; Wang, Y. Event-Based Finite-Time Formation Tracking Control for UAV with Bearing Measurements. IEEE Trans. Ind. Electron. 2025, 72, 7482–7492. [Google Scholar] [CrossRef]
  17. Wang, K.; Mu, C. Learning-Based Control with Decentralized Dynamic Event-Triggering for Vehicle Systems. IEEE Trans. Ind. Inform. 2023, 19, 2629–2639. [Google Scholar] [CrossRef]
  18. Zhang, S.; Zhao, B.; Liu, D.; Zhang, Y. Event-Triggered Decentralized Integral Sliding Mode Control for Input-Constrained Nonlinear Large-Scale Systems with Actuator Failures. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 1914–1925. [Google Scholar] [CrossRef]
  19. Zhang, Y.; Zhao, B.; Liu, D.; Zhang, S. Distributed Fault Tolerant Consensus Control of Nonlinear Multiagent Systems via Adaptive Dynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 9041–9053. [Google Scholar] [CrossRef]
  20. Zhang, Z.; Wang, Y.; Miao, Z.; Jiang, Y.; Feng, Y. Asymptotic Stability Analysis and Stabilization Control for General Fractional-Order Neural Networks via an Unified Lyapunov Function. IEEE Trans. Netw. Sci. Eng. 2024, 11, 2675–2688. [Google Scholar] [CrossRef]
  21. Xia, H.; Hou, J.; Guo, P. Two-Level Local Observer-Based Decentralized Optimal Fault Tolerant Tracking Control for Unknown Nonlinear Interconnected Systems. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 1779–1790. [Google Scholar] [CrossRef]
  22. Liu, P.; Zhang, H.; Ming, Z.; Wang, S.; Agarwal, R.K. Dynamic Event-Triggered Safe Control for Nonlinear Game Systems with Asymmetric Input Saturation. IEEE Trans. Cybern. 2024, 54, 5115–5126. [Google Scholar] [CrossRef]
  23. Xia, H.; Wang, X.; Huang, D.; Sun, C. Cooperative-Critic Learning-Based Secure Tracking Control for Unknown Nonlinear Systems with Multisensor Faults. IEEE Trans. Cybern. 2025, 55, 282–294. [Google Scholar] [CrossRef]
  24. Qin, C.; Qiao, X.; Wang, J.; Zhang, D.; Hou, Y.; Hu, S. Barrier-Critic Adaptive Robust Control of Nonzero-Sum Differential Games for Uncertain Nonlinear Systems with State Constraints. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 50–63. [Google Scholar] [CrossRef]
  25. Wei, Q.; Jiang, H. Event-/Self-Triggered Adaptive Optimal Consensus Control for Nonlinear Multiagent System with Unknown Dynamics and Disturbances. IEEE Trans. Cybern. 2025, 55, 1476–1485. [Google Scholar] [CrossRef]
  26. An, T.; Dong, B.; Yan, H.; Liu, L.; Ma, B. Dynamic Event-Triggered Strategy-Based Optimal Control of Modular Robot Manipulator: A Multiplayer Nonzero-Sum Game Perspective. IEEE Trans. Cybern. 2024, 54, 7514–7526. [Google Scholar] [CrossRef] [PubMed]
  27. Zhang, K.; Zhang, Z.-X.; Xie, X.P.; Rubio, J.d.J. An Unknown Multiplayer Nonzero-Sum Game: Prescribed-Time Dynamic Event-Triggered Control via Adaptive Dynamic Programming. IEEE Trans. Autom. Sci. Eng. 2024, 22, 8317–8328. [Google Scholar] [CrossRef]
  28. Mu, C.; Wang, K.; Ni, Z.; Sun, C. Cooperative Differential Game-Based Optimal Control and Its Application to Power Systems. IEEE Trans. Ind. Inform. 2020, 16, 5169–5179. [Google Scholar] [CrossRef]
  29. An, T.; Wang, Y.; Liu, G.; Li, Y.; Dong, B. Cooperative Game-Based Approximate Optimal Control of Modular Robot Manipulators for Human–Robot Collaboration. IEEE Trans. Cybern. 2023, 53, 4691–4703. [Google Scholar] [CrossRef]
  30. Belhenniche, A.; Chertovskih, R.; Gonçalves, R. Convergence Analysis of Reinforcement Learning Algorithms Using Generalized Weak Contraction Mappings. Symmetry 2025, 17, 750. [Google Scholar] [CrossRef]
Figure 1. Joint module of MUS.
Figure 1. Joint module of MUS.
Symmetry 17 01665 g001
Figure 2. MUS (a) platform (b) joint module platform.
Figure 2. MUS (a) platform (b) joint module platform.
Symmetry 17 01665 g002
Figure 3. Position tracking via proposed method. (a) joint 1 (b) joint 2.
Figure 3. Position tracking via proposed method. (a) joint 1 (b) joint 2.
Symmetry 17 01665 g003
Figure 4. Position error via existing method. (a) joint 1 (b) joint 2.
Figure 4. Position error via existing method. (a) joint 1 (b) joint 2.
Symmetry 17 01665 g004
Figure 5. Position (a) joint 1 (b) joint 2 error via proposed method.
Figure 5. Position (a) joint 1 (b) joint 2 error via proposed method.
Symmetry 17 01665 g005
Figure 6. Control (a) joint 1 (b) joint 2 torque curves via existing method.
Figure 6. Control (a) joint 1 (b) joint 2 torque curves via existing method.
Symmetry 17 01665 g006
Figure 7. Control (a) joint 1 (b) joint 2 torque curves via proposed method.
Figure 7. Control (a) joint 1 (b) joint 2 torque curves via proposed method.
Symmetry 17 01665 g007
Figure 8. NN (a) joint 1 (b) joint 2 weight.
Figure 8. NN (a) joint 1 (b) joint 2 weight.
Symmetry 17 01665 g008
Table 1. Root mean squares.
Table 1. Root mean squares.
Joint 1Joint 2
Position error of proposed method1.88 × 10 3 rad1.23 × 10 3 rad
Position error of existing method2.56 × 10 3 rad1.79 × 10 3 rad
Control torque of proposed method0.36 Nm0.19 Nm
Control torque of existing method0.42 Nm0.22 Nm
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Si, L.; Liu, Y.; Zhong, L.; Qian, Y. Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach. Symmetry 2025, 17, 1665. https://doi.org/10.3390/sym17101665

AMA Style

Si L, Liu Y, Zhong L, Qian Y. Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach. Symmetry. 2025; 17(10):1665. https://doi.org/10.3390/sym17101665

Chicago/Turabian Style

Si, Liang, Yebao Liu, Luyang Zhong, and Yuhan Qian. 2025. "Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach" Symmetry 17, no. 10: 1665. https://doi.org/10.3390/sym17101665

APA Style

Si, L., Liu, Y., Zhong, L., & Qian, Y. (2025). Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach. Symmetry, 17(10), 1665. https://doi.org/10.3390/sym17101665

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop