Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach

Si, Liang; Liu, Yebao; Zhong, Luyang; Qian, Yuhan

doi:10.3390/sym17101665

Open AccessArticle

Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach

by

Liang Si

^1,2,

Yebao Liu

^1,*,

Luyang Zhong

¹ and

Yuhan Qian

¹

Aerospace Times Feihong Technology Company Limited, Beijing 100094, China

²

National Elite Institute of Engineering, Northwestern Polytechnical University, Xi’an 710072, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(10), 1665; https://doi.org/10.3390/sym17101665

Submission received: 5 August 2025 / Revised: 25 August 2025 / Accepted: 10 September 2025 / Published: 6 October 2025

(This article belongs to the Special Issue Symmetries in Dynamical Systems and Control Theory)

Download

Browse Figures

Versions Notes

Abstract

An approximate optimal control issue for modular unmanned systems (MUSs) is presented via a cooperative differential game for solving the trajectory tracking problem. Initially, the modular unmanned system’s dynamic model is built with the joint torque feedback technique. The moment of inertia of the motor rotor has positive symmetry. Each MUS module is deemed as a participant in the cooperative differential game. Then, the MUS trajectory tracking problem is transformed into an approximate optimal control problem by means of adaptive critic design (ACD). The approximate optimal control is obtained by the critic network, approaching the joint performance index function of the system. The stability of the closed-loop system is proved through Lyapunov theory. The feasibility of the proposed control algorithm is verified by an experimental platform.

Keywords:

adaptive critic design; approximate optimal control; modular unmanned systems; cooperative differential game

1. Introduction

Unmanned systems [1,2] refer to collections of intelligent equipment that realize unmanned or semi-autonomous operation through advanced sensing, artificial intelligence, autonomous navigation, and other technologies, including unmanned aerial vehicles [3], unmanned vehicles [4], unmanned ships [5], robots [6], and cross-domain collaborative unmanned cluster systems. Their core goals are to improve task efficiency, reduce human risk, and push the limits of human physiology in complex, dangerous, or repetitive scenarios through robot replacement or human–robot collaboration. However, traditional unmanned systems usually have fixed configurations, thus limiting their further application. In order to solve the above challenges, modular unmanned systems (MUSs) [7] have been proposed and applied to many aspects, such as autonomous assembly of large-aperture telescopes in space via the handling, assembly, and connection of trusses and lenses. In addition, modular reconfigurable robots use the same ideas as MUSs.

An ideal MUS should not only ensure its stability and accuracy under control, but also consider its optimality. Adaptive critic design (ACD) [8,9], as a branch of optimal control, is a cutting-edge method at the intersection of machine learning, control theory, and operations research which aims to solve the real-time optimization and decision problems of complex dynamic systems. Through the integration of reinforcement learning, neural networks, and dynamic programming, it overcomes the dependence of traditional dynamic programming on precise mathematical models, and realizes online learning and adaptive adjustment of strategies in unknown or time-varying environments, known as the intelligent optimization brain of dynamic systems. Adaptive critic design has many applications, such as in continuous-time systems [10,11], discrete-time systems [12,13], data-driven systems [14,15], event-triggered systems [16,17], fault diagnosis systems [18,19], and model uncertain system [20,21]. However, most of the systems mentioned above have a single controller, and with the gradual expansion of the system scale, multi-controller or multi-player systems should be further considered to optimize overall system performance.

Differential game theory [22,23] is a frontier field combining game theory and dynamic system theory, focusing on the strategic interaction between multiple decision-makers under continuous time frames. Different from traditional static games, differential games model the strategy selection of the participants as a time-dependent state variable control process, and describe the dynamic evolution of the system state through differential equations. Each participant adjusts their control strategy according to real-time information. In the process of pursuing the optimization of individual objective functions, it is necessary for one player to predict the influence of the opponent’s behavior on the system trajectory and consider the chain reaction of its own strategy on the global dynamics. There are many types of games, such as zero-sum games [24,25], non-zero-sum games [26,27], etc. Cooperative games [28,29], as special non-zero-sum games, have only one performance index, and each player hopes to optimize the overall performance index function of the system through cooperation. Therefore, the control problem of MUSs is suitable to be solved by the concept of cooperative differential games.

This paper proposes an approximate optimal control method based on cooperative differential games for MUSs and addresses the trajectory tracking problem. First, a dynamic model is established. Then, each module of the MUS is treated as a participant in a cooperative differential game, transforming the trajectory tracking problem into an approximate optimal control problem based on a cooperative differential game via ACD. A critic network is employed to approximate the system’s joint performance index function, and control law is derived through a policy iteration algorithm. The stability is proven using Lyapunov theory. The effectiveness of the proposed algorithm is validated through an experimental platform. This paper’s contributions are as follows:

It is the first paper to use cooperative differential games for MUSs via ACD to guarantee accuracy and optimality. The developed control method is verified on the actual platform.
The experimental results are verified via tracking error and control torque under the developed cooperative game method using ACD.

Notation

x is the global state of the MUS; E is the position tracking error of the MUS; U is the global optimal control; W is the global coupled strategy;

Q, R, P

are a given positive definite matrix;

U_{l t i} (\dot{E}, U, W)

is the utility function; and

V_{a l} (\dot{E}, U, W)

is the cost function.

2. Dynamic Model of MUS

According to the dynamic modeling method of MUS based on the JTF [26], we obtain

I_{m o i} γ_{r a i} {\ddot{θ}}_{i} + f_{l u i} (θ_{i}, {\dot{θ}}_{i}) + I_{d c i} (θ, \dot{θ}, \ddot{θ}) + \frac{τ_{c o i}}{γ_{r a i}} = τ_{i},

(1)

where

I_{m o i}

is the moment of inertia of the motor rotor in relation to the rotating shaft;

γ_{r a i}

is the reduction ratio of the motor;

θ_{i}, {\dot{θ}}_{i}

, and

{\ddot{θ}}_{i}

are the position vector, velocity vector, and acceleration vector of the joint of the ith subsystem of the MUS;

f_{l u i} (θ_{i}, {\dot{θ}}_{i})

is the concentrated joint friction torque;

I_{d c i} (θ, \dot{θ}, \ddot{θ})

is the coupling term between joint subsystems;

τ_{c o i}

is the coupling joint torque measurement; and

τ_{i}

indicates control torque.

The joint module of the MUS is shown in Figure 1.

Property 1.

The moment of inertia of the motor rotor

I_{m o i}

has positive symmetry. It will be useful in the stability analysis.

(1): Coupling joint torque measurement $τ_{c o i}$

Coupling joint torque

τ_{c o i}

is measured and mainly consists of

τ_{c o i} = τ_{c o f i} + τ_{c o c i},

(2)

where

τ_{c o f i}

is the joint torque measured in free space, and

τ_{c o c i}

represents the value of the external torque generated by the continuous or instantaneous contact environment. Additionally, the joint torque values in the constrained space

τ_{c o i}

and free space

τ_{c o f i}

are easily found; therefore, the external torque

τ_{c o c i}

can be obtained by Formula (2).

(2): Concentrated joint friction torque $f_{l u i} (θ_{i}, {\dot{θ}}_{i})$

Concentrated joint friction torque

f_{l u i} (θ_{i}, {\dot{θ}}_{i})

reflects the friction torque on the DC motor and reducer of the joint module.

f_{l u i} (θ_{i}, {\dot{θ}}_{i})

is composed of nonlinear functions related to joint position and velocity:

f_{l u i} (θ_{i}, {\dot{θ}}_{i}) = f_{b v i} {\dot{θ}}_{i} + (f_{c o i} + f_{s t i} e^{(- f_{τ s i} {\dot{θ}}_{i}^{2})}) s g n ({\dot{θ}}_{i}) + f_{p d i} (θ_{i}, {\dot{θ}}_{i}),

(3)

where

f_{b v i}, f_{c o i}, f_{s t i}

, and

f_{τ s i}

represent the viscous, Coulomb, static, and Stribeck friction of subsystem module i, respectively;

f_{p d i} (θ_{i}, {\dot{θ}}_{i})

represents position-dependent friction term; and

s g n ({\dot{θ}}_{i})

is a symbolic function.

According to linearization criteria,

f_{p d i}

is approximated by the following formula:

{\hat{f}}_{l u i} (θ_{i}, {\dot{θ}}_{i}) = {\hat{f}}_{b v i} {\dot{θ}}_{i} + ({\hat{f}}_{c o i} + {\hat{f}}_{s t i} e^{(- {\hat{f}}_{τ s i} {\dot{θ}}_{i}^{2})}) s g n ({\dot{θ}}_{i}) + f_{p d i} (θ_{i}, {\dot{θ}}_{i}) + K ({\dot{θ}}_{i}) {\tilde{F}}_{p u i},

(4)

where

{\hat{f}}_{b v i}, {\hat{f}}_{c o i}, {\hat{f}}_{s t i}

, and

{\hat{f}}_{τ s i}

are the approximate approximation values of

f_{b v i}, f_{c o i}, f_{s t i}

, and

f_{τ s i}

;

{\tilde{F}}_{p u i} = {[\begin{matrix} f_{b v i} - {\hat{f}}_{b v i} & f_{c o i} - {\hat{f}}_{c o i} & f_{s t i} - {\hat{f}}_{s t i} & f_{τ s i} - {\hat{f}}_{τ s i} \end{matrix}]}^{T}

is the parameter uncertainty of friction torque; and

K ({\dot{θ}}_{i}) = [{\dot{θ}}_{i}, s g n ({\dot{θ}}_{i}), e^{(- {\hat{f}}_{τ s i} {\dot{θ}}_{i}^{2})} s g n ({\dot{θ}}_{i}), - {\hat{f}}_{s t i} {\dot{θ}}_{i}^{2} e^{(- {\hat{f}}_{τ s i} {\dot{θ}}_{i}^{2})} s g n ({\dot{θ}}_{i})] .

Remark 1.

Friction torque parameters

f_{b v i}, f_{c o i}, f_{s t i}, f_{τ s i}

and their estimates

{\hat{f}}_{b v i}, {\hat{f}}_{c o i}, {\hat{f}}_{s t i}, {\hat{f}}_{τ s i}

are uniformly bounded, and the upper bound of

{\tilde{F}}_{p u i}

can be defined as

∥{\tilde{F}}_{p u i}∥ \leq ρ_{p u i k} (k = 1, 2, 3, 4)

, where

ρ_{p u i k}

is a known upper bound constant. Therefore, the friction modeling error

K ({\dot{θ}}_{i}) {\tilde{F}}_{p u i}

has the following relationship

∥K ({\dot{θ}}_{i}) {\tilde{F}}_{p u i}∥ \leq ∥K ({\dot{θ}}_{i})∥ ρ_{p u i k}

. In addition, the upper bound of the position-dependent friction term

f_{p d i}

is

∥f_{p d i} (θ_{i}, {\dot{θ}}_{i})∥ \leq ρ_{f p d i}

, where

ρ_{f p d i}

is a known upper bound constant.

(3): Coupling term $I_{d c i} (θ, \dot{θ}, \ddot{θ})$

The coupling term

I_{d c i} (θ, \dot{θ}, \ddot{θ})

, related to the coupling dynamics of the robot’s global vector, is

\begin{matrix} I_{d c i} (θ, \dot{θ}, \ddot{θ}) = I_{m o i} \sum_{k = 1}^{i - 1} a_{m o i}^{T} a_{l n k} {\ddot{θ}}_{k} + I_{m o i} \sum_{k = 2}^{i - 1} \sum_{m = 1}^{k - 1} a_{m o i}^{T} (a_{l n m} \times a_{l n k}) {\dot{θ}}_{m} {\dot{θ}}_{k} \\ = I_{m o i} \sum_{k = 1}^{i - 1} Λ_{k}^{i} {\ddot{θ}}_{k} + I_{m o i} \sum_{k = 2}^{i - 1} \sum_{m = 1}^{k - 1} Ω_{m k}^{i} {\dot{θ}}_{m} {\dot{θ}}_{k} \\ = \sum_{k = 1}^{i - 1} [\begin{matrix} I_{m o i} {\hat{Λ}}_{k}^{i} & I_{m o i} \end{matrix}] {[\begin{matrix} {\ddot{θ}}_{k} & {\tilde{Λ}}_{k}^{i} {\ddot{θ}}_{k} \end{matrix}]}^{T} + \sum_{k = 2}^{i - 1} \sum_{m = 1}^{k - 1} [\begin{matrix} I_{m o i} {\hat{Ω}}_{m k}^{i} & I_{m o i} \end{matrix}] {[\begin{matrix} {\dot{θ}}_{m} {\dot{θ}}_{k} & {\tilde{Ω}}_{m k}^{i} {\dot{θ}}_{m} {\dot{θ}}_{k} \end{matrix}]}^{T}, \end{matrix}

(5)

where

a_{m o i}

,

a_{l n m}

, and

a_{l n k}

are unit vectors around the rotation axes of the ith motor and the m and k joints, respectively;

Λ_{k}^{i} = a_{m o i}^{T} a_{l n k}

; and

Ω_{m k}^{i} = a_{m o i}^{T} (a_{l n m} \times a_{l n k})

. In addition,

{\hat{Λ}}_{k}^{i} = Λ_{k}^{i} - {\tilde{Λ}}_{k}^{i}

and

{\hat{Ω}}_{m k}^{i} = Ω_{m k}^{i} - {\tilde{Ω}}_{m k}^{i}

, where

{\hat{Λ}}_{k}^{i}

and

{\hat{Ω}}_{m k}^{i}

are estimated values of

Λ_{k}^{i}

and

Ω_{m k}^{i}

, and

{\tilde{Λ}}_{k}^{i}

and

{\tilde{Ω}}_{m k}^{i}

are calibration errors.

Remark 2.

For coupling term

I_{d c i} (θ, \dot{θ}, \ddot{θ})

, the dot product between

a_{m o i}

,

a_{l n m}

, and

a_{l n k}

is bounded, and satisfies

∥Λ_{k}^{i}∥ = ∥a_{m o i}^{T} a_{l n k}∥ < 1

and

∥Ω_{m k}^{i}∥ = ∥a_{m o i}^{T} (a_{l n m} \times a_{l n k})∥ < 1

. When the k and m

(1 < k, m < i - 1)

joints near the base joint are stable, the coupling term

I_{d c i} (θ, \dot{θ}, \ddot{θ})

can be given the following upper bounds:

∥I_{d c i} (θ, \dot{θ}, \ddot{θ})∥

< ρ_{d c i}

, where

ρ_{d c i}

is bounded by a positive number. Therefore, while stabilizing the current joint, the MUS can be stabilized step-by-step, thus making the system globally stable.

Define

x_{i} = {[x_{i 1}, x_{i 2}]}^{T} = {[θ_{i}, θ_{i}]}^{T}

,

u_{i} = τ_{i} = R^{1 \times 1}

. Then, (1) can be converted into a state space equation:

\{\begin{matrix} {\dot{x}}_{i 1} = x_{i 2} \\ {\dot{x}}_{i 2} = f_{i} (x_{i 1}, x_{i 2}) + g_{i} (x_{i 1}) u_{i} + ψ_{i} (x) \\ y_{i} = x_{i 1} \end{matrix},

(6)

where

f_{i} (x_{i 1}, x_{i 2})

,

g_{i} (x_{i 1})

, and

ψ_{i} (x)

are the measurable dynamics, control input matrix, and uncertainty of MUS, respectively, and

\begin{matrix} g_{i} (x_{i 1}) = \frac{1}{I_{m o i} γ_{r a i}} \\ f_{i} (x_{i 1}, x_{i 2}) = - g_{i} (x_{i 1}) ({\hat{f}}_{b v i} {\dot{θ}}_{i} + ({\hat{f}}_{c o i} + {\hat{f}}_{s t i} e^{(- {\hat{f}}_{τ s i} {\dot{θ}}_{i}^{2})}) s g n ({\dot{θ}}_{i}) + f_{p d i} (θ_{i}, {\dot{θ}}_{i}) + K ({\dot{θ}}_{i}) {\tilde{F}}_{p u i} + \frac{τ_{c o i}}{γ_{r a i}}) \\ ψ_{i} (x) = - g_{i} (x_{i 1}) I_{d c i} (θ, \dot{θ}, \ddot{θ}) . \end{matrix}

3. Approximate Optimal-Control-Based Cooperative Differential Game via ACD

3.1. Problem Description

In this section, an approximate optimal control method based on a cooperative differential game created via ACD is proposed. In order to ensure the convenience of controller design, the state space equation of the following system is considered:

\{\begin{matrix} {\dot{x}}_{1} = x_{2} \\ {\dot{x}}_{2} = f (x) + \sum_{m = 1}^{n} G_{m} u_{m} + W (x) \\ y = x_{1} \end{matrix},

(7)

where

x = {[x_{1}^{T}, x_{2}^{T}]}^{T} \in R^{2 n}

is the global state of the MUS, and vectors

x_{1}

and

x_{2}

, respectively, are expressed by the following formula:

x_{m} = {[x_{1 m}, \dots, x_{i m} \dots, x_{n m}]}^{T} \in R^{n}, m = 1, 2 .

(8)

In addition, the following equation is established:

\begin{matrix} f (x) = {[f_{1} (x), \dots, f_{i} (x) \dots, f_{n} (x)]}^{T}, \\ G_{m} = {[0, \dots, 0, g_{m}, 0, \dots, 0]}^{T}, \\ W (x) = {[ψ_{1} (x), \dots, ψ_{i} (x) \dots, ψ_{n} (x)]}^{T}, \end{matrix}

(9)

where

g_{m} = {(I_{m o m} γ_{r a m})}^{- 1}

and

m = 1, 2, \dots, n

.

G_{m}

is a positive symmetry matrix.

Define the following cost function:

V_{a l} (\dot{E}, U, W) = \int_{t}^{\infty} ({\dot{E}}^{T} Q \dot{E} + U^{T} R U + W^{T} P W) d τ = \int_{t}^{\infty} U_{l t} (\dot{E}, U, W) d τ,

(10)

where E is the error of the MUS, and the velocity error is defined as

\dot{E} = x_{2} - {\dot{x}}_{1 d}

;

x_{1 d}

is the desired trajectory;

Q, R = d i a g [R_{1}, R_{2}, \dots, R_{n}], P

are a given positive definite matrix; and

U_{l t i} (\dot{E}, U, W)

is the utility function.

H_{a m} (\dot{E}, U, W) = U_{l t i} (\dot{E}, U, W) + (\nabla V_{a l}^{T}) (f (x) + G U + W - {\ddot{x}}_{1 d}),

(11)

where

\nabla V_{a l} (\dot{E}) = \frac{\partial V_{a l} (\dot{E})}{\partial \dot{E}}

.

Define the optimal cost function:

V_{a l}^{*} (\dot{E}, U, W) = min_{U, W} \int_{0}^{\infty} (U_{l t} (\dot{E}, U, W)) d τ .

(12)

According to optimality conditions

\frac{\partial H_{a m}}{\partial U} = 0

and

\frac{\partial H_{a m}}{\partial W} = 0

, the approximate optimal control strategy is obtained a follows:

U^{*} = - \frac{1}{2} R^{- 1} G^{T} \nabla V_{a l}^{*} .

(13)

W^{*} = - \frac{1}{2} P^{- 1} \nabla V_{a l}^{*} .

(14)

Considering the cooperative differential game represented by Formula (10), each player needs to minimize the cost function of the corresponding coupling and define the control input matrix

G = [G_{1}, G_{2}, \dots, G_{n}]

; then, the cooperative game problem is transformed to solve the approximate optimal control related to the cost function (10).

Then, Equations (10), (13) and (14) are brought into the Hamiltonian function (11), and the coupled HJB equation is as follows:

0 = (\nabla V_{a l}^{T}) (f (x) - {\ddot{x}}_{1 d}) + {\dot{E}}^{T} Q \dot{E} - \frac{1}{4} {(\nabla V_{a l}^{*})}^{T} G P^{- 1} G^{T} (\nabla V_{a l}^{*}) - \frac{1}{4} {(\nabla V_{a l}^{*})}^{T} P^{- 1} (\nabla V_{a l}^{*}) .

(15)

The cost function can be deduced by Formula (15)

\nabla V_{a l}^{*}

, and the corresponding equilibrium solution of the Pareto equation is obtained. However, since Formula (15) is a nonlinear PDE, it is difficult to directly obtain the Pareto equilibrium solution of the system by analytical methods. Therefore, in the next section, the critic NN is used in the MUS so as to obtain the approximate optimal control strategy.

3.2. An Approximate Solution of Decentralized Approximate Optimal Control in a Cooperative Differential Game Based on a Critic Network

Dynamic compensation is important in MUS control, and a controller based on dynamic model compensation, local expectation information, and approximate optimal control law is figured as follows:

u_{i}^{*} = u_{i 1} + u_{i 2} + u_{i 3}^{*},

(16)

where

u_{i 1}

is used to process dynamic models

f_{i} (x_{i 1}, x_{i 2})

,

u_{i 2}

deals with the coupling terms of the MUS using locally desired control information, and

u_{i 3}^{*}

is the optimal compensation for uncertainty.

According to Formula (6), the design control law

u_{i 1}

to compensate a subsystem model that has been accurately modeled and measurable is as follows:

u_{i 1} = - (- {\hat{f}}_{b v i} x_{i 2} - ({\hat{f}}_{c o i} + {\hat{f}}_{s t i} e^{(- {\hat{f}}_{τ s i} x_{i 2}^{2})}) s g n ({\dot{θ}}_{i}) - g_{i}^{- 1} {\ddot{x}}_{i 1 d} - \frac{τ_{c o i}}{γ_{r a i}}) .

(17)

The critic neural network is

V_{a l} (\dot{E}) = W_{c r}^{T} δ_{c r} (\dot{E}) + ε_{c r},

(18)

where

W_{c r}

is the ideal weight vector of the critic neural network;

δ_{c r} (\dot{E})

represents the activation function; and

ε_{c r}

is the finite approximation error.

Taking the partial derivative of (18) yields

\nabla V_{a l} (\dot{E}) = {(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + \nabla ε_{c r},

(19)

where

\nabla δ_{c r} (\dot{E}) = \frac{\partial (δ_{c r} (\dot{E}))}{\partial \dot{E}}

and

\nabla ε_{c r}

are the partial derivatives of the activation function and approximation error.

By substituting Formula (19) into (13) and (14), we get

U^{*} = - \frac{1}{2} R^{- 1} G^{T} ({(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + \nabla ε_{c r}),

(20)

W^{*} = - \frac{1}{2} P^{- 1} ({(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + \nabla ε_{c r}) .

(21)

According to Formulas (20) and (21), each element of the expansion matrix can be obtained as

u_{i}^{*} = - \frac{1}{2} R_{i}^{- 1} G_{i}^{T} ({(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + \nabla ε_{c r}) .

(22)

{ψ_{i}}^{*} = - \frac{1}{2} P_{i}^{- 1} ({(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + \nabla ε_{c r}) .

(23)

Substituting Formulas (19), (20), and (21) into (11), we get

H_{a m} (\dot{E}, U, W) = {\dot{E}}^{T} Q \dot{E} + U^{T} R U + W^{T} P W + (\nabla V_{a l}^{T}) (f (x) + G U + W - {\ddot{x}}_{1 d}) - e_{J h} = 0,

(24)

where

e_{J h}

is the approximation residual of the neural network.

In order to implement a decentralized control mechanism, the assumption of nominal limitation between coupling terms needs to be eliminated, so that the state of the coupled subsystem can be represented by the expected state of the other subsystems.

\begin{matrix} f_{i} (x) = f_{i} (x_{i}, x_{m d}) + Δ f_{i} (x, x_{m d}), \\ u_{m 2} = G_{m}^{- 1} ({\dot{x}}_{m 2 d} - f_{m} (x_{d})), m \neq i . \end{matrix}

(25)

Here,

x_{m d}

represents the desired state of the coupled subsystem

m = 1, \dots, i - 1, i + 1, \dots, n

, and

Δ f_{i} (x, x_{m d})

is substitution error. Since the coupling term satisfies the local Lipschitz condition, there are

∥Δ f_{i} (x, x_{m d})∥ \leq \sum_{m = 1, m \neq i}^{n} d_{i m} E_{m},

(26)

where

E_{m} = ∥x_{m} - x_{m d}∥

and

d_{i m} \geq 0

.

The ideal critic neural network weight vector

W_{c r}

is unknown, but it can be approximated as

{\hat{V}}_{a l} (\dot{E}) = {\hat{W}}_{c r}^{T} δ_{c r} (\dot{E}),

(27)

where

{\hat{W}}_{c r}

and

{\hat{V}}_{a l} (\dot{E})

are estimated values of

W_{c r}

and

V_{a l} (\dot{E})

.

Taking the partial derivative of (27) yields

\nabla {\hat{V}}_{a l} (\dot{E}) = {(\nabla δ_{c r} (\dot{E}))}^{T} {\hat{W}}_{c r} .

(28)

Combined with Formulas (22), (23), and (28), the approximate decentralized optimal control laws are as follows:

{\hat{u}}_{i 3}^{*} = - \frac{1}{2} R_{i}^{- 1} G_{i}^{T} ({(\nabla δ_{c r} (\dot{E}))}^{T} {\hat{W}}_{c r}) .

(29)

{\hat{ψ}}_{i}^{*} = - \frac{1}{2} P_{i}^{- 1} ({(\nabla δ_{c r} (\dot{E}))}^{T} {\hat{W}}_{c r}) .

(30)

Define

F (x) = [f_{1} (x_{1}, x_{m d}), f_{2} (x_{2}, x_{m d}), \dots, f_{n} (x_{n}, x_{m d})]

. According to Formula (24), the approximate Hamiltonian function can be obtained:

{\hat{H}}_{a m} (\dot{E}, \hat{U}, \hat{W}) = {\dot{E}}^{T} Q \dot{E} + {\hat{U}}^{T} R \hat{U} + {\hat{W}}^{T} P \hat{W} + (\nabla V_{a l}^{T}) (F (x) + G \hat{U} + \hat{W} - {\ddot{x}}_{1 d}) = e_{J} .

(31)

The approximation error of the Hamiltonian function is

e_{J} = {\hat{H}}_{a m} - H_{a m},

(32)

where

e_{J} = {\hat{H}}_{a m}

is obtained by Formulas (24) and (31).

The weight approximation error is defined as

{\tilde{W}}_{c r} = W_{c r} - {\hat{W}}_{c r} .

(33)

According to (24), (31), and (32), we can deduce

e_{J} = e_{J h} - {\tilde{W}}_{c r}^{T} \nabla δ_{c r} (\dot{E}) \ddot{E} .

(34)

Define the following objective function:

E_{J} = \frac{1}{2} e_{J}^{T} e_{J} .

(35)

The weight update law for critic NNs is obtained as follows:

{\dot{\hat{W}}}_{c r} = - α_{l e} e_{J} \nabla δ_{c r} (\dot{E}) \ddot{E},

(36)

where

α_{l e}

is the critic neural network learning law. Define

υ_{a u} = \nabla δ_{c r} (\dot{E}) \ddot{E}

, and assume that normal numbers for

υ_{L b}

satisfy

∥ν_{a u}∥ \leq ν_{L b}

. The dynamic equation for weight approximation error can be deduced as follows:

{\dot{\tilde{W}}}_{c r} = - {\dot{\hat{W}}}_{c r} = α_{l e} e_{J} υ_{a u} = α_{l e} (e_{J h} - {\tilde{W}}_{c r}^{T} υ_{a u}) υ_{a u} .

(37)

Finally, according to Formulas (17), (25), and (29), the decentralized approximate optimal control law based on the cooperative differential game via ACD is obtained as follows:

\begin{matrix} {\hat{u}}_{i}^{*} = u_{i 1} + u_{i 2} + {\hat{u}}_{i 3}^{*} = - (- {\hat{f}}_{b v i} x_{i 2} - ({\hat{f}}_{c o i} + {\hat{f}}_{s t i} e^{(- {\hat{f}}_{τ s i} x_{i 2}^{2})}) s g n ({\dot{θ}}_{i}) - g_{i}^{- 1} {\ddot{x}}_{i 1 d} - \frac{τ_{c o i}}{γ_{r a i}}) \\ + G_{m}^{- 1} ({\dot{x}}_{m 2 d} - f_{m} (x_{d})) - \frac{1}{2} R_{i}^{- 1} G_{i}^{T} ({(\nabla δ_{c r} (\dot{E}))}^{T} {\hat{W}}_{c r}) . \end{matrix}

(38)

3.3. Fulfillment of Policy Iteration

In order to solve the difficulty of the HJB equation, a policy iteration method is proposed.

Step 1: Let

k = 0

and begin with admissible control

u_{i}^{(0)}

, then choose a small positive number

ε_{i}

.

Step 2: Let

k = 0

; based on admissible control

u_{i}^{(k)}

, solve

V_{a l}^{k} (\dot{E})

through

{\dot{E}}^{T} Q \dot{E} + {(U^{(k)})}^{T} R U^{(k)} + {(W^{(k)})}^{T} P W + {(\nabla V_{a l}^{(k + 1)})}^{T} (f (x) + G U^{(K)} + W^{(K)} - {\ddot{x}}_{1 d}) - e_{J h} = 0,

with

V_{a l}^{(k + 1)} (0) = 0

.

Step 3: Update the control policy

u_{i}^{(k)}

with

u_{i}^{(k + 1)} = - \frac{1}{2} R_{i}^{- 1} G_{i}^{T} (x_{i}) \nabla V_{a l}^{(k + 1)}

.

Step 4: If

∥V_{a l}^{(k + 1)} - V_{a l}^{(k)}∥ \leq ε_{i}

, stop; else,

k = k + 1

, return to Step 2.

Theorem 1.

Given the initial control policy

U^{(0)}

and the base policy iteration, the improved cost function and control policy converge to the optimal ones as

k \to \infty

, i.e.,

V_{a l}^{(k)} \to V_{a l}^{(*)}, U^{(k)} \to U^{(*)} .

One has

U^{(k)}

for any

k \geq 0

with initial policy

U^{(0)}

. Then, one has integer

k_{0 i}

for any

ω_{i}

. For

k \geq k_{0 i}

, one has:

sup |V_{a l}^{(k)} - V_{a l}^{(*)}| < ω_{i}, sup |U^{(k)} - U^{(*)}| < ω_{i} .

The algorithm will converge to the improved cost function and optimal control.

Theorem 2.

Considering the dynamic model (1) of the MUS subsystem, its state space equation is shown in (7). Under the proposed decentralized approximate optimal control law (38) based on a cooperative differential game, the tracking error of the MUS is guaranteed to be UUB.

Choose

V_{a l} (\dot{E}) = V_{c l} (t)

as a Lyapunov candidate function, the derivative of which can be obtained as follows:

{\dot{V}}_{c l} (t) = (\nabla V_{a l}^{T}) (F (x) + G U + W - {\ddot{x}}_{1 d}),

(39)

where

V_{s i} = {J_{i}}^{*} ({\dot{e}}_{s i})

and

V_{s j i} = {J_{i}}^{*} ({\dot{e}}_{s j i})

.

Considering the HJB Equation (15), it can be obtained as follows:

(\nabla V_{a l}^{T}) (F (x) - {\ddot{x}}_{1 d}) = - {\dot{E}}^{T} Q \dot{E} + \frac{1}{4} {(\nabla V_{a l}^{*})}^{T} G R^{- 1} G^{T} (\nabla V_{a l}^{*}) + \frac{1}{4} {(\nabla V_{a l}^{*})}^{T} P^{- 1} (\nabla V_{a l}^{*}) .

(40)

By substituting Formula (40) into (39), we get

{\dot{V}}_{c l} (t) = - {\dot{E}}^{T} Q \dot{E} + \frac{1}{4} {(\nabla V_{a l}^{*})}^{T} G R^{- 1} G^{T} (\nabla V_{a l}^{*}) + \frac{1}{4} {(\nabla V_{a l}^{*})}^{T} P^{- 1} (\nabla V_{a l}^{*}) + (\nabla V_{a l}^{T}) (G U + W) .

(41)

Combining this with Formula (41), it can be deduced that

\begin{matrix} {\dot{V}}_{c l} (t) = - {\dot{E}}^{T} Q \dot{E} - (\nabla V_{a l}^{T}) (G (U^{*} - \hat{U}) + (W^{*} - \hat{W})) \\ + \frac{1}{4} {(\nabla V_{a l}^{*})}^{T} G R^{- 1} G^{T} (\nabla V_{a l}^{*}) + \frac{1}{4} {(\nabla V_{a l}^{*})}^{T} P^{- 1} (\nabla V_{a l}^{*}) . \end{matrix}

(42)

Substituting Formula (39) into (42) yields

\begin{matrix} {\dot{V}}_{c l} (t) = - {\dot{E}}^{T} Q \dot{E} + \frac{1}{4} {(\nabla V_{a l}^{*})}^{T} (G R^{- 1} G^{T} (\nabla V_{a l}^{*}) + P^{- 1} (\nabla V_{a l}^{*})) + \frac{1}{2} {({(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + \nabla ε_{c r})}^{T} \\ (G R^{- 1} (G^{T} {(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + G^{T} \nabla ε_{c r}) + P^{- 1} ({(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + \nabla ε_{c r})) \\ = - {\dot{E}}^{T} Q \dot{E} + Π_{J}, \end{matrix}

(43)

where

Π_{J}

is with the upper bound:

\begin{matrix} Π_{J} \leq ∥\begin{matrix} \frac{1}{4} {(\nabla V_{a l}^{*})}^{T} (G R^{- 1} G^{T} (\nabla V_{a l}^{*}) + P^{- 1} (\nabla V_{a l}^{*})) + \frac{1}{2} {({(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + \nabla ε_{c r})}^{T} \\ (G R^{- 1} (G^{T} {(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + G^{T} \nabla ε_{c r}) + P^{- 1} ({(\nabla δ_{c r} (\dot{E}))}^{T} W_{c r} + \nabla ε_{c r})) \end{matrix}∥ \\ \leq π_{J}, \end{matrix}

(44)

where

π_{J}

is a computable positive number.

According to Formula (44),

{\dot{V}}_{c l} (t)

has the following upper bounds:

\begin{matrix} {\dot{V}}_{c l} (t) \leq - {\dot{E}}^{T} Q \dot{E} + π_{J} \\ \leq - λ_{min} (Q) {∥\dot{E}∥}^{2} + π_{J}, \end{matrix}

(45)

where

λ_{min}

is the minimum of the eigenvalue.

If

\dot{E}

is outside the compact set

Ω_{c l} = \{\dot{E} : \dot{E} \leq \sqrt{\frac{π_{J}}{λ_{min} (Q)}}\},

(46)

it follows that Formula (39) is negative definite; that is, if Formula (46) is satisfied for any

\dot{E} \neq 0

, then

{\dot{V}}_{c l} (t) < 0

. Therefore, we can conclude that the trajectory tracking error is ultimately uniformly bounded under the decentralized approximate optimal control law (30) based on a cooperative differential game via ACD. Proof complete.

4. Experiment

4.1. Experimental Setup

The proposed approximate optimal control law based on the cooperative differential game was verified on a 2-DOF MUS experimental platform, the details of which are shown in Figure 2. The measurable joint control torque is measured by the torque sensor, and the joint position information can be obtained by a combination of the absolute encoder and the incremental encoder. Because the control system is built in the Simulink environment, the sampling interval of the system can be set by the software. We conducted several experiments with different related algorithms. To reduce the length of the article, two representative different control methods are used in this section, namely non-zero-sum game-based optimal control [7] and the proposed cooperative differential game-based adaptive critic design method. The critic neural network chose a 2-3-1 structure with a single hidden layer structure. The activation function was

δ_{c r} (\dot{E}) = e^{\frac{- {(\dot{E} - ς)}^{T} (\dot{E} - ς)}{l}}

, with

ς = [- 0.5, 0.5]

and

l = [1.28, 1.28]

. The learning rate of weight update was 0.82.

Q = I

,

R = 0.97 I

, and

P = 0.94 I

, where I is the identity matrix.

4.2. Experimental Results

(1): Position tracking performance

Figure 3, Figure 4, and Figure 5, respectively, show the position tracking curves under the non-zero-sum game-based optimal control method [7,30] cooperative differential game-based approximate optimal control strategy. The desired values are those which the MUS is expected to achieve. The actual values are those of the MUS under the different control algorithms’ curves. If the difference between the desired values and actual values is small, we say that the performance is good. It can be seen from the figures that both the traditional control method and the proposed control strategy can ensure good position tracking performance. Figure 4 and Figure 5 show the position tracking error curves of the traditional and proposed control methods. Both methods guarantee that the position tracking error is within 0.002 rad. It can be seen from the graph that the actual position immediately deviates from the expected position when the curve deflects, but due to the good robustness of the proposed control system, the actual trajectory coincides with the expected trajectory again in a short time. The proposed method’s tracking error is reduced by nearly 30% in both joint one and joint two. The proper learning law shows good performance. Based on different experiments, we chose a learning law with a better performance of 0.82.

(2): Control torque

Figure 6 and Figure 7 show the control torque curve under the existing and proposed control methods. Control torque has a serious tremor effect, which will affect the position tracking performance and durability of the DC motor. Each joint of the MUS has a certain cost function under the existing non-zero-sum game method. However, this method cannot optimize the whole performance of the MUS. The developed cooperative differential game method is the only one with a whole cost function, and it can minimize the total performance. Therefore, in Figure 7, the control torque is reasonably optimized due to the adoption of the cooperative game strategy based on ACD. The root mean squares of the tracking error and control torque are shown in Table 1. The values of the control torque when completing the same task should be approximately the same. However, the cooperative differential game-based ACD method can minimize the value of the control torque to the greatest extent possible. According to Table 1, the control torque in the proposed method is reduced by nearly 15%.

(3): Critic NN weight

Figure 8 shows the weight curves of the neural network with the proposed control method. Due to the training of the neural network and the critic neural network’s policy iteration, the weights of the neural network are aggregated. However, because of the nature of RBFNNs, the aggregation of the curves is not to a specific value but within a small range.

5. Conclusions

This paper proposes an MUS approximate optimal control method based on cooperative differential game theory and solves the trajectory tracking problem. Firstly, a dynamic model of an MUS is established. Then, each module of the MUS is treated as a participant in the cooperative differential game, and the trajectory tracking problem of the MUS is transformed into an approximate optimal control problem through ACD based on the cooperative differential game. Using the critic network to approximate the joint performance index function of the system, an approximate optimal control law is obtained through a policy iteration algorithm. According to Lyapunov theory, the stability has been proven. Finally, the effectiveness was verified through an experimental platform.

Author Contributions

Conceptualization, L.S.; Methodology, L.S.; Software, L.S.; Validation, Y.L.; Formal analysis, Y.L.; Resources, L.Z. and Y.Q.; Data curation, L.Z.; Writing—original draft, L.S.; Writing—review & editing, Y.L.; Visualization, Y.L.; Supervision, Y.Q.; Project administration, Y.Q.; Funding acquisition, Y.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.

Conflicts of Interest

Liang Si, Yebao Liu, Luyang Zhong and Yuhan Qian were employed by the Aerospace Times Feihong Technology Company Limited.

References

Liu, Y.-J.; Gao, B.; Yu, D.; Li, D.; Liu, L. Neuro-Adaptive Fault-Tolerant Attitude Control of a Quadrotor UAV with Flight Envelope Limitation and Feedforward Compensation. IEEE Trans. Syst. Man Cybern. Syst. 2025, 55, 3143–3151. [Google Scholar] [CrossRef]
Xue, S.; Zhao, N.; Zhang, W.; Luo, B.; Liu, D. A Hybrid Adaptive Dynamic Programming for Optimal Tracking Control of USVs. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 9961–9969. [Google Scholar] [CrossRef]
Huang, Y.; Xu, X.; Meng, Z.; Sun, J. A Smooth Distributed Formation Control Method for Quadrotor UAVs under Event-Triggering Mechanism and Switching Topologies. IEEE Trans. Veh. Technol. 2025, 74, 10081–10091. [Google Scholar] [CrossRef]
Xue, S.; Zhang, W.; Luo, B.; Liu, D. Integral Reinforcement Learning-Based Dynamic Event-Triggered Nonzero-Sum Games of USVs. IEEE Trans. Cybern. 2025, 55, 1706–1716. [Google Scholar] [CrossRef]
Luo, D.; Wang, Y.; Li, Z.; Song, Y.; Lewis, F.L. Asymptotic Leader-Following Consensus of Heterogeneous Multi-Agent Systems with Unknown and Time-Varying Control Gains. IEEE Trans. Autom. Sci. Eng. 2025, 22, 2768–2779. [Google Scholar] [CrossRef]
Huang, Y.; Kuai, J.; Cui, S.; Meng, Z.; Sun, J. Distributed Algorithms via Saddle-Point Dynamics for Multi-Robot Task Assignment. IEEE Robot. Autom. Lett. 2024, 9, 11178–11185. [Google Scholar] [CrossRef]
Liu, Y.; An, T.; Chen, J.; Zhong, L.; Qian, Y. Event-trigger Reinforcement Learning-based Coordinate Control of Modular Unmanned System via Nonzero-sum Game. Sensors 2025, 25, 314. [Google Scholar] [CrossRef]
Ren, J.; Wang, D.; Li, M.; Qiao, J. Discounted Stable Adaptive Critic Design for Zero-Sum Games with Application Verifications. IEEE Trans. Autom. Sci. Eng. 2025, 22, 11706–11716. [Google Scholar] [CrossRef]
Wang, D.; Hu, L.; Li, X.; Qiao, J. Online Fault-Tolerant Tracking Control with Adaptive Critic for Nonaffine Nonlinear Systems. IEEE/CAA J. Autom. Sin. 2025, 12, 215–227. [Google Scholar] [CrossRef]
Zhang, Y.; Li, J.-Y. Reinforcement Learning-Based Distributed Robust Bipartite Consensus Control for Multispacecraft Systems with Dynamic Uncertainties. IEEE Trans. Ind. Inform. 2024, 20, 13341–13351. [Google Scholar] [CrossRef]
Zhao, B.; Zhang, S.; Liu, D. Self-Triggered Approximate Optimal Neuro-Control for Nonlinear Systems Through Adaptive Dynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 4713–4723. [Google Scholar] [CrossRef]
Lin, M.; Zhao, B.; Liu, D. Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method with Convergence Analysis. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 5574–5585. [Google Scholar] [CrossRef]
Liu, N.; Zhang, K.; Xie, X.; Yue, D. UKF-Based Optimal Tracking Control for Uncertain Dynamic Systems with Asymmetric Input Constraints. IEEE Trans. Cybern. 2024, 54, 7224–7235. [Google Scholar] [CrossRef]
Wang, K.; Mu, C.; Ni, Z.; Liu, D. Safe Reinforcement Learning and Adaptive Optimal Control with Applications to Obstacle Avoidance Problem. IEEE Trans. Autom. Sci. Eng. 2024, 21, 4599–4612. [Google Scholar] [CrossRef]
Zhao, J.; Wang, Z.; Lv, Y.; Na, J.; Liu, C.; Zhao, Z. Data-Driven Learning for H∞ Control of Adaptive Cruise Control Systems. IEEE Trans. Veh. Technol. 2024, 73, 18348–18362. [Google Scholar] [CrossRef]
Ding, C.; Zhang, Z.; Miao, Z.; Wang, Y. Event-Based Finite-Time Formation Tracking Control for UAV with Bearing Measurements. IEEE Trans. Ind. Electron. 2025, 72, 7482–7492. [Google Scholar] [CrossRef]
Wang, K.; Mu, C. Learning-Based Control with Decentralized Dynamic Event-Triggering for Vehicle Systems. IEEE Trans. Ind. Inform. 2023, 19, 2629–2639. [Google Scholar] [CrossRef]
Zhang, S.; Zhao, B.; Liu, D.; Zhang, Y. Event-Triggered Decentralized Integral Sliding Mode Control for Input-Constrained Nonlinear Large-Scale Systems with Actuator Failures. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 1914–1925. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, B.; Liu, D.; Zhang, S. Distributed Fault Tolerant Consensus Control of Nonlinear Multiagent Systems via Adaptive Dynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 9041–9053. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, Y.; Miao, Z.; Jiang, Y.; Feng, Y. Asymptotic Stability Analysis and Stabilization Control for General Fractional-Order Neural Networks via an Unified Lyapunov Function. IEEE Trans. Netw. Sci. Eng. 2024, 11, 2675–2688. [Google Scholar] [CrossRef]
Xia, H.; Hou, J.; Guo, P. Two-Level Local Observer-Based Decentralized Optimal Fault Tolerant Tracking Control for Unknown Nonlinear Interconnected Systems. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 1779–1790. [Google Scholar] [CrossRef]
Liu, P.; Zhang, H.; Ming, Z.; Wang, S.; Agarwal, R.K. Dynamic Event-Triggered Safe Control for Nonlinear Game Systems with Asymmetric Input Saturation. IEEE Trans. Cybern. 2024, 54, 5115–5126. [Google Scholar] [CrossRef]
Xia, H.; Wang, X.; Huang, D.; Sun, C. Cooperative-Critic Learning-Based Secure Tracking Control for Unknown Nonlinear Systems with Multisensor Faults. IEEE Trans. Cybern. 2025, 55, 282–294. [Google Scholar] [CrossRef]
Qin, C.; Qiao, X.; Wang, J.; Zhang, D.; Hou, Y.; Hu, S. Barrier-Critic Adaptive Robust Control of Nonzero-Sum Differential Games for Uncertain Nonlinear Systems with State Constraints. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 50–63. [Google Scholar] [CrossRef]
Wei, Q.; Jiang, H. Event-/Self-Triggered Adaptive Optimal Consensus Control for Nonlinear Multiagent System with Unknown Dynamics and Disturbances. IEEE Trans. Cybern. 2025, 55, 1476–1485. [Google Scholar] [CrossRef]
An, T.; Dong, B.; Yan, H.; Liu, L.; Ma, B. Dynamic Event-Triggered Strategy-Based Optimal Control of Modular Robot Manipulator: A Multiplayer Nonzero-Sum Game Perspective. IEEE Trans. Cybern. 2024, 54, 7514–7526. [Google Scholar] [CrossRef] [PubMed]
Zhang, K.; Zhang, Z.-X.; Xie, X.P.; Rubio, J.d.J. An Unknown Multiplayer Nonzero-Sum Game: Prescribed-Time Dynamic Event-Triggered Control via Adaptive Dynamic Programming. IEEE Trans. Autom. Sci. Eng. 2024, 22, 8317–8328. [Google Scholar] [CrossRef]
Mu, C.; Wang, K.; Ni, Z.; Sun, C. Cooperative Differential Game-Based Optimal Control and Its Application to Power Systems. IEEE Trans. Ind. Inform. 2020, 16, 5169–5179. [Google Scholar] [CrossRef]
An, T.; Wang, Y.; Liu, G.; Li, Y.; Dong, B. Cooperative Game-Based Approximate Optimal Control of Modular Robot Manipulators for Human–Robot Collaboration. IEEE Trans. Cybern. 2023, 53, 4691–4703. [Google Scholar] [CrossRef]
Belhenniche, A.; Chertovskih, R.; Gonçalves, R. Convergence Analysis of Reinforcement Learning Algorithms Using Generalized Weak Contraction Mappings. Symmetry 2025, 17, 750. [Google Scholar] [CrossRef]

Figure 1. Joint module of MUS.

Figure 2. MUS (a) platform (b) joint module platform.

Figure 3. Position tracking via proposed method. (a) joint 1 (b) joint 2.

Figure 4. Position error via existing method. (a) joint 1 (b) joint 2.

Figure 5. Position (a) joint 1 (b) joint 2 error via proposed method.

Figure 6. Control (a) joint 1 (b) joint 2 torque curves via existing method.

Figure 7. Control (a) joint 1 (b) joint 2 torque curves via proposed method.

Figure 8. NN (a) joint 1 (b) joint 2 weight.

Table 1. Root mean squares.

	Joint 1	Joint 2
Position error of proposed method	1.88 × $10^{- 3}$ rad	1.23 × $10^{- 3}$ rad
Position error of existing method	2.56 × $10^{- 3}$ rad	1.79 × $10^{- 3}$ rad
Control torque of proposed method	0.36 Nm	0.19 Nm
Control torque of existing method	0.42 Nm	0.22 Nm

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Si, L.; Liu, Y.; Zhong, L.; Qian, Y. Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach. Symmetry 2025, 17, 1665. https://doi.org/10.3390/sym17101665

AMA Style

Si L, Liu Y, Zhong L, Qian Y. Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach. Symmetry. 2025; 17(10):1665. https://doi.org/10.3390/sym17101665

Chicago/Turabian Style

Si, Liang, Yebao Liu, Luyang Zhong, and Yuhan Qian. 2025. "Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach" Symmetry 17, no. 10: 1665. https://doi.org/10.3390/sym17101665

APA Style

Si, L., Liu, Y., Zhong, L., & Qian, Y. (2025). Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach. Symmetry, 17(10), 1665. https://doi.org/10.3390/sym17101665

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cooperative Differential Game-Based Modular Unmanned System Approximate Optimal Control: An Adaptive Critic Design Approach

Abstract

1. Introduction

Notation

2. Dynamic Model of MUS

3. Approximate Optimal-Control-Based Cooperative Differential Game via ACD

3.1. Problem Description

3.2. An Approximate Solution of Decentralized Approximate Optimal Control in a Cooperative Differential Game Based on a Critic Network

3.3. Fulfillment of Policy Iteration

4. Experiment

4.1. Experimental Setup

4.2. Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI