Next Article in Journal
Validation and Analysis of Recreational Runners’ Kinematics Obtained from a Sacral IMU
Next Article in Special Issue
Risk Assessment of Hydrogen-Powered Aircraft: An Integrated HAZOP and Fuzzy Dynamic Bayesian Network Framework
Previous Article in Journal
Adaptive Filtering for Channel Estimation in RIS-Assisted mmWave Systems
Previous Article in Special Issue
Disturbance Estimation and Predefined-Time Control Approach to Formation of Multi-Spacecraft Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Event-Trigger Reinforcement Learning-Based Coordinate Control of Modular Unmanned System via Nonzero-Sum Game

1
Aerospace Times Feihong Technology Company Limited, Beijing 130012, China
2
Department of Control Science and Engineering, Changchun University of Technology, Changchun 130012, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(2), 314; https://doi.org/10.3390/s25020314
Submission received: 6 November 2024 / Revised: 24 December 2024 / Accepted: 28 December 2024 / Published: 7 January 2025
(This article belongs to the Special Issue Smart Sensing and Control for Autonomous Intelligent Unmanned Systems)

Abstract

:
Decreasing the position error and control torque is important for the coordinate control of a modular unmanned system with less communication burden between the sensor and the actuator. Therefore, this paper proposes event-trigger reinforcement learning (ETRL)-based coordinate control of a modular unmanned system (MUS) via the nonzero-sum game (NZSG) strategy. The dynamic model of the MUS is established via joint torque feedback (JTF) technology. Based on the NZSG strategy, the existing coordinate control problem is transformed into an RL issue. With the help of the ET mechanism, the periodic communication mechanism of the system is avoided. The ET-critic neural network (NN) is used to approximate the performance index function, thus obtaining the ETRL coordinate control policy. The stability of the closed-loop system is verified via Lyapunov’s theorem. Experiment results demonstrate the validity of the proposed method. The experimental results show that the proposed method reduces the position error by 30% and control torque by 10% compared with the existing control methods.

1. Introduction

With the rapid development of the space industry and the continuous increase in the demand for space exploration, the complexity of the environment and the precision of the control requirements faced by space operations are also constantly improving [1,2,3]. The problems of high risk and low efficiency caused by traditional astronaut operations relying on them leaving the capsule are becoming increasingly prominent. In recent years, thanks to the rapid improvement and development of unmanned systems research and development technology, the use of high-precision and -performance unmanned systems to solve the assembly and maintenance of space operations in orbit is gradually becoming a scientific value of the goal-oriented basic research topic in the field of space exploration. So far, unmanned systems operating in conventional ground environments have achieved good reliability and accuracy. However, under the high standard requirements of coordinate missions in complex space environments, traditional unmanned systems are difficult to meet the transportation requirements of launch vehicles and spacecraft due to their large volume, heavy mass, and difficulty in disassembly and assembly. However, for the space unmanned systems that are in service to overcome the above difficulties, there are still some limitations in the configuration of the mechanism, and it is difficult to change its assembly configuration and working mode according to different task requirements. The modular unmanned system (MUS) [4,5] is a kind of autonomous unmanned system with standard modules and interfaces that can reassemble and configure itself according to different task requirements. Through the reconfiguration of modules, the unmanned system can show a variety of assembly configurations to complete different tasks, thus showing advantages that traditional unmanned systems do not possess.
As an important branch of game theory, differential game [6,7] focuses on the dynamic decision-making process of continuous time systems described by differential equations. It is an ideal tool to deal with multi-participant decision making and control problems and to solve optimal strategies, and it is widely used in economics [8], management [9], computer science [10], and other fields [11,12]. Reinforcement learning [13,14] originated as an imitation of the human brain learning mechanism, reflecting the mapping of learning environment state to action, so that the system can obtain the maximum cumulative reward from the environment and then optimize the system performance through the optimal strategy selection. In recent years, it has been widely used in complex nonlinear differential games because it can effectively solve the problem of “dimensionality disaster” in traditional dynamic programming [15,16]. As a kind of game, the nonzero-sum game (NZSG) [17,18] needs to solve the corresponding coupled Hamilton–Jacobi (HJ) equation for each player in order to obtain its Nash equilibrium solution.
As an important part of modern control theory, the core problem of optimal control is to select control strategies to make some performance indexes of a given controlled system optimal. For a large number of nonlinear systems in practical engineering, to obtain the optimal control strategy, it is necessary to solve the HJ(-Bellman) (HJB) equation, which is a class of nonlinear partial differential equation, and it is difficult to obtain the optimal solution by analytical methods. The reinforcement learning method is a powerful tool to solve the optimal control problem of nonlinear systems. In reinforcement learning systems, a neural network (NN) [19,20] is designed to approximate the performance index function and estimate the solution of the HJ(B) equation. Due to its strong advantages in solving nonlinear optimal control, reinforcement learning has attracted extensive attention from scholars both domestic and abroad in recent years, and has made rich achievements in solving problems such as discrete time optimal control [21,22,23], continuous time optimal control [24,25,26], and data-driven optimal control [27,28,29] of complex nonlinear systems. However, these results are based on periodic sampling or event triggering, resulting in a waste of resources and high computational costs.
Motivated by the above, this paper develops event-trigger (ET) reinforcement learning (ETRL)-based coordinate control of MUS via the NZSG strategy. The main contributions of this paper are mainly the following two aspects:
1. To the best of the authors’ knowledge, it is the first time to introduce the NZSG via reinforcement learning applied to an MUS. By considering the control torque of n modules in the MUS as decision-makers, the optimal control problem for the MUS system is morphed into an NZSG issue with n players.
2. The stability of the developed method is guaranteed and the experiment on MUS is conducted. Through the experimental results, we can conclude that the proposed method produces less tracking errors and power consumption.

2. Background and Related Work

2.1. Reinforcement Learning

Optimal control is widely utilized and holds great importance in many areas. However, with the increase in the system’s dimension, the issue of the dimensionality curse has appeared. Reinforcement learning, as an effective solution to the dimensionality curse in optimal control, has emerged as a crucial approach for addressing approximate optimal control issues. Vamvoudakis et al. [30] published a book about RL-based control for cognitive autonomy. Wang et al. [12] developed a review in the field of RL for advanced control applications. Liu and Xue et al. [31] concluded the RL with applications in control. The above three papers are all surveys about the RL or adaptive dynamic programming. The proposed method in this paper deals with the coordinate control of an modular unmanned system that is a specific application environment using RL. Dong et al. [32] proposed safe RL for the sake of trajectory tracking for a modular robot system. Event-trigger is not mentioned, as it causes computation and communication burdens. An et al. [17] designed a cooperative game-based RL method for human–robot interaction. Liu et al. [33] used the RL control method to deal with the vehicle path tracking issue. The particle swarm optimization method is utilized in the vehicle path tracking issue [34]. However, the above methods only consider a single controller to guarantee optimality. The modern industry needs more than one player/controller to finish the task using other player’s information. Hence, the developed nonzero-sum game has great importance.

2.2. Nonzero-Sum Game

Differential game theory focuses on the dynamic decision-making process in multi-player interactive systems and with advantages in dealing with uncertain interaction and disturbance. Differential games include the zero-sum game, nonzero-sum game, cooperative game, etc. Each module in the MUS system functions as a participant in NZSG, each with its own policy, collectively operating within the group using a general quadratic performance index function as the basis for the game. Wu et al. [35] proposed NZSG for an unmanned aerial vehicle with uncertain as well as asymmetric information. Coordinate control is not considered. Zheng et al. [36] developed a Q-learning-based NZSG for spacecraft system under a pursuit–evasion condition. The above method is based on a time-triggered mechanism. Besides optimum, the communication burden between the sensor and actuator needs to be considered. Therefore, an event-trigger has been developed to decrease the quantity of sampling. An et al. [37] used a dynamic event-trigger to complete a robot’s tracking task via NZSG. Dong et al. [38] proposed event-trigger value iteration RL for a coordinated task under the framework of NZSG. The above methods are only applied on robots; thus, they are unsuitable for modular unmanned systems.

3. Dynamic Model

For an MUS employing the JTF technique, the dynamic model of the ith subsystem is presented below:
I i m γ i q ¨ i + τ i s γ i + f i r ( q i , q ˙ i ) + I i ( q , q ˙ , q ¨ ) = τ i + J i T f ,
where f is the contact force between the MUS and object; f i r ( q i , q ˙ i ) means lumped joint friction; γ i indicates the gear ratio; q i reflects joint position; τ i s represents coupled joint torque; I i ( q , q ˙ , q ¨ ) is the IDC effect among MUS subsystems; τ i indicates control torque; and subscript i is ith joint module subsystem. The property analyses are described below:
(1)
The lumped joint friction
The joint friction term f i r ( q i , q ˙ i ) is formulated as
f i r ( q i , q ˙ i ) = f ^ i b q ˙ i + ( f ^ i s e ( f ^ i r q ˙ i 2 ) + f ^ i c ) s g n ( q ˙ i ) + f i p ( q i , q ˙ i ) + Y i ( q ˙ i ) F ˜ i r ,
in which
Y i ( q ˙ i ) = f i b f ^ i b , f i c f ^ i c , f i s f ^ i s , f i τ f ^ i τ T ,
where f i p ( q i , q ˙ i ) is the position dependency friction term; f i b , f i τ are viscous and Stribect friction effects; and f i s , f i c are static and Coulomb friction parameters. Furthermore, f ^ i b , f ^ i c , f ^ i s , and f ^ i τ are the estimated values.
Remark 1.
The variables f i b , f i c , f i s , f i τ are bounded, and their corresponding estimates also possess boundedness. Consequently, this ensures that the variable F ˜ i r is bounded, as indicated by F ˜ i r b i F r m , where b i F r m represents a known positive constant for each m in (1,2,3,4). Consequently, Y i ( q ˙ i ) F ˜ i r can be derived, which is designated as Y i ( q ˙ i ) F ˜ i r Y i ( q ˙ i ) b i F r m . Additionally, f i p ( q i , q ˙ i ) b i F p , in which b i F p is a known positive constant.
(2)
The interconnected dynamic coupling
The IDC is expressible as a nonlinear function of the coupled vectors of the entire modular subsystem in this way:
I i = I i m j = 1 i 1 v m i T v l j q ¨ j + I i m j = 2 i 1 k = 1 j 1 v m i T ( v l k × v l j ) q ˙ k q ˙ j = I i m j = 1 i 1 D j i q ¨ j + I i m j = 2 i 1 k = 1 j 1 Θ k j i q ˙ k q ˙ j = j = 1 i 1 I i m D ^ j i , I i m q ¨ j , D ˜ j i q ¨ j T + j = 2 i 1 k = 1 j 1 I i m Θ ^ k j i , I i m q ¨ j , Θ ˜ k j i q ˙ k q ˙ j T ,
in which v m i , v l j , v l k denote the unit vectors along with the ith, jth, and kth joint rotation axes, respectively. Consequently, define D j i = v m i T v l j and Θ k j i = v m i T ( v l k × v l j ) . We also have the relation that D ^ j i = D j i D ˜ j i and Θ ^ k j i = Θ k j i Θ ˜ k j i , in which D ^ j i , Θ ^ k j i represent the estimated values of D j i , Θ k j i , and D ˜ j i , Θ ˜ k j i are alignment errors.
Remark 2.
Based on (4), which characterizes v m i , v l k , v l j , it is inferred that the magnitudes of the associated vector products are bounded, where D j i = v m i T v l j < 1 and Θ k j i = v m i T ( v l k × v l j ) < 1 . Additionally, our findings indicate that I i is bounded and the up-bound is given as I i b i I with a positive constant.
Define state vector x i = [ x i 1 , x i 2 ] T = [ q i , q ˙ i ] T and the control input u i = τ i . The state space of the ith subsystem is
x ˙ i 1 = x i 2 x ˙ i 2 = f i ( x ) + g i u i ,
where
g i = ( I i m γ i ) 1 f i = g i ( f ^ i s e ( f ^ i r x ˙ i 1 2 ) + f ^ i c ) s g n ( x i 2 ) f i p ( x i 1 , x i 2 ) f ^ i b x i 2 Y i ( x i 2 ) F ˜ i r τ i s γ i I i ( x , x ˙ , x ¨ ) J i T f .
Control objectives aim to ensure optimal tracking error performance for the MUS in coordinate control. Within the subsequent section, we introduce an event-trigger reinforcement learning-based coordinate control via nonzero-sum game framework.

4. Event-Trigger Reinforcement Learning-Based Coordinate Control via Nonzero-Sum Game

4.1. Problem Transformation

Based on the dynamic model (1) and state space (5), the control object of this paper is completing optimal trajectory tracking. Therefore, to facilitate designing the controller, the augmenting subsystem is deduced:
x ˙ 1 = x 2 x ˙ 2 = f ( x ) + m = 1 n G m u m ,
where x = [ x 1 T , x 2 T ] T R 2 n is global state of the MUS, in which the vectors x 1 , x 2 are given by x 1 = [ x 11 , , x i 1 , , x n 1 ] T R n and x 2 = [ x 12 , , x i 2 , , x n 2 ] T R n . Moreover, f ( x ) = [ f 1 ( x ) , , f i ( x ) , , f n ( x ) ] T , G m = [ 0 , , 0 , g m , 0 , , 0 ] T , where g m = I m m γ m 1 , m = 1 , , n .
Define the cost function:
J i ( e ˙ s , u 1 , , u n ) = t e ˙ T s Q i e ˙ s + m = 1 n u m T R i m u m d τ = t U i ( e ˙ s , u 1 , , u n ) d τ ,
where position error is e = e 1 , e 2 , , e n T = x 1 x d and velocity error vector e ˙ = e ˙ 1 , e ˙ 2 , , e ˙ n T = x 2 x ˙ d ; e s = e ˙ + β e means fusion error; x d , x ˙ d , x ¨ d represent the determined reference vectors; Q i , R i m denote determined positive definite matrices; and U i ( e ˙ s , u 1 , , u n ) indicates the utility function. Employing the infinitesimal version of (8), the Hamiltonian function can be derived:
H i ( e ˙ s , u 1 , , u n , J i ) = U i ( e ˙ s , u 1 , , u n ) + ( J i ) T f ( x ) + m = 1 n G m u m b d ,
where J i ( e ˙ s ) = J i ( e ˙ s ) e ˙ s is the partial derivative of J i ( e ˙ s ) , b d = x ¨ d + β e . Additionally, the optimal value function can be described as
J i * ( e ˙ s , u 1 , , u n ) = min u i t U i ( e ˙ s , u 1 , , u n ) d τ .
Based on the stationary condition H i u i = 0 , the local optimal control policy u i * is defined as
u i * = 1 2 R i i 1 G i T J i * .
By substituting (8) and (11) into the Hamiltonian function (9), the coupled Hamilton–Jacobi (HJ) equation can be derived:
0 = ( J i * ) T f ( x ) 1 2 m = 1 n G m R m m 1 G m T J m * + ϖ ( x ) b d + 1 4 m = 1 n ( J i * ) T G m R m m 1 R i m R m m 1 G m T ( J m * ) + e ˙ T s Q i e ˙ s
It is hard to obtain an analytical solution because of the nonlinearity system. Therefore, an event-trigger reinforcement learning-based coordinate control is introduced.

4.2. Event-Trigger Reinforcement Learning-Based Coordinate Control

The optimal control policy (11) is addressed from periodic sampling as well as the coupled HJ equation (12). Fixed sampling control not only escalates computational demands but also excessively taps into communication resources, jeopardizing the timeliness of control in environments with constrained bandwidth. Therefore, an event-triggered strategy is introduced to optimize efficiency.
Set a series of monotonously increasing t j j = 0 + , which contains trigger instants t j . Then, define the sampling state
e ˙ s j i x j i = e ˙ s j i x i t j ,
where e ˙ s j i x j i denotes triggering instant state for t t j , t j + 1 . To obtain the trigger condition, the subsequent gap function is introduced:
g e j i t = e ˙ s i x i e ˙ s j i x i .
Upon event triggering, based on (13), the actual state undergoes sampling to become the sampled state, after which g e j i t is reset to zero. The optimal control law is updated to u i * e ˙ s i t j = u i * e ˙ s j i during t j , t j + 1 , j N . It should be noted that u i * e ˙ s j i are discrete values updated irregularly, necessitating conversion to continuous values. Therefore, a zero-order holder is derived to cope with this issue.
According to the dynamic model of MUS (7), one gives the event-trigger value function as follows:
J i ( e ˙ s j i , u 1 , , u n ) = t t + T e ˙ s j i T Q i e ˙ s j i + m = 1 n u m T ( e ˙ s j i ) R i m u m ( e ˙ s j i ) d τ .
One has the event-triggered HJ equation:
H i ( e ˙ s j i , u 1 , , u n , J i ( e ˙ s j i ) ) = U i ( e ˙ s j i , u 1 , , u n ) + ( J i ( e ˙ s j i ) ) T f ( x ) + m = 1 n G m u m ( e ˙ s j i ) b d ,
where J i e ˙ s j i = J i e ˙ s j i / e ˙ s j i is the partial derivative of J i e ˙ s j i with regard to e ˙ s j i . To eliminate the assumption of norm-boundness regarding interconnections, the desired states of coupled subsystems are used as a replacement for their actual states. Consequently, the interconnection term is depicted:
f ( x ) = f i ( x i , x m d ) + Δ f i ( x , x m d ) , u m = G m 1 ( x ˙ m 2 d f m ( x d ) ) , m i .
where x m d is the desired state of coupled subsystems for m = 1 , , i 1 , i + 1 , , n , and Δ f i ( x , x m d ) represents the substitution error. Given the interconnection’s compliance with the global Lipschitz condition, this indicates
Δ f i ( x , x m d ) m = 1 , m i n d i m E m ,
where E m = x m x m d , and d i m 0 denotes an unknown global Lipschitz constant.
The improved event-triggered optimal value function is
J i * e ˙ s j i , u 1 , , u n = min μ i t t + T e ˙ s j i T Q i e ˙ s j i + m = 1 n u m T ( e ˙ s j i ) R i m u m ( e ˙ s j i ) d τ .
Through the substitution of (19) into (16), it can be inferred that
0 = min u i e ˙ s j i Ψ i Ω H i e ˙ s j i , u i ( e ˙ s j i ) , J i * ( e ˙ s j i ) .
Based on (21), one has the event-triggered optimal control law
u i * e ˙ s j i = 1 2 R i i 1 G i T J i * e ˙ s j i .
For any e ˙ s i , e ˙ s j i Ω , the control law is Lipschitz continuous. Then, one has a constant χ i satisfying
u i * e ˙ s i u i * e ˙ s j i = u i * e ˙ s i + g e j i t u i * e ˙ s j i χ i g e j i t .
Given the challenging nature of solving the coupled HJ equation and the curse of dimensionality that arises with increasing dimensions, we employ the reinforcement learning algorithm for deriving an approximate solution for the event-triggered HJ equation in real-time.
The improved value function J i * e ˙ s j i can be obtained by the radial basis function neural network (RBFNN) as follows:
J i * e ˙ s j i = W c i T δ c i e ˙ s j i + ε c i e ˙ s j i ,
where W c i R K i is the desired critic NN weight vector; K i is the number of neurons in the hidden-layer; δ c i e ˙ s j i = exp e ˙ s j i c i j / 2 b i j 2 denotes the activation function; and ε c i e ˙ s j i is critic NN approximation error, which is bounded as δ c i ( e ˙ s j i ) δ c i max , ε c i e ˙ s j i ε c i max with the positive constants δ c i max and ε c i max .
Therefore, the partial derivative of J i * e ˙ s j i can be obtained as follows:
J i * e ˙ s j i = δ c i T e ˙ s j i W c i + ε c i e ˙ s j i ,
where δ c i e ˙ s j i is Lipschitz continuous.
The relationship δ c i e ˙ s i δ c i e ˙ s j i p i g e j i t can be derived, and p i is a positive constant.
Substituting (24) into (21) can yield the following:
u i * e ˙ s j i = 1 2 R i i 1 G i T J i * e ˙ s j i = 1 2 R i i 1 G i T δ c i T e ˙ s j i W c i + ε c i e ˙ s j i .
Therefore, one obtains the event-triggered HJ equation as follows:
H i e ˙ s j i , u i * e ˙ s j i , J i * e ˙ s j i = U i ( e ˙ s j i , u 1 , , u i * , , u n ) + ( J i ( e ˙ s j i ) ) T f i ( x i , x m d ) b d + m = 1 n G m u m ( e ˙ s j i ) e c H i ,
where e c H i = ε c i T e ˙ s j i e ¨ s j i means residual error, and the positive constant e c H i max is the upper bound of e c H i .
Since we cannot obtain the desired critic NN weight vector, we approximate the improved value function
J ^ i e ˙ s j i = W ^ c i T δ c i e ˙ s j i .
Furthermore, the partial derivative J ^ i e ˙ s j i is formulated as follows:
J ^ i e ˙ s j i = δ c i T e ˙ s j i W ^ c i .
Therefore, merging (28) with (21), the event-trigger-based approximate optimal control law u ^ i e ˙ s j i is obtained as
u ^ i e ˙ s j i = 1 2 R i i 1 G i T δ c i T e ˙ s j i W ^ c i .
According to (25), (28) and (29), we can obtain the event-triggered approximate HJ equation as
H ^ i e ˙ s j i , u ^ i * e ˙ s j i , J ^ i * e ˙ s j i = U i ( e ˙ s j i , u ^ 1 , , u ^ i * , , u ^ n ) + ( J i ( e ˙ s j i ) ) T f i ( x i , x m d ) b d + m = 1 n G m u ^ m ( e ˙ s j i ) e c i .
Define the critic approximation error vector as W ˜ c i = W c i W ^ c i ; from (30), we can define e c i / W ^ c i = δ c i e ˙ s j i e ¨ s j i = θ i . To refine the estimation of the desired vector W ^ c i , we employ the gradient descent algorithm to minimize the objective function E c i = 1 2 e c i 2 , with the update rate given by
W ^ ˙ c i = α c i E c i W ^ c i = α c i U i ( e ˙ s j i , u ^ 1 , , u ^ i * , , u ^ n ) + θ i W ^ c i T θ i .
Then, the approximation error vector is
W ˜ ˙ c i = α c i θ i W ˜ c i T e c H i θ i .
Theorem 1.
Taking the value function (23) into account, it is estimated by the critic NN with weights W c i . The cost function, as given by Equation (27), is approximated using the weights W ^ c i . Assuming the update law for the critic NN is defined by (31), the weight approximation error is proven to be UUB.
Proof. 
The candidate for the Lyapunov function is selected as
V 1 i t = 1 2 α c i W ˜ c i T W ˜ c i .
The derivative of V 1 i t can be obtained as
V ˙ 1 i t = 1 α c i W ˜ c i T W ˜ ˙ c i = W ˜ c i T e c H i θ i W ˜ c i T θ i = W ˜ c i T e c H i θ i W ˜ c i T θ i 2 1 2 e c H i 2 1 2 W ˜ c i T θ i 2 .
Upon analyzing (34), we observe that if W ˜ c i e c H i θ i , this leads to V ˙ 1 i t < 0 , which in turn confirms the UUB of the critic approximation error vector. □
Theorem 2.
Given an MUS with joint subsystem dynamic model (1) and state space (7), the closed-loop MUS with coordinate control is UUB under the presented event-triggered reinforcement learning-based coordinate control law (35) if
g e j i 2 1 α i 2 σ min Q i e ˙ s i 2 + m = 1 n r m 2 u ^ m e ˙ s j m 2 2 R i χ i 2 G i 2 δ c i max W c i max + ε c i max 2 4 R i 2 χ i 2 ,
holds, where α i 0 , 1 is the designed sampling frequency parameter, σ min · means the minimum eigenvalue of the matrix, and r i m is a positive constant that satisfies R i m = r i m T r i m , assuming W ˜ c i W c i max .
Proof. 
We select the Lyapunov candidate function
V i t = V s i + V s j i ,
where V s i = J i * e ˙ s i and V s j i = J i * e ˙ s j i . □
The following proof is divided into two cases.
Case 1: 
The events are not triggered, i.e., t t j , t j + 1 .
Computing the derivative with respect to time of (36), the result is obtained as follows:
V ˙ s i ( t ) = ( J i * ( e ˙ s i ) ) T f i ( x i , x m d ) b d + m = 1 n G m u m ,
V ˙ s j i = 0 .
Based on the optimal control law (11) and time-triggered HJ equation (12), one has
( J i * ) T f i ( x i , x m d ) b d = e ˙ s i T Q i e ˙ s i + 1 2 m = 1 n G m R m m 1 G m T J m * 1 4 m = 1 n ( J m * ) T G m R m m 1 R i m R m m 1 G m T ( J m * ) ,
J i * e ˙ s i G i = 2 m = 1 n G m T u m * .
Substituting (39) and (40) into (37), we can obtain the following equation:
V ˙ s i ( t ) = e ˙ s i T Q i e ˙ s i ( J i * ) T m = 1 n G m u m * u ^ m 1 4 m = 1 n ( J m * ) T G m R m m 1 R i m R m m 1 G m T ( J m * ) = e ˙ s i T Q i e ˙ s i 1 4 m = 1 n ( J m * ( e ˙ s j i ) ) T G m R m m 1 R i m R m m 1 G m T ( J m * ( e ˙ s j i ) ) + 1 2 δ c i T ( e ˙ s j i ) W c i + ε c i T m = 1 n G m R m m 1 G m T δ c m T ( e ˙ s j i ) W ˜ c m + G m T ε c m = e ˙ s i T Q i e ˙ s i 1 4 m = 1 n ( J m * ( e ˙ s j i ) ) T G m R m m 1 R i m R m m 1 G m T ( J m * ( e ˙ s j i ) ) + Π i J ,
in which the function term Π i J has the following up-bound:
Π i J 1 2 δ c i T ( e ˙ s j i ) W c i + ε c i T m = 1 n G m R m m 1 G m T ϕ c m T ( e ˙ s j i ) W ˜ c m + ε c m π i J ,
where π i J is a computable positive constant.
According to (24) and (28), (41) can be transformed into
V ˙ s i e ˙ s i T Q i e ˙ s i m = 1 n u ^ m T e ˙ s j m R i m u ^ m e ˙ s j m + 2 χ 2 R i i g e j i t 2 + 1 2 R i i 1 G i 2 δ c i T e ˙ s j i W ˜ c i + ε c i e ˙ s j i 2 .
Thus, we have V ˙ t as
V ˙ t = i = 1 n V ˙ i t = i = 1 n V ˙ s i + V ˙ s j i .
According to (38) and (43) as well as (44), one has
V ˙ ( t ) i = 1 n e ˙ s i T Q i e ˙ s i + 2 R i m χ 2 g e j i 2 m = 1 n u ^ m T e ˙ s j m R i m u ^ m e ˙ s j m + 1 2 R i m 1 G i 2 δ c i T e ˙ s j i W ˜ c i + ε c i e ˙ s j i 2 i = 1 n α i 2 σ min Q i e ˙ s i 2 + α i 2 1 σ min Q i e ˙ s i 2 r i m 2 u ^ i e ˙ s j m 2 + 2 R i m χ 2 g e j i 2 + 1 2 R i m 1 G i 2 δ c i max W c i max + ε c i max 2 .
Given that (35) is valid, (45) is compliant with the requirement V ˙ t i = 1 n ( π i J α i 2 σ min Q i e ˙ s i 2 ) . The condition that ensures the negativity of V ˙ i ( t ) is that e ˙ s i does not fall within the confines of Ω = e ˙ s i : e ˙ s i π i J α i 2 σ min ( Q i ) , a requirement critical for affirming the negativity of the proposed Lyapunov function.
Case 2: 
When events are triggered, t = t j + 1 , the difference of (36) is rewritten as
Δ V i t = V i e ˙ s j i + 1 V i e ˙ s i x j i + 1 = J i * e ˙ s j i + 1 J i * e ˙ s i x j i + 1 + J i * e ˙ s j i + 1 J i * e ˙ s j i .
Based on (35), one has V ˙ t 0 . Therefore, we have J i * e ˙ s j i + 1 J i * e ˙ s i x j i + 1 .
Then, one has (46) as the following form:
Δ V t J i * e ˙ s j i + 1 J i * e ˙ s j i ν g e j i t j ,
where ν denotes a class-k function, g e j i t j = e ˙ s j i + 1 e ˙ s j i .
Taking into account Cases 1 and 2 collectively, it follows that under the condition specified by (35), the closed-loop MUS’s tracking error is UUB. Thus, the conclusion of the proof is established.

4.3. Exclusion of Zeno Behaviors

The minimum trigger interval t min = min t j + 1 t j is likely to be 0—that is, Zeno behavior. Therefore, we give the following theorem to avoid the phenomenon:
Theorem 3.
Considering MUS (1), the triggering condition (35) and the event-triggered approximate optimal control law (29), the minimum trigger interval t min is with a positive lower bound by
t min 1 S i Z ln 1 + π j , min > 0 ,
where π j , min = min g e j i e ˙ s i x j i + 1 e ˙ s j i + Θ i > 0 , S i Z , Θ i are positive constants.
Proof. 
The time derivative of the event-triggered gap function (14) can be derived as follows:
d g e j i d t = g ˙ e j i = e ¨ s i x i e ¨ s j i x j i = e ¨ s i x i .
The upper bound of e ¨ s i x i is derived as
e ¨ s i x i S i Z x i + S i Z Θ i .
Combining (14) and (49) with (50), it can be obtained that
g ˙ e j i t j t e S i Z t w S i Z e ˙ s j i + Θ i d w e ˙ s j i + Θ i e S i Z t t j 1 .
When t = t j + 1 , the event-triggered condition satisfies
g e j i e ˙ s j i x j i + 1 = g e j i x j i + 1 .
Based on (51) and (52), the jth triggering interval Δ t j has the lower bound by
Δ t j = t j + 1 t j 1 S i Z ln 1 + g e j i e ˙ s j i x j i + 1 e ˙ s j i + Θ i .
This concludes the proof. □

5. Experiment

5.1. Experimental Setup

The validation of the proposed control method’s effectiveness is demonstrated through experiments on a 2 degrees of freedom (DOF) MUS platform. Detailed information about the experimental setup can be found in Figure 1. Joint control torque is measured using a joint torque sensor, while joint position information is acquired from both absolute and incremental encoders. The data acquisition board acts as the intermediary allowing interaction between the software environment (Simulink of Matlab 2016a) and hardware components. It is noted that the proposed ETRL via NZSG, which is in the form of continuous time, needs to be realized discretely when it is implemented in experiments. Fortunately, the control system, which is constructed under the Simulink environment, may complete the discrete realization automatically and adjust the sampling period adaptively. The model parameters are as follows: I m i = 120 g·cm2, γ i = 100, f ^ i b = 12 m·Nm/rad, f ^ i c = 30 m·Nm, f ^ i s = 40 m·Nm, f ^ i τ = 20 s2/rad2. We consider the coordinate control, which is illustrated in Figure 1. The purpose of the experiment is satisfying the requirements of position tracking performance and control torque optimization under a coordinate operation with MUS. The critic NN is selected as RBFNN, and the activation function of (23) is δ c i = e ( e ˙ s j i i ) T ( e ˙ s j i i ) ζ i with initial value W c i = [ 0.3 , 0.3 , 0.3 , 0.3 , 0.3 ] T . i , ζ i denote the center and width of the activation function. The purpose of the control method is to decrease the position error and control torque as much as possible. The experimental results show that the proposed method reduces the position error by 30% and control torque by 10% compared with the existing control methods.

5.2. Experimental Results

The experimental outcomes are utilized to evaluate the system’s position tracking accuracy, tracking error magnitude, applied control torque, contact forces, event-triggering mechanism’s efficiency, and neural network (NN) weights’ performance individually. Two distinct control methodologies are implemented: the established learning-based tracking approach, as seen in references [37,39], and the novel proposed control strategy. The coordinate control was subjected to the implementation of two distinct control approaches. The upper figure corresponds to joint one, while the lower one illustrates joint two.
(1) Position tracking
Figure 2, Figure 3 and Figure 4 depict the position tracking and tracking error curves in joint space during coordinate control using both the existing learning-based tracking control method and the proposed approximate optimal control method. The graphical analysis indicates that the position tracking error is notably lower and smoother with the newly proposed control method as opposed to the previously established method. The proposed method reduces the position error by 30%. This is attributed to the accurate solution of the coordinate control problem achieved by the proposed method. At the corners of the trajectory, the tracking error tends to increase but is effectively mitigated back to an acceptable range along the smooth path by the proposed approximate optimal control method. Figure 5 shows the 3D tracking curves.
(2) Control torque
Figure 6 displays the control torque curves during coordinate control using both the existing learning-based tracking control method and the proposed approximate optimal control method. The illustrations indicate that the control torque experiences a sharp increase during sudden trajectory changes, potentially impacting the lifespan of the DC motors. The proposed method reduces the control torque by 10% compared with the existing control methods. Furthermore, the control torque curves under the current control method display pronounced chattering, potentially degrading the accuracy of trajectory tracking. However, by employing the developed approximate optimal control method, the output torques are optimized to minimize motor power consumption and instantaneous increases in control torques are maintained within safe boundaries.
(3) Contact force
Figure 7 depicts the contact force curves during coordinate control using the proposed approximate optimal control method. Since the MUS has 2-DOF and the joint axes are assembled in parallel, the contact force curves appear in a two-dimensional space. From the figures, it can be observed that the proposed approximate optimal control method ensures that the contact force remains below 2N, with minimal chattering phenomenon.
(4) Event-triggered mechanism
Figure 8 and Figure 9 depict the trigger threshold and trigger condition curves. Owing to the incorporation of the NN within the reinforcement learning process, both the trigger condition and the trigger threshold exhibit large values. The proposed method’s trigger time is nearly half that of the existing method. However, the trigger condition stays within the threshold limits, confirming the reliability of the newly introduced strategy. Figure 9 demonstrates that the developed controller substantially reduces the communication burden of the MUS.
(5) NN weight
Figure 10 illustrates the behavior of the critic NN via RBFNN under coordinate control facilitated by the proposed approximate optimal control method. The converged weights obtained from the proposed approximate optimal control policies allow the NN to accurately reflect the ongoing coordinate operations in real-time.
Based on the experimental results, the closed-loop MUS systems have better performance than the existing methods in terms of position tracking and control torque under the proposed ETRL via the NZSG approach (cf. Table 1). Drawing from the experimental figure findings, when compared to existing methods, the closed-loop MUS demonstrates enhanced performance in position tracking, control torque, contact force, and event-triggered conditions under the proposed approximate optimal control method.

6. Conclusions

An ETRL-based coordinate control of MUS via NZSG is proposed in the paper. JTF is utilized to form the MUS’s dynamic. The coordinate control problem is transformed into an RL issue via the NZSG strategy. Conventional periodic communication is avoided by the ET mechanism. The performance index function is approximated by the critic NN to obtain the optimal control strategy. According to the Lyapunov theorem, the closed-loop system is guaranteed to be stable. The experimental results show that the proposed method reduces the position error by 30% and control torque by 10% compared with the existing control methods. The mentioned control algorithm only concerns the static event-trigger. However, the computation burden and power consumption can be optimized by the dynamic event-trigger or self-event-trigger. This is the future research direction that we will work on.

Author Contributions

Conceptualization, Y.L.; methodology, T.A.; software, J.C.; validation, L.Z. and Y.Q. All authors have read and agreed to the published version of the manuscript.

Funding

The work is supported by the National Natural Science Foundation of China (62473063), the Scientific Technological Development Plan Project in Jilin Province of China (20220201038GX), Key Laboratory of Advanced Structural Materials (Changchun University of Technology), Ministry of Education, China (ASM-202202).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

Authors Y.L., J.C., L.Z. and Y.Q. were employed by the company Aerospace Times Feihong Technology Company Limited. The remaining author declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Hirano, D.; Inazawa, M.; Sutoh, M.; Sawada, H.; Kawai, Y.; Nagata, M.; Sakoda, G.; Yoneda, Y.; Watanabe, K. Transformable Nano Rover for Space Exploration. IEEE Robot. Autom. Lett. 2024, 9, 3139–3146. [Google Scholar] [CrossRef]
  2. Kedia, R.; Goel, S.; Balakrishnan, M.; Paul, K.; Sen, R. Design Space Exploration of FPGA-Based System With Multiple DNN Accelerators. IEEE Embed. Syst. Lett. 2021, 13, 114–117. [Google Scholar] [CrossRef]
  3. Goyal, M.; Dewaskar, M.; Duggirala, P.S. NExG: Provable and Guided State-Space Exploration of Neural Network Control Systems Using Sensitivity Approximation. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2022, 41, 4265–4276. [Google Scholar] [CrossRef]
  4. Nguyen, T.M.; Ajib, W.; Assi, C. A Novel Cooperative NOMA for Designing UAV-Assisted Wireless Backhaul Networks. IEEE J. Sel. Areas Commun. 2018, 36, 2497–2507. [Google Scholar] [CrossRef]
  5. Cheng, X.; Jiang, R.; Sang, H.; Li, G.; He, B. Joint Optimization of Multi-UAV Deployment and User Association Via Deep Reinforcement Learning for Long-Term Communication Coverage. IEEE Trans. Instrum. Meas. 2024, 73, 5503613. [Google Scholar] [CrossRef]
  6. Xue, S.; Luo, B.; Liu, D.; Yang, Y. Constrained Event-Triggered H∞ Control Based on Adaptive Dynamic Programming With Concurrent Learning. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 357–369. [Google Scholar] [CrossRef]
  7. Yang, X.; Xu, M.; Wei, Q. Adaptive Dynamic Programming for Nonlinear-Constrained H∞ Control. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 4393–4403. [Google Scholar] [CrossRef]
  8. Renga, D.; Spoturno, F.; Meo, M. Reinforcement Learning for charging scheduling in a renewable powered Battery Swapping Station. IEEE Trans. Veh. Technol. 2024, 73, 14382–14398. [Google Scholar] [CrossRef]
  9. Lv, Y.; Wu, Z.; Zhao, X. Data-Based Optimal Microgrid Management for Energy Trading With Integral Q-Learning Scheme. IEEE Internet Things J. 2023, 10, 16183–16193. [Google Scholar] [CrossRef]
  10. Sun, J.; Zhang, H.; Yan, Y.; Xu, S.; Fan, X. Optimal Regulation Strategy for Nonzero-Sum Games of the Immune System Using Adaptive Dynamic Programming. IEEE Trans. Cybern. 2023, 53, 1475–1484. [Google Scholar] [CrossRef]
  11. Sun, J.; Dai, J.; Zhang, H.; Yu, S.; Xu, S.; Wang, J. Neural-Network-Based Immune Optimization Regulation Using Adaptive Dynamic Programming. IEEE Trans. Cybern. 2023, 53, 1944–1953. [Google Scholar] [CrossRef] [PubMed]
  12. Wang, D.; Gao, N.; Liu, D.; Li, J.; Lewis, F.L. Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications. IEEE/CAA J. Autom. Sin. 2024, 11, 18–36. [Google Scholar] [CrossRef]
  13. Lv, Y.; Chang, H.; Zhao, J. Online Adaptive Integral Reinforcement Learning for Nonlinear Multi-Input System. IEEE Trans. Circuits Syst. II Express Briefs 2023, 70, 4176–4180. [Google Scholar] [CrossRef]
  14. Na, J.; Lv, Y.; Zhang, K.; Zhao, J. Adaptive Identifier-Critic-Based Optimal Tracking Control for Nonlinear Systems With Experimental Validation. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 459–472. [Google Scholar] [CrossRef]
  15. Jin, P.; Ma, Q.; Lewis, F.L.; Xu, S. Robust Optimal Output Regulation for Nonlinear Systems With Unknown Parameters. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 4908–4917. [Google Scholar] [CrossRef]
  16. Jin, P.; Ma, Q.; Gu, J. Fixed-Time Practical Anti-Saturation Attitude Tracking Control of QUAV with Prescribed Performance: Theory and Experiments. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 6050–6060. [Google Scholar] [CrossRef]
  17. An, T.; Wang, Y.; Liu, G.; Li, Y.; Dong, B. Cooperative Game-Based Approximate Optimal Control of Modular Robot Manipulators for Human–Robot Collaboration. IEEE Trans. Cybern. 2023, 53, 4691–4703. [Google Scholar] [CrossRef]
  18. Sahabandu, D.; Moothedath, S.; Allen, J.; Bushnell, L.; Lee, W.; Poovendran, R. RL-ARNE: A Reinforcement Learning Algorithm for Computing Average Reward Nash Equilibrium of Nonzero-Sum Stochastic Games. IEEE Trans. Autom. Control 2024, 69, 7824–7831. [Google Scholar] [CrossRef]
  19. Zhao, B.; Shi, G.; Liu, D. Event-Triggered Local Control for Nonlinear Interconnected Systems Through Particle Swarm Optimization-Based Adaptive Dynamic Programming. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 7342–7353. [Google Scholar] [CrossRef]
  20. Zhang, Y.; Zhao, B.; Liu, D.; Zhang, S. Distributed Fault Tolerant Consensus Control of Nonlinear Multiagent Systems via Adaptive Dynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 9041–9053. [Google Scholar] [CrossRef]
  21. Zhang, Y.; Zhao, B.; Liu, D.; Zhang, S. Event-Triggered Control of Discrete-Time Zero-Sum Games via Deterministic Policy Gradient Adaptive Dynamic Programming. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 4823–4835. [Google Scholar] [CrossRef]
  22. Ye, J.; Dong, H.; Bian, Y.; Qin, H.; Zhao, X. ADP-Based Optimal Control for Discrete-Time Systems With Safe Constraints and Disturbances. IEEE Trans. Autom. Sci. Eng. 2024; early access. [Google Scholar] [CrossRef]
  23. Song, S.; Gong, D.; Zhu, M.; Zhao, Y.; Huang, C. Data-Driven Optimal Tracking Control for Discrete-Time Nonlinear Systems With Unknown Dynamics Using Deterministic ADP. IEEE Trans. Neural Netw. Learn. Syst. 2023; early access. [Google Scholar] [CrossRef] [PubMed]
  24. Mu, C.; Wang, K.; Xu, X.; Sun, C. Safe Adaptive Dynamic Programming for Multiplayer Systems With Static and Moving No-Entry Regions. IEEE Trans. Artif. Intell. 2024, 5, 2079–2092. [Google Scholar] [CrossRef]
  25. Xiao, G.; Zhang, H. Convergence Analysis of Value Iteration Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems. IEEE Trans. Cybern. 2024, 54, 1639–1649. [Google Scholar] [CrossRef]
  26. Davari, M.; Gao, W.; Aghazadeh, A.; Blaabjerg, F.; Lewis, F.L. An Optimal Synchronization Control Method of PLL Utilizing Adaptive Dynamic Programming to Synchronize Inverter-Based Resources With Unbalanced, Low-Inertia, and Very Weak Grids. IEEE Trans. Autom. Sci. Eng. 2024; early access. [Google Scholar] [CrossRef]
  27. Wei, Q.; Li, T. Constrained-Cost Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 3251–3264. [Google Scholar] [CrossRef]
  28. Lin, M.; Zhao, B. Policy Optimization Adaptive Dynamic Programming for Optimal Control of Input-Affine Discrete-Time Nonlinear Systems. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 4339–4350. [Google Scholar] [CrossRef]
  29. Mu, C.; Wang, K.; Ni, Z. Adaptive Learning and Sampled-Control for Nonlinear Game Systems Using Dynamic Event-Triggering Strategy. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 4437–4450. [Google Scholar] [CrossRef]
  30. Vamvoudakis, K.; Kokolakis, N. Synchronous Reinforcement Learning-Based Control for Cognitive Autonomy. Found. Trends Syst. Control 2020, 8, 1–175. [Google Scholar] [CrossRef]
  31. Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive Dynamic Programming for Control: A Survey and Recent Advances. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 142–160. [Google Scholar] [CrossRef]
  32. Dong, B.; Zhu, X.; An, T.; Jiang, H.; Ma, B. Barrier-critic-disturbance Approximate Optimal Control of Nonzero-sum Differential Games for Modular Robot Manipulators. Neural Netw. 2025, 181, 106880. [Google Scholar] [CrossRef] [PubMed]
  33. Liu, Y.; Cui, D.; Peng, W. Optimum Control for Path Tracking Problem of Vehicle Handling Inverse Dynamics. Sensors 2023, 23, 6673. [Google Scholar] [CrossRef] [PubMed]
  34. Liu, Y.; Cui, D. Optimal Control of Vehicle Path Tracking Problem. World Electr. Veh. J. 2024, 15, 429. [Google Scholar] [CrossRef]
  35. Wu, P.; Wang, H.; Liang, G.; Zhang, P. Research on Unmanned Aerial Vehicle Cluster Collaborative Countermeasures Based on Dynamic Non-Zero-Sum Game under Asymmetric and Uncertain Information. Aerospace 2023, 10, 711. [Google Scholar] [CrossRef]
  36. Zheng, Z.; Zhang, P.; Yuan, J. Nonzero-Sum Pursuit-Evasion Game Control for Spacecraft Systems: A Q-Learning Method. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 3971–3981. [Google Scholar] [CrossRef]
  37. An, T.; Dong, B.; Yan, H.; Liu, L.; Ma, B. Dynamic Event-triggered Strategy-based Optimal Control of Modular Robot Manipulator: A Multiplayer Nonzero-Sum Game Perspective. IEEE Trans. Cybern. 2024, 54, 7514–7526. [Google Scholar] [CrossRef]
  38. Dong, B.; Gao, Y.; An, T.; Jiang, H.; Ma, B. Nonzero-sum Game-based Decentralized Approximate Optimal Control of Modular Robot Manipulators with Coordinate Operation Tasks using Value Iteration. Meas. Sci. Technol. 2024. [Google Scholar] [CrossRef]
  39. Liu, F.; Xiao, W.; Chen, S.; Jiang, C. Adaptive Dynamic Programming-based Multi-sensor Scheduling for Collaborative Target Tracking in Energy Harvesting Wireless Sensor Networks. Sensors 2018, 18, 4090. [Google Scholar] [CrossRef]
Figure 1. Experimental platform.
Figure 1. Experimental platform.
Sensors 25 00314 g001
Figure 2. Position tracking curves in joint space via the existing learning-based tracking control method, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Figure 2. Position tracking curves in joint space via the existing learning-based tracking control method, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Sensors 25 00314 g002
Figure 3. Position tracking curves in joint space via the proposed approximate optimal control method, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Figure 3. Position tracking curves in joint space via the proposed approximate optimal control method, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Sensors 25 00314 g003
Figure 4. Position tracking error curves in joint space, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Figure 4. Position tracking error curves in joint space, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Sensors 25 00314 g004
Figure 5. Position tracking curves in 3D space.
Figure 5. Position tracking curves in 3D space.
Sensors 25 00314 g005
Figure 6. Control torque curves, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Figure 6. Control torque curves, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Sensors 25 00314 g006
Figure 7. Contact force curves via the proposed approximate optimal control method, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Figure 7. Contact force curves via the proposed approximate optimal control method, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Sensors 25 00314 g007
Figure 8. Trigger threshold and trigger condition curves via the proposed approximate optimal control method, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Figure 8. Trigger threshold and trigger condition curves via the proposed approximate optimal control method, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Sensors 25 00314 g008
Figure 9. Time-triggered and event-triggered time curves via the proposed approximate optimal control method.
Figure 9. Time-triggered and event-triggered time curves via the proposed approximate optimal control method.
Sensors 25 00314 g009
Figure 10. NN curve via the proposed approximate optimal control method, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Figure 10. NN curve via the proposed approximate optimal control method, where the upper (a) and lower (b) subgraphs correspond to Joint 1 and Joint 2 respectively.
Sensors 25 00314 g010
Table 1. Performance comparisons.
Table 1. Performance comparisons.
Mean Absolute Value of
Position Error
Mean Absolute Value of
Control Torque
The existing method (Joint 1) 1.73 × 10 3 rad0.32 Nm
The proposed method (Joint 1) 1.03 × 10 3 rad0.29 Nm
The existing method (Joint 2) 1.62 × 10 3 rad0.30 Nm
The existing method (Joint 2) 0.98 × 10 3 rad0.26 Nm
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; An, T.; Chen, J.; Zhong, L.; Qian, Y. Event-Trigger Reinforcement Learning-Based Coordinate Control of Modular Unmanned System via Nonzero-Sum Game. Sensors 2025, 25, 314. https://doi.org/10.3390/s25020314

AMA Style

Liu Y, An T, Chen J, Zhong L, Qian Y. Event-Trigger Reinforcement Learning-Based Coordinate Control of Modular Unmanned System via Nonzero-Sum Game. Sensors. 2025; 25(2):314. https://doi.org/10.3390/s25020314

Chicago/Turabian Style

Liu, Yebao, Tianjiao An, Jianguo Chen, Luyang Zhong, and Yuhan Qian. 2025. "Event-Trigger Reinforcement Learning-Based Coordinate Control of Modular Unmanned System via Nonzero-Sum Game" Sensors 25, no. 2: 314. https://doi.org/10.3390/s25020314

APA Style

Liu, Y., An, T., Chen, J., Zhong, L., & Qian, Y. (2025). Event-Trigger Reinforcement Learning-Based Coordinate Control of Modular Unmanned System via Nonzero-Sum Game. Sensors, 25(2), 314. https://doi.org/10.3390/s25020314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop