Next Article in Journal
LARS: A Light-Augmented Reality System for Collective Robotic Interaction
Previous Article in Journal
A Dual-Modality CNN Approach for RSS-Based Indoor Positioning Using Spatial and Frequency Fingerprints
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Sliding Mode Fault-Tolerant Control for Multiple Robotic Manipulators via Critic-Only Dynamic Programming

1
School of Mechanical Engineering, Guangdong Ocean University, Zhanjiang 524088, China
2
Guangdong Engineering Technology Research Center of Ocean Equipment and Manufacturing, Zhanjiang 524088, China
3
Shenzhen Institute of Guangdong Ocean University, Shenzhen 518120, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(17), 5410; https://doi.org/10.3390/s25175410
Submission received: 23 July 2025 / Revised: 18 August 2025 / Accepted: 26 August 2025 / Published: 2 September 2025
(This article belongs to the Section Sensors and Robotics)

Abstract

This paper proposes optimal sliding mode fault-tolerant control for multiple robotic manipulators in the presence of external disturbances and actuator faults. First, a quantitative prescribed performance control (QPPC) strategy is constructed, which relaxes the constraints on initial conditions while strictly restricting the trajectory within a preset range. Second, based on QPPC, adaptive gain integral terminal sliding mode control (AGITSMC) is designed to enhance the anti-interference capability of robotic manipulators in complex environments. Third, a critic-only neural network optimal dynamic programming (CNNODP) strategy is proposed to learn the optimal value function and control policy. This strategy fits nonlinearities solely through critic networks and uses residuals and historical samples from reinforcement learning to drive neural network updates, achieving optimal control with lower computational costs. Finally, the boundedness and stability of the system are proven via the Lyapunov stability theorem. Compared with existing sliding mode control methods, the proposed method reduces the maximum position error by up to 25% and the peak control torque by up to 16.5%, effectively improving the dynamic response accuracy and energy efficiency of the system.

1. Introduction

With the continuous progress in intelligence, big data, cloud computing, and other technologies, the number of robotic manipulators will increase from a single function to multiple functions and from fixed scenes to flexible applications, whose applications in the fields of intelligent manufacturing, medical care, logistics, and other fields will continue to expand [1,2,3]. Therefore, multiple robotic manipulators will be more intelligent and have a complex direction of development. In the process of developing multiple robotic manipulators, fault-tolerant control is particularly important because of its complexity and interdependence among multiple robotic manipulators, which has also been one of the research focuses of cooperative control of multiple robotic manipulators in recent years [4,5,6].
Multiple robotic manipulators are critical for industrial and hazardous operations (e.g., nuclear waste handling [7,8]), but their control performance is limited by actuator/sensor failures, time-varying joint friction, and persistent disturbances [9]. High radiation in such environments directly causes sensor/actuator malfunctions, degrading system reliability. Therefore, enhancing the control performance of multiple robotic manipulators has become an urgent requirement. Sliding mode control (SMC) [10], known for its fast response and strong anti-interference capabilities, has emerged as a critical candidate for robotic manipulator system control [11,12]. However, existing SMC strategies face three fundamental contradictions in practical applications. First, there is a contradiction between robustness and precision. The traditional SMC is sensitive to parameter variations and external disturbances. For example, Wu et al.’s performance-constrained SMC method [13], when applied to complex terrain transportation tasks, experiences significant trajectory deviations due to inertial parameter fluctuations caused by load changes and wind disturbances. Although the boundary layer strategy [14,15] mitigates chattering, it compromises sliding mode stability, failing to meet the millimeter-level precision requirements for medical rehabilitation robotic manipulator end-effectors. Second, there is a conflict between computational complexity and real-time performance. Advanced SMC methods such as higher-order terminal SMC [16] rely on complex derivative calculations, significantly increasing the computational burden of the controller. In collaborative welding scenarios requiring a millisecond-level response, computational delays can lead to welding defects such as uneven seams or porosity [16,17]. Similarly, real-time performance degradation in multiple robotic manipulator collaborative grasping tasks causes asynchronous movements among manipulators. The third limitation is the limitations of adaptive capability in scenario adaptation. Existing adaptive SMC strategies perform well in specific scenarios but have deficiencies in robotic manipulator collaborative tasks: the terminal SMC in [18], which is effective for simple mobile robots, cannot adapt to dynamic parameter adjustments in industrial assembly; the nonsingular terminal strategy in [19] exhibits excessive complexity when handling strong coupling disturbances; and the stiffness-scheduling method in [20] lacks multiple robotic manipulator coordination mechanisms, hindering personalized treatment in rehabilitation scenarios. In summary, current SMC methods face significant limitations in robustness, real-time performance, and adaptive capability when applied to multiple robotic manipulators, as they fail to fully address their complex characteristics and domain-specific requirements. Therefore, developing a novel SMC structure with high robustness, low computational complexity, and strong scenario adaptability is crucial for advancing the stable application of multiple robotic manipulators in complex environments.
As the concept of sustainable development has taken deep root in people’s minds, the optimal control [21] of multiple robotic manipulators has become a key indicator for evaluating their performance. Reinforcement learning (RL), with its ability to obtain optimal strategies through continuous training, has garnered widespread attention in the field of multiple robotic manipulator control. To address the challenging problem of optimal control for multiple robotic manipulators under unknown nonlinear disturbances, reference [22,23,24] pioneered the adoption of an actor–critic architecture based on neural networks. In this architecture, the critic is responsible for calculating the cost function to evaluate control performance, while the actor continuously optimizes its own strategy on the basis of feedback from the critic, thereby gradually approaching the optimal solution. When the dynamic characteristics of the system are unknown, the identifier becomes a crucial component for accurately estimating uncertainties. References [25,26] introduced neural networks and fuzzy logic systems, respectively, as identifiers, effectively enhancing the system’s adaptability to unknown environments. However, the model uncertainties inherent in multiple robotic manipulator systems, along with the nonlinear characteristics of reinforcement learning, pose significant challenges in terms of heavy computational burdens for traditional optimal control methods in practical applications. To overcome this dilemma, refs. [27,28,29,30] innovatively proposed a critic-only reinforcement learning framework. Reference [27] ingeniously transformed the safety coordination problem into an optimal control problem by simplifying the structure of the critic neural network and introducing obstacle avoidance variables to increase system safety. Ma et al. [28] combined adaptive dynamic programming (ADP) algorithms with event-triggered mechanisms, utilizing a critic-only neural network to efficiently solve event-triggered Hamilton–Jacobi–Bellman equations, achieving decentralized tracking control. Reference [29] further constructed a critic-only neural network on the basis of policy iteration and ADP algorithms, successfully deriving an approximate fault-tolerant position—force optimal control strategy. Reference [30] targeted robotic manipulator systems with asymmetric input constraints and disturbances, achieving optimal control in complex environments by introducing a value function and approximately solving Hamilton–Jacobi–Isaacs equations online on the basis of ADP principles. Inspired by these studies, this paper focuses on multiple robotic manipulator systems in complex environments and designs an RL optimal control strategy based on a critic-only neural network. This strategy reduces computational complexity by optimizing the structure of the critic network and proposes explicit solutions to address the coupling issues of model uncertainties and nonlinear disturbances, providing a more efficient solution for practical deployment.
Prescribed performance control (PPC) [13,31] serves as an efficient control strategy capable of ensuring that system states adhere strictly to predefined performance specifications. Its core principle lies in the meticulous design of performance functions, combined with error transformation mechanisms and controller design [32,33], enabling control systems to precisely meet established performance requirements. Owing to this significant advantage, PPC has been extensively researched and applied in the field of robotic manipulator control. In terms of PPC implementation, reference [34,35,36] introduced performance functions to overcome performance limitations associated with tracking errors and transformed the error constraint problem of multiple robotic manipulators into an unconstrained stability control problem through error transformation strategies, providing new insights for system performance optimization. To address the state and output constraints of robotic manipulators, references [37,38] innovatively proposed barrier Lyapunov functions within the PPC framework, effectively ensuring that system states remain within predefined constraint boundaries. Furthermore, Liu et al. [39] developed a dynamic threshold finite-time prescribed performance control (DTFTPPC) method, which dynamically adjusts performance thresholds when the system reaches predefined time points, continuously compressing errors into smaller ranges and significantly enhancing control precision. However, existing research still presents limitations: the aforementioned multiple robotic manipulator systems require validation of initial position compliance with performance constraints before each operation, an additional verification step that increases operational complexity. Therefore, designing PPC control schemes that do not rely on initial position checks—while maintaining control performance and simplifying operational procedures—has become a critical challenge in the field of multiple robotic manipulator control, warranting in-depth and systematic investigations.
Inspired by the above discussion, an ADP based on an approximate optimal solution is proposed for the problem of actuator faults of multiple robotic manipulators in external disturbances. The sliding mode variable is constructed by combining the QPPC, and then the sliding mode variable is added to the value function to obtain an approximate optimal solution for the multiple robotic manipulators. The ADP is constructed to improve the control performance of multiple robotic manipulators under external disturbances and actuator failures. The contributions of this paper are summarized as follows.
(1) To address the problem of complex operations because of different initial states, a quantitative prescribed performance control (QPPC) strategy is introduced to release the initial state to realize flexible performance constraints and global prescribed performance.
(2) Adaptive gain-integrated terminal sliding mode control (AGITSMC) is proposed on the basis of a gain parameter that varies with the magnitude of the error, which improves the sensitivity of the sliding mode variable in complex environments. The proposed AGITSMC not only improves the convergence velocity and tracking accuracy but also enhances the stability under external disturbances and actuator faults.
(3) A critic-only neural network optimal dynamic programming (CNNODP) strategy is constructed by combining the gradient descent strategy, parallel learning technique, and experience playback technique. Unlike the traditional actor–critic or identifier–actor–critic NN structure, a critic-only NN is used to approximate the cost function, and the RL residuals and historical samples are employed to drive the update of the neural network, which achieves optimal control with fewer computations.
The remaining parts of this paper are structured as follows: Section 2 expounds on the dynamic model of multiple robotic manipulators and the theorems employed in this study; Section 3 provides a detailed derivation of the proposed control algorithms, namely, QPPC, AGITSMC, CNNODP, and fault compensation control, along with a stability analysis of the closed-loop system; Section 4 introduces the simulation setup and presents many simulation results to validate the effectiveness and robustness of the proposed algorithms; and finally, Section 5 summarizes the major findings of this paper and discusses potential future research directions.

2. Preliminaries

For ease of understanding, we use specific characters to represent the state variables of the robotic manipulator, with detailed explanations provided in the Table 1 below:

2.1. System Model

The dynamical models of multiple robotic manipulator systems are represented in the Euler–Lagrange form; here, we consider a multiple robotic manipulator system with N degrees of freedom with the following nonlinear differential equation:
M i ( q i ) q ¨ i + C i ( q i , q ˙ i ) q ˙ i + G i ( q i ) = Γ i + τ i
Remark 1. 
To further deepen the in-depth understanding of multiple robotic manipulator system models, a detailed and systematic explanation of the kinematic and dynamic models of multiple robotic manipulator systems is presented below.
(1) Variables and models: The kinematic model links the joint space and operational space of the manipulator. For an N-degree-of-freedom manipulator, the relationship between the joint position vector q i , s and the end-effector pose is nonlinear because of the complex link–joint geometry. Forward kinematics calculates the end-effector pose from given joint positions via coordinate transformations, whereas inverse kinematics finds joint positions for a desired end-effector pose, which often has multiple or no solutions and requires specific algorithms. The dynamic model is key to understanding the forces on the manipulator. The vectors q i , s , q ˙ i , s , and q ¨ i , s represent the position, velocity, and acceleration, respectively. The inertia matrix M ( q i ) n × n connects acceleration to inertial forces and changes with joint position. The centripetal-C Coriolis matrix C ( q i , q ˙ i ) n × n accounts for velocity-related dynamic effects. The gravity vector G ( q i ) n shows that gravitational forces vary with pose. The input torque τ i n is the control signal, and the bounded disturbance vector Γ i n represents the uncertainties. Newton’s second law shows that the left-hand side describes internal dynamics and that the right-hand side represents external forces.
(2) Assumptions and physical constraints: Links are assumed to be rigid to ease force calculations. Joints are ideal with no flexibility or clearance for direct motion transfer. Motion is assumed to be continuous without sudden impacts when smooth models are used. Dynamic parameters such as mass and inertia are considered known and constant, ignoring real-world variations. Nonlinear factors such as friction are often ignored initially, and the system is assumed to be linear or linearizable for applying linear control theory. The joint motion range is limited by mechanical design to prevent damage. Velocity is constrained by motor and structural performance, as high speed can cause vibration and stress. The input torque is limited by the motor capacity to avoid overloading. Acceleration is restricted by system inertia and motor capabilities. Manipulators must avoid collisions to prevent damage and follow physical laws such as energy conservation.
(3) Reasons for ignoring parameter uncertainty: Multiple robotic manipulator models are complex, and adding parameter uncertainty makes them even more complex, increasing the difficulty of analysis and controller design. In ideal scenarios such as well-controlled labs, accurate parameter knowledge is reasonable for focusing on basic system aspects. A phased research approach is common: first, a basic controller is designed without considering uncertainty for fundamental functions; then, robustness against uncertainty is enhanced later for an efficient research process.
Combined with the dynamic Equation (1) of multiple robotic manipulators, one obtains
q ¨ i = M i 1 ( q i ) τ i + Γ i C i ( q i , q ˙ i ) q ˙ i G i ( q i )
For simplicity, we let H i = M i 1 ( q i ) and D i = Γ i C i ( q i , q ˙ i ) q ˙ i G i ( q i ) . The expression for q ¨ i can be rewritten as q ¨ i = H i D i + H i τ i . Let H t i denote the diagonal matrix of matrix H i ; we obtain H i = H t i + H T i , where q ¨ i = H t i τ i + H T i τ i + H i D i . Thus, q ¨ i can again be rewritten as follows:
q ¨ i = H t i τ i + i
where i = H T i τ i + H i D i .
The following actuator fault model is constructed:
τ i , s = Φ i , s T τ h i , s + i , s
where τ h i , s and τ i , s denote the input and output of the fault model, respectively; Φ i , s and i , s denote the bias fault and additive fault of the actuator, respectively; and Φ i , s ( 0 , 1 ] , where * i = * i 1 , * i 2 , , * i N T and * i denote the state vectors of the i t h robotic manipulator, with a total of n dimensions. s = 1 , 2 , , N , denotes the degree of freedom of the i t h robotic manipulator with N dimensions.

2.2. Graph Description

Define the followers set by f = 1 , , N . The digraph χ = K , E , A is used to represent the relation under which information about the state is exchanged among robotic manipulators, where A = a i j n × n represents the communication of followers. a i j = 1 indicates that the i robotic manipulator can obtain information from the j robotic manipulator; otherwise, a i j = 0 . The set of followers is defined as K = k 1 , k 2 , , k n . d i = j = 1 n a i j and D = diag{ d i }   i = 1 , , n are defined. The Laplacian matrix is defined as L = D A . The adjacency matrix of leaders is defined as B = diag{ b i }, which represents the communication among the leader and the followers. b i = 1 indicates that the i robotic manipulator can obtain information from the leader; otherwise, b i = 0 . For later analysis, the information exchange matrix is defined as H = L + B .

2.3. Radial Basis Function Neural Network (RBFNN) Research

In current research on control systems, uncertain nonlinear terms in nonlinear systems can be approximated via the RBFNN.
In general, the RBFNN is utilized to approximate any unknown function F ( ξ ) . The RBFNN is denoted as follows:
F ( ξ ) = W * T S ( ξ ) + χ ( ξ )
where ξ = ξ 1 , ξ 2 , , ξ n T n and W = W 1 , W 2 , W n T n indicate the unknown input vector and the weight vector, respectively. S ( ξ ) = S 1 ( ξ ) , S 2 ( ξ ) , S n ( ξ ) T n is the known basis function vector. F ( ξ ) is a continuous function that is defined on a compact set Λ . Hence, for χ ¯ > 0 , a neural network exists such that F ( ξ ) W T S ( ξ ) χ ¯ . Therefore, χ ( ξ ) χ ¯ . The optimal constant vector W * is designed to be W * = arg min W M sup F ( ξ ) W T S ( ξ ) .

2.4. Lemmas and Assumptions

Lemma 1 
([40]). For any constants Θ i , 1 > 0 and Θ i , 2 > 0 , the relationships below can be obtained:
0 < Θ i , 2 I i , n < M i < Θ i , 1 I i , n
Lemma 2 
([41]). For any continuous functions x ( t ) and x 0 ( t ) on [ 0 , ) that satisfy lim t x 0 ( t ) = 0 and x 0 ( t ) > 0 , the relationships below can be obtained:
x ( t ) < x 0 ( t ) + x 2 ( t ) x 2 ( t ) + x 0 2 ( t )
Assumption 1. 
The bias Φ i , s fault and additive i , s fault of the actuator are bounded, i.e., the existence of constants r 1 and r 2 such that Φ i , s r 1 and i , s r 2 .
Assumption 2 
([27]). The activation function φ i , s and its gradient φ i , s are paradigmatically bounded, i.e., φ i , s φ ¯ i , s and φ i , s φ ¯ i , s , where φ ¯ i , s and φ ¯ i , s are unknown positive parameters.

3. Main Results

The main objective of this work is to satisfy the control performance requirements of multiple robotic manipulators with actuator faults and external disturbances. The overall control scheme of this paper is shown in Figure 1. First, a quantitative prescribed performance control (QPPC) strategy is designed by combining the signals generated by the observer. Then, the QPPC is utilized to construct the adaptive gain-integrated terminal sliding mode control (AGITSMC). Third, the sliding mode variables are introduced into the value function to construct a critic-only neural network optimal dynamic programming (CNNODP) control strategy for approximating the optimal solution. Finally, adaptive fault compensation control is proposed to reconstruct controllers with fault problems.

3.1. Quantitative Prescribed Performance Control (QPPC)

In general, the robotic manipulator connected to the leader knows its information, whereas other robotic manipulators do not have direct access to the leader’s information. However, once the robotic manipulator connected to the leader is faulty, the other robotic manipulators are inevitably greatly affected. To ensure that all the robotic manipulators can directly access the leader information without being affected by other robotic manipulators, the dynamic distribution observer is designed as follows:
q ^ ˙ i , s = v i , s g 1 j N a i , j ( q ^ i , s q ^ j , s ) g 1 b i ( q ^ i , s q L , s )
v ˙ i , s = h i , s g 2 j N a i , j ( q ^ i , s q ^ j , s ) g 2 b i ( q ^ i , s q L , s ) g 3 j N a i , j ( v i , s v j , s ) g 3 b i ( v i , s v L , s )
h ˙ i , s = g 4 b i + j N a i , j ( j N a i , j ( h i , s h j , s ) + b i ( h i , s h 0 , s ) ) + ( 1 b i + j N a i , j ) ( j N h ˙ j , s + b j h ˙ 0 , s )
where q L , s , v 0 , s , and h 0 , s denote the leader’s trajectory, velocity and acceleration, respectively; q ^ i , s , v i , s , and h i , s denote the follower’s trajectory, velocity and acceleration, respectively; and g c ( c = 1 ,   2 ,   3 ,   4 ) denotes positive parameters. The observation errors of the observers are defined as E p i , s = q ^ i , s q L , s , E v i , s = v i , s v 0 , s , and E h i , s = h i , s h 0 , s .
Lemma 3 
([27]). For q ^ i , s ( 0 ) , v i , s ( 0 ) and h i , s ( 0 ) with any initial state satisfying g 1 H    I N g 2 H    g 3 H > 0 and g 4 > 0 , the observer’s state converges exponentially to the leader’s trajectory, i.e., q ^ i , s q L , s , v i , s v 0 , s and h i , s h 0 , s as t .
The angle error is given by the following:
E i , s = q i , s q ^ i , s
The velocity error is given by the following:
e i , s = q ˙ i , s v i , s
Here, by using the inverse tangent function, the error is quantified as follows:
E t i , s = 2 π arctan ( E i , s )
On the basis of the inverse tangent function, 1 < 2 π arctan ( E i , s ) < 1 .
To ensure the prescribed performance of the consensus error with multiple robotic manipulator positions, the prescribed performance function is designed as follows:
β i , s ( t ) = ( β o i , s β i , s ) ( T a t T a ) 2 + β i , s , t [ 0 , T a ) β i , s ,               t [ T a , )
where β o i , s > 0 and β i , s > 0 denote the initial state and the ultimate convergence domain, respectively, and where T a is a predefined convergence time.
Deriving from β i , s ( t ) , one can obtain the following:
β ˙ i , s ( t ) = 2 T a ( β o i , s β i , s ) ( T a t T a ) , t [ 0 , T a ) 0 ,             t [ T a , )
To ensure PPC, by combining the prescribed performance Function (14) and quantified Function (13), the error transformation function is defined as follows:
Z i , s = tan ( π E t i , s 2 β i , s ) = tan ( arctan ( E i , s ) β i , s )
Remark 2. 
The error transformation function tan ( π 2 x ) ensures that 1 < x < 1 for 1 < x ( 0 ) < 1 . β i , s ( t ) is a prescribed performance function of arctan ( E i , s ) and π 2 β i , s ( 0 ) < arctan ( E i , s ( 0 ) ) < π 2 β i , s ( 0 ) . From the above, it follows that β i , s ( t ) is a monotonically decreasing function. π 2 β i , s ( t ) < arctan ( E i , s ( t ) ) < π 2 β i , s ( t ) always holds under the constraints of tan ( π 2 x ) , and arctan ( E i , s ) enters and is always in the predefined domain [ β i , s , β i , s ] . On the basis of the properties of the inverse tangent function, arctan ( E i , s ) always satisfies 1 < 2 π arctan ( E i , s ) < 1 . Thus, E i , s ( t ) is unconstrained when β i , s ( t ) 1 , and E i , s ( t ) is constrained as β i , s ( t )  decreases and satisfies β i , s ( t ) < 1 .
From the above, the constraint states of E i , s ( t ) can be classified into two types.
  • When the prescribed performance function β i , s ( t ) 1 , one can obtain the following:
    < E i , s <
  • When the prescribed performance function β i , s ( t ) < 1 , one can obtain the following:
    tan ( π 2 β i , s ) < E i , s < tan ( π 2 β i , s )
Therefore, the PPC strategy designed in this paper releases the initial condition of multiple robotic manipulators, which no longer needs to know the initial state.
Deriving from Z i , s , one can obtain the following:
Z ˙ i , s = r i , s E ˙ i , s R i , s
where r i , s = sec 2 ( arctan ( E i , s ) β i , s ) 1 β i , s ( 1 + E i , s 2 ) > 0 and R i , s = sec 2 ( arctan ( E i , s ) β i , s ) arctan ( E i , s ) ˙ i , s β i , s 2 .
Remark 3. 
Combined with the quantization function, a new error transformation function is developed in this paper. Unlike the traditional error transformation strategy [34,35,36,37,38,39], this strategy releases the initial condition and no longer requires the initial error to be within a certain domain, realizing global PPC. In addition, with the help of the quantization function, the error is transformed to a smaller bounded domain even if the error is large, which can greatly reduce the control energy and overshooting phenomenon.

3.2. Adaptive Gain Integral Terminal Sliding Mode Control (AGITSMC)

To ensure that the system trajectory converges quickly and with good robustness, an AGITSMC control strategy is designed as follows:
σ i , s = b 1 i , s Z i , s + b 2 i , s Z ˙ i , s + b 3 0 t ( b 1 i , s Z i , s + b 2 i , s Z ˙ i , s ) d r
where b 1 i , s = a 1 ( 2 ( 2 π arctan ( E i , s ) ) 2 ) 2 c 1 > 0 and b 2 i , s = a 2 ( 2 ( 2 π arctan ( e i , s ) ) 2 ) 2 c 2 > 0 , and where a 1 , a 2 , b 3 , c 1 , and c 2 are design parameters.
Thus, the derivative of σ i , s is as follows:
σ ˙ i , s = b 1 i , s Z ˙ i , s + b ˙ 1 i , s Z i , s + b 2 i , s Z ¨ i , s + b ˙ 2 i , s Z ˙ i , s + b 3 ( b 1 i , s Z i , s + b 2 i , s Z ˙ i , s )    = F i , s τ i , s + f i , s
where f i , s = b 2 i , s ( i , s r i , s + r ˙ i , s E ˙ i , s R i , s u i , s q ^ ¨ i , s ) + b 1 i , s Z ˙ i , s + b ˙ 1 i , s Z i , s + b ˙ 2 i , s Z ˙ i , s + b 3 i , s ( ( b 1 i , s Z i , s + b 2 i , s Z ˙ i , s ) ) and F i , s = b 2 i , s r i , s H t i , s .
On the basis of the [42] SMC principle, when the system runs into the sliding stage, i.e., σ i , s = 0 , the following can be obtained:
b 1 i , s Z i , s + b 2 i , s Z ˙ i , s + b 3 0 t ( b 1 i , s Z i , s + b 2 i , s Z ˙ i , s ) d r = 0
Remark 4. 
The SMC process is considered in two stages [42]: the reaching stage and the sliding mode control stage. However, the trajectory tends to depart from the slide modeling surface under the effects of other adverse conditions, such as obstacle avoidance, external disturbances, actuator faults, and input saturation, which leads to the problem of robustness degradation of the system, especially in the reaching phase. Obviously, the problem of robustness degradation severely affects the control performance. In this paper, AGITSMC control is developed to suppress the jitter of the system. On this basis, the adaptive gain parameters b 1 i , s and b 2 i , s are introduced, which increase when the error increases and decrease when the error decreases. Therefore, the introduction of the adaptive gain parameter avoids excessive or insufficient gain, which not only improves the convergence velocity and control accuracy but also reduces the jitter phenomenon. Thus, the proposed AGITSMC strategy can enable the system to maintain good control performance in complex environments.

3.3. Critic-Only Neural Network Optimal Dynamic Programming (CNNODP) Control

Next, to design an optimal sliding mode controller, we construct a critic-only neural network (NN) strategy with the RBFNN, where the critic-only NN is updated in real time via RL Bellman residuals and empirical replay techniques. Combining the sliding variables in (20) with the total cost of the control input energy, the performance function index with an attenuation coefficient can be designed as follows:
C i , s ( σ i , s ( t ) ) = t e w ( r t ) N i , s ( σ i , s ( r ) , τ i , s ) d r
where N i , s ( σ i , s , τ i , s ) = σ i , s 2 + τ i , s 2 denotes the cost function and where τ i , s denotes the controller. The attenuation coefficient w > 0 and the e w ( r t ) term ensure that the cost function remains bounded even though the tracking errors do not eventually converge to 0.
Remark 5. 
The purpose of adaptive optimal control is to minimize the value function of a particular characteristic. In nominal systems [22,23,24,25,26], the cost function is a quadratic function related to the tracking error and control input. In this work, sliding mode variables are added to the value function of each robotic manipulator to achieve optimal sliding mode control to address the effects of environmental disturbances and actuator faults on control performance.
To address the optimal trajectory problem, a new adaptive optimal sliding mode controller is designed. The design process is as follows:
First, the Bellman equation is obtained by using Leibniz’s law to derive the following Equation (23):
C i , s ( σ i , s ) σ ˙ i , s w C i , s ( σ i , s ) + N i , s ( σ i , s , τ i , s ) = 0
where C i , s ( σ i , s ) = C i , s ( σ i , s ) σ i , s is the gradient of C i , s ( σ i , s ) in the direction of σ i , s .
Then, the Hamiltonian function can be defined as follows:
H i , s ( σ i , s , C i , s ( σ i , s ) , τ i , s ) = C i , s ( σ i , s ) ( F i , s τ i , s + f i , s ) w C i , s ( σ i , s ) + N i , s ( σ i , s , τ i , s )
Ω ( A ) is defined as the set of permissible controls on A , where A R n denotes the tight set. The optimized cost function is as follows:
C i , s * ( σ i , s ) = min τ Ω ( A ) t e w ( r t ) N i , s ( σ i , s ( r ) , τ i , s ) d r = t N i , s ( σ i , s ( r ) , τ i , s * ) d r
Thus, the Hamilton–Jacobi–Bellman (HJB) equation can be designed according to the following:
H i , s ( σ i , s , C i , s * ( σ i , s ) , τ i , s * ) = C i , s * ( σ i , s ) ( F i , s τ i , s * + f i , s ) w C i , s * ( σ i , s ) + N i , s ( σ i , s , τ i , s * )
where C i , s * ( σ i , s ) = C i , s * ( σ i , s ) σ i , s is the gradient of C i , s * ( σ i , s ) in the direction of σ i , s .
q ¨ i , s is considered optimized τ i , s * . By solving for H i , s τ i , s * = 0 , the optimal controller can be calculated as follows:
τ i , s * = F i , s 2 C i , s * ( σ i , s )
Obviously, the direct computation of the optimal controller is difficult because of the complex nonlinearities in C i , s * ( σ i , s ) . To achieve the optimal performance control objective, the term C i , s * ( σ i , s ) is constructed as C i , s * ( σ i , s ) = λ i , s ϑ i , s 2 + C i , s 0 ( σ i , s ) , where λ i , s a is a positive number.
Therefore, one can obtain the following:
C i , s * ( σ i , s ) = 2 λ i , s σ i , s + C i , s 0 ( σ i , s )
Therefore, τ i , s * can be rewritten as follows:
τ i , s * = λ i , s F i , s σ i , s 1 2 F i , s C i , s 0 ( σ i , s )
Considering the unknown functions, the critic-only NN is used to approximate C i , s 0 ( σ i , s ) and C i , s 0 ( σ i , s ) . The cost function is approximated as C i , s 0 ( ϑ i , s ) = θ i , s T φ i , s + ξ i , s and C i , s 0 ( σ i , s ) = φ i , s T θ i , s + ξ i , s , where θ i , s is the ideal NN parameter vector, φ i , s is the activation function, and ξ i , s denotes the approximation error with ξ i , s ξ ¯ i , s . Since θ i , s is unknown, the critic-only NN is calculated as follows:
C ^ i , s 0 ( σ i , s ) = φ i , s T θ ^ i , s
where C ^ i , s 0 ( σ i , s ) and θ ^ i , s denote the cost function and the critic-only NN weight θ i , s estimate, respectively. θ ˜ i , s = θ ^ i , s θ i , s and θ ˜ i , s are the estimate errors.
Finally, combining (30) and (31), the approximate optimal controller can be rewritten as follows:
τ i , s = λ i , s F i , s σ i , s 1 2 F i , s φ i , s T θ ^ i , s
To update the critic-only NN in real time with the RL Bellman residuals and empirical replay techniques, we can define t 1 = t T w and T w > 0 . By using the RL algorithm, Equation (23) can be rewritten as follows:
C i , s * ( σ i , s ( t 1 ) ) = t 1 t e w ( r t 1 ) N i , s ( σ i , s ( r ) , τ i , s * ) d r + e w T w C i , s * ( σ i , s )
Therefore, the RL Bellman residuals caused by the critic-only NN can be defined as follows:
p i , s = t 1 t e w ( r t 1 ) N i , s ( σ i , s ( r ) , τ i , s ) d r + e w T w C ^ i , s ( σ i , s ) C ^ i , s ( σ i , s ( t 1 ) )    = t 1 t e w ( r t 1 ) N i , s ( σ i , s ( r ) , τ i , s ) d r + e w T w ( λ i , s σ i , s 2 + θ ^ i , s T φ i , s ) ( λ i , s σ i , s 2 ( t 1 ) + θ ^ i , s T φ i , s ( t 1 ) )    = t 1 t e w ( r t 1 ) N i , s ( σ i , s ( r ) , τ i , s ) d r + Δ σ i , s + θ ^ i , s T Δ φ i , s
where Δ φ i , s = e w T w φ i , s φ i , s ( t 1 ) and Δ σ i , s = e w T w λ i , s σ i , s 2 λ i , s σ i , s 2 ( t 1 ) .
Remark 6. 
In this paper, the critic-only NN is constructed to obtain an approximate optimal controller instead of the actor–critic structure or the identifier–actor–critic structure. The proposed strategy is excellent over several existing strategies.
First, in controller (32), the consensus control relies on the value function, and we use the critic-only NN to obtain the approximate optimal controller. Compared with the actor–critic structure [22,23,24], the critic-only NN structure adopted in this paper greatly reduces the control system complexity, which is favorable for practical application in engineering.
Next, compared with traditional adaptive optimal control [25,26], the RL Bellman residuals are adopted to drive the update law of the critic-only NN. However, because of the uncertainty of the system model, there are often unknown functions in the HJB equations, which requires the identifier network to estimate the unknown function i , s ( t ) of (3). There is no doubt that this greatly increases the complexity of the system. To eliminate the identifier network, we use an empirical playback technique to obtain the RL residuals (34) and then drive the updating law with the help of the RL residuals. As a result, there is no need to consider the effect of unknown functions, which reduces the need for an identifier network and greatly reduces the complexity of the system.
Third, unlike [27,28,29,30], the designed update law contains two items. The first one is driven by using RL residuals on the basis of the gradient descent method. The second uses historical samples to adjust the weight vector, which accelerates the decrease in the adaptive law [43].
Finally, compared with existing reinforcement learning algorithms [22,23,24,25,26,27,28,29,30], this paper introduces a feedback term λ i , s σ i , s to improve the convergence velocity of robotic manipulators.
To minimize the RL residual, the gradient descent method, parallel learning technique, and experience replay technique are utilized to obtain the update law of θ ^ i , s as follows:
θ ^ ˙ i , s = p 0 , s Δ φ i , s ( 1 + Δ φ i , s T Δ φ i , s ) 2 p i , s p 0 , s l = 1 L Δ φ i , s l ( 1 + Δ φ i , s l T Δ φ i , s l ) 2 p i , s l
where p 0 , s is the learning rate, whose value determines the training velocity of the critic-only NN, and where t l { t 1 , t 2 , , t L } is the index that is used to mark the historical state of the storage.
According to θ ˜ i , s = θ ^ i , s θ i , s , one can obtain the following:
θ ˜ ˙ i , s = θ ^ ˙ i , s = p 0 , s Δ φ i , s T Δ φ i , s ( 1 + Δ φ i , s T Δ φ i , s ) 2 θ ˜ i , s p 0 , s l = 1 L Δ φ i , s l T Δ φ i , s l ( 1 + Δ φ i , s l T Δ φ i , s l ) 2 θ ˜ i , s      + p 0 , s Δ φ i , s ( 1 + Δ φ i , s T Δ φ i , s ) 2 Δ ξ i , s + p 0 , s l = 1 L Δ φ i , s l ( 1 + Δ φ i , s l T Δ φ i , s l ) 2 Δ ξ i , s l
where Δ ξ i , s = ξ i , s ( t 1 ) e w T w ξ i , s and Δ ξ i , s l = ξ i , s l ( t 1 ) e w T w ξ i , s l .
Remark 7. 
To ensure that θ ^ i , s converges to θ i , s , this paper combines the gradient descent method and parallel learning technique to relax the PE condition. Therefore, the update law (35) is classified into two terms. The first is driven by the current data via the gradient descent algorithm, and the second is driven by the historical data via the gradient descent algorithm. To minimize the residuals from the RL, the current data and historical data are used together to drive the update of (35) via the parallel learning technique. Define φ ¯ i , s l = Δ φ i , s l ( 1 + Δ φ i , s l T Δ φ i , s l ) 2 and Z ( φ ¯ i , s ) = [ φ ¯ i , s 1 , φ ¯ i , s 2 , , φ ¯ i , s l ] T as the state of the history. To increase the velocity of convergence of the critic-only neural network and relieve the stringent incentive persistence (PE) conditions encountered in several existing studies, the number of stored history states can be defined as L > r a n k ( Z ( φ ¯ i , s ) ) ) , such as [43].

3.4. Adaptive Fault Compensation Control

Adaptive fault compensation control is designed here to address the problem of multiple robotic manipulator actuator faults. To achieve the control objective, an adaptive fault compensation optimal controller is designed as follows.
Combining (32), the compensation controller is calculated as follows:
τ h i , s = λ i , s F i , s σ i , s 1 2 F i , s φ i , s T θ ^ i , s
α i , s = τ h i , s σ i , s σ i , s 2 + k 2 , 2 2 e 2 t γ ^ 2 i , s
τ γ i , s = ( γ ^ 1 i , s + 1 ) α i , s
Combining (37)–(39), the adaptive fault compensation optimal controller is calculated as follows:
τ i , s = Φ i , s ( τ γ i , s ) + i , s
The adaptive parameters γ ^ 1 and γ ^ 2 are approximations of 1 γ 1 and γ 2 , respectively, which can be calculated as follows:
γ ^ ˙ 1 i , s = k 1 , 1 γ ^ 1 i , s k 1 , 2 F i , s σ i , s α i , s
γ ^ ˙ 2 i , s = k 2 , 1 γ ^ 2 i , s + σ i , s 2 F i , s σ i , s 2 + k 2 , 2 2 e 2 t
where k 1 , 1 , k 1 , 2 , k 2 , 1 , and k 2 , 2 are positive constants; if γ ^ 1 i , s ( 0 ) > 0 and γ ^ 2 i , s ( 0 ) > 0 , it is simple to obtain γ ^ 1 i , s ( t ) > 0 and γ ^ 2 i , s ( t ) > 0 ; γ ˜ 1 = γ ^ 1 1 γ 1 and γ ˜ 2 = γ ^ 2 γ 2 .

3.5. Stability Analysis

Step 1. On the basis of ϑ i , s , V s = i = 1 n s = 1 N 1 2 σ i , s 2 . Deriving from V s , one can obtain the following:
V ˙ s = i = 1 n s = 1 N σ i , s F i , s τ i , s + f i , s    = i = 1 n s = 1 N F i , s σ i , s ( Φ i , s ( ( γ ^ 1 , s + 1 ) α i , s ) + i , s ) + σ i , s f i , s    i = 1 n s = 1 N F i , s σ i , s γ 1 i , s ( γ ^ 1 , s + 1 ) α i , s + F i , s σ i , s γ 2 i , s + σ i , s f i , s
Step 2. On the basis of γ ^ 1 , V 1 = i = 1 n s = 1 N 1 2 γ 1 i , s k 1 , 2 γ ˜ 1 i , s 2 . Deriving from V 1 , one can obtain the following:
V 1 = i = 1 n s = 1 N γ 1 i , s k 1 , 2 γ ˜ 1 i , s k 1 , 1 γ ^ 1 i , s F i , s k 1 , 2 σ i , s α i , s
On the basis of Yang’s inequality, one can obtain the following:
γ ˜ 1 i , s ( γ ˜ 1 i , s + 1 γ 1 i , s ) γ ˜ 1 i , s 2 + 1 2 γ ˜ 1 i , s 2 + 1 2 1 γ 1 i , s 2
Substituting (45) into (44), one can obtain the following:
V 1 i = 1 n s = 1 N F i , s σ i , s γ 1 i , s γ ˜ 1 i , s α i , s 1 2 k 1 , 1 k 1 , 2 γ 1 i , s γ ˜ 1 i , s 2 + 1 2 k 1 , 1 k 1 , 2 1 γ 1 i , s
Step 3. On the basis of γ ^ 2 , V 2 = i = 1 n s = 1 N 1 2 γ ˜ 2 i , s 2 . Deriving from V 2 , one can obtain the following:
V ˙ 2 = i = 1 n s = 1 N γ ˜ 2 i , s k 2 , 1 γ ^ 2 i , s + σ i , s 2 F i , s σ i , s 2 + k 2 , 2 2 e 2 t
On the basis of Yang’s inequality, one can obtain the following:
γ ˜ 2 i , s ( γ ˜ 2 i , s + γ 2 i , s ) γ ˜ 2 i , s 2 + 1 2 γ ˜ 2 i , s 2 + 1 2 γ 2 i , s 2
Substituting (48) into (47), one can obtain the following:
V ˙ 2 i = 1 n s = 1 N σ i , s 2 F i , s σ i , s 2 + k 2 , 2 2 e 2 t γ ˜ 2 i , s k 2 , 1 2 γ ˜ 2 i , s 2 + k 2 , 1 2 γ 2 i , s 2
Combining (43), (46) and (49), one can obtain the following:
V ˙ s + V ˙ 1 + V ˙ 2 i = 1 n s = 1 N F i , s σ i , s γ 1 i , s ( γ ^ 1 , s + 1 ) α i , s F i , s σ i , s γ 1 i , s γ ˜ 1 i , s α i , s + σ i , s 2 F i , s σ i , s 2 + k 2 , 2 2 e 2 t γ ˜ 2 i , s k 2 , 1 2 γ ˜ 2 i , s 2 k 1 , 1 2 γ 1 i , s k 1 , 2 γ ˜ 1 i , s 2 + F i , s σ i , s γ 2 i , s + σ i , s f i , s + k 2 , 1 2 γ 2 i , s 2 + 1 2 k 1 , 1 k 1 , 2 1 γ 1 i , s          i = 1 n s = 1 N F i , s σ i , s τ h i , s ( 1 + γ 1 i , s ) γ 2 i , s σ i , s 2 F i , s σ i , s 2 + k 2 , 2 2 e 2 t + F i , s σ i , s γ 2 i , s k 1 , 1 2 γ 1 i , s k 1 , 2 γ ˜ 1 i , s 2 k 2 , 1 2 γ ˜ 2 i , s 2 + k 1 , 1 2 k 1 , 2 γ 1 i , s + k 2 , 1 2 γ 2 i , s 2 + σ i , s f i , s
On the basis of Lemma 2, one can obtain the following:
γ 2 i , s F i , s σ i , s 2 σ i , s 2 + k 2 , 2 2 e 2 t F i , s σ i , s γ 2 i , s + k 2 , 2 e t F i , s γ 2 i , s
Substituting (51) into (50), one can obtain the following:
V ˙ s + V ˙ 1 + V ˙ 2 i = 1 n s = 1 N F i , s σ i , s τ h i , s ( 1 + γ 1 i , s ) k 1 , 1 2 γ 1 i , s k 1 , 2 γ ˜ 1 i , s 2 k 2 , 1 2 γ ˜ 2 i , s 2 + σ i , s f i , s + k 1 , 1 2 k 1 , 2 γ 1 i , s + k 2 , 1 2 γ 2 i , s 2 + k 2 , 2 e t F i , s γ 2 i , s
From the above, we find that τ h i , s is an approximate optimal controller derived without considering actuator faults. Therefore, V s 0 = σ i , s 2 , where σ i , s does not consider actuator faults.
Assumption 3 
([31]). Let V s 0 be a candidate term of a continuously differentiable Lyapunov function that satisfies the following:
V ˙ s 0 = ( V s 0 ( σ i , s ) ) T σ ˙ i , s = ( V s 0 ( σ i , s ) ) T ( F i , s τ h i , s + f i , s ) 0
where V s 0 ( σ i , s ) is the gradient of V s 0 across σ i , s .
Therefore, there exists a positive definite matrix ψ i , s R n × n satisfying the following:
( V s 0 ( σ i , s ) ) T ( ( 1 + γ 1 i , s ) F i , s τ h i , s + f i , s ) η min ( ψ i , s ) σ i , s 2
Combining (52), (54) and Assumption 3, one can obtain the following:
V ˙ s + V ˙ 1 + V ˙ 2 i = 1 n s = 1 N η min ( ψ i , s ) σ i , s 2 k 1 , 1 2 γ 1 i , s k 1 , 2 γ ˜ 1 i , s 2 k 2 , 1 2 γ ˜ 2 i , s 2 + 1 2 k 1 , 1 k 1 , 2 γ 1 i , s + k 2 , 1 2 γ 2 i , s 2 + k 2 , 2 e t F i , s γ 2 i , s
Step 4. On the basis of θ i , s , V 3 = i = 1 n s = 1 N 1 2 θ ˜ i , s 2 . Deriving from V 3 , one can obtain the following:
V ˙ 3 = i = 1 n s = 1 N θ ˜ i , s θ ˜ ˙ i , s = i = 1 n s = 1 N p 0 , s Δ φ i , s T Δ φ i , s ( 1 + Δ φ i , s T Δ φ i , s ) 2 θ ˜ i , s 2 p 0 , s l = 1 L Δ φ i , s l T Δ φ i , s l ( 1 + Δ φ i , s l T Δ φ i , s l ) 2 θ ˜ i , s 2 + p 0 , s Δ φ i , s ( 1 + Δ φ i , s T Δ φ i , s ) 2 Δ ξ i , s θ ˜ i , s + p 0 , s l = 1 L Δ φ i , s l ( 1 + Δ φ i , s l T Δ φ i , s l ) 2 Δ ξ i , s l θ ˜ i , s
On the basis of Yang’s inequality, one can obtain the following:
Δ φ i , s ( 1 + Δ φ i , s T Δ φ i , s ) 2 Δ ξ i , s θ ˜ i , s 1 2 Δ φ i , s T Δ φ i , s ( 1 + Δ φ i , s T Δ φ i , s ) 4 θ ˜ i , s 2 + 1 2 Δ ξ i , s 2
l = 1 L Δ φ i , s l ( 1 + Δ φ i , s l T Δ φ i , s l ) 2 Δ ξ i , s l θ ˜ i , s 1 2 l = 1 L Δ φ i , s l T Δ φ i , s l ( 1 + Δ φ i , s l T Δ φ i , s l ) 4 θ ˜ i , s 2 + 1 2 l = 1 L Δ ξ i , s l 2
Combining (56)–(58), one can obtain the following:
φ θ = Δ φ i , s T Δ φ i , s ( 1 + Δ φ i , s T Δ φ i , s ) 2 1 2 Δ φ i , s T Δ φ i , s ( 1 + Δ φ i , s T Δ φ i , s ) 4 = Δ φ i , s T Δ φ i , s 2 ( ( 1 + Δ φ i , s T Δ φ i , s ) 2 1 ) 2 ( 1 + Δ φ i , s T Δ φ i , s ) 4 > 0
φ θ L = l = 1 L Δ φ i , s l T Δ φ i , s l ( 1 + Δ φ i , s l T Δ φ i , s l ) 2 1 2 l = 1 L Δ φ i , s l T Δ φ i , s l ( 1 + Δ φ i , s l T Δ φ i , s l ) 4 = l = 1 L Δ φ i , s l T Δ φ i , s l 2 ( ( 1 + Δ φ i , s l T Δ φ i , s l ) 2 1 ) 2 ( 1 + Δ φ i , s l T Δ φ i , s l ) 4 > 0
Combining (56)–(60), one can obtain the following:
V ˙ 3 i = 1 n s = 1 N ( p 0 , s ( φ θ i , s + φ θ i , s L ) θ ˜ i , s 2 + 1 2 p 0 , s ( Δ ξ i , s 2 + l = 1 L Δ ξ i , s l 2 ) )
Step 5. On the basis of C i , s * ( σ i , s ) , V 4 = i = 1 n s = 1 N C i , s * ( σ i , s ) . Deriving from V 5 , one can obtain the following:
V ˙ 4 = i = 1 n s = 1 N C ˙ i , s * ( σ i , s ) = i = 1 n s = 1 N w C i , s * N i , s ( σ i , s , τ h i , s * ) i = 1 n s = 1 N σ i , s 2 + w C i , s *
Step 6. On this basis V = V s + V 1 + V 2 + V 3 + V 4 . Deriving from V , one can obtain the following:
V ˙ i = 1 n s = 1 N η min ( ψ i , s ) σ i , s 2 k 1 , 1 2 γ 1 i , s k 1 , 2 γ ˜ 1 i , s 2 k 2 , 1 2 γ ˜ 2 i , s 2 + 1 2 k 1 , 1 k 1 , 2 γ 1 i , s + k 2 , 1 2 γ 2 i , s 2 + γ 2 i , s k 2 , 2 e t F i , s p 0 , s ( φ θ + φ θ L ) θ ˜ i , s 2 + 1 2 p 0 , s ( Δ ξ i , s 2 + l = 1 L Δ ξ i , s l 2 ) σ i , s 2 + w C i , s *     i = 1 n s = 1 N μ 1 x i , s + μ 2
where x i , s = [ σ i , s , γ ˜ 1 i , s , γ ˜ 2 i , s , θ ˜ i , s ] T , μ 1 = min ( η min ( ψ i , s ) + 1 , γ 1 i , s k 1 , 1 2 k 1 , 2 , k 2 , 1 2 , p 0 , s ( φ θ + φ θ L ) ) and μ 2 = 1 2 k 1 , 1 k 1 , 2 γ 1 i , s + k 2 , 1 2 γ 2 i , s 2 + γ 2 i , s k 2 , 2 e t F i , s + 1 2 p 0 , s ( Δ ξ i , s 2 + l = 1 L Δ ξ i , s l 2 ) + w C i , s * > 0 .
Finally, one can obtain the following:
V ˙ 0 , x i , s μ 2 μ 1
On the basis of the standard Lyapunov extension lemma of [44], the trajectories of the multiple robotic manipulators are verified to be uniformly ultimately bounded (UUB).
Proof. 
On the basis of assumption 2 and the e w ( r t ) term, the cost function approximation cannot be infinite in a real system even though the tracking error does not eventually converge to 0. On the basis of Section 2.4, the critic-only NN approximation error and its gradient are both bounded. It is simple to obtain Δ ξ i , s = ξ i , s ( t 1 ) e w T w ξ i , s 2 ξ ¯ i , s and Δ ξ i , s l = ξ i , s l ( t 1 ) e w T w ξ i , s l 2 ξ ¯ i , s l . On the basis of Assumption 1, the actuator fault influences are bounded. For equation F i , s = b 2 u i , s H t i , s , Lemma 2 shows that H t i , s is bounded, it is obvious from b 2 i , s = a 2 ( 2 ( 2 π arctan ( E i , s ) ) 2 ) 2 c 2 that a 2 2 2 c 2 b 2 i , s a 2 , and it is obvious from r i , s = sec 2 ( arctan ( E i , s ) β i , s ) 1 β i , s ( 1 + E i , s 2 ) that 0 < r i , s sec 2 ( π 2 β i , s ) 1 β i , s . It can be concluded that F i , s is bounded. Assumption 1 and Assumption 2 can be fulfilled in practice. In summary, we can consider μ 2 to be bounded. Therefore, the UUB of the multiple robotic manipulator system can be ensured under the control scheme of this paper. The proof ends. □
Remark 8. 
Since the critic-only NN is related to the estimation error, it can guarantee only UUB stability for cooperative control with multiple robotic manipulators. However, one can obtain the desired control performance by adjusting the design parameters of the controller and choosing the appropriate neural network structure.

4. Simulations

4.1. Simulation Conditions

In this section, multiple two-degree-of-freedom manipulator systems consisting of a leader and seven followers are demonstrated. The dynamics model is given by the following entries:
M i = P i , 1 + 2 P i , 2 cos ( q i , 2 )    P i , 3 + P i , 2 cos ( q i , 2 ) P i , 3 + P i , 2 cos ( q i , 2 )      P i , 3 C i = P i , 2 sin ( q i , 2 ) q ˙ i , 1   P i , 2 sin ( q i , 2 ) ( q ˙ i , 1 + q ˙ i , 2 ) P i , 2 sin ( q i , 2 ) q ˙ i , 2       0
where p i , 1 = J i , 1 + m i , 1 r i , a 1 2 + m i , 2 r i , 1 2 + J i , 2 + m i , 2 r i , a 2 2 ; p i , 2 = m i , 2 r i , 1 r i , a 2 ; p i , 3 = J i , 2 + m i , 2 r i , a 2 2 ; m i , 1 and m i , 2 denote the mass of the robotic manipulator; J i , 1 and J i , 2 denote the moment of inertia of the robotic manipulator; r i , 1 and r i , 2 denote the length of the robotic manipulator; and r i , a 1 and r i , a 2 denote the center of mass of the robotic manipulator. The critic-only NN structure is shown in Figure 2. The interactive topology of leaders and followers is shown in Figure 3. The multiple robotic manipulator system with a coordinate diagram is shown in Figure 4. The multiple manipulator model parameters are shown in Table 2. The system’s parameters and initial conditions are shown in Table 3.
Remark 9. 
To explain the effects of various parameter choices on the performance of the control system, the following basic suggestions for parameter adjustment are given:
(1) For the dynamic distributed observer (DDO), the selection of parameter g c ( c = 1 , 2 , 3 , 4 ) needs to balance the observation accuracy and computational efficiency: increasing g c ( c = 1 , 2 , 3 , 4 ) can reduce the observation error of robotic manipulators and improve the control accuracy, but it increases the computational burden and affects the real-time performance. The value of g c ( c = 1 , 2 , 3 , 4 ) should be optimized in practical applications to ensure sufficient accuracy while avoiding excessive consumption of computational resources to achieve the best trade-off between error and computational cost.
(2) For the QPPC method, the initial value ( β o i , s ) plays a role in determining the initial error bounds and has an impact on the transient performance; the final value ( β i , s ) affects the ultimate convergence performance; the convergence time ( T a ) necessitates a careful balance between velocity, which favors small values, and stability, which requires large values.
(3) For the AGITSMC method, as fundamental proportionality coefficients, a 1 and a 2 directly establish the benchmark for the magnitude values of b 1 i , s and b 2 i , s , meaning that larger values of a 1 and a 2 will correspondingly increase the overall magnitude values of b 1 i , s and b 2 i , s , which in turn affects the calculation of the sliding surface and influences the dynamic performance of the control system; as small positive parameters, c 1 and c 2 dictate the change rates of b 1 i , s and b 2 i , s with respect to variations in E i , s and e i , s , where larger c 1 and c 2 values heighten the sensitivity of functions b 1 i , s and b 2 i , s to such changes, enabling quicker adjustments of b 1 i , s and b 2 i , s and thus facilitating a more prompt system response to error changes; b 3 is a positive number that can balance the impact of current errors and historical errors on the sliding surface.
(4) For the CNNODP method, feedback gains ( λ i , s ) optimize the system response by balancing the convergence velocity and stability, increasing the degree of convergence, but excessive gain causes oscillation; the discount factor ( w > 0) serves to strike a balance between the significance of the present performance and future outcomes while simultaneously enhancing computational stability by preventing the unrestricted accumulation of future costs or rewards in the control system; the learning rate ( p 0 , s ) is crucial in the gradient descent algorithm, as it controls the step size, influences the convergence speed and stability, and balances speed with accuracy to ensure effective convergence to a good solution; ( T w ) usually represents a time interval that balances the system’s consideration of current and future performance by defining the integral interval, influencing the stability of the optimization results, and thereby achieving more effective control optimization; the experience sample ( L ) provides diverse data to facilitate comprehensive system understanding, smooths the update process, boosts computational efficiency, and prevents overfitting to recent experiences.
(5) For the fault compensation control method, as a positive parameter, ( k 1 , 1 , k 1 , 2 , k 2 , 1 , and k 2 , 2 ) control the rate of compensation of the fault compensation rate, which is greater for a faster response to changes in actuator faults and smaller for slower and more conservative updating; thus, it needs to be carefully adjusted to ensure stable and accurate fault compensation. ( k 1 , 1 , k 1 , 2 , k 2 , 1 , and k 2 , 2 ) are positive numbers that control the compensation speed of the fault compensation rate. The larger the values of ( k 1 , 1 , k 1 , 2 , k 2 , 1 , and k 2 , 2 are, the faster the fault compensation rate responds to changes in actuator faults, and a smaller value results in a slower and more conservative update. These parameters need to be carefully adjusted to ensure the stability and accuracy of fault compensation.

4.2. Simulation Analysis

Figure 5 and Figure 6 show the position tracking and tracking error profiles for one leader and seven followers via the QPPC strategy. Figure 7 illustrates the control inputs of the robotic manipulator. Specifically, the system is unconstrained from the initial period to a certain time period, i.e., when β i , s 1 , and the system error is constrained to a predefined time domain when β i , s < 1 . Therefore, Figure 5 and Figure 6 show that the designed controller realizes flexible performance constraints and has good tracking performance throughout. Figure 6 shows the control inputs of the multiple robotic manipulators. The control inputs eventually converge and remain bounded and very stable without significant jitter.
Figure 8 illustrates the observation errors for the multiple robotic manipulators. The observation errors eventually converge and are always bounded. Figure 9 shows the convergence process of the critic-only NN weight estimation for robotic manipulator 1, which ultimately remains stable. The critic-only NN weight estimation eventually converges and eventually remains stable. Therefore, optimal control is achieved by using the critic-only strategy, which is approximately one-half the computation of the actor–critic strategy [22,23,24] and one-third of the identifier–actor–critic strategy [25,26].

4.3. Comparative Simulation

4.3.1. Comparative Simulation for the AGITSMC Control Strategy

To highlight the advantages of the AGITSMC control strategy, comparisons with those of [11,12,13] are given.
  • Case 1 ([13]).
    ϑ i , s = E ˙ i , s + k 1 ρ i k 1 ρ i k 2 E i , s k 2 s i g k 3 ( E i , s )
    where k 1 , k 2 > 1 and k 3 > 1 are positive parameters,and ρ i is the prescribed performance function.
  • Case 2 ([12]).
    ϑ i , s = E ˙ i , s + k 4 0 t s i g k 5 ( E i , s ) d t
    where k 4 and k 5 are positive parameters.
  • Case 3 ([11]).
    ϑ i , s = E i , s + k 6 s i g k 7 ( E i , s ) + k 8 s i g k 9 ( E ˙ i , s )
    where k 6 , k 7 , k 8 and k 9 are positive parameters.
To better show the differences in the control performance of the four strategies, local comparisons of the trajectory convergence velocity, error convergence domain, and control inputs for the four control strategies are presented. The parameters for the comparative experiments are shown in Table 4. Figure 10 and Figure 11 show a comparison of the trajectory convergence velocities from 0 to 10 s for the four control strategies. From the two figures. The proposed AGITSMC strategy has the fastest convergence velocity. Figure 12 and Figure 13 show the comparisons of the error convergence domain in the 20–350 s range for the four control strategies. The two figures show that the tracking errors of the four strategies are within a predefined domain, especially after the actuator faults are within a predefined domain. Therefore, the four strategies have good steady-state performance. Specifically, among the four control strategies, the AGITSMC strategy has the smallest error convergence domain and the best tracking effect. Figure 14 and Figure 15 show comparisons of the control inputs from 20 to 350 s for the four strategies. The control inputs of the proposed AGITSMC strategy are much smoother and recover to a smooth state even after actuator faults. In contrast, the control inputs of the other three strategies have a certain level of jitter, especially in case 3, which obviously cannot address the problem of input jitter well. In addition, the control energy of the AGITSMC strategy is the smallest.
To fully evaluate the effectiveness of the four strategies, three indicators are used to quantify their performance. They are the integral of the absolute value of the error (IAE), the integral of the time multiplied by the absolute value of the error (ITAE), and the integral of the square value (ISV) of the control input, which are described as follows:
I A E = 0 350 E i , s d t I T A E = 0 350 t E i , s d t I S V = 0 350 τ i , s 2 d t
To facilitate the comparative performance indicators of the seven robotic manipulators in the four strategies, we directly take the average of the corresponding indicators of the seven robotic manipulators. The relevant data are summarized in Table 5. Table 5 shows that both the IAE and ITAE indicators of the proposed AGITSMC are minimized, which means that the AGITSMC has a smaller tracking error and higher dynamic tracking accuracy than the comparative strategies do; the ISV indicator of the AGITSMC is minimized, which means that the control cost of the proposed controller is minimized. Thus, compared with existing SMC strategies, the proposed AGITSMC not only offers better tracking performance but also saves more cost.

4.3.2. Comparative Simulation for the CNNODP Control Strategy

To highlight the advantages of the CNNODP control strategy, comparisons with the actor–critic method [22,23,24] are given.
The convergence speeds of the two strategies are shown in Figure 16 and Figure 17, indicating that the CNNODP strategy converges significantly faster. This is because CNNODP only needs to optimize a single value network without the coupled update issues between the actor and critic networks. In simple state-action mapping scenarios, it can quickly learn effective control rules, leading to a faster error reduction rate than the more complex actor–critic architecture can achieve. Figure 18 and Figure 19 clearly show a comparison of the error convergence domains of the two strategies. The results clearly reveal that the convergence error of the CNNODP strategy is significantly smaller. The core reason for this advantage lies in the fact that the value function design of the CNNODP strategy enables it to directly learn the “optimal action-state matching” rules without going through complex network interaction processes. Therefore, unlike other strategies, it does not generate additional deviations because of network coupling issues. This characteristic allows the CNNODP strategy to maintain high tracking accuracy even after the system stabilizes.

5. Conclusions

In this paper, adaptive optimal sliding mode fault-tolerant control is proposed for multiple robotic manipulators on the basis of quantitative prescribed performance control and critic-only dynamic programming. The QPPC strategy releases the initial state of the control system to realize global predetermined performance control. The AGITSMC strategy not only improves the convergence velocity and control accuracy but also reduces the jitter of the system. The CNNODP strategy not only reduces the computational effort but also achieves optimal control. The adaptive fault compensation control strategy addresses actuator faults. The simulation results of the control strategies constructed in this paper under external disturbances and actuator faults and the comparison of the simulation results with those of the existing methods have proven the effectiveness of the proposed strategy. The simulation results show that the proposed control strategies have very good control performance. Specifically, the QPPC strategy divides the error constraint into two parts, i.e., when β i , s 1 , the system is unconstrained and when β i , s < 1 , the system is constrained to a predefined domain, and the control system retains a very good tracking effect throughout the whole process. Compared with other existing strategies, the AGITMC strategy ensures faster convergence of the system, less control cost, and the best interference ability. The learning law of the CNNODP strategy remains stable throughout the process, with a good learning effect. In conclusion, the proposed control strategy can achieve optimal robust control of multiple robotic manipulators under external disturbances and actuator faults. From the perspective of research development, it is necessary to further expand the section on future work, and conducting tests in different fields has numerous positive implications. For example, in various scenarios of industrial automation production, testing the applicability of this method to different types of multiple robotic manipulator systems can further validate its robustness and versatility. In the aerospace field, when confronted with complex and high-precision operational tasks as well as harsh environmental conditions, testing this method can explore its performance under extreme circumstances. Therefore, if tests can be carried out in different fields in the future, they will contribute to a more comprehensive evaluation of the value and application prospects of this method.

Author Contributions

Conceptualization, X.Z. and H.L.; Methodology, X.Z. and H.L.; Software, X.Z. and H.L.; Validation, X.Z. and H.L.; Formal analysis, Z.Y. and H.L.; Investigation, X.H.; Resources, Z.Y. and X.H.; Data curation, Z.Y. and X.H.; Writing—original draft, X.Z. and Z.Y.; Writing—review & editing, X.Z. and Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key Project of the Department of Education of Guangdong Province [grant number 2023ZDZX1005], the Science and Technology Innovation Team of the Department of Education of Guangdong Province [grant number 2024KCXTD041], the Shenzhen Science and Technology Program [grant number JCYJ20220530162014033], the Guangdong Basic and Applied Basic Research Foundation [grant number 2024A1515011345], and the Science and Technology Planning Project of Zhanjiang City [grant number 2021A05023].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

There are no potential commercial interests that require disclosure under the relevant guidelines.

References

  1. Surya, P.S.K.; Shukla, A.; Pandya, N.; Jha, S.S. Automated Vision-based Bolt Sorting by Manipulator for Industrial Applications. In Proceedings of the 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE), Bari, Italy, 28 August–1 September 2024. [Google Scholar]
  2. Lai, J.; Lu, B.; Ren, H. Kinematic concepts in minimally invasive surgical flexible robotic manipulators: State of the art. In Handbook of Robotic Surgery; Academic Press: Cambridge, MA, USA, 2025; pp. 27–41. [Google Scholar]
  3. Dias, P.A.; Petry, M.R.; Rocha, L.F. The Role of Robotics: Automation in Shoe Manufacturing. In Proceedings of the 2024 20th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications (MESA), Genova, Italy, 2–4 September 2024. [Google Scholar]
  4. Wang, L.M.; Jia, L.Z.; Zhang, R.D.; Gao, F.R. H∞output feedback fault-tolerant control of industrial processes based on zero-sum games and off-policy Q-learning. Comput. Chem. Eng. 2023, 179, 108421. [Google Scholar] [CrossRef]
  5. Wang, L.M.; Li, X.Y.; Zhang, R.D.; Gao, F.R. Reinforcement Learning-Based Optimal Fault-Tolerant Tracking Control of Industrial Processes. Ind. Eng. Chem. Res. 2023, 62, 16014–16024. [Google Scholar] [CrossRef]
  6. Wang, L.M.; Jia, L.Z.; Zou, T.; Zhang, R.D.; Gao, F.R. Two-dimensional reinforcement learning model-free fault-tolerant control for batch processes against multi- faults. Comput. Chem. Eng. 2025, 192, 108883. [Google Scholar] [CrossRef]
  7. Visinsky, M.L.Z. Dynamic Fault Detection and Intelligent Fault Tolerance for Robotics; Rice University: Houston, TX, USA, 1994. [Google Scholar]
  8. Aldridge, H.A.; Juang, J.N. Experimental Robot Position Sensor Fault Tolerance Using Accelerometers and Joint Torque Sensors. No. NASA-TM-110335. 1997. Available online: https://dl.acm.org/doi/pdf/10.5555/871365 (accessed on 22 July 2025).
  9. Sun, W.W. Stabilization analysis of time-delay Hamiltonian systems in the presence of saturation. Appl. Math. Comput. 2011, 217, 9625–9634. [Google Scholar] [CrossRef]
  10. Liu, Z.; Zhang, J.Y.; Zhu, Q.M. Adaptive Secure Control for Uncertain Cyber-Physical Systems With Markov Switching Against Both Sensor and Actuator Attacks. IEEE Trans. Syst. Man Cybern. Syst. 2025, 55, 3917–3928. [Google Scholar] [CrossRef]
  11. Jouila, A.; Nouri, K. An adaptive robust nonsingular fast terminal sliding mode controller based on wavelet neural network for a 2-DOF robotic arm. J. Frankl. Inst. 2020, 357, 13259–13282. [Google Scholar] [CrossRef]
  12. Zhang, S.; Cheng, S.; Jin, Z. Variable Trajectory Impedance: A Super-Twisting Sliding Mode Control Method for Mobile Manipulator Based on Identification Model. IEEE Trans. Ind. Electron. 2025, 72, 610–619. [Google Scholar] [CrossRef]
  13. Wu, Y.; Wang, Y.Y.; Xie, X.P.; Wu, Z.G.; Yan, H.C. Adaptive Reinforcement Learning Strategy-Based Sliding Mode Control of Uncertain Euler–Lagrange Systems With Prescribed Performance Guarantees: Autonomous Underwater Vehicles-Based Verification. IEEE Trans. Fuzzy Syst. 2024, 32, 6160–6171. [Google Scholar] [CrossRef]
  14. Boiko, I.; Fridman, L. Analysis of chattering in continuous sliding-mode controllers. IEEE Trans. Autom. Control 2005, 50, 1442–1446. [Google Scholar] [CrossRef]
  15. Zhang, Z.; Guo, Y.; Zhu, S.; Liu, J.; Gong, D. Adaptive integral sliding-mode finite-time control with integrated extended state observer for uncertain nonlinear systems. Inf. Sci. 2024, 667, 120456. [Google Scholar] [CrossRef]
  16. Zha, M.X.; Wang, H.P.; Tian, Y.; He, D.X.; Wei, Y.C. A novel hybrid observer-based model-free adaptive high-order terminal sliding mode control for robot manipulators with prescribed performance. Int. J. Robust Nonlinear Control 2024, 34, 11655–11680. [Google Scholar] [CrossRef]
  17. Tran, X.T.; Kang, H.J. Adaptive Hybrid High-Order Terminal Sliding Mode Control of MIMO Uncertain Nonlinear Systems and Its Application to Robot Manipulators. Int. J. Precis. Eng. Manuf. 2015, 16, 255–266. [Google Scholar] [CrossRef]
  18. Baban, P.Q.; Ahangari, M.E. Adaptive terminal sliding mode control of a non-holonomic wheeled mobile robot. Int. J. Veh. Inform. Commun. Syst. 2024, 9, 335–356. [Google Scholar]
  19. Alattas, K.A.; Vu, M.T.; Mofid, O.; El-Sousy, F.F.M.; Alanazi, A.K.; Awrejcewicz, J.; Mobayen, S. Adaptive Nonsingular Terminal Sliding Mode Control for Performance Improvement of Perturbed Nonlinear Systems. Mathematics 2022, 10, 1064. [Google Scholar] [CrossRef]
  20. Hu, K.X.; Ma, Z.J.; Zou, S.L.; Li, J.; Ding, H.R. Impedance Sliding-Mode Control Based on Stiffness Scheduling for Rehabilitation Robot Systems. Cyborg Bionic Syst. 2024, 5, 0099. [Google Scholar] [CrossRef]
  21. Tian, G.T.; Tan, J.; Li, B.; Duan, G.R. Optimal Fully Actuated System Approach-Based Trajectory Tracking Control for Robot Manipulators. IEEE Trans. Cybern. 2024, 54, 7469–7478. [Google Scholar] [CrossRef]
  22. Yin, Y.; Ning, X.; Xia, D. Adaptive output-feedback fault-tolerant control for space manipulator via actor-critic learning. Adv. Space Res. 2025, 75, 3914–3932. [Google Scholar] [CrossRef]
  23. Rahimi Nohooji, H.; Zaraki, A.; Voos, H. Actor–critic learning based PID control for robotic manipulators. Appl. Soft Comput. 2024, 151, 111153. [Google Scholar] [CrossRef]
  24. Liang, X.; Yao, Z.; Ge, Y.; Yao, J. Disturbance observer based actor-critic learning control for uncertain nonlinear systems. Chin. J. Aeronaut. 2023, 36, 271–280. [Google Scholar] [CrossRef]
  25. Fan, Y.; Yang, C.; Li, Y. Fixed-Time Neuro-Optimal Adaptive Control With Input Saturation for Uncertain Robots. IEEE Internet Things J. 2024, 11, 28906–28917. [Google Scholar] [CrossRef]
  26. Xie, Z.C.; Sun, T.; Kwan, T.; Wu, X.F. Motion control of a space manipulator using fuzzy sliding mode control with reinforcement learning. Acta Astronaut. 2020, 176, 156–172. [Google Scholar] [CrossRef]
  27. Guo, Y.; Huang, H. Approximate optimal and safe coordination of nonlinear second-order multirobot systems with model uncertainties. ISA Trans. 2024, 149, 155–167. [Google Scholar] [CrossRef]
  28. Ma, B.; Li, Y.C. Compensator-critic structure-based event-triggered decentralized tracking control of modular robot manipulators: Theory and experimental verification. Complex Intell. Syst. 2022, 8, 1913–1927. [Google Scholar] [CrossRef]
  29. Ma, B.; Dong, B.; Zhou, F.; Li, Y.C. Adaptive Dynamic Programming-Based Fault-Tolerant Position-Force Control of Constrained Reconfigurable Manipulators. IEEE Access 2020, 8, 183286–183299. [Google Scholar] [CrossRef]
  30. Duc, D.N.; Khac, L.L.; Tan, L.N. ADP-Based H∞ Optimal Control of Robot Manipulators With Asymmetric Input Constraints and Disturbances. IEEE Access 2024, 12, 67809–67819. [Google Scholar] [CrossRef]
  31. Liu, D.; Yang, X.; Wang, D.; Wei, Q. Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints. IEEE Trans. Cybern. 2015, 45, 1372–1385. [Google Scholar] [CrossRef] [PubMed]
  32. Liu, H.; Huang, H.; Tian, X.; Zhang, J. Distributed fixed-time formation control for UAV-USV multiagent systems based on the FEWNN with prescribed performance. Ocean Eng. 2025, 328, 120996. [Google Scholar] [CrossRef]
  33. Liu, H.; Feng, Z.; Tian, X.; Mai, Q. Adaptive predefined-time specific performance control for underactuated multi-AUVs: An edge computing-based optimized RL method. Ocean Eng. 2025, 318, 120048. [Google Scholar] [CrossRef]
  34. Wang, M.; Yang, A. Dynamic Learning From Adaptive Neural Control of Robot Manipulators With Prescribed Performance. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 2244–2255. [Google Scholar] [CrossRef]
  35. Golestani, M.; Chhabra, R.; Esmaeilzadeh, M. Finite-Time Nonlinear H∞ Control of Robot Manipulators With Prescribed Performance. IEEE Control Syst. Lett. 2023, 7, 1363–1368. [Google Scholar] [CrossRef]
  36. Xia, Y.; Yuan, Y.; Sun, W. Finite-Time Adaptive Fault-Tolerant Control for Robot Manipulators With Guaranteed Transient Performance. IEEE Trans. Ind. Inform. 2025, 21, 3336–3345. [Google Scholar] [CrossRef]
  37. Guo, Q.; Zhang, Y.; Celler, B.G.; Su, S.W. Neural Adaptive Backstepping Control of a Robotic Manipulator With Prescribed Performance Constraint. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3572–3583. [Google Scholar] [CrossRef]
  38. Zhu, C.; Jiang, Y.; Yang, C. Fixed-Time Neural Control of Robot Manipulator With Global Stability and Guaranteed Transient Performance. IEEE Trans. Ind. Electron. 2023, 70, 803–812. [Google Scholar] [CrossRef]
  39. Liu, X.; Zhang, H.; Sun, J.; Guo, X. Dynamic Threshold Finite-Time Prescribed Performance Control for Nonlinear Systems With Dead-Zone Output. IEEE Trans. Cybern. 2024, 54, 655–664. [Google Scholar] [CrossRef] [PubMed]
  40. Xu, Z.; Zhao, L. Distributed Adaptive Gain-Varying Finite-Time Event-Triggered Control for Multiple Robot Manipulators With Disturbances. IEEE Trans. Ind. Inform. 2023, 19, 9302–9313. [Google Scholar] [CrossRef]
  41. Ma, Z.; Ma, H. Adaptive Fuzzy Backstepping Dynamic Surface Control of Strict-Feedback Fractional-Order Uncertain Nonlinear Systems. IEEE Trans. Fuzzy Syst. 2020, 28, 122–133. [Google Scholar] [CrossRef]
  42. Mei, K.; Ding, S.; Dai, X.; Chen, C.C. Design of Second-Order Sliding-Mode Controller via Output Feedback. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 4371–4380. [Google Scholar] [CrossRef]
  43. Ma, Q.; Jin, P.; Lewis, F.L. Guaranteed Cost Attitude Tracking Control for Uncertain Quadrotor Unmanned Aerial Vehicle Under Safety Constraints. IEEE/CAA J. Autom. Sin. 2024, 11, 1447–1457. [Google Scholar] [CrossRef]
  44. Lewis, F.; Yesildirak, A.; Jagannathan, S. Neural Network Control of Robot Manipulators and Non-Linear Systems; Taylor & Francis: London, UK, 1998. [Google Scholar]
Figure 1. Overall control scheme of the proposed method.
Figure 1. Overall control scheme of the proposed method.
Sensors 25 05410 g001
Figure 2. The critic-only NN structure.
Figure 2. The critic-only NN structure.
Sensors 25 05410 g002
Figure 3. Interaction of multiple robotic manipulators with two degrees of freedom.
Figure 3. Interaction of multiple robotic manipulators with two degrees of freedom.
Sensors 25 05410 g003
Figure 4. Multiple robotic manipulator systems with a coordinate diagram.
Figure 4. Multiple robotic manipulator systems with a coordinate diagram.
Sensors 25 05410 g004
Figure 5. Angular consistency effect.
Figure 5. Angular consistency effect.
Sensors 25 05410 g005
Figure 6. Trajectory errors of the system.
Figure 6. Trajectory errors of the system.
Sensors 25 05410 g006
Figure 7. Control inputs to the system.
Figure 7. Control inputs to the system.
Sensors 25 05410 g007
Figure 8. Observation errors of the system.
Figure 8. Observation errors of the system.
Sensors 25 05410 g008
Figure 9. Learning process of the critic-only NN weights for robotic manipulator 1.
Figure 9. Learning process of the critic-only NN weights for robotic manipulator 1.
Sensors 25 05410 g009
Figure 10. Comparison of trajectory convergence velocities from 0 to 10 s for the four strategies.
Figure 10. Comparison of trajectory convergence velocities from 0 to 10 s for the four strategies.
Sensors 25 05410 g010
Figure 11. Comparison of trajectory convergence velocities from 0 to 10 s for the four strategies.
Figure 11. Comparison of trajectory convergence velocities from 0 to 10 s for the four strategies.
Sensors 25 05410 g011
Figure 12. Comparison of the error convergence domain at 20–350 s for the four strategies.
Figure 12. Comparison of the error convergence domain at 20–350 s for the four strategies.
Sensors 25 05410 g012
Figure 13. Comparison of the error convergence domain at 20–350 s for the four strategies.
Figure 13. Comparison of the error convergence domain at 20–350 s for the four strategies.
Sensors 25 05410 g013
Figure 14. Comparison of control inputs in the range of 20–350 s for the four strategies.
Figure 14. Comparison of control inputs in the range of 20–350 s for the four strategies.
Sensors 25 05410 g014
Figure 15. Comparison of control inputs in the range of 20–350 s for the four strategies.
Figure 15. Comparison of control inputs in the range of 20–350 s for the four strategies.
Sensors 25 05410 g015
Figure 16. Comparison of the convergence velocity for the two strategies.
Figure 16. Comparison of the convergence velocity for the two strategies.
Sensors 25 05410 g016
Figure 17. Comparison of the convergence velocity for the two strategies.
Figure 17. Comparison of the convergence velocity for the two strategies.
Sensors 25 05410 g017
Figure 18. Comparison of the convergence domain for the two strategies.
Figure 18. Comparison of the convergence domain for the two strategies.
Sensors 25 05410 g018
Figure 19. Comparison of the convergence domain for the two strategies.
Figure 19. Comparison of the convergence domain for the two strategies.
Sensors 25 05410 g019
Table 1. Abbreviated table.
Table 1. Abbreviated table.
ParametersSignificanceParametersSignificance
q i , s position i , s performance index
q ˙ i , s velocity N i , s cost function
q ¨ i , s acceleration vectors H i , s Hamilton–Jacobi–Bellman
M i inertia matrix C i , s 0 unknown continuous function
C i centripetal and Coriolis force term τ i , s / τ ^ i , s ideal/estimation of control input
G i gravity vector C i , s 0 unknown continuous function
τ i , s input torque vectors E i , s angle error
Γ i external disturbance ξ i , s approximation error
τ h i , s input of fault model φ i , s basis function vector
Φ i , s bias fault of actuator p i , s RL Bellman residuals
i , s additive fault of actuator γ ^ 1 / γ ^ 2 adaptive parameters
q L , s leader’s trajectory V Lyapunov function
v 0 leader’s velocity a i j communication among agents
h 0 leader’s acceleration β i , s auxiliary function
q ^ i , s follower’s observer trajectory Z i , s conversion error
v i , s follower’s observer velocity σ i , s sliding mode variable
h i , s follower’s observer acceleration set of real numbers
θ / θ ^ / θ ˜ ideal/estimation/error of adaptive parameters b i , s communication among leader and agents
E t i , s quantization error * Euclidean norm
n n -dimensional Euclidean space n × n n × n -dimensional Euclidean space
I identity matrix
Table 2. Model parameters of multiple robotic manipulators.
Table 2. Model parameters of multiple robotic manipulators.
RMS J i , 1 k g m 2 J i , 2 k g m 2 m i , 1 k g m i , 2 k g r i , 1 m r i , 2 m r i , a 1 m r i , a 2 m
R-10.60.81.52.01.71.52.22.8
R-20.50.61.11.21.81.52.22.8
R-30.60.72.12.02.92.31.00.8
R-40.80.92.01.92.62.61.30.9
R-50.70.83.93.03.73.43.13.8
R-60.50.63.23.73.83.83.43.9
R-70.90.92.82.12.62.82.31.8
Table 3. Initial parameters and initial conditions of the system.
Table 3. Initial parameters and initial conditions of the system.
System state q L = [ 0.3 sin ( 0.08 t ) + 0.2 , 0.3 sin ( 0.08 t ) 0.2 ] T , G i ( q i ) = [ 9.8 , 9.8 ] T , q ^ i ( 0 ) = [ 0.2 , 0.2 ] T , q i ( 0 ) = [ 2 , 2 ] T , q ˙ i ( 0 ) = [ 0 , 0 ] T , q ¨ i ( 0 ) = [ 0 , 0 ] T , v i ( 0 ) = [ 0 , 0 ] T , h i ( 0 ) = [ 0 , 0 ] T , Γ i = [ 0.25 sin 3 t , 0.25 sin 3 t ] T
Observer g 1 = 5 , g 2 = 4 , g 3 = 7 , g 4 = 8
QPPC β o i , s = 10 , β i , s = 0.5 , T a = 20
AGITSM a 1 = 10 , a 2 = 7 , b 3 = 0.2 , c 1 = 2 , c 2 = 2
CNNODP φ i , s = 0.1 q i , s q ˙ i , s q i , s q ^ ˙ i , s   0 . 1 q ˙ i , s q ˙ i , s   0 . 1 q ˙ i , s q ^ i , s   0 . 1 q ˙ i , s q ^ ˙ i , s q ^ i , s q ^ ˙ i , s q ^ ˙ i , s q ^ ˙ i , s T , λ i , s = 10 , p 0 , s = 1 , w = 0.1 , T w = 0.05 , L = 10
Fault control i , s = 0.05 + 0.05 sin ( 0.8 t ) , k 1 , 1 = 0.1 , k 1 , 2 = 50 , k 2 , 1 = 0.1 , k 2 , 2 = 20 , Φ i , s = 0.01 , γ ^ 1 i , s ( 0 ) = [ 0 , 0 ] T , γ ^ 2 i , s ( 0 ) = [ 0 , 0 ] T
Table 4. The parameters of the comparative experiment.
Table 4. The parameters of the comparative experiment.
CaseCase 1Case 2Case 3
Parameters k 1 = 7 , k 2 = k 3 = 2 k 4 = 0.2 , k 5 = 2 k 6 = 9 , k 8 = 7 , k 7 = k 9 = 2
Table 5. Comparison of three control performance indicators for the four strategies.
Table 5. Comparison of three control performance indicators for the four strategies.
StrategiesIAEITAEISV
Joint 1Joint 2Joint 1Joint 2Joint 1Joint 2
Case 13.84992.8982108.4324102.320343.331548.9875
Case 24.84163.3159156.3586124.054845.323151.1957
Case 34.58433.1525133.4075114.375249.818553.9525
AGITSMC3.60902.715296.686695.761341.568946.5755
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, X.; Yang, Z.; Liu, H.; Huang, X. Optimal Sliding Mode Fault-Tolerant Control for Multiple Robotic Manipulators via Critic-Only Dynamic Programming. Sensors 2025, 25, 5410. https://doi.org/10.3390/s25175410

AMA Style

Zhang X, Yang Z, Liu H, Huang X. Optimal Sliding Mode Fault-Tolerant Control for Multiple Robotic Manipulators via Critic-Only Dynamic Programming. Sensors. 2025; 25(17):5410. https://doi.org/10.3390/s25175410

Chicago/Turabian Style

Zhang, Xiaoguang, Zhou Yang, Haitao Liu, and Xin Huang. 2025. "Optimal Sliding Mode Fault-Tolerant Control for Multiple Robotic Manipulators via Critic-Only Dynamic Programming" Sensors 25, no. 17: 5410. https://doi.org/10.3390/s25175410

APA Style

Zhang, X., Yang, Z., Liu, H., & Huang, X. (2025). Optimal Sliding Mode Fault-Tolerant Control for Multiple Robotic Manipulators via Critic-Only Dynamic Programming. Sensors, 25(17), 5410. https://doi.org/10.3390/s25175410

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop