Next Article in Journal
Decomposing Juggling Skill into Sequencing, Prediction, and Accuracy: A Computational Model with Low-Gravity VR Training
Previous Article in Journal
Knowledge Distillation in Object Detection: A Survey from CNN to Transformer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Non-Singular Fast Terminal Sliding Mode Trajectory Tracking Control for Robotic Manipulator with Novel Configuration Based on TD3 Deep Reinforcement Learning and Nonlinear Disturbance Observer

1
School of Mechanical Engineering, State Key Laboratory of Advanced Equipment and Technology for Metal Forming, Key Laboratory of High-Efficiency and Clean Mechanical Manufacture of Ministry of Education, National Demonstration Center for Experimental Mechanical Engineering Education, Shandong University, Jinan 250061, China
2
Institute of Marine Science and Technology, Shandong Key Laboratory of Intelligent Marine Engineering Geology, Environment and Equipment, Shandong University, Qingdao 266237, China
*
Author to whom correspondence should be addressed.
Sensors 2026, 26(1), 297; https://doi.org/10.3390/s26010297
Submission received: 3 November 2025 / Revised: 27 November 2025 / Accepted: 30 December 2025 / Published: 2 January 2026
(This article belongs to the Section Sensors and Robotics)

Abstract

This work proposes a non-singular fast terminal sliding mode control (NFTSMC) strategy based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm and a nonlinear disturbance observer (NDO) to address the issues of modeling errors, motion disturbances, and transmission friction in robotic manipulators. Firstly, a novel modular serial 5-DOF robotic manipulator configuration is designed, and its kinematic and dynamic models are established. Secondly, a nonlinear disturbance observer is employed to estimate the total disturbance of the system and apply feedforward compensation. Based on boundary layer technology, an improved NFTSMC method is proposed to accelerate the convergence of tracking errors, reduce chattering, and avoid singularity issues inherent in traditional terminal sliding mode control. The stability of the designed control system is proved using Lyapunov stability theory. Subsequently, a deep reinforcement learning (DRL) agent based on the TD3 algorithm is trained to adaptively adjust the control gains of the non-singular fast terminal sliding mode controller. The dynamic information of the robotic manipulator is used as the input to the TD3 agent, which searches for optimal controller parameters within a continuous action space. A composite reward function is designed to ensure the stable and efficient learning of the TD3 agent. Finally, the motion characteristics of three joints for the designed 5-DOF robotic manipulator are analyzed. The results show that compared to the non-singular fast terminal sliding mode control algorithm based on a nonlinear disturbance observer (NDONFT), the non-singular fast terminal sliding mode control algorithm integrating a nonlinear disturbance observer and the Twin Delayed Deep Deterministic Policy Gradient algorithm (TD3NDONFT) reduces the mean absolute error of position tracking for the three joints by 7.14%, 19.94%, and 6.14%, respectively, and reduces the mean absolute error of velocity tracking by 1.78%, 9.10%, and 2.11%, respectively. These results verify the effectiveness of the proposed algorithm in enhancing the trajectory tracking accuracy of the robotic manipulator under unknown time-varying disturbances and demonstrate its strong robustness against sudden disturbances.

1. Introduction

As the core executive units of modern industrial automation systems, robotic manipulators have expanded their application scenarios from traditional structured environments such as welding processing [1] and material handling [2] to highly dynamic unstructured environments including surgical assistance [3], space exploration [4], and service robots [5]. During the execution of complex tasks, the end-effector of the robotic manipulator must accurately complete high-precision spatial movements along a predefined trajectory. However, robotic manipulators are highly complex multi-input multi-output systems characterized by strong nonlinearity, time-varying dynamics, and strong coupling [6]. Additionally, the structural parameters of the robotic manipulator are difficult to obtain accurately, and the dynamic model is partially unknown, resulting in significant uncertainties in the mathematical state-space model describing the motion of the robotic manipulators [7]. Functional configuration design and high-precision trajectory tracking control are effective approaches to address these challenges.
To better achieve trajectory tracking control of robotic manipulators, researchers have proposed various control methods, including fuzzy control [8,9], sliding mode control [10,11], neural network control [12,13], reinforcement learning control [14,15], and model predictive control [16,17].
Xian et al. [18] established a dynamic model for coordinated robotic manipulators using fuzzy set theory and developed an approximate constrained following servo control to ensure the consistent boundedness and consistent ultimate boundedness of the controlled system. They selected optimal parameters by solving fuzzy-based performance metrics. Zhang et al. [19] proposed a three-dimensional fuzzy active disturbance rejection controller for the mechanical arm of an iron roughneck by adding a three-dimensional fuzzy module to the classical ADRC. The controller output is adjusted based on tracking differential error, error rate of change, and error acceleration. Obadina et al. [20] developed a hybrid optimization algorithm for gray-box model identification of robotic manipulators and applied the model for real-time fuzzy trajectory tracking control. Jiang et al. [21] combined the Beetle Antenna Search (BAS) algorithm with Particle Swarm Optimization (PSO), introducing fractional calculus to dynamically adjust inertia weight and fractional order, achieving high-precision trajectory tracking control for robotic manipulators. However, fuzzy strategies overly rely on human experience, and parameter selection based on intelligent algorithms is prone to being trapped in local optima.
Sun et al. [22] employed radial basis function neural networks (RBFNN) to compensate online for uncertainty and unknown dynamics of system parameters. Chen et al. [23] optimized RBFNN parameters using an immune algorithm and designed adaptive rate compensation for unknown hysteresis errors similar to recoil and RBFNN approximation errors. However, the high computational complexity of neural network algorithms results in poor real-time tracking performance and presents physical implementation challenges. Carron et al. [24] combined inverse dynamics feedback linearization and data-driven error models with model predictive control. They processed offline data using Gaussian filtering and utilized extended Kalman filtering to estimate residual disturbances online, achieving bias-free tracking of the robotic manipulator. Kang [25] proposed an event-triggered model predictive control strategy for robotic manipulators with model uncertainty and input constraints. Trigger conditions were defined based on the weights of the predictive model and predicted tracking error. However, MPC is sensitive to system modeling accuracy, and its time-domain optimization mechanism only solves local optima within a finite time interval, making it difficult to ensure global asymptotic stability of the closed-loop system.
Sliding mode control has been widely applied in robotic manipulator control due to its simple physical implementation, strong robustness, and fast transient response [26]. Kali et al. [27] used a linear sliding mode surface for the trajectory tracking control of robotic manipulators, but this method only guarantees asymptotic convergence of the state [28]. Rapid convergence requires high gains, whereas terminal sliding mode control can ensure system convergence within finite time. Xu et al. [29] designed a discrete integral terminal sliding mode control law that incorporates delayed estimation of unknown disturbances and discretization errors in the robotic manipulator system and introduces an adaptive switching term to suppress sliding mode chattering effects. However, the chattering issue in sliding mode control can cause actuator damage or even system instability. To reduce or eliminate chattering, Makrini et al. [30] designed a variable boundary layer (BLT) sliding mode control method, which achieves joint safety and high-performance control by adjusting torque limit parameters and the expansion factor of the variable boundary layer.
Additionally, since certain system states are unmeasurable in practical applications, system uncertainty can affect transient performance and even lead to instability in robotic manipulator systems. To better achieve disturbance compensation, Fan et al. [31] proposed a fuzzy control strategy based on a spatial extended state observer to address the trajectory tracking control problem of robotic manipulator capturing a floating object in a microgravity environment. Yin et al. [32] proposed an adaptive non-singular terminal sliding mode control method based on NDO to observe the internal modeling errors of robotic manipulator and the external unknown time-varying disturbances acting on each joint for feedforward compensation. Zha et al. [33] estimated lumped uncertainties of the entire system using a hybrid observer composed of an adaptive time delay estimator based on gradient compensation and a second-order adaptive sliding mode observer. They then constructed an adaptive integral non-singular fast terminal sliding mode algorithm based on the backstepping method to stabilize the system and reduce chattering. Hu et al. [34] achieved real-time disturbance estimation and compensation by combining load torque estimation based on an improved harmonic drive flexibility model with friction compensation using a hybrid friction model, effectively improving trajectory tracking performance of the robotic manipulator under high-speed variable load conditions.
The aforementioned methods have achieved good results in robotic manipulators, but with the rapid development of artificial intelligence technology, data-driven machine learning approaches offer new insights for the intelligent tracking control of robotic manipulators. Reinforcement learning does not require the establishment of precise dynamic models, which makes it advantageous for solving sequential decision-making problems under highly nonlinear and uncertain conditions [35]. Viswanadhapalli et al. [36] utilized deep reinforcement learning control based on the Deep Deterministic Policy Gradient (DDPG) framework to achieve precise servo tracking of a flexible robotic manipulator, and evaluated the controller performance through hardware-in-the-loop (HIL) testing. Lu et al. [37] proposed an adaptive proportional-integral robust control method based on DDPG, which searches for the optimal controller parameters in a continuous action space using the dynamic information of the robotic manipulator. They also designed a reward function that combines a Gaussian function with Euclidean distance to ensure stable and efficient agent learning. Ren et al. [38] proposed an adaptive sliding mode control method based on DDPG reinforcement learning, leveraging RL autonomous learning capabilities to adaptively adjust the key parameters of the controller online. However, the DDPG strategy is susceptible to Q-value overestimation and insufficient exploration efficiency. The TD3 algorithm introduces a double Q-network, target policy noise, and delayed policy update techniques, achieving significant improvements in mitigating over-bias in action-value function estimation and enhancing policy stability. Zhu et al. [39] designed an adaptive sliding mode controller based on TD3 parameter optimization for variable-speed trajectory tracking of underactuated vessels in scenarios involving model uncertainty and external environmental disturbances. Fan et al. [40] addressed path tracking of unmanned underwater vehicles by integrating an improved experience replay strategy into TD3 while enhancing learning efficiency through refined regularization methods and dynamic reward functions, achieving faster convergence and superior tracking performance compared with mainstream classical DRL approaches.
Due to the highly nonlinear, multivariable strong coupling and the difficulty of accurately obtaining physical parameters, establishing an accurate dynamic model of robotic manipulators is extremely challenging. At the same time, the time-varying and uncertain nature of friction and external disturbances further complicates modeling. Although traditional control methods achieve satisfactory performance in known systems, their adaptability remains insufficient for unknown or uncertain systems. In order to address the trajectory tracking problem of robotic manipulator under unknown time-varying disturbances and modeling uncertainties, this paper proposes an improved NFTSMC method based on the TD3 algorithm and NDO. The main contributions are summarized as follows:
  • For tasks such as precision assembly and high-accuracy positioning that impose strict requirements on position and velocity control, a novel 5-DOF robotic manipulator configuration is designed, which reduces the end-effector load by positioning the joint motors at the front and simplifies dynamic model computation by introducing prismatic joint.
  • A nonlinear disturbance observer is employed to estimate internal modeling errors and external unknown time-varying disturbances of the robotic manipulator, followed by feedforward compensation. An improved nonsingular fast terminal sliding mode control law is designed based on boundary layer technique, and global system stability is analyzed using Lyapunov theory.
  • Adaptive control is achieved through DRL by proposing an adaptive NDONFT control method based on the TD3 algorithm, which employs a dual Q-network structure and selects the minimum Q-value to effectively mitigate value overestimation in DRL. The learning efficiency and stability of the agent are enhanced by modifying the reward function, thereby avoiding convergence to local optima.
  • Using the three joints of the designed 5-DOF robotic manipulator as examples, this paper verifies that the proposed method achieves excellent trajectory tracking performance and demonstrates stronger robustness against external unknown sudden disturbances.
The remainder of this paper is organized as follows: Section 2 presents the system principles and mathematical models of the robotic manipulator; Section 3 introduces the design process and stability analysis of the TD3NDONFT algorithm-based controller; Section 4 analyzes the simulation results; and Section 5 summarizes the main contributions of this paper.

2. System Principles and Mathematical Models

2.1. System Principles

A 5-DOF robotic manipulator with a modular serial configuration is designed to meet the stringent position and velocity control requirements of precision assembly and high-accuracy positioning tasks, as shown in Figure 1a. The manipulator consists of a base, a lifting mechanism, a main arm, a secondary arm, and an end-effector gripper. The degrees of freedom of each joint are as follows: horizontal rotation of the base about a vertical axis (DOF1), linear translation of the lifting platform along a vertical rail (DOF2), pitch motion of the main arm about a horizontal axis (DOF3), independent pitch adjustment of the secondary arm about a parallel horizontal axis (DOF4), and rotation of the end-effector about a vertical axis (DOF5). DOF2 employs a screw-and-rail composite transmission mechanism, utilizing high-precision screw drives and low-friction rails to significantly enhance vertical positioning resolution and repeatability. The pitch joint of the secondary arm (DOF4) transmits power through two-stage bevel gear meshing, forming a front-mounted drive unit layout that effectively reduces end-effector inertia and enhances dynamic response capability. This arrangement accommodates high-speed, variable-acceleration trajectory tracking requirements. The overall symmetrical structure and compact transmission chain design reduce kinematic coupling effects and simplify the inverse kinematics model solution process. This configuration, through high-rigidity transmission, low-inertia drive, and decoupled motion chain design, lays the foundation for position closed-loop control in trajectory tracking.
In robotic manipulator modeling, the Denavit–Hartenberg (DH) method is a commonly used technique for describing the geometric structure and kinematic relationships of robotic manipulators. Compared to the standard DH method, the modified DH (MDH) method used in this study more accurately characterizes the geometric and kinematic features of the manipulator [41]. Table 1 lists the DH parameters of the manipulator based on the modified DH method. In this paper, the base coordinate system established using the MDH method is assumed to be the zero position, as shown in Figure 1b.

2.2. Dynamic Modeling

According to the Lagrange method, the dynamic model of an n-link robotic manipulator can be expressed as follows:
M ( q ) q ¨ + C ( q , q ˙ ) q ˙ + G ( q ) + F f ( q ˙ ) = τ + τ d
where q , q ˙ , q ¨ R n represent the joint position, velocity, and acceleration vectors of the robotic manipulator system, respectively. M ( q ) R n × n is the symmetric positive definite inertia matrix of the system; C ( q , q ˙ ) R n × n is the Coriolis and centrifugal force matrix; G ( q ) R n is the gravity vector; F f ( q ˙ ) R n is the friction vector; τ d R n represents external time-varying disturbances. τ R n is the torque input to the joints. Due to the complexity of the robotic manipulator structure, environmental variations, and measurement errors, it is generally difficult to obtain accurate values of M ( q ) , C ( q , q ˙ ) , G ( q ) in the dynamic equations. Therefore, M ( q ) , C ( q , q ˙ ) , G ( q ) is defined as:
M ( q ) = M 0 ( q ) + Δ M ( q ) C ( q , q ˙ ) = C 0 ( q , q ˙ ) + Δ C ( q , q ˙ ) G ( q ) = G 0 ( q ) + Δ G ( q )
where M 0 ( q ) , C 0 ( q , q ˙ ) , G 0 ( q ) represents the nominal value of the dynamic equation of the robotic manipulator, and Δ M ( q ) , Δ C ( q , q ˙ ) , Δ G ( q ) represents the uncertainty term of the dynamic equation, i.e., the modeling error. Therefore, Equation (1) can be expressed as follows:
M 0 ( q ) q ¨ + C 0 ( q , q ˙ ) q ˙ + G 0 ( q ) + Δ E ( q , q ˙ , q ¨ ) + F f ( q ˙ ) = τ + τ d
Δ E ( q , q ˙ , q ¨ ) = Δ M ( q ) q ¨ + Δ C ( q , q ˙ ) q ˙ + Δ G ( q )
where Δ E ( q , q ˙ , q ¨ ) is the total modeling error of the robotic manipulator system. Setting F ( q , q ˙ , q ¨ ) = τ d Δ E ( q , q ˙ , q ¨ ) F f ( q ˙ ) as the total disturbance of the system, including external disturbances, internal modeling errors, and friction. Equation (3) can be rewritten as:
M 0 ( q ) q ¨ + C 0 ( q , q ˙ ) q ˙ + G 0 ( q ) = τ + F ( q , q ˙ , q ¨ )
The purpose of this paper is to design a controller τ for the robotic manipulator system that enables the joint positions q = ( q 1 , q 2 , , q n ) T and joint velocities q ˙ = ( q ˙ 1 , q ˙ 2 , , q ˙ n ) T of an n-DOF robotic manipulator to achieve high-precision tracking of the desired trajectory positions q d = ( q d 1 , q d 2 , , q d n ) T and velocities q ˙ d = ( q ˙ d 1 , q ˙ d 2 , , q ˙ d n ) T under disturbances. To achieve this goal, the following assumptions [25,32] are made regarding system (1).
Assumption 1. 
For q R n , the symmetric positive definite inertia matrix M ( q ) is uniformly bounded, and there exists a normal constant m 1 , m 2 such that the following inequality holds:
m 1 x 2 x T M ( q ) x m 2 x 2
Assumption 2. 
For q R n , M ˙ ( q ) 2 C ( q , q ˙ ) is a skew-symmetric matrix, it satisfies the following equation:
x T M ˙ ( q ) 2 C ( q , q ˙ ) x = 0 , x R n
Assumption 3. 
In practice, for q R n , the gravity matrix satisfies, i.e., is always bounded.

3. TD3NDONFT Controller Design

In order to achieve trajectory tracking control of robotic manipulators under complex environmental influences, this paper proposes a non-singular fast terminal sliding mode control method based on NDO and DRL, as shown in Figure 2. The NDO is used to estimate and compensate for the lumped disturbance, while the autonomous learning capability of DRL is employed to learn the controller parameters.
In the n-link robotic manipulator system, let the desired joint position, velocity, and acceleration be q d , q ˙ d , q ¨ d , respectively, and the actual joint position, velocity, and acceleration be q , q ˙ , q ¨ , respectively. Then, the tracking error and its derivative are defined as follows:
e = q d q e ˙ = q ˙ d q ˙ e ¨ = q ¨ d q ¨
The error state variables ε 1 and ε 2 are constructed based on the position error e , velocity error e ˙ , and acceleration error e ¨ of the robotic manipulator joints, that is ε 1 = e , ε 2 = ε ˙ 1 = e ˙ , ε ˙ 2 = e ¨ . Then, the error dynamics equation of the manipulator can be written as:
ε ˙ 1 = ε 2 ε ˙ 2 = M 0 1 ( q ) τ C 0 ( q , q ˙ ) q ˙ G 0 ( q ) + F ( q , q ˙ , q ¨ ) + q ¨ d

3.1. Design of Nonlinear Disturbance Observer

The disturbance is estimated by correcting the estimated value based on the difference between the estimated output and the actual output. Combining with Equation (5), the following nonlinear disturbance observer is designed:
F ^ ˙ ( q , q ˙ , q ¨ ) = L ( q , q ˙ ) ( F ( q , q ˙ , q ¨ ) F ^ ( q , q ˙ , q ¨ ) )   = L ( q , q ˙ ) F ^ ( q , q ˙ , q ¨ ) + L ( q , q ˙ ) ( M 0 ( q ) q ¨ + C 0 ( q , q ˙ ) q ˙ + G 0 ( q ) τ )
where L ( q , q ˙ ) is the gain matrix of the observer.
Since Equation (10) involves acceleration signals that cannot be accurately obtained by directly differentiating velocity signals in practical applications. This is because velocity signals inevitably contain sensor noise, quantization errors, and sampling jitter. Differentiation inherently amplifies high-frequency components, which significantly increases the influence of noise and results in severe distortion of the estimated acceleration. Furthermore, due to the high coupling of the manipulator system along with external disturbances, a nonlinear disturbance observer needs to be further designed to accurately estimate external disturbances in the manipulator system and feed them back to the controller for disturbance compensation.
Define auxiliary parameter variables as:
z = F ^ ( q , q ˙ , q ¨ ) p ( q , q ˙ )
where p ( q , q ˙ ) is the designed function vector, according to [42], the gain matrix L ( q , q ˙ ) and function vector p ( q , q ˙ ) of the designed disturbance observer are given by:
L ( q , q ˙ ) = X 1 M 0 1 ( q ) p ( q , q ˙ ) = X 1 q ˙  
where X is an invertible matrix.
Substituting Equation (12) into Equation (11) and differentiating yields:
z ˙ = F ^ ˙ ( q , q ˙ , q ¨ ) p ˙ ( q , q ˙ ) = F ^ ˙ ( q , q ˙ , q ¨ ) L ( q , q ˙ ) M 0 ( q ) q ¨
Substituting Equation (10) into Equation (13) yields:
z ˙ = L ( q , q ˙ ) M 0 ( q ) q ¨ + C 0 ( q , q ˙ ) q ˙ + G 0 ( q ) τ L ( q , q ˙ ) F ^ ( q , q ˙ , q ¨ ) L ( q , q ˙ ) M 0 ( q ) q ¨   = L ( q , q ˙ ) C 0 ( q , q ˙ ) q ˙ + G 0 ( q ) τ L ( q , q ˙ ) F ^ ( q , q ˙ , q ¨ )
In summary, Equations (11)–(14) constitute the designed nonlinear disturbance observer.
In order to enable the nonlinear disturbance observer to accurately estimate disturbances, the linear matrix inequality (LMI) method is introduced to solve L ( q , q ˙ ) , p ( q , q ˙ ) . For convenience in derivation, the symbol is abbreviated as L ( q , q ˙ ) = L , p ( q , q ˙ ) = p ,   F ( q , q ˙ , q ¨ ) = F in the following steps.
To prove that the observation error converges asymptotically, define the Lyapunov function as:
V 0 = F ˜ T X T M 0 ( q ) X F ˜
where F ˜ = F F ^ is the observation error, M 0 ( q ) = M 0 T ( q ) > 0 .
The derivative of the above Lyapunov function is:
V ˙ 0 = F ˜ ˙ T X T M 0 ( q ) X F ˜ + F ˜ T X T M ˙ 0 ( q ) X F ˜ + F ˜ T X T M 0 ( q ) X F ˜ ˙
According to Equations (11) and (14), the derivative of the observation error is:
F ˜ ˙ = F ˙ F ^ ˙ = F ˙ z ˙ p ˙   = F ˙ L C 0 ( q , q ˙ ) q ˙ + G 0 ( q ) τ + L F ^ L M 0 ( q ) q ¨   = F ˙ + L F ^ L M 0 ( q ) q ¨ + C 0 ( q , q ˙ ) q ˙ + G 0 ( q ) τ   = F ˙ + L F ^ L F   = F ˙ L F ˜
Generally, without prior knowledge of F ˙ , it is assumed that the dynamic disturbance relative to the observer changes slowly [43], that is F ˙ = 0 . Further, the observer error equation is obtained as:
F ˜ ˙ = L F ˜ = X 1 M 0 1 ( q ) F ˜ F ˜ ˙ T = X 1 M 0 1 ( q ) F ˜ T = F ˜ T M 0 T ( q ) X T
Substituting Equation (18) into Equation (16) yields:
V ˙ 0 = F ˜ ˙ T X T M 0 ( q ) X F ˜ + F ˜ T X T M ˙ 0 ( q ) X F ˜ + F ˜ T X T M 0 ( q ) X F ˜ ˙   = F ˜ T M 0 T ( q ) X T X T M 0 ( q ) X F ˜ + F ˜ T X T M ˙ 0 ( q ) X F ˜ F ˜ T X T M 0 ( q ) X X 1 M 0 1 ( q ) F ˜   = F ˜ T X F ˜ + F ˜ T X T M ˙ 0 ( q ) X F ˜ F ˜ T X T F ˜   = F ˜ T X X T M ˙ 0 ( q ) X + X T F ˜
Construct the following inequality:
X + X T X T M ˙ 0 ( q ) X γ
where γ > 0 is a symmetric positive definite matrix, there exists γ > 0 such that the following equation holds:
V ˙ 0 F ˜ T γ F ˜ = γ V 0
It can be seen that the disturbance observer converges exponentially.
Since Equation (20) contains nonlinear terms, it must be converted to a linear matrix inequality to be solved. Define matrix Y = X 1 , then Equation (20) can be transformed into:
Y T + Y M ˙ 0 ( q ) Y T γ Y
Since M ˙ 0 ( q ) ζ , then M ˙ 0 ( q ) ζ I , the sufficient condition for the above equation to hold is Y T + Y ζ I Y T γ Y 0 , which is equivalent to:
Y T + Y ζ I Y T Y γ 1 0
Based on the value of Y obtained from the solution, the observer gain matrix L ( q , q ˙ ) and auxiliary variable p ( q , q ˙ ) can be calculated.

3.2. NFTSM Controller Design

In traditional sliding mode control, the sliding surface is generally selected as a linear sliding surface. When the system state reaches the sliding surface, the linear sliding surface ensures that the error converges asymptotically to zero but does not guarantee finite-time convergence. In contrast, the terminal sliding surface guarantees that the error variable converges to zero within finite time.
Therefore, this paper adopts a non-singular fast terminal sliding mode surface [44]:
r = e ˙ + α e + β e p / q s i g n ( e )
where α , β > 0 , p , q are all positive odd numbers, and p > q , s i g n ( ) is the sign function. The derivative of Equation (24) with respect to time is as follows:
r ˙ = e ¨ + α e ˙ + β p q e p / q 1 e ˙
Combining Equations (9) and (25) yields:
r ˙ = M 0 1 ( q ) τ C 0 ( q , q ˙ ) q ˙ G 0 ( q ) + F ( q , q ˙ , q ¨ ) + q ¨ d + α e ˙ + β p q e p / q 1 e ˙
Multiply both sides by the matrix M 0 ( q ) to obtain:
M 0 ( q ) r ˙ = τ C 0 ( q , q ˙ ) q ˙ G 0 ( q ) + F ( q , q ˙ , q ¨ ) + M 0 ( q ) q ¨ d + α e ˙ + β p q e p / q 1 e ˙
Because the acceleration vector q ¨ is difficult to obtain in practice, and to avoid its appearance in the control law, define the variable:
q ˙ r = r + q ˙ = q ˙ d + α e + β e p / q s i g n ( e ) q ¨ r = r ˙ + q ¨ = q ¨ d + α e ˙ + β p q e p / q 1 e ˙
Substituting Equation (28) into Equation (27) yields:
τ = M 0 ( q ) q ¨ r + C 0 ( q , q ˙ ) q ˙ + G 0 ( q ) M 0 ( q ) r ˙ F ( q , q ˙ , q ¨ )   = M 0 ( q ) q ¨ r + C 0 ( q , q ˙ ) q ˙ r + G 0 ( q ) M 0 ( q ) r ˙ C 0 ( q , q ˙ ) r F ( q , q ˙ , q ¨ )
According to the selected non-singular fast terminal sliding mode surface, the controller is formulated as follows:
τ = τ m + τ s + τ r τ m = M 0 ( q ) q ¨ r + C 0 ( q , q ˙ ) q ˙ r + G 0 ( q ) τ s = K p r + K i 0 t r d t τ r = K r s i g n ( r )
τ m is the equivalent controller obtained when the system reaches the sliding surface r = r ˙ = 0 , and at this time the external disturbance is zero. K p and K i are the proportional gain and integral gain, respectively, with K p > 0 , K i > 0 . K p r is the sliding mode control term, which ensures that the system reaches stability. K i 0 t r d t is employed to further eliminate errors and external disturbances. τ r is a robust controller primarily used to suppress external disturbances outside the nonlinear disturbance observer and internal model reconstruction errors, thereby enhancing disturbance rejection capability.
At the same time, by introducing a nonlinear disturbance observer to estimate the disturbances, the control law after adding its compensation is:
τ = τ m + τ s + τ r F ^ = M 0 ( q ) q ¨ r + C 0 ( q , q ˙ ) q ˙ r + G 0 ( q ) + K p r + K i 0 t r d t + K r s i g n ( r ) F ^
From Equations (29) and (31), we can derive that:
M 0 ( q ) r ˙ + C 0 ( q , q ˙ ) r + K i 0 t r d t = K p r K r s i g n ( r ) + F ^ F
In order to limit the control signal to the feasible range of the actuator and avoid output overshoot, thereby preventing input saturation, a saturation function s a t ( r ) is introduced to replace the sign function in the robust controller. The saturation function maintains linearity or smooth variation when the error is small, thereby reducing chattering, and gradually saturates when the error is large, thus preventing shocks caused by abrupt changes. The saturation function is defined as follows:
s a t ( r ) = s i g n ( r )         i f | r | h 1 | r | | r | + h s i g n ( r )         i f | r | h < 1
where h is a positive constant representing the thickness of the boundary layer.

3.3. Stability Analysis of the Control System and Proof of Finite-Time Convergence

In the stability analysis of control systems, we consider a Lyapunov function in integral form:
V 1 = 1 2 r T M 0 r + 1 2 0 t r d τ T K i 0 t r d τ + F ˜ T X T M 0 X F ˜
Substituting Equation (15) into Equation (34), we obtain the derivative of V 1 with respect to time is:
V ˙ 1 = r T M 0 ( q ) r ˙ + 1 2 M ˙ 0 ( q ) r + K i 0 t r d τ + V ˙ 0
Considering the skew-symmetric property of the robotic manipulator dynamic equation, we have r T ( M ˙ 0 ( q ) 2 C 0 ( q , q ˙ ) ) r = 0 , and substituting Equation (32) into Equation (35), we derive:
V ˙ 1 = r T [ M 0 ( q ) r ˙ + C 0 ( q , q ˙ ) r + K i 0 t r d τ ] + V ˙ 0   = r T K p r K r s i g n ( r ) + F ^ F + V ˙ 0   = r T K p r K r s i g n ( r ) F ˜ + V ˙ 0
Substituting Equation (21) into Equation (36) and simplifying yields V ˙ 1 r T K p r K r r r F ˜ γ V 0 0 . According to Lyapunov stability theory, the system state converges asymptotically to the sliding surface r = 0 . According to LaSalle’s invariance principle, when t , r 0 , that is, e 0 , e ˙ 0 .
To prove that the system state converges to the sliding surface within a finite time, consider the Lyapunov function V 2 = 1 2 r T M 0 ( q ) r , and differentiate it to obtain:
V ˙ 2 = r T ( K p r K r s i g n ( r ) K i 0 t r d τ + F ^ F ) r T K p r r T K r s i g n ( r ) r T F ˜ K r r
Therefore V ˙ 2 = d V 2 d t 2 K r V 2 1 / 2 , we obtain d t d V 2 2 K r V 2 1 / 2 , let t r denote the system convergence time, and V 2 ( 0 ) represent the initial state, at this time V 2 t r = 0 . Integrating the above yields:
0 t r d t V 2 ( 0 ) V 2 ( t r ) d V 2 2 K r V 2 1 / 2 = 2 K r V 2 1 / 2 V 2 ( 0 ) V 2 ( t r )
The convergence time is calculated as t r 2 V 2 1 / 2 ( 0 ) K r , The trajectory tracking error of the robotic manipulator system tends to zero within finite time.
When | r | h , the saturation function behaves as the sign function s i g n ( r ) , which does not affect the convergence of the control law. When | r | < h , the convergence speed decreases to some extent, but r has been controlled within a certain range. The robotic manipulator system achieves the desired convergence by adjusting h , so the saturation function does not affect the final convergence properties of the system [34].

3.4. Deep Reinforcement Learning TD3 Adaptive Control

Reinforcement learning is a key branch of machine learning that obtains control strategies through interaction with the environment. RL problems are usually formulated as a Markovian decision process (MDP), which mainly includes the environment, agent, reward, state, and action [40]. Specifically, at each time step t , the robotic manipulator agent selects an action a t based on the current state s t , and the environment provides a reward r t quantifying the performance of the robotic manipulator. The ultimate goal of RL is to find an effective control strategy π ( s ) that maximizes the cumulative long-term return R t k = t = t k γ t t k r t , where γ 0 , 1 is the discount factor indicating the importance of future rewards, and t k represents the initial time step.
TD3 is an advanced DRL method designed to address control problems with continuous action spaces. It employs an actor-critic architecture, where the critic evaluates the value function Q π ( s t , a t ) of actions and states. Specifically, at time step t , Q π ( s t , a t ) represents the long-term reward obtained by taking action a t under policy π ( s ) . Under this formulation, the Q function satisfies:
Q π ( s t , a t ) = r t + γ Q π ( s t + 1 , π ( s t + 1 ) )
The ultimate goal of DRL is to learn an optimal control strategy π ( s ) through Equation (40) so that the value function is maximized.
π ( s ) = arg max π Q π ( s t , a t )
Generally, neural networks (NNs) are employed to solve Equations (39) and (40) within the actor-critic architecture, as shown in Figure 3. TD3 mainly consists of six NNs. Specifically, two evaluate critic networks are used to estimate the Q-functions, Q θ 1 with parameter θ 1 and Q θ 2 with parameter θ 2 . Additionally, an evaluate actor Q π with parameters ϕ is used for policy updates. Each actor and critic network are paired with a target network to ensure training stability, denoted by Q θ 1 , Q θ 2 and Q π . The update of the evaluate critics is expressed by the following loss function:
L ( θ j ) = 1 N i = 1 n ω i δ i 2
δ i = r i + γ min j = 1 , 2 Q θ j ( s i + 1 , π ϕ ( s i + 1 ) + ε ) Q θ j ( s i , π ϕ ( s i ) )
where N is the batch size selected transitions at each step, δ i is known as the time differencing error (TD error), and ω i is the importance sampling weight (ISW) of the i-th sample. The target value estimate is calculated by selecting the minimum estimated Q value to reduce the error overestimation problem in traditional Q learning. Additionally, to ensure smoother updates of the critics and prevent overfitting, a noise signal is added as a regularization term to the target actor, ε c l i p ( N ( 0 , σ ) , c ¯ , c ¯ ) , where σ is the noise standard deviation, N denotes a standard normal distribution, and c ¯ represents the noise clipping limit.
The evaluate actor updates by maximizing the learned Q-function and adopts the deterministic policy gradient for policy updates.
ϕ J ( ϕ ) = 1 N i = 1 N a Q θ ( s i , a i ) a = π ϕ ( s i ) ϕ π ϕ ( s i )
where J ( ϕ ) = ( 1 / N ) i = 1 N Q θ ( s i , a i ) a = π ϕ ( s i ) is the expected return to be maximized, a Q θ ( s i , a i ) a = π ϕ ( s i ) is the gradient of Q-value with respect to action, and ϕ π ϕ ( s i ) is the gradient with respect to the evaluate actor parameter ϕ . Additionally, to ensure a stable training process, the target network parameters are updated by soft updating to track the evaluated network.
θ ( 1 τ ) θ + τ ¯ θ ϕ ( 1 τ ) ϕ + τ ¯ ϕ
where τ ¯ 0 , 1 is the soft factor that determines the update rate of the target networks. It is worth noting that the update frequencies of the actor and critic networks in TD3 are inconsistent. Generally, the critic network updates more frequently than the actor network to minimize the Q-value error before introducing policy updates.

3.4.1. State Space Design

The state s t represents the environmental information perceived by the robotic manipulator agent. The objective is to design a controller that integrates sliding mode control based on disturbance observation compensation with TD3 to reduce the errors between the actual and target positions, as well as between the actual and desired velocities of the robotic manipulator. Based on the design of the NDONFT controller and considering the structural characteristics of the designed 5-DOF robotic manipulator, the first three joints are sufficient to cover most of the target workspace, while the two joints at the end are mainly used to adjust the posture of the actuator. Therefore, simulation is conducted on the three joints of the designed robotic manipulator, and the current state is defined as follows:
S 1 = { sin q 1 , cos q 1 , sin d 2 , cos d 2 , sin q 3 , cos q 3 , q ˙ 1 , d ˙ 2 , q ˙ 3 } S 2 = { e 1 , e 2 , e 3 , e 1 d t , e 2 d t , e 3 d t } S 3 = { a t 1 , F ^ 1 , F ^ 2 , F ^ 3 }
For S 1 , the observed values consist of the joint information of the robotic manipulator, specifically the sine and cosine of the joint position and the joint velocity. By replacing the joint position q i with sin ( q i ) and cos ( q i ) , the discontinuous position measurements can be expressed through continuous two-dimensional parameterization, while constraining the joint position to [ 1 , 1 ] . This approach greatly reduces the complexity of the DRL training process and facilitates training convergence. For S 2 , the observed values include the joint position error and the error integral. The error integral reflects the long-term deviation of the system, and its inclusion helps to reduce the steady-state error. For S 3 , the observed values comprise the action at the previous time step and the disturbances observed at each joint. Observing the action helps prevent training from converging to extreme values and enhances the stability of the strategy, whereas observing the disturbances enables rapid assessment of the tracking performance and timely adjustment of the strategy. It should be noted that the observations are normalized to prevent gradient explosion.

3.4.2. Action Space Design

Based on the designed nonlinear disturbance observer-based terminal sliding mode control method, the action space of DRL is defined as the proportional gain, integral gain, and robust gain of NDONFT, that is a = ( K p 1 , K i 1 , K r 1 , K p 2 , K i 2 , K r 2 , K p 3 , K i 3 , K r 3 ) .
The network structure of the critic and actor in TD3NDONFT is shown in Figure 4. The two critic networks share the same structure, with an input layer consisting of the state vector s t and the action vector a t . The four intermediate hidden layers are fully connected with 256, 128, 128, and 64 nodes, respectively. The output layer produces a one-dimensional Q-value evaluating the action. The input layer of the actor network is the state vector s t . The intermediate hidden layers are fully connected with 256, 128, 64, and 32 nodes, respectively. The output layer is a parameter vector representing the action vector a t . According to the action parameter constraints, the output parameters are limited by a sigmoid layer to restrict actions within (0, 1). Additionally, the actions are scaled up by applying gains during simulation.

3.4.3. Reward Function Design

During DRL training, the reward function plays a crucial role in effectively guiding the agent to meet task requirements. To address the problem of sparse rewards, a composite reward function integrating trajectory tracking control is designed, consisting of the following components.
(1) Tracking error: To reduce the tracking error of the robotic manipulator, a distance-based reward is established based on the distance between the current position and the desired position. The tracking error includes position tracking error and velocity tracking error, the reward is expressed as follows:
r e = { c 1 × i = 1 n e c e × | q d i ( t ) q i ( t ) | + c 2 × i = 1 n | q d i ( t ) q i ( t ) | c 3 × i = 1 n e c e × | q ˙ d i ( t ) q ˙ i ( t ) | + c 4 × i = 1 n | q ˙ d i ( t ) q ˙ i ( t ) |
where c 1 , c 2 , c 3 and c 4 are the weights assigned to the penalty components of position and velocity. As shown in Equation (46), when the position and velocity tracking errors approach zero, a higher reward is returned; when these errors are large, a lower reward is received.
(2) Influence of the previous time step: To reduce the influence of the previous control input on the current step and prevent actions from approaching extreme values that cause the agent to converge to a local optimum, the following equation is incorporated into the reward function:
r t = c 5 × i = 1 n a i ( t 1 ) 2
where c 5 is the weight assigned to penalizing the action.
(3) Boundary reward: when the joint position exceeds a predefined limit, the following reward function applies:
r s = 0 , i f | q i ( t ) | < ψ 100 , o t h e r
Through all the above designs, the reward for robotic manipulator trajectory tracking is summarized as r = r e + r t + r s .
Based on the above design, a NFTSMC method for robotic manipulators combining NDO and TD3 algorithm is proposed. The NDO estimates and compensates for lumped disturbances online, while the TD3 algorithm achieves parameter adaptation in terminal sliding mode control. Before training, random parameters are first used to initialize the evaluate critic network, evaluate actor network, target critic network, target actor network, and parameters in the experience replay buffer. Then, the robotic manipulator dynamics model is loaded as the environment. To better improve the adaptive performance of the controller, the desired trajectory of the robotic manipulator is randomly initialized at the beginning of each episode. At each time step, the current state s t of the robotic manipulator is input into the target network to obtain the action vector a t . The action vector a t is proportionally scaled and input into the sliding mode controller together with the disturbance estimate F ^ observed by the NDO. The reward r t at the current time step is calculated based on data including position error and velocity error, and then the next state s t + 1 of the robotic manipulator is obtained. The tuple data ( s t , a t , r t , s t + 1 ) generated during this process is stored in the experience replay buffer. When the buffer reaches the predefined size, a mini-batch of tuple data is randomly sampled to update the six NNs mentioned above.

4. Simulation Studies

The recommended workstation configuration and simulation environment for this study are as follows: operating system, Windows 11; processor, intel(R) Core (TM) i7-10700 CPU @ 2.90 GHz; RAM, 32.0 GB; 64-bit operating system; simulation software, MATLAB, Version: R2024a. We also used the Simulink environment and Simscape toolbox in MATLAB. The development code is in MATLAB language.
In this section, in order to verify the effectiveness of the proposed algorithm, the two end joints of the designed robotic manipulator are fixed, and simulation is performed using the initial three joints as an example. Simulink and Simscape toolboxes are used to build a simulation model of the controlled robotic manipulator. The simulation results validate the effectiveness of the designed control algorithm. The simulation model of the designed robotic manipulator trajectory tracking control algorithm is shown in Figure 5.
During the training process, the desired trajectory of the robotic manipulator is set as shown in Equation (49). Since the initial position of the prismatic joint is set at the base during dynamic modeling, an initial position offset exists for q 2 d .
q d = q 1 d q 2 d q 3 d = r a n d sin t π 8 0.2 + 0.06 r a n d sin t π 8 r a n d sin t π 8
The initial joint positions of the three joints for the robotic manipulator are set to q 1 ( 0 ) = 0 rad , q 2 ( 0 ) = 0.2 m , q 3 ( 0 ) = 0 rad . The initial desired positions can be calculated using Equation (49). The physical parameters of the designed manipulator are listed in Table 2.
To demonstrate the robustness of the proposed control scheme, modeling errors, friction, and external disturbances are introduced into the robotic manipulator system. In this study, 80% of the dynamic parameters correspond to the nominal model, while 20% represent modeling errors in the system dynamics.
M 0 ( q ) = 0.8 M ( q ) , Δ M ( q ) = 0.2 M ( q ) C 0 ( q , q ˙ ) = 0.8 C ( q , q ˙ ) , Δ C ( q , q ˙ ) = 0.2 C ( q , q ˙ ) G 0 ( q ) = 0.8 G ( q ) , Δ G ( q ) = 0.2 G ( q )
The disturbance τ d acting on each joint of the robotic manipulator is assumed to be time-varying and consists of three components:
τ d = τ d 1 + τ d 2 + τ d 3   = 5 sin ( t ) + sin ( 5 t ) + 0.5 sin ( q ˙ ) + 0.5 q ˙ + 10 sin ( t 5 )
where τ d 1 is a small time-varying disturbance, τ d 2 is joint friction, and τ d 3 is an unknown large time-varying disturbance introduced at 5s to verify the adaptability and robustness of the proposed control method.
The proposed algorithm is compared with non-singular fast terminal sliding mode control based on disturbance observer (NDONFT), non-singular fast terminal sliding mode control (NFTSM), NDOPID, and PID controllers to verify its effectiveness.
(1) PID: Proportional-integral-derivative controller, the control law is as follows:
τ = k p e + k i 0 t e d t + k d e ˙
For the PID controller, k p , k i , k d represent the proportional, integral, and derivative coefficients, respectively. The selection of parameters mainly relies on experience and tuning. The selection principle is to minimize oscillations during the initial position tracking phase while ensuring trajectory tracking accuracy.
(2) NDOPID: PID controller incorporating the nonlinear disturbance observer; the control law is as follows:
τ = k p e + k i 0 t e d t + k d e ˙ F ^
For the NDOPID controller, in order to ensure the fairness of the comparison, its basic parameters are selected based on those of the PID controller. The disturbance observer estimates disturbances to compensate the PID control.
(3) NFTSM: Non-singular fast terminal sliding mode control excluding disturbance compensation observed by the NDO and parameter adaptation via the TD3 algorithm. The control law is as follows:
τ = M 0 ( q ) q ¨ r + C 0 ( q , q ˙ ) q ˙ r + G 0 ( q ) + K p r + K i 0 t r d t + K r s a t ( r )
To ensure fairness in the controller comparison, the sliding mode coefficient α , β , p , q is set equal to those in the NDONFT and TD3NDONFT controllers, and the coefficient K p , K i , K r is set equal to that in the NDONFT controller.
(4) NDONFT: Non-singular fast terminal sliding mode controller based on the NDO, excluding the adaptive strategy of TD3. The control law is as follows:
τ = M 0 ( q ) q ¨ r + C 0 ( q , q ˙ ) q ˙ r + G 0 ( q ) + K p r + K i 0 t r d t + K r s a t ( r ) F ^
For the NDONFT controller, X 1 is an invertible matrix used to calculate the observer gain matrix and auxiliary variables. The same value of X 1 is used across controllers involving disturbance observers.
(5) TD3NDONFT: Non-singular fast terminal sliding mode control based on TD3 and NDO, with adaptive adjustment of control parameters achieved via the TD3 algorithm. The control law is as follows:
τ = M 0 ( q ) q ¨ r + C 0 ( q , q ˙ ) q ˙ r + G 0 ( q ) + K p t r + K i t 0 t r d t + K r t s a t ( r ) F ^
For the TD3NDONFT controller, the adaptive parameters K p t , K i t , K r t are obtained via the TD3 algorithm. Sensitivity analysis of the controller parameters is conducted to set the range of parameter variation during training. The training parameters of the TD3 algorithm are listed in Table 3.
The parameter settings of the five control laws mentioned above are listed in Table 4.
Figure 6 shows the reward curve of the TD3 algorithm presented in this paper, which serves as a key indicator for evaluating the training effectiveness of DRL algorithms. During the policy exploration phase, the curve shows considerable volatility with a low and unstable average reward level, indicating that the strategy of the agent has not yet effectively explored action sequences with higher rewards and remains in an exploratory learning phase. Subsequently, the performance enters an optimization phase, with the average reward showing an overall upward trend and volatility gradually decreasing, indicating the strategy begins to learn more effective action selection mechanisms, leading to significant performance improvement. In the strategy convergence phase, the average reward tends to stabilize and the standard deviation converges to a low level, indicating the strategy has reached a stable convergence state, demonstrating strong robustness and generalization capability.
To further compare the dynamic characteristics of the five controllers, this paper evaluates controller performance using the mean absolute error (MAE) of joint position and joint velocity, the integral absolute error (IAE), and the integral time absolute error (ITAE).
M A E = 1 t f 0 t f e ( t ) d t I A E = 0 t f e ( t ) d t I T A E = 0 t f t e ( t ) d t
where t f represents the total time.
After the training is completed, the desired trajectories of joint 1 and joint 3 of the robotic manipulator are set to q 1 d = q 3 d = 0.5 sin t π / 8 , and the desired trajectory of joint 2 is set to q 2 d = 0.2 + 0.05 sin t π / 8 for simulation. Figure 7 shows the simulation results of joint position tracking and tracking errors. All five algorithms ensure trajectory tracking within a certain error range before and after sudden disturbances. Under different control methods, the tracking errors of each joint fall within the ranges of [−0.03, 0.03] rad, [−0.01, 0.015] m, and [−0.04, 0.04] rad, respectively, demonstrating a certain degree of disturbance rejection under sudden disturbances. In the initial stage, the PID and NDOPID control methods exhibit significant overshoot, while the NFTSM, NDONFT, and TD3NDONFT controllers track the desired trajectory at a relatively fast speed. At 5 s, the joints experience a large unknown sudden disturbance. The maximum position tracking errors of the three joints after the disturbance are 0.0048 rad, 0.0014 m, and 0.0041 rad for the proposed TD3NDONFT algorithm; 0.0054 rad, 0.0016 m, and 0.0051 m for the NDONFT method; and 0.0188 rad, 0.0092 m, and 0.0393 rad for the NFTSM method. The results indicate that the proposed control strategy better suppresses external unknown time-varying disturbances.
The MAE indicators for position tracking under different control strategies are presented in Table 5. Both before and after applying sudden disturbances, the proposed control method demonstrates superior tracking performance. During 0–10 s, the MAE of joint positions using the TD3NDONFT algorithm reduce by 7.14%, 19.94%, and 6.14% compared to NDONFT, by 64.58%, 88.06%, and 84.53% compared to NFTSM, by 53.35%, 85.40%, and 63.43% compared to NDOPID, and by 70.60%, 88.48%, and 77.70% compared to PID, respectively. The NDONFT algorithm reduces the MAE by 49.76%, 81.77%, and 61.04% compared to NDOPID.
Figure 8 shows the joint velocity tracking of the robotic manipulator and its error. In the initial stage, the velocity tracking exhibits a distinct reverse sudden change. This occurs because the initial state of the robotic manipulator does not match the desired state, which causes the controller to overcompensate at this stage and subsequently enables it to quickly track the desired velocity. Analysis of the velocity tracking error in Figure 8 indicates that although the system velocity experiences significant disturbance after a sudden disturbance, all five control methods converge to the desired velocity. The maximum velocity tracking error of joint 1 after a sudden disturbance using the proposed TD3NDONFT algorithm is 0.0409 rad. 0.0436 rad using the NDONFT method, 0.0506 rad using the NFTSM method, 0.0602 rad using the NDOPID method, and 0.0691 rad using the PID method.
Table 6 shows the MAE performance indicators of velocity tracking under different control methods. Before and after applying sudden disturbances, the algorithm proposed maintains a smaller MAE of velocity after reaching the steady state. Within 0–10 s, the MAE of the velocity of each joint using the TD3NDONFT algorithm reduces by 1.78%, 9.10%, and 2.11% compared to NDONFT, by 19.26%, 58.01%, and 44.64% compared to NFTSM, by 14.18%, 37.37%, and 17.03% compared to NDOPID, and by 27.18%, 48.67%, and 32.45% compared to PID, respectively. This indicates that the proposed algorithm achieves high velocity tracking accuracy and robustness.
Figure 9 and Figure 10 show the IAE and ITAE indicators for position and velocity tracking using the five algorithms. The IAE indicators for the three joints of the proposed algorithm are the lowest in the 0–5 s, 5–10 s, and 0–10 s intervals, indicating that the TD3NDONFT algorithm achieves high position and velocity tracking accuracy before and after experiencing unknown time-varying disturbances. Next, the effectiveness of the TD3 adaptive algorithm, nonlinear disturbance observer, and improved non-singular fast terminal sliding mode within the proposed TD3NDONFT algorithm is analyzed.
Within 0–10 s, the TD3NDONFT algorithm reduces the IAE of joint position tracking by 7.16%, 19.97%, and 6.16% compared to NDONFT, and reduces the IAE of velocity tracking by 1.78%, 9.11%, and 2.11%, indicating that introducing the TD3 adaptive algorithm improves the tracking performance of the robotic manipulator. Compared to NFTSM, NDONFT reduces the IAE of position tracking for each joint by 61.91%, 85.11%, and 83.56%, and reduces the IAE of velocity tracking by 17.81%, 53.82%, and 43.47%, respectively. Compared to PID, NDOPID reduces the position tracking IAE by 37.01%, 21.09%, and 39.05%, and the velocity tracking IAE by 15.17%, 18.05%, and 18.60%, indicating the effectiveness of introducing the nonlinear disturbance observer. Compared to NDOPID, NDONFT reduces the position tracking IAE by 49.82%, 81.79%, and 61.11%, and the velocity tracking IAE by 12.63%, 31.11%, and 15.25%, demonstrating the effectiveness of the improved non-singular fast terminal sliding mode. At the same time, the ITAE of position and velocity tracking for the three joints using the TD3NDONFT algorithm is the lowest among the five algorithms.
The adaptive parameters change proposed in this paper are shown in Figure 11. During the variation in adaptive parameters, the proportional gain K p is initially small to generate low joint torque, thereby avoiding excessive initial torque, and then gradually increases to improve joint tracking capability. However, the prismatic joint must overcome its own weight at startup, so a larger initial proportional gain is required. The integral gain K i is larger for the prismatic joint than for the revolute joint to reduce the system steady-state error. Since the nonlinear disturbance observer has estimated most disturbances, only a small robust gain K r is required to compensate the remaining disturbance. At 5 s, when a sudden disturbance occurs, all parameters adjust accordingly to ensure tracking accuracy under unknown time-varying disturbances.
Figure 12 shows the control input torque for each joint of the robotic manipulator. In the initial stage, the PID and NDOPID algorithms exhibit large torque oscillations, which may adversely affect the joint actuator and may cause motor driver overload or failure. At 5s, when a sudden disturbance occurs, the torques of all five control methods fluctuate significantly. The torque response under the proposed TD3NDONFT algorithm is smoother, indicating that the method maintains good dynamic performance while ensuring steady-state accuracy.
Figure 13 shows the actual disturbances during trajectory tracking and the disturbances estimated by the nonlinear disturbance observer. The nonlinear disturbance observer within the proposed TD3NDONFT algorithm effectively estimates disturbances affecting each joint of the robotic manipulator. The joint disturbance estimation errors are shown in Figure 14.
To further validate the generalization capability of deep reinforcement learning, this paper introduces desired trajectories with more complex dynamic characteristics for testing. The desired trajectories for joint 1 and joint 3 of the robotic arm are set as q 1 d = q 3 d = 0.3 sin 2 t + 0.2 cos ( t ) , while the desired trajectory for joint 2 is set as q 2 d = 0.2 + 0.03 sin 2 t + 0.02 cos ( t ) . Figure 15 presents the position tracking curves and corresponding tracking errors of each joint under different control strategies, while Figure 16 illustrates the velocity tracking performance and velocity tracking errors of the joints. Experimental results demonstrate that the proposed method maintains excellent accuracy and robustness even in complex trajectory tracking tasks, fully reflecting its strong generalization performance.

5. Conclusions

This paper studies the trajectory tracking control problem of a novel robotic manipulator configuration. The main contributions are as follows:
  • For robotic manipulator system with modeling uncertainties, friction, and unknown external time-varying disturbances, an adaptive non-singular fast terminal sliding mode control strategy based on the Twin Delayed Deep Deterministic policy gradient algorithm and a nonlinear disturbance observer is proposed. Stability and finite-time convergence of the closed-loop system are established via Lyapunov analysis.
  • A nonlinear disturbance observer estimates the lumped uncertainty of the robot manipulator and provides feedforward compensation. Based on boundary layer techniques, the non-singular fast terminal sliding mode is modified to reduce chattering in sliding mode control. Adaptive control of the desired trajectory is achieved using the Twin Delayed Deep Deterministic policy gradient algorithm.
  • Training simulations are conducted using the designed 5-DOF robotic manipulator as an example. Convergence of training is ensured through the design of the observation space and reward function for the Twin Delayed Deep Deterministic policy gradient algorithm. The training process considers trajectory tracking accuracy under sudden disturbances to ensure that the robotic manipulator can handle emergency situations.
  • Using the trained agent, different control strategies are compared. Compared with PID, NDOPID, NFTSM, and NDONFT controllers, TD3NDONFT achieves higher trajectory tracking accuracy, lower MAE, IAE, and ITAE across time intervals, and stronger robustness against sudden disturbances. By randomizing the desired trajectory, the proposed algorithm exhibits improved generalization and attains higher tracking accuracy across different trajectory configurations.
This study provides a new approach for the development of robotic manipulators with novel configurations and the realization of high-precision trajectory tracking control. Although the simulation results are promising, a systematic comparison with other mainstream deep reinforcement learning algorithms is still lacking, and exploring a broader hyperparameter space to obtain more generalizable conclusions also represents a valuable direction for future research. In addition, the performance of robotic manipulators is affected by various factors such as sensor noise and communication delays in practical applications. Future work will focus on establishing an experimental platform to validate the effectiveness of the proposed algorithm and exploring its application in trajectory tracking tasks for other nonlinear systems.

Author Contributions

Conceptualization, H.Y. and Z.S.; methodology, H.Y.; software, Z.S.; validation, H.Y., Z.W. and L.W.; formal analysis, H.Y.; investigation, H.Y. and Z.S.; resources, G.X.; data curation, H.Y. and Z.W.; writing—original draft preparation, H.Y.; writing—review and editing, G.X. and Y.L.; visualization, H.Y.; supervision, Y.L.; project administration, G.X.; funding acquisition, G.X. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China (2023YFC2810100), the National Natural Science Foundation of China (52471331, 42188102).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, X.; Zhou, X.; Xia, Z.; Gu, X. A Survey of Welding Robot Intelligent Path Optimization. J. Manuf. Process. 2021, 63, 14–23. [Google Scholar] [CrossRef]
  2. Wu, X.; Chi, J.; Jin, X.-Z.; Deng, C. Reinforcement Learning Approach to the Control of Heavy Material Handling Manipulators for Agricultural Robots. Comput. Electr. Eng. 2022, 104, 108433. [Google Scholar] [CrossRef]
  3. Mancino, F.; Fontalis, A.; Grandhi, T.S.P.; Magan, A.; Plastow, R.; Kayani, B.; Haddad, F.S. Robotic Arm-Assisted Conversion of Unicompartmental Knee Arthroplasty to Total Knee Arthroplasty: Feasibility, Safety, and Clinical Outcomes. Bone Jt. J. 2024, 106-B, 680–687. [Google Scholar] [CrossRef]
  4. Li, Z.; Zhou, Y.; Zhu, M.; Wu, Q. Adaptive Fuzzy Integral Sliding Mode Cooperative Control Based on Time-Delay Estimation for Free-Floating Close-Chain Manipulators. Sensors 2024, 24, 3718. [Google Scholar] [CrossRef] [PubMed]
  5. Park, K.-H.; Lee, H.-E.; Kim, Y.; Bien, Z.Z. A Steward Robot for Human-Friendly Human-Machine Interaction in a Smart House Environment. IEEE Trans. Autom. Sci. Eng. 2008, 5, 21–25. [Google Scholar] [CrossRef]
  6. Kumar, A.; Kumar, V. Evolving an Interval Type-2 Fuzzy PID Controller for the Redundant Robotic Manipulator. Expert Syst. Appl. 2017, 73, 161–177. [Google Scholar] [CrossRef]
  7. Tian, G.; Tan, J.; Li, B.; Duan, G. Optimal Fully Actuated System Approach-Based Trajectory Tracking Control for Robot Manipulators. IEEE Trans. Cybern. 2024, 54, 7469–7478. [Google Scholar] [CrossRef]
  8. Chang, C.-W.; Tao, C.-W. Design of a Fuzzy Trajectory Tracking Controller for a Mobile Manipulator System. Soft Comput. 2024, 28, 5197–5211. [Google Scholar] [CrossRef]
  9. Shi, M.; Cheng, Y.; Rong, B.; Zhao, W.; Yao, Z.; Yu, C. Research on Vibration Suppression and Trajectory Tracking Control Strategy of a Flexible Link Manipulator. Appl. Math. Model. 2022, 110, 78–98. [Google Scholar] [CrossRef]
  10. Yang, J.; Wang, Y.; Wang, T.; Hu, Z.; Yang, X.; Rodriguez-Andina, J.J. Time-Delay Sliding Mode Control for Trajectory Tracking of Robot Manipulators. IEEE Trans. Ind. Electron. 2024, 71, 13083–13091. [Google Scholar] [CrossRef]
  11. Cui, S.; Song, H.; Zheng, T.; Dai, P. Trajectory Tracking Control of Mobile Manipulator Based on Improved Sliding Mode Control Algorithm. Processes 2024, 12, 881. [Google Scholar] [CrossRef]
  12. Li, T.; Zhang, G.; Zhang, T.; Pan, J. Adaptive Neural Network Tracking Control of Robotic Manipulators Based on Disturbance Observer. Processes 2024, 12, 499. [Google Scholar] [CrossRef]
  13. Li, H.; Hu, X.; Zhang, X.; Chen, H.; Li, Y. Adaptive Radial Basis Function Neural Network Sliding Mode Control of Robot Manipulator Based on Improved Genetic Algorithm. Int. J. Comput. Integr. Manuf. 2024, 37, 1025–1039. [Google Scholar] [CrossRef]
  14. Liu, Y.-C.; Huang, C.-Y. DDPG-Based Adaptive Robust Tracking Control for Aerial Manipulators With Decoupling Approach. IEEE Trans. Cybern. 2022, 52, 8258–8271. [Google Scholar] [CrossRef]
  15. Cao, S.; Sun, L.; Jiang, J.; Zuo, Z. Reinforcement Learning-Based Fixed-Time Trajectory Tracking Control for Uncertain Robotic Manipulators with Input Saturation. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 4584–4595. [Google Scholar] [CrossRef] [PubMed]
  16. Yan, J.; Jin, L.; Hu, B. Data-Driven Model Predictive Control for Redundant Manipulators With Unknown Model. IEEE Trans. Cybern. 2024, 54, 5901–5911. [Google Scholar] [CrossRef]
  17. Dai, L.; Yu, Y.; Zhai, D.-H.; Huang, T.; Xia, Y. Robust Model Predictive Tracking Control for Robot Manipulators With Disturbances. IEEE Trans. Ind. Electron. 2021, 68, 4288–4297. [Google Scholar] [CrossRef]
  18. Xian, Y.; Huang, K.; Zhen, S.; Wang, M.; Xiong, Y. Task-Driven-Based Robust Control Design and Fuzzy Optimization for Coordinated Robotic Arm Systems. Int. J. Fuzzy Syst. 2023, 25, 1579–1596. [Google Scholar] [CrossRef]
  19. Zhang, K.; Liu, Y.; Jia, H.; Yan, F.; Xue, G. Research on a Three-Dimensional Fuzzy Active Disturbance Rejection Controller for the Mechanical Arm of an Iron Roughneck. Processes 2023, 11, 1409. [Google Scholar] [CrossRef]
  20. Obadina, O.O.; Thaha, M.A.; Mohamed, Z.; Shaheed, M.H. Grey-Box Modelling and Fuzzy Logic Control of a Leader–Follower Robot Manipulator System: A Hybrid Grey Wolf–Whale Optimisation Approach. ISA Trans. 2022, 129, 572–593. [Google Scholar] [CrossRef]
  21. Jiang, Z.; Zhang, X.; Liu, G. Trajectory Tracking Control of a 6-DOF Robotic Arm Based on Improved FOPID. Int. J. Dyn. Control 2025, 13, 137. [Google Scholar] [CrossRef]
  22. Sun, Y.; Kuang, J.; Gao, Y.; Chen, W.; Wang, J.; Liu, J.; Wu, L. Fixed-Time Prescribed Performance Tracking Control for Manipulators against Input Saturation. Nonlinear Dyn. 2023, 111, 14077–14095. [Google Scholar] [CrossRef]
  23. Chen, J.; Zhang, H.; Zhu, T.; Pan, S. Trajectory Tracking Control of a Manipulator Based on an Immune Algorithm-Optimized Neural Network in the Presence of Unknown Backlash-like Hysteresis. Appl. Math. Comput. 2024, 470, 128552. [Google Scholar] [CrossRef]
  24. Carron, A.; Arcari, E.; Wermelinger, M.; Hewing, L.; Hutter, M.; Zeilinger, M.N. Data-Driven Model Predictive Control for Trajectory Tracking With a Robotic Arm. IEEE Robot. Autom. Lett. 2019, 4, 3758–3765. [Google Scholar] [CrossRef]
  25. Kang, E.; Qiao, H.; Chen, Z.; Gao, J. Tracking of Uncertain Robotic Manipulators Using Event-Triggered Model Predictive Control With Learning Terminal Cost. IEEE Trans. Autom. Sci. Eng. 2022, 19, 2801–2815. [Google Scholar] [CrossRef]
  26. Van, M.; Mavrovouniotis, M.; Ge, S.S. An Adaptive Backstepping Nonsingular Fast Terminal Sliding Mode Control for Robust Fault Tolerant Control of Robot Manipulators. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 1448–1458. [Google Scholar] [CrossRef]
  27. Kali, Y.; Saad, M.; Benjelloun, K.; Khairallah, C. Super-Twisting Algorithm with Time Delay Estimation for Uncertain Robot Manipulators. Nonlinear Dyn. 2018, 93, 557–569. [Google Scholar] [CrossRef]
  28. Qi, M.; Han, S.; Guo, G.; Liu, P.; Zhi, Y.; Zhao, Z. Adaptive Trajectory Tracking Control of Robotic Manipulators Based on Integral Sliding Mode. Asian J. Control. 2024, 27, 1361–1372. [Google Scholar] [CrossRef]
  29. Xu, J.; Sui, Z.; Wang, W.; Xu, F. An Adaptive Discrete Integral Terminal Sliding Mode Control Method for a Two-Joint Manipulator. Processes 2024, 12, 1106. [Google Scholar] [CrossRef]
  30. Makrini, I.E.; Rodriguez-Guerrero, C.; Lefeber, D.; Vanderborght, B. The Variable Boundary Layer Sliding Mode Control: A Safe and Performant Control for Compliant Joint Manipulators. IEEE Robot. Autom. Lett. 2017, 2, 187–192. [Google Scholar] [CrossRef]
  31. Fan, C.; Xie, Z.; Liu, Y.; Li, C.; Yu, C.; Liu, H. Manipulator Trajectory Tracking of Fuzzy Control Based on Spatial Extended State Observer. IEEE Access 2020, 8, 24296–24308. [Google Scholar] [CrossRef]
  32. Yin, S.; Shi, Z.; Liu, Y.; Xue, G.; You, H. Adaptive Non-Singular Terminal Sliding Mode Trajectory Tracking Control of Robotic Manipulators Based on Disturbance Observer Under Unknown Time-Varying Disturbance. Processes 2025, 13, 266. [Google Scholar] [CrossRef]
  33. Zha, M.; Wang, H.; Tian, Y.; He, D.; Wei, Y. A Novel Hybrid Observer-Based Model-Free Adaptive High-Order Terminal Sliding Mode Control for Robot Manipulators with Prescribed Performance. Int. J. Robust Nonlinear Control 2024, 34, 11655–11680. [Google Scholar] [CrossRef]
  34. Hu, S.; Wan, Y.; Liang, X. Adaptive Nonsingular Fast Terminal Sliding Mode Trajectory Tracking Control for Robotic Manipulators with Model Feedforward Compensation. Nonlinear Dyn. 2025, 113, 16893–16911. [Google Scholar] [CrossRef]
  35. Buşoniu, L.; de Bruin, T.; Tolić, D.; Kober, J.; Palunko, I. Reinforcement Learning for Control: Performance, Stability, and Deep Approximators. Annu. Rev. Control 2018, 46, 8–28. [Google Scholar] [CrossRef]
  36. Viswanadhapalli, J.K.; Elumalai, V.K.; Shivram, S.; Shah, S.; Mahajan, D. Deep Reinforcement Learning with Reward Shaping for Tracking Control and Vibration Suppression of Flexible Link Manipulator. Appl. Soft Comput. 2024, 152, 110756. [Google Scholar] [CrossRef]
  37. Lu, P.; Huang, W.; Xiao, J.; Zhou, F.; Hu, W. Adaptive Proportional Integral Robust Control of an Uncertain Robotic Manipulator Based on Deep Deterministic Policy Gradient. Mathematics 2021, 9, 2055. [Google Scholar] [CrossRef]
  38. Ren, Z.; Chen, J.; Miao, Y.; Miao, Y.; Guo, Z.; Hu, B.; Lin, R. Adaptive Sliding Mode Control of Robotic Manipulator Based on Reinforcement Learning. Asian J. Control 2024, 26, 2703–2718. [Google Scholar] [CrossRef]
  39. Zhu, S.; Zhang, G.; Wang, Q.; Li, Z. Sliding Mode Control for Variable-Speed Trajectory Tracking of Underactuated Vessels with TD3 Algorithm Optimization. J. Mar. Sci. Eng. 2025, 13, 99. [Google Scholar] [CrossRef]
  40. Fan, Y.; Dong, H.; Zhao, X.; Denissenko, P. Path-Following Control of Unmanned Underwater Vehicle Based on an Improved TD3 Deep Reinforcement Learning. IEEE Trans. Control Syst. Technol. 2024, 32, 1904–1919. [Google Scholar] [CrossRef]
  41. Grotjahn, M.; Heimann, B.; Abdellatif, H. Identification of Friction and Rigid-Body Dynamics of Parallel Kinematic Structures for Model-Based Control. Multibody Syst. Dyn. 2004, 11, 273–294. [Google Scholar] [CrossRef]
  42. Mohammadi, A.; Tavakoli, M.; Marquez, H.J.; Hashemzadeh, F. Nonlinear Disturbance Observer Design for Robotic Manipulators. Control Eng. Pract. 2013, 21, 253–267. [Google Scholar] [CrossRef]
  43. Chávez-Olivares, C.A.; Mendoza-Gutiérrez, M.O.; Bonilla-Gutiérrez, I. A Nonlinear Disturbance Observer for Robotic Manipulators without Velocity and Acceleration Measurements. J. Braz. Soc. Mech. Sci. Eng. 2023, 45, 639. [Google Scholar] [CrossRef]
  44. Liu, H.; Zhang, T. Neural Network-Based Robust Finite-Time Control for Robotic Manipulators Considering Actuator Dynamics. Robot. Comput.-Integr. Manuf. 2013, 29, 301–308. [Google Scholar] [CrossRef]
Figure 1. (a) Configuration of the robotic manipulator; (b) MDH coordinate system.
Figure 1. (a) Configuration of the robotic manipulator; (b) MDH coordinate system.
Sensors 26 00297 g001
Figure 2. TD3NDONFT control algorithm principle diagram.
Figure 2. TD3NDONFT control algorithm principle diagram.
Sensors 26 00297 g002
Figure 3. Trajectory tracking control framework for NFTSMC of robotic manipulators based on NDO and TD3.
Figure 3. Trajectory tracking control framework for NFTSMC of robotic manipulators based on NDO and TD3.
Sensors 26 00297 g003
Figure 4. The structure of the actor network and critic network.
Figure 4. The structure of the actor network and critic network.
Sensors 26 00297 g004
Figure 5. Simulation model of the robotic manipulator. (a) DRL framework; (b) TD3NDONFT control program construction; (c) 5-DOF robotic manipulator constructed with Simscape.
Figure 5. Simulation model of the robotic manipulator. (a) DRL framework; (b) TD3NDONFT control program construction; (c) 5-DOF robotic manipulator constructed with Simscape.
Sensors 26 00297 g005
Figure 6. Reward of the Twin Delayed Deterministic Policy Gradient algorithm.
Figure 6. Reward of the Twin Delayed Deterministic Policy Gradient algorithm.
Sensors 26 00297 g006
Figure 7. Joint position tracking and position tracking error of robotic manipulator. (a) Position tracking comparison of joint 1; (b) Position tracking error comparison of joint 1; (c) Position tracking comparison of joint 2; (d) Position tracking error comparison of joint 2; (e) Position tracking comparison of joint 3; (f) Position tracking error comparison of joint 3.
Figure 7. Joint position tracking and position tracking error of robotic manipulator. (a) Position tracking comparison of joint 1; (b) Position tracking error comparison of joint 1; (c) Position tracking comparison of joint 2; (d) Position tracking error comparison of joint 2; (e) Position tracking comparison of joint 3; (f) Position tracking error comparison of joint 3.
Sensors 26 00297 g007
Figure 8. Joint velocity tracking and velocity tracking error of robotic manipulator. (a) Velocity tracking comparison of joint 1; (b) Velocity tracking error comparison of joint 1; (c) Velocity tracking comparison of joint 2; (d) Velocity tracking error comparison of joint 2; (e) Velocity tracking comparison of joint 3; (f) Velocity tracking error comparison of joint 3.
Figure 8. Joint velocity tracking and velocity tracking error of robotic manipulator. (a) Velocity tracking comparison of joint 1; (b) Velocity tracking error comparison of joint 1; (c) Velocity tracking comparison of joint 2; (d) Velocity tracking error comparison of joint 2; (e) Velocity tracking comparison of joint 3; (f) Velocity tracking error comparison of joint 3.
Sensors 26 00297 g008
Figure 9. Comparison of IAE and ITAE performance for five position tracking algorithms. (a) Joint 1 position tracking IAE; (b) Joint 2 position tracking IAE; (c) Joint 3 position tracking IAE; (d) Position tracking ITAE.
Figure 9. Comparison of IAE and ITAE performance for five position tracking algorithms. (a) Joint 1 position tracking IAE; (b) Joint 2 position tracking IAE; (c) Joint 3 position tracking IAE; (d) Position tracking ITAE.
Sensors 26 00297 g009
Figure 10. Comparison of IAE and ITAE performance for five velocity tracking algorithms. (a) Joint 1 velocity tracking IAE; (b) Joint 2 velocity tracking IAE; (c) Joint 3 velocity tracking IAE; (d) Velocity tracking ITAE.
Figure 10. Comparison of IAE and ITAE performance for five velocity tracking algorithms. (a) Joint 1 velocity tracking IAE; (b) Joint 2 velocity tracking IAE; (c) Joint 3 velocity tracking IAE; (d) Velocity tracking ITAE.
Sensors 26 00297 g010
Figure 11. The variation in adaptive parameters. (a) Parameter variations of K p at each joint; (b) Parameter variations of K i at each joint; (c) Parameter variations of K r at each joint.
Figure 11. The variation in adaptive parameters. (a) Parameter variations of K p at each joint; (b) Parameter variations of K i at each joint; (c) Parameter variations of K r at each joint.
Sensors 26 00297 g011
Figure 12. Torque input for robotic manipulator joint control. (a) Control input torque of joint 1; (b) Control input torque of joint 2; (c) Control input torque of joint 3.
Figure 12. Torque input for robotic manipulator joint control. (a) Control input torque of joint 1; (b) Control input torque of joint 2; (c) Control input torque of joint 3.
Sensors 26 00297 g012
Figure 13. Disturbance observation of each joint for the robotic manipulator. (a) Real disturbance and observed disturbance of joint 1; (b) Real disturbance and observed disturbance of joint 2; (c) Real disturbance and observed disturbance of joint 3.
Figure 13. Disturbance observation of each joint for the robotic manipulator. (a) Real disturbance and observed disturbance of joint 1; (b) Real disturbance and observed disturbance of joint 2; (c) Real disturbance and observed disturbance of joint 3.
Sensors 26 00297 g013
Figure 14. Disturbance observation error of each joint for the robotic manipulator.
Figure 14. Disturbance observation error of each joint for the robotic manipulator.
Sensors 26 00297 g014
Figure 15. Joint position tracking and position tracking error of robotic manipulator under generalization capability test. (a) Position tracking comparison of joint 1; (b) Position tracking error comparison of joint 1; (c) Position tracking comparison of joint 2; (d) Position tracking error comparison of joint 2; (e) Position tracking comparison of joint 3; (f) Position tracking error comparison of joint 3.
Figure 15. Joint position tracking and position tracking error of robotic manipulator under generalization capability test. (a) Position tracking comparison of joint 1; (b) Position tracking error comparison of joint 1; (c) Position tracking comparison of joint 2; (d) Position tracking error comparison of joint 2; (e) Position tracking comparison of joint 3; (f) Position tracking error comparison of joint 3.
Sensors 26 00297 g015
Figure 16. Joint velocity tracking and velocity tracking error of robotic manipulator under generalization capability test. (a) Velocity tracking comparison of joint 1; (b) Velocity tracking error comparison of joint 1; (c) Velocity tracking comparison of joint 2; (d) Velocity tracking error comparison of joint 2; (e) Velocity tracking comparison of joint 3; (f) Velocity tracking error comparison of joint 3.
Figure 16. Joint velocity tracking and velocity tracking error of robotic manipulator under generalization capability test. (a) Velocity tracking comparison of joint 1; (b) Velocity tracking error comparison of joint 1; (c) Velocity tracking comparison of joint 2; (d) Velocity tracking error comparison of joint 2; (e) Velocity tracking comparison of joint 3; (f) Velocity tracking error comparison of joint 3.
Sensors 26 00297 g016
Table 1. MDH Parameters of the Robotic Manipulator.
Table 1. MDH Parameters of the Robotic Manipulator.
JointsType d i / mm q i / rad a i 1 / mm α i 1 / rad
Joint 1revolute154.55 q 1 00
Joint 2prismatic d 2 012.000
Joint 3revolute0 q 3 174.46 π / 2
Joint 4revolute0 q 4 389.830
Joint 5revolute0 q 5 70.00 π / 2
Table 2. Physical parameters of the robotic manipulator.
Table 2. Physical parameters of the robotic manipulator.
SymbolDefinitionValue
M1Mass of Link 12.98 kg
M2Mass of Link 21.47 kg
M3Mass of Link 31.05 kg
M4Mass of Link 40.32 kg
M5Mass of Link 50.18 kg
gGravity acceleration9.806 m/s2
Table 3. Training parameters of the TD3 algorithm.
Table 3. Training parameters of the TD3 algorithm.
Parameter SymbolValue
Actor network learning rate0.0001
Critic network learning rate0.0001
Sampling step size0.01
Discount factor0.995
Experience replay size1,000,000
Minibatch size128
Noise variance0.1
Threshold of gradient1
Max Step1000
Table 4. Parameter settings in the control law.
Table 4. Parameter settings in the control law.
Control LawParameter Settings
PID k p = d i a g ( 500 , 2000 , 400 ) , k i = d i a g ( 300 , 1000 , 300 ) , k d = d i a g ( 50 , 200 , 50 )
NDOPID k p = d i a g ( 500 , 2000 , 400 ) , k i = d i a g ( 300 , 1000 , 300 ) , k d = d i a g ( 50 , 200 , 50 ) X 1 = d i a g ( 0.5251 , 0.0927 , 0.6277 )
NFTSM K p = d i a g ( 70 , 75 , 25 ) , K i = d i a g ( 40 , 150 , 25 ) , K r = d i a g ( 0.7 , 0.7 , 0.7 ) α = 8 , β = 2 , p = 5 , q = 3
NDONFT K p = d i a g ( 70 , 75 , 25 ) , K i = d i a g ( 40 , 150 , 25 ) , K r = d i a g ( 0.7 , 0.7 , 0.7 ) α = 8 , β = 2 , p = 5 , q = 3 , X 1 = d i a g ( 0.5251 , 0.0927 , 0.6277 )
TD3NDONFT α = 8 , β = 2 , p = 5 , q = 3 , X 1 = d i a g ( 0.5251 , 0.0927 , 0.6277 ) c 1 = c 3 = 1 , c 2 = c 4 = 5 , c 5 = 0.3 , c e = 10
Table 5. Position tracking mean absolute error evaluation indexes.
Table 5. Position tracking mean absolute error evaluation indexes.
ControllersJointsPosition Tracking Mean Absolute Error
0–5 (s)5–10 (s)0–10 (s)
PIDJoint 11.064 × 10−21.305 × 10−21.185 × 10−2
Joint 25.745 × 10−33.457 × 10−34.601 × 10−3
Joint 31.192 × 10−21.535 × 10−21.364 × 10−2
NDOPIDJoint 18.054 × 10−36.879 × 10−37.467 × 10−3
Joint 25.403 × 10−31.858 × 10−33.631 × 10−3
Joint 38.919 × 10−37.714 × 10−38.317 × 10−3
NFTSMJoint 18.845 × 10−31.082 × 10−29.834 × 10−3
Joint 23.256 × 10−35.624 × 10−34.440 × 10−3
Joint 31.434 × 10−22.499 × 10−21.966 × 10−2
NDONFTJoint 15.219 × 10−32.283 × 10−33.751 × 10−3
Joint 29.744 × 10−43.497 × 10−46.621 × 10−4
Joint 35.147 × 10−31.333 × 10−33.240 × 10−3
TD3NDONFTJoint 15.152 × 10−31.814 × 10−33.483 × 10−3
Joint 27.866 × 10−42.734 × 10−45.301 × 10−4
Joint 35.029 × 10−31.053 × 10−33.041 × 10−3
Table 6. Velocity tracking Mean absolute error evaluation indexes.
Table 6. Velocity tracking Mean absolute error evaluation indexes.
ControllersJointsVelocity Tracking Mean Absolute Error
0–5 (s)5–10 (s)0–10 (s)
PIDJoint 14.781 × 10−21.385 × 10−23.083 × 10−2
Joint 21.004 × 10−23.508 × 10−36.774 × 10−3
Joint 34.803 × 10−21.589 × 10−23.196 × 10−2
NDOPIDJoint 14.506 × 10−27.262 × 10−32.616 × 10−2
Joint 29.352 × 10−31.750 × 10−35.551 × 10−3
Joint 34.428 × 10−27.757 × 10−32.602 × 10−2
NFTSMJoint 14.380 × 10−21.182 × 10−22.781 × 10−2
Joint 29.749 × 10−36.813 × 10−38.281 × 10−3
Joint 35.179 × 10−22.620 × 10−23.900 × 10−2
NDONFTJoint 14.116 × 10−24.554 × 10−32.286 × 10−2
Joint 26.158 × 10−31.491 × 10−33.825 × 10−3
Joint 34.040 × 10−23.702 × 10−32.205 × 10−2
TD3NDONFTJoint 14.091 × 10−23.986 × 10−32.245 × 10−2
Joint 25.619 × 10−31.334 × 10−33.477 × 10−3
Joint 34.009 × 10−23.083 × 10−32.159 × 10−2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

You, H.; Liu, Y.; Shi, Z.; Wang, Z.; Wang, L.; Xue, G. Adaptive Non-Singular Fast Terminal Sliding Mode Trajectory Tracking Control for Robotic Manipulator with Novel Configuration Based on TD3 Deep Reinforcement Learning and Nonlinear Disturbance Observer. Sensors 2026, 26, 297. https://doi.org/10.3390/s26010297

AMA Style

You H, Liu Y, Shi Z, Wang Z, Wang L, Xue G. Adaptive Non-Singular Fast Terminal Sliding Mode Trajectory Tracking Control for Robotic Manipulator with Novel Configuration Based on TD3 Deep Reinforcement Learning and Nonlinear Disturbance Observer. Sensors. 2026; 26(1):297. https://doi.org/10.3390/s26010297

Chicago/Turabian Style

You, Huaqiang, Yanjun Liu, Zhenjie Shi, Zekai Wang, Lin Wang, and Gang Xue. 2026. "Adaptive Non-Singular Fast Terminal Sliding Mode Trajectory Tracking Control for Robotic Manipulator with Novel Configuration Based on TD3 Deep Reinforcement Learning and Nonlinear Disturbance Observer" Sensors 26, no. 1: 297. https://doi.org/10.3390/s26010297

APA Style

You, H., Liu, Y., Shi, Z., Wang, Z., Wang, L., & Xue, G. (2026). Adaptive Non-Singular Fast Terminal Sliding Mode Trajectory Tracking Control for Robotic Manipulator with Novel Configuration Based on TD3 Deep Reinforcement Learning and Nonlinear Disturbance Observer. Sensors, 26(1), 297. https://doi.org/10.3390/s26010297

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop