1. Introduction
Robotic manipulator systems are characterized by strong nonlinearity, high coupling, and sensitivity to external disturbances; therefore, achieving high-precision trajectory tracking remains a critical problem in nonlinear control. In early studies, Lin et al. [
1] designed a robust controller for robotic manipulators based on optimal control theory; Ahmad et al. [
2] proposed a proportional–integral sliding mode control method, which improved the steady-state performance of the system by introducing an integral term. In addition, some studies have attempted to use adaptive control strategies to cope with model uncertainties, but performance degradation still exists in complex disturbance environments.
Fundamentally, the practical implementation of high-precision trajectory tracking is inextricably linked to the performance of the underlying sensor-based feedback control systems. Real-time state variables, such as joint angular positions and velocities, must be continuously acquired via high-precision joint sensors (e.g., optical rotary encoders). However, in actual engineering deployments, physical sensors inevitably introduce measurement noise, and the overall robotic sensory system is highly susceptible to external environmental disturbances [
3]. Recently, integrating adaptive estimation and learning-assisted architectures with sensor feedback has become a vital trend to improve the motion control of autonomous systems [
4]. Therefore, when designing a control strategy for robotic manipulators, it is crucial to ensure the algorithm possesses strong robustness against the inherent uncertainties and dynamic interference present in real-world sensor data acquisition.
To enhance the system’s robustness against uncertainties and external disturbances, sliding mode control (SMC) is widely used in robotic manipulator systems. Man et al. [
5] proposed a multi-input multi-output terminal sliding mode control method to achieve asymptotic stability of the system state; Levant [
6] theoretically developed high-order sliding mode control methods; and Utkin [
7] systematically summarized the application of sliding mode control in engineering. However, traditional sliding mode control is prone to severe high-frequency chattering problems due to the use of discontinuous switching terms. To address this problem, Bartolini et al. [
8] weakened the chattering phenomenon by improving the reaching law, and Shtessel et al. [
9] systematically analyzed the chattering mechanism and suppression methods from an engineering perspective. To improve the convergence speed, non-singular terminal sliding mode control was proposed. Rathaur [
10] designed a singularity-free fast terminal sliding mode controller; Sun et al. [
11] introduced neural networks into finite-time sliding mode control to achieve faster convergence performance; and Gao et al. [
12] utilized RBF neural networks to approximate system uncertainties online. However, these methods usually rely on large switching gains to ensure finite-time convergence, resulting in prominent chattering problems.
To overcome the problem that convergence time depends on initial states, Zhang et al. [
13] proposed a fixed-time sliding mode control method; Polyakov [
14] systematically studied fixed-time stability theoretically; and Zuo [
15] proposed a singularity-free fixed-time control method. Building upon this, Cao et al. [
16] introduced reinforcement learning to achieve fixed-time trajectory tracking control under input constraints. However, fixed-time control methods generally suffer from complex parameter design and conservative upper bounds of convergence time. To achieve explicit adjustment of convergence time, predefined-time control has gradually become a research hotspot. Jia et al. [
17] proposed a predefined-time non-singular sliding mode control method; Muñoz-Vázquez et al. [
18] studied the predefined-time robust stabilization of robotic systems. Regarding applications in different fields, Yu et al. [
19] designed a predefined-time non-singular fast terminal sliding mode for trajectory tracking of underwater vehicles (ROVs); Chen et al. [
20] and Zheng et al. [
21] achieved predefined-time stabilization control of spacecraft attitude based on observer theory and fully actuated system approaches, respectively. In the field of robotic manipulators and complex robots, Liu et al. [
22] proposed a general predefined-time terminal sliding mode control scheme for dual-arm space robots; Xu et al. [
23] further explored event-triggered adaptive sliding mode control with predefined-time convergence characteristics; Li et al. [
24] applied PTC to conventional robotic manipulator trajectory tracking; Wang et al. [
25] provided a systematic review of this field; and Shao et al. [
26] studied the predefined-time stability of uncertain systems. Furthermore, Song et al. [
27] systematically solved the prescribed-time stabilization problem of strict-feedback nonlinear systems theoretically, laying a foundation for the generalization of such methods. These studies show that PTC has significant advantages in improving system dynamic performance. However, under complex disturbances, the aforementioned methods usually rely on large robust gains, thereby aggravating the chattering of control inputs.
In industrial automation contexts, such as high-speed pick-and-place operations, automated sorting, and collaborative precision assembly, robotic manipulators are required to complete specified motions within rigid, safety-critical time windows. Under these operating conditions, conventional asymptotic or finite-time control methods become inadequate because their convergence time bounds heavily depend on or grow with the initial states of the system, which are often uncertain or variable on a real-world production line. Predefined-time stability overcomes this limitation by decoupling the settling time from the initial conditions, allowing control engineers to explicitly pre-program the exact convergence time
directly into the software architecture, thereby guaranteeing predictable cycle times and superior dynamic performance. It is worth noting that although the integration of sliding mode control with reinforcement learning has been previously explored in the literature [
28,
29,
30], those schemes typically still rely on high-gain switching terms even after neural network compensation, and rarely guarantee a strictly user-prescribed convergence time. The strategy proposed in this paper differs in that it concurrently achieves (i) strict predefined-time convergence via a smooth artificial-delay sliding surface, (ii) online Actor–Critic compensation that reduces dependence on large robust gains, and (iii) a singularity-free, low-chattering control input suitable for industrial-grade actuators.
In recent years, with the development of data-driven control, reinforcement learning (RL) has gradually been introduced into nonlinear system control. Vamvoudakis and Lewis [
31] proposed an online Actor–Critic control framework; Lewis et al. [
32] systematically summarized its application in control; and Zhao et al. [
33] provided a review of reinforcement learning-based control methods. Modares et al. [
34] proposed a framework using RL to solve the optimal control of nonlinear systems with partially unknown dynamics. In the field of robotic manipulator control, Liu et al. [
35] achieved trajectory tracking using reinforcement learning; Liu et al. [
28] specifically designed a predefined-time tracking controller based on the Actor–Critic algorithm for n-DOF manipulators; and Wang et al. [
29] introduced the Actor–Critic structure into the predefined-time control of nonlinear systems. To address modeling uncertainties, Hao et al. [
36] significantly improved system robustness using RBF neural network adaptive compensation, while Wang et al. [
37] combined fuzzy neural networks with the Actor–Critic architecture to enhance the system’s adaptive capability against external disturbances. In addition, Sun et al. [
30] combined reinforcement learning with sliding mode control, reducing chattering to a certain extent. However, existing research mostly utilizes reinforcement learning as an auxiliary compensation means and still relies on high-gain sliding mode terms to guarantee robustness, thus failing to fundamentally resolve the chattering problem.
In summary, existing methods have not yet simultaneously achieved the following three objectives: predefined-time convergence, low-chattering control input, and low reliance on high robust gains. Therefore, how to achieve a control strategy with low chattering and good engineering feasibility while ensuring strict predefined-time convergence remains an unresolved problem.
To address the above problems, this paper proposes a predefined-time sliding mode control method integrating reinforcement learning (PTC-RLC). This method achieves strict error convergence within a user-defined time by constructing a predefined-time sliding surface based on smooth artificial delay feedback; meanwhile, it introduces an Actor–Critic network to approximate lumped system uncertainties online, effectively compensating for the physical sensor measurement uncertainties and dynamic disturbances, thereby reducing the controller’s reliance on large robust gains and weakening chattering from the mechanism. Based on Lyapunov theory, the semi-global uniform ultimate boundedness of the closed-loop system is proven.
The remainder of this paper is organized as follows:
Section 2 introduces the system model.
Section 3 presents the preliminaries and controller design.
Section 4 performs the stability analysis.
Section 5 verifies the effectiveness of the proposed method through simulation.
Section 6 provides the conclusion.
2. Model Description
While the proposed control architecture is universally applicable to
n-DOF manipulators, where the per-joint sliding-surface computation and the Actor compensator scale linearly with the number of joints, a 2-DOF robotic manipulator model is selected in this section for explicit mathematical illustration and subsequent simulation verification. Consider a 2-DOF robotic manipulator model, whose structure is shown in
Figure 1.
The Lagrangian dynamic equation of an
n-degree-of-freedom (DOF) robotic manipulator affected by parameter uncertainties and external disturbances is formulated as follows:
where
are the joint angular position, velocity, and acceleration vectors, respectively;
is the control torque;
denotes the external lumped generalized disturbances, encompassing joint friction, payload variations, and unmodeled external forces; and
,
, and
represent the inertia matrix, Coriolis/centrifugal matrix, and gravity vector, respectively. By decomposing these system matrices into nominal parts (subscript 0) and uncertain parts (subscript
), Equation (
1) can be compactly rearranged as
where
is the nominal invertible inertia matrix, and
represents the lumped system uncertainty, which encapsulates unmodeled dynamics, parameter variations, and external disturbances. To facilitate the subsequent predefined-time controller design, the trajectory tracking error vectors are defined as follows. Let
represent the ideal continuous reference trajectory. The position tracking error
and velocity tracking error
are defined as
Taking the time derivative of the error vectors yields the foundational error evolution dynamics:
By substituting the isolated system dynamics from Equation (
2) into the acceleration error, the complete closed-loop error evolution equation can be explicitly derived as
where
and
are
n-dimensional vectors. Assume that the reference trajectory
and its second derivative
are continuous and bounded. To facilitate the subsequent controller design, the necessary mathematical foundation for function approximation and specific control objectives will be introduced in
Section 3.
5. Simulation and Results
To concretely verify the effectiveness and performance of the proposed “predefined-time controller integrating reinforcement learning” in computer simulations, a series of detailed numerical simulations are conducted in this section. We select a 2-DOF robotic manipulator model as the simulation object, in which the
,
,
structure adopted is the standard two-link rigid planar manipulator model introduced in [
41] and subsequently adopted by recent learning-based manipulator-control studies such as [
16]. The overall control flowchart is shown in
Figure 2, and the related dynamic system parameters are set as follows:
where
,
,
,
, and
. The physical parameters are defined as follows:
and
denote the mass and length of link
i, respectively, with
kg and
m;
m/s
2 is the gravitational acceleration;
is the moment of inertia of link
i, calculated as
kg·m
2 and
kg·m
2; and
represents the distance from joint
to the center of mass of link
i.
To verify the robustness of the proposed PTC-RLC strategy under significant parameter perturbations, the nominal mass used by the controller is intentionally set as
kg, representing an
deviation from the actual link mass
kg. Furthermore, the lumped disturbance
in Equation (
1) is modeled as a combination of viscous friction, Coulomb friction, and time-varying external disturbances:
The simulation is conducted with a sampling period of
s. To ensure reproducibility and enable a clear comparison, the specific control gains, RBF neural network hyperparameters, and predefined time constant are summarized in
Table 1. These parameters are selected to balance convergence speed and control smoothness.
The parameters of the NTSMC are tuned to achieve its best possible performance under the same disturbance conditions to ensure a fair comparison.
5.1. Trajectory Tracking Performance and Control Input Analysis
To comprehensively evaluate the overall performance of the proposed predefined-time sliding mode controller integrating reinforcement learning (PTC-RLC, denoted as Proposed (RL) in the figures), this section presents a detailed comparative analysis with a traditional baseline controller (Baseline) and a non-singular terminal sliding mode controller (NTSMC) [
42]. In the comparative simulations, the “Baseline” controller is specifically defined as the proposed predefined-time sliding mode control architecture without the auxiliary Actor–Critic reinforcement learning compensation network (i.e., by enforcing
). This ablation setup isolates and clarifies the unique contribution of the reinforcement learning agent in dynamically mitigating unknown parameter variations and external disturbances. Regarding the choice of external comparison, NTSMC [
42] is selected as it represents a state-of-the-art finite-time robust controller widely benchmarked in the manipulator-control literature. A classical PID controller was preliminarily evaluated under the same operating conditions but is omitted from the comparative figures, because under the imposed
mass mismatch and the time-varying disturbance
specified below, the PID controller exhibits sustained tracking errors and visible drift, which renders it uncompetitive against any sliding-mode-based scheme and offers limited additional insight into the chattering-suppression and predefined-time properties that are the central focus of this paper.
Figure 3 and
Figure 4 show the trajectory tracking curves of Joint 1 and Joint 2 of the robotic manipulator, respectively. Under the harsh operating conditions of strong parameter perturbations and external disturbances, all three controllers track the desired trajectory. However, the Baseline controller exhibits noticeable phase lag during the initial response phase and reduced fidelity where curvature changes rapidly, such as at the trajectory peaks and valleys. NTSMC improves the response speed to a certain extent, but visible trajectory deviations remain. In contrast, the proposed scheme converges to the reference trajectory faster, exhibits the predefined-time convergence behavior, and maintains close trajectory overlap throughout the motion cycle with no apparent lag or overshoot.
To intuitively evaluate the overall control precision of the system in both transient and steady states,
Figure 5 displays the evolution curve of the comprehensive position tracking error norm
for the dual joints. To clearly reveal the fluctuation details of microscopic errors, this curve is plotted using a log scale.
As shown in
Figure 5, in the initial transient phase, the error of the Baseline controller converges the slowest, and although NTSMC converges faster, its decay rate slows down noticeably as it approaches zero. The proposed method (Proposed) exhibits the fastest error decay, and within the prescribed time
s, the error converges to a neighborhood of zero, confirming that the predefined-time sliding surface delivers convergence within the designed time window. After entering the steady-state tracking phase, due to the model parameter uncertainties and external disturbances, the error of the Baseline controller fluctuates noticeably and remains at the order of
. NTSMC reduces the overall error, but its curve shows high-frequency spikes, reflecting the chattering caused by the high-gain switching term. The proposed method, by employing the Actor–Critic reinforcement learning network for online estimation and compensation of disturbances, reduces the controller’s reliance on large switching gains; consequently, the steady-state error is kept below the order of
and the curve is smooth, substantially suppressing the chattering and yielding improved control precision.
The input torque comparison curves of the three controllers are shown in
Figure 6. NTSMC, in resisting strong disturbances through high-gain switching, produces control torques accompanied by visible high-frequency chattering. Such high-frequency oscillations can excite unmodeled high-frequency dynamics of the manipulator in actual physical systems and cause wear on actuators such as motors. The proposed strategy outputs a smoother, continuous torque curve that mitigates the inherent chattering of traditional sliding mode control; the torque amplitude is kept within a reasonable physical range with no abrupt jumps, while the high tracking precision reported above is preserved. This indicates that the algorithm has good engineering implementation feasibility while delivering the transient and steady-state performance reported above.
Furthermore, to evaluate the real-time industrial applicability, the computational cost of the proposed control law was quantified. All timing experiments were conducted on a desktop workstation equipped with an Intel Core i5-14600KF processor running Windows 10 and MATLAB R2024a. The average single-step execution time was measured using MATLAB’s tic/toc utilities, with the timer wrapping only the controller-update block of each iteration and the recorded times averaged over the entire -step simulation horizon ( s with sampling period s). Under this protocol, the proposed PTC-RLC algorithm requires approximately ms per control cycle, which is nearly two orders of magnitude below the 1 ms budget of a standard 1 kHz industrial control loop. Because the measurement is obtained inside an interpreted MATLAB environment, an equivalent C/C++ implementation deployed on industrial embedded hardware would be expected to execute substantially faster. This minimal computational overhead indicates that the proposed algorithm is suitable for real-time deployment at standard industrial control frequencies without inducing control delays.
5.2. Performance Evaluation Under Extreme Trajectories
To further thoroughly evaluate the performance limits, transient overshoot, and switching behavior of the proposed PTC-RLC strategy, two additional extreme trajectory cases are conducted: a high-frequency continuous trajectory and a point-to-point step response. The high-frequency trajectory is utilized to test the bandwidth and convergence limits of the predefined-time sliding surface, while the step response, characterized by a massive initial error, is implemented to observe the overshoot and dynamic reaching process of the controller. It should be noted that this subsection focuses on characterizing the intrinsic bandwidth, overshoot, and switching behavior of the proposed PTC-RLC under extreme dynamic conditions, rather than repeating the three-controller comparison of
Section 5.1. Comparative curves with NTSMC under the step input are not reported because the discontinuous high-gain switching of NTSMC induces severe transient chattering and torque saturation at the instant of the step jump, leading to non-representative actuator behavior; under the high-frequency reference, NTSMC was preliminarily verified to be inferior to the proposed scheme with steady-state error exceeding
rad, and is omitted to preserve figure clarity.
As depicted in
Figure 7 and
Figure 8, in the point-to-point step response scenario (dashed blue line), the proposed controller demonstrates an exceptionally smooth transient reaching behavior. Despite the sudden and large initial position deviation, the error strictly converges to zero without any noticeable overshoot. This fully validates that the constructed sliding surface based on smooth artificial delay feedback effectively suppresses the excessive transient chattering commonly observed in conventional reaching laws. In the high-frequency trajectory tracking scenario (solid red line), the controller exhibits strong tracking capability. As shown in the logarithmic error norm in
Figure 8, the steady-state tracking error is firmly confined to a minimal scale (<
).
Furthermore,
Figure 9 illustrates the control torque behavior under these extreme conditions. For the step response, the torque naturally peaks at the initial moment to drive the manipulator toward the target but rapidly decays to a smooth, steady-state level compensating only for gravity and friction. In contrast, for the high-frequency trajectory, the controller must output high-frequency oscillating torque to force the mechanical structure to track rapidly changing references. Even under such aggressive dynamic demands, the torque remains bounded and does not exhibit high-frequency infinite chattering, confirming the controller’s reliability under extreme dynamic limits.
5.3. Effect of the Predefined-Time Parameter
One of the most significant theoretical advantages of the proposed method is that the convergence time can be explicitly scheduled by adjusting the single parameter . To visually demonstrate how the controller behaves with different predefined-time parameters, an additional comparative study is performed using the step response trajectory, evaluating the convergence behavior under .
The simulation results are comprehensively presented in
Figure 10. The vertical dotted lines with corresponding colors mark the exact prescribed deadlines
. It is evident that regardless of the chosen
, the tracking error curves successfully hit the equilibrium point (zero) exactly at or slightly before their respective predefined times. Specifically, when
, the error decays smoothly; when a more aggressive parameter
is selected, the convergence rate dramatically accelerates. A slight numerical residual with peak magnitude below
rad is observed at
for the most aggressive setting. This residual is attributable to the discrete-time numerical implementation with the finite sampling period
s, which limits the achievable resolution of the smooth Bump kernel
when its compact support
shrinks to a narrow time window of approximately
s and is thereby covered by only a few hundred sampling instants; the residual decreases monotonically as
, consistent with the analytical convergence guarantee in Theorem 1. The error is immediately trapped in the sliding manifold thereafter. This explicit parameter-adjustment mechanism provides great flexibility for practical engineering deployments, allowing operators to make a reasonable trade-off between strict convergence time and acceptable actuator torque limits.