Adaptive Predefined-Time Tracking Control for Robotic Manipulator Based on Actor-Critic Reinforcement Learning
Abstract
1. Introduction
- A novel control framework that synergistically integrates predefined-time stability theory with Actor-Critic reinforcement learning is proposed. The Actor neural network approximates unknown system dynamics and generates control inputs, while the Critic neural network evaluates the cost-to-go function to guide the learning process, achieving both guaranteed convergence time and online learning capability.
- Predefined-time neural network weight update laws are designed with specially constructed terms that incorporate the predefined-time convergence mechanism. These update laws ensure the convergence of both tracking errors and weight estimation errors within the predefined time while maintaining the learning and approximation capabilities of the neural networks.
- The upper bound of the settling time can be explicitly preset by a single design parameter , satisfying , which is independent of initial conditions and system parameters. This explicit relationship between the design parameter and settling time bound greatly simplifies the controller design process for applications with specific timing requirements.
2. Preliminaries and Problem Formulation
2.1. System Model
2.2. Control Objective
- (i)
- The joint angle tracks the desired trajectory with the tracking error converging to a small neighborhood of the origin within a predefined time , where is a preset design parameter.
- (ii)
- All signals in the closed-loop system remain bounded within the predefined time.
- (iii)
- The Actor-Critic neural networks learn to compensate for the unknown system dynamics online.
2.3. Actor-Critic Neural Network Framework
2.3.1. RBF Basis Function
2.3.2. Critic Network Structure
2.3.3. Actor Network Structure
2.3.4. Actor-Critic Cooperative Learning Mechanism
- (1)
- Critic evaluates policy performance: The Critic network computes the estimated cost function based on the current state and control input, evaluating the quality of the Actor’s current policy. A larger indicates poorer policy performance that requires improvement.
- (2)
- Actor improves control policy: The Actor network utilizes the evaluation information provided by the Critic as feedback to adjust its weights , thereby improving the control policy to minimize the long-term cost.
- (3)
- Online cooperative update: The weights of both networks are updated in real-time during the control process. Through continuous “evaluation-improvement” cycles, the control performance is progressively optimized.
2.4. Technical Lemmas
3. Actor-Critic Predefined-Time Controller Design
3.1. Predefined-Time Virtual Controller Design
3.2. Actor-Critic Reinforcement Learning Controller Design
3.2.1. Critic Network Design
3.2.2. Actor Network Design
3.2.3. Predefined-Time Actual Controller
4. Stability Analysis
- ,
- (i)
- The error signals converge to a compact set within the predefined time .
- (ii)
- All signals in the closed-loop system remain bounded.
- (iii)
- The convergence region is given by:
- from cancels with from .
- from cancels exactly with from , since the Actor weight update law (41) explicitly includes the factor in the gradient term, and the control law ensures that . This exact cancellation holds for any without requiring any approximation.
- The damping term generates the component through Lemma 6, which dominates when V is large.
- The predefined-time term directly generates the component through algebraic substitution, which dominates when V is small.
- The combination of both terms ensures predefined-time convergence for all values of .
5. Simulation Results
5.1. Simulation Setup
5.2. Tracking Performance Analysis
5.3. Neural Network Learning Process
5.4. Effect of Predefined Time Parameter
5.5. Comparison with State-of-the-Art Methods
5.6. Robustness Evaluation
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gao, H.; Yang, Y.; Liu, J.; Sun, C. Reinforcement Learning-Based Admittance Control for Physical Human–Robot Interaction with Output Constraints. IEEE Trans. Autom. Sci. Eng. 2025, 22, 16334–16345. [Google Scholar] [CrossRef]
- Vyas, Y.J.; van der Wijk, V.; Cocuzza, S. A Review of Mechanical Design Approaches for Balanced Robotic Manipulation. Robotics 2025, 14, 151. [Google Scholar] [CrossRef]
- Zhang, D.; Hu, J.; Cheng, J.; Wu, Z.G.; Yan, H. A Novel Disturbance Observer Based Fixed-Time Sliding Mode Control for Robotic Manipulators with Global Fast Convergence. IEEE/CAA J. Autom. Sin. 2024, 11, 661–672. [Google Scholar] [CrossRef]
- Sun, Y.; Yan, B.; Shi, P.; Lim, C.C. Consensus for Multiagent Systems Under Output Constraints and Unknown Control Directions. IEEE Syst. J. 2024, 17, 1035–1044. [Google Scholar] [CrossRef]
- Liu, J.; Wang, Q.G.; Yu, J. Event-Triggered Adaptive Neural Network Tracking Control for Uncertain Systems with Unknown Input Saturation Based on Command Filters. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 8702–8707. [Google Scholar] [CrossRef]
- Li, W.; Zhang, Z.; Ge, S.S. Dynamic Gain Reduced-Order Observer-Based Global Adaptive Neural-Network Tracking Control for Nonlinear Time-Delay Systems. IEEE Trans. Cybern. 2023, 53, 7105–7114. [Google Scholar] [CrossRef]
- Xie, X.; Chen, W.; Xia, C.; Xing, J.; Chang, L. An RBFNN-Based Prescribed Performance Controller for Spacecraft Proximity Operations with Collision Avoidance. Sensors 2026, 26, 108. [Google Scholar] [CrossRef]
- Zhang, X.; Li, H.; Zhu, G.; Zhang, Y.; Wang, C.; Wang, Y.; Su, C.Y. Finite-Time Adaptive Quantized Control for Quadrotor Aerial Vehicle with Full States Constraints and Validation on QDrone Experimental Platform. Drones 2024, 8, 264. [Google Scholar] [CrossRef]
- Zhang, S.; Yang, P.; Kong, L.; Li, G.; He, W. A Single Parameter-Based Adaptive Approach to Robotic Manipulators with Finite Time Convergence and Actuator Fault. IEEE Access 2020, 8, 15123–15131. [Google Scholar] [CrossRef]
- Li, G.; Chen, X.; Yu, J.; Liu, J. Adaptive Neural Network-Based Finite-Time Impedance Control of Constrained Robotic Manipulators with Disturbance Observer. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 1412–1416. [Google Scholar] [CrossRef]
- Jiménez-Rodríguez, E.; Muñoz-Vázquez, A.J.; Sánchez-Torres, J.D.; Defoort, M.; Loukianov, A.G. A Lyapunov-Like Characterization of Predefined-Time Stability. IEEE Trans. Autom. Control 2020, 65, 4922–4927. [Google Scholar] [CrossRef]
- Zhang, T.; Bai, R.; Li, Y. Practically Predefined-Time Adaptive Fuzzy Quantized Control for Nonlinear Stochastic Systems with Actuator Dead Zone. IEEE Trans. Fuzzy Syst. 2023, 31, 1240–1253. [Google Scholar] [CrossRef]
- Liu, B.; Wang, W.; Li, Y.; Yi, Y.; Xie, G. Adaptive Quantized Predefined-Time Backstepping Control for Nonlinear Strict-Feedback Systems. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 3859–3863. [Google Scholar] [CrossRef]
- Xie, S.; Chen, Q. Adaptive Nonsingular Predefined-Time Control for Attitude Stabilization of Rigid Spacecrafts. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 189–193. [Google Scholar] [CrossRef]
- Fan, Y.; Yang, C.; Zhan, H.; Li, Y. Neuro-Adaptive-Based Predefined-Time Smooth Control for Manipulators with Disturbance. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 4605–4616. [Google Scholar] [CrossRef]
- Lewis, F.L.; Vrabie, D.; Vamvoudakis, K.G. Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers. IEEE Control Syst. Mag. 2012, 32, 76–105. [Google Scholar]
- Ouyang, Y.; He, W.; Li, X. Reinforcement learning control of a single-link flexible robotic manipulator. IET Control Theory Appl. 2017, 11, 1426–1433. [Google Scholar] [CrossRef]
- Vamvoudakis, K.G.; Lewis, F.L. Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
- Guan, X.; Li, Y.X.; Hou, Z.; Ahn, C.K. Reinforcement Learning-Based Event-Triggered Adaptive Fixed-Time Optimal Formation Control of Multiple QAAVs. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 11849–11864. [Google Scholar] [CrossRef]
- Liu, Y.J.; Li, S.; Tong, S.; Chen, C.L.P. Adaptive Reinforcement Learning Control Based on Neural Approximation for Nonlinear Discrete-Time Systems with Unknown Nonaffine Dead-Zone Input. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 295–305. [Google Scholar] [CrossRef]
- Zhang, Y.; Liang, X.; Li, D.; Ge, S.S.; Gao, B.; Chen, H.; Lee, T.H. Reinforcement Learning-Based Time-Synchronized Optimized Control for Affine Systems. IEEE Trans. Artif. Intell. 2024, 5, 5216–5231. [Google Scholar] [CrossRef]
- Sun, Y.; Shi, P.; Lim, C.C. Event-triggered adaptive leaderless consensus control for nonlinear multi-agent systems with unknown backlash-like hysteresis. Int. J. Robust Nonlinear Control 2021, 31, 7409–7424. [Google Scholar] [CrossRef]
- Hu, G.; Xu, D.; Hua, W.; Jiang, B.; Shi, P.; Rudas, I.J. Fixed-Time Cooperative Sliding Mode Control for Synchronization of Multilinear Motor Systems. IEEE/ASME Trans. Mechatronics 2025, 31, 173–184. [Google Scholar] [CrossRef]
- Muñoz-Vázquez, A.J.; Sánchez-Torres, J.D.; Jiménez-Rodríguez, E.; Loukianov, A.G. Predefined-time robust stabilization of robotic manipulators. IEEE/ASME Trans. Mechatronics 2019, 24, 1033–1040. [Google Scholar] [CrossRef]
- Sun, Y.; Shi, P.; Lim, C.C. Adaptive consensus control for output-constrained nonlinear multi-agent systems with actuator faults. J. Frankl. Inst. 2022, 359, 4216–4232. [Google Scholar] [CrossRef]
- Zhang, L.; Su, Y.; Wang, Z.; Wang, H. Fixed-time terminal sliding mode control for uncertain robot manipulators. ISA Trans. 2024, 144, 364–373. [Google Scholar] [CrossRef]






| Parameter | Description | Value | Unit |
|---|---|---|---|
| System Parameters | |||
| m | Link mass | 1.0 | kg |
| l | Link length | 0.5 | m |
| Friction coefficient | 1.0 | N·m·s/rad | |
| g | Gravitational acceleration | 9.8 | m/s2 |
| External disturbance | N·m | ||
| Reference trajectory | rad | ||
| Predefined-Time Parameters | |||
| Predefined time parameter | 2.0 | s | |
| Maximum settling time | s | ||
| Convergence parameter | 0.6 | - | |
| Controller Parameters | |||
| Feedback gain | 100 | - | |
| Small constants | - | ||
| Smoothing parameter | 0.05 | - | |
| Neural Network Parameters | |||
| Actor network nodes | 100 | - | |
| Critic network nodes | 64 | - | |
| Learning rates | 100, 50 | - | |
| Critic feedback gain | 2.0 | - | |
| , | RBF widths | 1.2, 1.0 | - |
| Discount factor | 10 | - | |
| , | Weight bounds | 200, 100 | - |
| PID Controller | |||
| PID gains | 25, 12, 5 | - | |
| Performance Metric | AC-PT | PID | Improvement |
|---|---|---|---|
| Total RMSE (rad) | 67.0% | ||
| SS RMSE (rad) | 96.9% | ||
| Max SS Error (rad) | 97.5% | ||
| Settling Time (s) | 98.5% | ||
| Satisfied | 20/20 (100%) | N/A | — |
| Performance Metric | AC-PT | PID | Improvement |
|---|---|---|---|
| Total RMSE (rad) | 0.0467 | 0.1333 | 65.0% |
| Steady-State RMSE (rad) | 0.0014 | 0.0465 | 96.9% |
| Max Steady-State Error (rad) | 0.0037 | 0.1259 | 97.1% |
| Settling Time to rad (s) | 0.229 | 13.841 | 98.3% |
| Time Within rad (%) | 100.0 | 12.2 | - |
| Scenario | Total RMSE | SS RMSE | Settling Time | |
|---|---|---|---|---|
| (rad) | (rad) | (s) | Satisfied | |
| Nominal (, ) | 0.0450 | 0.0015 | 0.209 | Yes |
| Mass Uncertainty | ||||
| kg () | 0.0429 | 0.0015 | 0.207 | Yes |
| kg () | 0.0469 | 0.0016 | 0.212 | Yes |
| Friction Uncertainty | ||||
| () | 0.0448 | 0.0015 | 0.208 | Yes |
| () | 0.0453 | 0.0015 | 0.210 | Yes |
| Increased Disturbance | ||||
| disturbance | 0.0450 | 0.0015 | 0.209 | Yes |
| disturbance | 0.0450 | 0.0015 | 0.209 | Yes |
| disturbance | 0.0450 | 0.0015 | 0.209 | Yes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Qin, Y.; Sun, Y.; Huang, J.; Li, Y. Adaptive Predefined-Time Tracking Control for Robotic Manipulator Based on Actor-Critic Reinforcement Learning. Sensors 2026, 26, 1529. https://doi.org/10.3390/s26051529
Qin Y, Sun Y, Huang J, Li Y. Adaptive Predefined-Time Tracking Control for Robotic Manipulator Based on Actor-Critic Reinforcement Learning. Sensors. 2026; 26(5):1529. https://doi.org/10.3390/s26051529
Chicago/Turabian StyleQin, Yong, Yuan Sun, Jun Huang, and Yankai Li. 2026. "Adaptive Predefined-Time Tracking Control for Robotic Manipulator Based on Actor-Critic Reinforcement Learning" Sensors 26, no. 5: 1529. https://doi.org/10.3390/s26051529
APA StyleQin, Y., Sun, Y., Huang, J., & Li, Y. (2026). Adaptive Predefined-Time Tracking Control for Robotic Manipulator Based on Actor-Critic Reinforcement Learning. Sensors, 26(5), 1529. https://doi.org/10.3390/s26051529

