Deep Deterministic Policy Gradient-Based Parameter Adaptation for Synchronous Sliding-Mode Control with Time-Delay Estimation in Dual-Arm Robot Manipulators Under System Uncertainties
Abstract
1. Introduction
- (1)
- A DRL-assisted parameter adaptation framework integrated with a baseline SSMC–TDE controller, enabling online performance-oriented gain adjustment for synchronous dual-arm robotic manipulation under significant uncertainties;
- (2)
- Lyapunov-based stability analysis demonstrating that the integration of the DRL adaptation layer preserves the stability properties of the underlying SSMC-TDE control structure;
- (3)
- Comprehensive co-simulation studies validating improved robustness, convergence performance, and synchronization accuracy compared with fixed-parameter baseline controllers.
2. Dynamic Modeling and Problem Formulation
2.1. Dual-Arm Robot Dynamics
2.2. Synchronous Coordination Error
2.3. Root Mean Square Error
3. Synchronous Sliding-Mode Control with Time-Delay Estimation
3.1. Control Design
3.2. Stability Analysis
4. DRL-Based Online Parameter Adaptation (DDPG)
4.1. Online Parameter Adaptive with SSMC-TDE
4.1.1. State, Action, and Reward Design
4.1.2. DDPG Training Setup in MATLAB Simulink
4.2. DDPG Framework
| Algorithm 1: Deep Deterministic Policy Gradient algorithm |
| Initialize the Actor and Critic networks along with their corresponding target networks and , and create an experience replay buffer. for i = 1, episode_max do Initialize a random noise process for action exploration Receive initial state for j = 1, J do Select action based on the current policy with added exploration noise. Execute action and observe reward and observe new state . Store transition in . Sample a random minibatch of transitions from . Set as Equation (37). Update the critic by minimizing the loss function given in (38). Update the actor policy using the sampled policy gradient defined in (39). Update the target networks expressed in (40). end end |
5. Simulation and Co-Simulation Results
5.1. Simulation Environment
5.2. Simulation Results and Discussions
5.2.1. Scenario 1
5.2.2. Scenario 2
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Tran, D.T.; Dao, H.V.; Ahn, K.K. Adaptive Synchronization Sliding Mode Control for an Uncertain Dual-Arm Robot with Unknown Control Direction. Appl. Sci. 2023, 13, 7423. [Google Scholar] [CrossRef]
- Smith, C.; Karayiannidis, Y.; Nalpantidis, L.; Gratal, X.; Qi, P.; Dimarogonas, D.V.; Kragic, D. Dual arm manipulation—A survey. Robot. Auton. Syst. 2012, 60, 1340–1353. [Google Scholar] [CrossRef]
- Rigatos, G.; Abbaszadeh, M.; Busawon, K.; Pomares, J. A Nonlinear Optimal Control Approach for Dual-Arm Robotic Manipulators. Int. J. Humanoid Robot. 2025, 22, 2450009. [Google Scholar] [CrossRef]
- Karim, M.F.; Bollimuntha, S.; Hashmi, M.S.; Das, A.; Singh, G.; Sridhar, S.; Singh, A.K.; Govindan, N.; Krishna, K.M. Da-Vil: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control. In Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 19–23 May 2025; pp. 11896–11903. [Google Scholar]
- Yang, C.; Jiang, Y.; Na, J.; Li, Z.; Cheng, L.; Su, C.Y. Finite-Time Convergence Adaptive Fuzzy Control for Dual-Arm Robot with Unknown Kinematics and Dynamics. IEEE Trans. Fuzzy Syst. 2019, 27, 574–588. [Google Scholar] [CrossRef]
- Hacioglu, Y.; Arslan, Y.Z.; Yagiz, N. MIMO fuzzy sliding mode controlled dual arm robot in load transportation. J. Frankl. Inst. 2011, 348, 1886–1902. [Google Scholar] [CrossRef]
- Jinjun, D.; Yahui, G.; Ming, C.; Xianzhong, D. Symmetrical adaptive variable admittance control for position/force tracking of dual-arm cooperative manipulators with unknown trajectory deviations. Robot. Comput. -Integr. Manuf. 2019, 57, 357–369. [Google Scholar] [CrossRef]
- Zhang, Y. Adaptive coordinated impedance control for dual-arm robot symmetric bimanual tasks. Robot. Auton. Syst. 2025, 193, 105110. [Google Scholar] [CrossRef]
- Abbas, M.; Narayan, J.; Dwivedy, S.K. A systematic review on cooperative dual-arm manipulators: Modeling, planning, control, and vision strategies. Int. J. Intell. Robot. Appl. 2023, 7, 683–707. [Google Scholar] [CrossRef]
- Al-Shuka, H.; Li, Y.; Song, R. Adaptive Approximation Control of Robotic Manipulators: Centralized and Decentralized Control Algorithms; School of Control Science and Engineering, Shandong University: Jinan, China, 2020; p. 10. [Google Scholar]
- Zhang, W.; Sun, C.; Alharbi, M.; Hasanien, H.M.; Song, K. A voltage-power self-coordinated control system on the load-side of storage and distributed generation inverters in distribution grid. Ain Shams Eng. J. 2025, 16, 103480. [Google Scholar] [CrossRef]
- Liu, X.; Xu, X.; Zhu, Z.; Jiang, Y. Dual-Arm Coordinated Control Strategy Based on Modified Sliding Mode Impedance Controller. Sensors 2021, 21, 4653. [Google Scholar] [CrossRef]
- Tran, D.T.; Nguyen, T.N.; Nguyen, M.T.; Ngo, V.T.; Le, H.L. Synchronous Sliding Mode Control for a 4-DOF Parallel Manipulator in Practice. J. Tech. Educ. Sci. 2023, 18, 1–13. [Google Scholar] [CrossRef]
- Tran, D.T.; Nguyen, X.T.; Nguyen, T.N.; Truong, Q.T. Practical Synchronous Sliding Mode Control With Time Delay Estimation for a 4-DOF Parallel Manipulator With Unknown Dynamics and Variable Payload. IEEE Access 2025, 13, 102758–102770. [Google Scholar] [CrossRef]
- Cam, T.D.T.; Tran, D.T.; Tri, N.T.; Nghi, D.V. Synchronization Sliding Mode Control with Time-Delay Estimation for a 2-DOF Closed-Kinematic Chain Robot Manipulator. In Proceedings of the 2021 International Conference on System Science and Engineering (ICSSE); IEEE: New York, NY, USA, 2021; pp. 38–43. [Google Scholar]
- Tran, D.T.; Nguyen, T.N.; Nguyen, X.T.; Nguyen, D.M. Synchronous PD Control Using a Time Delay Estimator for a Four-Degree-of-Freedom Parallel Robot in Practice. Machines 2023, 11, 831. [Google Scholar] [CrossRef]
- Harandi, M. On the Controllers Based on Time Delay Estimation for Robotic Manipulators. arXiv 2021. [Google Scholar] [CrossRef]
- Truong, T.N.; Vo, A.T.; Kang, H.-J. A Novel Time Delay Nonsingular Fast Terminal Sliding Mode Control for Robot Manipulators with Input Saturation. Mathematics 2025, 13, 119. [Google Scholar] [CrossRef]
- Kali, Y.; Saad, M.; Benjelloun, K. Optimal super-twisting algorithm with time delay estimation for robot manipulators based on feedback linearization. Robot. Auton. Syst. 2018, 108, 87–99. [Google Scholar] [CrossRef]
- Lee, J.W.; Rho, J.M.; Park, S.G.; An, H.M.; Kim, M.; Lee, S.Y. Improved Adaptive Sliding Mode Control Using Quasi-Convex Functions and Neural Network-Assisted Time-Delay Estimation for Robotic Manipulators. Sensors 2025, 25, 4252. [Google Scholar] [CrossRef]
- Vo, A.T.; Truong, T.N.; Kang, H.-J.; Nguyen, N.H.A. Prescribed performance model-free sliding mode control using time-delay estimation and adaptive technique applied to industrial robot arms. Inf. Sci. 2025, 702, 121911. [Google Scholar] [CrossRef]
- Hu, W.; Yang, Y.; Liu, Z. Deep Deterministic Policy Gradient (DDPG) Agent-Based Sliding Mode Control for Quadrotor Attitudes. Drones 2024, 8, 95. [Google Scholar] [CrossRef]
- Simon, J.; Gogolák, L.; Sárosi, J. Deep Reinforcement Learning-Assisted Teaching Strategy for Industrial Robot Manipulator. Appl. Sci. 2024, 14, 10929. [Google Scholar] [CrossRef]
- Hao, X.; Xin, Z.; Huang, W.; Wan, S.; Qiu, G.; Wang, T.; Wang, Z. Deep reinforcement learning enhanced PID control for hydraulic servo systems in injection molding machines. Sci. Rep. 2025, 15, 23005. [Google Scholar] [CrossRef] [PubMed]
- Khan, H.; Khan, S.A.; Lee, M.C.; Ghafoor, U.; Gillani, F.; Shah, U.H. DDPG-Based Adaptive Sliding Mode Control with Extended State Observer for Multibody Robot Systems. Robotics 2023, 12, 161. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, O.; Zhao, Y.; Zhu, Q.; Liu, J. Adaptive neural network-based fixed-time control for robots with input saturation and prescribed performance. Nonlinear Dyn. 2025, 113, 18229–18241. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, O.; Gao, Y.; Zhao, Y.; Sun, Y.; Liu, J. Adaptive neural network-based fixed-time control for trajectory tracking of robotic systems. IEEE Trans. Circuits Syst. II Express Briefs 2022, 70, 241–245. [Google Scholar] [CrossRef]
- Liu, J.; Sun, Y.; Liu, Z.; Gao, Y.; Wu, L.; Leon, J.I.; Franquelo, L.G. Predefined-Time Reliable Control for Robotic Systems With Prescribed Performance. IEEE Trans. Ind. Electron. 2025, 72, 11695–11703. [Google Scholar] [CrossRef]
- Kim, S.; Suh, J.-H.J.I.A. A Study on Robust Control Scheme Using Prescribed Performance-based Time Delay Control and RBF Neural Network. IEEE Access 2025, 13, 180513–180522. [Google Scholar] [CrossRef]
- Yao, J.; Deng, W. Active disturbance rejection adaptive control of uncertain nonlinear systems: Theory and application. Nonlinear Dyn. 2017, 89, 1611–1624. [Google Scholar] [CrossRef]
- Adane, A.G.; Abdissa, C.M. Adaptive Fuzzy Sliding Mode Controller of Three Link Robot Arm Manipulator. IEEE Access 2025, 13, 158222–158236. [Google Scholar] [CrossRef]
- Han, M.; Wong, K.; Euler-Rolle, J.; Zhang, L.; Katzschmann, R.K. Robust learning-based control for uncertain nonlinear systems with validation on a soft robot. IEEE Trans. Neural Netw. Learn. Syst. 2023, 36, 510–524. [Google Scholar] [CrossRef]
- Zhang, X.; Liu, J.; Xu, X.; Yu, S.; Chen, H. Robust learning-based predictive control for discrete-time nonlinear systems with unknown dynamics and state constraints. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 7314–7327. [Google Scholar] [CrossRef]
- Wang, Y.; Fang, S.; Hu, J. Active disturbance rejection control based on deep reinforcement learning of PMSM for more electric aircraft. IEEE Trans. Power Electron. 2022, 38, 406–416. [Google Scholar] [CrossRef]
- Luo, G.; Zhang, D.; Feng, W.; Jiang, Z.; Liu, X. Deep Reinforcement Learning Based Active Disturbance Rejection Control for ROV Position and Attitude Control. Appl. Sci. 2025, 15, 4443. [Google Scholar] [CrossRef]
- Ran, M.; Li, J.; Xie, L. Reinforcement-learning-based disturbance rejection control for uncertain nonlinear systems. IEEE Trans. Cybern. 2021, 52, 9621–9633. [Google Scholar] [CrossRef] [PubMed]
- Maleki, M.; Razavi, F.S.; Taghavipour, A. Reinforcement Learning-Based Adaptive Gain Tuning of Terminal Super-Twisting SMC for Lane-Change Control in Autonomous Vehicles. IEEE Access 2025, 13, 197206–197218. [Google Scholar] [CrossRef]
- Nguyen, T.N.; Nguyen, X.T.; Truong, Q.T.; Tu, D.C.T.; Ahn, J.H.; Ahn, K.K.; Tran, D.T. Reinforcement learning-based improvement of a PD-TDE controller for an upper limb rehabilitation robotic system. In Proceedings of the 2025 28th International Conference on Mechatronics Technology (ICMT), Ho Chi Minh City, Vietnam, 12–15 November 2025; pp. 96–101. [Google Scholar]
- Lee, J.; Chang, P.H.; Jamisola, R.S. Relative Impedance Control for Dual-Arm Robots Performing Asymmetric Bimanual Tasks. IEEE Trans. Ind. Electron. 2014, 61, 3786–3796. [Google Scholar] [CrossRef]
- Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. 2022, 2022, 5481–5487. [Google Scholar] [CrossRef]
- Pham, D.-A.; Han, S.-H. Enhancing Underwater Robot Manipulators with a Hybrid Sliding Mode Controller and Neural-Fuzzy Algorithm. J. Mar. Sci. Eng. 2023, 11, 2312. [Google Scholar] [CrossRef]
- Tran, D.T.; Nha, N.T.; Van Thuyen, N.; Lam, L.H.; Ahn, K.K. A Fault-tolerant Synchronous Sliding Mode Control for a 4-DOF Parallel Manipulator With Uncertainties and Actuator Faults. Int. J. Control Autom. Syst. 2024, 22, 1313–1323. [Google Scholar] [CrossRef]
- Kachroo, P.; Tomizuka, M. Chattering reduction and error convergence in the sliding-mode control of a class of nonlinear systems. IEEE Trans. Autom. Control 2002, 41, 1063–1068. [Google Scholar] [CrossRef]
- Lee, H.; Utkin, V.I. Chattering suppression methods in sliding mode control systems. Annu. Rev. Control 2007, 31, 179–188. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D.J. Continuous control with deep reinforcement learning. arXiv 2015. [Google Scholar]
- Bucolo, M.; Buscarino, A.; Famoso, C.; Fortuna, L.; Frasca, M. Control of imperfect dynamical systems. Nonlinear Dyn. 2019, 98, 2989–2999. [Google Scholar] [CrossRef]
- Fortuna, L.; Buscarino, A.; Frasca, M. Imperfect dynamical systems. Chaos Solitons Fractals 2018, 117, 200. [Google Scholar] [CrossRef]

























| Define | Symbol | Tuning Parameter |
|---|---|---|
| Sample time | 0.05 | |
| Smooth factor | 0.001 | |
| Reward discount factor | 0.99 | |
| Learning rate for actor | 0.0001 | |
| Learning rate for critic | 0.001 | |
| Minibatch size | 128 | |
| Experience buffer length | R |
| Scenario | Uncertainty/Disturbance | Description | Verification Purpose |
|---|---|---|---|
| Scenario 1 | External force disturbance | ±10 N applied to joint 21 and 22 (8–9 s) | Robustness to external disturbances |
| Scenario 2 | Payload variation | Payload increases from 0.5 kg (0–8 s) → 1 kg (8–15 s) → 1.5 kg (15–20 s) | Adaptability to time-varying payloads |
| Controller | Control Coefficient |
|---|---|
| SSMC | |
| SSMC-TDE |
| Controllers | Joint 11 | Joint 21 | Joint 31 | Joint 12 | Joint 22 | Joint 32 |
|---|---|---|---|---|---|---|
| SSMC | 0.0234 | 0.0586 | 0.1071 | 0.0518 | 0.098 | 0.2268 |
| SSMC-TDE | 0.001 | 0.0016 | 0.0028 | 0.0011 | 0.0018 | 0.0032 |
| Proposed method | 0.0006 | 0.001 | 0.0018 | 0.0007 | 0.0014 | 0.0018 |
| Controllers | Joint 11 | Joint 21 | Joint 31 | Joint 12 | Joint 22 | Joint 32 |
|---|---|---|---|---|---|---|
| SSMC | 0.0699 | 0.5788 | 0.1078 | 0.0864 | 0.5874 | 0.2277 |
| SSMC-TDE | 0.0007 | 0.0018 | 0.0027 | 0.0012 | 0.002 | 0.0029 |
| Proposed method | 0.0004 | 0.0012 | 0.0018 | 0.0007 | 0.0014 | 0.0019 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tran, D.T.; Nguyen, T.N.; Huynh, T.K.T.; Ahn, K.K. Deep Deterministic Policy Gradient-Based Parameter Adaptation for Synchronous Sliding-Mode Control with Time-Delay Estimation in Dual-Arm Robot Manipulators Under System Uncertainties. Appl. Sci. 2026, 16, 2042. https://doi.org/10.3390/app16042042
Tran DT, Nguyen TN, Huynh TKT, Ahn KK. Deep Deterministic Policy Gradient-Based Parameter Adaptation for Synchronous Sliding-Mode Control with Time-Delay Estimation in Dual-Arm Robot Manipulators Under System Uncertainties. Applied Sciences. 2026; 16(4):2042. https://doi.org/10.3390/app16042042
Chicago/Turabian StyleTran, Duc Thien, Thanh Nha Nguyen, Thi Kim Tram Huynh, and Kyoung Kwan Ahn. 2026. "Deep Deterministic Policy Gradient-Based Parameter Adaptation for Synchronous Sliding-Mode Control with Time-Delay Estimation in Dual-Arm Robot Manipulators Under System Uncertainties" Applied Sciences 16, no. 4: 2042. https://doi.org/10.3390/app16042042
APA StyleTran, D. T., Nguyen, T. N., Huynh, T. K. T., & Ahn, K. K. (2026). Deep Deterministic Policy Gradient-Based Parameter Adaptation for Synchronous Sliding-Mode Control with Time-Delay Estimation in Dual-Arm Robot Manipulators Under System Uncertainties. Applied Sciences, 16(4), 2042. https://doi.org/10.3390/app16042042

