Deep Reinforcement Learning Trajectory Tracking Control for a Six-Degree-of-Freedom Electro-Hydraulic Stewart Parallel Mechanism
Abstract
1. Introduction
2. Kinematic and Dynamic Analysis of Parallel Mechanisms
2.1. Inverse Kinematic Model
2.2. Dynamic Analysis
3. DDPG Algorithm Model
- Position Error Term:where represents the position error of the i-th (i = 1,2,…,6) hydraulic cylinder, and is the position error weighting coefficient.
- Maximum Position Error Penalty Term:where denotes the maximum value of the squared position error of the i-th (i = 1,2,…,6) hydraulic cylinder, and is the maximum position error penalty weighting coefficient, .
- Error Coupling Suppression Term:where and ( = 1,2,…,6, = 2,…,6, < ) are the position errors of two distinct hydraulic cylinders, and is the error coupling suppression weighting coefficient
- Over-Travel Penalty Term:
4. Comparative Experimental Analysis
4.1. Simulation Environment Setup
4.2. Comparative Experiment
- Input signals: linear displacement 0.03sin(0.15πt) m, angular displacement 1.72sin(0.15πt)°
- 2.
- Input signals: linear displacement 0.06sin(0.525πt) m, angular displacement 3.44sin(0.525πt)°
5. Conclusions
- (1)
- For traditional PID control algorithms, control parameters rely on prior knowledge for design. Fixed control parameters struggle to adapt to dynamic changes, requiring frequent manual tuning. Each controller operates independently without information-sharing mechanisms, unable to proactively explore new operating conditions. Some improved algorithms based on this, such as Fuzzy+PID, Neural Network+PID, and feedforward-feedback control, exhibit limited practical control performance for high-order, strongly coupled, highly nonlinear systems like the electro-hydraulic Stewart parallel mechanism. This limitation stems from issues like high computational load, insufficient real-time capability, and difficulties in establishing precise models.
- (2)
- Deep reinforcement learning agents directly learn nonlinear mapping relationships, perceive environmental changes (e.g., load transients, disturbances) in real-time, autonomously adjust strategies dynamically, and proactively address issues like multivariable coupling, disturbance rejection, and fault tolerance. However, no control algorithm is perfect: Compared to PID control, DDPG-PID achieves higher control accuracy through real-time dynamic adjustment of control gains, but exhibits inferior smoothness in trajectory tracking curves. In the DDPG algorithm, observation range, action range, learning rate, noise intensity, and reward function gains collectively affect training stability, convergence speed, and final performance, requiring timely adjustments during training. The DDPG algorithm imposes high computational demands; for the complex operating condition discussed in this paper—synchronous variable-amplitude variable-frequency sinusoidal excitation across all 6 degrees of freedom—hardware configuration directly impacts training speed, algorithm stability, and experimental efficiency.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Furqan, M.; Suhaib, M.; Ahmad, N. Studies on Stewart platform manipulator: A review. J. Mech. Sci. Technol. 2017, 31, 4459–4470. [Google Scholar] [CrossRef]
- McCann, C.; Patel, V.; Dollar, A. The Stewart Hand. IEEE Robot. Autom. Mag. 2021, 28, 23–36. [Google Scholar] [CrossRef]
- Bi, F.; Ma, T.; Wang, X.; Yang, X.; Lv, Z. Research on Vibration Control of Seating System Platform Based on the Cubic Stewart Parallel Mechanism. IEEE Access 2019, 7, 155637–155649. [Google Scholar] [CrossRef]
- Karmakar, S.; Turner, C.J. A Literature Review on Stewart-Gough Platform Calibrations. J. Mech. Des. 2024, 146, 083302. [Google Scholar] [CrossRef]
- Kazezkhan, G.; Xu, Q.; Wang, N.; Xue, F.; Wang, H. Performance Analysis and Optimization of a Modified Stewart Platform for the Qitai Radio Telescope. Res. Astron. Astrophys. 2023, 23, 095022. [Google Scholar] [CrossRef]
- Ma, T.; Li, T.J.; Jing, G.X.; Liu, H.; Bi, F.R. Development of a Novel Seat Suspension Based on the Cubic Stewart Parallel Mechanism and Magnetorheological Fluid Damper. Appl. Sci. 2022, 12, 11437. [Google Scholar] [CrossRef]
- Peterson, T.R. Design and Implementation of Stewart Platform Robot for Robotics Course Laboratory. Master’s Thesis, California Polytechnic State University, San Luis Obispo, CA, USA, 2020. [Google Scholar]
- Tang, J.; Cao, D.Q.; Yu, T.H. Decentralized vibration control of a voice coil motor-based Stewart parallel mechanism: Simulation and experiments. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2019, 233, 132–145. [Google Scholar] [CrossRef]
- Wang, Z.J.; Yang, C.H.; Che, R.Q.; Li, H.X.; Chen, Y.P.; Chen, L.J.; Yuan, W.X.; Yang, F.; Tian, J.; Wang, B.J. Assisted Tea Leaf Picking: The Design and Simulation of a 6-DOF Stewart Parallel Lifting Platform. Agronomy 2024, 14, 844. [Google Scholar] [CrossRef]
- Tian, T.; Jiang, H.; Tong, Z.; He, J.; Huang, Q. An inertial parameter identification method of eliminating system damping effect for a six-degree-of-freedom parallel manipulator. Chin. J. Aeronaut. 2015, 28, 582–592. [Google Scholar] [CrossRef]
- Jishnu, A.K.; Chauhan, D.K.; Vundavilli, P.R. Design of neural network-based adaptive inverse dynamics controller for motion control of stewart platform. Int. J. Comput. Methods 2022, 19, 2142010. [Google Scholar] [CrossRef]
- Phan, V.D.; Vo, C.P.; Ahn, K.K. Adaptive neural tracking control for flexible joint robot including hydraulic actuator dynamics with disturbance observer. Int. J. Robust Nonlinear Control 2024, 34, 8744–8767. [Google Scholar] [CrossRef]
- Phan, V.D.; Phan, Q.C.; Ahn, K.K. Observer-Based Adaptive Fuzzy Tracking Control for a Valve-Controlled Electro-Hydraulic System in Presence of Input Dead Zone and Internal Leakage Fault. Int. J. Fuzzy Syst. 2025. [Google Scholar] [CrossRef]
- Cai, W.; Zhang, Y.; Zhang, J.; Guo, S.; Guo, R. Prediction of Input–Output Characteristic Curves of Hydraulic Cylinders Based on Three-Layer BP Neural Network. Sensors 2025, 25, 1949. [Google Scholar] [CrossRef]
- Liu, N.; Chai, T.; Zhang, Y.; Gao, W. Data-driven optimal tuning of PID controller parameters. Sci. China-Inf. Sci. 2025, 68, 172201:1–172201:21. [Google Scholar] [CrossRef]
- Omurlu, V.E.; Yildiz, I. Parallel self-tuning fuzzy PD+ PD controller for a Stewart–Gough platform-based spatial joystick. Arab. J. Sci. Eng. 2012, 37, 2089–2102. [Google Scholar] [CrossRef]
- Taghizadeh, M.; Javad Yarmohammadi, M. Development of a self-tuning PID controller on hydraulically actuated stewart platform stabilizer with base excitation. Int. J. Control Autom. Syst. 2018, 16, 2990–2999. [Google Scholar] [CrossRef]
- Barghandan, M.; Pirmohamadi, A.A.; Mobayen, S.; Fekih, A. Optimal adaptive barrier-function super-twisting nonlinear global sliding mode scheme for trajectory tracking of parallel robots. Heliyon 2023, 9, e13378. [Google Scholar] [CrossRef]
- Shahbazi, M.; Heidari, M.; Ahmadzadeh, M. Optimization of dynamic parameter design of Stewart platform with Particle Swarm Optimization (PSO) algorithm. Adv. Mech. Eng. 2024, 16, 1–16. [Google Scholar] [CrossRef]
- Zhao, D.; Li, S.; Gao, F. Fully adaptive feedforward feedback synchronized tracking control for Stewart Platform systems. Int. J. Control Autom. Syst. 2008, 6, 689–701. [Google Scholar]
- Cai, Y.; Zheng, S.; Liu, W.; Qu, Z.; Han, J. Model analysis and modified control method of ship-mounted Stewart platforms for wave compensation. IEEE Access 2021, 9, 4505–4517. [Google Scholar] [CrossRef]
- Dong, Z.; He, S.; Liao, Y.; Wang, H.; Song, M.; Jiang, J.; Chen, G. Pressure Control in the Pump-Controlled Hydraulic Die Cushion Pressure-Building Phase Using Enhanced Model Predictive Control with Extended State Observer-Genetic Algorithm Optimization. Actuators 2025, 14, 261. [Google Scholar] [CrossRef]
- Mirza, M.A.; Li, S.; Jin, L. Simultaneous learning and control of parallel Stewart platforms with unknown parameters. Neurocomputing 2017, 266, 114–122. [Google Scholar] [CrossRef]
- Shi, Y.; Sheng, W.; Wang, J.; Jin, L.; Li, B.; Sun, X. Real-time tracking control and efficiency analyses for Stewart platform based on discrete-time recurrent neural network. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 5099–5111. [Google Scholar] [CrossRef]
- Jiang, Z.; Chen, Z.; Xu, K.; Shi, L. Distributed collaborative control to pose tracking for six-DOF parallel mechanism under multi-cylinder communication. ISA Trans. 2025, 159, 312–325. [Google Scholar] [CrossRef]
- Huang, H.C.; Chen, Y.X. Evolutionary optimization of fuzzy reinforcement learning and its application to time-varying tracking control of industrial parallel robotic manipulators. IEEE Trans. Ind. Inform. 2023, 19, 11712–11720. [Google Scholar] [CrossRef]
- Han, R.; He, H.; Wang, Y.; Wang, Y. Reinforcement Learning Based Energy Management Strategy for Fuel Cell Hybrid Electric Vehicles. Chin. J. Mech. Eng. 2025, 38, 66. [Google Scholar] [CrossRef]
- Huang, H.C.; Xu, S.S.D.; Chen, Y.X.; Chen, C.M. Reinforcement Fuzzy Q-Learning Incorporated with Genetic Kinematics Analysis for Self-organizing Holonomic Motion Control of Six-Link Stewart Platforms. Int. J. Fuzzy Syst. 2023, 25, 1239–1255. [Google Scholar] [CrossRef]
- Yadavari, H.; Tavakol Aghaei, V.; İkizoğlu, S. Deep reinforcement learning-based control of Stewart platform with parametric simulation in ROS and Gazebo. J. Mech. Robot. 2023, 15, 035001:1–035001:11. [Google Scholar] [CrossRef]
- Wang, W.; Zhang, X.; Han, L.L.; Wang, M.; Zhong, Y.B. Inverse kinematics analysis of 6-DOF Stewart platform based on homogeneous coordinate transformation. Ferroelectrics 2018, 522, 108–121. [Google Scholar] [CrossRef]
- Li, Z. Virtual Experimental Teaching Platform for Hydraulic Six-DOF Parallel Mechanism Based on UDP Communication Technology. Master’s Thesis, Taiyuan University of Science and Technology, Taiyuan, China, 2023. [Google Scholar] [CrossRef]
- Wang, Y.; Kong, Y.; Li, Z.; Zhang, H.; Li, C.; Wang, X. Virtual Reality Technology of Hydraulic Six-DOF Parallel Mechanism Driven by User Datagram Protocol Data. Sci. Technol. Eng. 2024, 24, 7760–7768. [Google Scholar]
- Wang, Y. Hydraulic Stewart Platform DDPG Motion Control. Master’s Thesis, Taiyuan University of Science and Technology, Taiyuan, China, 2024. [Google Scholar] [CrossRef]
Parameter | Value |
---|---|
Radius of moving platform r (m) | 0.39 |
Radius of fixed platform R (m) | 0.616 |
Maximum piston rod displacement (m) | 0.3 |
Initial distance between moving and fixed platforms H (m) | 0.9 |
Angle between adjacent hinge points on fixed platform at center α (°) | 12.152 |
Angle between adjacent hinge points on moving platform at center β (°) | 8.093 |
Rated current of electro-hydraulic proportional valve I (mA) | 40 |
Response frequency of electro-hydraulic servo valve f (Hz) | 80 |
Load mass m (kg) | 1000 |
Viscous friction coefficient of the Hooke joint Cu (N·s/m) | 0.0001 |
Parameter Category | Parameter Name | Value/Description |
---|---|---|
Environment Parameters | Observation space dimension | 24-dimensional vector |
Action space dimension | 18-dimensional vector | |
Network Architecture | Sampling time | 0.02 s |
Number of neurons | 512 | |
Observation space range | Deviation [−∞,+∞] m, Deviation Integral [−∞, +∞] m·s, Actual Elongation [0, 0.3] m, Current [−40, 40] mA. | |
Action space range | [−1, 1] | |
Optimizer Parameters | Optimizer type | RMSprop |
Actor learning rate | 1 × | |
Critic learning rate | 5 × | |
Gradient threshold | Actor:10, Critic:1 | |
Regularization factor | 1 × | |
Training Parameters | Discount factor | 0.99 |
Target network soft update factor | 0.05 | |
Experience replay buffer size | 1 × | |
batch size | 256 | |
Noise parameter—initial variance | 50 (Dimensionless after normalization) | |
Noise decay rate | 1 × (Decay ratio per training step, dimensionless) | |
Target network update frequency | 1 Step/Episode |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kong, Y.; Wang, Y.; Wang, Y.; Zhu, S.; Zhang, R.; Wang, L. Deep Reinforcement Learning Trajectory Tracking Control for a Six-Degree-of-Freedom Electro-Hydraulic Stewart Parallel Mechanism. Eng 2025, 6, 212. https://doi.org/10.3390/eng6090212
Kong Y, Wang Y, Wang Y, Zhu S, Zhang R, Wang L. Deep Reinforcement Learning Trajectory Tracking Control for a Six-Degree-of-Freedom Electro-Hydraulic Stewart Parallel Mechanism. Eng. 2025; 6(9):212. https://doi.org/10.3390/eng6090212
Chicago/Turabian StyleKong, Yigang, Yulong Wang, Yueran Wang, Shenghao Zhu, Ruikang Zhang, and Liting Wang. 2025. "Deep Reinforcement Learning Trajectory Tracking Control for a Six-Degree-of-Freedom Electro-Hydraulic Stewart Parallel Mechanism" Eng 6, no. 9: 212. https://doi.org/10.3390/eng6090212
APA StyleKong, Y., Wang, Y., Wang, Y., Zhu, S., Zhang, R., & Wang, L. (2025). Deep Reinforcement Learning Trajectory Tracking Control for a Six-Degree-of-Freedom Electro-Hydraulic Stewart Parallel Mechanism. Eng, 6(9), 212. https://doi.org/10.3390/eng6090212