A Motion Control Strategy for a Blind Hexapod Robot Based on Reinforcement Learning and Central Pattern Generator
Abstract
1. Introduction
2. CPG Control Network
2.1. The Mathematical Model of a Single CPG
2.2. CPG Control Network Structure for a Hexapod Robot
2.3. Tripod Gait
3. Proprioceptive Movement
3.1. Proprioception
3.2. The Proprioception of a Hexapod Robot
4. Reinforcement Learning Algorithm
5. Strategy Network Training
5.1. Training Platforms
5.2. Parametric Policy Network Training
- Collect a batch of samples using a random policy ;
- Update the Q-function based on the samples: for ;
- Update the policy parameters: ;
- Update the temperature coefficient: ;
- Update the policy network parameters: for .
5.3. Training Result
6. Policy Network Performance Analysis
6.1. Physical Model of a Hexapod Robot
6.2. Policy Network Performance Analysis
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Dupeyroux, J.; Serres, J.R.; Viollet, S. AntBot: A six-legged walking robot able to home like desert ants in outdoor environments. Sci. Robot. 2019, 4, eaau0307. [Google Scholar] [CrossRef] [PubMed]
- Gao, Y.; Wei, W.; Wang, X.; Li, Y.; Yu, Q. Feasibility, planning and control of ground-wall transition for a suctorial hexapod robot. Appl. Intell. 2021, 51, 5506–5524. [Google Scholar] [CrossRef]
- Buchanan, R.; Wellhausen, L.; Bjelonic, M.; Bandyopadhyay, T.; Kottege, N.; Hutter, M. Perceptive whole-body planning for multilegged robots in confined spaces. J. Field Robot. 2021, 38, 68–84. [Google Scholar] [CrossRef]
- Song, X.; Zhang, X.; Meng, X.; Chen, C.; Huang, D. Gait optimization of step climbing for a hexapod robot. J. Field Robot. 2022, 39, 55–68. [Google Scholar] [CrossRef]
- Homchanthanakul, J.; Manoonpong, P. Continuous online adaptation of bioinspired adaptive neuroendocrine control for autonomous walking robots. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 1833–1845. [Google Scholar] [CrossRef]
- Thor, M.; Manoonpong, P. Versatile modular neural locomotion control with fast learning. Nat. Mach. Intell. 2022, 4, 169–179. [Google Scholar] [CrossRef]
- Manoonpong, P.; Patanè, L.; Xiong, X.; Brodoline, I.; Dupeyroux, J.; Viollet, S.; Arena, P.; Serres, J.R. Insect-inspired robots: Bridging biological and artificial systems. Sensors 2021, 21, 7609. [Google Scholar] [CrossRef]
- Mao, L.; Gao, F.; Tian, Y.; Zhao, Y. Novel method for preventing shin-collisions in six-legged robots by utilising a robot–terrain interference model. Mech. Mach. Theory 2020, 151, 103897. [Google Scholar] [CrossRef]
- Bjelonic, M.; Kottege, N.; Homberger, T.; Borges, P.; Beckerle, P.; Chli, M. Weaver: Hexapod Robot for Autonomous Navigation on Unstructured Terrain. J. Field Robot. 2018, 35, 1063–1079. [Google Scholar] [CrossRef]
- Azayev, T.; Zimmerman, K. Blind Hexapod Locomotion in Complex Terrain with Gait Adaptation Using Deep Reinforcement Learning and Classification. J. Intell. Robot. Syst. 2020, 99, 659–671. [Google Scholar] [CrossRef]
- Bachega, R.P.; Neves, G.P.D.; Campo, A.B.; Angelico, B.A. Flexibility in Hexapod Robots: Exploring Mobility of the Body. IEEE Access 2023, 11, 110454–110471. [Google Scholar] [CrossRef]
- Ruscelli, F.; Sartoretti, G.; Nan, J.; Feng, Z.; Travers, M.; Choset, H. Proprioceptive-Inertial Autonomous Locomotion for Articulated Robots. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 3436–3441. [Google Scholar]
- Ma, F.; Yan, W.; Chen, L.; Cui, R. CPG-based Motion Planning of Hybrid Underwater Hexapod Robot for Wall Climbing and Transition. IEEE Robot. Autom. Lett. 2022, 7, 12299–12306. [Google Scholar] [CrossRef]
- Helal, K.; Albadin, A.; Albitar, C.; Alsaba, M. Workspace trajectory generation with smooth gait transition using CPG-based locomotion control for hexapod robot. Heliyon 2024, 10, e31847. [Google Scholar] [CrossRef]
- Chen, Y.; Zhou, Y. Machine learning based decision making for time varying systems: Parameter estimation and performance optimization. Knowl.-Based Syst. 2020, 190, 105479. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Huangfu, Z.; Feng, Y.; Chen, Y. A soft actor-critic approach for a blind walking hexapod robot with obstacle avoidance. Actuators 2023, 12, 393. [Google Scholar] [CrossRef]
- Cheng, C.; Zhang, H.; Sun, Y.; Tao, H.; Chen, Y. A cross-platform deep reinforcement learning model for autonomous navigation without global information in different scenes. Control Eng. Pract. 2024, 150, 105991. [Google Scholar] [CrossRef]
- Ren, J.; Gosgnach, S. Localization of rhythm generating components of the mammalian locomotor central pattern generator. Neuroscience 2023, 513, 28–37. [Google Scholar] [CrossRef]
- Wang, C.; Gao, J.; Deng, Z.; Zhang, Y.; Zheng, C.; Liu, X.; Sperandio, I.; Chen, J. Extracurricular sports activities modify the proprioceptive map in children aged 5–8 years. Sci. Rep. 2022, 12, 9338. [Google Scholar] [CrossRef]
- Qu, X.; Hu, X.; Zhao, J.; Zhao, Z. The roles of lower-limb joint proprioception in postural control during gait. Appl. Ergon. 2022, 99, 103635. [Google Scholar] [CrossRef]
- Wu, P.; Tian, E.; Tao, H.; Chen, Y. Data-driven spiking neural networks for intelligent fault detection in vehicle lithium-ion battery systems. Eng. Appl. Artif. Intell. 2025, 141, 109756. [Google Scholar] [CrossRef]
- Wang, G.; Li, Z.; Weng, G.; Chen, Y. An overview of industrial image segmentation using deep learning models. Intell. Robot. 2025, 5, 143–180. [Google Scholar] [CrossRef]
- Boudlal, A.; Khafaji, A.; Elabbadi, J. Entropy adjustment by interpolation for exploration in Proximal Policy Optimization (PPO). Eng. Appl. Artif. Intell. 2024, 133, 108401. [Google Scholar] [CrossRef]
- Cho, Y.i.; Kim, B.; Yoon, H.C.; Woo, J.H. Locating algorithm of steel stock area with asynchronous advantage actor-critic reinforcement learning. J. Comput. Des. Eng. 2024, 11, 230–246. [Google Scholar] [CrossRef]
- Sumiea, E.H.; Abdulkadir, S.J.; Alhussian, H.S.; Al-Selwi, S.M.; Alqushaibi, A.; Ragab, M.G.; Fati, S.M. Deep deterministic policy gradient algorithm: A systematic review. Heliyon 2024, 10, e30697. [Google Scholar] [CrossRef] [PubMed]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Hou, Y.; Han, G.; Zhang, F.; Lin, C.; Peng, J.; Liu, L. Distributional soft actor-critic-based multi-auv cooperative pursuit for maritime security protection. IEEE Trans. Intell. Transp. Syst. 2023, 25, 6049–6060. [Google Scholar] [CrossRef]
- Kadokawa, Y.; Kodera, T.; Tsurumine, Y.; Nishimura, S.; Matsubara, T. Robust iterative value conversion: Deep reinforcement learning for neurochip-driven edge robots. Robot. Auton. Syst. 2024, 181, 104782. [Google Scholar] [CrossRef]
- Todorov, E.; Erez, T.; Tassa, Y. Mujoco: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 5026–5033. [Google Scholar]
- Zang, W.; Song, D. Energy-saving profile optimization for underwater glider sampling: The soft actor critic method. Measurement 2023, 217, 113008. [Google Scholar] [CrossRef]
- Banerjee, C.; Chen, Z.; Noman, N. Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples With On-Policy Experiences. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 3121–3129. [Google Scholar] [CrossRef]
Learning Rate | Batch Size | Buffer Sie | Gamma | Tau |
---|---|---|---|---|
0.0003 | 256 | 1,000,000 | 0.99 | 0.85 |
Left Leg | Right Leg | |
---|---|---|
Front Leg | (93.6, 50.805, 0) | (93.6, −50.805, 0) |
Middle Leg | (0, 73.575, 0) | (0, −73.575, 0) |
Rear Leg | (−93.6, 50.805, 0) | (−93.6, −50.805, 0) |
Standard Deviation | Terrain 1 | Terrain 2 | Terrain 3 |
---|---|---|---|
Vertical Acceleration () | 2.97 | 2.39 | 2.78 |
Oscillations around roll () | 0.0206 | 0.0191 | 0.0133 |
Oscillations around pitch () | 0.0231 | 0.0214 | 0.0172 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, L.; Li, R.; Wang, X.; Gao, W.; Chen, Y. A Motion Control Strategy for a Blind Hexapod Robot Based on Reinforcement Learning and Central Pattern Generator. Symmetry 2025, 17, 1058. https://doi.org/10.3390/sym17071058
Wang L, Li R, Wang X, Gao W, Chen Y. A Motion Control Strategy for a Blind Hexapod Robot Based on Reinforcement Learning and Central Pattern Generator. Symmetry. 2025; 17(7):1058. https://doi.org/10.3390/sym17071058
Chicago/Turabian StyleWang, Lei, Ruiwen Li, Xiaoxiao Wang, Weidong Gao, and Yiyang Chen. 2025. "A Motion Control Strategy for a Blind Hexapod Robot Based on Reinforcement Learning and Central Pattern Generator" Symmetry 17, no. 7: 1058. https://doi.org/10.3390/sym17071058
APA StyleWang, L., Li, R., Wang, X., Gao, W., & Chen, Y. (2025). A Motion Control Strategy for a Blind Hexapod Robot Based on Reinforcement Learning and Central Pattern Generator. Symmetry, 17(7), 1058. https://doi.org/10.3390/sym17071058