Adaptive Quadruped Balance Control for Dynamic Environments Using Maximum-Entropy Reinforcement Learning
Abstract
:1. Introduction
2. Materials and Methods
2.1. Policy Training Details
2.1.1. Observations and Actions
2.1.2. Reward Function
2.1.3. Policy Network
2.1.4. Maximum-Entropy RL Policy Training Algorithm
2.2. Automatic Disturbance Curriculum
3. Verification Environment
4. Evaluation in Simulation and Real-World Experiments
4.1. Simulation Results and Analysis
4.2. Real-World Experiment Results and Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Raibert, M.H.; Brown, H.B. Experiments in balance with a 2D one-legged hopping machine. J. Dyn. Syst. Meas. Control Trans. ASME 1984, 106, 75–81. [Google Scholar] [CrossRef]
- Raibert, M.H.; Brown, H.B.; Chepponis, M. Experiments in balance with a 3D one-legged hopping machine. Int. J. Robot. Res. 1984, 3, 75–92. [Google Scholar] [CrossRef]
- Raibert, M.H. BigDog, the rough-terrain quadruped robot. IFAC Proc. Vol. 2008, 17, 10822–10825. [Google Scholar] [CrossRef] [Green Version]
- Xu, Z.; Gao, J.; Liu, C. Stability analysis of quadruped robot based on compliant control. In Proceedings of the 2016 IEEE International Conference on Robotics and Biomimetics, Qingdao, China, 3–7 December 2016; pp. 236–241. [Google Scholar]
- Raibert, M.H.; Chepponis, M.; Brown, H.B. Running on four legs as though they were one. IEEE J. Robot. Autom. 1986, 2, 70–82. [Google Scholar] [CrossRef] [Green Version]
- Stephens, B.J.; Atkeson, C.G. Dynamic balance force control for compliant humanoid robots. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, China, 18–22 October 2010; pp. 1248–1255. [Google Scholar] [CrossRef] [Green Version]
- Khorram, M.; Moosavian, S.A.A. Push recovery of a quadruped robot on challenging terrains. Robotica 2016, 35, 1–20. [Google Scholar] [CrossRef]
- Dini, N.; Majd, V.J.; Edrisi, F.; Attar, M. Estimation of external forces acting on the legs of a quadruped robot using two nonlinear disturbance observers. In Proceedings of the 4th RSI international conference on robotics and mechatronics (ICRoM), Tehran, Iran, 26–28 October 2016; pp. 72–77. [Google Scholar]
- Dini, N.; Majd, V.J. Sliding-Mode tracking control of a walking quadruped robot with a push recovery algorithm using a nonlinear disturbance observer as a virtual force sensor. Iran. J. Sci. Technol. Trans. Electr. Eng. 2020, 44, 1033–1057. [Google Scholar] [CrossRef]
- Fahmi, S.; Mastalli, C.; Focchi, M.; Semini, C. Passive Whole-Body Control for Quadruped Robots: Experimental Validation over Challenging Terrain. IEEE Robot. Autom. Lett. 2019, 4, 2553–2560. [Google Scholar] [CrossRef] [Green Version]
- Henze, B.; Roa, M.A.; Ott, C. Passivity-based whole-body balancing for torque-controlled humanoid robots in multi-contact scenarios. Int. J. Robot. Res. 2016, 35, 1522–1543. [Google Scholar] [CrossRef]
- Peng, X.; Abbeel, P.; Levine, S.; Van de Panne, M. DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills. ACM Trans. Graph. 2018, 35, 143. [Google Scholar] [CrossRef] [Green Version]
- Fujimoto, S.; Van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Tsounis, V.; Alge, M.; Lee, J.; Farshidian, F.; Hutter, M. DeepGait: Planning and Control of Quadrupedal Gaits Using Deep Reinforcement Learning. IEEE Robot. Autom. Lett. 2020, 5, 3699–3706. [Google Scholar] [CrossRef] [Green Version]
- Heess, N.; Dhruva, T.B.; Srinivasan, S.; Lemmon, J.; Merel, J.; Wayne, G.; Tassa, Y.; Erez, T.; Wang, Z.; Eslami, S.M.; et al. Emergence of locomotion behaviours in rich environments. arXiv 2017, arXiv:1707.02286. [Google Scholar]
- Hwangbo, J.; Lee, J.; Dosovitskiy, A.; Bellicoso, D.; Tsounis, V.; Koltun, V.; Hutter, M. Learning agile and dynamic motor skills for legged robots. Sci. Robot. 2019, 4, eaau5872. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tan, J.; Zhang, T.; Coumans, E.; Iscen, A.; Bai, Y.; Hafner, D.; Bohez, S.; Vanhoucke, V. Sim-to-Real: Learning agile locomotion for quadruped robots. In Proceedings of the 14th Robotics: Science and Systems (RSS 2018), Pennsylvania, PA, USA, 26–30 June 2018; p. 10. [Google Scholar] [CrossRef]
- Lee, J.; Hwangbo, J.; Wellhausen, L.; Koltun, V.; Hutter, M. Learning quadrupedal locomotion over challenging terrain. Sci. Robot. 2020, 5, eabc5986. [Google Scholar] [CrossRef] [PubMed]
- Lee, C.; An, D. Reinforcement learning and neural network-based artificial intelligence control algorithm for self-balancing quadruped robot. J. Mech. Sci. Technol. 2021, 35, 307–322. [Google Scholar] [CrossRef]
- Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.; Abbeel, P. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Barth-Maron, G.; Hoffman, M.W.; Budden, D.; Dabney, W.; Horgan, D.; Tb, D.; Muldal, A.; Heess, N.; Lillicrap, T. Distributed distributional deterministic policy gradients. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, Canada, 30 April–3 May 2018. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Coumans, E.; Bai, Y. PyBullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning. 2019. Available online: http://pybullet.org (accessed on 20 July 2020).
- Kau, N.; Schultz, A.; Ferrante, N.; Slade, P. Stanford doggo: An open-source, quasi-direct-drive quadruped. In Proceedings of the 2019 IEEE International Conference on Robotics and Automation (ICRA 2019), Montreal, Canada, 20–24 May 2019; pp. 6309–6315. [Google Scholar] [CrossRef] [Green Version]
50 | 4.11 | 5.27 | 2.21 | 2.08 | 5.45 | 6.87 | 3.20 | 2.32 | 6.94 | 8.08 | 3.69 | 2.53 |
80 | 4.01 | 5.06 | 2.75 | 1.87 | 5.34 | 6.94 | 3.07 | 1.50 | 6.70 | 8.22 | 3.72 | 2.29 |
100 | 4.01 | 5.13 | 2.52 | 1.78 | 5.25 | 7.21 | 3.02 | 1.55 | 6.59 | 8.07 | 4.08 | 2.27 |
120 | 3.74 | 4.89 | 2.38 | 1.67 | 5.74 | 7.13 | 3.27 | 2.20 | 6.63 | 8.45 | 3.87 | 2.23 |
160 | 4.03 | 6.10 | 3.17 | 1.80 | 6.38 | 8.86 | 4.25 | 2.25 | 6.69 | 9.02 | 3.83 | 2.04 |
200 | 3.72 | 5.96 | 2.53 | 1.67 | 6.39 | 8.13 | 3.55 | 2.16 | 6.75 | 8.85 | 3.99 | 2.13 |
50 | 49.37 | 60.47 | 57.47 | 66.31 | 63.58 | 68.71 |
80 | 53.49 | 63.15 | 71.89 | 78.38 | 65.85 | 72.17 |
100 | 55.52 | 65.26 | 70.54 | 78.53 | 65.56 | 71.89 |
120 | 55.32 | 65.85 | 61.72 | 69.19 | 66.36 | 73.61 |
160 | 55.32 | 70.48 | 64.70 | 74.56 | 69.46 | 77.36 |
200 | 55.18 | 72.03 | 66.21 | 73.45 | 68.48 | 75.88 |
50 | 46.33 | 58.14 | 41.20 | 53.35 | 46.19 | 54.29 |
80 | 31.32 | 45.57 | 42.45 | 55.72 | 44.41 | 54.69 |
100 | 37.07 | 50.81 | 42.41 | 58.06 | 38.15 | 49.49 |
120 | 36.30 | 51.28 | 42.96 | 54.08 | 41.60 | 54.18 |
160 | 31.23 | 47.96 | 33.33 | 51.99 | 42.81 | 57.58 |
200 | 32.02 | 57.57 | 44.46 | 56.35 | 40.92 | 54.94 |
T = 10 s | T = 20 s | |||||
---|---|---|---|---|---|---|
PF | 0.34 | 4.54 | 8.56 | 0.17 | 4.94 | 9.96 |
CG | 0.25 | 5.12 | 10.37 | 0.04 | 5.45 | 11.43 |
EG | 0.06 | 1.73 | 4.01 | 0.04 | 2.08 | 4.46 |
(°) | (°) | (°) | % | |
---|---|---|---|---|
PF | 0.02 | 4.30 | 9.30 | \ |
EG | 0.04 | 1.84 | 5.66 | 57.27 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, H.; Fu, T.; Ling, Y.; He, C. Adaptive Quadruped Balance Control for Dynamic Environments Using Maximum-Entropy Reinforcement Learning. Sensors 2021, 21, 5907. https://doi.org/10.3390/s21175907
Sun H, Fu T, Ling Y, He C. Adaptive Quadruped Balance Control for Dynamic Environments Using Maximum-Entropy Reinforcement Learning. Sensors. 2021; 21(17):5907. https://doi.org/10.3390/s21175907
Chicago/Turabian StyleSun, Haoran, Tingting Fu, Yuanhuai Ling, and Chaoming He. 2021. "Adaptive Quadruped Balance Control for Dynamic Environments Using Maximum-Entropy Reinforcement Learning" Sensors 21, no. 17: 5907. https://doi.org/10.3390/s21175907
APA StyleSun, H., Fu, T., Ling, Y., & He, C. (2021). Adaptive Quadruped Balance Control for Dynamic Environments Using Maximum-Entropy Reinforcement Learning. Sensors, 21(17), 5907. https://doi.org/10.3390/s21175907