Efficient Reinforcement Learning for 3D Jumping Monopods
Abstract
1. Introduction
Paper Contribution
2. Guided Reinforcement Learning for Jumping
2.1. Problem Description
2.2. Overview of the Approach
3. Learning Framework
3.1. The Action Space
Trajectory Parametrization in Cartesian Space
3.2. A Physically Informative Reward Function
3.3. Implementation Details
4. Simulation Results
4.1. Non-Linear Trajectory Optimization
4.2. End-to-End RL
4.3. Policy Performance: The Feasibility Region
4.3.1. Performance Baseline: Trajectory Optimization
4.3.2. Performance of GRL
4.3.3. Performance Baseline: E2E RL
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jenelten, F.; Grandia, R.; Farshidian, F.; Hutter, M. TAMOLS: Terrain-Aware Motion Optimization for Legged Systems. IEEE Trans. Robot. 2022, 38, 3395–3413. [Google Scholar] [CrossRef]
- Roscia, F.; Focchi, M.; Prete, A.D.; Caldwell, D.G.; Semini, C. Reactive Landing Controller for Quadruped Robots. IEEE Robot. Autom. Lett. 2023, 8, 7210–7217. [Google Scholar] [CrossRef]
- Park, H.W.; Wensing, P.M.; Kim, S. High-speed bounding with the MIT Cheetah 2: Control design and experiments. Int. J. Robot. Res. 2017, 36, 167–192. [Google Scholar] [CrossRef]
- Yim, J.K.; Singh, B.R.P.; Wang, E.K.; Featherstone, R.; Fearing, R.S. Precision Robotic Leaping and Landing Using Stance-Phase Balance. IEEE Robot. Autom. Lett. 2020, 5, 3422–3429. [Google Scholar] [CrossRef]
- Nguyen, C.; Nguyen, Q. Contact-timing and Trajectory Optimization for 3D Jumping on Quadruped Robots. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 11994–11999. [Google Scholar] [CrossRef]
- Chignoli, M.; Kim, S. Online Trajectory Optimization for Dynamic Aerial Motions of a Quadruped Robot. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: New York, NY, USA, 2021; pp. 7693–7699. [Google Scholar]
- García, G.; Griffin, R.; Pratt, J. Time-Varying Model Predictive Control for Highly Dynamic Motions of Quadrupedal Robots. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 7344–7349. [Google Scholar] [CrossRef]
- Chignoli, M.; Morozov, S.; Kim, S. Rapid and Reliable Quadruped Motion Planning with Omnidirectional Jumping. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 6621–6627. [Google Scholar] [CrossRef]
- Mastalli, C.; Merkt, W.; Xin, G.; Shim, J.; Mistry, M.; Havoutis, I.; Vijayakumar, S. Agile Maneuvers in Legged Robots:a Predictive Control Approach. arXiv 2022, arXiv:2203.07554v2. [Google Scholar]
- Li, H.; Wensing, P.M. Cafe-Mpc: A Cascaded-Fidelity Model Predictive Control Framework with Tuning-Free Whole-Body Control. arXiv 2024, arXiv:2403.03995. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.M.O.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Gehring, C.; Coros, S.; Hutler, M.; Dario Bellicoso, C.; Heijnen, H.; Diethelm, R.; Bloesch, M.; Fankhauser, P.; Hwangbo, J.; Hoepflinger, M.; et al. Practice Makes Perfect: An Optimization-Based Approach to Controlling Agile Motions for a Quadruped Robot. IEEE Robot. Autom. Mag. 2016, 23, 34–43. [Google Scholar] [CrossRef]
- Hwangbo, J.; Lee, J.; Dosovitskiy, A.; Bellicoso, D.; Tsounis, V.; Koltun, V.; Hutter, M. Learning agile and dynamic motor skills for legged robots. Sci. Robot. 2019, 4, eaau5872. [Google Scholar] [CrossRef] [PubMed]
- Peng, X.; Coumans, E.; Zhang, T.; Lee, T.W.; Tan, J.; Levine, S. Learning Agile Robotic Locomotion Skills by Imitating Animals. In Proceedings of the Robotics: Science and Systems 2020, Corvalis, OR, USA, 12–16 July 2020. [Google Scholar] [CrossRef]
- Ji, G.; Mun, J.; Kim, H.; Hwangbo, J. Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion. IEEE Robot. Autom. Lett. 2022, 7, 4630–4637. [Google Scholar] [CrossRef]
- Rudin, N.; Hoeller, D.; Reist, P.; Hutter, M. Learning to walk in minutes using massively parallel deep reinforcement learning. In Proceedings of the Conference on Robot Learning. PMLR, Auckland, New Zealand, 14–18 December 2022; pp. 91–100. [Google Scholar]
- Fankhauser, P.; Hutter, M.; Gehring, C.; Bloesch, M.; Hoepflinger, M.A.; Siegwart, R. Reinforcement learning of single legged locomotion. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 188–193. [Google Scholar] [CrossRef]
- OpenAI. Benchmarks for Spinning Up Implementations. 2022. Available online: https://spinningup.openai.com/en/latest/spinningup/bench.html#benchmarks-for-spinning-up-implementations (accessed on 26 February 2023).
- Bogdanovic, M.; Khadiv, M.; Righetti, L. Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization. Front. Robot. AI 2022, 9, 854212. [Google Scholar] [CrossRef] [PubMed]
- Bellegarda, G.; Nguyen, C.; Nguyen, Q. Robust Quadruped Jumping via Deep Reinforcement Learning. arXiv 2023, arXiv:cs.RO/2011.07089. [Google Scholar]
- Grandesso, G.; Alboni, E.; Papini, G.P.; Wensing, P.M.; Prete, A.D. CACTO: Continuous Actor-Critic with Trajectory Optimization-Towards Global Optimality. IEEE Robot. Autom. Lett. 2023, 8, 3318–3325. [Google Scholar] [CrossRef]
- Peng, X.B.; van de Panne, M. Learning Locomotion Skills Using DeepRL: Does the Choice of Action Space Matter? In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, Los Angeles, CA, USA, 28–30 July 2017. [Google Scholar]
- Bellegarda, G.; Byl, K. Training in Task Space to Speed Up and Guide Reinforcement Learning. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 2693–2699. [Google Scholar] [CrossRef]
- Chen, S.; Zhang, B.; Mueller, M.W.; Rai, A.; Sreenath, K. Learning Torque Control for Quadrupedal Locomotion. arXiv 2022, arXiv:2203.05194. [Google Scholar]
- Aractingi, M.; Léziart, P.A.; Flayols, T.; Perez, J.; Silander, T.; Souères, P. Controlling the Solo12 Quadruped Robot with Deep Reinforcement Learning. Sci. Rep. 2023, 13, 11945. [Google Scholar] [CrossRef] [PubMed]
- Majid, A.Y.; Saaybi, S.; van Rietbergen, T.; François-Lavet, V.; Prasad, R.V.; Verhoeven, C. Deep Reinforcement Learning Versus Evolution Strategies: A Comparative Survey. arXiv 2021, arXiv:2110.01411. [Google Scholar] [CrossRef] [PubMed]
- Atanassov, V.; Ding, J.; Kober, J.; Havoutis, I.; Santina, C.D. Curriculum-Based Reinforcement Learning for Quadrupedal Jumping: A Reference-free Design. arXiv 2024, arXiv:2401.16337. [Google Scholar]
- Yang, Y.; Meng, X.; Yu, W.; Zhang, T.; Tan, J.; Boots, B. Continuous Versatile Jumping Using Learned Action Residuals. In Proceedings of the Machine Learning Research PMLR, Philadelphia, PA, USA, 15–16 June 2023; Volume 211, pp. 770–782. [Google Scholar]
- Vezzi, F.; Ding, J.; Raffin, A.; Kober, J.; Della Santina, C. Two-Stage Learning of Highly Dynamic Motions with Rigid and Articulated Soft Quadrupeds. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024. [Google Scholar]
- Henderson, P.; Hu, J.; Romoff, J.; Brunskill, E.; Jurafsky, D.; Pineau, J. Towards the systematic reporting of the energy and carbon footprints of machine learning. J. Mach. Learn. Res. 2020, 21, 10039–10081. [Google Scholar]
- Mock, J.W.; Muknahallipatna, S.S. A comparison of ppo, td3 and sac reinforcement algorithms for quadruped walking gait generation. J. Intell. Learn. Syst. Appl. 2023, 15, 36–56. [Google Scholar] [CrossRef]
- Shafiee, M.; Bellegarda, G.; Ijspeert, A. ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots. arXiv 2024, arXiv:2310.10486. [Google Scholar]
- Zador, A.M. A critique of pure learning and what artificial neural networks can learn from animal brains. Nat. Commun. 2019, 10, 3770. [Google Scholar] [CrossRef] [PubMed]
- Shen, H.; Yosinski, J.; Kormushev, P.; Caldwell, D.G.; Lipson, H. Learning Fast Quadruped Robot Gaits with the RL PoWER Spline Parameterization. Cybern. Inf. Technol. 2013, 12, 66–75. [Google Scholar] [CrossRef]
- Kim, T.; Lee, S.H. Quadruped Locomotion on Non-Rigid Terrain using Reinforcement Learning. arXiv 2021, arXiv:2107.02955. [Google Scholar]
- Ji, Y.; Li, Z.; Sun, Y.; Peng, X.B.; Levine, S.; Berseth, G.; Sreenath, K. Hierarchical Reinforcement Learning for Precise Soccer Shooting Skills using a Quadrupedal Robot. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 1479–1486. [Google Scholar] [CrossRef]
- Grzes, M. Reward Shaping in Episodic Reinforcement Learning; ACM: New York, NY, USA, 2017. [Google Scholar]
- Focchi, M.; Roscia, F.; Semini, C. Locosim: An Open-Source Cross-PlatformRobotics Framework. In Synergetic Cooperation between Robots and Humans, Proceedings of the CLAWAR 2023, Florianopolis, Brazil, 2–4 October 2023; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2024; pp. 395–406. [Google Scholar] [CrossRef]
- Budhiraja, R.; Carpentier, J.; Mastalli, C.; Mansard, N. Differential Dynamic Programming for Multi-Phase Rigid Contact Dynamics. In Proceedings of the IEEE International Conference on Humanoid Robots, Beijing, China, 6–9 November 2018. [Google Scholar]
- Mastalli, C.; Budhiraja, R.; Merkt, W.; Saurel, G.; Hammoud, B.; Naveau, M.; Carpentier, J.; Righetti, L.; Vijayakumar, S.; Mansard, N. Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control. In Proceedings of the IEEE International Conference on Robotics and Automation, Paris, France, 31 May–31 August 2020; pp. 2536–2542. [Google Scholar] [CrossRef]
- Carpentier, J.; Saurel, G.; Buondonno, G.; Mirabel, J.; Lamiraux, F.; Stasse, O.; Mansard, N. The Pinocchio C++ library—A fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives. In Proceedings of the IEEE International Symposium on System Integrations (SII), Paris, France, 14–16 January 2019. [Google Scholar]
- Gangapurwala, S.; Campanaro, L.; Havoutis, I. Learning Low-Frequency Motion Control for Robust and Dynamic Robot Locomotion. arXiv 2022, arXiv:2209.14887. [Google Scholar]
- Zhao, T.Z.; Kumar, V.; Levine, S.; Finn, C. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware. arXiv 2023, arXiv:2304.13705. [Google Scholar]
- Jeon, S.H.; Heim, S.; Khazoom, C.; Kim, S. Benchmarking Potential Based Rewards for Learning Humanoid Locomotion. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 9204–9210. [Google Scholar] [CrossRef]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; Volume 382, pp. 41–48. [Google Scholar] [CrossRef]
Variable | Name | Range |
---|---|---|
m | Robot mass [kg] | 1.5 |
P | Proportional gain | 10 |
D | Derivative gain | 0.2 |
Nominal configuration | [rad] | |
Simulator time step [s] | 0.001 | |
Max torque [Nm] | 8 | |
Touch-down force th. [N] | 1 | |
Num. of expl. steps | 1280 (GRL), 10 × 104 (E2E) | |
Batch size | 256 (GRL), 512 (E2E) | |
Expl. noise | 0.4 (GRL), 0.3 (E2E) | |
Landing target repetition | 5 (GRL), 20 (E2E) | |
Training step interval | 1 (GRL), 100 (E2E) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bussola, R.; Focchi, M.; Del Prete, A.; Fontanelli, D.; Palopoli, L. Efficient Reinforcement Learning for 3D Jumping Monopods. Sensors 2024, 24, 4981. https://doi.org/10.3390/s24154981
Bussola R, Focchi M, Del Prete A, Fontanelli D, Palopoli L. Efficient Reinforcement Learning for 3D Jumping Monopods. Sensors. 2024; 24(15):4981. https://doi.org/10.3390/s24154981
Chicago/Turabian StyleBussola, Riccardo, Michele Focchi, Andrea Del Prete, Daniele Fontanelli, and Luigi Palopoli. 2024. "Efficient Reinforcement Learning for 3D Jumping Monopods" Sensors 24, no. 15: 4981. https://doi.org/10.3390/s24154981
APA StyleBussola, R., Focchi, M., Del Prete, A., Fontanelli, D., & Palopoli, L. (2024). Efficient Reinforcement Learning for 3D Jumping Monopods. Sensors, 24(15), 4981. https://doi.org/10.3390/s24154981