Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning
Abstract
:1. Introduction
2. Formulation of the Problem
3. Framework
3.1. Framework of the Proposed Hybrid Reinforcement Learning
3.2. Hierarchical Gaussian Processes
3.3. Initial Control Inputs
3.4. Model-Free Fine Tuning
Algorithm 1 Updating |
1: For transition , from back to front do |
2: If is terminal |
3: Update |
4: Else |
5: Obtain the next transition from |
6: Update |
7: End If |
8: End For |
Algorithm 2 Hybrid Reinforcement Learning Algorithm |
1: Initialize: State space , and action space |
2: Apply discrete actions randomly to the robot and collect data set |
3: Using to generate , and then obtain the transition model |
4: Obtain the reduced action space |
5: Initialize using |
6: Initialize replay memory , to capacity , parameter vector |
7: For do |
8: Reset the robot and the platform to their initial position, transition temporary table |
9: For do |
10: Obtain current state from sensors’ reading |
11: Select a random action from with probability , otherwise select , observe the next state , and receive an immediate reward |
12: Append transition to |
13: If is within stable region |
14: Update using Algorithm 1 |
15: Store in , refresh : |
16: End If |
17: Sample random minibatch of transitions from , |
18: Apply a gradient descent on to improve Q-function |
19: Every step, update using Algorithm 1 |
20: End For |
21: End For |
4. Experiment and Discussion
4.1. Experiment Setup
4.2. Experiment Results
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Nwokah, O.D.; Hurmuzlu, Y. The Mechanical System Design Handbook Modeling, Measurement, and Control; CRC Press: Boca Raton, FL, USA, 2001. [Google Scholar]
- Chevallereau, C.; Bessonnet, G.; Abba, G.; Aoustin, Y. Bipedal Robots: Modeling, Design and Walking Synthesis, 1st ed.; Wiley-ISTE: London, UK, 2008. [Google Scholar]
- Gil, C.R.; Calvo, H.; Sossa, H. Learning an Efficient Gait Cycle of a Biped Robot Based on Reinforcement Learning and Artificial Neural Networks. Appl. Sci. 2019, 9, 502. [Google Scholar] [CrossRef] [Green Version]
- Vukobratovic, M.; Borovac, B. Zero-moment point thirty-five years of its life. Int. J. Hum. Robot. 2004, 1, 157–173. [Google Scholar] [CrossRef]
- Strom, J.; Slavov, G.; Chown, E. Omnidirectional Walking Using ZMP and Preview Control for the NAO Humanoid Robot. In RoboCup 2009: Robot Soccer World Cup XIII; Springer: Berlin/Heidelberg, Germany, 2009; pp. 378–389. [Google Scholar]
- Yi, J.; Zhu, Q.; Xiong, R.; Wu, J. Walking Algorithm of humanoid Robot on Uneven Terrain with Terrain Estimator. Int. J. Adv. Robot. Syst. 2016, 13, 35. [Google Scholar] [CrossRef] [Green Version]
- Lee, H.; Yang, J.; Zhang, S.; Chen, Q. Research on the Stability of Biped Robot Walking on Different Road Surfaces. In Proceedings of the 2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII), Jeju, Korea, 23–27 July 2018; pp. 54–57. [Google Scholar]
- Yoshida, Y.; Takeuchi, K.; Sato, D.; Nemchev, D. Balance control of humanoid robots in response to disturbances in the frontal plane. In Proceedings of the 2011 IEEE International Conference on Robotics and Biomimetics, Phuket, Thailand, 7–11 December 2011; pp. 2241–2242. [Google Scholar]
- Zhong, Q.; Chen, F. Trajectory planning for biped robot walking on uneven terrain—Taking stepping as an example. CAAI Trans. Intell. Technol. 2016, 1, 197–209. [Google Scholar] [CrossRef]
- Gong, Y.; Hartley, R.; Da, X.; Hereid, A.; Harib, O.; Huang, J.K.; Grizzle, J. Feedback Control of a Cassie Bipedal Robot: Walking, Standing, and Riding a Segway. In Proceedings of the 2019 American Control Conference (ACC), Philadelphia, PA, USA, 10–12 July 2019. [Google Scholar]
- Wang, S.; Chavalitwongse, W.; Robert, B. Machine Learning Algorithms in Bipedal Robot Control. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2012, 42, 728–743. [Google Scholar] [CrossRef]
- Valdez, F.; Castillo, O.; Caraveo, C.; Peraza, C. Comparative Study of the Conventional Mathematical and Fuzzy Logic Controllers for Velocity Regulation. Axioms 2019, 8, 53. [Google Scholar] [CrossRef] [Green Version]
- Juang, C.F.; Yeh, Y.T. Multiobjective Evolution of Biped Robot Gaits Using Advanced Continuous Ant-Colony Optimized Recurrent Neural Networks. IEEE Trans. Cybern. 2018, 48, 1910–1922. [Google Scholar] [CrossRef]
- Ferreira, J.P.; Crisóstomo, M.M.; Coimbra, A.P. SVR Versus Neural-Fuzzy Network Controllers for the Sagittal Balance of a Biped Robot. IEEE Trans. Neural Netw. 2009, 20, 1885–1897. [Google Scholar] [CrossRef]
- Sun, C.; He, W.; Ge, W.; Chang, C. Adaptive Neural Network Control of Biped Robots. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 315–326. [Google Scholar] [CrossRef]
- Saputra, A.A.; Botzheim, J.; Sulistijono, I.A.; Kubota, N. Biologically Inspired Control System for 3-D Locomotion of a Humanoid Biped Robot. IEEE Trans. Syst. Man Cybern. Syst. 2016, 46, 898–911. [Google Scholar] [CrossRef]
- Katic, D.M.; Vukobratovic, M.K. Hybrid Dynamic Control Algorithm for Humanoid Robots Based on Reinforcement Learning. J. Intell. Robot. Syst. 2008, 51, 3–30. [Google Scholar] [CrossRef]
- Guerrero, N.N.; Weber, C.; Schroeter, P.; Wermter, S. Real-world reinforcement for autonomous humanoid robot docking. Robot. Auton. Syst. 2012, 60, 1400–1407. [Google Scholar] [CrossRef]
- Hwang, K.S.; Jiang, W.C.; Chen, Y.J.; Shi, H. Gait Balance and Acceleration of a Biped Robot Based on Q-Learning. IEEE Access 2016, 4, 2439–2449. [Google Scholar]
- Hwang, K.S.; Jiang, W.C.; Chen, Y.J.; Shi, H. Motion Segmentation and Balancing for a Biped Robot’s Imitation Learning. IEEE Trans. Ind. Inform. 2017, 13, 1099–1108. [Google Scholar] [CrossRef]
- Wu, W.; Gao, L. Posture self-stabilizer of a bipedal robot based on training platform and reinforcement learning. J. Robot. Auton. Syst. 2017, 98, 42–55. [Google Scholar] [CrossRef]
- Seo, D.; Kim, H.; Kim, D. Push Recovery Control for Humanoid Robot using Reinforcement Learning. In Proceedings of the 2019 Third IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 25–27 February 2019. [Google Scholar]
- Shi, Q.; Ying, W.; Lv, L.; Xie, J. Deep reinforcement learning-based attitude motion control for humanoid robots with stability constraints. Ind. Robot 2020, 47, 335–347. [Google Scholar] [CrossRef]
- Garcia, J.; Shafie, D. Teaching a humanoid robot to walk faster through Safe Reinforcement Learning. Eng. Appl. Artif. Intell. 2020, 88, 103360. [Google Scholar] [CrossRef]
- Polydoros, A.S.; Nalpantidis, L. Survey of Model-Based Rinforcement Learning: Application on Robotics. J. Intell. Robot. Syst. 2017, 86, 153–173. [Google Scholar] [CrossRef]
- Deisenroth, M.P.; Rasmussen, C.E. PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; pp. 465–472. [Google Scholar]
- Englert, P.; Paraschos, A.; Peters, J.; Deisenroth, M.P. Model-based imitation learning by probabilistic trajectory matching. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 1922–1927. [Google Scholar]
- Deisenroth, M.P.; Fox, D.; Rasmussen, C.E. Gaussian Processes for Data-Efficient Learning in Robotics and Control. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 408–423. [Google Scholar] [CrossRef] [Green Version]
- Coates, A.; Abbeel, P.; Ng, A.Y. Apprenticeship learning for helicopter control. Commun. ACM 2009, 52, 97–105. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Li, Z.; Silander, T.; Leong, T. Online feature selection for model-based reinforcement learning. Proc. Int. Conf. Int. Conf. Mach. Learn. 2013, 28, 498–506. [Google Scholar]
- Nagabandi, A.; Kahn, G.; Fearing, R.S.; Levine, S. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 7579–7586. [Google Scholar]
- Pong, V.; Gu, S.; Dalal, M.; Levine, S. Temporal Difference Model: Model-Free RL for Model-Based Control. arXiv 2018, arXiv:1802.0908v1. [Google Scholar]
- Feinberg, V.; Wan, A.; Stoica, I.; Jordan, M.I.; Gonzalez, J.E.; Levine, S. Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning. arXiv 2018, arXiv:1803.00101v1. [Google Scholar]
- Gu, S.; Lillicrap, T.; Sutskever, I.; Levine, S. Continuous Deep Q-Learning with Model-based Acceleration. arXiv 2016, arXiv:1603.00748v1. [Google Scholar]
- Hafez, M.B.; Weber, C.; Kerzel, M.; Wermter, S. Curious Meta-Controller: Adaptive Alternation between Model-Based and Model-Free Control in Deep Reinforcement Learning. arXiv 2019, arXiv:1905.01718v1. [Google Scholar]
- Nagabandi, A.; Kahn, G.; Fearing, R.S.; Levine, S. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning. arXiv 2017, arXiv:1708.02596v2. [Google Scholar]
- Alcaraz-Jiménez, J.J.; Herrero-Pérez, D.; Martínez-Barberá, H. Robust feedback control of ZMP-based gait for the humanoid robot Nao. Int. J. Robot. Res. 2013, 32, 1074–1088. [Google Scholar] [CrossRef]
- Xi, A.; Mudiyanselage, T.W.; Tao, D.; Chen, C. Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning. IEEE-CAA J. Autom. Sin. 2019, 6, 938–951. [Google Scholar] [CrossRef]
- Daley, B.; Amato, C. Efficient Eligibility Traces for Deep Reinforcement Learning. arXiv 2018, arXiv:1810.09967v1. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning an Introduction; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Feng, J.; Fyfe, C.; Jain, L.C. Experimental analysis on Sarsa(lambda) and Q(lambda) with different eligibility traces strategies. J. Intell. Fuzzy Syst. 2009, 20, 73–82. [Google Scholar]
- Peng, J.; Williams, R.J. Incremental Multi-Step Q-Learning. Mach. Learn. 1996, 22, 283–290. [Google Scholar] [CrossRef] [Green Version]
Evaluation Index | Our Method | Wu’s Method |
---|---|---|
ZMP Error | 0.0125 m | 0.025 m |
DoF of the platform | 2 | 1 |
Maximum angle | ||
Maximum frequency | ||
Success rate (0.0333 Hz, 30 deg) | 100% | 82% |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xi, A.; Chen, C. Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning. Sensors 2020, 20, 4468. https://doi.org/10.3390/s20164468
Xi A, Chen C. Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning. Sensors. 2020; 20(16):4468. https://doi.org/10.3390/s20164468
Chicago/Turabian StyleXi, Ao, and Chao Chen. 2020. "Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning" Sensors 20, no. 16: 4468. https://doi.org/10.3390/s20164468
APA StyleXi, A., & Chen, C. (2020). Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning. Sensors, 20(16), 4468. https://doi.org/10.3390/s20164468