Online Reinforcement-Learning-Based Adaptive Terminal Sliding Mode Control for Disturbed Bicycle Robots on a Curved Pavement
Abstract
:1. Introduction
2. Dynamics
3. Design of Controller
3.1. Feedback Transformation and Terminal Sliding Mode Control
3.2. PPO-TSMC
4. Simulation Experiment
4.1. Simulation Platform
4.2. Experimental Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Stasinopoulos, S.; Zhao, M.; Zhong, Y. Simultaneous localization and mapping for autonomous bicycles. Int. J. Adv. Robot. Syst. 2017, 14, 172988141770717. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Li, J.; Yi, J.; Song, D. Balance control and analysis of stationary riderless motorcycles. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011. [Google Scholar]
- Yu, Y.; Zhao, M. Steering control for autonomously balancing bicycle at low speed. In Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 12–15 December 2018. [Google Scholar]
- Sun, Y.; Zhao, M.; Wang, B.; Zheng, X.; Liang, B. Polynomial controller for BR based on nonlinear descriptor system. In Proceedings of the IECON 2020—46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020. [Google Scholar]
- Chen, C.K.; Chu, T.D.; Zhang, X.D. Modeling and control of an active stabilizing assistant system for a bicycle. Sensors 2019, 19, 248. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zheng, X.; Zhu, X.; Chen, Z.; Sun, Y.; Liang, B.; Wang, T. Dynamic modeling of an unmanned motorcycle and combined balance control with both steering and double cmgs. Mech. Mach. Theory 2022, 169, 104–643. [Google Scholar] [CrossRef]
- He, K.; Deng, Y.; Wang, G.; Sun, X.; Sun, Y.; Chen, Z. Learning-Based Trajectory Tracking and Balance Control for BRs with a Pendulum: A Gaussian Process Approach. IEEE/ASME Trans. Mechatronics 2022, 27, 634–644. [Google Scholar] [CrossRef]
- Kim, Y.; Kim, H.; Lee, J. Stable control of the BR on a curved path by using a reaction wheel. J. Mech. Sci. Technol. 2015, 29, 2219–2226. [Google Scholar] [CrossRef]
- Chen, L.; Liu, J.; Wang, H.; Hu, Y.; Zheng, X.; Ye, M.; Zhang, J. Robust control of reaction wheel BR via adaptive integral terminal sliding mode. Nonlinear Dyn. 2021, 104, 291–2302. [Google Scholar]
- Kim, H.-W.; An, J.-W.; Yoo, H.d.; Lee, J.-M. Balancing control of bicycle robot using pid control. In Proceedings of the 2013 13th International Conference on Control, Automation and Systems (ICCAS 2013), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 145–147. [Google Scholar]
- Kanjanawanishkul, K. Lqr and mpc controller design and comparison for a stationary self-balancing BR with a reaction wheel. Kybernetika 2015, 51, 173–191. [Google Scholar]
- Owczarkowski, A.; Horla, D.; Zietkiewicz, J. Introduction of feedback linearization to robust lqr and lqi control—Analysis of results from an unmanned BR with reaction wheel. Asian J. Control. 2019, 21, 1028–1040. [Google Scholar] [CrossRef]
- Yi, J.; Song, D.; Levandowski, A.; Jayasuriya, S. Trajectory tracking and balance stabilization control of autonomous motorcycles. In Proceedings of the 2006 IEEE International Conference on Robotics and Automation, ICRA 2006, Orlando, FL, USA, 15–19 May 2006; pp. 2583–2589. [Google Scholar]
- Hwang, C.-L.; Wu, H.-M.; Shih, C.-L. Fuzzy sliding-mode underactuated control for autonomous dynamic balance of an electrical bicycle. IEEE Trans. Control. Syst. Technol. 2009, 17, 658–670. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Aradi, S. Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 740–759. [Google Scholar] [CrossRef]
- Randløv, J.; Alstrøm, P. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping; ICML: Baltimore, MD, USA, 1998. [Google Scholar]
- Choi, S.Y.; Le, T.; Nguyen, Q.; Layek, M.; Lee, S.G.; Chung, T.C. Toward self-driving bicycles using state-of-the-art deep reinforcement learning algorithms. Symmetry 2019, 11, 290. [Google Scholar] [CrossRef]
- Zheng, Q.; Wang, D.; Chen, Z.; Sun, Y.; Liang, B. Continuous reinforcement learning based ramp jump control for single-track two-wheeled robots. Trans. Inst. Meas. Control. 2022, 44, 892–904. [Google Scholar] [CrossRef]
- Johannink, T.; Bahl, S.; Nair, A.; Luo, J.; Kumar, A.; Loskyll, M.; Ojea, J.A.; Solowjow, E.; Levine, S. Residual reinforcement learning for robot control. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 6023–6029. [Google Scholar]
- Venkataraman, S.; Gulati, S. Terminal sliding modes: A new approach to nonlinear control synthesis. In Proceedings of the 5th International Conference on Advanced Robotics ’Robots in Unstructured Environments, Pisa, Italy, 19–22 June 1991; Volume 1, pp. 443–448. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Olfati-Saber, R. Global stabilization of a flat underactuated system: The inertia wheel pendulum. In Proceedings of the IEEE Conference on Decision and Control, Los Alamitos, CA, USA, 4–7 December 2001; Volume 4, pp. 3764–3765. [Google Scholar]
- Olfati-Saber, R. Nonlinear Control of Underactuated Mechanical Systems with Application to Robotics and Aerospace Vehicles. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2001. [Google Scholar]
- Spong, M.W.; Corke, P.; Lozano, R. Nonlinear control of the reaction wheel pendulum. Automatica 2001, 37, 1845–1851. [Google Scholar] [CrossRef]
- Zhou, M.; Feng, Y.; Han, F. Continuous full-order terminal sliding mode control for a class of nonlinear systems. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3657–3660. [Google Scholar]
- Shtessel, Y.; Edwards, C.; Fridman, L.; Levant, A. Sliding Mode Control and Observation; Publishing House: New York, NY, USA, 2014. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 7–9 July 2015; pp. 1889–1897. [Google Scholar]
- Olfati-Saber, R. Normal forms for underactuated mechanical systems with symmetry. IEEE Trans. Autom. Control. 2002, 47, 305–308. [Google Scholar] [CrossRef] [Green Version]
- Andrychowicz, M.; Raichuk, A.; Stańczyk, P.; Orsini, M.; Girgin, S.; Marinier, R.; Hussenot, L.; Geist, M.; Pietquin, O.; Michalski, M.; et al. What matters in on-policy reinforcement learning? A large-scale empirical study. arXiv 2020, arXiv:2006.05990. [Google Scholar]
- Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair; ICML: Baltimore, MA, USA, 2010. [Google Scholar]
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-dimensional continuous control using generalized advantage estimation. arXiv 2015, arXiv:1506.02438. [Google Scholar]
- Konda, V.; Tsitsiklis, J. Actor-critic algorithms. In Proceedings of the Neural Information Processing Systems (NIPS), Denver, CO, USA, 29 November 29–4 December 1999. [Google Scholar]
- Holzleitner, M.; Gruber, L.; Arjona-Medina, J.; Brandstetter, J.; Hochreiter, S. Convergence proof for actor-critic methods applied to ppo and rudder. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XLVIII; Springer: Berlin/Heidelberg, Germany, 2021; pp. 105–130. [Google Scholar]
- Machado, M.; Moreira, P.; Flores, P.; Lankarani, H.M. Compliant contact force models in multibody dynamics: Evolution of the hertz contact theory. Mech. Mach. Theory 2012, 53, 99–121. [Google Scholar] [CrossRef]
- Marques, F.; Flores, P.; Claro, J.P.; Lankarani, H.M. A survey and comparison of several friction force models for dynamic analysis of multibody mechanical systems. Nonlinear Dyn. 2016, 86, 1407–1443. [Google Scholar] [CrossRef]
- Giesbers, J. Contact Mechanics in MSC Adams-a Technical Evaluation of the Contact Models in Multibody Dynamics Software MSC Adams. Ph.D. Thesis, University of Twente, Twente, The Netherlands, 2012. [Google Scholar]
- Sapietová, A.; Gajdoš, L.; Dekỳxsx, V.; Sapieta, M. Analysis of the influence of input function contact parameters of the impact force process in the msc. adams. In Advanced Mechatronics Solutions; Springer: Berlin/Heidelberg, Germany, 2016; pp. 243–253. [Google Scholar]
- Chen, L.; Yan, B.; Wang, H.; Shao, K.; Kurniawan, E.; Wang, G. Extreme-learning-machine-based robust integral terminal sliding mode control of bicycle robot. Control. Eng. Pract. 2022, 124, 105064. [Google Scholar] [CrossRef]
- Deisenroth, M.P.; Fox, D.; Rasmussen, C.E. Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 37, 408–423. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chettibi, T. Smooth point-to-point trajectory planning for robot manipulators by using radial basis functions. Robotica 2019, 37, 539–559. [Google Scholar] [CrossRef]
- Moerl, T.M.; Broekens, J.; Jonker, C.M. Model-based reinforcement learning: A survey. arXiv 2020, arXiv:2006.16712. [Google Scholar]
- Rietsch, S.; Huang, S.Y.; Kontes, G.; Plinge, A.; Mutschler, C. Driver Dojo: A Benchmark for Generalizable Reinforcement Learning for Autonomous Driving. arXiv 2022, arXiv:2207.11432. [Google Scholar]
Case 1 | Case 2 | Case 3 | Case 4 | |
---|---|---|---|---|
1.5 | 1.5 | 3 | 3 | |
Case | Controller | MAX (rad) | MEAN (rad) | RMS (rad) | Time (s) |
---|---|---|---|---|---|
1 | TSMC | - | - | - | 196 |
AITSM | 0.215 | −0.0229 | 0.0625 | - | |
PPO | - | - | - | 22 | |
RRL | - | - | - | 361 | |
PPO-TSMC | 0.0869 | −0.0053 | 0.0371 | - | |
2 | TSMC | - | - | - | 168 |
AITSM | 0.231 | −0.0353 | 0.0735 | - | |
PPO | - | - | - | 14 | |
RRL | - | - | - | 274 | |
PPO-TSMC | 0.132 | −0.0078 | 0.0539 | - | |
3 | TSMC | 0.224 | −0.0199 | 0.0883 | - |
AITSM | - | - | - | 389 | |
PPO | - | - | - | 11 | |
RRL | 0.321 | −0.0171 | 0.0931 | - | |
PPO-TSMC | 0.137 | −0.0082 | 0.0489 | - | |
4 | TSMC | 0.269 | −0.0194 | 0.0858 | - |
AITSM | - | - | - | 265 | |
PPO | - | - | - | 8 | |
RRL | 0.314 | 0.0046 | 0.0774 | - | |
PPO-TSMC | 0.211 | −0.0072 | 0.0540 | - |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, X.; Deng, Y.; Zheng, X.; Zheng, Q.; Liang, B.; Liu, Y. Online Reinforcement-Learning-Based Adaptive Terminal Sliding Mode Control for Disturbed Bicycle Robots on a Curved Pavement. Electronics 2022, 11, 3495. https://doi.org/10.3390/electronics11213495
Zhu X, Deng Y, Zheng X, Zheng Q, Liang B, Liu Y. Online Reinforcement-Learning-Based Adaptive Terminal Sliding Mode Control for Disturbed Bicycle Robots on a Curved Pavement. Electronics. 2022; 11(21):3495. https://doi.org/10.3390/electronics11213495
Chicago/Turabian StyleZhu, Xianjin, Yang Deng, Xudong Zheng, Qingyuan Zheng, Bin Liang, and Yu Liu. 2022. "Online Reinforcement-Learning-Based Adaptive Terminal Sliding Mode Control for Disturbed Bicycle Robots on a Curved Pavement" Electronics 11, no. 21: 3495. https://doi.org/10.3390/electronics11213495
APA StyleZhu, X., Deng, Y., Zheng, X., Zheng, Q., Liang, B., & Liu, Y. (2022). Online Reinforcement-Learning-Based Adaptive Terminal Sliding Mode Control for Disturbed Bicycle Robots on a Curved Pavement. Electronics, 11(21), 3495. https://doi.org/10.3390/electronics11213495