# Stability and Safety Learning Methods for Legged Robots

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Background

#### 2.1. Classes of Dynamical Systems

**Definition 1.**

**Definition 2.**

**Definition 3.**

**Definition 4.**

**Definition 5.**

**Definition 6.**

#### 2.2. Stability Theory: Lyapunov Functions

**Definition 7.**

**Definition 8.**

**Definition 9.**

**Definition 10.**

**Theorem 1**

**Definition 11.**

#### 2.3. Safety Theory: Barrier Functions and Nagumo’s Theorem

**Definition 12.**

**Definition 13.**

**Definition 14.**

**Definition 15.**

#### 2.4. Stability and Safety for Controlled Systems: Control Barrier and Control Lyapunov Functions

## 3. Learning Methodologies

#### 3.1. Supervised Learning

#### 3.2. Reinforcement Learning Algorithms

- Soft actor–critic (SAC): the policy is instructed to maximize a balance between expected returns and entropy, which indicates the degree of randomness in the policy. This is closely related to the trade-off between exploration and exploitation: increasing entropy leads to more exploration and thus increases the learning rate. It can also prevent the policy from prematurely converging to a suboptimal local solution [58]. A flow chart of the method is reported in Figure 3.

- Deep deterministic policy gradient (DDPG): this is an actor–critic model-free algorithm that is based on the deterministic policy gradient and can operate over continuous action spaces. The goal is to learn the policy that maximizes the expected discounted cumulative long-term reward without violating the policy [59]. A flow chart of the method is reported in Figure 4.

- Imitation learning (IL): this learning framework aims to acquire a policy that replicates the actions of experts who demonstrate how to perform the desired task. The expert’s behaviour can be encapsulated as a set of trajectories, where each element can come from different example conditions; furthermore, it can be both offline and online. It was used in [37] in combination with LMI formulation of stability condition, to synthesize a stable controller for an inverted pendulum and a car trajectory following. A flow chart of a typical IL approach is reported in Figure 5.

#### 3.3. Linear and Nonlinear Programming

#### 3.3.1. Quadratic Programming

#### 3.3.2. Mixed Integer Programming

#### 3.3.3. Semi Definite Programming

## 4. Applications

#### 4.1. LF/CLF Applications

#### 4.2. BF/CBF Applications

#### 4.3. CBLF Applications

## 5. Future Perspectives

- Data-efficient Lyapunov functional distillation: Addressing the challenges related to the availability and efficiency of datasets for Lyapunov function distillation is crucial [42]. Future research should focus on methods that improve the extraction of Lyapunov functions from limited datasets to ensure robustness and efficiency in learning-based control systems.
- Integration of Lyapunov design techniques with offline learning: The current review highlights the potential of Lyapunov design techniques in reinforcement learning, especially in offline environments [12]. Future efforts could explore and extend the integration of Lyapunov-based approaches with reinforcement learning to achieve robust and efficient control policies in legged robotic systems.
- Flexible approaches based on system requirements: Depending on the specific requirements of a given system, future research could look at flexible approaches that combine CLFs and CBFs based on the priorities of stability or constraint satisfaction [81]. This adaptability ensures that control strategies can be tailored to the specific needs of different robotic systems, including a grading in the level of certification that can be relaxed when the robot generalizes the trained task, trying to extend the safety region beyond that one related to the training dataset.
- Terrain adaptation and obstacle avoidance: A major challenge for legged robots is to navigate uneven and discrete (rocky) terrains while avoiding obstacles [3]. Future work should aim to implement and further integrate discrete-time CBFs with continuous-time controllers to improve the adaptability and obstacle-avoidance capability of legged robots [45,49].
- Development of standard benchmarks: The development of standard benchmarks for the application of legged robots plays a critical role in the advancement of robotics by providing a common framework for evaluating control strategies. Standard benchmarks serve as important tools for evaluating the performance, robustness and adaptability of different control algorithms for different legged-robot platforms. These benchmarks will facilitate fair and objective comparisons between different control strategies, promote healthy competition and accelerate the identification of best practices. The introduction of benchmarks also encourages knowledge sharing and collaboration within the scientific community, as researchers can collectively contribute to the refinement and expansion of these standardized tests.

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Prajna, S.; Jadbabaie, A. Safety Verification of Hybrid Systems Using Barrier Certificates. In Proceedings of the International Conference on Hybrid Systems: Computation and Control, Philadelphia, PA, USA, 25–27 March 2004. [Google Scholar]
- Prajna, S. Barrier certificates for nonlinear model validation. Automatica
**2006**, 42, 117–126. [Google Scholar] [CrossRef] - Torres-Pardo, A.; Pinto-FernÃ¡ndez, D.; Garabini, M.; Angelini, F.; Rodriguez-Cianca, D.; Massardi, S.; Tornero, J.; Moreno, J.C.; Torricelli, D. Legged locomotion over irregular terrains: State of the art of human and robot performance. Bioinspir. Biomim.
**2022**, 17, 061002. [Google Scholar] [CrossRef] [PubMed] - Bouman, A.; Ginting, M.F.; Alatur, N.; Palieri, M.; Fan, D.D.; Touma, T.; Pailevanian, T.; Kim, S.K.; Otsu, K.; Burdick, J.; et al. Autonomous Spot: Long-Range Autonomous Exploration of Extreme Environments with Legged Locomotion. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 2518–2525. [Google Scholar] [CrossRef]
- Arena, P.; Patanè, L.; Taffara, S. A Data-Driven Model Predictive Control for Quadruped Robot Steering on Slippery Surfaces. Robotics
**2023**, 12, 67. [Google Scholar] [CrossRef] - Patané, L. Bio-inspired robotic solutions for landslide monitoring. Energies
**2019**, 12, 1256. [Google Scholar] [CrossRef] - Arena, P.; Patanè, L.; Taffara, S. Learning risk-mediated traversability maps in unstructured terrains navigation through robot-oriented models. Inf. Sci.
**2021**, 576, 1–23. [Google Scholar] [CrossRef] - Semini, C.; Wieber, P.B. Legged Robots. In Encyclopedia of Robotics; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–11. [Google Scholar] [CrossRef]
- Ren, Z.; Lai, J.; Wu, Z.; Xie, S. Deep neural networks-based real-time optimal navigation for an automatic guided vehicle with static and dynamic obstacles. Neurocomputing
**2021**, 443, 329–344. [Google Scholar] [CrossRef] - Singh, N.; Thongam, K. Neural network-based approaches for mobile robot navigation in static and moving obstacles environments. Intell. Serv. Robot.
**2019**, 12, 55–67. [Google Scholar] [CrossRef] - Xiao, W.; Cassandras, G.C.; Belta, C. Safety-Critical Optimal Control for Autonomous Systems. J. Syst. Sci. Complex.
**2021**, 34, 1723–1742. [Google Scholar] [CrossRef] - Westenbroek, T.; Castaneda, F.; Agrawal, A.; Sastry, S.; Sreenath, K. Lyapunov Design for Robust and Efficient Robotic Reinforcement Learning. arXiv
**2022**, arXiv:2208.06721. [Google Scholar] - Dai, H.; Landry, B.; Yang, L.; Pavone, M.; Tedrake, R. Lyapunov-stable neural-network control. arXiv
**2021**, arXiv:2109.14152. [Google Scholar] - Dawson, C.; Gao, S.; Fan, C. Safe Control With Learned Certificates: A Survey of Neural Lyapunov, Barrier, and Contraction Methods for Robotics and Control. IEEE Trans. Robot.
**2022**, 39, 1749–1767. [Google Scholar] [CrossRef] - Hafstein, S.; Giesl, P. Computational methods for Lyapunov functions. Discret. Contin. Dyn. Syst. Ser. B
**2015**, 20, i–ii. [Google Scholar] [CrossRef] - Tsukamoto, H.; Chung, S.J.; Slotine, J.J.E. Contraction theory for nonlinear stability analysis and learning-based control: A tutorial overview. Annu. Rev. Control
**2021**, 52, 135–169. [Google Scholar] [CrossRef] - Anand, A.; Seel, K.; Gjaerum, V.; Haakansson, A.; Robinson, H.; Saad, A. Safe Learning for Control using Control Lyapunov Functions and Control Barrier Functions: A Review. Procedia Comput. Sci.
**2021**, 192, 3987–3997. [Google Scholar] [CrossRef] - Hu, K.; Ott, C.; Lee, D. Online iterative learning control of zero-moment point for biped walking stabilization. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 5127–5133. [Google Scholar]
- Hu, K.; Ott, C.; Lee, D. Learning and Generalization of Compensative Zero-Moment Point Trajectory for Biped Walking. IEEE Trans. Robot.
**2016**, 32, 717–725. [Google Scholar] [CrossRef] - Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst.
**1989**, 2, 303–314. [Google Scholar] [CrossRef] - Elallid, B.B.; Benamar, N.; Hafid, A.S.; Rachidi, T.; Mrani, N. A Comprehensive Survey on the Application of Deep and Reinforcement Learning Approaches in Autonomous Driving. J. King Saud Univ. Comput. Inf. Sci.
**2022**, 34, 7366–7390. [Google Scholar] [CrossRef] - Xie, J.; Shao, Z.; Li, Y.; Guan, Y.; Tan, J. Deep Reinforcement Learning with Optimized Reward Functions for Robotic Trajectory Planning. IEEE Access
**2019**, 7, 105669–105679. [Google Scholar] [CrossRef] - Khalil, H.K. Nonlinear Systems, 3rd ed.; Prentice-Hall: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
- Nagumo, M. Über Die LAGE der Integralkurven Gewöhnlicher Differentialgleichungen. 1942. Available online: https://www.jstage.jst.go.jp/article/ppmsj1919/24/0/24_0_551/_pdf (accessed on 1 November 2023).
- Prajna, S.; Jadbabaie, A. Safety Verification of Hybrid Systems Using Barrier Certificates. In Hybrid Systems: Computation and Control; Alur, R., Pappas, G.J., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 477–492. [Google Scholar]
- Ames, A.; Coogan, S.D.; Egerstedt, M.; Notomista, G.; Sreenath, K.; Tabuada, P. Control Barrier Functions: Theory and Applications. In Proceedings of the 2019 18th European Control Conference (ECC), Naples, Italy, 25–28 June 2019; pp. 3420–3431. [Google Scholar]
- Dawson, C.; Qin, Z.; Gao, S.; Fan, C. Safe Nonlinear Control Using Robust Neural Lyapunov-Barrier Functions. arXiv
**2021**, arXiv:2109.06697. [Google Scholar] - Richards, S.; Berkenkamp, F.; Krause, A. The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems. In Proceedings of the 2nd Conference on Robot Learning (CoRL 2018), Zurich, Switzerland, 29–31 October 2018. [Google Scholar]
- Gaby, N.; Zhang, F.; Ye, X. Lyapunov-Net: A Deep Neural Network Architecture for Lyapunov Function Approximation. arXiv
**2022**, arXiv:2109.13359. [Google Scholar] - Abate, A.; Ahmed, D.; Giacobbe, M.; Peruffo, A. Formal Synthesis of Lyapunov Neural Networks. IEEE Control Syst. Lett.
**2021**, 5, 773–778. [Google Scholar] [CrossRef] - Abate, A.; Ahmed, D.; Edwards, A.; Giacobbe, M.; Peruffo, A. FOSSIL: A Software Tool for the Formal Synthesis of Lyapunov Functions and Barrier Certificates Using Neural Networks. In Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control (HSCC ’21), New York, NY, USA, 19–21 May 2021. [Google Scholar] [CrossRef]
- Zhou, R.; Quartz, T.; Sterck, H.D.; Liu, J. Neural Lyapunov Control of Unknown Nonlinear Systems with Stability Guarantees. arXiv
**2022**, arXiv:2206.01913. [Google Scholar] - Chang, Y.C.; Roohi, N.; Gao, S. Neural Lyapunov Control. arXiv
**2022**, arXiv:2005.00611. [Google Scholar] - Wu, J.; Clark, A.; Kantaros, Y.; Vorobeychik, Y. Neural Lyapunov Control for Discrete-Time Systems. arXiv
**2023**, arXiv:2305.06547. [Google Scholar] - Cosner, R.K.; Yue, Y.; Ames, A.D. End-to-End Imitation Learning with Safety Guarantees using Control Barrier Functions. arXiv
**2022**, arXiv:2212.11365. [Google Scholar] - Lindemann, L.; Hu, H.; Robey, A.; Zhang, H.; Dimarogonas, D.V.; Tu, S.; Matni, N. Learning Hybrid Control Barrier Functions from Data. arXiv
**2020**, arXiv:2011.04112. [Google Scholar] - Yin, H.; Seiler, P.; Jin, M.; Arcak, M. Imitation Learning with Stability and Safety Guarantees. arXiv
**2021**, arXiv:2012.09293. [Google Scholar] [CrossRef] - Chen, S.; Fazlyab, M.; Morari, M.; Pappas, G.J.; Preciado, V.M. Learning Lyapunov Functions for Hybrid Systems. In Proceedings of the 2021 55th Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 24–26 March 2021; p. 1. [Google Scholar]
- Chow, Y.; Nachum, O.; Duenez-Guzman, E.; Ghavamzadeh, M. A Lyapunov-based Approach to Safe Reinforcement Learning. arXiv
**2018**, arXiv:1805.07708. [Google Scholar] - Zhao, L.; Gatsis, K.; Papachristodoulou, A. Stable and Safe Reinforcement Learning via a Barrier-Lyapunov Actor–Critic Approach. arXiv
**2023**, arXiv:2304.04066. [Google Scholar] - Hejase, B.; Ozguner, U. Lyapunov Stability Regulation of Deep Reinforcement Learning Control with Application to Automated Driving. In Proceedings of the 2023 American Control Conference (ACC), San Diego, CA, USA, 31 May–2 June 2023; pp. 4437–4442. [Google Scholar] [CrossRef]
- Boffi, N.M.; Tu, S.; Matni, N.; Slotine, J.J.E.; Sindhwani, V. Learning Stability Certificates from Data. arXiv
**2020**, arXiv:2008.05952. [Google Scholar] - Ames, A.D.; Galloway, K.; Sreenath, K.; Grizzle, J.W. Rapidly Exponentially Stabilizing Control Lyapunov Functions and Hybrid Zero Dynamics. IEEE Trans. Autom. Control
**2014**, 59, 876–891. [Google Scholar] [CrossRef] - Xiong, Z.; Eappen, J.; Qureshi, A.H.; Jagannathan, S. Model-free Neural Lyapunov Control for Safe Robot Navigation. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 5572–5579. [Google Scholar]
- Csomay-Shanklin, N.; Cosner, R.K.; Dai, M.; Taylor, A.J.; Ames, A.D. Episodic Learning for Safe Bipedal Locomotion with Control Barrier Functions and Projection-to-State Safety. In Proceedings of the 3rd Conference on Learning for Dynamics and Control, Virtual Event, Switzerland, 7–8 June 2021; Jadbabaie, A., Lygeros, J., Pappas, G.J., Parrilo, P.A., Recht, B., Tomlin, C.J., Zeilinger, M.N., Eds.; PMLR: Maastricht, Germany, 2021; Volume 144, pp. 1041–1053. [Google Scholar]
- Grandia, R.; Taylor, A.J.; Ames, A.D.; Hutter, M. Multi-Layered Safety for Legged Robots via Control Barrier Functions and Model Predictive Control. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 8352–8358. [Google Scholar] [CrossRef]
- Peng, C.; Donca, O.; Hereid, A. Safe Path Planning for Polynomial Shape Obstacles via Control Barrier Functions and Logistic Regression. arXiv
**2022**, arXiv:2210.03704. [Google Scholar] - Hsu, S.C.; Xu, X.; Ames, A.D. Control barrier function based quadratic programs with application to bipedal robotic walking. In Proceedings of the 2015 American Control Conference (ACC), Chicago, IL, USA, 1–3 July 2015; pp. 4542–4548. [Google Scholar] [CrossRef]
- Agrawal, A.; Sreenath, K. Discrete Control Barrier Functions for Safety-Critical Control of Discrete Systems with Application to Bipedal Robot Navigation. In Proceedings of the Robotics: Science and Systems, Cambridge, MA, USA, 12–16 July 2017. [Google Scholar]
- Nguyen, Q.; Hereid, A.; Grizzle, J.W.; Ames, A.D.; Sreenath, K. 3D dynamic walking on stepping stones with Control Barrier Functions. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 827–834. [Google Scholar] [CrossRef]
- Choi, J.J.; Castañeda, F.; Tomlin, C.J.; Sreenath, K. Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions. arXiv
**2020**, arXiv:2004.07584. [Google Scholar] - Meng, Y.; Fan, C. Hybrid Systems Neural Control with Region-of-Attraction Planner. arXiv
**2023**, arXiv:2303.10327. [Google Scholar] - Rodriguez, I.D.J.; Csomay-Shanklin, N.; Yue, Y.; Ames, A. Neural Gaits: Learning Bipedal Locomotion via Control Barrier Functions and Zero Dynamics Policies. In Proceedings of the Conference on Learning for Dynamics & Control, Stanford, CA, USA, 23–24 June 2022. [Google Scholar]
- Cunningham, P.; Cord, M.; Delany, S.J. Supervised Learning. In Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval; Springer: Berlin/Heidelberg, Germany, 2008; pp. 21–49. [Google Scholar] [CrossRef]
- Barreto, G.; Araújo, A.; Ritter, H. Self-Organizing Feature Maps for Modeling and Control of Robotic Manipulators. J. Intell. Robot. Syst.
**2003**, 36, 407–450. [Google Scholar] [CrossRef] - Arena, P.; Di Pietro, F.; Li Noce, A.; Patanè, L. Attitude control in the Mini Cheetah robot via MPC and reward-based feed-forward controller. IFAC-PapersOnLine
**2022**, 55, 41–48. [Google Scholar] [CrossRef] - Yu, K.; Jin, K.; Deng, X. Review of Deep Reinforcement Learning. In Proceedings of the 2022 IEEE 5th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing China, 16–18 December 2022; Volume 5, pp. 41–48. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor–Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv
**2018**, arXiv:1801.01290. [Google Scholar] - Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv
**2019**, arXiv:1509.02971. [Google Scholar] - Hwangbo, J.; Lee, J.; Dosovitskiy, A.; Bellicoso, D.; Tsounis, V.; Koltun, V.; Hutter, M. Learning agile and dynamic motor skills for legged robots. Sci. Robot.
**2019**, 4, eaau5872. [Google Scholar] [CrossRef] - Lee, J.; Hwangbo, J.; Wellhausen, L.; Koltun, V.; Hutter, M. Learning quadrupedal locomotion over challenging terrain. Sci. Robot.
**2020**, 5, eabc5986. [Google Scholar] [CrossRef] - Miki, T.; Lee, J.; Hwangbo, J.; Wellhausen, L.; Koltun, V.; Hutter, M. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot.
**2022**, 7, eabk2822. [Google Scholar] [CrossRef] [PubMed] - Nocedal, J.; Wright, S.J. (Eds.) Quadratic Programming. In Numerical Optimization; Springer: New York, NY, USA, 1999; pp. 438–486. [Google Scholar] [CrossRef]
- Vandenberghe, L.; Boyd, S. Semidefinite Programming. SIAM Rev.
**1996**, 38, 49–95. [Google Scholar] [CrossRef] - Tayal, M.; Kolathaya, S. Polygonal Cone Control Barrier Functions (PolyC2BF) for safe navigation in cluttered environments. arXiv
**2023**, arXiv:2311.08787. [Google Scholar] - Westervelt, E.; Buche, G.; Grizzle, J. Experimental Validation of a Framework for the Design of Controllers that Induce Stable Walking in Planar Bipeds. I. J. Robot. Res.
**2004**, 23, 559–582. [Google Scholar] [CrossRef] - Sreenath, K.; Park, H.W.; Poulakakis, I.; Grizzle, J. A Compliant Hybrid Zero Dynamics Controller for Stable, Efficient and Fast Bipedal Walking on MABEL. I. J. Robot. Res.
**2011**, 30, 1170–1193. [Google Scholar] [CrossRef] - Kenneally, G.; De, A.; Koditschek, D.E. Design Principles for a Family of Direct-Drive Legged Robots. IEEE Robot. Autom. Lett.
**2016**, 1, 900–907. [Google Scholar] [CrossRef] - Coumans, E.; Bai, Y. Pybullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning. 2020. Available online: https://docs.google.com/document/d/10sXEhzFRSnvFcl3XxNGhnD4N2SedqwdAvK3dsihxVUA (accessed on 1 November 2023).
- Unitree. A1 Quadruped Robot. 2023. Available online: https://m.unitree.com/a1/ (accessed on 1 November 2023).
- Da, X.; Xie, Z.; Hoeller, D.; Boots, B.; Anandkumar, A.; Zhu, Y.; Babich, B.; Garg, A. Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion. arXiv
**2020**, arXiv:2009.10019. [Google Scholar] - Ray, A.; Achiam, J.; Amodei, D. Benchmarking safe exploration in deep reinforcement learning. arXiv
**2019**, arXiv:1910.01708. [Google Scholar] - Castillo, G.A.; Weng, B.; Zhang, W.; Hereid, A. Robust Feedback Motion Policy Design Using Reinforcement Learning on a 3D Digit Bipedal Robot. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 5136–5143. [Google Scholar] [CrossRef]
- Ambrose, E.; Ma, W.L.; Hubicki, C.; Ames, A.D. Toward benchmarking locomotion economy across design configurations on the modular robot: AMBER-3M. In Proceedings of the 2017 IEEE Conference on Control Technology and Applications (CCTA), Kohala Coast, HI, USA, 27–30 August 2017; pp. 1270–1276. [Google Scholar] [CrossRef]
- Hutter, M.; Gehring, C.; Jud, D.; Lauber, A.; Bellicoso, C.D.; Tsounis, V.; Hwangbo, J.; Bodie, K.; Fankhauser, P.; Bloesch, M.; et al. ANYmal—A highly mobile and dynamic quadrupedal robot. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 38–44. [Google Scholar] [CrossRef]
- Tayal, M.; Kolathaya, S.N.Y. Safe Legged Locomotion Using Collision Cone Control Barrier Functions (C3BFs). arXiv
**2023**, arXiv:2309.01898. [Google Scholar] - Ma, W.L.; Zhao, H.H.; Kolathaya, S.; Ames, A.D. Human-inspired walking via unified PD and impedance control. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 5088–5094. [Google Scholar] [CrossRef]
- Reher, J.; Cousineau, E.A.; Hereid, A.; Hubicki, C.M.; Ames, A.D. Realizing dynamic and efficient bipedal locomotion on the humanoid robot DURUS. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 1794–1801. [Google Scholar] [CrossRef]
- Tedrake, R. Underactuated robotics: Learning, planning, and control for efficient and agile machines, Course notes for MIT, 6:832. Work. Draft. Ed.
**2009**, 3, 2. [Google Scholar] - Mellinger, D.; Kumar, V. Minimum snap trajectory generation and control for quadrotors. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 2520–2525. [Google Scholar] [CrossRef]
- Jin, W.; Wang, Z.; Yang, Z.; Mou, S. Neural Certificates for Safe Control Policies. arXiv
**2020**, arXiv:2006.08465. [Google Scholar]

**Figure 2.**Flow chart for supervised learning with counterexamples. Given an initial domain D and a system f, a student network ${\pi}_{\Omega}$ representing the Lyapunov candidate is trained using P samples from D. At each iteration, the neural network is translated into an analytic form and sent as input to an optimizer that checks the Lyapunov constraints. If they are not satisfied, a counterexample that violates the constraints is returned. This is used to generate a set of counterexamples that are included in the original data set D to obtain an augmented data set ${D}^{*}$. If the Lyapunov function conditions are not satisfied, as described in [32,34], the training is terminated when the maximum number of iterations is reached without a solution being found, while the algorithm in [30,31] offers the possibility to reduce the search domain and repeat the learning procedure.

**Figure 3.**Flow chart related to the SAC algorithm, based on [40]. After initializing all networks, the neural controller ${\pi}_{\theta}$, the Lyapunov function ${L}_{v}$, the action-value networks ${Q}_{{\varphi}_{i}}$ with i = {1,2}, the Lagrange multipliers $\zeta $ and $\lambda $, and the coefficient $\alpha $, the system applies a control signal and receives feedback, including reward and cost. The received transitions are stored in the replay buffer R and a set of transition pairs is randomly selected to construct the CBF and CLF constraints with the system f. These constraints are then used in RL controller training, where the extended Lagrangian method is used to update the parameters. If the safety and stability constraints are not satisfied, an optimal backup QP controller is designed to maintain basic safety.

**Figure 4.**Flow chart related to the DDPG algorithm, based on [44]. This implementation is based on a co-learning framework where TNLF is trained together with the controller. The learning is based on the policy network ${\pi}_{{\theta}_{\pi}}$, the Q-function network ${Q}_{{\theta}_{Q}}$, their target networks ${\pi}_{{\theta}_{\pi}^{{}^{\prime}}}$ and ${Q}_{{\theta}_{Q}^{{}^{\prime}}}$, the Lyapunov function network ${V}_{{\theta}_{V}}$ and the Lyapunov Q-function network ${L}_{{Q}_{{\theta}_{Q}}}$. This acts as an additional critic for the actions of the actor, which is guided by its target network ${Q}_{{\theta}_{Q}^{{}^{\prime}}}$. The target networks are slowly changing networks that should follow the main value networks. After initializing the parameters, the transitions are sampled with the target network ${\pi}_{{\theta}_{\pi}^{{}^{\prime}}}$ and the results are stored in the playback buffer R. This is normally used in RL to store the trajectories of experiences when executing a policy in an environment. Successively, the Q-net, the Lyapunov net and the respective target nets are trained on dataset D extracted from R. At the end of each step, the target networks are updated.

**Figure 5.**Flow chart for training with imitation learning, based on [35,37]. At each step, pairs of state values and control actions of the teacher controller are passed to the imitation training block as a reference. In this implementation, the teacher controller is based on an optimal controller that meets the safety conditions, so no further optimization block is required.

**Figure 7.**Application examples for four-legged and two-legged architectures. (

**a**) LF on an 8-DoF quadruped robot controlled with a DDPG algorithm [44] to navigate towards a target location, in an environment filled with obstacles. (

**b**) CBF with episodic learning on AMBER-3M biped controlled to walk on stepping stones, the time evolution of the barrier function at episode 0 [45] is reported in the panel where the distance between the robot feet and the stone centres is depicted. (

**c**) LF on A1 quadruped with SAC algorithm for velocity tracking. The comparison between desired forward velocity and measured robot speed is reported in the panel [12]. (

**d**) CBLF on DURUS 23-DoF biped with QP optimization for walking control on steps [50].

**Table 1.**Analysis of the recent literature: referred papers are check marked (✓) in terms of the considered model, learning methodology and adopted certificate. The * indicates works that present applications on legged robots.

Model Type | Methodology | Certificate | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Generic | Affine | Hybrid | Markovian | Model-Free | RL | Supervised | L&NP | LF/CLF | BF/CBF | |

[29] | ✓ | ✓ | ✓ | |||||||

[30] | ✓ | ✓ | ✓ | |||||||

[31] | ✓ | ✓ | ✓ | ✓ | ||||||

[32] | ✓ | ✓ | ✓ | |||||||

[33] | ✓ | ✓ | ✓ | |||||||

[34] | ✓ | ✓ | MILP | ✓ | ||||||

[35] | ✓ | ✓ | ✓ | |||||||

[36] | ✓ | ✓ | ✓ | |||||||

[27] | ✓ | ✓ | QP | ✓ | ✓ | |||||

[37] | ✓ | ✓ | SDP | ✓ | ||||||

[38] | ✓ | ✓ | MIQP | ✓ | ||||||

[13] | ✓ | ✓ | MILP | ✓ | ||||||

[39] | ✓ | ✓ | ✓ | |||||||

[40] | ✓ | ✓ | ✓ | ✓ | ||||||

[41] | ✓ | ✓ | ✓ | |||||||

[42] * | ✓ | S | ✓ | |||||||

[12] * | ✓ | ✓ | ✓ | |||||||

[43] * | ✓ | QP | ✓ | |||||||

[44] * | ✓ | ✓ | ✓ | |||||||

[45] * | ✓ | QP | ✓ | |||||||

[46] * | ✓ | QP | ✓ | |||||||

[47] * | ✓ | QP | ✓ | ✓ | ||||||

[48] * | ✓ | QP | ✓ | ✓ | ||||||

[49] * | ✓ | QP | ✓ | ✓ | ||||||

[50] * | ✓ | ✓ | QP | ✓ | ✓ | |||||

[51] * | ✓ | ✓ | ✓ | |||||||

[52] * | ✓ | ✓ | ✓ | |||||||

[53] * | ✓ | ✓ | ✓ |

**Table 2.**Analysis of the CLF and CBF applications to walking robots: referred papers are check marked (✓) in terms of certificate type. Task, robot configuration and implementation type (simulation or real experiment) are also outlined.

LF/CLF | BF/CBF | TASK | ROBOT | Sim | Real | |
---|---|---|---|---|---|---|

[42] | ✓ | Stable standing | Minitaur quadruped | ✓ | ||

[12] | ✓ | Velocity tracking | A1 quadruped | ✓ | ||

✓ | Walking with unknown load | A1 quadruped | ✓ | |||

✓ | Locomotion control | Rabbit biped | ✓ | |||

[43] | ✓ | Locomotion control | Rabbit biped | ✓ | ||

✓ | Locomotion control | Marvel biped | ✓ | |||

[44] | ✓ | Navigation control | 8-DoF quadruped | ✓ | ||

[45] | ✓ | Walking on 2D stepping stones | AMBER-3M biped | ✓ | ✓ | |

[46] | ✓ | Walking on 3D stepping stones | ANYmal quadruped | ✓ | ✓ | |

[47] | ✓ | Navigation control | Digit biped | ✓ | ||

[48] | ✓ | ✓ | Locomotion control | AMBER2 7-DoF biped | ✓ | |

[49] | ✓ | ✓ | Navigation control | 21-DoF biped | ✓ | |

[50] | ✓ | ✓ | Locomotion control | DURUS 23-DoF biped | ✓ | |

[51] | ✓ | ✓ | Walking on 2D stepping stones | Rabbit biped | ✓ | |

[52] | ✓ | Locomotion control | Compass-gait walked biped | ✓ | ||

[53] | ✓ | Locomotion control | AMBER-3M | ✓ | ✓ | |

[65] | ✓ | Navigation Control | Laikago | ✓ | ||

[36] | ✓ | Locomotion control | Compass-gait walked biped | ✓ |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Arena, P.; Li Noce, A.; Patanè, L.
Stability and Safety Learning Methods for Legged Robots. *Robotics* **2024**, *13*, 17.
https://doi.org/10.3390/robotics13010017

**AMA Style**

Arena P, Li Noce A, Patanè L.
Stability and Safety Learning Methods for Legged Robots. *Robotics*. 2024; 13(1):17.
https://doi.org/10.3390/robotics13010017

**Chicago/Turabian Style**

Arena, Paolo, Alessia Li Noce, and Luca Patanè.
2024. "Stability and Safety Learning Methods for Legged Robots" *Robotics* 13, no. 1: 17.
https://doi.org/10.3390/robotics13010017