# FastMimic: Model-Based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

#### 2.1. Motion Imitation

#### 2.2. Model-Based Legged Locomotion Control

#### 2.3. Dynamic Movement Primitives (DMPs)

## 3. Model-Based Control for Motion Imitation

#### 3.1. Motion Retargeting

#### 3.2. Model-Based Controller

#### 3.2.1. Gait Planner

#### 3.2.2. Stance Controller

#### 3.2.3. Swing Controller

#### 3.3. Trajectory Optimization with DMPs

## 4. Experiment

#### 4.1. Reward Function

- Joint pose reward ${r}_{t}^{p}=exp[-5{\sum}_{j}\parallel {\overline{\mathbf{q}}}_{t}^{j}-{\mathbf{q}}_{t}^{j}{\parallel}^{2}]$,
- Joint velocity reward ${r}_{t}^{v}=exp[-0.1{\sum}_{j}\parallel {\dot{\overline{\mathbf{q}}}}_{t}^{j}-{\dot{\mathbf{q}}}_{t}^{j}{\parallel}^{2}]$,
- End-effector reward ${r}_{t}^{e}=exp[-40{\sum}_{i}\parallel {\overline{\mathbf{x}}}_{t,i}^{f}-{\mathbf{x}}_{t,i}^{f}{\parallel}^{2}]$,
- CoM position reward ${r}_{t}^{rp}=exp[-20\parallel {\overline{\mathbf{x}}}_{pos,t}^{robot}-{\mathbf{x}}_{pos,t}^{robot}{\parallel}^{2}-10\parallel {\overline{\mathbf{x}}}_{ori,t}^{robot}-{\mathbf{x}}_{ori,t}^{robot}{\parallel}^{2}]$,
- CoM velocity reward ${r}_{t}^{rv}=exp[-2\parallel {\dot{\overline{\mathbf{x}}}}_{pos,t}^{robot}-{\dot{\mathbf{x}}}_{pos,t}^{robot}{\parallel}^{2}-\parallel {\dot{\overline{\mathbf{x}}}}_{ori,t}^{robot}-{\dot{\mathbf{x}}}_{ori,t}^{robot}{\parallel}^{2}]$.

#### 4.2. Comparison Experiments

**DeepMimic (RL):**We compare our approach against DeepMimic [1], a learning-based approach that learns an RL policy per reference motion. We use the reward function described in Section 4.1 and train each policy for 100 million simulation steps using Proximal Policy Optimization [48]. We use the open-source implementation to train policies for trot, pace, turn, and side-step motions. For each target motion, we train two RL policies with different random seeds and report average performance in Figure 4. We apply policies learned in simulation to hardware with no fine-tuning to make the comparison fair to our approach.**Model-based Controller with Raibert Swing (Raibert):**Next, we compare our method against a model-based method from Kang et al. [15] which uses animal reference for CoM motion, but uses linear swing trajectories of fixed time length ${T}_{s}$ that reach a footstep calculated using the Raibert heuristic: ${\mathbf{x}}_{i}^{f,ref}=0.5{T}_{s}{\dot{\mathbf{x}}}^{\mathrm{robot}}-K({\dot{\overline{\mathbf{x}}}}^{\mathrm{robot}}-{\dot{\mathbf{x}}}^{\mathrm{robot}})$. Compared to Equation (3), we note that this method does not take the animal motion into account, while our approach augments the animal motion with a stabilizing feedback mechanism. Because each gait has unique swing foot motion style, using pre-defined swing trajectories could cause difficulties when reproducing natural and diverse motion styles.**Model-based Controller (MBC):**In this baseline, we send the animal motion trajectories to our model-based controller without any trajectory optimization. This experiment highlights the importance of whole-body trajectory optimization in simulation.**Model-based Controller with DMP Optimization (MBC-DMP, ours):**Our approach, which uses offline trajectory optimization to adapt the reference motion sent to the model-based controller.**Long horizon Model-predictive control (LH):**Whole-body control of quadrupedal robots can be improved by using a receding-horizon model predictive control approach that plans over multiple time steps, instead of instantaneous forces, as in Equation (1). We utilize the approach from Di et al. [5] to solve a higher-order QP to plan actions over a horizon of 10 steps, instead of a single step. During stance, this long-horizon convex MPC controller uses the linearized centroidal model to predict future states, and plans a sequence of contact forces that lead to a desired CoM trajectory. For swing control, we use the same setup as ours. This experiment aims to test if online MPC can replace offline trajectory optimization.**Long horizon MPC with DMP Optimization (LH-DMP):**Lastly, we augment the long horizon model-predictive control (LH) with our DMP-based trajectory optimization (LH-DMP). We expect that this method outperforms our method because it reasons over a longer horizon of the reference motion. However, both long-horizon predictive control methods, LH and LH-DMP, cannot easily be deployed to hardware due to their expensive computational costs. Therefore, we only conduct simulation experiments for analysis.

**Figure 4.**Comparison against other baselines. Our method (MBC-DMP) outperforms other learning-based (RL) or model-based (Raibert, RL, MBC, LH) in simulation and on hardware, except for LH-DMP. Note that the methods with long-horizon predictions, LH and LH-DMP, cannot easily be deployed to hardware due to their computational costs.

#### 4.3. Motion Stitching

_{i}independently with DMPs. Then, we optimize all DMPs together with respect to the reward function mentioned in Section 4.1. To warm start the optimization, we initialize the DMPs with the parameters for individual optimized motion. Our method successfully imitates four stitched motions: (1) Pace + Trot, (2) Trot + Turn + Trot, (3) Side Step + Trot + Turn, and (4) Trot + Side Step + Reverse trot. Real-world results can be seen in the Supplementary Video.

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Peng, X.B.; Abbeel, P.; Levine, S.; van de Panne, M. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. (TOG)
**2018**, 37, 143. [Google Scholar] [CrossRef] [Green Version] - Peng, X.B.; Coumans, E.; Zhang, T.; Lee, T.W.; Tan, J.; Levine, S. Learning agile robotic locomotion skills by imitating animals. arXiv
**2020**, arXiv:2004.00784. [Google Scholar] - Park, H.W.; Wensing, P.M.; Kim, S. High-speed bounding with the MIT Cheetah 2: Control design and experiments. Int. J. Robot. Res.
**2017**, 36, 167–192. [Google Scholar] [CrossRef] [Green Version] - Bledt, G.; Powell, M.J.; Katz, B.; Di Carlo, J.; Wensing, P.M.; Kim, S. MIT Cheetah 3: Design and control of a robust, dynamic quadruped robot. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 2245–2252. [Google Scholar]
- Di Carlo, J.; Wensing, P.M.; Katz, B.; Bledt, G.; Kim, S. Dynamic locomotion in the mit cheetah 3 through convex model-predictive control. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–9. [Google Scholar]
- Park, S.; Ryu, H.; Lee, S.; Lee, S.; Lee, J. Learning Predict-and-simulate Policies from Unorganized Human Motion Data. ACM Trans. Graph. (TOG)
**2019**, 38, 205. [Google Scholar] [CrossRef] [Green Version] - Bergamin, K.; Clavet, S.; Holden, D.; Forbes, J.R. DReCon: Data-driven Responsive Control of Physics-based Characters. ACM Trans. Graph. (TOG)
**2019**, 38, 206. [Google Scholar] [CrossRef] [Green Version] - Won, J.; Gopinath, D.; Hodgins, J. A scalable approach to control diverse behaviors for physically simulated characters. ACM Trans. Graph. (TOG)
**2020**, 39, 33. [Google Scholar] [CrossRef] - Fussell, L.; Bergamin, K.; Holden, D. SuperTrack: Motion tracking for physically simulated characters using supervised learning. ACM Trans. Graph. (TOG)
**2021**, 40, 197. [Google Scholar] [CrossRef] - Holden, D.; Komura, T.; Saito, J. Phase-functioned neural networks for character control. ACM Trans. Graph. (TOG)
**2017**, 36, 42. [Google Scholar] [CrossRef] [Green Version] - Ijspeert, A.J.; Nakanishi, J.; Hoffmann, H.; Pastor, P.; Schaal, S. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Comput.
**2013**, 25, 328–373. [Google Scholar] [CrossRef] [Green Version] - Hansen, N. The CMA evolution strategy: A comparing review. Towards a New Evolutionary Computation; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006; pp. 75–102. [Google Scholar]
- Raibert, M.H. Legged Robots That Balance; MIT Press: Cambridge, MA, USA, 1986. [Google Scholar]
- Unitree Robotics. Available online: http://www.unitree.cc/ (accessed on 3 March 2023).
- Kang, D.; Zimmermann, S.; Coros, S. Animal Gaits on Quadrupedal Robots Using Motion Matching and Model Based Control. In Proceedings of the IROS, Prague, Czech Republic, 27 September–1 October 2021. [Google Scholar]
- Won, J.; Lee, J. Learning Body Shape Variation in Physics-based Characters. ACM Trans. Graph. (TOG)
**2019**, 38. [Google Scholar] [CrossRef] [Green Version] - Merel, J.; Tassa, Y.; TB, D.; Srinivasan, S.; Lemmon, J.; Wang, Z.; Wayne, G.; Heess, N. Learning human behaviors from motion capture by adversarial imitation. arXiv
**2017**, arXiv:1707.02201. [Google Scholar] - Merel, J.; Hasenclever, L.; Galashov, A.; Ahuja, A.; Pham, V.; Wayne, G.; Teh, Y.W.; Heess, N. Neural Probabilistic Motor Primitives for Humanoid Control. In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Xie, Z.; Berseth, G.; Clary, P.; Hurst, J.; van de Panne, M. Feedback Control for Cassie with Deep Reinforcement Learning. In Proceedings of the IROS, Madrid, Spain, 1–5 October 2018. [Google Scholar]
- Luo, Y.S.; Soeseno, J.H.; Chen, T.P.C.; Chen, W.C. CARL: Controllable Agent with Reinforcement Learning for Quadruped Locomotion. ACM Trans. Graph. (TOG)
**2020**, 39, 38. [Google Scholar] [CrossRef] - Exarchos, I.; Jiang, Y.; Yu, W.; Liu, C.K. Policy Transfer via Kinematic Domain Randomization and Adaptation. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021. [Google Scholar]
- Schwind, W.J. Spring Loaded Inverted Pendulum Running: A Plant Model; University of Michigan: Ann Arbor, MI, USA, 1998. [Google Scholar]
- Green, K.; Godse, Y.; Dao, J.; Hatton, R.L.; Fern, A.; Hurst, J. Learning Spring Mass Locomotion: Guiding Policies with a Reduced-Order Model. IEEE RAL
**2021**, 6, 3926–3932. [Google Scholar] [CrossRef] - Gong, Y.; Grizzle, J. Angular momentum about the contact point for control of bipedal locomotion: Validation in a lip-based controller. arXiv
**2020**, arXiv:2008.10763. [Google Scholar] - Li, T.; Geyer, H.; Atkeson, C.G.; Rai, A. Using deep reinforcement learning to learn high-level policies on the atrias biped. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA 2019), Montreal, QC, Canada, 20–24 May 2019; pp. 263–269. [Google Scholar]
- Kim, D.; Di Carlo, J.; Katz, B.; Bledt, G.; Kim, S. Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. arXiv
**2019**, arXiv:1909.06586. [Google Scholar] - Xie, Z.; Da, X.; Babich, B.; Garg, A.; van de Panne, M. GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model. arXiv
**2021**, arXiv:2104.09771. [Google Scholar] - Da, X.; Xie, Z.; Hoeller, D.; Boots, B.; Anandkumar, A.; Zhu, Y.; Babich, B.; Garg, A. Learning a contact-adaptive controller for robust, efficient legged locomotion. arXiv
**2020**, arXiv:2009.10019. [Google Scholar] - Li, T.; Calandra, R.; Pathak, D.; Tian, Y.; Meier, F.; Rai, A. Planning in Learned Latent Action Spaces for Generalizable Legged Locomotion. IEEE Robot. Autom. Lett.
**2021**, 6, 2682–2689. [Google Scholar] [CrossRef] - Kormushev, P.; Calinon, S.; Caldwell, D.G. Robot motor skill coordination with EM-based reinforcement learning. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 3232–3237. [Google Scholar]
- Mülling, K.; Kober, J.; Kroemer, O.; Peters, J. Learning to select and generalize striking movements in robot table tennis. Int. J. Robot. Res.
**2013**, 32, 263–279. [Google Scholar] [CrossRef] [Green Version] - Ude, A.; Gams, A.; Asfour, T.; Morimoto, J. Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Trans. Robot.
**2010**, 26, 800–815. [Google Scholar] [CrossRef] [Green Version] - Conkey, A.; Hermans, T. Active learning of probabilistic movement primitives. In Proceedings of the Humanoids, Toronto, ON, Canada, 15–17 October 2019; pp. 1–8. [Google Scholar]
- Kober, J.; Peters, J. Learning motor primitives for robotics. In Proceedings of the ICRA, Paris, France, 7–10 December 2009; pp. 2112–2118. [Google Scholar]
- Pastor, P.; Kalakrishnan, M.; Chitta, S.; Theodorou, E.; Schaal, S. Skill learning and task outcome prediction for manipulation. In Proceedings of the Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 3828–3834. [Google Scholar]
- Rai, A.; Sutanto, G.; Schaal, S.; Meier, F. Learning feedback terms for reactive planning and control. In Proceedings of the Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 2184–2191. [Google Scholar]
- Stulp, F.; Sigaud, O. Path integral policy improvement with covariance matrix adaptation. arXiv
**2012**, arXiv:1206.4621. [Google Scholar] - Bahl, S.; Mukadam, M.; Gupta, A.; Pathak, D. Neural dynamic policies for end-to-end sensorimotor learning. arXiv
**2020**, arXiv:2012.02788. [Google Scholar] - Jalics, L.; Hemami, H.; Zheng, Y.F. Pattern generation using coupled oscillators for robotic and biorobotic adaptive periodic movement. In Proceedings of the Robotics and Automation (ICRA), Albuquerque, NW, USA, 20–25 April 1997. [Google Scholar]
- Kajita, S.; Kanehiro, F.; Kaneko, K.; Fujiwara, K.; Yokoi, K.; Hirukawa, H. A realtime pattern generator for biped walking. In Proceedings of the Robotics and Automation (ICRA), Washington, DC, USA, 11–15 May 2002; Volume 1, pp. 31–37. [Google Scholar]
- Yang, J.F.; Lam, T.; Pang, M.Y.; Lamont, E.; Musselman, K.; Seinen, E. Infant stepping: A window to the behaviour of the human pattern generator for walking. Can. J. Physiol. Pharmacol.
**2004**, 82, 662–674. [Google Scholar] [CrossRef] - Ijspeert, A.J. Central pattern generators for locomotion control in animals and robots: A review. Neural Netw.
**2008**, 21, 642–653. [Google Scholar] [CrossRef] - Rosado, J.; Silva, F.; Santos, V. Adaptation of Robot Locomotion Patterns with Dynamic Movement Primitives. In Proceedings of the 2015 IEEE International Conference on Autonomous Robot Systems and Competitions, Vila Real, Portugal, 8–10 April 2015; pp. 23–28. [Google Scholar] [CrossRef]
- Schaller, N.U.; Herkner, B.; Villa, R.; Aerts, P. The intertarsal joint of the ostrich (Struthio camelus): Anatomical examination and function of passive structures in locomotion. J. Anat.
**2009**, 214, 830–847. [Google Scholar] [CrossRef] [PubMed] - Wu, A.; Geyer, H. The 3-D spring–mass model reveals a time-based deadbeat control for highly robust running and steering in uncertain environments. IEEE TRO
**2013**, 29, 1114–1124. [Google Scholar] [CrossRef] - Zhang, H.; Starke, S.; Komura, T.; Saito, J. Mode-adaptive neural networks for quadruped motion control. ACM Trans. Graph. (TOG)
**2018**, 37, 145. [Google Scholar] [CrossRef] [Green Version] - Coumans, E.; Bai, Y. PyBullet. 2016–2019. Available online: http://pybullet.org (accessed on 3 March 2023).
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv
**2017**, arXiv:1707.06347. [Google Scholar]

**Figure 1.**An overview of our approach. We start with retargeted animal motions and use them as reference motions in a model-based controller. In simulation, we run trajectory optimization to update the reference motion and send the optimized trajectory to hardware. The optimization pipeline is used on four animal motions—turn, trot, pace, and side-step. The same hardware controller tracks these motions, resulting in diverse, agile gaits on hardware with no real-world fine-tuning.

**Figure 2.**An overview of our model-based controller which uses animal motions as desired trajectory input. The controller is composed of three components: a stance controller, a swing controller, and a gait planner.The stance controller plans contact force to command desired joint angles and torques on the stance legs. The swing controller commands desired joint angles on the swing legs to track the desired motion trajectory. Both the stance and swing controllers have an online adjustment module to stabilize the robot. The gait planner is responsible for switching the legs between swing and stance.

**Figure 3.**Finite state machine used by the gait planner to switch leg control from swing (0) to stance (1) and vice versa. When the measured contact state deviates from the desired contact state, depending on the previous state of the leg, the gait planner initiates a switch from stance to swing, or vice versa. This also takes into account early touchdown (TD) and take off (TO). We additionally add a minimum time limit of 100 ms between consecutive switches to avoid instability due to measurement noise.

**Figure 5.**(

**Left**) Tracking in z-axis. DMP optimization increased the amplitude of the trajectory. Tracking error was decreased by 24%. (

**Right**) Tracking in x-axis. With optimization, the constant drift was reduced. Tracking error was decreased by 92%.

**Figure 6.**Snapshots of our approach in action on the A1 robot—Turn, Trot, Pace and Side-stepping motions. A model-based controller with online adaptation achieves all the motions, with no fine-tuning.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, T.; Won, J.; Cho, J.; Ha, S.; Rai, A.
FastMimic: Model-Based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion. *Robotics* **2023**, *12*, 90.
https://doi.org/10.3390/robotics12030090

**AMA Style**

Li T, Won J, Cho J, Ha S, Rai A.
FastMimic: Model-Based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion. *Robotics*. 2023; 12(3):90.
https://doi.org/10.3390/robotics12030090

**Chicago/Turabian Style**

Li, Tianyu, Jungdam Won, Jeongwoo Cho, Sehoon Ha, and Akshara Rai.
2023. "FastMimic: Model-Based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion" *Robotics* 12, no. 3: 90.
https://doi.org/10.3390/robotics12030090