Robust Visuomotor Control for Humanoid Loco-Manipulation Using Hybrid Reinforcement Learning
Abstract
1. Introduction
- A successful implementation of deep visuo-motor RL for loco-manipulation tasks on a physical humanoid robot, which dynamically adjusts whole-body motions in response to environmental variations such as object displacements.
- A hybrid RL-WBC architecture that bridges the sim-to-real gap by ensuring physical feasibility, achieving high success rates in different loco-manipulation tasks.
- Novel sampling priority metric and mid-way initialization technique that accelerate policy convergence during training.
2. Related Works
2.1. Model-Based Methods
2.2. Learning-Based Methods
2.3. Hybrid Methods
3. Visuomotor Whole Body Control
3.1. Whole Body Control with Manipulation Load Stabilizing
3.2. Integration with RL
3.3. Visuomotor Control with Depth Image
4. Training Efficiency Enhancement
4.1. Prioritized Experience Sampling
4.2. Mid-Way Initialization
Algorithm 1 Visuomotor Control for Humanoid Loco-manipulation Tasks |
Require: batch size M, trajectory length N, number of actors K, replay size R, exploration constant , initial learning rates and |
Learner |
1: Initialize visual encoder weights v, which is shared by actor and critic networks |
2: Initialize the remaining network weights at random |
3: Initialize target weights |
4: Launch K actors and replicate network weights to each actor |
5: for do |
6: Sample M transitions of length N from replay with priority |
7: Construct the target distributions |
8: Compute the actor and critic updates |
9: Update network parameters |
10: If , update the target networks |
11: If , replicate network weights to the actors |
12: end for |
13: return policy parameters |
Actor |
1: repeat |
2: wait until current movement step finishes |
3: Sample action |
4: Esecute action , observe reward r and state , which includes an one-hot vector representing current FSM state |
5: Calculate priority p using Equation (6) |
6: Store in replay |
7: until Learner finishes |
5. Ablation Study
5.1. Load-Carrying Task
5.2. Door Opening Task
6. Experiment Results
6.1. Robot Platform
6.2. Indoor Experiments
6.3. Locomotion Stability
6.4. Manipulation Adjustment
7. Discussion
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Murooka, M.; Kumagai, I.; Morisawa, M.; Kanehiro, F.; Kheddar, A. Humanoid loco-manipulation planning based on graph search and reachability maps. IEEE Robot. Autom. Lett. 2021, 6, 1840–1847. [Google Scholar] [CrossRef]
- Agility Robotics. Made for Work. Available online: www.youtube.com/watch?v=Jycdks836bY (accessed on 26 June 2025).
- Chappellet, K.; Murooka, M.; Caron, G.; Kanehiro, F.; Kheddar, A. Humanoid loco-manipulations using combined fast dense 3d tracking and slam with wide-angle depth-images. IEEE Trans. Autom. Sci. Eng. 2023, 21, 3691–3704. [Google Scholar] [CrossRef]
- Chi, C.; Xu, Z.; Feng, S.; Cousineau, E.; Du, Y.; Burchfiel, B.; Tedrake, R.; Song, S. Diffusion policy: Visuomotor policy learning via action diffusion. Int. J. Robot. Res. 2023, 02783649241273668. Available online: https://arxiv.org/pdf/2303.04137 (accessed on 10 June 2025).
- Seo, M.; Han, S.; Sim, K.; Bang, S.H.; Gonzalez, C.; Sentis, L.; Zhu, Y. Deep imitation learning for humanoid loco-manipulation through human teleoperation. In Proceedings of the 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), Austin, TX, USA, 12–14 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]
- Zhang, C.; Xiao, W.; He, T.; Shi, G. Wococo: Learning whole-body humanoid control with sequential contacts. arXiv 2024, arXiv:2406.06005. [Google Scholar]
- Ji, M.; Peng, X.; Liu, F.; Li, J.; Yang, G.; Cheng, X.; Wang, X. Exbody2: Advanced expressive humanoid whole-body control. arXiv 2024, arXiv:2412.13196. [Google Scholar]
- Haarnoja, T.; Moran, B.; Lever, G.; Huang, S.H.; Tirumala, D.; Humplik, J.; Wulfmeier, M.; Tunyasuvunakool, S.; Siegel, N.Y.; Hafner, R.; et al. Learning agile soccer skills for a bipedal robot with deep reinforcement learning. Sci. Robot. 2024, 9, eadi8022. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Peng, X.B.; Abbeel, P.; Levine, S.; Berseth, G.; Sreenath, K. Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control. Int. J. Robot. Res. 2024, 44, 840–888. [Google Scholar] [CrossRef]
- Radosavovic, I.; Xiao, T.; Zhang, B.; Darrell, T.; Malik, J.; Sreenath, K. Real-world humanoid locomotion with reinforcement learning. Sci. Robot. 2024, 9, eadi9579. [Google Scholar] [CrossRef] [PubMed]
- Gu, Z.; Li, J.; Shen, W.; Yu, W.; Xie, Z.; McCrory, S.; Cheng, X.; Shamsah, A.; Griffin, R.; Liu, C.K.; et al. Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning. arXiv 2025, arXiv:2501.02116. [Google Scholar]
- Chitnis, R.; Tulsiani, S.; Gupta, S.; Gupta, A. Efficient bimanual manipulation using learned task schemas. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1149–1155. [Google Scholar]
- Nasiriany, S.; Liu, H.; Zhu, Y. Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 7477–7484. [Google Scholar]
- Peng, X.B.; Abbeel, P.; Levine, S.; Van de Panne, M. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. (TOG) 2018, 37, 1–14. [Google Scholar] [CrossRef]
- Zhang, Y.; Yuan, Y.; Gurunath, P.; He, T.; Omidshafiei, S.; Agha-mohammadi, A.a.; Vazquez-Chanlatte, M.; Pedersen, L.; Shi, G. FALCON: Learning Force-Adaptive Humanoid Loco-Manipulation. arXiv 2025, arXiv:2505.06776. [Google Scholar]
- Dao, J.; Duan, H.; Fern, A. Sim-to-real learning for humanoid box loco-manipulation. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 16930–16936. [Google Scholar]
- Bouyarmane, K.; Kheddar, A. Using a multi-objective controller to synthesize simulated humanoid robot motion with changing contact configurations. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 25–30 September 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 4414–4419. [Google Scholar]
- Wang, C.; Chen, X.; Yu, Z.; Dong, Y.; Chen, K.; Gergondet, P. Robust humanoid robot vehicle ingress with a finite state machine integrated with deep reinforcement learning. Int. J. Mach. Learn. Cybern. 2024, 16, 2537–2551. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Jorgensen, S.J.; Vedantam, M.; Gupta, R.; Cappel, H.; Sentis, L. Finding locomanipulation plans quickly in the locomotion constrained manifold. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 6611–6617. [Google Scholar]
- Lobos-Tsunekawa, K.; Leiva, F.; Ruiz-del Solar, J. Visual navigation for biped humanoid robots using deep reinforcement learning. IEEE Robot. Autom. Lett. 2018, 3, 3247–3254. [Google Scholar] [CrossRef]
- Hwangbo, J.; Lee, J.; Dosovitskiy, A.; Bellicoso, D.; Tsounis, V.; Koltun, V.; Hutter, M. Learning agile and dynamic motor skills for legged robots. Sci. Robot. 2019, 4, eaau5872. [Google Scholar] [CrossRef] [PubMed]
- Yang, C.; Yuan, K.; Heng, S.; Komura, T.; Li, Z. Learning natural locomotion behaviors for humanoid robots using human bias. IEEE Robot. Autom. Lett. 2020, 5, 2610–2617. [Google Scholar] [CrossRef]
- Melo, L.C.; Melo, D.C.; Maximo, M.R. Learning humanoid robot running motions with symmetry incentive through proximal policy optimization. J. Intell. Robot. Syst. 2021, 102, 54. [Google Scholar] [CrossRef]
- Li, Z.; Cheng, X.; Peng, X.B.; Abbeel, P.; Levine, S.; Berseth, G.; Sreenath, K. Reinforcement learning for robust parameterized locomotion control of bipedal robots. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2811–2817. [Google Scholar]
- Wang, G.; Xin, M.; Wu, W.; Liu, Z.; Wang, H. Learning of long-horizon sparse-reward robotic manipulator tasks with base controllers. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 4072–4081. [Google Scholar] [CrossRef] [PubMed]
- Kuo, P.H.; Chen, K.L. Two-stage fuzzy object grasping controller for a humanoid robot with proximal policy optimization. Eng. Appl. Artif. Intell. 2023, 125, 106694. [Google Scholar] [CrossRef]
- Johannink, T.; Bahl, S.; Nair, A.; Luo, J.; Kumar, A.; Loskyll, M.; Ojea, J.A.; Solowjow, E.; Levine, S. Residual reinforcement learning for robot control. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 6023–6029. [Google Scholar]
- CNRS-AIST Joint Robotics Laboratory. mc_rtc. Available online: https://github.com/jrl-umi3218/mc_rtc (accessed on 26 June 2025).
- Barth-Maron, G.; Hoffman, M.W.; Budden, D.; Dabney, W.; Horgan, D.; Tb, D.; Muldal, A.; Heess, N.; Lillicrap, T. Distributed distributional deterministic policy gradients. arXiv 2018, arXiv:1804.08617. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Ma, L.; Chen, J. Using RGB image as visual input for mapless robot navigation. arXiv 2019, arXiv:1903.09927. [Google Scholar]
- Kim, T.; Park, Y.; Park, Y.; Suh, I.H. Acceleration of actor-critic deep reinforcement learning for visual grasping in clutter by state representation learning based on disentanglement of a raw input image. arXiv 2020, arXiv:2002.11903. [Google Scholar]
- Zeng, F.; Wang, C.; Ge, S.S. A survey on visual navigation for artificial agents with deep reinforcement learning. IEEE Access 2020, 8, 135426–135442. [Google Scholar] [CrossRef]
- Hansen, J.; Hogan, F.; Rivkin, D.; Meger, D.; Jenkin, M.; Dudek, G. Visuotactile-rl: Learning multimodal manipulation policies with deep reinforcement learning. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 8298–8304. [Google Scholar]
- Brossette, S.; Vaillant, J.; Keith, F.; Escande, A.; Kheddar, A. Point-cloud multi-contact planning for humanoids: Preliminary results. In Proceedings of the 2013 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM), Manila, Philippines, 12–15 November 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 19–24. [Google Scholar]
- Qin, Y.; Escande, A.; Kanehiro, F.; Yoshida, E. Dual-arm mobile manipulation planning of a long deformable object in industrial installation. IEEE Robot. Autom. Lett. 2023, 8, 3039–3046. [Google Scholar] [CrossRef]
- Yarats, D.; Fergus, R.; Lazaric, A.; Pinto, L. Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv 2021, arXiv:2107.09645. [Google Scholar]
- Fu, Z.; Cheng, X.; Pathak, D. Deep whole-body control: Learning a unified policy for manipulation and locomotion. In Proceedings of the Conference on Robot Learning, PMLR, Atlanta, GA, USA, 6–9 November 2023; pp. 138–149. [Google Scholar]
- Liu, M.; Chen, Z.; Cheng, X.; Ji, Y.; Qiu, R.Z.; Yang, R.; Wang, X. Visual whole-body control for legged loco-manipulation. arXiv 2024, arXiv:2403.16967. [Google Scholar]
- Cong, Y.; Chen, R.; Ma, B.; Liu, H.; Hou, D.; Yang, C. A comprehensive study of 3-D vision-based robot manipulation. IEEE Trans. Cybern. 2021, 53, 1682–1698. [Google Scholar] [CrossRef] [PubMed]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
- Gao, J.; Li, X.; Liu, W.; Zhao, J. Prioritized experience replay method based on experience reward. In Proceedings of the 2021 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), Chongqing, China, 9–11 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 214–219. [Google Scholar]
- Cheng, N.; Wang, P.; Zhang, G.; Ni, C.; Nematov, E. Prioritized experience replay in path planning via multi-dimensional transition priority fusion. Front. Neurorobotics 2023, 17, 1281166. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Qian, X.; Song, W. Prioritized experience replay based on dynamics priority. Sci. Rep. 2024, 14, 6014. [Google Scholar] [CrossRef] [PubMed]
- Rajeswaran, A.; Kumar, V.; Gupta, A.; Vezzani, G.; Schulman, J.; Todorov, E.; Levine, S. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv 2017, arXiv:1709.10087. [Google Scholar]
- Nair, A.; McGrew, B.; Andrychowicz, M.; Zaremba, W.; Abbeel, P. Overcoming exploration in reinforcement learning with demonstrations. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 6292–6299. [Google Scholar]
- Sato, S.; Kojio, Y.; Kojima, K.; Sugai, F.; Kakiuchi, Y.; Okada, K.; Inaba, M. Drop prevention control for humanoid robots carrying stacked boxes. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 4118–4125. [Google Scholar]
- Li, J.; Nguyen, Q. Multi-contact mpc for dynamic loco-manipulation on humanoid robots. In Proceedings of the 2023 American Control Conference (ACC), San Diego, CA, USA, 31 May– 2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1215–1220. [Google Scholar]
- Caron, S.; Kheddar, A.; Tempier, O. Stair climbing stabilization of the HRP-4 humanoid robot using whole-body admittance control. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 277–283. [Google Scholar]
Architecture | Trans./Rot. Error | Param. Count | Input |
---|---|---|---|
Our method | 1.8 cm/4.5° | ∼1.6M | Depth |
ResNet-18 | 2.6 cm/6.5° | ∼12M | RGB |
PoseCNN | 2.1 cm/5.3° | ∼35M | RGB |
GDR-Net | 1.2 cm/3.1° | ∼50M | RGB |
FFB6D | 0.9 cm/2.5° | ∼65M | RGB-D |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, C.; Huang, Q.; Chen, X.; Zhang, Z.; Shi, J. Robust Visuomotor Control for Humanoid Loco-Manipulation Using Hybrid Reinforcement Learning. Biomimetics 2025, 10, 469. https://doi.org/10.3390/biomimetics10070469
Wang C, Huang Q, Chen X, Zhang Z, Shi J. Robust Visuomotor Control for Humanoid Loco-Manipulation Using Hybrid Reinforcement Learning. Biomimetics. 2025; 10(7):469. https://doi.org/10.3390/biomimetics10070469
Chicago/Turabian StyleWang, Chenzheng, Qiang Huang, Xuechao Chen, Zeyu Zhang, and Jing Shi. 2025. "Robust Visuomotor Control for Humanoid Loco-Manipulation Using Hybrid Reinforcement Learning" Biomimetics 10, no. 7: 469. https://doi.org/10.3390/biomimetics10070469
APA StyleWang, C., Huang, Q., Chen, X., Zhang, Z., & Shi, J. (2025). Robust Visuomotor Control for Humanoid Loco-Manipulation Using Hybrid Reinforcement Learning. Biomimetics, 10(7), 469. https://doi.org/10.3390/biomimetics10070469