Motion Planning-Augmented Hierarchical Reinforcement Learning for Long-Horizon Mobile Manipulation
Abstract
1. Introduction
2. Related Work
2.1. Hierarchical Reinforcement Learning for Long-Horizon Mobile Manipulation
2.2. Integrating Motion Planning with Reinforcement Learning
3. Preliminaries
3.1. Markov Decision Process
3.2. Options Formulation and Semi-Markov Decision Process
3.3. Skill Chaining
3.4. Motion Planning Priors
4. MP-Augmented HRL for Long-Horizon Tasks
4.1. HRL via SMDP-Based Skill Chaining
- TidyHouse: Five target objects are relocated from their initial positions to designated open receptacles such as tables and counters.
- SetTable: One bowl is retrieved from a closed drawer and one apple is retrieved from a closed fridge, both of which are subsequently placed on the dining table.
- Pick: grasps target object x given its ground-truth pose .
- Place: places object x at goal position , optionally within a target articulation.
- Open: opens articulation a (fridge or drawer) at handle position .
- Close: closes articulation a at handle position .
- Navigate: repositions the mobile base to a region-goal defined around the subsequent manipulation target, rather than a single point coordinate.
4.2. Region-Goal Generation for Seamless Connectivity
4.3. MP-Augmented Subtask Learning
| Algorithm 1 MP-Augmented Soft Actor–Critic (SAC) algorithm. |
|
4.4. Experimental Validation via Challenging Configurations
5. Results and Discussion
5.1. Experimental Setup and Ablation Conditions
- Baseline (MSHAB subtasks + P2P NAV)—standard RL-trained subtask policies from the MS-HAB benchmark combined with point-to-point navigation. The manipulation subtasks are trained with the task-specific shaping terms defined in Appendix A but without the MP-guided tracking reward .
- Ours (MPA subtasks + RG NAV)—MP-Augmented subtask policies trained with the MP-guided tracking reward defined in Equation (4), combined with region-goal navigation.
5.2. Subtask-Level Comparison
5.3. Region-Goal Navigation
5.4. Long-Horizon Sequential Performance
5.5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| MP | Motion Planning |
| RL | Reinforcement Learning |
| DRL | Deep Reinforcement Learning |
| MSHAB | ManiSkill-HAB |
| HAB | Home Assistant Benchmark |
| TCP | Tool Center Point |
| EE | End Effector |
| Nav | Navigate |
| SAC | Soft Actor–Critic |
| PPO | Proximal Policy Optimization |
| SMDP | Semi-Markov Decision Process |
| MDP | Markov Decision Process |
| HRL | Hierarchical Reinforcement Learning |
| LLM | Large Language Model |
| VLM | Vision-Language Model |
| IK | Inverse Kinematics |
| P2P | Point-to-point |
Appendix A. Detailed Subtask Reward Functions
Appendix A.1. Pick Subtask
Appendix A.2. Place Subtask
Appendix A.3. Open Subtask
Appendix A.4. Close Subtask
Appendix A.5. Navigate Subtask
Appendix B. Detailed Motion Planning
Appendix B.1. Inverse Kinematics Target Selection
Appendix B.2. Hierarchical Motion Planning
References
- Shukla, A.; Tao, S.; Su, H. ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks. In Proceedings of the International Conference on Learning Representations (ICLR); OpenReview.net: Singapore, 2025. [Google Scholar]
- Xia, F.; Li, C.; Martín-Martín, R.; Litany, O.; Toshev, A.; Savarese, S. Relmogen: Integrating motion generation in reinforcement learning for mobile manipulation. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2021; pp. 4583–4590. [Google Scholar]
- Sun, J.; Curtis, A.; You, Y.; Xu, Y.; Koehle, M.; Chen, Q.; Huang, S.; Guibas, L.; Chitta, S.; Schwager, M.; et al. ARCH: Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly. In Proceedings of the 9th Conference on Robot Learning (CoRL), Seoul, Republic of Korea, 27–30 September 2025; Machine Learning Research; PMLR: New York, NY, USA, 2025; Volume 305, pp. 2628–2642. [Google Scholar]
- Yamada, J.; Lee, Y.; Salhotra, G.; Pertsch, K.; Pflueger, M.; Sukhatme, G.; Lim, J.; Englert, P. Motion planner augmented reinforcement learning for robot manipulation in obstructed environments. In Proceedings of the Conference on Robot Learning; PMLR: New York, NY, USA, 2021; pp. 589–603. [Google Scholar]
- Faust, A.; Ramirez, O.; Fiser, M.; Oslund, K.; Francis, A.; Davidson, J.; Tapia, L. PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2018; pp. 5113–5120. [Google Scholar]
- Szot, A.; Clegg, A.; Undersander, E.; Wijmans, E.; Zhao, Y.; Turner, J.; Maestre, N.; Mukadam, M.; Chaplot, D.S.; Maksymets, O.; et al. Habitat 2.0: Training home assistants to rearrange their habitat. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS); Curran Associates Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 251–266. [Google Scholar]
- Kindle, J.; Furrer, F.; Novkovic, T.; Chung, J.J.; Siegwart, R.; Nieto, J. Whole-Body Control of a Mobile Manipulator using End-to-End Reinforcement Learning. arXiv 2020, arXiv:2003.02637. [Google Scholar]
- Sutton, R.S.; Precup, D.; Singh, S. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell. 1999, 112, 181–211. [Google Scholar] [CrossRef]
- Konidaris, G.; Barto, A. Skill discovery in continuous reinforcement learning domains using skill chaining. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2009; Volume 22. [Google Scholar]
- Bhaskar, A.; Mahammad, Z.; Jadhav, S.R.; Tokekar, P. PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning. arXiv 2024, arXiv:2408.04054. [Google Scholar]
- Dalal, M.; Chiruvolu, T.; Chaplot, D.S.; Salakhutdinov, R. Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks. In Proceedings of the International Conference on Learning Representations (ICLR); OpenReview.net: Vienna, Austria, 2024. [Google Scholar]
- Gao, J.; Ye, W.; Guo, J.; Li, Z. Deep Reinforcement Learning for Indoor Mobile Robot Path Planning. Sensors 2020, 20, 5493. [Google Scholar] [CrossRef] [PubMed]
- Ota, K.; Jha, D.; Onishi, T.; Kanezaki, A.; Yoshiyasu, Y.; Sasaki, Y.; Mariyama, T.; Nikovski, D. Deep Reactive Planning in Dynamic Environments. In Proceedings of the 2020 Conference on Robot Learning. PMLR; Machine Learning Research; Mitsubishi Electric Research Laboratories, Inc.: Cambridge, MA, USA, 2021; Volume 155, pp. 1943–1957. [Google Scholar]
- Kolomeytsev, Y.; Golembiovsky, D. Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation. arXiv 2025, arXiv:2512.24651. [Google Scholar]
- Freitag, K.; Ceder, K.; Laezza, R.; Åkesson, K.; Chehreghani, M.H. Curriculum Reinforcement Learning for Complex Reward Functions. arXiv 2024, arXiv:2410.16790. [Google Scholar]
- Song, S.; Bihl, T.; Liu, J. Coulomb force-guided deep reinforcement learning for effective and explainable robotic motion planning. Front. Robot. AI 2026, 12, 1697155. [Google Scholar] [CrossRef] [PubMed]
- Wang, D.; Zhang, P.; Ding, P.; Wang, J.; Zhang, J. A trend-aware reinforcement learning approach for adaptive motion planning of robotic manipulators in dynamic environments. Eng. Appl. Artif. Intell. 2026, 171, 114284. [Google Scholar] [CrossRef]
- Johannink, T.; Bahl, S.; Nair, A.; Luo, J.; Kumar, A.; Loskyll, M.; Ojea, J.A.; Solowjow, E.; Levine, S. Residual Reinforcement Learning for Robot Control. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2019; pp. 6023–6029. [Google Scholar] [CrossRef]
- Jauhri, S.; Prasad, V.; Chalvatzaki, G. Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers. arXiv 2026, arXiv:2604.12509. [Google Scholar]
- Gu, J.; Chaplot, D.S.; Su, H.; Malik, J. Multi-skill Mobile Manipulation for Object Rearrangement. In Proceedings of the International Conference on Learning Representations (ICLR); OpenReview.net: Kigali, Rwanda, 2023. [Google Scholar]
- Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; John Wiley & Sons: New York, NY, USA, 1994. [Google Scholar]
- LaValle, S.M. Planning Algorithms; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
- LaValle, S.M. Rapidly-Exploring Random Trees: A New Tool for Path Planning; Technical Report TR 98-11; Department of Computer Science, Iowa State University: Ames, IA, USA, 1998. [Google Scholar]
- Karaman, S.; Frazzoli, E. Sampling-Based Algorithms for Optimal Motion Planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
- Kavraki, L.E.; Švestka, P.; Latombe, J.C.; Overmars, M.H. Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces. IEEE Trans. Robot. Autom. 1996, 12, 566–580. [Google Scholar] [CrossRef]
- Huang, X.; Batra, D.; Rai, A.; Szot, A. Skill Transformer: A Monolithic Policy for Mobile Manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2023; pp. 10852–10862. [Google Scholar]
- Sundaralingam, B.; Hari, S.K.S.; Fishman, A.; Garrett, C.; Van Wyk, K.; Blukis, V.; Millane, A.; Oleynikova, H.; Handa, A.; Ramos, F.; et al. CuRobo: Parallelized collision-free robot motion generation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2023; pp. 8112–8119. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning (ICML); PMLR: Cambridge, MA, USA, 2018; pp. 1861–1870. [Google Scholar]
- Lin, T.; Sachdev, K.; Fan, L.; Malik, J.; Zhu, Y. Sim-to-real reinforcement learning for vision-based dexterous manipulation on humanoids. arXiv 2025, arXiv:2502.20396. [Google Scholar]
- Li, Y.; Zhang, W.; Zhang, Z.; Shi, X.; Li, Z.; Zhang, M.; Chi, W. An adaptive compensation strategy for sensors based on the degree of degradation. Biomim. Intell. Robot. 2025, 5, 100235. [Google Scholar] [CrossRef]
- Chen, G.; Meng, X.; Qiao, J.; Zhang, Z.; Wu, W.; Xu, Y.; Hu, H. Design and depth control of a beaver-inspired underwater robot based on center-of-mass adjustment. Ocean Eng. 2026, 352, 124616. [Google Scholar] [CrossRef]
- Zhang, H.; Sheng, X.; Xiong, Z.; Zhu, X. A Novel Semi-Coupled Hierarchical Motion Planning Framework for Cooperative Transportation of Multiple Mobile Manipulators. Robotica 2025, 43, 4302–4324. [Google Scholar] [CrossRef]
- Zang, X.; Zhang, X.; Zhao, J. A Hierarchical Motion Planning Method for Mobile Manipulator. Sensors 2023, 23, 6952. [Google Scholar] [CrossRef] [PubMed]
- Şucan, I.A.; Moll, M.; Kavraki, L.E. The Open Motion Planning Library. IEEE Robot. Autom. Mag. 2012, 19, 72–82. [Google Scholar] [CrossRef]








| Method | Simulator | Task | MP Integ. | Ref. Space | SC | RG | HRL | Grasp |
|---|---|---|---|---|---|---|---|---|
| PRM-RL [5] | Custom 2D | 2D Nav. | Sub-goal | Cartesian | × | × | × | — |
| Kindle et al. [7] | Gazebo | WB Reaching | Reward | Cartesian | × | × | × | — |
| MoPA-RL [4] | MuJoCo | Manipulation | Action-space | Joint | × | × | × | — |
| Ota et al. [13] | PyBullet | Reaching | Reward | Joint | × | × | × | — |
| ReLMoGen [2] | iGibson | Interactive Nav | — | — | √ | × | √ | Magical |
| M3 [20] | Habitat 2.0 | HAB | — | — | √ | √ | √ | Magical |
| PSL [11] | Robosuite | Manipulation | Action-space | Cartesian | √ | × | √ | Magical |
| PLANRL [10] | RoboSuite | Manipulation | Mode-switch | Mixed | × | × | × | — |
| ARCH [3] | IsaacGym | Manipulation | Hybrid | Cartesian | √ | × | √ | — |
| MS-HAB [1] | ManiSkill3 | HAB | — | — | √ | × | √ | Realistic |
| Ours | ManiSkill3 | HAB | Reward | Joint | √ | √ | √ | Realistic |
| Task | Subtask | Ours (MP-Aug) | Baseline | ||
|---|---|---|---|---|---|
| SR (%) | Conv. (M) | SR (%) | Conv. (M) | ||
| TidyHouse | Pick | 78.0 | ∼8 | 75.0 | ∼10 |
| Place | 60.0 | ∼4 | 50.0 | ∼4 | |
| SetTable | Open | 89.0 | ∼5 | 87.0 | ∼15 |
| Pick | 91.0 | ∼8 | 83.0 | ∼10 | |
| Place | 68.0 | ∼3.5 | 65.0 | ∼4 | |
| Close | 89.0 | ∼7 | 88.0 | ∼10 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Kim, H.; Choi, M.-T. Motion Planning-Augmented Hierarchical Reinforcement Learning for Long-Horizon Mobile Manipulation. Sensors 2026, 26, 3845. https://doi.org/10.3390/s26123845
Kim H, Choi M-T. Motion Planning-Augmented Hierarchical Reinforcement Learning for Long-Horizon Mobile Manipulation. Sensors. 2026; 26(12):3845. https://doi.org/10.3390/s26123845
Chicago/Turabian StyleKim, Hyungtai, and Mun-Taek Choi. 2026. "Motion Planning-Augmented Hierarchical Reinforcement Learning for Long-Horizon Mobile Manipulation" Sensors 26, no. 12: 3845. https://doi.org/10.3390/s26123845
APA StyleKim, H., & Choi, M.-T. (2026). Motion Planning-Augmented Hierarchical Reinforcement Learning for Long-Horizon Mobile Manipulation. Sensors, 26(12), 3845. https://doi.org/10.3390/s26123845

