Learning to Navigate in Mixed Human–Robot Crowds via an Attention-Driven Deep Reinforcement Learning Framework
Abstract
1. Introduction
- A navigation framework based on single-agent deep reinforcement learning, which allows the ego robot to move according to social norms among humans and other robots that follow a predefined collision avoidance behavior.
- The use of imitation learning for socially aware single-agent navigation in environments shared by other robots and humans.
- A model of the full social awareness for the ego robot and partial social awareness for other robots to reflect realistic interactions, enabling the extraction of socially relevant knowledge from human–robot interactions.
2. Related Work
3. Methodology
3.1. Assumptions
- This work is based on a single-agent deep reinforcement learning framework, where an agent learns in an environment that includes other robots and humans as part of the environment.
- The robot that learns the optimal policy is called the ego robot, while the other robots in the environment are referred to as the other robots.
- The ego robot has a full view of the environment, while the other robots have a partial view of the environment.
- All the robots are modeled as holonomic, i.e., they can move in any direction instantly without rotation constraints.
- Humans and other robots in the environment follow the ORCA policy.
- There is no explicit communication of navigation intent between the ego robot, other robots, and humans; they can only observe the states of each other.
- The navigation is modeled as point-to-point navigation scenarios in Two-Dimensional (2D) plane.
3.2. Problem Formulation
3.2.1. State Space
3.2.2. Action Space
3.2.3. Reward Function
3.2.4. Optimal Policy and Value Function
3.3. Interaction Modeling
3.3.1. Human–Human Interaction
3.3.2. Human–Ego Robot Interaction
3.3.3. Ego Robot–Other Robot Interaction
3.3.4. Human–Other Robot Interaction
3.3.5. Other Robot–Other Robot Interaction
3.4. System Architecture
3.4.1. Interaction Module
- The agent’s own state ( for a human or for another robot).
- The agent’s local map ( or ), as defined in Equation (11).
- The ego robot’s own state (), which is shared across all agents.
3.4.2. Pooling Module
3.4.3. Planning Module
4. Results
4.1. Training Setup
4.2. Qualitative Results
4.2.1. Open Space Scenario
- a
- 5 Humans and 2 other robots
- b
- 10 Humans and 3 Other robots
4.2.2. Static Obstacles Scenario
4.3. Quantitative Results and Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ivanov, S.; Gretzel, U.; Berezina, K.; Sigala, M.; Webster, C. Progress on robotics in hospitality and tourism: A review of the literature. J. Hosp. Tour. Technol. 2019, 10, 489–521. [Google Scholar] [CrossRef]
- Daza, M.; Barrios-Aranibar, D.; Diaz-Amado, J.; Cardinale, Y.; Vilasboas, J. An Approach of Social Navigation Based on Proxemics for Crowded Environments of Humans and Robots. Micromachines 2021, 12, 193. [Google Scholar] [CrossRef] [PubMed]
- Samarakoon, S.M.B.P.; Muthugala, M.A.V.J.; Jayasekara, A.G.B.P. A Review on Human–Robot Proxemics. Electronics 2022, 11, 2490. [Google Scholar] [CrossRef]
- Kruse, T.; Pandey, A.K.; Alami, R.; Kirsch, A. Human-aware robot navigation: A survey. Robot. Auton. Syst. 2013, 61, 1726–1743. [Google Scholar] [CrossRef]
- Guillén-Ruiz, S.; Bandera, J.P.; Hidalgo-Paniagua, A.; Bandera, A. Evolution of socially-aware robot navigation. Electronics 2023, 12, 1570. [Google Scholar] [CrossRef]
- Chen, Y.; Zhao, F.; Lou, Y. Interactive model predictive control for robot navigation in dense crowds. IEEE Trans. Syst. Man, Cybern. Syst. 2021, 52, 2289–2301. [Google Scholar] [CrossRef]
- Fiorini, P.; Shiller, Z. Motion planning in dynamic environments using velocity obstacles. Int. J. Robot. Res. 1998, 17, 760–772. [Google Scholar] [CrossRef]
- Van Den Berg, J.; Guy, S.J.; Lin, M.; Manocha, D. Reciprocal n-body collision avoidance. In Robotics Research: The 14th International Symposium ISRR; Springer: Berlin/Heidelberg, Germany, 2011; pp. 3–19. [Google Scholar]
- Helbing, D.; Molnar, P. Social force model for pedestrian dynamics. Phys. Rev. E 1995, 51, 4282. [Google Scholar] [CrossRef]
- Aoude, G.S.; Luders, B.D.; Joseph, J.M.; Roy, N.; How, J.P. Probabilistically safe motion planning to avoid dynamic obstacles with uncertain motion patterns. Auton. Robot. 2013, 35, 51–76. [Google Scholar] [CrossRef]
- Svenstrup, M.; Bak, T.; Andersen, H.J. Trajectory planning for robots in dynamic human environments. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 4293–4298. [Google Scholar]
- Fulgenzi, C.; Tay, C.; Spalanzani, A.; Laugier, C. Probabilistic navigation in dynamic environment using rapidly-exploring random trees and gaussian processes. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, 22–26 September 2008; pp. 1056–1062. [Google Scholar]
- Chen, Y.F.; Liu, M.; Everett, M.; How, J.P. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 285–292. [Google Scholar]
- Chen, Y.F.; Everett, M.; Liu, M.; How, J.P. Socially aware motion planning with deep reinforcement learning. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 1343–1350. [Google Scholar]
- Everett, M.; Chen, Y.F.; How, J.P. Motion planning among dynamic, decision-making agents with deep reinforcement learning. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3052–3059. [Google Scholar]
- Chen, C.; Hu, S.; Nikdel, P.; Mori, G.; Savva, M. Relational graph learning for crowd navigation. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 10007–10013. [Google Scholar]
- Zhou, Z.; Zhu, P.; Zeng, Z.; Xiao, J.; Lu, H.; Zhou, Z. Robot navigation in a crowd by integrating deep reinforcement learning and online planning. Appl. Intell. 2022, 52, 15600–15616. [Google Scholar] [CrossRef]
- Chen, C.; Liu, Y.; Kreiss, S.; Alahi, A. Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 6015–6022. [Google Scholar]
- Yang, Y.; Jiang, J.; Zhang, J.; Huang, J.; Gao, M. ST2: Spatial-temporal state transformer for crowd-aware autonomous navigation. IEEE Robot. Autom. Lett. 2023, 8, 912–919. [Google Scholar] [CrossRef]
- Tai, L.; Zhang, J.; Liu, M.; Burgard, W. Socially compliant navigation through raw depth inputs with generative adversarial imitation learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 1111–1117. [Google Scholar]
- Pfeiffer, M.; Schaeuble, M.; Nieto, J.; Siegwart, R.; Cadena, C. From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1527–1533. [Google Scholar]
- Chandra, R.; Zinage, V.; Bakolas, E.; Biswas, J.; Stone, P. Decentralized multi-robot social navigation in constrained environments via game-theoretic control barrier functions. arXiv 2023, arXiv:2308.10966. [Google Scholar]
- Escudie, E.; Matignon, L.; Saraydaryan, J. Attention graph for multi-robot social navigation with deep reinforcement learning. arXiv 2024, arXiv:2401.17914. [Google Scholar] [CrossRef]
- Wang, W.; Mao, L.; Wang, R.; Min, B.C. Multi-robot cooperative socially-aware navigation using multi-agent reinforcement learning. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 12353–12360. [Google Scholar]
- Albrecht, S.V.; Christianos, F.; Schäfer, L. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches; MIT Press: Cambridge, MA, USA, 2024. [Google Scholar]
- Liu, L.; Dugas, D.; Cesari, G.; Siegwart, R.; Dubé, R. Robot navigation in crowded environments using deep reinforcement learning. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 5671–5677. [Google Scholar]
- Chen, Y.; Liu, C.; Shi, B.E.; Liu, M. Robot navigation in crowds by graph convolutional networks with attention learned from human gaze. IEEE Robot. Autom. Lett. 2020, 5, 2754–2761. [Google Scholar] [CrossRef]
- Samsani, S.S.; Muhammad, M.S. Socially compliant robot navigation in crowded environment by human behavior resemblance using deep reinforcement learning. IEEE Robot. Autom. Lett. 2021, 6, 5223–5230. [Google Scholar] [CrossRef]
- Dong, L.; He, Z.; Song, C.; Yuan, X.; Zhang, H. Multi-robot social-aware cooperative planning in pedestrian environments using attention-based actor-critic. Artif. Intell. Rev. 2024, 57, 108. [Google Scholar] [CrossRef]
- Wang, W.; Bera, A.; Min, B.C. Hyper-SAMARL: Hypergraph-based Coordinated Task Allocation and Socially-aware Navigation for Multi-Robot Systems. arXiv 2024, arXiv:2409.11561. [Google Scholar]
- Song, C.; He, Z.; Dong, L. A local-and-global attention reinforcement learning algorithm for multiagent cooperative navigation. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 7767–7777. [Google Scholar] [CrossRef]
- Zhou, X.; Piao, S.; Chi, W.; Chen, L.; Li, W. HeR-DRL: Heterogeneous Relational Deep Reinforcement Learning for Single-Robot and Multi-Robot Crowd Navigation. IEEE Robot. Autom. Lett. 2025, 10, 4524–4531. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016, arXiv:1606.01540. [Google Scholar] [CrossRef]
- Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]










| Agent | In Presence of Human | In Presence of Ego Robot | In Presence of Other Robot |
|---|---|---|---|
| Human | Reciprocal collision avoidance | Reciprocal collision avoidance | Reciprocal collision avoidance |
| Ego Robot | Implements learned DRL policy with social awareness | - | Reciprocal collision avoidance (no social awareness) |
| Other Robot | Reciprocal collision avoidance with larger safety margin | Reciprocal collision avoidance (no social awareness) | Reciprocal collision avoidance |
| Parameter | Value |
|---|---|
| Preferred velocity | 1.0 m/s |
| Radius of all agents | 0.3 m |
| Discomfort distance for humans | 0.2 m |
| Hidden units of | 150, 100 |
| Hidden units of | 100, 50 |
| Hidden units of | 100, 100 |
| Hidden units of | 150, 100, 100 |
| IL training episodes | 2000 |
| IL epochs | 50 |
| IL learning rate | 0.01 |
| RL learning rate | 0.001 |
| Discount factor | 0.9 |
| Training batch size | 100 |
| RL training episodes | 6000 |
| Evaluation episodes | 1000 |
| Exploration rate in first 4000 episodes | 0.5 to 0.1 |
| Component | Description |
|---|---|
| Imitation Learning (IL) | 2000 expert trajectories using ORCA-generated demonstrations (5 humans + 2 robots, square-crossing setup); trained for 50 epochs, learning rate 0.01, batch size 100. |
| Reinforcement Learning (RL) | 6000 training episodes, learning rate 0.001, discount factor 0.9, batch size 100, Adam optimizer. |
| Stopping rule | Fixed at 6000 episodes; convergence determined via success rate. |
| Random seeds | Single seed (consistent across IL and DRL-only runs). |
| Hardware | AMD Ryzen 9 5950X CPU, 64 GB RAM; total training time of 17 h including IL pretraining. |
| Environment | PyTorch + OpenAI Gym; 2D square-crossing scenario. |
| Works | Success Rate (%) | Collision Rate (%) | Navigation Time (s) | Discomfort Rate (%) | Avg. Min. Separation Distance (m) | No of Training Episodes |
|---|---|---|---|---|---|---|
| This work | 89.0 | 0.15 | 12.13 | 6 | 0.16 | 6000 |
| HeR-DRL [32] | 96.54 | 3.14 | 10.88 | 6.9 | 0.154 | 15,000 |
| HoR-DRL [32] | 96.05 | 3.06 | 10.91 | 7.8 | 0.15 | 10,000 |
| LSTM-RL [15] | 85.52 | 5.49 | 11.52 | * | 0.147 | * |
| SARL [18] | 93.14 | 4.34 | 10.83 | * | 0.154 | 10,000 |
| ST2 [19] | 96.46 | 2.99 | 11.08 | * | 0.149 | 1000 |
| Episode | Success Rate (%) | Collision Rate (%) | Navigation Time (s) | Discomfort Rate (%) | Avg. Min. Separation Distance (m) |
|---|---|---|---|---|---|
| 0 | 88 | 5 | 11.37 | 15 | 0.09 |
| 1000 | 79 | 2 | 12.28 | 10 | 0.12 |
| 2000 | 72 | 0 | 14.10 | 2 | 0.14 |
| 3000 | 83 | 1 | 13.48 | 3 | 0.15 |
| 4000 | 86 | 0 | 12.92 | 3 | 0.16 |
| 5000 | 87 | 2 | 12.01 | 4 | 0.15 |
| Metric | Mean | Lower 95% CI | Upper 95% CI |
|---|---|---|---|
| Success Rate (%) | 0.8250 | 0.7763 | 0.8737 |
| Collision Rate (%) | 0.0167 | 0.0018 | 0.0316 |
| Navigation Time (s) | 12.693 | 11.890 | 13.497 |
| Discomfort Rate (%) | 0.0617 | 0.0201 | 0.1032 |
| Avg. Min. Separation Distance (m) | 0.1350 | 0.1143 | 0.1557 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kabir, I.K.; Mysorewala, M.F.; Osais, Y.I.; Nasir, A. Learning to Navigate in Mixed Human–Robot Crowds via an Attention-Driven Deep Reinforcement Learning Framework. Mach. Learn. Knowl. Extr. 2025, 7, 145. https://doi.org/10.3390/make7040145
Kabir IK, Mysorewala MF, Osais YI, Nasir A. Learning to Navigate in Mixed Human–Robot Crowds via an Attention-Driven Deep Reinforcement Learning Framework. Machine Learning and Knowledge Extraction. 2025; 7(4):145. https://doi.org/10.3390/make7040145
Chicago/Turabian StyleKabir, Ibrahim K., Muhammad F. Mysorewala, Yahya I. Osais, and Ali Nasir. 2025. "Learning to Navigate in Mixed Human–Robot Crowds via an Attention-Driven Deep Reinforcement Learning Framework" Machine Learning and Knowledge Extraction 7, no. 4: 145. https://doi.org/10.3390/make7040145
APA StyleKabir, I. K., Mysorewala, M. F., Osais, Y. I., & Nasir, A. (2025). Learning to Navigate in Mixed Human–Robot Crowds via an Attention-Driven Deep Reinforcement Learning Framework. Machine Learning and Knowledge Extraction, 7(4), 145. https://doi.org/10.3390/make7040145

