A Multi-Robot Collaborative Exploration Method Based on Deep Reinforcement Learning and Knowledge Distillation
Abstract
:1. Introduction
- We provide an enhanced deep reinforcement learning approach that builds upon the SAC deep reinforcement learning algorithm, enabling it to address decision-making challenges in multi-robot systems.
- We view the multi-agent system as a whole and obtain a policy network that allocates decisions to individual agents via centralized training. This network is converted into several distributed policy networks that can be implemented on individual agents with knowledge distillation techniques. Our proposed strategy offers an innovative approach to address the multi-agent collaboration task.
2. Related Work
2.1. Multi-Robot Exploration
2.2. Deep Reinforcement Learning
3. Methodology
3.1. Problem Formulation
3.2. Technical Framework
3.3. Centralized Reinforcement Learning
3.3.1. Modeling
3.3.2. Multi-Robot SAC
Algorithm 1 Soft Actor-Critic for Multi-Robot Decision Making | |
Input: | ▹ Initial parameters. |
| ▹ Initialize target network. |
| ▹ Initialize replay buffer. |
| |
| |
| ▹ Sample actions from the policy. |
| ▹ Sample a transition from the environment. |
| |
| |
| |
| ▹ Update policy critic network weights. |
| ▹ Update actor network weights. |
| ▹ Adjust temperature. |
| |
| |
| |
Output: | ▹ Optimized parameters |
3.3.3. Learning
- The smallest rectangular region containing valid information is boxed out, and irrelevant unknown regions are excluded;
- The boxed rectangular region is scaled so that the number of pixels on its long side matches the input dimension of the network (100 pixels in this study);
- The scaled rectangular region is padded into a square with unknown space, making it the same dimension as the input to the network (100 pixels × 100 pixels in this study).
3.4. Knowledge Distillation
3.4.1. Training Dataset Generation
3.4.2. Training Method
4. Training
5. Experiment
- Nearest Frontier [7]: The robots consistently choose the nearest positions on the frontiers as target points to explore;
- Information Gain [8]: The methodology involves the selection of goals for individual robots through the calculation of information gain resulting from the exploration of various frontier locations. This gain is then combined with the path length cost. Additionally, a co-ordination factor is employed to encourage the robot formation to disperse, thereby enhancing exploration efficiency;
- DME-DRL [14]: DME-DRL is based on the MADDPG deep reinforcement learning method with the introduction of structural information and time series. The robots employ this technique to choose the closest frontiers in specific directions as goal points for exploration.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
DRL | Deep reinforcement learning |
MADRL | Multi-agent deep reinforcement learning |
CTDE | Centralized training with decentralized execution |
GNN | Graph neural network |
AI | Artificial intelligence |
RL | Reinforcement learning |
MDP | Markov Decision Process |
MR-SAC | Multi-Robot SAC |
RNN | Recurrent neural network |
References
- Burgard, W.; Moors, M.; Stachniss, C.; Schneider, F.E. Coordinated multi-robot exploration. IEEE Trans. Robot. 2005, 21, 376–386. [Google Scholar] [CrossRef]
- Ahmadi, M.; Stone, P. A multi-robot system for continuous area sweeping tasks. In Proceedings of the 2006 IEEE International Conference on Robotics and Automation, ICRA 2006, Orlando, FL, USA, 15–19 May 2006; pp. 1724–1729. [Google Scholar] [CrossRef]
- Queralta, J.P.; Taipalmaa, J.; Pullinen, B.C.; Sarker, V.K.; Gia, T.N.; Tenhunen, H.; Gabbouj, M.; Raitoharju, J.; Westerlund, T. Collaborative multi-robot search and rescue: Planning, coordination, perception, and active vision. IEEE Access 2020, 8, 191617–191643. [Google Scholar] [CrossRef]
- Ng, M.K.; Chong, Y.W.; Ko, K.m.; Park, Y.H.; Leau, Y.B. Adaptive path finding algorithm in dynamic environment for warehouse robot. Neural Comput. Appl. 2020, 32, 13155–13171. [Google Scholar] [CrossRef]
- Gul, F.; Mir, A.; Mir, I.; Mir, S.; Islaam, T.U.; Abualigah, L.; Forestiero, A. A Centralized Strategy for Multi-Agent Exploration. IEEE Access 2022, 10, 126871–126884. [Google Scholar] [CrossRef]
- Matignon, L.; Jeanpierre, L.; Mouaddib, A.I. Coordinated multi-robot exploration under communication constraints using decentralized markov decision processes. In Proceedings of the AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; Volume 26, pp. 2017–2023. [Google Scholar]
- Yamauchi, B. Frontier-based exploration using multiple robots. In Proceedings of the Second International Conference on Autonomous Agents, St. Paul, MN, USA, 9–13 May 1998; pp. 47–53. [Google Scholar]
- Colares, R.G.; Chaimowicz, L. The next frontier: Combining information gain and distance cost for decentralized multi-robot exploration. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, 4–8 April 2016; pp. 268–274. [Google Scholar]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Niroui, F.; Zhang, K.; Kashino, Z.; Nejat, G. Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments. IEEE Robot. Autom. Lett. 2019, 4, 610–617. [Google Scholar] [CrossRef]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30, 6379–6390. [Google Scholar]
- Rashid, T.; Samvelyan, M.; De Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 2020, 21, 1–51. [Google Scholar]
- He, D.; Feng, D.; Jia, H.; Liu, H. Decentralized exploration of a structured environment based on multi-agent deep reinforcement learning. In Proceedings of the 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), Hong Kong, China, 2–4 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 172–179. [Google Scholar]
- Zhang, K.; Yang, Z.; Başar, T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. In Handbook of Reinforcement Learning and Control; Springer: Cham, Switzerland, 2021; pp. 321–384. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Stachniss, C.; Martinez Mozos, O.; Burgard, W. Speeding-up multi-robot exploration by considering semantic place information. In Proceedings of the 2006 IEEE International Conference on Robotics and Automation, ICRA 2006, Orlando, FL, USA, 15–19 May 2006; pp. 1692–1697. [Google Scholar] [CrossRef]
- Stachniss, C.; Martínez Mozos, Ó.; Burgard, W. Efficient exploration of unknown indoor environments using a team of mobile robots. Ann. Math. Artif. Intell. 2008, 52, 205–227. [Google Scholar] [CrossRef]
- Aurenhammer, F. Voronoi diagrams—A survey of a fundamental geometric data structure. ACM Comput. Surv. (CSUR) 1991, 23, 345–405. [Google Scholar] [CrossRef]
- Haumann, A.D.; Listmann, K.D.; Willert, V. DisCoverage: A new paradigm for multi-robot exploration. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, Alaska, 3–8 May 2010; pp. 929–934. [Google Scholar]
- Bautin, A.; Simonin, O.; Charpillet, F. MinPos: A Novel Frontier Allocation Algorithm for Multi-robot Exploration. In Proceedings of the Intelligent Robotics and Applications; Su, C.Y., Rakheja, S., Liu, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 496–508. [Google Scholar]
- Irturk, A.U. Distributed Multi-robot Coordination For Area Exploration and Mapping; University of California Santa Barbara: Santa Barbara, CA, USA, 2006. [Google Scholar]
- Rogers, J.G.; Nieto-Granda, C.; Christensen, H.I. Coordination Strategies for Multi-robot Exploration and Mapping. In Experimental Robotics: The 13th International Symposium on Experimental Robotics; Desai, J.P., Dudek, G., Khatib, O., Kumar, V., Eds.; Springer International Publishing: Heidelberg, Germany, 2013; pp. 231–243. [Google Scholar] [CrossRef]
- Dias, M.; Zlot, R.; Kalra, N.; Stentz, A. Market-Based Multirobot Coordination: A Survey and Analysis. Proc. IEEE 2006, 94, 1257–1270. [Google Scholar] [CrossRef]
- Zlot, R.; Stentz, A.; Dias, M.; Thayer, S. Multi-robot exploration controlled by a market economy. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), Washington, DC, USA, 11–15 May 2002; Volume 3, pp. 3016–3023. [Google Scholar] [CrossRef]
- Yan, Z.; Jouandeau, N.; Cherif, A.A. Multi-robot decentralized exploration using a trade-based approach. In Proceedings of the International Conference on Informatics in Control, Automation and Robotics, Noordwijkerhout, The Netherlands, 28–31 July 2011; SciTePress: Setubal, Portugal, 2011; Volume 2, pp. 99–105. [Google Scholar]
- Otte, M.; Kuhlman, M.J.; Sofge, D. Auctions for multi-robot task allocation in communication limited environments. Auton. Robot. 2020, 44, 547–584. [Google Scholar] [CrossRef]
- Zhang, H.; Cheng, J.; Zhang, L.; Li, Y.; Zhang, W. H2GNN: Hierarchical-Hops Graph Neural Networks for Multi-Robot Exploration in Unknown Environments. IEEE Robot. Autom. Lett. 2022, 7, 3435–3442. [Google Scholar] [CrossRef]
- Sutton, R.S.; Precup, D.; Singh, S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 1999, 112, 181–211. [Google Scholar] [CrossRef]
- Tan, A.H.; Bejarano, F.P.; Zhu, Y.; Ren, R.; Nejat, G. Deep Reinforcement Learning for Decentralized Multi-Robot Exploration With Macro Actions. IEEE Robot. Autom. Lett. 2023, 8, 272–279. [Google Scholar] [CrossRef]
- Shakya, A.K.; Pillai, G.; Chakrabarty, S. Reinforcement learning algorithms: A brief survey. Expert Syst. Appl. 2023, 231, 120495. [Google Scholar] [CrossRef]
- Kohl, N.; Stone, P. Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, New Orleans, LA, USA, 26 April–1 May 2004; IEEE: Piscataway, NJ, USA, 2004; Volume 3, pp. 2619–2624. [Google Scholar]
- Ng, A.Y.; Coates, A.; Diel, M.; Ganapathi, V.; Schulte, J.; Tse, B.; Berger, E.; Liang, E. Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX: The 9th International Symposium on Experimental Robotics; Springer: Berlin/Heidelberg, Germany, 2006; pp. 363–372. [Google Scholar]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhang, Y.; Wang, S. A review of mobile robot path planning based on deep reinforcement learning algorithm. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2021; Volume 2138, p. 012011. [Google Scholar]
- Liu, R.; Nageotte, F.; Zanne, P.; de Mathelin, M.; Dresp-Langley, B. Deep reinforcement learning for the control of robotic manipulation: A focussed mini-review. Robotics 2021, 10, 22. [Google Scholar] [CrossRef]
- Elallid, B.B.; Benamar, N.; Hafid, A.S.; Rachidi, T.; Mrani, N. A comprehensive survey on the application of deep and reinforcement learning approaches in autonomous driving. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 7366–7390. [Google Scholar] [CrossRef]
- Sturtevant, N. Benchmarks for Grid-Based Pathfinding. Trans. Comput. Intell. AI Games 2012, 4, 144–148. [Google Scholar] [CrossRef]
Index | C1 | RNN | TC | C2 | Elapsed Time (s) |
---|---|---|---|---|---|
1 | 2 | GRU | 3 | 2 | 181.0, 170.6, 160.1 |
2 | 2 | GRU | 4 | 3 | 179.1, 162.6, 159.4 |
3 | 3 | GRU | 3 | 2 | 177.3, 164.3, 160.3 |
4 | 3 | GRU | 4 | 3 | 173.2, 160.1, 156.5 |
5 | 4 | GRU | 3 | 2 | 176.5, 161.5, 154.9 |
6 | 4 | GRU | 4 | 3 | 170.7, 156.3, 154.1 |
7 | 2 | LSTM | 3 | 2 | 182.0, 165.9, 160.8 |
8 | 2 | LSTM | 4 | 3 | 175.3, 164.5, 160.7 |
9 | 3 | LSTM | 3 | 2 | 176.9, 163.5, 159.4 |
10 | 3 | LSTM | 4 | 3 | 170.8, 159.8, 154.3 |
11 | 4 | LSTM | 3 | 2 | 175.0, 158.5, 154.4 |
12 | 4 | LSTM | 4 | 3 | 172.3, 159.1, 149.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, R.; Lyu, M.; Zhang, J. A Multi-Robot Collaborative Exploration Method Based on Deep Reinforcement Learning and Knowledge Distillation. Mathematics 2025, 13, 173. https://doi.org/10.3390/math13010173
Wang R, Lyu M, Zhang J. A Multi-Robot Collaborative Exploration Method Based on Deep Reinforcement Learning and Knowledge Distillation. Mathematics. 2025; 13(1):173. https://doi.org/10.3390/math13010173
Chicago/Turabian StyleWang, Rui, Ming Lyu, and Jie Zhang. 2025. "A Multi-Robot Collaborative Exploration Method Based on Deep Reinforcement Learning and Knowledge Distillation" Mathematics 13, no. 1: 173. https://doi.org/10.3390/math13010173
APA StyleWang, R., Lyu, M., & Zhang, J. (2025). A Multi-Robot Collaborative Exploration Method Based on Deep Reinforcement Learning and Knowledge Distillation. Mathematics, 13(1), 173. https://doi.org/10.3390/math13010173