Learning-Based Multi-Robot Active SLAM: A Conceptual Framework and Survey
Abstract
1. Introduction
- An integrated conceptual framework for multi-robot AC-SLAM is introduced, modeling it as a coupled system consisting of a Dec-POMDP/MARL decision layer and a distributed factor graph/map fusion estimation layer. Within this abstraction, the fundamental components and objectives of collaborative perception, collaborative mapping, and collaborative decision-making are delineated, together with the input–output relationships of each module, to provide a structured lens for organizing and analyzing existing systems and methods. To avoid overstating novelty, it is emphasized that this framework serves primarily as an organizing abstraction: different surveyed works instantiate different subsets of modules with varying levels of experimental maturity. Table 3 summarizes the validation status of each component.
- Learning-driven multi-robot active SLAM methods are systematically reviewed, with a specific focus on analyzing the advantages and limitations of techniques such as end-to-end DRL, hierarchical reinforcement learning, and sparse map representations in achieving effective team collaborative exploration and safe constrained control within long-duration, complex environments.
- The primary challenges in the sim-to-real transfer of multi-robot active SLAM are summarized, including model bias, sensor and dynamics mismatch, and discrepancies in communication conditions. Typical technical pathways—such as domain randomization, stylized simulation, and online adaptation—are further categorized, and their transfer efficiency and applicable scenarios are compared.
2. Problem Formulation and Solution Framework of AC-SLAM
2.1. Formal Definition of AC-SLAM
2.1.1. Notations
- Robots (Agents): , with denoting the communication neighbors of robot i.
- Joint State: , where denotes the pose of robot i at time t and M denotes the environmental map (feature points, occupancy grid, or 3D mesh).
- Joint Actions: , representing the control inputs of all N robots at time t.
- Joint Observations: , where is the observation obtained by robot i at time t.
- Policy: , the decision policy of robot i mapping observation–action history to a distribution over actions.
- Reward: , the joint reward function integrating exploration gain, mapping quality, and costs.
- Belief: , the posterior probability distribution over the joint state given all past observations and actions, i.e., .
2.1.2. Optimization Objective Function
2.2. Conceptual System Framework: Coupled Model
2.2.1. Framework Architecture Diagram
| Algorithm 1: Multi-robot AC-SLAM pipeline |
![]() |
2.2.2. Collaborative Perception
2.2.3. Collaborative Mapping/Optimization
2.2.4. Collaborative Policy for AC-SLAM
- N is the number of robots;
- represents the global state (including the poses of all robots and the environmental map);
- denotes the action space of robot i (e.g., target poses, velocity commands);
- is the state transition function;
- is the observation space of robot i;
- is the observation function;
- is the joint reward function (integrating exploration coverage, pose and map uncertainty, communication/motion costs, and cooperative gains);
- is the discount factor.
3. Learning-Driven Multi-Robot Active SLAM
3.1. The Cognitive Shift in Collaborative Decision-Making
3.2. End-to-End Deep Reinforcement Learning
3.2.1. Unified Architectures and Perception Backbones
3.2.2. Rewards and Learning
- Intrinsic Exploration Reward (): Proportional to the number of newly visited grid cells or the reduction in map entropy. This drives the fundamental map-building behavior.
- Obstacle Avoidance Penalty (): A negative reward for collisions or proximity to obstacles, ensuring safety.
- Smoothness Penalty (): Penalties for jerky movements or rapid oscillations in angular velocity, which are detrimental to odometry estimation and map consistency.
- Coordination Reward (): In multi-robot settings, a penalty is often applied for overlap with other agents’ trajectories or sensor footprints, encouraging dispersion and reducing redundant coverage.
3.2.3. Challenges in Multi-Agent Scalability
Computational Complexity Analysis
- Independent learners avoid the exponential blowup by treating other agents as environment dynamics, achieving per-agent training and inference at the cost of non-stationarity.
- CTDE methods (QMIX, VDN) maintain tractable per-agent inference while requiring mixing network forward passes during centralized training.
- Hierarchical approaches reduce decision frequency through temporal abstraction, with high-level policies operating at frequency where k is the option length.
- Communication-based methods introduce per-agent message complexity for fully-connected topologies, where is the message size.
3.3. Hierarchical Policy and Spatial Abstraction
3.3.1. Policy Hierarchy
- High-level Policy: Runs at a lower frequency to select long-term sub-goals in an abstract state space (e.g., navigate to the corridor end node). Its action space is a discrete set of graph nodes [62].
- Low-level Control: Runs at a higher frequency to generate specific motion commands (linear velocity v, angular velocity ) that move the robot toward the sub-goals. This layer also handles obstacle avoidance and smooth control execution.
3.3.2. Spatial Abstraction
3.4. The Robustness Gap
4. Sim-to-Real Transfer Pathways
- Non-stationarity in Action Execution: In simulation, robot actions are idealized to be instantaneous, synchronous, and of fixed step size. However, in the real world, physical constraints and network latency cause action execution to experience delays and jitter. This variability at the execution level results in the robot’s actual responses deviating from the commands issued in simulation.
- Non-stationarity in Reward Function: Simulation-based training typically relies on a fixed reward function. However, the real environment is dynamic; previously defined reward weights or environmental features may no longer apply, and new factors can emerge. This non-stationarity of the reward function means that a policy optimal in simulation may not remain optimal in reality.
- Environmental Uncertainty in State Perception: Simulation environments are highly simplified and often ignore complex physical factors like friction, slopes, and lighting variations. This leads to uncertainty in the state transition model. In other words, the robot’s perception and prediction of its own state and the environment in the real world will deviate from those in simulation.
- Technical Uncertainty in Observation and Communication: Simulation assumes precise sensor observations, zero-latency communication, and no packet loss. In reality, sensor data is often accompanied by noise and information loss, and multi-robot communication faces issues such as bandwidth limitations, delays, and out-of-order delivery. This uncertainty in the perception and communication links causes the quality of information obtained by the robot during task execution to be significantly lower than in the simulation environment.
4.1. Domain Randomization
4.2. Domain Adaptation
4.3. Real-to-Sim
4.4. Robustness in Communication and Distributed Coordination
5. Future Directions
5.1. From Mapping to Understanding
5.2. Dynamic Digital Twins
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, W.; Wang, X.; Gao, S.; Shang, G.; Zhou, C.; Li, Z.; Xu, C.; Hu, K. Overview of multi-robot collaborative SLAM from the perspective of data fusion. Machines 2023, 11, 653. [Google Scholar] [CrossRef]
- Rosen, D.M.; Doherty, K.J.; Terán Espinoza, A.; Leonard, J.J. Advances in inference and representation for simultaneous localization and mapping. Annu. Rev. Control Robot. Auton. Syst. 2021, 4, 215–242. [Google Scholar] [CrossRef]
- Rosinol, A.; Violette, A.; Abate, M.; Hughes, N.; Chang, Y.; Shi, J.; Gupta, A.; Carlone, L. Kimera: From SLAM to spatial perception with 3D dynamic scene graphs. Int. J. Robot. Res. 2021, 40, 1510–1546. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, W.; Bai, Y.; Liang, X.; Li, G.; Gao, W.; Lin, L. Aligning cyber space with physical world: A comprehensive survey on embodied AI. IEEE ASME Trans. Mechatron. 2025, 30, 7253–7274. [Google Scholar] [CrossRef]
- Sun, F.; Chen, R.; Ji, T.; Luo, Y.; Zhou, H.; Liu, H. A comprehensive survey on embodied intelligence: Advancements, challenges, and future perspectives. Caai Artif. Intell. Res. 2024, 3, 9150042. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, L.; Zheng, Y.; Liu, Y.; Dang, F.; Li, N.; Ma, K. Embodied navigation. Sci. China Inf. Sci. 2025, 68, 141101. [Google Scholar] [CrossRef]
- Moosavi, S.K.R.; Zafar, M.H.; Sanfilippo, F. Collaborative robots (cobots) for disaster risk resilience: A framework for swarm of snake robots in delivering first aid in emergency situations. Front. Robot. AI 2024, 11, 1362294. [Google Scholar] [CrossRef]
- Ebadi, K.; Bernreiter, L.; Biggie, H.; Catt, G.; Chang, Y.; Chatterjee, A.; Denniston, C.E.; Deschênes, S.P.; Harlow, K.; Khattak, S.; et al. Present and future of SLAM in extreme environments: The DARPA SubT challenge. IEEE Trans. Robot. 2024, 40, 936–959. [Google Scholar] [CrossRef]
- Placed, J.A.; Strader, J.; Carrillo, H.; Atanasov, N.; Indelman, V.; Carlone, L.; Castellanos, J.A. A Survey on Active Simultaneous Localization and Mapping: State of the Art and New Frontiers. IEEE Trans. Robot. 2023, 39, 1686–1705. [Google Scholar] [CrossRef]
- Bernreiter, L.; Khattak, S.; Ott, L.; Siegwart, R.; Hutter, M.; Cadena, C. A framework for collaborative multi-robot mapping using spectral graph wavelets. Int. J. Robot. Res. 2024, 43, 2070–2088. [Google Scholar] [CrossRef]
- Huang, Y.; Lin, X.; Englot, B. Multi-robot autonomous exploration and mapping under localization uncertainty with expectation-maximization. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; IEEE: New York, NY, USA, 2024; Volume 18, pp. 7236–7242. [Google Scholar] [CrossRef]
- Chiun, J.; Zhang, S.; Wang, Y.; Cao, Y.; Sartoretti, G. MARVEL: Multi-agent reinforcement learning for constrained field-of-view multi-robot exploration in large-scale environments. In Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 17–23 May 2025; IEEE: New York, NY, USA, 2025; Volume 20, pp. 11392–11398. [Google Scholar] [CrossRef]
- Xu, W.; Chen, Y.; Liu, S.; Nie, A.; Chen, R. Multi-robot cooperative simultaneous localization and mapping algorithm based on sub-graph partitioning. Sensors 2025, 25, 2953. [Google Scholar] [CrossRef]
- Chang, Y.; Ebadi, K.; Denniston, C.E.; Ginting, M.F.; Rosinol, A.; Reinke, A.; Palieri, M.; Shi, J.; Chatterjee, A.; Morrell, B.; et al. LAMP 2.0: A robust multi-robot SLAM system for operation in challenging large-scale underground environments. IEEE Robot. Autom. Lett. 2022, 7, 9175–9182. [Google Scholar] [CrossRef]
- Lajoie, P.Y.; Ramtoula, B.; Chang, Y.; Carlone, L.; Beltrame, G. DOOR-SLAM: Distributed, online, and outlier resilient SLAM for robotic teams. IEEE Robot. Autom. Lett. 2020, 5, 1656–1663. [Google Scholar] [CrossRef]
- Denniston, C.E.; Chang, Y.; Reinke, A.; Ebadi, K.; Sukhatme, G.S.; Carlone, L.; Morrell, B.; Agha-mohammadi, A.A. Loop closure prioritization for efficient and scalable multi-robot SLAM. IEEE Robot. Autom. Lett. 2022, 7, 9651–9658. [Google Scholar] [CrossRef]
- Ahmed, M.F.; Maragliano, M.; Frémont, V.; Recchiuto, C.T. Efficient multi-robot active SLAM. J. Intell. Robot. Syst. 2025, 111, 64. [Google Scholar] [CrossRef]
- Chen, Y.; Zhao, L.; Lee, K.M.B.; Yoo, C.; Huang, S.; Fitch, R. Broadcast your weaknesses: Cooperative active pose-graph SLAM for multiple robots. IEEE Robot. Autom. Lett. 2020, 5, 2200–2207. [Google Scholar] [CrossRef]
- Chang, Y.; Tian, Y.; How, J.P.; Carlone, L. Kimera-multi: A system for distributed multi-robot metric-semantic simultaneous localization and mapping. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: New York, NY, USA, 2021; Volume 12. [Google Scholar] [CrossRef]
- Schmuck, P.; Ziegler, T.; Karrer, M.; Perraudin, J.; Chli, M. COVINS: Visual-inertial SLAM for centralized collaboration. In Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Bari, Italy, 4–8 October 2021; IEEE: New York, NY, USA, 2021; Volume 13. [Google Scholar] [CrossRef]
- Cao, H.; Shreedharan, S.; Atanasov, N. Multi-robot object SLAM using distributed variational inference. IEEE Robot. Autom. Lett. 2024, 9, 8722–8729. [Google Scholar] [CrossRef]
- Durrant-Whyte, H.; Bailey, T. Simultaneous localization and mapping: Part I. IEEE Robot. Autom. Mag. 2006, 13, 99–110. [Google Scholar] [CrossRef]
- Kazerouni, I.A.; Fitzgerald, L.; Dooly, G.; Toal, D. A survey of state-of-the-art on visual SLAM. Expert Syst. Appl. 2022, 205, 117734. [Google Scholar] [CrossRef]
- Chen, W.; Shang, G.; Ji, A.; Zhou, C.; Wang, X.; Xu, C.; Li, Z.; Hu, K. An overview on visual SLAM: From tradition to semantic. Remote Sens. 2022, 14, 3010. [Google Scholar] [CrossRef]
- Favorskaya, M.N. Deep learning for visual SLAM: The state-of-the-art and future trends. Electronics 2023, 12, 2006. [Google Scholar] [CrossRef]
- Ahmed, M.F.; Masood, K.; Fremont, V.; Fantoni, I. Active SLAM: A Review on Last Decade. Sensors 2023, 23, 8097. [Google Scholar] [CrossRef]
- Lluvia, I.; Lazkano, E.; Ansuategi, A. Active mapping and robot exploration: A survey. Sensors 2021, 21, 2445. [Google Scholar] [CrossRef]
- Lajoie, P.Y.; Ramtoula, B.; Wu, F.; Beltrame, G. Towards collaborative simultaneous localization and mapping: A survey of the current research landscape. Field Robot. 2022, 2, 971–1000. [Google Scholar] [CrossRef]
- Wang, C.; Yu, C.; Xu, X.; Gao, Y.; Yang, X.; Tang, W.; Yu, S.; Chen, Y.; Gao, F.; Jian, Z.; et al. Multi-Robot System for Cooperative Exploration in Unknown Environments: A Survey. arXiv 2025, arXiv:2503.07278. [Google Scholar] [CrossRef]
- Orr, J.; Dutta, A. Multi-agent deep reinforcement learning for multi-robot applications: A survey. Sensors 2023, 23, 3625. [Google Scholar] [CrossRef]
- Queralta, J.P.; Taipalmaa, J.; Pullinen, B.C.; Sarker, V.K.; Gia, T.N.; Tenhunen, H.; Gabbouj, M.; Raitoharju, J.; Westerlund, T. Collaborative Multi-Robot Search and Rescue: Planning, Coordination, Perception, and Active Vision. IEEE Access 2020, 8, 191617–191643. [Google Scholar] [CrossRef]
- Lajoie, P.Y.; Hu, S.; Beltrame, G.; Carlone, L. Modeling Perceptual Aliasing in SLAM via Discrete-Continuous Graphical Models. IEEE Robot. Autom. Lett. 2019, 4, 1232–1239. [Google Scholar] [CrossRef]
- Feng, D.; Qi, Y.; Zhong, S.; Chen, Z.; Chen, Q.; Chen, H.; Wu, J.; Ma, J. S3E: A Multi-Robot Multimodal Dataset for Collaborative SLAM. IEEE Robot. Autom. Lett. 2024, 9, 11401–11408. [Google Scholar] [CrossRef]
- Tian, Y.; Khosoussi, K.; Rosen, D.M.; How, J.P. Distributed certifiably correct pose-graph optimization. IEEE Trans. Robot. 2021, 37, 2137–2156. [Google Scholar] [CrossRef]
- Lajoie, P.Y.; Beltrame, G. Swarm-SLAM: Sparse decentralized collaborative simultaneous localization and mapping framework for multi-robot systems. IEEE Robot. Autom. Lett. 2024, 9, 475–482. [Google Scholar] [CrossRef]
- Yu, C.; Velu, A.; Vinitsky, E.; Gao, J.; Wang, Y.; Bayen, A.; WU, Y. The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games. Adv. Neural Inf. Process. Syst. 2022, 35, 24611–24624. [Google Scholar]
- Indelman, V.; Carlone, L.; Dellaert, F. Planning in the continuous domain: A generalized belief space approach for autonomous navigation in unknown environments. Int. J. Robot. Res. 2015, 34, 849–882. [Google Scholar] [CrossRef]
- Lauri, M.; Pajarinen, J.; Peters, J. Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement. Auton. Agent. Multi. Agent. Syst. 2020, 34, 42. [Google Scholar] [CrossRef]
- Dellaert, F.; Kaess, M. Factor graphs for robot perception. Found. Trends Robot. 2017, 6, 1–139. [Google Scholar] [CrossRef]
- Kaess, M.; Johannsson, H.; Roberts, R.; Ila, V.; Leonard, J.J.; Dellaert, F. iSAM2: Incremental smoothing and mapping using the Bayes tree. Int. J. Robot. Res. 2012, 31, 216–235. [Google Scholar] [CrossRef]
- Oliehoek, F.A.; Amato, C. A Concise Introduction to Decentralized POMDPs; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
- Wang, T.H.; Manivasagam, S.; Liang, M.; Yang, B.; Zeng, W.; Urtasun, R. V2VNet: Vehicle-to-vehicle communication for joint perception and prediction. In Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 34, pp. 605–621. [Google Scholar]
- Hu, Y.; Fang, S.; Lei, Z.; Zhong, Y.; Chen, S. Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35, pp. 4874–4886. [Google Scholar]
- Li, Y.; Ren, S.; Wu, P.; Chen, S.; Feng, C.; Zhang, W. Learning Distilled Collaboration Graph for Multi-Agent Perception. In Advances in Neural Information Processing Systems; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 29541–29552. [Google Scholar]
- Chen, J.; Wu, Z.; Li, H.; Yang, F.; Xia, L. Robust Loop Closure Selection Based on Inter-Robot and Intra-Robot Consistency for Multi-Robot Map Fusion. Remote Sens. 2023, 15, 2796. [Google Scholar] [CrossRef]
- Shi, W.; Ling, Q.; Yuan, K.; Wu, G.; Yin, W. On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 2014, 62, 1750–1761. [Google Scholar] [CrossRef]
- Choudhary, S.; Carlone, L.; Nieto, C.; Rogers, J.; Christensen, H.I.; Dellaert, F. Distributed mapping with privacy and communication constraints: Lightweight algorithms and object-based models. Int. J. Robot. Res. 2017, 36, 1286–1311. [Google Scholar] [CrossRef]
- Rashid, T.; Samvelyan, M.; de Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. J. Mach. Learn. Res. 2020, 21, 1–51. [Google Scholar]
- Feder, H.J.S.; Leonard, J.J.; Smith, C.M. Adaptive mobile robot navigation and mapping. Int. J. Robot. Res. 1999, 18, 650–668. [Google Scholar] [CrossRef]
- Bourgault, F.; Makarenko, A.; Williams, S.; Grocholsky, B.; Durrant-Whyte, H. Information based adaptive robotic exploration. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, 30 September–4 October 2002; Volume 1, pp. 540–545. [Google Scholar] [CrossRef]
- Yamauchi, B. A frontier-based approach for autonomous exploration. In Proceedings of the 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97. ‘Towards New Computational Principles for Robotics and Automation’, Monterey, CA, USA, 10–11 July 1997; pp. 146–151. [Google Scholar] [CrossRef]
- Mirowski, P.; Pascanu, R.; Viola, F.; Soyer, H.; Ballard, A.J.; Banino, A.; Denil, M.; Goroshin, R.; Sifre, L.; Kavukcuoglu, K.; et al. Learning to Navigate in Complex Environments. arXiv 2017, arXiv:1611.03673. [Google Scholar] [CrossRef]
- Parisotto, E.; Salakhutdinov, R. Neural Map: Structured Memory for Deep Reinforcement Learning. arXiv 2017, arXiv:1702.08360. [Google Scholar] [CrossRef]
- Lin, J.; Yang, X.; Zheng, P.; Cheng, H. End-to-end decentralized multi-robot navigation in unknown complex environments via deep reinforcement learning. In Proceedings of the 2019 IEEE International Conference on Mechatronics and Automation (ICMA), Tianjin, China, 4–7 August 2019; IEEE: New York, NY, USA, 2019; Volume 42. [Google Scholar] [CrossRef]
- Levine, S.; Finn, C.; Darrell, T.; Abbeel, P. End-to-End Training of Deep Visuomotor Policies. J. Mach. Learn. Res. 2016, 17, 1334–1373. [Google Scholar]
- Wang, S.; Clark, R.; Wen, H.; Trigoni, N. Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; IEEE: New York, NY, USA, 2017; pp. 2043–2050. [Google Scholar]
- Clark, R.; Wang, S.; Wen, H.; Markham, A.; Trigoni, N. Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar] [CrossRef]
- Cai, Y.; He, X.; Guo, H.; Yau, W.Y.; Lv, C. Transformer-based Multi-Agent Reinforcement Learning for Generalization of Heterogeneous Multi-Robot Cooperation. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 13695–13702. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Dy, J., Krause, A., Eds.; Volume 80, pp. 1861–1870. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Chaplot, D.S.; Gandhi, D.; Gupta, S.; Gupta, A.; Salakhutdinov, R. Learning to Explore using Active Neural SLAM. arXiv 2020, arXiv:2004.05155. [Google Scholar] [CrossRef]
- Gronauer, S.; Diepold, K. Multi-agent deep reinforcement learning: A survey. Artif. Intell. Rev. 2022, 55, 895–943. [Google Scholar] [CrossRef]
- Lowe, R.; WU, Y.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Bettini, M.; Prorok, A.; Moens, V. BenchMARL: Benchmarking Multi-Agent Reinforcement Learning. J. Mach. Learn. Res. 2024, 25, 1–10. [Google Scholar]
- Foerster, J.; Farquhar, G.; Afouras, T.; Nardelli, N.; Whiteson, S. Counterfactual Multi-Agent Policy Gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
- Devin, C.; Gupta, A.; Darrell, T.; Abbeel, P.; Levine, S. Learning modular neural network policies for multi-task and multi-robot transfer. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 2169–2176. [Google Scholar] [CrossRef]
- Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Virtual Conference, 1–4 December 2020; pp. 737–744. [Google Scholar] [CrossRef]
- Bernstein, D.S.; Givan, R.; Immerman, N.; Zilberstein, S. The complexity of decentralized control of Markov decision processes. Math. Oper. Res. 2002, 27, 819–840. [Google Scholar] [CrossRef]
- Sutton, R.S.; Precup, D.; Singh, S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 1999, 112, 181–211. [Google Scholar] [CrossRef]
- Kulkarni, T.D.; Narasimhan, K.; Saeedi, A.; Tenenbaum, J. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation. In Advances in Neural Information Processing Systems; Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29. [Google Scholar]
- Hafner, D.; Lee, K.H.; Fischer, I.; Abbeel, P. Deep Hierarchical Planning from Pixels. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35, pp. 26091–26104. [Google Scholar]
- Li, J.; Tang, C.; Tomizuka, M.; Zhan, W. Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning. IEEE Robot. Autom. Lett. 2022, 7, 10216–10223. [Google Scholar] [CrossRef]
- Gürtler, N.; Büchler, D.; Martius, G. Hierarchical Reinforcement Learning with Timed Subgoals. In Advances in Neural Information Processing Systems; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 21732–21743. [Google Scholar]
- Wang, T.; Dong, H.; Lesser, V.; Zhang, C. ROMA: Multi-Agent Reinforcement Learning with Emergent Roles. arXiv 2020, arXiv:2003.08039. [Google Scholar] [CrossRef]
- Wang, T.; Gupta, T.; Mahajan, A.; Peng, B.; Whiteson, S.; Zhang, C. RODE: Learning Roles to Decompose Multi-Agent Tasks. arXiv 2020, arXiv:2010.01523. [Google Scholar] [CrossRef]
- Setyawan, G.E.; Hartono, P.; Sawada, H. Cooperative multi-robot hierarchical reinforcement learning. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 745–751. [Google Scholar] [CrossRef]
- Singh Chaplot, D.; Salakhutdinov, R.; Gupta, A.; Gupta, S. Neural topological SLAM for visual navigation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, 14–19 June 2020; IEEE: New York, NY, USA, 2020; Volume 44. [Google Scholar] [CrossRef]
- Savinov, N.; Dosovitskiy, A.; Koltun, V. Semi-parametric Topological Memory for Navigation. arXiv 2018, arXiv:1803.00653. [Google Scholar] [CrossRef]
- Battaglia, P.W.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. Relational inductive biases, deep learning, and graph networks. arXiv 2018, arXiv:1806.01261. [Google Scholar] [CrossRef]
- Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing for Quantum Chemistry. Int. Conf. Mach. Learn. 2017, 70, 1263–1272. [Google Scholar]
- Kipf, T. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar] [CrossRef]
- Luo, T.; Subagdja, B.; Wang, D.; Tan, A.H. Multi-agent collaborative exploration through graph-based deep reinforcement learning. In Proceedings of the 2019 IEEE International Conference on Agents (ICA), Beijing, China, 6–9 July 2019; IEEE: New York, NY, USA, 2019; Volume 49. [Google Scholar] [CrossRef]
- Tzes, M.; Bousias, N.; Chatzipantazis, E.; Pappas, G.J. Graph Neural Networks for Multi-Robot Active Information Acquisition. arXiv 2022, arXiv:2209.12091. [Google Scholar] [CrossRef]
- Yang, X.; Yang, Y.; Yu, C.; Chen, J.; Yu, J.; Ren, H.; Yang, H.; Wang, Y. Active Neural Topological Mapping for Multi-Agent Exploration. IEEE Robot. Autom. Lett. 2024, 9, 303–310. [Google Scholar] [CrossRef]
- Cai, G.; Guo, L.; Chang, X. An Enhanced Hierarchical Planning Framework for Multi-Robot Autonomous Exploration. arXiv 2024, arXiv:2410.19373. [Google Scholar] [CrossRef]
- Gervet, T.; Chintala, S.; Batra, D.; Malik, J.; Chaplot, D.S. Navigating to objects in the real world. Sci. Robot. 2023, 8, eadf6991. [Google Scholar] [CrossRef] [PubMed]
- Puig, X.; Undersander, E.; Szot, A.; Cote, M.D.; Yang, T.Y.; Partsey, R.; Desai, R.; Clegg, A.W.; Hlavac, M.; Min, S.Y.; et al. Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots. arXiv 2023, arXiv:2310.13724. [Google Scholar] [CrossRef]
- NVIDIA Isaac Sim. Available online: https://developer.nvidia.com/isaac-sim (accessed on 15 December 2025).
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; IEEE: New York, NY, USA, 2017; Volume 56. [Google Scholar]
- He, T.; Wang, Z.; Xue, H.; Ben, Q.; Luo, Z.; Xiao, W.; Yuan, Y.; Da, X.; Castañeda, F.; Sastry, S.; et al. VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation. arXiv 2025, arXiv:2511.15200. [Google Scholar] [CrossRef]
- Rao, K.; Harris, C.; Irpan, A.; Levine, S.; Ibarz, J.; Khansari, M. RL-CycleGAN: Reinforcement learning aware simulation-to-real. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, 14–19 June 2020; IEEE: New York, NY, USA, 2020; Volume 58. [Google Scholar]
- Béres, A.; Gyires-Tóth, B. Enhancing visual domain randomization with real images for Sim-to-real transfer. Infocommun. J. 2023, 15, 15–25. [Google Scholar] [CrossRef]
- Cheng, S.; Ma, L.; Chen, Z.; Mandlekar, A.; Garrett, C.; Xu, D. Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training. arXiv 2025, arXiv:2509.18631. [Google Scholar] [CrossRef]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2022, 65, 99–106. [Google Scholar] [CrossRef]
- Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 139:1–139:14. [Google Scholar] [CrossRef]
- Xie, Z.; Liu, Z.; Peng, Z.; Wu, W.; Zhou, B. Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation. In Proceedings of the Computer Vision and Pattern Recognition Conference, Vancouver, BC, Canada, 16–21 June 2025; pp. 1581–1591. [Google Scholar]
- Chhablani, G.; Ye, X.; Irshad, M.Z.; Kira, Z. EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 4–10 October 2025; pp. 25431–25441. [Google Scholar]
- Kim, W.; Cho, M.; Sung, Y. Message-dropout: An efficient training method for multi-agent deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6079–6086. [Google Scholar] [CrossRef]
- Pongsirijinda, K.; Cao, Z.; Pik Lik Lau, B.; Liu, R.; Yuen, C.; Tan, U.X. MEF-explore: Communication-constrained multi-robot entropy-field-based exploration. IEEE Trans. Autom. Sci. Eng. 2025, 22, 16062–16078. [Google Scholar] [CrossRef]
- Xu, H.; Liu, P.; Chen, X.; Shen, S. D2SLAM: Decentralized and distributed collaborative visual-inertial SLAM system for aerial swarm. IEEE Trans. Robot. 2024, 40, 3445–3464. [Google Scholar] [CrossRef]
- Tian, Y.; Chang, Y.; Herrera Arias, F.; Nieto-Granda, C.; How, J.; Carlone, L. Kimera-multi: Robust, distributed, dense metric-semantic SLAM for multi-robot systems. IEEE Trans. Robot. 2022, 38, 2022–2038. [Google Scholar] [CrossRef]
- Mangelson, J.G.; Dominic, D.; Eustice, R.M.; Vasudevan, R. Pairwise consistent measurement set maximization for robust multi-robot map merging. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; IEEE: New York, NY, USA, 2018; Volume 70. [Google Scholar] [CrossRef]
- Yang, H.; Antonante, P.; Tzoumas, V.; Carlone, L. Graduated non-convexity for robust spatial perception: From non-minimal solvers to global outlier rejection. IEEE Robot. Autom. Lett. 2020, 5, 1127–1134. [Google Scholar] [CrossRef]
- Zhang, L.; Deng, J. Deep Compressed Communication and Application in Multi-Robot 2D-Lidar SLAM: An Intelligent Huffman Algorithm. Sensors 2024, 24, 3154. [Google Scholar] [CrossRef]
- Han, J.; Ma, C.; Zou, D.; Jiao, S.; Chen, C.; Wang, J. Distributed Multi-Robot SLAM Algorithm with Lightweight Communication and Optimization. Electronics 2024, 13, 4129. [Google Scholar] [CrossRef]
- Li, P.; An, Z.; Abrar, S.; Zhou, L. Large Language Models for Multi-Robot Systems: A Survey. arXiv 2025, arXiv:2502.03814. [Google Scholar] [CrossRef]
- Hou, J.; Xue, X.; Zeng, T. Hi-Dyna Graph: Hierarchical Dynamic Scene Graph for Robotic Autonomy in Human-Centric Environments. arXiv 2025, arXiv:2506.00083. [Google Scholar] [CrossRef]
- Liu, S.; Yuan, H.; Hu, M.; Li, Y.; Chen, Y.; Liu, S.; Lu, Z.; Jia, J. RL-GPT: Integrating Reinforcement Learning and Code-as-policy. In Advances in Neural Information Processing Systems; Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2024; Volume 37, pp. 28430–28459. [Google Scholar] [CrossRef]
- Abou-Chakra, J.; Sun, L.; Rana, K.; May, B.; Schmeckpeper, K.; Suenderhauf, N.; Minniti, M.V.; Herlant, L. Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin. arXiv 2025, arXiv:2504.03597. [Google Scholar] [CrossRef]


| Dimension | Single-Robot SLAM | Multi-Robot Collaborative SLAM | Evidences |
|---|---|---|---|
| Exploration Efficiency/Coverage | Serial exploration; coverage time scales with environment size; limited sensing range/FoV. | Parallel exploration reduces time to completion and improves coverage; requires task allocation and mitigation of overlap/conflicts. | [7,17,18] |
| Localization Accuracy | Relies on intra-robot loop closure; prone to drift over long trajectories; lacks inter-robot constraint correction. | Inter-robot loop closure and joint optimization significantly reduce drift and improve global consistency; relies on robust inter-robot data association. | [14,19,20,21] |
| Robustness/Fault Tolerance | Single point of failure; limited recovery methods during sensor degradation, occlusion, or dynamic interference. | Redundant observations and multi-path loop closures enhance robustness; network splitting or outliers may lead to inconsistency or erroneous fusion. | [14,15,16,19] |
| Computation/Scalability | Centralized front/back-end; large-scale or high-frequency data constrained by onboard compute and memory. | Computation can be distributed or server-aided; however, global optimization, verification, and map fusion can become bottlenecks. | [16,17,20,21] |
| Communication Load | No inter-robot communication (local bus/storage only). | Requires exchange of keyframes/submaps/constraints; significantly impacted by bandwidth, latency, and disconnection. | [15,18,19,20] |
| System Complexity | Relatively simple architecture (no inter-robot data association, alignment, or consistency maintenance). | Requires time synchronization, coordinate initialization/alignment, inter-robot outlier rejection, and consistency management. | [14,16,19] |
| Feature | Single/Passive SLAM | Single/Active SLAM | Multi/Passive SLAM | Multi/Active SLAM |
|---|---|---|---|---|
| Core Objective | Estimate only: High-precision localization & mapping | Joint decision and estimation: Select a to obtain faster/more accurately | Fuse multi-robot to estimate global | Collab. decision and dist. estimation: Jointly select to maximize info gain and map consistency |
| Math. Essence | State estimation/optimization (Filtering, BA/Factor Graph) | POMDP/info-theoretic planning + SLAM backend | Distributed/centralized estimation (Factor Graph, Map Fusion) | Dec-POMDP/multi-agent collaboration + distributed estimation |
| RL Relevance | Low–Med: Partially used for learned front-end/loop closure | Med–High: Used for view selection, policy learning in exploration tasks | Low–Med: Used for communication scheduling/task allocation | High: MARL for collaborative policy; strongly coupled with SLAM backend |
| Key Challenges | Degraded scenes, dynamic environments, long-term drift | High cost of non-myopic planning; balancing efficiency/safety/real-time | Inter-robot loop closure, map fusion, bandwidth limits, scalability | Collab. policy design, sparse comms/sensing, robustness, decentralization, false loop closures |
| Surveys | [23,24,25] | [9,26,27] | [1,28] | [9,26,29,30] |
| Component | Theory | Sim | 1-R | N-R | Bench | Benchmarks/Evidences |
|---|---|---|---|---|---|---|
| Estimation Layer (Section 2.2.2 and Section 2.2.3) | ||||||
| Factor Graph Backend | ✔ | ✔ | ✔ | ✔ | ✔ | KITTI, EuRoC, TUM RGB-D [8] |
| Distributed PGO | ✔ | ✔ | ✔ | ✔ | ∘ | Custom pose graphs [34,35] |
| Loop Closure | ✔ | ✔ | ✔ | ∘ | ∘ | Place recognition [15] |
| Decision Layer (Section 2.2.4) | ||||||
| Frontier Planning | ✔ | ✔ | ✔ | ✔ | ✔ | Standard grid worlds |
| Info-gain Planning | ✔ | ✔ | ✔ | ∘ | ∘ | Active SLAM metrics [9] |
| MARL Exploration | ✔ | ✔ | ∘ | – | ∘ | SMAC, Habitat (non-SLAM) [36] |
| E2E Visual RL | ∘ | ✔ | – | – | ∘ | Habitat, Gibson (nav only) |
| Full Pipeline Integration | ||||||
| Estimation–Decision | ∘ | ✔ | ✔ | ∘ | – | No unified benchmark |
| Complete AC-SLAM | ∘ | ∘ | ∘ | – | – | No standard benchmark |
| Paradigm | Joint State/Action | Training | Inference | Comm. | Scale ‡ |
|---|---|---|---|---|---|
| Centralized † | , (exponential) | N/A | Low () | ||
| Independent [63] | local only (, ) | None | High () | ||
| CTDE [48] | factorized | mix | Train only | Moderate | |
| Hierarchical [70,71] | abstracted | dec. | Varies | High | |
| Comm.-based [63] | msgs | Moderate |
| Method | Year | Sensing | Spatial Abstraction | Hierarchical Approach | Learning Method | Agent Count | Environment |
|---|---|---|---|---|---|---|---|
| [72] | 2022 | Visual and Proprioceptive | Discrete Latent Goal Codes | High-level: Selects latent goals Low-level: Generates primitive actions | Dreamer (Model-Based RL) | 1 | Standard RL Suites (DMLab, DM Control, Atari) |
| [74] | 2021 | State Space (Controllable and Environmental) | Subgoal Space Restricted to Directly Controllable State Part | High-level: Assigns long-term subgoals Low-level: Generates primitive actions | SAC (with Hindsight Action Relabeling and HER) | 1 | 4 Standard Benchmarks (e.g., AntFourRooms) and 3 Dynamic Tasks (Platforms, Drawbridge, Tennis2D) |
| [73] | 2022 | Visual | Implicit Latent Space | High-level: Selects sub-goal images Low-level: Generates primitive actions | Q-Learning (Conservative) | 1 | CARLA (Town03) and Antmaze |
| [75] | 2020 | Proprioceptive and Relative Features of Nearest Units | Stochastic Role Embedding | High-level: Samples latent role Low-level: Generates primitive actions | QMIX | 5–27 | StarCraft II (SMAC) |
| [76] | 2020 | Relative Features of Nearest Units | Action Space Decomposition | High-level: Selects action subsets Low-level: Generates primitive actions | QMIX (Mixing) | 2–27 | StarCraft II (SMAC) |
| [77] | 2022 | State Vector (Relative Polar Coords and Physics) | Abstract Environmental Layer (Ignored Obstacles and Enhanced Dynamics) | High-level: Generates subgoals (abstract states) Low-level: Generates primitive actions | MH-DDPG (Hierarchical MADDPG) | 3 | Modified MPE (Simple Spread with Obstacles) |
| [83] | 2019 | Lidar and Graph Node States | Topological Graph | High-level: Allocates target region Low-level: Deterministic path planning | MAG-DQN (Spectral GCN + Centralized DQN) | 1–10 | ROS Room Dataset (2D Indoor Grid Maps) |
| [84] | 2022 | Positional, State Estimates, and Grid Map | Occupancy Grid and Uncertainty Heatmap | Global: Graph aggregation (info sharing via DKF) Local: node-update policy (CNN + Action MLP) | Imitation Learning | 10–80 | 2D Cluttered Continuous Area |
| [85] | 2024 | Visual and Pose | Topological Graph | High-level: Selects global goal (via GNN) Low-level: Path planning (FMM) and primitive actions | MAPPO | 3, 4 | Habitat (Gibson and HM3D) |
| [86] | 2024 | LiDAR and Pose | Occupancy Grid and Sparse Frontier Graph | High-level: Allocates cluster centers (mGNN) Low-level: Local routing and ROS Nav | PPO (with mGNN) | 3 | iGibson (Gibson Dataset) |
| [12] | 2025 | Visual (Constrained) and Pose | Informative Graph | Viewpoint and heading selection (via graph attention and action pruning) | SAC (Attentive Privileged Critic) | 2, 4, 8 | Randomly Generated Large-Scale Indoor |
| Category | Work | Targeted Gap | Advantages | Limitations |
|---|---|---|---|---|
| Domain Rand. | [91] | Perception, Dynamics | Simple implementation; enables cold start without real data; robust to dynamic changes. | Policy may be overly conservative; difficult to cover long-tail distributions; high compute cost. |
| Domain Adapt. | [92] | Perception | Optimal performance for specific target environments; effectively utilizes real-world priors. | Requires target domain data; GAN training instability; may introduce geometric distortions affecting depth. |
| Real-to-Sim | [97,98] | Perception | High visual fidelity; highest sim-vs.-real correlation; supports complex physical interactions. | High reconstruction cost; requires re-scanning if environmental physical properties change. |
| Comm. Robustness | [100,101] | Communication | Explicitly handles multi-robot disconnection; enhances practical robustness of distributed systems. | Increases state space complexity; higher demands on real-time algorithmic performance. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lv, B.; Duan, S. Learning-Based Multi-Robot Active SLAM: A Conceptual Framework and Survey. Appl. Sci. 2026, 16, 1412. https://doi.org/10.3390/app16031412
Lv B, Duan S. Learning-Based Multi-Robot Active SLAM: A Conceptual Framework and Survey. Applied Sciences. 2026; 16(3):1412. https://doi.org/10.3390/app16031412
Chicago/Turabian StyleLv, Bowen, and Shihong Duan. 2026. "Learning-Based Multi-Robot Active SLAM: A Conceptual Framework and Survey" Applied Sciences 16, no. 3: 1412. https://doi.org/10.3390/app16031412
APA StyleLv, B., & Duan, S. (2026). Learning-Based Multi-Robot Active SLAM: A Conceptual Framework and Survey. Applied Sciences, 16(3), 1412. https://doi.org/10.3390/app16031412


