A Frontier Review of Semantic SLAM Technologies Applied to the Open World
Abstract
1. Introduction
- Section 2 organizes and introduces the development and current research status of open-world semantic SLAM technology, summarizing the progress and significance of semantic SLAM in open-world settings based on existing studies;
- Section 3 focuses on the three key technologies of open-world semantic SLAM: zero-shot open-vocabulary understanding, dynamic semantic expansion, and multimodal semantic fusion. It systematically reviews the cutting-edge advancements and core challenges, aiming to provide theoretical guidance and technical roadmaps for future research and to promote the transition of semantic SLAM from controlled lab environments to real-world open environments;
- Section 4 compiles commonly used datasets and evaluation metrics for open-world semantic SLAM, and evaluates and compares the performance of current algorithms in related studies;
- Section 5 presents future research directions and recommendations.
2. The Evolution from Visual SLAM to Open-World Semantic SLAM
3. Theoretical Foundations and Key Enabling Technologies
3.1. Zero-Shot Open-Vocabulary Understanding
3.2. Dynamic Semantic Expansion
3.3. Multimodal Semantic Fusion
4. Algorithm Evaluation and Performance Comparison
4.1. Open-World Dataset
4.2. Performance Metrics
4.2.1. Absolute Trajectory Error
4.2.2. Mean Intersection over Union
4.2.3. Frames per Second
4.3. Performance Comparison
4.3.1. Accuracy Improvement
- Extended 3D Gaussian representation: The conventional Gaussian Splatting framework is extended to a 13-dimensional parameter space, enabling the joint optimization of geometric, semantic, and instance-level attributes. This enriched representation supports multimodal rendering, including depth, color, semantics, and instance labels;
- Spatio-temporal label enhancement (STL) module: To address the noise inherent in pseudo-labels generated by pretrained 2D vision models, the STL module projects these labels into a 3D voxel space. By leveraging multi-view consistency, it performs joint optimization using a voxel-wise label aggregation formulation
4.3.2. Semantic Generalization Ability
4.3.3. Real-Time Performance Optimization
5. Summary and Prospects
- Designing robust semantic representation mechanisms tailored for open-vocabulary scenarios, incorporating multi-view, multi-frame, and contextual modeling strategies to mitigate inter-class confusion and semantic ambiguity;
- Developing efficient and lightweight vision-language models and edge deployment frameworks that balance recognition accuracy with computational efficiency;
- Constructing semantic expansion frameworks that support continual and incremental learning, incorporating memory-based architectures and regularization techniques to prevent catastrophic forgetting and enhance adaptability to novel categories;
- Improving multimodal fusion by advancing temporal-spatial alignment and cross-modal feature matching methods, alongside uncertainty modeling and robust state estimation, to enhance system stability and perception performance in dynamic open-world settings.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rosen, D.M.; Doherty, K.J.; Terán Espinoza, A.; Leonard, J.J. Advances in inference and representation for simultaneous localization and mapping. Annu. Rev. Control. Robot. Auton. Syst. 2021, 4, 215–242. [Google Scholar] [CrossRef]
- Galagain, C.; Poreba, M.; Goulette, F. Is Semantic SLAM Ready for Embedded Systems ? A Comparative Survey. arXiv 2025, arXiv:2505.12384. [Google Scholar]
- Durrant-Whyte, H.; Bailey, T. Simultaneous localization and mapping: Part I. IEEE Robot. Autom. Mag. 2006, 13, 99–110. [Google Scholar] [CrossRef]
- Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Leonard, J.J. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE Trans. Robot. 2017, 32, 1309–1332. [Google Scholar] [CrossRef]
- Bowman, S.L.; Atanasov, N.; Daniilidis, K.; Pappas, G.J. Probabilistic data association for semantic SLAM. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1722–1729. [Google Scholar]
- Kazerouni, I.A.; Fitzgerald, L.; Dooly, G.; Toal, D. A survey of state-of-the-art on visual SLAM. Expert Syst. Appl. 2022, 205, 117734. [Google Scholar] [CrossRef]
- Chen, W.; Shang, G.; Ji, A.; Zhou, C.; Wang, X.; Xu, C.; Hu, K. An Overview on Visual SLAM: From Tradition to Semantic. Remote Sens. 2022, 14, 3010. [Google Scholar] [CrossRef]
- Tateno, K.; Tombari, F.; Laina, I.; Navab, N. CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6243–6252. [Google Scholar]
- Jiang, L.; Shi, S.; Schiele, B. Open-vocabulary 3D semantic segmentation with foundation models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21284–21294. [Google Scholar]
- Singh, K.; Magoun, T.; Leonard, J.J. LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping. arXiv 2024, arXiv:2404.04377. [Google Scholar]
- Wu, Y.; Meng, J.; Li, H.; Wu, C.; Shi, Y.; Cheng, X.; Zhang, J. Opengaussian: Towards point-level 3D Gaussian-based open vocabulary understanding. arXiv 2024, arXiv:2406.02058. [Google Scholar]
- Hong, J.; Choi, R.; Leonard, J.J. Semantic enhancement for object SLAM with heterogeneous multimodal large language model agents. arXiv 2025, arXiv:2411.06752. [Google Scholar]
- Li, B.; Cai, Z.; Li, Y.F.; Reid, I.; Rezatofighi, H. Hi-SLAM: Scaling-up semantics in SLAM with a hierarchically categorical Gaussian splatting. arXiv 2024, arXiv:2409.12518. [Google Scholar]
- Zhou, Z.; Lei, Y.; Zhang, B.; Liu, L.; Liu, Y. ZEGCLIP: Towards adapting CLIP for zero-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 11175–11185. [Google Scholar]
- Zhang, J.; Dong, R.; Ma, K. CLIP-FO3D: Learning free open-world 3D scene representations from 2D dense CLIP. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 2048–2059. [Google Scholar]
- Kassab, C.; Mattamala, M.; Zhang, L.; Fallon, M. Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 15988–15994. [Google Scholar]
- Jatavallabhula, K.M.; Kuwajerwala, A.; Gu, Q.; Omama, M.; Chen, T.; Maalouf, A.; Torralba, A. ConceptFusion: Open-set multimodal 3D mapping. arXiv 2023, arXiv:2302.07241. [Google Scholar]
- Zhu, F.; Zhao, Y.; Chen, Z.; Jiang, C.; Zhu, H.; Hu, X. DyGS-SLAM: Realistic Map Reconstruction in Dynamic Scenes Based on Double-Constrained Visual SLAM. Remote Sens. 2025, 17, 625. [Google Scholar] [CrossRef]
- Liu, P.; Guo, Z.; Warke, M.; Chintala, S.; Paxton, C.; Shafiullah, N.M.M.; Pinto, L. DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation. arXiv 2024, arXiv:2411.04999. [Google Scholar]
- Yan, Z.; Li, S.; Wang, Z.; Wu, L.; Wang, H.; Zhu, J.; Liu, J. Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation. arXiv 2024, arXiv:2410.11989. [Google Scholar] [CrossRef]
- Laina, S.B.; Boche, S.; Papatheodorou, S.; Schaefer, S.; Jung, J.; Leutenegger, S. FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment. arXiv 2025, arXiv:2504.08603. [Google Scholar]
- Steinke, T.; Büchner, M.; Vödisch, N.; Valada, A. Collaborative Dynamic 3D Scene Graphs for Open-Vocabulary Urban Scene Understanding. arXiv 2025, arXiv:2503.08474. [Google Scholar]
- Song, C.; Zeng, B.; Cheng, J.; Wu, F.; Hao, F. PSMD-SLAM: Panoptic Segmentation-Aided Multi-Sensor Fusion Simultaneous Localization and Mapping in Dynamic Scenes. Appl. Sci. 2024, 14, 3843. [Google Scholar] [CrossRef]
- Gonzalez, M.; Marchand, E.; Kacete, A.; Royan, J. TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic SLAM. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 9126–9132. [Google Scholar]
- Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-Time Single Camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [PubMed]
- Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; pp. 225–234. [Google Scholar]
- Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 834–849. [Google Scholar]
- Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast semi-direct monocular visual odometry. In Proceedings of the IEEE International Conference on Robotics & Automation, Vienna, Austria, 1–5 June 2014; pp. 15–22. [Google Scholar]
- Salas-Moreno, R.F.; Newcombe, R.A.; Strasdat, H.; Kelly, P.H.J.; Davison, A.J. SLAM++: Simultaneous Localisation and Mapping at the Level of Objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1352–1359. [Google Scholar]
- Runz, M.; Buffier, M.; Agapito, L. MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects. In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, Germany, 16–20 October 2018; pp. 10–20. [Google Scholar]
- McCormac, J.; Handa, A.; Davison, A.; Leutenegger, S. SemanticFusion: Dense 3D semantic mapping with convolutional neural networks. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 4628–4635. [Google Scholar]
- Yu, C.; Liu, Z.; Liu, X.-J.; Xie, F.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1168–1174. [Google Scholar]
- Bescos, B.; Fácil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021. [Google Scholar]
- Peng, S.; Genova, K.; Jiang, C.; Tagliasacchi, A.; Pollefeys, M.; Funkhouser, T.A. OpenScene: 3D Scene Understanding with Open Vocabularies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2023; pp. 815–824. [Google Scholar]
- Shafiullah, N.; Paxton, C.; Pinto, L.; Chintala, S.; Szlam, A. CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory. arXiv 2022, arXiv:2210.05663. [Google Scholar]
- Martins, T.B.; Oswald, M.R.; Civera, J. OVO-SLAM: Open-Vocabulary Online Simultaneous Localization and Mapping. arXiv 2024, arXiv:2411.15043. [Google Scholar]
- Yang, D.; Gao, Y.; Wang, X.; Yue, Y.; Yang, Y.; Fu, M. OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding. arXiv 2025, arXiv:2503.01646. [Google Scholar]
- Gupta, A.; Narayan, S.; Joseph, K.J.; Khan, S.; Khan, F.S.; Shah, M. Ow-detr: Open-world detection transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9235–9244. [Google Scholar]
- Yamazaki, K.; Hanyu, T.; Vo, K.; Pham, T.; Tran, M.; Doretto, G.; Le, N. Open-fusion: Real-time open-vocabulary 3D mapping and queryable scene representation. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 9411–9417. [Google Scholar]
- Yao, L.; Han, J.; Wen, Y.; Liang, X.; Xu, D.; Zhang, W.; Xu, H. Detclip: Dictionary-enriched visual-concept paralleled pre-training for open-world detection. Adv. Neural Inf. Process. Syst. 2022, 35, 9125–9138. [Google Scholar]
- Vödisch, N.; Cattaneo, D.; Burgard, W.; Valada, A. Continual SLAM: Beyond lifelong simultaneous localization and mapping through continual learning. In Proceedings of the International Symposium of Robotics Research, Geneva, Switzerland, 25–30 September 2022; pp. 19–35. [Google Scholar]
- Gu, X.; Lin, T.Y.; Kuo, W.; Cui, Y. Open-vocabulary object detection via vision and language knowledge distillation. arXiv 2021, arXiv:2104.13921. [Google Scholar]
- Zhang, H.; Zhang, P.; Hu, X.; Chen, Y.C.; Li, L.; Dai, X.; Gao, J. Glipv2: Unifying localization and vision-language understanding. Adv. Neural Inf. Process. Syst. 2022, 35, 36067–36080. [Google Scholar]
- Li, J.; Li, D.; Savarese, S.; Hoi, S. Blip-2: Bootstrap language-image pre-training with frozen image encoders and large language models. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 19730–19742. [Google Scholar]
- Gao, Y.; Liu, J.; Xu, Z.; Zhang, J.; Li, K.; Ji, R.; Shen, C. PyramidCLIP: Hierarchical feature alignment for vision-language model pretraining. Adv. Neural Inf. Process. Syst. 2022, 35, 35959–35970. [Google Scholar]
- Martins, T.B.; Oswald, M.R.; Civera, J. Open-vocabulary online semantic mapping for SLAM. arXiv 2025, arXiv:2411.15043. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Visual instruction tuning. Adv. Neural Inf. Process. Syst. 2023, 36, 34892–34916. [Google Scholar]
- Wu, Y.; Gao, Q.; Zhang, R.; Li, H.; Zhang, J. Language-assisted 3D scene understanding. IEEE Trans. Multimed. 2025, 27, 3869–3879. [Google Scholar] [CrossRef]
- Lipson, L.; Teed, Z.; Deng, J. Deep patch visual SLAM. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 424–440. [Google Scholar]
- Ha, H.; Song, S. Semantic abstraction: Open-world 3D scene understanding from 2D vision-language models. arXiv 2022, arXiv:2207.11514. [Google Scholar]
- Mohiuddin, R.; Prakhya, S.M.; Collins, F.; Liu, Z.; Borrmann, A. OpenSU3D: Open world 3D scene understanding using foundation models. arXiv 2024, arXiv:2407.14279. [Google Scholar]
- Eslamian, A.; Ahmadzadeh, M.R. Det-SLAM: A semantic visual SLAM for highly dynamic scenes using Detectron2. In Proceedings of the 2022 8th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Behshahr, Iran, 28–29 December 2022; pp. 1–5. [Google Scholar]
- Chen, R.; Wang, Z.; Wang, J.; Ma, Y.; Gong, M.; Wang, W.; Liu, T. PanoSLAM: Panoptic 3D scene reconstruction via Gaussian SLAM. arXiv 2024, arXiv:2501.00352. [Google Scholar]
- Zhu, S.; Wang, G.; Blum, H.; Liu, J.; Song, L.; Pollefeys, M.; Wang, H. SNI-SLAM: Semantic neural implicit SLAM. In Proceedings of the IEEE Transactions on Visualization and Computer Graphics, Online, 25–29 March 2023. [Google Scholar]
- Cen, J.; Yun, P.; Zhang, S.; Cai, J.; Luan, D.; Tang, M.; Wang, M.Y. Open-world semantic segmentation for LiDAR point clouds. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2022; pp. 318–334. [Google Scholar]
- Wan, Z.; Mao, Y.; Zhang, J.; Dai, Y. Rpeflow: Multimodal fusion of RGB-Pointcloud-Event for joint optical flow and scene flow estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 10030–10040. [Google Scholar]
- Wei, H.; Jiao, J.; Hu, X.; Yu, J.; Xie, X.; Wu, J.; Liu, M. FusionPortableV2: A unified multi-sensor dataset for generalized SLAM across diverse platforms and scalable environments. Int. J. Robot. Res. 2025, 44, 1093–1116. [Google Scholar] [CrossRef]
- Li, J.; Dai, H.; Han, H.; Ding, Y. MSeg3D: Multi-modal 3D semantic segmentation for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 21694–21704. [Google Scholar]
- Gu, J.; Bellone, M.; Pivoňka, T.; Sell, R. CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in Autonomous Driving. IEEE Trans. Intell. Veh. 2024, early access.
- Han, X.; Chen, S.; Fu, Z.; Feng, Z.; Fan, L.; An, D.; Xu, S. Multimodal fusion and vision-language models: A survey for robot vision. arXiv 2025, arXiv:2504.02477. [Google Scholar]
- Nguyen, T.M.; Yuan, S.; Cao, M.; Nguyen, T.H.; Xie, L. Viral SLAM: Tightly coupled camera-IMU-UWB-LiDAR SLAM. arXiv 2021, arXiv:2105.03296. [Google Scholar]
- Jia, Y.; Luo, H.; Zhao, F.; Jiang, G.; Li, Y.; Yan, J.; Wang, Z. LVIO-Fusion: A self-adaptive multi-sensor fusion SLAM framework using actor-critic method. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 286–293. [Google Scholar]
- Wang, Y.; Abd Rahman, A.H.; Nor Rashid, F.A.; Razali, M.K.M. Tackling Heterogeneous Light Detection and Ranging-Camera Alignment Challenges in Dynamic Environments: A Review for Object Detection. Sensors 2024, 24, 7855. [Google Scholar] [CrossRef] [PubMed]
- He, Q.; Peng, J.; Jiang, Z.; Wu, K.; Ji, X.; Zhang, J.; Wu, Y. UniM-OV3D: Uni-modality open-vocabulary 3D scene understanding with fine-grained feature representation. arXiv 2024, arXiv:2401.11395. [Google Scholar]
- Nwankwo, L.; Rueckert, E. Understanding Why SLAM Algorithms Fail in Modern Indoor Environments. In Proceedings of the International Conference on Robotics in Alpe-Adria Danube Region, Bled, Slovenia, 14–16 June 2023; pp. 186–194. [Google Scholar]
- Liu, X.; He, Y.; Li, J.; Yan, R.; Li, X.; Huang, H. A comparative review on enhancing visual simultaneous localization and mapping with deep semantic segmentation. Sensors 2024, 24, 3388. [Google Scholar] [CrossRef]
- Ali, I.; Wan, B.; Zhang, H. Prediction of SLAM ATE using an ensemble learning regression model and 1-D global pooling of data characterization. arXiv 2023, arXiv:2303.00616. [Google Scholar]
- Lee, D.; Jung, M.; Yang, W.; Kim, A. Lidar odometry survey: Recent advancements and remaining challenges. Intell. Serv. Robot. 2024, 17, 95–118. [Google Scholar] [CrossRef]
- Lee, S.H.; Civera, J. What is wrong with the absolute trajectory error? In Proceedings of the European Conference on Computer Vision (ECCV), Dublin, Ireland, 17–18 September 2025; pp. 108–123. [Google Scholar]
- Ulku, I.; Akagündüz, E. A survey on deep learning-based architectures for semantic segmentation on 2D images. Appl. Artif. Intell. 2022, 36, 2032924. [Google Scholar] [CrossRef]
- Xiang, Z.; Guo, J.; Meng, J.; Meng, X.; Li, Y.; Kim, J.; Chen, Y. Accurate localization of indoor high similarity scenes using visual SLAM combined with loop closure detection algorithm. PLoS ONE 2024, 19, e0312358. [Google Scholar] [CrossRef] [PubMed]
- Lisus, D.; Holmes, C.; Waslander, S. Towards Open World NeRF-Based SLAM. In Proceedings of the 2023 20th Conference on Robots and Vision (CRV), Montreal, QC, Canada, 6–8 June 2023; pp. 37–44. [Google Scholar]
- Zhu, Z.; Peng, S.; Larsson, V.; Xu, W.; Bao, H.; Cui, Z.; Pollefeys, M. Nice-SLAM: Neural implicit scalable encoding for SLAM. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12786–12796. [Google Scholar]
Method | Monocular | RGB-D | No Semantic | Closed-Set Semantic | Open-World Semantic |
---|---|---|---|---|---|
MonoSLAM | ✓ | × | ✓ | × | × |
PTAM | ✓ | × | ✓ | × | × |
LSD-SLAM | ✓ | × | ✓ | × | × |
SVO | ✓ | × | ✓ | × | × |
SLAM++ | × | ✓ | × | ✓ | × |
SemanticFusion | × | ✓ | × | ✓ | × |
DS-SLAM | × | ✓ | × | ✓ | × |
DynaSLAM | × | ✓ | × | ✓ | × |
OVO-SLAM | × | ✓ | × | × | ✓ |
OpenGS-SLAM | × | ✓ | × | × | ✓ |
Dataset | RGB | LiDAR | Depth | Language | Instance Labels | Indoor | Outdoor | Openness Level |
---|---|---|---|---|---|---|---|---|
OpenScene | ✓ | × | ✓ | ✓ | ✓ | ✓ | ✓ | High |
OpenVocab-3D | ✓ | × | × | ✓ | ✓ | ✓ | × | High |
SemanticKITTI | × | ✓ | × | × | × | × | ✓ | Medium |
ScanNet | ✓ | × | ✓ | × | ✓ | ✓ | × | Medium |
Replica | ✓ | × | ✓ | × | ✓ | ✓ | × | Medium |
Method | Contribution | ATE | mIoU (%) | FPS | System | Code Resource |
---|---|---|---|---|---|---|
NeRF-Based SLAM [76] | Extending the NeRF baseline approach to open-world scenarios | - | - | - | NVIDIA RTX 2070 SUPER GPU | - |
SNI-SLAM [58] | Proposing multimodal feature fusion and hierarchical semantic coding | 0.456 | 87.41 | 2.15 | NVIDIA RTX 4090 GPU | https://github.com/IRMVLab/SNI-SLAM (accessed on 10 August 2025) |
OVO-SLAM [37] | First online open vocabulary 3D semantic SLAM implementation | - | 27.1 | - | NVIDIA RTX 3090 GPU | - |
PanoSLAM [57] | First implementation of geometric reconstruction, 3D semantic segmentation and instance segmentation in a unified framework | 0.39 | 50.67 (Avg) | - | NVIDIA RTX 4090 GPU | https://github.com/runnachen/PanoSLAM (accessed on 10 August 2025) |
OpenGS-SLAM [38] | Implementing dense semantic SLAM in open set scenarios | 0.16 | 63.17 | 165.47 | - | https://young-bit.github.io/opengs-github.github.io/ (accessed on 10 August 2025) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Miao, L.; Liu, W.; Deng, Z. A Frontier Review of Semantic SLAM Technologies Applied to the Open World. Sensors 2025, 25, 4994. https://doi.org/10.3390/s25164994
Miao L, Liu W, Deng Z. A Frontier Review of Semantic SLAM Technologies Applied to the Open World. Sensors. 2025; 25(16):4994. https://doi.org/10.3390/s25164994
Chicago/Turabian StyleMiao, Le, Wen Liu, and Zhongliang Deng. 2025. "A Frontier Review of Semantic SLAM Technologies Applied to the Open World" Sensors 25, no. 16: 4994. https://doi.org/10.3390/s25164994
APA StyleMiao, L., Liu, W., & Deng, Z. (2025). A Frontier Review of Semantic SLAM Technologies Applied to the Open World. Sensors, 25(16), 4994. https://doi.org/10.3390/s25164994