SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System
Abstract
1. Introduction
- To further enhance speed, we incorporated elements from YOLO-based detection methods [17] to design a lightweight keypoint detection and matching network named SuperFeats.
- We designed specific loss functions for both the keypoint detection and descriptor networks in SuperFeats. By combining semi-supervised and self-supervised training, the network achieved promising results.
- Leveraging this network, we integrated it with the GICP algorithm [9] and factor graph to develop a new SLAM system, SFGS-SLAM, which optimizes rendering quality efficiently.
2. Relative Works
3. SFGS-SLAM Framework
3.1. Network Structure and Loss Functions
3.2. Feature Point Mapping
3.3. Factor Graph
4. Results
4.1. Setup
4.2. Matching
4.3. Tracking
4.4. Rendering
4.5. Real-World Experiment
4.6. Ablation
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, G.; Wang, W. A survey on 3D gaussian splatting. arXiv 2024, arXiv:2401.03890. [Google Scholar] [CrossRef]
- Labbé, M.; Michaud, F. RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. J. Field Robot. 2019, 36, 416–446. [Google Scholar] [CrossRef]
- Lin, J. Dynamic nerf: A review. arXiv 2024, arXiv:2405.08609. [Google Scholar] [CrossRef]
- Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 1–14. [Google Scholar] [CrossRef]
- Fei, B.; Xu, J.; Zhang, R.; Zhou, Q.; Yang, W.; He, Y. 3D gaussian splatting as new era: A survey. IEEE Trans. Vis. Comput. Graph. 2024, 31, 4429–4449. [Google Scholar] [CrossRef] [PubMed]
- Huang, B.; Yu, Z.; Chen, A.; Geiger, A.; Gao, S. 2d gaussian splatting for geometrically accurate radiance fields. In Proceedings of the ACM SIGGRAPH 2024 Conference Papers, Denver, CO, USA, 27 July–1 August 2024; pp. 1–11. [Google Scholar]
- Keetha, N.; Karhade, J.; Jatavallabhula, K.M.; Yang, G.; Scherer, S.; Ramanan, D.; Luiten, J. Splatam: Splat track & map 3D gaussians for dense rgb-d slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21357–21366. [Google Scholar]
- Matsuki, H.; Murai, R.; Kelly, P.H.; Davison, A.J. Gaussian splatting slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 18039–18048. [Google Scholar]
- Ha, S.; Yeon, J.; Yu, H. Rgbd gs-icp slam. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 180–197. [Google Scholar]
- Huang, H.; Li, L.; Cheng, H.; Yeung, S.-K. Photo-slam: Real-time simultaneous localization and photorealistic mapping for monocular stereo and rgb-d cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21584–21593. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 224–236. [Google Scholar]
- Zhao, X.; Wu, X.; Miao, J.; Chen, W.; Chen, P.C.; Li, Z. Alike: Accurate and lightweight keypoint detection and descriptor extraction. IEEE Trans. Multimed. 2022, 25, 3101–3112. [Google Scholar] [CrossRef]
- Potje, G.; Cadar, F.; Araujo, A.; Martins, R.; Nascimento, E.R. Xfeat: Accelerated features for lightweight image matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 2682–2691. [Google Scholar]
- Vedaldi, A. An implementation of SIFT detector and descriptor. Univ. Calif. Los Angeles 2006, 7. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Varghese, R.; Sambath, M. Yolov8: A novel object detection algorithm with enhanced performance and robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
- Koide, K.; Yokozuka, M.; Oishi, S.; Banno, A. Voxelized GICP for fast and accurate 3D point cloud registration. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11054–11059. [Google Scholar]
- Dellaert, F. Factor graphs and GTSAM: A hands-on introduction. Ga. Inst. Technol. Tech. Rep. 2012, 2. [Google Scholar]
- Jin, S.; Dai, X.; Meng, Q. “Focusing on the right regions”—Guided saliency prediction for visual SLAM. Expert Syst. Appl. 2023, 213, 119068. [Google Scholar] [CrossRef]
- Jin, S.; Wang, X.; Meng, Q. Spatial memory-augmented visual navigation based on hierarchical deep reinforcement learning in unknown environments. Knowl. Based Syst. 2024, 285, 111358. [Google Scholar] [CrossRef]
- Straub, J.; Whelan, T.; Ma, L.; Chen, Y.; Wijmans, E.; Green, S.; Engel, J.J.; Mur-Artal, R.; Ren, C.; Verma, S. The replica dataset: A digital replica of indoor spaces. arXiv 2019, arXiv:1906.05797. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
- Backhaus, A.; Luettel, T.; Wuensche, H.-J. YOLOPoint: Joint Keypoint and Object Detection. In Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems, Kumamoto, Japan, 21–23 August 2023; pp. 112–123. [Google Scholar]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Li, Z.; Snavely, N. Megadepth: Learning single-view depth prediction from internet photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2041–2050. [Google Scholar]
- Balntas, V.; Lenc, K.; Vedaldi, A.; Mikolajczyk, K. HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5173–5182. [Google Scholar]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
- Schubert, D.; Goll, T.; Demmel, N.; Usenko, V.; Stückler, J.; Cremers, D. The TUM VI benchmark for evaluating visual-inertial odometry. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1680–1687. [Google Scholar]
- Zhao, Z.; Wu, C.; Kong, X.; Li, Q.; Guo, Z.; Lv, Z.; Du, X. Light-SLAM: A robust deep-learning visual SLAM system based on LightGlue under challenging lighting conditions. IEEE Trans. Intell. Transp. Syst. 2025, 26, 9918–9931. [Google Scholar] [CrossRef]
- Deng, Z.; Wang, R. SGF-SLAM: Semantic Gaussian Filtering SLAM for Urban Road Environments. Sensors 2025, 25, 3602. [Google Scholar] [CrossRef] [PubMed]
Methods | Acc (5°) | Acc (10°) 1 | Dim | FPS |
---|---|---|---|---|
ORB [15] | 13.8 | 31.9 | 256-b | 46.7 |
Alike [12] | 49.3 | 77.7 | 64-f | 6.9 |
Superpoint [11] | 45.0 | 67.4 | 256-f | 5.1 |
Xfeat [13] | 41.9 | 74.9 | 64-f | 25.8 |
SuperFeats | 40.1 | 70.6 | 64-f | 32.2 |
Methods | ATE (cm) | FPS |
---|---|---|
ORB-SLAM3 [16] | 1.27 | 156.46 |
Light-SLAM [30] | 2.34 | 167.13 |
SplaTAM [7] | 3.23 | 0.43 |
Photo-SLAM [10] | 1.27 | 41.66 |
MonoGS SLAM [8] | 3.69 | 3.21 |
GICP-SLAM [9] | 2.40 | 45.59 |
SFGS-SLAM | 2.33 | 33.17 |
Methods | PSNR [dB] ↑ | SSIM ↑ | LPIPS ↓ | FPS |
---|---|---|---|---|
SplaTAM [7] | 23.46 | 0.906 | 0.156 | 0.32 |
Photo-SLAM [10] | 21.40 | 0.738 | 0.447 | 36.75 |
GICP-SLAM [9] | 19.62 | 0.750 | 0.240 | 42.41 |
SFGS-SLAM | 24.85 | 0.909 | 0.187 | 31.26 |
Method | Metric | Room0 | Room1 | Room2 | Office0 | Office1 | Office2 | Office3 | Office4 | Avg. |
---|---|---|---|---|---|---|---|---|---|---|
SplaTAM [7] | PSNR [dB] ↑ | 32.60 | 33.55 | 34.83 | 38.09 | 39.02 | 31.95 | 29.53 | 31.55 | 33.88 |
SSIM ↓ | 0.975 | 0.969 | 0.982 | 0.983 | 0.981 | 0.966 | 0.949 | 0.951 | 0.970 | |
LPIPS ↓ | 0.070 | 0.097 | 0.074 | 0.088 | 0.093 | 0.098 | 0.119 | 0.150 | 0.099 | |
Photo-SLAM [10] | PSNR [dB] ↑ | 32.09 | 33.03 | 34.30 | 37.56 | 38.42 | 31.47 | 29.10 | 31.05 | 33.37 |
SSIM ↓ | 0.920 | 0.915 | 0.921 | 0.925 | 0.929 | 0.912 | 0.899 | 0.896 | 0.915 | |
LPIPS ↓ | 0.053 | 0.074 | 0.056 | 0.067 | 0.071 | 0.075 | 0.091 | 0.114 | 0.075 | |
Mono GS [8] | PSNR [dB] ↑ | 32.83 | 36.43 | 37.49 | 39.95 | 42.09 | 36.24 | 36.70 | 36.07 | 37.22 |
SSIM ↓ | 0.954 | 0.959 | 0.965 | 0.971 | 0.977 | 0.964 | 0.963 | 0.957 | 0.964 | |
LPIPS ↓ | 0.068 | 0.076 | 0.075 | 0.072 | 0.055 | 0.078 | 0.065 | 0.099 | 0.073 | |
GICP-SLAM [9] | PSNR [dB] ↑ | 32.20 | 35.36 | 34.42 | 40.31 | 40.75 | 33.85 | 34.08 | 34.47 | 35.93 |
SSIM ↓ | 0.940 | 0.960 | 0.957 | 0.978 | 0.977 | 0.962 | 0.953 | 0.963 | 0.961 | |
LPIPS ↓ | 0.081 | 0.067 | 0.083 | 0.045 | 0.051 | 0.069 | 0.067 | 0.065 | 0.066 | |
SFGS-SLAM | PSNR [dB] ↑ | 35.39 | 36.91 | 35.58 | 40.35 | 40.79 | 39.02 | 35.25 | 37.72 | 37.63 |
SSIM ↓ | 0.957 | 0.966 | 0.969 | 0.986 | 0.984 | 0.972 | 0.965 | 0.973 | 0.972 | |
LPIPS ↓ | 0.079 | 0.066 | 0.082 | 0.044 | 0.050 | 0.067 | 0.066 | 0.063 | 0.064 |
Methods | PSNR [dB] ↑ | SSIM ↑ | LPIPS ↓ | Map Storage (MB) | GPU Memory (MB) |
---|---|---|---|---|---|
3D Gaussian Splatting [4] | 23.35 | 0.787 | 0.298 | 3432 | 6945 |
Photo-SLAM [10] | 20.57 | 0.706 | 0.345 | 3247 | 4126 |
GICP-SLAM [9] | 19.36 | 0.714 | 0.301 | 2981 | 3941 |
SFGS-SLAM | 24.66 | 0.805 | 0.239 | 2877 | 3804 |
Methods | PSNR [dB] ↑ | SSIM ↑ | LPIPS ↓ | ATE (cm) |
---|---|---|---|---|
Without SuperFeats | 23.08 | 0.881 | 0.224 | 3.74 |
Without Factor Graph | 21.56 | 0.798 | 0.346 | 5.23 |
SFGS-SLAM | 24.85 | 0.909 | 0.187 | 2.33 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, R.; Deng, Z. SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System. Appl. Sci. 2025, 15, 10876. https://doi.org/10.3390/app152010876
Wang R, Deng Z. SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System. Applied Sciences. 2025; 15(20):10876. https://doi.org/10.3390/app152010876
Chicago/Turabian StyleWang, Runmin, and Zhongliang Deng. 2025. "SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System" Applied Sciences 15, no. 20: 10876. https://doi.org/10.3390/app152010876
APA StyleWang, R., & Deng, Z. (2025). SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System. Applied Sciences, 15(20), 10876. https://doi.org/10.3390/app152010876