Enhancing Real-Time Visual SLAM with Distant Landmarks in Large-Scale Environments
Abstract
:1. Introduction
- The limitation of the conventional SLAM algorithm in large-scale environments is revealed by discussing the perception range constrained by the triangulation parallax angle. By enhancing the covisibility of keyframes for the graph optimization, a methodology is proposed to improve SLAM by the observation of distant landmarks.
- The concept of the virtual map point is introduced, representing a map point candidate without a determined spatial coordinate. By continuously tracking corresponding features across frames through this data structure, distant map points can be triangulated once sufficient parallax angles are achieved, expanding the mapping range of visual SLAM. Meanwhile, these features are related, monitored, and retrieved effectively and preciously without extending the range of local mapping, thus constraining the scale of optimization.
- An example SLAM software incorporating the proposed method is implemented based on the open-source ORB-SLAM3 code. Experiments conducted on drone and vehicle datasets, along with field tests on an embedded system in a UGV, demonstrate that the proposed method surpasses the state-of-the-art baseline system in terms of perception range, enhancing localization accuracy while maintaining real-time performance.
2. Related Works
3. Perception of Distant Landmarks in SLAM
3.1. Triangulation Error and Parallax Angle
3.2. Perception Range of Feature-Based SLAM
3.3. Localization Enhanced by Distant Landmarks
- Construct distant landmarks as early as possible;
- Constrain the batch size of keyframes to optimize while constructing landmarks.
4. SLAM System with the Virtual Map Point
4.1. Virtual Map Point
- Efficient Parallax Inspection: Through computing angles between back-projected rays from observed frames, the maximum parallax angle among frames is continuously updated within the data structure. This procedure is detached from the optimization for tracking or mapping; thus, the angle can be inspected constantly but effectively, which consumes minimal computational resources, ensuring timely awareness of enough parallax.
- Rapid Frame Retrieval: The features corresponding to the same distant landmark are continuously attached to the data structure. Instead of searching for frames outside the range of local mapping by feature matching, the frame corresponding to sufficient parallax angle can be retrieved effectively through indexed features.
- Seamless Conversion to Map Point: Once the spatial coordinates of a virtual map point are determined, the observation from historical frames is inherited when constructing the corresponding map point. This relationship of observations enhances covisibility between frames and is crucial for further local and global optimization in the SLAM system.
4.2. Software Implementation Based on ORB-SLAM3
Algorithm 1 Mapping with virtual map points. |
Input: New frame with n features and pose ; Adjacent keyframe with features and pose ; Output: New map points and virtual map points ;
|
5. Experimental Results and Discussion
5.1. Dataset Tests
5.2. Discussion on Dataset Tests
5.2.1. Accuracy of Localization
5.2.2. Range of Mapping
5.2.3. Real-Time Performance
5.3. Field Tests and Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ATE | Absolute Trajectory Error |
DLT | Direct Linear Transform |
GNSS | Global Navigation Satellite System |
IMU | Inertial Measurement Unit |
INS | Inertial Navigation System |
LiDAR | Light Detection and Ranging |
MAV | Micro Aerial Vehicle |
MP | Map Point |
PBA | Photometric Bundle Adjustment |
RMS | Root Mean Square |
RTK | Real-time Kinematic Positioning |
SLAM | Simultaneous Localization and Mapping |
SVD | Singular Value Decomposition |
ToF | Time-of-Flight |
UGV | Unmanned Ground Vehicle |
VMP | Virtual Map Point |
References
- Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.; Leonard, J.J. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE Trans. Robot. 2016, 32, 1309–1332. [Google Scholar] [CrossRef]
- Wang, K.; Zhao, G.; Lu, J. A Deep Analysis of Visual SLAM Methods for Highly Automated and Autonomous Vehicles in Complex Urban Environment. IEEE Trans. Intell. Transp. Syst. 2024, 25, 10524–10541. [Google Scholar] [CrossRef]
- Zhang, S.; Zhao, S.; An, D.; Liu, J.; Wang, H.; Feng, Y.; Li, D.; Zhao, R. Visual SLAM for underwater vehicles: A survey. Comput. Sci. Rev. 2022, 46, 100510. [Google Scholar] [CrossRef]
- Ding, H.; Zhang, B.; Zhou, J.; Yan, Y.; Tian, G.; Gu, B. Recent developments and applications of simultaneous localization and mapping in agriculture. J. Field Robot. 2022, 39, 956–983. [Google Scholar] [CrossRef]
- Gupta, A.; Fernando, X. Simultaneous Localization and Mapping (SLAM) and Data Fusion in Unmanned Aerial Vehicles: Recent Advances and Challenges. Drones 2022, 6, 85. [Google Scholar] [CrossRef]
- Wang, K.; Kooistra, L.; Pan, R.; Wang, W.; Valente, J. UAV-based simultaneous localization and mapping in outdoor environments: A systematic scoping review. J. Field Robot. 2024, 41, 1617–1642. [Google Scholar] [CrossRef]
- He, M.; Zhu, C.; Huang, Q.; Ren, B.; Liu, J. A review of monocular visual odometry. Vis. Comput. 2020, 36, 1053–1065. [Google Scholar] [CrossRef]
- Forster, C.; Zhang, Z.; Gassner, M.; Werlberger, M.; Scaramuzza, D. SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems. IEEE Trans. Robot. 2017, 33, 249–265. [Google Scholar] [CrossRef]
- Zubizarreta, J.; Aguinaga, I.; Montiel, J.M.M. Direct Sparse Mapping. IEEE Trans. Robot. 2020, 36, 1363–1370. [Google Scholar] [CrossRef]
- Strasdat, H.; Montiel, J.; Davison, A.J. Visual SLAM: Why filter? Image Vis. Comput. 2012, 30, 65–77. [Google Scholar] [CrossRef]
- Klein, G.; Murray, D. Parallel Tracking and Mapping for Small AR Workspaces. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; pp. 225–234. [Google Scholar] [CrossRef]
- Herrera, D.C.; Kim, K.; Kannala, J.; Pulli, K.; Heikkilä, J. DT-SLAM: Deferred Triangulation for Robust SLAM. In Proceedings of the 2014 2nd International Conference on 3D Vision, Tokyo, Japan, 8–11 December 2014; Volume 1, pp. 609–616. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Muravyev, K.; Yakovlev, K. Evaluation of RGB-D SLAM in Large Indoor Environments. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Fuzhou, China, 16–18 December 2022; Volume 13719 LNCS, pp. 93–104. [Google Scholar]
- Graeter, J.; Wilczynski, A.; Lauer, M. LIMO: Lidar-Monocular Visual Odometry. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 7872–7879. [Google Scholar] [CrossRef]
- Zhang, J.; Huang, Z.; Zhu, X.; Guo, F.; Sun, C.; Zhan, Q.; Shen, R. LOFF: LiDAR and Optical Flow Fusion Odometry. Drones 2024, 8, 411. [Google Scholar] [CrossRef]
- Gao, B.; Lang, H.; Ren, J. Stereo Visual SLAM for Autonomous Vehicles: A Review. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; pp. 1316–1322. [Google Scholar] [CrossRef]
- Xue, F.; Budvytis, I.; Reino, D.O.; Cipolla, R. Efficient Large-scale Localization by Global Instance Recognition. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17327–17336. [Google Scholar] [CrossRef]
- Zhang, X.; Dong, J.; Zhang, Y.; Liu, Y.H. MS-SLAM: Memory-Efficient Visual SLAM with Sliding Window Map Sparsification. J. Field Robot. 2024; early access. [Google Scholar] [CrossRef]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar] [CrossRef]
- Xiang, G.; Tao, Z. Introduction to Visual SLAM from Theory to Practice; Springer: Singapore, 2021; pp. 157–165. [Google Scholar]
- Kümmerle, R.; Grisetti, G.; Strasdat, H.; Konolige, K.; Burgard, W. G2o: A general framework for graph optimization. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 3607–3613. [Google Scholar] [CrossRef]
- Barfoot, T.D. State Estimation for Robotics: Second Edition, 2nd ed.; Cambridge University Press: Cambridge, UK, 2024. [Google Scholar]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
- Jeong, J.; Cho, Y.; Shin, Y.S.; Roh, H.; Kim, A. Complex Urban Dataset with Multi-level Sensors from Highly Diverse Urban Environments. Int. J. Robot. Res. 2019, 38, 642–657. [Google Scholar] [CrossRef]
- Majdik, A.L.; Till, C.; Scaramuzza, D. The Zurich urban micro aerial vehicle dataset. Int. J. Robot. Res. 2017, 36, 269–273. [Google Scholar] [CrossRef]
- Grupp, M. EVO: Python Package for the Evaluation of Odometry and SLAM. 2017. Available online: https://github.com/MichaelGrupp/evo (accessed on 6 August 2023).
- Sousa, R.B.; Sobreira, H.M.; Moreira, A.P. A systematic literature review on long-term localization and mapping for mobile robots. J. Field Robot. 2023, 40, 1245–1322. [Google Scholar] [CrossRef]
Sequence | Images | Method | Mean Tracking Time (ms) | RMS ATE (m) | Median MP Triangulation Depth (m) | Median MP Observation Distance (m) | Map Points | Virtual Map Point Percentage |
---|---|---|---|---|---|---|---|---|
KITTI00 | 4541 | ORB3 1 | 15.413 | 8.928 | 17.669 | 20.581 | 152,450 | - |
MOD 2 | 14.856 | 7.095 | 18.024 | 20.944 | 153,900 | 4.71% | ||
KITTI 01 | 1101 | ORB3 | 13.410 | 405.202 | 56.957 | 62.147 | 25,925 | - |
MOD | 13.612 | 357.171 | 60.985 | 60.345 | 26,156 | 6.29% | ||
KITTI 02 | 4661 | ORB3 | 15.493 | 28.620 | 17.813 | 20.500 | 197,044 | - |
MOD | 15.104 | 25.877 | 18.639 | 21.452 | 192,774 | 5.24% | ||
KITTI 03 | 801 | ORB3 | 21.328 | 0.827 | 16.793 | 20.448 | 30,243 | - |
MOD | 22.641 | 0.795 | 16.770 | 20.651 | 30,388 | 4.85% | ||
KITTI 04 | 271 | ORB3 | 15.159 | 0.968 | 30.519 | 36.302 | 11,789 | - |
MOD | 15.642 | 0.538 | 32.786 | 38.703 | 11,458 | 13.41% | ||
KITTI 05 | 2761 | ORB3 | 14.879 | 7.902 | 23.283 | 27.480 | 88,343 | - |
MOD | 14.203 | 6.389 | 23.385 | 27.856 | 87,560 | 6.21% | ||
KITTI 06 | 1101 | ORB3 | 15.388 | 13.621 | 27.486 | 31.502 | 37,027 | - |
MOD | 15.014 | 12.714 | 28.578 | 32.296 | 35,856 | 8.33% | ||
KITTI 07 | 1101 | ORB3 | 16.712 | 3.140 | 20.472 | 24.015 | 40,221 | - |
MOD | 15.515 | 2.363 | 21.920 | 25.728 | 42,928 | 5.27% | ||
KITTI 08 | 4071 | ORB3 | 19.935 | 66.120 | 12.477 | 14.144 | 162,303 | - |
MOD | 15.597 | 55.433 | 12.855 | 14.563 | 154,909 | 6.56% | ||
KITTI 09 | 1591 | ORB3 | 14.208 | 7.910 | 20.098 | 25.036 | 69,606 | - |
MOD | 14.942 | 7.330 | 20.415 | 25.523 | 72,536 | 4.71% | ||
KITTI 10 | 1201 | ORB3 | 14.759 | 8.292 | 13.452 | 15.862 | 47,516 | - |
MOD | 14.572 | 6.811 | 13.458 | 15.744 | 47,045 | 5.42% | ||
EuRoC MH 01 | 3682 | ORB3 | 16.754 | 0.0457 | 3.914 | 4.699 | 13,591 | - |
MOD | 13.964 | 0.0453 | 3.929 | 4.687 | 13,507 | 2.18% | ||
EuRoC MH 02 | 3040 | ORB3 | 14.572 | 0.0401 | 3.558 | 4.395 | 11,924 | - |
MOD | 13.072 | 0.0356 | 3.570 | 4.481 | 11,909 | 4.01% | ||
EuRoC MH 03 | 2700 | ORB3 | 13.450 | 0.0365 | 4.309 | 5.004 | 10,501 | - |
MOD | 12.843 | 0.0364 | 4.291 | 5.033 | 10,241 | 2.88% | ||
EuRoC MH 04 | 2033 | ORB3 | 12.167 | 0.0531 | 5.821 | 6.856 | 13,256 | - |
MOD | 10.529 | 0.0482 | 5.822 | 6.810 | 13,008 | 3.69% | ||
EuRoCMH 05 | 2273 | ORB3 | 12.056 | 0.0583 | 5.393 | 6.254 | 13,926 | - |
MOD | 12.454 | 0.0480 | 5.548 | 6.703 | 12,954 | 5.71% |
Sequence | Images | Method | Mean Tracking Time (ms) | RMS ATE (m) | Median MP Triangulation Depth (m) | Median MP Observation Distance (m) | Map Points | Virtual Map Point Percentage |
---|---|---|---|---|---|---|---|---|
KAIST 26 | 5837 | ORB3 | 25.976 | 35.428 | 17.879 | 24.719 | 100,549 | - |
MOD | 24.749 | 23.361 | 18.598 | 27.682 | 96,321 | 8.50% | ||
KAIST27 | 11,605 | ORB3 | 24.500 | 26.274 | 19.218 | 31.358 | 138,445 | - |
MOD | 20.694 | 22.483 | 19.608 | 32.510 | 136,682 | 9.60% | ||
KAIST 28 | 19,745 | ORB3 | 20.258 | 87.110 | 21.114 | 22.994 | 270,674 | - |
MOD | 22.767 | 72.789 | 23.879 | 23.296 | 264,266 | 8.25% | ||
KAIST 29 | 4436 | ORB3 | 24.965 | 98.036 | 21.114 | 18.217 | 62,721 | - |
MOD | 22.633 | 73.398 | 23.879 | 21.702 | 60,797 | 11.24% | ||
KAIST32 | 10,968 | ORB3 | 22.339 | 108.416 | 11.309 | 10.729 | 195,685 | - |
MOD | 21.872 | 83.542 | 17.674 | 16.859 | 187,904 | 9.21% | ||
KAIST33 * | 12,822 | ORB3 | 23.121 | 137.694 | 15.999 | 22.254 | 236,458 | - |
MOD | 23.994 | 68.610 | 16.463 | 21.846 | 233,570 | 9.13% | ||
KAIST 38 | 21,600 | ORB3 | 19.830 | 64.144 | 13.212 | 16.384 | 303,414 | - |
MOD | 21.940 | 53.473 | 18.563 | 22.525 | 254,995 | 7.80% | ||
KAIST39 | 18,657 | ORB3 | 19.767 | 115.902 | 11.583 | 15.129 | 310,722 | - |
MOD | 22.420 | 46.189 | 16.275 | 22.005 | 284,604 | 7.26% | ||
UZH * | 81,169 | ORB3 | 41.479 | 11.672 | 10.234 | 10.788 | 284,779 | - |
MOD | 43.179 | 10.857 | 10.704 | 11.430 | 276,414 | 3.06% |
Sequence | Images | Method | Mean Tracking Time (ms) | Median Tracking Time (ms) | RMS ATE (m) | Median MP Triangulation Depth (m) | Median MP Observation Distance (m) | Map Points | Virtual Map Point Percentage |
---|---|---|---|---|---|---|---|---|---|
Yard | 4838 | ORB3 | 30.271 | 26.739 | 1.962 | 3.535 | 10.263 | 17,945 | - |
MOD | 26.122 | 23.247 | 1.531 | 3.739 | 11.536 | 17,076 | 9.76% | ||
Road | 9901 | ORB3 | 30.289 | 28.705 | 2.376 | 4.834 | 15.196 | 32,025 | - |
MOD | 30.385 | 28.672 | 2.033 | 5.208 | 16.850 | 29,473 | 10.51% | ||
Park | 10,538 | ORB3 | 24.804 | 23.430 | 1.842 | 6.219 | 13.364 | 29,229 | - |
MOD | 24.805 | 23.458 | 1.639 | 6.699 | 13.601 | 27,028 | 7.43% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dou, H.; Zhao, X.; Liu, B.; Jia, Y.; Wang, G.; Wang, C. Enhancing Real-Time Visual SLAM with Distant Landmarks in Large-Scale Environments. Drones 2024, 8, 586. https://doi.org/10.3390/drones8100586
Dou H, Zhao X, Liu B, Jia Y, Wang G, Wang C. Enhancing Real-Time Visual SLAM with Distant Landmarks in Large-Scale Environments. Drones. 2024; 8(10):586. https://doi.org/10.3390/drones8100586
Chicago/Turabian StyleDou, Hexuan, Xinyang Zhao, Bo Liu, Yinghao Jia, Guoqing Wang, and Changhong Wang. 2024. "Enhancing Real-Time Visual SLAM with Distant Landmarks in Large-Scale Environments" Drones 8, no. 10: 586. https://doi.org/10.3390/drones8100586
APA StyleDou, H., Zhao, X., Liu, B., Jia, Y., Wang, G., & Wang, C. (2024). Enhancing Real-Time Visual SLAM with Distant Landmarks in Large-Scale Environments. Drones, 8(10), 586. https://doi.org/10.3390/drones8100586