D3L-SLAM: A Comprehensive Hybrid Simultaneous Location and Mapping System with Deep Keypoint, Deep Depth, Deep Pose, and Line Detection
Abstract
:1. Introduction
- We propose D3L-SLAM, a hybrid monocular visual SLAM system that integrates deep keypoints, deep pose priors, deep depth estimation, and a line detector, enabling robust operation in complex, low-texture environments.
- To enhance SLAM performance in low-texture scenarios, we incorporate point–line feature constraints to optimize pose estimation and mapping through the construction of a tightly coupled point–line bundle adjustment (BA).
- To ensure scale-consistent pose and map estimates in monocular visual SLAM, we employ self-supervised depth estimation with RGB images to form a pseudo-RGBD sensor and integrate a virtual baseline to create a pseudo-stereo SLAM system.
- We conducted extensive experiments on both public and self-collected datasets, demonstrating that D3L-SLAM significantly outperforms representative traditional SLAM (e.g., ORB-SLAM3) and learning-based SLAM (e.g., LIFT-SLAM).
2. Related Works
2.1. Deep-Learning-Based Visual SLAM
2.2. Hybrid Visual SLAM
3. Hybrid Visual SLAM with Deep Keypoints, Depths, Poses, and Line Features
3.1. Framework
3.2. Integrating Line Features and Deep Keypoints in the Front End
3.2.1. Line Feature Detection
3.2.2. Self-Supervised Learning of Deep Keypoints
3.3. Virtual Stereo Visual SLAM
3.3.1. Self-Supervised Learning of Pose and Depth
3.3.2. Virtual Features Derived from Learned Depth
3.3.3. The Motion Model from PoseNet
3.4. Line–Point Bundle Adjustment and Online BoW in the Back End
3.4.1. Line–Point Pose-Only Bundle Adjustment Optimization
3.4.2. Binary Descriptors and Online-Learning-Based BoW
4. Experiments
4.1. Implementation Details
4.2. Evaluation on the KITTI Odometry Dataset
4.3. Evaluation with Self-Collected Dataset
4.4. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, X.; Fan, X.; Shi, P.; Ni, J.; Zhou, Z. An Overview of Key SLAM Technologies for Underwater Scenes. Remote Sens. 2023, 15, 2496. [Google Scholar] [CrossRef]
- Chen, W.; Zhou, C.; Shang, G.; Wang, X.; Li, Z.; Xu, C.; Hu, K. SLAM Overview: From Single Sensor to Heterogeneous Fusion. Remote Sens. 2022, 14, 6033. [Google Scholar] [CrossRef]
- Xu, K.; Hao, Y.; Yuan, S.; Wang, C.; Xie, L. AirVO: An Illumination-Robust Point-Line Visual Odometry. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 3429–3436. [Google Scholar] [CrossRef]
- Yang, N.; von Stumberg, L.; Wang, R.; Cremers, D. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1278–1289. [Google Scholar] [CrossRef]
- Wang, S.; Clark, R.; Wen, H.; Trigoni, N. Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 2043–2050. [Google Scholar]
- Jin, J.; Bai, J.; Xu, Y.; Huang, J. Unifying Deep ConvNet and Semantic Edge Features for Loop Closure Detection. Remote Sens. 2022, 14, 4885. [Google Scholar] [CrossRef]
- Liu, T.; Wang, Y.; Niu, X.; Chang, L.; Zhang, T.; Liu, J. LiDAR Odometry by Deep Learning-Based Feature Points with Two-Step Pose Estimation. Remote Sens. 2022, 14, 2764. [Google Scholar] [CrossRef]
- Wang, S.; Gou, G.; Sui, H.; Zhou, Y.; Zhang, H.; Li, J. CDSFusion: Dense Semantic SLAM for Indoor Environment Using CPU Computing. Remote Sens. 2022, 14, 979. [Google Scholar] [CrossRef]
- Li, R.; Wang, S.; Gu, D. DeepSLAM: A robust monocular SLAM system with unsupervised deep learning. IEEE Trans. Ind. Electron. 2020, 68, 3577–3587. [Google Scholar] [CrossRef]
- Li, D.; Shi, X.; Long, Q.; Liu, S.; Yang, W.; Wang, F.; Wei, Q.; Qiao, F. DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 4958–4965. [Google Scholar]
- Tang, J.; Ericson, L.; Folkesson, J.; Jensfelt, P. GCNv2: Efficient Correspondence Prediction for Real-Time SLAM. IEEE Robot. Autom. Lett. 2019, 4, 3505–3512. [Google Scholar] [CrossRef]
- Bruno, H.M.S.; Colombini, E. LIFT-SLAM: A deep-learning feature-based monocular visual SLAM method. Neurocomputing 2020, 455, 97–110. [Google Scholar] [CrossRef]
- Xiao, Z.; Li, S. SL-SLAM: A robust visual-inertial SLAM based deep feature extraction and matching. arXiv 2024, arXiv:2405.03413. [arXiv:cs.RO/2405.03413]. [Google Scholar]
- Bian, J.; Li, Z.; Wang, N.; Zhan, H.; Shen, C.; Cheng, M.M.; Reid, I. Unsupervised scale-consistent depth and ego-motion learning from monocular video. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Wang, Y.; Xu, B.; Fan, W.; Xiang, C. A robust and efficient loop closure detection approach for hybrid ground/aerial vehicles. Drones 2023, 7, 135. [Google Scholar] [CrossRef]
- Teed, Z.; Deng, J. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Adv. Neural Inf. Process. Syst. 2021, 34, 16558–16569. [Google Scholar]
- Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
- Li, Y.; Ushiku, Y.; Harada, T. Pose graph optimization for unsupervised monocular visual odometry. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 5439–5445. [Google Scholar]
- Zhao, W.; Liu, S.; Shu, Y.; Liu, Y.J. Towards Better Generalization: Joint Depth-Pose Learning Without PoseNet. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9148–9158. [Google Scholar] [CrossRef]
- Zhan, H.; Weerasekera, C.S.; Bian, J.; Garg, R.; Reid, I.D. DF-VO: What Should Be Learnt for Visual Odometry? arXiv 2021, arXiv:2103.00933. [Google Scholar]
- Sun, D.; Yang, X.; Liu, M.Y.; Kautz, J. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8934–8943. [Google Scholar]
- Tang, J.; Folkesson, J.; Jensfelt, P. Geometric Correspondence Network for Camera Motion Estimation. IEEE Robot. Autom. Lett. 2018, 3, 1010–1017. [Google Scholar] [CrossRef]
- Sarlin, P.E.; Cadena, C.; Siegwart, R.Y.; Dymczyk, M. From Coarse to Fine: Robust Hierarchical Localization at Large Scale. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12708–12717. [Google Scholar]
- Galvez-López, D.; Tardos, J.D. Bags of Binary Words for Fast Place Recognition in Image Sequences. IEEE Trans. Robot. 2012, 28, 1188–1197. [Google Scholar] [CrossRef]
- Arandjelović, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 218, 5297–5307. [Google Scholar] [CrossRef]
- Tiwari, L.; Ji, P.; Tran, Q.H.; Zhuang, B.; Anand, S.; Chandraker, M. Pseudo rgb-d for self-improving monocular slam and depth prediction. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 437–455. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-Supervised Interest Point Detection and Description. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 337–33712. [Google Scholar]
- Lindenberger, P.; Sarlin, P.E.; Pollefeys, M. LightGlue: Local Feature Matching at Light Speed. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 17581–17592. [Google Scholar] [CrossRef]
- Sauerbeck, F.; Obermeier, B.; Rudolph, M.; Betz, J. RGB-L: Enhancing Indirect Visual SLAM Using LiDAR-Based Dense Depth Maps. In Proceedings of the 2023 3rd International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 24–26 March 2023; pp. 95–100. [Google Scholar]
- Engel, J.; Koltun, V.; Cremers, D. Direct Sparse Odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 611–625. [Google Scholar] [CrossRef]
- Yuan, C.; Xu, Y.; Zhou, Q. PLDS-SLAM: Point and Line Features SLAM in Dynamic Environment. Remote Sens. 2023, 15, 1893. [Google Scholar] [CrossRef]
- Rong, H.; Gao, Y.; Guan, L.; Ramirez-Serrano, A.; Xu, X.; Zhu, Y. Point-Line Visual Stereo SLAM Using EDlines and PL-BoW. Remote Sens. 2021, 13, 3591. [Google Scholar] [CrossRef]
- Grompone von Gioi, R.; Jakubowicz, J.; Morel, J.M.; Randall, G. LSD: A Fast Line Segment Detector with a False Detection Control. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 722–732. [Google Scholar] [CrossRef]
- Zhang, L.; Koch, R. An efficient and robust line segment matching approach based on LBD descriptor and pairwise geometric consistency. J. Vis. Commun. Image Represent. 2013, 24, 794–805. [Google Scholar] [CrossRef]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar] [CrossRef]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.J.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Garcia-Fidalgo, E.; Ortiz, A. iBoW-LCD: An Appearance-Based Loop-Closure Detection Approach Using Incremental Bags of Binary Words. IEEE Robot. Autom. Lett. 2018, 3, 3051–3057. [Google Scholar] [CrossRef]
- Geiger, A.; Ziegler, J.; Stiller, C. StereoScan: Dense 3d reconstruction in real-time. In Proceedings of the 2011 IEEE Intelligent Vehicles Symposium (IV), Baden-Baden, Germany, 5–9 June 2011; pp. 963–968. [Google Scholar] [CrossRef]
- Wei, P.; Hua, G.; Huang, W.; Meng, F.; Liu, H. Unsupervised Monocular Visual-inertial Odometry Network. In Proceedings of the International Joint Conference on Artificial Intelligence, Rhodes, Greece, 12–18 September 2020. [Google Scholar]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Gao, X.; Wang, R.; Demmel, N.; Cremers, D. LDSO: Direct Sparse Odometry with Loop Closure. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 2198–2204. [Google Scholar] [CrossRef]
- Deng, C.; Qiu, K.; Xiong, R.; Zhou, C. Comparative Study of Deep Learning Based Features in SLAM. In Proceedings of the 2019 4th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS), Nagoya, Japan, 13–15 July 2019; pp. 250–254. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Chintala, S. PyTorch: An Imperative Style, High-Performance Deep Learning Library; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
Method | Deep Keypoint | Line Detector | Deep Depth | Deep Pose | Loopclosing |
---|---|---|---|---|---|
GCN-SLAM | ✓ | ||||
DX-SLAM | ✓ | ✓ | |||
LIFT-SLAM | ✓ | ✓ | ✓ | ||
SC-Depth | ✓ | ✓ | ✓ | ||
Pseudo RGBD-SLAM | ✓ | ✓ | |||
AirVO | ✓ | ✓ | |||
SL-SLAM | ✓ | ✓ | ✓ | ||
D3VO | ✓ | ✓ | |||
PLDS-SLAM | ✓ | ✓ | |||
RGBL-SLAM | ✓ | ✓ | |||
D3L-SLAM (Ours) | ✓ | ✓ | ✓ | ✓ | ✓ |
Seq | VISO-M | LDSO | ORB-SLAM3 | LIFT-SLAM | D3L-SLAM | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
00 | 36.95 | 2.42 | 2.92 | 0.36 | 7.88 | 2.74 | 0.30 | 7.15 | 3.18 | 2.99 | 8.06 | 2.09 | 0.80 | 5.54 |
02 | 21.98 | 1.22 | 4.02 | 0.74 | 26.53 | 5.16 | 0.38 | 21.94 | 8.73 | 2.49 | 40.04 | 3.83 | 1.02 | 17.37 |
03 | 16.14 | 2.67 | 2.16 | 0.16 | 2.89 | 2.69 | 0.17 | 3.32 | 1.46 | 0.34 | 2.23 | 2.63 | 1.62 | 3.63 |
04 | 2.61 | 1.53 | 1.3 | 0.19 | 2.89 | 2.25 | 0.24 | 2.27 | 2.22 | 0.48 | 0.51 | 3.21 | 1.00 | 2.33 |
05 | 17.20 | 3.52 | 2.05 | 0.24 | 4.60 | 3.36 | 0.39 | 6.34 | 6.09 | 3.11 | 13.55 | 2.27 | 0.56 | 5.90 |
06 | 7.91 | 1.83 | 6.04 | 0.16 | 13.28 | 6.40 | 0.18 | 14.23 | 12.24 | 2.91 | 30.38 | 2.91 | 1.12 | 5.85 |
07 | 20.00 | 5.30 | 11.27 | 5.13 | 15.82 | 2.08 | 0.42 | 2.46 | 2.42 | 4.02 | 3.63 | 1.18 | 0.41 | 1.49 |
08 | 39.78 | 1.99 | 31.69 | 0.24 | 127.92 | 8.55 | 0.31 | 31.97 | 47.10 | 2.02 | 184.43 | 2.82 | 0.63 | 11.54 |
09 | 29.01 | 1.32 | 18.28 | 0.21 | 75.78 | 3.08 | 0.54 | 8.97 | 19.91 | 2.14 | 59.62 | 2.28 | 0.43 | 5.37 |
10 | 28.52 | 3.23 | 7.77 | 0.21 | 16.95 | 8.57 | 0.36 | 17.02 | 9.72 | 2.24 | 29.87 | 3.99 | 0.58 | 8.10 |
Avg | 22.01 | 2.50 | 8.75 | 0.76 | 29.45 | 4.49 | 0.33 | 11.567 | 11.31 | 2.27 | 37.23 | 2.72 | 0.82 | 6.71 |
Method | 01 Seq | 02 Seq | 03 Seq | 04 Seq | Avg |
---|---|---|---|---|---|
ORB-SLAM3 | 22.30 | 8.43 | 13.42 | 25.50 | 17.41 |
LDSO | 70.36 | 117.10 | 53.96 | 24.23 | 66.41 |
D3L-SLAM | 8.72 | 9.42 | 11.09 | 10.21 | 9.86 |
Method | DK | Depth | Pose | Line | Metric | 00 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ours1 | ✓ | 2.56 | 3.76 | 4.87 | 1.07 | 1.90 | 7.24 | 1.00 | 7.66 | 2.95 | 5.58 | 3.86 | ||||
0.31 | 0.27 | 0.17 | 0.24 | 0.26 | 0.17 | 0.28 | 0.28 | 0.29 | 0.23 | 0.25 | ||||||
Ours2 | ✓ | ✓ | 2.13 | 4.07 | 2.76 | 2.88 | 2.12 | 3.03 | 1.38 | 3.13 | 2.75 | 4.27 | 2.85 | |||
0.79 | 1.06 | 1.46 | 0.91 | 0.58 | 1.15 | 0.41 | 0.66 | 0.53 | 0.62 | 0.82 | ||||||
Ours3 | ✓ | ✓ | ✓ | 1.98 | 3.70 | 2.58 | 3.53 | 2.15 | 2.57 | 1.59 | 3.03 | 2.67 | 4.27 | 2.81 | ||
0.63 | 1.01 | 1.43 | 1.16 | 0.54 | 1.07 | 0.69 | 0.65 | 0.51 | 0.62 | 0.83 | ||||||
Ours4 | ✓ | ✓ | ✓ | ✓ | 2.09 | 3.83 | 2.63 | 3.21 | 2.27 | 2.91 | 1.18 | 2.82 | 2.28 | 3.99 | 2.72 | |
0.80 | 1.02 | 1.62 | 1.00 | 0.56 | 1.12 | 0.41 | 0.63 | 0.43 | 0.58 | 0.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qu, H.; Wang, C.; Xu, Y.; Zhang, L.; Hu, X.; Chen, C. D3L-SLAM: A Comprehensive Hybrid Simultaneous Location and Mapping System with Deep Keypoint, Deep Depth, Deep Pose, and Line Detection. Appl. Sci. 2024, 14, 9748. https://doi.org/10.3390/app14219748
Qu H, Wang C, Xu Y, Zhang L, Hu X, Chen C. D3L-SLAM: A Comprehensive Hybrid Simultaneous Location and Mapping System with Deep Keypoint, Deep Depth, Deep Pose, and Line Detection. Applied Sciences. 2024; 14(21):9748. https://doi.org/10.3390/app14219748
Chicago/Turabian StyleQu, Hao, Congrui Wang, Yangfan Xu, Lilian Zhang, Xiaoping Hu, and Changhao Chen. 2024. "D3L-SLAM: A Comprehensive Hybrid Simultaneous Location and Mapping System with Deep Keypoint, Deep Depth, Deep Pose, and Line Detection" Applied Sciences 14, no. 21: 9748. https://doi.org/10.3390/app14219748
APA StyleQu, H., Wang, C., Xu, Y., Zhang, L., Hu, X., & Chen, C. (2024). D3L-SLAM: A Comprehensive Hybrid Simultaneous Location and Mapping System with Deep Keypoint, Deep Depth, Deep Pose, and Line Detection. Applied Sciences, 14(21), 9748. https://doi.org/10.3390/app14219748