This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Survey of Visual SLAM Based on RGB-D Images Using Deep Learning and Comparative Study for VOE
by
Van-Hung Le
Van-Hung Le *
and
Thi-Ha-Phuong Nguyen
Thi-Ha-Phuong Nguyen
Information Technology Department, Tan Trao University, Tuyen Quang City 22000, Vietnam
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(7), 394; https://doi.org/10.3390/a18070394 (registering DOI)
Submission received: 21 April 2025
/
Revised: 13 June 2025
/
Accepted: 23 June 2025
/
Published: 27 June 2025
Abstract
Visual simultaneous localization and mapping (Visual SLAM) based on RGB-D image data includes two main tasks: One is to build an environment map, and the other is to simultaneously track the position and movement of visual odometry estimation (VOE). Visual SLAM and VOE are used in many applications, such as robot systems, autonomous mobile robots, assistance systems for the blind, human–machine interaction, industry, etc. To solve the computer vision problems in Visual SLAM and VOE from RGB-D images, deep learning (DL) is an approach that gives very convincing results. This manuscript examines the results, advantages, difficulties, and challenges of the problem of Visual SLAM and VOE based on DL. In this paper, the taxonomy is proposed to conduct a complete survey based on three methods to construct Visual SLAM and VOE from RGB-D images (1) using DL for the modules of the Visual SLAM and VOE systems; (2) using DL to supplement the modules of Visual SLAM and VOE systems; and (3) using end-to-end DL to build Visual SLAM and VOE systems. The 220 scientific publications on Visual SLAM, VOE, and related issues were surveyed. The studies were surveyed based on the order of methods, datasets, evaluation measures, and detailed results. In particular, studies on using DL to build Visual SLAM and VOE systems have analyzed the challenges, advantages, and disadvantages. We also proposed and published the TQU-SLAM benchmark dataset, and a comparative study on fine-tuning the VOE model using a Multi-Layer Fusion network (MLF-VO) framework was performed. The comparison results of VOE on the TQU-SLAM benchmark dataset range from 16.97 m to 57.61 m. This is a huge error compared to the VOE methods on the KITTI, TUM RGB-D SLAM, and ICL-NUIM datasets. Therefore, the dataset we publish is very challenging, especially in the opposite direction (OP-D) when collecting and annotation data. The results of the comparative study are also presented in detail and available.
Share and Cite
MDPI and ACS Style
Le, V.-H.; Nguyen, T.-H.-P.
A Survey of Visual SLAM Based on RGB-D Images Using Deep Learning and Comparative Study for VOE. Algorithms 2025, 18, 394.
https://doi.org/10.3390/a18070394
AMA Style
Le V-H, Nguyen T-H-P.
A Survey of Visual SLAM Based on RGB-D Images Using Deep Learning and Comparative Study for VOE. Algorithms. 2025; 18(7):394.
https://doi.org/10.3390/a18070394
Chicago/Turabian Style
Le, Van-Hung, and Thi-Ha-Phuong Nguyen.
2025. "A Survey of Visual SLAM Based on RGB-D Images Using Deep Learning and Comparative Study for VOE" Algorithms 18, no. 7: 394.
https://doi.org/10.3390/a18070394
APA Style
Le, V.-H., & Nguyen, T.-H.-P.
(2025). A Survey of Visual SLAM Based on RGB-D Images Using Deep Learning and Comparative Study for VOE. Algorithms, 18(7), 394.
https://doi.org/10.3390/a18070394
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.