MDPI - Publisher of Open Access Journals

45 pages, 5418 KB

Open AccessReview

Visual and Visual–Inertial SLAM for UGV Navigation in Unstructured Natural Environments: A Survey of Challenges and Deep Learning Advances

by Tiago Pereira, Carlos Viegas, Salviano Soares and Nuno Ferreira

Robotics 2026, 15(2), 35; https://doi.org/10.3390/robotics15020035 - 2 Feb 2026

Viewed by 1084

Abstract

Localization and mapping remain critical challenges for Unmanned Ground Vehicles (UGVs) operating in unstructured natural environments, such as forests and agricultural fields. While Visual SLAM (VSLAM) and Visual–Inertial SLAM (VI-SLAM) have matured significantly in structured and urban scenarios, their extension to outdoor natural [...] Read more.

Localization and mapping remain critical challenges for Unmanned Ground Vehicles (UGVs) operating in unstructured natural environments, such as forests and agricultural fields. While Visual SLAM (VSLAM) and Visual–Inertial SLAM (VI-SLAM) have matured significantly in structured and urban scenarios, their extension to outdoor natural domains introduces severe challenges, including dynamic vegetation, illumination variations, a lack of distinctive features, and degraded GNSS availability. Recent advances in Deep Learning have brought promising developments to VSLAM- and VI-SLAM-based pipelines, ranging from learned feature extraction and matching to self-supervised monocular depth prediction and differentiable end-to-end SLAM frameworks. Furthermore, emerging methods for adaptive sensor fusion, leveraging attention mechanisms and reinforcement learning, open new opportunities to improve robustness by dynamically weighting the contributions of camera and IMU measurements. This review provides a comprehensive overview of Visual and Visual–Inertial SLAM for UGVs in unstructured environments, highlighting the challenges posed by natural contexts and the limitations of current pipelines. Classic VI-SLAM frameworks and recent Deep-Learning-based approaches were systematically reviewed. Special attention is given to field robotics applications in agriculture and forestry, where low-cost sensors and robustness against environmental variability are essential. Finally, open research directions are discussed, including self-supervised representation learning, adaptive sensor confidence models, and scalable low-cost alternatives. By identifying key gaps and opportunities, this work aims to guide future research toward resilient, adaptive, and economically viable VSLAM and VI-SLAM pipelines, tailored for UGV navigation in unstructured natural environments. Full article

(This article belongs to the Special Issue Localization and 3D Mapping of Intelligent Robotics)

► Show Figures

Figure 1

32 pages, 1435 KB

Open AccessReview

Smart Safety Helmets with Integrated Vision Systems for Industrial Infrastructure Inspection: A Comprehensive Review of VSLAM-Enabled Technologies

by Emmanuel A. Merchán-Cruz, Samuel Moveh, Oleksandr Pasha, Reinis Tocelovskis, Alexander Grakovski, Alexander Krainyukov, Nikita Ostrovenecs, Ivans Gercevs and Vladimirs Petrovs

Sensors 2025, 25(15), 4834; https://doi.org/10.3390/s25154834 - 6 Aug 2025

Cited by 3 | Viewed by 5236

Abstract

Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused [...] Read more.

Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused inspection platforms, highlighting how modern helmets leverage real-time visual SLAM algorithms to map environments and assist inspectors. A systematic literature search was conducted targeting high-impact journals, patents, and industry reports. We classify helmet-integrated camera systems into monocular, stereo, and omnidirectional types and compare their capabilities for infrastructure inspection. We examine core VSLAM algorithms (feature-based, direct, hybrid, and deep-learning-enhanced) and discuss their adaptation to wearable platforms. Multi-sensor fusion approaches integrating inertial, LiDAR, and GNSS data are reviewed, along with edge/cloud processing architectures enabling real-time performance. This paper compiles numerous industrial use cases, from bridges and tunnels to plants and power facilities, demonstrating significant improvements in inspection efficiency, data quality, and worker safety. Key challenges are analyzed, including technical hurdles (battery life, processing limits, and harsh environments), human factors (ergonomics, training, and cognitive load), and regulatory issues (safety certification and data privacy). We also identify emerging trends, such as semantic SLAM, AI-driven defect recognition, hardware miniaturization, and collaborative multi-helmet systems. This review finds that VSLAM-equipped smart helmets offer a transformative approach to infrastructure inspection, enabling real-time mapping, augmented awareness, and safer workflows. We conclude by highlighting current research gaps, notably in standardizing systems and integrating with asset management, and provide recommendations for industry adoption and future research directions. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 3rd Edition)

► Show Figures

Figure 1

22 pages, 13198 KB

Open AccessArticle

UAV Localization in Urban Area Mobility Environment Based on Monocular VSLAM with Deep Learning

by Mutagisha Norbelt, Xiling Luo, Jinping Sun and Uwimana Claude

Drones 2025, 9(3), 171; https://doi.org/10.3390/drones9030171 - 26 Feb 2025

Cited by 9 | Viewed by 2953

Abstract

Unmanned Aerial Vehicles (UAVs) play a major role in different applications, including surveillance, mapping, and disaster relief, particularly in urban environments. This paper presents a comprehensive framework for UAV localization in outdoor environments using monocular ORB-SLAM3 integrated with optical flow and YOLOv5 for [...] Read more.

Unmanned Aerial Vehicles (UAVs) play a major role in different applications, including surveillance, mapping, and disaster relief, particularly in urban environments. This paper presents a comprehensive framework for UAV localization in outdoor environments using monocular ORB-SLAM3 integrated with optical flow and YOLOv5 for enhanced performance. The proposed system addresses the challenges of accurate localization in dynamic outdoor environments where traditional GPS methods may falter. By leveraging the capabilities of ORB-SLAM3, the UAV can effectively map its environment while simultaneously tracking its position using visual information from a single camera. The integration of optical flow techniques allows for accurate motion estimation between consecutive frames, which is critical for maintaining accurate localization amidst dynamic changes in the environment. YOLOv5 is a highly efficient model utilized for real-time object detection, enabling the system to identify and classify dynamic objects within the UAV’s field of view. This dual approach of using both optical flow and deep learning enhances the robustness of the localization process by filtering out dynamic features that could otherwise cause mapping errors. Experimental results show that the combination of monocular ORB-SLAM3, optical flow, and YOLOv5 significantly improves localization accuracy and reduces trajectory errors compared to traditional methods. In terms of absolute trajectory error and average tracking time, the suggested approach performs better than ORB-SLAM3 and DynaSLAM. For real-time SLAM applications in dynamic situations, our technique is especially well-suited due to its potential to achieve lower latency and greater accuracy. These improvements guarantee more dependable performance in a variety of scenarios in addition to increasing overall efficiency. The framework effectively distinguishes between static and dynamic elements, allowing for more reliable map construction and navigation. The results show that our proposed method (U-SLAM) produces a considerable decrease of up to 43.47% in APE and 26.47% RPE in S000, and its accuracy is higher for sequences with moving objects and more motion inside the image. Full article

(This article belongs to the Topic Unmanned Vehicles Technology and Embodied Intelligence Systems for Intelligent Transportation)

► Show Figures

Figure 1

21 pages, 4974 KB

Open AccessFeature PaperArticle

Improving Visual SLAM by Combining SVO and ORB-SLAM2 with a Complementary Filter to Enhance Indoor Mini-Drone Localization under Varying Conditions

by Amin Basiri, Valerio Mariani and Luigi Glielmo

Drones 2023, 7(6), 404; https://doi.org/10.3390/drones7060404 - 19 Jun 2023

Cited by 14 | Viewed by 7630

Abstract

Mini-drones can be used for a variety of tasks, ranging from weather monitoring to package delivery, search and rescue, and also recreation. In outdoor scenarios, they leverage Global Positioning Systems (GPS) and/or similar systems for localization in order to preserve safety and performance. [...] Read more.

Mini-drones can be used for a variety of tasks, ranging from weather monitoring to package delivery, search and rescue, and also recreation. In outdoor scenarios, they leverage Global Positioning Systems (GPS) and/or similar systems for localization in order to preserve safety and performance. In indoor scenarios, technologies such as Visual Simultaneous Localization and Mapping (V-SLAM) are used instead. However, more advancements are still required for mini-drone navigation applications, especially in the case of stricter safety requirements. In this research, a novel method for enhancing indoor mini-drone localization performance is proposed. By merging Oriented Rotated Brief SLAM (ORB-SLAM2) and Semi-Direct Monocular Visual Odometry (SVO) via an Adaptive Complementary Filter (ACF), the proposed strategy achieves better position estimates under various conditions (low light in low-surface-texture environments and high flying speed), showing an average percentage error of 18.1% and 25.9% smaller than that of ORB-SLAM and SVO against the ground-truth. Full article

(This article belongs to the Special Issue Drone-Based Information Fusion to Improve Autonomous Navigation)

► Show Figures

Figure 1

29 pages, 9493 KB

Open AccessEditor’s ChoiceReview

Visual SLAM: What Are the Current Trends and What to Expect?

by Ali Tourani, Hriday Bavle, Jose Luis Sanchez-Lopez and Holger Voos

Sensors 2022, 22(23), 9297; https://doi.org/10.3390/s22239297 - 29 Nov 2022

Cited by 103 | Viewed by 28838

Abstract

In recent years, Simultaneous Localization and Mapping (SLAM) systems have shown significant performance, accuracy, and efficiency gain. In this regard, Visual Simultaneous Localization and Mapping (VSLAM) methods refer to the SLAM approaches that employ cameras for pose estimation and map reconstruction and are [...] Read more.

In recent years, Simultaneous Localization and Mapping (SLAM) systems have shown significant performance, accuracy, and efficiency gain. In this regard, Visual Simultaneous Localization and Mapping (VSLAM) methods refer to the SLAM approaches that employ cameras for pose estimation and map reconstruction and are preferred over Light Detection And Ranging (LiDAR)-based methods due to their lighter weight, lower acquisition costs, and richer environment representation. Hence, several VSLAM approaches have evolved using different camera types (e.g., monocular or stereo), and have been tested on various datasets (e.g., Technische Universität München (TUM) RGB-D or European Robotics Challenge (EuRoC)) and in different conditions (i.e., indoors and outdoors), and employ multiple methodologies to have a better understanding of their surroundings. The mentioned variations have made this topic popular for researchers and have resulted in various methods. In this regard, the primary intent of this paper is to assimilate the wide range of works in VSLAM and present their recent advances, along with discussing the existing challenges and trends. This survey is worthwhile to give a big picture of the current focuses in robotics and VSLAM fields based on the concentrated resolutions and objectives of the state-of-the-art. This paper provides an in-depth literature survey of fifty impactful articles published in the VSLAMs domain. The mentioned manuscripts have been classified by different characteristics, including the novelty domain, objectives, employed algorithms, and semantic level. The paper also discusses the current trends and contemporary directions of VSLAM techniques that may help researchers investigate them. Full article

(This article belongs to the Special Issue Aerial Robotics: Navigation and Path Planning)

► Show Figures

Figure 1

25 pages, 4085 KB

Open AccessArticle

Lie Group Modelling for an EKF-Based Monocular SLAM Algorithm

by Samy Labsir, Gaël Pages and Damien Vivet

Remote Sens. 2022, 14(3), 571; https://doi.org/10.3390/rs14030571 - 25 Jan 2022

Cited by 9 | Viewed by 5168

Abstract

This paper addresses the problem of monocular Simultaneous Localization And Mapping on Lie groups using fiducial patterns. For that purpose, we propose a reformulation of the classical camera model as a model on matrix Lie groups. Thus, we define an original-state vector containing [...] Read more.

This paper addresses the problem of monocular Simultaneous Localization And Mapping on Lie groups using fiducial patterns. For that purpose, we propose a reformulation of the classical camera model as a model on matrix Lie groups. Thus, we define an original-state vector containing the camera pose and the set of transformations from the world frame to each pattern, which constitutes the map’s state. Each element of the map’s state, as well as the camera pose, are intrinsically constrained to evolve on the matrix Lie group

S E (3)

. Filtering is then performed by an extended Kalman filter dedicated to matrix Lie groups to solve the visual SLAM process (LG-EKF-VSLAM). This algorithm has been evaluated in different scenarios based on simulated data as well as real data. The results show that the LG-EKF-VSLAM can improve the absolute position and orientation accuracy, compared to a classical EKF visual SLAM (EKF-VSLAM). Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

5 pages, 17007 KB

Open AccessProceeding Paper

Visual Simultaneous Localization and Mapping (vSLAM) of Driverless Car in GPS-Denied Areas

by Abira Kanwal, Zunaira Anjum and Wasif Muhammad

Eng. Proc. 2021, 12(1), 49; https://doi.org/10.3390/engproc2021012049 - 29 Dec 2021

Cited by 5 | Viewed by 2101

Abstract

A simultaneous localization and mapping (SLAM) algorithm allows a mobile robot or a driverless car to determine its location in an unknown and dynamic environment where it is placed, and simultaneously allows it to build a consistent map of that environment. Driverless cars [...] Read more.

A simultaneous localization and mapping (SLAM) algorithm allows a mobile robot or a driverless car to determine its location in an unknown and dynamic environment where it is placed, and simultaneously allows it to build a consistent map of that environment. Driverless cars are becoming an emerging reality from science fiction, but there is still too much required for the development of technological breakthroughs for their control, guidance, safety, and health related issues. One existing problem which is required to be addressed is SLAM of driverless car in GPS denied-areas, i.e., congested urban areas with large buildings where GPS signals are weak as a result of congested infrastructure. Due to poor reception of GPS signals in these areas, there is an immense need to localize and route driverless car using onboard sensory modalities, e.g., LIDAR, RADAR, etc., without being dependent on GPS information for its navigation and control. The driverless car SLAM using LIDAR and RADAR involves costly sensors, which appears to be a limitation of this approach. To overcome these limitations, in this article we propose a visual information-based SLAM (vSLAM) algorithm for GPS-denied areas using a cheap video camera. As a front-end process, features-based monocular visual odometry (VO) on grayscale input image frames is performed. Random Sample Consensus (RANSAC) refinement and global pose estimation is performed as a back-end process. The results obtained from the proposed approach demonstrate 95% accuracy with a maximum mean error of 4.98. Full article

(This article belongs to the Proceedings of The 1st International Conference on Energy, Power and Environment)

► Show Figures

Figure 1

15 pages, 15831 KB

Open AccessFeature PaperEditor’s ChoiceArticle

Visual SLAM for Indoor Livestock and Farming Using a Small Drone with a Monocular Camera: A Feasibility Study

by Sander Krul, Christos Pantos, Mihai Frangulea and João Valente

Drones 2021, 5(2), 41; https://doi.org/10.3390/drones5020041 - 19 May 2021

Cited by 80 | Viewed by 14275

Abstract

Real-time data collection and decision making with drones will play an important role in precision livestock and farming. Drones are already being used in precision agriculture. Nevertheless, this is not the case for indoor livestock and farming environments due to several challenges and [...] Read more.

Real-time data collection and decision making with drones will play an important role in precision livestock and farming. Drones are already being used in precision agriculture. Nevertheless, this is not the case for indoor livestock and farming environments due to several challenges and constraints. These indoor environments are limited in physical space and there is the localization problem, due to GPS unavailability. Therefore, this work aims to give a step toward the usage of drones for indoor farming and livestock management. To investigate on the drone positioning in these workspaces, two visual simultaneous localization and mapping (VSLAM)—LSD-SLAM and ORB-SLAM—algorithms were compared using a monocular camera onboard a small drone. Several experiments were carried out in a greenhouse and a dairy farm barn with the absolute trajectory and the relative pose error being analyzed. It was found that the approach that suits best these workspaces is ORB-SLAM. This algorithm was tested by performing waypoint navigation and generating maps from the clustered areas. It was shown that aerial VSLAM could be achieved within these workspaces and that plant and cattle monitoring could benefit from using affordable and off-the-shelf drone technology. Full article

(This article belongs to the Special Issue Advances in Civil Applications of Unmanned Aircraft Systems)

► Show Figures

Figure 1

17 pages, 2508 KB

Open AccessArticle

DOE-SLAM: Dynamic Object Enhanced Visual SLAM

by Xiao Hu and Jochen Lang

Sensors 2021, 21(9), 3091; https://doi.org/10.3390/s21093091 - 29 Apr 2021

Cited by 10 | Viewed by 6601

Abstract

In this paper, we formulate a novel strategy to adapt monocular-vision-based simultaneous localization and mapping (vSLAM) to dynamic environments. When enough background features can be captured, our system not only tracks the camera trajectory based on static background features but also estimates the [...] Read more.

In this paper, we formulate a novel strategy to adapt monocular-vision-based simultaneous localization and mapping (vSLAM) to dynamic environments. When enough background features can be captured, our system not only tracks the camera trajectory based on static background features but also estimates the foreground object motion from object features. In cases when a moving object obstructs too many background features for successful camera tracking from the background, our system can exploit the features from the object and the prediction of the object motion to estimate the camera pose. We use various synthetic and real-world test scenarios and the well-known TUM sequences to evaluate the capabilities of our system. The experiments show that we achieve higher pose estimation accuracy and robustness over state-of-the-art monocular vSLAM systems. Full article

(This article belongs to the Special Issue Vision-Based Sensors in Navigation: Image Processing and Understanding)

► Show Figures

Figure 1

14 pages, 892 KB

Open AccessCommunication

Modelling Software Architecture for Visual Simultaneous Localization and Mapping

by Bhavyansh Mishra, Robert Griffin and Hakki Erhan Sevil

Automation 2021, 2(2), 48-61; https://doi.org/10.3390/automation2020003 - 2 Apr 2021

Cited by 2 | Viewed by 5798

Abstract

Visual simultaneous localization and mapping (VSLAM) is an essential technique used in areas such as robotics and augmented reality for pose estimation and 3D mapping. Research on VSLAM using both monocular and stereo cameras has grown significantly over the last two decades. There [...] Read more.

Visual simultaneous localization and mapping (VSLAM) is an essential technique used in areas such as robotics and augmented reality for pose estimation and 3D mapping. Research on VSLAM using both monocular and stereo cameras has grown significantly over the last two decades. There is, therefore, a need for emphasis on a comprehensive review of the evolving architecture of such algorithms in the literature. Although VSLAM algorithm pipelines share similar mathematical backbones, their implementations are individualized and the ad hoc nature of the interfacing between different modules of VSLAM pipelines complicates code reuseability and maintenance. This paper presents a software model for core components of VSLAM implementations and interfaces that govern data flow between them while also attempting to preserve the elements that offer performance improvements over the evolution of VSLAM architectures. The framework presented in this paper employs principles from model-driven engineering (MDE), which are used extensively in the development of large and complicated software systems. The presented VSLAM framework will assist researchers in improving the performance of individual modules of VSLAM while not having to spend time on system integration of those modules into VSLAM pipelines. Full article

(This article belongs to the Collection Smart Robotics for Automation)

► Show Figures

Figure 1

16 pages, 10497 KB

Open AccessArticle

Evaluation of Several Feature Detectors/Extractors on Underwater Images towards vSLAM

by Franco Hidalgo and Thomas Bräunl

Sensors 2020, 20(15), 4343; https://doi.org/10.3390/s20154343 - 4 Aug 2020

Cited by 31 | Viewed by 5931

Abstract

Modern visual SLAM (vSLAM) algorithms take advantage of computer vision developments in image processing and in interest point detectors to create maps and trajectories from camera images. Different feature detectors and extractors have been evaluated for this purpose in air and ground environments, [...] Read more.

Modern visual SLAM (vSLAM) algorithms take advantage of computer vision developments in image processing and in interest point detectors to create maps and trajectories from camera images. Different feature detectors and extractors have been evaluated for this purpose in air and ground environments, but not extensively for underwater scenarios. In this paper (I) we characterize underwater images where light and suspended particles alter considerably the images captured, (II) evaluate the performance of common interest points detectors and descriptors in a variety of underwater scenes and conditions towards vSLAM in terms of the number of features matched in subsequent video frames, the precision of the descriptors and the processing time. This research justifies the usage of feature detectors in vSLAM for underwater scenarios and present its challenges and limitations. Full article

(This article belongs to the Special Issue Intelligence and Autonomy for Underwater Robotic Vehicles)

► Show Figures

Figure 1

17 pages, 3541 KB

Open AccessArticle

LAP-SLAM: A Line-Assisted Point-Based Monocular VSLAM

by Fukai Zhang, Ting Rui, Chengsong Yang and Jianjun Shi

Electronics 2019, 8(2), 243; https://doi.org/10.3390/electronics8020243 - 20 Feb 2019

Cited by 15 | Viewed by 5637

Abstract

While the performance of the state-of-the-art point-based VSLAM (vision simultaneous localization and mapping) systems in well textured sequences is impressive, their performance in poorly textured situations is not satisfactory enough. A sensible alternative or addition is to consider lines. In this paper, we [...] Read more.

While the performance of the state-of-the-art point-based VSLAM (vision simultaneous localization and mapping) systems in well textured sequences is impressive, their performance in poorly textured situations is not satisfactory enough. A sensible alternative or addition is to consider lines. In this paper, we propose a novel line-assisted point-based VSLAM algorithm (LAP-SLAM). Our algorithm uses lines without descriptor matching, and the lines are used to assist the computation conducted by points. To the best of our knowledge, this paper proposes a new way to include line information in VSLAM. The basic idea is to use the collinear relationship of points to optimize the current point-based VSLAM algorithm. In LAP-SLAM, we propose a practical algorithm to match lines and compute the collinear relationship of points, a line-assisted bundle adjustment approach and a modified perspective-n-point (PnP) approach. We built our system based on the architecture and pipeline of ORB-SLAM. We evaluate the proposed method on a diverse range of indoor sequences in the TUM dataset and compare with point-based and point-line-based methods. The results show that the accuracy of our algorithm is close to point-line-based VSLAM systems with a much faster speed. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI