Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (47)

Search Parameters:
Keywords = stereo visual SLAM

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 22793 KB  
Article
GL-VSLAM: A General Lightweight Visual SLAM Approach for RGB-D and Stereo Cameras
by Xu Li, Tuanjie Li, Yulin Zhang, Ziang Li, Lixiang Ban and Yuming Ning
Sensors 2025, 25(24), 7467; https://doi.org/10.3390/s25247467 - 8 Dec 2025
Viewed by 594
Abstract
Feature-based indirect SLAM is more robust than direct SLAM; however, feature extraction and descriptor computation are time-consuming. In this paper, we propose GL-VSLAM, a general lightweight visual SLAM approach designed for RGB-D and stereo cameras. GL-VSLAM utilizes sparse optical flow matching based on [...] Read more.
Feature-based indirect SLAM is more robust than direct SLAM; however, feature extraction and descriptor computation are time-consuming. In this paper, we propose GL-VSLAM, a general lightweight visual SLAM approach designed for RGB-D and stereo cameras. GL-VSLAM utilizes sparse optical flow matching based on uniform motion model prediction to establish keypoint correspondences between consecutive frames, rather than relying on descriptor-based feature matching, thereby achieving high real-time performance. To enhance positioning accuracy, we adopt a coarse-to-fine strategy for pose estimation in two stages. In the first stage, the initial camera pose is estimated using RANSAC PnP based on robust keypoint correspondences from sparse optical flow. In the second stage, the camera pose is further refined by minimizing the reprojection error. Keypoints and descriptors are extracted from keyframes for backend optimization and loop closure detection. We evaluate our system on the TUM and KITTI datasets, as well as in a real-world environment, and compare it with several state-of-the-art methods. Experimental results demonstrate that our method achieves comparable positioning accuracy, while its efficiency is up to twice that of ORB-SLAM2. Full article
Show Figures

Figure 1

17 pages, 2339 KB  
Article
Robust Direct Multi-Camera SLAM in Challenging Scenarios
by Yonglei Pan, Yueshang Zhou, Qiming Qi, Guoyan Wang, Yanwen Jiang, Hongqi Fan and Jun He
Electronics 2025, 14(23), 4556; https://doi.org/10.3390/electronics14234556 - 21 Nov 2025
Viewed by 652
Abstract
Traditional monocular and stereo visual SLAM systems often fail to operate stably in complex unstructured environments (e.g., weakly textured or repetitively textured scenes) due to feature scarcity from their limited fields of view. In contrast, multi-camera systems can effectively overcome the perceptual limitations [...] Read more.
Traditional monocular and stereo visual SLAM systems often fail to operate stably in complex unstructured environments (e.g., weakly textured or repetitively textured scenes) due to feature scarcity from their limited fields of view. In contrast, multi-camera systems can effectively overcome the perceptual limitations of monocular or stereo setups by providing broader field-of-view coverage. However, most existing multi-camera visual SLAM systems are primarily feature-based and thus still constrained by the inherent limitations of feature extraction in such environments. To address this issue, a multi-camera visual SLAM framework based on the direct method is proposed. In the front-end, a detector-free matcher named Efficient LoFTR is incorporated, enabling pose estimation through dense pixel associations to improve localization accuracy and robustness. In the back-end, geometric constraints among multiple cameras are integrated, and system localization accuracy is further improved through a joint optimization process. Through extensive experiments on public datasets and a self-built simulation dataset, the proposed method achieves superior performance over state-of-the-art approaches regarding localization accuracy, trajectory completeness, and environmental adaptability, thereby validating its high robustness in complex unstructured environments. Full article
Show Figures

Figure 1

14 pages, 1310 KB  
Article
Stereo-GS: Online 3D Gaussian Splatting Mapping Using Stereo Depth Estimation
by Junkyu Park, Byeonggwon Lee, Sanggi Lee and Soohwan Song
Electronics 2025, 14(22), 4436; https://doi.org/10.3390/electronics14224436 - 14 Nov 2025
Viewed by 1956
Abstract
We present Stereo-GS, a real-time system for online 3D Gaussian Splatting (3DGS) that reconstructs photorealistic 3D scenes from streaming stereo pairs. Unlike prior offline 3DGS methods that require dense multi-view input or precomputed depth, Stereo-GS estimates metrically accurate depth maps directly from rectified [...] Read more.
We present Stereo-GS, a real-time system for online 3D Gaussian Splatting (3DGS) that reconstructs photorealistic 3D scenes from streaming stereo pairs. Unlike prior offline 3DGS methods that require dense multi-view input or precomputed depth, Stereo-GS estimates metrically accurate depth maps directly from rectified stereo geometry, enabling progressive, globally consistent reconstruction. The frontend combines a stereo implementation of DROID-SLAM for robust tracking and keyframe selection with FoundationStereo, a generalizable stereo network that needs no scene-specific fine-tuning. A two-stage filtering pipeline improves depth reliability by removing outliers using a variance-based refinement filter followed by a multi-view consistency check. In the backend, we selectively initialize new Gaussians in under-represented regions flagged by low PSNR during rendering and continuously optimize them via differentiable rendering. To maintain global coherence with minimal overhead, we apply a lightweight rigid alignment after periodic bundle adjustment. On EuRoC and TartanAir, Stereo-GS attains state-of-the-art performance, improving average PSNR by 0.22 dB and 2.45 dB over the best baseline, respectively. Together with superior visual quality, these results show that Stereo-GS delivers high-fidelity, geometrically accurate 3D reconstructions suitable for real-time robotics, navigation, and immersive AR/VR applications. Full article
(This article belongs to the Special Issue Real-Time Computer Vision)
Show Figures

Graphical abstract

19 pages, 5861 KB  
Article
Topological Signal Processing from Stereo Visual SLAM
by Eleonora Di Salvo, Tommaso Latino, Maria Sanzone, Alessia Trozzo and Stefania Colonnese
Sensors 2025, 25(19), 6103; https://doi.org/10.3390/s25196103 - 3 Oct 2025
Viewed by 719
Abstract
Topological signal processing is emerging alongside Graph Signal Processing (GSP) in various applications, incorporating higher-order connectivity structures—such as faces—in addition to nodes and edges, for enriched connectivity modeling. Rich point clouds acquired by multi-camera systems in Visual Simultaneous Localization and Mapping (V-SLAM) are [...] Read more.
Topological signal processing is emerging alongside Graph Signal Processing (GSP) in various applications, incorporating higher-order connectivity structures—such as faces—in addition to nodes and edges, for enriched connectivity modeling. Rich point clouds acquired by multi-camera systems in Visual Simultaneous Localization and Mapping (V-SLAM) are typically processed using graph-based methods. In this work, we introduce a topological signal processing (TSP) framework that integrates texture information extracted from V-SLAM; we refer to this framework as TSP-SLAM. We show how TSP-SLAM enables the extension of graph-based point cloud processing to more advanced topological signal processing techniques. We demonstrate, on real stereo data, that TSP-SLAM enables a richer point cloud representation by associating signals not only with vertices but also with edges and faces of the mesh computed from the point cloud. Numerical results show that TSP-SLAM supports the design of topological filtering algorithms by exploiting the mapping between the 3D mesh faces, edges and vertices and their 2D image projections. These findings confirm the potential of TSP-SLAM for topological signal processing of point cloud data acquired in challenging V-SLAM environments. Full article
(This article belongs to the Special Issue Stereo Vision Sensing and Image Processing)
Show Figures

Figure 1

27 pages, 5515 KB  
Article
Optimizing Multi-Camera Mobile Mapping Systems with Pose Graph and Feature-Based Approaches
by Ahmad El-Alailyi, Luca Morelli, Paweł Trybała, Francesco Fassi and Fabio Remondino
Remote Sens. 2025, 17(16), 2810; https://doi.org/10.3390/rs17162810 - 13 Aug 2025
Cited by 1 | Viewed by 2942
Abstract
Multi-camera Visual Simultaneous Localization and Mapping (V-SLAM) increases spatial coverage through multi-view image streams, improving localization accuracy and reducing data acquisition time. Despite its speed and generally robustness, V-SLAM often struggles to achieve precise camera poses necessary for accurate 3D reconstruction, especially in [...] Read more.
Multi-camera Visual Simultaneous Localization and Mapping (V-SLAM) increases spatial coverage through multi-view image streams, improving localization accuracy and reducing data acquisition time. Despite its speed and generally robustness, V-SLAM often struggles to achieve precise camera poses necessary for accurate 3D reconstruction, especially in complex environments. This study introduces two novel multi-camera optimization methods to enhance pose accuracy, reduce drift, and ensure loop closures. These methods refine multi-camera V-SLAM outputs within existing frameworks and are evaluated in two configurations: (1) multiple independent stereo V-SLAM instances operating on separate camera pairs; and (2) multi-view odometry processing all camera streams simultaneously. The proposed optimizations include (1) a multi-view feature-based optimization that integrates V-SLAM poses with rigid inter-camera constraints and bundle adjustment; and (2) a multi-camera pose graph optimization that fuses multiple trajectories using relative pose constraints and robust noise models. Validation is conducted through two complex 3D surveys using the ATOM-ANT3D multi-camera fisheye mobile mapping system. Results demonstrate survey-grade accuracy comparable to traditional photogrammetry, with reduced computational time, advancing toward near real-time 3D mapping of challenging environments. Full article
Show Figures

Graphical abstract

32 pages, 1435 KB  
Review
Smart Safety Helmets with Integrated Vision Systems for Industrial Infrastructure Inspection: A Comprehensive Review of VSLAM-Enabled Technologies
by Emmanuel A. Merchán-Cruz, Samuel Moveh, Oleksandr Pasha, Reinis Tocelovskis, Alexander Grakovski, Alexander Krainyukov, Nikita Ostrovenecs, Ivans Gercevs and Vladimirs Petrovs
Sensors 2025, 25(15), 4834; https://doi.org/10.3390/s25154834 - 6 Aug 2025
Cited by 1 | Viewed by 4793
Abstract
Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused [...] Read more.
Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused inspection platforms, highlighting how modern helmets leverage real-time visual SLAM algorithms to map environments and assist inspectors. A systematic literature search was conducted targeting high-impact journals, patents, and industry reports. We classify helmet-integrated camera systems into monocular, stereo, and omnidirectional types and compare their capabilities for infrastructure inspection. We examine core VSLAM algorithms (feature-based, direct, hybrid, and deep-learning-enhanced) and discuss their adaptation to wearable platforms. Multi-sensor fusion approaches integrating inertial, LiDAR, and GNSS data are reviewed, along with edge/cloud processing architectures enabling real-time performance. This paper compiles numerous industrial use cases, from bridges and tunnels to plants and power facilities, demonstrating significant improvements in inspection efficiency, data quality, and worker safety. Key challenges are analyzed, including technical hurdles (battery life, processing limits, and harsh environments), human factors (ergonomics, training, and cognitive load), and regulatory issues (safety certification and data privacy). We also identify emerging trends, such as semantic SLAM, AI-driven defect recognition, hardware miniaturization, and collaborative multi-helmet systems. This review finds that VSLAM-equipped smart helmets offer a transformative approach to infrastructure inspection, enabling real-time mapping, augmented awareness, and safer workflows. We conclude by highlighting current research gaps, notably in standardizing systems and integrating with asset management, and provide recommendations for industry adoption and future research directions. Full article
Show Figures

Figure 1

25 pages, 4682 KB  
Article
Visual Active SLAM Method Considering Measurement and State Uncertainty for Space Exploration
by Yao Zhao, Zhi Xiong, Jingqi Wang, Lin Zhang and Pascual Campoy
Aerospace 2025, 12(7), 642; https://doi.org/10.3390/aerospace12070642 - 20 Jul 2025
Cited by 1 | Viewed by 1334
Abstract
This paper presents a visual active SLAM method considering measurement and state uncertainty for space exploration in urban search and rescue environments. An uncertainty evaluation method based on the Fisher Information Matrix (FIM) is studied from the perspective of evaluating the localization uncertainty [...] Read more.
This paper presents a visual active SLAM method considering measurement and state uncertainty for space exploration in urban search and rescue environments. An uncertainty evaluation method based on the Fisher Information Matrix (FIM) is studied from the perspective of evaluating the localization uncertainty of SLAM systems. With the aid of the Fisher Information Matrix, the Cramér–Rao Lower Bound (CRLB) of the pose uncertainty in the stereo visual SLAM system is derived to describe the boundary of the pose uncertainty. Optimality criteria are introduced to quantitatively evaluate the localization uncertainty. The odometry information selection method and the local bundle adjustment information selection method based on Fisher Information are proposed to find out the measurements with low uncertainty for localization and mapping in the search and rescue process. By adopting the method above, the computing efficiency of the system is improved while the localization accuracy is equivalent to the classical ORB-SLAM2. Moreover, by the quantified uncertainty of local poses and map points, the generalized unary node and generalized unary edge are defined to improve the computational efficiency in computing local state uncertainty. In addition, an active loop closing planner considering local state uncertainty is proposed to make use of uncertainty in assisting the space exploration and decision-making of MAV, which is beneficial to the improvement of MAV localization performance in search and rescue environments. Simulations and field tests in different challenging scenarios are conducted to verify the effectiveness of the proposed method. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

32 pages, 2740 KB  
Article
Vision-Based Navigation and Perception for Autonomous Robots: Sensors, SLAM, Control Strategies, and Cross-Domain Applications—A Review
by Eder A. Rodríguez-Martínez, Wendy Flores-Fuentes, Farouk Achakir, Oleg Sergiyenko and Fabian N. Murrieta-Rico
Eng 2025, 6(7), 153; https://doi.org/10.3390/eng6070153 - 7 Jul 2025
Cited by 8 | Viewed by 12079
Abstract
Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from [...] Read more.
Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from sensing to deployment. We first examine the expanding sensor palette—monocular and multi-camera rigs, stereo and RGB-D devices, LiDAR–camera hybrids, event cameras, and infrared systems—highlighting the complementary operating envelopes and the rise of learning-based depth inference. The advances in visual localization and mapping are then analyzed, contrasting sparse and dense SLAM approaches, as well as monocular, stereo, and visual–inertial formulations. Additional topics include loop closure, semantic mapping, and LiDAR–visual–inertial fusion, which enables drift-free operation in dynamic environments. Building on these foundations, we review the navigation and control strategies, spanning classical planning, reinforcement and imitation learning, hybrid topological–metric memories, and emerging visual language guidance. Application case studies—autonomous driving, industrial manipulation, autonomous underwater vehicles, planetary rovers, aerial drones, and humanoids—demonstrate how tailored sensor suites and algorithms meet domain-specific constraints. Finally, the future research trajectories are distilled: generative AI for synthetic training data and scene completion; high-density 3D perception with solid-state LiDAR and neural implicit representations; event-based vision for ultra-fast control; and human-centric autonomy in next-generation robots. By providing a unified taxonomy, a comparative analysis, and engineering guidelines, this review aims to inform researchers and practitioners designing robust, scalable, vision-driven robotic systems. Full article
(This article belongs to the Special Issue Interdisciplinary Insights in Engineering Research)
Show Figures

Figure 1

26 pages, 14214 KB  
Article
Stereo Visual Odometry and Real-Time Appearance-Based SLAM for Mapping and Localization in Indoor and Outdoor Orchard Environments
by Imran Hussain, Xiongzhe Han and Jong-Woo Ha
Agriculture 2025, 15(8), 872; https://doi.org/10.3390/agriculture15080872 - 16 Apr 2025
Cited by 4 | Viewed by 5099
Abstract
Agricultural robots can mitigate labor shortages and advance precision farming. However, the dense vegetation canopies and uneven terrain in orchard environments reduce the reliability of traditional GPS-based localization, thereby reducing navigation accuracy and making autonomous navigation challenging. Moreover, inefficient path planning and an [...] Read more.
Agricultural robots can mitigate labor shortages and advance precision farming. However, the dense vegetation canopies and uneven terrain in orchard environments reduce the reliability of traditional GPS-based localization, thereby reducing navigation accuracy and making autonomous navigation challenging. Moreover, inefficient path planning and an increased risk of collisions affect the robot’s ability to perform tasks such as fruit harvesting, spraying, and monitoring. To address these limitations, this study integrated stereo visual odometry with real-time appearance-based mapping (RTAB-Map)-based simultaneous localization and mapping (SLAM) to improve mapping and localization in both indoor and outdoor orchard settings. The proposed system leverages stereo image pairs for precise depth estimation while utilizing RTAB-Map’s graph-based SLAM framework with loop-closure detection to ensure global map consistency. In addition, an incorporated inertial measurement unit (IMU) enhances pose estimation, thereby improving localization accuracy. Substantial improvements in both mapping and localization performance over the traditional approach were demonstrated, with an average error of 0.018 m against the ground truth for outdoor mapping and a consistent average error of 0.03 m for indoor trails with a 20.7% reduction in visual odometry trajectory deviation compared to traditional methods. Localization performance remained robust across diverse conditions, with a low RMSE of 0.207 m. Our approach provides critical insights into developing more reliable autonomous navigation systems for agricultural robots. Full article
Show Figures

Figure 1

25 pages, 27528 KB  
Article
A Stereo Visual-Inertial SLAM Algorithm with Point-Line Fusion and Semantic Optimization for Forest Environments
by Bo Liu, Hongwei Liu, Yanqiu Xing, Weishu Gong, Shuhang Yang, Hong Yang, Kai Pan, Yuanxin Li, Yifei Hou and Shiqing Jia
Forests 2025, 16(2), 335; https://doi.org/10.3390/f16020335 - 13 Feb 2025
Cited by 1 | Viewed by 2377
Abstract
Accurately localizing individual trees and identifying species distribution are critical tasks in forestry remote sensing. Visual Simultaneous Localization and Mapping (visual SLAM) algorithms serve as important tools for outdoor spatial positioning and mapping, mitigating signal loss caused by tree canopy obstructions. To address [...] Read more.
Accurately localizing individual trees and identifying species distribution are critical tasks in forestry remote sensing. Visual Simultaneous Localization and Mapping (visual SLAM) algorithms serve as important tools for outdoor spatial positioning and mapping, mitigating signal loss caused by tree canopy obstructions. To address these challenges, a semantic SLAM algorithm called LPD-SLAM (Line-Point-Distance Semantic SLAM) is proposed, which integrates stereo cameras with an inertial measurement unit (IMU), with contributions including dynamic feature removal, an individual tree data structure, and semantic point distance constraints. LPD-SLAM is capable of performing individual tree localization and tree species discrimination tasks in forest environments. In mapping, LPD-SLAM reduces false species detection and filters dynamic objects by leveraging a deep learning model and a novel individual tree data structure. In optimization, LPD-SLAM incorporates point and line feature reprojection error constraints along with semantic point distance constraints, which improve robustness and accuracy by introducing additional geometric constraints. Due to the lack of publicly available forest datasets, we choose to validate the proposed algorithm on eight experimental plots, which are selected to cover different seasons, various tree species, and different data collection paths, ensuring the dataset’s diversity and representativeness. The experimental results indicate that the average root mean square error (RMSE) of the trajectories of LPD-SLAM is reduced by up to 81.2% compared with leading algorithms. Meanwhile, the mean absolute error (MAE) of LPD-SLAM in tree localization is 0.24 m, which verifies its excellent performance in forest environments. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

17 pages, 7941 KB  
Article
Visual Localization Domain for Accurate V-SLAM from Stereo Cameras
by Eleonora Di Salvo, Sara Bellucci, Valeria Celidonio, Ilaria Rossini, Stefania Colonnese and Tiziana Cattai
Sensors 2025, 25(3), 739; https://doi.org/10.3390/s25030739 - 26 Jan 2025
Cited by 4 | Viewed by 2124
Abstract
Trajectory estimation from stereo image sequences remains a fundamental challenge in Visual Simultaneous Localization and Mapping (V-SLAM). To address this, we propose a novel approach that focuses on the identification and matching of keypoints within a transformed domain that emphasizes visually significant features. [...] Read more.
Trajectory estimation from stereo image sequences remains a fundamental challenge in Visual Simultaneous Localization and Mapping (V-SLAM). To address this, we propose a novel approach that focuses on the identification and matching of keypoints within a transformed domain that emphasizes visually significant features. Specifically, we propose to perform V-SLAM in a VIsual Localization Domain (VILD), i.e., a domain where visually relevant feature are suitably represented for analysis and tracking. This transformed domain adheres to information-theoretic principles, enabling a maximum likelihood estimation of rotation, translation, and scaling parameters by minimizing the distance between the coefficients of the observed image and those of a reference template. The transformed coefficients are obtained from the output of specialized Circular Harmonic Function (CHF) filters of varying orders. Leveraging this property, we employ a first-order approximation of the image-series representation, directly computing the first-order coefficients through the application of first-order CHF filters. The proposed VILD provides a theoretically grounded and visually relevant representation of the image. We utilize VILD for point matching and tracking across the stereo video sequence. The experimental results on real-world video datasets demonstrate that integrating visually-driven filtering significantly improves trajectory estimation accuracy compared to traditional tracking performed in the spatial domain. Full article
(This article belongs to the Special Issue Emerging Advances in Wireless Positioning and Location-Based Services)
Show Figures

Figure 1

24 pages, 31029 KB  
Article
InCrowd-VI: A Realistic Visual–Inertial Dataset for Evaluating Simultaneous Localization and Mapping in Indoor Pedestrian-Rich Spaces for Human Navigation
by Marziyeh Bamdad, Hans-Peter Hutter and Alireza Darvishy
Sensors 2024, 24(24), 8164; https://doi.org/10.3390/s24248164 - 21 Dec 2024
Cited by 1 | Viewed by 4028
Abstract
Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual–inertial dataset specifically [...] Read more.
Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual–inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. Recorded using Meta Aria Project glasses, it captures realistic scenarios without environmental control. InCrowd-VI features 58 sequences totaling a 5 km trajectory length and 1.5 h of recording time, including RGB, stereo images, and IMU measurements. The dataset captures important challenges such as pedestrian occlusions, varying crowd densities, complex layouts, and lighting changes. Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service. In addition, a semi-dense 3D point cloud of scenes is provided for each sequence. The evaluation of state-of-the-art visual odometry (VO) and SLAM algorithms on InCrowd-VI revealed severe performance limitations in these realistic scenarios. Under challenging conditions, systems exceeded the required localization accuracy of 0.5 m and the 1% drift threshold, with classical methods showing drift up to 5–10%. While deep learning-based approaches maintained high pose estimation coverage (>90%), they failed to achieve real-time processing speeds necessary for walking pace navigation. These results demonstrate the need and value of a new dataset to advance SLAM research for visually impaired navigation in complex indoor environments. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

21 pages, 3646 KB  
Article
D3L-SLAM: A Comprehensive Hybrid Simultaneous Location and Mapping System with Deep Keypoint, Deep Depth, Deep Pose, and Line Detection
by Hao Qu, Congrui Wang, Yangfan Xu, Lilian Zhang, Xiaoping Hu and Changhao Chen
Appl. Sci. 2024, 14(21), 9748; https://doi.org/10.3390/app14219748 - 25 Oct 2024
Viewed by 2810
Abstract
Robust localization and mapping are crucial for autonomous systems, but traditional handcrafted feature-based visual SLAM often struggles in challenging, textureless environments. Additionally, monocular SLAM lacks scale-aware depth perception, making accurate scene scale estimation difficult. To address these issues, we propose D3L-SLAM, a novel [...] Read more.
Robust localization and mapping are crucial for autonomous systems, but traditional handcrafted feature-based visual SLAM often struggles in challenging, textureless environments. Additionally, monocular SLAM lacks scale-aware depth perception, making accurate scene scale estimation difficult. To address these issues, we propose D3L-SLAM, a novel monocular SLAM system that integrates deep keypoints, deep depth estimates, deep pose priors, and a line detector. By leveraging deep keypoints, which are more resilient to lighting variations, our system improves the robustness of visual SLAM. We further enhance perception in low-texture areas by incorporating line features in the front-end and mitigate scale degradation with learned depth estimates. Additionally, point-line feature constraints optimize pose estimation and mapping through a tightly coupled point-line bundle adjustment (BA). The learned pose estimates refine the feature matching process during tracking, leading to more accurate localization and mapping. Experimental results on public and self-collected datasets show that D3L-SLAM significantly outperforms both traditional and learning-based visual SLAM methods in localization accuracy. Full article
Show Figures

Figure 1

17 pages, 3301 KB  
Article
Stereo and LiDAR Loosely Coupled SLAM Constrained Ground Detection
by Tian Sun, Lei Cheng, Ting Zhang, Xiaoping Yuan, Yanzheng Zhao and Yong Liu
Sensors 2024, 24(21), 6828; https://doi.org/10.3390/s24216828 - 24 Oct 2024
Cited by 1 | Viewed by 2023
Abstract
In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain [...] Read more.
In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain limitations. This paper proposes a hybrid localization and mapping method using stereo vision and LiDAR. Unlike the traditional single-sensor systems, we construct a pose optimization model by matching ground information between LiDAR maps and visual images. We use stereo vision to extract ground information and fuse it with LiDAR tensor voting data to establish coplanarity constraints. Pose optimization is achieved through a graph-based optimization algorithm and a local window optimization method. The proposed method is evaluated using the KITTI dataset and compared against the ORB-SLAM3, F-LOAM, LOAM, and LeGO-LOAM methods. Additionally, we generate 3D point cloud maps for the corresponding sequences and high-definition point cloud maps of the streets in sequence 00. The experimental results demonstrate significant improvements in trajectory accuracy and robustness, enabling the construction of clear, dense 3D maps. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

21 pages, 12275 KB  
Article
Segmentation Point Simultaneous Localization and Mapping: A Stereo Vision Simultaneous Localization and Mapping Method for Unmanned Surface Vehicles in Nearshore Environments
by Xiujing Gao, Xinzhi Lin, Fanchao Lin and Hongwu Huang
Electronics 2024, 13(16), 3106; https://doi.org/10.3390/electronics13163106 - 6 Aug 2024
Cited by 4 | Viewed by 2023
Abstract
Unmanned surface vehicles (USVs) in nearshore areas are prone to environmental occlusion and electromagnetic interference, which can lead to the failure of traditional satellite-positioning methods. This paper utilizes a visual simultaneous localization and mapping (vSLAM) method to achieve USV positioning in nearshore environments. [...] Read more.
Unmanned surface vehicles (USVs) in nearshore areas are prone to environmental occlusion and electromagnetic interference, which can lead to the failure of traditional satellite-positioning methods. This paper utilizes a visual simultaneous localization and mapping (vSLAM) method to achieve USV positioning in nearshore environments. To address the issues of uneven feature distribution, erroneous depth information, and frequent viewpoint jitter in the visual data of USVs operating in nearshore environments, we propose a stereo vision SLAM system tailored for nearshore conditions: SP-SLAM (Segmentation Point-SLAM). This method is based on ORB-SLAM2 and incorporates a distance segmentation module, which filters feature points from different regions and adaptively adjusts the impact of outliers on iterative optimization, reducing the influence of erroneous depth information on motion scale estimation in open environments. Additionally, our method uses the Sum of Absolute Differences (SAD) for matching image blocks and quadric interpolation to obtain more accurate depth information, constructing a complete map. The experimental results on the USVInland dataset show that SP-SLAM solves the scaling constraint failure problem in nearshore environments and significantly improves the robustness of the stereo SLAM system in such environments. Full article
(This article belongs to the Section Electrical and Autonomous Vehicles)
Show Figures

Figure 1

Back to TopTop