Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (71)

Search Parameters:
Keywords = stereo SLAM

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 5083 KB  
Article
MDR–SLAM: Robust 3D Mapping in Low-Texture Scenes with a Decoupled Approach and Temporal Filtering
by Kailin Zhang and Letao Zhou
Electronics 2025, 14(24), 4864; https://doi.org/10.3390/electronics14244864 - 10 Dec 2025
Viewed by 355
Abstract
Realizing real-time dense 3D reconstruction on resource-limited mobile platforms remains a significant challenge, particularly in low-texture environments that demand robust multi-frame fusion to resolve matching ambiguities. However, the inherent tight coupling of pose estimation and mapping in traditional monolithic SLAM architectures imposes a [...] Read more.
Realizing real-time dense 3D reconstruction on resource-limited mobile platforms remains a significant challenge, particularly in low-texture environments that demand robust multi-frame fusion to resolve matching ambiguities. However, the inherent tight coupling of pose estimation and mapping in traditional monolithic SLAM architectures imposes a severe restriction on integrating high-complexity fusion algorithms without compromising tracking stability. To overcome these limitations, this paper proposes MDR–SLAM, a modular and fully decoupled stereo framework. The system features a novel keyframe-driven temporal filter that synergizes efficient ELAS stereo matching with Kalman filtering to effectively accumulate geometric constraints, thereby enhancing reconstruction density in textureless areas. Furthermore, a confidence-based fusion backend is employed to incrementally maintain global map consistency and filter outliers. Quantitative evaluation on the NUFR-M3F indoor dataset demonstrates the effectiveness of the proposed method: compared to the standard single-frame baseline, MDR–SLAM reduces map RMSE by 83.3% (to 0.012 m) and global trajectory drift by 55.6%, while significantly improving map completeness. The system operates entirely on CPU resources with a stable 4.7 Hz mapping frequency, verifying its suitability for embedded mobile robotics. Full article
(This article belongs to the Special Issue Recent Advance of Auto Navigation in Indoor Scenarios)
Show Figures

Figure 1

24 pages, 22793 KB  
Article
GL-VSLAM: A General Lightweight Visual SLAM Approach for RGB-D and Stereo Cameras
by Xu Li, Tuanjie Li, Yulin Zhang, Ziang Li, Lixiang Ban and Yuming Ning
Sensors 2025, 25(24), 7467; https://doi.org/10.3390/s25247467 - 8 Dec 2025
Viewed by 508
Abstract
Feature-based indirect SLAM is more robust than direct SLAM; however, feature extraction and descriptor computation are time-consuming. In this paper, we propose GL-VSLAM, a general lightweight visual SLAM approach designed for RGB-D and stereo cameras. GL-VSLAM utilizes sparse optical flow matching based on [...] Read more.
Feature-based indirect SLAM is more robust than direct SLAM; however, feature extraction and descriptor computation are time-consuming. In this paper, we propose GL-VSLAM, a general lightweight visual SLAM approach designed for RGB-D and stereo cameras. GL-VSLAM utilizes sparse optical flow matching based on uniform motion model prediction to establish keypoint correspondences between consecutive frames, rather than relying on descriptor-based feature matching, thereby achieving high real-time performance. To enhance positioning accuracy, we adopt a coarse-to-fine strategy for pose estimation in two stages. In the first stage, the initial camera pose is estimated using RANSAC PnP based on robust keypoint correspondences from sparse optical flow. In the second stage, the camera pose is further refined by minimizing the reprojection error. Keypoints and descriptors are extracted from keyframes for backend optimization and loop closure detection. We evaluate our system on the TUM and KITTI datasets, as well as in a real-world environment, and compare it with several state-of-the-art methods. Experimental results demonstrate that our method achieves comparable positioning accuracy, while its efficiency is up to twice that of ORB-SLAM2. Full article
Show Figures

Figure 1

17 pages, 2339 KB  
Article
Robust Direct Multi-Camera SLAM in Challenging Scenarios
by Yonglei Pan, Yueshang Zhou, Qiming Qi, Guoyan Wang, Yanwen Jiang, Hongqi Fan and Jun He
Electronics 2025, 14(23), 4556; https://doi.org/10.3390/electronics14234556 - 21 Nov 2025
Viewed by 597
Abstract
Traditional monocular and stereo visual SLAM systems often fail to operate stably in complex unstructured environments (e.g., weakly textured or repetitively textured scenes) due to feature scarcity from their limited fields of view. In contrast, multi-camera systems can effectively overcome the perceptual limitations [...] Read more.
Traditional monocular and stereo visual SLAM systems often fail to operate stably in complex unstructured environments (e.g., weakly textured or repetitively textured scenes) due to feature scarcity from their limited fields of view. In contrast, multi-camera systems can effectively overcome the perceptual limitations of monocular or stereo setups by providing broader field-of-view coverage. However, most existing multi-camera visual SLAM systems are primarily feature-based and thus still constrained by the inherent limitations of feature extraction in such environments. To address this issue, a multi-camera visual SLAM framework based on the direct method is proposed. In the front-end, a detector-free matcher named Efficient LoFTR is incorporated, enabling pose estimation through dense pixel associations to improve localization accuracy and robustness. In the back-end, geometric constraints among multiple cameras are integrated, and system localization accuracy is further improved through a joint optimization process. Through extensive experiments on public datasets and a self-built simulation dataset, the proposed method achieves superior performance over state-of-the-art approaches regarding localization accuracy, trajectory completeness, and environmental adaptability, thereby validating its high robustness in complex unstructured environments. Full article
Show Figures

Figure 1

14 pages, 1310 KB  
Article
Stereo-GS: Online 3D Gaussian Splatting Mapping Using Stereo Depth Estimation
by Junkyu Park, Byeonggwon Lee, Sanggi Lee and Soohwan Song
Electronics 2025, 14(22), 4436; https://doi.org/10.3390/electronics14224436 - 14 Nov 2025
Viewed by 1810
Abstract
We present Stereo-GS, a real-time system for online 3D Gaussian Splatting (3DGS) that reconstructs photorealistic 3D scenes from streaming stereo pairs. Unlike prior offline 3DGS methods that require dense multi-view input or precomputed depth, Stereo-GS estimates metrically accurate depth maps directly from rectified [...] Read more.
We present Stereo-GS, a real-time system for online 3D Gaussian Splatting (3DGS) that reconstructs photorealistic 3D scenes from streaming stereo pairs. Unlike prior offline 3DGS methods that require dense multi-view input or precomputed depth, Stereo-GS estimates metrically accurate depth maps directly from rectified stereo geometry, enabling progressive, globally consistent reconstruction. The frontend combines a stereo implementation of DROID-SLAM for robust tracking and keyframe selection with FoundationStereo, a generalizable stereo network that needs no scene-specific fine-tuning. A two-stage filtering pipeline improves depth reliability by removing outliers using a variance-based refinement filter followed by a multi-view consistency check. In the backend, we selectively initialize new Gaussians in under-represented regions flagged by low PSNR during rendering and continuously optimize them via differentiable rendering. To maintain global coherence with minimal overhead, we apply a lightweight rigid alignment after periodic bundle adjustment. On EuRoC and TartanAir, Stereo-GS attains state-of-the-art performance, improving average PSNR by 0.22 dB and 2.45 dB over the best baseline, respectively. Together with superior visual quality, these results show that Stereo-GS delivers high-fidelity, geometrically accurate 3D reconstructions suitable for real-time robotics, navigation, and immersive AR/VR applications. Full article
(This article belongs to the Special Issue Real-Time Computer Vision)
Show Figures

Graphical abstract

18 pages, 3754 KB  
Article
Hardware Implementation of Improved Oriented FAST and Rotated BRIEF-Simultaneous Localization and Mapping Version 2
by Ji-Long He, Ying-Hua Chen, Wenny Ramadha Putri, Chung-I. Huang, Ming-Hsiang Su, Kuo-Chen Li, Jian-Hong Wang, Shih-Lun Chen, Yung-Hui Li and Jia-Ching Wang
Sensors 2025, 25(20), 6404; https://doi.org/10.3390/s25206404 - 17 Oct 2025
Viewed by 1115
Abstract
The field of autonomous driving has seen continuous advances, yet achieving higher levels of automation in real-world applications remains challenging. A critical requirement for autonomous navigation is accurate map construction, particularly in novel and unstructured environments. In recent years, Simultaneous Localization and Mapping [...] Read more.
The field of autonomous driving has seen continuous advances, yet achieving higher levels of automation in real-world applications remains challenging. A critical requirement for autonomous navigation is accurate map construction, particularly in novel and unstructured environments. In recent years, Simultaneous Localization and Mapping (SLAM) has evolved to support diverse sensor modalities, with some implementations incorporating machine learning to improve performance. However, these approaches often demand substantial computational resources. The key challenge lies in achieving efficiency within resource-constrained environments while minimizing errors that could degrade downstream tasks. This paper presents an enhanced ORB-SLAM2 (Oriented FAST and Rotated BRIEF Simultaneous Localization and Mapping, version 2) algorithm implemented on a Raspberry Pi 3 (ARM A53 CPU) to improve mapping performance under limited computational resources. ORB-SLAM2 comprises four main stages: Tracking, Local Mapping, Loop Closing, and Full Bundle Adjustment (BA). The proposed improvements include employing a more efficient feature descriptor to increase stereo feature-matching rates and optimizing loop-closing parameters to reduce accumulated errors. Experimental results demonstrate that the proposed system achieves notable improvements on the Raspberry Pi 3 platform. For monocular SLAM, RMSE is reduced by 18.11%, mean error by 22.97%, median error by 29.41%, and maximum error by 17.18%. For stereo SLAM, RMSE decreases by 0.30% and mean error by 0.38%. Furthermore, the ROS topic frequency stabilizes at 10 Hz, with quad-core CPU utilization averaging approximately 90%. These results indicate that the system satisfies real-time requirements while maintaining a balanced trade-off between accuracy and computational efficiency under resource constraints. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

20 pages, 74841 KB  
Article
Autonomous Concrete Crack Monitoring Using a Mobile Robot with a 2-DoF Manipulator and Stereo Vision Sensors
by Seola Yang, Daeik Jang, Jonghyeok Kim and Haemin Jeon
Sensors 2025, 25(19), 6121; https://doi.org/10.3390/s25196121 - 3 Oct 2025
Cited by 1 | Viewed by 1317
Abstract
Crack monitoring in concrete structures is essential to maintaining structural integrity. Therefore, this paper proposes a mobile ground robot equipped with a 2-DoF manipulator and stereo vision sensors for autonomous crack monitoring and mapping. To facilitate crack detection over large areas, a 2-DoF [...] Read more.
Crack monitoring in concrete structures is essential to maintaining structural integrity. Therefore, this paper proposes a mobile ground robot equipped with a 2-DoF manipulator and stereo vision sensors for autonomous crack monitoring and mapping. To facilitate crack detection over large areas, a 2-DoF motorized manipulator providing linear and rotational motions, with a stereo vision sensor mounted on the end effector, was deployed. In combination with a manual rotation plate, this configuration enhances accessibility and expands the field of view for crack monitoring. Another stereo vision sensor, mounted at the front of the robot, was used to acquire point cloud data of the surrounding environment, enabling tasks such as SLAM (simultaneous localization and mapping), path planning and following, and obstacle avoidance. Cracks are detected and segmented using the deep learning algorithms YOLO (You Only Look Once) v6-s and SFNet (Semantic Flow Network), respectively. To enhance the performance of crack segmentation, synthetic image generation and preprocessing techniques, including cropping and scaling, were applied. The dimensions of cracks are calculated using point clouds filtered with the median absolute deviation method. To validate the performance of the proposed crack-monitoring and mapping method with the robot system, indoor experimental tests were performed. The experimental results confirmed that, in cases of divided imaging, the crack propagation direction was predicted, enabling robotic manipulation and division-point calculation. Subsequently, total crack length and width were calculated by combining reconstructed 3D point clouds from multiple frames, with a maximum relative error of 1%. Full article
Show Figures

Figure 1

19 pages, 5861 KB  
Article
Topological Signal Processing from Stereo Visual SLAM
by Eleonora Di Salvo, Tommaso Latino, Maria Sanzone, Alessia Trozzo and Stefania Colonnese
Sensors 2025, 25(19), 6103; https://doi.org/10.3390/s25196103 - 3 Oct 2025
Viewed by 650
Abstract
Topological signal processing is emerging alongside Graph Signal Processing (GSP) in various applications, incorporating higher-order connectivity structures—such as faces—in addition to nodes and edges, for enriched connectivity modeling. Rich point clouds acquired by multi-camera systems in Visual Simultaneous Localization and Mapping (V-SLAM) are [...] Read more.
Topological signal processing is emerging alongside Graph Signal Processing (GSP) in various applications, incorporating higher-order connectivity structures—such as faces—in addition to nodes and edges, for enriched connectivity modeling. Rich point clouds acquired by multi-camera systems in Visual Simultaneous Localization and Mapping (V-SLAM) are typically processed using graph-based methods. In this work, we introduce a topological signal processing (TSP) framework that integrates texture information extracted from V-SLAM; we refer to this framework as TSP-SLAM. We show how TSP-SLAM enables the extension of graph-based point cloud processing to more advanced topological signal processing techniques. We demonstrate, on real stereo data, that TSP-SLAM enables a richer point cloud representation by associating signals not only with vertices but also with edges and faces of the mesh computed from the point cloud. Numerical results show that TSP-SLAM supports the design of topological filtering algorithms by exploiting the mapping between the 3D mesh faces, edges and vertices and their 2D image projections. These findings confirm the potential of TSP-SLAM for topological signal processing of point cloud data acquired in challenging V-SLAM environments. Full article
(This article belongs to the Special Issue Stereo Vision Sensing and Image Processing)
Show Figures

Figure 1

27 pages, 5515 KB  
Article
Optimizing Multi-Camera Mobile Mapping Systems with Pose Graph and Feature-Based Approaches
by Ahmad El-Alailyi, Luca Morelli, Paweł Trybała, Francesco Fassi and Fabio Remondino
Remote Sens. 2025, 17(16), 2810; https://doi.org/10.3390/rs17162810 - 13 Aug 2025
Cited by 1 | Viewed by 2839
Abstract
Multi-camera Visual Simultaneous Localization and Mapping (V-SLAM) increases spatial coverage through multi-view image streams, improving localization accuracy and reducing data acquisition time. Despite its speed and generally robustness, V-SLAM often struggles to achieve precise camera poses necessary for accurate 3D reconstruction, especially in [...] Read more.
Multi-camera Visual Simultaneous Localization and Mapping (V-SLAM) increases spatial coverage through multi-view image streams, improving localization accuracy and reducing data acquisition time. Despite its speed and generally robustness, V-SLAM often struggles to achieve precise camera poses necessary for accurate 3D reconstruction, especially in complex environments. This study introduces two novel multi-camera optimization methods to enhance pose accuracy, reduce drift, and ensure loop closures. These methods refine multi-camera V-SLAM outputs within existing frameworks and are evaluated in two configurations: (1) multiple independent stereo V-SLAM instances operating on separate camera pairs; and (2) multi-view odometry processing all camera streams simultaneously. The proposed optimizations include (1) a multi-view feature-based optimization that integrates V-SLAM poses with rigid inter-camera constraints and bundle adjustment; and (2) a multi-camera pose graph optimization that fuses multiple trajectories using relative pose constraints and robust noise models. Validation is conducted through two complex 3D surveys using the ATOM-ANT3D multi-camera fisheye mobile mapping system. Results demonstrate survey-grade accuracy comparable to traditional photogrammetry, with reduced computational time, advancing toward near real-time 3D mapping of challenging environments. Full article
Show Figures

Graphical abstract

32 pages, 1435 KB  
Review
Smart Safety Helmets with Integrated Vision Systems for Industrial Infrastructure Inspection: A Comprehensive Review of VSLAM-Enabled Technologies
by Emmanuel A. Merchán-Cruz, Samuel Moveh, Oleksandr Pasha, Reinis Tocelovskis, Alexander Grakovski, Alexander Krainyukov, Nikita Ostrovenecs, Ivans Gercevs and Vladimirs Petrovs
Sensors 2025, 25(15), 4834; https://doi.org/10.3390/s25154834 - 6 Aug 2025
Cited by 1 | Viewed by 4594
Abstract
Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused [...] Read more.
Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused inspection platforms, highlighting how modern helmets leverage real-time visual SLAM algorithms to map environments and assist inspectors. A systematic literature search was conducted targeting high-impact journals, patents, and industry reports. We classify helmet-integrated camera systems into monocular, stereo, and omnidirectional types and compare their capabilities for infrastructure inspection. We examine core VSLAM algorithms (feature-based, direct, hybrid, and deep-learning-enhanced) and discuss their adaptation to wearable platforms. Multi-sensor fusion approaches integrating inertial, LiDAR, and GNSS data are reviewed, along with edge/cloud processing architectures enabling real-time performance. This paper compiles numerous industrial use cases, from bridges and tunnels to plants and power facilities, demonstrating significant improvements in inspection efficiency, data quality, and worker safety. Key challenges are analyzed, including technical hurdles (battery life, processing limits, and harsh environments), human factors (ergonomics, training, and cognitive load), and regulatory issues (safety certification and data privacy). We also identify emerging trends, such as semantic SLAM, AI-driven defect recognition, hardware miniaturization, and collaborative multi-helmet systems. This review finds that VSLAM-equipped smart helmets offer a transformative approach to infrastructure inspection, enabling real-time mapping, augmented awareness, and safer workflows. We conclude by highlighting current research gaps, notably in standardizing systems and integrating with asset management, and provide recommendations for industry adoption and future research directions. Full article
Show Figures

Figure 1

25 pages, 4682 KB  
Article
Visual Active SLAM Method Considering Measurement and State Uncertainty for Space Exploration
by Yao Zhao, Zhi Xiong, Jingqi Wang, Lin Zhang and Pascual Campoy
Aerospace 2025, 12(7), 642; https://doi.org/10.3390/aerospace12070642 - 20 Jul 2025
Viewed by 1293
Abstract
This paper presents a visual active SLAM method considering measurement and state uncertainty for space exploration in urban search and rescue environments. An uncertainty evaluation method based on the Fisher Information Matrix (FIM) is studied from the perspective of evaluating the localization uncertainty [...] Read more.
This paper presents a visual active SLAM method considering measurement and state uncertainty for space exploration in urban search and rescue environments. An uncertainty evaluation method based on the Fisher Information Matrix (FIM) is studied from the perspective of evaluating the localization uncertainty of SLAM systems. With the aid of the Fisher Information Matrix, the Cramér–Rao Lower Bound (CRLB) of the pose uncertainty in the stereo visual SLAM system is derived to describe the boundary of the pose uncertainty. Optimality criteria are introduced to quantitatively evaluate the localization uncertainty. The odometry information selection method and the local bundle adjustment information selection method based on Fisher Information are proposed to find out the measurements with low uncertainty for localization and mapping in the search and rescue process. By adopting the method above, the computing efficiency of the system is improved while the localization accuracy is equivalent to the classical ORB-SLAM2. Moreover, by the quantified uncertainty of local poses and map points, the generalized unary node and generalized unary edge are defined to improve the computational efficiency in computing local state uncertainty. In addition, an active loop closing planner considering local state uncertainty is proposed to make use of uncertainty in assisting the space exploration and decision-making of MAV, which is beneficial to the improvement of MAV localization performance in search and rescue environments. Simulations and field tests in different challenging scenarios are conducted to verify the effectiveness of the proposed method. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

32 pages, 2740 KB  
Article
Vision-Based Navigation and Perception for Autonomous Robots: Sensors, SLAM, Control Strategies, and Cross-Domain Applications—A Review
by Eder A. Rodríguez-Martínez, Wendy Flores-Fuentes, Farouk Achakir, Oleg Sergiyenko and Fabian N. Murrieta-Rico
Eng 2025, 6(7), 153; https://doi.org/10.3390/eng6070153 - 7 Jul 2025
Cited by 7 | Viewed by 11511
Abstract
Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from [...] Read more.
Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from sensing to deployment. We first examine the expanding sensor palette—monocular and multi-camera rigs, stereo and RGB-D devices, LiDAR–camera hybrids, event cameras, and infrared systems—highlighting the complementary operating envelopes and the rise of learning-based depth inference. The advances in visual localization and mapping are then analyzed, contrasting sparse and dense SLAM approaches, as well as monocular, stereo, and visual–inertial formulations. Additional topics include loop closure, semantic mapping, and LiDAR–visual–inertial fusion, which enables drift-free operation in dynamic environments. Building on these foundations, we review the navigation and control strategies, spanning classical planning, reinforcement and imitation learning, hybrid topological–metric memories, and emerging visual language guidance. Application case studies—autonomous driving, industrial manipulation, autonomous underwater vehicles, planetary rovers, aerial drones, and humanoids—demonstrate how tailored sensor suites and algorithms meet domain-specific constraints. Finally, the future research trajectories are distilled: generative AI for synthetic training data and scene completion; high-density 3D perception with solid-state LiDAR and neural implicit representations; event-based vision for ultra-fast control; and human-centric autonomy in next-generation robots. By providing a unified taxonomy, a comparative analysis, and engineering guidelines, this review aims to inform researchers and practitioners designing robust, scalable, vision-driven robotic systems. Full article
(This article belongs to the Special Issue Interdisciplinary Insights in Engineering Research)
Show Figures

Figure 1

26 pages, 14214 KB  
Article
Stereo Visual Odometry and Real-Time Appearance-Based SLAM for Mapping and Localization in Indoor and Outdoor Orchard Environments
by Imran Hussain, Xiongzhe Han and Jong-Woo Ha
Agriculture 2025, 15(8), 872; https://doi.org/10.3390/agriculture15080872 - 16 Apr 2025
Cited by 4 | Viewed by 4958
Abstract
Agricultural robots can mitigate labor shortages and advance precision farming. However, the dense vegetation canopies and uneven terrain in orchard environments reduce the reliability of traditional GPS-based localization, thereby reducing navigation accuracy and making autonomous navigation challenging. Moreover, inefficient path planning and an [...] Read more.
Agricultural robots can mitigate labor shortages and advance precision farming. However, the dense vegetation canopies and uneven terrain in orchard environments reduce the reliability of traditional GPS-based localization, thereby reducing navigation accuracy and making autonomous navigation challenging. Moreover, inefficient path planning and an increased risk of collisions affect the robot’s ability to perform tasks such as fruit harvesting, spraying, and monitoring. To address these limitations, this study integrated stereo visual odometry with real-time appearance-based mapping (RTAB-Map)-based simultaneous localization and mapping (SLAM) to improve mapping and localization in both indoor and outdoor orchard settings. The proposed system leverages stereo image pairs for precise depth estimation while utilizing RTAB-Map’s graph-based SLAM framework with loop-closure detection to ensure global map consistency. In addition, an incorporated inertial measurement unit (IMU) enhances pose estimation, thereby improving localization accuracy. Substantial improvements in both mapping and localization performance over the traditional approach were demonstrated, with an average error of 0.018 m against the ground truth for outdoor mapping and a consistent average error of 0.03 m for indoor trails with a 20.7% reduction in visual odometry trajectory deviation compared to traditional methods. Localization performance remained robust across diverse conditions, with a low RMSE of 0.207 m. Our approach provides critical insights into developing more reliable autonomous navigation systems for agricultural robots. Full article
Show Figures

Figure 1

25 pages, 27528 KB  
Article
A Stereo Visual-Inertial SLAM Algorithm with Point-Line Fusion and Semantic Optimization for Forest Environments
by Bo Liu, Hongwei Liu, Yanqiu Xing, Weishu Gong, Shuhang Yang, Hong Yang, Kai Pan, Yuanxin Li, Yifei Hou and Shiqing Jia
Forests 2025, 16(2), 335; https://doi.org/10.3390/f16020335 - 13 Feb 2025
Cited by 1 | Viewed by 2317
Abstract
Accurately localizing individual trees and identifying species distribution are critical tasks in forestry remote sensing. Visual Simultaneous Localization and Mapping (visual SLAM) algorithms serve as important tools for outdoor spatial positioning and mapping, mitigating signal loss caused by tree canopy obstructions. To address [...] Read more.
Accurately localizing individual trees and identifying species distribution are critical tasks in forestry remote sensing. Visual Simultaneous Localization and Mapping (visual SLAM) algorithms serve as important tools for outdoor spatial positioning and mapping, mitigating signal loss caused by tree canopy obstructions. To address these challenges, a semantic SLAM algorithm called LPD-SLAM (Line-Point-Distance Semantic SLAM) is proposed, which integrates stereo cameras with an inertial measurement unit (IMU), with contributions including dynamic feature removal, an individual tree data structure, and semantic point distance constraints. LPD-SLAM is capable of performing individual tree localization and tree species discrimination tasks in forest environments. In mapping, LPD-SLAM reduces false species detection and filters dynamic objects by leveraging a deep learning model and a novel individual tree data structure. In optimization, LPD-SLAM incorporates point and line feature reprojection error constraints along with semantic point distance constraints, which improve robustness and accuracy by introducing additional geometric constraints. Due to the lack of publicly available forest datasets, we choose to validate the proposed algorithm on eight experimental plots, which are selected to cover different seasons, various tree species, and different data collection paths, ensuring the dataset’s diversity and representativeness. The experimental results indicate that the average root mean square error (RMSE) of the trajectories of LPD-SLAM is reduced by up to 81.2% compared with leading algorithms. Meanwhile, the mean absolute error (MAE) of LPD-SLAM in tree localization is 0.24 m, which verifies its excellent performance in forest environments. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

29 pages, 4682 KB  
Article
LSAF-LSTM-Based Self-Adaptive Multi-Sensor Fusion for Robust UAV State Estimation in Challenging Environments
by Mahammad Irfan, Sagar Dalai, Petar Trslic, James Riordan and Gerard Dooly
Machines 2025, 13(2), 130; https://doi.org/10.3390/machines13020130 - 9 Feb 2025
Cited by 7 | Viewed by 3858
Abstract
Unmanned aerial vehicle (UAV) state estimation is fundamental across applications like robot navigation, autonomous driving, virtual reality (VR), and augmented reality (AR). This research highlights the critical role of robust state estimation in ensuring safe and efficient autonomous UAV navigation, particularly in challenging [...] Read more.
Unmanned aerial vehicle (UAV) state estimation is fundamental across applications like robot navigation, autonomous driving, virtual reality (VR), and augmented reality (AR). This research highlights the critical role of robust state estimation in ensuring safe and efficient autonomous UAV navigation, particularly in challenging environments. We propose a deep learning-based adaptive sensor fusion framework for UAV state estimation, integrating multi-sensor data from stereo cameras, an IMU, two 3D LiDAR’s, and GPS. The framework dynamically adjusts fusion weights in real time using a long short-term memory (LSTM) model, enhancing robustness under diverse conditions such as illumination changes, structureless environments, degraded GPS signals, or complete signal loss where traditional single-sensor SLAM methods often fail. Validated on an in-house integrated UAV platform and evaluated against high-precision RTK ground truth, the algorithm incorporates deep learning-predicted fusion weights into an optimization-based odometry pipeline. The system delivers robust, consistent, and accurate state estimation, outperforming state-of-the-art techniques. Experimental results demonstrate its adaptability and effectiveness across challenging scenarios, showcasing significant advancements in UAV autonomy and reliability through the synergistic integration of deep learning and sensor fusion. Full article
Show Figures

Figure 1

17 pages, 7941 KB  
Article
Visual Localization Domain for Accurate V-SLAM from Stereo Cameras
by Eleonora Di Salvo, Sara Bellucci, Valeria Celidonio, Ilaria Rossini, Stefania Colonnese and Tiziana Cattai
Sensors 2025, 25(3), 739; https://doi.org/10.3390/s25030739 - 26 Jan 2025
Cited by 4 | Viewed by 2064
Abstract
Trajectory estimation from stereo image sequences remains a fundamental challenge in Visual Simultaneous Localization and Mapping (V-SLAM). To address this, we propose a novel approach that focuses on the identification and matching of keypoints within a transformed domain that emphasizes visually significant features. [...] Read more.
Trajectory estimation from stereo image sequences remains a fundamental challenge in Visual Simultaneous Localization and Mapping (V-SLAM). To address this, we propose a novel approach that focuses on the identification and matching of keypoints within a transformed domain that emphasizes visually significant features. Specifically, we propose to perform V-SLAM in a VIsual Localization Domain (VILD), i.e., a domain where visually relevant feature are suitably represented for analysis and tracking. This transformed domain adheres to information-theoretic principles, enabling a maximum likelihood estimation of rotation, translation, and scaling parameters by minimizing the distance between the coefficients of the observed image and those of a reference template. The transformed coefficients are obtained from the output of specialized Circular Harmonic Function (CHF) filters of varying orders. Leveraging this property, we employ a first-order approximation of the image-series representation, directly computing the first-order coefficients through the application of first-order CHF filters. The proposed VILD provides a theoretically grounded and visually relevant representation of the image. We utilize VILD for point matching and tracking across the stereo video sequence. The experimental results on real-world video datasets demonstrate that integrating visually-driven filtering significantly improves trajectory estimation accuracy compared to traditional tracking performed in the spatial domain. Full article
(This article belongs to the Special Issue Emerging Advances in Wireless Positioning and Location-Based Services)
Show Figures

Figure 1

Back to TopTop