Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (428)

Search Parameters:
Keywords = visual simultaneous localization and mapping

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 33517 KB  
Article
DOE-LVI: Tightly Coupled LiDAR-Visual-Inertial SLAM System with Dynamic Object Elimination
by Tuanjie Li, Shichao Yang, Xu Li and Junjie Wang
Sensors 2026, 26(12), 3717; https://doi.org/10.3390/s26123717 - 11 Jun 2026
Viewed by 177
Abstract
In dynamic environments, Simultaneous Localization and Mapping (SLAM) systems often struggle with the challenges posed by moving objects. To address these issues, we propose Dynamic-Object-Elimination LiDAR-Visual-Inertial SLAM (DOE-LVI), an advanced tightly coupled LiDAR-Visual-Inertial SLAM system. DOE-LVI integrates two primary subsystems: the Visual-Inertial System [...] Read more.
In dynamic environments, Simultaneous Localization and Mapping (SLAM) systems often struggle with the challenges posed by moving objects. To address these issues, we propose Dynamic-Object-Elimination LiDAR-Visual-Inertial SLAM (DOE-LVI), an advanced tightly coupled LiDAR-Visual-Inertial SLAM system. DOE-LVI integrates two primary subsystems: the Visual-Inertial System (VIS) and the LiDAR-Inertial System (LIS). The VIS component extracts depth information from LiDAR scans and correlates it with visual features, providing accurate pose estimation by minimizing both visual and IMU residuals. The LIS uses this initial estimate to generate range images and perform preliminary removal of dynamic points. Misclassified points are then corrected through ground fitting and precise scan matching with the submap. For enhanced loop closure detection, DOE-LVI employs global LiDAR descriptors, which significantly improve both localization robustness and accuracy. Experimental evaluations on the KITTI and UrbanNav datasets demonstrate that DOE-LVI achieves robust localization and mapping performance, particularly in highly dynamic environments. Full article
(This article belongs to the Section Environmental Sensing)
Show Figures

Figure 1

23 pages, 89616 KB  
Article
DMSG-SLAM: Cascaded Semantic and Geometric Filtering for RGB-D Tracking and Mapping in Dynamic Environments
by Beicheng Li, Enhui Zheng, Huailiang Wang, Yuhao Geng, Qiming Hu and Xuxu Qi
Sensors 2026, 26(12), 3634; https://doi.org/10.3390/s26123634 - 7 Jun 2026
Viewed by 323
Abstract
Traditional visual SLAM systems often suffer from localization drift in dynamic environments due to interference from moving objects. Although semantic segmentation and depth-based masking methods have improved performance, they may still suffer from boundary under-segmentation and missed detections due to truncation of dynamic [...] Read more.
Traditional visual SLAM systems often suffer from localization drift in dynamic environments due to interference from moving objects. Although semantic segmentation and depth-based masking methods have improved performance, they may still suffer from boundary under-segmentation and missed detections due to truncation of dynamic objects. To address these challenges, we propose a cascaded framework, DMSG-SLAM, a cascaded visual SLAM system that fuses Depth-Mask, Semantic information and Geometry constraints for dynamic environments. A lightweight object detection network, combined with depth consistency, is first employed to generate instance-like masks for preliminary dynamic feature removal. Then, a rotation-aware local epipolar geometric filtering mechanism is introduced to suppress residual features near object boundaries and mitigate perceptual blind spots caused by occlusion or truncation. Within potential dynamic regions, the epipolar threshold is adaptively switched according to the estimated inter-frame rotation to provide a more conservative filtering effect under challenging motion conditions. In addition, a TSDF-based dense volumetric map is incorporated to reconstruct more consistent surfaces. Experiments on highly dynamic sequences from the TUM RGB-D dataset indicate that DMSG-SLAM achieves competitive accuracy in dynamic environments, with localization performance improving by up to 90% compared to ORB-SLAM2. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

24 pages, 5086 KB  
Article
Multi-Source Sensor Fusion Localization Method for Autonomous Underwater Vehicles Based on Deep Learning
by Xin Pan, Guoli Feng, Haiyan Zeng and Qunhong Tian
J. Mar. Sci. Eng. 2026, 14(11), 1064; https://doi.org/10.3390/jmse14111064 - 5 Jun 2026
Viewed by 227
Abstract
Autonomous Underwater Vehicles (AUVs) are increasingly used in deep-sea exploration, environmental monitoring, and marine engineering. Their operational safety and mission performance rely heavily on accurate and long-endurance underwater localization. However, both single-sensor localization methods and existing multi-sensor fusion approaches have inherent limitations, making [...] Read more.
Autonomous Underwater Vehicles (AUVs) are increasingly used in deep-sea exploration, environmental monitoring, and marine engineering. Their operational safety and mission performance rely heavily on accurate and long-endurance underwater localization. However, both single-sensor localization methods and existing multi-sensor fusion approaches have inherent limitations, making it difficult to achieve high-precision localization during long-duration missions. To address this issue, this study develops a deep-learning-based multi-source sensor fusion framework for AUV localization. In the proposed framework, high-frequency data from the Inertial navigation system (INS) and Doppler velocity log (DVL) are used for continuous position propagation, while low-frequency absolute position observations from the Ultra-short baseline (USBL) system and Sonar are used to periodically correct the propagated results. Based on this framework, three instantiated models are developed using a Deep neural network (DNN), a Long short-term memory (LSTM) network, and a Bayesian semi-supervised mixed shallow-layer neural network (BSsMSLNN), respectively. Comparative experiments are conducted against the Extended Kalman filter (EKF) and Simultaneous localization and mapping system using Sonar, Visual, Inertial, and Depth sensor (SVIn2). The results show that the proposed framework effectively suppresses long-term error accumulation and significantly improves localization accuracy. Among the evaluated models, the BSsMSLNN-based method achieves the best performance in terms of trajectory fitting, root mean square error (RMSE), and coefficient of determination (R2). The proposed method provides a feasible solution for high-precision autonomous navigation of AUVs in GPS-denied environments. Full article
(This article belongs to the Special Issue Advances in Underwater Positioning and Navigation Technology)
Show Figures

Figure 1

21 pages, 27380 KB  
Article
A 3D Indoor Modelling Method Using 360° Panoramic Images and Its Application to CCTV Camera Placement Optimization
by Anak Agung Surya Pradhana, Nobuo Funabiki, I Nyoman Darma Kotama, Kadek Suarjuna Batubulan and Putu Sugiartawan
Sensors 2026, 26(11), 3431; https://doi.org/10.3390/s26113431 - 28 May 2026
Viewed by 382
Abstract
Nowadays, closed-circuit television (CCTV) cameras are deployed worldwide to monitor movements of humans and other objects to improve the efficiency and safety of societies. Therefore, their proper placement is crucial for achieving effective surveillance coverage. Additionally, their proper placement is significantly important for [...] Read more.
Nowadays, closed-circuit television (CCTV) cameras are deployed worldwide to monitor movements of humans and other objects to improve the efficiency and safety of societies. Therefore, their proper placement is crucial for achieving effective surveillance coverage. Additionally, their proper placement is significantly important for maximizing visual coverage while reducing installation/management costs. For this task, digital twin is a useful technology, since it can simulate coverage and blind spots while freely changing camera locations. To implement digital twin, 3D modelling of a structure including a complex room is a key issue. In this paper, we propose a 3D indoor modelling method using 360° panoramic images and show its application to a CCTV camera placement optimization. This method constructs a structured 3D model of a target room from captured 360° panoramic images using a 3D Gaussian Splatting reconstruction method based on a visual simultaneous localization and mapping (VSLAM) framework. The Inertial Measurement Unit (IMU) is used together to improve the camera position estimation accuracy. The model construction is anchored using a GNSS/GPS reference to establish global spatial coordinates. As an application of the generated 3D model, optimal locations of a given number of CCTV cameras are determined by combining ray-casting visibility analysis and a greedy optimization algorithm in the virtual environment, maximizing visual coverage while minimizing blind spots and avoiding excessive overlap between camera views. For evaluations, we applied the proposed method to three rooms in Okayama University, Japan, and seven rooms in the Indonesian Institute of Business and Technology, Indonesia. After optimizing camera locations in the virtual environment, the cameras were actually installed in the rooms according to the recommended positions. The performance was evaluated using visibility coverage, blind spot reduction, and Root Mean Squared Error (RMSE) between the estimated and actual camera positions, where promising results were achieved. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

27 pages, 480 KB  
Article
Hardware-Oriented Lie-Group Optimization Library for FPGA-Accelerated SLAM Using Custom Numeric Precision
by Emanuel Trabes and Carlos Valderrama Sakuyama
Electronics 2026, 15(11), 2272; https://doi.org/10.3390/electronics15112272 - 25 May 2026
Viewed by 457
Abstract
Nonlinear optimization is a central component of visual odometry and simultaneous localization and mapping (SLAM), but its repeated small- and medium-scale linear algebra operations are difficult to deploy efficiently on embedded hardware. This paper presents a synthesizable C++ library for AMD/Xilinx Vitis high-level [...] Read more.
Nonlinear optimization is a central component of visual odometry and simultaneous localization and mapping (SLAM), but its repeated small- and medium-scale linear algebra operations are difficult to deploy efficiently on embedded hardware. This paper presents a synthesizable C++ library for AMD/Xilinx Vitis high-level synthesis (HLS) that provides field-programmable gate array (FPGA)-oriented dense linear algebra kernels and Lie-group primitives on SO(3) and SE(3). The library supports configurable scalar types, including IEEE floating point, posit arithmetic, and reduced-precision floating-point formats, enabling design-space exploration between numerical accuracy and hardware cost. The proposed kernels are integrated into the back-end of a monocular direct mesh-based visual SLAM system and evaluated on an AMD/Xilinx Kria KV260 platform. Compared with the software reference running on the embedded processor, the integrated FPGA implementation reduces the end-to-end optimization iteration time from 32.0 ms to 8.9 ms, corresponding to a speed-up of 3.6×, and reduces the dominant kernel latency from 25.0 ms to 4.9 ms. The most resource-efficient reduced-precision configuration reduces lookup table (LUT) usage by 29.6%, flip-flop (FF) usage by 25.7%, block random-access memory (BRAM) usage by 25.9%, and digital signal processor (DSP) usage by 38.6% relative to the floating-point hardware baseline, while keeping the relative trajectory error within 1.42%. The results show that Lie-group-aware optimization back-ends can be mapped to embedded FPGAs efficiently when fixed-size algebraic kernels, synthesis-aware memory structures, and configurable arithmetic are considered together. Full article
Show Figures

Figure 1

26 pages, 5908 KB  
Article
A2PM-VINS: A Visual–Inertial SLAM Method Based on Area-to-Point Matching
by Mengxing Ma, Zengao Jiang, Yunhai Yan, Jianing Tang and Yunhao Chen
Sensors 2026, 26(10), 3071; https://doi.org/10.3390/s26103071 - 13 May 2026
Viewed by 407
Abstract
The localization performance of visual–inertial simultaneous localization and mapping (VI-SLAM) strongly depends on front-end feature matching. In degraded scenes with low illumination, repetitive textures, and weak textures, traditional geometric front ends often suffer from sparse features and mismatches, resulting in unstable state estimation. [...] Read more.
The localization performance of visual–inertial simultaneous localization and mapping (VI-SLAM) strongly depends on front-end feature matching. In degraded scenes with low illumination, repetitive textures, and weak textures, traditional geometric front ends often suffer from sparse features and mismatches, resulting in unstable state estimation. To address this issue, this paper proposes Area-to-Point Matching Visual–Inertial SLAM (A2PM-VINS), a visual–inertial SLAM method based on Area-to-Point matching. The method introduces Area-to-Point hierarchical matching and a kinematic temporal inheritance mechanism to improve matching reliability and track continuity, and further designs an Anchor–Explorer feature selection strategy to retain features with higher geometric value for back-end optimization. In addition, a Sub-Window Consistency (SWC) weighting strategy is incorporated into the back end to suppress geometrically deceptive observations with poor temporal continuity and geometric consistency. Experiments on the European Robotics Challenge Micro Aerial Vehicle (EuRoC MAV) dataset show that A2PM-VINS achieves superior or competitive localization accuracy on multiple challenging sequences. The absolute trajectory errors on MH_04 and MH_05 are 0.0983 m and 0.1191 m, respectively, and stable tracking is maintained on V2_02, where VINS-Fusion fails. These results show that the proposed method effectively improves the robustness of visual–inertial state estimation in complex degraded environments. Full article
Show Figures

Figure 1

21 pages, 2880 KB  
Article
Robust Multi-Modal Factor Graph Optimization for Distributed Collaborative LiDAR–Visual–Inertial SLAM
by Wan Xu, Shijie Liu, Rupeng Chen, Simin Du and Yujie Wang
Appl. Sci. 2026, 16(10), 4677; https://doi.org/10.3390/app16104677 - 9 May 2026
Viewed by 278
Abstract
To address accuracy and reliability challenges in simultaneous localization and mapping (SLAM) systems under extreme conditions, this paper presents LIVE-SLAM, a tightly-coupled LiDAR–inertial–visual framework. The technical core integrates a LiDAR Probabilistic Feature Extraction (LPFE) module to reduce frontend overhead by retaining high-confidence features, [...] Read more.
To address accuracy and reliability challenges in simultaneous localization and mapping (SLAM) systems under extreme conditions, this paper presents LIVE-SLAM, a tightly-coupled LiDAR–inertial–visual framework. The technical core integrates a LiDAR Probabilistic Feature Extraction (LPFE) module to reduce frontend overhead by retaining high-confidence features, an adaptive confidence-based weighting strategy in the backend optimization to dynamically balance multi-modal residuals during sensor degradation, and a Visual Redundancy Removal (VRR) based hybrid loop closure mechanism to mitigate perceptual aliasing. Evaluation on the KITTI benchmark and challenging real-world datasets demonstrates that our multi-sensor fusion effectively prevents tracking failures typical of single-sensor systems. Specifically, compared to the LVI-SAM framework, the frontend runtime is reduced by 49% and backend efficiency is improved by 25% in complex urban sequences. Furthermore, our approach achieves an average RMSE improvement of 35.3% over FAST-LIO2 and LIO-SAM in diverse real-world scenarios, particularly in environments with geometric degradation and lighting variations. These findings confirm the system’s superior real-time efficiency and global localization precision in both standard benchmarks and complex practical applications. Full article
(This article belongs to the Section Robotics and Automation)
Show Figures

Figure 1

23 pages, 10069 KB  
Article
LIG-SLAM: A Lightweight Visual RGB-D SLAM for Indoor Dynamic Environments Leveraging Instance Segmentation and Geometric Information
by Xingyu Chen, Jiasai Wu, Junjie Hou, Xiao Liu and Junren Sun
Sensors 2026, 26(10), 2926; https://doi.org/10.3390/s26102926 - 7 May 2026
Viewed by 579
Abstract
Traditional visual Simultaneous Localization and Mapping (SLAM) systems achieve high accuracy in static environments. However, in indoor dynamic scenes with frequent object motions, the presence of moving objects severely violates the scene rigidity assumption, often leading to significant performance degradation and tracking instability. [...] Read more.
Traditional visual Simultaneous Localization and Mapping (SLAM) systems achieve high accuracy in static environments. However, in indoor dynamic scenes with frequent object motions, the presence of moving objects severely violates the scene rigidity assumption, often leading to significant performance degradation and tracking instability. To explicitly address this challenge, this paper introduces LIG-SLAM, a resource-efficient visual SLAM solution that extends the ORB-SLAM3 architecture. By incorporating dynamic object perception and geometric constraints, the system achieves robust localization in dynamic indoor environments, while its inference efficiency is significantly enhanced through targeted optimization. Specifically, a YOLOv5-based instance segmentation network is employed to obtain pixel-level segmentation of dynamic regions. To mitigate the erroneous rejection of static feature points, epipolar geometric constraints are incorporated to improve the accuracy of dynamic feature selection. Furthermore, a RANSAC-based depth consistency check is adopted to further enhance accuracy and alleviate the effects of epipolar degeneracy. Unlike conventional semantic SLAM frameworks, the proposed system incorporates ONNX-based optimization, thereby accelerating inference and improving real-time performance. Empirical evaluations conducted on TUM dynamic datasets indicate that the developed approach surpasses ORB-SLAM3 by a substantial margin, achieving a reduction of over 90% in terms of the Absolute Trajectory Error (ATE). Compared with existing semantic SLAM approaches, it achieves improvements in both accuracy and real-time performance, particularly in challenging indoor dynamic scenarios. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

16 pages, 6417 KB  
Article
Beyond Single Descriptors: Complementary Feature Learning for Image Matching
by Xianguo Yu, Yulong Feng and Xi Li
J. Imaging 2026, 12(5), 201; https://doi.org/10.3390/jimaging12050201 - 5 May 2026
Viewed by 485
Abstract
Sparse local feature matching has served as the cornerstone of numerous visual geometry tasks and attracted extensive attention. Although significant progress has been made in this area, improving the discriminative power of descriptors remains a key challenge. As far as we know, existing [...] Read more.
Sparse local feature matching has served as the cornerstone of numerous visual geometry tasks and attracted extensive attention. Although significant progress has been made in this area, improving the discriminative power of descriptors remains a key challenge. As far as we know, existing sparse feature matching methods only predict a single descriptor map for keypoints, which might restrict their potential in solving complex scenarios. This issue is particularly pronounced in real-time applications where most methods only learn descriptor maps at a reduced spatial resolution compared to the input image. Consequently, they require interpolating from the low resolution map for obtaining per-keypoint descriptors, which will introduce background contamination and reduce the discriminability of final descriptors. To address these issues, we propose an efficient novel complementary local feature description model. Specifically, the model simultaneously learns two descriptor maps using different loss functions within a single Convolutional Neural Network (CNN). An orthogonal loss is introduced to effectively coordinate the learning of the two branches, aiming to obtain decoupled and complementary descriptors. Extensive experiments across various visual geometry tasks, such as homography estimation, indoor and outdoor pose estimation, as well as visual localization, have demonstrated the superior performance of the proposed method. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

22 pages, 55191 KB  
Article
RTS-SLAM: A Trajectory Consistency-Driven Multi-Constraint Dynamic Feature-Rejection Method for Visual SLAM in Dynamic Environments
by Huailiang Wang, Qiming Hu, Beicheng Li, Yuhao Geng, Chao Su, Shibo Zhu, Enhui Zheng and Weimin Chen
Sensors 2026, 26(9), 2846; https://doi.org/10.3390/s26092846 - 2 May 2026
Cited by 2 | Viewed by 973
Abstract
Simultaneous Localization and Mapping (SLAM) is a fundamental methodology that underpins autonomous navigation in robotic systems. Conventional approaches perform well in static environments but rely on the assumption of environmental rigidity, which leads to significant accuracy degradation in dynamic environments. To address this [...] Read more.
Simultaneous Localization and Mapping (SLAM) is a fundamental methodology that underpins autonomous navigation in robotic systems. Conventional approaches perform well in static environments but rely on the assumption of environmental rigidity, which leads to significant accuracy degradation in dynamic environments. To address this challenge, this study presents RTS-SLAM, a real-time semantic visual SLAM system designed for dynamic environments. Based on the ORB-SLAM2 framework, a multi-layer, constraint-driven dynamic feature-rejection strategy is introduced. The proposed approach first removes dynamic features by combining semantic information with geometric constraints. Subsequently, residual dynamic points are eliminated via trajectory-consistency constraint analysis, thereby effectively improving localization accuracy. Furthermore, a dense mapping strategy featuring global sparsification and critical region refinement is proposed. By reducing redundancy in the dense point cloud, the method decreases memory usage while preserving important object geometries. Experimental evaluations on the TUM RGB-D and Bonn datasets indicate that RTS-SLAM reduces the average absolute trajectory error by more than 95% compared with ORB-SLAM2 in dynamic environments. Meanwhile, the system maintains real-time performance and achieves high localization accuracy in dynamic environments. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

15 pages, 4761 KB  
Article
AR-Based Teleoperation of an Omnidirectional Mobile Robot for UV-C Disinfection
by Andres de la Rosa-Garcia, Alma Guadalupe Rodriguez-Ramirez, Beatriz Alvarado Robles, Israel Soto-Marrufo, Diana Ortiz-Muñoz, Victor Manuel Alonso-Mendoza, David Luviano-Cruz and Francesco Garcia-Luna
Robotics 2026, 15(5), 94; https://doi.org/10.3390/robotics15050094 - 1 May 2026
Viewed by 517
Abstract
The COVID-19 pandemic highlighted the need for effective disinfection strategies in order to minimize human exposure and reduce the risk of contagion in indoor environments. Ultraviolet-C (UV-C) irradiation has proven to be an effective solution for inactivating a wide range of pathogens. However, [...] Read more.
The COVID-19 pandemic highlighted the need for effective disinfection strategies in order to minimize human exposure and reduce the risk of contagion in indoor environments. Ultraviolet-C (UV-C) irradiation has proven to be an effective solution for inactivating a wide range of pathogens. However, traditional fixed UV-C systems suffer from limited coverage and lack operational flexibility. To address these limitations, this paper proposes an augmented reality (AR)-based teleoperation framework for an omnidirectional mobile robot equipped with a UV-C disinfection light. Unlike traditional toolchain integrations, our framework synergizes immersive spatial visualization of a reconstructed environment, operator-guided waypoint-based remote navigation, and real-time interaction with the disinfection payload in a single operational workflow. The system is implemented using a ROSMASTER X3 Plus robotic platform, which generates a three-dimensional representation of the environment through visual simultaneous localization and mapping using RTAB-Map. The result is a 3D map that is imported into the Unity game engine and deployed to a Meta Quest 3 head-mounted display, enabling immersive visualization and interaction. Communication between the AR interface and the robotic system is achieved via the ROS-TCP-Connection, allowing real-time data exchange and remote robot control. Through the AR interface, the operator can navigate the robot within the scanned environment and activate the UV-C light. Experimental validation conducted in a classroom demonstrates the feasibility of the proposed approach and shows measurable reductions in surface microbial load. These results indicate that our system-level integration of AR-assisted teleoperation with mobile UV-C robotics represents a feasible proof-of-concept for flexible, operator-guided disinfection of indoor spaces. Full article
(This article belongs to the Special Issue Development of Biomedical Robotics)
Show Figures

Figure 1

37 pages, 5258 KB  
Article
UWB-Assisted Intelligent Light-Band Navigation System for Driverless Mining Vehicles: A Case Study in Underground Mines
by Junhong Liu, Xiaoquan Li and Chenglin Yin
Eng 2026, 7(5), 195; https://doi.org/10.3390/eng7050195 - 26 Apr 2026
Viewed by 306
Abstract
Autonomous driving in underground mines faces significant challenges due to Global Navigation Satellite System (GNSS) denial and harsh environmental conditions. Mainstream multi-sensor fusion and Simultaneous Localization and Mapping (SLAM) schemes have achieved substantial progress in underground navigation, but their deployment in feature-sparse tunnels [...] Read more.
Autonomous driving in underground mines faces significant challenges due to Global Navigation Satellite System (GNSS) denial and harsh environmental conditions. Mainstream multi-sensor fusion and Simultaneous Localization and Mapping (SLAM) schemes have achieved substantial progress in underground navigation, but their deployment in feature-sparse tunnels may still face challenges related to computational burden and perception robustness. This study explores an infrastructure-assisted navigation architecture that transforms the roadway into a structured luminous guidance channel by deploying programmable Light Emitting Diode (LED) strips along the tunnel roof. The proposed system simplifies complex three-dimensional pose estimation into a two-dimensional visual servoing task targeting optical signals. Central to this approach is a robust data fusion strategy that utilizes a topology matching algorithm to map noisy Ultra-Wide-band (UWB) coordinates onto a discrete LED index space, thereby providing a reliable global positioning reference. Furthermore, a hierarchical fault-tolerant controller based on a Finite State Machine (FSM) is designed to facilitate seamless degradation to a UWB-assisted ultrasonic wall-following mode in the event of visual degradation, supporting fault-tolerant operation under controlled laboratory conditions. Experimental results in a laboratory simulation environment demonstrate that the system achieves millimeter-level static initialization accuracy, a dynamic tracking Root Mean Square Error of approximately 4 cm, and a 100% autonomous recovery rate from visual failures in straight tunnels. These results demonstrate the feasibility of the proposed infrastructure-assisted route under controlled laboratory conditions and suggest its potential as an engineering reference for structured underground transport scenarios with acceptable infrastructure modification. Full article
Show Figures

Figure 1

26 pages, 4404 KB  
Article
Loop Closure with 3D Gaussian Splatting for Dynamic SLAM
by Zhanwu Ma, Wansheng Cheng and Song Fan
Sensors 2026, 26(9), 2669; https://doi.org/10.3390/s26092669 - 25 Apr 2026
Viewed by 984
Abstract
Robust pose estimation and high-fidelity scene reconstruction in dynamic environments represent core challenges in the field of Visual Simultaneous Localization and Mapping (SLAM). Although 3D Gaussian Splatting (3DGS)-based techniques have demonstrated significant potential, existing methods typically assume static scenes and struggle to address [...] Read more.
Robust pose estimation and high-fidelity scene reconstruction in dynamic environments represent core challenges in the field of Visual Simultaneous Localization and Mapping (SLAM). Although 3D Gaussian Splatting (3DGS)-based techniques have demonstrated significant potential, existing methods typically assume static scenes and struggle to address the inconsistency between photometric and geometric observations in dynamic settings, leading to a notable degradation in pose estimation and map accuracy. To address these issues, this paper presents a novel dynamic SLAM method: Loop Closure with 3D Gaussian Splatting for Dynamic SLAM (LCD-Splat). Taking RGB-D images as input, LCD-Splat integrates Mask R-CNN with an improved multi-view geometry approach to detect dynamic objects, generating static scene maps and filling in occluded backgrounds. By leveraging 3DGS submaps and a frame to model tracking strategy, LCD-Splat achieves dense map construction. The method initiates online loop closure detection and employs a novel coarse to fine 3DGS registration algorithm to compute loop closure constraints between submaps. Global consistency is ultimately ensured through robust pose graph optimization. Experimental results on real-world datasets such as TUM RGB-D and Bonn demonstrate that LCD-Splat outperforms existing state-of-the-art SLAM methods in terms of tracking, scene reconstruction, and rendering performance. This approach provides novel insights for high-precision SLAM in dynamic environments and holds significant implications for scene understanding in complex settings. Full article
Show Figures

Figure 1

23 pages, 7207 KB  
Article
Visual Understanding of Intelligent Apple Picking: Detection-Segmentation Joint Architecture Based on Improved YOLOv11
by Bin Yan and Qianru Wu
Horticulturae 2026, 12(4), 494; https://doi.org/10.3390/horticulturae12040494 - 18 Apr 2026
Viewed by 1411
Abstract
Achieving precise fruit localization and fine branch segmentation simultaneously in unstructured orchard environments remains challenging due to variable lighting, occlusion, and complex backgrounds. This study proposed a joint detection–segmentation architecture based on an improved YOLOv11 network for collaborative perception of apples and tree [...] Read more.
Achieving precise fruit localization and fine branch segmentation simultaneously in unstructured orchard environments remains challenging due to variable lighting, occlusion, and complex backgrounds. This study proposed a joint detection–segmentation architecture based on an improved YOLOv11 network for collaborative perception of apples and tree branches. First, a dual-task dataset of spindle-type apple orchards was constructed with bounding-box annotations for fruits and pixel-level polygon masks for branches, encompassing diverse illumination and occlusion conditions. Second, Convolutional Block Attention Modules (CBAMs) are strategically embedded into the YOLOv11 backbone to enhance feature discrimination for slender branch structures while preserving high fruit detection accuracy. The enhanced model achieves precision of 0.981, recall of 0.986, and F1-score of 0.983 for apple detection, and precision of 0.803, recall of 0.715, mAP of 0.698, and IoU of 0.6066 for branch segmentation on the validation set. Comparative experiments against YOLOv8 and baseline YOLOv11 confirm improved segmentation continuity and finer branch delineation. The proposed integrated perception framework provides reliable visual guidance for collision-avoidance robotic harvesting and offers a practical reference for multi-task agricultural vision systems. Full article
Show Figures

Figure 1

26 pages, 964 KB  
Article
Environment-Guided Multimodal Pest Detection and Risk Assessment in Fruit and Vegetable Production Systems
by Jiapeng Sun, Yucheng Peng, Zhimeng Zhang, Wenrui Xu, Boyuan Xi, Yuanying Zhang and Yihong Song
Horticulturae 2026, 12(4), 486; https://doi.org/10.3390/horticulturae12040486 - 16 Apr 2026
Viewed by 1322
Abstract
Aimed at the practical challenge that pest occurrence in fruit and vegetable horticultural production exhibits strong environmental dependency, pronounced stage characteristics, and high sensitivity to control decision-making, a multimodal pest recognition and occurrence risk joint modeling method is proposed to address the limitation [...] Read more.
Aimed at the practical challenge that pest occurrence in fruit and vegetable horticultural production exhibits strong environmental dependency, pronounced stage characteristics, and high sensitivity to control decision-making, a multimodal pest recognition and occurrence risk joint modeling method is proposed to address the limitation that conventional intelligent plant protection systems focus primarily on pest identification while lacking risk discrimination capability. Within a unified network framework, pest visual information and environmental temporal data are integrated through the construction of an environment-guided representation learning mechanism, a recognition–risk joint optimization strategy, and a risk-aware decision representation modeling structure. In this manner, pest category recognition and occurrence risk evaluation are conducted simultaneously, thereby providing direct decision support for precision prevention and control in fruit and vegetable production. Systematic experimental evaluation is conducted based on multi-crop and multi-year field data collected from Wuyuan County, Bayannur City, Inner Mongolia. Overall comparative results demonstrate that an identification accuracy of 0.947, a precision of 0.936, and a recall of 0.924 are achieved on the test set, all of which significantly outperform mainstream visual detection models such as YOLOv8, DETR, and Mask R-CNN. In terms of detection performance, mAP@50 and mAP@75 reach 0.962 and 0.821, respectively, indicating stable localization and discrimination capability under complex backgrounds and dense small-target conditions. For the occurrence risk discrimination task, a risk accuracy of 0.887 is obtained, representing an improvement of approximately 4.5 percentage points compared with the simple multimodal feature concatenation method. Cross-crop, cross-site, and cross-year generalization experiments further show that risk accuracy remains above 0.84 with stable recognition performance under significant distribution shifts. Ablation studies verify the synergistic contributions of the proposed core modules to overall performance improvement. The results indicate that the proposed framework enables the transition from single recognition to risk-driven plant protection decision-making, providing a technically viable pathway for pest diagnosis and control strategy optimization in fruit and vegetable horticulture. Full article
Show Figures

Figure 1

Back to TopTop