MDPI - Publisher of Open Access Journals

26 pages, 7857 KB

Open AccessArticle

YSAG-VINS—A Robust Visual-Inertial Navigation System with Adaptive Geometric Constraints and Semantic Information Based on YOLOv8n-ODUIB in Dynamic Environments

by Kunlin Wang, Dashuai Chai, Xiqi Wang, Ruijie Yan, Yipeng Ning, Wengang Sang and Shengli Wang

Appl. Sci. 2025, 15(19), 10595; https://doi.org/10.3390/app151910595 - 30 Sep 2025

Abstract

Dynamic environments pose significant challenges for Visual Simultaneous Localization and Mapping (VSLAM), as moving objects can introduce outlier observations that severely degrade localization and mapping performance. To address this problem, we propose YSAG-VINS, a VSLAM algorithm specifically designed for dynamic scenes. The system [...] Read more.

Dynamic environments pose significant challenges for Visual Simultaneous Localization and Mapping (VSLAM), as moving objects can introduce outlier observations that severely degrade localization and mapping performance. To address this problem, we propose YSAG-VINS, a VSLAM algorithm specifically designed for dynamic scenes. The system integrates an enhanced YOLOv8 object detection network with an adaptive epipolar constraint strategy to effectively identify and suppress the impact of dynamic features. In particular, a lightweight YOLOv8n model augmented with ODConv and UIB modules is employed to balance detection accuracy with real-time efficiency. Based on semantic detection results, images are divided into static background and potentially dynamic regions, and the motion state of these regions is further verified using geometric constraints. Features belonging to truly dynamic objects are then removed to enhance robustness. Comprehensive experiments on multiple public datasets demonstrate that YSAG-VINS achieves superior pose estimation accuracy compared with VINS-Fusion, VDO-SLAM, and Dynamic-VINS. On three dynamic sequences of the KITTI dataset, the proposed method achieves average RMSE improvement rates of 48.62%, 12.18%, and 13.50%, respectively. These results confirm that YSAG-VINS provides robust and high-accuracy localization performance in dynamic environments, making it a promising solution for real-world applications such as autonomous driving, service robotics, and augmented reality. Full article

28 pages, 10315 KB

Open AccessArticle

DKB-SLAM: Dynamic RGB-D Visual SLAM with Efficient Keyframe Selection and Local Bundle Adjustment

by Qian Sun, Ziqiang Xu, Yibing Li, Yidan Zhang and Fang Ye

Robotics 2025, 14(10), 134; https://doi.org/10.3390/robotics14100134 - 25 Sep 2025

Abstract

Reliable navigation for mobile robots in dynamic, human-populated environments remains a significant challenge, as moving objects often cause localization drift and map corruption. While Simultaneous Localization and Mapping (SLAM) techniques excel in static settings, issues like keyframe redundancy and optimization inefficiencies further hinder [...] Read more.

Reliable navigation for mobile robots in dynamic, human-populated environments remains a significant challenge, as moving objects often cause localization drift and map corruption. While Simultaneous Localization and Mapping (SLAM) techniques excel in static settings, issues like keyframe redundancy and optimization inefficiencies further hinder their practical deployment on robotic platforms. To address these challenges, we propose DKB-SLAM, a real-time RGB-D visual SLAM system specifically designed to enhance robotic autonomy in complex dynamic scenes. DKB-SLAM integrates optical flow with Gaussian-based depth distribution analysis within YOLO detection frames to efficiently filter dynamic points, crucial for maintaining accurate pose estimates for the robot. An adaptive keyframe selection strategy balances map density and information integrity using a sliding window, considering the robot’s motion dynamics through parallax, visibility, and matching quality. Furthermore, a heterogeneously weighted local bundle adjustment (BA) method leverages map point geometry, assigning higher weights to stable edge points to refine the robot’s trajectory. Evaluations on the TUM RGB-D benchmark and, crucially, on a mobile robot platform in real-world dynamic scenarios, demonstrate that DKB-SLAM outperforms state-of-the-art methods, providing a robust and efficient solution for high-precision robot localization and mapping in dynamic environments. Full article

(This article belongs to the Special Issue SLAM and Adaptive Navigation for Robotics)

► Show Figures

Figure 1

20 pages, 7575 KB

Open AccessArticle

A Two-Step Filtering Approach for Indoor LiDAR Point Clouds: Efficient Removal of Jump Points and Misdetected Points

by Yibo Cao, Yonghao Huang and Junheng Ni

Sensors 2025, 25(19), 5937; https://doi.org/10.3390/s25195937 - 23 Sep 2025

Viewed by 141

Abstract

In the simultaneous localization and mapping (SLAM) process of indoor mobile robots, accurate and stable point cloud data are crucial for localization and environment perception. However, in practical applications indoor mobile robots may encounter glass, smooth floors, edge objects, etc. Point cloud data [...] Read more.

In the simultaneous localization and mapping (SLAM) process of indoor mobile robots, accurate and stable point cloud data are crucial for localization and environment perception. However, in practical applications indoor mobile robots may encounter glass, smooth floors, edge objects, etc. Point cloud data are often misdetected in such environments, especially at the intersection of flat surfaces and edges of obstacles, which are prone to generating jump points. Smooth planes may also lead to the emergence of misdetected points due to reflective properties or sensor errors. To solve these problems, a two-step filtering method is proposed in this paper. In the first step, a clustering filtering algorithm based on radial distance and tangential span is used for effective filtering against jump points. The algorithm ensures accurate data by analyzing the spatial relationship between each point in the point cloud and the neighboring points, which allows it to identify and filter out the jump points. In the second step, a filtering algorithm based on the grid penetration model is used to further filter out misdetected points on the smooth plane. The model eliminates unrealistic point cloud data and improves the overall quality of the point cloud by simulating the characteristics of the beam penetrating the object. Experimental results in indoor environments show that this two-step filtering method significantly reduces jump points and misdetected points in the point cloud, leading to improved navigational accuracy and stability of indoor mobile robots. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

20 pages, 4833 KB

Open AccessArticle

High-Precision Visual SLAM for Dynamic Scenes Using Semantic–Geometric Feature Filtering and NeRF Maps

by Yanjun Ma, Jiahao Lv and Jie Wei

Electronics 2025, 14(18), 3657; https://doi.org/10.3390/electronics14183657 - 15 Sep 2025

Viewed by 379

Abstract

Dynamic environments pose significant challenges for visual SLAM, including feature ambiguity, weak textures, and map inconsistencies caused by moving objects. We present a robust SLAM framework integrating image enhancement, a mixed-precision quantized feature detection network, semantic-driven dynamic feature filtering, and NeRF-based static scene [...] Read more.

Dynamic environments pose significant challenges for visual SLAM, including feature ambiguity, weak textures, and map inconsistencies caused by moving objects. We present a robust SLAM framework integrating image enhancement, a mixed-precision quantized feature detection network, semantic-driven dynamic feature filtering, and NeRF-based static scene reconstruction. The system reliably extracts features under challenging conditions, removes dynamic points using instance segmentation combined with polar geometric constraints, and reconstructs static scenes with enhanced structural fidelity. Extensive experiments on TUM RGB-D, BONN RGB-D, and a custom dataset demonstrate notable improvements in the RMSE, mean, median, and standard deviation. Compared with ORB-SLAM3, our method achieves an average RMSE reduction of 93.4%, demonstrating substantial improvement, and relative to other state-of-the-art dynamic SLAM systems, it improves the average RMSE by 49.6% on TUM and 23.1% on BONN, highlighting its high accuracy, robustness, and adaptability in complex and highly dynamic environments. Full article

(This article belongs to the Special Issue 3D Computer Vision and 3D Reconstruction)

► Show Figures

Figure 1

25 pages, 20160 KB

Open AccessArticle

A Robust Framework Fusing Visual SLAM and 3D Gaussian Splatting with a Coarse-Fine Method for Dynamic Region Segmentation

by Zhian Chen, Yaqi Hu and Yong Liu

Sensors 2025, 25(17), 5539; https://doi.org/10.3390/s25175539 - 5 Sep 2025

Viewed by 1242

Abstract

Existing visual SLAM systems with neural representations excel in static scenes but fail in dynamic environments where moving objects degrade performance. To address this, we propose a robust dynamic SLAM framework combining classic geometric features for localization with learned photometric features for dense [...] Read more.

Existing visual SLAM systems with neural representations excel in static scenes but fail in dynamic environments where moving objects degrade performance. To address this, we propose a robust dynamic SLAM framework combining classic geometric features for localization with learned photometric features for dense mapping. Our method first tracks objects using instance segmentation and a Kalman filter. We then introduce a cascaded, coarse-to-fine strategy for efficient motion analysis: a lightweight sparse optical flow method performs a coarse screening, while a fine-grained dense optical flow clustering is selectively invoked for ambiguous targets. By filtering features on dynamic regions, our system drastically improves camera pose estimation, reducing Absolute Trajectory Error by up to 95% on dynamic TUM RGB-D sequences compared to ORB-SLAM3, and generates clean dense maps. The 3D Gaussian Splatting backend, optimized with a Gaussian pyramid strategy, ensures high-quality reconstruction. Validations on diverse datasets confirm our system’s robustness, achieving accurate localization and high-fidelity mapping in dynamic scenarios while reducing motion analysis computation by 91.7% over a dense-only approach. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Graphical abstract

23 pages, 4627 KB

Open AccessArticle

Dynamic SLAM Dense Point Cloud Map by Fusion of Semantic Information and Bayesian Moving Probability

by Qing An, Shao Li, Yanglu Wan, Wei Xuan, Chao Chen, Bufan Zhao and Xijiang Chen

Sensors 2025, 25(17), 5304; https://doi.org/10.3390/s25175304 - 26 Aug 2025

Viewed by 742

Abstract

Most existing Simultaneous Localization and Mapping (SLAM) systems rely on the assumption of static environments to achieve reliable and efficient mapping. However, such methods often suffer from degraded localization accuracy and mapping consistency in dynamic settings, as they lack explicit mechanisms to distinguish [...] Read more.

Most existing Simultaneous Localization and Mapping (SLAM) systems rely on the assumption of static environments to achieve reliable and efficient mapping. However, such methods often suffer from degraded localization accuracy and mapping consistency in dynamic settings, as they lack explicit mechanisms to distinguish between static and dynamic elements. To overcome this limitation, we present BMP-SLAM, a vision-based SLAM approach that integrates semantic segmentation and Bayesian motion estimation to robustly handle dynamic indoor scenes. To enable real-time dynamic object detection, we integrate YOLOv5, a semantic segmentation network that identifies and localizes dynamic regions within the environment, into a dedicated dynamic target detection thread. Simultaneously, the data association Bayesian mobile probability proposed in this paper effectively eliminates dynamic feature points and successfully reduces the impact of dynamic targets in the environment on the SLAM system. To enhance complex indoor robotic navigation, the proposed system integrates semantic keyframe information with dynamic object detection outputs to reconstruct high-fidelity 3D point cloud maps of indoor environments. The evaluation conducted on the TUM RGB-D dataset indicates that the performance of BMP-SLAM is superior to that of ORB-SLAM3, with the trajectory tracking accuracy improved by 96.35%. Comparative evaluations demonstrate that the proposed system achieves superior performance in dynamic environments, exhibiting both lower trajectory drift and enhanced positioning precision relative to state-of-the-art dynamic SLAM methods. Full article

(This article belongs to the Special Issue Indoor Localization Technologies and Applications)

► Show Figures

Figure 1

27 pages, 7285 KB

Open AccessArticle

Towards Biologically-Inspired Visual SLAM in Dynamic Environments: IPL-SLAM with Instance Segmentation and Point-Line Feature Fusion

by Jian Liu, Donghao Yao, Na Liu and Ye Yuan

Biomimetics 2025, 10(9), 558; https://doi.org/10.3390/biomimetics10090558 - 22 Aug 2025

Viewed by 668

Abstract

Simultaneous Localization and Mapping (SLAM) is a fundamental technique in mobile robotics, enabling autonomous navigation and environmental reconstruction. However, dynamic elements in real-world scenes—such as walking pedestrians, moving vehicles, and swinging doors—often degrade SLAM performance by introducing unreliable features that cause localization errors. [...] Read more.

Simultaneous Localization and Mapping (SLAM) is a fundamental technique in mobile robotics, enabling autonomous navigation and environmental reconstruction. However, dynamic elements in real-world scenes—such as walking pedestrians, moving vehicles, and swinging doors—often degrade SLAM performance by introducing unreliable features that cause localization errors. In this paper, we define dynamic regions as areas in the scene containing moving objects, and dynamic features as the visual features extracted from these regions that may adversely affect localization accuracy. Inspired by biological perception strategies that integrate semantic awareness and geometric cues, we propose Instance-level Point-Line SLAM (IPL-SLAM), a robust visual SLAM framework for dynamic environments. The system employs YOLOv8-based instance segmentation to detect potential dynamic regions and construct semantic priors, while simultaneously extracting point and line features using Oriented FAST (Features from Accelerated Segment Test) and Rotated BRIEF (Binary Robust Independent Elementary Features), collectively known as ORB, and Line Segment Detector (LSD) algorithms. Motion consistency checks and angular deviation analysis are applied to filter dynamic features, and pose optimization is conducted using an adaptive-weight error function. A static semantic point cloud map is further constructed to enhance scene understanding. Experimental results on the TUM RGB-D dataset demonstrate that IPL-SLAM significantly outperforms existing dynamic SLAM systems—including DS-SLAM and ORB-SLAM2—in terms of trajectory accuracy and robustness in complex indoor environments. Full article

(This article belongs to the Section Biomimetic Design, Constructions and Devices)

► Show Figures

Figure 1

19 pages, 5092 KB

Open AccessArticle

Estimating Position, Diameter at Breast Height, and Total Height of Eucalyptus Trees Using Portable Laser Scanning

by Milena Duarte Machado, Gilson Fernandes da Silva, André Quintão de Almeida, Adriano Ribeiro de Mendonça, Rorai Pereira Martins-Neto and Marcos Benedito Schimalski

Remote Sens. 2025, 17(16), 2904; https://doi.org/10.3390/rs17162904 - 20 Aug 2025

Viewed by 746

Abstract

Forest management planning depends on accurately collecting information on available resources, gathered by forest inventories. However, due to the extent of the planted areas in the world, collecting information traditionally has become challenging. Terrestrial light detection and ranging (LiDAR) has emerged as a [...] Read more.

Forest management planning depends on accurately collecting information on available resources, gathered by forest inventories. However, due to the extent of the planted areas in the world, collecting information traditionally has become challenging. Terrestrial light detection and ranging (LiDAR) has emerged as a promising tool to enhance forest inventory. However, selecting the optimal 3D point cloud density for accurately estimating tree attributes remains an open question. The objective of this study was to evaluate the accuracy of different point densities (points per square meter) in point clouds obtained through portable laser scanning combined with simultaneous localization and mapping (PLS-SLAM). The study aimed to identify tree positions and estimate the diameter at breast height (DBH) and total height (H) of 71 trees in a eucalyptus plantation in Brazil. We also tested a semi-automatic method for estimating total height. Point clouds with densities greater than 100 points/m² enabled the detection of over 88.7% of individual trees. The root mean square error (RMSE) of the best DBH measurement was 1.6 cm (RMSE = 5.9%) and the best H measurement (semi-automatic method) was 1.2 m (RMSE = 4.2%) for the point cloud with 36,000 points/m². When measuring the total heights of the largest trees (H > 31.4 m) using LiDAR, the values were always underestimated considering a reference value, and their measurements were significantly different (p-value < 0.05 by the t-test). For point clouds with a density of 36,000 points/m², the automated DBH and total tree height estimations yielded RMSEs of 5.9% and 14.4%, with biases of 4.8% and −1.4%, respectively. When using point clouds of 10 points/m², RMSE values increased to 18.8% for DBH and 28.4% for total tree height, while the bias was 6.2% and 18.4%, respectively. Additionally, total tree height estimations obtained via a semi-automatic method resulted in a lower RMSE of 4.2% and a bias of 1.5%. These findings indicate that point clouds acquired through PLS-SLAM with densities exceeding 100 points/m² are suitable for automated DBH estimation in the studied plantation. Despite the increased processing time required, the semi-automatic method is recommended for total tree height estimation due to its superior accuracy. Full article

(This article belongs to the Section Forest Remote Sensing)

► Show Figures

Figure 1

23 pages, 3199 KB

Open AccessArticle

A Motion Segmentation Dynamic SLAM for Indoor GNSS-Denied Environments

by Yunhao Wu, Ziyao Zhang, Haifeng Chen and Jian Li

Sensors 2025, 25(16), 4952; https://doi.org/10.3390/s25164952 - 10 Aug 2025

Viewed by 762

Abstract

In GNSS-deprived settings, such as indoor and underground environments, research on simultaneous localization and mapping (SLAM) technology remains a focal point. Addressing the influence of dynamic variables on positional precision and constructing a persistent map comprising solely static elements are pivotal objectives in [...] Read more.

In GNSS-deprived settings, such as indoor and underground environments, research on simultaneous localization and mapping (SLAM) technology remains a focal point. Addressing the influence of dynamic variables on positional precision and constructing a persistent map comprising solely static elements are pivotal objectives in visual SLAM for dynamic scenes. This paper introduces optical flow motion segmentation-based SLAM(OS-SLAM), a dynamic environment SLAM system that incorporates optical flow motion segmentation for enhanced robustness. Initially, a lightweight multi-scale optical flow network is developed and optimized using multi-scale feature extraction and update modules to enhance motion segmentation accuracy with rigid masks while maintaining real-time performance. Subsequently, a novel fusion approach combining the YOLO-fastest method and Rigidmask fusion is proposed to mitigate mis-segmentation errors of static backgrounds caused by non-rigid moving objects. Finally, a static dense point cloud map is generated by filtering out abnormal point clouds. OS-SLAM integrates optical flow estimation with motion segmentation to effectively reduce the impact of dynamic objects. Experimental findings from the Technical University of Munich (TUM) dataset demonstrate that the proposed method significantly outperforms ORB-SLAM3 in handling high dynamic sequences, achieving a reduction of 91.2% in absolute position error (APE) and 45.1% in relative position error (RPE) on average. Full article

(This article belongs to the Collection Navigation Systems and Sensors)

► Show Figures

Figure 1

17 pages, 7341 KB

Open AccessArticle

Three-Dimensional Environment Mapping with a Rotary-Driven Lidar in Real Time

by Baixin Tong, Fangdi Jiang, Bo Lu, Zhiqiang Gu, Yan Li and Shifeng Wang

Sensors 2025, 25(15), 4870; https://doi.org/10.3390/s25154870 - 7 Aug 2025

Viewed by 834

Abstract

Three-dimensional environment reconstruction refers to the creation of mathematical models of three-dimensional objects suitable for computer representation and processing. This paper proposes a novel 3D environment reconstruction approach that addresses the field-of-view limitations commonly faced by LiDAR-based systems. A rotary-driven LiDAR mechanism is [...] Read more.

Three-dimensional environment reconstruction refers to the creation of mathematical models of three-dimensional objects suitable for computer representation and processing. This paper proposes a novel 3D environment reconstruction approach that addresses the field-of-view limitations commonly faced by LiDAR-based systems. A rotary-driven LiDAR mechanism is designed to enable uniform and seamless full-field-of-view scanning, thereby overcoming blind spots in traditional setups. To complement the hardware, a multi-sensor fusion framework—LV-SLAM (LiDAR-Visual Simultaneous Localization and Mapping)—is introduced. The framework consists of two key modules: multi-threaded feature registration and a two-phase loop closure detection mechanism, both designed to enhance the system’s accuracy and robustness. Extensive experiments on the KITTI benchmark demonstrate that LV-SLAM outperforms state-of-the-art methods including LOAM, LeGO-LOAM, and FAST-LIO2. Our method reduces the average absolute trajectory error (ATE) from 6.90 m (LOAM) to 2.48 m, and achieves lower relative pose error (RPE), indicating improved global consistency and reduced drift. We further validate the system in real-world indoor and outdoor environments. Compared with fixed-angle scans, the rotary LiDAR mechanism produces more complete reconstructions with fewer occlusions. Geometric accuracy evaluation shows that the root mean square error between reconstructed and actual building dimensions remains below 5 cm. The proposed system offers a robust and accurate solution for high-fidelity 3D reconstruction, particularly suitable for GNSS-denied and structurally complex environments. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 3rd Edition)

► Show Figures

Figure 1

26 pages, 16392 KB

Open AccessArticle

TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology

by Jun-Hyeon Choi, Jeong-Won Pyo, Ye-Chan An and Tae-Yong Kuc

Sensors 2025, 25(15), 4614; https://doi.org/10.3390/s25154614 - 25 Jul 2025

Cited by 1 | Viewed by 583

Abstract

This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical representation that is [...] Read more.

This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical representation that is explicitly designed for multi-level reasoning. TOSD combines shape, color, and topological information without depending on predefined class labels. The shape descriptor captures the geometric configuration of each object. The color descriptor focuses on internal appearance by extracting normalized color features. The topology descriptor models the spatial and semantic relationships between objects in a scene. These components are integrated at both object and scene levels to produce compact and consistent embeddings. The resulting representation covers three levels of abstraction: low-level pixel details, mid-level object features, and high-level semantic structure. This hierarchical organization makes it possible to represent both local cues and global context in a unified form. We evaluate the proposed method on multiple vision tasks. The results show that TOSD performs competitively compared to baseline methods, while maintaining robustness in challenging cases such as occlusion and viewpoint changes. The framework is applicable to visual odometry, SLAM, object tracking, global localization, scene clustering, and image retrieval. In addition, this work extends our previous research on the Semantic Modeling Framework, which represents environments using layered structures of places, objects, and their ontological relations. Full article

(This article belongs to the Special Issue Event-Driven Vision Sensor Architectures and Application Scenarios)

► Show Figures

Figure 1

28 pages, 6171 KB

Open AccessArticle

Error Distribution Pattern Analysis of Mobile Laser Scanners for Precise As-Built BIM Generation

by Sung-Jae Bae, Junbeom Park, Joonhee Ham, Minji Song and Jung-Yeol Kim

Appl. Sci. 2025, 15(14), 8076; https://doi.org/10.3390/app15148076 - 20 Jul 2025

Viewed by 558

Abstract

Point clouds acquired by mobile laser scanners (MLS) are widely used for generating as-built building information models (BIM), particularly in indoor construction environments and existing buildings. While MLS offers fast and efficient scanning through SLAM technology, its accuracy and precision remains lower than [...] Read more.

Point clouds acquired by mobile laser scanners (MLS) are widely used for generating as-built building information models (BIM), particularly in indoor construction environments and existing buildings. While MLS offers fast and efficient scanning through SLAM technology, its accuracy and precision remains lower than that of terrestrial laser scanners (TLS). This study investigates the potential to improve MLS-based as-built BIM accuracy by analyzing and utilizing error distribution patterns inherent in MLS point clouds. Based on the assumption that each MLS device exhibits consistent and unique error distribution patterns, an experiment was conducted using three MLS devices and TLS-derived reference data. The analysis employed iterative closest point (ICP) registration and cloud-to-mesh (C2M) distance measurements on mock-ups with closed shapes. The results revealed that error patterns were stable across scans and could be leveraged as correction factors. In other words, the results indicate that when using MLS for as-built BIM generation, robust fitting methods have limitations in obtaining realistic object dimensions, as they do not account for the unique error patterns present in MLS point clouds. The proposed method provides a simple and repeatable approach for enhancing MLS accuracy, contributing to improved dimensional reliability in MLS-driven BIM applications. Full article

(This article belongs to the Special Issue Construction Automation and Robotics)

► Show Figures

Figure 1

20 pages, 3688 KB

Open AccessArticle

Intelligent Fruit Localization and Grasping Method Based on YOLO VX Model and 3D Vision

by Zhimin Mei, Yifan Li, Rongbo Zhu and Shucai Wang

Agriculture 2025, 15(14), 1508; https://doi.org/10.3390/agriculture15141508 - 13 Jul 2025

Viewed by 849

Abstract

Recent years have seen significant interest among agricultural researchers in using robotics and machine vision to enhance intelligent orchard harvesting efficiency. This study proposes an improved hybrid framework integrating YOLO VX deep learning, 3D object recognition, and SLAM-based navigation for harvesting ripe fruits [...] Read more.

Recent years have seen significant interest among agricultural researchers in using robotics and machine vision to enhance intelligent orchard harvesting efficiency. This study proposes an improved hybrid framework integrating YOLO VX deep learning, 3D object recognition, and SLAM-based navigation for harvesting ripe fruits in greenhouse environments, achieving servo control of robotic arms with flexible end-effectors. The method comprises three key components: First, a fruit sample database containing varying maturity levels and morphological features is established, interfaced with an optimized YOLO VX model for target fruit identification. Second, a 3D camera acquires the target fruit’s spatial position and orientation data in real time, and these data are stored in the collaborative robot’s microcontroller. Finally, employing binocular calibration and triangulation, the SLAM navigation module guides the robotic arm to the designated picking location via unobstructed target positioning. Comprehensive comparative experiments between the improved YOLO v12n model and earlier versions were conducted to validate its performance. The results demonstrate that the optimized model surpasses traditional recognition and harvesting methods, offering superior target fruit identification response (minimum 30.9ms) and significantly higher accuracy (91.14%). Full article

(This article belongs to the Special Issue Advanced Image Collection, Processing, and Analysis in Crop and Livestock Management)

► Show Figures

Figure 1

25 pages, 8564 KB

Open AccessArticle

A Vision-Based Single-Sensor Approach for Identification and Localization of Unloading Hoppers

by Wuzhen Wang, Tianyu Ji, Qi Xu, Chunyi Su and Guangming Zhang

Sensors 2025, 25(14), 4330; https://doi.org/10.3390/s25144330 - 10 Jul 2025

Viewed by 536

Abstract

To promote the automation and intelligence of rail freight, the accurate identification and localization of bulk cargo unloading hoppers have become a key technical challenge. Under the technological wave driven by the deep integration of Industry 4.0 and artificial intelligence, the bulk cargo [...] Read more.

To promote the automation and intelligence of rail freight, the accurate identification and localization of bulk cargo unloading hoppers have become a key technical challenge. Under the technological wave driven by the deep integration of Industry 4.0 and artificial intelligence, the bulk cargo unloading process is undergoing a significant transformation from manual operation to intelligent control. In response to this demand, this paper proposes a vision-based 3D localization system for unloading hoppers, which adopts a single visual sensor architecture and integrates three core modules: object detection, corner extraction, and 3D localization. Firstly, a lightweight hybrid attention mechanism is incorporated into the YOLOv5 network to enable edge deployment and enhance the detection accuracy of unloading hoppers in complex industrial scenarios. Secondly, an image processing approach combining depth consistency constraint (DCC) and geometric structure constraints is designed to achieve sub-pixel level extraction of key corner points. Finally, a real-time 3D localization method is realized by integrating corner-based initialization with an RGB-D SLAM tracking mechanism. Experimental results demonstrate that the proposed system achieves an average localization accuracy of 97.07% under challenging working conditions. This system effectively meets the comprehensive requirements of automation, intelligence, and high precision in railway bulk cargo unloading processes, and exhibits strong engineering practicality and application potential. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

34 pages, 5774 KB

Open AccessArticle

Approach to Semantic Visual SLAM for Bionic Robots Based on Loop Closure Detection with Combinatorial Graph Entropy in Complex Dynamic Scenes

by Dazheng Wang and Jingwen Luo

Biomimetics 2025, 10(7), 446; https://doi.org/10.3390/biomimetics10070446 - 6 Jul 2025

Viewed by 612

Abstract

In complex dynamic environments, the performance of SLAM systems on bionic robots is susceptible to interference from dynamic objects or structural changes in the environment. To address this problem, we propose a semantic visual SLAM (vSLAM) algorithm based on loop closure detection with [...] Read more.

In complex dynamic environments, the performance of SLAM systems on bionic robots is susceptible to interference from dynamic objects or structural changes in the environment. To address this problem, we propose a semantic visual SLAM (vSLAM) algorithm based on loop closure detection with combinatorial graph entropy. First, in terms of the dynamic feature detection results of YOLOv8-seg, the feature points at the edges of the dynamic object are finely judged by calculating the mean absolute deviation (MAD) of the depth of the pixel points. Then, a high-quality keyframe selection strategy is constructed by combining the semantic information, the average coordinates of the semantic objects, and the degree of variation in the dense region of feature points. Subsequently, the unweighted and weighted graphs of keyframes are constructed according to the distribution of feature points, characterization points, and semantic information, and then a high-performance loop closure detection method based on combinatorial graph entropy is developed. The experimental results show that our loop closure detection approach exhibits higher precision and recall in real scenes compared to the bag-of-words (BoW) model. Compared with ORB-SLAM2, the absolute trajectory accuracy in high-dynamic sequences improved by an average of 97.01%, while the number of extracted keyframes decreased by an average of 61.20%. Full article

(This article belongs to the Special Issue Artificial Intelligence for Autonomous Robots: 3rd Edition)

► Show Figures

Figure 1

Search Results (254)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (254)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI