MDPI - Publisher of Open Access Journals

34 pages, 24153 KB

Open AccessArticle

Forest Vegetation 3D Localization Using Deep Learning Object Detectors

by Paulo A. S. Mendes, António P. Coimbra and Aníbal T. de Almeida

Appl. Sci. 2026, 16(7), 3375; https://doi.org/10.3390/app16073375 - 31 Mar 2026

Viewed by 300

Forest fires are becoming increasingly prevalent and destructive in many regions of the world, posing significant threats to biodiversity, ecosystems, human settlements, climate, and the economy. The United States of America (USA), Australia, Canada, Greece and Portugal are five regions that have experienced [...] Read more.

Forest fires are becoming increasingly prevalent and destructive in many regions of the world, posing significant threats to biodiversity, ecosystems, human settlements, climate, and the economy. The United States of America (USA), Australia, Canada, Greece and Portugal are five regions that have experienced enormous forest fires. One way to reduce the size and rage of forest fires is by decreasing the amount of flammable material in forests. This can be achieved using autonomous Unmanned Ground Vehicles (UGVs) specialized in vegetation cutting and equipped with Artificial Intelligence (AI) algorithms to identify and differentiate between vegetation that should be preserved and material that should be removed as potential fire fuel. In this paper, an innovative study of forest vegetation detection, classification and 3D localization using ground vehicles’ RGB and depth images is presented to support autonomous forest cleaning operations to prevent fires. The presented work, which is a continuation of a previous research, presents a method for 3D objects localization in the real-world using Deep Learning Object Detection (DLOD) combined with an RGB-D camera. It presents and compares results of eight recent high-performance DLOD architectures, YOLOv5, YOLOv7, YOLOv8, YOLO-NAS, YOLOv9, YOLOv10, YOLO11 and YOLOv12, to detect and classify forest vegetation in five classes: “Grass”, “Live vegetation”, “Cut vegetation”, “Dead vegetation”, and “Tree-trunk”. For the training of the DLOD models, our custom dataset acquired in dense forests in Portugal is used. A methodology that combines the best DLOD trained for vegetation detection and classification and an RGB-D camera for the 3D localization of the classified detected objects in the real-world. The presented methods are employed in an Unmanned Ground Vehicle (UGV) to localize forest vegetation that needs to be thinned for fire prevention purposes. A key challenge for autonomous forest vegetation cleaning is the reliable discrimination of objects that need to be identified to reach the goal of fire prevention using autonomous unmanned ground vehicles in dense forests. With the obtained results, forest vegetation is precisely detected, classified and localized using the DL models and the localization method presented. Also, the fastest DLOD architecture to train is YOLOv5, and the fastest to infer are YOLOv7 and YOLOv12. The innovation presented is the detection, classification, and 3D localization of the vegetation using DLOD architectures, in real-time, with a localization error of the real-world object in width, height and depth under 21.4, 20.7 and 11%, respectively, using only a depth camera and a processing unit. The 3D localized objects are defined as parallelepiped geometrical shapes. The methodology for vegetation detection, classification and localization presented in this paper is highly suitable for future autonomous forest vegetation cleansing, specialized using unmanned ground vehicles. Full article

► Show Figures

Figure 1

17 pages, 14849 KB

Open AccessArticle

A Collaborative Robotic System for Autonomous Object Handling with Natural User Interaction

by Federico Neri, Gaetano Lettera, Giacomo Palmieri and Massimo Callegari

Robotics 2026, 15(3), 49; https://doi.org/10.3390/robotics15030049 - 27 Feb 2026

Viewed by 707

Abstract

In Industry 5.0, the transition from fixed traditional automation to flexible human–robot collaboration (HRC) needs interfaces that are both intuitive and efficient. This paper introduces a novel, multimodal control system for autonomous object handling, specifically designed to enhance natural user interaction in dynamic [...] Read more.

In Industry 5.0, the transition from fixed traditional automation to flexible human–robot collaboration (HRC) needs interfaces that are both intuitive and efficient. This paper introduces a novel, multimodal control system for autonomous object handling, specifically designed to enhance natural user interaction in dynamic work environments. The system integrates a 6-Degrees of Freedom (DoF) collaborative robot (UR5e) with a hand-eye RGB-D vision system to achieve robust autonomy. The core technical contribution lies in a vision pipeline utilizing deep learning for object detection and point cloud processing for accurate 6D pose estimation, enabling advanced tasks such as human-aware object handover directly onto the operator’s hand. Crucially, an Automatic Speech Recognition (ASR) is incorporated, providing a Natural Language Understanding (NLU) layer that allows operators to issue real-time commands for task modification, error correction and object selection. Experimental results demonstrate that this multimodal approach offers a streamlined workflow aiming to improve operational flexibility compared to traditional HMIs, while enhancing the perceived naturalness of the collaborative task. The system establishes a framework for highly responsive and intuitive human–robot workspaces, advancing the state of the art in natural interaction for collaborative object manipulation. Full article

(This article belongs to the Special Issue Human–Robot Collaboration in Industry 5.0)

► Show Figures

Figure 1

27 pages, 5554 KB

Open AccessArticle

Hierarchical Autonomous Navigation for Differential-Drive Mobile Robots Using Deep Learning, Reinforcement Learning, and Lyapunov-Based Trajectory Control

by Ramón Jaramillo-Martínez, Ernesto Chavero-Navarrete and Teodoro Ibarra-Pérez

Technologies 2026, 14(2), 125; https://doi.org/10.3390/technologies14020125 - 17 Feb 2026

Viewed by 590

Abstract

Autonomous navigation in mobile robots operating in dynamic and partially known environments demands the coordinated integration of perception, decision-making, and control while ensuring stability, safety, and energy efficiency. This paper presents an integrated navigation framework for differential-drive mobile robots that combines deep learning-based [...] Read more.

Autonomous navigation in mobile robots operating in dynamic and partially known environments demands the coordinated integration of perception, decision-making, and control while ensuring stability, safety, and energy efficiency. This paper presents an integrated navigation framework for differential-drive mobile robots that combines deep learning-based visual perception, reinforcement learning (RL) for high-level decision-making, and a Lyapunov-based trajectory reference generator for low-level motion execution. A convolutional neural network processes RGB-D images to classify obstacle configurations in real time, enabling navigation without prior map information. Based on this perception layer, an RL policy generates adaptive navigation subgoals in response to environmental changes. To ensure stable motion execution, a Lyapunov-based control strategy is formulated at the kinematic level to generate smooth velocity references, which are subsequently tracked by embedded PID controllers, explicitly decoupling learning-based decision-making from stability-critical control tasks. The local stability of the trajectory-tracking error is analyzed using a quadratic Lyapunov candidate function, ensuring asymptotic convergence under ideal kinematic assumptions. Experimental results demonstrate that while higher control gains provide faster convergence in simulation, an intermediate gain value (K = 0.5I) achieves a favorable trade-off between responsiveness and robustness in real-world conditions, mitigating oscillations caused by actuator dynamics, delays, and sensor noise. Validation across multiple navigation scenarios shows average tracking errors below 1.2 cm, obstacle detection accuracies above 95% for human obstacles, and a significant reduction in energy consumption compared to classical A* planners, highlighting the effectiveness of integrating learning-based navigation with analytically grounded control. Full article

(This article belongs to the Topic The AI Revolution: Driving the Evolution of Robotics and Smart Systems)

► Show Figures

Figure 1

29 pages, 33196 KB

Open AccessArticle

Robust Autonomous Perception for Indoor Service Machines via Geometry-Aware RGB-D SLAM and Probabilistic Dynamic Modeling

by Zhiyu Wang, Weili Ding and Wenna Wang

Machines 2026, 14(2), 222; https://doi.org/10.3390/machines14020222 - 12 Feb 2026

Viewed by 361

Abstract

Reliable autonomous perception is essential for indoor service machines operating in human-centered environments, where weak textures, repetitive structures, and frequent dynamic interference often degrade localization stability. Conventional RGB-D SLAM systems typically rely on static-scene assumptions or binary semantic masking, which are insufficient for [...] Read more.

Reliable autonomous perception is essential for indoor service machines operating in human-centered environments, where weak textures, repetitive structures, and frequent dynamic interference often degrade localization stability. Conventional RGB-D SLAM systems typically rely on static-scene assumptions or binary semantic masking, which are insufficient for handling persistent and fine-grained environmental dynamics. This paper presents a robust autonomous perception framework based on geometry-aware RGB-D SLAM, with a particular emphasis on probabilistic dynamic modeling at the feature level. The proposed system integrates multi-granularity geometric representations, including point features, parallel-line structures, and planar regions, to enhance geometric observability in low-texture indoor environments. On this basis, a probabilistic dynamic model is introduced to explicitly characterize feature reliability under motion, where dynamic probabilities are initialized by object detection and continuously updated through temporal consistency, spatial propagation, and multi-view geometric verification. Large-scale planar structures further serve as stable anchors to support robust pose estimation. Experimental results on the TUM RGB-D dynamic benchmark demonstrate that the proposed method significantly improves localization robustness, reducing the average ATE RMSE by approximately 66% compared with representative dynamic SLAM baselines. Additional evaluations on a real-world indoor dataset further validate its effectiveness for long-term autonomous perception under dense motion and frequent occlusions. Full article

(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)

► Show Figures

Figure 1

22 pages, 3651 KB

Open AccessArticle

Preliminary Exploration of a Gait Alteration Index to Detect Abnormal Walking Through a RGB-D Camera and Human Pose Estimation

by Gianluca Amprimo, Lorenzo Priano, Luca Vismara and Claudia Ferraris

Algorithms 2026, 19(2), 146; https://doi.org/10.3390/a19020146 - 11 Feb 2026

Viewed by 429

Abstract

Quantitative gait analysis is essential for assessing motor function, as altered walking patterns are linked to functional decline and increased fall risk. Although recent advances in markerless motion analysis and human pose estimation enable gait feature extraction from low-cost video systems compared to [...] Read more.

Quantitative gait analysis is essential for assessing motor function, as altered walking patterns are linked to functional decline and increased fall risk. Although recent advances in markerless motion analysis and human pose estimation enable gait feature extraction from low-cost video systems compared to expensive motion analysis laboratories, clinical translation remains limited by fragmented descriptors or approaches that directly regress clinical scores, often reducing interpretability and generalizability. We propose the Gait Alteration Index (GAI), an interpretable index that quantifies gait abnormality as a functional deviation from typical walking patterns, independently of specific pathologies. The GAI is computed from a small set of gait parameters and integrates three complementary domains: spatio-temporal characteristics, surrogates of dynamic stability, and arm swing behaviour, providing both a global index and domain-specific sub-indices. Preliminary evaluation on a heterogeneous cohort using clinician-derived assessments showed that the GAI captures clinically meaningful gait alterations (Spearman’s

ρ = 0.65

), with the strongest agreement for spatio-temporal features (

ρ = 0.77

). These results suggest that the GAI is a promising low-cost, and interpretable tool for objective gait assessment, screening, and longitudinal monitoring. Full article

(This article belongs to the Special Issue Machine Learning for Advanced Healthcare: Bridging Innovation and Clinical Implementation)

► Show Figures

Figure 1

20 pages, 4633 KB

Open AccessArticle

Teleoperation System for Service Robots Using a Virtual Reality Headset and 3D Pose Estimation

by Tiago Ribeiro, Eduardo Fernandes, António Ribeiro, Carolina Lopes, Fernando Ribeiro and Gil Lopes

Sensors 2026, 26(2), 471; https://doi.org/10.3390/s26020471 - 10 Jan 2026

Viewed by 895

Abstract

This paper presents an immersive teleoperation framework for service robots that combines real-time 3D human pose estimation with a Virtual Reality (VR) interface to support intuitive, natural robot control. The operator is tracked using MediaPipe for 2D landmark detection and an Intel RealSense [...] Read more.

This paper presents an immersive teleoperation framework for service robots that combines real-time 3D human pose estimation with a Virtual Reality (VR) interface to support intuitive, natural robot control. The operator is tracked using MediaPipe for 2D landmark detection and an Intel RealSense D455 RGB-D (Red-Green-Blue plus Depth) camera for depth acquisition, enabling 3D reconstruction of key joints. Joint angles are computed using efficient vector operations and mapped to the kinematic constraints of an anthropomorphic arm on the CHARMIE service robot. A VR-based telepresence interface provides stereoscopic video and head-motion-based view control to improve situational awareness during manipulation tasks. Experiments in real-world object grasping demonstrate reliable arm teleoperation and effective telepresence; however, vision-only estimation remains limited for axial rotations (e.g., elbow and wrist yaw), particularly under occlusions and unfavorable viewpoints. The proposed system provides a practical pathway toward low-cost, sensor-driven, immersive human–robot interaction for service robotics in dynamic environments. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

21 pages, 3387 KB

Open AccessArticle

Development of an Autonomous and Interactive Robot Guide for Industrial Museum Environments Using IoT and AI Technologies

by Andrés Arteaga-Vargas, David Velásquez, Juan Pablo Giraldo-Pérez and Daniel Sanin-Villa

Sci 2025, 7(4), 175; https://doi.org/10.3390/sci7040175 - 1 Dec 2025

Viewed by 1859

Abstract

This paper presents the design of an autonomous robot guide for a museum-like environment in a motorcycle assembly plant. The system integrates Industry 4.0 technologies such as artificial vision, indoor positioning, generative artificial intelligence, and cloud connectivity to enhance the visitor experience. The [...] Read more.

This paper presents the design of an autonomous robot guide for a museum-like environment in a motorcycle assembly plant. The system integrates Industry 4.0 technologies such as artificial vision, indoor positioning, generative artificial intelligence, and cloud connectivity to enhance the visitor experience. The development follows the Design Inclusive Research (DIR) methodology and the VDI 2206 standard to ensure a structured scientific and engineering process. A key innovation is the integration of mmWave sensors alongside LiDAR and RGB-D cameras, enabling reliable human detection and improved navigation safety in reflective indoor environments, as well as the deployment of an open-source large language model for natural, on-device interaction with visitors. The current results include the complete mechanical, electronic, and software architecture; simulation validation; and a preliminary implementation in the real museum environment, where the system demonstrated consistent autonomous navigation, stable performance, and effective user interaction. Full article

(This article belongs to the Section Computer Science, Mathematics and AI)

► Show Figures

Figure 1

23 pages, 15360 KB

Open AccessArticle

A Mobile Robotic System Design and Approach for Autonomous Targeted Disinfection

by Mohammed Z. Shaqura, Linyan Han, Mohammadali Javaheri Koopaee, Wissem Haouas, Moustafa Motawei, Peter Mooney, Nick Fry, Tony Wiese, Bilal Kaddouh and Robert C. Richardson

Robotics 2025, 14(12), 178; https://doi.org/10.3390/robotics14120178 - 30 Nov 2025

Viewed by 1121

Abstract

The recent global pandemic has posed unprecedented challenges to public health systems and has highlighted the critical need for effective, contactless disinfection strategies in shared environments. This study investigates the use of autonomous robotics to enhance disinfection efficiency and safety in public spaces [...] Read more.

The recent global pandemic has posed unprecedented challenges to public health systems and has highlighted the critical need for effective, contactless disinfection strategies in shared environments. This study investigates the use of autonomous robotics to enhance disinfection efficiency and safety in public spaces through the development of a custom-built mobile spraying platform. The proposed robotic system is equipped with an integrated 3D object pose estimation framework that fuses RGB-based object detection with point cloud segmentation to accurately identify and localize high-contact surfaces. To facilitate autonomous operation, both local and global motion planning algorithms are implemented, enabling the robot to navigate complex environments and execute disinfection tasks with minimal human intervention. Experimental results demonstrate the feasibility of the proposed disinfection robotic system. Full article

(This article belongs to the Section Sensors and Control in Robotics)

► Show Figures

Figure 1

26 pages, 1617 KB

Open AccessArticle

MemRoadNet: Human-like Memory Integration for Free Road Space Detection

by Sidra Shafiq, Abdullah Aman Khan and Jie Shao

Sensors 2025, 25(21), 6600; https://doi.org/10.3390/s25216600 - 27 Oct 2025

Viewed by 935

Abstract

Detecting available road space is a fundamental task for autonomous driving vehicles, requiring robust image feature extraction methods that operate reliably across diverse sensor-captured scenarios. However, existing approaches process each input independently without leveraging Accumulated Experiential Knowledge (AEK), limiting their adaptability and reliability. [...] Read more.

Detecting available road space is a fundamental task for autonomous driving vehicles, requiring robust image feature extraction methods that operate reliably across diverse sensor-captured scenarios. However, existing approaches process each input independently without leveraging Accumulated Experiential Knowledge (AEK), limiting their adaptability and reliability. In order to explore the impact of AEK, we introduce MemRoadNet, a Memory-Augmented (MA) semantic segmentation framework that integrates human-inspired cognitive architectures with deep-learning models for free road space detection. Our approach combines an InternImage-XL backbone with a UPerNet decoder and a Human-like Memory Bank system implementing episodic, semantic, and working memory subsystems. The memory system stores road experiences with emotional valences based on segmentation performance, enabling intelligent retrieval and integration of relevant historical patterns during training and inference. Experimental validation on the KITTI road, Cityscapes, and R2D benchmarks demonstrates that our single-modality RGB approach achieves competitive performance with complex multimodal systems while maintaining computational efficiency and achieving top performance among single-modality methods. The MA framework represents a significant advancement in sensor-based computer vision systems, bridging computational efficiency and segmentation quality for autonomous driving applications. Full article

(This article belongs to the Special Issue Image Feature Extraction for Computer Vision Tasks in Sensor Systems and Applications)

► Show Figures

Figure 1

19 pages, 8850 KB

Open AccessArticle

Intelligent Defect Recognition of Glazed Components in Ancient Buildings Based on Binocular Vision

by Youshan Zhao, Xiaolan Zhang, Ming Guo, Haoyu Han, Jiayi Wang, Yaofeng Wang, Xiaoxu Li and Ming Huang

Buildings 2025, 15(20), 3641; https://doi.org/10.3390/buildings15203641 - 10 Oct 2025

Cited by 1 | Viewed by 615

Abstract

Glazed components in ancient Chinese architecture hold profound historical and cultural value. However, over time, environmental erosion, physical impacts, and human disturbances gradually lead to various forms of damage, severely impacting the durability and stability of the buildings. Therefore, preventive protection of glazed [...] Read more.

Glazed components in ancient Chinese architecture hold profound historical and cultural value. However, over time, environmental erosion, physical impacts, and human disturbances gradually lead to various forms of damage, severely impacting the durability and stability of the buildings. Therefore, preventive protection of glazed components is crucial. The key to preventive protection lies in the early detection and repair of damage, thereby extending the component’s service life and preventing significant structural damage. To address this challenge, this study proposes a Restoration-Scale Identification (RSI) method that integrates depth information. By combining RGB-D images acquired from a depth camera with intrinsic camera parameters, and embedding a Convolutional Block Attention Module (CBAM) into the backbone network, the method dynamically enhances critical feature regions. It then employs a scale restoration strategy to accurately identify damage areas and recover the physical dimensions of glazed components from a global perspective. In addition, we constructed a dedicated semantic segmentation dataset for glazed tile damage, focusing on cracks and spalling. Both qualitative and quantitative evaluation results demonstrate that, compared with various high-performance semantic segmentation methods, our approach significantly improves the accuracy and robustness of damage detection in glazed components. The achieved accuracy deviates by only ±10 mm from high-precision laser scanning, a level of precision that is essential for reliably identifying and assessing subtle damages in complex glazed architectural elements. By integrating depth information, real scale information can be effectively obtained during the intelligent recognition process, thereby efficiently and accurately identifying the type of damage and size information of glazed components, and realizing the conversion from two-dimensional (2D) pixel coordinates to local three-dimensional (3D) coordinates, providing a scientific basis for the protection and restoration of ancient buildings, and ensuring the long-term stability of cultural heritage and the inheritance of historical value. Full article

(This article belongs to the Section Building Materials, and Repair & Renovation)

► Show Figures

Figure 1

28 pages, 10315 KB

Open AccessArticle

DKB-SLAM: Dynamic RGB-D Visual SLAM with Efficient Keyframe Selection and Local Bundle Adjustment

by Qian Sun, Ziqiang Xu, Yibing Li, Yidan Zhang and Fang Ye

Robotics 2025, 14(10), 134; https://doi.org/10.3390/robotics14100134 - 25 Sep 2025

Cited by 1 | Viewed by 2348

Abstract

Reliable navigation for mobile robots in dynamic, human-populated environments remains a significant challenge, as moving objects often cause localization drift and map corruption. While Simultaneous Localization and Mapping (SLAM) techniques excel in static settings, issues like keyframe redundancy and optimization inefficiencies further hinder [...] Read more.

Reliable navigation for mobile robots in dynamic, human-populated environments remains a significant challenge, as moving objects often cause localization drift and map corruption. While Simultaneous Localization and Mapping (SLAM) techniques excel in static settings, issues like keyframe redundancy and optimization inefficiencies further hinder their practical deployment on robotic platforms. To address these challenges, we propose DKB-SLAM, a real-time RGB-D visual SLAM system specifically designed to enhance robotic autonomy in complex dynamic scenes. DKB-SLAM integrates optical flow with Gaussian-based depth distribution analysis within YOLO detection frames to efficiently filter dynamic points, crucial for maintaining accurate pose estimates for the robot. An adaptive keyframe selection strategy balances map density and information integrity using a sliding window, considering the robot’s motion dynamics through parallax, visibility, and matching quality. Furthermore, a heterogeneously weighted local bundle adjustment (BA) method leverages map point geometry, assigning higher weights to stable edge points to refine the robot’s trajectory. Evaluations on the TUM RGB-D benchmark and, crucially, on a mobile robot platform in real-world dynamic scenarios, demonstrate that DKB-SLAM outperforms state-of-the-art methods, providing a robust and efficient solution for high-precision robot localization and mapping in dynamic environments. Full article

(This article belongs to the Special Issue SLAM and Adaptive Navigation for Robotics)

► Show Figures

Figure 1

15 pages, 21804 KB

Open AccessArticle

Automated On-Tree Detection and Size Estimation of Pomegranates by a Farmer Robot

by Rosa Pia Devanna, Francesco Vicino, Simone Pietro Garofalo, Gaetano Alessandro Vivaldi, Simone Pascuzzi, Giulio Reina and Annalisa Milella

Robotics 2025, 14(10), 131; https://doi.org/10.3390/robotics14100131 - 23 Sep 2025

Viewed by 1239

Abstract

Pomegranate (Punica granatum) fruit size estimation plays a crucial role in orchard management decision-making, especially for fruit quality assessment and yield prediction. Currently, fruit sizing for pomegranates is performed manually using calipers to measure equatorial and polar diameters. These methods rely [...] Read more.

Pomegranate (Punica granatum) fruit size estimation plays a crucial role in orchard management decision-making, especially for fruit quality assessment and yield prediction. Currently, fruit sizing for pomegranates is performed manually using calipers to measure equatorial and polar diameters. These methods rely on human judgment for sample selection, they are labor-intensive, and prone to errors. In this work, a novel framework for automated on-tree detection and sizing of pomegranate fruits by a farmer robot equipped with a consumer-grade RGB-D sensing device is presented. The proposed system features a multi-stage transfer learning approach to segment fruits in RGB images. Segmentation results from each image are projected on the co-located depth image; then, a fruit clustering and modeling algorithm using visual and depth information is implemented for fruit size estimation. Field tests carried out in a commercial orchard are presented for 96 pomegranate fruit samples, showing that the proposed approach allows for accurate fruit size estimation with an average discrepancy with respect to caliper measures of about 1.0 cm on both the polar and equatorial diameter. Full article

(This article belongs to the Section Agricultural and Field Robotics)

► Show Figures

Graphical abstract

16 pages, 2576 KB

Open AccessArticle

Enhancement in Three-Dimensional Depth with Bionic Image Processing

by Yuhe Chen, Chao Ping Chen, Baoen Han and Yunfan Yang

Computers 2025, 14(8), 340; https://doi.org/10.3390/computers14080340 - 20 Aug 2025

Cited by 1 | Viewed by 1024

Abstract

This study proposes an image processing framework based on bionic principles to optimize 3D visual perception in virtual reality (VR) systems. By simulating the physiological mechanisms of the human visual system, the framework significantly enhances depth perception and visual fidelity in VR content. [...] Read more.

This study proposes an image processing framework based on bionic principles to optimize 3D visual perception in virtual reality (VR) systems. By simulating the physiological mechanisms of the human visual system, the framework significantly enhances depth perception and visual fidelity in VR content. The research focuses on three core algorithms: Gabor texture feature extraction algorithm based on directional selectivity of neurons in the V1 region of the visual cortex, which enhances edge detection capability through fourth-order Gaussian kernel; improved Retinex model based on adaptive mechanism of retinal illumination, achieving brightness balance under complex illumination through horizontal–vertical dual-channel decomposition; the RGB adaptive adjustment algorithm, based on the three color response characteristics of cone cells, integrates color temperature compensation with depth cue optimization, enhances color naturalness and stereoscopic depth. Build a modular processing system on the Unity platform, integrate the above algorithms to form a collaborative optimization process, and ensure per-frame processing time meets VR real-time constraints. The experiment uses RMSE, AbsRel, and SSIM metrics, combined with subjective evaluation to verify the effectiveness of the algorithm. The results show that compared with traditional methods (SSAO, SSR, SH), our algorithm demonstrates significant advantages in simple scenes and marginal superiority in composite metrics for complex scenes. Collaborative processing of three algorithms can significantly improve depth map noise and enhance the user’s subjective experience. The research results provide a solution that combines biological rationality and engineering practicality for visual optimization in fields such as implantable metaverse, VR healthcare, and education. Full article

(This article belongs to the Special Issue Extended or Mixed Reality (AR + VR): Technology and Applications (2nd Edition))

► Show Figures

Figure 1

35 pages, 1553 KB

Open AccessArticle

Efficient Learning-Based Robotic Navigation Using Feature-Based RGB-D Pose Estimation and Topological Maps

by Eder A. Rodríguez-Martínez, Jesús Elías Miranda-Vega, Farouk Achakir, Oleg Sergiyenko, Julio C. Rodríguez-Quiñonez, Daniel Hernández Balbuena and Wendy Flores-Fuentes

Entropy 2025, 27(6), 641; https://doi.org/10.3390/e27060641 - 15 Jun 2025

Viewed by 3668

Abstract

Robust indoor robot navigation typically demands either costly sensors or extensive training data. We propose a cost-effective RGB-D navigation pipeline that couples feature-based relative pose estimation with a lightweight multi-layer-perceptron (MLP) policy. RGB-D keyframes extracted from human-driven traversals form nodes of a topological [...] Read more.

Robust indoor robot navigation typically demands either costly sensors or extensive training data. We propose a cost-effective RGB-D navigation pipeline that couples feature-based relative pose estimation with a lightweight multi-layer-perceptron (MLP) policy. RGB-D keyframes extracted from human-driven traversals form nodes of a topological map; edges are added when visual similarity and geometric–kinematic constraints are jointly satisfied. During autonomy, LightGlue features and SVD give six-DoF relative pose to the active keyframe, and the MLP predicts one of four discrete actions. Low visual similarity or detected obstacles trigger graph editing and Dijkstra replanning in real time. Across eight tasks in four Habitat-Sim environments, the agent covered 190.44 m, replanning when required, and consistently stopped within 0.1 m of the goal while running on commodity hardware. An information-theoretic analysis over the Multi-Illumination dataset shows that LightGlue maximizes per-second information gain under lighting changes, motivating its selection. The modular design attains reliable navigation without metric SLAM or large-scale learning, and seamlessly accommodates future perception or policy upgrades. Full article

(This article belongs to the Special Issue Application of Information Theory to Computer Vision and Image Processing, 3rd Edition)

► Show Figures

Figure 1

35 pages, 21267 KB

Open AccessArticle

Unmanned Aerial Vehicle–Unmanned Ground Vehicle Centric Visual Semantic Simultaneous Localization and Mapping Framework with Remote Interaction for Dynamic Scenarios

by Chang Liu, Yang Zhang, Liqun Ma, Yong Huang, Keyan Liu and Guangwei Wang

Drones 2025, 9(6), 424; https://doi.org/10.3390/drones9060424 - 10 Jun 2025

Cited by 2 | Viewed by 4359

Abstract

In this study, we introduce an Unmanned Aerial Vehicle (UAV) centric visual semantic simultaneous localization and mapping (SLAM) framework that integrates RGB–D cameras, inertial measurement units (IMUs), and a 5G–enabled remote interaction module. Our system addresses three critical limitations in existing approaches: (1) [...] Read more.

In this study, we introduce an Unmanned Aerial Vehicle (UAV) centric visual semantic simultaneous localization and mapping (SLAM) framework that integrates RGB–D cameras, inertial measurement units (IMUs), and a 5G–enabled remote interaction module. Our system addresses three critical limitations in existing approaches: (1) Distance constraints in remote operations; (2) Static map assumptions in dynamic environments; and (3) High–dimensional perception requirements for UAV–based applications. By combining YOLO–based object detection with epipolar–constraint-based dynamic feature removal, our method achieves real-time semantic mapping while rejecting motion artifacts. The framework further incorporates a dual–channel communication architecture to enable seamless human–in–the–loop control over UAV–Unmanned Ground Vehicle (UGV) teams in large–scale scenarios. Experimental validation across indoor and outdoor environments indicates that the system can achieve a detection rate of up to 75 frames per second (FPS) on an NVIDIA Jetson AGX Xavier using YOLO–FASTEST, ensuring the rapid identification of dynamic objects. In dynamic scenarios, the localization accuracy attains an average absolute pose error (APE) of 0.1275 m. This outperforms state–of–the–art methods like Dynamic–VINS (0.211 m) and ORB–SLAM3 (0.148 m) on the EuRoC MAV Dataset. The dual-channel communication architecture (Web Real–Time Communication (WebRTC) for video and Message Queuing Telemetry Transport (MQTT) for telemetry) reduces bandwidth consumption by 65% compared to traditional TCP–based protocols. Moreover, our hybrid dynamic feature filtering can reject 89% of dynamic features in occluded scenarios, guaranteeing accurate mapping in complex environments. Our framework represents a significant advancement in enabling intelligent UAVs/UGVs to navigate and interact in complex, dynamic environments, offering real-time semantic understanding and accurate localization. Full article

(This article belongs to the Special Issue Advances in Perception, Communications, and Control for Drones)

► Show Figures

Figure 1

Search Results (84)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (84)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI