Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (84)

Search Parameters:
Keywords = RGB-D human detection

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
34 pages, 24153 KB  
Article
Forest Vegetation 3D Localization Using Deep Learning Object Detectors
by Paulo A. S. Mendes, António P. Coimbra and Aníbal T. de Almeida
Appl. Sci. 2026, 16(7), 3375; https://doi.org/10.3390/app16073375 - 31 Mar 2026
Viewed by 300
Abstract
Forest fires are becoming increasingly prevalent and destructive in many regions of the world, posing significant threats to biodiversity, ecosystems, human settlements, climate, and the economy. The United States of America (USA), Australia, Canada, Greece and Portugal are five regions that have experienced [...] Read more.
Forest fires are becoming increasingly prevalent and destructive in many regions of the world, posing significant threats to biodiversity, ecosystems, human settlements, climate, and the economy. The United States of America (USA), Australia, Canada, Greece and Portugal are five regions that have experienced enormous forest fires. One way to reduce the size and rage of forest fires is by decreasing the amount of flammable material in forests. This can be achieved using autonomous Unmanned Ground Vehicles (UGVs) specialized in vegetation cutting and equipped with Artificial Intelligence (AI) algorithms to identify and differentiate between vegetation that should be preserved and material that should be removed as potential fire fuel. In this paper, an innovative study of forest vegetation detection, classification and 3D localization using ground vehicles’ RGB and depth images is presented to support autonomous forest cleaning operations to prevent fires. The presented work, which is a continuation of a previous research, presents a method for 3D objects localization in the real-world using Deep Learning Object Detection (DLOD) combined with an RGB-D camera. It presents and compares results of eight recent high-performance DLOD architectures, YOLOv5, YOLOv7, YOLOv8, YOLO-NAS, YOLOv9, YOLOv10, YOLO11 and YOLOv12, to detect and classify forest vegetation in five classes: “Grass”, “Live vegetation”, “Cut vegetation”, “Dead vegetation”, and “Tree-trunk”. For the training of the DLOD models, our custom dataset acquired in dense forests in Portugal is used. A methodology that combines the best DLOD trained for vegetation detection and classification and an RGB-D camera for the 3D localization of the classified detected objects in the real-world. The presented methods are employed in an Unmanned Ground Vehicle (UGV) to localize forest vegetation that needs to be thinned for fire prevention purposes. A key challenge for autonomous forest vegetation cleaning is the reliable discrimination of objects that need to be identified to reach the goal of fire prevention using autonomous unmanned ground vehicles in dense forests. With the obtained results, forest vegetation is precisely detected, classified and localized using the DL models and the localization method presented. Also, the fastest DLOD architecture to train is YOLOv5, and the fastest to infer are YOLOv7 and YOLOv12. The innovation presented is the detection, classification, and 3D localization of the vegetation using DLOD architectures, in real-time, with a localization error of the real-world object in width, height and depth under 21.4, 20.7 and 11%, respectively, using only a depth camera and a processing unit. The 3D localized objects are defined as parallelepiped geometrical shapes. The methodology for vegetation detection, classification and localization presented in this paper is highly suitable for future autonomous forest vegetation cleansing, specialized using unmanned ground vehicles. Full article
Show Figures

Figure 1

17 pages, 14849 KB  
Article
A Collaborative Robotic System for Autonomous Object Handling with Natural User Interaction
by Federico Neri, Gaetano Lettera, Giacomo Palmieri and Massimo Callegari
Robotics 2026, 15(3), 49; https://doi.org/10.3390/robotics15030049 - 27 Feb 2026
Viewed by 707
Abstract
In Industry 5.0, the transition from fixed traditional automation to flexible human–robot collaboration (HRC) needs interfaces that are both intuitive and efficient. This paper introduces a novel, multimodal control system for autonomous object handling, specifically designed to enhance natural user interaction in dynamic [...] Read more.
In Industry 5.0, the transition from fixed traditional automation to flexible human–robot collaboration (HRC) needs interfaces that are both intuitive and efficient. This paper introduces a novel, multimodal control system for autonomous object handling, specifically designed to enhance natural user interaction in dynamic work environments. The system integrates a 6-Degrees of Freedom (DoF) collaborative robot (UR5e) with a hand-eye RGB-D vision system to achieve robust autonomy. The core technical contribution lies in a vision pipeline utilizing deep learning for object detection and point cloud processing for accurate 6D pose estimation, enabling advanced tasks such as human-aware object handover directly onto the operator’s hand. Crucially, an Automatic Speech Recognition (ASR) is incorporated, providing a Natural Language Understanding (NLU) layer that allows operators to issue real-time commands for task modification, error correction and object selection. Experimental results demonstrate that this multimodal approach offers a streamlined workflow aiming to improve operational flexibility compared to traditional HMIs, while enhancing the perceived naturalness of the collaborative task. The system establishes a framework for highly responsive and intuitive human–robot workspaces, advancing the state of the art in natural interaction for collaborative object manipulation. Full article
(This article belongs to the Special Issue Human–Robot Collaboration in Industry 5.0)
Show Figures

Figure 1

27 pages, 5554 KB  
Article
Hierarchical Autonomous Navigation for Differential-Drive Mobile Robots Using Deep Learning, Reinforcement Learning, and Lyapunov-Based Trajectory Control
by Ramón Jaramillo-Martínez, Ernesto Chavero-Navarrete and Teodoro Ibarra-Pérez
Technologies 2026, 14(2), 125; https://doi.org/10.3390/technologies14020125 - 17 Feb 2026
Viewed by 590
Abstract
Autonomous navigation in mobile robots operating in dynamic and partially known environments demands the coordinated integration of perception, decision-making, and control while ensuring stability, safety, and energy efficiency. This paper presents an integrated navigation framework for differential-drive mobile robots that combines deep learning-based [...] Read more.
Autonomous navigation in mobile robots operating in dynamic and partially known environments demands the coordinated integration of perception, decision-making, and control while ensuring stability, safety, and energy efficiency. This paper presents an integrated navigation framework for differential-drive mobile robots that combines deep learning-based visual perception, reinforcement learning (RL) for high-level decision-making, and a Lyapunov-based trajectory reference generator for low-level motion execution. A convolutional neural network processes RGB-D images to classify obstacle configurations in real time, enabling navigation without prior map information. Based on this perception layer, an RL policy generates adaptive navigation subgoals in response to environmental changes. To ensure stable motion execution, a Lyapunov-based control strategy is formulated at the kinematic level to generate smooth velocity references, which are subsequently tracked by embedded PID controllers, explicitly decoupling learning-based decision-making from stability-critical control tasks. The local stability of the trajectory-tracking error is analyzed using a quadratic Lyapunov candidate function, ensuring asymptotic convergence under ideal kinematic assumptions. Experimental results demonstrate that while higher control gains provide faster convergence in simulation, an intermediate gain value (K = 0.5I) achieves a favorable trade-off between responsiveness and robustness in real-world conditions, mitigating oscillations caused by actuator dynamics, delays, and sensor noise. Validation across multiple navigation scenarios shows average tracking errors below 1.2 cm, obstacle detection accuracies above 95% for human obstacles, and a significant reduction in energy consumption compared to classical A* planners, highlighting the effectiveness of integrating learning-based navigation with analytically grounded control. Full article
Show Figures

Figure 1

29 pages, 33196 KB  
Article
Robust Autonomous Perception for Indoor Service Machines via Geometry-Aware RGB-D SLAM and Probabilistic Dynamic Modeling
by Zhiyu Wang, Weili Ding and Wenna Wang
Machines 2026, 14(2), 222; https://doi.org/10.3390/machines14020222 - 12 Feb 2026
Viewed by 361
Abstract
Reliable autonomous perception is essential for indoor service machines operating in human-centered environments, where weak textures, repetitive structures, and frequent dynamic interference often degrade localization stability. Conventional RGB-D SLAM systems typically rely on static-scene assumptions or binary semantic masking, which are insufficient for [...] Read more.
Reliable autonomous perception is essential for indoor service machines operating in human-centered environments, where weak textures, repetitive structures, and frequent dynamic interference often degrade localization stability. Conventional RGB-D SLAM systems typically rely on static-scene assumptions or binary semantic masking, which are insufficient for handling persistent and fine-grained environmental dynamics. This paper presents a robust autonomous perception framework based on geometry-aware RGB-D SLAM, with a particular emphasis on probabilistic dynamic modeling at the feature level. The proposed system integrates multi-granularity geometric representations, including point features, parallel-line structures, and planar regions, to enhance geometric observability in low-texture indoor environments. On this basis, a probabilistic dynamic model is introduced to explicitly characterize feature reliability under motion, where dynamic probabilities are initialized by object detection and continuously updated through temporal consistency, spatial propagation, and multi-view geometric verification. Large-scale planar structures further serve as stable anchors to support robust pose estimation. Experimental results on the TUM RGB-D dynamic benchmark demonstrate that the proposed method significantly improves localization robustness, reducing the average ATE RMSE by approximately 66% compared with representative dynamic SLAM baselines. Additional evaluations on a real-world indoor dataset further validate its effectiveness for long-term autonomous perception under dense motion and frequent occlusions. Full article
(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)
Show Figures

Figure 1

22 pages, 3651 KB  
Article
Preliminary Exploration of a Gait Alteration Index to Detect Abnormal Walking Through a RGB-D Camera and Human Pose Estimation
by Gianluca Amprimo, Lorenzo Priano, Luca Vismara and Claudia Ferraris
Algorithms 2026, 19(2), 146; https://doi.org/10.3390/a19020146 - 11 Feb 2026
Viewed by 429
Abstract
Quantitative gait analysis is essential for assessing motor function, as altered walking patterns are linked to functional decline and increased fall risk. Although recent advances in markerless motion analysis and human pose estimation enable gait feature extraction from low-cost video systems compared to [...] Read more.
Quantitative gait analysis is essential for assessing motor function, as altered walking patterns are linked to functional decline and increased fall risk. Although recent advances in markerless motion analysis and human pose estimation enable gait feature extraction from low-cost video systems compared to expensive motion analysis laboratories, clinical translation remains limited by fragmented descriptors or approaches that directly regress clinical scores, often reducing interpretability and generalizability. We propose the Gait Alteration Index (GAI), an interpretable index that quantifies gait abnormality as a functional deviation from typical walking patterns, independently of specific pathologies. The GAI is computed from a small set of gait parameters and integrates three complementary domains: spatio-temporal characteristics, surrogates of dynamic stability, and arm swing behaviour, providing both a global index and domain-specific sub-indices. Preliminary evaluation on a heterogeneous cohort using clinician-derived assessments showed that the GAI captures clinically meaningful gait alterations (Spearman’s ρ=0.65), with the strongest agreement for spatio-temporal features (ρ=0.77). These results suggest that the GAI is a promising low-cost, and interpretable tool for objective gait assessment, screening, and longitudinal monitoring. Full article
Show Figures

Figure 1

20 pages, 4633 KB  
Article
Teleoperation System for Service Robots Using a Virtual Reality Headset and 3D Pose Estimation
by Tiago Ribeiro, Eduardo Fernandes, António Ribeiro, Carolina Lopes, Fernando Ribeiro and Gil Lopes
Sensors 2026, 26(2), 471; https://doi.org/10.3390/s26020471 - 10 Jan 2026
Viewed by 895
Abstract
This paper presents an immersive teleoperation framework for service robots that combines real-time 3D human pose estimation with a Virtual Reality (VR) interface to support intuitive, natural robot control. The operator is tracked using MediaPipe for 2D landmark detection and an Intel RealSense [...] Read more.
This paper presents an immersive teleoperation framework for service robots that combines real-time 3D human pose estimation with a Virtual Reality (VR) interface to support intuitive, natural robot control. The operator is tracked using MediaPipe for 2D landmark detection and an Intel RealSense D455 RGB-D (Red-Green-Blue plus Depth) camera for depth acquisition, enabling 3D reconstruction of key joints. Joint angles are computed using efficient vector operations and mapped to the kinematic constraints of an anthropomorphic arm on the CHARMIE service robot. A VR-based telepresence interface provides stereoscopic video and head-motion-based view control to improve situational awareness during manipulation tasks. Experiments in real-world object grasping demonstrate reliable arm teleoperation and effective telepresence; however, vision-only estimation remains limited for axial rotations (e.g., elbow and wrist yaw), particularly under occlusions and unfavorable viewpoints. The proposed system provides a practical pathway toward low-cost, sensor-driven, immersive human–robot interaction for service robotics in dynamic environments. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

21 pages, 3387 KB  
Article
Development of an Autonomous and Interactive Robot Guide for Industrial Museum Environments Using IoT and AI Technologies
by Andrés Arteaga-Vargas, David Velásquez, Juan Pablo Giraldo-Pérez and Daniel Sanin-Villa
Sci 2025, 7(4), 175; https://doi.org/10.3390/sci7040175 - 1 Dec 2025
Viewed by 1859
Abstract
This paper presents the design of an autonomous robot guide for a museum-like environment in a motorcycle assembly plant. The system integrates Industry 4.0 technologies such as artificial vision, indoor positioning, generative artificial intelligence, and cloud connectivity to enhance the visitor experience. The [...] Read more.
This paper presents the design of an autonomous robot guide for a museum-like environment in a motorcycle assembly plant. The system integrates Industry 4.0 technologies such as artificial vision, indoor positioning, generative artificial intelligence, and cloud connectivity to enhance the visitor experience. The development follows the Design Inclusive Research (DIR) methodology and the VDI 2206 standard to ensure a structured scientific and engineering process. A key innovation is the integration of mmWave sensors alongside LiDAR and RGB-D cameras, enabling reliable human detection and improved navigation safety in reflective indoor environments, as well as the deployment of an open-source large language model for natural, on-device interaction with visitors. The current results include the complete mechanical, electronic, and software architecture; simulation validation; and a preliminary implementation in the real museum environment, where the system demonstrated consistent autonomous navigation, stable performance, and effective user interaction. Full article
(This article belongs to the Section Computer Science, Mathematics and AI)
Show Figures

Figure 1

23 pages, 15360 KB  
Article
A Mobile Robotic System Design and Approach for Autonomous Targeted Disinfection
by Mohammed Z. Shaqura, Linyan Han, Mohammadali Javaheri Koopaee, Wissem Haouas, Moustafa Motawei, Peter Mooney, Nick Fry, Tony Wiese, Bilal Kaddouh and Robert C. Richardson
Robotics 2025, 14(12), 178; https://doi.org/10.3390/robotics14120178 - 30 Nov 2025
Viewed by 1121
Abstract
The recent global pandemic has posed unprecedented challenges to public health systems and has highlighted the critical need for effective, contactless disinfection strategies in shared environments. This study investigates the use of autonomous robotics to enhance disinfection efficiency and safety in public spaces [...] Read more.
The recent global pandemic has posed unprecedented challenges to public health systems and has highlighted the critical need for effective, contactless disinfection strategies in shared environments. This study investigates the use of autonomous robotics to enhance disinfection efficiency and safety in public spaces through the development of a custom-built mobile spraying platform. The proposed robotic system is equipped with an integrated 3D object pose estimation framework that fuses RGB-based object detection with point cloud segmentation to accurately identify and localize high-contact surfaces. To facilitate autonomous operation, both local and global motion planning algorithms are implemented, enabling the robot to navigate complex environments and execute disinfection tasks with minimal human intervention. Experimental results demonstrate the feasibility of the proposed disinfection robotic system. Full article
(This article belongs to the Section Sensors and Control in Robotics)
Show Figures

Figure 1

26 pages, 1617 KB  
Article
MemRoadNet: Human-like Memory Integration for Free Road Space Detection
by Sidra Shafiq, Abdullah Aman Khan and Jie Shao
Sensors 2025, 25(21), 6600; https://doi.org/10.3390/s25216600 - 27 Oct 2025
Viewed by 935
Abstract
Detecting available road space is a fundamental task for autonomous driving vehicles, requiring robust image feature extraction methods that operate reliably across diverse sensor-captured scenarios. However, existing approaches process each input independently without leveraging Accumulated Experiential Knowledge (AEK), limiting their adaptability and reliability. [...] Read more.
Detecting available road space is a fundamental task for autonomous driving vehicles, requiring robust image feature extraction methods that operate reliably across diverse sensor-captured scenarios. However, existing approaches process each input independently without leveraging Accumulated Experiential Knowledge (AEK), limiting their adaptability and reliability. In order to explore the impact of AEK, we introduce MemRoadNet, a Memory-Augmented (MA) semantic segmentation framework that integrates human-inspired cognitive architectures with deep-learning models for free road space detection. Our approach combines an InternImage-XL backbone with a UPerNet decoder and a Human-like Memory Bank system implementing episodic, semantic, and working memory subsystems. The memory system stores road experiences with emotional valences based on segmentation performance, enabling intelligent retrieval and integration of relevant historical patterns during training and inference. Experimental validation on the KITTI road, Cityscapes, and R2D benchmarks demonstrates that our single-modality RGB approach achieves competitive performance with complex multimodal systems while maintaining computational efficiency and achieving top performance among single-modality methods. The MA framework represents a significant advancement in sensor-based computer vision systems, bridging computational efficiency and segmentation quality for autonomous driving applications. Full article
Show Figures

Figure 1

19 pages, 8850 KB  
Article
Intelligent Defect Recognition of Glazed Components in Ancient Buildings Based on Binocular Vision
by Youshan Zhao, Xiaolan Zhang, Ming Guo, Haoyu Han, Jiayi Wang, Yaofeng Wang, Xiaoxu Li and Ming Huang
Buildings 2025, 15(20), 3641; https://doi.org/10.3390/buildings15203641 - 10 Oct 2025
Cited by 1 | Viewed by 615
Abstract
Glazed components in ancient Chinese architecture hold profound historical and cultural value. However, over time, environmental erosion, physical impacts, and human disturbances gradually lead to various forms of damage, severely impacting the durability and stability of the buildings. Therefore, preventive protection of glazed [...] Read more.
Glazed components in ancient Chinese architecture hold profound historical and cultural value. However, over time, environmental erosion, physical impacts, and human disturbances gradually lead to various forms of damage, severely impacting the durability and stability of the buildings. Therefore, preventive protection of glazed components is crucial. The key to preventive protection lies in the early detection and repair of damage, thereby extending the component’s service life and preventing significant structural damage. To address this challenge, this study proposes a Restoration-Scale Identification (RSI) method that integrates depth information. By combining RGB-D images acquired from a depth camera with intrinsic camera parameters, and embedding a Convolutional Block Attention Module (CBAM) into the backbone network, the method dynamically enhances critical feature regions. It then employs a scale restoration strategy to accurately identify damage areas and recover the physical dimensions of glazed components from a global perspective. In addition, we constructed a dedicated semantic segmentation dataset for glazed tile damage, focusing on cracks and spalling. Both qualitative and quantitative evaluation results demonstrate that, compared with various high-performance semantic segmentation methods, our approach significantly improves the accuracy and robustness of damage detection in glazed components. The achieved accuracy deviates by only ±10 mm from high-precision laser scanning, a level of precision that is essential for reliably identifying and assessing subtle damages in complex glazed architectural elements. By integrating depth information, real scale information can be effectively obtained during the intelligent recognition process, thereby efficiently and accurately identifying the type of damage and size information of glazed components, and realizing the conversion from two-dimensional (2D) pixel coordinates to local three-dimensional (3D) coordinates, providing a scientific basis for the protection and restoration of ancient buildings, and ensuring the long-term stability of cultural heritage and the inheritance of historical value. Full article
(This article belongs to the Section Building Materials, and Repair & Renovation)
Show Figures

Figure 1

28 pages, 10315 KB  
Article
DKB-SLAM: Dynamic RGB-D Visual SLAM with Efficient Keyframe Selection and Local Bundle Adjustment
by Qian Sun, Ziqiang Xu, Yibing Li, Yidan Zhang and Fang Ye
Robotics 2025, 14(10), 134; https://doi.org/10.3390/robotics14100134 - 25 Sep 2025
Cited by 1 | Viewed by 2348
Abstract
Reliable navigation for mobile robots in dynamic, human-populated environments remains a significant challenge, as moving objects often cause localization drift and map corruption. While Simultaneous Localization and Mapping (SLAM) techniques excel in static settings, issues like keyframe redundancy and optimization inefficiencies further hinder [...] Read more.
Reliable navigation for mobile robots in dynamic, human-populated environments remains a significant challenge, as moving objects often cause localization drift and map corruption. While Simultaneous Localization and Mapping (SLAM) techniques excel in static settings, issues like keyframe redundancy and optimization inefficiencies further hinder their practical deployment on robotic platforms. To address these challenges, we propose DKB-SLAM, a real-time RGB-D visual SLAM system specifically designed to enhance robotic autonomy in complex dynamic scenes. DKB-SLAM integrates optical flow with Gaussian-based depth distribution analysis within YOLO detection frames to efficiently filter dynamic points, crucial for maintaining accurate pose estimates for the robot. An adaptive keyframe selection strategy balances map density and information integrity using a sliding window, considering the robot’s motion dynamics through parallax, visibility, and matching quality. Furthermore, a heterogeneously weighted local bundle adjustment (BA) method leverages map point geometry, assigning higher weights to stable edge points to refine the robot’s trajectory. Evaluations on the TUM RGB-D benchmark and, crucially, on a mobile robot platform in real-world dynamic scenarios, demonstrate that DKB-SLAM outperforms state-of-the-art methods, providing a robust and efficient solution for high-precision robot localization and mapping in dynamic environments. Full article
(This article belongs to the Special Issue SLAM and Adaptive Navigation for Robotics)
Show Figures

Figure 1

15 pages, 21804 KB  
Article
Automated On-Tree Detection and Size Estimation of Pomegranates by a Farmer Robot
by Rosa Pia Devanna, Francesco Vicino, Simone Pietro Garofalo, Gaetano Alessandro Vivaldi, Simone Pascuzzi, Giulio Reina and Annalisa Milella
Robotics 2025, 14(10), 131; https://doi.org/10.3390/robotics14100131 - 23 Sep 2025
Viewed by 1239
Abstract
Pomegranate (Punica granatum) fruit size estimation plays a crucial role in orchard management decision-making, especially for fruit quality assessment and yield prediction. Currently, fruit sizing for pomegranates is performed manually using calipers to measure equatorial and polar diameters. These methods rely [...] Read more.
Pomegranate (Punica granatum) fruit size estimation plays a crucial role in orchard management decision-making, especially for fruit quality assessment and yield prediction. Currently, fruit sizing for pomegranates is performed manually using calipers to measure equatorial and polar diameters. These methods rely on human judgment for sample selection, they are labor-intensive, and prone to errors. In this work, a novel framework for automated on-tree detection and sizing of pomegranate fruits by a farmer robot equipped with a consumer-grade RGB-D sensing device is presented. The proposed system features a multi-stage transfer learning approach to segment fruits in RGB images. Segmentation results from each image are projected on the co-located depth image; then, a fruit clustering and modeling algorithm using visual and depth information is implemented for fruit size estimation. Field tests carried out in a commercial orchard are presented for 96 pomegranate fruit samples, showing that the proposed approach allows for accurate fruit size estimation with an average discrepancy with respect to caliper measures of about 1.0 cm on both the polar and equatorial diameter. Full article
(This article belongs to the Section Agricultural and Field Robotics)
Show Figures

Graphical abstract

16 pages, 2576 KB  
Article
Enhancement in Three-Dimensional Depth with Bionic Image Processing
by Yuhe Chen, Chao Ping Chen, Baoen Han and Yunfan Yang
Computers 2025, 14(8), 340; https://doi.org/10.3390/computers14080340 - 20 Aug 2025
Cited by 1 | Viewed by 1024
Abstract
This study proposes an image processing framework based on bionic principles to optimize 3D visual perception in virtual reality (VR) systems. By simulating the physiological mechanisms of the human visual system, the framework significantly enhances depth perception and visual fidelity in VR content. [...] Read more.
This study proposes an image processing framework based on bionic principles to optimize 3D visual perception in virtual reality (VR) systems. By simulating the physiological mechanisms of the human visual system, the framework significantly enhances depth perception and visual fidelity in VR content. The research focuses on three core algorithms: Gabor texture feature extraction algorithm based on directional selectivity of neurons in the V1 region of the visual cortex, which enhances edge detection capability through fourth-order Gaussian kernel; improved Retinex model based on adaptive mechanism of retinal illumination, achieving brightness balance under complex illumination through horizontal–vertical dual-channel decomposition; the RGB adaptive adjustment algorithm, based on the three color response characteristics of cone cells, integrates color temperature compensation with depth cue optimization, enhances color naturalness and stereoscopic depth. Build a modular processing system on the Unity platform, integrate the above algorithms to form a collaborative optimization process, and ensure per-frame processing time meets VR real-time constraints. The experiment uses RMSE, AbsRel, and SSIM metrics, combined with subjective evaluation to verify the effectiveness of the algorithm. The results show that compared with traditional methods (SSAO, SSR, SH), our algorithm demonstrates significant advantages in simple scenes and marginal superiority in composite metrics for complex scenes. Collaborative processing of three algorithms can significantly improve depth map noise and enhance the user’s subjective experience. The research results provide a solution that combines biological rationality and engineering practicality for visual optimization in fields such as implantable metaverse, VR healthcare, and education. Full article
Show Figures

Figure 1

35 pages, 1553 KB  
Article
Efficient Learning-Based Robotic Navigation Using Feature-Based RGB-D Pose Estimation and Topological Maps
by Eder A. Rodríguez-Martínez, Jesús Elías Miranda-Vega, Farouk Achakir, Oleg Sergiyenko, Julio C. Rodríguez-Quiñonez, Daniel Hernández Balbuena and Wendy Flores-Fuentes
Entropy 2025, 27(6), 641; https://doi.org/10.3390/e27060641 - 15 Jun 2025
Viewed by 3668
Abstract
Robust indoor robot navigation typically demands either costly sensors or extensive training data. We propose a cost-effective RGB-D navigation pipeline that couples feature-based relative pose estimation with a lightweight multi-layer-perceptron (MLP) policy. RGB-D keyframes extracted from human-driven traversals form nodes of a topological [...] Read more.
Robust indoor robot navigation typically demands either costly sensors or extensive training data. We propose a cost-effective RGB-D navigation pipeline that couples feature-based relative pose estimation with a lightweight multi-layer-perceptron (MLP) policy. RGB-D keyframes extracted from human-driven traversals form nodes of a topological map; edges are added when visual similarity and geometric–kinematic constraints are jointly satisfied. During autonomy, LightGlue features and SVD give six-DoF relative pose to the active keyframe, and the MLP predicts one of four discrete actions. Low visual similarity or detected obstacles trigger graph editing and Dijkstra replanning in real time. Across eight tasks in four Habitat-Sim environments, the agent covered 190.44 m, replanning when required, and consistently stopped within 0.1 m of the goal while running on commodity hardware. An information-theoretic analysis over the Multi-Illumination dataset shows that LightGlue maximizes per-second information gain under lighting changes, motivating its selection. The modular design attains reliable navigation without metric SLAM or large-scale learning, and seamlessly accommodates future perception or policy upgrades. Full article
Show Figures

Figure 1

35 pages, 21267 KB  
Article
Unmanned Aerial Vehicle–Unmanned Ground Vehicle Centric Visual Semantic Simultaneous Localization and Mapping Framework with Remote Interaction for Dynamic Scenarios
by Chang Liu, Yang Zhang, Liqun Ma, Yong Huang, Keyan Liu and Guangwei Wang
Drones 2025, 9(6), 424; https://doi.org/10.3390/drones9060424 - 10 Jun 2025
Cited by 2 | Viewed by 4359
Abstract
In this study, we introduce an Unmanned Aerial Vehicle (UAV) centric visual semantic simultaneous localization and mapping (SLAM) framework that integrates RGB–D cameras, inertial measurement units (IMUs), and a 5G–enabled remote interaction module. Our system addresses three critical limitations in existing approaches: (1) [...] Read more.
In this study, we introduce an Unmanned Aerial Vehicle (UAV) centric visual semantic simultaneous localization and mapping (SLAM) framework that integrates RGB–D cameras, inertial measurement units (IMUs), and a 5G–enabled remote interaction module. Our system addresses three critical limitations in existing approaches: (1) Distance constraints in remote operations; (2) Static map assumptions in dynamic environments; and (3) High–dimensional perception requirements for UAV–based applications. By combining YOLO–based object detection with epipolar–constraint-based dynamic feature removal, our method achieves real-time semantic mapping while rejecting motion artifacts. The framework further incorporates a dual–channel communication architecture to enable seamless human–in–the–loop control over UAV–Unmanned Ground Vehicle (UGV) teams in large–scale scenarios. Experimental validation across indoor and outdoor environments indicates that the system can achieve a detection rate of up to 75 frames per second (FPS) on an NVIDIA Jetson AGX Xavier using YOLO–FASTEST, ensuring the rapid identification of dynamic objects. In dynamic scenarios, the localization accuracy attains an average absolute pose error (APE) of 0.1275 m. This outperforms state–of–the–art methods like Dynamic–VINS (0.211 m) and ORB–SLAM3 (0.148 m) on the EuRoC MAV Dataset. The dual-channel communication architecture (Web Real–Time Communication (WebRTC) for video and Message Queuing Telemetry Transport (MQTT) for telemetry) reduces bandwidth consumption by 65% compared to traditional TCP–based protocols. Moreover, our hybrid dynamic feature filtering can reject 89% of dynamic features in occluded scenarios, guaranteeing accurate mapping in complex environments. Our framework represents a significant advancement in enabling intelligent UAVs/UGVs to navigate and interact in complex, dynamic environments, offering real-time semantic understanding and accurate localization. Full article
(This article belongs to the Special Issue Advances in Perception, Communications, and Control for Drones)
Show Figures

Figure 1

Back to TopTop