Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (481)

Search Parameters:
Keywords = RGB-D sensor

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
30 pages, 8087 KB  
Article
A Novel SLAM Approach for Trajectory Generation of a Dual-Arm Mobile Robot (DAMR) Using Sensor Fusion
by Narendra Kumar Kolla and Pandu Ranga Vundavilli
Automation 2026, 7(2), 42; https://doi.org/10.3390/automation7020042 - 3 Mar 2026
Abstract
Simultaneous Localization and Mapping (SLAM) is essential for autonomous movement in intelligent robotic systems. Traditional SLAM using a single sensor, such as an Inertial Measurement Unit (IMU), faces challenges including noise and drift. This paper introduces a novel Cartographer-based SLAM approach for DAMR [...] Read more.
Simultaneous Localization and Mapping (SLAM) is essential for autonomous movement in intelligent robotic systems. Traditional SLAM using a single sensor, such as an Inertial Measurement Unit (IMU), faces challenges including noise and drift. This paper introduces a novel Cartographer-based SLAM approach for DAMR trajectory generation in indoor environments to reduce drift errors and improve localization accuracy. This SLAM approach integrates multi-sensor data with extended Kalman filter (EKF) fusion from wheel odometry, an RGB-D camera (RTAB-Map), and an IMU for precise mapping with DAMR trajectory generation and is compared with the heading reference trajectory generated by robot pose estimation and frame transformation. This system is implemented in the Robot Operating System (ROS 2) for coordinated data acquisition, processing, and visualization. After experimental verification, the DAMR trajectories generated are closer to the reference trajectory and drift errors are tuned. The experimental results revealed that the DAMR trajectory with multi-sensor data integration using the EKF effectively improved the positioning accuracy and robustness of the system. The proposed approach shows improved alignment with the reference trajectory, yielding a mean displacement error of 0.352% and an absolute trajectory error of 0.007 m, highlighting the effectiveness of the fusion approach for accurate indoor robot navigation. Full article
(This article belongs to the Section Robotics and Autonomous Systems)
Show Figures

Figure 1

17 pages, 12829 KB  
Article
Stereo Gaussian Splatting with Adaptive Scene Depth Estimation for Semantic Mapping
by Chenhui Fu and Jiangang Lu
J. Imaging 2026, 12(3), 105; https://doi.org/10.3390/jimaging12030105 - 28 Feb 2026
Viewed by 70
Abstract
Simultaneous Localization and Mapping (SLAM) is a fundamental capability in robotics and augmented reality. However, achieving accurate geometric reconstruction and consistent semantic understanding in complex environments remains challenging. Although recent neural implicit representations have improved reconstruction quality, they often suffer from high computational [...] Read more.
Simultaneous Localization and Mapping (SLAM) is a fundamental capability in robotics and augmented reality. However, achieving accurate geometric reconstruction and consistent semantic understanding in complex environments remains challenging. Although recent neural implicit representations have improved reconstruction quality, they often suffer from high computational cost and the forgetting phenomenon during online mapping. In this paper, we propose StereoGS-SLAM, a stereo semantic SLAM framework based on 3D Gaussian Splatting (3DGS) for explicit scene representation. Unlike existing approaches, StereoGS-SLAM operates on passive RGB stereo inputs without requiring active depth sensors. An adaptive depth estimation strategy is introduced to dynamically refine Gaussian scales based on real-time stereo depth estimates, ensuring robust and scale-consistent reconstruction. In addition, we propose a hybrid keyframe selection strategy that integrates motion-aware selection with lightweight random sampling to improve keyframe diversity and maintain stable, real-time optimization. Experimental evaluations demonstrate that StereoGS-SLAM achieves consistent and competitive localization, rendering, and semantic reconstruction performance compared with recent 3DGS-based SLAM systems. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

36 pages, 4079 KB  
Article
FEGW-YOLO: A Feature-Complexity-Guided Lightweight Framework for Real-Time Multi-Crop Detection with Advanced Sensing Integration on Edge Devices
by Yaojiang Liu, Hongjun Tian, Yijie Yin, Yuhan Zhou, Wei Li, Yang Xiong, Yichen Wang, Zinan Nie, Yang Yang, Dongxiao Xie and Shijie Huang
Sensors 2026, 26(4), 1313; https://doi.org/10.3390/s26041313 - 18 Feb 2026
Viewed by 184
Abstract
Real-time object detection on resource-constrained edge devices remains a critical challenge in precision agriculture and autonomous systems, particularly when integrating advanced multi-modal sensors (RGB-D, thermal, hyperspectral). This paper introduces FEGW-YOLO, a lightweight detection framework explicitly designed to bridge the efficiency-accuracy gap for fine-grained [...] Read more.
Real-time object detection on resource-constrained edge devices remains a critical challenge in precision agriculture and autonomous systems, particularly when integrating advanced multi-modal sensors (RGB-D, thermal, hyperspectral). This paper introduces FEGW-YOLO, a lightweight detection framework explicitly designed to bridge the efficiency-accuracy gap for fine-grained visual perception on edge hardware while maintaining compatibility with multiple sensor modalities. The core innovation is a Feature Complexity Descriptor (FCD) metric that enables adaptive, layer-wise compression based on the information-bearing capacity of network features. This compression-guided approach is coupled with (1) Feature Engineering-driven Ghost Convolution (FEG-Conv) for parameter reduction, (2) Efficient Multi-Scale Attention (EMA) for compensating compression-induced information loss, and (3) Wise-IoU loss for improved localization in dense, occluded scenes. The framework follows a principled “Compress, Compensate, and Refine” philosophy that treats compression and compensation as co-designed objectives rather than isolated knobs. Extensive experiments on a custom strawberry dataset (11,752 annotated instances) and cross-crop validation on apples, tomatoes, and grapes demonstrate that FEGW-YOLO achieves 95.1% mAP@0.5 while reducing model parameters by 54.7% and computational cost (GFLOPs) by 53.5% compared to a strong YOLO-Agri baseline. Real-time inference on NVIDIA Jetson Xavier achieves 38 FPS at 12.3 W, enabling 40+ hours of continuous operation on typical agricultural robotic platforms. Multi-modal fusion experiments with RGB-D sensors demonstrate that the lightweight architecture leaves sufficient computational headroom for parallel processing of depth and visual data, a capability essential for practical advanced sensing systems. Field deployment in commercial strawberry greenhouses validates an 87.3% harvesting success rate with a 2.1% fruit damage rate, demonstrating feasibility for autonomous systems. The proposed framework advances the state-of-the-art in efficient agricultural sensing by introducing a principled metric-guided compression strategy, comprehensive multi-modal sensor integration, and empirical validation across diverse crop types and real-world deployment scenarios. This work bridges the gap between laboratory research and practical edge deployment of advanced sensing systems, with direct relevance to autonomous harvesting, precision monitoring, and other resource-constrained agricultural applications. Full article
Show Figures

Figure 1

27 pages, 5554 KB  
Article
Hierarchical Autonomous Navigation for Differential-Drive Mobile Robots Using Deep Learning, Reinforcement Learning, and Lyapunov-Based Trajectory Control
by Ramón Jaramillo-Martínez, Ernesto Chavero-Navarrete and Teodoro Ibarra-Pérez
Technologies 2026, 14(2), 125; https://doi.org/10.3390/technologies14020125 - 17 Feb 2026
Viewed by 266
Abstract
Autonomous navigation in mobile robots operating in dynamic and partially known environments demands the coordinated integration of perception, decision-making, and control while ensuring stability, safety, and energy efficiency. This paper presents an integrated navigation framework for differential-drive mobile robots that combines deep learning-based [...] Read more.
Autonomous navigation in mobile robots operating in dynamic and partially known environments demands the coordinated integration of perception, decision-making, and control while ensuring stability, safety, and energy efficiency. This paper presents an integrated navigation framework for differential-drive mobile robots that combines deep learning-based visual perception, reinforcement learning (RL) for high-level decision-making, and a Lyapunov-based trajectory reference generator for low-level motion execution. A convolutional neural network processes RGB-D images to classify obstacle configurations in real time, enabling navigation without prior map information. Based on this perception layer, an RL policy generates adaptive navigation subgoals in response to environmental changes. To ensure stable motion execution, a Lyapunov-based control strategy is formulated at the kinematic level to generate smooth velocity references, which are subsequently tracked by embedded PID controllers, explicitly decoupling learning-based decision-making from stability-critical control tasks. The local stability of the trajectory-tracking error is analyzed using a quadratic Lyapunov candidate function, ensuring asymptotic convergence under ideal kinematic assumptions. Experimental results demonstrate that while higher control gains provide faster convergence in simulation, an intermediate gain value (K = 0.5I) achieves a favorable trade-off between responsiveness and robustness in real-world conditions, mitigating oscillations caused by actuator dynamics, delays, and sensor noise. Validation across multiple navigation scenarios shows average tracking errors below 1.2 cm, obstacle detection accuracies above 95% for human obstacles, and a significant reduction in energy consumption compared to classical A* planners, highlighting the effectiveness of integrating learning-based navigation with analytically grounded control. Full article
Show Figures

Figure 1

19 pages, 4367 KB  
Article
A Neuro-Symbolic Approach to Fall Detection via Monocular Depth Estimation
by Yinghai Xu, Bongjun Kim, In-Nea Wang and Junho Jeong
Appl. Sci. 2026, 16(4), 1895; https://doi.org/10.3390/app16041895 - 13 Feb 2026
Viewed by 241
Abstract
Falls remain a critical safety concern in surveillance settings, yet monocular RGB methods often degrade in multi-person scenes with occlusion and loss of three-dimensional cues. This study proposes a neuro-symbolic framework that restores physically interpretable depth proxies from monocular video and fuses them [...] Read more.
Falls remain a critical safety concern in surveillance settings, yet monocular RGB methods often degrade in multi-person scenes with occlusion and loss of three-dimensional cues. This study proposes a neuro-symbolic framework that restores physically interpretable depth proxies from monocular video and fuses them with skeleton-based spatio-temporal inference for robust fall detection. The pipeline estimates per-frame depth and 2D skeletons, recovers world coordinates for key joints, and derives absolute neck height and vertical descent rate for rule-based adjudication, while a neural method operates on joint trajectories; final decisions combine both streams with a logical policy and short-horizon temporal consistency. Experiments in a realistic indoor testbed with multi-person activity compare three configurations—neural, symbolic, and fused. The fused neuro-symbolic method achieved an accuracy of 0.88 and an F1 score of 0.76 on the real surveillance test set, outperforming the neural method alone (accuracy 0.81, F1 0.64) and the symbolic method alone (accuracy 0.77, F1 0.35). Gains arise from complementary error profiles: depth-derived, rule-based cues suppress spurious positives on non-fall frames, while the neural stream recovers true falls near rule boundaries. These findings indicate that integrating monocular depth proxies with interpretable rules improves reliability without additional sensors, supporting deployment in complex, multi-person surveillance environments. Full article
Show Figures

Figure 1

21 pages, 7192 KB  
Article
Expectation–Maximization Method for RGB-D Camera Calibration with Motion Capture System
by Jianchu Lin, Guangxiao Du, Yugui Zhang, Yiyan Zhao, Qian Xie, Jian Yao and Ashim Khadka
Photonics 2026, 13(2), 183; https://doi.org/10.3390/photonics13020183 - 12 Feb 2026
Viewed by 237
Abstract
Camera calibration is an essential research direction in photonics and computer vision. It achieves the standardization of camera data by using intrinsic and extrinsic parameters. Recently, RGB-D cameras have been an important device by supplementing deep information, and they are commonly divided into [...] Read more.
Camera calibration is an essential research direction in photonics and computer vision. It achieves the standardization of camera data by using intrinsic and extrinsic parameters. Recently, RGB-D cameras have been an important device by supplementing deep information, and they are commonly divided into three kinds of mechanisms: binocular, structured light, and Time of Flight (ToF). However, the different mechanisms cause calibration methods to be complex and hardly uniform. Lens distortion, parameter loss, and sensor degradation et al. even fail calibration. To address the issues, we propose a camera calibration method based on the Expectation–Maximization (EM) algorithm. A unified model of latent variables is established for the different kinds of cameras. In the EM algorithm, the E-step estimates the hidden intrinsic parameters of cameras, while the M-step learns the distortion parameters of the lens. In addition, the depth values are calculated by the spatial geometric method, and they are calibrated using the least squares method under an optical motion capture system. Experimental results demonstrate that our method can be directly employed in the calibration of monocular and binocular RGB-D cameras, reducing image calibration errors between 0.6 and 1.2% less than least squares, Levenberg–Marquardt, Direct Linear Transform, and Trust Region Reflection. The deep error is reduced by 16 to 19.3 mm. Therefore, our method can effectively improve the performance of different RGB-D cameras. Full article
Show Figures

Figure 1

31 pages, 3468 KB  
Article
From RGB-D to RGB-Only: Reliability and Clinical Relevance of Markerless Skeletal Tracking for Postural Assessment in Parkinson’s Disease
by Claudia Ferraris, Gianluca Amprimo, Gabriella Olmo, Marco Ghislieri, Martina Patera, Antonio Suppa, Silvia Gallo, Gabriele Imbalzano, Leonardo Lopiano and Carlo Alberto Artusi
Sensors 2026, 26(4), 1146; https://doi.org/10.3390/s26041146 - 10 Feb 2026
Viewed by 325
Abstract
Axial postural abnormalities in Parkinson’s Disease (PD) are traditionally assessed using clinical rating scales, although picture-based assessment is considered the gold standard. This study evaluates the reliability and clinical relevance of two markerless body-tracking frameworks, the RGB-D-based Microsoft Azure Kinect (providing the reference [...] Read more.
Axial postural abnormalities in Parkinson’s Disease (PD) are traditionally assessed using clinical rating scales, although picture-based assessment is considered the gold standard. This study evaluates the reliability and clinical relevance of two markerless body-tracking frameworks, the RGB-D-based Microsoft Azure Kinect (providing the reference KIN_3D model) and the RGB-only Google MediaPipe Pose (MP), using a synchronous dual-camera setup. Forty PD patients performed a 60 s static standing task. We compared KIN_3D with three MP models (at different complexity levels) across horizontal, vertical, sagittal, and 3D joint angles. Results show that lower-complexity MP models achieved high congruence with KIN_3D for trunk and shoulder alignment (ρ > 0.75), while the lateral view significantly improved tracking of sagittal angles (ρ ≥ 0.72). Conversely, the high-complexity model introduced significant skeletal distortions. Clinically, several angular parameters emerged as robust metrics for postural assessment and global motor impairments, while sagittal angles correlated with motor complications. Unexpectedly, a more upright frontal alignment was associated with greater freezing of gait severity, suggesting that static postural metrics may serve as proxies for dynamic gait performance. In addition, both RGB-only and RGB-D frameworks effectively discriminated between postural severity clusters. While the higher-complexity MP model should be avoided due to inaccurate 3D reconstructions, our findings demonstrate that low- and medium-complexity MP models represent a reliable alternative to RGB-D sensors for objective postural assessment in PD, facilitating the widespread application of objective posture measurements in clinical contexts. Full article
(This article belongs to the Special Issue Sensors for Human Motion Analysis and Applications)
Show Figures

Figure 1

17 pages, 7804 KB  
Article
A 3D Camera-Based Approach for Real-Time Hand Configuration Recognition in Italian Sign Language
by Luca Ulrich, Asia De Luca, Riccardo Miraglia, Emma Mulassano, Simone Quattrocchio, Giorgia Marullo, Chiara Innocente, Federico Salerno and Enrico Vezzetti
Sensors 2026, 26(3), 1059; https://doi.org/10.3390/s26031059 - 6 Feb 2026
Viewed by 262
Abstract
Deafness poses significant challenges to effective communication, particularly in contexts where access to sign language interpreters is limited. Hand configuration recognition represents a fundamental component of sign language understanding, as configurations constitute a core cheremic element in many sign languages, including Italian Sign [...] Read more.
Deafness poses significant challenges to effective communication, particularly in contexts where access to sign language interpreters is limited. Hand configuration recognition represents a fundamental component of sign language understanding, as configurations constitute a core cheremic element in many sign languages, including Italian Sign Language (LIS). In this work, we address configuration-level recognition as an independent classification task and propose a machine vision framework based on RGB-D sensing. The proposed approach combines MediaPipe-based hand landmark extraction with normalized three-dimensional geometric features and a Support Vector Machine classifier. The first contribution of this study is the formulation of LIS hand configuration recognition as a standalone, configuration-level problem, decoupled from temporal gesture modeling. The second contribution is the integration of sensor-acquired RGB-D depth measurements into the landmark-based feature representation, enabling a direct comparison with estimated depth obtained from monocular data. The third contribution consists of a systematic experimental evaluation on two LIS configuration sets (6 and 16 classes), demonstrating that the use of real depth significantly improves classification performance and class separability, particularly for geometrically similar configurations. The results highlight the critical role of depth quality in configuration-level recognition and provide insights into the design of robust vision-based systems for LIS analysis. Full article
(This article belongs to the Special Issue Sensing and Machine Learning Control: Progress and Applications)
Show Figures

Figure 1

23 pages, 6932 KB  
Article
RocSync: Millisecond-Accurate Temporal Synchronization for Heterogeneous Camera Systems
by Jaro Meyer, Frédéric Giraud, Joschua Wüthrich, Marc Pollefeys, Philipp Fürnstahl and Lilian Calvet
Sensors 2026, 26(3), 1036; https://doi.org/10.3390/s26031036 - 5 Feb 2026
Viewed by 328
Abstract
Accurate spatiotemporal alignment of multi-view video streams is essential for a wide range of dynamic-scene applications such as multi-view 3D reconstruction, pose estimation, and scene understanding. However, synchronizing multiple cameras remains a significant challenge, especially in heterogeneous setups combining professional- and consumer-grade devices, [...] Read more.
Accurate spatiotemporal alignment of multi-view video streams is essential for a wide range of dynamic-scene applications such as multi-view 3D reconstruction, pose estimation, and scene understanding. However, synchronizing multiple cameras remains a significant challenge, especially in heterogeneous setups combining professional- and consumer-grade devices, visible and infrared sensors, or systems with and without audio, where common hardware synchronization capabilities are often unavailable. This limitation is particularly evident in real-world environments, where controlled capture conditions are not feasible. In this work, we present a low-cost, general-purpose synchronization method that achieves millisecond-level temporal alignment across diverse camera systems while supporting both visible (RGB) and infrared (IR) modalities. The proposed solution employs a custom-built LED Clock that encodes time through red and infrared LEDs, allowing visual decoding of the exposure window (start and end times) from recorded frames for millisecond-level synchronization. We benchmark our method against hardware synchronization and achieve a residual error of 1.34 ms RMSE across multiple recordings. In further experiments, our method outperforms light-, audio-, and timecode-based synchronization approaches and directly improves downstream computer vision tasks, including multi-view pose estimation and 3D reconstruction. Finally, we validate the system in large-scale surgical recordings involving over 25 heterogeneous cameras spanning both IR and RGB modalities. This solution simplifies and streamlines the synchronization pipeline and expands access to advanced vision-based sensing in unconstrained environments, including industrial and clinical applications. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

27 pages, 49730 KB  
Article
AMSRDet: An Adaptive Multi-Scale UAV Infrared-Visible Remote Sensing Vehicle Detection Network
by Zekai Yan and Yuheng Li
Sensors 2026, 26(3), 817; https://doi.org/10.3390/s26030817 - 26 Jan 2026
Viewed by 335
Abstract
Unmanned Aerial Vehicle (UAV) platforms enable flexible and cost-effective vehicle detection for intelligent transportation systems, yet small-scale vehicles in complex aerial scenes pose substantial challenges from extreme scale variations, environmental interference, and single-sensor limitations. We present AMSRDet (Adaptive Multi-Scale Remote Sensing Detector), an [...] Read more.
Unmanned Aerial Vehicle (UAV) platforms enable flexible and cost-effective vehicle detection for intelligent transportation systems, yet small-scale vehicles in complex aerial scenes pose substantial challenges from extreme scale variations, environmental interference, and single-sensor limitations. We present AMSRDet (Adaptive Multi-Scale Remote Sensing Detector), an adaptive multi-scale detection network fusing infrared (IR) and visible (RGB) modalities for robust UAV-based vehicle detection. Our framework comprises four novel components: (1) a MobileMamba-based dual-stream encoder extracting complementary features via Selective State-Space 2D (SS2D) blocks with linear complexity O(HWC), achieving 2.1× efficiency improvement over standard Transformers; (2) a Cross-Modal Global Fusion (CMGF) module capturing global dependencies through spatial-channel attention while suppressing modality-specific noise via adaptive gating; (3) a Scale-Coordinate Attention Fusion (SCAF) module integrating multi-scale features via coordinate attention and learned scale-aware weighting, improving small object detection by 2.5 percentage points; and (4) a Separable Dynamic Decoder generating scale-adaptive predictions through content-aware dynamic convolution, reducing computational cost by 48.9% compared to standard DETR decoders. On the DroneVehicle dataset, AMSRDet achieves 45.8% mAP@0.5:0.95 (81.2% mAP@0.5) at 68.3 Frames Per Second (FPS) with 28.6 million (M) parameters and 47.2 Giga Floating Point Operations (GFLOPs), outperforming twenty state-of-the-art detectors including YOLOv12 (+0.7% mAP), DEIM (+0.8% mAP), and Mamba-YOLO (+1.5% mAP). Cross-dataset evaluation on Camera-vehicle yields 52.3% mAP without fine-tuning, demonstrating strong generalization across viewpoints and scenarios. Full article
(This article belongs to the Special Issue AI and Smart Sensors for Intelligent Transportation Systems)
Show Figures

Figure 1

22 pages, 8373 KB  
Article
Real-Time Automated Ergonomic Monitoring: A Bio-Inspired System Using 3D Computer Vision
by Gabriel Andrés Zamorano Núñez, Nicolás Norambuena, Isabel Cuevas Quezada, José Luis Valín Rivera, Javier Narea Olmos and Cristóbal Galleguillos Ketterer
Biomimetics 2026, 11(2), 88; https://doi.org/10.3390/biomimetics11020088 - 26 Jan 2026
Viewed by 403
Abstract
Work-related musculoskeletal disorders (MSDs) remain a global occupational health priority, with recognized limitations in current point-in-time assessment methodologies. This research extends prior computer vision ergonomic assessment approaches by implementing biological proprioceptive feedback principles into a continuous, real-time monitoring system. Unlike traditional periodic ergonomic [...] Read more.
Work-related musculoskeletal disorders (MSDs) remain a global occupational health priority, with recognized limitations in current point-in-time assessment methodologies. This research extends prior computer vision ergonomic assessment approaches by implementing biological proprioceptive feedback principles into a continuous, real-time monitoring system. Unlike traditional periodic ergonomic evaluation methods such as “Rapid Upper Limb Assessment” (RULA), our bio-inspired system translates natural proprioceptive mechanisms—which enable continuous postural monitoring through spinal feedback loops operating at 50–150 ms latencies—into automated assessment technology. The system integrates (1) markerless 3D pose estimation via MediaPipe Holistic (33 anatomical landmarks at 30 FPS), (2) depth validation via Orbbec Femto Mega RGB-D camera (640 × 576 resolution, Time-of-Flight sensor), and (3) proprioceptive-inspired alert architecture. Experimental validation with 40 adult participants (age 18–25, n = 26 female, n = 14 male) performing standardized load-lifting tasks (6 kg) demonstrated that 62.5% exhibited critical postural risk (RULA ≥ 5) during dynamic movement versus 7.5% at static rest, with McNemar test p<0.001 (Cohen’s h=1.22, 95% CI: 0.91–0.97). The system achieved 95% Pearson correlation between risk elevation and alert activation, with response latency of 42.1±8.3 ms. This work demonstrates technical feasibility for continuous occupational monitoring. However, long-term prospective studies are required to establish whether continuous real-time feedback reduces workplace injury incidence. The biomimetic design framework provides a systematic foundation for translating biological feedback principles into occupational health technology. Full article
(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)
Show Figures

Figure 1

33 pages, 2852 KB  
Article
Robust Activity Recognition via Redundancy-Aware CNNs and Novel Pooling for Noisy Mobile Sensor Data
by Bnar Azad Hamad Ameen and Sadegh Abdollah Aminifar
Sensors 2026, 26(2), 710; https://doi.org/10.3390/s26020710 - 21 Jan 2026
Viewed by 356
Abstract
This paper proposes a robust convolutional neural network (CNN) architecture for human activity recognition (HAR) using smartphone accelerometer data, evaluated on the WISDM dataset. We introduce two novel pooling mechanisms—Pooling A (Extrema Contrast Pooling (ECP)) and Pooling B (Center Minus Variation (CMV))—that enhance [...] Read more.
This paper proposes a robust convolutional neural network (CNN) architecture for human activity recognition (HAR) using smartphone accelerometer data, evaluated on the WISDM dataset. We introduce two novel pooling mechanisms—Pooling A (Extrema Contrast Pooling (ECP)) and Pooling B (Center Minus Variation (CMV))—that enhance feature discrimination and noise robustness. ECP emphasizes sharp signal transitions through a nonlinear penalty based on the squared range between extrema, while CMV Pooling penalizes local variability by subtracting the standard deviation, improving resilience to noise. Input data are normalized to the [0, 1] range to ensure bounded and interpretable pooled outputs. The proposed framework is evaluated in two separate configurations: (1) a 1D CNN applied to raw tri-axial sensor streams with the proposed pooling layers, and (2) a histogram-based image encoding pipeline that transforms segment-level sensor redundancy into RGB representations for a 2D CNN with fully connected layers. Ablation studies show that histogram encoding provides the largest improvement, while the combination of ECP and CMV further enhances classification performance. Across six activity classes, the 2D CNN system achieves up to 96.84% weighted classification accuracy, outperforming baseline models and traditional average pooling. Under Gaussian, salt-and-pepper, and mixed noise conditions, the proposed pooling layers consistently reduce performance degradation, demonstrating improved stability in real-world sensing environments. These results highlight the benefits of redundancy-aware pooling and histogram-based representations for accurate and robust mobile HAR systems. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

14 pages, 3527 KB  
Article
Robust Intraoral Image Stitching via Deep Feature Matching: Framework Development and Acquisition Parameter Optimization
by Jae-Seung Jeong, Dong-Jun Seong and Seong Wook Choi
Appl. Sci. 2026, 16(2), 1064; https://doi.org/10.3390/app16021064 - 20 Jan 2026
Viewed by 274
Abstract
Low-cost RGB intraoral cameras are accessible alternatives to intraoral scanners; however, generating panoramic images is challenging due to narrow fields of view, textureless surfaces, and specular highlights. This study proposes a robust stitching framework and identifies optimal acquisition parameters to overcome these limitations. [...] Read more.
Low-cost RGB intraoral cameras are accessible alternatives to intraoral scanners; however, generating panoramic images is challenging due to narrow fields of view, textureless surfaces, and specular highlights. This study proposes a robust stitching framework and identifies optimal acquisition parameters to overcome these limitations. All experiments were conducted exclusively on a mandibular dental phantom model. Geometric consistency was further validated using repeated physical measurements of mandibular arch dimensions as ground-truth references. We employed a deep learning-based approach using SuperPoint and SuperGlue to extract and match features in texture-poor environments, enhanced by a central-reference stitching strategy to minimize cumulative drift errors. To validate the feasibility in a controlled setting, we conducted experiments on dental phantoms varying working distances (1.5–3.0 cm) and overlap ratios. The proposed method detected approximately 19–20 times more valid inliers than SIFT, significantly improving matching stability. Experimental results indicated that a working distance of 2.5 cm offers the optimal balance between stitching success rate and image detail for handheld operation, while a 1/3 overlap ratio yielded superior geometric integrity. This system demonstrates that robust 2D dental mapping is achievable with consumer-grade sensors when combined with advanced deep feature matching and optimized acquisition protocols. Full article
(This article belongs to the Special Issue AI for Medical Systems: Algorithms, Applications, and Challenges)
Show Figures

Figure 1

20 pages, 4633 KB  
Article
Teleoperation System for Service Robots Using a Virtual Reality Headset and 3D Pose Estimation
by Tiago Ribeiro, Eduardo Fernandes, António Ribeiro, Carolina Lopes, Fernando Ribeiro and Gil Lopes
Sensors 2026, 26(2), 471; https://doi.org/10.3390/s26020471 - 10 Jan 2026
Viewed by 473
Abstract
This paper presents an immersive teleoperation framework for service robots that combines real-time 3D human pose estimation with a Virtual Reality (VR) interface to support intuitive, natural robot control. The operator is tracked using MediaPipe for 2D landmark detection and an Intel RealSense [...] Read more.
This paper presents an immersive teleoperation framework for service robots that combines real-time 3D human pose estimation with a Virtual Reality (VR) interface to support intuitive, natural robot control. The operator is tracked using MediaPipe for 2D landmark detection and an Intel RealSense D455 RGB-D (Red-Green-Blue plus Depth) camera for depth acquisition, enabling 3D reconstruction of key joints. Joint angles are computed using efficient vector operations and mapped to the kinematic constraints of an anthropomorphic arm on the CHARMIE service robot. A VR-based telepresence interface provides stereoscopic video and head-motion-based view control to improve situational awareness during manipulation tasks. Experiments in real-world object grasping demonstrate reliable arm teleoperation and effective telepresence; however, vision-only estimation remains limited for axial rotations (e.g., elbow and wrist yaw), particularly under occlusions and unfavorable viewpoints. The proposed system provides a practical pathway toward low-cost, sensor-driven, immersive human–robot interaction for service robotics in dynamic environments. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

17 pages, 160077 KB  
Article
RA6D: Reliability-Aware 6D Pose Estimation via Attention-Guided Point Cloud in Aerosol Environments
by Woojin Son, Seunghyeon Lee, Taejoo Kim, Geonhwa Son and Yukyung Choi
Robotics 2026, 15(1), 8; https://doi.org/10.3390/robotics15010008 - 29 Dec 2025
Viewed by 338
Abstract
We address the problem of 6D object pose estimation in aerosol environments, where RGB and depth sensors experience correlated degradation due to scattering and absorption. Handling such spatially varying degradation typically requires depth restoration, but obtaining ground-truth complete depth in aerosol conditions is [...] Read more.
We address the problem of 6D object pose estimation in aerosol environments, where RGB and depth sensors experience correlated degradation due to scattering and absorption. Handling such spatially varying degradation typically requires depth restoration, but obtaining ground-truth complete depth in aerosol conditions is prohibitively expensive. To overcome this limitation without relying on costly depth completion, we propose RA6D, a framework that integrates attention-guided reliability modeling with feature distillation. The attention map generated during RGB dehazing reflects aerosol distribution and provides a compact indicator of depth reliability. By embedding this attention as an additional feature in an Attention-Guided Point cloud (AGP), the network can adaptively respond to spatially varying degradation. In addition, to address the scarcity of aerosol-domain data, we employ clean-to-aerosol feature distillation, transferring robust representations learned under clean conditions. Experiments on aerosol benchmarks show that RA6D achieves higher accuracy and significantly faster inference than restoration-based pipelines, offering a practical solution for real-time robotic perception under severe visual degradation. Full article
(This article belongs to the Special Issue Extended Reality and AI Empowered Robots)
Show Figures

Figure 1

Back to TopTop