Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (241)

Search Parameters:
Keywords = RGB-D sensor data

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 389 KB  
Article
NURSE-AI: A Nurse-by-Design Framework for Multi-Sensor, AI-Enabled Chronic Wound Assessment in Community Healthcare
by Chiara Barchielli, Sara Jayousi, Riccardo Mari, Beatrice Albanesi, Marco Alaimo, Gianluca Galeotti, Paolo Zoppi and Lorenzo Mucchi
Sensors 2026, 26(10), 2948; https://doi.org/10.3390/s26102948 - 8 May 2026
Viewed by 321
Abstract
Accurate and reproducible chronic wound assessment remains challenging in community healthcare, where environmental variability and subjective visual evaluation may introduce substantial measurement errors. Although multi-sensor technologies, including RGB–D imaging, mobile Light Detection and Ranging (LiDAR), thermal infrared imaging, and hyperspectral sensing, as well [...] Read more.
Accurate and reproducible chronic wound assessment remains challenging in community healthcare, where environmental variability and subjective visual evaluation may introduce substantial measurement errors. Although multi-sensor technologies, including RGB–D imaging, mobile Light Detection and Ranging (LiDAR), thermal infrared imaging, and hyperspectral sensing, as well as artificial intelligence (AI)-based analytics, have advanced considerably, real-world adoption remains limited because of workflow misalignment, insufficient interpretability, and regulatory complexity. This study presents NURSE-AI, a Nurse-by-Design methodological framework for evaluating and preparing multi-sensor, AI-enabled wound assessment systems for deployment in community healthcare. NURSE-AI is proposed as a pre-implementation methodological framework supported by a feasibility study based on a synthetic dataset; therefore, it is not a clinical validation study, and no patient data were used. The framework integrates: (i) a GDPR-compliant synthetic multimodal dataset including RGB, depth, thermal, and hyperspectral-proxy layers; (ii) workflow-embedded acquisition modeling tailored to Family and Community Nurses (FCNs); (iii) a Wound Bed Preparation (WBP)-aligned interpretability layer; and (iv) a governance-by-design checklist addressing interoperability, metadata traceability, and regulatory readiness under Regulation (EU) 2017/745. A mixed-method feasibility evaluation was conducted with community nurses within AUSL Toscana Centro (Italy). The System Usability Scale (SUS) yielded a mean score of 74.5 ± 6.2, indicating good usability. Synthetic multimodal evaluation demonstrated promising segmentation performance under controlled synthetic conditions, with Intersection over Union (IoU) values ranging from 0.87 to 0.93, and simulated Intraclass Correlation Coefficient (ICC) values ≥ 0.90 for wound area estimation. Agreement between AI-generated WBP mappings and nurse interpretation ranged from κ = 0.80 to κ = 0.84. The NURSE-AI framework proposes a structured and reproducible pathway connecting sensor innovation, AI interpretability, nursing workflow integration, and regulatory preparedness, thereby providing structured groundwork for future clinical validation and scalable deployment in community healthcare. Full article
(This article belongs to the Special Issue AI and Big Data Analytics for Medical E-Diagnosis)
Show Figures

Figure 1

25 pages, 5188 KB  
Article
MonoCrown for Crown-Level Tree Species Semantic Segmentation in Heterogeneous Forests Using UAV RGB Imagery
by Linzhi Wen and Guangsheng Chen
Remote Sens. 2026, 18(9), 1338; https://doi.org/10.3390/rs18091338 - 27 Apr 2026
Viewed by 363
Abstract
Crown-level tree species semantic segmentation enables fine-grained forest inventory and management. Current high-precision tree species classification typically relies on multi-source remote sensing data, the acquisition and processing of which remain costly for large-area applications, making low-cost unmanned aerial vehicle (UAV) RGB imagery an [...] Read more.
Crown-level tree species semantic segmentation enables fine-grained forest inventory and management. Current high-precision tree species classification typically relies on multi-source remote sensing data, the acquisition and processing of which remain costly for large-area applications, making low-cost unmanned aerial vehicle (UAV) RGB imagery an attractive option for large-scale forest mapping. However, in heterogeneous forests, complex canopy structures and the limited spectral discriminability of low-cost UAV RGB imagery make 2D appearance cues alone insufficient for reliable species discrimination, crown delineation, and accurate separation of adjacent crowns. This often leads to inter-class confusion, blurred crown boundaries, and poor recognition of small crowns. To address these limitations, this paper proposes MonoCrown (MCrown), which strengthens geometric and contextual representation for distinguishing visually similar species and delineating crowns from single-temporal UAV RGB imagery. To compensate for the insufficiency of appearance cues, MCrown introduces monocular depth inferred offline from the same RGB image as a frozen geometric prior, and integrates cross-window global–local attention (CW-GLA), bidirectional cross-modal attention (BiCoAttn), and depth-adaptive injection (DAI) to capture long-range dependencies and promote complementary use of appearance and geometric features, especially for small crowns with similar visual patterns in complex scenes. To validate the method’s effectiveness, a crown-level UAV RGB dataset covering approximately 40 km2 was constructed. Systematic comparative experiments were conducted on the proposed dataset and on public benchmarks, supporting the effectiveness of the proposed approach across ten dominant classes, especially for small crowns and visually similar categories. Its mean Intersection over Union (mIoU) and overall accuracy (OA) reached 74.1% and 87.3%, respectively. The method achieves high-precision crown-level tree species semantic segmentation using single-temporal UAV RGB as the sole acquired modality, while monocular depth inferred from the same RGB image serves only as a frozen geometric prior, without requiring multispectral, multi-temporal, or active-sensor acquisitions. This offers a practical solution for crown-level tree species mapping in heterogeneous forests. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

28 pages, 33079 KB  
Article
Pedestrian Localization Using Smartphone LiDAR in Indoor Environments
by Kwangjae Sung and Jaehun Kim
Electronics 2026, 15(9), 1810; https://doi.org/10.3390/electronics15091810 - 24 Apr 2026
Viewed by 306
Abstract
Many place recognition approaches, which identify previously visited places or locations by matching current sensory data, such as 2D RGB images and 3D point clouds, have been proposed to achieve accurate and robust localization and loop closure detection in global positioning system (GPS)-denied [...] Read more.
Many place recognition approaches, which identify previously visited places or locations by matching current sensory data, such as 2D RGB images and 3D point clouds, have been proposed to achieve accurate and robust localization and loop closure detection in global positioning system (GPS)-denied environments. Since visual place recognition (VPR) methods that rely on images captured by camera sensors are highly sensitive to variations in appearance, including changes in lighting, surface color, and shadows, they can lead to poor place recognition accuracy. In contrast, light detection and ranging (LiDAR)-based place recognition (LPR) approaches based on 3D point cloud data that captures the shape and geometric structure of the environment are robust to changes in place appearance and can therefore provide more reliable place recognition results than VPR methods. This work presents an indoor LPR method called PointNetVLAD-based indoor pedestrian localization (PIPL). PIPL is a deep network model that uses PointNetVLAD to learn to extract global descriptors from 3D LiDAR point cloud data. PIPL can recognize places previously visited by a pedestrian using point clouds captured by a low-cost LiDAR sensor on a smartphone in small-scale indoor environments, while PointNetVLAD performs place recognition for vehicles using high-cost LiDAR, GPS, and inertial measurement unit (IMU) sensors in large-scale outdoor areas. For place recognition on 3D point cloud reference maps generated from LiDAR scans, PointNetVLAD exploits the universal transverse mercator (UTM) coordinate system based on GPS and IMU measurements, whereas PIPL uses a virtual coordinate system designed in this study due to the unavailability of GPS indoors. In experiments conducted in campus buildings, PIPL shows significant advantages over NetVLAD (known as a convolutional neural network (CNN)-based VPR method). Particularly in indoor environments with repetitive scenes where geometric structures are preserved and image-based appearance features are sparse or unclear, PIPL achieved 39% higher top-1 accuracy and 10% higher top-3 accuracy compared to NetVLAD. Furthermore, PIPL achieved place recognition accuracy comparable to NetVLAD even with a small number of points in a 3D point cloud and outperformed NetVLAD even with a smaller model training dataset. The experimental results also indicate that PIPL requires over 76% less place retrieval time than NetVLAD while maintaining robust place classification performance. Full article
(This article belongs to the Special Issue Advanced Indoor Localization Technologies: From Theory to Application)
Show Figures

Figure 1

42 pages, 7524 KB  
Article
3D Face Reconstruction with Deep Learning: Architectures, Datasets, and Benchmark Analysis
by Sankarshan Dasgupta, Ju Shen and Tam V. Nguyen
Sensors 2026, 26(8), 2540; https://doi.org/10.3390/s26082540 - 20 Apr 2026
Viewed by 1248
Abstract
Three-Dimensional (3D) face reconstruction from monocular Red-Green-Blue (RGB) imagery remains a fundamental yet ill-posed challenge in computer vision, with applications in biometrics, augmented reality/virtual reality (AR/VR), and intelligent visual sensing systems. While deep learning has significantly improved reconstruction fidelity and realism, existing surveys [...] Read more.
Three-Dimensional (3D) face reconstruction from monocular Red-Green-Blue (RGB) imagery remains a fundamental yet ill-posed challenge in computer vision, with applications in biometrics, augmented reality/virtual reality (AR/VR), and intelligent visual sensing systems. While deep learning has significantly improved reconstruction fidelity and realism, existing surveys primarily focus on network architectures in isolation, often overlooking how sensing conditions, data acquisition protocols, and geometric calibration influence reconstruction reliability and evaluation outcomes. This paper presents a sensor-aware, end-to-end review of deep learning-based 3D face reconstruction and introduces a unified modular framework that connects sensing hardware, data acquisition, calibration, representation learning, and geometric refinement within a coherent pipeline. The reconstruction process is organized into four stages: sensor-driven acquisition and calibration, landmark estimation and feature extraction, 3D representation and parameter regression, and iterative refinement via differentiable rendering. Within this framework, we examine how sensor characteristics, calibration accuracy, representation models, and supervision strategies affect reconstruction accuracy, perceptual quality, robustness, and computational efficiency. We further synthesize the reported results across widely used benchmarks using both geometric and perceptual metrics, highlighting trade-offs between reconstruction fidelity and deployment constraints. By integrating sensing-aware analysis with architectural evaluation, this survey provides practical insights for developing scalable and reliable 3D face reconstruction systems under real-world conditions. Full article
Show Figures

Figure 1

25 pages, 3612 KB  
Article
Learning Modality Complementarity for RGB-D Salient Object Detection via Dynamic Neural Network
by Yuanhao Li, Jia Song, Chenglizhao Chen and Xinyu Liu
Electronics 2026, 15(7), 1361; https://doi.org/10.3390/electronics15071361 - 25 Mar 2026
Viewed by 416
Abstract
RGB-D salient object detection (RGB-D SOD) aims to accurately localize and segment visually salient objects by jointly leveraging RGB images and depth maps. Some existing methods rely on static fusion strategies with fixed paths and weights, which treat all regions equally and fail [...] Read more.
RGB-D salient object detection (RGB-D SOD) aims to accurately localize and segment visually salient objects by jointly leveraging RGB images and depth maps. Some existing methods rely on static fusion strategies with fixed paths and weights, which treat all regions equally and fail to capture the varying importance of different regions and modalities. Although some attention-based methods alleviate the limitations of static fusion by assigning adaptive weights to different regions and modalities, the quality of RGB and depth data may degrade in real-world scenarios due to sensor noise, illumination changes, or environmental interference. These attention-based methods often overlook inter-modality quality differences and complementarity, making them prone to over-relying on a certain modality, which can lead to noise introduction, feature conflicts, and performance degradation. To address these limitations, this paper proposes a novel dynamic feature routing and fusion framework for RGB-D SOD, which adaptively adjusts the fusion strategy according to the quality of input modalities. To enable modality quality awareness, the proposed method characterizes the modality complementarity between RGB and depth features in a task-driven manner inspired by information-theoretic principles. We introduce a task-relevance scoring function which is integrated with a mutual information estimator to quantify such complementarity, and emphasizes task-relevant features while suppressing redundancy. A dynamic routing module is then designed to perform feature selection guided by the captured complementarity. In addition, we propose a novel cross-modal fusion module to adaptively fuse the features selected by the dynamic routing module, which effectively enhances complementary representations while suppressing redundant features and noise interference. Extensive experiments conducted on seven public RGB-D SOD benchmark datasets demonstrate that the proposed method consistently achieves competitive performance, outperforming existing methods by an average of approximately 1% across multiple evaluation metrics. Notably, in challenging scenarios with severe modality quality degradation, the proposed method outperforms existing best-performing methods by up to 1.8%, demonstrating strong robustness against cluttered backgrounds, complex object structures, and diverse object scales. Overall, the proposed dynamic fusion framework provides a novel solution to modality quality imbalance in RGB-D salient object detection. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

36 pages, 23123 KB  
Article
Evaluating Environmental and Crop Factors Affecting Drone-Mounted GPR Performance in Agricultural Fields
by Milad Vahidi and Sanaz Shafian
Sensors 2026, 26(6), 1873; https://doi.org/10.3390/s26061873 - 16 Mar 2026
Viewed by 607
Abstract
Drone-mounted ground-penetrating radar (GPR) systems offer new opportunities for integrating subsurface characterization into remote sensing workflows. However, the interaction between flight parameters, surface conditions, and vegetation characteristics remains poorly understood. This study investigates the impact of flight altitude, surface topography, crop presence, and [...] Read more.
Drone-mounted ground-penetrating radar (GPR) systems offer new opportunities for integrating subsurface characterization into remote sensing workflows. However, the interaction between flight parameters, surface conditions, and vegetation characteristics remains poorly understood. This study investigates the impact of flight altitude, surface topography, crop presence, and canopy water content on the stability and interpretability of GPR signals collected using a drone. Field experiments were conducted under controlled conditions using agricultural plots with variable canopy cover and soil moisture regimes. Radargrams were processed to evaluate signal amplitude, reflection continuity, and attenuation patterns in relation to terrain slope and vegetation structure derived from co-registered RGB drone imagery. The results reveal that lower flight altitudes and smoother surfaces yield higher signal coherence and greater subsurface penetration, while increased canopy water content and biomass reduce signal strength and clarity. Integrating drone-based GPR observations with surface spectral and thermal data improved discrimination between soil and vegetation-induced signal distortions. The findings highlight the potential of drone–GPR systems as a complementary layer in a multi-sensor remote sensing framework for precision agriculture, environmental monitoring, and 3D soil mapping. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

15 pages, 3088 KB  
Article
Lightweight Semantic Segmentation Algorithm Based on Gated Visual State Space Models
by Kui Di, Jinming Cheng, Lili Zhang and Yubin Bao
Electronics 2026, 15(6), 1175; https://doi.org/10.3390/electronics15061175 - 12 Mar 2026
Viewed by 580
Abstract
LiDAR serves as the primary sensor for acquiring environmental information in intelligent driving systems. However, under adverse weather conditions, point cloud signals obtained by LiDAR suffer from intensity attenuation and noise interference, leading to a decline in segmentation accuracy. To address these issues, [...] Read more.
LiDAR serves as the primary sensor for acquiring environmental information in intelligent driving systems. However, under adverse weather conditions, point cloud signals obtained by LiDAR suffer from intensity attenuation and noise interference, leading to a decline in segmentation accuracy. To address these issues, this paper designs a lightweight semantic segmentation system based on the Gated Visual State Space Model (VMamba), named RainMamba. Specifically, the system utilizes spherical projection to transform point clouds into 2D sequences and constructs a physical perception feature embedding module guided by the Beer–Lambert law to explicitly model and suppress spatial noise at the source. Subsequently, an uncertainty-weighted cross-modal correction module is employed to incorporate RGB images for dynamically calibrating the degraded point cloud data. Finally, a VMamba backbone is adopted to establish global dependencies with linear complexity. Experimental results on the SemanticKITTI dataset demonstrate that the system achieves an inference speed of 83 FPS, with a relative mIoU improvement of approximately 7.2% compared to the real-time baseline PolarNet. Furthermore, zero-shot evaluations on the real-world SemanticSTF dataset validate the system’s robust Sim-to-Real generalization capability. Notably, RainMamba delivers highly competitive accuracy comparable to the state-of-the-art heavy-weight model PTv3 while requiring a significantly lower parameter footprint, thereby demonstrating its immense potential for practical edge-computing deployment. Full article
Show Figures

Figure 1

25 pages, 3654 KB  
Project Report
Computer Vision-Based Monitoring and Data Integration in a Multi-Trophic Controlled-Environment Agriculture Demonstrator
by Frederik Werner, Till Glockow, Kai Meissner, Martin Krüger, Markus Reischl and Christof M. Niemeyer
Sustainability 2026, 18(6), 2700; https://doi.org/10.3390/su18062700 - 10 Mar 2026
Viewed by 568
Abstract
Controlled-environment agriculture (CEA) and circular production systems require coordinated monitoring of biological and physicochemical processes across trophic levels. This project report presents the implementation of a multi-trophic controlled-environment agriculture demonstrator that integrates computer-vision-based monitoring with established sensor infrastructure for aquaculture, poultry, plants, microalgae, [...] Read more.
Controlled-environment agriculture (CEA) and circular production systems require coordinated monitoring of biological and physicochemical processes across trophic levels. This project report presents the implementation of a multi-trophic controlled-environment agriculture demonstrator that integrates computer-vision-based monitoring with established sensor infrastructure for aquaculture, poultry, plants, microalgae, duckweed, and insect modules. Stereo imaging and RGB-D systems are deployed for non-invasive quantification of fish biomass and plant growth, while continuous water-quality and environmental measurements (e.g., pH, dissolved oxygen, nitrate, ammonium, temperature, CO2) provide complementary process data. These data streams are synchronized within a shared database architecture to enable cross-module evaluation of nutrient dynamics, growth progression, and operational stability under real facility conditions. The implemented framework demonstrates how computer vision can extend conventional sensor-based monitoring by directly capturing biological performance indicators across aquatic, terrestrial, and microbial domains. While advanced predictive modeling and full digital twin simulation remain future development steps, the realized data-integration architecture establishes a structural foundation for the systematic evaluation of circular indoor food-production systems. The demonstrator illustrates how multimodal monitoring can support nutrient recirculation, transparency of biological variability, and data-driven assessment within controlled multi-trophic environments. Full article
(This article belongs to the Special Issue Food Science and Engineering for Sustainability—2nd Edition)
Show Figures

Figure 1

25 pages, 4622 KB  
Article
Edge–Point Cloud Fusion for Geometric Fitting of Cylinder Parameters Using Single-View RGB-D Data
by Huayan Zhang, Jiaxin Liu and Zhongkui Wang
Sensors 2026, 26(5), 1687; https://doi.org/10.3390/s26051687 - 7 Mar 2026
Viewed by 567
Abstract
Cylinders are common in both industrial and daily settings. Accurate geometric fitting of their parameters, including position, orientation, and radius, is important in real-world perception tasks and industrial applications. At present, consumer-level RGB-D cameras provide three-dimensional (3D) point cloud data with acceptable accuracy [...] Read more.
Cylinders are common in both industrial and daily settings. Accurate geometric fitting of their parameters, including position, orientation, and radius, is important in real-world perception tasks and industrial applications. At present, consumer-level RGB-D cameras provide three-dimensional (3D) point cloud data with acceptable accuracy and are widely adopted in various sensing applications. Consequently, this task is typically formulated as a geometric fitting problem based on point cloud data. However, point cloud data acquired from such sensors often contain noise, particularly when scanning curved surfaces, which directly degrades the performance of point cloud-based fitting methods. In this paper, we propose an edge–point cloud fusion approach for the geometric fitting of cylinder parameters from single-view RGB-D data. Our approach leverages two-dimensional (2D) image-domain edge constraints together with point cloud data, then fuses them in a unified formulation to jointly optimize cylinder parameters. By explicitly incorporating reliable edge information, our method effectively mitigates the effects of noise in point cloud data. We evaluate the proposed method using real-world RGB-D data, and the experimental results show that our approach achieves significant improvements in both accuracy and robustness. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

27 pages, 15861 KB  
Article
Explorable 3D Hyperspectral Models from Multi-Angle Gimballed LWIR Pushbroom Imagery
by Nikolay Golosov, Guido Cervone and Mark Salvador
Remote Sens. 2026, 18(5), 781; https://doi.org/10.3390/rs18050781 - 4 Mar 2026
Viewed by 444
Abstract
Hyperspectral imaging in the long-wave infrared (LWIR) range enables identification of chemical compositions and material properties, but reconstructing 3D models from gimballed pushbroom sensors remains challenging because their unique acquisition geometry is incompatible with conventional photogrammetric software designed for frame cameras. This study [...] Read more.
Hyperspectral imaging in the long-wave infrared (LWIR) range enables identification of chemical compositions and material properties, but reconstructing 3D models from gimballed pushbroom sensors remains challenging because their unique acquisition geometry is incompatible with conventional photogrammetric software designed for frame cameras. This study presents a workflow for creating explorable 3D models from multi-angle LWIR hyperspectral imagery by co-registering hyperspectral line-scan data with simultaneously acquired RGB frame camera imagery using deep learning-based image matching. The co-registered images are processed in commercial photogrammetric software (Agisoft Metashape), and a texture-to-image mapping algorithm preserves correspondences between 3D model coordinates and original hyperspectral pixels across multiple viewing angles. Quantitative evaluation against reference data demonstrates that co-registration reduces geometric error approaching the accuracy of models built from high-resolution RGB imagery. The resulting models enable the retrieval of 8–50 spectral signatures per surface point, captured from different viewing geometries. This approach facilitates interactive exploration of angular variations in thermal infrared spectra, supporting material identification for non-Lambertian surfaces where single-angle observations may be insufficient for reliable classification. Full article
Show Figures

Figure 1

30 pages, 8087 KB  
Article
A Novel SLAM Approach for Trajectory Generation of a Dual-Arm Mobile Robot (DAMR) Using Sensor Fusion
by Narendra Kumar Kolla and Pandu Ranga Vundavilli
Automation 2026, 7(2), 42; https://doi.org/10.3390/automation7020042 - 3 Mar 2026
Viewed by 880
Abstract
Simultaneous Localization and Mapping (SLAM) is essential for autonomous movement in intelligent robotic systems. Traditional SLAM using a single sensor, such as an Inertial Measurement Unit (IMU), faces challenges including noise and drift. This paper introduces a novel Cartographer-based SLAM approach for DAMR [...] Read more.
Simultaneous Localization and Mapping (SLAM) is essential for autonomous movement in intelligent robotic systems. Traditional SLAM using a single sensor, such as an Inertial Measurement Unit (IMU), faces challenges including noise and drift. This paper introduces a novel Cartographer-based SLAM approach for DAMR trajectory generation in indoor environments to reduce drift errors and improve localization accuracy. This SLAM approach integrates multi-sensor data with extended Kalman filter (EKF) fusion from wheel odometry, an RGB-D camera (RTAB-Map), and an IMU for precise mapping with DAMR trajectory generation and is compared with the heading reference trajectory generated by robot pose estimation and frame transformation. This system is implemented in the Robot Operating System (ROS 2) for coordinated data acquisition, processing, and visualization. After experimental verification, the DAMR trajectories generated are closer to the reference trajectory and drift errors are tuned. The experimental results revealed that the DAMR trajectory with multi-sensor data integration using the EKF effectively improved the positioning accuracy and robustness of the system. The proposed approach shows improved alignment with the reference trajectory, yielding a mean displacement error of 0.352% and an absolute trajectory error of 0.007 m, highlighting the effectiveness of the fusion approach for accurate indoor robot navigation. Full article
(This article belongs to the Section Robotics and Autonomous Systems)
Show Figures

Figure 1

36 pages, 4079 KB  
Article
FEGW-YOLO: A Feature-Complexity-Guided Lightweight Framework for Real-Time Multi-Crop Detection with Advanced Sensing Integration on Edge Devices
by Yaojiang Liu, Hongjun Tian, Yijie Yin, Yuhan Zhou, Wei Li, Yang Xiong, Yichen Wang, Zinan Nie, Yang Yang, Dongxiao Xie and Shijie Huang
Sensors 2026, 26(4), 1313; https://doi.org/10.3390/s26041313 - 18 Feb 2026
Cited by 2 | Viewed by 743
Abstract
Real-time object detection on resource-constrained edge devices remains a critical challenge in precision agriculture and autonomous systems, particularly when integrating advanced multi-modal sensors (RGB-D, thermal, hyperspectral). This paper introduces FEGW-YOLO, a lightweight detection framework explicitly designed to bridge the efficiency-accuracy gap for fine-grained [...] Read more.
Real-time object detection on resource-constrained edge devices remains a critical challenge in precision agriculture and autonomous systems, particularly when integrating advanced multi-modal sensors (RGB-D, thermal, hyperspectral). This paper introduces FEGW-YOLO, a lightweight detection framework explicitly designed to bridge the efficiency-accuracy gap for fine-grained visual perception on edge hardware while maintaining compatibility with multiple sensor modalities. The core innovation is a Feature Complexity Descriptor (FCD) metric that enables adaptive, layer-wise compression based on the information-bearing capacity of network features. This compression-guided approach is coupled with (1) Feature Engineering-driven Ghost Convolution (FEG-Conv) for parameter reduction, (2) Efficient Multi-Scale Attention (EMA) for compensating compression-induced information loss, and (3) Wise-IoU loss for improved localization in dense, occluded scenes. The framework follows a principled “Compress, Compensate, and Refine” philosophy that treats compression and compensation as co-designed objectives rather than isolated knobs. Extensive experiments on a custom strawberry dataset (11,752 annotated instances) and cross-crop validation on apples, tomatoes, and grapes demonstrate that FEGW-YOLO achieves 95.1% mAP@0.5 while reducing model parameters by 54.7% and computational cost (GFLOPs) by 53.5% compared to a strong YOLO-Agri baseline. Real-time inference on NVIDIA Jetson Xavier achieves 38 FPS at 12.3 W, enabling 40+ hours of continuous operation on typical agricultural robotic platforms. Multi-modal fusion experiments with RGB-D sensors demonstrate that the lightweight architecture leaves sufficient computational headroom for parallel processing of depth and visual data, a capability essential for practical advanced sensing systems. Field deployment in commercial strawberry greenhouses validates an 87.3% harvesting success rate with a 2.1% fruit damage rate, demonstrating feasibility for autonomous systems. The proposed framework advances the state-of-the-art in efficient agricultural sensing by introducing a principled metric-guided compression strategy, comprehensive multi-modal sensor integration, and empirical validation across diverse crop types and real-world deployment scenarios. This work bridges the gap between laboratory research and practical edge deployment of advanced sensing systems, with direct relevance to autonomous harvesting, precision monitoring, and other resource-constrained agricultural applications. Full article
Show Figures

Figure 1

21 pages, 7192 KB  
Article
Expectation–Maximization Method for RGB-D Camera Calibration with Motion Capture System
by Jianchu Lin, Guangxiao Du, Yugui Zhang, Yiyan Zhao, Qian Xie, Jian Yao and Ashim Khadka
Photonics 2026, 13(2), 183; https://doi.org/10.3390/photonics13020183 - 12 Feb 2026
Viewed by 622
Abstract
Camera calibration is an essential research direction in photonics and computer vision. It achieves the standardization of camera data by using intrinsic and extrinsic parameters. Recently, RGB-D cameras have been an important device by supplementing deep information, and they are commonly divided into [...] Read more.
Camera calibration is an essential research direction in photonics and computer vision. It achieves the standardization of camera data by using intrinsic and extrinsic parameters. Recently, RGB-D cameras have been an important device by supplementing deep information, and they are commonly divided into three kinds of mechanisms: binocular, structured light, and Time of Flight (ToF). However, the different mechanisms cause calibration methods to be complex and hardly uniform. Lens distortion, parameter loss, and sensor degradation et al. even fail calibration. To address the issues, we propose a camera calibration method based on the Expectation–Maximization (EM) algorithm. A unified model of latent variables is established for the different kinds of cameras. In the EM algorithm, the E-step estimates the hidden intrinsic parameters of cameras, while the M-step learns the distortion parameters of the lens. In addition, the depth values are calculated by the spatial geometric method, and they are calibrated using the least squares method under an optical motion capture system. Experimental results demonstrate that our method can be directly employed in the calibration of monocular and binocular RGB-D cameras, reducing image calibration errors between 0.6 and 1.2% less than least squares, Levenberg–Marquardt, Direct Linear Transform, and Trust Region Reflection. The deep error is reduced by 16 to 19.3 mm. Therefore, our method can effectively improve the performance of different RGB-D cameras. Full article
Show Figures

Figure 1

17 pages, 7804 KB  
Article
A 3D Camera-Based Approach for Real-Time Hand Configuration Recognition in Italian Sign Language
by Luca Ulrich, Asia De Luca, Riccardo Miraglia, Emma Mulassano, Simone Quattrocchio, Giorgia Marullo, Chiara Innocente, Federico Salerno and Enrico Vezzetti
Sensors 2026, 26(3), 1059; https://doi.org/10.3390/s26031059 - 6 Feb 2026
Cited by 1 | Viewed by 637
Abstract
Deafness poses significant challenges to effective communication, particularly in contexts where access to sign language interpreters is limited. Hand configuration recognition represents a fundamental component of sign language understanding, as configurations constitute a core cheremic element in many sign languages, including Italian Sign [...] Read more.
Deafness poses significant challenges to effective communication, particularly in contexts where access to sign language interpreters is limited. Hand configuration recognition represents a fundamental component of sign language understanding, as configurations constitute a core cheremic element in many sign languages, including Italian Sign Language (LIS). In this work, we address configuration-level recognition as an independent classification task and propose a machine vision framework based on RGB-D sensing. The proposed approach combines MediaPipe-based hand landmark extraction with normalized three-dimensional geometric features and a Support Vector Machine classifier. The first contribution of this study is the formulation of LIS hand configuration recognition as a standalone, configuration-level problem, decoupled from temporal gesture modeling. The second contribution is the integration of sensor-acquired RGB-D depth measurements into the landmark-based feature representation, enabling a direct comparison with estimated depth obtained from monocular data. The third contribution consists of a systematic experimental evaluation on two LIS configuration sets (6 and 16 classes), demonstrating that the use of real depth significantly improves classification performance and class separability, particularly for geometrically similar configurations. The results highlight the critical role of depth quality in configuration-level recognition and provide insights into the design of robust vision-based systems for LIS analysis. Full article
(This article belongs to the Special Issue Sensing and Machine Learning Control: Progress and Applications)
Show Figures

Figure 1

33 pages, 2852 KB  
Article
Robust Activity Recognition via Redundancy-Aware CNNs and Novel Pooling for Noisy Mobile Sensor Data
by Bnar Azad Hamad Ameen and Sadegh Abdollah Aminifar
Sensors 2026, 26(2), 710; https://doi.org/10.3390/s26020710 - 21 Jan 2026
Viewed by 603
Abstract
This paper proposes a robust convolutional neural network (CNN) architecture for human activity recognition (HAR) using smartphone accelerometer data, evaluated on the WISDM dataset. We introduce two novel pooling mechanisms—Pooling A (Extrema Contrast Pooling (ECP)) and Pooling B (Center Minus Variation (CMV))—that enhance [...] Read more.
This paper proposes a robust convolutional neural network (CNN) architecture for human activity recognition (HAR) using smartphone accelerometer data, evaluated on the WISDM dataset. We introduce two novel pooling mechanisms—Pooling A (Extrema Contrast Pooling (ECP)) and Pooling B (Center Minus Variation (CMV))—that enhance feature discrimination and noise robustness. ECP emphasizes sharp signal transitions through a nonlinear penalty based on the squared range between extrema, while CMV Pooling penalizes local variability by subtracting the standard deviation, improving resilience to noise. Input data are normalized to the [0, 1] range to ensure bounded and interpretable pooled outputs. The proposed framework is evaluated in two separate configurations: (1) a 1D CNN applied to raw tri-axial sensor streams with the proposed pooling layers, and (2) a histogram-based image encoding pipeline that transforms segment-level sensor redundancy into RGB representations for a 2D CNN with fully connected layers. Ablation studies show that histogram encoding provides the largest improvement, while the combination of ECP and CMV further enhances classification performance. Across six activity classes, the 2D CNN system achieves up to 96.84% weighted classification accuracy, outperforming baseline models and traditional average pooling. Under Gaussian, salt-and-pepper, and mixed noise conditions, the proposed pooling layers consistently reduce performance degradation, demonstrating improved stability in real-world sensing environments. These results highlight the benefits of redundancy-aware pooling and histogram-based representations for accurate and robust mobile HAR systems. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Back to TopTop