Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline

Search Results (134)

Search Parameters:
Keywords = RGB-detector

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 14276 KB  
Article
DualFOD: A Dual-Modality Deep Learning Framework for UAS-Based Foreign Object Debris Detection Using Thermal and RGB Imagery
by Owais Ahmed, Caleb S. Caldwell and Adeel Khalid
Drones 2026, 10(3), 225; https://doi.org/10.3390/drones10030225 - 23 Mar 2026
Viewed by 429
Abstract
Foreign Object Debris (FOD) poses critical risks to aircraft during takeoff and landing, resulting in billions of dollars in losses annually due to infrastructure damage and flight delays. Advancements in automated inspection technologies have enabled the use of Unmanned Aerial Systems (UAS) combined [...] Read more.
Foreign Object Debris (FOD) poses critical risks to aircraft during takeoff and landing, resulting in billions of dollars in losses annually due to infrastructure damage and flight delays. Advancements in automated inspection technologies have enabled the use of Unmanned Aerial Systems (UAS) combined with Artificial Intelligence (AI) for rapid FOD identification. While prior research has extensively evaluated optical sensors such as RGB imaging and radar, limited work has investigated the potential of thermal imaging for improved FOD visibility under challenging environmental conditions. This study proposes DualFOD, a dual-modality detection framework that integrates a supervised YOLO12-based RGB detector with an unsupervised thermal anomaly extraction pipeline for identifying debris on runway surfaces. A decision-level fusion algorithm combines detections from both branches using spatial proximity matching to produce a unified FOD inventory. The RGB branch achieves a precision of 0.954 and mAP@0.5 of 0.890 on the held-out test set. Cross-site validation at the Cobb County Sport Aviation Complex demonstrates that thermal detection recovers debris missed by RGB at higher altitudes, with the fused output consistently outperforming either single-modality branch. This research contributes toward scalable autonomous FOD monitoring that enhances operational safety in aviation environments. Full article
Show Figures

Figure 1

25 pages, 3809 KB  
Article
Detection of Floricane Raspberry Shrubs from Unmanned Aerial Vehicle Imagery Using YOLO Models
by Magdalena Kapłan, Kamil Buczyński and Zbigniew Jarosz
Agriculture 2026, 16(6), 664; https://doi.org/10.3390/agriculture16060664 - 14 Mar 2026
Viewed by 404
Abstract
The present study investigated the detection performance of the YOLOv8s, YOLO11s, and YOLO12s models, implemented within convolutional neural network architectures, for identifying floricane raspberry (Rubus idaeus L.) shrubs using RGB imagery and multispectral data acquired in the near-infrared, red-edge, red, and green [...] Read more.
The present study investigated the detection performance of the YOLOv8s, YOLO11s, and YOLO12s models, implemented within convolutional neural network architectures, for identifying floricane raspberry (Rubus idaeus L.) shrubs using RGB imagery and multispectral data acquired in the near-infrared, red-edge, red, and green spectral bands with a DJI Mavic 3 Multispectral drone. Model training and validation were conducted to evaluate both within-modality detection performance and cross-modality transferability. Under all training scenarios, the YOLO-based detectors reached near-saturated accuracy levels. However, cross-domain assessments demonstrated substantial variability depending on the spectral configuration of the input imagery. Overall, the combination of UAV-based multispectral sensing with convolutional neural network detection frameworks establishes a technological basis for automated shrub monitoring and constitutes a meaningful advancement toward intelligent raspberry production systems. This integration further creates new prospects for the technological development of cultivation practices for this crop within the rapidly evolving landscape of artificial intelligence-driven agriculture. Full article
Show Figures

Figure 1

24 pages, 14132 KB  
Article
MP-Stain-Detector: A Learning-Based Stain Detection Method with a Multispectral Polarization Optical System
by Shun Zou, Pei An, Xiaoming Liu, Zuyuan Zhu, Yan Song, Tao Song and You Yang
Sensors 2026, 26(5), 1703; https://doi.org/10.3390/s26051703 - 8 Mar 2026
Viewed by 289
Abstract
Stain detection is crucial for robotic sweepers, enabling them to assess environmental hygiene and execute precise cleaning tasks. However, in complex indoor scenarios, highly accurate stain detection remains a significant challenge, as the visual features of stains are often obscured by ambient light, [...] Read more.
Stain detection is crucial for robotic sweepers, enabling them to assess environmental hygiene and execute precise cleaning tasks. However, in complex indoor scenarios, highly accurate stain detection remains a significant challenge, as the visual features of stains are often obscured by ambient light, background textures, and specular reflections. Most existing deep learning methods rely predominantly on standard Red-Green-Blue (RGB) images, which lack sufficient discriminative features to robustly distinguish stains from complex backgrounds or accurately classify diverse contaminants. To address these limitations, we propose a deep learning stain detection framework integrated with a multispectral polarization optical system. First, to extract discriminative optical features, we design a lightweight multispectral polarization optical module tailored for integration into robotic sweepers. It captures rich spectral and polarization features while effectively suppressing specular reflections. Second, to enhance feature representation capabilities, we develop a multispectral polarization (MP)-based stain detector, named MP-stain-detector, which fuses spectral composition data with polarization texture features. Third, to support rigorous model training and evaluation, we construct a comprehensive dataset, the MP-Stain-dataset, collected in real-world home scenarios. Experiments on the MP-Stain-dataset demonstrate that our method improves the overall mean accuracy by 2.44%, and by 5.72% for the challenging light-colored liquid category compared to conventional approaches. Full article
(This article belongs to the Special Issue Computational Optical Sensing and Imaging)
Show Figures

Figure 1

22 pages, 39829 KB  
Article
Dual-Detector Vision and Depth-Aware Back-Projection for Accurate Apple Detection and 3D Localisation for Robotic Harvesting
by Tagor Hossain, Peng Shi and Levente Kovacs
Robotics 2026, 15(2), 47; https://doi.org/10.3390/robotics15020047 - 22 Feb 2026
Viewed by 558
Abstract
Accurate apple detection and precise three-dimensional (3D) localisation are essential for autonomous robotic harvesting in orchard environments, where occlusion, illumination variation, depth noise, and the similar colour appearance of fruits and surrounding leaves present significant challenges. This paper proposes a dual-detector vision framework [...] Read more.
Accurate apple detection and precise three-dimensional (3D) localisation are essential for autonomous robotic harvesting in orchard environments, where occlusion, illumination variation, depth noise, and the similar colour appearance of fruits and surrounding leaves present significant challenges. This paper proposes a dual-detector vision framework combined with depth-aware back-projection to achieve robust apple detection and metric 3D localisation in real time. The method integrates the complementary strengths of YOLOv8 and Mask R-CNN through confidence-weighted fusion of bounding boxes and pixel-wise union of segmentation masks, producing stabilised two-dimensional (2D) apple representations under visually ambiguous conditions. The fusion results are converted into dense 3D representations through depth-guided projection within the camera coordinate system representing the visible fruit surface. A depth-consistency weighting strategy assigns higher influence to depth-reliable pixels during centroid computation, thereby suppressing noisy or occluded depth measurements and improving the stability of 3D fruit centre estimation, while local intensity normalisation standardises neighbourhood-level pixel intensities to reduce the impact of shadows, highlights, and uneven lighting, enabling more consistent segmentation and detection across varying illumination conditions. Experimental results demonstrate an accuracy of 98.9%, an mAP of 94.2%, an F1-score of 93.3%, and a recall of 92.8%, while achieving real-time performance at 86.42 FPS, confirming the suitability of the proposed method for robotic harvesting in challenging orchard environments. Full article
(This article belongs to the Special Issue Perception and AI for Field Robotics)
Show Figures

Figure 1

17 pages, 3074 KB  
Article
Dual-Modal Vision–Sonar Object Detection for Underwater Robots Based on Deep Learning
by Xiaoming Wang, Zhenyu Wang and Dexue Bi
J. Mar. Sci. Eng. 2026, 14(4), 338; https://doi.org/10.3390/jmse14040338 - 10 Feb 2026
Viewed by 584
Abstract
Applying state-of-the-art RGB object detectors (e.g., YOLOv8) to underwater scenes often yields unstable performance due to scattering, absorption, illumination deficiency, and bandwidth-limited transmission that severely corrupt image contrast and details. Forward-looking sonar (FLS) remains informative in turbid or low-visibility water, yet its low [...] Read more.
Applying state-of-the-art RGB object detectors (e.g., YOLOv8) to underwater scenes often yields unstable performance due to scattering, absorption, illumination deficiency, and bandwidth-limited transmission that severely corrupt image contrast and details. Forward-looking sonar (FLS) remains informative in turbid or low-visibility water, yet its low resolution and weak semantics make conventional fusion architectures costly and difficult to deploy on resource-constrained robots. This paper proposes a paired-sample-free RGB–FLS joint training paradigm based on parameter sharing, where RGB and FLS images from different datasets are jointly used during training without any frame-level pairing or architectural modification. The resulting model preserves the original detector parameter scale and inference cost, and requires only RGB input at test time. Experiments on the SeaClear and Marine Debris FLS datasets under six representative underwater degradation factors (contrast loss, blur, resolution reduction, color cast, and JPEG compression) show consistent robustness gains over RGB-only training. In particular, under severe low-contrast corruption, the proposed training strategy improves mAP50 by more than 14 percentage points compared with the RGB-only baseline. These results indicate that sonar-domain supervision functions as an auxiliary structural constraint during optimization, rather than a conventional multi-source data enlargement. By forcing a shared-parameter detector to fit a texture-poor, geometry-dominant sonar domain, the learned representation is biased away from color/texture shortcuts and becomes more stable under adverse underwater degradations, without increasing deployment complexity. Full article
(This article belongs to the Special Issue Advances in Marine Autonomous Vehicles)
Show Figures

Figure 1

17 pages, 1606 KB  
Article
Non-Destructive Estimation of Nitrogen and Crude Protein in Mombasa Grass Using Morphometry, Colorimetry, and Spectrophotometry
by Rafael M. Amaral, Berman E. Espino, Floridalma E. M. Francisco, Oswaldo Navarrete and Carlomagno S. Castro
Nitrogen 2026, 7(1), 15; https://doi.org/10.3390/nitrogen7010015 - 29 Jan 2026
Viewed by 525
Abstract
Estimating nitrogen (N) and the corresponding crude protein (CP) content in forage crops is essential for optimizing fertilization and livestock nutrition. However, standard methods such as the Dumas and Kjeldahl techniques are destructive, costly, and impractical for field use in certain regions of [...] Read more.
Estimating nitrogen (N) and the corresponding crude protein (CP) content in forage crops is essential for optimizing fertilization and livestock nutrition. However, standard methods such as the Dumas and Kjeldahl techniques are destructive, costly, and impractical for field use in certain regions of developing countries. This study evaluated four non-destructive approaches—morphometric measurements, Pantone® color scales, smartphone-based RGB analysis (ColorDetector app), and SPAD chlorophyll readings—for predicting N and CP in Megathyrsus maximus (Mombasa grass). A total of 120 samples were collected under three nitrogen fertilization levels and assessed using linear mixed-effects models with cross-validation. Morphometric variables showed poor performance (R2 < 0.01), indicating low correlation with nutrient content. Pantone-based RGB models provided slightly better predictions (R2 ≈ 0.30) but were limited by subjectivity and discrete data. SPAD-based models demonstrated moderate predictive accuracy (R2 ≈ 0.53; RMSE ≈ 0.46%). The highest accuracy was achieved with smartphone-derived RGB data, where full RGB models reached R2 = 0.60 and RMSE = 0.45%. Based on these results, a practical green color scale was developed from RGB values to support real-time, in-field nitrogen and crude protein assessment. This study highlights smartphone imaging as a scalable, low-cost, and accurate tool for non-destructive estimation of nitrogen and crude protein in tropical forages, offering an accessible alternative to laboratory methods for producers and field technicians. Full article
Show Figures

Figure 1

27 pages, 49730 KB  
Article
AMSRDet: An Adaptive Multi-Scale UAV Infrared-Visible Remote Sensing Vehicle Detection Network
by Zekai Yan and Yuheng Li
Sensors 2026, 26(3), 817; https://doi.org/10.3390/s26030817 - 26 Jan 2026
Cited by 2 | Viewed by 580
Abstract
Unmanned Aerial Vehicle (UAV) platforms enable flexible and cost-effective vehicle detection for intelligent transportation systems, yet small-scale vehicles in complex aerial scenes pose substantial challenges from extreme scale variations, environmental interference, and single-sensor limitations. We present AMSRDet (Adaptive Multi-Scale Remote Sensing Detector), an [...] Read more.
Unmanned Aerial Vehicle (UAV) platforms enable flexible and cost-effective vehicle detection for intelligent transportation systems, yet small-scale vehicles in complex aerial scenes pose substantial challenges from extreme scale variations, environmental interference, and single-sensor limitations. We present AMSRDet (Adaptive Multi-Scale Remote Sensing Detector), an adaptive multi-scale detection network fusing infrared (IR) and visible (RGB) modalities for robust UAV-based vehicle detection. Our framework comprises four novel components: (1) a MobileMamba-based dual-stream encoder extracting complementary features via Selective State-Space 2D (SS2D) blocks with linear complexity O(HWC), achieving 2.1× efficiency improvement over standard Transformers; (2) a Cross-Modal Global Fusion (CMGF) module capturing global dependencies through spatial-channel attention while suppressing modality-specific noise via adaptive gating; (3) a Scale-Coordinate Attention Fusion (SCAF) module integrating multi-scale features via coordinate attention and learned scale-aware weighting, improving small object detection by 2.5 percentage points; and (4) a Separable Dynamic Decoder generating scale-adaptive predictions through content-aware dynamic convolution, reducing computational cost by 48.9% compared to standard DETR decoders. On the DroneVehicle dataset, AMSRDet achieves 45.8% mAP@0.5:0.95 (81.2% mAP@0.5) at 68.3 Frames Per Second (FPS) with 28.6 million (M) parameters and 47.2 Giga Floating Point Operations (GFLOPs), outperforming twenty state-of-the-art detectors including YOLOv12 (+0.7% mAP), DEIM (+0.8% mAP), and Mamba-YOLO (+1.5% mAP). Cross-dataset evaluation on Camera-vehicle yields 52.3% mAP without fine-tuning, demonstrating strong generalization across viewpoints and scenarios. Full article
(This article belongs to the Special Issue AI and Smart Sensors for Intelligent Transportation Systems)
Show Figures

Figure 1

21 pages, 1284 KB  
Article
Probabilistic Indoor 3D Object Detection from RGB-D via Gaussian Distribution Estimation
by Hyeong-Geun Kim
Mathematics 2026, 14(3), 421; https://doi.org/10.3390/math14030421 - 26 Jan 2026
Viewed by 428
Abstract
Conventional object detectors represent each object by a deterministic bounding box, regressing its center and size from RGB images. However, such discrete parameterization ignores the inherent uncertainty in object appearance and geometric projection, which can be more naturally modeled as a probabilistic density [...] Read more.
Conventional object detectors represent each object by a deterministic bounding box, regressing its center and size from RGB images. However, such discrete parameterization ignores the inherent uncertainty in object appearance and geometric projection, which can be more naturally modeled as a probabilistic density field. Recent works have introduced Gaussian-based formulations that treat objects as distributions rather than boxes, yet they remain limited to 2D images or require late fusion between image and depth modalities. In this paper, we propose a unified Gaussian-based framework for direct 3D object detection from RGB-D inputs. Our method is built upon a vision transformer backbone to effectively capture global context. Instead of separately embedding RGB and depth features or refining depth within region proposals, our method takes a full four-channel RGB-D tensor and predicts the mean and covariance of a 3D Gaussian distribution for each object in a single forward pass. We extend a pretrained vision transformer to accept four-channel inputs by augmenting the patch embedding layer while preserving ImageNet-learned representations. This formulation allows the detector to represent both object location and geometric uncertainty in 3D space. By optimizing divergence metrics such as the Kullback–Leibler or Bhattacharyya distances between predicted and target distributions, the network learns a physically consistent probabilistic representation of objects. Experimental results on the SUN RGB-D benchmark demonstrate that our approach achieves competitive performance compared to state-of-the-art point-cloud-based methods while offering uncertainty-aware and geometrically interpretable 3D detections. Full article
Show Figures

Figure 1

22 pages, 9269 KB  
Article
Efficient Layer-Wise Cross-View Calibration and Aggregation for Multispectral Object Detection
by Xiao He, Tong Yang, Tingzhou Yan, Hongtao Li, Yang Ge, Zhijun Ren, Zhe Liu, Jiahe Jiang and Chang Tang
Electronics 2026, 15(3), 498; https://doi.org/10.3390/electronics15030498 - 23 Jan 2026
Viewed by 413
Abstract
Multispectral object detection is a fundamental task with an extensive range of practical implications. In particular, combining visible (RGB) and infrared (IR) images can offer complementary information that enhances detection performance in different weather scenarios. However, the existing methods generally involve aligning features [...] Read more.
Multispectral object detection is a fundamental task with an extensive range of practical implications. In particular, combining visible (RGB) and infrared (IR) images can offer complementary information that enhances detection performance in different weather scenarios. However, the existing methods generally involve aligning features across modalities and require proposals for the two-stage detectors, which are often slow and unsuitable for large-scale applications. To overcome this challenge, we introduce a novel one-stage oriented detector for RGB-infrared object detection called the Layer-wise Cross-Modality calibration and Aggregation (LCMA) detector. LCMA employs a layer-wise strategy to achieve cross-modality alignment by using the proposed inter-modality spatial-reduction attention. Moreover, we design Gated Coupled Filter in each layer to capture semantically meaningful features while ensuring that well-aligned and foreground object information is obtained before forwarding them to the detection head. This relieves the need for a region proposal step for the alignment, enabling direct category and bounding box predictions in a unified one-stage oriented detector. Extensive experiments on two challenging datasets demonstrate that the proposed LCMA outperforms state-of-the-art methods in terms of both accuracy and computational efficiency, which implies the efficacy of our approach in exploiting multi-modality information for robust and efficient multispectral object detection. Full article
(This article belongs to the Special Issue Multi-View Learning and Applications)
Show Figures

Figure 1

25 pages, 3879 KB  
Article
Robust Occluded Object Detection in Multimodal Autonomous Driving: A Fusion-Aware Learning Framework
by Zhengqing Li and Baljit Singh
Electronics 2026, 15(1), 245; https://doi.org/10.3390/electronics15010245 - 5 Jan 2026
Viewed by 794
Abstract
Reliable occluded object detection remains a persistent core challenge for autonomous driving perception systems, particularly in complex urban scenarios where targets are predominantly partially or fully obscured by static obstacles or dynamic agents. Conventional single-modality detectors often fail to capture adequate discriminative cues [...] Read more.
Reliable occluded object detection remains a persistent core challenge for autonomous driving perception systems, particularly in complex urban scenarios where targets are predominantly partially or fully obscured by static obstacles or dynamic agents. Conventional single-modality detectors often fail to capture adequate discriminative cues for robust recognition, while existing multimodal fusion strategies typically lack explicit occlusion modeling and effective feature completion mechanisms, ultimately degrading performance in safety-critical operating conditions. To address these limitations, we propose a novel Fusion-Aware Occlusion Detection (FAOD) framework that integrates explicit visibility reasoning with implicit cross-modal feature reconstruction. Specifically, FAOD leverages synchronized red–green–blue (RGB), light detection and ranging (LiDAR), and optional radar/infrared inputs, employs a visibility-aware attention mechanism to infer target occlusion states, and embeds a cross-modality completion module to reconstruct missing object features via complementary non-occluded modal information; it further incorporates an occlusion-aware data augmentation and annotation strategy to enhance model generalization across diverse occlusion patterns. Extensive evaluations on four benchmark datasets demonstrate that FAOD achieves state-of-the-art performance, including a +8.75% occlusion-level mean average precision (OL-mAP) improvement over existing methods on heavily occluded objects O=2 in the nuScenes dataset, while maintaining real-time efficiency. These findings confirm FAOD’s potential to advance reliable multimodal perception for next-generation autonomous driving systems in safety-critical environments. Full article
Show Figures

Figure 1

15 pages, 1730 KB  
Article
Research on Printed Circuit Board (PCB) Defect Detection Algorithm Based on Convolutional Neural Networks (CNN)
by Zhiduan Ni and Yeonhee Kim
Appl. Sci. 2025, 15(24), 13115; https://doi.org/10.3390/app152413115 - 12 Dec 2025
Viewed by 1886
Abstract
Printed Circuit Board (PCB) defect detection is critical for quality control in electronics manufacturing. Traditional manual inspection and classical Automated Optical Inspection (AOI) methods face challenges in speed, consistency, and flexibility. This paper proposes a CNN-based approach for automatic PCB defect detection using [...] Read more.
Printed Circuit Board (PCB) defect detection is critical for quality control in electronics manufacturing. Traditional manual inspection and classical Automated Optical Inspection (AOI) methods face challenges in speed, consistency, and flexibility. This paper proposes a CNN-based approach for automatic PCB defect detection using the YOLOv5 model. The method leverages a Convolutional Neural Network to identify various PCB defect types (e.g., open circuits, short circuits, and missing holes) from board images. In this study, a model was trained on a PCB image dataset with detailed annotations. Data augmentation techniques, such as sharpening and noise filtering, were applied to improve robustness. The experimental results showed that the proposed approach could locate and classify multiple defect types on PCBs, with overall detection precision and recall above 90% and 91%, respectively, enabling reliable automated inspection. A brief comparison with the latest YOLOv8 model is also presented, showing that the proposed CNN-based detector offers competitive performance. This study shows that deep learning-based defect detection can improve the PCB inspection efficiency and accuracy significantly, paving the way for intelligent manufacturing and quality assurance in PCB production. From a sensing perspective, we frame the system around an industrial RGB camera and controlled illumination, emphasizing how imaging-sensor choices and settings shape defect visibility and model robustness, and sketching future sensor-fusion directions. Full article
(This article belongs to the Special Issue Applications in Computer Vision and Image Processing)
Show Figures

Figure 1

20 pages, 15574 KB  
Article
Temporal Encoding Strategies for YOLO-Based Detection of Honeybee Trophallaxis Behavior in Precision Livestock Systems
by Gabriela Vdoviak and Tomyslav Sledevič
Agriculture 2025, 15(22), 2338; https://doi.org/10.3390/agriculture15222338 - 11 Nov 2025
Viewed by 1122
Abstract
Trophallaxis, a fundamental social behavior observed among honeybees, involves the redistribution of food and chemical signals. The automation of its detection under field-realistic conditions poses a significant challenge due to the presence of crowding, occlusions, and brief, fine-scale motions. In this study, we [...] Read more.
Trophallaxis, a fundamental social behavior observed among honeybees, involves the redistribution of food and chemical signals. The automation of its detection under field-realistic conditions poses a significant challenge due to the presence of crowding, occlusions, and brief, fine-scale motions. In this study, we propose a markerless, deep learning-based approach that injects short- and mid-range temporal features into single-frame You Only Look Once (YOLO) detectors via temporal-to-RGB encodings. A new dataset for trophallaxis detection, captured under diverse illumination and density conditions, has been released. On an NVIDIA RTX 4080 graphics processing unit (GPU), temporal-to-RGB inputs consistently outperformed RGB-only baselines across YOLO families. The YOLOv8m model improved from 84.7% mean average precision (mAP50) with RGB inputs to 91.9% with stacked-grayscale encoding and to 95.5% with temporally encoded motion and averaging over a 1 s window (TEMA-1s). Similar improvements were observed for larger models, with best mAP50 values approaching 94–95%. On an NVIDIA Jetson AGX Orin embedded platform, TensorRT-optimized YOLO models sustained real-time throughput, reaching 30 frames per second (fps) for small and 23–25 fps for medium models with temporal-to-RGB inputs. The results showed that the TEMA-1s encoded YOLOv8m model has achieved the highest mAP50 of 95.5% with real-time inference on both workstation and edge hardware. These findings indicate that temporal-to-RGB encodings provide an accurate and computationally efficient solution for markerless trophallaxis detection in field-realistic conditions. This approach can be further extended to multi-behavior recognition or integration of additional sensing modalities in precision beekeeping. Full article
Show Figures

Figure 1

28 pages, 24418 KB  
Article
PICU Face and Thoracoabdominal Detection Using Self-Supervised Divided Space–Time Mamba
by Mohamed Khalil Ben Salah, Philippe Jouvet and Rita Noumeir
Life 2025, 15(11), 1706; https://doi.org/10.3390/life15111706 - 4 Nov 2025
Viewed by 1045
Abstract
Non-contact vital sign monitoring in Pediatric Intensive Care Units is challenged by frequent occlusions, data scarcity, and the need for temporally stable anatomical tracking to extract reliable physiological signals. Traditional detectors produce unstable tracking, while video transformers are too computationally intensive for deployment [...] Read more.
Non-contact vital sign monitoring in Pediatric Intensive Care Units is challenged by frequent occlusions, data scarcity, and the need for temporally stable anatomical tracking to extract reliable physiological signals. Traditional detectors produce unstable tracking, while video transformers are too computationally intensive for deployment on resource-limited clinical hardware. We introduce Divided Space–Time Mamba, an architecture that decouples spatial and temporal feature learning using State Space Models to achieve linear-time complexity, over 92% lower than standard transformers. To handle data scarcity, we employ self-supervised pre-training with masked autoencoders on over 50 k domain-specific video clips and further enhance robustness with multimodal RGB-D input. Our model demonstrates superior performance, achieving 0.96 mAP@0.5, 0.62 mAP50-95, and 0.95 rotated IoU. Operating at 23 FPS (43 ms latency), our method is approximately 1.9× faster than VideoMAE and 5.7× faster than frame-wise YOLOv8, demonstrating its suitability for real-time clinical monitoring. Full article
Show Figures

Figure 1

20 pages, 10851 KB  
Article
Evaluating Feature-Based Homography Pipelines for Dual-Camera Registration in Acupoint Annotation
by Thathsara Nanayakkara, Hadi Sedigh Malekroodi, Jaeuk Sul, Chang-Su Na, Myunggi Yi and Byeong-il Lee
J. Imaging 2025, 11(11), 388; https://doi.org/10.3390/jimaging11110388 - 1 Nov 2025
Viewed by 1140
Abstract
Reliable acupoint localization is essential for developing artificial intelligence (AI) and extended reality (XR) tools in traditional Korean medicine; however, conventional annotation of 2D images often suffers from inter- and intra-annotator variability. This study presents a low-cost dual-camera imaging system that fuses infrared [...] Read more.
Reliable acupoint localization is essential for developing artificial intelligence (AI) and extended reality (XR) tools in traditional Korean medicine; however, conventional annotation of 2D images often suffers from inter- and intra-annotator variability. This study presents a low-cost dual-camera imaging system that fuses infrared (IR) and RGB views on a Raspberry Pi 5 platform, incorporating an IR ink pen in conjunction with a 780 nm emitter array to standardize point visibility. Among the tested marking materials, the IR ink showed the highest contrast and visibility under IR illumination, making it the most suitable for acupoint detection. Five feature detectors (SIFT, ORB, KAZE, AKAZE, and BRISK) were evaluated with two matchers (FLANN and BF) to construct representative homography pipelines. Comparative evaluations across multiple camera-to-surface distances revealed that KAZE + FLANN achieved the lowest mean 2D error (1.17 ± 0.70 px) and the lowest mean aspect-aware error (0.08 ± 0.05%) while remaining computationally feasible on the Raspberry Pi 5. In hand-image experiments across multiple postures, the dual-camera registration maintained a mean 2D error below ~3 px and a mean aspect-aware error below ~0.25%, confirming stable and reproducible performance. The proposed framework provides a practical foundation for generating high-quality acupoint datasets, supporting future AI-based localization, XR integration, and automated acupuncture-education systems. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

40 pages, 4610 KB  
Article
Semantic Priority Navigation for Energy-Aware Mining Robots
by Claudio Urrea, Kevin Valencia-Aragón and John Kern
Systems 2025, 13(9), 799; https://doi.org/10.3390/systems13090799 - 11 Sep 2025
Cited by 2 | Viewed by 1871
Abstract
Autonomous navigation in subterranean mines is hindered by deformable terrain, dust-laden visibility, and densely packed, safety-critical machinery. We propose a systems-oriented navigation framework that embeds semantic priorities into reactive planning for energy-aware autonomy in Robot Operating System (ROS). A lightweight Convolutional Neural Network [...] Read more.
Autonomous navigation in subterranean mines is hindered by deformable terrain, dust-laden visibility, and densely packed, safety-critical machinery. We propose a systems-oriented navigation framework that embeds semantic priorities into reactive planning for energy-aware autonomy in Robot Operating System (ROS). A lightweight Convolutional Neural Network (CNN) detector fuses RGB-D and LiDAR data to classify obstacles like humans, haul trucks, and debris, writing risk-weighted virtual LaserScans to the local planner so obstacles are evaluated by relevance rather than geometry. By integrating class-specific inflation layers in costmaps within a cyber–physical systems architecture, the system ensures ISO-compliant separation without sacrificing throughput. In Gazebo experiments with three obstacle classes and 60 runs, high-risk clearance increased by 34%, collisions dropped to zero, mission time remained statistically unchanged, and estimated kinematic effort increased by 6% relative to a geometry-only baseline. These results demonstrate effective systems integration and a favorable safety–efficiency trade-off in industrial cyber–physical environments, providing a reproducible reference for scalable deployment in real-world unstructured mining environments. Full article
Show Figures

Figure 1

Back to TopTop