MDPI - Publisher of Open Access Journals

39 pages, 51651 KB

Open AccessArticle

SMG-UAV: Sparse Mutual Guided RGB–Event Fusion for Robust UAV Detection in Challenging Dynamic Environments

by Ruizhi Zhang, Jinghua Hou, Yan Shi, Xiping Dai, Ke Zhang and Jingjing Diao

Drones 2026, 10(7), 486; https://doi.org/10.3390/drones10070486 (registering DOI) - 25 Jun 2026

Robust unmanned aerial vehicle (UAV) detection in real low-altitude anti-UAV scenarios remains challenging due to motion blur, extreme illumination, cluttered backgrounds, and tiny target sizes. Most existing UAV detectors rely on RGB imagery, but their performance often degrades severely under these adverse conditions. [...] Read more.

Robust unmanned aerial vehicle (UAV) detection in real low-altitude anti-UAV scenarios remains challenging due to motion blur, extreme illumination, cluttered backgrounds, and tiny target sizes. Most existing UAV detectors rely on RGB imagery, but their performance often degrades severely under these adverse conditions. Event cameras, as a neuromorphic sensing modality, capture motion-sensitive responses with high temporal resolution and thus provide complementary cues for robust UAV detection. However, existing RGB–event fusion detectors usually employ homogeneous feature extraction and generic fusion mechanisms, which are insufficient to handle heterogeneous modality degradation and exploit reliable cross-modal cues. To address this limitation, we propose SMG-UAV, a sparse mutual guided RGB–event fusion network for robust small-UAV detection. The proposed method integrates a hybrid dual-branch backbone for modality-specific representation learning, a Sparse Mutual Guided Bridge for bidirectional sparse cross-modal refinement, and a Selective Gated Pyramid Neck for multiscale enhancement of weak UAV responses. Experiments on the Florence RGB-Event Drone Dataset (FRED) and the Neuromorphic-RGB Drone Detection Dataset (NeRDD) demonstrate that SMG-UAV achieves state-of-the-art performance, outperforming the strongest competing method by an average of 5.2 points in

{AP}_{50}

, while delivering stronger robustness under multiple challenging anti-UAV conditions. Full article

(This article belongs to the Special Issue Detection, Identification and Tracking of UAVs and Drones: 2nd Edition)

► Show Figures

Figure 1

49 pages, 2508 KB

Open AccessReview

Sensing the Action: Rethinking Sensor Modalities and Multi-Modal Fusion in Vision–Language–Action Models for Robotic Manipulation

by Byoung Chul Ko

Sensors 2026, 26(11), 3541; https://doi.org/10.3390/s26113541 - 3 Jun 2026

Viewed by 625

Abstract

Recent Vision–Language–Action (VLA) models have rapidly emerged as general-purpose robotic policies that integrate language understanding, visual perception, and robot control. However, prior studies and surveys have primarily emphasized backbone architectures, action decoders, training recipes, and benchmark performance, whereas relatively limited systematic attention has [...] Read more.

Recent Vision–Language–Action (VLA) models have rapidly emerged as general-purpose robotic policies that integrate language understanding, visual perception, and robot control. However, prior studies and surveys have primarily emphasized backbone architectures, action decoders, training recipes, and benchmark performance, whereas relatively limited systematic attention has been given to sensor modality selection, heterogeneous signal alignment and fusion, and their connection to action generation, all of which are critical to the performance and safety of real-world robotic manipulation. This survey addresses this gap by reinterpreting VLA within the framework of a sensor–fusion–action pipeline. This study first presents a systematic taxonomy of major sensor modalities, including RGB, depth, tactile sensing, force/torque, proprioception and inertial measurement unit, multi-spectral/thermal, and event-based vision, and compares them in terms of the physical information they provide, their characteristic failure modes, and their deployment constraints. This survey further reviews teleoperation-, human video-, and simulation-based data collection pipelines, together with representative dataset configurations, and analyzes the multi-modal design space from a sensor-centric perspective, including early and late fusion, cross-attention, token-level fusion, adapters, mixture of experts, and multi-rate action representations. In addition, this study identifies a strong bias in existing benchmarks toward RGB-centric inputs and single success-rate metrics and emphasizes the need for a multidimensional evaluation framework incorporating robustness, worst-case performance, safety, latency, and efficiency. By shifting the focus away from a model-centric narrative and explicitly accounting for real-world sensor complexity, this survey seeks to establish a sensor-centered foundation for the next generation of Physical AI. Full article

(This article belongs to the Special Issue Feature Review Papers in Sensors and Robotics)

► Show Figures

Figure 1

27 pages, 4914 KB

Open AccessArticle

A Viewpoint on Event-Driven Perception and Digital Twin Integration for Autonomous Mining Robotics

by Vasiliki Balaska and Antonios Gasteratos

Electronics 2026, 15(10), 1993; https://doi.org/10.3390/electronics15101993 - 8 May 2026

Viewed by 405

Abstract

Robotic systems are increasingly being deployed in mining operations to support tasks such as inspection, navigation, environmental monitoring, and safety supervision. However, mining environments present significant challenges for robotic perception due to dynamic terrain conditions, poor illumination, airborne dust, and frequent disturbances caused [...] Read more.

Robotic systems are increasingly being deployed in mining operations to support tasks such as inspection, navigation, environmental monitoring, and safety supervision. However, mining environments present significant challenges for robotic perception due to dynamic terrain conditions, poor illumination, airborne dust, and frequent disturbances caused by excavation and heavy machinery. Conventional frame-based vision systems often struggle under these conditions due to motion blur, latency, and limited dynamic range. This study proposes a system-level conceptual framework for integrating event-based sensing into robotic mining systems in order to support perception in highly dynamic and safety-critical environments, with the aim of improving responsiveness and robustness under such conditions. Event-based cameras, inspired by biological vision, asynchronously detect brightness changes at the pixel level and provide microsecond temporal resolution with high dynamic range and low latency. The proposed framework combines event cameras with complementary sensing modalities including LiDAR, inertial measurement units, and RGB cameras to form a multi-sensor perception architecture. The framework is structured into multiple functional layers encompassing environmental sensing, event-driven perception, sensor fusion and AI processing, digital twin integration, and autonomous decision-making. Potential application scenarios including robotic tunnel inspection, autonomous navigation of mining robots, hazard detection, multi-agent cooperation in mining sites, and real-time digital twin updating are also discussed. The proposed framework provides a unified system-level reference architecture intended to guide future implementation and validation. Full article

(This article belongs to the Special Issue Next-Generation Robotic Intelligence: Active Perception, Adaptive Collaboration and Cross-Domain Digital Twins)

► Show Figures

Figure 1

28 pages, 2836 KB

Open AccessFeature PaperArticle

MA-EVIO: A Motion-Aware Approach to Event-Based Visual–Inertial Odometry

by Mohsen Shahraki, Ahmed Elamin and Ahmed El-Rabbany

Sensors 2025, 25(23), 7381; https://doi.org/10.3390/s25237381 - 4 Dec 2025

Cited by 1 | Viewed by 1391

Abstract

Indoor localization remains a challenging task due to the unavailability of reliable global navigation satellite system (GNSS) signals in most indoor environments. One way to overcome this challenge is through visual–inertial odometry (VIO), which enables real-time pose estimation by fusing camera and inertial [...] Read more.

Indoor localization remains a challenging task due to the unavailability of reliable global navigation satellite system (GNSS) signals in most indoor environments. One way to overcome this challenge is through visual–inertial odometry (VIO), which enables real-time pose estimation by fusing camera and inertial measurements. However, VIO suffers from performance degradation under high-speed motion and in poorly lit environments. In such scenarios, motion blur, sensor noise, and low temporal resolution reduce the accuracy and robustness of the estimated trajectory. To address these limitations, we propose a motion-aware event-based VIO (MA-EVIO) system that adaptively fuses asynchronous event data, frame-based imagery, and inertial measurements for robust and accurate pose estimation. MA-EVIO employs a hybrid tracking strategy combining sparse feature matching and direct photometric alignment. A key innovation is its motion-aware keyframe selection, which dynamically adjusts tracking parameters based on real-time motion classification and feature quality. This motion awareness also enables adaptive sensor fusion: during fast motion, the system prioritizes event data, while under slow or stable motion, it relies more on RGB frames and feature-based tracking. Experimental results on the DAVIS240c and VECtor benchmarks demonstrate that MA-EVIO outperforms state-of-the-art methods, achieving a lower mean position error (MPE) of 0.19 on DAVIS240c compared to 0.21 (EVI-SAM) and 0.24 (PL-EVIO), and superior performance on VECtor with MPE/mean rotation error (MRE) of 1.19%/1.28 deg/m versus 1.27%/1.42 deg/m (EVI-SAM) and 1.93%/1.56 deg/m (PL-EVIO). These results validate the effectiveness of MA-EVIO in challenging dynamic indoor environments. Full article

(This article belongs to the Special Issue Multi-Sensor Integration for Mobile and UAS Mapping)

► Show Figures

Figure 1

21 pages, 1686 KB

Open AccessArticle

Sparse-Gated RGB-Event Fusion for Small Object Detection in the Wild

by Yangsi Shi, Miao Li, Nuo Chen, Yihang Luo, Shiman He and Wei An

Remote Sens. 2025, 17(17), 3112; https://doi.org/10.3390/rs17173112 - 6 Sep 2025

Cited by 2 | Viewed by 4071

Abstract

Detecting small moving objects under challenging lighting conditions, such as overexposure and underexposure, remains a critical challenge in computer vision applications including surveillance, autonomous driving, and anti-UAV systems. Traditional RGB-based detectors often suffer from degraded object visibility and highly dynamic illumination, leading to [...] Read more.

Detecting small moving objects under challenging lighting conditions, such as overexposure and underexposure, remains a critical challenge in computer vision applications including surveillance, autonomous driving, and anti-UAV systems. Traditional RGB-based detectors often suffer from degraded object visibility and highly dynamic illumination, leading to suboptimal performance. To address these limitations, we propose a novel RGB-Event fusion framework that leverages the complementary strengths of RGB and event modalities for enhanced small object detection. Specifically, we introduce a Temporal Multi-Scale Attention Fusion (TMAF) module to encode motion cues from event streams at multiple temporal scales, thereby enhancing the saliency of small object features. Furthermore, we design a Sparse Noisy Gated Attention Fusion (SNGAF) module, inspired by the mixture-of-experts paradigm, which employs a sparse gating mechanism to adaptively combine multiple fusion experts based on input characteristics, enabling flexible and robust RGB-Event feature integration. Additionally, we present RGBE-UAV, which is a new RGB-Event dataset tailored for small moving object detection under diverse exposure conditions. Extensive experiments on our RGBE-UAV and public DSEC-MOD datasets demonstrate that our method outperforms existing state-of-the-art RGB-Event fusion approaches, validating its effectiveness and generalization under complex lighting conditions. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Sensor Data Processing for Remote Sensing)

► Show Figures

Figure 1

37 pages, 55522 KB

Open AccessArticle

EPCNet: Implementing an ‘Artificial Fovea’ for More Efficient Monitoring Using the Sensor Fusion of an Event-Based and a Frame-Based Camera

by Orla Sealy Phelan, Dara Molloy, Roshan George, Edward Jones, Martin Glavin and Brian Deegan

Sensors 2025, 25(15), 4540; https://doi.org/10.3390/s25154540 - 22 Jul 2025

Cited by 2 | Viewed by 1514

Abstract

Efficient object detection is crucial to real-time monitoring applications such as autonomous driving or security systems. Modern RGB cameras can produce high-resolution images for accurate object detection. However, increased resolution results in increased network latency and power consumption. To minimise this latency, Convolutional [...] Read more.

Efficient object detection is crucial to real-time monitoring applications such as autonomous driving or security systems. Modern RGB cameras can produce high-resolution images for accurate object detection. However, increased resolution results in increased network latency and power consumption. To minimise this latency, Convolutional Neural Networks (CNNs) often have a resolution limitation, requiring images to be down-sampled before inference, causing significant information loss. Event-based cameras are neuromorphic vision sensors with high temporal resolution, low power consumption, and high dynamic range, making them preferable to regular RGB cameras in many situations. This project proposes the fusion of an event-based camera with an RGB camera to mitigate the trade-off between temporal resolution and accuracy, while minimising power consumption. The cameras are calibrated to create a multi-modal stereo vision system where pixel coordinates can be projected between the event and RGB camera image planes. This calibration is used to project bounding boxes detected by clustering of events into the RGB image plane, thereby cropping each RGB frame instead of down-sampling to meet the requirements of the CNN. Using the Common Objects in Context (COCO) dataset evaluator, the average precision (AP) for the bicycle class in RGB scenes improved from 21.08 to 57.38. Additionally, AP increased across all classes from 37.93 to 46.89. To reduce system latency, a novel object detection approach is proposed where the event camera acts as a region proposal network, and a classification algorithm is run on the proposed regions. This achieved a 78% improvement over baseline. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

32 pages, 2740 KB

Open AccessArticle

Vision-Based Navigation and Perception for Autonomous Robots: Sensors, SLAM, Control Strategies, and Cross-Domain Applications—A Review

by Eder A. Rodríguez-Martínez, Wendy Flores-Fuentes, Farouk Achakir, Oleg Sergiyenko and Fabian N. Murrieta-Rico

Eng 2025, 6(7), 153; https://doi.org/10.3390/eng6070153 - 7 Jul 2025

Cited by 26 | Viewed by 18206

Abstract

Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from [...] Read more.

Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from sensing to deployment. We first examine the expanding sensor palette—monocular and multi-camera rigs, stereo and RGB-D devices, LiDAR–camera hybrids, event cameras, and infrared systems—highlighting the complementary operating envelopes and the rise of learning-based depth inference. The advances in visual localization and mapping are then analyzed, contrasting sparse and dense SLAM approaches, as well as monocular, stereo, and visual–inertial formulations. Additional topics include loop closure, semantic mapping, and LiDAR–visual–inertial fusion, which enables drift-free operation in dynamic environments. Building on these foundations, we review the navigation and control strategies, spanning classical planning, reinforcement and imitation learning, hybrid topological–metric memories, and emerging visual language guidance. Application case studies—autonomous driving, industrial manipulation, autonomous underwater vehicles, planetary rovers, aerial drones, and humanoids—demonstrate how tailored sensor suites and algorithms meet domain-specific constraints. Finally, the future research trajectories are distilled: generative AI for synthetic training data and scene completion; high-density 3D perception with solid-state LiDAR and neural implicit representations; event-based vision for ultra-fast control; and human-centric autonomy in next-generation robots. By providing a unified taxonomy, a comparative analysis, and engineering guidelines, this review aims to inform researchers and practitioners designing robust, scalable, vision-driven robotic systems. Full article

(This article belongs to the Special Issue Interdisciplinary Insights in Engineering Research)

► Show Figures

Figure 1

25 pages, 2723 KB

Open AccessArticle

A Human-Centric, Uncertainty-Aware Event-Fused AI Network for Robust Face Recognition in Adverse Conditions

by Akmalbek Abdusalomov, Sabina Umirzakova, Elbek Boymatov, Dilnoza Zaripova, Shukhrat Kamalov, Zavqiddin Temirov, Wonjun Jeong, Hyoungsun Choi and Taeg Keun Whangbo

Appl. Sci. 2025, 15(13), 7381; https://doi.org/10.3390/app15137381 - 30 Jun 2025

Cited by 7 | Viewed by 1845

Abstract

Face recognition systems often falter when deployed in uncontrolled settings, grappling with low light, unexpected occlusions, motion blur, and the degradation of sensor signals. Most contemporary algorithms chase raw accuracy yet overlook the pragmatic need for uncertainty estimation and multispectral reasoning rolled into [...] Read more.

Face recognition systems often falter when deployed in uncontrolled settings, grappling with low light, unexpected occlusions, motion blur, and the degradation of sensor signals. Most contemporary algorithms chase raw accuracy yet overlook the pragmatic need for uncertainty estimation and multispectral reasoning rolled into a single framework. This study introduces HUE-Net—a Human-centric, Uncertainty-aware, Event-fused Network—designed specifically to thrive under severe environmental stress. HUE-Net marries the visible RGB band with near-infrared (NIR) imagery and high-temporal-event data through an early-fusion pipeline, proven more responsive than serial approaches. A custom hybrid backbone that couples convolutional networks with transformers keeps the model nimble enough for edge devices. Central to the architecture is the perturbed multi-branch variational module, which distills probabilistic identity embeddings while delivering calibrated confidence scores. Complementing this, an Adaptive Spectral Attention mechanism dynamically reweights each stream to amplify the most reliable facial features in real time. Unlike previous efforts that compartmentalize uncertainty handling, spectral blending, or computational thrift, HUE-Net unites all three in a lightweight package. Benchmarks on the IJB-C and N-SpectralFace datasets illustrate that the system not only secures state-of-the-art accuracy but also exhibits unmatched spectral robustness and reliable probability calibration. The results indicate that HUE-Net is well-positioned for forensic missions and humanitarian scenarios where trustworthy identification cannot be deferred. Full article

(This article belongs to the Special Issue New Technologies and Applications of Visual-Based Human-Computer Interactions)

► Show Figures

Figure 1

73 pages, 2833 KB

Open AccessArticle

A Comprehensive Methodological Survey of Human Activity Recognition Across Diverse Data Modalities

by Jungpil Shin, Najmul Hassan, Abu Saleh Musa Miah and Satoshi Nishimura

Sensors 2025, 25(13), 4028; https://doi.org/10.3390/s25134028 - 27 Jun 2025

Cited by 19 | Viewed by 7970

Abstract

Human Activity Recognition (HAR) systems aim to understand human behavior and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, [...] Read more.

Human Activity Recognition (HAR) systems aim to understand human behavior and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. Each modality provides unique and complementary information suited to different application scenarios. Consequently, numerous studies have investigated diverse approaches for HAR using these modalities. This survey includes only peer-reviewed research papers published in English to ensure linguistic consistency and academic integrity. This paper presents a comprehensive survey of the latest advancements in HAR from 2014 to 2025, focusing on Machine Learning (ML) and Deep Learning (DL) approaches categorized by input data modalities. We review both single-modality and multi-modality techniques, highlighting fusion-based and co-learning frameworks. Additionally, we cover advancements in hand-crafted action features, methods for recognizing human–object interactions, and activity detection. Our survey includes a detailed dataset description for each modality, as well as a summary of the latest HAR systems, accompanied by a mathematical derivation for evaluating the deep learning model for each modality, and it also provides comparative results on benchmark datasets. Finally, we provide insightful observations and propose effective future research directions in HAR. Full article

(This article belongs to the Special Issue Computer Vision and Sensors-Based Application for Intelligent Systems)

► Show Figures

Figure 1

22 pages, 23754 KB

Open AccessArticle

A Low-Latency Dynamic Object Detection Algorithm Fusing Depth and Events

by Duowen Chen, Liqi Zhou and Chi Guo

Drones 2025, 9(3), 211; https://doi.org/10.3390/drones9030211 - 15 Mar 2025

Cited by 1 | Viewed by 2180

Abstract

Existing RGB image-based object detection methods achieve high accuracy when objects are static or in quasi-static conditions but demonstrate degraded performance with fast-moving objects due to motion blur artifacts. Moreover, state-of-the-art deep learning methods, which rely on RGB images as input, necessitate training [...] Read more.

Existing RGB image-based object detection methods achieve high accuracy when objects are static or in quasi-static conditions but demonstrate degraded performance with fast-moving objects due to motion blur artifacts. Moreover, state-of-the-art deep learning methods, which rely on RGB images as input, necessitate training and inference on high-performance graphics cards. These cards are not only bulky and power-hungry but also challenging to deploy on compact robotic platforms. Fortunately, the emergence of event cameras, inspired by biological vision, provides a promising solution to these limitations. These cameras offer low latency, minimal motion blur, and non-redundant outputs, making them well suited for dynamic obstacle detection. Building on these advantages, a novel methodology was developed through the fusion of events with depth to address the challenge of dynamic object detection. Initially, an adaptive temporal sampling window was implemented to selectively acquire event data and supplementary information, contingent upon the presence of objects within the visual field. Subsequently, a warping transformation was applied to the event data, effectively eliminating artifacts induced by ego-motion while preserving signals originating from moving objects. Following this preprocessing stage, the transformed event data were converted into an event queue representation, upon which denoising operations were performed. Ultimately, object detection was achieved through the application of image moment analysis to the processed event queue representation. The experimental results show that, compared with the current state-of-the-art methods, the proposed method has improved the detection speed by approximately 20% and the accuracy by approximately 5%. To substantiate real-world applicability, the authors implemented a complete obstacle avoidance pipeline, integrating our detector with planning modules and successfully deploying it on a custom-built quadrotor platform. Field tests confirm reliable avoidance of an obstacle approaching at approximately 8 m/s, thereby validating practical deployment potential. Full article

(This article belongs to the Topic Unmanned Vehicles Technology and Embodied Intelligence Systems for Intelligent Transportation)

► Show Figures

Figure 1

19 pages, 3089 KB

Open AccessArticle

Efficient Spiking Neural Network for RGB–Event Fusion-Based Object Detection

by Liangwei Fan, Jingjun Yang, Lei Wang, Jinpu Zhang, Xiangkai Lian and Hui Shen

Electronics 2025, 14(6), 1105; https://doi.org/10.3390/electronics14061105 - 11 Mar 2025

Cited by 7 | Viewed by 4445

Abstract

Robust object detection in challenging scenarios remains a critical challenge for autonomous driving systems. Inspired by human visual perception, integrating the complementary modalities of RGB frames and event streams presents a promising approach to achieving robust object detection. However, existing multimodal object detectors [...] Read more.

Robust object detection in challenging scenarios remains a critical challenge for autonomous driving systems. Inspired by human visual perception, integrating the complementary modalities of RGB frames and event streams presents a promising approach to achieving robust object detection. However, existing multimodal object detectors achieve superior performance at the cost of significant computational power consumption. To address this challenge, we propose a novel spiking RGB–event fusion-based detection network (SFDNet), a fully spiking object detector capable of achieving both low-power and high-performance object detection. Specifically, we first introduce the Leaky Integrate-and-Multi-Fire (LIMF) neuron model, which combines soft and hard reset mechanisms to enhance feature representation in SNNs. We then develop a multi-scale hierarchical spiking residual attention network and a lightweight spiking aggregation module for efficient dual-modality feature extraction and fusion. Experimental results on two public multimodal object detection datasets demonstrate that our SFDNet achieves state-of-the-art performance with remarkably low power consumption. The superior performance in challenging scenarios, such as motion blur and low-light conditions, highlights the robustness and effectiveness of SFDNet, significantly advancing the applicability of SNNs for real-world object detection tasks. Full article

(This article belongs to the Topic State-of-the-Art Object Detection, Tracking, and Recognition Techniques)

► Show Figures

Figure 1

7 pages, 3886 KB

Open AccessProceeding Paper

Event/Visual/IMU Integration for UAV-Based Indoor Navigation

by Ahmed Elamin and Ahmed El-Rabbany

Proceedings 2024, 110(1), 2; https://doi.org/10.3390/proceedings2024110002 - 2 Dec 2024

Viewed by 2485

Abstract

Unmanned aerial vehicle (UAV) navigation in indoor environments is challenging due to varying light conditions, the dynamic clutter typical of indoor spaces, and the absence of GNSS signals. In response to these complexities, emerging sensors, such as event cameras, demonstrate significant potential in [...] Read more.

Unmanned aerial vehicle (UAV) navigation in indoor environments is challenging due to varying light conditions, the dynamic clutter typical of indoor spaces, and the absence of GNSS signals. In response to these complexities, emerging sensors, such as event cameras, demonstrate significant potential in indoor navigation with their low latency and high dynamic range characteristics. Unlike traditional RGB cameras, event cameras mitigate motion blur and operate effectively in low-light conditions. Nevertheless, they exhibit limitations in terms of information output during scenarios of limited motion, in contrast to standard cameras that can capture detailed surroundings. This study proposes a novel event-based visual–inertial odometry approach for precise indoor navigation. In the proposed approach, the standard images are leveraged for feature detection and tracking, while events are aggregated into frames to track features between consecutive standard frames. The fusion of IMU measurements and feature tracks facilitates the continuous estimation of sensor states. The proposed approach is evaluated and validated using a controlled office environment simulation developed using Gazebo, employing a P230 simulated drone equipped with an event camera, an RGB camera, and IMU sensors. This simulated environment provides a testbed for evaluating and showcasing the proposed approach’s robust performance in realistic indoor navigation scenarios. Full article

(This article belongs to the Proceedings of The 31st International Conference on Geoinformatics)

► Show Figures

Figure 1

21 pages, 5375 KB

Open AccessArticle

PII-GCNet: Lightweight Multi-Modal CNN Network for Efficient Crowd Counting and Localization in UAV RGB-T Images

by Zuodong Niu, Huilong Pi, Donglin Jing and Dazheng Liu

Electronics 2024, 13(21), 4298; https://doi.org/10.3390/electronics13214298 - 31 Oct 2024

Cited by 4 | Viewed by 2317

Abstract

With the increasing need for real-time crowd evaluation in military surveillance, public safety, and event crowd management, crowd counting using unmanned aerial vehicle (UAV) captured images has emerged as an essential research topic. While conventional RGB-based methods have achieved significant success, their performance [...] Read more.

With the increasing need for real-time crowd evaluation in military surveillance, public safety, and event crowd management, crowd counting using unmanned aerial vehicle (UAV) captured images has emerged as an essential research topic. While conventional RGB-based methods have achieved significant success, their performance is severely hampered in low-light environments due to poor visibility. Integrating thermal infrared (TIR) images can address this issue, but existing RGB-T crowd counting networks, which employ multi-stream architectures, tend to introduce computational redundancy and excessive parameters, rendering them impractical for UAV applications constrained by limited onboard resources. To overcome these challenges, this research introduces an innovative, compact RGB-T framework designed to minimize redundant feature processing and improve multi-modal representation. The proposed approach introduces a Partial Information Interaction Convolution (PIIConv) module to selectively minimize redundant feature computations and a Global Collaborative Fusion (GCFusion) module to improve multi-modal feature representation through spatial attention mechanisms. Empirical findings indicate that the introduced network attains competitive results on the DroneRGBT dataset while significantly reducing floating-point operations (FLOPs) and improving inference speed across various computing platforms. This study’s significance is in providing a computationally efficient framework for RGB-T crowd counting that balances accuracy and resource efficiency, making it ideal for real-time UAV deployment. Full article

(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network)

► Show Figures

Figure 1

17 pages, 8954 KB

Open AccessArticle

Accurate Physical Activity Recognition using Multidimensional Features and Markov Model for Smart Health Fitness

by Amir Nadeem, Ahmad Jalal and Kibum Kim

Symmetry 2020, 12(11), 1766; https://doi.org/10.3390/sym12111766 - 24 Oct 2020

Cited by 96 | Viewed by 4085

Abstract

Recent developments in sensor technologies enable physical activity recognition (PAR) as an essential tool for smart health monitoring and for fitness exercises. For efficient PAR, model representation and training are significant factors contributing to the ultimate success of recognition systems because model representation [...] Read more.

Recent developments in sensor technologies enable physical activity recognition (PAR) as an essential tool for smart health monitoring and for fitness exercises. For efficient PAR, model representation and training are significant factors contributing to the ultimate success of recognition systems because model representation and accurate detection of body parts and physical activities cannot be distinguished if the system is not well trained. This paper provides a unified framework that explores multidimensional features with the help of a fusion of body part models and quadratic discriminant analysis which uses these features for markerless human pose estimation. Multilevel features are extracted as displacement parameters to work as spatiotemporal properties. These properties represent the respective positions of the body parts with respect to time. Finally, these features are processed by a maximum entropy Markov model as a recognition engine based on transition and emission probability values. Experimental results demonstrate that the proposed model produces more accurate results compared to the state-of-the-art methods for both body part detection and for physical activity recognition. The accuracy of the proposed method for body part detection is 90.91% on a University of Central Florida’s (UCF) sports action dataset and, for activity recognition on a UCF YouTube action dataset and an IM-DailyRGBEvents dataset, accuracy is 89.09% and 88.26% respectively. Full article

► Show Figures

Figure 1

17 pages, 7441 KB

Open AccessArticle

Active Eye-in-Hand Data Management to Improve the Robotic Object Detection Performance

by Pourya Hoseini, Janelle Blankenburg, Mircea Nicolescu, Monica Nicolescu and David Feil-Seifer

Computers 2019, 8(4), 71; https://doi.org/10.3390/computers8040071 - 23 Sep 2019

Cited by 4 | Viewed by 6039

Abstract

Adding to the number of sources of sensory information can be efficacious in enhancing the object detection capability of robots. In the realm of vision-based object detection, in addition to improving the general detection performance, observing objects of interest from different points of [...] Read more.

Adding to the number of sources of sensory information can be efficacious in enhancing the object detection capability of robots. In the realm of vision-based object detection, in addition to improving the general detection performance, observing objects of interest from different points of view can be central to handling occlusions. In this paper, a robotic vision system is proposed that constantly uses a 3D camera, while actively switching to make use of a second RGB camera in cases where it is necessary. The proposed system detects objects in the view seen by the 3D camera, which is mounted on a humanoid robot’s head, and computes a confidence measure for its recognitions. In the event of low confidence regarding the correctness of the detection, the secondary camera, which is installed on the robot’s arm, is moved toward the object to obtain another perspective of the object. With the objects detected in the scene viewed by the hand camera, they are matched to the detections of the head camera, and subsequently, their recognition decisions are fused together. The decision fusion method is a novel approach based on the Dempster–Shafer evidence theory. Significant improvements in object detection performance are observed after employing the proposed active vision system. Full article

(This article belongs to the Special Issue Vision, Image and Signal Processing (ICVISP))

► Show Figures

Figure 1

Search Results (15)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (15)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI