sensors-logo

Journal Browser

Journal Browser

AI-Based Object Detection and Tracking in UAVs: Challenges and Research Directions—2nd Edition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (31 January 2026) | Viewed by 9634

Special Issue Editor

Special Issue Information

Dear Colleagues,

Combining autonomous unmanned aerial vehicles (UAVs) and AI-based object detection and tracking could significantly improve efficiency, reduce costs, and lower risks for various applications. With fast developments in UAV platform design, cameras, micro-computers, and image-processing algorithms, autonomous UAVs have become a promising sensing platform for various applications such as environmental monitoring and infrastructure inspection. These systems can reduce the necessity of traditional manual inspection in risky working environments and avoid the cost of using piloted fixed-wing aircraft or helicopters to conduct large-scale sensing tasks.

New aerial-based sensors with machine learning, object detection, and tracking capabilities provide both opportunities and challenges that allow the research community to provide novel solutions. The key aim of this Special Issue is to bring together innovative research that uses off-the-shelf or custom-made platforms to extend autonomous aerial sensing capabilities. Contributions from all fields related to UAVs and aerial-image processing techniques are of interest, particularly including, but not limited to, the following topics:

Unmanned aerial vehicle (UAV) systems;

Machine learning;

AI-based data processing;

Object detection;

Object tracking;

Localization and mapping;

Path planning;

Obstacle avoidance;

Multi-agent collaboration.

Dr. Boyang Li
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • object detection
  • object tracking
  • UAV

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

25 pages, 1539 KB  
Article
RFE-YOLO: A Lightweight Receptive Field-Enhanced Network for UAV Imagery Object Detection
by Yimo Peng and Xiangyu Ge
Sensors 2026, 26(9), 2903; https://doi.org/10.3390/s26092903 - 6 May 2026
Viewed by 738
Abstract
Object detection in unmanned aerial vehicle (UAV) remote sensing imagery remains a formidable challenge due to the diminutive scale of targets, complex background clutter, and extreme variability in target morphology. Standard convolutional neural networks typically suffer from irreversible fine-grained information loss during downsampling, [...] Read more.
Object detection in unmanned aerial vehicle (UAV) remote sensing imagery remains a formidable challenge due to the diminutive scale of targets, complex background clutter, and extreme variability in target morphology. Standard convolutional neural networks typically suffer from irreversible fine-grained information loss during downsampling, as strided operations discard critical spatial details essential for the localization of tiny objects. To address these issues, we propose RFE-YOLO, a lightweight receptive field-enhanced network specifically tailored for high-precision small object detection in UAV scenarios. First, the Cross-Scale Receptive Field Enhancement (CSRE) module is designed to mitigate intrinsic information loss by integrating space-to-depth convolution (SPD-Conv), which preserves spatial details by migrating them into the channel dimension. This module further employs an energy-based adaptive weight generation mechanism to distinguish target signals from environmental noise. Second, this paper proposes the C3k2-Dynamic Inception Mixer Block (C3k2-DIMB), which adaptively captures anisotropic features—such as slender vehicles—via dynamic kernel weighting and multi-shape inception kernels. Third, the Shuffled Upsampling for Resolution Enhancement (SURE) module is introduced to maintain spatial fidelity during resolution recovery, utilizing a channel shuffle mechanism to overcome information isolation. Finally, the Multi-feature Fusion Module (MFM) replaces conventional static concatenation with a dynamic softmax-based competition mechanism, effectively bridging the semantic gap between multi-level features while suppressing background distractors. Experimental results on the VisDrone dataset demonstrate that RFE-YOLO significantly enhances the representation capability for small objects. Specifically, the proposed model achieves a state-of-the-art mAP50 of 42.70%, representing a substantial 9.3% improvement over the baseline YOLO11n. Furthermore, our architecture maintains an exceptionally lightweight profile with only 1.91 M parameters, demonstrating that high-precision detection can be achieved through structural intelligence rather than excessive parameter scaling. This makes RFE-YOLO highly suitable for real-time inference on edge-deployed UAV platforms. Full article
Show Figures

Figure 1

23 pages, 4705 KB  
Article
CSFPR-RTDETR-CR: A Causal Intervention Enhanced Framework for Infrared UAV Small Target Detection with Feature Debiasing
by Honglong Wang and Lihui Sun
Sensors 2026, 26(6), 1941; https://doi.org/10.3390/s26061941 - 19 Mar 2026
Viewed by 401
Abstract
Infrared UAV small target detection is critical in areas such as military reconnaissance, disaster monitoring, and border patrol. However, it faces challenges due to the small size of targets, weak texture, and complex backgrounds in infrared images. Existing deep learning-based object detection models [...] Read more.
Infrared UAV small target detection is critical in areas such as military reconnaissance, disaster monitoring, and border patrol. However, it faces challenges due to the small size of targets, weak texture, and complex backgrounds in infrared images. Existing deep learning-based object detection models often learn spurious correlations between targets and their backgrounds. This leads to poor generalization and higher rates of false positives and missed detections in complex scenes. To overcome feature bias and improve performance, this paper proposes an enhanced detection framework based on causal reasoning. The framework builds on the advanced CSFPR-RTDETR detector. Guided by the principles of structural causal models, it explicitly separates causal and non-causal features in the feature space. Feature debiasing is achieved through a three-path approach. First, a causal data augmentation module is introduced. It applies frequency perturbations drawn from a Gaussian distribution to non-causal features. This strengthens the model’s robustness against mixed disturbances. Second, a counterfactual reasoning module is integrated into the backbone network. This module generates counterfactual samples to intervene in the feature distribution, helping the model identify and utilize causal features more effectively. Third, a causal attention mechanism module is added to the encoder. By distinguishing and weighting causal and non-causal features, it guides the model to focus on features that are essential for detecting targets. Experiments on the HIT-UAV public dataset show that the proposed framework improves mAP@50 by 5.6% and mAP@50:95 by 1.8%. Visualization analysis further confirms that the framework enhances feature discrimination and overall detection performance. Full article
Show Figures

Figure 1

26 pages, 16182 KB  
Article
Overcoming Scale Variations and Occlusions in Aerial Detection: A Context-Aware DEIM Framework
by Xinhao Chang, Xuejuan Wang and Kefeng Li
Sensors 2026, 26(1), 147; https://doi.org/10.3390/s26010147 - 25 Dec 2025
Cited by 3 | Viewed by 1025
Abstract
Object detection in Unmanned Aerial Vehicle (UAV) imagery has gained significant traction in applications such as railway inspection and waste management. While emerging end-to-end detectors like DEIM show promise, they often struggle with weak feature responses and spatial misalignment in aerial scenarios. To [...] Read more.
Object detection in Unmanned Aerial Vehicle (UAV) imagery has gained significant traction in applications such as railway inspection and waste management. While emerging end-to-end detectors like DEIM show promise, they often struggle with weak feature responses and spatial misalignment in aerial scenarios. To address these issues, this paper proposes SCA-DEIM, a context-aware real-time detection framework. Specifically, we introduce the Adaptive Spatial and Channel Synergistic Attention (ASCSA) module, which refines existing attention paradigms by transitioning from a static gating mechanism to an active signal amplifier. Unlike traditional designs that impose rigid bounds on feature responses, this improved architecture enhances feature extraction by dynamically boosting the saliency of faint small-target signals amidst complex backgrounds. Furthermore, drawing inspiration from infrared small object detection, we propose the Cross-Stage Partial Shifted Pinwheel Mixed Convolution (CSP-SPMConv). By synergizing asymmetric padding with a spatial shift mechanism, this module effectively aligns receptive fields and enforces cross-channel interaction, thereby resolving feature misalignment and scale fusion issues. Comprehensive experiments on the VisDrone2019 dataset demonstrate that, compared with the baseline model, SCA-DEIM achieves improvements of 1.8% in Average Precision (AP), 2.3% in AP for small objects (APs), and 2.0% in AP for large objects (APl), while maintaining a competitive inference speed. Notably, visualization results under different illumination conditions demonstrate the strong robustness of the model. In addition, further validation on both the UAVVaste and UAVDT datasets confirms that the proposed method effectively enhances the detection performance for small objects. Full article
Show Figures

Figure 1

28 pages, 54754 KB  
Article
Rethinking Adaptive Contextual Information and Multi-Scale Feature Fusion for Small-Object Detection in UAV Imagery
by Chang Liu, Yong Wang, Qiang Cao, Changlei Zhang and Anyu Cheng
Sensors 2025, 25(23), 7312; https://doi.org/10.3390/s25237312 - 1 Dec 2025
Cited by 4 | Viewed by 1100
Abstract
Small object detection in unmanned aerial vehicle (UAV) imagery poses significant challenges due to insufficient feature representation, complex background interference, and extremely small target sizes. These factors collectively degrade the performance of conventional detection algorithms, leading to low accuracy, frequent missed detections, and [...] Read more.
Small object detection in unmanned aerial vehicle (UAV) imagery poses significant challenges due to insufficient feature representation, complex background interference, and extremely small target sizes. These factors collectively degrade the performance of conventional detection algorithms, leading to low accuracy, frequent missed detections, and false alarms. To address these issues, we propose YOLO-DMF, which is a novel detection framework specifically designed for drone-based scenarios. Our approach introduces three key innovations from the perspectives of feature extraction and information fusion: (1) a Detail-Semantic Adaptive Fusion (DSAF) module that employs a multi-branch architecture to synergistically enhance shallow detail features and deep semantic information, thereby significantly improving feature representation for small objects; (2) a Multi-Scale Residual Spatial Attention (MSRSA) mechanism incorporating scale-adaptive spatial attention to improve robustness against background clutter while enabling a more precise localization of critical target regions; and (3) a Feature Pyramid Reuse and Fusion Network (FPRFN) that introduces a dedicated 160×160 detection head and hierarchically combines multi-level shallow features with high-level semantic information through cross-scale fusion, effectively enhancing sensitivity to both small and tiny objects. Comprehensive experiments on the VisDrone2019 dataset demonstrate that YOLO-DMF outperforms state-of-the-art lightweight detection models. Compared to the baseline YOLOv8s, our method achieves improvements of 3.9% in mAP@0.5 and 2.5% in mAP@0.5:0.95 while reducing model parameters by 66.67% with only a 2.81% increase in computational cost. The model achieves a real-time inference speed of 34.1 FPS on the RK3588 NPU, satisfying the latency requirements for real-time object detection. Additional validation on both the AI-TOD and WAID datasets confirms the method’s strong generalization capability and promising potential for practical engineering applications. Full article
Show Figures

Figure 1

21 pages, 3849 KB  
Article
Low-Power Branch CNN Hardware Accelerator with Early Exit for UAV Disaster Detection Using 16 nm CMOS Technology
by Yu-Pei Liang, Wen-Chin Chao and Ching-Che Chung
Sensors 2025, 25(15), 4867; https://doi.org/10.3390/s25154867 - 7 Aug 2025
Cited by 1 | Viewed by 1063
Abstract
This paper presents a disaster detection framework based on aerial imagery, utilizing a Branch Convolutional Neural Network (B-CNN) to enhance feature learning efficiency. The B-CNN architecture incorporates branch training, enabling effective training and inference with reduced model parameters. To further optimize resource usage, [...] Read more.
This paper presents a disaster detection framework based on aerial imagery, utilizing a Branch Convolutional Neural Network (B-CNN) to enhance feature learning efficiency. The B-CNN architecture incorporates branch training, enabling effective training and inference with reduced model parameters. To further optimize resource usage, the framework integrates DoReFa-Net for weight quantization and fixed-point parameter representation. An early exit mechanism is introduced to support low-latency, energy-efficient predictions. The proposed B-CNN hardware accelerator is implemented using TSMC 16 nm CMOS technology, incorporating power gating techniques to manage memory power consumption. Post-layout simulations demonstrate that the proposed hardware accelerator operates at 500 MHz with a power consumption of 37.56 mW. The system achieves a disaster prediction accuracy of 88.18%, highlighting its effectiveness and suitability for low-power, real-time applications in aerial disaster monitoring. Full article
Show Figures

Figure 1

19 pages, 5625 KB  
Article
UAV Imagery Real-Time Semantic Segmentation with Global–Local Information Attention
by Zikang Zhang and Gongquan Li
Sensors 2025, 25(6), 1786; https://doi.org/10.3390/s25061786 - 13 Mar 2025
Cited by 10 | Viewed by 4495
Abstract
In real-time semantic segmentation for drone imagery, current lightweight algorithms suffer from the lack of integration of global and local information in the image, leading to missed detections and misclassifications in the classification categories. This paper proposes a method for the real-time semantic [...] Read more.
In real-time semantic segmentation for drone imagery, current lightweight algorithms suffer from the lack of integration of global and local information in the image, leading to missed detections and misclassifications in the classification categories. This paper proposes a method for the real-time semantic segmentation of drones that integrates multi-scale global context information. The principle utilizes a UNet structure, with the encoder employing a Resnet18 network to extract features. The decoder incorporates a global–local attention module, where the global branch compresses and extracts global information in both vertical and horizontal directions, and the local branch extracts local information through convolution, thereby enhancing the fusion of global and local information in the image. In the segmentation head, a shallow-feature fusion module is used to multi-scale integrate the various features extracted by the encoder, thereby strengthening the spatial information in the shallow features. The model was tested on the UAvid and UDD6 datasets, achieving accuracies of 68% mIoU (mean Intersection over Union) and 67% mIoU on the two datasets, respectively, 10% and 21.2% higher than the baseline model UNet. The real-time performance of the model reached 72.4 frames/s, which is 54.4 frames/s higher than the baseline model UNet. The experimental results demonstrate that the proposed model balances accuracy and real-time performance well. Full article
Show Figures

Figure 1

Back to TopTop