MDPI - Publisher of Open Access Journals

26 pages, 32866 KB

Open AccessArticle

Low-Altitude Multi-Object Tracking via Graph Neural Networks with Cross-Attention and Reliable Neighbor Guidance

by Hanxiang Qian, Xiaoyong Sun, Runze Guo, Shaojing Su, Bing Ding and Xiaojun Guo

Remote Sens. 2025, 17(20), 3502; https://doi.org/10.3390/rs17203502 - 21 Oct 2025

In low-altitude multi-object tracking (MOT), challenges such as frequent inter-object occlusion and complex non-linear motion disrupt the appearance of individual targets and the continuity of their trajectories, leading to frequent tracking failures. We posit that the relatively stable spatio-temporal relationships within object groups [...] Read more.

In low-altitude multi-object tracking (MOT), challenges such as frequent inter-object occlusion and complex non-linear motion disrupt the appearance of individual targets and the continuity of their trajectories, leading to frequent tracking failures. We posit that the relatively stable spatio-temporal relationships within object groups (e.g., pedestrians and vehicles) offer powerful contextual cues to resolve such ambiguities. We present NOWA-MOT (Neighbors Know Who We Are), a novel tracking-by-detection framework designed to systematically exploit this principle through a multi-stage association process. We make three primary contributions. First, we introduce a Low-Confidence Occlusion Recovery (LOR) module that dynamically adjusts detection scores by integrating IoU, a novel Recovery IoU (RIoU) metric, and location similarity to surrounding objects, enabling occluded targets to participate in high-priority matching. Second, for initial data association, we propose a Graph Cross-Attention (GCA) mechanism. In this module, separate graphs are constructed for detections and trajectories, and a cross-attention architecture is employed to propagate rich contextual information between them, yielding highly discriminative feature representations for robust matching. Third, to resolve the remaining ambiguities, we design a cascaded Matched Neighbor Guidance (MNG) module, which uniquely leverages the reliably matched pairs from the first stage as contextual anchors. Through MNG, star-shaped topological features are built for unmatched objects relative to their stable neighbors, enabling accurate association even when intrinsic features are weak. Our comprehensive experimental evaluation on the VisDrone2019 and UAVDT datasets confirms the superiority of our approach, achieving state-of-the-art HOTA scores of 51.34% and 62.69%, respectively, and drastically reducing identity switches compared to previous methods. Full article

(This article belongs to the Special Issue Multi-Object Detection and Feature Extraction of Remote Sensing Images)

► Show Figures

Figure 1

20 pages, 2817 KB

Open AccessArticle

Wildfire Detection from a Drone Perspective Based on Dynamic Frequency Domain Enhancement

by Xiaohui Ma, Yueshun He, Ping Du, Wei Lv and Yuankun Yang

Forests 2025, 16(10), 1613; https://doi.org/10.3390/f16101613 - 21 Oct 2025

Abstract

In recent years, drone-based wildfire detection technology has advanced rapidly, yet existing methods still encounter numerous challenges. For instance, high background complexity leads to frequent false positives and false negatives in models, which struggle to accurately identify both small-scale fire points and large-scale [...] Read more.

In recent years, drone-based wildfire detection technology has advanced rapidly, yet existing methods still encounter numerous challenges. For instance, high background complexity leads to frequent false positives and false negatives in models, which struggle to accurately identify both small-scale fire points and large-scale wildfires simultaneously. Furthermore, the complex model architecture and substantial parameter count hinder lightweight deployment requirements for drone platforms. To this end, this paper presents a lightweight drone-based wildfire detection model, DFE-YOLO. This model utilizes dynamic frequency domain enhancement technology to resolve the aforementioned challenges. Specifically, this study enhances small object detection capabilities through a four-tier detection mechanism; improves feature representation and robustness against interference by incorporating a Dynamic Frequency Domain Enhancement Module (DFDEM) and a Target Feature Enhancement Module (C2f_CBAM); and significantly reduces parameter count via a multi-scale sparse sampling module (MS3) to address resource constraints on drones. Experimental results demonstrate that DFE-YOLO achieves mAP50 scores of 88.4% and 88.0% on the Multiple lighting levels and Multiple wildfire objects Synthetic Forest Wildfire Dataset (M4SFWD) and Fire-detection datasets, respectively, whilst reducing parameters by 23.1%. Concurrently, mAP50-95 reaches 50.6% and 63.7%. Comprehensive results demonstrate that DFE-YOLO surpasses existing mainstream detection models in both accuracy and efficiency, providing a reliable solution for wildfire monitoring via unmanned aerial vehicles. Full article

(This article belongs to the Section Natural Hazards and Risk Management)

► Show Figures

Figure 1

25 pages, 2128 KB

Open AccessArticle

A Low-Cost UAV System and Dataset for Real-Time Weed Detection in Salad Crops

by Alina L. Machidon, Andraž Krašovec, Veljko Pejović, Daniele Latini, Sarathchandrakumar T. Sasidharan, Fabio Del Frate and Octavian M. Machidon

Electronics 2025, 14(20), 4082; https://doi.org/10.3390/electronics14204082 - 17 Oct 2025

Viewed by 158

Abstract

The global food crises and growing population necessitate efficient agricultural land use. Weeds cause up to 40% yield loss in major crops, resulting in over USD 100 billion in annual economic losses. Camera-equipped UAVs offer a solution for automatic weed detection, but the [...] Read more.

The global food crises and growing population necessitate efficient agricultural land use. Weeds cause up to 40% yield loss in major crops, resulting in over USD 100 billion in annual economic losses. Camera-equipped UAVs offer a solution for automatic weed detection, but the high computational and energy demands of deep learning models limit their use to expensive, high-end UAVs. In this paper, we present a low-cost UAV system built from off-the-shelf components, featuring a custom-designed on-board computing system based on the NVIDIA Jetson Nano. This system efficiently manages real-time image acquisition and inference using the energy-efficient Squeeze U-Net neural network for weed detection. Our approach ensures the pipeline operates in real time without affecting the drone’s flight autonomy. We also introduce the AgriAdapt dataset, a novel collection of 643 high-resolution aerial images of salad crops with weeds, which fills a key gap by providing realistic UAV data for benchmarking segmentation models under field conditions. Several deep learning models are trained and validated on the newly introduced AgriAdapt dataset, demonstrating its suitability for effective weed segmentation in UAV imagery. Quantitative results show that the dataset supports a range of architectures, from larger models such as DeepLabV3 to smaller, lightweight networks like Squeeze U-Net (with only 2.5 M parameters), achieving high accuracy (around 90%) across the board. These contributions distinguish our work from earlier UAV-based weed detection systems by combining a novel dataset with a comprehensive evaluation of accuracy, latency, and energy efficiency, thus directly targeting deep learning applications for real-time UAV deployment. Our results demonstrate the feasibility of deploying a low-cost, energy-efficient UAV system for real-time weed detection, making advanced agricultural technology more accessible and practical for widespread use. Full article

(This article belongs to the Special Issue Unmanned Aircraft Systems with Autonomous Navigation, 2nd Edition)

► Show Figures

Figure 1

26 pages, 10166 KB

Open AccessFeature PaperArticle

ADG-YOLO: A Lightweight and Efficient Framework for Real-Time UAV Target Detection and Ranging

by Hongyu Wang, Zheng Dang, Mingzhu Cui, Hanqi Shi, Yifeng Qu, Hongyuan Ye, Jingtao Zhao and Duosheng Wu

Drones 2025, 9(10), 707; https://doi.org/10.3390/drones9100707 - 13 Oct 2025

Viewed by 1026

Abstract

The rapid evolution of UAV technology has increased the demand for lightweight airborne perception systems. This study introduces ADG-YOLO, an optimized model for real-time target detection and ranging on UAV platforms. Building on YOLOv11n, we integrate C3Ghost modules for efficient feature fusion and [...] Read more.

The rapid evolution of UAV technology has increased the demand for lightweight airborne perception systems. This study introduces ADG-YOLO, an optimized model for real-time target detection and ranging on UAV platforms. Building on YOLOv11n, we integrate C3Ghost modules for efficient feature fusion and ADown layers for detail-preserving downsampling, reducing the model’s parameters to 1.77 M and computation to 5.7 GFLOPs. The Extended Kalman Filter (EKF) tracking improves positional stability in dynamic environments. Monocular ranging is achieved using similarity triangle theory with known target widths. Evaluations on a custom dataset, consisting of 5343 images from three drone types in complex environments, show that ADG-YOLO achieves 98.4% mAP_0.5 and 85.2% mAP_0.5:0.95 at 27 FPS when deployed on Lubancat4 edge devices. Distance measurement tests indicate an average error of 4.18% in the 0.5–5 m range for the DJI NEO model, and an average error of 2.40% in the 2–50 m range for the DJI 3TD model. These results suggest that the proposed model provides a practical trade-off between detection accuracy and computational efficiency for resource-constrained UAV applications. Full article

(This article belongs to the Topic Advances in Integrative AI, Machine Learning, and Big Data for Transformative Applications)

► Show Figures

Figure 1

20 pages, 49845 KB

Open AccessArticle

DDF-YOLO: A Small Target Detection Model Using Multi-Scale Dynamic Feature Fusion for UAV Aerial Photography

by Ziang Ma, Chao Wang, Chuanzhi Chen, Jinbao Chen and Guang Zheng

Aerospace 2025, 12(10), 920; https://doi.org/10.3390/aerospace12100920 - 13 Oct 2025

Viewed by 533

Abstract

Unmanned aerial vehicle (UAV)-based object detection shows promising potential in intelligent transportation and disaster response. However, detecting small targets remains challenging due to inherent limitations (long-distance and low-resolution imaging) and environmental interference (complex backgrounds and occlusions). To address these issues, this paper proposes [...] Read more.

Unmanned aerial vehicle (UAV)-based object detection shows promising potential in intelligent transportation and disaster response. However, detecting small targets remains challenging due to inherent limitations (long-distance and low-resolution imaging) and environmental interference (complex backgrounds and occlusions). To address these issues, this paper proposes an enhanced small target detection model, DDF-YOLO, which achieves higher detection performance. First, a dynamic feature extraction module (C2f-DCNv4) employs deformable convolutions to effectively capture features from irregularly shaped objects. In addition, a dynamic upsampling module (DySample) optimizes multi-scale feature fusion by combining shallow spatial details with deep semantic features, preserving critical low-level information while enhancing generalization across scales. Finally, to balance rapid convergence with precise localization, an adaptive Focaler-ECIoU loss function dynamically adjusts training weights based on sample quality during bounding box regression. Extensive experiments on VisDrone2019 and UAVDT benchmarks demonstrate DDF-YOLO’s superiority. Compared to YOLOv8n, our model achieves gains of 8.6% and 4.8% in mAP50, along with improvements of 5.0% and 3.3% in mAP50-95, respectively. Furthermore, it exhibits superior efficiency, requiring only 7.3 GFLOPs and attaining an inference speed of 179 FPS. These results validate the model’s robustness for UAV-based detection, particularly in small-object scenarios. Full article

(This article belongs to the Section Aeronautics)

► Show Figures

Figure 1

22 pages, 9295 KB

Open AccessArticle

FedGTD-UAVs: Federated Transfer Learning with SPD-GCNet for Occlusion-Robust Ground Small-Target Detection in UAV Swarms

by Liang Zhao, Xin Jia and Yuting Cheng

Drones 2025, 9(10), 703; https://doi.org/10.3390/drones9100703 - 12 Oct 2025

Viewed by 341

Abstract

Swarm-based UAV cooperative ground target detection faces critical challenges including sensitivity to small targets, susceptibility to occlusion, and data heterogeneity across distributed platforms. To address these issues, we propose FedGTD-UAVs—a privacy-preserving federated transfer learning (FTL) framework optimized for real-time swarm perception tasks. Our [...] Read more.

Swarm-based UAV cooperative ground target detection faces critical challenges including sensitivity to small targets, susceptibility to occlusion, and data heterogeneity across distributed platforms. To address these issues, we propose FedGTD-UAVs—a privacy-preserving federated transfer learning (FTL) framework optimized for real-time swarm perception tasks. Our solution integrates three key innovations: (1) an FTL paradigm employing centralized pre-training on public datasets followed by federated fine-tuning of sparse parameter subsets—under severe non-Independent and Identically Distributed (non-IID) data distributions, this paradigm ensures data privacy while maintaining over 98% performance; (2) an Space-to-Depth Convolution (SPD-Conv) backbone that replaces lossy downsampling with lossless space-to-depth operations, preserving fine-grained spatial features critical for small targets; (3) a lightweight Global Context Network (GCNet) module leverages contextual reasoning to effectively capture long-range dependencies, thereby enhancing robustness against occluded objects while maintaining real-time inference at 217 FPS. Extensive validation on VisDrone2019 and CARPK benchmarks demonstrates state-of-the-art performance: 44.2% mAP@0.5 (surpassing YOLOv8s by 12.1%) with 3.2× superior accuracy-efficiency trade-off. Compared to traditional centralized learning methods that rely on global data sharing and pose privacy risks, as well as the significant performance degradation of standard federated learning under non-IID data, this framework successfully resolves the core conflict between data privacy protection and detection performance maintenance, providing a secure and efficient solution for real-world deployment in complex dynamic environments. Full article

► Show Figures

Figure 1

39 pages, 13725 KB

Open AccessArticle

SRTSOD-YOLO: Stronger Real-Time Small Object Detection Algorithm Based on Improved YOLO11 for UAV Imageries

by Zechao Xu, Huaici Zhao, Pengfei Liu, Liyong Wang, Guilong Zhang and Yuan Chai

Remote Sens. 2025, 17(20), 3414; https://doi.org/10.3390/rs17203414 - 12 Oct 2025

Viewed by 887

Abstract

To address the challenges of small target detection in UAV aerial images—such as difficulty in feature extraction, complex background interference, high miss rates, and stringent real-time requirements—this paper proposes an innovative model series named SRTSOD-YOLO, based on YOLO11. The backbone network incorporates a [...] Read more.

To address the challenges of small target detection in UAV aerial images—such as difficulty in feature extraction, complex background interference, high miss rates, and stringent real-time requirements—this paper proposes an innovative model series named SRTSOD-YOLO, based on YOLO11. The backbone network incorporates a Multi-scale Feature Complementary Aggregation Module (MFCAM), designed to mitigate the loss of small target information as network depth increases. By integrating channel and spatial attention mechanisms with multi-scale convolutional feature extraction, MFCAM effectively locates small objects in the image. Furthermore, we introduce a novel neck architecture termed Gated Activation Convolutional Fusion Pyramid Network (GAC-FPN). This module enhances multi-scale feature fusion by emphasizing salient features while suppressing irrelevant background information. GAC-FPN employs three key strategies: adding a detection head with a small receptive field while removing the original largest one, leveraging large-scale features more effectively, and incorporating gated activation convolutional modules. To tackle the issue of positive-negative sample imbalance, we replace the conventional binary cross-entropy loss with an adaptive threshold focal loss in the detection head, accelerating network convergence. Additionally, to accommodate diverse application scenarios, we develop multiple versions of SRTSOD-YOLO by adjusting the width and depth of the network modules: a nano version (SRTSOD-YOLO-n), small (SRTSOD-YOLO-s), medium (SRTSOD-YOLO-m), and large (SRTSOD-YOLO-l). Experimental results on the VisDrone2019 and UAVDT datasets demonstrate that SRTSOD-YOLO-n improves the mAP@0.5 by 3.1% and 1.2% compared to YOLO11n, while SRTSOD-YOLO-l achieves gains of 7.9% and 3.3% over YOLO11l, respectively. Compared to other state-of-the-art methods, SRTSOD-YOLO-l attains the highest detection accuracy while maintaining real-time performance, underscoring the superiority of the proposed approach. Full article

(This article belongs to the Special Issue Advanced Image Processing Algorithms for Object Detection and Tracking in Aerial and Satellite Imagery)

► Show Figures

Figure 1

30 pages, 13570 KB

Open AccessArticle

DVIF-Net: A Small-Target Detection Network for UAV Aerial Images Based on Visible and Infrared Fusion

by Xiaofeng Zhao, Hui Zhang, Chenxiao Li, Kehao Wang and Zhili Zhang

Remote Sens. 2025, 17(20), 3411; https://doi.org/10.3390/rs17203411 - 11 Oct 2025

Viewed by 743

Abstract

During UAV aerial photography tasks, influenced by flight altitude and imaging mechanisms, the target in images often exhibits characteristics such as small size, complex backgrounds, and small inter-class differences. Under single optical modality, the weak and less discriminative feature representation of targets in [...] Read more.

During UAV aerial photography tasks, influenced by flight altitude and imaging mechanisms, the target in images often exhibits characteristics such as small size, complex backgrounds, and small inter-class differences. Under single optical modality, the weak and less discriminative feature representation of targets in drone-captured images makes them easily overwhelmed by complex background noise, leading to low detection accuracy, high missed-detection and false-detection rates in current object detection networks. Moreover, such methods struggle to meet all-weather and all-scenario application requirements. To address these issues, this paper proposes DVIF-Net, a visible-infrared fusion network for small-target detection in UAV aerial images, which leverages the complementary characteristics of visible and infrared images to enhance detection capability in complex environments. Firstly, a dual-branch feature extraction structure is designed based on YOLO architecture to separately extract features from visible and infrared images. Secondly, a P4-level cross-modal fusion strategy is proposed to effectively integrate features from both modalities while reducing computational complexity. Meanwhile, we design a novel dual context-guided fusion module to capture complementary features through channel attention of visible and infrared images during fusion and enhance interaction between modalities via element-wise multiplication. Finally, an edge information enhancement module based on cross stage partial structure is developed to improve sensitivity to small-target edges. Experimental results on two cross-modal datasets, DroneVehicle and VEDAI, demonstrate that DVIF-Net achieves detection accuracies of 85.8% and 62%, respectively. Compared with YOLOv10n, it has improved by 21.7% and 10.5% in visible modality, and by 7.4% and 30.5% in infrared modality, while maintaining a model parameter count of only 2.49 M. Furthermore, compared with 15 other algorithms, the proposed DVIF-Net attains SOTA performance. These results indicate that the method significantly enhances the detection capability for small targets in UAV aerial images, offering a high-precision and lightweight solution for real-time applications in complex aerial scenarios. Full article

► Show Figures

Figure 1

19 pages, 762 KB

Open AccessArticle

TMRGBT-D2D: A Temporal Misaligned RGB-Thermal Dataset for Drone-to-Drone Target Detection

by Hexiang Hao, Yueping Peng, Zecong Ye, Baixuan Han, Wei Tang, Wenchao Kang, Xuekai Zhang, Qilong Li and Wenchao Liu

Drones 2025, 9(10), 694; https://doi.org/10.3390/drones9100694 - 10 Oct 2025

Viewed by 364

Abstract

In the field of drone-to-drone detection tasks, the issue of fusing temporal information with infrared and visible light data for detection has been rarely studied. This paper presents the first temporal misaligned rgb-thermal dataset for drone-to-drone target detection, named TMRGBT-D2D. The dataset covers [...] Read more.

In the field of drone-to-drone detection tasks, the issue of fusing temporal information with infrared and visible light data for detection has been rarely studied. This paper presents the first temporal misaligned rgb-thermal dataset for drone-to-drone target detection, named TMRGBT-D2D. The dataset covers various lighting conditions (i.e., high-light scenes captured during the day, medium-light and low-light scenes captured at night, with night scenes accounting for 38.8% of all data), different scenes (sky, forests, buildings, construction sites, playgrounds, roads, etc.), different seasons, and different locations, consisting of a total of 42,624 images organized into sequential frames extracted from 19 RGB-T video pairs. Each frame in the dataset has been meticulously annotated, with a total of 94,323 annotations. Except for drones that cannot be identified under extreme conditions, infrared and visible light annotations are one-to-one corresponding. This dataset presents various challenges, including small object detection (the average size of objects in visible light images is approximately 0.02% of the image area), motion blur caused by fast movement, and detection issues arising from imaging differences between different modalities. To our knowledge, this is the first temporal misaligned rgb-thermal dataset for drone-to-drone target detection, providing convenience for research into rgb-thermal image fusion and the development of drone target detection. Full article

(This article belongs to the Special Issue Detection, Identification and Tracking of UAVs and Drones)

► Show Figures

Figure 1

24 pages, 76400 KB

Open AccessArticle

MBD-YOLO: An Improved Lightweight Multi-Scale Small-Object Detection Model for UAVs Based on YOLOv8

by Bo Xu, Di Cai, Kelin Sui, Zheng Wang, Chuangchuang Liu and Xiaolong Pei

Appl. Sci. 2025, 15(20), 10877; https://doi.org/10.3390/app152010877 - 10 Oct 2025

Viewed by 452

Abstract

To address the challenges of low detection accuracy and weak generalization in UAV aerial imagery caused by complex ground environments, significant scale variations among targets, dense small objects, and background interference, this paper proposes an improved lightweight multi-scale small-object detection model, MBD-YOLO (MBFF [...] Read more.

To address the challenges of low detection accuracy and weak generalization in UAV aerial imagery caused by complex ground environments, significant scale variations among targets, dense small objects, and background interference, this paper proposes an improved lightweight multi-scale small-object detection model, MBD-YOLO (MBFF module, BiMS-FPN, and Dual-Stream Head). Specifically, to enhance multi-scale feature extraction capabilities, we introduce the Multi-Branch Feature Fusion (MBFF) module, which dynamically adjusts receptive fields through parallel branches and adaptive depthwise convolutions, expanding the receptive field while preserving detail perception. We further design a lightweight Bidirectional Multi-Scale Feature Aggregation Pyramid Network (BiMS-FPN), integrating bidirectional propagation paths and a Multi-Scale Feature Aggregation (MSFA) module to mitigate feature spatial misalignment and improve small-target detection. Additionally, the Dual-Stream Head with NMS-free architecture leverages a task-aligned architecture and dynamic matching strategies to boost inference speed without compromising accuracy. Experiments on the VisDrone2019 dataset demonstrate that MBD-YOLO-n surpasses YOLOv8n by 6.3% in mAP50 and 8.2% in mAP50–95, with accuracy gains of 17.96–55.56% for several small-target categories, while increasing parameters by merely 3.1%. Moreover, MBD-YOLO-s achieves superior detection accuracy, efficiency, and generalization with only 12.1 million parameters, outperforming state-of-the-art models and proving suitable for resource-constrained embedded deployment scenarios. The superior performance of MBD-YOLO, which harmonizes high precision with low computational demand, fulfills the critical requirements for real-time deployment on resource-limited UAVs, showing great promise for applications in traffic monitoring, urban security, and agricultural surveying. Full article

(This article belongs to the Special Issue Application, Optimization and Architecture of Deep Learning Neural Network)

► Show Figures

Figure 1

14 pages, 1304 KB

Open AccessArticle

RoadNet: A High-Precision Transformer-CNN Framework for Road Defect Detection via UAV-Based Visual Perception

by Long Gou, Yadong Liang, Xingyu Zhang and Jianfeng Yang

Drones 2025, 9(10), 691; https://doi.org/10.3390/drones9100691 - 9 Oct 2025

Viewed by 352

Abstract

Automated Road defect detection using Unmanned Aerial Vehicles (UAVs) has emerged as an efficient and safe solution for large-scale infrastructure inspection. However, object detection in aerial imagery poses unique challenges, including the prevalence of extremely small targets, complex backgrounds, and significant scale variations. [...] Read more.

Automated Road defect detection using Unmanned Aerial Vehicles (UAVs) has emerged as an efficient and safe solution for large-scale infrastructure inspection. However, object detection in aerial imagery poses unique challenges, including the prevalence of extremely small targets, complex backgrounds, and significant scale variations. Mainstream deep learning-based detection models often struggle with these issues, exhibiting limitations in detecting small cracks, high computational demands, and insufficient generalization ability for UAV perspectives. To address these challenges, this paper proposes a novel comprehensive network, RoadNet, specifically designed for high-precision road defect detection in UAV-captured imagery. RoadNet innovatively integrates Transformer modules with a convolutional neural network backbone and detection head. This design not only significantly enhances the global feature modeling capability crucial for understanding complex aerial contexts but also maintains the computational efficiency necessary for potential real-time applications. The model was trained and evaluated on a self-collected UAV road defect dataset (UAV-RDD). In comparative experiments, RoadNet achieved an outstanding mAP@0.5 score of 0.9128 while maintaining a fast-processing speed of 210.01 ms per image, outperforming other state-of-the-art models. The experimental results demonstrate that RoadNet possesses superior detection performance for road defects in complex aerial scenarios captured by drones. Full article

(This article belongs to the Special Issue Advances in Cooperative Perception Application for Unmanned System in Modern Transportation)

► Show Figures

Figure 1

29 pages, 7823 KB

Open AccessArticle

Real-Time Detection Sensor for Unmanned Aerial Vehicle Using an Improved YOLOv8s Algorithm

by Fuhao Lu, Chao Zeng, Hangkun Shi, Yanghui Xu and Song Fu

Sensors 2025, 25(19), 6246; https://doi.org/10.3390/s25196246 - 9 Oct 2025

Viewed by 694

Abstract

This study advances the unmanned aerial vehicle (UAV) localization technology within the framework of a low-altitude economy, with particular emphasis on the accurate and real-time identification and tracking of unauthorized (“black-flying”) drones. Conventional YOLOv8s-based target detection algorithms often suffer from missed detections due [...] Read more.

This study advances the unmanned aerial vehicle (UAV) localization technology within the framework of a low-altitude economy, with particular emphasis on the accurate and real-time identification and tracking of unauthorized (“black-flying”) drones. Conventional YOLOv8s-based target detection algorithms often suffer from missed detections due to their reliance on single-frame features. To address this limitation, this paper proposes an improved detection algorithm that integrates a long-short-term memory (LSTM) network into the YOLOv8s framework. By incorporating time-series modeling, the LSTM module enables the retention of historical features and dynamic prediction of UAV trajectories. The loss function combines bounding box regression loss with binary cross-entropy and is optimized using the Adam algorithm to enhance training convergence. The training data distribution is validated through Monte Carlo random sampling, which improves the model’s generalization to complex scenes. Simulation results demonstrate that the proposed method significantly enhances UAV detection performance. In addition, when deployed on the RK3588-based embedded system, the method achieves a low false negative rate and exhibits robust detection capabilities, indicating strong potential for practical applications in airspace management and counter-UAV operations. Full article

(This article belongs to the Special Issue Smart Sensing and Control for Autonomous Intelligent Unmanned Systems)

► Show Figures

Figure 1

25 pages, 12740 KB

Open AccessArticle

GM-DETR: Infrared Detection of Small UAV Swarm Targets Based on Detection Transformer

by Chenhao Zhu, Xueli Xie, Jianxiang Xi and Xiaogang Yang

Remote Sens. 2025, 17(19), 3379; https://doi.org/10.3390/rs17193379 - 7 Oct 2025

Viewed by 358

Abstract

Infrared object detection is an important prerequisite for small unmanned aerial vehicle (UAV) swarm countermeasures. Owing to the limited imaging area and texture features of small UAV targets, accurate infrared detection of UAV swarm targets is challenging. In this paper, the GM-DETR is [...] Read more.

Infrared object detection is an important prerequisite for small unmanned aerial vehicle (UAV) swarm countermeasures. Owing to the limited imaging area and texture features of small UAV targets, accurate infrared detection of UAV swarm targets is challenging. In this paper, the GM-DETR is proposed for the detection of densely distributed small UAV swarm targets in infrared scenarios. Specifically, high-level and low-level features are fused by the Fine-Grained Context-Aware Fusion module, which augments texture features in the fused feature map. Furthermore, a Supervised Sampling and Sparsification module is proposed as an explicit guiding mechanism, which assists the GM-DETR to focus on high-quality queries according to the confidence value. The Geometric Relation Encoder is introduced to encode geometric relation among queries, which makes up for the information loss caused by query serialization. In the second stage of the GM-DETR, a long-term memory mechanism is introduced to make UAV detection more stable and distinguishable in motion blur scenes. In the decoder, the self-attention mechanism is improved by introducing memory blocks as additional decoding information, which enhances the robustness of the GM-DETR. In addition, we constructed a small UAV swarm dataset, UAV Swarm Dataset (USD), which comprises 7000 infrared images of low-altitude UAV swarms, as another contribution. The experimental results on the USD show that the GM-DETR outperforms other state-of-the-arts detectors and obtains the best scores (90.6 on

{AP}_{75}

and 63.8 on

{AP}_{S}

), which demonstrates the effectiveness of the GM-DETR in detecting small UAV targets. The good performance of the GM-DETR on the Drone Vehicle dataset also demonstrates the superiority of the proposed modules in detecting small targets. Full article

► Show Figures

Graphical abstract

27 pages, 25256 KB

Open AccessArticle

A Progressive Target-Aware Network for Drone-Based Person Detection Using RGB-T Images

by Zhipeng He, Boya Zhao, Yuanfeng Wu, Yuyang Jiang and Qingzhan Zhao

Remote Sens. 2025, 17(19), 3361; https://doi.org/10.3390/rs17193361 - 4 Oct 2025

Viewed by 472

Abstract

Drone-based target detection using visible and thermal (RGB-T) images is critical in disaster rescue, intelligent transportation, and wildlife monitoring. However, persons typically occupy fewer pixels and exhibit more varied postures than vehicles or large animals, making them difficult to detect in unmanned aerial [...] Read more.

Drone-based target detection using visible and thermal (RGB-T) images is critical in disaster rescue, intelligent transportation, and wildlife monitoring. However, persons typically occupy fewer pixels and exhibit more varied postures than vehicles or large animals, making them difficult to detect in unmanned aerial vehicle (UAV) remote sensing images with complex backgrounds. We propose a novel progressive target-aware network (PTANet) for person detection using RGB-T images. A global adaptive feature fusion module (GAFFM) is designed to fuse the texture and thermal features of persons. A progressive focusing strategy is used. Specifically, we incorporate a person segmentation auxiliary branch (PSAB) during training to enhance target discrimination, while a cross-modality background mask (CMBM) is applied in the inference phase to suppress irrelevant background regions. Extensive experiments demonstrate that the proposed PTANet achieves high accuracy and generalization performance, reaching 79.5%, 47.8%, and 97.3% mean average precision (mAP)@50 on three drone-based person detection benchmarks (VTUAV-det, RGBTDronePerson, and VTSaR), with only 4.72 M parameters. PTANet deployed on an embedded edge device with TensorRT acceleration and quantization achieves an inference speed of 11.177 ms (640 × 640 pixels), indicating its promising potential for real-time onboard person detection. The source code is publicly available on GitHub. Full article

(This article belongs to the Special Issue Object Detection and Information Extraction Based on Remote Sensing Imagery (Second Edition))

► Show Figures

Figure 1

14 pages, 2759 KB

Open AccessArticle

Unmanned Airborne Target Detection Method with Multi-Branch Convolution and Attention-Improved C2F Module

by Fangyuan Qin, Weiwei Tang, Haishan Tian and Yuyu Chen

Sensors 2025, 25(19), 6023; https://doi.org/10.3390/s25196023 - 1 Oct 2025

Viewed by 238

Abstract

In this paper, a target detection network algorithm based on a multi-branch convolution and attention improvement Cross-Stage Partial-Fusion Bottleneck with Two Convolutions (C2F) module is proposed for the difficult task of detecting small targets in unmanned aerial vehicles. A C2F module method consisting [...] Read more.

In this paper, a target detection network algorithm based on a multi-branch convolution and attention improvement Cross-Stage Partial-Fusion Bottleneck with Two Convolutions (C2F) module is proposed for the difficult task of detecting small targets in unmanned aerial vehicles. A C2F module method consisting of fusing partial convolutional (PConv) layers was designed to improve the speed and efficiency of extracting features, and a method consisting of combining multi-scale feature fusion with a channel space attention mechanism was applied in the neck network. An FA-Block module was designed to improve feature fusion and attention to small targets’ features; this design increases the size of the miniscule target layer, allowing richer feature information about the small targets to be retained. Finally, the lightweight up-sampling operator Content-Aware ReAssembly of Features was used to replace the original up-sampling method to expand the network’s sensory field. Experimental tests were conducted on a self-complied mountain pedestrian dataset and the public VisDrone dataset. Compared with the base algorithm, the improved algorithm improved the mAP50, mAP50-95, P-value, and R-value by 2.8%, 3.5%, 2.3%, and 0.2%, respectively, on the Mountain Pedestrian dataset and the mAP50, mAP50-95, P-value, and R-value by 9.2%, 6.4%, 7.7%, and 7.6%, respectively, on the VisDrone dataset. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

Search Results (341)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (341)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI