MDPI - Publisher of Open Access Journals

20 pages, 7280 KiB

Open AccessArticle

UAV-DETR: An Enhanced RT-DETR Architecture for Efficient Small Object Detection in UAV Imagery

by Yu Zhou and Yan Wei

Sensors 2025, 25(15), 4582; https://doi.org/10.3390/s25154582 - 24 Jul 2025

Viewed by 586

To mitigate the technical challenges associated with small-object detection, feature degradation, and spatial-contextual misalignment in UAV-acquired imagery, this paper proposes UAV-DETR, an enhanced Transformer-based object detection model designed for aerial scenarios. Specifically, UAV imagery often suffers from feature degradation due to low resolution [...] Read more.

To mitigate the technical challenges associated with small-object detection, feature degradation, and spatial-contextual misalignment in UAV-acquired imagery, this paper proposes UAV-DETR, an enhanced Transformer-based object detection model designed for aerial scenarios. Specifically, UAV imagery often suffers from feature degradation due to low resolution and complex backgrounds and from semantic-spatial misalignment caused by dynamic shooting conditions. This work addresses these challenges by enhancing feature perception, semantic representation, and spatial alignment. Architecturally extending the RT-DETR framework, UAV-DETR incorporates three novel modules: the Channel-Aware Sensing Module (CAS), the Scale-Optimized Enhancement Pyramid Module (SOEP), and the newly designed Context-Spatial Alignment Module (CSAM), which integrates the functionalities of contextual and spatial calibration. These components collaboratively strengthen multi-scale feature extraction, semantic representation, and spatial-contextual alignment. The CAS module refines the backbone to improve multi-scale feature perception, while SOEP enhances semantic richness in shallow layers through lightweight channel-weighted fusion. CSAM further optimizes the hybrid encoder by simultaneously correcting contextual inconsistencies and spatial misalignments during feature fusion, enabling more precise cross-scale integration. Comprehensive comparisons with mainstream detectors, including Faster R-CNN and YOLOv5, demonstrate that UAV-DETR achieves superior small-object detection performance in complex aerial scenarios. The performance is thoroughly evaluated in terms of mAP@0.5, parameter count, and computational complexity (GFLOPs). Experiments on the VisDrone2019 dataset benchmark demonstrate that UAV-DETR achieves an mAP@0.5 of 51.6%, surpassing RT-DETR by 3.5% while reducing the number of model parameters from 19.8 million to 16.8 million. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

23 pages, 3858 KiB

Open AccessArticle

MCFA: Multi-Scale Cascade and Feature Adaptive Alignment Network for Cross-View Geo-Localization

by Kaiji Hou, Qiang Tong, Na Yan, Xiulei Liu and Shoulu Hou

Sensors 2025, 25(14), 4519; https://doi.org/10.3390/s25144519 - 21 Jul 2025

Viewed by 374

Abstract

Cross-view geo-localization (CVGL) presents significant challenges due to the drastic variations in perspective and scene layout between unmanned aerial vehicle (UAV) and satellite images. Existing methods have made certain advancements in extracting local features from images. However, they exhibit limitations in modeling the [...] Read more.

Cross-view geo-localization (CVGL) presents significant challenges due to the drastic variations in perspective and scene layout between unmanned aerial vehicle (UAV) and satellite images. Existing methods have made certain advancements in extracting local features from images. However, they exhibit limitations in modeling the interactions among local features and fall short in aligning cross-view representations accurately. To address these issues, we propose a Multi-Scale Cascade and Feature Adaptive Alignment (MCFA) network, which consists of a Multi-Scale Cascade Module (MSCM) and a Feature Adaptive Alignment Module (FAAM). The MSCM captures the features of the target’s adjacent regions and enhances the model’s robustness by learning key region information through association and fusion. The FAAM, with its dynamically weighted feature alignment module, adaptively adjusts feature differences across different viewpoints, achieving feature alignment between drone and satellite images. Our method achieves state-of-the-art (SOTA) performance on two public datasets, University-1652 and SUES-200. In generalization experiments, our model outperforms existing SOTA methods, with an average improvement of 1.52% in R@1 and 2.09% in AP, demonstrating its effectiveness and strong generalization in cross-view geo-localization tasks. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

29 pages, 18908 KiB

Open AccessArticle

Toward Efficient UAV-Based Small Object Detection: A Lightweight Network with Enhanced Feature Fusion

by Xingyu Di, Kangning Cui and Rui-Feng Wang

Remote Sens. 2025, 17(13), 2235; https://doi.org/10.3390/rs17132235 - 29 Jun 2025

Cited by 1 | Viewed by 687

Abstract

UAV-based small target detection is crucial in environmental monitoring, circuit detection, and related applications. However, UAV images often face challenges such as significant scale variation, dense small targets, high inter-class similarity, and intra-class diversity, which can lead to missed detections, thus reducing performance. [...] Read more.

UAV-based small target detection is crucial in environmental monitoring, circuit detection, and related applications. However, UAV images often face challenges such as significant scale variation, dense small targets, high inter-class similarity, and intra-class diversity, which can lead to missed detections, thus reducing performance. To solve these problems, this study proposes a lightweight and high-precision model UAV-YOLO based on YOLOv8s. Firstly, a double separation convolution (DSC) module is designed to replace the Bottleneck structure in the C2f module with deep separable convolution and point-by-point convolution fusion, which can reduce the model parameters and calculation complexity while enhancing feature expression. Secondly, a new SPPL module is proposed, which combines spatial pyramid pooling rapid fusion (SPPF) with long-distance dependency modeling (LSKA) to improve the robustness of the model to multi-scale targets through cross-level feature association. Then, DyHead is used to replace the original detector head, and the discrimination ability of small targets in complex background is enhanced by adaptive weight allocation and cross-scale feature optimization fusion. Finally, the WIPIoU loss function is proposed, which integrates the advantages of Wise-IoU, MPDIoU and Inner-IoU, and incorporates the geometric center of bounding box, aspect ratio and overlap degree into a unified measure to improve the localization accuracy of small targets and accelerate the convergence. The experimental results on the VisDrone2019 dataset showed that compared to YOLOv8s, UAV-YOLO achieved an 8.9% improvement in the recall of mAP@0.5 and 6.8%, while the parameters and calculations were reduced by 23.4% and 40.7%, respectively. Additional evaluations of the DIOR, RSOD, and NWPU VHR-10 datasets demonstrate the generalization capability of the model. Full article

(This article belongs to the Special Issue Geospatial Intelligence in Remote Sensing)

► Show Figures

Figure 1

24 pages, 7959 KiB

Open AccessArticle

Dynamic Collaborative Optimization Method for Real-Time Multi-Object Tracking

by Ziqi Li, Dongyao Jia, Zihao He and Nengkai Wu

Appl. Sci. 2025, 15(9), 5119; https://doi.org/10.3390/app15095119 - 5 May 2025

Viewed by 813

Abstract

Multi-object tracking still faces significant challenges in complex conditions such as dense scenes, occlusion environments, and non-linear motion, especially regarding the detection and identity maintenance of small objects. To tackle these issues, this paper proposes a multi-modal fusion tracking framework that realizes high-precision [...] Read more.

Multi-object tracking still faces significant challenges in complex conditions such as dense scenes, occlusion environments, and non-linear motion, especially regarding the detection and identity maintenance of small objects. To tackle these issues, this paper proposes a multi-modal fusion tracking framework that realizes high-precision tracking in complex scenarios by collaboratively optimizing feature enhancement and motion prediction. Firstly, a multi-scale feature adaptive enhancement (MS-FAE) module is designed, integrating multi-level features and introducing a small object adaptive attention mechanism to enhance the representation ability for small objects. Secondly, a cross-frame feature association module (CFAM) is put forward, constructing a global semantic association network via grouped cross-attention and a memory recall mechanism to solve the matching difficulties in occlusion and dense scenes. Thirdly, a Dynamic Motion Model (DMM) is developed, enabling the robust prediction of non-linear motion based on an improved Kalman filter framework. Finally, a Bi-modal dynamic decision method (BDDM) is devised to fuse appearance and motion information for hierarchical decision making. Experiments conducted on multiple public datasets, including MOT17, MOT20, and VisDrone-MOT, demonstrate that this method remarkably improves tracking accuracy while maintaining real-time performance. On the MOT17 test set, it achieves 63.7% in HOTA, 61.4 FPS in processing speed, and 79.4% in IDF1, outperforming current state-of-the-art tracking algorithms. Full article

(This article belongs to the Topic Visual Computing and Understanding: New Developments and Trends)

► Show Figures

Figure 1

19 pages, 9732 KiB

Open AccessArticle

YOLO-MARS: An Enhanced YOLOv8n for Small Object Detection in UAV Aerial Imagery

by Guofeng Zhang, Yanfei Peng and Jincheng Li

Sensors 2025, 25(8), 2534; https://doi.org/10.3390/s25082534 - 17 Apr 2025

Cited by 2 | Viewed by 1048

Abstract

In unmanned aerial vehicle (UAV) aerial imagery scenarios, challenges such as small target size, compact distribution, and mutual occlusion often result in missed detections and false alarms. To address these challenges, this paper introduces YOLO-MARS, a small target recognition model that incorporates a [...] Read more.

In unmanned aerial vehicle (UAV) aerial imagery scenarios, challenges such as small target size, compact distribution, and mutual occlusion often result in missed detections and false alarms. To address these challenges, this paper introduces YOLO-MARS, a small target recognition model that incorporates a multi-level attention residual mechanism. Firstly, an ERAC module is designed to enhance the ability to capture small targets by expanding the feature perception range, incorporating channel attention weight allocation strategies to strengthen the extraction capability for small targets and introducing a residual connection mechanism to improve gradient propagation stability. Secondly, a PD-ASPP structure is proposed, utilizing parallel paths for differentiated feature extraction and incorporating depthwise separable convolutions to reduce computational redundancy, thereby enabling the effective identification of targets at various scales under complex backgrounds. Thirdly, a multi-scale SGCS-FPN fusion architecture is proposed, adding a shallow feature guidance branch to establish cross-level semantic associations, thereby effectively addressing the issue of small target loss in deep networks. Finally, a dynamic WIoU evaluation function is implemented, constructing adaptive penalty terms based on the spatial distribution characteristics of predicted and ground-truth bounding boxes, thereby optimizing the boundary localization accuracy of densely packed small targets from the UAV viewpoint. Experiments conducted on the VisDrone2019 dataset demonstrate that the YOLO-MARS method achieves 40.9% and 23.4% in the mAP50 and mAP50:95 metrics, respectively, representing improvements of 8.1% and 4.3% in detection accuracy compared to the benchmark model YOLOv8n, thus demonstrating its advantages in UAV aerial target detection. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

20 pages, 16843 KiB

Open AccessTechnical Note

STCA: High-Altitude Tracking via Single-Drone Tracking and Cross-Drone Association

by Yu Qiao, Huijie Fan, Qiang Wang, Tinghui Zhao and Yandong Tang

Remote Sens. 2024, 16(20), 3861; https://doi.org/10.3390/rs16203861 - 17 Oct 2024

Viewed by 1255

Abstract

In this paper, we introduce a high-altitude multi-drone multi-target (HAMDMT) tracking method called STCA, which aims to collaboratively track similar targets that are easily confused. We approach this challenge by categorizing the HAMDMT tracking into two principal tasks: Single-Drone Tracking and Cross-Drone Association. [...] Read more.

In this paper, we introduce a high-altitude multi-drone multi-target (HAMDMT) tracking method called STCA, which aims to collaboratively track similar targets that are easily confused. We approach this challenge by categorizing the HAMDMT tracking into two principal tasks: Single-Drone Tracking and Cross-Drone Association. Single-Drone Tracking employs positional and appearance data vectors to overcome the challenges arising from similar target appearances within the field of view of a single drone. The Cross-Drone Association employs image-matching technology (LightGlue) to ascertain the topological relationships between images captured by disparate drones, thereby accurately determining the associations between targets across multiple drones. In Cross-Drone Association, we enhanced LightGlue into a more efficacious method, designated T-LightGlue, for cross-drone target tracking. This approach markedly accelerates the tracking process while reducing indicator dropout. To narrow down the range of targets involved in the cross-drone association, we develop a Common View Area Model based on the four vertices of the image. Considering to mitigate the occlusion encountered by high-altitude drones, we design a Local-Matching Model that assigns the same ID to the mutually nearest pair of targets from different drones after mapping the centroids of the targets across drones. The MDMT dataset is the only one captured by a high-altitude drone and contains a substantial number of similar vehicles. In the MDMT dataset, the STCA achieves the highest MOTA in Single-Drone Tracking, with the IDF1 system achieving the second-highest performance and the MDA system achieving the highest performance in Cross-Drone Association. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

► Show Figures

Graphical abstract

19 pages, 54513 KiB

Open AccessArticle

A Satellite-Drone Image Cross-View Geolocalization Method Based on Multi-Scale Information and Dual-Channel Attention Mechanism

by Naiqun Gong, Liwei Li, Jianjun Sha, Xu Sun and Qian Huang

Remote Sens. 2024, 16(6), 941; https://doi.org/10.3390/rs16060941 - 7 Mar 2024

Cited by 4 | Viewed by 4395

Abstract

Satellite-Drone Image Cross-View Geolocalization has wide applications. Due to the pronounced variations in the visual features of 3D objects under different angles, Satellite-Drone cross-view image geolocalization remains an unresolved challenge. The key to successful cross-view geolocalization lies in extracting crucial spatial structure information [...] Read more.

Satellite-Drone Image Cross-View Geolocalization has wide applications. Due to the pronounced variations in the visual features of 3D objects under different angles, Satellite-Drone cross-view image geolocalization remains an unresolved challenge. The key to successful cross-view geolocalization lies in extracting crucial spatial structure information across different scales in the image. Recent studies improve image matching accuracy by introducing an attention mechanism to establish global associations among local features. However, existing methods primarily focus on using single-scale features and employ a single-channel attention mechanism to correlate local convolutional features from different locations. This approach inadequately explores and utilizes multi-scale spatial structure information within the image, particularly lacking in the extraction and utilization of locally valuable information. In this paper, we propose a cross-view image geolocalization method based on multi-scale information and a dual-channel attention mechanism. The multi-scale information includes features extracted from different scales using various convolutional slices, and it extensively utilizes shallow network features. The dual-channel attention mechanism, through successive local and global feature associations, effectively learns depth discriminative features across different scales. Experimental results were conducted using existing satellite and drone image datasets, with additional validation performed on an independent self-made dataset. The findings indicate that our approach exhibits superior performance compared to existing methods. The methodology presented in this paper exhibits enhanced capabilities, especially in the exploitation of multi-scale spatial structure information and the extraction of locally valuable information. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

19 pages, 29140 KiB

Open AccessArticle

Mapping Malaria Vector Habitats in West Africa: Drone Imagery and Deep Learning Analysis for Targeted Vector Surveillance

by Fedra Trujillano, Gabriel Jimenez Garay, Hugo Alatrista-Salas, Isabel Byrne, Miguel Nunez-del-Prado, Kallista Chan, Edgar Manrique, Emilia Johnson, Nombre Apollinaire, Pierre Kouame Kouakou, Welbeck A. Oumbouke, Alfred B. Tiono, Moussa W. Guelbeogo, Jo Lines, Gabriel Carrasco-Escobar and Kimberly Fornace

Remote Sens. 2023, 15(11), 2775; https://doi.org/10.3390/rs15112775 - 26 May 2023

Cited by 14 | Viewed by 4399

Abstract

Disease control programs are needed to identify the breeding sites of mosquitoes, which transmit malaria and other diseases, in order to target interventions and identify environmental risk factors. The increasing availability of very-high-resolution drone data provides new opportunities to find and characterize these [...] Read more.

Disease control programs are needed to identify the breeding sites of mosquitoes, which transmit malaria and other diseases, in order to target interventions and identify environmental risk factors. The increasing availability of very-high-resolution drone data provides new opportunities to find and characterize these vector breeding sites. Within this study, drone images from two malaria-endemic regions in Burkina Faso and Côte d’Ivoire were assembled and labeled using open-source tools. We developed and applied a workflow using region-of-interest-based and deep learning methods to identify land cover types associated with vector breeding sites from very-high-resolution natural color imagery. Analysis methods were assessed using cross-validation and achieved maximum Dice coefficients of 0.68 and 0.75 for vegetated and non-vegetated water bodies, respectively. This classifier consistently identified the presence of other land cover types associated with the breeding sites, obtaining Dice coefficients of 0.88 for tillage and crops, 0.87 for buildings and 0.71 for roads. This study establishes a framework for developing deep learning approaches to identify vector breeding sites and highlights the need to evaluate how results will be used by control programs. Full article

(This article belongs to the Special Issue Remote Sensing and Infectious Diseases)

► Show Figures

Figure 1

25 pages, 12497 KiB

Open AccessArticle

Susceptibility Prediction of Post-Fire Debris Flows in Xichang, China, Using a Logistic Regression Model from a Spatiotemporal Perspective

by Tao Jin, Xiewen Hu, Bo Liu, Chuanjie Xi, Kun He, Xichao Cao, Gang Luo, Mei Han, Guotao Ma, Ying Yang and Yan Wang

Remote Sens. 2022, 14(6), 1306; https://doi.org/10.3390/rs14061306 - 8 Mar 2022

Cited by 27 | Viewed by 3422

Abstract

The post-fire debris flow (PFDF) is a commonly destructive hazard that may persist for several years following the wildfires. Susceptibility mapping is an effective method for mitigating hazard risk. Yet, the majority of susceptibility prediction models only focus on spatial probability in the [...] Read more.

The post-fire debris flow (PFDF) is a commonly destructive hazard that may persist for several years following the wildfires. Susceptibility mapping is an effective method for mitigating hazard risk. Yet, the majority of susceptibility prediction models only focus on spatial probability in the specific period while ignoring the change associated with time. This study improves the predictive model by introducing the temporal factor. The area burned by the 30 March 2020 fire in Xichang City, China is selected as an illustrative example, and the susceptibility of the PFDF was predicted for different periods of seven months after the wildfires. 2214 hydrological response events, including 181 debris flow events and 2033 flood events from the 82 watersheds are adopted to construct the sample dataset. Seven conditioning factors consist of temporal factors and spatial factors are extracted by the remote sensing interpretation, field investigations, and in situ tests, after correlation and importance analysis. The logistic regression (LR) is adopted to establish prediction models through 10 cross-validations. The results show that the susceptibility to PFDF has significantly reduced over time. After two months of wildfire, the proportions of very low, low, moderate, high, and very high susceptibility are 1.2%, 3.7%, 24.4%, 23.2%, and 47.6%, respectively. After seven months of wildfire, the proportions of high and very high susceptibility decreased to 0, while the proportions of very low to medium susceptibility increased to 35.4%, 35.6%, and 28.1%, respectively. The reason is that the drone seeding of grass seeds and artificial planting of trees accelerated the natural recovery of vegetation and soil after the fire. This study can give insight into the evolution mechanism of PFDF over time and reflect the important influence of human activity after the wildfire. Full article

(This article belongs to the Special Issue Geospatial Techniques for Landslides and Erosion Studies: Data Capture, Monitoring, Analysis and Modelling)

► Show Figures

Figure 1

13 pages, 2529 KiB

Open AccessArticle

Vineyard Pruning Weight Prediction Using 3D Point Clouds Generated from UAV Imagery and Structure from Motion Photogrammetry

by Marta García-Fernández, Enoc Sanz-Ablanedo, Dimas Pereira-Obaya and José Ramón Rodríguez-Pérez

Agronomy 2021, 11(12), 2489; https://doi.org/10.3390/agronomy11122489 - 8 Dec 2021

Cited by 19 | Viewed by 4023

Abstract

In viticulture, information about vine vigour is a key input for decision-making in connection with production targets. Pruning weight (PW), a quantitative variable used as indicator of vegetative vigour, is associated with the quantity and quality of the grapes. Interest has been growing [...] Read more.

In viticulture, information about vine vigour is a key input for decision-making in connection with production targets. Pruning weight (PW), a quantitative variable used as indicator of vegetative vigour, is associated with the quantity and quality of the grapes. Interest has been growing in recent years around the use of unmanned aerial vehicles (UAVs) or drones fitted with remote sensing facilities for more efficient crop management and the production of higher quality wine. Current research has shown that grape production, leaf area index, biomass, and other viticulture variables can be estimated by UAV imagery analysis. Although SfM lowers costs, saves time, and reduces the amount and type of resources needed, a review of the literature revealed no studies on its use to determine vineyard pruning weight. The main objective of this study was to predict PW in vineyards from a 3D point cloud generated with RGB images captured by a standard drone and processed by SfM. In this work, vertical and oblique aerial images were taken in two vineyards of Godello and Mencía varieties during the 2019 and 2020 seasons using a conventional Phantom 4 Pro drone. Pruning weight was measured on sampling grids comprising 28 calibration cells for Godello and 59 total cells for Mencía (39 calibration cells and 20 independent validation). The volume of vegetation (V) was estimated from the generated 3D point cloud and PW was estimated by linear regression analysis taking V as predictor variable. When the results were leave-one-out cross-validated (LOOCV), the R² was found to be 0.71 and the RMSE 224.5 (g) for the PW estimate in Mencía 2020, calculated for the 39 calibration cells on the grounds of oblique images. The regression analysis results for the 20 validation samples taken independently of the rest (R² = 0.62; RMSE = 249.3 g) confirmed the viability of using the SfM as a fast, non-destructive, low-cost procedure for estimating pruning weight. Full article

(This article belongs to the Special Issue Geoinformatics Application in Agriculture)

► Show Figures

Figure 1

13 pages, 3904 KiB

Open AccessEditor’s ChoiceArticle

Distinguishing Drones from Birds in a UAV Searching Laser Scanner Based on Echo Depolarization Measurement

by Jacek Wojtanowski, Marek Zygmunt, Tadeusz Drozd, Marcin Jakubaszek, Marek Życzkowski and Michał Muzal

Sensors 2021, 21(16), 5597; https://doi.org/10.3390/s21165597 - 19 Aug 2021

Cited by 17 | Viewed by 3882

Abstract

Widespread availability of drones is associated with many new fascinating possibilities, which were reserved in the past for few. Unfortunately, this technology also has many negative consequences related to illegal activities (surveillance, smuggling). For this reason, particularly sensitive areas should be equipped with [...] Read more.

Widespread availability of drones is associated with many new fascinating possibilities, which were reserved in the past for few. Unfortunately, this technology also has many negative consequences related to illegal activities (surveillance, smuggling). For this reason, particularly sensitive areas should be equipped with sensors capable of detecting the presence of even miniature drones from as far away as possible. A few techniques currently exist in this field; however, all have significant drawbacks. This study addresses a novel approach for small (<5 kg) drones detection technique based on a laser scanning and a method to discriminate UAVs from birds. The latter challenge is fundamental in minimizing the false alarm rate in each drone monitoring equipment. The paper describes the developed sensor and its performance in terms of drone vs. bird discrimination. The idea is based on simple cross-polarization ratio analysis of the optical echo received as a result of laser backscattering on the detected object. The obtained experimental results show that the proposed method does not always guarantee 100 percent discrimination efficiency, but provides certain confidence level distribution. Nevertheless, due to the hardware simplicity, this approach seems to be a valuable addition to the developed anti-drone laser scanner. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI