A Scale-Adaptive Aggregation and Multi-Domain Feature Fusion Architecture for Small-Target Detection in UAV Aerial Imagery
Abstract
1. Introduction
- (1)
- Implementation of a Dedicated P2 Detection Branch. We design a new P2 detection head and reconstruct its feature fusion pathway to form a complete P2 branch. This branch enables earlier access to fine-grained details and provides a dedicated stream for very small targets.
- (2)
- Backbone Enhancement via MBConv Integration. The original YOLOv11 backbone is re-engineered by embedding Mobile Inverted Bottleneck Convolution (MBConv) blocks. This redesign reduces computational cost while strengthening the retention of delicate small-object features, thereby alleviating feature degradation.
- (3)
- Development of a Scale-Adaptive Attention Fusion (SAF) Mechanism. We devise the SAF mechanism to orchestrate a more effective amalgamation of features from different network depths [15]. SAF applies dynamic weighting to adaptively narrow semantic gaps between feature levels. Combined with a Channel-Adaptive Projection (CAP) module, it promotes coherent feature flow and improves detection across object scales.
- (4)
- Conception of a Multi-Domain Feature Attention Fusion (MDFAF) Module. This module is engineered to jointly harness and refine discriminative information from three distinct yet complementary realms: spatial granularity, frequency-domain context, and explicit positional relations. By integrating frequency cues with positional information, MDFAF enhances local textures and structural details while maintaining global context [16]. This yields a more robust representation for small targets in cluttered UAV scenes.
2. Related Work
2.1. Object Detection Algorithms
2.2. Deep Learning Methods for UAV Aerial Small Object Detection
3. Methods for Implementation
3.1. Overview of MSCM-YOLO Model
3.2. Improvements in Backbone Structure
- (1)
- To alleviate the bottleneck of insufficient feature extraction capabilities, this paper first employs the MBConv module as the foundational building block. Its unique inverted residual structure and attention mechanism significantly enhance feature representation capabilities while ensuring computational efficiency. The expand–compress channel transformation strategy effectively boosts feature response for small objects. The architecture of the MBConv module is illustrated in Figure 2.
- (2)
- To mitigate the limitation of insufficient utilization of low-level detail features in existing models, this paper introduces a Scale-Adaptive Attention Fusion Module (SAF). The module adaptively fuses high-resolution spatial details, mid-scale structural cues, and deep semantic context via a detail-enhancement pathway from shallow to deep layers. The SAF structure is illustrated in Figure 3.
- (3)
- To alleviate fusion bottlenecks caused by mismatched feature distributions and channel dimensions across different branches, this study proposes a Channel-Adaptive Projection (CAP) module, as illustrated in Figure 4. Centered on a learnable 1 × 1 convolution, the CAP module unifies channel dimensions of cross-source features, reducing semantic inconsistency and response misalignment during fusion. Unlike conventional projection operations, CAP integrates channel transformation with a compression–excitation mechanism, introducing global context-aware channel attention to selectively enhance informative features while suppressing redundant responses.
3.3. Improvements in Neck Structure
3.4. Improvements in Head Structure
4. Experiment
4.1. Datasets
4.2. Implementation Details
4.3. Evaluation Metrics
4.4. Ablation Study
4.4.1. Impact of Individual Components
4.4.2. Synergistic Effects of Module Combinations
4.4.3. Analysis of Efficiency Trade-Off
4.5. Comparisons with Other Object Detection Algorithms
4.5.1. Comparison with Lightweight YOLO Models
4.5.2. Comparison with High-Performance Detectors
4.5.3. Per-Category Performance Analysis
4.5.4. Overall Performance and Efficiency Comparison
4.6. Generalization Experiments
- UAVDT: Focuses on UAV-view vehicle detection in urban traffic scenes. It features dense vehicles, viewpoint/scale variations, frequent occlusion, motion blur, and complex backgrounds under diverse lighting and weather conditions, making detection a persistent challenge in real-world UAV surveillance.
- DIOR: A large-scale dataset with 20 common object categories in diverse optical remote sensing scenes, evaluating generalization across multiple land-cover types and more complex semantic contexts.
- AI-TOD: Specifically designed for tiny object detection in aerial images, where targets occupy an average of only 0.12% of the image area, testing the limit of feature extraction capability.
4.6.1. Results on UAVDT
4.6.2. Results on DIOR and AI-TOD
5. Visualization
5.1. Comparison of Detection Results
5.2. Confusion Matrix Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Al-Gburi, S.H.; Al-Sammak, K.A.; Marghescu, I.; Oprea, C.C.; Drăgulinescu, A.-M.C.; Alduais, N.A.M.; Alheeti, K.M.A.; Al-Sammak, N.A.H. EffRes-DrowsyNet: A Novel Hybrid Deep Learning Model Combining EfficientNetB0 and ResNet50 for Driver Drowsiness Detection. Sensors 2025, 25, 3711. [Google Scholar] [CrossRef]
- Xing, L.; Fan, X.; Dong, Y.; Xiong, Z.; Xing, L.; Yang, Y.; Bai, H.; Zhou, C. Multi-UAV cooperative system for search and rescue based on YOLOv5. Int. J. Disaster Risk Reduct. 2022, 76, 102972. [Google Scholar] [CrossRef]
- Nex, F.; Armenakis, C.; Cramer, M.; Cucci, D.A.; Gerke, M.; Honkavaara, E.; Kukko, A.; Persello, C.; Skaloud, J. UAV in the advent of the twenties: Where we stand and what is next. ISPRS J. Photogramm. Remote Sens. 2022, 184, 215–242. [Google Scholar] [CrossRef]
- Villarino, A.; Valenzuela, H.; Antón, N.; Domínguez, M.; Méndez Cubillos, X.C. UAV Applications for Monitoring and Management of Civil Infrastructures. Infrastructures 2025, 10, 106. [Google Scholar] [CrossRef]
- Kabashkin, I.; Kulmurzina, A.; Nadimov, B.; Tlepiyeva, G.; Sansyzbayeva, Z.; Sultanov, T. Synchronized Multi-Point UAV-Based Traffic Monitoring for Urban Infrastructure Decision Support. Drones 2025, 9, 370. [Google Scholar] [CrossRef]
- Zhou, B.; Liu, W.; Yang, H. Unmanned Aerial Vehicle Service Network Design for Urban Monitoring. Transp. Res. Part C Emerg. Technol. 2023, 157, 104406. [Google Scholar] [CrossRef]
- Wang, Y.; Li, J.; Yang, X.; Peng, Q. UAV–Ground Vehicle Collaborative Delivery in Emergency Response: A Review of Key Technologies and Future Trends. Appl. Sci. 2025, 15, 9803. [Google Scholar] [CrossRef]
- Lin, T.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Vijayakumar, A.; Vairavasundaram, S. Yolo-based object detection models: A review and its applications. Multimed. Tools Appl. 2024, 83, 83535–83574. [Google Scholar] [CrossRef]
- Li, H.; Huang, B.; Lv, J. YOLO-MFG: Multi-Scale and Feature-Preserving YOLO with Gated Attention for Remote Sensing Object Detection. IEEE Geosci. Remote Sens. Lett. 2025, 23, 1–5. [Google Scholar] [CrossRef]
- Johari, R.; Joshi, M.; Singh, A.P.; Arora, A. FusionSight-YOLO: A Modified YOLOv8 Architecture with Multi-Channel Input Fusion for UAV-Based Object Detection. In Proceedings of the 2025 International Conference on Emerging Technology in Autonomous Aerial Vehicles (ETAAV), Bangalore, India, 18–20 August 2025; pp. 1–6. [Google Scholar]
- Ding, Y.; He, W.; Chen, Y.; Zhang, J. Multi-Dimensional Statistical Attention for Target Detection With Channel Weights and Spatial Cross. In Proceedings of the 2025 IEEE 8th International Conference on Mechatronics and Computer Technology Engineering (MCTE), Guangzhou, China, 29–31 August 2025; pp. 106–109. [Google Scholar]
- Wang, W.; Yang, T.; Wang, X. From Spatial to Frequency Domain: A Pure Frequency Domain FDNet Model for the Classification of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–13. [Google Scholar] [CrossRef]
- Fan, F.; Zhang, M.; Yu, D.; Li, J.; Liu, G. Efficient Remote Sensing Image Target Detection Network With Shape-Location Awareness Enhancements. IEEE Sens. J. 2024, 24, 30654–30667. [Google Scholar] [CrossRef]
- Lu, X.; Chen, H.; Yang, H. HFE-YOLO for Small Object Detection in UAV Aerial Image. In Proceedings of the 2025 4th International Conference on Artificial Intelligence, Internet of Things and Cloud Computing Technology (AIoTC), Guilin, China, 8–10 August 2025; pp. 44–47. [Google Scholar]
- Zhu, Y.; Ma, Y.; Fan, F.; Huang, J.; Yao, Y.; Zhou, X.; Huang, R. Toward Robust Infrared Small Target Detection via Frequency and Spatial Feature Fusion. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–15. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
- Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-Time Flying Object Detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 1–10. [Google Scholar]
- Wang, J.; Yang, W.; Guo, H.; Zhang, R.; Xia, G.-S. Tiny Object Detection in Aerial Images. In Proceedings of the International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 3791–3798. [Google Scholar]
- Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.-S. Detecting Tiny Objects in Aerial Images: A Normalized Wasserstein Distance and a New Benchmark. ISPRS J. Photogramm. Remote Sens. 2022, 190, 79–93. [Google Scholar] [CrossRef]
- Wang, X.; Peng, Y.; Shen, C. Efficient Feature Fusion for UAV Object Detection. arXiv 2025, arXiv:2501.17983. [Google Scholar] [CrossRef]
- Du, Z.; Hu, Z.; Zhao, G.; Jin, Y.; Ma, H. Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images. arXiv 2024, arXiv:2407.19696. [Google Scholar] [CrossRef]
- Li, H.; Qu, H. DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection. In Proceedings of the International Conference on Computational Visual Media (CVM), Hong Kong, China, 19–21 April 2025; Springer Nature: Singapore, 2025; pp. 212–227. [Google Scholar]
- Teng, X.; Zhang, W.; Liu, T.; Yang, J.; Ma, M. STFF-RTDETR: An End-to-End Small Target Detection Network Based on Multi-Scale Feature Extraction for UAV Aerial Images. J. Supercomput. 2025, 81, 928. [Google Scholar] [CrossRef]
- Liang, S.; Feng, X.; Xie, M.; Tang, Q.; Zhu, H.; Li, G. Lightweight YOLO-SR: A Method for Small Object Detection in UAV Aerial Images. Appl. Sci. 2025, 15, 13063. [Google Scholar] [CrossRef]
- Zhuo, Z.; Lu, R.; Yao, Y.; Wang, S.; Zheng, Z.; Zhang, J.; Yang, X. TAF-YOLO: A Small-Object Detection Network for UAV Aerial Imagery via Visible and Infrared Adaptive Fusion. Remote Sens. 2025, 17, 3936. [Google Scholar] [CrossRef]
- Liu, H.; Li, X.; Wang, L.; Zhang, Y.; Wang, Z.; Lu, Q. MS-YOLOv11: A Wavelet-Enhanced Multi-Scale Network for Small Object Detection in Remote Sensing Images. Sensors 2025, 25, 6008. [Google Scholar] [CrossRef]
- Wang, A.; Fu, Z.; Zhao, Y.; Chen, H. A Remote Sensing Image Object Detection Model Based on Improved YOLOv11. Electronics 2025, 14, 2607. [Google Scholar] [CrossRef]
- Zhang, P.; Zhao, X.; Yang, X.; Zhang, Z.; Bi, C.; Zhang, L. F3-YOLO: A Robust and Fast Forest Fire Detection Model. Forests 2025, 16, 1368. [Google Scholar] [CrossRef]
- Wen, Z.; Li, P.; Liu, Y.; Chen, J.; Xiang, X.; Li, Y.; Wang, H.; Zhao, Y.; Zhou, G. FANet: Frequency-Aware Attention-Based Tiny-Object Detection in Remote Sensing Images. Remote Sens. 2025, 17, 4066. [Google Scholar] [CrossRef]
- Wei, X.; Li, Z.; Wang, Y. SED-YOLO based multi-scale attention for small object detection in remote sensing. Sci. Rep. 2025, 15, 3125. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; He, N.; Hong, C.; Wang, Q.; Chen, M. Improved YOLOX-X based UAV aerial photography object detection algorithm. Image Vis. Comput. 2023, 135, 104716. [Google Scholar] [CrossRef]
- Padilla, R.; Passos, W.L.; Dias, T.L.B.; Netto, S.L.; da Silva, E.A.B. A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics 2021, 10, 279. [Google Scholar] [CrossRef]
- Xiao, Y.; Xu, T.; Xin, Y.; Li, J. FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection. arXiv 2025, arXiv:2504.20670. [Google Scholar] [CrossRef]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; Volume 12178, pp. 9756–9765. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-aligned One-stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE Computer Society: Washington, DC, USA, 2021; Volume 12907, pp. 3490–3499. [Google Scholar]
- Huang, M.; Li, G.; Liu, Z.; Wu, Y.; Gong, C.; Zhu, L.; Yang, Y. Exploring viewport features for semi-supervised saliency prediction in omnidirectional images. Image Vis. Comput. 2023, 129, 104590. [Google Scholar] [CrossRef]
- Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. Adv. Neural Inf. Process. Syst. 2023, 36, 51094–51112. [Google Scholar]
- Wu, S.; Lu, X.; Guo, C.; Guo, H. Accurate UAV Small Object Detection Based on HRFPN and EfficentVMamba. Sensors 2024, 24, 4966. [Google Scholar] [CrossRef]
- Yang, F.; He, M.; Liu, J.; Jin, H. RMH-YOLO: A Refined Multi-Scale Architecture for Small-Target Detection in UAV Aerial Imagery. Sensors 2025, 25, 7088. [Google Scholar] [CrossRef]
- Yao, B.; Zhang, C.; Meng, Q.; Sun, X.; Hu, X.; Wang, L.; Li, X. SRM-YOLO for Small Object Detection in Remote Sensing Images. Remote Sens. 2025, 17, 2099. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Yu, G.; Chang, Q.; Lv, W.; Xu, C.; Cui, C.; Ji, W.; Dang, Q.; Deng, K.; Wang, G.; Du, Y.; et al. PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices. arXiv 2021, arXiv:2111.00902. [Google Scholar]
- Qiu, X.; Chen, Y.; Cai, W.; Niu, M.; Li, J. LD-YOLOv10: A Lightweight Target Detection Algorithm for Drone Scenarios Based on YOLOv10. Electronics 2024, 13, 3269. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, X.; Sun, S.; You, H.; Wang, Y.; Lin, J.; Wang, J. Vehicle detection in drone aerial views based on lightweight OSD-YOLOv10. Sci. Rep. 2025, 15, 25155. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Nayak, A. LSCNet: A Lightweight Shallow Feature Cascade Network for Small Object Detection in UAV Imagery. Future Internet 2025, 17, 568. [Google Scholar] [CrossRef]
- Zhang, Y.; Ye, M.; Zhu, G.; Liu, Y.; Guo, P.; Yan, J. FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
- Yu, R.; Zhang, Y.; Liu, S. CIMB-YOLOv8: A Lightweight Remote Sensing Object Detection Network Based on Contextual Information and Multiple Branches. Electronics 2025, 14, 2657. [Google Scholar] [CrossRef]
- Hollard, L.; Mohimont, L.; Steffenel, L.A.; Gaveau, N. LeYOLO: New Embedded Architecture for Object Detection. In Proceedings of the Conference on Robots and Vision (CRV), Calgary, AB, Canada, 27–29 May 2025. [Google Scholar]
- Wang, J.; Bai, Z.; Zhang, X.; Qiu, Y.; Bu, F.; Shao, Y. HiDRA-DCDNet: Dynamic Hierarchical Attention and Multi-Scale Context Fusion for Real-Time Remote Sensing Small-Target Detection. Remote Sens. 2025, 17, 2195. [Google Scholar] [CrossRef]












| Environment | Parameters |
|---|---|
| OS | Ubuntu 20.04 |
| CPU | Intel(R) Xeon(R) Silver 4210R |
| GPU | NVIDIA GeForce RTX 3090 |
| Language | Python 3.12 |
| CUDA Version | CUDA 12.2 |
| Framework | Torch 2.5.1 |
| Imgsz | Epochs | Batch | Optimizer | Lr0 | Momentum | Weight_Decay | Seed |
|---|---|---|---|---|---|---|---|
| 640 × 640 | 300 | 8 | SGD | 0.04 | 0.937 | 5 × 10−4 | 42 |
| Variant | P2 | MBConv | SAF+CAP | MDFAF | P(%) | R(%) | mAP50(%) | mAP50:95(%) | Param(M) | FPS |
|---|---|---|---|---|---|---|---|---|---|---|
| A | 44.03 | 33.32 | 33.64 | 19.91 | 2.59 | 598.73 | ||||
| B | √ | 48.40 | 35.90 | 37.47 | 22.72 | 2.67 | 550.01 | |||
| C | √ | √ | 47.55 | 36.16 | 37.30 | 22.65 | 2.67 | 524.20 | ||
| D | √ | √ | 48.56 | 38.31 | 39.44 | 23.99 | 2.97 | 399.79 | ||
| E | √ | √ | 52.76 | 40.72 | 42.83 | 26.53 | 3.69 | 331.03 | ||
| F | √ | √ | √ | 50.77 | 37.66 | 39.48 | 23.91 | 2.97 | 295.06 | |
| G | √ | √ | √ | 52.54 | 41.34 | 43.22 | 26.54 | 3.69 | 258.54 | |
| H | √ | √ | √ | 52.27 | 42.25 | 44.18 | 27.32 | 3.91 | 223.66 | |
| I | √ | √ | √ | √ | 53.42 | 43.37 | 44.41 | 27.13 | 3.97 | 167.81 |
| Model | Param | mAP50(%) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ALL | Ped | Peo | Bic | Car | Van | Truck | Tri | Awn | Bus | Motor | ||
| ATSS | 31.0 | 36.9 | 42.7 | 22.3 | 18.8 | 76.6 | 41.4 | 36.9 | 28.4 | 8.5 | 52.1 | 41.4 |
| TOOD | 31.8 | 41.0 | 41.5 | 31.9 | 19.2 | 81.4 | 46.5 | 39.6 | 31.8 | 14.1 | 53.5 | 50.5 |
| YOLOv8 | 3.0 | 33.8 | 35.7 | 28.6 | 8.25 | 76.2 | 39.1 | 29.8 | 21.6 | 12.2 | 48.9 | 37.9 |
| VFNet | 33.5 | 41.3 | 41.8 | 25.4 | 20.0 | 80.4 | 47.4 | 41.7 | 35.1 | 15.5 | 57.0 | 48.8 |
| YOLOv11 | 2.6 | 33.6 | 35.9 | 27.3 | 8.37 | 76.1 | 39.9 | 30.5 | 21.0 | 11.9 | 47.2 | 37.4 |
| Gold-YOLO | - | 31.3 | 32.3 | 26.3 | 6.9 | 74.3 | 36.9 | 26.6 | 20.1 | 11.5 | 44.3 | 33.7 |
| YOLOv12 | 2.5 | 31.9 | 34.8 | 26.0 | 7.96 | 74.8 | 37.3 | 26.1 | 18.9 | 11.6 | 46.5 | 34.9 |
| HRMamba-YOLO | 33.5 | 38.9 | 37.5 | 26.8 | 21.6 | 68.2 | 46.8 | 41.6 | 30.9 | 19.9 | 58.9 | 36.8 |
| RT-DETR-R18 | 19.9 | 42.5 | 44.9 | 39.2 | 18.8 | 81.7 | 48.3 | 36.2 | 32.0 | 15.9 | 54.9 | 53.5 |
| DASSF-YOLO | - | 39.6 | 44.6 | 36.8 | 14.4 | 81.0 | 44.4 | 31.2 | 24.7 | 16.1 | 52.5 | 45.9 |
| RMH-YOLOv8n | 1.3 | 42.4 | - | - | - | - | - | - | - | - | - | - |
| SRM-YOLO | 3.2 | 39.4 | 52.7 | 42.1 | 12.7 | 80.5 | 43.1 | 34.1 | 25.9 | 14.8 | 53.6 | 45.2 |
| MSCM-YOLO | 3.9 | 44.4 | 50.7 | 41.2 | 16.6 | 83.6 | 48.8 | 39.9 | 32.1 | 17.7 | 61.8 | 51.7 |
| Model | Param(M) | P(%) | R(%) | mAP50(%) | mAP50:95(%) | FLOPs/G |
|---|---|---|---|---|---|---|
| HRMamba-YOLO | 33.5 | - | - | 38.9 | - | 96.4 |
| DASSF-YOLO | - | 49.9 | 39.0 | 39.6 | 23.5 | - |
| RMH-YOLOv8n | 1.3 | 53.0 | 40.4 | 42.4 | 25.7 | 16.7 |
| SRM-YOLO | 3.2 | 49.4 | 38.1 | 39.4 | - | 15.9 |
| MSCM-YOLO | 3.9 | 53.4 | 43.3 | 44.4 | 27.1 | 18.7 |
| Dataset | Model | F1(%) | mAP50(%) | mAP50:95(%) |
|---|---|---|---|---|
| DIOR | ATSS | 82.9 | 83.1 | 52.1 |
| YOLOv8 | 82.0 | 84.5 | 62.1 | |
| YOLOv11 | 82.5 | 85.2 | 63.0 | |
| YOLOv12 | 82.9 | 84.8 | 62.3 | |
| DASSF-YOLO | 81.2 | 81.4 | 59.2 | |
| CIMB-YOLO | 83.9 | 85.3 | 63.2 | |
| MSCM-YOLO | 84.7 | 88.2 | 66.2 | |
| AI-TOD | YOLOv8 | 46.7 | 41.3 | 18.0 |
| YOLOv11 | 46.6 | 41.0 | 17.6 | |
| FFCA-YOLO | 45.8 | 36.9 | 16.0 | |
| HGNetv2 | 44.4 | 39.5 | 17.4 | |
| LeYOLO | 43.2 | 37.5 | 16.0 | |
| HiDRA-DCDNet | 50.3 | 45.0 | 19.3 | |
| MSCM-YOLO | 54.4 | 48.3 | 21.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sun, Z.; Zhang, G.; Xing, Y.; Liu, Y. A Scale-Adaptive Aggregation and Multi-Domain Feature Fusion Architecture for Small-Target Detection in UAV Aerial Imagery. Sensors 2026, 26, 1610. https://doi.org/10.3390/s26051610
Sun Z, Zhang G, Xing Y, Liu Y. A Scale-Adaptive Aggregation and Multi-Domain Feature Fusion Architecture for Small-Target Detection in UAV Aerial Imagery. Sensors. 2026; 26(5):1610. https://doi.org/10.3390/s26051610
Chicago/Turabian StyleSun, Zhiwei, Guanglei Zhang, Yuxin Xing, and Yuliang Liu. 2026. "A Scale-Adaptive Aggregation and Multi-Domain Feature Fusion Architecture for Small-Target Detection in UAV Aerial Imagery" Sensors 26, no. 5: 1610. https://doi.org/10.3390/s26051610
APA StyleSun, Z., Zhang, G., Xing, Y., & Liu, Y. (2026). A Scale-Adaptive Aggregation and Multi-Domain Feature Fusion Architecture for Small-Target Detection in UAV Aerial Imagery. Sensors, 26(5), 1610. https://doi.org/10.3390/s26051610

