AD-YOLO: A Unified Method for Traffic-Dense and Small Object Detection in UAV Images
Highlights
- AD-YOLO improves small object detection in traffic-dense UAV images by integrating adaptive orientation-aware feature extraction, dual-path cross-scale feature fusion, and a reparametrized large-kernel fusion module.
- AD-YOLO outperforms baseline models in detection accuracy with acceptable computational costs, demonstrating its strong robustness and application potential under complex aerial perspectives.
- Jointly modeling object orientations, multi-scale contexts, and bidirectional feature interactions contributes to enhancing the detection of densely distributed, scale-varying objects in UAV images.
- AD-YOLO offers a concise yet effective approach to boosting detection performance on dense and small objects in UAV imagery, without requiring extensive modifications to the original framework.
Abstract
1. Introduction
- (1)
- To mitigate information loss during the feature extraction of diverse traffic objects, we propose the AG module, which boosts the backbone network’s capability to extract multi-orientation and multi-scale object features via two key components: the adaptive rotational convolution unit (ARCUnit) and the group directional attention mechanism with mixed kernels (GDA-MK).
- (2)
- To alleviate the fine-detail loss caused by downsampling and the information redundancy induced by upsampling in conventional feature pyramid architectures, we propose the DPCFPN. By coupling the multi-directional context aggregation path (MDCAP) with the hierarchical semantic progressive fusion path (HSPFP), the DPCFPN synergistically captures fine-grained spatial details and high-level abstract semantics, thereby improving detection consistency for multi-scale objects.
- (3)
- To enhance the representation of deep features embedded in the DPCFPN, we introduce the hierarchically dense reparameterized large-kernel (HDRepLK). Without significantly increasing the number of model parameters, HDRepLK effectively expands the network’s receptive field and enhances its capacity to fuse multi-scale contextual information.
- (4)
- Extensive experiments conducted on two mainstream UAV-based traffic object datasets, VisDrone2019 [4] and UAVDT [9], demonstrate that AD-YOLO outperforms SOTA methods in detection accuracy with acceptable computational costs, verifying its strong robustness and promising potential for application to complex aerial perspectives.
2. Literature Review
2.1. Traditional Traffic Object Detection
2.2. Traffic Object Detection in UAV Images
2.2.1. Feature Extraction Mechanisms
Convolutional Deformation Mechanisms
Attention Mechanisms
2.2.2. Feature Fusion Networks
2.3. Transformer-Based Methods for UAV Object Detection
2.4. Summary of Limitations in Existing Research
- Inadequate modeling of geometric variations: Standard convolutions and generic attention mechanisms lack explicit orientation alignment and joint direction–scale collaborative modeling. Even with the adoption of deformable convolutions [24,25,26,27,28,29,47] or dimension-independent attention modules [33,35,36,37], existing models still struggle to effectively capture arbitrary orientations, weak textures, and dense layouts.
- Inefficient cross-scale feature fusion: Commonly used feature pyramids such as PANet [39] and adaptive FPN [44] rely primarily on feedforward fusion mechanisms. This reliance restricts the deep-shallow feature interaction necessary to mitigate detail attenuation and maintain feature consistency across scales in dense, small-object detection [48,49].
- Trade-off between detection accuracy and computational efficiency: Superior models often achieve performance gains through module stacking, leading to excessive parameters and prohibitive computational costs. Conversely, lightweight designs typically sacrifice detection accuracy in complex aerial scenes. There remains a lack of unified architectures that inherently balance representational capacities with computational efficiency, particularly for resource-constrained aerial perception systems.
3. Methodology
3.1. Overview
3.2. Adaptive Guidance Module
3.2.1. Adaptive Rotational Convolution Unit
3.2.2. Group Directional Attention Mechanism with Mixed Kernels
3.3. Dual-Path Collaborative Feature Pyramid Network
3.3.1. Multi-Directional Context Aggregation Path
3.3.2. Hierarchical Semantic Progressive Fusion Path
3.4. Hierarchically Dense Reparameterized Large-Kernel
4. Experiments
4.1. Datasets
4.2. Experimental Setup
4.3. Evaluation Metrics
4.4. Comparison with SOTA Baselines
4.4.1. Overall Performance
4.4.2. Performance Comparison on Objects with Varying Shapes and Scales
4.5. Ablation Studies
4.5.1. Effectiveness of the Group Directional Attention Mechanism with Mixed Kernels
4.5.2. Effectiveness of the Dual-Path Collaborative Feature Pyramid Network
4.5.3. Component-Wise Ablation Analysis
4.6. Visualization Analysis
4.7. Deployment Feasibility
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, R.; Wang, B.; Zhang, J.; Bian, Z.; Feng, C.; Ozbay, K. When language and vision meet road safety: Leveraging multimodal large language models for video-based traffic accident analysis. Accid. Anal. Prev. 2025, 219, 108077. [Google Scholar] [CrossRef]
- Muhammad, K.; Hussain, T.; Ullah, H.; Ser, J.D.; Rezaei, M.; Kumar, N.; Hijji, M.; Bellavista, P.; de Albuquerque, V.H.C. Vision-based semantic segmentation in scene understanding for autonomous driving: Recent achievements, challenges, and outlooks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22694–22715. [Google Scholar] [CrossRef]
- Outay, F.; Mengash, H.A.; Adnan, M. Applications of unmanned aerial vehicle (UAV) in road safety, traffic and highway infrastructure management: Recent advances and challenges. Transp. Res. Part A Policy Pract. 2020, 141, 116–129. [Google Scholar] [CrossRef]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Fan, H.; Hu, Q.; Ling, H. Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7380–7399. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Vijayakumar, A.; Vairavasundaram, S. YOLO-based object detection models: A review and its applications. Multimed. Tools Appl. 2024, 83, 83535–83574. [Google Scholar] [CrossRef]
- Wang, C.; Han, Y.; Yang, C.; Wu, M.; Chen, Z.; Yun, L.; Jin, X. CF-YOLO for small target detection in drone imagery based on YOLOv11 algorithm. Sci. Rep. 2025, 15, 16741. [Google Scholar] [CrossRef]
- Wang, T.; Ma, Z.; Yang, T.; Zou, S. PETNet: A YOLO-based prior enhanced transformer network for aerial image detection. Neurocomputing 2023, 547, 126384. [Google Scholar] [CrossRef]
- Yu, H.; Li, G.; Zhang, W.; Huang, Q.; Du, D.; Tian, Q.; Nicu, S. The unmanned aerial vehicle benchmark: Object detection, tracking and baseline. Int. J. Comput. Vis. 2020, 128, 1141–1159. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Jocher, G. Ultralytics YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 30 January 2026).
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. 2023. Available online: https://github.com/topics/yolov8 (accessed on 30 January 2026).
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar] [CrossRef]
- Jocher, G.; Qiu, J. Ultralytics YOLOv11. 2024. Available online: https://github.com/topics/yolo11 (accessed on 30 January 2026).
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham Switzerland, 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Mhalla, A.; Chateau, T.; Gazzah, S.; Amara, N.E.B. An embedded computer-vision system for multi-object detection in traffic surveillance. IEEE Trans. Intell. Transp. Syst. 2019, 20, 4006–4018. [Google Scholar] [CrossRef]
- Chen, C.; Liu, B.; Wan, S.; Qiao, P.; Pei, Q. An edge traffic flow detection scheme based on deep learning in an intelligent transportation system. IEEE Trans. Intell. Transp. Syst. 2021, 22, 1840–1852. [Google Scholar] [CrossRef]
- Charouh, Z.; Ezzouhri, A.; Ghogho, M.; Guennoun, Z. A resource-efficient CNN-based method for moving vehicle detection. Sensors 2022, 22, 1193. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar] [CrossRef]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable ConvNets v2: More deformable, better results. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9300–9308. [Google Scholar] [CrossRef]
- Shin, Y.; Shin, H.; Ok, J.; Back, M.; Youn, J.; Kim, S. DCEF2-YOLO: Aerial detection YOLO with deformable convolution–efficient feature fusion for small target detection. Remote Sens. 2024, 16, 1071. [Google Scholar] [CrossRef]
- Xu, X.; Xing, Z.; Sun, M.; Zhang, P.; Yang, K. Enhancing UAV object detection through multi-scale deformable convolutions and adaptive fusion attention. J. Supercomput. 2025, 81, 1301. [Google Scholar] [CrossRef]
- Peng, J.; Lv, K.; Wang, G.; Xiao, W.; Ran, T.; Yuan, L. MLSA-YOLO: A multi-level feature fusion and scale-adaptive framework for small object detection. J. Supercomput. 2025, 81, 528. [Google Scholar] [CrossRef]
- Wang, W.; Li, S.; Shao, J.; Jumahong, H. LKC-Net: Large kernel convolution object detection network. Sci. Rep. 2023, 13, 9535. [Google Scholar] [CrossRef] [PubMed]
- Shi, C.; Zheng, X.; Zhao, Z.; Zhang, K.; Su, Z.; Lu, Q. LSKF-YOLO: Large selective kernel feature fusion network for power tower detection in high-resolution satellite remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–16. [Google Scholar] [CrossRef]
- Wang, Z.; Li, Y.; Liu, Y.; Meng, F. Improved object detection via large kernel attention. Expert Syst. Appl. 2024, 240, 122507. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Qiu, Y.; Sha, F.; Niu, L. DKA-YOLO: Enhanced small object detection via dilation kernel aggregation convolution modules. IEEE Access 2024, 12, 187353–187366. [Google Scholar] [CrossRef]
- Jiang, T.; Li, C.; Yang, M.; Wang, Z. An improved YOLOv5s algorithm for object detection with an attention mechanism. Electronics 2022, 11, 2494. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Wang, J.; Wu, J.; Wu, J.; Wang, J.; Wang, J. YOLOv7 optimization model based on attention mechanism applied in dense scenes. Appl. Sci. 2023, 13, 9173. [Google Scholar] [CrossRef]
- Wang, S.; Liu, Y.; Wang, X.; Xu, J. An improved YOLO algorithm for UAV detection in formation flight. Signal Image Video Process. 2025, 19, 195. [Google Scholar] [CrossRef]
- Li, M.; Chen, Y.; Zhang, T.; Huang, W. TA-YOLO: A lightweight small object detection model based on multi-dimensional trans-attention module for remote sensing images. Complex Intell. Syst. 2024, 10, 5459–5473. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 7036–7045. [Google Scholar] [CrossRef]
- Gong, Y.; Yu, X.; Ding, Y.; Peng, X.; Zhao, J.; Han, Z. Effective fusion factor in FPN for tiny object detection. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 1159–1167. [Google Scholar] [CrossRef]
- Shi, Z.; Hu, J.; Ren, J.; Ye, H.; Yuan, X.; Ouyang, Y.; He, J.; Ji, B.; Guo, J. HS-FPN: High frequency and spatial perception FPN for tiny object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 6896–6904. [Google Scholar] [CrossRef]
- Meng, X.; Yuan, F.; Zhang, D. Improved model MASW YOLO for small target detection in UAV images based on YOLOv8. Sci. Rep. 2025, 15, 25027. [Google Scholar] [CrossRef] [PubMed]
- Yang, G.; Lei, J.; Zhu, Z.; Cheng, S.; Feng, Z.; Liang, R. AFPN: Asymptotic feature pyramid network for object detection. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics; IEEE: Piscataway, NJ, USA, 2023; pp. 2184–2189. [Google Scholar] [CrossRef]
- Liao, D.; Zhang, J.; Tao, Y.; Jin, X. ATBHC-YOLO: Aggregate transformer and bidirectional hybrid convolution for small object detection. Complex Intell. Syst. 2025, 11, 38. [Google Scholar] [CrossRef]
- He, J.; Liu, B.; Chen, H. HDPNet: Hourglass vision transformer with dual-path feature pyramid for camouflaged object detection. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision; IEEE: Piscataway, NJ, USA, 2025; pp. 8638–8647. [Google Scholar] [CrossRef]
- Cao, D.; Chen, Z.; Gao, L. An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks. Hum.-Centric Comput. Inf. Sci. 2020, 10, 14. [Google Scholar] [CrossRef]
- Zhou, L.; Zhao, S.; Li, S.; Wang, Y.; Liu, Y.; Zuo, X. A lightweight object detection method based on fine-grained information extraction and exchange in UAV aerial images. Knowl.-Based Syst. 2025, 315, 113253. [Google Scholar] [CrossRef]
- Wu, P.; Xu, Y.; Ma, Y.; Zhang, Y.; Xu, Y. LYA-YOLO: A lightweight and accurate YOLO model in drone aerial image scenes. Expert Syst. Appl. 2026, 321, 132166. [Google Scholar] [CrossRef]
- Pu, Y.; Wang, Y.; Xia, Z.; Han, Y.; Wang, Y.; Gan, W.; Wang, Z.; Song, S.; Huang, G. Adaptive Rotated Convolution for Rotated Object Detection. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 6566–6577. [Google Scholar] [CrossRef]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Yang, Z.; Guan, Q.; Zhao, K.; Yang, J.; Xu, X.; Long, H.; Tang, Y. Multi-branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for Accurate Object Detection. In Proceedings of the Pattern Recognition and Computer Vision, Urumqi, China, 18–20 October 2024; pp. 492–505. [Google Scholar] [CrossRef]
- Xue, C.; Xia, Y.; Wu, M.; Chen, Z.; Cheng, F.; Yun, L. EL-YOLO: An efficient and lightweight low-altitude aerial objects detector for onboard applications. Expert Syst. Appl. 2024, 256, 124848. [Google Scholar] [CrossRef]
- Feng, Y.; Huang, J.; Du, S.; Ying, S.; Yong, J.H.; Li, Y.; Ding, G.; Ji, R.; Gao, Y. Hyper-YOLO: When visual object detection meets hypergraph computation. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 2388–2401. [Google Scholar] [CrossRef]
- Luo, H.; Wang, Y.; Chen, Y.; Li, X.; Zhan, J.; Zuo, D. EBC-YOLO: A remote sensing target recognition model adapted for complex environments. Earth Sci. Inform. 2025, 18, 282. [Google Scholar] [CrossRef]
- Zhao, X.; Zhang, H.; Zhang, W.; Ma, J.; Li, C.; Ding, Y.; Zhang, Z. MSUD-YOLO: A novel multiscale small object detection model for UAV aerial images. Drones 2025, 9, 429. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, X.; Sun, S.; You, H.; Wang, Y.; Lin, J.; Wang, J. Vehicle detection in drone aerial views based on lightweight OSD-YOLOv10. Sci. Rep. 2025, 15, 25155. [Google Scholar] [CrossRef] [PubMed]
- Li, M.; Liang, X.; Hu, Q.; Lin, Y.e.; Xia, C. Multi-scale feature fusion with knowledge distillation for object detection in aerial imagery. Eng. Appl. Artif. Intell. 2025, 158, 111518. [Google Scholar] [CrossRef]
- Li, Z.; Lian, S.; Pan, D.; Wang, Y.; Liu, W. AD-Det: Boosting object detection in UAV images with Focused small objects and balanced tail classes. Remote Sens. 2025, 17, 1556. [Google Scholar] [CrossRef]
- Yan, H.; Kong, X.; Wang, J.; Tomiyama, H. ST-YOLO: An enhanced detector of small objects in unmanned aerial vehicle imagery. Drones 2025, 9, 338. [Google Scholar] [CrossRef]
- Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar] [CrossRef]
- Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-attention with relative position representations. arXiv 2018, arXiv:1803.02155. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 510–519. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Learning spatial fusion for single-shot object detection. arXiv 2019, arXiv:1911.09516. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]








| Component | Configuration |
|---|---|
| Central processing unit | Intel Core Ultra 7 (20-core) |
| Graphics processing unit | NVIDIA GeForce RTX 5090D (24 GB VRAM) |
| Memory | 32 GB RAM |
| Software | Python 3.8.20, PyTorch 2.2.1, CUDA 12.1 |
| Hyper-Parameter | Value |
|---|---|
| Input resolution | 640 × 640 |
| Total epochs | 300 |
| Batch size | 16 |
| Optimizer | Stochastic gradient descent |
| Initial learning rate | 0.01 |
| Final learning rate | 0.0001 (cosine decay) |
| P | R | mAP50 | mAP75 | mAP50:95 | Para | FPS | GFLOPs | |
|---|---|---|---|---|---|---|---|---|
| YOLOv5-S [14] | 41.6 | 32.3 | 31.4 | 18.9 | 17.3 | 7.1 | 170 | 16.0 |
| YOLOv5-M [14] | 46.7 | 35.2 | 35.2 | 22.5 | 20.2 | 20.9 | 133 | 23.8 |
| YOLOv8-S [15] | 49.5 | 38.7 | 36.1 | 23.9 | 23.4 | 11.1 | 250 | 28.8 |
| YOLOv8-M [15] | 53.3 | 41.2 | 38.9 | 26.3 | 25.6 | 25.8 | 130 | 23.2 |
| YOLOv10-S [16] | 51.0 | 38.1 | 39.3 | 24.7 | 23.8 | 8.1 | 345 | 24.8 |
| YOLOv10-M [16] | 53.9 | 42.1 | 42.3 | 26.5 | 25.8 | 16.5 | 174 | 15.3 |
| YOLOv11-S [17] | 51.8 | 38.1 | 39.0 | 22.8 | 23.4 | 9.4 | 167 | 9.41 |
| YOLOv11-M [17] | 54.0 | 43.1 | 43.9 | 27.1 | 26.9 | 20.0 | 103 | 20.0 |
| Hyper-YOLO [54] | 50.6 | 38.8 | 39.6 | — | 23.8 | 11.2 | — | 39.0 |
| EL-YOLO [53] | 48.8 | 40.8 | 42.9 | — | 24.8 | 6.7 | — | 1.1 |
| EBC-YOLO [55] | 55.3 | 42.0 | 44.3 | — | 26.7 | 10.2 | — | 35.5 |
| MSUD-YOLO [56] | 53.0 | 42.0 | 43.4 | — | 25.6 | 6.8 | 134 | — |
| CF-YOLO [7] | 52.8 | 43.4 | 44.9 | — | 27.5 | 23.9 | 377 | 23.9 |
| AD-YOLO (Ours) | 56.9 | 43.5 | 45.4 | 27.8 | 27.8 | 15.3 | 192 | 14.1 |
| P | R | mAP50 | mAP75 | mAP50:95 | Para | FPS | GFLOPs | |
|---|---|---|---|---|---|---|---|---|
| YOLOv5-S [14] | 36.3 | 30.3 | 28.6 | 17.3 | 16.9 | 7.1 | 204 | 16.0 |
| YOLOv5-M [14] | 40.4 | 35.6 | 31.4 | 20.8 | 19.3 | 20.9 | 165 | 23.8 |
| YOLOv8-S [15] | 36.2 | 30.2 | 29.4 | 18.4 | 17.4 | 11.1 | 285 | 28.8 |
| YOLOv8-M [15] | 35.8 | 34.2 | 30.3 | 18.9 | 17.8 | 25.8 | 160 | 23.2 |
| YOLOv10-S [16] | 35.4 | 27.4 | 26.3 | 15.9 | 15.0 | 8.1 | 370 | 24.8 |
| YOLOv10-M [16] | 39.7 | 29.8 | 29.4 | 17.3 | 16.9 | 16.5 | 217 | 15.3 |
| YOLOv11-S [17] | 44.8 | 33.9 | 31.2 | 17.7 | 17.4 | 9.4 | 190 | 9.4 |
| YOLOv11-M [17] | 39.6 | 30.1 | 31.8 | 21.8 | 19.9 | 20.0 | 155 | 20.3 |
| MSDC-DETR [25] | — | — | 30.6 | — | 18.6 | 19.6 | — | 59.1 |
| OSD-YOLO [57] | 42.3 | 33.1 | 31.5 | — | 17.8 | 1.6 | — | 7.9 |
| MFF-KD [58] | — | — | 33.9 | 23.5 | 21.3 | 10.5 | — | — |
| AD-Det [59] | — | — | 34.2 | 21.9 | 20.1 | 64.1 | — | 107.2 |
| ST-YOLO [60] | — | — | 33.4 | — | — | 9.00 | — | 20.1 |
| PETNet [8] | — | — | 38.6 | 22.3 | 21.5 | 83.0 | — | 63.9 |
| AD-YOLO (Ours) | 45.2 | 34.4 | 35.4 | 25.8 | 23.0 | 15.3 | 273 | 14.1 |
| YOLOv5-M [14] | YOLOv8-M [15] | YOLOv10-M [16] | YOLOv11-M [17] | AD-YOLO | |
|---|---|---|---|---|---|
| Pedestrian | 42.9 | 45.4 | 46.7 | 47.8 | 49.1 |
| Person | 32.3 | 35.1 | 35.9 | 36.2 | 37.9 |
| Bicycle | 13.5 | 15.3 | 17.1 | 17.6 | 20.1 |
| Car | 74.9 | 78.3 | 81.5 | 81.8 | 82.3 |
| Van | 38.6 | 43.7 | 49.2 | 49.7 | 50.3 |
| Truck | 31.1 | 37.4 | 41.5 | 43.0 | 44.3 |
| Tricycle | 20.3 | 23.7 | 33.7 | 34.1 | 34.9 |
| Awning-tricycle | 10.9 | 14.4 | 17.9 | 18.2 | 19.6 |
| Bus | 47.3 | 52.1 | 61.9 | 62.6 | 64.2 |
| Motorcycle | 40.2 | 43.6 | 49.3 | 50.2 | 51.2 |
| mAP50 | 35.2 | 38.9 | 42.3 | 43.9 | 45.4 |
| Car | Truck | Bus | mAP50 | |
|---|---|---|---|---|
| YOLOv5-M [14] | 66.6 | 4.8 | 22.8 | 31.4 |
| YOLOv8-M [15] | 67.7 | 3.5 | 19.7 | 30.3 |
| YOLOv10-M [16] | 68.4 | 3.0 | 23.1 | 31.5 |
| YOLOv11-M [17] | 69.7 | 3.5 | 22.2 | 31.8 |
| AD-YOLO (Ours) | 71.0 | 4.9 | 30.3 | 35.4 |
| VisDrone2019 | UAVDT | |||||
|---|---|---|---|---|---|---|
| YOLOv5-M [14] | 11.3 | 27.6 | 32.6 | 10.7 | 25.5 | 32.1 |
| YOLOv8-M [15] | 13.6 | 36.1 | 39.1 | 10.1 | 23.6 | 29.6 |
| YOLOv10-M [16] | 14.5 | 37.6 | 45.8 | 11.1 | 24.2 | 28.7 |
| YOLOv11-M [17] | 15.6 | 38.4 | 49.6 | 11.5 | 25.7 | 31.1 |
| AD-YOLO (Ours) | 16.2 | 39.3 | 51.2 | 12.1 | 27.9 | 32.5 |
| Method | mAP50 | Para | GFLOPs | |||
|---|---|---|---|---|---|---|
| YOLOv8-M [15] | 38.9 | 12.6 | 33.4 | 42.0 | 25.8 | 23.2 |
| ARCUnit (baseline) | 40.5 | 13.4 | 34.2 | 42.6 | 22.8 | 20.3 |
| +Global-attention [61] | 39.5 | 12.9 | 33.5 | 42.2 | 23.6 | 21.1 |
| +Self-attention [62] | 40.3 | 13.2 | 33.9 | 42.4 | 22.8 | 20.5 |
| +Selective kernel network [63] | 41.3 | 13.9 | 35.5 | 42.4 | 33.9 | 28.6 |
| +CBAM [34] | 41.5 | 14.0 | 35.8 | 42.6 | 22.8 | 20.4 |
| +GDA-MK | 41.6 | 14.3 | 35.9 | 43.6 | 22.8 | 20.4 |
| Method | mAP50 | Para | GFLOPs | |||
|---|---|---|---|---|---|---|
| YOLOv8-M (baseline) [15] | 38.9 | 12.6 | 33.4 | 42.0 | 25.9 | 23.2 |
| +Adaptive FPN [44] | 39.5 | 13.1 | 33.9 | 42.5 | 17.9 | 16.5 |
| +Adaptively spatial feature fusion [64] | 41.3 | 14.1 | 35.7 | 43.1 | 27.1 | 24.8 |
| +MDCAP | 42.1 | 14.6 | 36.1 | 42.1 | 23.4 | 21.2 |
| +HSPFP | 42.7 | 14.6 | 37.7 | 43.7 | 17.3 | 15.8 |
| +DPCFPN (MDCAP+HSPFP) | 43.3 | 15.4 | 37.4 | 43.9 | 16.6 | 15.1 |
| Baseline | AG | DPCFPN | HDRepLK | mAP50 | Para | GFLOPs | |||
|---|---|---|---|---|---|---|---|---|---|
| ✓ | 38.9 | 12.6 | 33.4 | 42.0 | 25.86 | 23.2 | |||
| ✓ | ✓ | 41.6 | 14.3 | 35.9 | 43.6 | 22.76 | 20.4 | ||
| ✓ | ✓ | 43.3 | 15.4 | 37.4 | 43.9 | 16.63 | 15.1 | ||
| ✓ | ✓ | ✓ | 44.1 | 15.9 | 38.3 | 46.9 | 18.35 | 16.8 | |
| ✓ | ✓ | ✓ | 42.7 | 14.8 | 36.8 | 43.9 | 13.53 | 12.4 | |
| ✓ | ✓ | ✓ | ✓ | 45.4 | 16.2 | 39.3 | 51.2 | 15.26 | 14.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Deng, Y.; Hu, Y.; Ye, Y.; Xu, P. AD-YOLO: A Unified Method for Traffic-Dense and Small Object Detection in UAV Images. Drones 2026, 10, 338. https://doi.org/10.3390/drones10050338
Deng Y, Hu Y, Ye Y, Xu P. AD-YOLO: A Unified Method for Traffic-Dense and Small Object Detection in UAV Images. Drones. 2026; 10(5):338. https://doi.org/10.3390/drones10050338
Chicago/Turabian StyleDeng, Yu, Yucong Hu, Yun Ye, and Pengpeng Xu. 2026. "AD-YOLO: A Unified Method for Traffic-Dense and Small Object Detection in UAV Images" Drones 10, no. 5: 338. https://doi.org/10.3390/drones10050338
APA StyleDeng, Y., Hu, Y., Ye, Y., & Xu, P. (2026). AD-YOLO: A Unified Method for Traffic-Dense and Small Object Detection in UAV Images. Drones, 10(5), 338. https://doi.org/10.3390/drones10050338

