Vehicle Detection in Drone Aerial Views Based on Lightweight YOLOv10-IAD
Abstract
1. Introduction
- Involution convolution is introduced into the backbone network. Proposed by Li et al. at CVPR 2021, Involution adopts a channel-shared, space-specific design that dynamically generates kernels based on input content, inverting the design principles of standard convolution [24]. This allows us to expand the kernel size from 3 × 3 to 7 × 7 with negligible parameter increase, significantly enlarging the receptive field and enhancing spatial perception for dense small scale targets.
- The ACmix module is embedded into the neck network. Proposed by Pan et al. at CVPR 2022, ACmix unifies self-attention and convolution by revealing that the bulk of the computations of these two paradigms are carried out with the same operation and then fuse local response aggregation and global context modeling via learnable scalars [25]. This integration improves multi-scale feature interaction efficiency while preserving both local details and global semantics.
- The DyHead module replaces the original detection head. Proposed by Dai et al. at CVPR 2021, DyHead applies three serial dynamic enhancement units—scale-aware, spatial-aware, and task-aware attention—to dynamically recalibrate features [26]. This enables the detection head to adaptively adjust its response based on input content, thereby improving localization accuracy for occluded and scale-variant targets.
2. Methodology
2.1. YOLOv10 Algorithm
2.2. YOLOv10-IAD Network Model
2.2.1. Feature Extraction Network
2.2.2. Feature Fusion Network
2.2.3. Detection Head
3. Results and Discussion
3.1. Experimental Environments and Dataset
3.2. Evaluation Indexes
3.3. Experimental Results and Analysis
3.3.1. Ablation Experiments
3.3.2. Comparison and Analysis of Different Models
3.3.3. Performance Analysis on Samples of Different Difficulty
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yang, J. Research on Vehicle Target Detection Algorithm from UAV Perspective. Ph.D. Thesis, Xidian University, Xi’an, China, 2021. (In Chinese) [Google Scholar]
- Telikani, A.; Sarkar, A.; Du, B.; Shen, J. Machine Learning for UAV-Aided ITS: A Review with Comparative Study. IEEE Trans. Intell. Transp. Syst. 2024, 25, 15388–15406. [Google Scholar] [CrossRef]
- Rahman, M.H.; Sejan, M.A.S.; Aziz, M.A.; Tabassum, R.; Baik, J.I.; Song, H.K. A Comprehensive Survey of Unmanned Aerial Vehicles Detection and Classification Using Machine Learning Approach: Challenges, Solutions, and Future Directions. Remote Sens. 2024, 16, 879. [Google Scholar] [CrossRef]
- Nikouei, M.; Baroutian, B.; Nabavi, S.; Taraghi, F.; Aghaei, A.; Sajedi, A.; Moghaddam, M.E. Small Object Detection: A Comprehensive Survey on Challenges, Techniques and Real-World Applications. Intell. Syst. Appl. 2025, 27, 200561. [Google Scholar] [CrossRef]
- Hua, W.; Chen, Q. A Survey of Small Object Detection Based on Deep Learning in Aerial Images. Artif. Intell. Rev. 2025, 58, 162. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 580–587. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 779–788. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2117–2125. [Google Scholar]
- Hansen, K.S.; Bruun, F.M.; Sermsar, F.; Nygaard, M.; Koca, M. Comparative Analysis of SSD and Faster R-CNN in UAV-Based Vehicle Detection. In Proceedings of the 2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey, 21–22 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
- Ghasemi Darehnaei, Z.; Shokouhifar, M.; Mirhosseini, S.M.; Yazdanjouei, H. Two-Stage Swarm Intelligence Ensemble Deep Transfer Learning (SI-EDTL) for Vehicle Detection Using Unmanned Aerial Vehicles. Concurr. Comput. Pract. Exp. 2022, 34, e6726. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Kang, J.; Yang, H.; Kim, H. Simplifying Two-Stage Object Detectors for On-Board Remote Sensing. IEEE Access 2025, 13, 145703–145713. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Xiang, Y.; Li, B.; Wan, T. Vehicle Detection Algorithm for UAV Aerial Photography Based on Improved YOLOv5. Comput. Meas. Control 2025, 33, 48–56. (In Chinese) [Google Scholar]
- Zhu, L.; Xiong, J.; Xiong, F.; Hu, H.; Jiang, Z. YOLO-Drone: Airborne Real-Time Detection of Dense Small Objects from High-Altitude Perspective. arXiv 2023, arXiv:2304.06925. [Google Scholar]
- Cao, J.; Qiao, G.; Chen, M.; Zou, X.; Liu, D. Improvement Strategy of YOLO Algorithm for Small Target Detection from High-Altitude View. J. Comput. Appl. 2024, 44, 280–285. (In Chinese) [Google Scholar]
- Zhou, Y.; Wang, L.; Zhang, H.; Huo, J. Vehicle Detection Method Based on Multi-Layer Selective Feature for UAV Aerial Images. J. King Saud. Univ.—Comput. Inf. Sci. 2025, 37, 139. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV, USA, 3–6 December 2012; Curran Associates: Red Hook, NY, USA, 2012; pp. 1097–1105. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 8759–8768. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 7464–7473. [Google Scholar]
- Li, D.; Hu, J.; Wang, C.; Li, X.; She, Q.; Zhu, L.; Zhang, T.; Chen, Q. Involution: Inverting the Inherence of Convolution for Visual Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 12321–12330. [Google Scholar]
- Pan, X.; Ge, C.; Lu, R.; Song, S.; Chen, G.; Huang, Z.; Huang, G. On the Integration of Self-Attention and Convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 815–825. [Google Scholar]
- Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic Head: Unifying Object Detection Heads with Attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 7369–7378. [Google Scholar]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Ling, H.; Hu, Q.; Nie, Q.; Cheng, H.; Liu, C.; Liu, X.; et al. VisDrone: The Vision Meets Drone Object Detection in Image/Video Challenge Results. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 1–21. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 370–386. [Google Scholar]










| Environment | Configuration |
|---|---|
| Operating System | Ubuntu 20.04 |
| GPU | NVIDIA RTX 3090 (24 GB) |
| CPU | Intel Xeon Platinum 8362 |
| Python | 3.8.19 |
| Torch Deep Learning Framework | Torch 2.1.0 + CUDA 12.1 |
| Datasets | Number of Images | Target Instances | Target Instance |
|---|---|---|---|
| CARPK | 1448 | 89,777 | 62 |
| AU-AIR | 32,823 | 132,000+ | 4 |
| DroneVehicle | 56,878 | 260,000+ | 4.6 |
| VEDAI | 1272 | 3500 | 2.8 |
| VisDrone2019 | 10,209 | 540,000+ | 50 |
| UAVDT | 80,000 | 840,000+ | 10.5 |
| Baseline | Involution | DyHead | ACmix | Params (M) | Flops (G) | R (%) | mAP50 (%) | mAP50–95 (%) | FPS |
|---|---|---|---|---|---|---|---|---|---|
| YOLOv10n | 2.3 | 6.7 | 39.4 | 43.5 | 19.8 | 182 | |||
| √ | 2.4 | 6.9 | 40.1 | 44.0 | 20.5 | 178 | |||
| √ | 2.5 | 7.0 | 40.2 | 44.3 | 20.8 | 169 | |||
| √ | 2.5 | 7.1 | 40.5 | 44.9 | 21.2 | 166 | |||
| √ | √ | 2.7 | 7.3 | 41.2 | 45.6 | 21.8 | 162 | ||
| √ | √ | 2.7 | 7.4 | 41.5 | 46.4 | 22.2 | 158 | ||
| √ | √ | 2.8 | 7.5 | 41.8 | 46.8 | 22.8 | 155 | ||
| √ | √ | √ | 2.9 | 7.7 | 42.5 | 47.2 | 23.0 | 153 |
| Baseline | Involution | DyHead | ACmix | Params (M) | Flops (G) | R (%) | mAP50 (%) | mAP50–95 (%) | FPS |
|---|---|---|---|---|---|---|---|---|---|
| YOLOv10n | 2.3 | 6.7 | 43.5 | 48.5 | 23.0 | 182 | |||
| √ | 2.4 | 6.9 | 43.8 | 49.5 | 23.7 | 178 | |||
| √ | 2.5 | 7.0 | 44.0 | 49.8 | 23.9 | 169 | |||
| √ | 2.5 | 7.1 | 44.2 | 50.1 | 24.1 | 166 | |||
| √ | √ | 2.7 | 7.3 | 44.8 | 50.8 | 24.6 | 162 | ||
| √ | √ | 2.7 | 7.4 | 45.0 | 51.1 | 24.8 | 158 | ||
| √ | √ | 2.8 | 7.5 | 45.2 | 51.4 | 25.0 | 155 | ||
| √ | √ | √ | 2.9 | 7.7 | 45.5 | 52.0 | 25.3 | 153 |
| Method | Params (M) | Flops (G) | R (%) | mAP50 (%) | mAP50–95 (%) |
|---|---|---|---|---|---|
| Faster-R-CNN | 41.34 | 190 | 36.5 | 38.2 | 16.8 |
| Mask-R-CNN | 43.99 | 243 | 37.2 | 39.0 | 17.2 |
| RT-DETR | 19.88 | 35.2 | 38.0 | 40.5 | 18.1 |
| YOLOv5n | 1.76 | 4.1 | 29.5 | 32.8 | 18.3 |
| YOLOv5s | 7.05 | 7.96 | 30.0 | 30.4 | 17.5 |
| YOLOv8n | 3.2 | 8.1 | 30.2 | 32.4 | 18.9 |
| YOLOv8s | 11.1 | 28.4 | 35.0 | 39.2 | 21.0 |
| YOLOv10n (Baseline) | 2.3 | 6.7 | 39.4 | 43.5 | 19.8 |
| YOLOv11n | 2.58 | 6.3 | 33.5 | 34.5 | 20.0 |
| YOLOv11s | 9.41 | 21.5 | 36.5 | 39.1 | 23.4 |
| YOLOv10-IAD (Present) | 2.9 | 7.7 | 42.5 | 47.2 | 23.0 |
| Method | Params (M) | Flops (G) | R (%) | mAP50 (%) | mAP50–95 (%) |
|---|---|---|---|---|---|
| Faster-R-CNN | 41.34 | 190 | 37.2 | 38.5 | 20.2 |
| Mask-R-CNN | 43.99 | 243 | 38.1 | 39.2 | 20.8 |
| RTDETR | 19.88 | 59.85 | 39.0 | 41.0 | 21.5 |
| YOLOv5n | 1.76 | 4.1 | 38.0 | 42.5 | 22.0 |
| YOLOv5s | 7.05 | 7.96 | 39.5 | 44.0 | 22.8 |
| YOLOv8n | 3.2 | 8.1 | 38.5 | 43.2 | 22.5 |
| YOLOv8s | 11.1 | 28.4 | 41.0 | 45.5 | 23.8 |
| YOLOv10n (Baseline) | 2.3 | 6.7 | 43.5 | 48.5 | 23.0 |
| YOLOv11n | 2.58 | 6.3 | 42.0 | 47.0 | 24.0 |
| YOLOv11s | 9.41 | 21.5 | 43.2 | 48.2 | 24.5 |
| YOLOv10-IAD (Present) | 2.9 | 7.7 | 45.5 | 52.0 | 25.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, L.; Li, Z.; Yao, Y. Vehicle Detection in Drone Aerial Views Based on Lightweight YOLOv10-IAD. Sensors 2026, 26, 3585. https://doi.org/10.3390/s26113585
Zhang L, Li Z, Yao Y. Vehicle Detection in Drone Aerial Views Based on Lightweight YOLOv10-IAD. Sensors. 2026; 26(11):3585. https://doi.org/10.3390/s26113585
Chicago/Turabian StyleZhang, Lei, Zhongmin Li, and Yufeng Yao. 2026. "Vehicle Detection in Drone Aerial Views Based on Lightweight YOLOv10-IAD" Sensors 26, no. 11: 3585. https://doi.org/10.3390/s26113585
APA StyleZhang, L., Li, Z., & Yao, Y. (2026). Vehicle Detection in Drone Aerial Views Based on Lightweight YOLOv10-IAD. Sensors, 26(11), 3585. https://doi.org/10.3390/s26113585

