RTUAV-YOLO: A Family of Efficient and Lightweight Models for Real-Time Object Detection in UAV Aerial Imagery
Abstract
1. Introduction
- (1)
- To address the loss of contextual information for small objects caused by multiple downsampling in YOLOv11, we introduced a Progressive Dilated Separable Convolution Module (PDSCM) in the P4–P5 phase of the backbone network. This module reduces feature sampling and uses expanding depth-wise separable convolutions to detect objects at different scales. It establishes contextual spatial relationships between these scales, enhancing the representation of small objects at multiple scales. A P2 detection head is introduced to the head, allowing the model to focus more on detailed features, thereby improving its sensitivity to small objects.
- (2)
- To address the issues of unbalanced features and decreased information content in small-scale object processing in traditional architectures, we designed a lightweight Multi-Scale Feature Adaptive Modulation module (MSFAM) and replaced the C3K2 module in the backbone network. This significantly enhanced the feature extraction capability of small objects through an adaptive weight generation mechanism and dual-channel heterogeneous feature aggregation. We also introduced a Lightweight DownSampling Module (LDSM) to replace the convolutional modules in the backbone and Neck, achieving efficient downsampling of feature maps while reducing computational complexity and preserving key small-object features.
- (3)
- To rectify the insensitivity of conventional Intersection over Union (IoU) metrics to-ward small objects, we develop a Minimum Point Distance Wise IoU (MPDWIoU) loss function. This loss function integrates minimum point distance metrics, dynamic anchor focusing strategies, and auxiliary bounding box supervision to specifically mitigate gradient imbalance and low precision in small-target localization while enhancing regression robustness in cluttered environments.
- (4)
- Comprehensive experiments on the VisDrone2019 dataset confirms that RTUAV-YOLO achieves an effective balance between computational efficiency and detection performance. RTUAV-YOLO achieves an average improvement of 3.4% and 2.4% in mAP50 and mAP50-95, respectively, compared to the baseline model YOLOv11, while reducing the number of parameters by 65.3%. Comprehensive comparison results across different model series are presented in Figure 1. Furthermore, cross-dataset evaluations on the UAVDT and UAVVaste datasets confirm RTUAV-YOLO’s generalization and robustness across diverse UAV aerial imagery conditions. The complete implementation code is publicly available at https://gitee.com/zhangruizhi0110/RTUAV-YOLO (accessed on 20 October 2025).
2. Related Works
2.1. General Object Detection Models and Limitations
2.2. Object Detection Model for UAV Aerial Images
2.3. Bounding Box Regression Loss Function
3. Methods
3.1. RTUAV-YOLO Overall Framework
3.2. Multi-Scale Feature Adaptive Modulation Module
3.2.1. Dynamic Weight Generator
3.2.2. Dual-Branch Feature Extractor
3.2.3. Adaptive Feature Fusion
3.3. Progressive Dilated Separable Convolution Module
3.4. Lightweight DownSampling Module
3.5. Minimum Point Distance Wise IoU
4. Experiments and Results
4.1. Datasets
4.2. Implementation Details
4.3. Evaluation Metrics
4.3.1. Precision
4.3.2. Recall
4.3.3. Mean Average Precision
4.3.4. Model Parameter Scale
4.3.5. Floating Point Operations
4.4. Comparative Experiments
4.4.1. Experimental Results on the VisDrone2019 Dataset
4.4.2. Experimental Results on UAVDT and UAVVaste Datasets
4.5. Ablation Experiments
4.6. Edge Computing Platform Deployment
5. Discussion
5.1. Resource Suitability Under UAV Constraints
5.2. Resolution Choice and Small-Object Fidelity
5.3. Cross-Domain Generalization Capability Verification
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, X.; He, J.; Yao, Y.; Zhang, J.; Liang, H.; Wang, H.; Hong, Y. Classifying Urban Land Use by Integrating Remote Sensing and Social Media Data. Int. J. Geogr. Inf. Sci. 2017, 31, 1675–1696. [Google Scholar] [CrossRef]
- Byun, S.; Shin, I.-K.; Moon, J.; Kang, J.; Choi, S.-I. Road Traffic Monitoring from UAV Images Using Deep Learning Networks. Remote Sens. 2021, 13, 4027. [Google Scholar] [CrossRef]
- Cao, D.; Zhang, B.; Zhang, X.; Yin, L.; Man, X. Optimization Methods on Dynamic Monitoring of Mineral Reserves for Open Pit Mine Based on UAV Oblique Photogrammetry. Measurement 2023, 207, 112364. [Google Scholar] [CrossRef]
- Albattah, W.; Masood, M.; Javed, A.; Nawaz, M.; Albahli, S. Custom CornerNet: A Drone-Based Improved Deep Learning Technique for Large-Scale Multiclass Pest Localization and Classification. Complex Intell. Syst. 2023, 9, 1299–1316. [Google Scholar] [CrossRef]
- Zhang, H.; Wang, L.; Tian, T.; Yin, J. A Review of Unmanned Aerial Vehicle Low-Altitude Remote Sensing (UAV-LARS) Use in Agricultural Monitoring in China. Remote Sens. 2021, 13, 1221. [Google Scholar] [CrossRef]
- Sun, G.; He, L.; Sun, Z.; Wu, Q.; Liang, S.; Li, J.; Niyato, D.; Leung, V.C.M. Joint Task Offloading and Resource Allocation in Aerial-Terrestrial UAV Networks With Edge and Fog Computing for Post-Disaster Rescue. IEEE Trans. Mob. Comput. 2024, 23, 8582–8600. [Google Scholar] [CrossRef]
- Teixidó, P.; Gómez-Galán, J.A.; Caballero, R.; Pérez-Grau, F.J.; Hinojo-Montero, J.M.; Muñoz-Chavero, F.; Aponte, J. Secured Perimeter with Electromagnetic Detection and Tracking with Drone Embedded and Static Cameras. Sensors 2021, 21, 7379. [Google Scholar] [CrossRef] [PubMed]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Weng, S.; Wang, H.; Wang, J.; Xu, C.; Zhang, E. YOLO-SRMX: A Lightweight Model for Real-Time Object Detection on Unmanned Aerial Vehicles. Remote Sens. 2025, 17, 2313. [Google Scholar] [CrossRef]
- Liu, Y.; He, M.; Hui, B. ESO-DETR: An Improved Real-Time Detection Transformer Model for Enhanced Small Object Detection in UAV Imagery. Drones 2025, 9, 143. [Google Scholar] [CrossRef]
- Luo, X.; Zhu, X. YOLO-SMUG: An Efficient and Lightweight Infrared Object Detection Model for Unmanned Aerial Vehicles. Drones 2025, 9, 245. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2016; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Li, Y.; Li, Q.; Pan, J.; Zhou, Y.; Zhu, H.; Wei, H.; Liu, C. SOD-YOLO: Small-Object-Detection Algorithm Based on Improved YOLOv8 for UAV Images. Remote Sens. 2024, 16, 3057. [Google Scholar] [CrossRef]
- Zhou, S.; Zhou, H.; Qian, L. A Multi-Scale Small Object Detection Algorithm SMA-YOLO for UAV Remote Sensing Images. Sci. Rep. 2025, 15, 9255. [Google Scholar] [CrossRef] [PubMed]
- Guo, C.; Fan, B.; Zhang, Q.; Xiang, S.; Pan, C. AugFPN: Improving Multi-Scale Feature Learning for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 12595–12604. [Google Scholar]
- Cheng, G.; Lang, C.; Wu, M.; Xie, X.; Yao, X.; Han, J. Feature Enhancement Network for Object Detection in Optical Remote Sensing Images. J. Remote Sens. 2021, 2021, 9805389. [Google Scholar] [CrossRef]
- Zhang, K.; Shen, H. Multi-Stage Feature Enhancement Pyramid Network for Detecting Objects in Optical Remote Sensing Images. Remote Sens. 2022, 14, 579. [Google Scholar] [CrossRef]
- Li, H.; Li, Y.; Xiao, L.; Zhang, Y.; Cao, L.; Wu, D. RLRD-YOLO: An Improved YOLOv8 Algorithm for Small Object Detection from an Unmanned Aerial Vehicle (UAV) Perspective. Drones 2025, 9, 293. [Google Scholar] [CrossRef]
- Chang, J.; Lu, Y.; Xue, P.; Xu, Y.; Wei, Z. Automatic Channel Pruning via Clustering and Swarm Intelligence Optimization for CNN. Appl. Intell. 2022, 52, 17751–17771. [Google Scholar] [CrossRef]
- Guo, S.; Wang, Y.; Li, Q.; Yan, J. DMCP: Differentiable Markov Channel Pruning for Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1539–1547. [Google Scholar]
- He, Y.; Zhang, X.; Sun, J. Channel Pruning for Accelerating Very Deep Neural Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1389–1397. [Google Scholar]
- Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning Efficient Convolutional Networks Through Network Slimming. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2736–2744. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Fan, Q.; Li, Y.; Deveci, M.; Zhong, K.; Kadry, S. LUD-YOLO: A Novel Lightweight Object Detection Network for Unmanned Aerial Vehicle. Inf. Sci. 2025, 686, 121366. [Google Scholar] [CrossRef]
- Cao, J.; Bao, W.; Shang, H.; Yuan, M.; Cheng, Q. GCL-YOLO: A GhostConv-Based Lightweight YOLO Network for UAV Small Object Detection. Remote Sens. 2023, 15, 4932. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv 2020, arXiv:2005.03572. [Google Scholar] [CrossRef]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- Lu, S.; Lu, H.; Dong, J.; Wu, S. Object Detection for UAV Aerial Scenarios Based on Vectorized IOU. Sensors 2023, 23, 3061. [Google Scholar] [CrossRef] [PubMed]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 370–386. [Google Scholar]
- Kraft, M.; Piechocki, M.; Ptak, B.; Walas, K. Autonomous, Onboard Vision-Based Trash and Litter Detection in Low Altitude Aerial Images Collected by an Unmanned Aerial Vehicle. Remote Sens. 2021, 13, 965. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Qiu, J. YOLOv8 by Ultralytics. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 20 August 2025).
- Zhong, H.; Zhang, Y.; Shi, Z.; Zhang, Y.; Zhao, L. PS-YOLO: A Lighter and Faster Network for UAV Object Detection. Remote Sens. 2025, 17, 1641. [Google Scholar] [CrossRef]
- Zhao, X.; Zhang, H.; Zhang, W.; Ma, J.; Li, C.; Ding, Y.; Zhang, Z. MSUD-YOLO: A Novel Multiscale Small Object Detection Model for UAV Aerial Images. Drones 2025, 9, 429. [Google Scholar] [CrossRef]
- Peng, H.; Xie, H.; Liu, H.; Guan, X. LGFF-YOLO: Small Object Detection Method of UAV Images Based on Efficient Local–Global Feature Fusion. J. Real-Time Image Proc. 2024, 21, 167. [Google Scholar] [CrossRef]
- Xiao, Y.; Xu, T.; Xin, Y.; Li, J. FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 8673–8681. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
- Hu, L.; Yuan, J.; Cheng, B.; Xu, Q. CSFPR-RTDETR: Real-Time Small Object Detection Network for UAV Images Based on Cross-Spatial-Frequency Domain and Position Relation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5638219. [Google Scholar] [CrossRef]
- Zhang, H.; Liu, K.; Gan, Z.; Zhu, G.-N. UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Imagery. arXiv 2025, arXiv:2501.01855. [Google Scholar]
- Yang, F.; Fan, H.; Chu, P.; Blasch, E.; Ling, H. Clustered Object Detection in Aerial Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8311–8320. [Google Scholar]
- Deng, S.; Li, S.; Xie, K.; Song, W.; Liao, X.; Hao, A.; Qin, H. A Global-Local Self-Adaptive Network for Drone-View Object Detection. IEEE Trans. Image Process. 2021, 30, 1556–1569. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–12 December 2020; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 21002–21012. [Google Scholar]
- Du, B.; Huang, Y.; Chen, J.; Huang, D. Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 13435–13444. [Google Scholar]
- Tang, S.; Zhang, S.; Fang, Y. HIC-YOLOv5: Improved YOLOv5 For Small Object Detection. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–16 May 2024; pp. 6614–6619. [Google Scholar]
- Suo, J.; Wang, T.; Zhang, X.; Chen, H.; Zhou, W.; Shi, W. HIT-UAV: A High-Altitude Infrared Thermal Dataset for Unmanned Aerial Vehicle-Based Object Detection. Sci. Data 2023, 10, 227. [Google Scholar] [CrossRef]
- Wang, J.; Yang, W.; Guo, H.; Zhang, R.; Xia, G.-S. Tiny Object Detection in Aerial Images. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 3791–3798. [Google Scholar]










| Model | P (%) | R (%) | mAP50 (%) | mAP50-95 (%) | Params (M) | FLOPs (G) | FPS |
|---|---|---|---|---|---|---|---|
| YOLOv8-N | 42.5 | 33.9 | 33.3 | 19.6 | 3.16 | 8.9 | 184 |
| YOLOv8-S | 47.9 | 37.8 | 39.1 | 23.4 | 11.17 | 28.8 | 143 |
| YOLOv8-M | 53.7 | 43.2 | 44.4 | 27.1 | 25.90 | 79.3 | 89 |
| YOLOv8-L | 56.7 | 44.1 | 45.9 | 28.4 | 43.69 | 165.7 | 66 |
| YOLOv8-X | 57.5 | 44.6 | 46.8 | 28.9 | 68.23 | 258.5 | 48 |
| YOLOv11-N | 42.3 | 33.6 | 33.1 | 19.3 | 2.62 | 6.6 | 192 |
| YOLOv11-S | 48.2 | 38.1 | 39.4 | 23.6 | 9.46 | 21.7 | 160 |
| YOLOv11-M | 53.4 | 43.0 | 44.1 | 26.9 | 20.11 | 68.5 | 97 |
| YOLOv11-L | 56.9 | 44.3 | 46.0 | 28.5 | 25.37 | 87.6 | 75 |
| YOLOv11-X | 57.7 | 44.8 | 46.9 | 29.0 | 56.97 | 196.0 | 56 |
| RTUAV-YOLO-N | 44.7 | 35.6 | 35.9 | 21.3 | 0.79 | 8.3 | 187 |
| RTUAV-YOLO-S | 52.5 | 41.1 | 42.9 | 25.9 | 2.49 | 21.4 | 162 |
| RTUAV-YOLO-M | 57.5 | 44.7 | 46.8 | 28.9 | 7.23 | 56.1 | 109 |
| RTUAV-YOLO-L | 60.2 | 46.0 | 49.2 | 30.5 | 10.43 | 79.1 | 88 |
| RTUAV-YOLO-X | 61.3 | 48.1 | 51.4 | 32.4 | 22.86 | 168.1 | 65 |
| Model | mAP50 (%) | mAP50-95 (%) | Params (M) | FLOPs (G) | FPS |
|---|---|---|---|---|---|
| PS-YOLO-N | 33.2 | 19.3 | 1.39 | 5.1 | 213 |
| PS-YOLO-S | 40.7 | 24.2 | 5.53 | 20.0 | 178 |
| PS-YOLO-M | 46.0 | 28.3 | 14.82 | 73.5 | 91 |
| MUSD-YOLO | 43.4 | - | 6.77 | - | 82 |
| LGFF-YOLO | 43.5 | 22.9 | 4.15 | 12.4 | 181 |
| RT-DETR-R18 | 44.6 | 26.7 | 20.03 | 60.0 | 101 |
| RT-DETR-R34 | 46.0 | 27.2 | 31.04 | 92.0 | 78 |
| RT-DETR-R50 | 48.3 | 28.8 | 42.02 | 136.0 | 69 |
| FBRT-YOLO-N | 34.4 | 20.2 | 0.94 | 6.7 | 192 |
| FBRT-YOLO-S | 42.4 | 25.9 | 2.92 | 22.9 | 143 |
| FBRT-YOLO-M | 45.9 | 28.4 | 7.26 | 58.7 | 94 |
| FBRT-YOLO-L | 47.7 | 29.7 | 14.63 | 119.2 | 70 |
| FBRT-YOLO-X | 48.4 | 30.1 | 22.89 | 185.8 | 52 |
| CSFPR-RTDETR | 42.3 | 24.9 | 14.09 | 63.9 | 98 |
| UAV-DETR-EV2 | 47.5 | 28.7 | 13.02 | 43.1 | 101 |
| UAV-DETR-R18 | 48.8 | 29.8 | 20.19 | 70.2 | 91 |
| UAV-DETR-R50 | 51.1 | 31.5 | 42.07 | 170.6 | 61 |
| RTUAV-YOLO-N | 35.9 | 21.3 | 0.79 | 8.3 | 187 |
| RTUAV-YOLO-S | 42.9 | 25.9 | 2.49 | 21.4 | 162 |
| RTUAV-YOLO-M | 46.8 | 28.9 | 7.23 | 56.1 | 109 |
| RTUAV-YOLO-L | 49.2 | 30.5 | 10.43 | 79.1 | 88 |
| RTUAV-YOLO-X | 51.4 | 32.4 | 22.86 | 168.1 | 65 |
| Model | mAP50 (%) | mAP75 (%) | mAP50-95 (%) |
|---|---|---|---|
| YOLOv11-X [11] | 28.7 | 16.8 | 15.8 |
| ClusDet [51] | 26.5 | 12.5 | 13.7 |
| GLSAN [52] | 28.1 | 18.8 | 17.0 |
| GFL [53] | 29.5 | 17.9 | 16.9 |
| CEASC [54] | 30.9 | 17.8 | 17.1 |
| FBRT-YOLO [47] | 31.1 | 18.9 | 18.4 |
| RTUAV-YOLO-X | 31.8 | 19.3 | 18.8 |
| Model | mAP50 (%) | mAP75 (%) | mAP50-95 (%) |
|---|---|---|---|
| YOLOv11-S [11] | 63.0 | 41.2 | 27.8 |
| HIC-YOLOv5 [55] | 65.1 | 43.1 | 30.5 |
| RT-DETR-R18 [48] | 72.6 | 45.7 | 36.3 |
| RT-DETR-R50 [48] | 73.5 | 47.1 | 37.4 |
| RTUAV-YOLO-X | 76.8 | 49.7 | 38.2 |
| Backbone | LDSM | MSFAM | PDSCM | MPDWIoU | mAP50 (%) | mAP50-95 (%) | Params (M) | FLOPs (G) |
|---|---|---|---|---|---|---|---|---|
| × | × | × | × | × | 39.4 | 23.6 | 9.46 | 21.7 |
| √ | × | × | × | × | 39.9 | 23.9 | 3.87 | 27.0 |
| √ | √ | × | × | × | 40.1 | 24.0 | 3.22 | 23.7 |
| √ | × | √ | × | × | 40.5 | 24.6 | 3.62 | 25.7 |
| √ | × | × | √ | × | 40.4 | 24.3 | 3.56 | 25.4 |
| √ | × | × | × | √ | 40.3 | 24.2 | 3.87 | 27.0 |
| √ | √ | × | √ | × | 41.4 | 24.8 | 3.01 | 24.2 |
| √ | × | √ | √ | × | 41.6 | 24.9 | 3.12 | 23.9 |
| √ | √ | √ | × | × | 41.8 | 25.1 | 2.94 | 22.9 |
| √ | √ | √ | √ | × | 42.4 | 25.5 | 2.49 | 21.4 |
| √ | √ | √ | √ | √ | 42.9 | 25.9 | 2.49 | 21.4 |
| Loss Function | mAP50 (%) | mAP50-95 (%) | Params (M) | FLOPs (G) |
|---|---|---|---|---|
| GIoU [34] | 42.3 | 25.5 | 2.49 | 21.4 |
| DIoU [35] | 42.2 | 25.2 | 2.49 | 21.4 |
| CIoU [36] | 42.4 | 25.5 | 2.49 | 21.4 |
| WIoUv3 [37] | 42.5 | 25.6 | 2.49 | 21.4 |
| MPDWIoU (Ours) | 42.9 | 25.9 | 2.49 | 21.4 |
| Model | mAP50 (%) | mAP50-95 (%) | Params (M) | FPS |
|---|---|---|---|---|
| YOLOv8-N | 33.3 | 19.6 | 3.16 | 35.2 |
| YOLOv8-S | 39.1 | 23.4 | 11.17 | 20.1 |
| YOLOv11-N | 33.1 | 19.3 | 2.62 | 45.1 |
| YOLOv11-S | 39.4 | 23.6 | 9.46 | 23.3 |
| PS-YOLO-N | 33.2 | 19.3 | 1.39 | 41.1 |
| PS-YOLO-S | 40.7 | 24.2 | 5.53 | 26.9 |
| FBRT-YOLO-N | 34.4 | 20.2 | 0.94 | 52.6 |
| FBRT-YOLO-S | 42.4 | 25.9 | 2.92 | 34.1 |
| UAV-DETR-EV2 | 47.5 | 28.7 | 13.02 | 19.4 |
| RTUAV-YOLO-N | 35.9 | 21.3 | 0.79 | 53.7 |
| RTUAV-YOLO-S | 42.9 | 25.9 | 2.49 | 37.8 |
| Model | mAP50 (%) | mAP50-95 (%) | Params (M) | FLOPs (G) |
|---|---|---|---|---|
| YOLOv8-N | 71.4 | 43.2 | 3.16 | 8.9 |
| YOLOv8-S | 80.1 | 49.3 | 11.17 | 28.8 |
| YOLOv8-M | 86.2 | 54.2 | 25.90 | 79.3 |
| YOLOv11-N | 71.7 | 43.2 | 2.62 | 6.6 |
| YOLOv11-S | 80.6 | 49.5 | 9.46 | 21.7 |
| YOLOv11-M | 87.1 | 54.8 | 20.11 | 68.5 |
| RTUAV-YOLO-N | 76.9 | 43.2 | 0.79 | 8.3 |
| RTUAV-YOLO-S | 82.3 | 49.3 | 2.49 | 21.4 |
| RTUAV-YOLO-M | 89.2 | 57.1 | 7.23 | 56.1 |
| Model | mAP50 (%) | mAP50-95 (%) | Params (M) | FLOPs (G) |
|---|---|---|---|---|
| YOLOv8-N | 39.2 | 17.5 | 3.16 | 8.9 |
| YOLOv8-S | 43.6 | 19.1 | 11.17 | 28.8 |
| YOLOv8-M | 46.1 | 22.3 | 25.90 | 79.3 |
| YOLOv11-N | 39.6 | 17.9 | 2.62 | 6.6 |
| YOLOv11-S | 43.9 | 19.2 | 9.46 | 21.7 |
| YOLOv11-M | 46.7 | 22.8 | 20.11 | 68.5 |
| RTUAV-YOLO-N | 40.1 | 18.6 | 0.79 | 8.3 |
| RTUAV-YOLO-S | 46.1 | 20.8 | 2.49 | 21.4 |
| RTUAV-YOLO-M | 49.2 | 25.3 | 7.23 | 56.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, R.; Hou, J.; Li, L.; Zhang, K.; Zhao, L.; Gao, S. RTUAV-YOLO: A Family of Efficient and Lightweight Models for Real-Time Object Detection in UAV Aerial Imagery. Sensors 2025, 25, 6573. https://doi.org/10.3390/s25216573
Zhang R, Hou J, Li L, Zhang K, Zhao L, Gao S. RTUAV-YOLO: A Family of Efficient and Lightweight Models for Real-Time Object Detection in UAV Aerial Imagery. Sensors. 2025; 25(21):6573. https://doi.org/10.3390/s25216573
Chicago/Turabian StyleZhang, Ruizhi, Jinghua Hou, Le Li, Ke Zhang, Li Zhao, and Shuo Gao. 2025. "RTUAV-YOLO: A Family of Efficient and Lightweight Models for Real-Time Object Detection in UAV Aerial Imagery" Sensors 25, no. 21: 6573. https://doi.org/10.3390/s25216573
APA StyleZhang, R., Hou, J., Li, L., Zhang, K., Zhao, L., & Gao, S. (2025). RTUAV-YOLO: A Family of Efficient and Lightweight Models for Real-Time Object Detection in UAV Aerial Imagery. Sensors, 25(21), 6573. https://doi.org/10.3390/s25216573

