YOLOFLY: A Consumer-Centric Framework for Efficient Object Detection in UAV Imagery
Abstract
:1. Introduction
- The design of the C4f feature extraction module, which uses spatially separable convolution bottleneck blocks to significantly reduce computational costs while maintaining high feature extraction ability, thereby improving inference speed in UAV scenarios.
- The proposal of the DWcDetect head, which replaces some traditional convolutions with depthwise separable convolutions to drastically reduce computational costs and support real-time detection for fast-moving UAVs.
- The introduction of the MPSA multi-level attention mechanism, which enhances the model’s ability to capture fine-grained features and addresses issues with small and overlapping objects.
- The design of the new ACIoU loss function, which considers both aspect ratio and area ratio to overcome the limitations of the traditional IoU metric in handling area discrepancies, leading to improved prediction accuracy.
2. Related Works
3. Method
3.1. C4f
3.2. DWcDetect
3.3. MPSA
3.4. ACIoU
4. Results
4.1. Experimental Environment and Parameter Settings
4.2. Ablation Experiment
- YOLOv11n: Baseline model without any added modules.
- YOLOC: Only the C4f module was added.
- YOLOD: Only the DWcDetect module was added.
- YOLOM: Only the MPSA module was added.
- YOLOA: Only the ACIoU loss function was added.
- YOLOCD: The C4f and DWcDetect modules were added.
- YOLOCM: The C4f and MPSA modules were added.
- YOLOCA: The C4f module and the ACIoU loss function were added.
- YOLODM: The DWcDetect and MPSA modules were added.
- YOLODA: The DWcDetect module and the ACIoU loss function were added.
- YOLOMA: The MPSA module and the ACIoU loss function were added.
- YOLOCDM: The C4f, DWcDetect, and MPSA modules were added.
- YOLOCDA: The C4f module, the DWcDetect module, and the ACIoU loss function were added.
- YOLODMA: The DWcDetect module, the MPSA module, and the ACIoU loss function were added.
- YOLOFLY: All modules (the C4f, DWcDetect, and MPSA modules, as well as the ACIoU loss function) were combined.
- The C4f and DWcDetect modules effectively improve detection speed, particularly in the YOLOCD model, where detection speed significantly increases.
- The MPSA and ACIoU modules play a crucial role in enhancing detection precision, especially in small-object detection and bounding box regression accuracy.
- The comprehensive optimization of the YOLOFLY model significantly improves detection accuracy while maintaining excellent detection speed, demonstrating an outstanding balance of performance and making it suitable for applications requiring both real-time processing and high accuracy.
4.3. Real-World Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mohsan, S.A.H.; Othman, N.Q.H.; Li, Y.; Alsharif, M.H.; Khan, M.A. Unmanned aerial vehicles (UAVs): Practical aspects, applications, open challenges, security issues, and future trends. Intell. Serv. Robot. 2023, 16, 109–137. [Google Scholar] [CrossRef]
- Eskandari, R.; Mahdianpari, M.; Mohammadimanesh, F.; Salehi, B.; Brisco, B.; Homayouni, S. Meta-analysis of unmanned aerial vehicle (UAV) imagery for agro-environmental monitoring using machine learning and statistical models. Remote Sens. 2020, 12, 3511. [Google Scholar] [CrossRef]
- Yuan, S.; Li, Y.; Bao, F.; Xu, H.; Yang, Y.; Yan, Q.; Zhong, S.; Yin, H.; Xu, J.; Huang, Z.; et al. Marine environmental monitoring with unmanned vehicle platforms: Present applications and future prospects. Sci. Total Environ. 2023, 858, 159741. [Google Scholar] [CrossRef] [PubMed]
- Hu, X.; Assaad, R.H. The use of unmanned ground vehicles and unmanned aerial vehicles in the civil infrastructure sector: Applications, robotic platforms, sensors, and algorithms. Expert Syst. Appl. 2023, 232, 120897. [Google Scholar] [CrossRef]
- Wang, C.; Liang, G. Overview of Research on Object Detection Based on YOLO. In Proceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering, Dalian China, 17–19 November 2023. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part V 13; Springer International Publishing: Cham, Switzerland, 2014. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Fan, H.; Hu, Q.; Ling, H. Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7380–7399. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef]
- Shi, L.; Zhan, Z.H.; Liang, D. Memory-based ant colony system approach for multi-source data associated dynamic electric vehicle dispatch optimization. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17491–17505. [Google Scholar] [CrossRef]
- Amit, Y.; Felzenszwalb, P.; Girshick, R. Object detection. In Computer Vision: A Reference Guide; Springer International Publishing: Cham, Switzerland, 2021; pp. 875–883. [Google Scholar]
- Wu, X.; Sahoo, D.; Hoi, S.C. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Wu, P.; Chai, B.; Nie, X.; Yan, L.; Wang, Z.; Zhou, Q.; Wang, B.; Peng, Y.; Li, H. Enhanced object detection: A study on vast vocabulary object detection track for v3det challenge 2024. arXiv 2024, arXiv:2406.09201. [Google Scholar]
- Minderer, M.; Gritsenko, A.; Houlsby, N. Scaling Open-Vocabulary Object Detection. arXiv 2023, arXiv:2306.09683. [Google Scholar]
- Zang, Y.; Li, W.; Han, J.; Zhou, K.; Loy, C.C. Contextual object detection with multimodal large language models. Int. J. Comput. Vis. 2024, 133, 825–843. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Girshick, R. Fast r-cnn. arXiv 2015, arXiv:1504.08083. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2017. [Google Scholar]
- Hosain, M.T.; Zaman, A.; Abir, M.R.; Akter, S.; Mursalin, S.; Khan, S.S. Synchronizing Object Detection: Applications, Advancements and Existing Challenges. IEEE Access 2024, 12, 54129–54167. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. arXiv 2019, arXiv:1904.01355. [Google Scholar]
- Long, X.; Deng, K.; Wang, G.; Zhang, Y.; Dang, Q.; Gao, Y.; Shen, H.; Ren, J.; Han, S.; Ding, E.; et al. PP-YOLO: An effective and efficient implementation of object detector. arXiv 2020, arXiv:2007.12099. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Huang, Y.F.; Liu, T.J.; Liu, K.H. Improved small object detection for road driving based on YOLO-R. In Proceedings of the 2022 IEEE International Conference on Consumer Electronics-Taiwan, Taipei, Taiwan, 6–8 July 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
- Gung, J.J.; Lin, C.Y.; Lin, P.F.; Chung, W.K. An incremental meta defect detection system for printed circuit boards. In Proceedings of the 2022 IEEE International Conference on Consumer Electronics-Taiwan, Taipei, Taiwan, 6–8 July 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
- Guo, X.; Jiang, F.; Chen, Q.; Wang, Y.; Sha, K.; Chen, J. Deep Learning-Enhanced Environment Perception for Autonomous Driving: MDNet with CSP-DarkNet53. Pattern Recognit. 2024, 160, 111174. [Google Scholar] [CrossRef]
- Chen, W.H.; Hsu, H.J.; Lin, Y.C. Implementation of a real-time uneven pavement detection system on FPGA platforms. In Proceedings of the 2022 IEEE International Conference on Consumer Electronics-Taiwan, Taipei, Taiwan, 6–8 July 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
- Rahman, R.; Bin Azad, Z.; Bakhtiar Hasan, M. Densely-populated traffic detection using YOLOv5 and non-maximum suppression ensembling. In Proceedings of the International Conference on Big Data, IoT, and Machine Learning: BIM 2021, Cox’s Bazar, Bangladesh, 23–25 September 2021; Springer: Singapore, 2022. [Google Scholar]
- Khaki, S.; Safaei, N.; Pham, H.; Wang, L. WheatNet: A lightweight convolutional neural network for high-throughput image-based wheat head detection and counting. Neurocomputing 2022, 489, 78–89. [Google Scholar] [CrossRef]
- Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Zhang, H.; Zu, K.; Lu, J.; Zou, Y.; Meng, D. EPSANet: An efficient pyramid squeeze attention block on convolutional neural network. In Proceedings of the Asian Conference on Computer Vision, Macau, Chia, 4–8 December 2022. [Google Scholar]
- Junos, M.H.; Mohd Khairuddin, A.S.; Thannirmalai, S.; Dahari, M. Automatic detection of oil palm fruits from UAV images using an improved YOLO model. Vis. Comput. 2022, 38, 2341–2355. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34. [Google Scholar]
- Du, S.; Zhang, B.; Zhang, P.; Xiang, P. An improved bounding box regression loss function based on CIOU loss for multi-scale object detection. In Proceedings of the 2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML), Chengdu, China, 16–18 July 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar]
- Huang, P.; Tian, S.; Su, Y.; Tan, W.; Dong, Y.; Xu, W. IA-CIOU: An Improved IOU Bounding Box Loss Function for SAR Ship Target Detection Methods. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10569–10582. [Google Scholar] [CrossRef]
- Chen, D.; Miao, D. Control distance IoU and control distance IoU loss for better bounding box regression. Pattern Recognit. 2023, 137, 109256. [Google Scholar]
- GitHub-Ultralytics/Ultralytics: YOLOv8 in nn > Modules > Block > c2f. Available online: https://github.com/ultralytics/ultralytics/tree/main (accessed on 24 December 2024).
- GitHub-Ultralytics/Ultralytics: YOLOv11 in nn > Modules > Block > c3k2. Available online: https://github.com/ultralytics/ultralytics/tree/main (accessed on 24 December 2024).
Abbreviation | Full Name |
---|---|
YOLO | you only look once |
DWcDetect | depthwise separable convolution detection |
MPSA | multi-level pyramid split attention |
ACIoU | area-constrained intersection over union |
UAV | unmanned aerial vehicle |
mAP | mean average precision |
MS | missed detection rate for slow flight (5 m/s) |
MF | missed detection rate for fast flight (19 m/s) |
Model | Size | mAP50-95 | Inference Time | Params | FLOPs |
---|---|---|---|---|---|
(Pixels) | (%) | (ms) | (M) | (B) | |
Faster R-CNN | 640 × 640 | 17.0 | 680.4 | 82.3 | 65.2 |
CenterNet | 640 × 640 | 12.5 | 272.7 | 32.67 | - |
RetinaNet | 640 × 640 | 11.8 | 73.0 | - | - |
SSD | 640 × 640 | 12.9 | 246.9 | 23.61 | 22.18 |
CornerNet | 640 × 640 | 17.4 | 96.2 | - | - |
YOLOv5n | 640 × 640 | 17.7 | 73.6 | 2.6 | 7.7 |
YOLOv8n | 640 × 640 | 19.26 | 80.7 | 3.2 | 8.7 |
YOLOv8m | 640 × 640 | 23.25 | 234.4 | 25.9 | 78.9 |
YOLOv9t | 640 × 640 | 20.28 | 73.6 | 2.1 | 7.7 |
YOLOv9m | 640 × 640 | 27.16 | 228.3 | 20.1 | 76.8 |
YOLOv10n | 640 × 640 | 21.46 | 60.8 | 2.3 | 6.7 |
YOLOv10m | 640 × 640 | 27.08 | 186.5 | 15.4 | 59.1 |
YOLOv11n | 640 × 640 | 21.52 | 56.1 | 2.6 | 6.5 |
YOLO11m | 640 × 640 | 28.06 | 183.4 | 20.1 | 68.0 |
YOLOFLY | 640 × 640 | 24.72 | 28.9 | 2.0 | 4.7 |
Model | Size | mAP50-95 | Inference Time | Params | FLOPs |
---|---|---|---|---|---|
(Pixels) | (%) | (ms) | (M) | (B) | |
YOLOv11n | 640 × 640 | 21.52 | 56.1 | 2.6 | 6.5 |
YOLOC | 640 × 640 | 21.53 | 46.2 | 2.3 | 5.6 |
YOLOD | 640 × 640 | 21.28 | 43.8 | 2.3 | 5.6 |
YOLOM | 640 × 640 | 22.49 | 60.9 | 2.7 | 6.8 |
YOLOA | 640 × 640 | 22.74 | 56.2 | 2.6 | 6.5 |
YOLOCD | 640 × 640 | 21.51 | 27.6 | 2.0 | 4.7 |
YOLOCM | 640 × 640 | 22.56 | 50.7 | 2.4 | 5.8 |
YOLOCA | 640 × 640 | 22.83 | 46.4 | 2.3 | 5.6 |
YOLODM | 640 × 640 | 22.38 | 49.5 | 2.4 | 5.9 |
YOLODA | 640 × 640 | 22.65 | 44.0 | 2.3 | 5.5 |
YOLOMA | 640 × 640 | 24.64 | 61.1 | 2.7 | 6.7 |
YOLOCDM | 640 × 640 | 23.49 | 32.8 | 2.1 | 5.0 |
YOLOCDA | 640 × 640 | 22.52 | 27.7 | 2.0 | 4.8 |
YOLOCMA | 640 × 640 | 24.74 | 51.2 | 2.4 | 5.9 |
YOLODMA | 640 × 640 | 24.65 | 49.6 | 2.4 | 5.8 |
YOLOFLY | 640 × 640 | 24.72 | 28.9 | 2.0 | 4.7 |
Model | Size | mAP50-95 | IT | MS | MF |
---|---|---|---|---|---|
(Pixels) | (%) | (ms) | (%) | (%) | |
YOLOv8n | 1080 × 1080 | 57.68 | 65.8 | 0.07 | 8.36 |
YOLOv9t | 1080 × 1080 | 61.84 | 53.8 | 0.06 | 8.02 |
YOLOv10n | 1080 × 1080 | 65.83 | 40.9 | 0.04 | 5.29 |
YOLOv11n | 1080 × 1080 | 67.52 | 34.6 | 0.04 | 4.71 |
YOLOFLY | 1080 × 1080 | 78.16 | 20.2 | 0 | 0.96 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, P.; Fei, H.; Jia, D.; Sun, Z.; Lian, N.; Wei, J.; Zhou, J. YOLOFLY: A Consumer-Centric Framework for Efficient Object Detection in UAV Imagery. Electronics 2025, 14, 498. https://doi.org/10.3390/electronics14030498
Ma P, Fei H, Jia D, Sun Z, Lian N, Wei J, Zhou J. YOLOFLY: A Consumer-Centric Framework for Efficient Object Detection in UAV Imagery. Electronics. 2025; 14(3):498. https://doi.org/10.3390/electronics14030498
Chicago/Turabian StyleMa, Pengwei, Hongmei Fei, Dingyi Jia, Zheng Sun, Nan Lian, Jingyi Wei, and Jie Zhou. 2025. "YOLOFLY: A Consumer-Centric Framework for Efficient Object Detection in UAV Imagery" Electronics 14, no. 3: 498. https://doi.org/10.3390/electronics14030498
APA StyleMa, P., Fei, H., Jia, D., Sun, Z., Lian, N., Wei, J., & Zhou, J. (2025). YOLOFLY: A Consumer-Centric Framework for Efficient Object Detection in UAV Imagery. Electronics, 14(3), 498. https://doi.org/10.3390/electronics14030498