UN-YOLOv5s: A UAV-Based Aerial Photography Detection Algorithm
Abstract
:1. Introduction
- (1)
- In order to solve the problem of gradient disappearance or explosion and equal distribution of attention weights caused by the deepening of the model network, a CSR module is set to make the stable transmission of feature information and always pay attention to effective information.
- (2)
- Aiming at the problem of missing location information caused by continuous convolution, feature fusion is used to fuse deep semantic information and shallow location information to improve the generalization ability of the model.
- (3)
- Increase the size of the detection head by up-sampling twice to obtain a smaller feature prediction map, improving prediction probability. At the same time, a more suitable anchor box is selected to improve the detection accuracy.
2. Related Work
2.1. Introduction to the YOLO Algorithm
2.2. YOLOv5 Loss Calculation
2.3. YOLOv5 Network Structure
3. The Proposed Algorithm
3.1. A More Focused and Stable CSR Module
3.2. More Accurate Small Object Detection Mechanism (MASD)
3.3. Multi-Scale Feature Fusion Path (MCF)
4. Experimental Results
4.1. Loss Function Comparison
4.2. Experimental Results of UN-YOLOv5s
4.3. Example Effect Diagram
4.4. Ablation Experiments
4.5. Comparison of Different YOLO Versions
5. Discussion and Future Research
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Sample Availability
References
- Fahlstrom, P.G.; Gleason, T.J.; Sadraey, M.H. Introduction to UAV Systems; John Wiley & Sons: Hoboken, NJ, USA, 2022. [Google Scholar]
- Alam, S.S.; Chakma, A.; Rahman, M.H.; Bin Mofidul, R.; Alam, M.M.; Utama, I.B.K.Y.; Jang, Y.M. RF-Enabled Deep-Learning-Assisted Drone Detection and Identification: An End-to-End Approach. Sensors 2023, 23, 4202. [Google Scholar] [CrossRef] [PubMed]
- Tsoukalas, A.; Xing, D.; Evangeliou, N.; Giakoumidis, N.; Tzes, A. Deep learning assisted visual tracking of evader-UAV. In Proceedings of the 2021 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece, 15–18 June 2021; pp. 252–257. [Google Scholar]
- Moon, S.; Jeon, J.; Kim, D.; Kim, Y. Swarm Reconnaissance Drone System for Real-Time Object Detection Over a Large Area. IEEE Access 2023, 11, 23505–23516. [Google Scholar] [CrossRef]
- Lou, H.T.; Duan, X.H.; Guo, J.M.; Liu, H.Y.; Gu, J.S.; Bi, L.Y.; Chen, H.N. DC-YOLOv8: Small Size Object Detection Algorithm Based on Camera Sensor. Electronics 2023, 12, 2323. [Google Scholar] [CrossRef]
- Winston, P.H. Artificial Intelligence; Addison-Wesley Longman Publishing Co., Inc.: Upper Saddle River, NJ, USA, 1984. [Google Scholar]
- Mariano, V.Y.; Min, J.; Park, J.H.; Kasturi, R.; Mihalcik, D.; Li, H.; Doermann, D.; Drayer, T. Performance evaluation of object detection algorithms. In Proceedings of the 2002 International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002; pp. 965–969. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016. ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; Volume 9905. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; Volume 2016, pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; Volume 2017, pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; Volume 2015, pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural information Processing Systems 28, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J.R. Object detection via region-based fully convolutional networks. ADvances Neural Inf. Process. Syst. 2016, 29, 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. Proc. Aaai Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
- Wei, C.; Tan, Z.; Qing, Q.; Zeng, R.; Wen, G. Fast Helmet and License Plate Detection Based on Lightweight YOLOv5. Sensors 2023, 23, 4335. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; Duan, X.; Chen, H.; Lou, H.; Deng, L. DBF-YOLO: UAV Small Targets Detection Based on Shallow Feature Fusion. IEEJ Trans. Electr. Electron. Eng. 2023, 18, 605–612. [Google Scholar] [CrossRef]
- Zhao, W.; Wu, D.; Zheng, X. Detection of Chrysanthemums Inflorescence Based on Improved CR-YOLOv5s Algorithm. Sensors 2023, 23, 4234. [Google Scholar] [CrossRef] [PubMed]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef] [Green Version]
- Janocha, K.; Czarnecki, W.M. On loss functions for deep neural networks in classification. arXiv 2017, arXiv:1702.05659. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Wang, W.; Xie, E.; Song, X.; Zang, Y.; Wang, W.; Lu, T.; Yu, G.; Shen, C. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 8440–8449. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 8–24 July 2021; pp. 11863–11874. [Google Scholar]
- Gao, M.; Du, Y.; Yang, Y.; Zhang, J. Adaptive anchor box mechanism to improve the accuracy in the object detection system. Multimed. Tools Appl. 2019, 78, 27383–27402. [Google Scholar] [CrossRef]
- Lu, H.; Zhang, Q. Review on the Application of Deep Convolutional Neural Networks in Computer Vision. J. Data Acquis. Process. 2016, 31, 1–17. [Google Scholar]
- Li, D.; Wong, K.D.; Hu, Y.H. Detection, classification, and tracking of targets. IEEE Signal Process. Mag. 2002, 19, 17–29. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Imambi, S.; Prakash, K.B.; Kanagachidambaresan, G.R. PyTorch. Program. Tensorflow Solut. Edge Comput. Appl. 2021, 87–104. [Google Scholar] [CrossRef]
- Flach, P.; Kull, M. Precision-recall-gain curves: PR analysis done right. In Proceedings of the Dvances in Neural Information Processing Systems 28, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Mehods | mAP@0.5(%) | mAP@0.5:0.95(%) | P | R | GFLOPs | Speed (ms) |
---|---|---|---|---|---|---|
YOLOv5s | 32.1 | 17 | 0.407 | 0.342 | 15.8 | 7.4 |
YOLOv5s+MASD | 36.6 | 20 | 0.453 | 0.373 | 33.9 | 9.4 |
YOLOv5s+MASD+MCF | 39.7 | 21.8 | 0.487 | 0.396 | 35.6 | 9.1 |
UN-YOLOv5s | 40.5 | 22.5 | 0.489 | 0.404 | 37.4 | 9.1 |
Mehods | mAP@0.5(%) | mAP@0.5:0.95(%) | P | R | GFLOPs | Speed (ms) |
---|---|---|---|---|---|---|
YOLOv5s | 32.1 | 17 | 0.407 | 0.342 | 15.8 | 7.4 |
YOLOv5l | 38.3 | 21.5 | 0.499 | 0.382 | 107.8 | 15.4 |
YOLOv3 | 38.7 | 21.2 | 0.492 | 0.383 | 154.7 | 19.3 |
YOLOv8s | 39.4 | 23.5 | 0.504 | 0.385 | 28.5 | 5.6 |
UN-YOLOv5s | 40.5 | 22.5 | 0.489 | 0.404 | 37.4 | 9.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, J.; Liu, X.; Bi, L.; Liu, H.; Lou, H. UN-YOLOv5s: A UAV-Based Aerial Photography Detection Algorithm. Sensors 2023, 23, 5907. https://doi.org/10.3390/s23135907
Guo J, Liu X, Bi L, Liu H, Lou H. UN-YOLOv5s: A UAV-Based Aerial Photography Detection Algorithm. Sensors. 2023; 23(13):5907. https://doi.org/10.3390/s23135907
Chicago/Turabian StyleGuo, Junmei, Xingchen Liu, Lingyun Bi, Haiying Liu, and Haitong Lou. 2023. "UN-YOLOv5s: A UAV-Based Aerial Photography Detection Algorithm" Sensors 23, no. 13: 5907. https://doi.org/10.3390/s23135907
APA StyleGuo, J., Liu, X., Bi, L., Liu, H., & Lou, H. (2023). UN-YOLOv5s: A UAV-Based Aerial Photography Detection Algorithm. Sensors, 23(13), 5907. https://doi.org/10.3390/s23135907