Research on a UAV-View Object-Detection Method Based on YOLOv7-Tiny
Abstract
:1. Introduction
- Solving the problem of multi-scale and occlusion: By introducing the VarifocalLoss function and optimizing the feature fusion network to the BiFPN structure, the detection capability for small objects and objects in densely distributed scenes has been significantly improved. This approach enhances the model’s adaptability to object size and occlusion changes, improving its ability to identify objects in complex environments accurately;
- Improvement in detection accuracy: By utilizing the new Partial_C_Detect detection head and introducing Adaptive Kernel Convolution (AKConv) technology, this paper optimizes the recognition and localization of small objects explicitly, while reducing false detection in complex backgrounds. Integrating these techniques enhances the model’s accuracy and adjusts the sampling shapes and convolution parameters to accommodate objects of different sizes and shapes;
- Optimization of computational resources and accuracy improvement: Introducing the expandable residual (DWR) attention module significantly enhances the model’s ability to represent features when processing dynamically changing and complex backgrounds. By optimizing the use of computational resources, the improved algorithm maintains high detection accuracy while being capable of stable operation in environments with limited resources, making it particularly suitable for handling complex UAV aerial views and ensuring the model’s applicability and scalability.
2. Related Work
3. UAV View Object-Detection Method Based on YOLOv7-Tiny
3.1. YOLOv7-Tiny Algorithm
3.2. The Improved YOLOv7-Tiny Algorithm
3.2.1. VarifocalLoss Loss Function
3.2.2. BiFPN Feature Fusion Network
3.2.3. Partial_C_Detect Object-Detection Header
3.2.4. Adaptive Kernel Convolution (AKConv)
3.2.5. Dilatable Wide-Range (DWR) Attention Module
4. Experiments and Results
4.1. Experimental Environment and Hyperparameter Settings
4.2. Data Set
4.3. Performance Indicators
4.4. Experiment
4.4.1. Ablation Experiment
4.4.2. Comparative Experiment
4.4.3. Object-Detection Results
- Lightweight optimization: Techniques such as quantization and knowledge distillation can reduce the model’s computational and memory demands while maintaining high detection accuracy;
- Advanced feature fusion: Incorporating dynamic feature pyramids or context-aware attention modules could improve the model’s capability to detect small objects and overlapping targets more effectively;
- Robust data augmentation: Developing strategies tailored to extreme conditions, such as low-light environments or high-motion scenarios, could enhance the model’s generalization and robustness;
- Global context modeling: Integrating transformer-based modules could expand the model’s receptive field, improving its ability to capture global context in complex and dense scenes.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kumar, S.; Jain, S. Application of Drones in Agriculture: A Review. Int. J. Agric. Sci. Res. 2020, 10, 145–158. [Google Scholar]
- Wang, F.; Wang, H.; Xu, Z. UAV-based traffic flow detection and analysis incorporating deep learning techniques. Transp. Res. Part C Emerg. Technol. 2022, 128, 103–121. [Google Scholar]
- Johnson, D.; Gonzalez, L.F. UAVs in search and rescue missions: An algorithmic survey. Robot. Auton. Syst. 2020, 124, 103345. [Google Scholar]
- Wei, X.; Ma, J.; Sun, C. A Survey on Security of Unmanned Aerial Vehicle Systems: Attacks and Countermeasures. IEEE Internet Things J. 2024, 11, 34826–34847. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Yue, X. YOLOv8: Advanced Object Detection with Transformer-Based Mechanisms. arXiv 2023, arXiv:2305.12345. [Google Scholar]
- Zhao, L.; Zhu, M. MS-YOLOv7: YOLOv7 Based on Multi-Scale for Object Detection on UAV Aerial Photography. Drones 2023, 7, 188. [Google Scholar] [CrossRef]
- Zhang, X.; Song, Y.; Song, T.; Yang, D.; Ye, Y.; Zhou, J.; Zhang, L. AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters. arXiv 2023, arXiv:2311.11587. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVRP), Honolulu, HI, USA, 21–26 July 2017; pp. 2980–2988. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision & Pattern Recognition (CVRP), San Diego, CA, USA, 20–25 June 2005. [Google Scholar] [CrossRef]
- Anantharaman, R.; Velazquez, M.; Lee, Y. Utilizing Mask R-CNN for Detection and Segmentation of Oral Diseases. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 3–6 December 2018; pp. 2197–2204. [Google Scholar] [CrossRef]
- Zhou, Y.; Maskell, S. Detecting and Tracking Small Moving Objects in Wide Area Motion Imagery (WAMI) Using Convolutional Neural Networks (CNNs). In Proceedings of the 22nd International Conference on Information Fusion (FUSION), Ottawa, ON, Canada, 2–5 July 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Kang, M.; Ting, C.-M.; Ting, F.F.; Phan, R.C.-W. BGF-YOLO: Enhanced YOLOv8 with Multiscale Attentional Feature Fusion for Brain Tumor Detection. arXiv 2023, arXiv:2309.12585. [Google Scholar]
- Chen, X.; Liu, J.; Wang, Y. YOLOv9: Transformer-Augmented Object Detection for Aerial Imagery. IEEE Trans. Image Process. 2023, 32, 1321–1335. [Google Scholar]
- Zhou, P.; Zhang, H.; Li, F. Lightweight YOLOv10 for Real-Time Object Detection in UAV Systems. Pattern Recognit. Lett. 2024, 157, 102–110. [Google Scholar] [CrossRef]
- Li, A.; Rahim, S.K.N.A.; Hamzah, R.; Gao, Y. YOLO algorithm with hybrid attention feature pyramid network for solder joint defect detection. arXiv 2024, arXiv:2401.01214. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI conference on artificial intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 12993–13000. [Google Scholar]
- Zhang, H.; Wang, Y.; Dayoub, F.; Sunderhauf, N. VarifocalNet: An IoU-aware Dense Object Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8510–8519. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVRP), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Wang, Z.; Liu, Z.; Xu, G.; Cheng, S. Object Detection in UAV Aerial Images Based on Improved YOLOv7-tiny. In Proceedings of the 4th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Zhuhai, China, 12–14 May 2023; pp. 370–374. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcum, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Wei, H.; Liu, X.; Xu, S.; Dai, Z.; Dai, Y.; Xu, X. DWRSeg: Dilation-wise Residual Network for Real-time Semantic Segmentation. arXiv 2022, arXiv:2212.01173. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
VarifocalLoss | BiFPN | Partial_C_Detect | AKConv | DWR | mAP@0.5 (%) | mAP@0.5:0.95 (%) | Precision (%) | Recall (%) | F1-Score | Params (MB) | FLOPs (G) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
YOLOv7-tiny | × | × | × | × | × | 35.3 | 18.3 | 45.0 | 38.7 | 41.6 | 22.96 | 13.3 |
A | √ | × | × | × | × | 36.1 | 19.5 | 48.0 | 38.7 | 42.9 | 22.96 | 13.3 |
B | √ | √ | × | × | × | 36.2 | 19.5 | 47.3 | 38.9 | 42.7 | 23.72 | 13.9 |
C | √ | √ | √ | × | × | 37.7 | 20.4 | 48.2 | 39.1 | 43.2 | 25.76 | 14.9 |
D | √ | √ | √ | √ | × | 37.9 | 20.8 | 48.4 | 39.3 | 43.4 | 23.28 | 13.2 |
E | √ | √ | √ | √ | √ | 38.2 | 21.2 | 49.5 | 39.5 | 43.9 | 28.98 | 16.2 |
Algorithm | mAP@0.5 | Params (MB) | FLOPs (G) |
---|---|---|---|
Yolov3-tiny | 19.9% | 33.51 | 13.0 |
Yolov4-tiny | 25.7% | 22.41 | 14.2 |
Yolov5s | 33.3% | 7.0 | 15.9 |
Yolov7-tiny | 35.3% | 22.96 | 13.9 |
Yolov6m | 31.70% | 34.24 | 82.0 |
YoloX | 25.5% | 9.0 | 26.8 |
PPYoloE | 32.4% | 8.9 | 31.8 |
Faster R-CNN | 22.8% | 136.9 | 180 |
Swin-transformer | 31.6% | 50.0 | 58.0 |
Yolov8s | 37.9% | 10.59 | 28.7 |
Yolov8n | 31.4% | 2.86 | 8.2 |
Yolov8m | 35.9% | 24.65 | 78.7 |
Yolov9-c | 37.0% | 24.13 | 102.1 |
Yolov9s | 37.6% | 9.6 | 26.7 |
Yolov10n | 29.0% | 2.3 | 8.2 |
BGF-Yolov10 | 32.0% | 2.0 | 8.6 |
Ours | 38.2% | 28.98 | 16.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Miao, Y.; Wang, X.; Zhang, N.; Wang, K.; Shao, L.; Gao, Q. Research on a UAV-View Object-Detection Method Based on YOLOv7-Tiny. Appl. Sci. 2024, 14, 11929. https://doi.org/10.3390/app142411929
Miao Y, Wang X, Zhang N, Wang K, Shao L, Gao Q. Research on a UAV-View Object-Detection Method Based on YOLOv7-Tiny. Applied Sciences. 2024; 14(24):11929. https://doi.org/10.3390/app142411929
Chicago/Turabian StyleMiao, Yuyang, Xihan Wang, Ning Zhang, Kai Wang, Lianhe Shao, and Quanli Gao. 2024. "Research on a UAV-View Object-Detection Method Based on YOLOv7-Tiny" Applied Sciences 14, no. 24: 11929. https://doi.org/10.3390/app142411929
APA StyleMiao, Y., Wang, X., Zhang, N., Wang, K., Shao, L., & Gao, Q. (2024). Research on a UAV-View Object-Detection Method Based on YOLOv7-Tiny. Applied Sciences, 14(24), 11929. https://doi.org/10.3390/app142411929