Research on Improved YOLOv5 Vehicle Target Detection Algorithm in Aerial Images
Abstract
:1. Introduction
2. Model and Method
2.1. Introduction to the YOLOv5 Algorithm
2.2. Model Improvement Based on Receptive Fields
2.3. Improvements Based on Attention Mechanism
- CBAM enables the network to simultaneously focus on important information in both channel and spatial dimensions, precisely locating vehicle targets. Unlike standalone attention modules that may lose information, CBAM combines the advantages of channel and spatial attention mechanisms, resulting in better detection performance.
- The CBAM module incurs low computational costs, can be integrated into any network, and is easy to operate with plug-and-play capability, allowing end-to-end training.
- By serializing information in two dimensions and adaptively refining features through the multiplication of the two types of information, CBAM effectively generates output feature maps.
2.4. Cross-Connected Vehicle Detection Based on Feature Pyramid
2.5. Lightweight Aerial Vehicle Object Detection Algorithm
3. Experimental Results and Analysis
3.1. Experimental Environment and Parameter Configuration
3.2. Analysis of Vehicle Detection Results of Small Targets in Aerial Images
3.3. Analysis of Object Detection Results for Lightweight Aerial Vehicles
4. Discussion
- The weighted bidirectional fusion method used in this paper increases the size of the model. Future work will focus on reducing the model size while improving its accuracy. Additionally, due to class imbalance, fixed-size anchor boxes may limit the universality of detection. Future work will explore improvements using anchor-free methods.
- In natural scenes, images often suffer from occlusion. Moreover, images captured from the air may exhibit variations in angle and height, leading to deformations of vehicles. Irregular arrangements of objects may also result in a significant overlap of anchor boxes. Future work will consider using rotated bounding boxes for vehicle detection to reduce overlap and improve detection accuracy, thereby further enhancing the performance of the improved model.
5. Conclusions
- The original model’s large receptive field in the feature maps makes it prone to losing small targets and information due to fixed receptive fields. To address this, a receptive field module is introduced in the shallow feature layers, utilizing dilated convolutions to adjust the eccentricity of the convolutional kernel, enabling sampling on the feature map based on different ranges without losing information, thereby enhancing the detection of small targets. The experimental results show that on the DroneVehicle dataset, the [email protected] of the original model is 0.818, while the improved model achieves an [email protected] of 0.828, with improvements in both recall and precision.
- Since targets in aerial images are small in proportion, it is easy to predict positive samples as background or other class samples. Therefore, a CBAM module is added before feature fusion to enhance the model’s focus on blurry small targets, reducing irrelevant information interference and improving the feature extraction capability. The experimental results on the DroneVehicle dataset show that M-YOLOv5 achieves a 1.7% increase in [email protected] compared to the original model, enhancing target localization.
- The bidirectional feature pyramid structure based on weighted connections strengthens the bottom-level features, enabling the full cross-fusion of bottom-level features with top-level features, enhancing feature transmission across different scales. The experimental results demonstrate that on the DroneVehicle dataset, M-YOLOv5 achieves a 2.3% increase in [email protected] compared to the original model, effectively improving the model’s target detection performance.
- Considering the real-time issue of target detection, a lightweight network structure and optimization module using depth-wise separable convolutions are applied to the improved M-YOLOv5 model, sacrificing some accuracy to reduce the model’s parameter count by 71.3%, providing a new direction for lightweight target detection.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hamet, P.; Tremblay, J. Artificial intelligence in medicine. Metabolism 2017, 69, S36–S40. [Google Scholar] [CrossRef] [PubMed]
- Zaidi, S.S.A.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digit. Signal Process. 2022, 126, 103514. [Google Scholar] [CrossRef]
- Sukanya, C.M.; Gokul, R.; Paul, V. A survey on object recognition methods. Int. J. Sci. Eng. Comput. Technol. 2016, 6, 48. [Google Scholar]
- Nguyen, T.T.; Grabner, H.; Bischof, H.; Gruber, B. On-Line Boosting for Car Detection from Aerial Images. In Proceedings of the 2007 IEEE International Conference on Research, Innovation and Vision for the Future, Hanoi, Vietnam, 5–9 March 2007; pp. 87–95. [Google Scholar]
- Cao, X.; Wu, C.; Lan, J.; Yan, P.; Li, X. Vehicle detection and motion analysis in low-altitude airborne video under urban environment. IEEE Trans. Circuits Syst. Video Technol. 2011, 21, 1522–1533. [Google Scholar] [CrossRef]
- Yu, C.; Jiang, X.; Wu, F.; Fu, Y.; Zhang, Y.; Li, X.; Fu, T.; Pei, J. Research on Vehicle Detection in Infrared Aerial Images in Complex Urban and Road Backgrounds. Electronics 2024, 13, 319. [Google Scholar] [CrossRef]
- Kuma, D.; Sinha, B. Vehicle Detection in Aerial Images: A Survey. In Data Science and Communication; Springer: Berlin/Heidelberg, Germany, 2024; pp. 145–158. [Google Scholar]
- Ali, S.; Jalal, A. Vehicle Detection and Tracking from Aerial Imagery via YOLO and Centroid Tracking. In Proceedings of the Conference ICACS’23, Larissa, Greece, 19–21 October 2023. [Google Scholar]
- Makrigiorgis, R.; Kyrkou, C.; Kolios, P. How High Can You Detect? Improved Accuracy and Efficiency at Varying Altitudes for Aerial Vehicle Detection. In Proceedings of the 2023 International Conference on Unmanned Aircraft Systems (ICUAS), Warsaw, Poland, 6–9 June 2023. [Google Scholar]
- Wu, T.H.; Wang, T.W.; Liu, Y.Q. Real-Time Vehicle and Distance Detection Based on Improved Yolo v5 Network. In Proceedings of the 2021 3rd World Symposium on Artificial Intelligence (WSAI), Guangzhou, China, 18–20 June 2021; pp. 24–28. [Google Scholar]
- Tang, H.; Liang, S.; Yao, D.; Qiao, Y. A visual defect detection for optics lens based on the YOLOv5-C3CA-SPPF network model. Opt. Express 2023, 31, 2628–2643. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Zhang, X.; Ke, X. Quad-FPN: A novel quad feature pyramid network for SAR ship detection. Remote Sens. 2021, 13, 2771. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Liu, S.; Huang, D. Receptive Field Block Net for Accurate and Fast Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Tan, M.X.; Pang, R.M.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Sun, Y.; Cao, B.; Zhu, P.; Hu, Q. Drone-Based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning. Trans. Circuits Syst. Video Technol. 2022, 32, 6700–6713. [Google Scholar] [CrossRef]
- Glas, A.S.; Lijmer, J.G.; Prins, M.H.; Bonsel, G.J.; Bossuyt, P.M. The diagnostic odds ratio: A single indicator of test performance. J. Clin. Epidemiol. 2003, 56, 1129–1135. [Google Scholar] [CrossRef] [PubMed]
- Buckland, M.; Gey, F. The relationship between recall and precision. J. Am. Soc. Inf. Sci. 1994, 45, 12–19. [Google Scholar] [CrossRef]
- Yacouby, R.; Axman, D. Probabilistic Extension of Precision, Recall, and f1 Score for More thorough Evaluation of Classification Models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Online, 20 November 2020; pp. 79–91. [Google Scholar]
Configuration Information | Title 2 |
---|---|
Operating System | Ubuntu 18.04 |
CPU | Intel Core i7 |
Memory | 32G |
GPU | NVIDA GeForce RTX-4070Ti |
Framework | Pytorch3.8 |
Model | P | R | [email protected] |
---|---|---|---|
YOLOv5 | 0.824 | 0.82 | 0.818 |
YOLOv5 + RFBs | 0.824 | 0.93 | 0.828 |
YOLOv5 + RFBs + CBAM | 0.824 | 0.96 | 0.835 |
YOLOv5 + RFBs + CBAM + BiFPN | 0.852 | 0.98 | 0.841 |
Model | [email protected] | Speed | Params | GFlops |
---|---|---|---|---|
1 | 0.841 | 4.1 | 9,560,126 | 34.8 |
2 | 0.841 | 2.7 | 3,371,068 | 26.9 |
3 | 0.765 | 2.6 | 2,743,228 | 23.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, X.; Xiu, J.; Liu, X. Research on Improved YOLOv5 Vehicle Target Detection Algorithm in Aerial Images. Drones 2024, 8, 202. https://doi.org/10.3390/drones8050202
Yang X, Xiu J, Liu X. Research on Improved YOLOv5 Vehicle Target Detection Algorithm in Aerial Images. Drones. 2024; 8(5):202. https://doi.org/10.3390/drones8050202
Chicago/Turabian StyleYang, Xue, Jihong Xiu, and Xiaojia Liu. 2024. "Research on Improved YOLOv5 Vehicle Target Detection Algorithm in Aerial Images" Drones 8, no. 5: 202. https://doi.org/10.3390/drones8050202
APA StyleYang, X., Xiu, J., & Liu, X. (2024). Research on Improved YOLOv5 Vehicle Target Detection Algorithm in Aerial Images. Drones, 8(5), 202. https://doi.org/10.3390/drones8050202