Lightweight Vehicle Detection Based on Improved YOLOv5s
Abstract
:1. Introduction
- A lightweight module (IPA) with a Transformer encoder is proposed, which uses a parallel two-branch structure, where one branch uses efficient attention to capture global information, while the other branch uses convolutional attention to capture local information. The two-branch structure allows the model to better capture global and local information and obtain richer contextual information.
- A lightweight and efficient multiscale spatial channel reconstruction (MSCCR) module is proposed, which consists of only one efficient convolution and one ordinary attention, without excessive computational redundancy and parametric quantities. And an efficient module was designed by combining the multiscale spatial channel reconstruction (MSCCR) module with the C3 structure in YOLOv5. It is named the C3_MR module.
- The integration of the IPA module and C3_MR module into the YOLOv5s backbone network. Compared to the original model, the mAP@50 increases by 3.1% and the parameters decrease by about 9% with no increase in FLOPS.
2. Related Work
2.1. Vehicle Detection
2.2. YOLOv5
2.3. Attention Mechanism
3. Materials and Methods
3.1. YOLOV5s Improvements
3.2. Integrated Perception Attention Module (IPA)
3.3. MultiScale Spatial Channel Reconstruction Module (MSCCR)
3.4. C3_MR Build
4. Results
4.1. Dataset
4.2. Experimental Equipment and Evaluation Indicators
4.3. Comparisons
4.4. Comparison of Test Results
4.5. Grad-CAM Visualisation
4.6. Ablation Experiment
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Du, Y.; Liu, X.; Yi, Y.; Wei, K. Optimizing Road Safety: Advancements in Lightweight Yolov8 Models and Ghostc2f Design for Real-Time Distracted Driving Detection. Sensors 2023, 23, 8844. [Google Scholar] [CrossRef]
- Rajamoorthy, R.; Arunachalam, G.; Kasinathan, P.; Devendiran, R.; Ahmadi, P.; Pandiyan, S.; Muthusamy, S.; Panchal, H.; Kazem, H.A.; Sharma, P. A Novel Intelligent Transport System Charging Scheduling for Electric Vehicles Using Grey Wolf Optimizer and Sail Fish Optimization Algorithms. Energy Sources Part A Recovery Util. Environ. Eff. 2023, 44, 3555–3575. [Google Scholar] [CrossRef]
- Yu, B.; Zhang, H.; Li, W.; Qian, C.; Li, B.; Wu, C. Ego-Lane Index Estimation Based on Lane-Level Map and Lidar Road Boundary Detection. Sensors 2021, 21, 7118. [Google Scholar] [CrossRef] [PubMed]
- Miao, Y.; Liu, F.; Hou, T.; Liu, L.; Liu, Y. A Nighttime Vehicle Detection Method Based on Yolo V3. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020. [Google Scholar]
- Tajar, A.T.; Ramazani, A.; Mansoorizadeh, M. A Lightweight Tiny-Yolov3 Vehicle Detection Approach. J. Real-Time Image Process. 2021, 18, 2389–2401. [Google Scholar] [CrossRef]
- Zhu, L.; Geng, X.; Li, Z.; Liu, C. Improving Yolov5 with Attention Mechanism for Detecting Boulders from Planetary Images. Remote Sens. 2021, 13, 3776. [Google Scholar] [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. Tph-Yolov5: Improved Yolov5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Huang, S.; He, Y.; Chen, X.-A. M-Yolo: A Nighttime Vehicle Detection Method Combining Mobilenet V2 and Yolo V3. J. Phys. Conf. Ser. 2021, 1883, 012094. [Google Scholar] [CrossRef]
- Li, X.; Qin, Y.; Wang, F.; Guo, F.; Yeow, J.T.W. Pitaya Detection in Orchards Using the Mobilenet-Yolo Model. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020. [Google Scholar]
- Viola, P.; Jones, M. Rapid Object Detection Using a Boosted Cascade of Simple Features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Zhang, K.; Wang, W.; Lv, Z.; Fan, Y.; Song, Y. Computer Vision Detection of Foreign Objects in Coal Processing Using Attention Cnn. Eng. Appl. Artif. Intell. 2021, 102, 104242. [Google Scholar] [CrossRef]
- Russell, A.; Jia, Z.J. Vehicle Detection Based on Color Analysis. In Proceedings of the 2012 International Symposium on Communications and Information Technologies (ISCIT), Gold Coast, Australia, 2–5 October 2012. [Google Scholar]
- Satzoda, R.K.; Trivedi, M.M. Multipart Vehicle Detection Using Symmetry-Derived Analysis and Active Learning. IEEE Trans. Intell. Transp. Syst. 2016, 17, 926–937. [Google Scholar] [CrossRef]
- Chen, H.T.; Wu, Y.C.; Hsu, C.C. Daytime Preceding Vehicle Brake Light Detection Using Monocular Vision. IEEE Sens. J. 2016, 16, 120–131. [Google Scholar] [CrossRef]
- Razalli, H.; Ramli, R.; Alkawaz, M.H. Emergency Vehicle Recognition and Classification Method Using Hsv Color Segmentation. In Proceedings of the 2020 16th IEEE International Colloquium on Signal Processing & Its Applications (CSPA), Langkawi, Kedah, Malaysia, 28–29 February 2020. [Google Scholar]
- Zhang, Y.; Sun, Y.; Wang, Z.; Jiang, Y. Yolov7-Rar for Urban Vehicle Detection. Sensors 2023, 23, 1801. [Google Scholar] [CrossRef]
- Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J.; Berkeley, U. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation Tech Report. In Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, 23-28 June 2014; 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-Cnn. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-Cnn: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-Cnn. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single Shot Multibox Detector. In Computer Vision—ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2015; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Liu, R.W.; Yuan, W.; Chen, X.; Lu, Y. An Enhanced Cnn-Enabled Learning Method for Promoting Ship Detection in Maritime Surveillance System. Ocean Eng. 2021, 235, 109435. [Google Scholar] [CrossRef]
- Nepal, U.; Eslamiat, H. Comparing Yolov3, Yolov4 and Yolov5 for Autonomous Landing Spot Detection in Faulty Uavs. Sensors 2022, 22, 464. [Google Scholar] [CrossRef] [PubMed]
- Sozzi, M.; Cantalamessa, S.; Cogato, A.; Kayad, A.; Marinello, F. Automatic Bunch Detection in White Grape Varieties Using Yolov3, Yolov4, and Yolov5 Deep Learning Algorithms. Agronomy 2022, 12, 319. [Google Scholar] [CrossRef]
- Huang, Z.; Wang, J.; Fu, X.; Yu, T.; Guo, Y.; Wang, R. Dc-Spp-Yolo: Dense Connection and Spatial Pyramid Pooling Based Yolo for Object Detection. Inf. Sci. 2020, 522, 241–258. [Google Scholar] [CrossRef]
- Bie, M.; Liu, Y.; Li, G.; Hong, J.; Li, J. Real-Time Vehicle Detection Algorithm Based on a Lightweight You-Only-Look-Once (Yolov5n-L) Approach. Expert Syst. Appl. 2023, 213, 119108. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Mnih, V.; Heess, N.M.O.; Graves, A.; Kavukcuoglu, K. Recurrent Models of Visual Attention. In Proceedings of the Neural Information Processing Systems 2014, Montreal, BC, Canada, 8–13 December 2014. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.-S. Cbam: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV) 2018, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Wang, Q.; Wu, T.; Zheng, H.; Guo, G. Hierarchical Pyramid Diverse Attention Networks for Face Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Yang, J.; Ren, P.; Zhang, D.; Chen, D.; Wen, F.; Li, H.; Hua, G. Neural Aggregation Network for Video Face Recognition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Yuan, Y.; Wang, J. Ocnet: Object Context Network for Scene Parsing. arXiv 2018, arXiv:1809.00916. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-Local Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Huang, Z.; Wang, X.; Wei, Y.; Huang, L.; Shi, H.; Liu, W.; Huang, T.S. Ccnet: Criss-Cross Attention for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 6896–6908. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 9992–10002. [Google Scholar]
- Mehta, S.; Rastegari, M. Mobilevit: Light-Weight, General-Purpose, Mobile-Friendly Vision Transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Zhang, J.; Li, X.; Li, J.; Liu, L.; Xue, Z.; Zhang, B.; Jiang, Z.; Huang, T.; Wang, Y.; Wang, C. Rethinking Mobile Block for Efficient Attention-Based Models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023. [Google Scholar]
- Fan, Q.; Huang, H.; Guan, J.; He, R. Rethinking Local Perception in Lightweight Vision Transformer. arXiv 2023, arXiv:2303.17803. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Li, J.; Wen, Y.; He, L. Scconv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, Rhodes Island, Greece, 4–10 June 2023. [Google Scholar]
- Lyu, S.; Chang, M.-C.; Du, D.; Li, W.; Wei, Y.; Del Coco, M.; Carcagnì, P.; Schumann, A.; Munjal, B.; Dang, D.-Q.-T.; et al. Ua-Detrac 2018: Report of Avss2018 & Iwt4s Challenge on Advanced Traffic Monitoring. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 2018, Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar]
- Lyu, S.; Chang, M.-C.; Du, D.; Wen, L.; Qi, H.; Li, Y.; Wei, Y.; Ke, L.; Hu, T.; Del Coco, M. Ua-Detrac 2017: Report of Avss2017 & Iwt4s Challenge on Advanced Traffic Monitoring. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017. [Google Scholar]
- Wen, L.; Du, D.; Cai, Z.; Lei, Z.; Chang, M.-C.; Qi, H.; Lim, J.; Yang, M.-H.; Lyu, S. Ua-Detrac: A New Benchmark and Protocol for Multi-Object Detection and Tracking. Comput. Vis. Image Underst. 2020, 193, 102907. [Google Scholar] [CrossRef]
- Tang, L.; Yun, L.; Chen, Z.; Cheng, F. Hrynet: A Highly Robust Yolo Network for Complex Road Traffic Object Detection. Sensors 2024, 24, 642. [Google Scholar] [CrossRef]
- Jamiya, S.S.; Rani, P.E. LittleYOLO-SPP: A Delicate Real-Time Vehicle Detection Algorithm. Optik 2021, 225, 165818. [Google Scholar]
Model | Size | Params (M) | FLOPs (G) |
---|---|---|---|
YOLOv5s | 640 × 640 | 7.0 | 16.0 |
YOLOv5m | 640 × 640 | 21.1 | 49.2 |
YOLOv5l | 640 × 640 | 46.5 | 109.6 |
YOLOv5x | 640 × 640 | 86.7 | 206.3 |
Model | Size | Params |
---|---|---|
Bottleneck | 640 × 640 | 20,672 |
MSCCR | 640 × 640 | 11,852 |
Model | Params (M) | FLOPS (G) | mAP@50 |
---|---|---|---|
Faster-RCNN | 41.3 | 60.5 | 0.508 |
SSD | 24.1 | 30.5 | 0.532 |
our | 6.4 | 15.6 | 0.565 |
Model | Backbone | Params (M) | FLOPS (G) | mAP@50 | mAP@50:95 |
---|---|---|---|---|---|
YOLOv5s | Mobileentv2 | 7.4 | 16.1 | 0.509 | 0.347 |
YOLOv5s | Mobileentv3 | 7.3 | 9.9 | 0.498 | 0.335 |
YOLOv5s | Efficientnet | 7.0 | 13.4 | 0.524 | 0.347 |
YOLOv5s | Ours | 6.4 | 15.6 | 0.565 | 0.384 |
Model | Params (M) | FLOPS (G) | mAP@50 | FPS |
---|---|---|---|---|
YOLOv3-tiny | 8.7 | 12.9 | 0.498 | - |
YOLOv4-tiny | 6.0 | 16.2 | 0.508 | - |
YOLOv5s | 7.0 | 16.0 | 0.534 | 78.5 |
YOLOV6n | 4.6 | 11.3 | 0.536 | 84.3 |
YOLOV7-tiny | 6.0 | 13.0 | 0.472 | 98 |
Ours | 6.4 | 15.6 | 0.565 | 52 |
Model | Params (M) | FLOPS (G) | mAP@50 |
---|---|---|---|
YOLOv5s | 7.0 | 16.0 | 0.534 |
YOLOv5s + C3_MR | 7.0 | 15.4 | 0.557 |
YOLOv5s + IAT | 6.4 | 16.2 | 0.547 |
YOLOv5s + C3_MR + IAT | 6.4 | 15.6 | 0.565 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Xu, S.; Wang, P.; Li, K.; Song, Z.; Zheng, Q.; Li, Y.; He, Q. Lightweight Vehicle Detection Based on Improved YOLOv5s. Sensors 2024, 24, 1182. https://doi.org/10.3390/s24041182
Wang Y, Xu S, Wang P, Li K, Song Z, Zheng Q, Li Y, He Q. Lightweight Vehicle Detection Based on Improved YOLOv5s. Sensors. 2024; 24(4):1182. https://doi.org/10.3390/s24041182
Chicago/Turabian StyleWang, Yuhai, Shuobo Xu, Peng Wang, Kefeng Li, Ze Song, Quanfeng Zheng, Yanshun Li, and Qiang He. 2024. "Lightweight Vehicle Detection Based on Improved YOLOv5s" Sensors 24, no. 4: 1182. https://doi.org/10.3390/s24041182
APA StyleWang, Y., Xu, S., Wang, P., Li, K., Song, Z., Zheng, Q., Li, Y., & He, Q. (2024). Lightweight Vehicle Detection Based on Improved YOLOv5s. Sensors, 24(4), 1182. https://doi.org/10.3390/s24041182