An Improved YOLO Model for Traffic Signs Small Target Image Detection
Abstract
:1. Introduction
- In order to enhance the detection accuracy and accelerate network convergence within the YOLOv5 architecture, we introduce a decoupled head from YOLOX [26] into the detection head of the model. The underlying concept behind this proposal is to segregate the classification and localization tasks, thereby ensuring a more focused and specialized tackle on each sub-task in the detection process. This incorporates a decoupled architecture approach that caters to the model’s need to separately optimize these distinct detection aspects. This innovative inclusion promises to increase the precision of detection while accelerating the model’s learning efficiency.
- We propose to replace the original convolution in the network framework with the SPD-Convolution module. This change aims to eliminate the impact of pooling operations on feature information and enhance the model’s ability to extract features from low-resolution small objects.
- Adding the Contextual Augmentation Module (CAM) to supplement contextual information by utilizing extended convolutions with varying rates. This module captures contextual information from different receptive fields, playing a crucial complementary role in small object detection of traffic signs. It effectively addresses the challenges posed by small targets and improves detection accuracy.
- Our proposed technique significantly reduces the occurrence of missed detections and false positives, ensuring a high recall rate of 91.6% and a precision rate of 95.0%. These results represent substantial improvements over the original YOLOv5 algorithm, demonstrating the efficacy of our approach in achieving more precise and comprehensive traffic sign detection.
2. YOLOv5 Algorithm
2.1. Input
2.2. Backbone
2.3. Neck
2.4. Head
2.5. The Loss Function in YOLOv5
3. Improvement of the YOLOv5s Algorithm
3.1. SPD-Conv
fscale−1,0 = X[scale − 1:S:scale, 0:S:scale];
f0,1 = X[0:S:scale, 1:S:scale], f1,1, ...,
fscale−1,1 = X[scale − 1:S:scale, 1:S:scale];
fscale−1,scale−1 = X[scale − 1:S:scale, scale − 1:S:scale];
3.2. Decoupled Head
3.3. Context Augmentation Module
4. Experiment and Discussion
4.1. Dataset
4.2. Network Architecture of the Improved YOLOv5s
4.3. Experiment and Parameter Settings
Experimental Setup
4.4. Detection Performance Evaluation Metrics
4.5. Analysis of Experimental Results
4.5.1. Comparative Experiment on the Improvement of the SPD-Conv Module
4.5.2. Comparative Experiment on CAM Improvement of Contextual Information
4.5.3. Comparative Experiment on Decoupled Head Improvement
4.5.4. Comparative Experimental Analysis of All Improved Algorithms
4.6. Overfitting Analysis
- Data Augmentation: During the training phase, we employed the Mosaic data augmentation technique, which involved random cropping, rotations, scaling, and color distortions. By applying Mosaic, we were able to combine four images into a single training sample, allowing us to enhance the diversity of the training set and improve the model’s generalization capabilities.
- Dropout: Dropout was used as a regularization technique during the training process. It randomly deactivates a subset of neurons in the network during training, preventing co-adaptation and thereby reducing the risk of overfitting.
- Early Stopping: Early stopping is an effective strategy against overfitting. By monitoring the performance on the validation set, we stopped the training process if the validation performance did not improve over several consecutive epochs. This strategy prevented the model from overfitting the training data by terminating the training process before the model could start learning noise from the training data.
4.7. Ablation Experiment
4.8. Comparative Experiments with Mainstream Algorithms
4.9. Example Visualization of Experimental Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Deng, X. Research on Detection and Recognition of Road Traffic Signs in Natural Environments. Ph.D. Thesis, Nanjing University of Science and Technology, Nanjing, China, 2014. [Google Scholar]
- Mogelmose, A.; Trivedi, M.M.; Moeslund, T.B. Vision-Based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey. IEEE Trans. Intell. Transp. Syst. 2012, 13, 1484–1497. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 508. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kong, T.; Yao, A.; Chen, Y.; Sun, F. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Cheng, P.; Liu, W.; Zhang, Y.; Ma, H. LOCO: Local Context Based Faster R-CNN for Small Traffic Sign Detection. In Proceedings of the MultiMedia Modeling, Bangkok, Thailand, 5–7 February 2018; Schoeffmann, K., Chalidabhongse, T.H., Ngo, C.W., Aramvith, S., O’Connor, N.E., Ho, Y.S., Gabbouj, M., Elgammal, A., Eds.; Springer International: Cham, Switzerland, 2018; pp. 329–341. [Google Scholar]
- Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
- Yao, Y.; Han, L.; Du, C.; Xu, X.; Jiang, X. Traffic sign detection algorithm based on improved YOLOv4-Tiny. Signal Process. Image Commun. 2022, 107, 116783. [Google Scholar] [CrossRef]
- Zhu, Z.; Liang, D.; Zhang, S.; Huang, X.; Li, B.; Hu, S. Traffic-Sign Detection and Classification in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Houben, S.; Stallkamp, J.; Salmen, J.; Schlipsing, M.; Igel, C. Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, USA, 4–9 August 2013; pp. 1–8. [Google Scholar] [CrossRef]
- Lai, H.; Chen, L.; Liu, W.; Yan, Z.; Ye, S. STC-YOLO: Small Object Detection Network for Traffic Signs in Complex Environments. Sensors 2023, 23, 5307. [Google Scholar] [CrossRef] [PubMed]
- Liu, J.; Zhang, W.; Tang, Y.; Tang, J.; Wu, G. Residual Feature Aggregation Network for Image Super-Resolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2356–2365. [Google Scholar] [CrossRef]
- Kim, C.-I.; Park, J.; Park, Y.; Jung, W.; Lim, Y.-S. Deep Learning-Based Real-Time Traffic Sign Recognition System for Urban. Environ. Infrastruct. 2023, 8, 20. [Google Scholar] [CrossRef]
- Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. StrongSORT: Make DeepSORT Great Again. arXiv 2023, arXiv:2202.13514. [Google Scholar] [CrossRef]
- Chu, J.; Zhang, C.; Yan, M.; Zhang, H.; Ge, T. TRD-YOLO: A Real-Time, High-Performance Small Traffic Sign Detection Algorithm. Sensors 2023, 23, 3871. [Google Scholar] [CrossRef] [PubMed]
- Hu, J.; Wang, Z.; Chang, M.; Xie, L.; Xu, W.; Chen, N. PSG-YOLOv5: A Paradigm for Traffic Sign Detection and Recognition Algorithm Based on Deep Learning. Symmetry 2022, 14, 2262. [Google Scholar] [CrossRef]
- Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar]
- Luo, H.; Yang, Y.; Tong, B.; Wu, F.; Fan, B. Traffic Sign Recognition Using a Multi-Task Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1100–1111. [Google Scholar] [CrossRef]
- Cao, J.; Zhang, J.; Jin, X. A Traffic-Sign Detection Algorithm Based on Improved Sparse R-cnn. IEEE Access 2021, 9, 122774–122788. [Google Scholar] [CrossRef]
- Xiong, Q.; Zhang, X.; Wang, X.; Qiao, N.; Shen, J. Robust Iris-Localization Algorithm in Non-Cooperative Environments Based on the Improved YOLO v4 Model. Sensors 2022, 22, 9913. [Google Scholar] [CrossRef] [PubMed]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the European Conference on Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 346–361. [Google Scholar] [CrossRef] [Green Version]
- Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. arXiv 2019, arXiv:1911.11929. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. arXiv 2018, arXiv:1803.01534. [Google Scholar]
- Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. arXiv 2022, arXiv:2208.03641. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. arXiv 2017, arXiv:1612.03144. [Google Scholar]
- Xiao, J.; Zhao, T.; Yao, Y.; Yu, Q.; Chen, Y. Context Augmentation and Feature Refinement Network for Tiny Object Detection. In Proceedings of the 10th International Conference on Learning Representations, Online, 25–29 April 2022. [Google Scholar]
Experimental Environment | Version or Model |
---|---|
Operating system | ubuntu20.04 |
CPU | Data |
GPU | RTX 3090 |
Python | 3.8 |
CUDA | 11.3 |
PyTorch | 1.10.0 |
Parameter | Settings |
---|---|
Img-size | 640 × 640 |
Batch-size | 36 |
Epochs | 800 |
Data enhancement | Mosaic |
Parameter | Settings |
---|---|
Lr0 | 0.01 |
Lrf | 0.2 |
Momentum | 0.937 |
Weight decay factor | 0.0005 |
Algorithm | Parameters/M | P% | R% | mAP% |
---|---|---|---|---|
YOLOv5 | 7.13 | 92.9 | 86.8 | 91.8 |
YOLOv5-SPD | 8.67 | 93.2 | 90.3 | 94.0 |
Algorithm | Parameters/M | P% | R% | mAP% |
---|---|---|---|---|
YOLOv5 | 7.13 | 92.9 | 86.8 | 91.8 |
YOLOv5-CAM | 9.1 | 92.5 | 86.6 | 92.0 |
Algorithm | Parameters/M | P% | R% | mAP% |
---|---|---|---|---|
YOLOv5 | 7.13 | 92.9 | 86.8 | 91.8 |
YOLOv5-Head | 14.4 | 93.3 | 89.9 | 93.8 |
Algorithm | Parameters/M | P% | R% | mAP% |
---|---|---|---|---|
YOLOv5 | 7.13 | 92.9 | 86.8 | 91.8 |
Ours | 24.3 | 95.0 | 91.6 | 95.4 |
Algorithm | Parameters/M | P% | R% | mAP% |
---|---|---|---|---|
YOLOv5 | 7.13 | 92.9 | 86.8 | 91.8 |
CAM | 9.1 | 92.5 | 86.6 | 92.0 |
SPD | 8.67 | 93.2 | 90.3 | 94.0 |
Head | 14.4 | 93.3 | 89.9 | 93.8 |
SPD + Head | 15.9 | 95.3 | 90.0 | 95.2 |
SPD + Head + CAM | 24.3 | 95.0 | 91.6 | 95.4 |
Algorithm | Parameters/M | P% | R% | mAP% | FPS |
---|---|---|---|---|---|
SSD | 25.0 | 30.2 | 35.6 | 45.3 | 83.7 |
Faster R-CNN | 167.7 | 81.0 | 85.2 | 82.4 | 3.4 |
RetinaNet | 67.6 | 73.2 | 75.4 | 71.9 | 17.6 |
YOLOv3 | 61.8 | 56.3 | 59.2 | 58.1 | 29.6 |
YOLOv4 | 96.9 | 73.6 | 78.4 | 75.2 | 41.7 |
YOLOv5 | 7.13 | 92.9 | 86.8 | 91.8 | 107.5 |
Ours | 24.3 | 95.0 | 91.6 | 95.4 | 80.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, T.; Sun, L.; Dong, Q. An Improved YOLO Model for Traffic Signs Small Target Image Detection. Appl. Sci. 2023, 13, 8754. https://doi.org/10.3390/app13158754
Han T, Sun L, Dong Q. An Improved YOLO Model for Traffic Signs Small Target Image Detection. Applied Sciences. 2023; 13(15):8754. https://doi.org/10.3390/app13158754
Chicago/Turabian StyleHan, Tianxin, Lina Sun, and Qing Dong. 2023. "An Improved YOLO Model for Traffic Signs Small Target Image Detection" Applied Sciences 13, no. 15: 8754. https://doi.org/10.3390/app13158754
APA StyleHan, T., Sun, L., & Dong, Q. (2023). An Improved YOLO Model for Traffic Signs Small Target Image Detection. Applied Sciences, 13(15), 8754. https://doi.org/10.3390/app13158754