YOLO-TSF: A Small Traffic Sign Detection Algorithm for Foggy Road Scenes
Abstract
:1. Introduction
- We proposed the Channel-Coordinate Attention Module (CCAM) and combined it with the local–global residual learning structure to design the Local–Global Feature Fusion Module (LGFFM). This effectively addresses the issue of indistinct features in foggy images due to a decrease in contrast.
- To solve the feature loss problem of small targets in cross-scale fusion, we designed the Multi-Head Adaptive Spatial Feature Fusion Detection Head (MASFFHead). It can effectively handle targets of different scales and, by integrating more shallow features, it reduces the feature loss problem of small targets after convolution pooling operations and performs secondary extraction of small targets.
- We designed a new loss function, NWD-CIoU, which can calculate the similarity between bounding boxes without considering whether they overlap. This solves the problem of IoU being highly sensitive to objects of different scales.
- In response to the lack of foggy traffic signs datasets, we took photos of foggy road scenes and performed fogging operations on the TT100K dataset. Finally, data enhancement processing, such as random flipping, Copy-Paste, and brightness adjustment, was performed to construct the Foggy-TT100k dataset.
2. Related Works
2.1. Object Detection Models
2.2. Foggy Object Detection
3. Methods
3.1. LGFFM
3.2. MASFFHead
3.3. NWD-CIoU Loss Function
3.4. Construction of Foggy Traffic Signs Dataset
4. Experiments
4.1. Implementation Details
4.2. Evaluation Metrics
4.3. Exploration Experiment of Scale Factor λ
4.4. Ablation Experiment
4.5. Comparison of Different Detectors
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gong, C.; Li, A.; Song, Y.; Xu, N.; He, W. Traffic sign recognition based on the YOLOv3 algorithm. Sensors 2022, 22, 9345. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Li, X.; Xie, Z.; Deng, X. Traffic sign detection based on improved faster R-CNN for autonomous driving. J. Supercomput. 2022, 78, 7982–8002. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Cao, J.; Chen, Q.; Guo, J.; Shi, R. Attention-guided context feature pyramid network for object detection. arXiv 2020, arXiv:2005.11475. [Google Scholar]
- Zhao, L.; Wei, Z.; Li, Y.; Jin, J.; Li, X. Sedg-yolov5: A lightweight traffic sign detection model based on knowledge distillation. Electronics 2023, 12, 305. [Google Scholar] [CrossRef]
- Wang, J.; Chen, Y.; Dong, Z.; Gao, M. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput. Appl. 2023, 35, 7853–7865. [Google Scholar] [CrossRef]
- YOLOv5. 2021. Available online: https://github.com/ultralytics/yolov5 (accessed on 2 October 2022).
- Saxena, S.; Dey, S.; Shah, M.; Gupta, S. Traffic sign detection in unconstrained environment using improved YOLOv4. Expert Syst. Appl. 2023, 238, 121836. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wang, Y.; Bai, M.; Wang, M.; Zhao, F.; Guo, J. Multiscale traffic sign detection method in complex environment based on YOLOv4. Comput. Intell. Neurosci. 2022, 2022, 5297605. [Google Scholar] [CrossRef]
- Yao, J.; Huang, B.; Yang, S.; Xiang, X.; Lu, Z. Traffic sign detection and recognition under low illumination. Mach. Vis. Appl. 2023, 34, 75. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, New Orleans, LA, USA, 18–24 June 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Ge, Z. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Wei, X. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 9 September 2024).
- Ma, Y.; Cai, J.; Tao, J.; Yang, Q.; Gao, Y.; Fan, X. Foggy image detection based on dehazenet with improved ssd. In Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence, Virtually, 2–9 February 2021; pp. 82–86. [Google Scholar]
- Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef]
- Huang, S.C.; Le, T.H.; Jaw, D.W. DSNet: Joint semantic learning for object detection in inclement weather conditions. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2623–2633. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Li, C.; Guo, C.; Guo, J.; Han, P.; Fu, H.; Cong, R. PDR-Net: Perception-inspired single image dehazing network with refinement. IEEE Trans. Multimed. 2019, 22, 704–716. [Google Scholar] [CrossRef]
- Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4770–4778. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Wang, Z.; Wang, J.; Li, Y.; Wang, S. Traffic sign recognition with lightweight two-stage model in complex scenes. IEEE Trans. Intell. Transp. Syst. 2020, 23, 1121–1131. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Learning spatial fusion for single-shot object detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
- Zhu, Z.; Liang, D.; Zhang, S.; Huang, X.; Li, B.; Hu, S. Traffic-sign detection and classification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2110–2118. [Google Scholar]
- Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.Y.; Cubuk, E.D.; Zoph, B. Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2918–2928. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Chen, K. Rtmdet: An empirical study of designing real-time object detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar]
- Zhang, S.; Wang, X.; Wang, J.; Pang, J.; Lyu, C.; Zhang, W.; Chen, K. Dense distinct query for end-to-end object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7329–7338. [Google Scholar]
- Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Shum, H.Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
- Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 11908–11915. [Google Scholar]
λ | mAP(0.5) | mAP(0.5:0.95) |
---|---|---|
0.1 | 74.9% | 54.5% |
0.2 | 75.1% | 54.8% |
0.3 | 75.6% | 55.4% |
0.4 | 74.7% | 55.2% |
0.5 | 75.2% | 55.3% |
0.6 | 74.7% | 55.3% |
0.7 | 74.3% | 55.0% |
0.8 | 74.9% | 55.2% |
0.9 | 74.1% | 55.0% |
1.0 | 74.3% | 54.9% |
Baseline | LGFFM | MASFFHead | NWD-CIoU | Params | mAP(0.5) | mAP(0.5:0.95) | P | F1-Score |
---|---|---|---|---|---|---|---|---|
✓ | 11.2 M | 74.3% | 54.9% | 72.7% | 71.6% | |||
✓ | ✓ | 11.2 M | 76.7% | 58.3% | 77.4% | 73.5% | ||
✓ | ✓ | 13.6 M | 80.2% | 60.5% | 77.4% | 76.7% | ||
✓ | ✓ | 11.2 M | 75.6% | 55.3% | 77.1% | 72.4% | ||
✓ | ✓ | ✓ | 13.6 M | 82.6% | 62.4% | 77.9% | 78.6% | |
✓ | ✓ | ✓ | 11.2 M | 77.6% | 58.5% | 78.6% | 75.9% | |
✓ | ✓ | ✓ | 13.6 M | 81.8% | 61.5% | 79.3% | 77.5% | |
✓ | ✓ | ✓ | ✓ | 13.7 M | 83.1% | 62.7% | 79.8% | 79.6% |
Method | Params | map@0.5 | map@0.5:0.95 |
---|---|---|---|
Faster R-CNN [4] | 41.4 M | 63.7% | 48.8% |
Cascade R-CNN [40] | 69.3 M | 68.2% | 52.6% |
RTMDet [41] | 52.3 M | 69.1% | 52.5% |
DDQ-DETR [42] | 53.4 M | 69.0% | 45.9% |
DINO [43] | 47.6 M | 55.1% | 36.8% |
YOLOv5 [8] | 7.1 M | 71.4% | 52.9% |
YOLOv6 [24] | 18.5 M | 67.3% | 49.9% |
YOLOv7 [25] | 6.1 M | 72.8% | 50.7% |
YOLOv8 [26] | 11.2 M | 74.3% | 54.9% |
AOD [32] +YOLOv8 | 11.3 M | 74.8% | 53.7% |
FFA [44] +YOLOv8 | 12.6 M | 75.2% | 55.4% |
YOLO-TSF (Ours) | 13.7 M | 83.1% | 62.7% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, R.; Chen, Y.; Wang, Y.; Sun, C. YOLO-TSF: A Small Traffic Sign Detection Algorithm for Foggy Road Scenes. Electronics 2024, 13, 3744. https://doi.org/10.3390/electronics13183744
Li R, Chen Y, Wang Y, Sun C. YOLO-TSF: A Small Traffic Sign Detection Algorithm for Foggy Road Scenes. Electronics. 2024; 13(18):3744. https://doi.org/10.3390/electronics13183744
Chicago/Turabian StyleLi, Rongzhen, Yajun Chen, Yu Wang, and Chaoyue Sun. 2024. "YOLO-TSF: A Small Traffic Sign Detection Algorithm for Foggy Road Scenes" Electronics 13, no. 18: 3744. https://doi.org/10.3390/electronics13183744
APA StyleLi, R., Chen, Y., Wang, Y., & Sun, C. (2024). YOLO-TSF: A Small Traffic Sign Detection Algorithm for Foggy Road Scenes. Electronics, 13(18), 3744. https://doi.org/10.3390/electronics13183744