TAFENet: A Two-Stage Attention-Based Feature-Enhancement Network for Strip Steel Surface Defect Detection
Abstract
:1. Introduction
2. Related Works
2.1. Defect Detection Methods
2.2. Attention Mechanisms
2.3. Self-Attention Mechanism and CNN
2.4. Multiscale Object Detection
3. Method
3.1. Backbone Network
3.2. Neck Network
3.3. Loss Function
4. Experiments and Results Analysis
4.1. Datasets
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Experimental Results
4.5. Ablation Study
4.6. Discussion
5. Conclusions
- Use of the hybrid block of convolution and self-attention blocks, and reasonable use of the lightweight self-attention blocks, can achieve a balance between performance and latency in detection of strip steel surface defect.
- The two-stage feature-enhancement structure enhances the prediction capability for subtle details in strip steel surface defects by minimizing the loss of feature information.
- The targeted use of multiple attention mechanisms can mitigate the impacts of interfering factors and enhance the ability to detect defects with varying aspect ratios and irregular shapes.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Guo, M.; Xu, T.; Liu, J.; Liu, Z.; Jiang, P.; Mu, T.; Zhang, S.; Martin, R.R.; Cheng, M.; Hu, S. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Yang, G.; Lei, J.; Zhu, Z.; Cheng, S.; Feng, Z.; Liang, R. AFPN: Asymptotic feature pyramid network for object detection. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USA, 1–4 October 2023; pp. 2184–2189. [Google Scholar]
- Song, K.; Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 2013, 285, 858–864. [Google Scholar] [CrossRef]
- Yue, B.; Wang, Y.; Min, Y.; Zhang, Z.; Wang, W.; Yong, J. Rail surface defect recognition method based on AdaBoost multi-classifier combination. In Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China, 18–21 November 2019; pp. 391–396. [Google Scholar]
- Huang, Y.; Qiu, C.; Wang, X.; Wang, S.; Yuan, K. A compact convolutional neural network for surface defect inspection. Sensors 2020, 20, 1974. [Google Scholar] [CrossRef] [PubMed]
- Cheng, X.; Yu, J. RetinaNet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection. IEEE Trans. Instrum. Meas. 2020, 70, 2503911. [Google Scholar] [CrossRef]
- Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2204–2212. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2017–2025. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Gao, Z.; Xie, J.; Wang, Q.; Li, P. Global second-order pooling convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3024–3033. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 9992–10002. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16133–16142. [Google Scholar]
- Chen, H.; Chu, X.; Ren, Y.; Zhao, X.; Huang, K. Pelk: Parameter-efficient large kernel convnets with peripheral convolution. arXiv 2024, arXiv:2403.07589. [Google Scholar]
- Peng, Z.; Huang, W.; Gu, S.; Xie, L.; Wang, Y.; Jiao, J.; Ye, Q. Conformer: Local features coupling global representations for visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 367–376. [Google Scholar]
- Chen, P.Y.; Chang, M.C.; Hsieh, J.W.; Chen, Y.S. Parallel residual bi-fusion feature pyramid network for accurate single-shot object detection. IEEE Trans. Image Process. 2021, 30, 9099–9111. [Google Scholar] [CrossRef] [PubMed]
- Quan, Y.; Zhang, D.; Zhang, L.; Tang, J. Centralized feature pyramid for object detection. arXiv 2022, arXiv:2210.02093. [Google Scholar] [CrossRef] [PubMed]
- Jin, Z.; Yu, D.; Song, L.; Yuan, Z.; Yu, L. You should look at all objects. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 332–349. [Google Scholar]
- Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Li, J.; Xia, X.; Li, W.; Li, H.; Wang, X.; Xiao, X.; Wang, R.; Zheng, M.; Pan, X. Next-vit: Next generation vision transformer for efficient deployment in realistic industrial scenarios. arXiv 2022, arXiv:2207.05501. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
- He, Y.; Song, K.; Meng, Q.; Yan, Y. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans. Ind. Inform. 2019, 69, 1493–1504. [Google Scholar] [CrossRef]
- Lv, X.; Duan, F.; Jiang, J.J.; Fu, X.; Gan, L. Deep metallic surface defect detection: The new benchmark and detection network. Sensors 2020, 20, 1562. [Google Scholar] [CrossRef] [PubMed]
- Lin, T. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar]
- Jocher Glenn. Ultralytics yolov8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 5 September 2024).
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Li, Y.; Chen, Y.; Wang, N.; Zhang, Z. Scale-aware trident networks for object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6053–6062. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
- Wang, C.; Yeh, I.; Liao, H. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Method | Crazing | Inclusion | Patches | Pitted Surface | Rolled-In Scale | Scratches | mAP |
---|---|---|---|---|---|---|---|
Retinanet [27] | 33.2 | 73.4 | 82.4 | 83.1 | 61.6 | 76.6 | 68.4 |
EDDN [26] | 41.7 | 76.3 | 86.3 | 85.1 | 58.1 | 85.6 | 72.4 |
Faster RCNN [28] | 39.2 | 75.8 | 86.2 | 85.9 | 62.8 | 88.4 | 73.0 |
FCOS [29] | 37.2 | 82.0 | 81.6 | 90.7 | 61.3 | 90.3 | 73.9 |
YOLOv8 [30] | 41.3 | 79.1 | 90.4 | 77.9 | 62.5 | 91.1 | 73.7 |
YOLOv10 [31] | 40.6 | 87.1 | 84.2 | 84.5 | 67.4 | 91.5 | 75.9 |
TridentNet [32] | 41.0 | 82.9 | 93.4 | 90.3 | 61.6 | 92.5 | 77.0 |
Libra Faster RCNN [33] | 41.5 | 79.1 | 90.8 | 88.4 | 70.7 | 92.0 | 77.1 |
YOLOv9 [34] | 46.3 | 87.4 | 89.4 | 82.7 | 64.3 | 94.5 | 77.4 |
TAFENet | 55.3 | 87.5 | 93.8 | 85.3 | 60.1 | 95.8 | 79.6 |
Method | Pu | Wl | Cg | Ws | Os | Ss | In | Rp | Cr | Wf | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|
Retinanet | 96.4 | 75.3 | 91.9 | 68.4 | 62.1 | 56.7 | 4.9 | 24.7 | 2.0 | 73.8 | 55.6 |
Libra Faster RCNN | 99.5 | 42.9 | 94.9 | 72.8 | 72.1 | 62.8 | 18.8 | 37.4 | 17.6 | 69.3 | 58.8 |
Faster RCNN | 99.0 | 45.7 | 92.5 | 72.9 | 69.6 | 59.1 | 26.7 | 36.7 | 31.2 | 73.3 | 60.7 |
FCOS | 96.7 | 57.3 | 93.0 | 73.6 | 61.8 | 61.5 | 21.3 | 35.7 | 25.1 | 84.2 | 61.2 |
YOLOv8 | 91.2 | 78.5 | 90.8 | 66.5 | 61.4 | 54.9 | 20.4 | 35.8 | 29.5 | 85.6 | 61.5 |
YOLOv10 | 93.5 | 79.1 | 86.8 | 72.4 | 64.8 | 60.2 | 22.4 | 33.9 | 28.6 | 84.2 | 62.6 |
TridentNet | 96.6 | 50.7 | 95.8 | 76.9 | 72.9 | 67.0 | 24.0 | 40.2 | 28.4 | 79.4 | 63.2 |
YOLOv9 | 94.0 | 84.1 | 92.9 | 70.5 | 65.6 | 57.6 | 25.3 | 34.8 | 32.7 | 86.4 | 64.4 |
EDDN | 90.0 | 88.5 | 84.8 | 55.8 | 62.2 | 65.0 | 25.6 | 36.4 | 52.1 | 91.9 | 65.1 |
TAFENet | 97.2 | 93.5 | 94.6 | 74.8 | 66.1 | 52.9 | 29.4 | 22.1 | 88.4 | 89.7 | 70.9 |
Method | Crazing | Inclusion | Patches | Pitted Surface | Rolled-In Scale | Scratches | mAP |
---|---|---|---|---|---|---|---|
Baseline (+SAB) | 46.2 | 84.5 | 91.8 | 84.6 | 56.7 | 92.9 | 76.4 |
+Self-attention | 47.2 | 88.3 | 93.5 | 82.4 | 57.8 | 96.1 | 77.6 |
+Self-attention+CAB | 55.3 | 87.5 | 93.8 | 85.3 | 60.1 | 95.8 | 79.6 |
Method | Crazing | Inclusion | Patches | Pitted Surface | Rolled-In Scale | Scratches | mAP |
---|---|---|---|---|---|---|---|
FPN structure | 49.4 | 86.3 | 95.6 | 82.8 | 56.7 | 92.5 | 77.2 |
FSFE | 50.7 | 83.1 | 93.5 | 81.7 | 55.1 | 91.4 | 75.9 |
SSFE | 44.0 | 85.7 | 90.8 | 78.6 | 51.5 | 91.5 | 73.7 |
FSFE + SSFE | 55.3 | 87.5 | 93.8 | 85.3 | 60.1 | 95.8 | 79.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, L.; Fu, Z.; Guo, H.; Feng, Y.; Sun, Y.; Wang, Z. TAFENet: A Two-Stage Attention-Based Feature-Enhancement Network for Strip Steel Surface Defect Detection. Electronics 2024, 13, 3721. https://doi.org/10.3390/electronics13183721
Zhang L, Fu Z, Guo H, Feng Y, Sun Y, Wang Z. TAFENet: A Two-Stage Attention-Based Feature-Enhancement Network for Strip Steel Surface Defect Detection. Electronics. 2024; 13(18):3721. https://doi.org/10.3390/electronics13183721
Chicago/Turabian StyleZhang, Li, Zhipeng Fu, Huaping Guo, Yan Feng, Yange Sun, and Zuofei Wang. 2024. "TAFENet: A Two-Stage Attention-Based Feature-Enhancement Network for Strip Steel Surface Defect Detection" Electronics 13, no. 18: 3721. https://doi.org/10.3390/electronics13183721
APA StyleZhang, L., Fu, Z., Guo, H., Feng, Y., Sun, Y., & Wang, Z. (2024). TAFENet: A Two-Stage Attention-Based Feature-Enhancement Network for Strip Steel Surface Defect Detection. Electronics, 13(18), 3721. https://doi.org/10.3390/electronics13183721