IMTS-YOLO: A Steel Surface Defect Detection Model Integrating Multi-Scale Perception and Progressive Attention
Abstract
1. Introduction
- We propose a novel C2PSA-IGM module incorporating an Intelligent Guidance Mechanism (IGM) that significantly enhances the model’s capacity for extracting and integrating heterogeneous semantic features. Through the implementation of multi-semantic spatial guidance and progressive channel interaction mechanisms, our approach effectively addresses semantic ambiguity and feature conflicts while preserving discriminative spatial structures. Unlike conventional hybrid attention methods such as CBAM that process spatial and channel responses separately, our IGM employs grouped spatial modeling combined with channel self-attention, demonstrating superior performance in semantic consistency, cross-scene generalization, and comprehension of complex visual patterns.
- We develop the MulC3k2 module by augmenting the C3k2 structure with a multi-scale attention component (MulBk). This innovative design couples MLKA with a GSAU, enabling more effective modeling of long-range dependencies while enhancing local feature representation. In contrast to traditional methods like RCAN that primarily depend on single-scale attention mechanisms, our MulBk module provides multi-scale receptive fields and a more adaptable feature fusion framework, which proves particularly valuable for reconstructing high-frequency image details and processing complex defect texture.
- We introduce a TASFF module to enhance the detection head performance. The TASFF mechanism effectively mitigates cross-scale inconsistencies in feature pyramids by dynamically learning spatial fusion weights across multi-scale features. Distinguished from conventional element-wise addition or concatenation methods typically employed in single-stage detectors, our TASFF module adaptively filters conflicting information while preserving discriminative features, thereby substantially improving detection consistency for multi-scale targets. This approach maintains high detection efficiency while demonstrating enhanced robustness for targets of varying sizes and low contrast in complex scenarios.
2. Materials and Methods
2.1. Datasets
2.1.1. NEU-DET [23,24,25]
2.1.2. GC10-DET [26]
2.1.3. Dataset Analysis
2.2. Baseline: YOLOv11 Model
- Backbone: primarily built with Conv, C3k2, and SPPF modules for feature extraction from input images. A C2PSA module is added post-SPPF to enhance feature selection.
- Neck: composed of Upsample, Concat, and C3k2 modules to facilitate multi-scale feature fusion between shallow and deep layers.
- Head: employs Conv, DWConv (depthwise convolution), and Conv2d modules for the final object classification and localization predictions.
2.3. IMTS-YOLO Model
2.3.1. C2PSA-IGM
2.3.2. Structural Enhancement: The MulC3k2 Module
2.3.3. TASFF Head
2.3.4. Shape-IoU
3. Results
3.1. Experimental Setup and Training Parameters
3.2. Experimental Metrics
3.3. Ablation Study
3.4. Performance Evaluation: Precision–Recall Analysis and Visual Assessment
3.5. Comparison with Mainstream Object Detection Algorithms
3.6. Cross-Dataset Generalization Assessment
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, Y.; Zhang, H.; Huang, Q.; Han, Y.; Zhao, M. DsP-YOLO: An anchor-free network with DsPAN for small object detection of multiscale defects. Expert Syst. Appl. 2024, 241, 122669. [Google Scholar] [CrossRef]
- Zhang, D.; Hao, X.; Wang, D.; Qin, C.; Zhao, B.; Liang, L.; Liu, W. An efficient lightweight convolutional neural network for industrial surface defect detection. Artif. Intell. Rev. 2023, 56, 10651–10677. [Google Scholar] [CrossRef]
- Huang, X.; Zhu, J.; Huo, Y. SSA-YOLO: An Improved YOLO for Hot-Rolled Strip Steel Surface Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 1–17. [Google Scholar] [CrossRef]
- Zhang, T.; Ma, C.; Liu, Z.; ur Rehman, S.; Li, Y.; Saraee, M. Gas pipeline defect detection based on improved deep learning approach. Expert Syst. Appl. 2025, 267, 126212. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Zhao, C.; Shu, X.; Yan, X.; Zuo, X.; Zhu, F. RDD-YOLO: A modified YOLO for detection of steel surface defects. Measurement 2023, 214, 112776. [Google Scholar] [CrossRef]
- Liang, C.; Wang, Z.Z.; Liu, X.L.; Zhang, P.; Tian, Z.W.; Qian, R.L. SDD-Net: A Steel Surface Defect Detection Method Based on Contextual Enhancement and Multiscale Feature Fusion. IEEE Access 2024, 12, 185740–185756. [Google Scholar] [CrossRef]
- Gui, Z.; Geng, J. YOLO-ADS: An Improved YOLOv8 Algorithm for Metal Surface Defect Detection. Electronics 2024, 13, 3129. [Google Scholar] [CrossRef]
- Ma, H.; Zhang, Z.; Zhao, J. A Novel ST-YOLO Network for Steel-Surface-Defect Detection. Sensors 2023, 23, 9152. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar] [CrossRef]
- Zhang, H.; Zhang, S. Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale. arXiv 2023, arXiv:2312.17663. [Google Scholar] [CrossRef]
- Feng, X.; Gao, X.; Luo, L. X-SDD: A New Benchmark for Hot Rolled Steel Strip Surface Defects Detection. Symmetry 2021, 13, 706. [Google Scholar] [CrossRef]
- Cheng, Z.; Gao, L.; Wang, Y.; Deng, Z.; Tao, Y. EC-YOLO: Effectual Detection Model for Steel Strip Surface Defects Based on YOLO-V5. IEEE Access 2024, 12, 62765–62778. [Google Scholar] [CrossRef]
- Lu, M.; Sheng, W.; Zou, Y.; Chen, Y.; Chen, Z. WSS-YOLO: An improved industrial defect detection network for steel surface defects. Measurement 2024, 236, 115060. [Google Scholar] [CrossRef]
- Liu, X.; Gao, J. Surface Defect Detection Method of Hot Rolling Strip Based on Improved SSD Model. In Database Systems for Advanced Applications, Proceedings of the DASFAA 2021 International Workshops, Taipei, Taiwan, 11–14 April 2021; Springer: Cham, Switzerland, 2021; pp. 209–222. [Google Scholar]
- Xie, W.; Sun, X.; Ma, W. A light weight multi-scale feature fusion steel surface defect detection model based on YOLOv8. Meas. Sci. Technol. 2024, 35, 55017. [Google Scholar] [CrossRef]
- Si, Y.; Xu, H.; Zhu, X.; Zhang, W.; Dong, Y.; Chen, Y.; Li, H. SCSA: Exploring the synergistic effects between spatial and channel attention. Neurocomputing 2025, 634, 129866. [Google Scholar] [CrossRef]
- Yi, F.; Zhang, H.; Yang, J.; He, L.; Mohamed, A.S.A.; Gao, S. YOLOv7-SiamFF: Industrial defect detection algorithm based on improved YOLOv7. Comput. Electr. Eng. 2024, 114, 109090. [Google Scholar] [CrossRef]
- Wang, Y.; Li, Y.; Wang, G.; Liu, X. Multi-scale Attention Network for Single Image Super-Resolution. arXiv 2022, arXiv:2209.14145. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Bao, Y.; Song, K.; Liu, J.; Wang, Y.; Yan, Y.; Yu, H.; Li, X. Triplet-Graph Reasoning Network for Few-Shot Metal Generic Surface Defect Segmentation. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
- Song, K.; Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 2013, 285, 858–864. [Google Scholar] [CrossRef]
- He, Y.; Song, K.; Meng, Q.; Yan, Y. An End-to-End Steel Surface Defect Detection Approach via Fusing Multiple Hierarchical Features. IEEE Trans. Instrum. Meas. 2020, 69, 1493–1504. [Google Scholar] [CrossRef]
- Lv, X.; Duan, F.; Jiang, J.-j.; Fu, X.; Gan, L. Deep Metallic Surface Defect Detection: The New Benchmark and Detection Network. Sensors 2020, 20, 1562. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 122–138. [Google Scholar]
- Wu, Y.; He, K. Group Normalization. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Huang, H.; Chen, Z.; Zou, Y.; Lu, M.; Chen, C.; Song, Y.; Zhang, H.; Yan, F. Channel prior convolutional attention for medical image segmentation. Comput. Biol. Med. 2024, 178, 108784. [Google Scholar] [CrossRef]
- Yu, W.; Si, C.; Zhou, P.; Luo, M.; Zhou, Y.; Feng, J.; Yan, S.; Wang, X. MetaFormer Baselines for Vision. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 896–912. [Google Scholar] [CrossRef]
- Guo, M.-H.; Lu, C.-Z.; Liu, Z.-N.; Cheng, M.-M.; Hu, S.-M. Visual attention network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
- Xie, W.; Ma, W.; Sun, X. An efficient re-parameterization feature pyramid network on YOLOv8 to the detection of steel surface defect. Neurocomputing 2025, 614, 128775. [Google Scholar] [CrossRef]
- Wang, F.; Jiang, X.; Han, Y.; Wu, L. YOLO-LSDI: An Enhanced Algorithm for Steel Surface Defect Detection Using a YOLOv11 Network. Electronics 2025, 14, 2576. [Google Scholar] [CrossRef]
- Liu, P.; Yuan, X.; Han, Q.; Xing, B.; Hu, X.; Zhang, J. Micro-defect Varifocal Network: Channel attention and spatial feature fusion for turbine blade surface micro-defect detection. Eng. Appl. Artif. Intell. 2024, 133, 108075. [Google Scholar] [CrossRef]
- Zhou, H.; Zou, H.; Hu, G. An efficient and lightweight algorithm for detecting surface defects of steel based on SCCI-YOLO. Sci. Rep. 2025, 15, 36276. [Google Scholar] [CrossRef] [PubMed]
- Song, H. RSTD-YOLOv7: A steel surface defect detection based on improved YOLOv7. Sci. Rep. 2025, 15, 19649. [Google Scholar] [CrossRef] [PubMed]

















| Parameter | Parameter Value |
|---|---|
| Epochs | 300 |
| Momentum | 0.937 |
| Initial Learning Rate | 0.01 |
| Optimizer | SGD |
| Batch Size | 32 |
| Weight Decay | 0.0005 |
| Mosaic | 1.0 |
| Mixup | 0.0 |
| Hsv_h | 0.015 |
| Hsv_s | 0.7 |
| Hsv_v | 0.4 |
| Group | C2PSA-IGM (M1) | MulC3k2 (M2) | TASFF (M3) | Shape-IoU (M4) | P (%) | R (%) | Parms (M) | GFLOPs | mAP50 (%) | mAP50-95 (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | - | - | - | - | 70.3 | 69.9 | 2.5 | 6.4 | 75.3 | 41.5 |
| 2 | √ | 73.4 | 71.8 | 2.5 | 6.4 | 77.5 | 42.6 | |||
| 3 | √ | 70.6 | 70.1 | 2.6 | 6.7 | 76.6 | 41.1 | |||
| 4 | √ | 71.9 | 70.8 | 3.9 | 8.6 | 76.5 | 41.6 | |||
| 5 | √ | 70.5 | 70.0 | 2.5 | 6.4 | 75.8 | 42.8 | |||
| 6 | √ | √ | √ | 77.1 | 73.0 | 3.9 | 8.8 | 79.9 | 43.4 | |
| 7 | √ | √ | √ | 75.9 | 73.2 | 2.6 | 6.7 | 79.5 | 44.1 | |
| 8 | √ | √ | √ | 77.0 | 73.3 | 3.9 | 8.6 | 79.4 | 44.3 | |
| 9 | √ | √ | √ | 74.7 | 72.1 | 3.9 | 8.8 | 78.7 | 43.8 | |
| 10 | √ | √ | √ | √ | 78.0 | 73.8 | 3.9 | 8.8 | 80.3 | 44.7 |
| Loss Function | P | R | mAP50 | mAP50-95 |
|---|---|---|---|---|
| Base (CIOU) | 77.1 | 73.0 | 79.9 | 43.4 |
| GIoU | 80.9 | 67.8 | 78.3 | 42.4 |
| DIoU | 76.4 | 73.9 | 79.4 | 43.5 |
| EIoU | 78.7 | 71.6 | 80.1 | 43.5 |
| SIoU | 76.9 | 71.7 | 79.0 | 44.8 |
| Shape-IoU | 78.0 | 73.8 | 80.3 | 44.7 |
| Model | P (%) | R (%) | Params (M) | GFLOPs | mAP50 (%) | mAP50-95 (%) | FPS |
|---|---|---|---|---|---|---|---|
| Faster R-CNN [16] | 33.0 | 91.1 | 138.4 | 368.2 | 73.6 | 33.0 | 36.0 |
| SSD [38] | - | - | 25.1 | 88.2 | 70.8 | - | 37.7 |
| RT-DETR [38] | - | - | 28.5 | 100.6 | 73.5 | - | 66.1 |
| Deformable DETR [39] | - | - | 34.2 | 78.0 | 71.6 | 40.1 | 118.7 |
| YOLOv3 [16] | 76.3 | 71.2 | 103.7 | 282.2 | 76.8 | 42.5 | 67.0 |
| YOLOv5s [16] | 74.7 | 74.7 | 7.0 | 15.8 | 76.8 | 42.4 | 220.0 |
| VF-Net [40] | 38.1 | 59.9 | 599.1 | 140.97 | 70.6 | - | 12.2 |
| YOLOv7-Tiny [16] | 73.4 | 66.4 | 6.0 | 13.1 | 74.0 | 37.1 | 165.0 |
| YOLOv8n | 69.2 | 77.4 | 3.0 | 8.1 | 75.2 | 40.9 | 212.5 |
| YOLOv10n [39] | - | - | 2.7 | 8.2 | 73.7 | 41.8 | 220.7 |
| YOLOv11n | 70.3 | 69.9 | 2.5 | 6.4 | 75.3 | 41.5 | 235.3 |
| RDD-YOLO [8] | - | - | - | - | 81.1 | - | 57.8 |
| SCCI-YOLO [41] | - | - | 1.7 | - | 78.6 | 45.3 | 270.2 |
| IMTS-YOLO (Ours) | 78.0 | 73.8 | 3.9 | 8.8 | 80.3 | 44.7 | 268.2 |
| Model | P (%) | R (%) | Params (M) | GFLOPs | mAP50 (%) | mAP50-95 (%) | FPS |
|---|---|---|---|---|---|---|---|
| Faster R-CNN [16] | 38.2 | 59.4 | 138.4 | 368.2 | 56.9 | 20.4 | 35.0 |
| SSD [38] | - | - | 25.7 | 88.8 | 68.3 | - | 37.5 |
| RT-DETR-R18 [16] | 72.5 | 69.5 | 19.9 | 55.4 | 71.6 | 36.8 | 137.0 |
| YOLOv3 [16] | 62.4 | 62.7 | 103.7 | 282.2 | 62.2 | 32.5 | 106.0 |
| YOLOv5s [16] | 72.4 | 66.5 | 7.0 | 15.8 | 69.4 | 35.6 | 239.0 |
| VF-Net [42] | - | - | - | - | 64.5 | - | - |
| YOLOv7-Tiny [16] | 76.2 | 58.9 | 6.0 | 13.1 | 68.1 | 33.8 | 208.0 |
| YOLOv8n [16] | 65.0 | 67.2 | 3.0 | 8.1 | 68.7 | 36.1 | 222.0 |
| YOLOv11n | 70.7 | 66.2 | 2.5 | 6.4 | 69.5 | 37.3 | 224.3 |
| RDD-YOLO [8] | - | - | - | - | 75.2 | - | 57.5 |
| SCCI-YOLO [41] | - | - | 1.7 | - | 67.3 | 33.4 | - |
| IMTS-YOLO (Ours) | 75.4 | 69.2 | 3.9 | 8.8 | 73.9 | 40.4 | 238.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Fu, P.; Yuan, H.; He, J.; Wu, B.; Xu, N.; Gu, Y. IMTS-YOLO: A Steel Surface Defect Detection Model Integrating Multi-Scale Perception and Progressive Attention. Coatings 2026, 16, 51. https://doi.org/10.3390/coatings16010051
Fu P, Yuan H, He J, Wu B, Xu N, Gu Y. IMTS-YOLO: A Steel Surface Defect Detection Model Integrating Multi-Scale Perception and Progressive Attention. Coatings. 2026; 16(1):51. https://doi.org/10.3390/coatings16010051
Chicago/Turabian StyleFu, Pengzheng, Hongbin Yuan, Jing He, Bangzhi Wu, Nuo Xu, and Yong Gu. 2026. "IMTS-YOLO: A Steel Surface Defect Detection Model Integrating Multi-Scale Perception and Progressive Attention" Coatings 16, no. 1: 51. https://doi.org/10.3390/coatings16010051
APA StyleFu, P., Yuan, H., He, J., Wu, B., Xu, N., & Gu, Y. (2026). IMTS-YOLO: A Steel Surface Defect Detection Model Integrating Multi-Scale Perception and Progressive Attention. Coatings, 16(1), 51. https://doi.org/10.3390/coatings16010051

