Defect Detection Algorithm of Galvanized Sheet Based on S-C-B-YOLO

Liu, Yicheng; Fan, Gaoxia; Zhang, Hanquan; Xiao, Dong

doi:10.3390/math14010110

Open AccessArticle

Defect Detection Algorithm of Galvanized Sheet Based on S-C-B-YOLO

¹

School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China

²

School of Information Science and Engineering, Northeastern University, Shenyang 110819, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(1), 110; https://doi.org/10.3390/math14010110 (registering DOI)

Submission received: 17 November 2025 / Revised: 21 December 2025 / Accepted: 25 December 2025 / Published: 28 December 2025

(This article belongs to the Special Issue Advance in Neural Networks and Visual Learning)

Download

Browse Figures

Versions Notes

Abstract

Galvanized steel sheets are vital anti-corrosion materials, yet their surface quality is prone to defects that impact performance. Manual inspection is inefficient, while conventional machine vision struggles with complex, small-scale defects in industrial settings. Although deep learning offers promising solutions, standard object detection models like YOLOv5 (which is short for ‘You Only Look Once’) exhibit limitations in handling the subtle textures, scale variations, and reflective surfaces characteristic of galvanized sheet defects. To address these challenges, this paper proposes S-C-B-YOLO, an enhanced detection model based on YOLOv5. First, a Squeeze-and-Excitation (SE) attention mechanism is integrated into the deep layers of the backbone network to adaptively recalibrate channel-wise features, improving focus on defect-relevant information. Second, a Transformer block is combined with a C3 module to form a C3TR module, enhancing the model’s ability to capture global contextual relationships for irregular defects. Finally, the original path aggregation network (PANet) is replaced with a bidirectional feature pyramid network (Bi-FPN) to facilitate more efficient multi-scale feature fusion, significantly boosting sensitivity to small defects. Extensive experiments on a dedicated galvanized sheet defect dataset show that S-C-B-YOLO achieves a mean average precision (mAP@0.5) of 92.6% and an inference speed of 62 FPS, outperforming several baseline models including YOLOv3, YOLOv7, and Faster R-CNN. The proposed model demonstrates a favorable balance between accuracy and speed, offering a robust and practical solution for automated, real-time defect inspection in galvanized steel production.

Keywords:

galvanized sheet; defect detection; deep learning; two-way feature pyramid; squeeze-and-excitation attention; YOLO

MSC:

68T07

1. Introduction

Galvanized steel sheets represent a significant category of metallic materials, extensively utilized in construction, automotive, and electrical appliance industries due to their excellent corrosion resistance and formability [1]. In the manufacturing process, coiled steel sheets undergo pickling followed by immersion in molten zinc, forming a protective zinc coating that acts as a physical barrier [2]. This hot-dip galvanizing process is cost-effective and provides superior rust prevention, effectively shielding the steel from exposure to moisture, oxygen, and other corrosive elements, thereby inhibiting oxidation [3]. It can extend the service life of steel by 5–8 years in humid environments, establishing itself as a crucial method for enhancing steel longevity [4]. The integrity of the zinc coating directly dictates the reliability of final products [5] like automobiles and household appliances, while superior surface appearance is essential for consumer appeal and market competitiveness [6]. Furthermore, surface imperfections can adversely affect coating adhesion and weldability [7], compromising subsequent processing quality. Therefore, rigorous surface quality inspection is imperative to ensure that galvanized sheets meet both high-quality standards and economic viability.

In practical hot-dip galvanizing production, factors such as limited equipment precision and process fluctuations can introduce over 50 distinct types of surface defects [8]. These defects can be broadly categorized into mechanical damage (e.g., scratches) and coating defects (e.g., uneven zinc distribution) [9]. Left unaddressed, these defects may progressively expand, leading to localized zinc spallation and initiating cascading corrosion issues [10].

Consequently, high-speed online detection of surface defects on galvanized sheets has become a focal point in visual inspection research. Detection methodologies in this domain can be classified into traditional manual inspection, conventional machine learning-based detection, and deep learning-based detection [11]. Initially, steel enterprises worldwide primarily relied on manual visual inspection [12], where workers performed real-time observation on the production line alongside periodic unrolling for sampling to achieve monitoring objectives. However, with advancements in industrial technology and rising demand for high-grade steel, the limitations of this method—susceptibility to subjective bias, lack of quantitative standards, and difficulty in promptly identifying minute defects—became increasingly apparent [13]. Moreover, challenging factory conditions involving noise, high temperatures, and low illumination contribute to visual fatigue among inspectors, resulting in high missed-detection rates and delayed response times. Consequently, manual inspection is largely obsolete in modern industrial settings [14].

Subsequently, advancements in hardware capabilities and machine learning techniques fostered the growing application of machine vision in industrial defect detection [15]. This approach involves capturing images of galvanized sheets using industrial cameras, followed by feature extraction via methods like edge detection and texture analysis [16], and finally classification using algorithms such as Support Vector Machines (SVM) [17]. Recognized for its speed, accuracy, and high degree of automation [18], this technology has garnered significant attention from both industry and research institutions [19]. Nevertheless, conventional machine learning methods depend heavily on handcrafted feature extraction [20], a process that is often complex and time-consuming. In contrast, deep learning-based methods autonomously learn features from image samples [21], demonstrating superior performance in defect localization and recognition. They are characterized by high accuracy, strong adaptability, ease of deployment, efficiency, and enhanced robustness [22]. Precisely due to these advantages, applying deep learning to surface defect detection in galvanized sheets constitutes a current research hotspot and prevailing trend.

Although YOLOv5 exhibits robust performance in general object detection tasks, its baseline architecture suffers from several critical limitations when directly deployed for defect detection on metallic surfaces (e.g., galvanized sheets). These limitations constitute the core justification for the targeted improvements proposed in this work:

Insufficient Capability for Detecting Small Defects: Certain defects on galvanized sheet surfaces (e.g., non-uniform spangle size, fine scratches) occupy only a minimal number of pixels in high-resolution images acquired by industrial cameras [23]. The semantic information of such small objects is highly susceptible to loss during feature extraction and down-sampling in the standard YOLOv5 pipeline. The original Feature Pyramid Network (FPN)/PANet in YOLOv5 exhibits limited efficiency in cross-scale feature fusion and inadequate representational capacity for subtle features, resulting in a lower recall rate for small defects.

Limited Modeling Capacity for Defects with Complex Textures and Irregular Shapes: Defects such as “diagonal streaking” on metallic surfaces often exhibit complex textural patterns and irregular geometric characteristics. The backbone network of standard YOLOv5, which is primarily based on CNN convolutional operations, has an inherent limitation in capturing long-range dependencies and global contextual information, stemming from its local receptive field. This hinders the model’s ability to grasp the overall morphology of defects and their complex contextual relationships with the surrounding background, thereby compromising the accurate classification and localization of irregular defects.

Robustness to Reflective Surfaces and Illumination Variations Needs Enhancement: Galvanized sheet surfaces are highly reflective, and uneven illumination in production environments can easily induce highlights and glare in acquired images. These regions exhibit pixel-level similarities to certain defects (e.g., bright spot defects). The standard model lacks an explicit mechanism for the adaptive recalibration of feature channels, rendering it vulnerable to interference from such semantically irrelevant strong noise, which may result in false positives or missed detections.

Feature Utilization Efficiency Can Be Further Improved: In complex industrial scenarios, not all channel-wise information in the rich feature maps extracted by the network is equally critical for the final detection task. Standard YOLOv5 treats all feature channels equally [24], without adaptively emphasizing key defect-relevant features or suppressing redundant and noisy information. This limits the model’s precision and efficiency to a certain extent.

Thus, direct deployment of the standard YOLOv5 model fails to meet the stringent requirements for high accuracy, strong robustness, and real-time performance in galvanized sheet surface defect detection. The proposed S-C-B-YOLO model in this work is specifically designed to address these limitations in a systematic manner: it achieves this by introducing the Squeeze-and-Excitation (SE) attention mechanism to adaptively recalibrate channel-wise features, thereby enhancing the model’s focus on critical information and its interference resistance; by designing the C3TR module that integrates the global modeling capabilities of Transformer, thereby improving the model’s understanding of complex textures and irregular defects; and by adopting Bi-FPN to optimize the multi-scale feature fusion pipeline, thereby significantly enhancing the model’s sensitivity to small defects. These improvements are synergistic and targeted, with the goal of adapting the model more effectively to the specific challenges associated with metallic surface defect detection.

2. Review

To address the limitations of existing methods in detecting small, irregular, and reflective defects on galvanized steel surfaces, this work proposes an enhanced YOLOv5-based model, named S-C-B-YOLO. The research focuses on three key improvements: integrating a channel attention mechanism to enhance feature selectivity, incorporating transformer-based modules to capture global context, and optimizing the multi-scale feature fusion pathway. The study systematically develops and validates this model using a dedicated dataset of galvanized sheet defects, aiming to achieve a balance between high detection accuracy and real-time processing speed [25] suitable for industrial deployment.

We collected and constructed a surface spangle defect dataset for galvanized sheets from factory-produced finished products. The dataset primarily includes ‘diagonal streaking’ and ‘non-uniform spangle size’ [26].

Diagonal streaking refers to streaks that appear at the edges of hot-dip galvanized sheets, forming a certain angle with the rolling direction of the strip steel, as shown in the green box in Figure 1a. The zinc coating is noticeably thicker in these streaked areas. In extreme cases, the streaks may extend across the entire cross-section of the galvanized sheet, resulting in a penetrating streaking defect.

Non-uniform spangle size is a common defect characterized by a significant difference in spangle dimensions between the head/tail and the middle region of the steel sheet, or a pronounced size variation between these areas, as illustrated in the yellow and orange boxes in Figure 1b.

The main contributions of this paper are summarized as follows:

Dataset Construction: We compile and annotate a practical galvanized sheet defect dataset, applying advanced augmentation techniques like Mosaic to improve data diversity and model robustness.

Architecture Design: We propose the S-C-B-YOLO model, which systematically integrates three key enhancements into the YOLOv5 framework: the SE attention mechanism for feature recalibration, the C3TR module for global context modeling, and the Bi-FPN for efficient multi-scale fusion.

Comprehensive Validation: Through extensive ablation studies and comparisons with state-of-the-art detectors, we demonstrate the effectiveness of our model, showing significant improvements in both accuracy and speed, and provide an analysis of its performance and failure modes.

The remainder of this paper is organized as follows. Section 2 gives the review. Section 3 introduces two methods used in data augmentation. Section 4 details the proposed methodology and architecture improvements. Section 5 describes the dataset preparation and experimental setup, including implementation details, and evaluation metrics. In addition, it presents and discusses the results, including ablation studies, comparative analysis, and failure case examination. Finally, Section 6 concludes the paper and suggests future research directions.

3. Method

To improve model robustness, prevent overfitting, and enhance the detection capability for small-scale defects, systematic data augmentation techniques were applied during the training phase.

3.1. Mosaic Data Augmentation

Mosaic data augmentation is an effective strategy widely used in object detection tasks. Its core principle involves randomly selecting four training images, scaling and cropping them, and stitching them into a new composite image while correspondingly fusing their bounding box annotations. The primary advantages of this method include:

Enriched Context: Forces the model to learn to recognize multiple objects and their spatial relationships within a single, complex scene.

Improved Small-Object Detection: Small defects in the original images may become more prominent in the composite image after stitching and rescaling.

Optimized Batch Normalization: A single composite image contains richer pixel statistics, contributing to more stable training.

Increased Training Efficiency: Effectively exposes the model to more diverse data combinations within the same number of iterations, potentially reducing the total epochs required for convergence.

The specific workflow consists of the following key steps:

Random Image Selection: Four original images are randomly sampled from the training set.
Random Scaling and Cropping: Each image undergoes random resizing and region cropping to introduce variations in scale and composition.
Stitching: The processed images are placed into the four quadrants of a new canvas to form a composite image.
Label Fusion: The bounding box coordinates from each original image are affinely transformed according to their new position and scale within the composite image, generating a unified label file for the synthesized sample.

In this work, Mosaic augmentation was activated by setting the parameter “mosaic = 1” during training. It is important to note that this technique is applied exclusively during the training phase and is disabled for model validation and testing to ensure performance evaluation reflects the model’s capability on real, single images.

3.2. Gaussian Noise

When neural networks attempt to learn recurring but potentially useless high-frequency features, they often face the problem of overfitting. The presence of Gaussian noise allows them to effectively simulate high-frequency features; however, it also affects low-frequency features, thereby making the expected data less accurate. Nonetheless, neural networks can overcome this challenge through learning, ultimately obtaining a more accurate model. Adding an appropriate amount of noise can significantly enhance the learning efficiency and accuracy of neural networks.

Gaussian noise is typically defined by a Gaussian distribution or normal distribution probability density function. It is a type of noise that exists at almost every point, with random noise intensity. Its calculation method can be easily derived using knowledge related to the normal distribution in probability theory, which can be expressed by (1). Here, μ represents the mean (expected value) and σ² represents the variance. For each input pixel, the output pixel can be obtained by adding a random number conforming to a Gaussian distribution.

f (x) = \frac{1}{\sqrt{2 π σ}} \exp (- \frac{{(x - μ)}^{2}}{2 σ^{2}})

(1)

4. Establishment of Galvanized Sheet Inspection Model Based on S-C-B-YOLO

This experiment employs the YOLOv5 model. After comprehensive consideration of the application scenario and resource requirements, the YOLOv5s version was selected. This version features the minimal network depth and narrowest feature map width, resulting in the fastest processing speed and making it more suitable for real-time detection. Its architecture primarily consists of four parts: the Input, the Backbone, the Neck, and the Output (Figure 2).

4.1. SE Attention Mechanism

The SE (Squeeze-and-Excitation) attention mechanism primarily functions to re-calibrate channel-wise feature responses by performing adaptive weighting of feature map channels. This enables the network to focus more on channels containing crucial information while suppressing less important ones, thereby enhancing the representational capacity of the features. This dynamic adjustment of channel-wise feature response strengths is achieved through a three-step process applied to each channel of the input feature map: Squeeze (global information compression), Excitation (importance weight learning), and Scale (channel re-weighting).

In the present improvement, the SE attention module is incorporated into the deeper layers of the Backbone network (preceding the SPPF module). Placing the SE module here aids the network in better integrating and utilizing information from all preceding layers, leveraging its global receptive field to guide feature selection. It takes the output from the previous C3 module as its input. For this specific implementation, the input feature map has a shape of [B, 1024, Height, Width], where the channel count C is 1024, and the compression ratio is set to 2.

The operational procedure is as follows:

Level 1. Squeeze (Compression):

The objective of this step is to aggregate global spatial information from each channel, discarding the spatial distribution details. This is accomplished by performing Global Average Pooling (GAP) over the spatial dimensions (H, W) of each channel. The result is a channel-wise descriptor vector with a shape of [B, 1024, 1, 1]. Each element

Z_{C}

in this vector Z represents the global average response intensity of the c-th channel in the original feature map U, as shown in Equation (2).

Z_{C} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} U_{c} (i, j)

(2)

Level 2. Excitation (Adaptive Gating/Weight Learning):

This step aims to capture non-linear dependencies between channels and generate a set of modulation weights.

1. Dimensionality Reduction: The descriptor vector Z from the previous step is passed through a fully connected (FC) layer, reducing its dimensionality. Typically, the dimension is reduced to

\frac{C}{reduction_ratio}

. Here, with reduction_ratio = 2 and C = 1024, the dimension is compressed to 512. A non-linear activation function (e.g., ReLU) is applied subsequently to capture non-linear interactions and reduce computational complexity.

2. Dimensionality Restoration: The compressed features (512-dimensional) are then fed through another FC layer to restore the original channel dimensionality (1024-dimensional).

3. Activation Function: A Sigmoid activation function is applied to the output of this second FC layer, constraining the weight for each channel to the range [0, 1].

Upon completion of these steps, a channel weighting vector S with a shape of [B, 1024, 1, 1] is generated.

Level 3. Scale (Re-weighting):

The final step involves re-scaling the original input feature map U using the learned weights. This is done by performing channel-wise multiplication between the original feature map U and the channel weighting vector S. Specifically, for each channel c, all spatial elements (i, j) within that channel are multiplied by the corresponding scalar weight

S_{c}

, as defined in Equation (3).

Û_{c} (i, j) = S_{c} \times U_{c} (i, j)

(3)

The result is the recalibrated feature map Û, which possesses the same shape as the input U.

Role of the Incorporated SE Attention Mechanism:

The integration of the SE attention mechanism delivers the following key benefits:

(1) Amplification of Salient Feature Channels: The SE module enables the network to autonomously learn and emphasize the feature channels that contain the most critical information for the specific task (e.g., object detection). For instance, it can identify channels that are particularly sensitive to the textures, shapes, or contextual cues of certain defect categories among the 1024 available channels.

(2) Suppression of Noisy or Irrelevant Channels: Concurrently, the mechanism attenuates the contributions from channels that harbor redundant information, noise, or details less pertinent to the current detection objective.

(3) Enhanced Feature Discriminability: This dynamic, adaptive, channel-wise feature recalibration significantly boosts the representational capacity and discriminative power of the feature maps produced by the network.

4.2. C3TR Module

Prior research indicates that hybrid models often outperform pure Transformer or pure CNN architectures when applied to small-scale datasets. Coincidentally, the volume of defective galvanized steel image data generated in industrial production settings is typically limited. Applying such a hybrid model to the galvanized sheet defect detection task can potentially enhance the network’s capacity for capturing global contextual information. Therefore, this paper integrates a Transformer Block with the final C3 module in the YOLOv5 backbone to form a C3TR module, while retaining the original C3 modules in other layers. This design strikes a balance between overall feature extraction accuracy and computational speed, augments global perceptual capability, and delivers significant performance improvements for complex detection tasks.

The network structure of the Transformer Block is depicted in Figure 3. For the input, the feature map extracted by the preceding layer of this module is first flattened into a 1D vector sequence, which is then passed to a Patch Embedding layer. Subsequently, each vector undergoes a linear transformation via a fully connected layer, is concatenated with a class token, and then summed with positional encodings that incorporate spatial information about the image. The resulting tokens are fed into the Encoder, a core component comprising mainly a Multi-Head Attention module and a Multilayer Perceptron (MLP).

The output from the Multi-Head Attention layer is combined with its original input via a residual connection, and the result is then passed to the MLP block. In this implementation, the MLP block, constructed with two fully connected layers, enables the Transformer Block to model complex relationships. Another residual connection adds the input of the MLP to its output, facilitating better extraction of image information and yielding the final output of the encoder. Multiple identical Transformer encoder layers are connected and stacked sequentially, forming the complete structure of the Transformer Block module. Replacing the Bottleneck module within the original C3 module with this Transformer Block creates the proposed C3TR module. A structural comparison between the improved C3TR module and the original C3 module is presented in Figure 4.

4.3. Bi-FPN

Feature pyramids have become a standard component in object detection networks, enhancing the capability to detect objects across various scales. They facilitate the extraction of multi-scale feature information, which is then fused across different hierarchical feature maps to improve model accuracy. Since small objects inherently contain limited pixel information and are prone to being lost during down-sampling, effectively detecting objects with significant size variations is challenging. The traditional Feature Pyramid Network (FPN) addresses this by employing a top-down pathway with lateral connections, fusing high-resolution, shallow features with semantically rich, deep features.

YOLO-v5 utilizes the PANet (Path Aggregation Network). In this paper, we enhance the neck by replacing PANet with the Weighted Bidirectional Feature Pyramid Network (Bi-FPN). Bi-FPN enables simple yet efficient multi-scale feature fusion, and its structure is illustrated in Figure 5.

Bi-FPN simplifies the architecture by removing nodes that have only one input edge, as their contribution to feature fusion across different networks is minimal. It implements a simplified bidirectional network. If the input and output nodes reside at the same level, an additional edge is incorporated to fuse more features without introducing significant extra cost. Unlike PANet, which features a single top-down and a single bottom-up path, Bi-FPN treats each bidirectional (top-down and bottom-up) pathway as a feature network layer. It assigns learnable weights to each input feature and repeatedly applies these bidirectional layers to achieve higher-level feature fusion.

Furthermore, Bi-FPN employs fast normalized fusion for feature integration, as shown in Equation (4).

O = \sum_{i} \frac{ω_{i}}{ε + \sum_{j} ω_{j}} \times I_{i}

(4)

I_{i}

denotes the i-th input feature map;

ω_{i}

is the learnable weight corresponding to the i-th input feature

I_{i}

;

ω_{j}

refers to any of the learnable weights in the set; ε is any tiny constant added to the denominator.

This method offers a blend of accuracy and speed. The core mechanism involves using learnable weights for each input feature, which are then normalized to generate a weighted sum. For instance, considering the features at level 6 in Figure 5,

P_{6}^{td}

represents the intermediate feature from the top-down pathway at level 6, while

P_{6}^{out}

denotes the output feature from the bottom-up pathway at level 6. In summary, Bi-FPN integrates bidirectional cross-scale connections and fast normalized fusion. In the network’s neck, the original Concat operation is replaced with the Bi-FPN_ADD operation, resulting in superior fusion performance compared to its predecessor.

The improved backbone network is tightly integrated with the BiFPN module through its multi-scale feature outputs (P3, P4, P5). In the feature pyramid neck, BiFPN employs a weighted bidirectional connection mechanism to perform cross-scale fusion of feature maps from different depths of the backbone. Specifically, the shallow high-resolution features (P3), mid-level features (P4), and deep semantic-rich features (P5) output by the backbone are simultaneously fed into the multi-layered stacked structure of BiFPN. Within each BiFPN layer, adaptive weighted fusion is applied to the different input features using learnable weights, and both top-down and bottom-up bidirectional information flow are executed. This architecture stacks three BiFPN layers in total (including 2 BiFPN_Add2 layers and 1 BiFPN_Add3 layer). Through this multi-level, iterative bidirectional fusion, the representational capacity of multi-scale features is enhanced and semantic information is effectively integrated, thereby significantly improving the model’s robustness in detecting objects with varying scales.

The overall architecture of the improved algorithm is depicted in Figure 6.

This study implements systematic architectural improvements to the original YOLOv5s model. In the backbone module, the standard C3 modules are replaced with C3TR modules integrating the Transformer’s self-attention mechanism, and a Squeeze-and-Excitation (SE) channel attention module is inserted after deep feature extraction layers. In the neck module, the traditional unidirectional feature pyramid and simple concatenation operations are entirely replaced with a BiFPN, which achieves adaptive weighted fusion of multi-level features through BiFPN_Add2 and BiFPN_Add3 operations. Finally, while maintaining the original detection head structure in the output module, the feature map input indices are adjusted according to the depth changes in the preceding network. This series of enhancements constructs an improved object detection architecture that integrates local feature extraction, global contextual modeling, and adaptive multi-scale fusion capabilities.

5. Experimental Results

5.1. Experimental Environment

To ensure the validity of experimental comparisons, all experiments in this paper (including the ablation studies and comparative experiments described in the following sections) were conducted under identical hardware and software configurations, as detailed in Table 1. For the same purpose, we unified the model training parameters: the batch size was set to 32, momentum to 0.9, the image size was adjusted to 640 × 640, weight decay to 0.0005, and the initial learning rate was set to 0.01, as shown in Table 2.

Each network involved in the experiments was trained for 300 epochs using data for the two primary defect types: ‘diagonal streaking’ and ‘non-uniform spangle size’.

5.1.1. Software and Hardware

The selection of key training parameters—specifically an image size of 640 × 640 and a batch size of 32—was determined by a balance between computational constraints, training stability, and consideration for small defect detection. Resizing the original high-resolution images (3024 × 3024) to 640 × 640 is a standard practice in YOLO-based detection, which ensures compatibility with pre-trained models and allows for a feasible batch size given GPU memory limitations (NVIDIA GTX 1650Ti). A larger batch size is crucial for stable batch normalization and efficient gradient estimation. To mitigate the potential loss of fine details for small defects due to down-sampling, our method specifically incorporates two countermeasures: (1) mosaic data augmentation, which increases the relative pixel proportion of small defects in training samples, and (2) the Bi-FPN architecture, which is explicitly designed to enhance multi-scale feature fusion and has demonstrated improved sensitivity to small objects in our ablation studies (Section 5.3). The batch size of 32 was the maximum viable size under our hardware setup, and the initial learning rate was tuned accordingly to ensure training stability.

5.1.2. Establishment of the Galvanized Sheet Surface Dataset

We developed a dedicated image dataset for galvanized sheet surface defect detection to provide a reliable foundation for model training and evaluation. All images were collected from actual industrial production lines, ensuring authenticity and representativeness. The dataset is characterized as follows:

Scale and Resolution: It comprises 2880 high-quality images, each with a resolution of 3024 × 3024 pixels. This high resolution is crucial for preserving the fine details of subtle defects.

Data Split: To ensure a fair and objective evaluation, the dataset was split into training, validation, and test sets. Specifically, 2304 images were used for training, 288 for validation, and 288 for testing. This split is designed to provide sufficient data for model learning while maintaining an independent test set to rigorously assess generalization performance.

Data Source: Images originate from two primary sources: frames captured from real-time video streams on the production line and high-definition static images of samples with typical defects. This diversity helps cover defect appearances under various lighting conditions, viewing angles, and background contexts, enhancing the model’s robustness.

The detailed statistics of the dataset are shown in Table 3.

5.1.3. Annotation Process

The galvanized sheet spangle defect dataset consists of raw images and their corresponding label files (Figure 7). First, we used the LabelImg software (v 1.8.6) to annotate the defects by drawing bounding boxes on the raw images. Each defect was manually categorized and labeled, with different defect types assigned distinct pixel-wise numerical values. For the task of spangle defect detection on galvanized sheet surfaces, diagonal streaking defects were labeled as 0, and non-uniform spangle size defects were labeled as 1. The annotations were then exported into two formats: TXT files compatible with the YOLO network format and XML files in the VOC2007 format.

5.2. Evaluation Metrics

For evaluating the YOLOv5 algorithm, common metrics include Precision, Recall, Average Precision (AP), the Confusion Matrix, and the Precision-Recall (P-R) Curve. Furthermore, higher-level detection metrics such as mean Average Precision (mAP), Frames Per Second (FPS), and Giga Floating Point Operations Per Second (GFLOPS) are also employed. All these standard evaluation metrics can be derived from the Confusion Matrix, as illustrated in Figure 8.

Based on the Confusion Matrix, core metrics for assessing image recognition algorithm performance—Precision (P), Recall (R), and Average Precision (AP)—can be calculated. AP provides a single composite score that comprehensively considers both precision and recall across different confidence thresholds, obtained by calculating the area under the P-R curve. The calculation formulas are given by Equations (5)–(7).

P = \frac{T P}{T P + F P}

(5)

R = \frac{T P}{T P + F N}

(6)

AP = \frac{1}{11} \sum_{i = 0, 0.1, \dots, 1.0} P_{smooth} (i)

(7)

After computing the AP for all categories, the most effective evaluation metric, mean Average Precision (mAP), is derived by simply summing these AP values and dividing by the total number of categories. A higher mAP indicates better performance, as shown in Equation (8).

m AP = \frac{1}{n_{j}} \sum_{j = 1}^{n_{j}} {AP}_{j}

(8)

5.3. Results Curves

The improved algorithm was trained over multiple runs, yielding the final training weights and a results summary plot (results.jpg), as shown in Figure 9. This plot displays the curves of key metrics—including localization loss, classification loss, and mAP@0.5—plotted against the number of training epochs. Analysis of the final training curves indicates that the losses of the improved model decreased at a notably faster rate, achieving optimal performance around 200 epochs. By this point, the localization loss, confidence loss, and classification loss had all been reduced to below 0.02, with no observable signs of overfitting. Overall, the network demonstrates effective training convergence and achieves a high level of recognition accuracy.

To be more detailed, initial loss values were Localization loss = 2.87, Confidence loss = 3.21, Classification loss = 1.95; the model’s total loss stabilized at a low level around Epoch 200, with no significant decrease in subsequent epochs, localization loss, confidence loss, and classification loss all reduced to below 0.02; and the max mAP@0.5 during training reached 93.1% at Epoch 240, and the final evaluated mAP@0.5 (on the test set) was 92.6%.

5.4. Ablation Study

This work incorporates three distinct improvements targeting different modules. Consequently, five experimental configurations were evaluated on the same dataset to facilitate a direct comparison of the impact of each algorithmic modification on the baseline YOLOv5s model. The detailed configurations for the ablation study are as follows:

(1): Configuration 1: The original YOLOv5s model.
(2): Configuration 2: The C3 module in the final layer of the YOLOv5s backbone is replaced with the C3TR module, integrating a Transformer block for feature extraction.
(3): Configuration 3: The Bi-FPN structure is applied to the neck network of YOLOv5s for multi-scale feature fusion.
(4): Configuration 4: The SE attention mechanism is introduced into the backbone network of YOLOv5s to enhance feature extraction capability.
(5): Configuration 5: This configuration integrates all the aforementioned improvements, testing the complete enhanced YOLOv5s algorithm.

The results of the ablation study are presented in Table 4.

Analysis of the results indicates that introducing the C3TR module into the YOLOv5 model notably enhances its feature extraction capability, improving the mAP across various categories. However, this comes at the cost of a marginally reduced operational speed, reflected in a lower FPS compared to the original YOLOv5s. Incorporating the SE attention mechanism prior to the SPPF module reduces the GFLOPs, thereby improving computational efficiency. Employing Bi-FPN yields a substantial improvement in mAP, benefiting both operational speed and detection accuracy, albeit with an associated increase in GFLOPs. In summary, the data presented in the table demonstrate that the integration of these three improvements collectively elevates the mean average precision by approximately 3 percentage points, while maintaining satisfactory operational speed and computational load, ultimately achieving promising training outcomes.

5.5. Comparative Experiments

Finally, this paper compares the proposed improved YOLOv5 algorithm with other mainstream object detection algorithms. The experimental results are presented in Table 5, which includes comparisons against YOLO-v3, YOLO-v7, and the two-stage object detection network Faster R-CNN.

The proposed method achieved a mean average precision (mAP@0.5:0.95) of 85%, surpassing all other compared detection algorithms. It demonstrates a significant improvement of 4.4% over the YOLOv3 baseline. Faster R-CNN attained a relatively lower mAP of 0.779 on this specific dataset. While YOLOv7 achieved a higher detection speed (FPS), its average precision was inferior to the proposed algorithm. The results indicate that the method presented in this paper effectively enhances model accuracy while maintaining a competitive detection speed. It exhibits a clear advantage in detecting spangle defects on galvanized steel sheet surfaces and offers a new perspective for defect detection tasks.

5.6. Actual Galvanized Sheet Detection Results

Upon finalizing the model architecture, a number of images were randomly selected from the previously reserved test set for evaluation. The defects detected by the model in each image were compared against the actual defects present, as illustrated in Figure 10. The results indicate that the model demonstrates satisfactory detection performance across most test images, with specific examples provided in the figure. However, certain defect features occasionally found on actual galvanized sheet surfaces were not represented in the current dataset. Consequently, subsequent work could focus on expanding the training dataset by incorporating examples from real inspection scenarios. This would enhance the model’s generalization capability, better preparing it to meet the demands of practical industrial detection applications in the future.

6. Conclusions

This study addressed the critical challenge of automated, high-accuracy defect detection on galvanized steel sheets, a task complicated by the presence of small, irregular defects and reflective surfaces. To overcome the limitations of standard detection models in this specific industrial context, we proposed S-C-B-YOLO, an enhanced architecture based on YOLOv5. The core innovation lies in the synergistic integration of three key modifications: the incorporation of an SE attention mechanism to adaptively emphasize defect-relevant features, the design of a C3TR module to capture global contextual information for irregular defects, and the replacement of PANet with Bi-FPN to optimize multi-scale feature fusion, particularly for small targets.

Experimental results on a dedicated galvanized sheet defect dataset demonstrate the effectiveness of our approach. The proposed model achieved a mean average precision (mAP@0.5) of 92.6% and an inference speed of 62 FPS. Ablation studies confirmed the individual and collective contribution of each proposed component to the final performance. Furthermore, comparative experiments showed that S-C-B-YOLO surpasses several mainstream detectors, including YOLOv3, YOLOv7, and Faster R-CNN, in terms of overall accuracy while maintaining competitive inference speed, showcasing a superior balance suitable for real-time industrial inspection.

In summary, this work provides a robust and efficient deep learning solution for galvanized sheet surface defect detection. The proposed model’s design effectively tackles the specific challenges of the domain, and its performance validates the potential for practical deployment in quality control systems. Future work will focus on expanding the defect dataset to include more rare defect categories, further optimizing the model for edge deployment, and exploring its adaptability to other types of metallic surface inspections.

Author Contributions

Conceptualization, Y.L. and G.F.; methodology, H.Z. and D.X.; software, Y.L. and H.Z.; validation, Y.L. and G.F.; formal analysis, G.F. and H.Z.; investigation, H.Z. and D.X.; writing—original draft preparation, Y.L. and G.F.; writing—review and editing, G.F. and H.Z.; visualization, Y.L. and G.F.; funding acquisition, D.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key R&D Program of China under Grant 2022YFB2703304; in part by the National Natural Science Foundation of China under Grant 52074064; in part by the Fundamental Research Funds for the Central Universities under Grant 2025GFZD01, 2025GFZD27.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, W.; Zhang, H.; Wang, G.; Xiong, G.; Zhao, M.; Li, G.; Li, R. Deep learning based online metallic surface defect detection method for wire and arc additive manufacturing. Robot. Comput.-Integr. Manuf. 2023, 80, 102470. [Google Scholar] [CrossRef]
Souza, B.J.; Stefenon, S.F.; Singh, G.; Freire, R.Z. Hybrid-YOLO for classification of insulators defects in transmission lines based on UAV. Int. J. Electr. Power Energy Syst. 2023, 148, 108982. [Google Scholar] [CrossRef]
Qiu, Q.; Lau, D. Real-time detection of cracks in tiled sidewalks using YOLO-based method applied to unmanned aerial vehicle (UAV) images. Autom. Constr. 2023, 147, 104745. [Google Scholar] [CrossRef]
Kou, X.; Liu, S.; Cheng, K.; Qian, Y. Development of a YOLO-V3-based model for detecting defects on steel strip surface. Measurement 2021, 182, 109454. [Google Scholar] [CrossRef]
Zhou, Z.; Zhang, J.; Gong, C. Automatic detection method of tunnel lining multi-defects via an enhanced You Only Look Once network. Comput.-Aided Civil Infrastruct. Eng. 2022, 37, 762–780. [Google Scholar] [CrossRef]
Yuan, M.; Zhou, Y.; Ren, X.; Zhi, H.; Zhang, J.; Chen, H. YOLO-HMC: An improved method for PCB surface defect detection. IEEE Trans. Instrum. Meas. 2024, 73, 2001611. [Google Scholar] [CrossRef]
Zhao, C.; Fan, Y.; Tan, J.; Li, Q.; Lin, Z.; Luo, S.; Chen, X. FCS-YOLO: An efficient algorithm for detecting steel surface defects. Meas. Sci. Technol. 2024, 35, 086004. [Google Scholar] [CrossRef]
Ren, Z.; Fang, F.; Yan, N.; Wu, Y. State of the art in defect detection based on machine vision. Int. J. Precis. Eng. Manuf.-Green Technol. 2022, 9, 661–691. [Google Scholar] [CrossRef]
Wen, X.; Shan, J.; He, Y.; Song, K. Steel surface defect recognition: A survey. Coatings 2023, 13, 17. [Google Scholar] [CrossRef]
Zhang, J.; Qian, S.; Tan, C. Automated bridge surface crack detection and segmentation using computer vision-based deep learning model. Eng. Appl. Artif. Intell. 2022, 115, 105225. [Google Scholar] [CrossRef]
Wang, Y.; Wang, H.; Xin, Z. Efficient detection model of steel strip surface defects based on YOLO-V7. IEEE Access 2022, 10, 133936–133944. [Google Scholar] [CrossRef]
Chen, Y.; Ding, Y.; Zhao, F.; Zhang, E.; Wu, Z.; Shao, L. Surface defect detection methods for industrial products: A review. Appl. Sci. 2021, 11, 7657. [Google Scholar] [CrossRef]
Zhao, W.; Chen, F.; Huang, H.; Li, D.; Cheng, W. A new steel defect detection algorithm based on deep learning. Comput. Intell. Neurosci. 2021, 2021, 5592878. [Google Scholar] [CrossRef]
Tang, J.; Liu, S.; Zhao, D.; Tang, L.; Zou, W.; Zheng, B. PCB-YOLO: An improved detection algorithm of PCB surface defects based on YOLOv5. Sustainability 2023, 15, 5963. [Google Scholar] [CrossRef]
Singh, S.A.; Desai, K.A. Automated surface defect detection framework using machine vision and convolutional neural networks. J. Intell. Manuf. 2023, 34, 1995–2011. [Google Scholar] [CrossRef]
Guo, Z.; Wang, C.; Yang, G.; Huang, Z.; Li, G. MSFT-YOLO: Improved YOLOv5 based on transformer for detecting defects of steel surface. Sensors 2022, 22, 3467. [Google Scholar] [CrossRef]
Xie, W.; Sun, X.; Ma, W. A light weight multi-scale feature fusion steel surface defect detection model based on YOLOv8. Meas. Sci. Technol. 2024, 35, 055017. [Google Scholar] [CrossRef]
Liu, M.; Chen, Y.; Xie, J.; He, L.; Zhang, Y. LF-YOLO: A lighter and faster YOLO for weld defect detection of X-ray image. IEEE Sens. J. 2023, 23, 7430–7439. [Google Scholar] [CrossRef]
Qiu, S.; Yang, C.H.; Wu, L.; Gao, H.; Song, W. One Improved Small-object Detection You-only-look-once Network for Strip-steel Surfaces. Sens. Mater. 2025, 37, 2257–2277. [Google Scholar] [CrossRef]
Wang, P.; Fan, E.; Wang, P. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognit. Lett. 2021, 141, 61–67. [Google Scholar] [CrossRef]
Xin, Y.; Kong, L.; Liu, Z.; Chen, Y.; Li, Y.; Zhu, H. Machine learning and deep learning methods for cybersecurity. IEEE Access 2018, 6, 35365–35381. [Google Scholar] [CrossRef]
Fu, Y.; Downey, A.R.J.; Yuan, L.; Zhang, T.; Pratt, A.; Balogun, Y. Machine learning algorithms for defect detection in metal laser-based additive manufacturing: A review. J. Manuf. Process. 2022, 75, 693–710. [Google Scholar] [CrossRef]
Zhou, S.; Zhou, Z.; Ji, K.; Wang, Y.; Zhou, X.; Yu, T. GSD-YOLO: A gear surface defects detection method using adaptive multi-scale fusion and hybrid feature fusion. IEEE Sens. J. 2025, 25, 30020–30033. [Google Scholar] [CrossRef]
Hussain, M. YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
Qiao, Q.; Hu, H.; Ahmad, A.; Wang, K. A Review of Metal Surface Defect Detection Technologies in Industrial Applications. IEEE Access 2025, 13, 48380–48400. [Google Scholar] [CrossRef]
Ma, G.; Yuan, H.; Yu, L.; He, Y. Monitoring of weld defects of visual sensing assisted GMAW process with galvanized steel. Mater. Manuf. Process. 2021, 36, 1178–1188. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of defect types. (a) Diagonal streaking; (b) non-uniform spangle size.

Figure 2. Schematic diagram of YOLO-v5s network structure.

Figure 3. (a) Schematic diagram of Transformer Block module; (b) structure diagram of the Transform er Encoder.

Figure 4. C3 (left) and C3TR (right) modules.

Figure 5. Bi-FPN Structure.

Figure 6. Structure of improved YOLOv5 algorithm.

Figure 7. Partial dataset illustration.

Figure 8. Confusion Matrix.

Figure 9. Train and Validate result curves.

Figure 10. Detection Results of the Improved YOLOv5 Network. (a) Input Image 1 (b) Input Image 2 (c) Detection Image 1 (d) Detection Image 2.

Table 1. Software and hardware environment.

Name	Experimental Parameters
Operating system	Windows10
GPU RAM	GPU NVIDIA GeForce GTX 1650Ti 16 GB
Python version	3.8
Frame	pytorch
Cuda version	12.3

Table 2. Model Training Parameters.

Parameter	Value
Learning rate momentum	0.01 0.9
Batch size	32
Image size	640 × 640
Weight decay	0.0005
Number of iterations (Epoch)	300

Table 3. Detailed statistics of the galvanized sheet spangle defect dataset.

Defect Category	Train Instances	Val Instances	Test Instances	Total Instances	Val(2) Instances
Diagonal Streaking	1850	220	210	2280	280
Non-uniform Spangle Size	1650	200	190	2040	260
Grand Total	3500	420	400	4320	540

Table 4. Ablation study results.

	YOLOv5s	YOLOv5s + C3TR	YOLOv5s + SE	YOLOv5s + Bi-FPN	YOLOv5s + ALL
mAP@0.5	0.892	0.917	0.899	0.923	0.926
mAP@0.5: 0.95	0.845	0.858	0.845	0.862	0.850
FPS	57	54	57	64	62
GFLOPs	16.4	16.8	16.1	16.7	16.4

Table 5. Performance comparison of different models.

	YOLO-v3	YOLO-v7	Faster-RCNN	S-C-B-YOLO-v5
mAP@0.5	0.877	0.904	0.779	0.926
mAP@0.5:0.95	0.816	0.835	0.712	0.850
fps	45	78	40	62

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Fan, G.; Zhang, H.; Xiao, D. Defect Detection Algorithm of Galvanized Sheet Based on S-C-B-YOLO. Mathematics 2026, 14, 110. https://doi.org/10.3390/math14010110

AMA Style

Liu Y, Fan G, Zhang H, Xiao D. Defect Detection Algorithm of Galvanized Sheet Based on S-C-B-YOLO. Mathematics. 2026; 14(1):110. https://doi.org/10.3390/math14010110

Chicago/Turabian Style

Liu, Yicheng, Gaoxia Fan, Hanquan Zhang, and Dong Xiao. 2026. "Defect Detection Algorithm of Galvanized Sheet Based on S-C-B-YOLO" Mathematics 14, no. 1: 110. https://doi.org/10.3390/math14010110

APA Style

Liu, Y., Fan, G., Zhang, H., & Xiao, D. (2026). Defect Detection Algorithm of Galvanized Sheet Based on S-C-B-YOLO. Mathematics, 14(1), 110. https://doi.org/10.3390/math14010110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Defect Detection Algorithm of Galvanized Sheet Based on S-C-B-YOLO

Abstract

1. Introduction

2. Review

3. Method

3.1. Mosaic Data Augmentation

3.2. Gaussian Noise

4. Establishment of Galvanized Sheet Inspection Model Based on S-C-B-YOLO

4.1. SE Attention Mechanism

4.2. C3TR Module

4.3. Bi-FPN

5. Experimental Results

5.1. Experimental Environment

5.1.1. Software and Hardware

5.1.2. Establishment of the Galvanized Sheet Surface Dataset

5.1.3. Annotation Process

5.2. Evaluation Metrics

5.3. Results Curves

5.4. Ablation Study

5.5. Comparative Experiments

5.6. Actual Galvanized Sheet Detection Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI