Research on Metallurgical Saw Blade Surface Defect Detection Algorithm Based on SC-YOLOv5

: Under the background of intelligent manufacturing, in order to solve the complex problems of manual detection of metallurgical saw blade defects in enterprises, such as real-time detection, false detection, and the detection model being too large to deploy, a study on a metallurgical saw blade surface defect detection algorithm based on SC-YOLOv5 is proposed. Firstly, the SC network is built by integrating coordinate a tt ention (CA) into the Shu ﬄ enet-V2 network, and the back-bone network of YOLOv5 is replaced by the SC network to improve detection accuracy. Then, the SIOU loss function is used in the YOLOv5 prediction layer to solve the angle problem between the prediction frame and the real frame. Finally, in order to ensure both accuracy and speed, lightweight convolution (GSConv) is used to replace the ordinary convolution module. The experimental results show that the mAP@0.5 of the improved YOLOv5 model is 88.5%, and the parameter is 31.1M. Compared with the original YOLOv5 model, the calculation amount is reduced by 56.36%, and the map value is increased by 0.021. In addition, the overall performance of the improved SC-YOLOv5 model is be tt er than that of the SSD and YOLOv3 target detection models. This method not only ensures the high detection rate of the model, but also signi ﬁ cantly reduces the complexity of the model and the amount of parameter calculation. It meets the needs of deploying mobile terminals and provides an e ﬀ ective reference direction for applications in enterprises.


Introduction
In the context of intelligent manufacturing, machine vision [1,2] has become an important research direction for artificial intelligence, and same-target defect detection has become the focus of research. In recent years, with the rapid development of Chinaʹs industrial level, the demand for metallurgical saw blades has increased significantly. While ensuring the output of metallurgical saw blades, the quality requirements have also become an important issue. As a multi-blade tool [3], which is essentially a consumable, the saw blade will still have defects even though metallurgical technology and the manufacturing process of saw blades have been continuously improved in recent years. These defects, such as cracks, pitting, and scarring, greatly affect the service life of a saw blade. The real-time performance of manual detection [4,5] cannot be guaranteed. Therefore, a good surface defect detection technology for metallurgical saw blades can not only ensure realtime performance, but also ensure surface quality, which is of great significance to service life.
With the development of industrial intelligence, defect detection is broadly divided into two directions [6][7][8]. One is the direction based on traditional image processing, and the other is the direction based on deep learning image detection. In traditional image processing, Liu et al. [9] proposed a two-branch balanced saliency model based on discriminant features for fabric defect detection. This method can be used for accurate fabric defect detection and even surface defect detection for other industrial products. Joung et al. [10] used infrared imaging to detect defects in pipelines. Zhang et al. [11] proposed an algorithm combining local binary patterns (LBPs) and a gray-level co-occurrence matrix (GLCM). The local feature information and overall texture information of the defect image are extracted using LBPs and the GLCM, respectively. There are many ways of feature extraction. However, for specific traditional image processing methods, real-time processing and accuracy should be considered to meet the demand. In image detection methods based on deep learning, faster RCNN, YOLO and other series of networks are mainly used. Faster RCNN is a typical representative of two-stage target detection model, but the detection and training process is still relatively complex. Sun et al. [12] proposed a new face detection scheme based on deep learning and obtained advanced face detection performance by improving the faster RCNN framework. Shou et al. [13] proposed an improved region-based convolutional neural network (RCNN) fast detection method for airport detection in large-scale remote sensing images. Multi-scale training is applied to faster RCNN to enhance the robustness of the network for airport detection of different scales. The core idea of YOLO is to transform the target detection into a regression problem [14,15], using the whole map as the input of the network, just after a neural network, so that YOLO uses the whole graph as the input to the network, and just goes through a neural network to obtain the position of the bounding box and its category. Wang et al. [16] investigated the improved YOLOv4 algorithm using a shallow feature enhancement mechanism for the problems of insensitivity to small objects and low detection accuracy in traffic light detection and recognition. Xian et al. [17] used a triple loss function in YOT-Net in order to improve defect detection accuracy for copper elbows. Image similarity was used to enhance the feature extraction capability. The YOT-Net method for copper elbow surface defects was proposed. Wang et al. [18] proposed an optimized micro YOLOv3 algorithm with less computation and higher accuracy to solve the problem of insufficient accuracy of the original micro YOLOv3 algorithm in target detection in a lawn environment. Xue et al. [19] proposed an improved forest fire small-target detection model based on YOLOv5 for the problems of high unpredictability and strong destructiveness of forest fires. The model improves the backbone layer of YOLOv5 and adds an attention mechanism module to improve the identifiability of small forest fire targets.
This paper focuses on a defect detection algorithm based on deep learning, which can have high transmission efficiency while ensuring high detection accuracy, and the model is lightweight enough to meet the needs of enterprise applications. Taking the metallurgical saw blade of a metallurgical saw blade factory in Tangshan as the research object, the main network Shuffenet-V2 [20] was replaced in the YOLOv5 model, and the coordinate attention mechanism CA [21][22][23] was added to effectively improve the ability to extract metallurgical saw blade features. The lightweight convolution GSConv module was used to reduce the volume of the model, accelerate the speed of model reasoning, and realize the lightweight nature of the model. The regression loss function SIoU was introduced to accelerate the fitting of the data [24,25]. At the same time, the application of the improved lightweight identification model was studied.

YOLOv5s Target Detection Models
YOLOv5 is a classical algorithm for single-stage target detection. The YOLOv5s architecture comprises the input, backbone network, neck network, and prediction head [26]. The backbone is responsible for feature extraction. The neck is responsible for feature fusion. The head contains three detection heads, which are responsible for outputting detection information. The YOLOv5s network is the network with the smallest depth and the smallest width of the feature map in the YOLOv5 series. The YOLOv5 network structure is shown in Figure 1 below.

SC-YOLOv5 Improved Model
To meet the needs of low recognition accuracy and a large number of parameters, this study improves the YOLOv5 model. While improving the recognition accuracy, the parameters of the model are reduced and the model inference speed is accelerated. In this study, the coordinate attention mechanism was introduced into the Shuffenet-V2 network (a lightweight network) to construct the SC network, and the SC network structure was used as the lightweight backbone network of the YOLOv5 model. The lightweight convolution module (GSConv) was used to replace the convolution module, and SIoU was introduced to redefine the loss function so that the lightweight degree of the backbone network was effectively improved. The improved SC-YOLOv5 model network structure based on YOLOv5 is shown in Figure 2. In Figure 2, the SC-YOLOv5 network structure is divided into four parts, in which Input inputs and processes the dataset. Backbone extracts the features of the input metallurgical saw blade images and uses the SC network module as a new backbone network to improve the feature extraction ability of the model for key information. The acquired feature map is fused by Neck, and the lightweight convolution module is used instead of the ordinary convolution module. Head performs regression prediction, introduces the SIoU function to calculate the regression loss, and improves the convergence ability of the model.

Shuffenet-V2 Architecture
The most important part of the Shufflenet V2 network structure is the basic residual unit (block), which has two branching structures. As shown in Figure 3, the first structure performs a channel division operation at the input and divides the input feature map into two branches [27]; the primary branch contains three convolution operations the secondary branch does not perform any operation, and the input and output channels of each branch remain the same. The second structure splits the feature map into two branches, with three convolution operations in the primary branch and one depth-divisible convolution and one pointwise convolution in the secondary branch. The residual unit merges the output feature maps of the two branches by splicing at the output and further performs channel blending on the merged feature maps. Different subgroups are randomly extracted for rearrangement into new feature maps so that the next group convolution can fuse the input feature information from different groups, improving the information flow between channel groups and ensuring that the input and output channels are correlated.   Table 1.

Coordinate Attention Mechanisms
Most previous attention mechanisms for lightweight networks used SE modules [28,29], which only consider interchannel information but ignore positional information. Although later CBAM modules [30] tried to extract positional attention information by convolution after reducing the number of channels, convolution can only extract local relations and lacks the ability to extract long-range relations. Therefore, a new efficient attention mechanism, coordinate attention (CA), is proposed, which is able to encode horizontal and vertical location information into channel attention, allowing mobile networks to focus on a wide range of location information without incurring too much computational cost. It is shown in Figure 4, where C is the number of channels, W is the width, H is the height, and r is the dimension.
The advantages of the location attention module are mainly as follows: 1.
It captures not only interchannel information, but also direction-dependent positional information, which helps the model to better locate and identify the target.
2. It is flexible and lightweight enough to be easily inserted into the core structure of a mobile network. 3. It can be used as a pre-trained model for a variety of tasks such as detection and segmentation, both of which show good performance improvement.

Redefined SIoU Loss Functions
YOLOv5s uses the CIoU loss function [31,32], although the length-width ratio of the predicted bounding box to the real bounding box is introduced on the basis of the DIoU loss, which makes the loss function pay more attention to the shape of the bounding box. However, the calculation of CIoU loss is relatively complex, which may lead to a large computational overhead in the training process. In order to solve the above problems, the loss function SIoU is introduced into the improved network model, which not only considers the overlapping area, distance, length, and width, but also considers the angle between the prediction box and the real box. The SIoU loss function consists of four cost functions: angle cost, distance cost, shape cost, and IoU cost.
The SIOU loss function is as follows: The overall loss function is as follows: In the formulas above, IOU denotes conventional regression loss, Δ denotes distance loss, Ω denotes shape loss, box W denotes frame loss model volume, box L denotes regression loss, cls W denotes categorical loss model volume, and cls L denotes focal loss.

Lightweight Convolutional GSConv
In the lightweight model design, the deep neural network has only used deep separated convolution (DSC) from start to finish. Although the number of parameters is reduced and the separation of channels and regions is achieved, DSC does not effectively use the feature information of different channels at the same spatial location. In order to make the output of the DSC as close to the SC as possible, a new method, GSConv, was introduced, which, as shown in Figure 5, introduces the SC-generated information (dense convolution operation) into each part of the DSC-generated information using shuffle. This method allows the information from SC to be completely blended into the output of DSC. GSConv first downsamples the inputs for normal convolution and then performs deep convolution using DSC; the results from SC and DSC are stitched together, and finally, a shuffle operation is performed.

Collection of Datasets
The dataset used industrial cameras mainly collected from a metallurgical saw blade factory in Tangshan. In order to increase the diversity of metallurgical saw blades and improve the generalization ability of the recognition model, natural image data from different angles and different environments were selected when collecting datasets. Through random noise, Gaussian blur, random cropping, random rotation, random translation, and other data amplification methods of the original image, three kinds of metallurgical saw blades with cracks, pitting corrosion, and scarring were collected. The metallurgical saw blade defect part of the picture is shown in Table 2.

Photos Description
Collected images of some metallurgical saw blade defects, numbered 000001-000004.
Collected images of some metallurgical saw blade defects, numbered 000005-000008.
Collected images of some metallurgical saw blade defects, numbered 000009-000012.
Collected images of some metallurgical saw blade defects, numbered 000013-000016.
Collected images of some metallurgical saw blade defects, numbered 000017-000020.

Dataset Processing
The collected dataset needed to be pre-processed, including data labeling, label conversion, and data storage. 1. Use labelImg software (https://gitcode.net/mirrors/tzutalin/labelimg?utm_source=csdn_github_accelerator, accessed on 8 August 2023) to label the dataset information, use the labeling software to frame the position of defects in the image, and save it as an xml format file, as shown in Table 3.

Photos Description
The picture on the left shows the crack defects of metallurgical saw blades, which are easily caused by the large temperature difference during the quenching process.
The picture on the left shows a metallurgical saw blade scarring defect, which is easily caused by poor surface cleaning when cutting steel.
The figure on the left represents the pitting corrosion defects of metallurgical saw blades. When there are localized damages on the surface of metallurgical saw blades, such as small holes, microcracks, and scratches, corrosive media and oxides are easily gathered and thus form pitting corrosion.
2. The annotation information required by the Yolo algorithm is the coordinates of the center of the target frame in the sample image and the width and height of the target frame. Therefore, the xml annotation file needs to be converted. The specific steps are as follows: • Calculate the YOLO type annotation data: set the coordinates of the center point of the target frame on the sample image as (x, y) and the width and height of the target frame as w and h, respectively. According to the known data of <xmin>, <xmax>, <ymin>, and <ymax>, they can be represented as follows (3):

Experimental Environment
This experiment was carried out using a Win11 operating system and NVIDIA Ge-Force GTX 105 graphics card. The model was built, trained, and validated using the Pytorch framework. The parameters of the YOLO model were initialized, and during the training process, the applied network model was designed with adaptive anchor frames, with initial anchor frame sizes set to [10,13,16,23,30,33], [30,45,59,61,62,119], and [90,116,156,198,326,373]. To perform network training, the learning rate was set to 0.01, and the number of training epochs was set to 200. The test platform is shown in Table 4.

Model Performance Evaluation
To accurately evaluate the improved SC-YOLOv5 network model, the model evaluation indicators of this experiment included the average accuracy mean mAP@0.5, confusion matrix, PR curve, and loss fitting diagram to evaluate the improved model. The PR curve reflects the relationship between precision and recall. The relationship between precision and recall is reciprocal. If the classifier only predicts the samples with high probability as positive samples, many positive samples with relatively low probability but still satisfied will be missed, resulting in a decrease in recall. The PR curve of the improved model reflects the average accuracy mAP@0.5 of the three defects in the model evaluation standard, and the mAP@0.5 values of cracks, pitting corrosion, and scarring are 0.866, 0.897, and 0.893. The average mAP@0.5 value of the three defects reached 0.885, as shown in Figure 6; the model has relatively high accuracy and good performance. The confusion matrix is a summary of the prediction results of the defect types. The count value is used to summarize the number of correct and incorrect predictions by classification and is broken down by defect type, showing which part of the classification model is confused when making predictions. From Figure 7 it can be seen that the actual value is close to the predicted value, and the positive sample ratios for cracks, pitting, and scars are 0.89, 0.91, and 0.92, respectively. The mAP is one of the metrics used to evaluate the detection performance of the improved model, combining the performance of the precision and recall metrics, taking into account the performance at different confidence levels. Specifically, mAP is obtained by averaging the average precision (AP) over all defect categories, and AP is the size of the area under the precision-recall curve. The precision and recall are shown below, where TP denotes data predicted to be defective and actually defective, FP denotes data predicted to be defective but actually not defective, TN denotes data predicted not to be defective and actually not defective, and FN denotes data predicted not to be defective but actually defective.

FP TP
TP + = Precision (6) FN TP TP + = Recall (7) Precision indicates the accuracy of the model in predicting correctly; the higher the value, the less the model misclassifies as a positive case. Recall reflects the ability of the model to identify correct samples; the higher it is, the lower the risk of the model underreporting (predicting a positive case as a false case). As shown in Table 5, the smaller the mean value of the loss fit plot, the more accurate the detection. Six metrics can be ideally fitted after 200 epochs of training, as expected.

Photos Description
The box_loss plots indicate the error between the prediction box and the calibration box; the smaller it is, the more accurate the localization is. The error in the graph is less than 0.04 and its defects are more accurately localized.
The obj_loss plots indicate the confidence level of the computational network; the smaller it is, the more accurate the ability to judge it as a target. The higher the confidence level of the graphs are all below 0.04, the higher their ability to be judged as defective targets.
The cls_loss plots indicate whether the computational anchor box is correctly categorized with the corresponding calibration; the smaller it is, the more accurate the categorization is. The values in the graph are all below 0.01 and their defects are categorized more accurately.
The mAP plots represent the average mAP over different IoU thresholds (from 0.5 to 0.95 in steps of 0.05), with higher mAP values being more accurate. In the graph mAP@0.5 Above 0.8, mAP@0.9 Above 0.6, its accuracy is relatively accurate.
The precision plot represents the proportion of truly defective samples out of all samples that were predicted as defective. The recall plot represents the proportion of samples that are correctly predicted as defective out of all samples that are truly defective. Both the predicted value and the recall in the figure are higher than 0.8, which gives better precision results.

GSConv Improves the Performance of the Model
To prove that the lightweight convolution module GSConv has a better effect on the YOLOv5 network structure and to obtain a better performance of the feature network SC-YOLOv5, comparative experiments were carried out for this paper. First, the original YOLOv5 network structure was experimentally analyzed using the lightweight convolutional module GSConv and the ordinary convolutional module Conv. The experiments show that by using the lightweight convolution module GSConv in the original YOLOv5 network structure, the model parameters are reduced by 6.4%, the F1 value is increased by 0.03, and the map value is increased by 0.04, so the network lightweighting is improved and the accuracy is effectively improved, as shown in Figure 8.

Network Model Performance Comparison Experiment
In order to objectively evaluate the performance of the model proposed in this paper, the self-made metallurgical saw blade defect dataset was selected to test SC-YOLOv5 and compare it with SSD, YOLO v3-tiny, and YOLOv5. The comparison results are shown in Table 6. From Table 3, it can be seen that this study further compares SC-YOLOv5 with SSD, YOLOv3-tiny, and YOLOv5 models on the metallurgical saw blade dataset. It can be seen from Table 3 that the mAP@0.5 of the SC-YOLOv5 model is 88.5%, and the memory size of the model is 5.78 MB. Compared with the SSD, YOLOv3-tiny, and YOLOv5 models, the mAP@0.5 of the SC-YOLOv5 model is increased by 25.6, 0.2, and 2.1 percentage points, respectively. The memory size of the model is reduced by 93.7%, 65.1%, and 79.2%, respectively. It can be seen that compared with the SSD, YOLOv3-tiny, and YOLOv5 models, the SC-YOLOv5 model has the best overall performance on metallurgical saw blade defects, and the model complexity is the lowest, which is conducive to the deployment of an improved metallurgical saw blade defect detection model in low-power equipment.

Improvement of SC Network Structure and GSConv on Model Performance
The SC network structure is used to replace the backbone network part in the original YOLOv5, SIoU is redefined as the loss function, and the light convolution module GSConv is replaced to improve the detection effect and further lighten the model parameters. For this paper, the SIoU loss function is first replaced by CIoU, and the SE, CBAM attention mechanism, and CA attention mechanism are added at the same position of the end layer of the YOLOv5 backbone network for ablation experiments. At the same time, the lightweight convolution module is replaced to complete the final improvement of the model, as shown in Table 7. In order to effectively evaluate the performance of the improved model SC-YOLOv5, other models were selected for comparison with SC-YOLOv5. The mAP training curves of the improved model and other models are shown in Figure 9. Experiments show that after the ShuffleNetV2 network structure replaces the original YOLOv5 backbone network structure, its model parameters are greatly reduced, and the lightweight degree of the network is greatly improved, as shown in Table 6. Compared with Model 6, the model parameters of SC-YOLOv5 (Model 7) in this paper are reduced by 2.5%, the map value is increased by 0.018, and the overall lightweight degree is improved. Compared with Model 5, the model parameters are reduced by 5.9%, and the map value is increased by 0.037. Compared with Model 4, the model parameters are reduced by 5.8%, and the mAP value is increased by 0.052. Compared with Model 1, the model parameters are reduced by 56.36%, and the lightweight degree of the network is greatly improved, which significantly improves the long-term transmission efficiency in industrial production, and the map value is slightly improved.
In Figure 9, compared with other models, after 200 rounds of full training, it is obvious that the improved Model 7 has a better convergence effect and better average accuracy. In order to verify the effectiveness of the model improvement, a random mixed test diagram of multiple defects in the test set was selected, and SC-YOLOv5 was used for testing. The test results are shown in Figure 10. It can be seen from the figure that for metallurgical saw blades with various defects, the improved model can effectively identify the types of defects and locate the target.