1. Introduction
Deformation damage is inevitable in the actual production and manufacturing of steel, which causes defects such as crazing, pitted surface, and inclusions [
1]. These defects significantly impact the quality of steel products, causing safety concerns and economic losses for companies. In the process of steel production, it is necessary to find and eliminate product defects in time, so as to improve the safety of products and avoid economic losses [
2]. The traditional method is mainly the visual detection method [
3], which is a manual detection method. Manual visual inspection methods have the disadvantages of low detection efficiency and limited accuracy. The limitations of the inspectors themselves can also lead to product misdetection and missed detection.
With the advancements in deep learning and computer vision, deep learning-based defect detection methods are increasingly being applied to various detection tasks. Deep learning methods have great advantages in detection speed and accuracy. Researchers have started employing these methods to solve different defect detection tasks in production.
Currently, deep learning-based detection methods are divided into one-stage methods and two-stage methods. Among one-stage methods, Deng et al. [
4] introduced the LFD-YOLO model, which is designed to detect surface defects on engine turbine blades. To detect printed circuit board (PCB) defects, Jiang et al. [
5] introduced dilated convolution and coordinate attention into the Single-Shot MultiBox Detector (SSD). Li et al. [
6] embedded an Efficient Channel Attention (ECA) mechanism into the YOLOX model to realize wood surface defect detection. Xing et al. [
7] constructed different scale convolution layers in the backbone network to detect the defects of railway train wheels. Xiang et al. [
8] used the HookNet model to detect fabric defects. Chen et al. [
9] designed a lightweight network, YOLOv8-FSD, for detecting surface defects on photovoltaic cells. Gao et al. [
10] introduced LGR-Net, a method for detecting defects in elevator guide rail clamps. This method employs a small object detection layer to build a multi-scale feature fusion network, thereby attaining greater detection accuracy. In two-stage methods, Chen et al. [
11] proposed an MANet network model applied to road defects detection based on MobileNet. Xiao et al. [
12] proposed a model for tiny target detection. In the model, the context enhancement module (CEM) is responsible for strengthening target feature information, and the feature purification module (FPM) is responsible for eliminating conflicting feature information. Tan et al. [
13] introduced an image processing method that they integrated into their proposed segmentation algorithm, thereby enabling the automatic identification of tunnel water leakage. Cui et al. [
14] proposed a CAB-Net, which uses dilated convolution to enhance target context information and improves ability to detect tiny targets. Urbonas et al. [
15] used Faster R-CNN to identify defects on the surface of wood panels.
Although deep learning techniques offer advantages in various defect detection tasks, they still face significant challenges and limitations in practical industrial applications. Firstly, the complexity of surface defects brings some challenges to detection. Secondly, the accuracy, parameter quantity and inference speed of defect detection models are also very important in practical applications [
16].
In view of the issues present in existing detection methods, we proposed a detection model named LCED-YOLO. The model aims to enhance model performance while maintaining a lightweight structure. First of all, we designed a module to enhance the model’s extraction capability. Secondly, LDConv was incorporated into the network bottleneck, which decreases the model’s parameters. Thirdly, we developed a lightweight decoupled head specifically for detection tasks, further lowering the model’s parameters. Finally, by introducing a learnable factor to optimize the CIoU loss function, the sample imbalance issue is addressed, and detection performance is strengthened. The main contributions are as follows:
1. A multi-scale enhancement module (MSE module) was designed in conjunction with the C3K2 module to create the C3K2-MSE module, which effectively enhances the feature information processing capability by accentuating edge information within the features.
2. A lightweight neck network and detection head were designed. In comparison to the original network, this approach reduces both the parameter count and FLOPs, while simultaneously enhancing detection accuracy.
3. By employing Focal-CIoU as the model’s loss function, the introduction of learnable factors effectively mitigates the sample imbalance problem in detection tasks, thereby enhancing the model’s detection performance.
The rest of this article is organized as follows. In
Section 2, the related research work is introduced.
Section 3 introduces the proposed LCED-YOLO network model in detail.
Section 4 presents the experimental evaluation. Finally,
Section 5 summarizes the work presented in this paper.
4. Experiments
4.1. Datasets
To verify the effectiveness of the proposed LCED-YOLO network, we conduct experimental verification on the NEU-DET [
45] and GC10-DET [
46] datasets. The NEU-DET dataset includes six distinct categories of steel surface defects, which comprise crazing (Crz), rolled-in scale (Rs), inclusions (In), patches (Pa), scratches (Sc), and a pitted surface (Ps). This study employs an 8:1:1 ratio to partition the dataset. There are 1440 images for training, 180 images for validation, and 180 images for testing. Additionally, the GC10-DET dataset includes ten categories of steel surface defects, which comprise oil spots (Os), creases (Crs), water spots (Ws), crescent gaps (Cg), silk spots (Ss), inclusions (In), weld lines (Wl), containing punching (Pu), waist folding (Wf), and rolled pits (Rp). This study employs an 8:1:1 ratio to partition the dataset, consisting of 1834 training images, 230 validation images, and 230 test images. The defects contained in these datasets are diverse and random in defect distribution and shape size. These characteristics will greatly increase the difficulty of extracting complex features of defects.
4.2. Experimental Setup
The experiment was completed under the Windows10 operating system, based on torch 1.12.1 and Python 3.8 as the deep learning framework implementation model. The system utilized an NVIDIA GeForce RTX 3060 GPU with 12 GB memory and an Intel Core i5-13400F CPU. The version of CUDA is 11.6. All input images are standardized to a resolution of 640 × 640, the training cycle is 300, the batch size is 8, and the SGD optimizer is used to optimize the model. The initial learning rate is 0.01, the weight attenuation coefficient is 0.0005, and the momentum is 0.937. The Mosaic method is used for data enhancement, and the last 10 epochs are closed. In order to ensure the fairness and comparability of the experiment, pre-training weights are not used in all experiments in this paper. All comparison methods and our proposed model were trained from scratch under identical training conditions.
4.3. Evaluation Basis
In order to evaluate the model’s performance and evaluate the effectiveness of the proposed method from different aspects, we select the average precision (AP), mean average precision (mAP), model parameters, model calculation, and FPS as the evaluation indexes of the model. The calculation formulas of these indexes are given in Equations (17)–(20).
where
represents the accuracy rate, which is the probability of predicting the correct number of positive samples in all predicted positive samples, and
represents the recall rate, which is the probability of predicting the correct number of positive samples in all actual samples.
represents the number of positive samples predicted correctly,
represents the number of positive samples predicted incorrectly, and
represents the number of positive samples predicted as negative samples.
4.4. Comparison Experiments
To validate the advantages of LCED-YOLO, it was compared against several popular algorithms. The main comparison algorithms include SSD [
17], YOLOv5s, YOLOv8n, YOLOv11n, Faster R-CNN [
22], RT-DETR [
47], and the advanced defect detection models WSS-YOLO [
48] and RDD-YOLO [
33]. Comparative experiments were performed on both the NEU-DET and GC10-DET datasets, which further demonstrated the efficacy and advantages of our proposed approach. In our experimental results, we employed multiple-repetition averaging to ensure the fairness of the outcomes.
4.4.1. Comparisons with Other Methods on NEU-DET
According to
Table 1 and
Table 2, the LCED-YOLO model shows significant performance compared with other methods. Compared with YOLOv11n, LCED-YOLO has a 2.6% increase in mAP50. Although it has a slight decrease in FPS, its model achieves a 19.2% reduction in parameters and a 23.1% decrease in computational complexity. It has better comprehensive performance and can meet the lightweight requirements of industry. LCED-YOLO reaches 79.8% mAP50, which is the best for the three defects of In, Ps, and Pa compared with all the other models.
Compared with SSD, Faster R-CNN, and RT-DETR, LCED-YOLO increases mAP50 by 7.1%, mAP50 by 4.9%, and mAP50 by 5.8%, respectively. Compared with YOLOv5s and YOLOv8n, it increases mAP50 by 4.5% and mAP50 by 2.8%, respectively. Compared with RDD-YOLO, the proposed method improves mAP50 by 2.8% while paying less resources and achieves higher detection accuracy. Although mAP50 is reduced by 0.1 compared with WSS-YOLO, LCED-YOLO has greater advantages in parameter quantity and computational complexity. Compared with WSS-YOLO, the parameter quantity is reduced by 53.3% and the computational complexity is reduced by 47.9%, which is more suitable for lightweight industrial deployment tasks.
To validate the detection ability of the LCED-YOLO, we conducted a visual experiment of the detection effect. In the experiment, the LCED-YOLO model was compared with advanced detection models such as YOLOv5s, YOLOv8n, YOLOv11n, RDD-YOLO, and WSS-YOLO. As shown in
Figure 9, in the experiment, a total of 6 different types of defects were selected; the prediction results were compared with ground truth boxes. Compared with other methods, LCED-YOLO shows excellent detection ability and can complete the detection task efficiently and accurately.
4.4.2. Comparisons with Other Methods on GC10-DET
To further verify the effectiveness and generalization ability of LCED-YOLO, the model parameters are kept unchanged, and the verification is performed on the GC10-DET dataset. The specific results are shown in
Table 3 and
Table 4.
Compared with other advanced methods, LCED-YOLO showed the best performance, reaching 70.3% mAP50. SSD, Faster R-CNN, RT-DETR, and YOLOv5s show relatively low performance in the GC10-DET dataset. Models such as YOLOv8n, YOLOv11n, RDD-YOLO, and WSS-YOLO achieve higher mAP50, but their model sizes are relatively large and are not suitable for application in actual industrial environments. LCED-YOLO has obvious advantages, such as higher detection accuracy and lower model parameters and computational complexity. These results validate the model’s enhanced performance in practical industrial defect detection applications, establishing a robust basis for quality assurance in manufacturing workflows.
4.5. Ablation Experiments
This section performs ablation experiments on the proposed enhanced approach using the NEU-DET dataset to validate its efficacy. Furthermore, experimental comparisons were made regarding the combination of multi-scale pooling kernels within the C3K2-MSE module and the choice of the model’s loss function. For clarity in presenting the ablation results, seven experimental ablation schemes are abbreviated as follows:
The Baseline using the C3K2-MSE module is called E-YOLO.
The Baseline using the LDConv module is called D-YOLO.
The Baseline using the lightweight detection head is called L-YOLO.
The Baseline using Focaler-CIoU is called C-YOLO.
The Baseline using C3K2-MSE and LDConv modules is called ED-YOLO.
The Baseline using C3K2-MSE, LDConv, and lightweight detection head modules is called LED-YOLO.
The Baseline using C3K2-MSE, LDConv, a lightweight detection head, and Focaler-CIoU is called LCED-YOLO.
According to
Table 5, the experimental results confirm the effectiveness of the proposed method. The LCED-YOLO model improves mAP50 by 2.6% compared to the Baseline. E-YOLO uses the C3K2-MSE module, and the model mAP50 is increased by 2.1%, which highlights the effectiveness of C3K2-MSE in capturing edge details of defects. D-YOLO introduces the LDConv module. While there is a marginal decline in detection accuracy, the model achieves a reduction of roughly 7.7% in parameters and 3.1% in computational complexity. By utilizing a lightweight decoupled head, L-YOLO improves mAP50 from 77.2% to 79%, reduces parameters by 15.3%, and reduces FLOPS to 5.2 G. This paper introduces a lightweight decoupled head that enhances model accuracy while simultaneously decreasing both parameters and computational complexity. Through the application of Focaler-CIoU for regression loss in bounding box prediction, C-YOLO improves its mAP50 by 1.1% while maintaining the same model parameters and computational load. This result indicates that Focaler-CIoU effectively strengthens the model’s ability to process imbalanced data samples. ED-YOLO uses both C3K2-MSE and LDConv modules to achieve 77.3% mAP50, which optimizes the model’s parameters. LDConv reduces the learning of positional offsets at each location via linearization, substituting downsampling techniques in the neck network to reduce the model’s parameters and computational complexity. It exhibits a robust capability for targets with significant feature variations, offering flexibility in response to deformations and local changes. Nevertheless, for targets that are less sensitive to feature variations, LDConv’s capacity for capturing global feature information is comparatively limited, potentially compromising the model’s detection performance. LED-YOLO uses C3K2-MSE, LDConv, and a lightweight decoupling head to achieve 78.2% mAP50 and 158 FPS, while optimizing the parameters and computational complexity of the model. Finally, the proposed method is integrated into LCED-YOLO, with mAP50 of 79.8%. The total number of parameters in the model is 2.1 M, representing a 19.2% reduction from the baseline. The computational complexity (FLOPS) is 5.0 G, a 23.1% decrease from the baseline, and the model achieves an inference speed of 151 FPS. In summary, this experiment proves the excellent detection performance of the LCED-YOLO model.
In the MSE module, it is necessary to conduct experiments to determine the optimal size for combining multi-scale pooling kernels. To ensure a sufficient and reasonable number of feature channels from the output of each convolutional layer, we experimented with the three combination sizes [3], [3, 6], and [3, 6, 9, 12]. The combination with the most significant impact on model performance was selected, and the results are presented in
Table 6.
As shown in
Table 6, the model performs optimally with the size combination [3, 6, 9, 12]. Conversely, the combinations [3] and [3, 6] result in a diminished capacity for capturing global feature information due to a higher number of channels, thereby compromising the model’s detection performance. A larger channel count also leads to an increase in the number of model parameters. Consequently, the [3, 6, 9, 12] combination is selected for use in the MSE module.
In the model, different loss functions will affect its performance. Therefore, when selecting a loss function, experiments are required to determine the best choice. This paper compares Focal-CIoU with other mainstream loss functions while keeping other conditions constant. From the results in
Table 7, it can be seen that Focal-CIoU’s mAP50 is 79.8%, which is 1.7%, 1.0%, 0.5%, 0.7%, 1.2%, and 0.2% higher than the other loss functions, respectively. The introduction of Focal-CIoU enables the model to achieve the best overall performance. This indicates that compared to other loss functions, Focal-CIoU is the most suitable loss function for this model.
5. Conclusions
This paper introduces the LCED-YOLO model for detecting defects on steel surfaces. Firstly, the C3K2-MSE module is designed to enhance the model’s capacity to extract edge information, which in turn enhances the model’s detection accuracy for complex objects. Secondly, LDConv is introduced to lightweight the neck structure of the model, which effectively reduces the model’s parameter and computational complexity. Thirdly, a lightweight decoupling head is designed. Through the grouping and refinement of feature information, the model’s detection performance is significantly improved while achieving a lightweight architecture. Finally, the CIoU loss is optimized through the introduction of learnable attention factors, improving the model’s adaptability to imbalanced samples and thereby boosting its overall performance. LCED-YOLO on NEU-DET is 79.8% mAP50, Params is 2.1 M, FLOPS is 5.0 G, FPS is 151; on GC10-DET, the precision reaches 70.3% mAP50 and the FPS is 188. Compared with other excellent models, it demonstrated superior overall performance. In summary, the model demonstrates strong performance in detection accuracy, parameters, and computational complexity, though opportunities for enhancement remain. The next research focus should be on maintaining the model detection ability while optimizing the network structure to achieve a more lightweight effect and make it easier to apply in actual industrial settings.