Steel Surface Defect Detection Algorithm Based on Improved YOLOv8n

: The traditional detection methods of steel surface defects have some problems, such as a lack of feature extraction ability, sluggish detection speed, and subpar detection performance. In this paper, a YOLOv8-based DDI-YOLO model is suggested for effective steel surface defect detection. First, on the Backbone network, the extended residual module (DWR) is fused with the C2f module to obtain C2f_DWR, and the two-step approach is used to carry out the effective extraction of multiscale contextual information, and then fusing feature maps formed from the multiscale receptive fields to enhance the capacity for feature extraction. Also based on the above, an extended heavy parameter module (DRB) is added to the structure of C2f_DWR to make up for the lack of C2f’s ability to capture small-scale pattern defects between training to enhance the training fluency of the model. Finally, the Inner-IoU loss function is employed to enhance the regression accuracy and training speed of the model. The experimental results show that the detection of DDI-YOLO on the NEU-DET dataset improves the mAP by 2.4%, the accuracy by 3.3%, and the FPS by 59 frames/s compared with the original YOLOv8n.Therefore, this paper’s proposed model has a superior mAP, accuracy, and FPS in identifying surface defects in steel.


Introduction
Nowadays, our country vigorously develops new industries, vigorously promotes industry, and its information business is a very important asset.In steel production processes, the detection of surface defects is a very critical part, as it is related to the quality of the entire industrial products and production safety aspects.Steel is extensively employed in many fields, including, in the past few years in our country, the rapidly developing aerospace and aviation industry, the oil industry, the automobile industry, the shipbuilding industry, the large-scale equipment industry, and other important industrial areas.Steel surface defects in these important industry areas are very important and have an immediate impact on the caliber of the industrial goods we produce and manufacture out of the finished product quality.If the quality of the finished product is poor, the economy will suffer.Therefore, it is urgent to improve the accuracy of steel surface defect detection.
Current surface defect detection methods fall into one of three categories: deeplearning-based, traditional-technology-based, and machine vision methods.These include manual inspections using magnetic particles, eddy currents, vision, and visible light inspection.Artificial visual inspection consists of checking the appearance or performance defects of the product by visual or tactile means, and the shortcomings of artificial visual inspection are also very obvious, e.g., too subjective, low efficiency, and the influence of external factors.The eddy-current-based inspection method is a non-destructive surface defect detection method.If there are some common defects on the steel surface, these defects will change the distribution of eddy currents and the characteristics of the magnetic field, so that the defects can be identified and localized by detecting changes in the magnetic a small sample steel plate defect detection algorithm based on lightweight YOLOv8 to bypass the difficult-to-apply deep learning approach when it comes to the detection of small sample defects.The LMSRNet network was designed to replace the YOLOv8 backbone, and the CBFPN and ECSA modules were developed to address the issue of the light weight of the model.Guo et al. [18] proposed an algorithm based on the improved MSFT-YOLO model, integrating the TRANS module into the network and data enhancement processing to increase the accuracy of steel defect detection.Cui Kebin et al. [19] proposed MCB-FAH-YOLOv8 to address the problems of defect detection, such as false detection and high leakage rate, by adding the improved convolutional attention mechanism module (CBAM) and the replaceable four-head ASFF predictor, so that the network could enhance the ability to detect tiny objects and dense targets and improve the accuracy at the expense of speed.The above methods are all based on YOLO for steel surface defects.
Zhou Yan et al. [20] proposed a method for detecting steel using multiscale lightweight attention, using a channel attention module for multilevel feature maps to reconstruct their channel-related information and improve the detection effect.Zhu et al. [21] used an efficient swin transformer to detect and classify steel surface defects, an approach which strengthened the connection between the feature mapping channels, reduced the resolution, and solved the image information retention problem.He [22] suggested that an MFN network can effectively improve the ability to obtain information about steel surface defects, and its features are able to integrate information from various levels to detect steel surface defects.A baseline convolutional neural network (CNN) is employed to produce feature maps for subsequent stages.The proposed network (MFN) is capable of fusing multiple features into a single feature and, based on these multilevel features, the region proposal network (RPN) is employed to produce the region of interest (ROI); finally, the baseline network ResNet 34 is used to achieve 74.8% mAP.The above methods have been designed and improved based on transformer and convolutional neural networks, respectively.Although all of the above methods improve the performance of the target detection algorithm to some degree, they still fall short in terms of accuracy and some other aspects.To summarize, we propose the steel surface defect detection algorithm DDI-YOLO, and the contributions made in this essay are the following: (1) The resolution of the issue of the original C2f module of YOLOv8 having insufficient ability to extract defective features on the steel surface and being unable to extract multiscale contextual information.Therefore, the dilation-wise residual module is added to the original C2f to further enhance the capacity of the network to extract multiscale contextual information.
(2) Since C2f in YOLOv8 cannot detect small-scale patterns during training, the dilated reparam block module is used to enable the detection of defects by C2f on small-scale patterns, thus enhancing the training ability of the model.
(3) The C2f_DWR module and the C2f_DRB module are combined to form the C2f_DWR_DRB module, which blends the benefits of every module to improve the model's comprehensiveness.
(4) A faster and more accurate convergence by making use of the Inner-IoU loss function to take the place of YOLOv8's original CIoU loss function.

Model Architecture of YOLOv8
YOLOv8 is a model built by Ultraytics based on the success of previous generations of YOLO, including upgrades and new features to further enhance performance and flexibility.YOLOv8 introduces innovations such as a new backbone network, an anchorless detection header, and a new loss function to support multi-platform operations from CPUs to GPUs.According to the model depth and feature map size, YOLOV8 is divided into different versions: YOLOv8-s, YOLOv8-n, YOLOv8-m, YOLOv8-l, and YOLOv8-x, with a total of five versions.The network structure diagram of YOLOv8 can be separated into the following four major parts, namely input, backbone, neck, and head.Among them, input is the input start, which is in charge of adjusting the scaling of the supplied image to the required training size in a certain proportion, and contains operations such as scaling, adjusting the image's color tone, etc.The backbone is the module used to extract the main information and is made up of a convolution module, C2f, a module (CSPLayer_2Cnov) which replaces the C3 module and the SPPF spatial pyramid pooling module used in YOLOv5 block.The neck module is employed to improve the merging of features of different dimensions through its structure containing an FPN feature pyramid and a path aggregation structure called dual stream FPN, which has the advantage of high efficiency, speed, etc.The head part is similar to that found in the previous YOLOv6 [23] and YOLOX [24].The decoupled head is used, while the coupled head is used in the previous YOLOv3 [25], YOLOv4 [26], and YOLOv5 [27].YOLOv8 also uses three output branches, but each output branch is subdivided into two parts which are used to categorize and regress the bounding box, respectively.Considering the model's detecting capabilities, this experiment used YOLOv8n as the baseline model and improved on it.

C2f-DWR Module
The network structure of YOLOv8 contains a large number of C2f modules, the fu name of which is "Cross Stage Partial Feature Fusion", which improve the performanc and efficiency of the model by partially fusing features at different stages of the network Further, the C2f module performs feature fusion at different stages of the network to full utilize the multilayer feature representation of the network.This feature fusion metho helps increase the model's detection accuracy and cut down on the number of parameter while the model's computational effectiveness is also enhanced.However, because th defects on the steel surface are relatively large no matter the shape, location, or size di ferences, especially the cracking-type defects, rolling oxide skin defects, and pitting su face defects, the defects have a large distribution range, the size and shape are no

C2f-DWR Module
The network structure of YOLOv8 contains a large number of C2f modules, the full name of which is "Cross Stage Partial Feature Fusion", which improve the performance and efficiency of the model by partially fusing features at different stages of the network.Further, the C2f module performs feature fusion at different stages of the network to fully utilize the multilayer feature representation of the network.This feature fusion method helps increase the model's detection accuracy and cut down on the number of parameters while the model's computational effectiveness is also enhanced.However, because the defects on the steel surface are relatively large no matter the shape, location, or size differences, especially the cracking-type defects, rolling oxide skin defects, and pitting surface defects, the defects have a large distribution range, the size and shape are not uniform, and the original C2f module of YOLOv8 has insufficient ability to extract the defective features on the steel surface and it cannot extract the multiscale contextual information.Therefore, in this paper, to enhance the network's capacity of extracting multiscale contextual information, a brand-new module, the C2f-DWR module, is designed.The C2f-DWR structure is characterized by the addition of the DWR module to the original C2f module, a decision which strengthens the extraction of features from the extensible sensory field at the higher level of the network.
The DWR module is fully known as a dilation-wise residual [28], which is also known as a dilation residual module.The DWR module is designed in a residual manner.With the residual, a two-step approach is used to efficiently extract multiscale contextual information and then fuse the feature maps derived from multiscale sensory fields.Specifically, the earlier method of acquiring multiscale context information in a single step is decomposed into a two-step method to reduce the acquisition difficulty.
The network structure of the DWR module, which comprises two branches, is shown in Figure 3.The two branches are implemented as follows: The first branch generates the associated residual features from the input features of the steel surface defects which are referred to as region residuals.In this branch, a series of concise feature maps in the form of regions with different sizes are generated as material for the next morphological filtering.This step is achieved by a common 3 × 3 convolution paired with a ReLU and a BN layer.The 3 × 3 convolution is employed for initial feature extraction.The ReLU activation function, instead of the commonly used PReLU layer, has a significant impact on the activation of the region features and on their conciseness.

C2f-DRB Module
The DRB module, known in full as the dilated reparam block [29] extended reprameterization module, utilizes a compact kernel and multiple layers with varying dilation rates to enhance a larger kernel convolutional layer.Its key hyperparameters include the size (k) of the larger kernel, the size (k) of the parallel convolutional layers, and the rate of expansion (r). Figure 4 depicts the scenario with four parallel layers, indicated by k = 9, r = (1, 2, 3, 4), and k = (5, 5, 3, 3).For greater K, we can employ more expansion layers with bigger kernel sizes or expansion rates.The core size and expansion ratio of parallel branches are flexible, the sole restriction being (k − 1) r + 1 < k.
To create a sizable kernel convolution layer for inference from the dilated repararm block, we first merged each BN into the preceding convolution layer, converted each layer with a scaling of r > 1 using function 1, and summed all resultant kernels with appropriate zero padding.The non-dilated large core layer was enhanced by the dilated small kernel curl layer in the dilated reparam block.
From a parametric point of view, such an expanded layer is comparable to a nonexpanded convolutional greater sparse kernel layer, thus allowing the entire block to be converted correspondingly into a single big kernel convolution.The use of a parallel small kernel convolution in conjunction with the large kernel convolution is recommended, as the latter facilitates the training process by capturing small-scale patterns.Their outputs are summed after two corresponding batch normalization (BN) layers.After training, the BN layers were merged into the convolution layers using a structural reparameterization method, in order for the large kernel convolution and the tiny kernel convolution to be equivalently combined for inference.In addition to small-scale patterns, augmenting the capacity of the big kernel to detect sparse patterns (i.e., pixels on the feature map that might have a stronger correlation than their neighbors with certain distant pixels) can produce higher-quality features.The need to capture such patterns is perfectly addressed by the mechanism of dilation convolution.From the viewpoint of a sliding window, the The second branch consists of the morphological filtering of the regional features of steel surface defects using multirate extended depth direction convolution, which becomes the semantic decidualization of steel defects.For every channel feature, just one intended receptive field is applied to avoid, if at all possible, redundant receptive fields.In practical steel surface defect detection, the desired concise region feature map can be judiciously discovered in the initial phase based on the extent of the second step's receptive field for fast matching with the receptive field.

C2f-DRB Module
The DRB module, known in full as the dilated reparam block [29] extended reprameterization module, utilizes a compact kernel and multiple layers with varying dilation rates to enhance a larger kernel convolutional layer.Its key hyperparameters include the size (k) of the larger kernel, the size (k) of the parallel convolutional layers, and the rate of expansion (r). Figure 4 depicts the scenario with four parallel layers, indicated by k = 9, r = (1, 2, 3, 4), and k = (5, 5, 3, 3).For greater K, we can employ more expansion layers with bigger kernel sizes or expansion rates.The core size and expansion ratio of parallel branches are flexible, the sole restriction being (k − 1) r + 1 < k.
To create a sizable kernel convolution layer for inference from the dilated repararm block, we first merged each BN into the preceding convolution layer, converted each layer with a scaling of r > 1 using function 1, and summed all resultant kernels with appropriate zero padding.The non-dilated large core layer was enhanced by the dilated small kernel curl layer in the dilated reparam block.
input channel is scanned by a dilation convolution layer with a dilation rate of r to identify spatial patterns in which each pixel of interest is r-1 pixels away from its neighboring pixels.As a result, we employed dilated convolutional layers in parallel with the larger kernel and summed their outputs.Since C2f in YOLOv8 lacks the ability to capture small-scale patterns between training sessions, the dilated reparam block module was used to compensate for this shortcoming, thus enhancing the model's training capability.

Inner-IoU Loss Functions
The loss function for the regression used in YOLOv8 is CIoU [30].The CIoU loss function considers the entire intersection between target frames and adds a correction factor to more precisely quantify the similarity between target frames.The CIoU loss function has the following advantages: it is more robust relative to different shapes of target frames, it more easily captures the exact shape of the target, and it takes into account several factors such as position, shape, and direction, all of which can help enhance the model's performance in complex situations.However, with respect to practical applications in the context of steel surface defect detection, it cannot be adapted according to the different detection assignments and its capacity for generalization is weak; therefore, considering the aforementioned shortcomings, the model cannot be applied to the detection of steel surface defects.Based on the above shortcomings, the regression loss function, known as the Inner-IoU loss function, was introduced in this study.The Inner-IoU loss function was proposed by Zhang Hao [31]   From a parametric point of view, such an expanded layer is comparable to a nonexpanded convolutional greater sparse kernel layer, thus allowing the entire block to be converted correspondingly into a single big kernel convolution.The use of a parallel small kernel convolution in conjunction with the large kernel convolution is recommended, as the latter facilitates the training process by capturing small-scale patterns.Their outputs are summed after two corresponding batch normalization (BN) layers.After training, the BN layers were merged into the convolution layers using a structural reparameterization method, in order for the large kernel convolution and the tiny kernel convolution to be equivalently combined for inference.In addition to small-scale patterns, augmenting the capacity of the big kernel to detect sparse patterns (i.e., pixels on the feature map that might have a stronger correlation than their neighbors with certain distant pixels) can produce higher-quality features.The need to capture such patterns is perfectly addressed by the mechanism of dilation convolution.From the viewpoint of a sliding window, the input channel is scanned by a dilation convolution layer with a dilation rate of r to identify spatial patterns in which each pixel of interest is r-1 pixels away from its neighboring pixels.As a result, we employed dilated convolutional layers in parallel with the larger kernel and summed their outputs.Since C2f in YOLOv8 lacks the ability to capture small-scale patterns between training sessions, the dilated reparam block module was used to compensate for this shortcoming, thus enhancing the model's training capability.

Inner-IoU Loss Functions
The loss function for the regression used in YOLOv8 is CIoU [30].The CIoU loss function considers the entire intersection between target frames and adds a correction factor to more precisely quantify the similarity between target frames.The CIoU loss function has the following advantages: it is more robust relative to different shapes of target frames, it more easily captures the exact shape of the target, and it takes into account several factors such as position, shape, and direction, all of which can help enhance the model's performance in complex situations.However, with respect to practical applications in the context of steel surface defect detection, it cannot be adapted according to the different detection assignments and its capacity for generalization is weak; therefore, considering the aforementioned shortcomings, the model cannot be applied to the detection of steel surface defects.Based on the above shortcomings, the regression loss function, known as the Inner-IoU loss function, was introduced in this study.The Inner-IoU loss function was proposed by Zhang Hao [31] et al. in 2023.This loss function calculates the IoU loss using an auxiliary bounding box, and a scale factor ratio is introduced to regulate the auxiliary bounding box scale size, which is utilized to determine the loss.To compensate for the shortcomings of the CIoU loss function, Inner-IoU entails the use of an auxiliary bounding box to compute the loss and quicken the process of bounding box regression.The scale factor ratio is introduced in the Inner-IoU and can be used to regulate the scale size of the auxiliary bounding box to overcome the limitation of the weak generalization ability of existing methods.The formula for the calculation of the Inner-IoU is as follows: The width and height of the GT frame are denoted by w gt and h gt , respectively, and w is the width and h is the height.The variable "ration" corresponds to the scale factor, which is usually in the value range of [0.5,1.5].

IoU inner =
inter union (7) The inter and union are calculated using the above formula.

Image Dataset
The NEU-DET [32] dataset was utilized in this study, consisting of a steel surface dataset from Northeastern University on which training and validation were performed.This dataset contains 1800 images in total, consisting of six categories of classified defects, and each type of defect is exemplified by 300 images which contain rolled-in scale (RS), patches (Pa), crazing (Cr), scratches (Sc), pitting surfaces (Ps), and inclusions (In), as shown in

Experimental Environment
The operating system used for the experiments in this paper was Windows 11, the CPU was a 13th Gen Intel(R) Core(TM) i5-13500HX, the GPU used was the NVIDIA Ge-Force RTX 4060 GPU, and the RAM was 16GB.The deep learning training architecture used was Pytorch 1.13.1.The specific parameters of the experiment were as follows: the learning rate was 0.01, the image size was 640 × 640, the number of iteration rounds (Epochs) was 200, and the optimizer chosen was SGD.

Experimental Metrics
The data evaluation metrics used in this study were precision and recall curves P-R (precision-recall curve), the number of parameters x (parameters), the average precision (AP) of each defect category, the average accuracy (mAP), and the frames per second (FPS), with the AP determined by the precision (Precision, P) and the recall (Recall, R).The P-R curve is a curve formed by the coordinate system of test precision and recall, and the area enclosed by the curve is mAP.The related calculation formula is: ( ) where TP is the number of predicted positive samples that are actually positive, i.e., positive samples correctly identified.FP is the number of predicted positive samples that are actually negative, i.e., negative samples misreported.
( ) where F N is the number of predicted negative samples that are actually positive, i.e., positive samples missed.

Experimental Environment
The operating system used for the experiments in this paper was Windows 11, the CPU was a 13th Gen Intel(R) Core(TM) i5-13500HX, the GPU used was the NVIDIA GeForce RTX 4060 GPU, and the RAM was 16 GB.The deep learning training architecture used was Pytorch 1.13.1.The specific parameters of the experiment were as follows: the learning rate was 0.01, the image size was 640 × 640, the number of iteration rounds (Epochs) was 200, and the optimizer chosen was SGD.

Experimental Metrics
The data evaluation metrics used in this study were precision and recall curves P-R (precision-recall curve), the number of parameters x (parameters), the average precision (AP) of each defect category, the average accuracy (mAP), and the frames per second (FPS), with the AP determined by the precision (Precision, P) and the recall (Recall, R).The P-R curve is a curve formed by the coordinate system of test precision and recall, and the area enclosed by the curve is mAP.The related calculation formula is: where TP is the number of predicted positive samples that are actually positive, i.e., positive samples correctly identified.FP is the number of predicted positive samples that are actually negative, i.e., negative samples misreported.
where FN is the number of predicted negative samples that are actually positive, i.e., positive samples missed.
where AP is the accuracy of each detection type.
where mAP denotes the mean of the values in all categories and n is the number of categories.

Ablation Experiments
To demonstrate the enhancement effect of each of our improvements to YOLOv8, we conducted five sets of ablation experiments separately.First, the DWR module was added to the sixth and eighth layers of the C2f structure in the backbone network, and then the DRB module was added outside the second, fourth, sixth, and eighth layers of the C2f structure.The DRB module was also added to the neck and head layers.In addition to that, a combination of DRB and DWR modules were added to the sixth and eighth layers in the C2f structure in the backbone network, and, lastly, the Inner-IoU loss function took the place of the original loss function.Each part of the modules corresponding to the above was added to the YOLOv8 model accordingly, and we report the experimental results of the various combinations of the improved modules.Table 1 shows the outcomes of our ablation studies.As it can be seen from Table 1, after replacing C2f in the backbone network with C2f_DWR, except for a slight decrease in recall and FPS, the rest of the metrics, such as mAP and precision, improved, with the mAP item experiencing a notable upgrade of 2.5 percentage points which was attributed to the excellent ability of DWR to extract the multiscale contextual information and then fuse the wildly formed by the multiscale fields into the feature maps, with a decrease in the number of parameters and computational effort.After substituting the C2f structure in the main network with the C2f_DRB structure, the DRB reparameterization module used a non-inflated small kernel and multiple inflated small kernel layers to augment a non-inflated large kernel convolutional layer, an approach which greatly reduced the number of parameters and significantly increased mAP and FPS, as well as FPS to 104.This was because DRB compensated for the inability of Cf2 to capture small-scale patterns between training sessions, thus improving the model's training and detection capabilities.Then, after substituting the loss function of YOLOv8 with the Inner-IoU loss function alone, while not affecting the size of the model parameters, there was a notable increase in accuracy, mAP, and FPS, with a 5.4% increase in accuracy, a 1.5% increase in mAP, and a 9% increase in FPS.Therefore, our improvement not only optimized the speed of the model, but also enhanced the detection of defects by the model, the average accuracy rate, and the speed of defect detection, indicating that our improvement is effective.The fourth set of experiments entailed the DWR module and the DRB module being combined to form a new C2f_DWR_DRB structure and, from Table 1, we can see that, except for the recall rate having declined, all other indexes experienced a great improvement, with the precision rate, mAP, and FPS improving by 2.1%, 4.6%, and 27%, respectively.The most obvious improvement was that experienced by the mAP and FPS indexes, as well as the quantity of calculations and the number of parameters.There was a decrease in the number of calculations and the number of parameters, showing that the C2f_DWR_DRB structure combined the advantages of the two respective modules.Finally, our DDI-YOLO experienced a decrease in recall, but all other metrics remained optimal.In summary, the improvement method of the model put forth in this research works well.

Comparative Experiments
To confirm that our suggested DDI-YOLO modeling algorithm is effective, in this paper, we used the original YOLOv8n as the baseline and tested YOLOv8n and DDI-YOLO using the NEU-DET dataset.Table 2 provides a summary of the findings.The indicator "↑1.8" indicates that for the defect type In, the mAP of DDI-YOLO was 1.8% greater than the value associated with the baseline model YOLOv8n, as well as the other models.As shown in Table 2, the accuracy was improved for the remaining four defect types, except for Cr defects and Ps defects for which the map values experienced a slight decrease.In addition, the AP values of In defects and Rs defects for the original YOLOv8n were only 84.7 and 74.6, respectively, and our model improved the mAP values of these two defects by 1.8% and 10.6%, respectively.Our introduction of the C2f_DWR_DRB module in place of the C2f module enhanced the extraction of features from the scalable receptive fields at the higher level of the network.In addition to this, our method improved the accuracy of Cr defects from 35.1% to 49.7% and of Rs defects from 52.1% to 67.4%, that is, an increase of 14.6% and 15.3%, respectively.Our approach offers a notable enhancement of the detection accuracy and precision of these defects, thus demonstrating the efficacy of our approach.To further learn about the detection capability of our proposed DDI-YOLO and baseline model YOLOv8n, the P-R curves of the two methods are shown in Figure 6a,b.The region enclosed by our proposed DDI-YOLO was to be greater than the region enclosed by YOLOv8.The NEU-DET dataset's overall mAP was 78.3%, which was a significant improvement of 2.4 percentage points over YOLOv8n.
To confirm the efficacy of the network improvement proposed in this work, the improved YOLOv8n algorithmic model was contrasted with other algorithmic models for experimentation purposes.In this paper, the classical SSD, YOLOv3-tiny, YOLOv5n, YOLOv6, YOLOv7-tiny [33], YOLOv8n (baseline), and the newer RT-DETR [34] were selected for comparative studies.These experiments were carried out utilizing an identical environment, dataset, and equipment, as shown in the graphs for the outcomes of the studies conducted to compare various algorithms.
Table 3 shows that, compared to several other algorithms, this paper's algorithm achieved the best performance (78.3% and 158 frames/s) in terms of mAP and FPS values, and was only second to YOLOv8n and YOLOv5n in terms of recall, but performed better than YOLOv8n with respect to all other metrics.This demonstrates that our improved model is superior to other algorithmic models in terms of detection accuracy and detection speed.Regarding the quantity of model parameters, compared to other algorithms, it was only a little higher than that characterizing YOLOv5, which is much smaller than most other algorithmic models.Figure 7 displays the experimental detection outcomes of several different algorithms, and it can be observed from the figure that the other algorithmic models were associated with wrong detections, missed detections, etc., while this paper's proposed algorithmic model had the best detection effect.To sum up, the improved algorithm presented in this essay not only performed better in terms of precision of measurements, accuracy, and detection speed, but also reduced the amount of arithmetic involved, something which, in turn, improved the detection efficiency.Therefore, the improved algorithm proposed in this work has a rather high generalization value and practicality.To confirm the efficacy of the network improvement proposed in this work, the improved YOLOv8n algorithmic model was contrasted with other algorithmic models for experimentation purposes.In this paper, the classical SSD, YOLOv3-tiny, YOLOv5n, YOLOv6, YOLOv7-tiny [33], YOLOv8n (baseline), and the newer RT-DETR [34] were selected for comparative studies.These experiments were carried out utilizing an identical environment, dataset, and equipment, as shown in the graphs for the outcomes of the studies conducted to compare various algorithms.
Table 3 shows that, compared to several other algorithms, this paper's algorithm achieved the best performance (78.3% and 158 frames/s) in terms of mAP and FPS values, and was only second to YOLOv8n and YOLOv5n in terms of recall, but performed better than YOLOv8n with respect to all other metrics.This demonstrates that our improved model is superior to other algorithmic models in terms of detection accuracy and detection speed.Regarding the quantity of model parameters, compared to other algorithms, it was only a little higher than that characterizing YOLOv5, which is much smaller than most other algorithmic models.Figure 7 displays the experimental detection outcomes of several different algorithms, and it can be observed from the figure that the other algorithmic models were associated with wrong detections, missed detections, etc., while this paper's proposed algorithmic model had the best detection effect.To sum up, the improved algorithm presented in this essay not only performed better in terms of precision of measure-

Conclusions
In response to the necessity of detecting steel surface defects in actual production, our institute proposes the defect detection algorithm DDI-YOLO.First, the backbone layer of YOLOv8 was improved by proposing the C2f_DWR_DRB module structure, which improved the backbone's ability to extract features.Second, in the neck structure, the C2f_DRB module was proposed to compensate for the inability of the C2f module in the neck network to capture the defects of small-scale patterns during training, and it improved the model's training ability.Finally, the Inner-IoU loss function was used to replace the loss function CIoU of the initial YOLOv8n model, leading to a more accurate model and faster training.Based on the experimental results, the following conclusions can be drawn: compared with YOLOv8n, DDI-YOLO enhanced the mAP by 2.4%, the accuracy by 3.3%, and the FPS by 59 frames/s, all of which can satisfy more practical needs.In summary, our proposed algorithmic model can meet the requirements of industrial detection, offers advantages in practical industrial applications, and can be applied to a

2. 2 .
Based on the Improved YOLOv8 Algorithm (DDI-YOLO) 2.2.1.DDI-YOLO As shown in Figure 1, the structure of the algorithmic model of the improved YOLOv8n consisted of the C2f_DWR and C2f_DRB modules in the backbone to replace the C2f modules in layer 6 and layer 7, with the C2f_DWR_DRB module improving the model's synthesis ability and incorporating each of the advantages associated with the C2f_DWR and C2_DRB modules into the backbone network.Then, in the neck layer, the C2f module in the twelfth, fifteenth, and twenty-first layers was replaced with the C2f_DRB module, with the use of C2f_DRB compensating for the inability of C2f to detect defects in smallscale patterns and enhancing the training ability of the model.Finally, the Inner-IoU loss function was used instead of YOLOv8n's loss function CIoU to make the model training faster and more accurate.Shown in Figure 2 is the whole process of detecting defects on steel surface by the DDI-YOLO model.

Figure 2 .
Figure 2. The overall process of DDI-YOLO model steel defect detection.

Figure 2 .
Figure 2. The overall process of DDI-YOLO model steel defect detection.
et al. in 2023.This loss function calculates the IoU loss using an auxiliary bounding box, and a scale factor ratio is introduced to regulate the auxiliary bounding box scale size, which is utilized to determine the loss.To compensate for the shortcomings of the CIoU loss function, Inner-IoU entails the use of an auxiliary bounding box to compute the loss and quicken the process of bounding box regression.The scale factor ratio is introduced in the Inner-IoU and can be used to regulate the scale size of the auxiliary bounding box to overcome the limitation of the weak generalization ability of existing methods.The formula for the calculation of the Inner-IoU is as follows: truth (GT) frame and the anchor are identified as gt B and B respectively, and the centers of the GT frame and the internal GT frame are denoted by the points ( which also denote the centers of the anchor and of the internal anchor.
truth (GT) frame and the anchor are identified as B gt and B respectively, and the centers of the GT frame and the internal GT frame are denoted by the points (x gt c ,y gt c ), which also denote the centers of the anchor and of the internal anchor. b

Figure 5 .
The dataset images contain 200 × 200 pixels and the total of 1800 images were randomly divided into an 8:1:1 ratio to create NEU-DET training, test, and validation sets, i.e., 1440 training samples, 180 test samples, and 180 validation samples.

Figure 5 .
Figure 5. Schematic diagram of six defect samples.

Figure 5 .
Figure 5. Schematic diagram of six defect samples.

Figure 6 .
Figure 6.Comparison of different loss functions.

Figure 7 .
Figure 7.Comparison of the experimental results.Figure 7. Comparison of the experimental results.

Figure 7 .
Figure 7.Comparison of the experimental results.Figure 7. Comparison of the experimental results.

Table 2 .
The detection performance of DDI-YOLO using the NEU-DET datasets.