Improved of YOLOv8-n Algorithm for Steel Surface Defect Detection

Xiang, Qingqing; Wu, Gang; Liu, Zhiqiang; Zeng, Xudong

doi:10.3390/met15080843

Open AccessArticle

Improved of YOLOv8-n Algorithm for Steel Surface Defect Detection

¹

College of Elite Engineer, Changsha University of Science and Technology, Changsha 410114, China

²

College of Mechanical and Vehicle Engineering, Changsha University of Science and Technology, Changsha 410114, China

^*

Author to whom correspondence should be addressed.

Metals 2025, 15(8), 843; https://doi.org/10.3390/met15080843

Submission received: 3 July 2025 / Revised: 21 July 2025 / Accepted: 25 July 2025 / Published: 28 July 2025

(This article belongs to the Special Issue Nondestructive Testing Methods for Metallic Material)

Download

Browse Figures

Versions Notes

Abstract

To address the limitations in multi-scale feature processing and illumination sensitivity of existing steel surface defect detection algorithms, we proposed ADP-YOLOv8-n, enhancing accuracy and computational efficiency through advanced feature fusion and optimized network architecture. Firstly, an adaptive weighted down-sampling (ADSConv) module was proposed, which improves detector adaptability to diverse defects via the weighted fusion of down-sampled feature maps. Next, the C2f_DWR module was proposed, integrating optimized C2F architecture with a streamlined DWR design to enhance feature extraction efficiency while reducing computational complexity. Then, a Multi-Scale-Focus Diffusion Pyramid was designed to adaptively handle multi-scale object detection by dynamically adjusting feature fusion, thus reducing feature redundancy and information loss while maintaining a balance between detailed and global information. Experiments demonstrate that the proposed ADP-YOLOv8-n detection algorithm achieves superior performance, effectively balancing detection accuracy, inference speed, and model compactness.

Keywords:

surface defect detection; adaptive weight; receptive field; nondestructive testing

Graphical Abstract

1. Introduction

As a fundamental raw material in the iron and steel industry, the surface quality of hot-rolled strip exerts a significant impact on the physical and chemical properties of downstream and end-products [1]. Steel surface defect detection plays a crucial role in the manufacturing process [2]. Undetected defects can impair the normal functionality of products and even give rise to severe safety hazards.

Compared with the time-consuming and unreliable traditional manual observation detection method, the automatic defect detection technology based on machine vision shows good performance in object classification [3]. However, this technique relies on the manual definition of defect features, and low-dimensional representations based on artificial features exhibit poor generalization ability when dealing with variable steel surface defects. Furthermore, the extraction of numerous features can result in slow detection speed, thereby impeding the implementation of machine vision defect detection in real industrial production [4].

Deep learning methods based on multilayer neural networks have garnered significant attention in both academia and industry due to their ability to automatically extract complex high-dimensional features from images. The detector network structure can identify regions of interest (ROIs) and perform classification. YOLOv8, as the state-of-the-art model in the YOLO series for object detection, builds upon the foundational framework of deep learning. By optimizing the architecture and training methods of deep convolutional neural networks, YOLOv8 achieves efficient and accurate object detection. Wang et al. [5] proposed a real-time YOLO-v5-based steel surface defect detection network, which incorporated a multi-scale exploration module to effectively recognize multi-scale surface defect features and integrated a spatial attention module to enhance defect information focus. Kou et al. [6] proposed an end-to-end YOLO-V3-based defect detection model, which demonstrated notable advancements in detection accuracy and speed on target datasets. Zhao et al. [7] developed an enhanced Faster RCNN to address the challenge of detecting small and complex objects with deep learning methods. Through integrating a multi-scale feature fusion network and deformable convolutional layers to replace conventional convolutional layers, this approach effectively enhanced the detection accuracy for small-scale steel surface defects. Chen et al. [8] proposed the SEP-YOLOv7 model, which integrated an improved ECA mechanism and multi-filter convolution residual blocks to detect weld proximity defects, showing excellent performance in this specific task. Lv et al. [9] proposed a steel surface defect detection model based on YOLOv8, integrating MobileViTv2 and Cross-Local Connection (CLC) to enhance feature extraction and multi-scale fusion. The introduction of MobileViTv2 resulted in the model’s detection speed being slightly slower than the original YOLOv8, and its performance may have been affected by complex backgrounds. Ma et al. [10] proposed a lightweight steel surface defect detection algorithm using improved YOLOv8, integrating GhostNet, the MPCA attention mechanism, and the SIoU loss function. It achieved a balance between lightweight design and detection accuracy. However, its performance may have been affected by complex backgrounds and small defects. Ruan et al. [11] proposed the EPSC-YOLO model, which enhanced industrial surface defect detection accuracy and real-time performance by integrating multi-scale attention, pyramidal convolutions, Soft-NMS, and the CISBA module. However, its high computational complexity and need for extensive labeled data restricted its use on resource-limited devices.

In summary, YOLOv3, YOLOv5, and YOLOv7 have demonstrated superior performance in detecting small-scale, individual defects. However, they encounter significant challenges when confronted with multi-scale feature defects. YOLOv8, leveraging its sophisticated detection framework, has demonstrated the capacity to address multi-scale feature defects. However, this capability comes at the cost of heightened computational demands and extensive data labeling requirements, especially when dealing with minute defects and intricate backgrounds. These factors lead to a notable decline in detection speed.

To address these challenges, this paper proposes an improved YOLOv8 steel surface defect detection algorithm, named ADP-YOLOv8-n. The contributions of this paper are as follows:

Due to the wide variability in the shape and size of steel surface defects, this study proposed an adaptive weighted sampling (ADSConv) module. By dynamically adjusting the weighted combination of multi-scale feature maps, it enables the comprehensive capture of defect features to enhance the model’s adaptability to different types of defects.
As defects occupy a minimal proportion of steel surface images, their identification is challenging under complex lighting and backgrounds. In this paper, the C2f [12] module in the feature extraction network is improved, and the original Bottleneck module is replaced by the simplified DWR [13] module. The optimized C2f_DWR enhances feature extraction from the network’s high-level variable receptive field via deep separable convolutions with different expansion rates.
Due to the fixed feature stitching and convolution operation of the traditional feature fusion module, there is a lack of adaptive optimization of targets of different sizes in the process of multi-scale feature fusion. This study designed a Multi-Scale-Focusing Diffusion Pyramid Network (MS-FDNet) to enable efficient multi-scale feature fusion.

2. Materials and Methods

YOLOv8 is an advanced target detection algorithm built upon the YOLOv5 architecture [14], incorporating significant optimizations and improvements. This algorithm comprises four core modules: input terminal, backbone network, neck, and output terminal. The input terminal employs Mosaic data enhancement, adaptive image scaling, and grayscale filling strategies to preprocess the image. The backbone network integrates Conv, C2f, and SPPF structures to extract image features through convolution and pooling operations. The Neck terminal was designed based on the PAN structure, enabling the fusion of multi-scale feature maps through up-sampling, down-sampling, and feature concatenation. The output terminal adopts a Decoupled Head [15] structure to decouple classification and regression processes, and employs the Task-Aligned Assigner [16] to weight classification and regression scores for positive sample matching. Improvements involve substituting the C3 module with the C2f module to facilitate lightweight processing, deleting the convolution layer before the up-sampling layer and compressing the algorithm size. It adopts the Decoupled Head and leverages the DFL (Distribution Focal Loss) [10] concept to adjust the number of channels in the regression head. Moreover, YOLOv8 employs the Anchor-Free approach, representing objects using multiple key points or center points with boundaries, thus making it better suited for dense detection scenarios.

2.1. Improved Network Structure of the YOLOv8-n Algorithm

YOLOv8-n is selected as the benchmark model for improvement, with the improved network structure illustrated in Figure 1. The improvement primarily targets the feature extraction network (network layers 6, 7, 8, and 9), aiming to enhance defect information focus via weighted fusion of feature maps (network layers 6 and 8) and expand the receptive field of backbone output features (network layers 7 and 9). This study designs a Multi-Scale-Focusing Diffusion Pyramid (MSFDP) network (marked by the red dotted frame) to enable efficient multi-scale feature fusion.

2.2. ADSConv Module Design of the YOLOv8-n Algorithm

In computer vision tasks, down-sampling is widely employed to reduce the spatial dimensions of feature maps, thereby decreasing computational complexity. However, traditional down-sampling approaches, such as max pooling or average pooling, often discard critical feature information. The ADSConv module proposes an adaptive weighting mechanism inspired by the attention concept. This design preserves critical information during down-sampling.

Specifically, the ADSConv module first employs a weight generation branch to produce adaptive weights. This branch comprises an average pooling layer, a 1 × 1 convolution layer, and a SoftMax layer. For the input feature map F, the average pooling layer aggregates the spatial information of the feature map, and the convolution layer adjusts the channel dimension of the feature map. Then, the weights are normalized by the SoftMax layer to ensure that the sum of weights equals 1. In this manner, the ADSConv module learns the importance of each feature and generates the corresponding adaptive weights

A_{n o r m}

.

The specific calculation is as follows:

A_{n o r m} = S o f t \max (C o n v (A v g P o o l 2 d (F)))

(1)

where F denotes the input feature map and

A_{n o r m}

represents the generated adaptive weights.

Simultaneously, another branch of the ADSConv module performs down-sampling on the input feature map via a grouped convolution layer to obtain the down-sampled feature map

F_{d s}

. This convolutional layer is designed to reduce the spatial dimensions of the feature map while maintaining feature richness through channel expansion.

Finally, the ADSConv module applies adaptive weights to the down-sampled feature map. The output feature layer

F_{o u t}

is derived through weighted summation of features at each position. The specific calculation is as follows:

F_{o u t} = \sum (F_{d s} \otimes A_{n o r m}, a x i s = - 1)

(2)

where

F_{d s}

is the generated feature map. The down-sampling branch

F_{o u t}

generates the output feature map weighted by adaptive weight;

a x i s

is the dimension of operation; and

\otimes

is element-by-element multiplication.

In summary, the ADSConv module efficiently reduces feature map size through the integration of adaptive weights and down-sampling operations, while preserving critical feature information. The architectural details of the ADSConv module are illustrated in Figure 2.

2.3. C2f_DWR Module Designs of the YOLOv8-n Algorithm

Given the significant variations in defect scales and shapes, enhancing the network’s multi-scale feature extraction capability becomes particularly crucial. However, traditional receptive field expansion via down-sampling often results in a reduced image size and the loss of critical feature information. To address feature information loss from down-sampling, Zhao W. et al. [7] proposed using the RFB module [17] to enhance the receptive field of the output feature layer. However, this approach rendered the model bulky and more complex, escalating computational resource consumption and processing time. The C2f_DWR module proposed in this study is directly integrated into C2f by replacing the original Bottleneck module with the simplified DWR module. It not only enables compact feature extraction and multi-scale feature fusion, but also significantly reduces computational complexity and improves detection accuracy and robustness for steel surface defects, outperforming the approach of adding an RFB module after C2f output. Unlike traditional methods that augment parameters through network depth, the C2f_DWR module employs depthwise separable convolutions with different expansion rates, expanding the receptive field and enhancing the network’s multi-scale feature extraction capability while preserving computational efficiency.

Considering that introducing DWR and the resulting multi-branch structure may increase model complexity, the study simplifies the original DWR module by replacing standard convolutions with deep separable convolutions, reducing expansion convolution types to retain only two rates (d = 1 and d = 3). Secondly, this work limits the quantity of C2f_DWR modules and performs a comparative test within the module analysis of Section 3.5.2 to identify the optimal number, thereby minimizing model complexity risks.

Specifically, the simplified DWR module employs a 3 × 3 depthwise separable convolution to halve the input feature map’s channel count, thereby reducing the computational load of subsequent operations. The module then utilizes two 3 × 3 depthwise separable convolutional layers with different dilation rates (d = 1, d = 3) to capture multi-scale feature information. Following concatenation of the two dilated convolutions’ outputs, feature fusion is conducted through a 1 × 1 conv layer, forming a residual connection with the original input feature map. The architectural details of the simplified DWR module are illustrated in Figure 3. The architectural details of the C2f_DWR module are illustrated in Figure 4.

2.4. Multi-Scale-Focus Diffusion Pyramid Network (MS-FDNet) Module

In the steel strip surface defect detection task, defect targets exhibit substantial variations in size and shape, while the rigid architecture of traditional feature fusion modules fails to flexibly adapt to diverse target types. Traditional feature fusion modules, with their fixed feature concatenation and convolution operations, lack adaptive optimization for multi-scale targets during feature fusion. For small-target detection, the detailed information in high-dimensional feature maps is of particular importance. However, such details may be diminished during feature concatenation, leading to inadequate small-target feature extraction. For large-target detection, retaining the global semantic information of low-dimensional features is essential. However, traditional feature fusion modules fail to dynamically adjust the weights and order of feature fusion, potentially overlooking critical features essential for large-target detection.

The Multi-Scale-Focusing Diffusion Pyramid Network is designed to achieve the efficient fusion of multi-scale features. Through integrating the Focus Feature module and pyramid cascade structure, MS-FDNet overcomes the limitations of the CCFM module in multi-scale target feature optimization.

The Focus Feature module employs multi-receptive field convolution to capture small-target details and integrates multi-scale features via dimensionality reduction convolution. The pyramid cascade structure enables flexible fusion of high-dimensional and low-dimensional features through multiple up–down-sampling and feature concatenation, ensuring the retention of small-target details and large-target global information. By dynamically adjusting feature fusion, MS-FDNet adapts to multi-scale target detection tasks, alleviating feature redundancy and information loss while balancing detailed and global information. Compared with the RT-DETR benchmark detection model, the detection performance is greatly improved after using the MS-FDNet structure. The Focus Feature module’s architecture is illustrated in Figure 5. The overall structure of MS-FDNet is shown in Figure 6.

3. Experimental Results and Analysis

3.1. Experimental Details, Datasets, and Evaluation Indicators

Experimental parameters were primarily configured following YOLOv8-n’s official recommendations, with Mosaic data augmentation disabled during the final 10 epochs. This experiment employed Stochastic Gradient Descent (SGD) for network parameter optimization. Key parameters included an initial learning rate of 0.01, momentum of 0.937, weight decay of 0.0005, batch size of 64, input image size of 640 × 640 pixels, and 400 training epochs. Experiments were conducted on an NVIDIA GeForce RTX 4090 (24 GB) GPU and Intel (R) Core (TM)i7-13700KF CPU. The deep learning framework used is Pytorch 2.1.1, with Python 3.8 as the programming language.

The NEU-DET surface defect dataset constructed by Northeastern University was utilized in the experiment [18]. This dataset contains grayscale images of six typical hot-rolled strip surface defects: cracks (Cr), inclusions (In), patches (Pa), pitting surfaces (Ps), rolled oxide scales (Rs), and scratches (Sc). Each defect type includes 300 images, totaling 1800 images. A partially labeled sample image is presented in Figure 7. For the defect detection task, 70% of the images (210 per defect type) were randomly chosen for model training, with the remaining images used for testing. This data distribution strategy aims to ensure that the model is trained on diverse samples while preserving sufficient test images to verify its generalization capability and effectiveness.

To detect defects with varying proportions and sizes, this study analyzes the proportional and size distributions of all defects, along with the sample count per defect and the central coordinate positions of defect targets in images. As depicted in Figure 8a, the aspect ratio of most defects falls between 1 and 3, with extremely low outliers. As shown in Figure 8b, the defect-to-image area ratio reveals diverse defect sizes in steel surfaces, with most defects being small-scale. Figure 8c illustrates uneven defect sample proportions with significant gaps. As shown in Figure 8d, defect target positions in images exhibit substantial variations. In conclusion, defects across categories in the dataset vary significantly by type, size, quantity, and distribution.

Model evaluation metrics comprise average precision (AP), mean average precision (mAP), frames per second (FPS), and model volume. Average precision (AP) denotes the area under the precision–recall (PR) curve, comprehensively reflecting detection performance in terms of precision and recall. mAP is calculated by averaging AP values across all detection categories. FPS measures detection speed, representing the number of images that the model processes per second. Model volume quantifies the model’s weight. The relevant formulas are as follows [19]:

P = \frac{T P}{T P + F P} \times 100 %

(3)

R = \frac{T P}{T P + F N} \times 100 %

(4)

A P = \int_{0}^{1} P (R) dR

(5)

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P (i) \times 100 %

(6)

In the formulas, P is the accuracy rate; R is the recall rate; TP (true positive) is true positive; FP (false positive) is false positive; and FN (false negative) is false negative.

3.2. Experimental Validation Analysis of the Improved Modules

3.2.1. ADSConv Module

In Section 2.2, the ADSConv module is described. This study incorporates the ADSConv module into the Backbone feature extraction network, thereby enhancing the network’s representational capability. The feature heat map is illustrated in Figure 9.

3.2.2. C2f_DWR Module

In Section 2.3, the C2f_DWR module is described. With this design, the C2f_DWR module expands the receptive field and enhances the network’s multi-scale feature extraction capability with lower computational complexity, thus improving the accuracy and robustness of steel surface defect detection. The C2f_DWR module is shown in Figure 5.

By improving the C2f module, the detector not only identifies previously unrecognized targets but also enhances the confidence of detected targets. Test cases of the C2f module before and after improvement are illustrated in Figure 10.

3.2.3. MS-FDNet Module

In Section 2.3, the MS-FDNet module is described. Integrating multi-receptive field convolution and pyramid cascade architecture, it effectively addresses the challenge of traditional feature fusion modules in optimizing multi-scale target features. Through the dynamic adjustment of feature fusion strategies, it balances detail preservation and global semantic information for multi-scale targets. Meanwhile, it mitigates feature redundancy and information loss, rendering it better suited for multi-scale target detection.

Test cases demonstrating the MS-FDNet structure before and after implementation are illustrated in Figure 11. The confusion matrices illustrating the performance of the three modules before and after implementation are shown in Figure 12.

3.3. Comparative Analysis of Different Defect Detection Algorithms

To evaluate the proposed algorithm’s effectiveness, performance comparisons were conducted for several state-of-the-art one-stage and two-stage detectors. The evaluation focused on mAP, per-image detection time, and model volume using the NEU-DET dataset. Two-stage methods include Faster R-CNN [20] and Cascade R-CNN [21], both utilizing region proposal networks for candidate bounding box generation. One-stage methods involve seven anchor-based and anchor-free detectors: SSD [22], CenterNet [23], YOLOv5 [14], YOLOv7 [24], YOLOv9 [25], YOLOX [26], and EfficientDet [27]. Detection results for different models are presented in Table 1.

It can be seen from Table 1 that the two-stage detectors show higher detection accuracy, and the mAP values of the Faster RCNN and Cascade RCNN reach 76.5% and 73.3%, respectively. However, their per-image inference speed is notably slower. The Faster R-CNN achieves a frame rate of 33.2 fps, while the Cascade R-CNN reaches 21.3 fps. Despite their high detection accuracy, two-stage detectors suffer from computational redundancy, resulting in slower speeds and larger model sizes that hinder practical deployment. By eliminating the region proposal stage, single-stage detectors significantly improve detection speed compared to their two-stage counterparts. The mAP value of the SSD algorithm based on the multi-frame prediction mechanism is 72.3%, and its frame rate can reach 98.0 fps, which is more than twice that of the two-stage target detectors, the Faster RCNN and Cascade RCNN. CenterNet abandons the concept of a priori boxes and represents targets as center points of bounding boxes (BBoxes). Compared to SSD, its frame rate significantly increases to 131.6 fps, with a mAP of 71.4%. EfficientDet, which employs BiFPN for feature extraction enhancement, achieves an overall mAP of 65.6%. Notably, it demonstrates the best detection accuracy for crack (Cr) features—the most challenging defect type—with a frame rate of 97.8 fps.

YOLO series detectors exhibit comparable overall performance. YOLOv5 matches SSD in detection accuracy while offering a lighter model architecture, with a trained model size of only 15.8 MB—an advantage for deployment. With a mAP of 74.7%, it excels in detection speed, achieving a frame rate of 139.8 fps. YOLOv7 achieves a mAP of 72.5% and a frame rate of 209.0 fps. YOLOv9, the latest benchmark model, achieves the highest mAP at 75.8% but with a slightly reduced detection speed of 106.4 fps and model size of 122.4 MB. The anchorless detector YOLOX achieves a mAP value of 72.5% and a detection speed of 136.4 fps. YOLOv8 outperforms all models, with its benchmark model matching YOLOv9’s detection accuracy (75.8%) while surpassing it in detection speed (209.4 fps) and model size (only 6.2 MB).

Compared with other models in the experiment, the proposed model significantly improves the detection accuracy at a slight sacrifice of detection speed. It reaches a mAP of 79.3%, a frame rate of 163.2 fps, and a model size of only 9.6 MB. Obviously, the improved detector has more advantages in performance, showing a good balance between detection accuracy, speed, and model volume.

3.4. Ablation Experiment

To analyze the impact of individual improvement measures on the benchmark model, ablation experiments were designed to evaluate the specific effect of each improvement on model performance. Experimental results are presented in Table 2.

Firstly, the adaptive weight down-sampling module is introduced alone. The improved detector shows a 1.1% mAP increase compared to the baseline, indicating that adaptive weight down-sampling enhances focus on defect information, thereby enabling more effective capture of defect features and improved detection accuracy. Following the addition of the ADSConv module, FPS drops to 146.8 fps, reflecting that while the attention mechanism enhances model representational capability, it may increase overall computational complexity, leading to reduced inference-stage performance. When the C2f module is replaced by the improved C2f_DWR alone, mAP increases by approximately 0.7% relative to the prior model. This demonstrates that the C2f_DWR module enables compact feature extraction and multi-scale fusion, expanding the receptive field and enhancing the network’s multi-scale feature extraction capability. The improved model achieves an FPS of 225.5, indicating that integrating modules in C2f outperforms traditional network deepening for feature extraction enhancement while preserving computational efficiency. Finally, when the MS-FDNet module is added alone, mAP increases by approximately 1.5% compared to the preceding configuration. This demonstrates that MS-FDNet effectively addresses the challenge of traditional feature fusion modules in optimizing multi-scale target features, thereby enhancing steel surface defect detection capability.

Table 3 demonstrates that the comprehensive application of these performance enhancement strategies to YOLOv8-n yields more significant performance gains than individual strategy applications. This indicates that the proposed optimization measures independently contribute to baseline performance improvements, with negligible negative interactions during simultaneous application. “✓” denotes that the module is applied in the detection algorithm, while “-” indicates that the module is not applied in the detection algorithm.

3.5. Validation Analysis of Improved Module Replacement Positions

3.5.1. ADSConv Module Analysis

In the improved model, select traditional convolution (Conv) modules in the feature extraction network are substituted with adaptive weighted down-sampling (ADSConv) modules. While the attention mechanism of the ADSConv module enhances model representational capacity, it concurrently increases computational complexity, leading to reduced inference speed. To determine the optimal number of ADSConv modules, a series of experiments were conducted, with results tabulated in Table 4. ‘T‘ represents that the position has been modified, and ‘F‘ represents that the position has not been modified. Comparison results indicate that replacing ADSConv modules in the final two stages yields optimal performance.

Computational load and parameter counts for the fourth and fifth convolutional layers, along with the overall network model, are compared before and after the replacement. Table 5 presents the comparison results. As shown in Table 5, the ADSConv layer elevates computational load and parameter counts relative to the original Conv layer. However, the actual parameter increment is negligible, and with controlled substitution counts, the practical impact remains insignificant.

3.5.2. Analysis of C2f_DWR Module

Integration of the C2f_DWR module into the backbone network substantially expands the network’s receptive field, enabling the model to capture multi-scale features. While the DWR module is simplified herein, its complexity remains higher than the Bottleneck module, potentially augmenting model size, elevating training difficulty, and increasing overfitting risk. Thus, the number of C2f_DWR layers must be minimized to avoid compromising detection performance. The experimental results from gradual replacement of backbone C2f layers (Table 6) indicate that replacing C2f_DWR layers in the final two stages suffices to yield significant performance gains.

3.6. Result Analysis

The YOLOv8-n model is enhanced through the integration of innovative modules, with the goal of boosting detection accuracy and speed for specific tasks. Firstly, the adaptive weight down-sampling module is introduced, significantly enhancing the model’s defect feature capture capability. While this enhancement involves a trade-off in inference speed, it highlights the potential of adaptive sampling for accuracy improvement. The improved C2f_DWR feature extraction module elevates detection accuracy via compact feature extraction and multi-scale fusion. Additionally, the module employs depthwise convolution to minimize model parameters, thereby accelerating detection speed. Finally, the MS-FDNet module balances detail retention and global semantic information by dynamically adjusting feature fusion to accommodate multi-scale targets. Meanwhile, it mitigates feature redundancy and information loss, rendering it better suited for multi-scale target detection.

The experimental results and analysis in Figure 13 reveal that while deep learning has achieved significant advancements in defect detection, challenges persist. Especially, detection performance remains suboptimal for difficult-to-identify features like cracks. One approach involves applying image enhancement techniques to improve image quality and enhance target–background contrast, thereby boosting detection performance. Another strategy entails optimizing the labeling process to more accurately characterize defect features. Additionally, domain adaptation techniques can be employed to enable model adaptation to diverse production lines and environmental conditions, enhancing detection robustness.

4. Conclusions

This paper proposes ADP-YOLOv8-n, an enhanced YOLOv8-n algorithm, to address the challenge of existing detection algorithms in efficiently and accurately detecting multi-scale features. Comparative experiments yield the following conclusions:

The comparative experiments among different detection algorithms reveal that two-stage detectors, such as the Faster R-CNN and Cascade, outperform one-stage detectors, including SSD, CenterNet, EfficientDet, and the YOLO series, in terms of detection accuracy. However, two-stage detectors are significantly slower in detection speed compared to one-stage detectors and also have larger model sizes.
The proposed ADP-YOLOv8-n algorithm demonstrates superior performance, achieving a favorable balance between detection accuracy, speed, and model size, with a modest sacrifice in detection speed and model size. Specifically, the ADP-YOLOv8-n algorithm achieves the highest detection accuracy in terms of Ps (84.8%), Rs (74.4%), Sc (91.3%), and mAP (79.3%). Although its detection accuracy for the Cr feature is slightly lower than that of the EfficientDet detector (53.0% vs. 56.9%), and its accuracy for the Pa feature is marginally lower than that of the SSD detector (92.5% vs. 93.5%), it still exhibits remarkable performance. In terms of detection speed, the ADP-YOLOv8-n algorithm (163.2 frames/s) is slower than YOLOv8 (209.4 frames/s) and YOLOv7 (209.0 frames/s). Regarding model size, the ADP-YOLOv8-n algorithm (9.6 MB) is slightly larger than YOLOv8 (6.2 MB), but significantly smaller than other detectors.
In this study, three improved modules (ADSConv, C2f_DWR, and MS-FDNet) were proposed to enhance the YOLOv8 detection model. Ablation studies demonstrated that ADSConv, C2f_DWR, and MS-FDNet individually improved detection accuracy by 1.1%, 0.7%, and 1.5%, respectively. In terms of detection speed, ADSConv led to a decrease of 62.6 frames/s, while C2f_DWR and MS-FDNet resulted in increases of 16.1 frames/s and 0.6 frames/s, respectively. When combined, ADSConv + C2f_DWR, ADSConv + MS-FDNet, C2f_DWR + MS-FDNet, and ADSConv + C2f_DWR + MS-FDNet achieved detection accuracy improvements of 2.6%, 1.7%, 2.7%, and 3.5%, respectively. However, their detection speeds decreased by 91.9 frames/s, 32.7 frames/s, 61.9 frames/s, and 46.2 frames/s, respectively. These results indicate that the ADP-YOLOv8-n algorithm sacrificed a certain degree of detection speed to enhance detection accuracy and reduce model size. Nevertheless, its detection speed remains superior to that of detectors other than YOLOv8 and YOLOv7.

Comparative experiments on the NEU-DET dataset demonstrate that the improved algorithm excels in detection accuracy, inference speed, and model size.

Future work directions: (1) explore the application of data enhancement techniques to generate high-quality defect images; (2) investigate the model’s representational capability under small-sample conditions to enhance network robustness; and (3) investigate integrating the proposed algorithm into real-world production lines.

Author Contributions

Conceptualization, Q.X., G.W. and Z.L.; methodology, Z.L.; software, Q.X. and X.Z.; validation, Q.X.; formal analysis, Q.X. and X.Z.; investigation, G.W.; resources, G.W.; data curation, Q.X. and Z.L.; writing—original draft preparation, Q.X. and Z.L.; writing—review and editing, Q.X. and Z.L.; project administration, G.W. and Z.L. funding acquisition, G.W.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (52476103).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed at the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADP-YOLOv8-n	Adaptive weight down-sampling YOLOv8-n
C2F	CSP Bottleneck with 2-convolution module
ADSConv	Adaptive Weighted Down sampling Convolution module
ROI	Regions of interest
DWR	Dilation-wise residual module
C2F_DWR	Combining the advantages of C2F and DWR modules.
MS-FDNet	Multi-Scale-Focus Diffusion Pyramid Network (MS-FDNet) module
Cr	Cracks
In	Inclusions
Pa	Patches
Ps	Pitting surfaces
Rs	Rolled oxide scales
Sc	Scratches
AP	Average precision
mAP	Mean average precision
FPS	Frames per second

References

Qiao, Q.; Hu, H.; Ahmad, A.; Wang, K. A Review of Metal Surface Defect Detection Technologies in Industrial Applications. IEEE Access 2025, 13, 48380–48400. [Google Scholar] [CrossRef]
Shen, K.; Zhou, X.; Liu, Z. MINet: Multiscale Interactive Network for Real-Time Salient Object Detection of Strip Steel Surface Defects. IEEE Trans. Ind. Inform. 2024, 20, 7842–7852. [Google Scholar] [CrossRef]
Wang, Q.; Wang, M.; Sun, J.; Chen, D.; Shi, P. Review of Surface-Defect Detection Methods for Industrial Products Based on Machine Vision. IEEE Access 2025, 13, 90668–90697. [Google Scholar] [CrossRef]
Wang, Y.; Yu, H.; Guo, B.; Shi, H.; Yu, Z. Research on Real-Time Detection System of Rail Surface Defects Based on Deep Learning. IEEE Sens. J. 2024, 24, 21157–21167. [Google Scholar]
Wang, L.; Liu, X.; Ma, J.; Su, W.; Li, H. Real-Time Steel Surface Defect Detection with Improved Multi-Scale YOLO-v5. Processes 2023, 11, 1357. [Google Scholar] [CrossRef]
Kou, X.; Liu, S.; Cheng, K.; Qian, Y. Development of a YOLO-V3-based model for detecting defects on steel strip surface. Measurement 2021, 182, 109454. [Google Scholar] [CrossRef]
Zhao, W.; Chen, F.; Huang, H.; Li, D.; Cheng, W. A new steel defect detection algorithm based on deep learning. Comput. Intell. Neurosci. 2021, 2021, 1–13. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.; Zhang, R.; Hsieh, M.Y.; Souri, A.; Li, K.-C. Average Sigmoid-Tanh Attention and Multi-filter Partially Decoupled Mechanism via YOLOv7 for Detecting Weld Proximity Defects. Met. Mater. Trans. 2025, 56, 4186–4200. [Google Scholar] [CrossRef]
Lv, Z.; Zhao, Z.; Xia, K.; Gu, G.; Liu, K.; Chen, X. Steel surface defect detection based on MobileViTv2 and YOLOv8. J. Supercomput. 2024, 80, 18919–18941. [Google Scholar] [CrossRef]
Ma, S.; Zhao, X.; Wan, L.; Zhang, Y.; Gao, H. A lightweight algorithm for steel surface defect detection using improved YOLOv8. Sci. Rep. 2025, 15, 8966. [Google Scholar] [CrossRef]
Ruan, S.; Zhan, C.; Liu, B.; Wan, Q.; Song, K. A high precision YOLO model for surface defect detection based on PyConv and CISBA. Sci. Rep. 2025, 15, 15841. [Google Scholar] [CrossRef]
Wang, C.; Wang, H.; Jiang, Y.; Yu, L.; Wang, X. CSCP-YOLO: A Lightweight and Efficient Algorithm for Real-Time Steel Surface Defect Detection. IEEE Access 2025, 13, 113517–113528. [Google Scholar] [CrossRef]
Lu, X.; Zhou, Y.-K.; Qin, W.; Yang, W.-W.; Chen, J.-X. A Novel and Compact Dual-Orthogonal-Ridged Dielectric Waveguide Resonator and Its Applications to Bandpass Filters. IEEE Trans. Microw. Theory Tech. 2025, 73, 1671–1679. [Google Scholar] [CrossRef]
Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
Zhang, H.; Miao, Q.; Li, S.; Wang, C.; Chan, S.; Hu, J.; Bai, C. An efficient and real-time steel surface defect detection method based on single-stage detection algorithm. Multimed. Tools Appl. 2024, 83, 90595–90617. [Google Scholar] [CrossRef]
Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE Computer Society: Washington, DC, USA, 2021; pp. 3490–3499. [Google Scholar]
Li, Y.; Han, Z.; Wang, W.; Xu, H.; Wei, Y.; Zai, G. Steel surface defect detection based on sparse global attention transformer. Pattern Anal. Appl. 2024, 27, 152. [Google Scholar] [CrossRef]
He, Y.; Song, K.; Meng, Q.; Yan, Y. An End-to-End Steel Surface Defect Detection Approach via Fusing Multiple Hierarchical Features. IEEE Trans. Instrum. Meas. 2020, 69, 1493–1504. [Google Scholar] [CrossRef]
Chan, S.; Li, S.; Zhang, H.; Zhou, X.; Mao, J.; Hong, F. Feature optimization-guided high-precision and real-time metal surface defect detection network. Sci. Rep. 2024, 14, 31941. [Google Scholar] [CrossRef]
Zamanidoost, Y.; Ould-Bachir, T.; Martel, S. OMS-CNN: Optimized Multi-Scale CNN for Lung Nodule Detection Based on Faster R-CNN. IEEE J. Biomed. Health Inform. 2025, 29, 2148–2160. [Google Scholar] [CrossRef]
Chai, B.; Nie, X.; Zhou, Q.; Zhou, X. Enhanced Cascade R-CNN for Multiscale Object Detection in Dense Scenes From SAR Images. IEEE Sens. J. 2024, 24, 20143–20153. [Google Scholar] [CrossRef]
Zhong, X. CAL-SSD: Lightweight SSD object detection based on coordinated attention. Signal Image Video Process. 2025, 19, 31. [Google Scholar] [CrossRef]
Wang, Y.; Deng, H.; Wang, Y.; Song, L.; Ma, B.; Song, H. CenterNet-LW-SE net: Integrating lightweight CenterNet and channel attention mechanism for the detection of Camellia oleifera fruits. Multimed. Tools Appl. 2024, 83, 68585–68603. [Google Scholar] [CrossRef]
Wang, C.; Bochkovskiy, A.; Liao, H. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Wang, C.; Yeh, I.; Liao, H. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Gu, Q.; Huang, H.; Han, Z.; Fan, Q. GLFE-YOLOX: Global and Local Feature Enhanced YOLOX for Remote Sensing Images. IEEE Trans. Instrum. Meas. 2024, 73, 1–12. [Google Scholar] [CrossRef]
Zocco, F.; Lin, T.-C.; Huang, C.-I.; Wang, H.-C.; Khyam, M.O.; Van, M. Towards More Efficient EfficientDets and Real-Time Marine Debris Detection. IEEE Robot. Autom. Lett. 2023, 8, 2134–2141. [Google Scholar] [CrossRef]

Figure 1. Overall network architecture.

Figure 2. The structure of ADSConv.

Figure 3. Simplified structure of DWR.

Figure 4. Structure of C2f_DW.

Figure 5. Structure of Focus Feature.

Figure 6. Structure of MS-FDNet.

Figure 7. Partially labeled NEU-DET dataset.

Figure 8. Sample parameters: (a) distribution of defect aspect ratio; (b) distribution of defect area-to-image area ratio; (c) number of samples with defects; and (d) position of the center point coordinate of defect target on the image.

Figure 9. The feature heat map: (a) original image; (b) feature heat map of YOLOv8-n network; (c) and feature heat map after adding ADSConv module.

Figure 10. Comparison of detection cases before and after C2f module improvement: (a) original image; (b) test results before C2f module improvement; and (c) test results after C2f module improvement.

Figure 11. Comparison of same layer feature map before and after introducing MS-FDNet module: (a) original image; (b) feature map before introducing MS-FDNet module; and (c) feature map after introducing MS-FDNet module.

Figure 12. Comparative confusion matrices before and after the introduction of ADSConv, C2f_DWR, and MS-FDNet modules: (a) confusion matrices before introducing the modules and (b) confusion matrices after introducing the modules.

Figure 13. Detection results of the proposed model: (a) crazing; (b) patches; (c) inclusion; (d) pitted surface; (e) rolled-in scale; and (f) scratches.

Table 1. Inspection results on NEU-DET.

Method	Cr/%	Pa/%	In/%	Ps/%	Rs/%	Sc/%	mAP/%	FPS/Frame/s	Volume/MB	P/%	R/%
Two-stage
Faster RCNN	52.0	90.4	85.9	78.1	60.4	92.2	76.5	33.2	113.2	44.7	82.4
Cascade	38.3	88.4	76.0	81.3	67.8	88.2	73.3	21.3	88.3	77.2	64.3
One-stage
SSD	43.7	93.5	80.8	83.7	56.3	75.7	72.3	98.0	98.2	75.1	65.9
CenterNet	44.2	88.8	78.8	77.5	52.0	87.1	71.4	131.6	131.0	72.5	34.5
Efficientdet	56.9	91.7	81.8	80.9	55.4	26.5	65.6	97.8	15.8	88.2	47.8
YOLOX	37.5	90.6	82.4	75.0	58.8	90.7	72.5	136.4	36.0	86.4	42.1
YOLOv5	37.0	91.1	82.6	77.3	69.6	90.5	74.7	139.8	14.5	78.6	48.9
YOLOv7	35.3	90.6	82.6	71.1	70.7	85.0	72.5	209.0	71.4	77.8	69.2
YOLOv9	40.9	93.0	80.7	79.1	70.6	90.2	75.8	106.4	122.4	76.7	63.1
YOLOv8	45.6	90.0	81.0	78.4	70.3	89.3	75.8	209.4	6.2	77.0	69.5
This Research	53.0	92.5	79.6	84.8	74.4	91.3	79.3	163.2	9.6	79.3	70.7

Table 2. Results of ablation study 1.

Method	mAP/%	Improve/%	FPS	Weight/MB
YOLOv8-n	75.8	-	209.4	6.2
YOLOv8-n + ADSConv	76.9	1.1	146.8	5.4
YOLOv8-n + C2f_DWR	76.5	0.7	225.5	6.2
YOLOv8-n + MS-FDNet	77.3	1.5	210.0	9.9

Table 3. Results of ablation study 2.

ADSConv	C2f_DWR	MS-FDNet	mAP/%	Improve/%	FPS	Weight/MB
-	-	-	75.8	-	209.4	6.2
✓	✓	-	78.4	2.6	117.5	6.9
✓	-	✓	77.5	1.7	176.7	9.8
-	✓	✓	78.5	2.7	147.5	9.8
✓	✓	✓	79.3	3.5	163.2	9.6

Table 4. Performance of different combinations of ADSConv layers.

The Position Replaced by ADSConv					mAP/%
1	2	3	4	5	mAP/%
F	T	T	T	T	76.3
F	T	T	T	T	76.6
F	F	F	T	T	79.3
F	F	F	F	T	78.2

Table 5. Comparison of GFLOPs and parameters before and after replacement.

	Calculation Amount/GFLOPs	Parameters/M
Conv-4	0.12	0.04
Conv-5	0.10	0.29
ADSConv-4	0.18	0.07
ADSConv-5	0.12	0.25
YOLOv8	8.5	3.02
YOLOv8-n + ADSConv	8.7	3.15

Table 6. Performance of different combinations of C2f_DWR layers.

The Position Replaced by C2f_DWR				mAP/%
1	2	3	4	mAP/%
F	T	T	T	76.6
F	F	T	T	79.3
F	F	F	T	75.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiang, Q.; Wu, G.; Liu, Z.; Zeng, X. Improved of YOLOv8-n Algorithm for Steel Surface Defect Detection. Metals 2025, 15, 843. https://doi.org/10.3390/met15080843

AMA Style

Xiang Q, Wu G, Liu Z, Zeng X. Improved of YOLOv8-n Algorithm for Steel Surface Defect Detection. Metals. 2025; 15(8):843. https://doi.org/10.3390/met15080843

Chicago/Turabian Style

Xiang, Qingqing, Gang Wu, Zhiqiang Liu, and Xudong Zeng. 2025. "Improved of YOLOv8-n Algorithm for Steel Surface Defect Detection" Metals 15, no. 8: 843. https://doi.org/10.3390/met15080843

APA Style

Xiang, Q., Wu, G., Liu, Z., & Zeng, X. (2025). Improved of YOLOv8-n Algorithm for Steel Surface Defect Detection. Metals, 15(8), 843. https://doi.org/10.3390/met15080843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved of YOLOv8-n Algorithm for Steel Surface Defect Detection

Abstract

1. Introduction

2. Materials and Methods

2.1. Improved Network Structure of the YOLOv8-n Algorithm

2.2. ADSConv Module Design of the YOLOv8-n Algorithm

2.3. C2f_DWR Module Designs of the YOLOv8-n Algorithm

2.4. Multi-Scale-Focus Diffusion Pyramid Network (MS-FDNet) Module

3. Experimental Results and Analysis

3.1. Experimental Details, Datasets, and Evaluation Indicators

3.2. Experimental Validation Analysis of the Improved Modules

3.2.1. ADSConv Module

3.2.2. C2f_DWR Module

3.2.3. MS-FDNet Module

3.3. Comparative Analysis of Different Defect Detection Algorithms

3.4. Ablation Experiment

3.5. Validation Analysis of Improved Module Replacement Positions

3.5.1. ADSConv Module Analysis

3.5.2. Analysis of C2f_DWR Module

3.6. Result Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI