1. Introduction
As one of the most important fundamental materials in steel and iron industry, steel strips are extensively used in automobile manufacturing, locomotives, aerospace, precision instrumentation, etc. For thin and wide flat steel, surface defects are the greatest threat to the product quality. Even for occasional internal defects, morphological changes will arise on the surface with a large probability. Any quality problems suffering on steel surface would give rise to irretrievable economic and reputation losses to both the steel company and end use customer. To cope with the above issue, automated visual inspection (AVI) instrument targeting on surface quality emerges as a standard configuration for flat steel mills [
1].
Among the numerous categories of surface defects of hot-rolled steel strips, the roll mark is put on the top list of the most serious defects by steel mills. As a typical representative of roll marks, the roller cracking defect makes the surface extremely uneven. After the downstream continuous rolling process, these defects would transform into bumps or even holes. What is worse, such defects often occur to periodical and continuous distribution. To sum up, roll mark is one of the most harmful defects threatening steel surface quantity. Consequently, how to rapidly and accurately detect roll marks is significant for the surface defect AVI instrument.
To be specific, roll marks have two characteristics as shown below: (1) Low contrast: Roll marks manifest with concave and convex manner, however, their deformation is very shallow. Thus, as shown in
Figure 1a, roll marks usually show very low contrast with the background. (2) Large intra-class distance: The appearances of roll marks are diverse, irregular, and multiscale.
Figure 1b shows three patches of roll marks with completely distinct appearances. Besides, massive pseudo defects, random noises, and aperiodic vibration degrade surface image quality of steel strips under the harsh industrial environment of hot-rolling line. In other side, the fine resolution requirement of defects and high rolling speed enforce the camera device to constantly generate massive image data. To sum up, the online roll mark detection in this paper is essentially a tiny target inspection task in high-resolution images captured under harsh environment.
The conventional steel surface defect detection method based on computer vision usually be spilt into three classes: conventional statistical, spectral and model-based. Neogi et al. [
2] proposed a global adaptive percentile thresholding scheme based on gradient images to separate defect selectively. It can precisely retain the defect edges regardless of the scales of defects. As a classical operator, local binary pattern (LBP) is extensively used to characterize local texture features of images, which benefit from its rotation invariance and gray invariance [
3]. Song et al. [
4] designed an AECLBP that regarded the surrounding gray values as its central gray value. AECLBP had achieved 98.93% accuracy on NEU datasets and great robustness to noise. Luo et al. [
5] proposed a GCLBP by first exploiting the non-uniform patterns information to enrich the descriptive information, GCLBP achieved 99.11% accuracy on NEU datasets. However, the conventional statistical methods have the following weaknesses: large computation requirement, unsatisfactory real-time performance, scale sensitivity, and noise sensitivity [
6]. Song et al. [
7] adopted wavelet transform to construct a scattering convolution network (SCN) which can enhance the tolerance ability of local and linearized deformations, and SCN obtained accuracy of 97.22% on hot-rolled steel strip defect detection application. Nonetheless, the spectral methods have the following weaknesses: they are easily affected by feature correlations between the scales, and high computation and memory requirements [
8]. Xu et al. [
9] designed a hidden Markov tree model called CAHMT based on an assertion that the correlation of wavelet coefficients of flat steel surface images at different scales satisfies Markov property. CAHMT’s detection false rate is as low as 3.7%. Fofi et al. [
10] designed a non-parametric texture defect detection method by using Weibull features. It performs well on DAGM database. However, it is hard for Weibull distribution to handle defects with gradual intensity or with low contrast. Hence, Liu et al. [
11] proposed a Haar–Weibull-variance (HWV) model by using Haar features from local patches. This method achieved accuracy of 96.2% on a hot-rolled steel surface defect dataset. Nonetheless, the above model-based methods have the following disadvantages: spatial limitations and failure to detect tiny defects among global images. In conclusion, traditional computer vision methods had achieved ideal results in the steel surface defect detection application. However, the aforementioned traditional methods only take low-level features into account, which could not fully characterize the image features and semantic details. It is also noteworthy that there are also nearly no specific roll mark detection methods and datasets.
For the past few years, hardware computing devices boosting and the continuous expansion of public datasets have facilitated the development of neural networks [
12]. Object detection networks based on neural networks are theoretically divided into two branches, namely one-stage detection and two-stage detection. One-stage detection networks operate feature extraction and prediction regression in an integral network orderly. Li et al. [
13] made the YOLO all convolutional to detect flat steel surface defects and reached accuracy of 99% with 83 FPS. Liu et al. [
14] improved SSD network and put forward RAF-SSD, which obtained 75.1% mAP in NEU database. However, the above methods failed to cope with the multiscale defects and tiny defects. Two-stage detection networks firstly propose a certain amount of proposal boxes, then classify them through another convolutional neural network. Dong et al. [
15] proposed a pyramid feature fusion and global context attention network for pixel-wise detection of surface defect, called PGA-Net, which achieved 82.15% mean pixel accuracy in NEU-Seg. Cha et al. [
16] designed a structural visual inspection method to decrease the processing time of Faster R-CNN which be capable of detect multiple classes defects. The ointment is that the real-time performance of Faster R-CNN is not satisfactory. He et al. [
17] proposed a multi-scale feature fusion network (MFN), which obtained 82.3% mAP in NEU-Det. Song et al. [
18] proposed a novel encoder–decoder residual network (EDRNet), which can accurately segment the whole defect instances with clear-cut boundary and effectively filter out irrelevant background noise. Nonetheless, the excellent performance of aforementioned deep learning methods was processed in image patches, such as NEU-Det (1200 samples with the resolution of 200 × 200), which is widely used among peers. These samples were beforehand processed by aforehand detection, selection and segmentation based on prior knowledge, which are rather easier to handle than our target (detect tiny defects in high resolution images). On the realistic industrial production line; however, the images acquired by the steel defect detection system were wide and high-resolution [
19]. For example, the image resolution of the surface defect detection system developed by our research team in the early stage is 1024 × 4096. In consideration of the lack of public databases for steel strip surface defect inspection field, we open a raw defect database of hot-rolled steel strip surface CSU_STEEL for the first time, which contains six kinds of defects including roll mark, elastic deformation, wave, inclusions, oxide scale, and scratches with 1024 × 4096 resolution.
As far as we know, this dataset is the first wide-format high-resolution hot-rolled steel strip surface original image dataset. Faced with the challenge of tiny target detection in high resolution image capture under harsh environments, a novel method—namely, smoothing complete feature pyramid networks (SCFPN)—is proposed for the above focused task. The concept of complete intersection over union (CIoU) is applied in feature pyramid networks to obtain faster regression speed and higher prediction accuracy by suppressing vanishing gradient in training process. In addition, label smoothing is employed to improve the generalization ability of model.
The rest of this paper is organized as follows.
Section 2 elaborates the proposed SCFPN in detail.
Section 3 will introduce our experiments setting. Afterwards, our experiments are evaluated quantitatively and qualitatively, and the experimental results on defect detection will be analyzed in
Section 4.
Section 5 will discuss the results. Finally,
Section 6 will conclude this paper and discuss the future work.2. Materials and Methods
This paper concentrates on the steel strip surface roll mark detection problem, A targeted two-stage object detection method—namely, smoothing complete feature pyramid networks (SCFPN)—is designed, and the structure of SCFPN is shown in
Figure 2. Primarily, the backbone extracts feature of multi-levels from input images. ResNets increase layers of networks without causing degradation problem. The deeper networks can extract more abstract feature with robust semantic information. Feature maps range from bottom (fine resolution) to top (coarse resolution) in the pyramid hierarchy are utilized to construct feature pyramid (Neck) by aggregations between fusion of multi-scale features. SCFPN only acquires a single-scale image of an arbitrary size, and builds feature pyramid at multiple scales by convolution. Afterwards, Faster R-CNN (Head) is applied to execute bounding boxes regression and classification tasks. Concretely, loss function of bounding boxes regression uses complete intersection over union (CIoU) loss which provides faster fitting speed and higher prediction accuracy. Loss function of classification is the Cross-entropy loss with label smoothing which is employed to enhance the generalization ability of the model. After the above steps, the networks export output images with predicted boxes and labels [
20].
3. Experiments
The model is implemented by TensorFlow framework (version 1.4.1). TensorFlow provides libraries for building deep learning model architectures. The evaluative datasets included DeepPCB, NEU datasets, and CSU_STEEL. The FPN series networks (FPN, CFPN, and SCFPN) employ ResNet101 which pre-trained on ImageNet as backbones. During the training process, the basic learning rate is set to 0.001, and the warm-up learning rate and step learning are adopted to stabilize the initial training process. The Cross-entropy loss is introduced to measure the deviation between the predicted class and the ground-truth class. Stochastic gradient descent (SGD) minimizes the deviation and obtains the optimum weight matrix during the back propagation process. Meanwhile, the momentum algorithm is adopted to accelerate the training.
Table 1 lists some hyper-parameters used in FPN series networks. All the experiments are performed on a server (12 GB NVidia Titan Xp GPU, 2.2GHz Intel Xeon E5–2630 CPU, 64GB RAM, Dell, Beijing, China).
3.1. Hot-Rolled Steel Strip Surface Dataset
In the field of steel defect detection, nearly all the public datasets were processed by aforehand detection, selection, and segmentation based on prior knowledge. The samples of these datasets could not reflect the most real situation in the actual industrial production line, which has a certain impact on the performance stability of the algorithm after being transplanted to the production line. Faced with the above bottlenecks, we collected and produced a hot-rolled steel strip surface defect database called CSU_STEEL to imitate the industrial production line situation perfectly for the first time. CSU_STEEL contained 968 original images of hot-rolled steel strip surface on the industrial production line, with a size of 1024 × 4096 pixels.
Figure 5 shows samples of CSU_STEEL.
CSU_STEEL has six classes of defects including roll mark, elastic deformation, wave, inclusions, oxide scale, and scratches. Different from sliced samples, each image from CSU_STEEL contains one or several classes of defects, which can be applied for both classification and detection. To our knowledge, this database is the first wide-format and high-resolution hot-rolled steel strip surface raw image dataset among peers. It provides a public dataset to verify the algorithm performance for researchers in the field of defect detection, which has contributed to the development and applications of surface defect detection field.
3.2. Evaluation Metrics
In our experiments, precision, recall, average precision (AP), mean average precision (mAP), and processing time are employed as evaluation metrics to investigate the performance of each network.
Precision is adapted to evaluate the percentage of correctly classified defects, and is calculated by
where True Positive (
TP) indicates the numbers that model correctly predicts the positive class, and False Positive (
FP) indicates the numbers that model incorrectly predicts the positive class.
Recall evaluates the percentage of actual positives was identified correctly, and its calculation formula can be expressed as
where
FN (False Negative) refers to the numbers that model incorrectly predicts the negative class.
AP is an overall measure metric of recall and precision which is the mean of the precision after each related sample is calculated. For the sake of comprehensiveness and simplicity, AP is applied to evaluate the detection performance of a model for a certain class comprehensively.
MAP, is the average AP of each class, is adapted to evaluate the comprehensive detection performance of a model for all classes [
25].
Processing time, namely the time requirement for network to process a single image, is adapted to evaluate the real-time performance of defect detection of the model.
3.3. Evaluation Experiments
In this paper, in order to demonstrate the performance improvement of the proposed methods, we firstly implement evaluation experiments on a widely used database, DeepPCB, which contains 1500 image pairs with annotations including positions of six common types of PCB defects [
26]. With the same purpose, so as to validate the effectiveness and efficiency of the proposed methods in steel surface defect detection, especially roll mark detection. We conducted evaluation experiments on NEU and CSU_STEEL further. Each database is divided into a training set and test set by hand-out method. The methods proposed in this paper are compared with the state-of-art surface defect detection methods, including Faster R-CNN, SSD, YOLOv3, and YOLOv4. The hyper-parameters of each model are adjusted and optimized simultaneously to obtain peak performance on each dataset.
The evaluation experiments contain three parts. Primarily, in order to verify the influence of CIoU on the bounding boxes regression, the loss value comparison experiment compares the loss curves of FPN (equipped with IoU loss), CFPN (equipped with CIoU loss), and SCFPN (equipped with CIoU loss) during the training. Afterwards, the quantitative evaluation between SCFPN and other methods are executed to prove the prominent performance of proposed SCFPN. The last part is qualitative evaluation between SCFPN and other methods and the qualitative assessment will verify the validity and generalization ability of the SCFPN on hot-rolled steel strip surface roll mark detection application.
5. Discussion
Based on the analysis of
Figure 6 and
Figure 7, we can draw a conclusion that CIoU endows network higher convergence speed and accuracy. In theory, the introduction of overlapping area and center distance penalty term provide the gradients about the exact direction of ground-truth boxes for the predicted boxes, even if they do not intersect. The
α weighting function that is involved in CIoU loss function provides the gradients about shape for the predicted boxes, prompting the predicted boxes to fit the size of the ground-truth boxes in shape more quickly. Based on the above adjustments, therefore, the vanishing gradient problem resulting from IoU loss function is effectively solved. What is more, CIoU also offers better convergence speed and accuracy for bounding boxes regression.
The comparisons of experimental results of
Table 2 and
Table 3 indicate that CIoU loss function promotes the network in accuracy. Theoretically, α weight function in CIoU prompts the predicted boxes to fit with ground-truth boxes accurately and quickly. Label smoothing contributes to higher detection accuracy of the network ulteriorly. The intrinsic reason is that label smoothing restrains the overfitting of model and promotes the generalization ability effectively. It is worth mentioning that experimental settings and computing devices of the methods being cited are different. Therefore, the purpose of quoting them is for presentation more than comparison.
In
Table 4, the proposed method achieves higher AP. However, the other methods’ performance is not very satisfactory. We can conclude that the proposed method is equipped handle a tiny target inspection task in high-resolution images captured under harsh environment. The intrinsic reason is that feature pyramid fuses multi-levels feature maps, and spreads semantic information from high layers to low layers to enrich fusing feature maps by downsampling and lateral connections. Therefore, the roll marks with large scale variation and most tiny targets can be detected stably. Contrary to methods based on multi-level feature pyramid, the networks of other methods are lack of modules which take full advantage of features, and the extracted features of these networks only contain information of respective scale which do not employ them in combination. It should be noted that YOLOv3 has feature pyramid modules, but the finest fusing feature’s resolution is 52 × 52. This results in that YOLOv3 performs poorly in tiny defect detection. In addition, the image resolution of DeepPCB and NEU datasets are 640 × 640 and 200 × 200, respectively. The scale variance between defects and the whole image is relatively small, moreover, the intra-class distance of defects is relatively small. Therefore, even methods without feature aggregation modules are competent for such detection tasks with satisfactory scores. Nonetheless, the resolution of CSU_STEEL dataset is 1024 × 4096, corresponding to the scale variance between defects and images is larger. Besides, roll marks got characteristics with low contrast and large intra-class distance, which brings about enormous difficulty for conventional object detection methods. What calls for special attention is that SCFPN is a two-stage network which implements feature extraction and regions classification and regression in two networks. YOLOv4 is a one-stage network which integrates feature extraction and object classification and regression in a single network. Nonetheless, YOLOv4 generates massive anchors result in imbalance between positive samples and negative samples. As a result, compared with our SCFPN, YOLOv4 consumes less processing time at the expense of detection accuracy. However, both methods satisfy the requirements of real-time detection of roll marks in actual steel industrial field.
By comprehensively comparing these methods in
Figure 8 and
Figure 9, we notice that SCFPN shows excellent recognition ability for different roll marks. Especially, for the low contrast and tiny roll marks submerged by noise, SCFPN behaves more robust. More significantly, our SCFPN is more refined for the location of defects. The comparative experiment further verifies that our proposed SCFPN has more robust feature extraction and generalization ability, which strongly adapt to the industrial application of roll mark detection in steel mills.