Surface Defect Detection Methods for Industrial Products: A Review

: The comprehensive intelligent development of the manufacturing industry puts forward new requirements for the quality inspection of industrial products. This paper summarizes the cur ‐ rent research status of machine learning methods in surface defect detection, a key part in the quality inspection of industrial products. First, according to the use of surface features, the application of traditional machine vision surface defect detection methods in industrial product surface defect de ‐ tection is summarized from three aspects: texture features, color features, and shape features. Sec ‐ ondly, the research status of industrial product surface defect detection based on deep learning technology in recent years is discussed from three aspects: supervised method, unsupervised method, and weak supervised method. Then, the common key problems and their solutions in in ‐ dustrial surface defect detection are systematically summarized; the key problems include real ‐ time problem, small sample problem, small target problem, unbalanced sample problem. Lastly, the com ‐ monly used datasets of industrial surface defects in recent years are more comprehensively sum ‐ marized, and the latest research methods on the MVTec AD dataset are compared, so as to provide some reference for the further research and development of industrial surface defect detection tech ‐ nology.


Introduction
In the industrial production process, due to the deficiencies and limitations of existing technology, working conditions, and other factors, the quality of manufactured products is extremely easily affected. Among them, surface defects are the most intuitive manifestation of product quality being affected. Therefore, in order to ensure the qualification ratio and the reliable quality, product surface defect detection [1,2] is necessary. "Defect" can be generally understood as the absence, imperfection or area compared with the normal sample. The comparison between the normal sample and the defective sample of industrial products is shown in Figure 1. Surface defect detection refers to the detection of scratches, defects, foreign body shielding, color contamination, holes, and other defects on the surface of the sample to be tested, so as to obtain a series of relevant information such as the category, contour, location, and size of surface defects of the sample to be tested [3]. Manual defect detection was once the mainstream method, but this method is low in efficiency; the detection results are easily affected by human subjectivity and cannot meet the requirements of real-time detection. It has been gradually replaced by other methods.
At present, some scholars have launched relevant research on surface defect detection, involving the latest methods, applications, key issues, and many other aspects [4].
Literature [5] summarizes the current research status of defect detection techniques such as magnetic particle inspection, penetrant inspection, eddy current inspection, ultrasonic inspection, machine vision, and deep learning [6,7]; compares and analyzes the advantages and disadvantages of the above methods; and combs the defect detection technology in electronic components, piping, welding parts, machinery parts, and the typical applications in quality control. From supervised learning model method, unsupervised learning model method [8], and other methods [9] (semi-supervised learning model method and weakly supervised learning model method), literature [10] analyzes surface defect detection methods based on deep learning, and then, three key problems of realtime, small samples, and comparison with traditional image processing-based defect detection methods in surface defect detection are discussed. After reviewing the automatic optical (visual) inspection (AOI) technology, literature [11] systematically described several steps and related methods used in the technology for surface defect detection. Literature [12] first lists the different objects in the field of defects; the mainstream technologies and deep learning methods used for defect detection are introduced and compared. After that, the applications of ultrasonic detection and deep learning methods in defect detection are analyzed. Finally, the existing applications are investigated and based on defect detection equipment, several challenges for defect detection are proposed, such as threedimensional target detection, high precision, high positioning, rapid detection, small targets, etc. Through investigation, it can be found that in the field of surface defect detection of industrial products, there is currently little literature review on machine learning methods, and although some literatures summarize the problems and challenges in surface defect detection for industrial products, their solutions and directions are not systematic enough. In addition, in terms of datasets, there is still no comprehensive arrangement of industrial product surface defect detection datasets. Therefore, in order to solve the above problems, this paper firstly summarizes the research status of industrial product surface defect detection from the traditional machine vision method and deep learning method, after that, the key problems in the process of industrial surface defect detection, real-time problems, small sample problems, small target problem, unbalanced sample problem, are discussed, and some solutions for each problem are given. Finally, comprehensive industrial surface defect detection datasets are summarized, and several new methods using MVTec AD dataset are compared.  [2][3][4][5][6]. Defect categories are shown below the image. Samples taken from the MVTec AD dataset [13]: the first row: leather; the second row: tile; the third row: cable. (MVTec AD is a dataset proposed by MVTec Software GmbH in 2019 for benchmarking Anomaly Detection methods, which focuses on industrial Detection and AD is the abbreviation of Anomaly Detection.).
The publication time of the references in this review is mainly concentrated after 2016 because these documents can represent the development of the latest technology. By referring to the related reviews, the order of organization of this paper is determined as traditional methods, latest methods, key problems, and tools (datasets) which is also the scope of this paper. The main contents of this paper are as follows: Section 2, the summary of industrial product surface defect detection methods based on traditional feature-based machine vision algorithm; Section 3, the summary of industrial product surface defect detection methods based on deep learning; Section 4, key problems and their solutions analysis and discussion; Section 5, the collation and summary of the industrial product surface defect detection dataset and the comparison of the latest methods of the MVTec AD dataset.

Traditional Feature-Based Machine Vision Algorithm for Surface Defect Detection
Traditional surface defect detection methods have played a huge role for a period of time. This chapter classifies the traditional industrial product surface defect detection method based on machine vision from the feature extraction level. According to the different features, they are mainly divided into three categories: texture feature-based method, color feature-based method, shape feature-based method. The specific further chapter arrangement is shown in Figure 2.

Texture Feature-Based Method
The texture feature reflects the homogeneity phenomenon in the image and can reflect the organization structure and arrangement properties of the image surface through the gray distribution of the pixels and their nearby spatial neighborhoods. Methods based on texture feature can be further divided into four categories: statistical method, signal processing method, structural method, and model method [14,15].
For the statistical method, the main idea is to treat the gray value distribution on the surface of an object as a random distribution, analyze the distribution of random variables from the perspective of statistics, and describe the spatial distribution of gray value through histogram feature, gray level co-occurrence matrix, local binary pattern, autocorrelation function, mathematical morphological, and other features.
For the signal processing method, the main idea is to treat the image as a two-dimensional signal and analyze the image from the perspective of signal filter design, so it is also called the frequency spectrum method; the signal processing method includes Fourier transform method, Gabor filter method, wavelet transform method, and other specific methods.
For the structural method, its theoretical basis is the texture primitive theory. The texture primitive theory states that the texture is composed of some minimal patterns (called texture primitives) that appear repeatedly in space according to a certain rule.
For the model method, the commonly used models for detecting surface defects of industrial products include MRF model and fractal model.
For the method based on texture features, this paper summarizes some recent application examples in industrial product surface defect detection, which are arranged according to the classification order in Figure 2, and the details are shown in Table 1.

Color Feature-Based Method
The calculation amount of color feature is small, and the dependence on the size, direction, viewing angle, and other factors of the image itself is small, and it has high robustness. It is one of the visual features widely used in image retrieval.

Color Histograms
The color histogram describes the proportion of different colors in the entire image, which is the result of global statistics; it does not pay attention to the spatial position of the color and cannot describe the objects in the image.
Features: Insensitive to physical transformations (rotation, scaling, etc.); if the image has multiple areas with obvious differences in the color distribution between the foreground and the background, the color histogram will show double peaks.
Literature [27] proposed a similarity evaluation method based on color histogram for Electrical Resistance Tomography (ERT) image evaluation. For the defect detection of wood surface, literature [28] proposed a classification method based on the percentage color histogram feature and feature vector texture feature of image block; this method has been proved effective by experiments, especially for the defect of junction type. Literature [29] designed a 2-step technological process of particle board defect detection by using SVM and color histogram features to complete the detection of defects and using smoothing and threshold technology to complete defect localization.

Color Moments
The main idea of color moment is that any color distribution in the image can be represented by its moments of each order. Since the information of color distribution is mainly concentrated in low-order moments, usually only the first-order moment (mean), the second-order moment (variance) and the third-order moment (offset) of the color are enough to show the color of the image surface distributed.
Features: no consideration of pixel spatial position; no need to vectorize the color features; no need for color quantization, smoothing, and other subsequent processing.
Literature [30] proposed a method of weighted fusion of color moment features and FSIFT features according to the magnitude of influence, which solved the problem that a single feature does not express the content of tile surface defects obviously. Literature [31] used cosine similarity to test the periodic law of the magneto-optical images, which proved the correctness of the law summarized by the color moment feature, so as to select the appropriate magneto-optical image for welding defect detection and location.

Color Coherence Vector
The color coherence vector is an improved algorithm of the color histogram; its main idea is that each color cluster in the histogram is divided into two parts: aggregation and non-aggregation; in the process of image similarity comparison, the similarity is compared respectively, and then, a similar value is obtained after comprehensive trade-off so as to get a result.
Literature [32] combined LBP feature with color aggregation vector feature weighted fusion and combined with the RBF-based SVM, so as to propose an image classification method that can improve the classification accuracy and calculation speed. Literature [33] stored the extracted color aggregation vector and texture feature in the form of a feature vector for subsequent network training.

Other Color Features
In addition to color histograms, color moments, and color aggregation vectors, color features commonly used for surface defect detection of industrial products include color sets and color correlation maps. Among them, the color set is also a kind of global color features and matching method; it is an approximation to the color histogram, which is expressed as a binary feature vector; by constructing a binary search tree, the retrieval speed can be speeded up. The color correlation graph describes the proportion (probability) of the number of pixels of a certain color in the whole image, which can reflect the spatial correlation between different color pairs; usually, it needs high hardware conditions.

Shape Feature-Based Method
The method which is based on the shape effectively uses the target of interest in the image for retrieval. Among them, the contour-based method is the main type of method. The contour-based method obtains the shape parameters of the image by describing the outer boundary features of the object; the representative methods are Hough transform and Fourier shape descriptor.
Hough transform uses the global features of the image to connect the edge pixels to form the closed boundary of the region, and its theoretical basis is the duality of point to line. Literature [34] proposes a method for detecting bottle surface defects; in the ROI extraction stage, fast Hough transform was used to detect the boundary line of light source. By using Gabor filter and Hough transform, literature [35] realized the detection of linear defects (such as indentation and bump) on the middle surface of E-TPU. Literature [36] realized surface defect detection of small camera lens based on circle Hough transform, polar coordinate transformation, weighted Sobel filter, and SVM.
The Fourier shape descriptor uses the Fourier transform of the object boundary as the shape description and uses the closure and periodicity of the region boundary to transform a two-dimensional problem into a one-dimensional problem. Literature [37] proposed a method for detecting and locating small defects in aperiodic images, which is based on global Fourier image reconstruction and template matching. Literature [38] proposed a detection method for cutting defects on the surface of magnets; the method uses Fourier transform and Hough transform to reconstruct the image of the magnet surface, and obtains defect information by comparing the gray difference between the reconstructed image and the original image.
In addition to the above three types of features, some other features, such as spatial relationship features, can also be used for the surface defect detection of industrial products. Because the surface of industrial products mostly contains a variety of information, it is usually not enough to use only a single feature or a single class of features. Therefore, in practical applications, multiple features and multiple classes of features are often used in combination.

Surface Defect Detection Method of Industrial Products Based on Deep Learning
The rapid development of deep learning has made it more and more widely used in the field of defect detection. This chapter is based on the common classifications of deep learning: supervised method, unsupervised method, weakly supervised method, and introduces briefly the current status of research on surface defect inspection of industrial products. The specific defect detection methods are shown in Figure 3.

Supervised Method
The supervised method requires that the training set and the test set are indispensable, and the samples in the training set must be labeled [39], in which the training set is used to find the inherent laws of the samples and then apply the laws in the test set. Supervised methods can be divided into supervised methods based on metric learning and supervised methods based on representation learning. Among the above supervised surface defect detection methods, the common model based on metric learning includes the Siamese Network; according to the three stages of defect detection, the method based on representation learning can be roughly divided into three categories: classification network, detection network, and segmentation network. Among them, the one that is commonly used as classification networks is Shuffle Net; the one that is commonly used as detection networks is Faster RCNN; the ones that are commonly used as segmentation networks are: FCN, Mask RCNN, etc. This section takes the network model mentioned above as an example and briefly introduces its research status in surface defect detection tasks. In the task of defect detection, the focus of the classification network is to solve the "what" problem, that is, to determine the type of the image (whether the image contains defects and what is the type of defects); the focus of the detection network is to solve the problem of "where is the defect", that is, to obtain the specific location information and category information of the defects by determining the location of the defects; the focus of the segmentation network is to solve the problem of "how many defects", that is, to segment the defect area from the background and obtain the location, category, attribute and other information of the defect.

Siamese Network
Siamese network can be used to judge the similarity between two samples; the core idea of its loss function is to make the input distance of similar categories as small as possible and the input distance of different categories as large as possible [40].
Literature [41] proposed a two-stage multi-scale feature similarity measurement model. After using the Siamese network as the backbone architecture to complete the feature extraction of pairwise image, the spatial pyramid pooling network was incorporated into the feature maps of each convolutional module to fuse the multi-scale feature vectors, and then the discriminative feature embedding and similarity metric were obtained by using the contrastive loss during the training process. The test was performed on the PCB data set with 6 types of defects including short circuit, open circuit, mouse bite, burr, leak, and copper, and the area under the ROC curve of all types was above 0.92. Literature [42] proposed a two-layer neural network for cross-category defect detection (SSIM layer: generating the function of analog SSIM components; SNN layer: composed of Siamese network connected to SSIM layer) without re-training. This method learns the differential features from image pairs containing some structural similarities, and it is assumed that different classification objects can share some structural similarities caused by the differences of these learned image pairs. Experiments in the actual factory dataset show that: this method has the ability of cross-class defect detection.

ShuffleNet
ShuffleNet, a lightweight network with high computational efficiency, uses two new methods of pointwise group convolution and channel shuffle to ensure the computational accuracy and reduce the computational cost effectively.
Based on Shuffle Net V2 framework, literature [43] proposed a novel in-line inspection solution for code on complex backgrounds for plastic containers, the proposed algorithm can also deal with images on complex backgrounds and be used into a practical industrial inspection system. Literature [44] proposed a ShuffleDefectNet defect detection system based on deep learning, which achieved an average accuracy of 99.75% on the NEU dataset.

Faster R-CNN
Faster RCNN, introduces a region proposal network (RPN) on the basis of Fast RCNN, puts the steps of generating regional recommendation into the neural network, and realizes an almost costless regional recommendation algorithm in the end-to-end learning mode, which greatly improves the speed of target detection and mentions also a sliding window approach.
Literature [45] proposed a cascaded structure based on Faster R-CNN, which converted the defect detection problem of power line insulators into a two-level target detection problem. Among them, the first stage was used to locate the insulator area; the second stage was used to locate the insulator area. Based on Faster R-CNN, literature [46] proposed a new network for PCB surface defect detection, which used ResNet50 with feature pyramid as the backbone and used residual units of GRAPN and the residual units of ShuffleNetV2 at the same time.

Fully Convolutional Networks
In an FCN, an end-to-end image segmentation method, all layers in the network are convolutional layers; the network mainly uses three technologies: Convolutional, Upsample and Skip Layer; label map can be directly obtained by making the network do pixellevel prediction. One of the core ideas is the deconvolution Layer with increased data size so that accurate results can be output.
Literature [47] proposed an algorithm that uses a deep neural network to combine Autoencoder and FCN to distinguish keyboard light leakage defects from dust. The proposed method was tested in a test set composed of 1632 images, and the false positive rate of light leakage defect was reduced from 6.27% to 2.37%. Literature [48] designed a complete system for automatic identification and diagnosis of insulator strings, which combined different deep learning-based components, including one insulator string segmentation component and two diagnostic components for missing and damaged insulator disc units, respectively. Literature [49] proposes a method for defect segmentation of solar cell electroluminescence (EL) images; this method uses FCN with a specific architecture of Unet, which can obtain the defect segmentation map in one step; compared with the method of repeatedly executing CNN sliding window, this method obtains similar results. Literature [50] combines FCN with Faster RCNN to design a deep learning model based on FCN for defect detection of tunnels; this model can accurately and quickly detect defects such as stains, leaks, and blockages of pipelines.

Mask RCNN
Mask RCNN is an extended form of Faster-R-CNN that integrates object detection and instance segmentation functions for a two-stage framework network: the first stage scans images and generates proposals (proposals may contain a target area), and the second stage classifies proposals and generates bounding boxes and masks.
Literature [51] proposed an improved model of Mask RCNN-IPCNN. The model first uses a deep residual neural network to process the image from the image pyramid to extract features; the extracted features are generated through the feature pyramid to generate pyramid features, which then are processed by RPN to generate defect bounding box and classify it, and after that, uses FCN to generate a defect mask in the defect bounding box. Literature [52] designed an end-to-end system that can locate solar panel pollution; the system is based on Mask FCNN (Fully Convolutional Mask RCNN), which consists of a classification network ImageNet and a comprehensive network of bottom-up upsampling of feature images; the effect of information loss is eliminated by up sampling.
In the field of surface defect detection of industrial products, due to high accuracy and good adaptability, the supervised method is the most mainstream method in the current deep learning method, and its application scope is becoming wider and wider. However, the disadvantages of this kind of method have gradually become prominent in practical applications, that is, the huge workload brought by the advance labeling of the dataset, especially in some high-precision scenarios; at the same time, the continuous improvement of the industrial level has led to the continuous reduction of defective samples, which also has a certain impact on the method of supervision.

Unsupervised Method
In response to the disadvantages of supervised methods, some researchers have begun to look at unsupervised methods. When the input training data are only the data information itself, and there is no label information, the machine learns the pattern of these unlabeled data to obtain some inherent characteristics and connections of the data and automatically classifies the data [53]. Then, when new data are encountered, it can judge which model that new data belong to according to the previously learned model (here, the model refers to the model composed of original data). This process belongs to unsupervised learning.
Among the unsupervised learning methods, the most commonly used surface defect detection methods mainly include the reconstruction-based methods and the embedding similarity-based methods. For the former, the training of the neural network structure is only used for the reconstruction of normal training images, and the anomalous images are easy to be found because they cannot be reconstructed well; anomaly score is usually represented by reconstruction error. The most common reconstruction-based method is Autoencoder (AE) and Generative Adversarial Network (GAN). For the latter, deep neural networks are used to extract meaningful vectors describing the entire image, and the anomaly score is usually represented by the distance between the embedded vectors of the test images and the reference vector representing normality from the training dataset. Typical algorithms mainly include SPADE [54], PaDIM [55], PatchCore [56], etc. In addition to these two types, Deep Belief Network (DBN) and Self-Organizing Map (SOM) can also be used for surface defect detection.
This section will first take the four network models of Autoencoder (AE), Generative Adversarial Network (GAN), Deep Belief Network (DBN), and Self-Organizing Map (SOM) as examples to briefly introduce their research status in surface defect detection task. The introduction of three typical algorithms of the embedding similarity-based methods will be shown in Section 5.

Autoencoder
Encoder and decoder are the two core parts of the self-encoder. Among these two, the encoder corresponds to the hidden layer in the network model and is used to learn the low-dimensional features of the input signal; the decoder corresponds to the output layer in the model, used to reproduce the input signal as much as possible. Therefore, enabling the encoder to learn the good low-dimensional characteristics of the input signal and reconstructing the input signal are the ultimate goals of the self-encoder.
Literature [57] used the SSIM index in traditional image processing as the reconstruction loss and introduced it into the AE-based image reconstruction; tests were carried out on the woven texture data set and the nanofibrous material dataset, both of which obtained a significant difference compared with the L2 loss. In order to solve the shortcoming that AE has good reconstruction of abnormal samples. Literature [58] converts anomaly detection into patch-sequence inpainting [59] problem; meanwhile, to make up for the shortcomings of this type of method that it is difficult to cover larger abnormal regions, transformer network is proposed to reconstruct covered patches only, and local and global embedding methods were designed for different situations. Literature [60] designed a full-convolution AE (MS-FCAE) with multi-scale feature clustering, used multiple FCAE sub-networks at different scales to reconstruct the texture image background and then subtract the texture background from the input image to obtain the residual images, and finally merged them to obtain a defect image, where each FCAE sub-network uses a fully convolutional neural network to directly obtain the original feature image from the input image and perform feature clustering. Literature [61] proposed a multi-scale convolutional denoising autoencoder (MSCDAE), which uses a multi-modal strategy to synthesize the results of multiple pyramid levels and tests on LCD panels, tiles, and textiles; the experiment proves that this method obtains high accuracy and robustness. Literature [62] uses the convolutional autoencoder (CAE) to detect the logo image of mobile phones, extracts the difference between the template image generated by CAE and the input image, and then processes it through mathematical morphology to achieve the purpose of anomaly detection. Literature [63] proposed a convolutional autoencoder (CAE) for unsupervised feature learning. Each CAE uses traditional online gradient descent training without additional regularization items. Good results are obtained on MNIST and CIFAR10.

Generative Adversarial Network
A generative adversarial network consists of two participants: generator and discriminator. The generator is used to obtain the distribution of sample data, and the discriminator is used to estimate the probability of sample training data. The ultimate goal of this model is to learn the inherent laws of real data, predict and estimate the distribution or density of real data and generate new data according to the learned knowledge, that is, generating adversarial network manufactures data.
Literature [64] proposed a method for fabric defect detection using DCGAN. First, the GAN discriminator is used to generate the defect distribution likelihood map; then, the encoder is introduced into the standard DCGAN to reconstruct the detected image, which is subtracted from the original image to obtain the residual image highlighting the potential defect area; later, the combined residual map and the defect distribution likelihood map are combined to obtain an enhanced fusion map, and the accurate position of the defect is finally obtained by threshold segmentation on the fusion map. Literature [65] proposed a GAN-based one-class classification method for strip steel surface defects detection, where the generator G adopts codec, and the features of the hidden space obtained by encoding (the penultimate layer output of the GAN generator) are input into the SVM for defect classification, the model has achieved good test results on the images provided by Handan Iron and Steel Plant. Literature [66] proposes a detection method based on GAN. In the first stage, a novel area is detected using a generative network and a statistical-based representation learning mechanism. In the second stage, the Frechet distance is directly used in the latent space to distinguish defects from normal samples. The method achieves an accuracy of 93.75% on the solar panel data set. Literature [67] designed a GAN-based surface vision detection framework, which uses a multi-scale fusion strategy to fuse the responses of the three convolutional layers of the GAN discriminator and then uses OTSU to segment the fusion feature response map to further segment the defect location. Experiments on wood and road crack datasets have proved the effectiveness of the framework. In order to detect various defects on the fabric surface, literature [68] proposed a model based on the GAN framework. The model first uses a variety of textures to fuse to a specific location, and then, the existing fabric defect data set was continuously updated through multi-level GAN; thus, the network model is continuously fine-tuned to achieve better detection effect.

Deep Belief Networks
The deep belief networks are composed of multiple RBMs (restricted Boltzmann machines), and the training of the whole network is completed by training the RBMs individually layer by layer.
Literature [69] proposed a defect detection algorithm for solar cells based on DBN. This algorithm took reconstructed images and training images as supervision data and established a good mapping relationship between training samples and non-defect images through the fine-tuning network of BP algorithm. Literature [70] proposed a DS-DBN-SVM (Differential Search-Deep belief network-Support vector machine) model to identify the type of bolt defects. In this model, the DS algorithm is used to optimize the weights and thresholds of the DBN network; the DS-DBN model is used to extract the feature of bolt data, and the extracted feature is used as the input of SVM to identify the bolt defect type.

Self-Organizing Map
The self-organizing map simulates the different division of labor of neural network cells in different regions of the human brain, and classifies the set of input patterns by searching for the optimal set of reference vectors.
Literature [71] proposed a detection method using SOM to distinguish normal wood from defective wood. The suspected defect area is detected in the first stage, and the defect area is inspected separately in the second stage. The test in the pine wood data set has obtained relatively ideal results. Literature [72] combined Otsu with SOM to realize the detection and location of TSV defects.
The unsupervised method effectively makes up for the deficiency of the supervised method, but it still has some problems due to its own characteristics. Since only positive examples are trained, the unsupervised method cannot determine what is the correct output, so it cannot guarantee a good detection effect for every type of defect samples (which did not appear during training). Therefore, the accuracy of the unsupervised method still has a large room for improvement, and in general, the unsupervised method has a better detection effect on texture images.

Weakly Supervised Method
Some scholars combine the characteristics of supervised method and unsupervised method; thus, the weakly supervised method is produced. Compared with supervised and unsupervised methods, the weakly supervised method can achieve better performance while avoiding higher markup costs. At present, the commonly used weak supervised methods in the inspection of industrial surface defects include incomplete supervision method and inaccurate supervision method.

Incomplete Supervision Method
Incomplete supervision means that most of the training samples are not labeled and only a small number of the samples are labeled, and this part of the labeled samples is not enough to train a good model. Among the incomplete supervision methods, the semisupervised method is often used in the surface defect detection of industrial products.
The semi-supervised methods can automatically develop unlabeled sample data without manual intervention to improve the learning effect. Literature [73] designed a deep convolutional neural network structure based on residual network structure, stacking two layers of residual building modules together to form a 43-layer convolutional neural network, and at the same time, in order to achieve the balance between network depth and network width and improve accuracy, the width of the network was appropriately increased. The network structure shows good performance on DAGM, NEU steel data set, and copper clad plate dataset. Literature [74] proposed a semi-supervised model based on Convolutional Autoencoder (CAE) and Generative Adversarial Network (SGAN), the stacked CAE is trained with unlabeled data, and its encoder network was retained and input into SoftMax layer as GAN discriminator, using GAN to generate false images of steel surface defects to train the discriminator. Literature [75] designed a steel surface defect detection system consisting of sample generation and semi-supervised learning. In the semi-supervised learning part, two classifiers of CDCGAN and ResNet18 were used, and a comparative experiment on the NEU-CLS data set was used. As a result, it is proved that the method is superior to supervised learning and transfer learning. Literature [76] proposed a PCB solder joint defect detection framework. In the classification task of this framework, the concept of active learning based on the "sample-query-suggestion" algorithm and the semi-supervised learning based on "self-training" are adopted, the proposed framework has been proven to improve the classification performance while greatly reducing the amount of annotations.

Inexact Supervision Method
Inexact supervision focuses on situations where the monitoring information is given, but the information is imprecise, that is, only contains coarse-grained labels. For more tasks that contain pixel-level labels, image-level labels are coarse-grained labels.
Literature [77] based on the original ResNet-50 network, deleted the original fully connected layer and pooling layer and added two 1×1 convolutions at the end of the network to obtain the corresponding feature map of the defect, which realized that only through the image tag completes the preliminary detection of cracks on the solar panel. Literature [78] developed a WSL framework composed of localization network (LNet) and decision network (DNet) for steel surface defect detection, where LNet uses image-level label training and outputs the heat map of potential defect locations as DNet input, DNet uses RSAM to weight the regions identified by LNet, the performance of the proposed framework is proven on real industrial data sets.
At present, weakly supervised methods are relatively rare in the field of surface defect detection of industrial products, but due to the advantages of both supervised learning methods and unsupervised learning methods at the same time, the application prospects of this type of method are also broad.

Summary
In conclusion, among the three methods of deep learning, the supervised method is the most widely used because of its good accuracy, but it has obvious disadvantages; the unsupervised method is in line with the process of industrial development but has its own characteristics; the weakly supervised method is not widely used at present, but it has a broad development prospect.

Real-Time Problem
In surface defect detection tasks in real industrial scenes, real-time problems are not to be ignored. In some special scenarios, such as online analysis, online monitoring and so on, real-time problems are in an extremely important position. The goal of dealing with real-time problems is to reduce the detection time and improve the detection efficiency under the premise of roughly the same accuracy. At present, some scholars have carried out certain research on real-time problems. For example, literature [79] designed a new type of 11-layer CNN model for welding defect detection in robotic welding manufacturing. The proposed method provides guidance for the online detection of metal additive manufacturing (AM), that is, the method can meet certain real-time requirements. Literature [80] proposed a two-stage algorithm combining SSIM and MobileNet to detect surface defects on printed circuit boards, which is at least 12 times faster than Faster RCNN while maintaining high accuracy. At present, model acceleration is one of the important ideas to solve the real-time problem. Model acceleration can be carried out mainly from two aspects: algorithm and hardware, as follows: (1) Algorithm: for the network algorithm level, lightweight network can be used to accelerate the model. Commonly used lightweight models include MobileNet, Shuf-fleNet, SqueezeNet, and EfficientNet. In addition, distillation and pruning can also be used in accelerating the network at algorithm level. In terms of calculation algorithms, the convolution operation can be optimized to achieve the purpose of model acceleration. Typical algorithms include FFT, Winograd, etc. (2) Hardware: the use of GPU, FPGA, DSP, etc. is the main way to accelerate the model through hardware at present.

Small Sample Problem
In reality, surface defect detection methods based on deep learning often cannot be directly used for the surface defect detection task of industrial products. One of the main reasons is that the continuous optimization of modern industrial processes has led to fewer and fewer defective samples, that is, the number of defect images is very limited. This problem of learning from a small number of samples is usually called the small sample problem [81], which can easily lead to over-fitting problems during training. At present, there are mainly the following four mainstream solutions to solve the problem of small samples:

Data Augmentation
The common methods of data augmentation include translation, rotation, mirroring, contrast adjustment, and data synthesis. Through data enlargement, a large number of sample images can be obtained.
Literature [82] adds synthetic defects to the surface of the defect-free image to complete the expansion of the decorated plastic parts dataset. Literature [83] fuses hand-made features with unsupervised learning features in a complementary way, generating a more discriminative defect representation.

Unsupervised/Semi-Supervised Model
One of the advantages of the unsupervised model is that it only needs to be trained with positive samples instead of negative samples, which provides a direction for solving the problem of small samples. In addition, the semi-supervised model that only requires a small number of samples to be labeled is also one of the alternative models to solve the small sample problem. For specific content, please refer to Section 3.2 and 3.3.1.

Transfer Learning
Through transfer learning, the knowledge already learned from one task can be applied to other different but related tasks, especially when data for the target task is insufficient. In reality, most data or tasks are related. Therefore, transfer learning is also one of the main ideas to solve the problem of small samples. Literature [84,85] combined transfer learning and Alex Net to detect surface defects of solar panels and fabrics. Literature [86] [ [87][88][89][90] combined transfer learning with the VGG network to the surface defect detection of emulsion pump body, printed circuit board, transmission line components, steel plate, and wood. Literature [91] combined transfer learning and DenseNet to the surface defect detection of fabrics.

Optimize Network Structure
The optimization of network structure is also a direction to solve the small sample problem. Taking GAN as an example, the AnoGAN model was proposed in literature [92] in 2017, which used GAN for image anomaly detection for the first time. The model continuously iteratively optimizes the parameters of the fixed generator G, looking for the closest generated image to the test image in the latent space and then uses DCGAN for image anomaly detection. In 2019, literature [93] improved AnoGAN and proposed the f-AnoGAN model. In this model, an Encoder is proposed to map the image to a point in latent space quickly and then use WGAN for anomaly detection. The introduction of Encoder solves the problem that AnoGAN iterative optimization takes a lot of time. In addition, the GANomaly model (the overall structure is encoder-decoder-encoder) was proposed in the literature [94] in 2018; by comparing the difference between the latent variables obtained by the coding and the latent variables obtained by reconstructed coding, the abnormal samples are detected. It is worth noting that none of the above models require training with negative samples.

Small Target Detection Problem
Small target detection problem is also one of the difficulties in the field of surface defect detection of industrial products. Small target refers to a target with a small size in the image. There are two definitions of "small". One is small in absolute size, which is usually considered as a small target with a size lower than 32*32 pixels. The other is small in relative size, that is, the target size is lower than a certain proportion of the original image size, such as 0.1, that is, the target is considered small target. Literature [95] integrated the features of different layers through rainbow concatenation (pooling and deconvolution), increasing the number of feature maps in different layers while increasing the feature map relationship between different layers, which solves the problem of small goals to a certain extent. Literature [96] proposed a Cascade R-CNN multi-stage target detection framework composed of multiple detectors with different IOU thresholds, in the whole framework, the proposal adjusted by the previous stage is used as the input for the next stage training. This method is used for small targets, the detection results of small targets were significantly improved by this method. At present, there are some skills to solve the small target detection problems, summarized as the following: (1) Feature fusion: fusion of deep semantic information into shallow feature maps, using deep features to enrich semantic information while using the characteristic of shallow features that to be suitable for detecting small targets; (2) Data Augmentation: increase the type and number of samples in the small target in the training set; (3) Image Pyramid + Multi-scale Sliding Window: set different input sizes for images, select a scale randomly from them during training, scale the input image to this scale, and send it to the network; (4) Reduce the network downsampling rate: by reducing the downsampling rate to reduce the loss of the object on the feature map, a common method is to directly remove the pooling layer and use the hole convolution at the same time; (5) Reasonable anchor design: the main methods include: border clustering, that is, clustering a set of suitable anchors on the labels of the training set; statistical experiments, that is, putting the anchor and the center point of the label together with using only the width and height information to carry out matching experiment to find a group of anchors with the most consistent aspect to height ratio distribution; set smaller and denser anchors and matching strategies, such as not setting too strict IoU threshold for small objects; (6) Appropriate training method: use high-resolution images for pre-training while magnifying the input image and then fine-tune on the small-resolution image; (7) Use GAN to magnify the small objects and then detect them; (8) Use Context information: establish a connection between the target and its Context.

Unbalanced Sample Identification Problem
The identification of unbalanced samples [97,98] is another difficulty based on the deep learning method when used in the surface defect detection of industrial products. In deep learning, when the model is trained, it is usually required that the number of samples of various categories in the sample set be balanced. However, this ideal situation rarely occurs in reality. In more cases, the data of "normal" samples in the data set is usually the majority, while the data amount of "defect" or "abnormal" samples only accounts for a small part in the total sample. This phenomenon is called an "unbalanced sample" phenomenon. The problem of unbalanced sample identification mainly exists in the task of supervised learning. The emergence of this problem will cause the algorithm to pay more attention to the category with larger data volume and underestimate the category with smaller data volume, thus affecting the generalization and prediction ability of the model in the test data.
At present, the identification of unbalanced samples can be dealt with from four aspects: data level, model level, feature level, and evaluation metric level.

Data Level
The idea of processing method at the data level is to change the sample distribution of the training set, so that the sample distribution in the training set tends to balance, that is, the number of all kinds of samples tends to be consistent. It can be carried out from five aspects: data source, data augmentation, data resampling [99,100], class equalization sampling, and synthetic samples [101], as shown in Figure 4.

Model Level
(1) Cost sensitive: The idea of cost-sensitive is to increase the loss value of the misclassified samples of small categories in the objective function and adjust the model's attention on the samples of small categories by optimizing the objective function. There are two main cost-sensitive methods: reconstruct the training set or introduce cost-sensitive factors.
Reconstruct the training set: Without changing the existing algorithm, weights are assigned to each sample in the training set according to the different misclassification costs of the samples, and the original sample set is reconstructed according to the weight.
Introduce cost-sensitive factors: Assign higher costs to small-class samples and lower costs to large-class samples to balance the difference in the number of samples. Cost sensitive factor includes cost sensitive matrix and cost sensitive vector, and cost sensitive method requires to specify cost sensitive matrix (or cost sensitive vector) before processing. In actual situations, the specific value of the misclassification weight in the costsensitive matrix (or cost-sensitive vector) can usually be specified based on information such as the ratio between samples and the confusion matrix of the classification result.
(2) Integrated learning: There are two main ways of using ensemble learning for defect detection, respectively: A (3) Converted to anomaly detection problem: When the sample classes are extremely unbalanced, the defect detection problem can be regarded as an anomaly detection problem, and the anomaly detection algorithm (such as One-Class SVM, SVDD, etc.) can be used to establish a single classifier to detect the anomaly points (i.e., samples of small categories).

Feature Level
From the perspective of feature selection, according to the relationship with the classifier, it can be further divided into three methods:  Irrelevant to the classifier (typical algorithm: Filter);  Independent of the classifier (typical algorithm: Wrapper);  Combined with the classifier (typical algorithm: Embedded).

Evaluation Metric Level
Since the problem of unbalanced samples has the greatest impact on the accuracy (Accuracy), this indicator is usually not used alone in practice. Some evaluation indicators are specially designed to solve the problem of unbalanced samples, such as Recall rate, F1 measure, Kappa coefficient, ROC (AUC), et al.

Industrial Product Defect Detection Dataset
Dataset is the basis for research work. A good dataset is more conducive to the discovery and summary of problems, so as to facilitate the solution. At present, the field of surface defect detection of industrial products does not have a large and unified data set. For specific research objects and research scenarios, different datasets are often used. According to the different objects and application scenarios, this section categorizes the datasets commonly used in the industrial field and gives related links. As shown in Table 2, these datasets cover a wide range of industrial applications, including: hot-rolled steel strips, steel rails, electronic commutators, solar panels, printed circuit boards, magnetic tiles, fabrics, and more. In view of the summary of the existing dataset, I hope to provide same data sources for scholars' research in this field. In addition, Figure 5 selects part of the dataset and shows it to some extent. Since the MVTec ad dataset is a defect detection dataset that simulates the real-world industrial inspection scenario and has a strong reference significance, this paper takes this dataset as an example dataset and compares the application performance of several typical algorithms on this dataset. The comparison results are shown in Table 3. The MVTec AD dataset contains a total of 15 categories, of which 5 categories are different types of textures, and the remaining 10 categories are 10 different types of objects. In this dataset, 3629 images are used for training and verification, and 1725 images are used for testing. The training set only contains non-defective images, while the test set contains both non-defective images and various types of defective images. This dataset is often used for unsupervised defect/anomaly detection. It can be seen from Table 3 that most methods have a good average effect on the MVTec AD dataset. When the evaluation index is AUC, some methods get an evaluation score of more than 0.99 in the optimal category. In addition, it can be seen from the optimal category that most methods can show better performance in the object category, and some methods show better performance in the texture category. In summary, in view of the characteristics of different categories, different methods can be combined to achieve the goal of having a good and stable detection effect on both the object category and the texture category.

Summary
Surface defect detection is an indispensable part of intelligent production. Therefore, research on surface defect detection of industrial products has strong practical significance. This article makes a certain investigation on the current situation of machine learning methods in the surface defect detection of industrial products. We first discuss the application of traditional machine vision methods and deep learning methods in the field of surface defect detection. At the same time, we point out some key problems in the field of surface defect detection of industrial products and summarize its solutions. In addition, we generalize a relatively complete dataset for surface defect detection of industrial products, which can help researchers conduct more in-depth research on surface defect detection of industrial products.
We support our points by briefly explaining some specific research methods, which is effective. Considering the length and readability of the article, we only select some research methods to expand and explain, so there is a problem that the literature of some methods is not enough or not up-to-date. However, we believe that our review can help researchers understand the related research progress of surface defect detection of industrial products and play a certain reference role.
Author Contributions: Authors contributed as follows: Conceptualization, Y.D.; methodology, Y.D. and Z.W.; funding acquisition, Y.C. and E.Z.; investigation, Y.C., Y.D., F.Z., and Z.W.; writing original draft preparation, Y.C. and Y.D.; writing-review and editing, Y.C., Y.D., Z.W., and L.S.; supervision, Y.C. and E.Z. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Data from this review can be made available upon request to the corresponding author after executing appropriate data sharing agreement. The corresponding websites are listed in the manuscript.

Conflicts of Interest:
The authors declare no conflicts of interest.