Fabric Defect Detection System Using Stacked Convolutional Denoising Auto-Encoders Trained with Synthetic Defect Data

: As defect detection using machine vision is diversifying and expanding, approaches using deep learning are increasing. Recently, there have been much research for detecting and classifying defects using image segmentation, image detection, and image classiﬁcation. These methods are e ﬀ ective but require a large number of actual defect data. However, it is very di ﬃ cult to get a large amount of actual defect data in industrial areas. To overcome this problem, we propose a method for defect detection using stacked convolutional autoencoders. The autoencoders we proposed are trained by using only non-defect data and synthetic defect data generated by using the characteristics of defect based on the knowledge of the experts. A key advantage of our approach is that actual defect data is not required, and we veriﬁed that the performance is comparable to the systems trained using real defect data.


Introduction
Recently, the importance of researching effective defect detection systems has been increasing. The defect detection systems using deep learning show better performance than the conventional feature-based defect detection system in complex patterns.
The methods of detecting defects based on deep learning are largely divided into the methods using supervised learning and unsupervised learning. In supervised learning, methods such as image segmentation [1], classification [2], and image detection [3] are used. In unsupervised learning, methods such as autoencoders are used.
Studies of defect detection using supervised learning are actively progressing. The methods have high defect detection rate, but they need sufficient defect data for training. Due to the industrial area's characteristics, the process of accumulating data over a long period of time is necessary to actually apply these studies.
There are studies on defect detection based on unsupervised learning using autoencoders. The studies include autoencoders for pre-training method [4] and detecting abnormal regions using the output from the autoencoders [5,6]. Studies on detecting abnormal regions using the output of the autoencoders have the advantage of not requiring labeled defect data but have lower detection rate than methods using supervised learning.
It is difficult to get a large amount of actual defect data. However, experienced workers can express the shapes and the types of defects. In this paper, we propose an efficient approach of identifying defects without actual defect data by generating artificial defect data using types and characteristics of the defects known by experienced workers to train stacked convolutional autoencoders.

1.
We define a defect detection system for environments where it is hard to obtain a large amount of data.

2.
We define a method of generating synthetic defects by using characteristics of the defects based on the knowledge of the experienced workers. We can train an autoencoder where the input is a defect image generated artificially and the output is the corresponding clean image.
The paper is organized as follows: Section 2 describes the conventional ways of detecting defects using unsupervised learning. Section 3 describes the proposed system components and implementations. Section 4 describes our practical implementation and test results. Finally, our summary and conclusions are provided in Section 5.

Related Works
Defect detection using computer vision has been widely studied for automated inspection systems. It has gradually displaced the traditional manual methods. Also, the study of the defect detection method using deep learning is accelerating the replacement of the traditional manual methods. As previously stated in Section 1, studies of deep learning-based defect detection are categorized into two groups: supervised and unsupervised learning-based methods. However, supervised learning-based method has the problem of limited defected data. To overcome this problem, there have been various studies on detecting defects using unsupervised learning-based methods.
The studies using autoencoders have mainly conducted for unsupervised defect detection method. These studies use differences between original image and restored image from the autoencoders to detect defects. Mei et al. [5] adopted the multi-scale convolutional denoising autoencoder (MSCDAE) architecture. They use the various scales of the image generated by the Gaussian pyramid and utilize the salt-and-pepper noise model to the input image to train the denoising autoencoders of various sizes. They try to detect defects by combining outputs of the autoencoders with various sizes. This method shows 80% accuracy and 0.65 F-Score on fabric datasets.
Bergmann et al. [7] used perceptual loss function to get better performance of the autoencoder in defect detection. They note that existing methods lead to large residuals in edge regions that have slight localization inaccuracies. To solve this problem, they proposed a method using perceptual loss function based on structure similarity to detect defects. The loss function measures luminance, contrast, and structure between image patches. Besides that, there are various approaches to detect defects based on unsupervised learning without using autoencoders. These studies tried to detect defects by using wavelet transformation [8][9][10] or Gabor filter [11,12].
Also, there are studies using an unsupervised and a supervised manner simultaneously. Yundong Li et al. [6] adopt Fisher criterion-based stacked denoising autoencoder (FCSDA). They designed a Fisher criterion-based loss function in the feature space to overcome the limited defected data problem. They trained two autoencoders and use them as a classification network and a segmentation network. The classification network is trained in a supervised manner with labeled dataset, and the segmentation network is trained in an unsupervised manner. If the classification network judges an image patch as a defect, the segmentation network locates the defect.
The aforementioned studies about unsupervised defect detection have focused on the loss functions to train the autoencoders suitably or ensemble the results of the various autoencoders to get better performance. They all trained denoising autoencoders and used them for defect detection. However, they did not consider much about adding noise to the input of the denoising autoencoders in the training phase.
They use only typical type of noise, such as Gaussian noise or salt-and-pepper noise. In our work, we focused on the noise used in the training phase. We added not only Gaussian noise but also generated synthetic defects to the input of the denoising autoencoders for training. Our method achieved significant performance improvement in real-world patterned fabric data.

System Components
Our defect detection system is comprised of four units: the defect generation unit, the pre-processing unit, the convolutional autoencoders unit, and the post-processing unit. Figure 1 shows overall architecture of the proposed system. The images coming from the camera are divided into small patches and processed one-by-one. In the defect generation unit, synthetic defects are generated using the characteristics of defects that the experienced workers described. Then, the generated defects are added to the non-defect training data. The preprocessing unit normalizes the images to make the dataset have a common scale. The trained convolutional autoencoder is used to reconstruct the input patches. Once the defective patches are used as input to the autoencoder, the defects can be located using the differences between the defective patches and the reconstructed images. The post-processing unit detects defects by using thresholding and morphology filtering.
Appl. Sci. 2020, 20, x FOR PEER REVIEW 3 of 10 generated synthetic defects to the input of the denoising autoencoders for training. Our method achieved significant performance improvement in real-world patterned fabric data.

System Components
Our defect detection system is comprised of four units: the defect generation unit, the preprocessing unit, the convolutional autoencoders unit, and the post-processing unit. Figure 1 shows overall architecture of the proposed system. The images coming from the camera are divided into small patches and processed one-by-one. In the defect generation unit, synthetic defects are generated using the characteristics of defects that the experienced workers described. Then, the generated defects are added to the non-defect training data. The preprocessing unit normalizes the images to make the dataset have a common scale. The trained convolutional autoencoder is used to reconstruct the input patches. Once the defective patches are used as input to the autoencoder, the defects can be located using the differences between the defective patches and the reconstructed images. The postprocessing unit detects defects by using thresholding and morphology filtering.

Defect Generation Unit
The defect generation unit is comprised of two stages: the defect pattern generation stage and the defect merge stage.

Defect Pattern Generation Stage
During the defect input stage, synthetic defects that the experts produce are expressed as image data. The human experts have knowledge of the various defects that occur frequently and also the defects that are very rare or hard to detect but critical for the quality of the products.
If the defects in fabric are due to stain, an expert just draws shapes with the stain's color. Similarly, in the case of the defects due to scratches, we just draw achromatic shapes.
Because synthetic defect patterns are generated by humans, only small amount of synthetic defect data are available. To solve this problem, we apply various data augmentation methods to the synthetic defect data which include: color inversion, color augmentation, translation, flip, etc.

Defect Merge Stage
During the defect merge stage, outputs of the defect pattern generation stage are combined into the non-defected data. To combine the images, we use Poisson image editing [13] method or Alpha

Defect Generation Unit
The defect generation unit is comprised of two stages: the defect pattern generation stage and the defect merge stage.

Defect Pattern Generation Stage
During the defect input stage, synthetic defects that the experts produce are expressed as image data. The human experts have knowledge of the various defects that occur frequently and also the defects that are very rare or hard to detect but critical for the quality of the products.
If the defects in fabric are due to stain, an expert just draws shapes with the stain's color. Similarly, in the case of the defects due to scratches, we just draw achromatic shapes.
Because synthetic defect patterns are generated by humans, only small amount of synthetic defect data are available. To solve this problem, we apply various data augmentation methods to the synthetic defect data which include: color inversion, color augmentation, translation, flip, etc.

Defect Merge Stage
During the defect merge stage, outputs of the defect pattern generation stage are combined into the non-defected data. To combine the images, we use Poisson image editing [13] method or Alpha compositing [14] method. It depends on the shape and type of the defects. In our experiments, we use alpha compositing with high transparency for the synthetic stain defects, and we use alpha compositing with low transparency and Poisson image editing for the synthetic scratch defects. Details about the Appl. Sci. 2020, 10, 2511 4 of 10 synthetic defects in our experiments are described in Section 4.4 After combining the images, additive Gaussian noise is added for robust defect detection.

Pre-Processing Unit
The pre-processing unit normalizes the images to have zero mean value and standard deviation equal to one. We use the z-score normalization [15] method. Let the pixels from an image are {x i } and the data matrix X = [x 0 , x 1 , x 2 , · · · x n ]. The normalized image data x can be computed as: where x is the mean value of {x i } and σ is the standard deviation of {x i } of the entire dataset.

Convolutional Autoencoders Unit
The autoencoder is trained to restore the non-defected image data from defected image data. The network architecture of the autoencoder is RED30 proposed in [16].

Convolutional Autoencoders
RED30 consists of an encoder with 15 convolutional layers, a decoder with 15 deconvolutional layers and element-wise sum layers for skip connections. All of the convolutional layers and deconvolutional layers use ReLU as activation functions. The architecture of the convolutional autoencoder is shown in Figure 2.
Appl. Sci. 2020, 20, x FOR PEER REVIEW 4 of 10 compositing [14] method. It depends on the shape and type of the defects. In our experiments, we use alpha compositing with high transparency for the synthetic stain defects, and we use alpha compositing with low transparency and Poisson image editing for the synthetic scratch defects. Details about the synthetic defects in our experiments are described in Section 4.4 After combining the images, additive Gaussian noise is added for robust defect detection.

Pre-Processing Unit
The pre-processing unit normalizes the images to have zero mean value and standard deviation equal to one. We use the z-score normalization [15] method. Let the pixels from an image are { } and the data matrix X = [ , , , ⋯ ]. The normalized image data can be computed as: where ̅ is the mean value of { } and is the standard deviation of { } of the entire dataset.

Convolutional Autoencoders Unit
The autoencoder is trained to restore the non-defected image data from defected image data. The network architecture of the autoencoder is RED30 proposed in [16].

Convolutional Autoencoders
RED30 consists of an encoder with 15 convolutional layers, a decoder with 15 deconvolutional layers and element-wise sum layers for skip connections. All of the convolutional layers and deconvolutional layers use ReLU as activation functions. The architecture of the convolutional autoencoder is shown in Figure 2. The encoder extracts the feature of the image data and converts it into a latent vector. Then, the decoder converts the latent vector into a restored image data [17]. In these processes, the defects in the image data are removed. Difference between the input image data and the restored image data are used to detect defects.

Skip Connections
When the convolutional autoencoder network goes deeper, it does not work well. This is because too much details are already lost in the encoder. To handle this problem, the skip connections are added between two corresponding convolution and deconvolution layers. Let the output from a convolution layer is and the output from the corresponding deconvolutional layer is . The input to the next deconvolutional layer is computed as follows: The encoder extracts the feature of the image data and converts it into a latent vector. Then, the decoder converts the latent vector into a restored image data [17]. In these processes, the defects in the image data are removed. Difference between the input image data and the restored image data are used to detect defects.

Skip Connections
When the convolutional autoencoder network goes deeper, it does not work well. This is because too much details are already lost in the encoder. To handle this problem, the skip connections are added between two corresponding convolution and deconvolution layers. Let the output from a convolution layer is X 1 and the output from the corresponding deconvolutional layer is X 2 . The input to the next deconvolutional layer is computed as follows: Appl. Sci. 2020, 10, 2511 5 of 10

Training
In the training phase, the autoencoder learns a mapping from the input image with defects to the original clean image. The autoencoder is trained to minimize the mean squared error (MSE) between original patches and restored patches.

Post-Processing Unit
In our scheme, the difference between the original image x i and the outputx i of the autoencoder can be used as a clue for defect detection. Since our system depends on the difference of the pixel values, we use the naive thresholding approach for detecting defects. However, defects having small differences and defects only having differences in specific channels are difficult to detect by naive thresholding approach. To solve the problem with defects in specific channels, first, we calculate the Euclidean distance L(x i ,x i ) as in Equation (3) for each pixel. We use logarithmic transformations with a bias for solving problems with defects having small differences. The results of these methods represent the abnormality of a pixel, K(x i ,x i ) as: where x i andx i mean the pixels in the original image and the output of the autoencoder and k means index of the channel. Additionally, morphology operations are used to remove noise before determining the defected region using thresholding.

Dataset
In this section, we evaluated the performance of our proposed defect detection system. We compared our proposed system with a defect detection system using real data. We applied the proposed system to the real-world fabric samples with various defects. The fabric samples were captured by VTC-2K10.5G-C19 [18] which is produced by Vieworks. We captured three types of fabric: dataset 1, 2, and 3 as shown in Figure 5. Each of the datasets is comprised of 64,000 images of 256 × 256 size, and we resized the image to 128 × 128 for training.
To measure the performance of the system, we captured 6000 images of real defected fabric samples also. The defects in the fabric samples are produced during the manufacturing process, or artificially applied to the fabric using knives, awls, inks, etc.

Generated Synthetic Defects
We assume that the defects generated by experts are comprised of four types: hole, stitching, stain, and misprinting. By considering the characteristics of each type, we generate synthetic defects and apply them to the non-defected image. In the case of hole and stitching defects, we draw achromatic ellipses and lines, and apply them to non-defected images by alpha compositing with low transparency or Poisson image editing. Similarly, we drew colored shapes for stain and misprinting. These are applied to non-defected images by alpha compositing with high transparency. All of the generated defects are randomly located and scaled to random size. Also, we used three data augmentation methods for synthetic defects: flip, transformation, color augmentation. Figure 3 shows examples of the generated synthetic defects.

Experimental Conditions
We train our model using 64,000 training images. The training dataset consists of non-defected images. We train RED30 with the Adam optimizer [19], Xavier initialization [20], a learning rate of 0.0001, a weight decay of 0.0005, and a noise level of 0.25. The number of filters is 64 and the filter size of the convolution and deconvolution layers is 3 × 3.

Verification
We compare the performance of our model with a defect detection system trained using real data. The baseline system we implemented is U-Net [21]. We train a U-Net using 13,000 training images with actual defects. Also, we used 6,000 test images with actual defects to measure the performance of the U-Net. The test images are the same as the images used in our system. When training the U-Net, we have used Adam optimizer, Xavier initialization, and a learning rate of 0.0001. The input and output size of the U-Net are 128 × 128, 88 × 88 respectively.

Evaluation
To evaluate performance of the networks, we use recall, precision, and F-score (F1 score). These values are computed as: Recall means the portion of the cases that the defect detector indicates as the defects among actual defects, while precision means the portion of the cases that are the actual defects among the cases that the detector indicated as defect. Table 1 shows the comparison between the performance of the U-Net and the performance of our proposed system.

Experimental Conditions
We train our model using 64,000 training images. The training dataset consists of non-defected images. We train RED30 with the Adam optimizer [19], Xavier initialization [20], a learning rate of 0.0001, a weight decay of 0.0005, and a noise level of 0.25. The number of filters is 64 and the filter size of the convolution and deconvolution layers is 3 × 3.

Verification
We compare the performance of our model with a defect detection system trained using real data. The baseline system we implemented is U-Net [21]. We train a U-Net using 13,000 training images with actual defects. Also, we used 6,000 test images with actual defects to measure the performance of the U-Net. The test images are the same as the images used in our system. When training the U-Net, we have used Adam optimizer, Xavier initialization, and a learning rate of 0.0001. The input and output size of the U-Net are 128 × 128, 88 × 88 respectively.

Evaluation
To evaluate performance of the networks, we use recall, precision, and F-score (F1 score). These values are computed as: Recall means the portion of the cases that the defect detector indicates as the defects among actual defects, while precision means the portion of the cases that are the actual defects among the cases that the detector indicated as defect. Table 1 shows the comparison between the performance of the U-Net and the performance of our proposed system.  Figure 4 shows some examples of the results of our proposed system. Also, Figure 5 shows the comparison between the performance of U-Net and the performance of our proposed system. In the case of the proposed method, the performance of defect detection drops when the difference in pixel values is small, while in the case of the baseline system, this phenomenon does not appear often. This is because our proposed system detects defects by using the difference between the input image and the restored image. It can make the defects with a small difference look like noise.   Figure 4 shows some examples of the results of our proposed system. Also, Figure 5 shows the comparison between the performance of U-Net and the performance of our proposed system. In the case of the proposed method, the performance of defect detection drops when the difference in pixel values is small, while in the case of the baseline system, this phenomenon does not appear often. This is because our proposed system detects defects by using the difference between the input image and the restored image. It can make the defects with a small difference look like noise.

Conclusion
In this paper, we proposed a defect detection system using synthetic defect data based on stacked convolutional autoencoders. The proposed autoencoder is trained by using only non-defect data and synthetic defect data generated by using the characteristics of defects described from the knowledge of experts. To verify the performance of our method, we compared the performance of the proposed system with U-Net trained with actual defect data, and showed that the proposed system using only non-defect data and synthetic data can detect actual defects comparably with that using real defect data. This method can be applied to many industrial and medical applications such as cancer detection using the knowledge of experienced doctors.
As a defect detection system using synthetic defect data generated by using the characteristics of defect based on the knowledge of the experts, our system has a limitation of the low detection-rate of unknown defects. However, the limitation is also shared with defect detection system based on real defect data. The system can be improved by iteration of the whole process. If the human expert forgets to mention a type of defect and the system makes some errors, the expert can add the type and some related types.
Our immediate future work includes enhancing the performance with larger and various learning datasets and an enhanced defect generating unit since the performance of our system can vary widely with different learning and test datasets. We can also improve the performance by solving the problem of low recall and precision when the difference between the input image and the restored image is very small. As one of the solutions, we may set a threshold in the tolerance according to the applications such that we can decide the limit of small defects that is allowed in a specific application. Finally, we will apply semisupervised learning which is based on the use of a large amount of unlabeled data but also employs a small amount of labeled data which may be available as in [22].

Conclusions
In this paper, we proposed a defect detection system using synthetic defect data based on stacked convolutional autoencoders. The proposed autoencoder is trained by using only non-defect data and synthetic defect data generated by using the characteristics of defects described from the knowledge of experts. To verify the performance of our method, we compared the performance of the proposed system with U-Net trained with actual defect data, and showed that the proposed system using only non-defect data and synthetic data can detect actual defects comparably with that using real defect data. This method can be applied to many industrial and medical applications such as cancer detection using the knowledge of experienced doctors.
As a defect detection system using synthetic defect data generated by using the characteristics of defect based on the knowledge of the experts, our system has a limitation of the low detection-rate of unknown defects. However, the limitation is also shared with defect detection system based on real defect data. The system can be improved by iteration of the whole process. If the human expert forgets to mention a type of defect and the system makes some errors, the expert can add the type and some related types.
Our immediate future work includes enhancing the performance with larger and various learning datasets and an enhanced defect generating unit since the performance of our system can vary widely with different learning and test datasets. We can also improve the performance by solving the problem of low recall and precision when the difference between the input image and the restored image is very small. As one of the solutions, we may set a threshold in the tolerance according to the applications such that we can decide the limit of small defects that is allowed in a specific application. Finally, we will apply semisupervised learning which is based on the use of a large amount of unlabeled data but also employs a small amount of labeled data which may be available as in [22].