1. Introduction
Owing to the developments in convolutional neural networks (CNN) over recent years, classical CNN models such as the ResNeXt [
1], Dual Path Networks [
2], and EfficientNet [
3] have been proposed, which has been proven to possess good image recognition abilities. Consequently, it is great for researchers to continue working toward importing CNN models into inspection applications and obtain good results, such as for industrial inspection [
4,
5], cryo-electron tomogram classification [
6], lithium-ion battery electrode defect detection [
7], solar cell surface defect inspection [
8], bolt joints monitoring [
9] and rolling bearing robust fault diagnosis [
10].
However, the training of CNNs requires many defect samples and necessitates additional manpower for defect labeling, which makes it difficult to import deep learning technology into actual production lines. Although many data augmentation methods have been proposed thus far [
11,
12,
13,
14], the application of CNNs in industrial testing remains limited.
As a solution to overcome this limitation, anomaly detection technology, an unsupervised learning method, has garnered significant research attention. In recent studies, anomaly detection was mainly realized using an autoencoder (AE) [
15] or a Generative Adversarial Network (GAN) [
16]. The principle involves training neural networks through good products to realize better feature reconstruction abilities for good products. With regard to production lines, an abnormal score is defined as the difference between reconstructed and original images or features. Anomaly detection methods such as AnoGAN [
17], GANomaly [
18], Skip-GANomaly [
19], and DFR [
20] have been proven to possess certain anomaly detection abilities. Their model architecture is shown in
Figure 1.
Although these models have certain abilities in industrial anomaly detection, their poor feature extraction and reconstruction abilities limit their practical applications. Recently, scholars have attempted to overcome these problems by introducing skip connections [
19], deep feature reconstruction algorithms [
20], and multiple autoencoders [
21]. In this paper, an anomaly detection model with a skip autoencoder and a deep feature extractor is proposed. This model was proven to possess better inspection ability than previous anomaly detection models for different datasets and under different conditions.
This paper mainly discusses the following issues regarding the proposed method:
The MVTecAD, furniture wood, and mobile phone cover glass datasets for production lines were used to train and verify the proposed model, which was then compared with previous anomaly detection models.
Different feature extractors were used to train the proposed model, and optimal feature extractor selection under different requirements was discussed.
The proposed model was trained with different feature extract layers, and the corresponding effects were discussed.
3. Proposed Method
3.1. Model Architecture
The structure of the proposed method, as shown in
Figure 2, comprises a pre-trained feature extractor and skip connection-based autoencoder. Inspired by DFR [
20] and Skip-GANomaly [
19], the feature extractor (FE) was designed as a pre-trained CNN (ResNeXt101 was used instead of VGG19 in the DFR) model. It can extract important features from an input image x. After feature extraction, these features are resized and concatenated as a three-dimensional tensor and imported to the autoencoder.
Furthermore, the proposed method uses the skip connection adopted in Skip-GANomaly. On importing this structure, the autoencoder (AE) in the proposed method is expected to realize better feature reconstruction than DFR. Furthermore, the introduction of this structure ensures that the model exhibits better stability and a higher anomaly detection ability.
During the training process, only good samples were input into the model. Therefore, the autoencoder in the model possessed better capability for reconstructing the characteristics of good products. During the detection process, provided the appropriate residual score is defined to express the characteristic tensor FE(x) and the tensor reconstructed via AE(FE(x)), the function of anomaly detection and segmentation can be realized.
Overall, the proposed method combines the advantages of DFR and Skip-GANomaly, which makes it suitable for feature extraction and reduction.
3.2. Training Process
In the training process, the training data x were inputed into the deep feature extractor, which was pre-trained using the ImageNet dataset [
22]. Furthermore, the weighting of the feature extractor was locked during the training process. By being pre-trained with a large amount of image data, the feature extractor can effectively extract a significant part of the training data. This step allows the model to obtain more features in the training process and makes the model considerably better than the model trained using the image directly.
After feature extraction, the feature was resized and concatenated as a three-dimensional tensor and then imported to the skip-connect-based autoencoder. Furthermore, to ensure that the model has the best image reconstruction ability for good samples, the loss function, which is termed as contextual loss, was applied in model training. This loss function indicates the difference between the feature tensor FE(x) and tensor AE(FE(x)) reconstructed by the autoencoder.
Mathematically, distance L2 was used to clearly define the difference between these two tensors. Therefore, the contextual loss also employed this distance definition. It can be expressed as follows:
The aim of training is to minimize contextual loss, which ensures that the model achieves the best feature reduction performance for normal samples and further improves the anomaly detection ability of the model.
3.3. Detection Process
In the detection phase, the image x to be tested is first inputted into the FE feature extractor (.) and is reshaped and concatenated as the feature tensor FE(x). Next, the feature tensor was input into the autoencoder for reconstruction, and the tensor AE(FE(x)) was output. The residual map, R(FE(x), AE(FE(x))), was used to calculate distance L2 for the two vectors. This can be expressed as:
As the proposed method only uses good products for training, it shows better feature reconstruction ability for good products. Therefore, the residual score in the residual map R(FE(x), AE(FE(x))) of the good products was lower than that of the bad products.
During detection, by adjusting different threshold values , the part where the residual score is greater than the threshold value is defined as the abnormal area.
6. Conclusions
In this study, a new anomaly detection and segmentation model was proposed. This model is improved based on the DFR architecture. Its novelty and contributions are as follows. First, we use the skip-connection architecture to improve the feature reduction ability of the autoencoder in DFR. Second, we change the feature extractor from the VGG model to ResNeXt101 and explore the best combination of output blocks. Third, in addition to the opened dataset, two sets of production line data collected by our team are used to verify the performance of the model in actual industrial applications. With the skip connection and deep feature extractor, the proposed model exhibits good feature extraction and reconstruction abilities. Therefore, its performance for MVTec AD and two groups of production line anomaly detection datasets is significantly better than those of previous models. Furthermore, this study discussed the performance of the model in terms of computing resources. The results indicate that the proposed model can maintain good detection and segmentation abilities even if it is replaced with a lighter feature extractor. This implies that the proposed model shows good performance and is suitable for application in actual production line inspection tasks. However, the proposed method still has the following two limitations and challenges. First, since the model focuses on the detection of details, it is relatively poor in detecting macro-scale defects. Second, the method using deep learning will still take a long time to train. These are the directions for our team to study and improve in the future research.