A Deep Convolutional Generative Adversarial Networks-Based Method for Defect Detection in Small Sample Industrial Parts Images

Gao, Hongbin; Zhang, Ya; Lv, Wenkai; Yin, Jiawei; Qasim, Tehreem; Wang, Dongyun

doi:10.3390/app12136569

Open AccessArticle

A Deep Convolutional Generative Adversarial Networks-Based Method for Defect Detection in Small Sample Industrial Parts Images

by

Hongbin Gao

¹,

Ya Zhang

¹,

Wenkai Lv

¹

,

Jiawei Yin

¹,

Tehreem Qasim

¹ and

Dongyun Wang

^1,2,*

¹

College of Engineering, Zhejiang Normal University, Jinhua 321005, China

²

Zhejiang Provincial Key Laboratory of Urban Rail Transit Intelligent Operation and Maintenance Technology and Equipment, Zhejiang Normal University, Jinhua 321005, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(13), 6569; https://doi.org/10.3390/app12136569

Submission received: 21 April 2022 / Revised: 22 June 2022 / Accepted: 23 June 2022 / Published: 29 June 2022

(This article belongs to the Special Issue Applications of Deep Learning and Artificial Intelligence Methods)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Online defect detection in small industrial parts is of paramount importance for building closed loop intelligent manufacturing systems. However, high-efficiency and high-precision detection of surface defects in these manufacturing systems is a difficult task and poses a major research challenge. The small sample size of industrial parts available for training machine learning algorithms and the low accuracy of computer vision-based inspection algorithms are the bottlenecks that restrict the development of efficient online defect detection technology. To address these issues, we propose a small sample gear face defect detection method based on a Deep Convolutional Generative Adversarial Network (DCGAN) and a lightweight Convolutional Neural Network (CNN) in this paper. Initially, we perform data augmentation by using DCGAN and traditional data enhancement methods which effectively increase the size of the training data. In the next stage, we perform defect classification by using a lightweight CNN model which is based on the state-of-the-art Vgg11 network. We introduce the Leaky ReLU activation function and a dropout layer in the proposed CNN. In the experimental evaluation, the proposed framework achieves a high score of 98.40%, which is better than that of the classic Vgg11 network model. The method proposed in this paper is helpful for the detection of defects in industrial parts when the available sample size for training is small.

Keywords:

defect detection; deep convolution generation adversarial network; data augmentation; lightweight model

1. Introduction

Small precision parts are in widespread use in the fields of aerospace, precision machinery, and instrumentation. A high surface quality is a priority for these tools. It is difficult to avoid defects on the surface of these products even with careful treatment [1]. Surface quality affects the reliability and lifetime of the equipment, which can potentially lead to serious disasters. Therefore, it is imperative to find efficient and non-destructive methods for the inspection of part defects. In this paper, we focus on precision pinions [2] and propose an online defect detection algorithm for small parts based on machine vision.

In the field of defect detection, traditional manual detection methods have been gradually eliminated, while machine learning and deep learning-based algorithms are becoming mainstream, the former mainly represented by random forests (RF) [3], support vector machines (SVMs) [4], artificial neural networks (ANNs) [5] etc. Goh et al. [6] based their method on a support vector machine and artificial neural network to diagnose faults in industrial parts. Sheng et al. [7] proposed a model method based on intrinsic time scale decomposition and an improved support vector machine, which effectively deals with the non-stationary and nonlinear characteristics of bearing vibration signals. Yi et al. [8] proposed a gearbox state detection method based on MODWPT (Maximal Overlap Discrete Wavelet Packet Transform), ant colony optimization algorithm, and random forest classifier which effectively reveals vibration related faults. When the traditional machine learning algorithm faces defect samples with large differences within the class, it can only extract a small number of features, and the generalization performance of the model is poor. As an extension of machine learning, deep learning mainly extracts the corresponding feature information by increasing the layers of convolutional neural networks (CNNs), and finally compresses the high-dimensional feature information into a one-dimensional feature vector, realizing the self-organization and self-learning of the feature [9], due to which it has achieved great success in the field of target segmentation, detection, and recognition in recent years [10]. Utilizing deep learning for defect detection, Allam et al. [11] reduced the effort required for manual inspection by combining domain expertise with Faster-R CNN. Their method identifies gear tooth form defects. Ku et al. [12] proposed a sparse deep learning model in the absence of training samples and achieved good classification results on both gear and rotor datasets. Therefore, in the defect detection algorithm, the deep learning algorithm based on convolutional neural network often has a better performance in the case of a large sample data volume, and it can complete the detection of the target defect on the basis of the self-extraction of the feature, which is the current mainstream algorithm.

Although the above methods have achieved encouraging results, there are still challenges to be overcome, such as data collection difficulties and data noise. When the amount of raw data is small and the distribution of defects in the class is large, the effective features extracted by the ML and DL algorithms are very limited, and it is difficult to achieve high classification accuracy [13]. Therefore, it is necessary to find a suitable data enhancement method to enrich the effective training data samples, play to the role of network regularization, and improve the performance of the model [14]. Existing traditional data enhancement methods mainly through image translation, rotation, random cropping, and the addition of Gaussian noise. However, realistic composite images similar to real ones cannot be obtained by the simple methods described above. Deep convolutional generative adversarial network (DCGAN) [15] is an extension of the GAN [16], which improves the inherent instability of the GAN by introducing CNNs to reduce mode crashes and has been successful in generating deep forgery in recent years. Peng et al. [17] used DCGAN networks to set thresholds from themselves, enabling automatic model tuning without human intervention to quantitatively monitor the severity of bearing failures in wind turbines. Jia et al. [18] used the C-DCGAN model to balance the vibration data of mechanical faults and validate them on two datasets, improving the accuracy of fault diagnosis and the classification ability of the classifier in the case of small samples. In the absence of anomalous samples, Oz et al. [19] used the generator network in DCGAN to provide a false image for the discriminator, so that the trained discriminator obtained the ability to segment the anomaly in the image and realized the localization of texture anomalies. It can be seen that DCGAN has a wide range of applications in data generation; compared with traditional data enhancement, it is an unsupervised learning method, can often generate more realistic and effective data, and the whole process does not require human adjustment, making it an effective method of data enhancement.

Based on the above analysis, in this paper, we propose a small sample gear end face defect detection method based on DCGAN and lightweight CNN, which is mainly divided into two steps. In step 1, we deal with the issue of small available training data by using data augmentation. For this purpose, a deep convolutional GAN (DCGAN) and traditional data enhancement fusion are adopted to increase the quantity and diversity of the data. DCGAN mainly consists of a generator model (

G

) and a discriminator model (

D

). During training, model

G

aims to generate realistic images to deceive

D

, while the goal of model

D

is to separate the images generated by the model

G

from the real images. Figure 1 shows the general framework of a DCGAN.

In step 2, we tackle the task of defect detection and classification by using the image data generated through the data augmentation. We propose an improved CNN based on the state-of-the-art Vgg11 network for the purpose of defect detection. The proposed method is superior to the original Vgg11 network in terms of accuracy and computing performance. Detailed discussion and analysis are given in Section 3 (Experimental analysis).

The rest of this article is organized as follows: in Section 2, construction of the dataset and the proposed defect detection framework are discussed. In Section 3, experimental results and discussion are given. Conclusions and future work are presented in Section 4.

2. Proposed Methods

In this section, we discuss in detail the construction of the dataset by using data augmentation (by employing a DCGAN) and the detection of defects in the dataset images by using a modified CNN which is based on the Vgg11 network. The architectural flowchart of the proposed method is shown in Figure 2.

First, the defective part that needs to be inspected is selected (here, the small modulus gear is used as an example); Second, the upper and lower surfaces of the parts are photographed separately by two cameras, and then the images with large defect differences are selected to form the original data set. Third, the dataset of images expanded by ROS is put into the DCGAN’s D network as real data, so as to improve the quality of the images generated by the G network. Fourth, the generated images are expanded by traditional data to form a final dataset. Fifth, the final dataset is entered into the proposed model for training and classification. Finally, a series of evaluation results are obtained.

2.1. Construction of Gear End Defect Dataset

The dataset used in this experiment is a batch of small modular copper gears purchased online, made by using processing technology such as tooth cutting. The tooth top circle diameter is 13.856 ± 0.009 mm, the diameter of the tooth root circle is 10.681 + 0.009 mm, and the diameter of the inner hole is 2.701 + 0.005 mm. Shown in Figure 3 are the front and side views of the gear. If it is a non-conforming gear, the dimensional defects are easy to be checked with the visual system, but the surface defects are not easy to detect due to the wide variety. Here, we classify and detect the five common kinds of gear surfaces.

Gear face defects can be divided into texture defects and shape defects. The texture defects are mainly stains, scratches, and circles while the shape defects are due to breakage. As shown in Figure 4, the dataset consists of the above-mentioned four defect types and normal gear face.

2.1.1. Image Selection from the Dataset

Gear face images sometimes contain two or more defects at the same time. Therefore, some of the original defect images in the dataset are not suitable for training. We selected a subset of the entire dataset which was based on the following points: (1) the contrast between the defect of the end face of the gear and its background should be larger, highlighting the type of defect; (2) the shooting environment should reduce the influence of unfavorable factors such as lighting, so that the end face of the gear is evenly affected by light; (3) images of uncertain quality should be discussed with the team to decide whether to keep them.

Taking into consideration the above-mentioned points, only 10 images of each category were included in the selected subset, which is far from enough to meet the requirements of neural network training. We therefore needed to apply data augmentation to build a sufficiently large number of realistic looking synthetic images for the purpose of training.

2.1.2. Data Augmentation

In order to generate sufficient image data for CNN training, we utilized DCGAN and traditional data augmentation. In the original dataset, each category contained 10 images, and an image processing algorithm was used to individually adjust each image to the size of 224 × 224 × 3 to fulfill network structure requirements.

In the DCGAN used in this work,

G

consisted of a fully connected layer and three deconvolutional layers that used modified linear units (ReLU) as the activation function for the first three layers of the network, and the final output layer network used hyperbolic tangent (Tanh) activation functions, and the hyperparameters of the generator model are shown in Table 1.

D

consisted of four convolutional layers and a fully connected layer, all of which employ Leaky ReLU as the activation function. The hyperparameters of the discriminant model are shown in Table 2. At the same time, both of the above network models used batch normalization to accelerate the convergence speed of neural networks.

The training of DCGAN can be seen as a process of alternating optimization of two subnets, and the discriminant needs some real picture data as the basis, so as to better generate the corresponding image. In our case, because the original dataset was so small, we used random oversampling (ROS) to increase the single round of training samples to 200 images. It is important to note here that the duplicate was a sample of a single image in a category, rather than copying multiple of them. The reason is that the gear defects are very small and not easy to distinguish, any two images in one category with different defect distributions as input will make the data generated by the G network contain large noise, and when the image is ROS-expanded and then used as input, the network can also generate a high-quality defect image after multiple iterations. During training, the network used a learning rate of 0.0001, a batch size of 8, and 1000 iterations. Figure 5 below is an example of a scratch defect, after 10 iterations, 100 iterations, 500 iterations, and 1000 iterations of the effect, where Figure 5a–d is the effect map generated by original inputting, and Figure 5e–h is the effect map generated by ROS inputting. It is intuitively apparent that the image of the original inputting cannot converge after 1000 iterations, and the image generated after ROS inputting is gradually closer to the real image, containing the corresponding defect characteristics that can be used to train the defect classification model. Using DCGAN, the dataset was expanded from the original 50 images to 500 images, increasing in size by a factor of 9.

The image data generated after DCGAN training iteration simulated the data distribution of real samples, and the image quality was high. However, due to the small original dataset, the generated data were highly similar to the original data. If these data are fed into the classification network, it is not conducive to improving the generalization of the network, so the diversity of data needs to be increased. Here, simple traditional data enhancement technology was adopted after DCGAN. Combining DCGAN with traditional data enhancement methods can significantly increase the size of the original dataset, as well as the quality and diversity of data. Traditional data enhancement methods are as follows:

Horizontal and vertical flip;
Random pan −7 to 7;
Random rotation 0°–360°;
Image brightness enhancement;
Gaussian noise with mean of 0 and standard deviation of 0.05.

After data enhancement, an image was integrated into the gear face dataset. The dataset at that stage contains 2500 images of gear end faces i.e., 500 images belonging to each class. We split the images into training, test, and validation sets according to the following ratio 7:1.5:1.5. The training, test, and validation sets contained 1750 images, 375 images, and 375 images, respectively.

2.2. The Proposed CNN Model

The schematic diagram of the CNN used in this work is shown in Figure 6. The proposed CNN is a variant of Vgg11. The number of parameters of Vgg11 is up to 128 M. Such a large number of parameters led to slow training speed and also caused over-fitting. For this reason, we made modifications to Vgg11 to make it suitable for the task of gear end face defect detection.

In the proposed model, 7 convolutional layers, 4 pooling layers, 3 fully connected layers, and 2 dropout layers were used. The model structure and parameters are shown in Table 3. We retained the depth of Vgg11 but reduced the number of channels per convolution core. With the continuous convolution and pooling of the network, the size of the feature graph kept decreasing. However, in order to obtain more comprehensive information, it was necessary to increase the number of convolutional kernels (channels), so as to effectively combine the previously learned features. Therefore, the number of channels in the convolutional kernels was set to 64-128-256-384, which reduced the number of model parameters while obtaining effective feature combination. The convolutional layer was divided into 4 blocks to learn the input features. Each block consisted of a convolution, activation, and maximum pooling order. The convolutional filter of each layer was of size 3 × 3, and the stride was fixed at 1 pixel. These settings allowed the network to efficiently extract image features. For the choice of activation function, we adopted LeakyReLU, which, as a variant of ReLU, solved the case where the gradient is 0 in the case of negative input and avoids the gradient disappearing. This was followed by two fully connected layers (Dense 1, Dense 2) of 256 channels and a dropout layer with a retention rate of 0.2. These settings reduced the risk of network overfitting. Finally, we connected a 5-channel fully connected layer and the Softmax activation function.

3. Experiments and Analysis

We conducted the experiments on a computer with the following specifications: corei5-9300HCPU@2.30GHz, 8 GB of RAM, NVIDIA GeForce 1650 GTX GPU, and Windows 10 operating system. We used TensorFlow 2.1 library in Python 3.7 for the experiments.

3.1. Experimental Setup

A flexible and adjustable experimental setup was implemented to capture gear face information and build a multi-category gear face dataset, as shown in Figure 7. The device captured images of the upper and lower ends of the defective gears by adjusting the camera and light source, respectively. Light source 1 was a white dome, while light source 2 was a white ring, and we used two Basler’s 300,000 pixel industrial-grade cameras. The specific parameters of each element are shown in Table 4. By adjusting the camera’s aperture, exposure time, and focal length, the images taken at both stations were similar. In addition, to ensure that the edges of the image were sharp and the defects were obvious, the device was placed in an environment with less interference from external light.

3.2. Evaluation of the Images Generated through DCGAN

Inception score (IS) [20] is a more common metric in the literature to evaluate the quality of the generating image; it is modeled by a pre-trained Inception v3 Net [21] on ImageNet, when the input is the image generated by the generator and the output is the probability that a sample belongs to each target class. For a good generator, we expect the image it generates to be clear, that is, the cross-entropy

H (y | x)

of

p (y | x)

should be very low, and at the same time, we expect that the image it generates should be diverse, that is, the cross-entropy

H (y)

of

p (y)

should be very high. IS takes into account both the quality and diversity of the image, and the evaluation equation is as follows:

I S (G) = \exp {\frac{1}{N} \sum_{i = 1}^{N} p (y | x) {\log [p (y | x)] - \log [p (y)]}} = \exp {H (y) - H (y | x)}

(1)

where

x

represents the generated image,

y

represents the prediction label of the inception model,

p (y | x)

represents the probability distribution of the picture belonging to each category, and

p (y)

represents the probability of the edge distribution of the image generated by each category. Therefore, a better generator model should have a higher IS value.

To evaluate the image generation model in this paper, 75 defect samples of each category were randomly selected from the defect images generated by DCGAN to form a test set 1. As a control group, a true defect sample was selected to form Test Set 2. At the same time, in this work, since the gear end face defect data in this article were not ready-made image data in ImageNet, the gear end face defect was classified using the CNN model given in Section 2.2 to calculate the corresponding IS. In Table 5, column 2 shows the score when all defect types are considered together (as a multi-class classification). The last four columns in Table 5 are the scores for separate evaluation of each defect type (as a binary classification). As you can see, the IS of the generated image was slightly lower than the real image. This indicates that the generated image was clear and similar to the real image.

However, IS is also flawed: it only considers the confidence level of

p (y | x)

and

p (y)

. The accuracy of the classification is not taken into account. If the generated images are classified into the wrong labels with high confidence, the IS of these images will still have a higher value. In order to analyze the correctness of the classification, we calculated the accuracy, which is the ratio of the correctly predicted sample (the sum of true positive and true negative) to all samples,

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(2)

in the above given equation,

T P

refers to the number of defective products that are correctly classified;

F P

refers to the number of non-defective products that are mislabeled;

T N

refers to the number of non-defective products that are correctly classified; and

F N

refers to the number of defective products that are mislabeled.

As shown in Table 6, the detection accuracy of stains, breakages, scratches, and circles is 96.00%, 98.67%, 98.67%, and 97.33%, respectively. The average accuracy is 97.67%. This shows that DCGAN performs well in terms of generating gear defect data.

3.3. Selection of Hyperparameters

Parameters and hyperparameters are two important concepts in the deep learning network model. In the process of convolutional neural network iteration, parameters (weights and bias) will automatically be retrieved and updated, they often do not need manual settings, and hyperparameters as external variables usually need to be artificially set according to experience. Different hyperparameter combinations will have different effects on the update of network parameters and the convergence of the model.

As for the hyperparameter search algorithm, we chose the grid search algorithm which takes a long time but can usually achieve the global optimal solution and evaluate the accuracy on the verification image set. Table 7 shows the hyperparameter evaluation results for the model in this paper. It can be seen that the influence of learning rate on accuracy was higher than the minimum batch. The optimal hyperparameter settings obtained were as follows: the learning rate was 0.0001, the minimum batch size was 16, and the backpropagation algorithm was Adam. In the same way, optimal hyperparameter evaluation of different network models is performed, and the optimal hyperparameter combination obtained is shown in Table 8. All subsequent experimental network models were carried out under the above optimal hyperparameters.

3.4. Comparison with Different Models

After setting the hyperparameters, the accuracy and loss of different network models were evaluated on the validation set. According to our experimental settings, images of the validation set were detected once per iteration to evaluate the accuracy and loss of the model. Using the cross-entropy loss function, the weights of all network models were randomly initialized. Vgg11 and Vgg16 set the last layer of the network as 5 channels, using the same dataset as the proposed model with GAN, while the proposed model without GAN only used traditional data enhancement to expand the dataset.

Under the conditions of the above settings, we obtained the accuracy and loss of the training set and the validation set of the four network models, as shown in Figure 8. In the evaluation of the training set, as can be observed from Figure 8a,b, all network models after 40 rounds of iterative training set accuracy were stable at 98%, and the loss of stability was at 0.05. The results are considerable. However, in the evaluation of the validation set, the performance results of different network models vary greatly. In Figure 8c, it can be seen that with the increase in number of iterations, all four network models achieved a high accuracy. We note that Vgg11, Vgg16, and the proposed model with GAN converged faster. An accuracy of 90% was achieved after only 6 epochs. The proposed model without GAN required 12 epochs. However, the Vgg11 and Vgg16 curves fluctuated greatly and had poor stability, which may be due to the large number of parameters in the two network models. There was also a certain risk of overfitting for the gear end face dataset with a small sample size. In terms of final accuracy, the proposed model with GAN reached an accuracy of 98.40% after 50 iterations, which was 1.33 percentage points, 2.13 percentage points, and 2.93 percentage points higher than Vgg11, Vgg16, and the proposed model without GAN, respectively. It can be seen in Figure 8d that the validation set loss of the proposed model with GAN converged to 0.11 after 50 iterations, which is lower than that of other network models. Although the proposed model without GAN performed better in terms of stability, due to reduced image data, the final accuracy and loss results were better for the proposed model with GAN. This shows that the proposed model with GAN has significant advantages in identifying the gear end face defects.

In the testing phase, 375 images of the test set were examined, and the normalized confusion matrix of each network model was obtained, as shown in Figure 9. The proposed model with GAN had a high accuracy (99%) in terms of recognizing normal, stains, scratches, and circles images, as shown in Figure 9a. We observed that in case of the proposed model with GAN, a small proportion of the breakage images were sometimes classified as normal. This also happened with other networks. This is potentially due to fact that breakages are a shape defect while the other types of defects in the end of the gear are mostly texture-based. Hence, the neural network pays more attention to the differences in texture and ignores some difficult-to-identify shape defects. Even so, we can later extract the gear foreground by Blob analysis and screen out the breakages by the size of the foreground area. Other network models perform less well than the proposed model with GAN in terms of identifying other defect categories. For example, in Figure 9b, the proposed model without GAN predicted 13% of normal images as circles; in Figure 9c Vgg11 was prone to confusion on the two defect categories of stains and scratches, with 4% of stains recognized as scratches and 3% of scratches recognized as stains; and in Figure 9d, Vgg16 identified 13% of scratches as stains.

From the above analysis, it can be concluded that through the optimization of the CNN network structure, the proposed model is better than the original Vgg network model, which proves that the improvement proposed in this paper is effective, because the deep structure, the number of small-sized strides and filters enable the network capture combined features, add invariance and class discrimination, and improve the network defect classification performance [22,23], while the use of the dropout layers effectively alleviates the overfitting of the network by randomly disabling neurons, providing higher efficiency and better generalization ability for the network [24].

For binary classification, we used the F1 score as the performance metric which depends upon precision and recall. Precision (

P

) represents the ratio of the number of true positives to the number of positive predicted samples (the sum of true positives and false positives). Precision indicates the correctness of the test results, as shown in Equation (3). Recall (

R

) represents the ratio of true positives to the number of true positive samples on the ground (sum of true positives and false negatives). The recall indicates the comprehensiveness of the test results, as shown in Equation (4). The F1 score is the harmonic mean of precision and recall, which takes into account both correctness and missed detection, as shown in Equation (5).

P = \frac{T P}{T P + F P}

(3)

R = \frac{T P}{T P + F N}

(4)

F 1 = \frac{2 \times P \times R}{P + R} = \frac{2 T P}{2 T P + F N + F P}

(5)

The F1 score also has its variant in multi-classification problems; academics often use micro-F1 and macro-F1 to evaluate the performance of multi-classification models, the two are very different in the data imbalance of multi-classification problems [25], but for the balanced dataset used in our experiments, the difference is not much. Here, the macro-F1 score was used as the model evaluation metric. As shown in Equations (6)–(8), macro-precision (

P_{m a c r o}

), macro-recall (

R_{m a c r o}

), and macro-F1 score (

F 1_{m a c r o}

) are the arithmetic averages of the precision, recall, and F1 score for each classification, respectively. A higher value for macro-F1 score means that the results of multi-classification are more accurate and comprehensive.

P_{m a c r o} = \frac{1}{| G |} \sum_{i = 1}^{| G |} \frac{T P_{i}}{T P_{i} + F P_{i}} = \frac{\sum_{i = 1}^{| G |} P_{i}}{| G |}

(6)

R_{m a c r o} = \frac{1}{| G |} \sum_{i = 1}^{| G |} \frac{T P_{i}}{T P_{i} + F N_{i}} = \frac{\sum_{i = 1}^{| G |} R_{i}}{| G |}

(7)

F 1_{m a c r o} = \frac{1}{| G |} \sum_{i = 1}^{| G |} \frac{2 \times P_{i} \times R_{i}}{P_{i} + R_{i}} = \frac{\sum_{i = 1}^{| G |} F 1_{i}}{| G |}

(8)

where

i

is the category,

G

is the total number of categories, and

P_{i}

,

R_{i}

, and

F 1_{i}

are the precision, recall, and F1 score for each category, respectively.

Table 9 shows the

P_{m a c r o}

,

R_{m a c r o}

,

F 1_{m a c r o}

, parameters, modeling time, and training time for the four network models. As can be seen in Table 9, after 50 iterations the proposed model with GAN yielded the highest

P_{m a c r o}

,

R_{m a c r o}

, and

F 1_{m a c r o}

indicating that the detection results of the proposed model were accurate and comprehensive.

Compared to Vgg11, the parameters of the proposed model with GAN were reduced from 128.80 M to 5.76 M, the modeling time was reduced from 1.72 h to 0.34 h, the training time was reduced from 0.73 h to 0.32 h, and the

F 1_{m a c r o}

increased from 97.06% to 98.40%. It can be seen that the proposed model is superior to Vgg11 network in the above indicators. Compared with the proposed model without GAN, the proposed model with GAN was 0.14 h and 0.15 h slower in modeling time and training time, respectively. However, as the GAN network was fully trained with data, the

F 1_{m a c r o}

for the proposed model with GAN increased from 95.36% to 98.40%. Therefore, this model discarded some time costs in exchange for greater accuracy, which is often considered desirable. To sum up, the proposed model with GAN can effectively compress the network and remove some redundant information while improving the network performance.

3.5. t-SNE Clustering Visualization

To explain and visualize the working process of the network model, the t-SNE [26] clustering algorithm was used to explain the cause of the failure. t-SNE can map high-dimensional features into low-dimensional spaces to visualize high-dimensional data. A total of 375 samples from 5 categories were randomly selected for demonstration, as shown in Figure 10a. After the proposed model’s processing, the separation between the dense clusters of mapped feature points was greater. In Figure 10b, after the Vgg11 model’s processing, the separation between scratches and stains was not obvious and there were overlapping areas indicating that Vgg11 found it difficult to classify these two types of defects. This also explains why scratches and stains were prone to misdetection in the normalized confusion matrix of Vgg11. The t-SNE study helped improve the model. Our future work will be focused on improving the model’s performance on these similar defect types.

4. Conclusions

Acquiring sufficient image data for efficient training of CNNs for defect detection in gear parts is a challenging research problem. We used the ROS method to take the copied data as the real input to the D network, allowing DCGAN to generate high-quality synthetic image data for the purpose of training a defect detection CNN model. The proposed DCGAN in conjunction with traditional data augmentation techniques, was able to generate a robust training image set in our experiments. The synthetic images generated through the DCGAN achieved IS scores of 3.850 ± 0.027 and an average accuracy of 97.67%. This proves that the images generated by DCGAN were clear and meaningful.

We then introduced an improved CNN based on the state-of-the-art Vgg11 network which was trained on the aforementioned image training set. The proposed CNN uses the Leaky ReLU activation function and a Dropout layer and efficiently extracted features while alleviating the model overfitting problem. Compared with the reference network models in our experiments, the proposed model with GAN reached an optimal

F 1_{m a c r o}

score of 98.40%, which shows that the proposed model performed defect detection with a low error rate. It also shows that the model had a high check rate. The proposed model can effectively compress the network without affecting network performance.

We finally performed t-SNE clustering to visualize and compare the classification capability of the proposed CNN and the Vgg11 network. t-SNE clustering reinforces our earlier findings which indicate higher detection accuracy in case of the proposed model with GAN.

The above work achieved a high accuracy in identifying gear surface defects, but some shape defects (such as size) still need to be screened in the first step through a pre-set program, and future work will also be committed to the combination of the two to form a complete algorithm for gear defect detection.

Author Contributions

Methodology, H.G.; supervision, D.W.; validation, W.L. and J.Y.; writing—original draft, H.G.; writing—review and editing, Y.Z. and T.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Key Research and Development Program of Zhejiang Province, grant number 2022C01139 and 2019C01134, and in part by the Key Research and Development Program of Jinhua, grant number 2021-1-001a.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kumar, A.; Gandhi, C.P.; Zhou, Y.Q.; Kumar, R.; Xiang, J.W. Latest developments in gear defect diagnosis and prognosis: A review. Measurement 2020, 158, 107735. [Google Scholar] [CrossRef]
He, X.S.; Wu, W.Q. A Practical Numerical Approach to Characterizing Non-Linear Shrinkage and Optimizing Dimensional Deviation of Injection-Molded Small Module Plastic Gears. Polymers 2021, 13, 2092. [Google Scholar] [CrossRef] [PubMed]
Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Lu, Y.J. An intelligent machine vision system for detecting surface defects on packing boxes based on support vector machine. Meas. Control 2019, 52, 1102–1110. [Google Scholar] [CrossRef] [Green Version]
Moayedi, H.; Mosallanezhad, M.; Rashid, A.S.A.; Jusoh, W.A.W.; Muazu, M.A. A systematic review and meta-analysis of artificial neural network application in geotechnical engineering: Theory and applications. Neural Comput. Appl. 2020, 32, 495–518. [Google Scholar] [CrossRef]
Goyal, D.; Dhami, S.S.; Pabla, B.S. Non-Contact Fault Diagnosis of Bearings in Machine Learning Environment. IEEE Sens. J. 2020, 20, 4816–4823. [Google Scholar] [CrossRef]
Sheng, J.L.; Dong, S.J.; Liu, Z. Bearing fault diagnosis based on intrinsic time-scale decomposition and improved Support vector machine model. J. Vibroengineering 2016, 18, 849–859. [Google Scholar]
Ikhlef, B.; Rahmoune, C.; Toufik, B.; Benazzouz, D. Gearboxes fault detection under operation varying condition based on MODWPT, Ant colony optimization algorithm and Random Forest classifier. Adv. Mech. Eng. 2021, 13, 16878140211043004. [Google Scholar] [CrossRef]
Liu, Z.H. Soft-shell Shrimp Recognition Based on an Improved AlexNet for Quality Evaluations. J. Food Eng. 2020, 266, 109698. [Google Scholar] [CrossRef]
Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
Allam, A.; Moussa, M.; Tarry, C.; Veres, M. Detecting Teeth Defects on Automotive Gears Using Deep Learning. Sensors 2021, 21, 8480. [Google Scholar] [CrossRef] [PubMed]
Kumar, A.; Vashishtha, G.; Gandhi, C.P.; Tang, H.S.; Xiang, J.W. Sparse transfer learning for identifying rotor and gear defects in the mechanical machinery. Measurement 2021, 179, 109494. [Google Scholar] [CrossRef]
Lv, Q.W.; Song, Y.H. Few-shot Learning Combine Attention Mechanism-Based Defect Detection in Bar Surface. ISIJ Int. 2019, 59, 1089–1097. [Google Scholar] [CrossRef] [Green Version]
Gao, X.; Deng, F.; Yue, X.H. Data augmentation in fault diagnosis based on the Wasserstein generative adversarial network with gradient penalty. Neurocomputing 2020, 396, 487–494. [Google Scholar] [CrossRef]
Radford, L.M.A.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
Chen, P.; Li, Y.; Wang, K.S.; Zuo, M.J.; Heyns, P.S.; Baggerohr, S. A threshold self-setting condition monitoring scheme for wind turbine generator bearings based on deep convolutional generative adversarial networks. Measurement 2021, 167, 108234. [Google Scholar] [CrossRef]
Luo, J.; Huang, J.Y.; Li, H.M. A case study of conditional deep convolutional generative adversarial networks in machine fault diagnosis. J. Intell. Manuf. 2021, 32, 407–425. [Google Scholar] [CrossRef]
Oz, M.A.N.; Mercimek, M.; Kaymakci, O.T. Anomaly localization in regular textures based on deep convolutional generative adversarial networks. Appl. Intell. 2022, 52, 1556–1565. [Google Scholar] [CrossRef]
Salimans, T.; Goodfellow, I.J.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved Techniques for Training GANs. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014. [Google Scholar]
Raghu, M.; Poole, B.; Kleinberg, J.M.; Ganguli, S.; Sohl-Dickstein, J. On the expressive power of deep neural networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017. [Google Scholar]
Simonyan, A.Z.K. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Poernomo, A.; Kang, D.K. Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network. Neural Netw. 2018, 104, 60–67. [Google Scholar] [CrossRef]
Ma, D.; Liu, J.; Fang, H.; Wang, N.; Zhang, C.; Li, Z.; Dong, J. A Multi-defect detection system for sewer pipelines based on StyleGAN-SDM and fusion CNN. Constr. Build. Mater. 2021, 312, 125385. [Google Scholar] [CrossRef]
Van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. DCGAN general framework.

Figure 2. Flow chart of the overall experiment procedures proposed.

Figure 3. Front view and side view of gears.

Figure 4. Example of a gear end face defect datasets: (A1–A5) Normal; (B1–B5) stain; (C1–C5) scratch; (D1–D5) circle; (E1–E5) breakage.

Figure 5. Scratch defect image generation effect: (a) 10 steps by original; (b) 200 steps by original; (c) 500 steps by original; (d) 1000 steps by original; (e) 10 steps by ROS; (f) 200 steps by ROS; (g) 500 steps by ROS; (h) 1000 steps by ROS.

Figure 6. Schematic diagram of proposed model.

Figure 7. Experimental equipment layout.

Figure 8. Comparison of results for each network model validation sets: (a) Course of training accuracy; (b) course of training loss; (c) course of validation accuracy; (d) course of validation loss.

Figure 9. Normalized confusion matrix of the gear end classification for different networks: (a) Proposed model with GAN; (b) proposed model without GAN; (c) Vgg11; (d) Vgg16.

Figure 10. A 2D feature visualization using the t-SNE algorithm for the last layer of the network: (a) Proposed model with GAN; (b) Vgg11.

Table 1. Structure of generative model.

Number	Layer Name	Channel	Kernel Size	Stride	Activation Function
1	Fully connected layer	256	-	-	ReLU
2	Deconvolution layer	128	5 × 5	1 × 1	ReLU
3	Deconvolution layer	64	5 × 5	2 × 2	ReLU
4	Deconvolution layer	3	5 × 5	2 × 2	Tanh

Table 2. Structure of discriminative model.

Number	Layer Name	Channel	Kernel Size	Stride	Activation Function
1	Convolution layer	16	3 × 3	2 × 2	Leaky ReLU
2	Convolution layer	32	3 × 3	2 × 2	Leaky ReLU
3	Convolution layer	64	3 × 3	2 × 2	Leaky ReLU
4	Convolution layer	128	3 × 3	2 × 2	Leaky ReLU
5	Fully connected layer	1	-	-	Leaky ReLU

Table 3. Structure of proposed model.

Number	Layer Name	Channel	Kernel Size	Stride	Activation Function
1	Block1_conv1	64	3 × 3	1 × 1	Leaky ReLU
2	Block1_pool	-	3 × 3	2 × 2	-
3	Block2_conv1	128	3 × 3	1 × 1	Leaky ReLU
4	Block2_pool	-	3 × 3	1 × 1	-
5	Block3_conv1	256	3 × 3	1 × 1	Leaky ReLU
6	Block3_conv2	256	3 × 3	1 × 1	Leaky ReLU
7	Block3_pool	-	3 × 3	1 × 1	-
8	Block4_conv1	384	3 × 3	1 × 1	Leaky ReLU
9	Block4_conv2	384	3 × 3	1 × 1	Leaky ReLU
10	Block4_conv3	256	3 × 3	1 × 1	Leaky ReLU
11	Block4_pool	-	3 × 3	2 × 2	-
12	Dense1	256	-	-	Leaky ReLU
13	Dropout (20%)	-	-	-	-
14	Dense2	256	-	-	Leaky ReLU
15	Dropout (20%)	-	-	-	-
16	Dense3	5	-	-	Softmax

Table 4. Detailed parameters for each component.

Name	Setting	Brand	Amount
Camera 1	Resolution: 659 px × 494 px Pixel size: 5.6 dpi × 5.6 dpiFrame rate: 120 ps Color: Black and white	Basler	1
Camera 2	Resolution: 659 px × 494 px Pixel size: 5.6 dpi × 5.6 dpiFrame rate: 120 ps Color: Colorful	Basler	1
Dome light source	Size: W140 mm × H73 mm Voltage: 24 V Power: 3.4 W/16.6 W Color: White	HTC	1
Ring light source	Size: OD × 90 mm × ID65 mm × H24 mm Angle: 30° Voltage: 24 V Power: 3.8 W/5.8 W Color: White	HTC	1

Table 5. Inception score of the real and generated images (IS ± std).

	All	Stain	Breakage	Scratch	Circle
Generated by DCGAN	3.850 ± 0.027	1.951 ± 0.014	1.982 ± 0.009	1.965 ± 0.004	1.956 ± 0.002
Real images	3.947 ± 0.040	1.999 ± 0.001	1.992 ± 0.012	1.997 ± 0.002	1.995 ± 0.004

Table 6. Accuracy analysis of images generated through DCGAN.

Number	True Number
Number	Stain	Breakage	Scratch	Circle	Total Number
Stain	72	1	1	2	76
Breakage	0	74	0	0	74
Scratch	0	0	74	0	74
Circle	3	0	0	73	76
Total number	75	75	75	75	300
Correct number	72	74	74	73	293
Accuracy	96.00%	98.67%	98.67%	97.33%	97.67%

Table 7. Comparison of different hyperparameter combinations.

Cases	Learning Rate	Batch Size	Accuracy
Case 1	0.0001	16	97.4%
Case 2	0.0001	32	97.3%
Case 3	0.0001	8	96.7%
Case 4	0.00005	16	96.3%
Case 5	0.00005	32	96.0%
Case 6	0.00005	8	95.6%
Case 7	0.0003	16	94.7%
Case 8	0.0003	32	94.4%
Case 9	0.0003	8	94.2%

Table 8. Hyperparameter selection for different models.

Models	Hyperparameter
Models	Learning Rate	Batch Size
Proposed model with GAN	0.0001	16
Proposed model without GAN	0.0003	16
Vgg11	0.00005	8
Vgg16	0.00005	8

Table 9. Summary of model performance.

Models	Macro -Precision (%)	Macro -Recall (%)	Macro-F1 Score (%)	Parameters (M)	Modeling Time (h)	Training TIME (h)
Proposed model without GAN	95.65	95.47	95.36	5.76	0.20	0.17
Proposed model with GAN	98.46	98.40	98.40	5.76	0.34	0.32
Vgg11	97.64	97.07	97.06	128.80	1.72	0.73
Vgg16	96.55	96.27	96.26	134.30	2.24	1.20

The best results for each parameter are printed in bold.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, H.; Zhang, Y.; Lv, W.; Yin, J.; Qasim, T.; Wang, D. A Deep Convolutional Generative Adversarial Networks-Based Method for Defect Detection in Small Sample Industrial Parts Images. Appl. Sci. 2022, 12, 6569. https://doi.org/10.3390/app12136569

AMA Style

Gao H, Zhang Y, Lv W, Yin J, Qasim T, Wang D. A Deep Convolutional Generative Adversarial Networks-Based Method for Defect Detection in Small Sample Industrial Parts Images. Applied Sciences. 2022; 12(13):6569. https://doi.org/10.3390/app12136569

Chicago/Turabian Style

Gao, Hongbin, Ya Zhang, Wenkai Lv, Jiawei Yin, Tehreem Qasim, and Dongyun Wang. 2022. "A Deep Convolutional Generative Adversarial Networks-Based Method for Defect Detection in Small Sample Industrial Parts Images" Applied Sciences 12, no. 13: 6569. https://doi.org/10.3390/app12136569

APA Style

Gao, H., Zhang, Y., Lv, W., Yin, J., Qasim, T., & Wang, D. (2022). A Deep Convolutional Generative Adversarial Networks-Based Method for Defect Detection in Small Sample Industrial Parts Images. Applied Sciences, 12(13), 6569. https://doi.org/10.3390/app12136569

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Convolutional Generative Adversarial Networks-Based Method for Defect Detection in Small Sample Industrial Parts Images

Abstract

1. Introduction

2. Proposed Methods

2.1. Construction of Gear End Defect Dataset

2.1.1. Image Selection from the Dataset

2.1.2. Data Augmentation

2.2. The Proposed CNN Model

3. Experiments and Analysis

3.1. Experimental Setup

3.2. Evaluation of the Images Generated through DCGAN

3.3. Selection of Hyperparameters

3.4. Comparison with Different Models

3.5. t-SNE Clustering Visualization

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI