A Rolling Bearing Fault Diagnosis Based on Conditional Depth Convolution Countermeasure Generation Networks under Small Samples

Aiming at the problems of low fault diagnosis accuracy caused by insufficient samples and unbalanced data sample distribution in bearing fault diagnosis, this paper proposes a fault diagnosis method for rolling bearings referencing conditional deep convolution adversarial generative networks (C−DCGAN) for efficient data augmentation. Firstly, the concept of conditional constraints is used to guide and improve the sample generation process of the original generative adversarial network, and specific constraints are added to the data generation model to perform a balanced expansion of muti-category fault data for small sample data sets. Secondly, aiming at the phenomena of training instability, gradient disappearance and gradient explosion in the imbalanced sample set, it is proposed to optimize the structure of the generative network by using the structure of self-defined skip connections and spectral normalization, while using the Wasserstein distance with penalty term instead of cross entropy. The function is used as the loss function of the generative adversarial network to improve the stable feature extraction ability of the generative network and the effect of the training process; in this way, simulation sample data with only a small variation from the real data distribution can be generated. Finally, the complete fault data set (after mixing the original data with sufficient fault category and sample number) and the generated data are input into the one-dimensional convolution neural network for fault diagnosis of rolling bearing. The experiment’s results show that the diagnosis method in this paper can improve the fault classification effect of rolling bearings by generating balanced and sufficient sample data.


Introduction
Mechanical equipment is composed of many parts, and the bearing is an important part of heavy industry; over the long-term operation of mechanical equipment, the bearings may often be easily damaged. In daily operation, when the mechanical failure of bearings occurs, it risks accidents, even ones resulting in losses to factories and enterprises [1][2][3]. Therefore, if a specific fault diagnosis is made for a rolling bearing in the process of operation, the fault types of the bearing can be identified in time, and the faults of the bearing can be repaired or replaced to different degrees; such precautions can effectively ensure the operation of the mechanical equipment, and avoid accidents [4,5].
In terms of industrial machinery fault diagnosis, there are mainly two types of fault diagnosis and detection methods. The first is to use the model to compare the output value of the model with the actual signal data during the operation of the machine, so as to distinguish different potential failures caused by the machine, and to infer and classify these predictions according to the failure phenomenon. The second is a data-driven fault diagnosis method, which usually collects a large number of vibration signals during the actual production process when the machinery is running, describing the state information convolutional neural network to construct WG−CNN, so as to carry out bearing fault diagnosis through data generation.
In summary, this paper proposes a bearing fault diagnosis method based on conditional deep convolution generative adversarial networks (C-DCGAN) on the basis of DCGAN and extra constraint condition C. The problem of the unbalanced distribution of generated data is solved by defining constraint condition C which is classification labels to constrain the generating process. According to the experimental results from the Case Western Reserve University (CWRU) Bearing Data Center, this method can effectively enhance muti-category data from small sample sets and improve the accuracy of bearing fault diagnosis while increasing the training stability of the generating network.

DCGAN
A GAN consists of a generator and a discriminator. Random data is used as the input value of the generator, and the output G (z) and the real sample value are input into the discriminator at the same time to discriminate the generated data. Through the game between the two, the difference between the data distribution of the generated sample and that of the real sample is gradually reduced to a certain extent, which then permits the inference that the generation model has trained convergence; the generator in the model can then be used to generate data which will be similar to the data distribution of real sample data.
The differences between DCGAN and GAN are that: (1) in the discriminator of DCGAN, the pool layer is removed and the convolution layer is added to the structure; (2) by adding batch normalization, the problem that all fault samples converge to the same point is solved; (3) all the fully connected layers in the GAN structure are removed and replaced by convolution layers; (4) the activation functions of each layer of the DCGAN generator and discriminator are modified and replaced, and the instability and gradient disappearance in the original GAN are solved by using the characteristics and definitions of different activation functions. DCGAN essentially maximizes the discriminator's accuracy with respect to the data source of the input value and minimizes the difference between the generator's output and the real sample. This game process can be seen as a maximization and minimization problem, expressed in Formula (1): Among them, P data is the data distribution of the real sample value, and P z is the data distribution of the random noise

CGAN
In 2014, based on GAN, condition C was added to each input of its network structure, and a conditional generative adversarial network (CGAN) was proposed [27]. Condition C can be any kind of auxiliary information, such as a specific set of sample input values.
Taking random noise Z and constraint C as input conditions, sample G (Z|C) is generated by the generator. According to constraint C, the discriminator can distinguish whether the input value is the real sample or not. The objective function V (D, G) is: CGAN makes use of the constraint condition C to constrain the training process of the generating model, and outputs G(Z|C), which conforms to the specific constraint condition, while the discriminator uses the real sample x and the specific constraint condition C, which is the probability of discriminating the input value as the real sample value.

C-DCGAN for Bearing Fault Diagnosis
In this paper, a C-DCGAN network structure is proposed, in which the bearing fault category label is defined as specific constraint C. The fault category label is processed by onehot encoding, and the one-bit valid code of the original label of the fault data is then defined as the constraint C in this paper. There are 10 categories in total, and the generated data can be restricted. Addressing the problems of training instability, gradient disappearance and gradient explosion of the original network model, solutions are summarized as follows: (1) Using custom residual blocks, the regularization effect of the generating network is optimized, the feature extraction ability of the network is enhanced, and the authenticity of the generated simulation fault data is improved. (2) In the generator and discriminator, spectral normalization is added to improve the stability of the network model training in this experiment, and to solve the gradient explosion problem. (3) The loss function is set as W distance with penalty term, which effectively solves the problem of gradient disappearance.

Conditional Deep Convolution Antagonism Generation Network Structure
In this paper, C−DCGAN uses the iterative training process to study the relation between the constraint condition C and the mapping relationship between the random noise and the original fault sample data, and also to supervise the generator whether in accordance with condition C; the problem of uneven distribution and insufficient number of bearing fault samples is solved by generating bearing fault data which obey the distribution of real fault samples.
The structure of the C−DCGAN network is shown in Figure 1.
CGAN makes use of the constraint condition C to constrain the training process of the generating model, and outputs G(Z|C), which conforms to the specific constraint condition, while the discriminator uses the real sample x and the specific constraint condition C, which is the probability of discriminating the input value as the real sample value.

C-DCGAN for Bearing Fault Diagnosis
In this paper, a C-DCGAN network structure is proposed, in which the bearing fault category label is defined as specific constraint C. The fault category label is processed by one-hot encoding, and the one-bit valid code of the original label of the fault data is then defined as the constraint C in this paper. There are 10 categories in total, and the generated data can be restricted. Addressing the problems of training instability, gradient disappearance and gradient explosion of the original network model, solutions are summarized as follows: (1). Using custom residual blocks, the regularization effect of the generating network is optimized, the feature extraction ability of the network is enhanced, and the authenticity of the generated simulation fault data is improved. (2). In the generator and discriminator, spectral normalization is added to improve the stability of the network model training in this experiment, and to solve the gradient explosion problem. (3). The loss function is set as W distance with penalty term, which effectively solves the problem of gradient disappearance.

Conditional Deep Convolution Antagonism Generation Network Structure
In this paper, C−DCGAN uses the iterative training process to study the relation between the constraint condition C and the mapping relationship between the random noise and the original fault sample data, and also to supervise the generator whether in accordance with condition C; the problem of uneven distribution and insufficient number of bearing fault samples is solved by generating bearing fault data which obey the distribution of real fault samples.
The structure of the C−DCGAN network is shown in Figure 1.  In C-DCGAN, the objective function is: E (·) is the expectation of the objective function, the distribution of the real fault data is P data , and the distribution of the input noise data is P z .
In the generator, the constraint condition C and the random noise Z are input into the generator synchronously as input values, and the five transposed convolution layers are connected to each other; between each transposed convolution layer, the network structure is deepened by the residual structure [28], that is, the transposed convolution layer, the activation function layer, the transposed convolution layer, the summation operation h(x) Sensors 2022, 22, 5658 5 of 16 and the activation function layer are superposed one by one to form a self-defined residual block; using the jump connection in the residual block, the mapping F(x) and the input value x are superimposed for output to reduce additional calculations and parameter values. In the output layer, the Tanh function functions as the activation function, which improves the convergence speed and reduces the number of iterations, while in the input layer and other intermediate layers, the Relu function is chosen as the activation function, and the calculation process is thereby simplified and the problem of gradient disappearance is solved. The network structure deepened in this experiment is only composed of two residual blocks, because excessive network depth may lead to problems such as unstable training and loss of data features, as shown in Figure 2. E (·) is the expectation of the objective function, the distribution of the real fault data is Pdata, and the distribution of the input noise data is Pz.
In the generator, the constraint condition C and the random noise Z are input into the generator synchronously as input values, and the five transposed convolution layers are connected to each other; between each transposed convolution layer, the network structure is deepened by the residual structure [28], that is, the transposed convolution layer, the activation function layer, the transposed convolution layer, the summation operation h(x) and the activation function layer are superposed one by one to form a self-defined residual block; using the jump connection in the residual block, the mapping F(x) and the input value x are superimposed for output to reduce additional calculations and parameter values. In the output layer, the Tanh function functions as the activation function, which improves the convergence speed and reduces the number of iterations, while in the input layer and other intermediate layers, the Relu function is chosen as the activation function, and the calculation process is thereby simplified and the problem of gradient disappearance is solved. The network structure deepened in this experiment is only composed of two residual blocks, because excessive network depth may lead to problems such as unstable training and loss of data features, as shown in Figure 2. In the discriminator, the constraint condition C, the generator output G(Z|C) and the real sample value x are input first, and then the feature is extracted through the defined five-layer convolution layer, and a residual block structure similar to the generator is added between each convolution layer. Finally, 0 and 1 are used as true and false judgment values to output the discrimination results of the generated samples. The Leaky Relu function is defined as the activation function of all layers, which solves the phenomenon of neuron necrosis in the original activation function, as shown in Figure 3. In the discriminator, the constraint condition C, the generator output G(Z|C) and the real sample value x are input first, and then the feature is extracted through the defined five-layer convolution layer, and a residual block structure similar to the generator is added between each convolution layer. Finally, 0 and 1 are used as true and false judgment values to output the discrimination results of the generated samples. The Leaky Relu function is defined as the activation function of all layers, which solves the phenomenon of neuron necrosis in the original activation function, as shown in Figure 3.

Spectral Normalization
In the generative adversarial networks, the number of parameters increases exponentially with the number of generating layers, which leads to an increase in the ampli-

Spectral Normalization
In the generative adversarial networks, the number of parameters increases exponentially with the number of generating layers, which leads to an increase in the amplitude of parameter variation and increases the probability of gradient explosion. By adding spectral normalization to the generator and discriminator respectively [29], the gradient upper bound of the function is restricted to make the function smoother, improve the stability of parameter variation and reduce the probability of gradient explosion.
The relationship between the output and input of the nth convolution layer of C−DCGAN network presented in this paper is shown in Formula (4): D n refers to the diagonal matrix, and W n is a parameter matrix. From Formula (4), the output function in C-DCGAN is: The upper bound of the gradient of the function is restricted by using the Lipschitz constraint [30]. In Formula (5), the gradient of the function is: W is the spectral norm of the W matrix, which can also be written as: σ(W) is the maximum singular value of the W matrix. The Formula (7) is normalized as follows: The essential process of spectral normalization is SVD layer by layer, which is solved one by one by power iteration because of time constraints and the cost of computing resources.
After the spectral normalization is added, the parameters of the generated network and the discriminated network are shown in the Tables 1 and 2.

Loss Function
KL divergence or JS divergence is usually used to calculate the distance in the original GAN. However, because of the asymmetry of the KL divergence, different loss values will be obtained for the same difference value when different calculation methods are used. However, the distance calculated by JS divergence is 0 when there is no overlap part in the probability distribution, and over-training will result in a loss value that converges to a fixed value and cannot yield a gradient value.
The KL divergence and JS divergence are similar to distance W. W can be understood as the minimum cost under an optimal path planning, and it can be used to compute the distance between two probability distributions. If there is no overlap between the two probability distributions, or the ratio of overlap is small, W distance can still measure the distance between the distributions. Therefore, W distance can not only provide the distance of distribution, but also guarantee the reliability of gradient information. By defining W as the loss function of this experiment, the calculation can, to a large extent, solve the problems of training instability and gradient disappearance.
The W distance is defined as follows: The definition of the W distance is derived by a transformation of the given formula, as follows: It is indicated that the training of the network needs to satisfy the Lipschitz constraint condition, that is D l ≤ 1, the amplitude of the parameter change in the training process must be restricted within a certain range.
In this paper, we modify the loss function of the network by adding a gradient penalty term [31], which satisfies the Lipschitz condition. The expression of the gradient penalty term is as follows: The regular term coefficient represents the random sampling value between the real sample value x and the generated sample value.
According to the above contents and formulas, the loss functions of the generator and discriminator can be obtained, which are Formulas (12) and (13) respectively:

Training Process of C-DCGAN
The optimization and improvement points of C-DCGAN have been described in detail in Sections 2.1 and 2.2 of this paper. Algorithm 1 is the pseudo-code of the improved C-DCGAN algorithm. Add constraints C to the generator Noise sample input generator to obtain generated data Update the discriminator by its stochastic gradient: End for 3. Sample mini batch of n noise samples from noise Pz(z) 4. Add constraints C to the discriminator 5. Update the generator by its stochastic gradient: 6. end while I is the maximum number of iterations in the training process.
Step is the training times of the discriminator.

One-Dimension Convolutional Neural Network
One-Dimension Convolutional Neural Network (1−D−CNN) is a deep feedback neural network. Since the bearing fault data is one-dimensional data, 1−D−CNN can be used for one-dimensional sequence processing and convolution operations.
Since the convolutional neural network can achieve muti-category classification effects, this paper uses 1−D−CNN to classify the faults of rolling bearings and mixes the original fault data and the generated sample data into the 1−D−CNN network for classification and diagnosis.
In this paper, the bearing fault sample data is input into multiple convolutional layers in the 1−D−CNN model, the input data is convolved through the convolutional layer, and the Relu function is used to activate and add nonlinear features to the feature pool and then use the maximum pooling layer to reduce the input data length of each layer and reduce the probability of over fitting [32]. Finally, the fault data features are processed through the fully connected layer and input to the softmax layer to calculate the probability of fault category and output classification result. The 1−D−CNN is literately trained, and the parameters of the model are literately updated according to the value of the cross-entropy loss function calculated by iteration until the end of the training.
The 1−D−CNN parameters in this paper are shown in Table 3.

Fault Diagnosis Algorithm
Because the fault diagnosis classification model needs sufficient robust data set to ensure its effectiveness, rich data categories and other characteristics will improve its performance. At the same time, it is necessary to maintain the stability of the training process of the countermeasure generation network. In order to optimize the solutions to the problems mentioned above, a fault diagnosis method based on C−DCGAN network structure is proposed in this paper, and its main flow is shown in Figure 4.

Data Set
For this paper, the experimental data from the rolling bearing on the inner and outer rings and the rolling body was recorded, after the use of electric spark technology on the

Data Set
For this paper, the experimental data from the rolling bearing on the inner and outer rings and the rolling body was recorded, after the use of electric spark technology on the rolling bearing had caused different degrees of single-point bearing fault. Figure 5 is the test stand; the diameters of the fault damage were 0.18 mm, 0.36 mm, 0.54 mm, and 0.71 mm, respectively. Then the vibration signals were collected under the conditions of 0HP, 1HP, 2HP and 3HP, respectively; the sampling frequency was 12 kHz. In this experiment, faults are divided into 10 categories. In order to ensure that the statistical characteristic distribution of fault samples follows that of the distribution of a large number of overall fault characteristics, 1024 consecutive sample points are intercepted on the original vibration signal for each small number of fault samples and a large number of overall fault samples are generated, with a total of three data sets A, B and C. Data set A is mainly used to train C−DCGAN to generate high-quality fault sample data, so it mainly contains the original fault sample data of various types of faults in the original fault data set. Data set B is mainly used to train the classification model for fault classification and diagnosis. The simulated fault samples generated from data set A are mixed into the original data set A, and the mixed fault data is used as the training set. Data set C is a test set to test the training convergent classification model. The number of experimental samples is shown in Table 4.

Experimental Results and Comparative Analysis
In order to verify the feasibility of the fault diagnosis method based on the C-DCGAN extended data set, the comprehensive indexes of positive case accuracy and negative case accuracy are used to evaluate the method.
"Accuracy" is the ratio of the number of correctly classified failure samples to the total number of failure samples when testing against a failure test set.
"Recall" is the ratio of the number of failure samples from all positive cases to the number of failure samples from the actual positive cases in the correct classification.
"Specificity" refers to the ratio of all false samples to all actual false samples in the predicted accurate fault samples.
G-mean, an index obtained by combining the accuracy of positive cases and negative cases, is usually used to evaluate the classification effect when the data distribution is unbalanced.

TP TN Accuracy TP FN FP TN
TP Figure 5. Test stand.
In this experiment, faults are divided into 10 categories. In order to ensure that the statistical characteristic distribution of fault samples follows that of the distribution of a large number of overall fault characteristics, 1024 consecutive sample points are intercepted on the original vibration signal for each small number of fault samples and a large number of overall fault samples are generated, with a total of three data sets A, B and C. Data set A is mainly used to train C−DCGAN to generate high-quality fault sample data, so it mainly contains the original fault sample data of various types of faults in the original fault data set. Data set B is mainly used to train the classification model for fault classification and diagnosis. The simulated fault samples generated from data set A are mixed into the original data set A, and the mixed fault data is used as the training set. Data set C is a test set to test the training convergent classification model. The number of experimental samples is shown in Table 4.

Experimental Results and Comparative Analysis
In order to verify the feasibility of the fault diagnosis method based on the C-DCGAN extended data set, the comprehensive indexes of positive case accuracy and negative case accuracy are used to evaluate the method.
"Accuracy" is the ratio of the number of correctly classified failure samples to the total number of failure samples when testing against a failure test set.
"Recall" is the ratio of the number of failure samples from all positive cases to the number of failure samples from the actual positive cases in the correct classification.
"Specificity" refers to the ratio of all false samples to all actual false samples in the predicted accurate fault samples. G-mean, an index obtained by combining the accuracy of positive cases and negative cases, is usually used to evaluate the classification effect when the data distribution is unbalanced.
TP refers to the number of real samples judged as true sample categories; TN refers to the number of false samples judged as false sample categories; FP refers to the number of false samples judged as true sample categories, that is, the number of samples that are mistakenly classified; FN refers to the number of real samples judged as false sample categories, that is, the number of real samples omitted.
The accuracy and the value of loss are shown in Figures 6 and 7.       Figure 8 that the fault sample data generated by C−DCGAN is not exactly the same as the original data, but its overall distribution is similar to the original sample data. At the same time, the generated data has expanded the diversity of the real sample data, which proves that the generation network can be used to effectively expand the unbalanced data set, so as to solve the problem of data imbalance in the data set.   Figure 8 that the fault sample data generated by C−DCGAN is not exactly the same as the original data, but its overall distribution is similar to the original sample data. At the same time, the generated data has expanded the diversity of the real sample data, which proves that the generation network can be used to effectively expand the unbalanced data set, so as to solve the problem of data imbalance in the data set. As shown in Figure 9, the data set of original unbalanced fault and the expanded fault are used for fault diagnosis by the same classifier, and T−SNE visualization is performed. Figure 9a is the diagnostic classification result of the original unbalanced fault data set. It can be seen that there are many overlapping cases of different types of faults. Figure 9b shows the diagnostic classification results after the data set is balanced and expanded by the generated model structure in this paper. It can be seen that C−DCGAN can generate muti-category fault sample data and mix the generated simulated fault data into the original fault data set. Use of the mixed fault data set for fault diagnosis and classification can improve the clustering effect of each type of fault sample data, and effectively improve the accuracy of fault diagnosis and classification. It can be seen from the classification effect in Figure 9 that the sample values of bearing faults of the same category are gathered together, while the fault data of different categories can be clearly separated according to the fault category, and most of the sample data can be correctly classified. As shown in Figure 9, the data set of original unbalanced fault and the expanded fault are used for fault diagnosis by the same classifier, and T−SNE visualization is performed. Figure 9a is the diagnostic classification result of the original unbalanced fault data set. It can be seen that there are many overlapping cases of different types of faults. Figure 9b shows the diagnostic classification results after the data set is balanced and expanded by the generated model structure in this paper. It can be seen that C−DCGAN can generate muticategory fault sample data and mix the generated simulated fault data into the original fault data set. Use of the mixed fault data set for fault diagnosis and classification can improve the clustering effect of each type of fault sample data, and effectively improve the accuracy of fault diagnosis and classification. It can be seen from the classification effect in Figure 9 that the sample values of bearing faults of the same category are gathered together, while the fault data of different categories can be clearly separated according to the fault category, and most of the sample data can be correctly classified. According to the definition and formula, the value range of G−mean is (0, 1), and values close to 1 indicate that the classification effect of G−mean is better. According to Figure 10, the original unbalanced fault data of bearing inputted into the 1−D−CNN network for fault diagnosis and the classification result reveals that most of the values are too small, indicating that the original unbalanced small fault data cannot train 1−D−CNN to a high accuracy rate, so the classification effect is poor. CGAN performs supervised data generation under constraints. The simulated fault sample data generated after the constraint guidance is mixed into the original fault data, and input into the classifier for training. After the classifier is trained to Nash equilibrium, the classification is used. It According to the definition and formula, the value range of G−mean is (0, 1), and values close to 1 indicate that the classification effect of G−mean is better. According to Figure 10, the original unbalanced fault data of bearing inputted into the 1−D−CNN network for fault diagnosis and the classification result reveals that most of the values are too small, indicating that the original unbalanced small fault data cannot train 1−D−CNN to a high accuracy rate, so the classification effect is poor. CGAN performs supervised data generation under constraints. The simulated fault sample data generated after the constraint guidance is mixed into the original fault data, and input into the classifier for training. After the classifier is trained to Nash equilibrium, the classification is used. It can be seen from the G−mean value that generating valid fault data through CGAN and expanding the data set can reduce the impact of uneven distribution of data samples. The performance of C−DCGAN under ten fault categories is the strongest of the three, and the average value returned exceeds 0.8, which can prove that the network structure described in this paper can better balance the expansion of the original fault sample data set. By balancing the fault data set, the classifier is trained to achieve high-accuracy fault classification and diagnosis, reducing the impact of classification errors caused by uneven data distribution. In order to verify the validity and authenticity of the fault sample data generated by the model proposed in this paper, the maximum mean discrepancy (MMD) is used for evaluation. This metric evaluates the true rows of the generated simulated sample data by calculating the probability distribution distance between the simulated sample data and the original sample data; the calculation formula is shown in Formula (18): where K indicates that the distance between the original and generated data set, and which is mapped to the regenerated Hilbert space by the function of Gaussian kernel Randomly selected 0, 1, 4, and 7 categories of fault data, as shown in Figure 11, in the process of generating data for these four different fault categories, as the number of training iterations increases, the maximum mean difference overall shows a downward trend, and the probability distribution between the original fault data and the generated fault samples gradually decreases. Scaled down, this verifies the authenticity of the simulated fault samples generated by the adversarial generative network proposed in this paper. In order to verify the validity and authenticity of the fault sample data generated by the model proposed in this paper, the maximum mean discrepancy (MMD) is used for evaluation. This metric evaluates the true rows of the generated simulated sample data by calculating the probability distribution distance between the simulated sample data and the original sample data; the calculation formula is shown in Formula (18): where K indicates that the distance between the original and generated data set, and which is mapped to the regenerated Hilbert space by the function of Gaussian kernel Randomly selected 0, 1, 4, and 7 categories of fault data, as shown in Figure 11, in the process of generating data for these four different fault categories, as the number of training iterations increases, the maximum mean difference overall shows a downward trend, and the probability distribution between the original fault data and the generated fault samples gradually decreases. Scaled down, this verifies the authenticity of the simulated fault samples generated by the adversarial generative network proposed in this paper.  In order to verify that the C−DCGAN fault diagnosis model proposed in this paper is suitable for fault diagnosis of bearings in the case of unbalanced sample data, the model is compared with the following two different diagnostic models of adversarial generative networks, with a 1:20 imbalance. The proportion of the fault data set is defined, and the highly unbalanced data set is input into three fault diagnosis models trained to Nash equilibrium, the trained diagnostic model is then used to classify the fault diagnosis of the test set. The classification results are classified into a confusion matrix, as shown in Figure 12. Under the premise of unbalanced sample distribution, the data generation quality of the generative model can be reflected according to the classification accuracy of fault diagnosis. In Figure 12, the fault diagnosis model proposed in this paper evinces high accuracy and can therefore be applied to bearing fault diagnosis scenarios with small sample fault data sets. In order to further verify that the conditional deep convolutional adversarial generation network proposed in this paper can improve fault diagnosis classification accuracy by balanced expansion of the small sample fault dataset, common fault diagnostic methods, such as C−DCGAN+SVM, C−DCGAN+LSTM, C−DCGAN+1-D−CNN, in-foGAN+1−D−CNN, and CGAN+1−D−CNN were adopted for a comparison of rolling bearing fault diagnosis. LSTM is a type of long short-term memory network that detects and classifies faults by learning the temporal information between fault features. C−DCGAN+SVM is a type of process in which, after expanding the small sample data set through the confrontation generation network proposed in this paper, the data set is input to the SVM network for classification training, and the trained classifier is used to perform fault diagnosis and classification on the test set. infoGAN+1-D−CNN is based on an information generation adversarial network for data generation, and it then uses one-dimensional convolutional network for fault feature extraction and fault classification. CGAN is based on GAN and adds constraints for data generation. In order to verify that the C−DCGAN fault diagnosis model proposed in this paper is suitable for fault diagnosis of bearings in the case of unbalanced sample data, the model is compared with the following two different diagnostic models of adversarial generative networks, with a 1:20 imbalance. The proportion of the fault data set is defined, and the highly unbalanced data set is input into three fault diagnosis models trained to Nash equilibrium, the trained diagnostic model is then used to classify the fault diagnosis of the test set. The classification results are classified into a confusion matrix, as shown in Figure 12. Under the premise of unbalanced sample distribution, the data generation quality of the generative model can be reflected according to the classification accuracy of fault diagnosis. In Figure 12, the fault diagnosis model proposed in this paper evinces high accuracy and can therefore be applied to bearing fault diagnosis scenarios with small sample fault data sets. In order to further verify that the conditional deep convolutional adversarial generation network proposed in this paper can improve fault diagnosis classification accuracy by balanced expansion of the small sample fault dataset, common fault diagnostic methods, such as C−DCGAN+SVM, C−DCGAN+LSTM, C−DCGAN+1-D−CNN, in-foGAN+1−D−CNN, and CGAN+1−D−CNN were adopted for a comparison of rolling bearing fault diagnosis. LSTM is a type of long short-term memory network that detects and classifies faults by learning the temporal information between fault features. C−DCGAN+SVM is a type of process in which, after expanding the small sample data set through the confrontation generation network proposed in this paper, the data set is input to the SVM network for classification training, and the trained classifier is used to perform fault diagnosis and classification on the test set. infoGAN+1-D−CNN is based on an information generation adversarial network for data generation, and it then uses one-dimensional convolutional network for fault feature extraction and fault classification. CGAN is based on GAN and adds constraints for data generation.
According to Table 5, it can be seen that the adversarial generation model proposed in this paper can effectively expand the fault data set and use a variety of classifiers for fault diagnosis, and the classification accuracy can reach 90%. The results show that the fault diagnosis method proposed in this paper can effectively improve the fault diagnosis accuracy of rolling bearings compared with several common fault diagnosis methods based on data augmentation. In order to further verify that the conditional deep convolutional adversarial generation network proposed in this paper can improve fault diagnosis classification accuracy by balanced expansion of the small sample fault dataset, common fault diagnostic methods, such as C−DCGAN+SVM, C−DCGAN+LSTM, C−DCGAN+1-D−CNN, infoGAN+1−D−CNN, and CGAN+1−D−CNN were adopted for a comparison of rolling bearing fault diagnosis. LSTM is a type of long short-term memory network that detects and classifies faults by learning the temporal information between fault features. C−DCGAN+SVM is a type of process in which, after expanding the small sample data set through the confrontation generation network proposed in this paper, the data set is input to the SVM network for classification training, and the trained classifier is used to perform fault diagnosis and classification on the test set. infoGAN+1-D−CNN is based on an information generation adversarial network for data generation, and it then uses one-dimensional convolutional network for fault feature extraction and fault classification. CGAN is based on GAN and adds constraints for data generation.
According to Table 5, it can be seen that the adversarial generation model proposed in this paper can effectively expand the fault data set and use a variety of classifiers for fault diagnosis, and the classification accuracy can reach 90%. The results show that the fault diagnosis method proposed in this paper can effectively improve the fault diagnosis accuracy of rolling bearings compared with several common fault diagnosis methods based on data augmentation.

Conclusions
In this paper, a conditional deep convolution generative adversarial network is proposed, one which can effectively expand the data set of small samples, improve the data imbalance of small samples, and improve the precision of bearing fault diagnosis. Compared with other fault diagnosis methods, the demonstrated improved loss functions and network structure can improve the stability of C-DCGAN and reduce the probability of gradient vanishing and gradient explosion. The experimental results show that the fault diagnosis model designed in this paper can effectively improve the classification accuracy of these 10 types of faults and better diagnose the more complex types of faults in the operation of mechanical equipment. However, there are still some remaining concerns. If the fault category is increased or the mechanical operation environment is disturbed, the classification accuracy of the classifier will be reduced to a certain extent, and the model training will be difficult to converge to the ideal state; Future research will aim at the construction of the fault diagnosis model in order to realize multiple fault classifications.