Intelligent Fault Diagnosis of Unbalanced Samples Using Optimized Generative Adversarial Network

: The increasing range of faults encountered by mechanical systems has brought great challenges for conducting intelligent fault diagnosis based on insufficient samples, in recent years. To tackle the issue of unbalanced samples, an improved methodology based on a generative adversarial network that uses sample generation and classification is proposed. First, 1D vibration signals are transformed into 2D images considering the features of the vibrating signals. Next, the optimized generation adversarial network is constructed for adversarial training to synthesize diverse fake 2D images according to actual sample characteristics with the generative model as a generator and the discriminative model as a discriminator. Our model uses an attenuated learning rate with a cross-iteration batch normalization layer to enhance the validity of the generator. Last, the discriminative model as a classifier is used to identify the fault states. The experimental results demonstrate that the proposed strategy efficiently improves fault identification accuracy in the two cases of sample imbalance.


Introduction
The safe operation of automatic equipment is essential in production processes.Equipment failure can lead to injury and severe accidents in manufacturing [1].It is necessary to improve the diagnosis of faults to discover potential safety hazards in time for maintaining the health conditions of the machines [2].
With the development of intelligent computing techniques, effective automatic fault diagnosis becomes possible, as does ensuring safety in industrial production [3][4][5].Fault diagnosis approaches always process the collected vibration signals.Current intelligent methods [4] are divided into two types: traditional machine learning methodologies (e.g., random forests (RFs) [6], back propagation neural networks [7], and support vector machines (SVMs) [8]) and deep learning methodologies (e.g., convolutional neural networks (CNNs) [9,10], a residual network (ResNet) [11], and capsule network (CapsNet)) [12].For example, Kumar et al. [13] constructed a new CNN model for the identification of machine failures with a wavelet transform technique.The current literature has focused on using plenty of balanced samples to learn a satisfactory training model.However, in real applications, the amount of the collected samples for individual fault classes is often insufficient, causing unbalanced sampling.This deficiency makes it difficult to form a perfect dataset for learning features, so there is a pressing need to discover better and more effective methods to deal with unbalanced data [14].Current methods usually employ optimization algorithms for alleviating the problem of unbalanced training data [15].Jia et al. [16] established a normalization CNN mode to enhance training effectiveness and used weighted Softmax loss to solve classification imbalance.Geng et al. [17] designed a deficiency detection strategy with ResNet by employing an imbalance-weighted crossentropy loss function to resolve the sample imbalance.However, for the optimization strategy, the proportion of the category samples is fixed, so it is difficult to address the issue of proportion change.
Data augmentation under small samples is a practical way to overcome the limitation above.As a prospective approach, a generative adversarial network (GAN) [18] is a semisupervised feature-learning method using game scenarios.The GAN and its variations have been widely employed in signal formation and recognition [19].GAN learns the distribution of previous samples and outputs satisfactory simulated ones.Thus, GAN provides a novel approach for solving the problem of unbalanced and insufficient data for actual fault diagnosis [20][21][22][23].Recently, Wang et al. [24] presented a conditional variational GAN model for imbalance fault diagnosis that gains the distribution features from a small fault sample size.Zhou et al. [25] developed a novel GAN method via global optimization to create more fault condition data to deal with the imbalance problem, and a two-hierarchical discriminator is designed to process unqualified data.Fan et al. [26] designed a Wasserstein GAN diagnosis approach with a full attention mechanism and gradient normalization to promote the quality of the fake samples.Meng et al. [27] proposed a feasible auxiliary classification GAN methodology with Wasserstein distance and attention modules for information insufficiency.GANs are also suitable for producing two-dimensional (2D) samples.Luo et al. [28] developed an effective generator by using a time-series GAN that learns raw data and an auxiliary classifier GAN for producing multimode time-frequency maps to balance the sample distributing.Liu et al. [29] proposed a multi-scale residual GAN to generate time-frequency maps to balance an unbalanced data distribution and used a classifier based on feature enhancement for fault classification.These studies are all two-step strategies.An example is the generative training of a GAN model and another deep learning classification model that are independent.The inconsistency of sample generation and sample classification can increase the complexity of the diagnosis model.
This paper proposes a model for diagnosing imbalance faults with an improved CGAN to ensure consistency of generation and classification.Because GANs were originally designed for image processing, vibration signals are first transformed into feature images as inputs of the proposed model.This develops a new generative model as a generator and a discriminative model as a discriminator to generate reasonable discriminatory samples.The discriminator also identifies the fault condition with real and fake samples.The discriminative model as a classifier relies on the classification training of enough samples to improve the fault classification accuracy of the designed method.Experimental results demonstrate that the advanced strategy is sufficiently robust to synthesize high-quality samples and provides excellent diagnostics with unbalanced data.
The contributions of this paper are summarized as follows: (1) A novel fault diagnosis method based on improved CGAN is presented to solve the problem of the multi-class imbalanced data.The improved model synthesizes more varied samples to balance the data distribution and identify fault states, which can be applied to the bearing and gearbox datasets.
(2) A generative model as a generator that integrates feature convolution and crossiteration batch normalization is proposed to overcome the training instability of CGANs.Furthermore, the discriminative model as a discriminator is improved to identify real/fake samples with skip connection, and it is also used as a classifier to define fault categories.
(3) The three loss functions are reconstructed for the generator, the discriminator, and the classifier to achieve the diagnosis of unbalanced samples for enhancing the model performance, respectively.
The remainder of this paper is organized as follows.Section 2 illustrates the theoretical basis and a detailed description of the proposed approach.The two cases for experimental verification against the compared methods are shown in Section 3. Section 4 reveals the conclusion.

Methods
2.1.Brief Theory 2.1.1.GAN Principle GAN [30] contains generator G and discriminator D in Figure 1.Random noise u generates samples G(u), which follow the actual data distribution, P real .The discriminator makes a distinction between real data y and fake data D for the input sample.Suppose P real (y) is the actual data distribution of y, and P u (u) is the generated noise distribution of u.The adversarial training between G and D is realized by the following: where E(•) is the expectation function, D(y) is the probability of y predicted as a real sample, and G(u) is the generated sample.
veals the conclusion.

Brief Theory
where () E  is the expectation function, () Dy is the probability of y predicted as a rea sample, and () Gu is the generated sample.However, the original GAN has several problems.For example, the training of GAN is unstable as it approaches Nash equilibrium.

Real samples y
Generator G Discriminator D

Random noise u
Real or Fake is the probability of be ing real data via discriminator D when random noise is combined with condition c and input to generator G to create samples.Figure 2 shows a simple CGAN structure.However, the original GAN has several problems.For example, the training of GAN is unstable as it approaches Nash equilibrium.

Conditional GAN
Conditional generative adversarial network (CGAN) uses additional information to adjust the model and guide the data generation process.An additional condition c is attached to the input layer, which can be any additional input that helps generate data with specific properties, e.g., class labels, text descriptions, etc.The corresponding condition c is input to the generator, along with the input noise; real data y and condition c are input to the discriminator.The objective function F(D, G) is expressed by the following: The optimization of objection function F(D, G) in CGAN is similar to that of GAN.E y∼P y (y) [log D(y|c)] is the probability of identifying real data after input data y and condition c are fed to discriminator D; E u∼P u (u) [log(1 − D(G(u|c)))] is the probability of being real data via discriminator D when random noise is combined with condition c and input to generator G to create samples.Figure 2 shows a simple CGAN structure.The classic convolutional neural network (CNN) [31] contains convolution, pooling and fully connected layers, as shown in Figure 3. Output features are extracted from the output of the previous convolution layer.The number of features is decreased by feature

Convolutional Neural Network
The classic convolutional neural network (CNN) [31] contains convolution, pooling, and fully connected layers, as shown in Figure 3. Output features are extracted from the output of the previous convolution layer.The number of features is decreased by feature selection in the pooling layer, commonly used in maximum and average pooling.The fully connected layer connects all the features of the previous layer, integrating the information with the categories in the pooling layer.Finally, the classifier is employed for the output of the network.Convolution is defined as follows: where b k n is the offset, y k n is the n-th feature image output of the k-th layer, y k−1 m is the m-th feature image of the (k − 1)-th layer, and κ k mn is the convolution kernel between the m-th input feature image and the m-th feature image.

Convolutional Neural Network
The classic convolutional neural network (CNN) [31] contains convolution, pooling and fully connected layers, as shown in Figure 3. Output features are extracted from th output of the previous convolution layer.The number of features is decreased by featur selection in the pooling layer, commonly used in maximum and average pooling.The full connected layer connects all the features of the previous layer, integrating the informatio with the categories in the pooling layer.Finally, the classifier is employed for the outpu of the network.Convolution is defined as follows: However, ReLU has the drawback that neurons with 0 y  may not be activated causing the related parameters to never be refreshed.Parametric ReLU (PReLU) [33] ca adaptively learn the parameters of the corrected linear unit and avoid zero gradien which is defined as follows: However, ReLU has the drawback that neurons with y ≤ 0 may not be activated, causing the related parameters to never be refreshed.Parametric ReLU (PReLU) [33] can adaptively learn the parameters of the corrected linear unit and avoid zero gradient, which is defined as follows: where y i is the input of the i-th channel.a i is a gradient for one layer as follows: where ξ represents the objective function.The momentum method [33] is used for updating a i .

Proposed Method 2.2.1. Overview
The structure of the optimized CGAN is presented in Figure 4.The model includes a generator and a discriminator.Firstly, one-dimensional (1D) vibration signals are transformed into feature images by applying wavelet transform.Secondly, a class label and random noise are fed to the proposed model to synthesize fake samples.In addition, all the samples, including real and fake samples, are classified.Finally, the faults can be recognized.
where  represents the objective function.The momentum method [33] is used for up- dating i a .

Overview
The structure of the optimized CGAN is presented in Figure 4.The model includes a generator and a discriminator.Firstly, one-dimensional (1D) vibration signals are transformed into feature images by applying wavelet transform.Secondly, a class label and random noise are fed to the proposed model to synthesize fake samples.In addition, all the samples, including real and fake samples, are classified.Finally, the faults can be recognized.

Data Preprocessing
To obtain rich features of mechanical fault conditions, the 2D feature images are produced from the 1D vibration signals with wavelet transform method [11].Then, image recognition methods can be used to identify the fault data.Continuous wavelet transform can be expressed as follows: where ()   ft is the raw signal,  is scale factor,  is translation factor, and ()   is the wavelet basis function.

Data Preprocessing
To obtain rich features of mechanical fault conditions, the 2D feature images are produced from the 1D vibration signals with wavelet transform method [11].Then, image recognition methods can be used to identify the fault data.Continuous wavelet transform can be expressed as follows: where f (t) is the raw signal, γ is scale factor, τ is translation factor, and φ(•) is the wavelet basis function.

Generative Model
The generative model as a generator is improved to synthesize high-quality samples from the noise and class label information as input to augment and balance the unbalanced dataset.The structure of the improved generator is illustrated in Figure 5.Because the ability of convolution operation in feature extraction is excellent, the proposed generator mainly uses convolution transpose and convolution layers.Unlike the generator of the traditional CGAN, the input of each convolution block is closely linked with the label data as a fault condition to improve the guidance of the label during the learning process.Each convolution layer uses the kernel size of 3 and the step size of 1.A cross-iteration batch normalization (CBN) [34] layer is also introduced to the generator by connecting the input layer with the following layers directly instead of through a BN layer.In order to mitigate gradient loss, the ReLU function is replaced with PReLU to speed up the convergence.
Finally, the synthesized samples are output through a convolution layer and hyperbolic tangent function.
the traditional CGAN, the input of each convolution block is closely linked with the labe data as a fault condition to improve the guidance of the label during the learning process Each convolution layer uses the kernel size of 3 and the step size of 1.A cross-iteration batch normalization (CBN) [34] layer is also introduced to the generator by connecting the input layer with the following layers directly instead of through a BN layer.In order to mitigate gradient loss, the ReLU function is replaced with PReLU to speed up the conver gence.Finally, the synthesized samples are output through a convolution layer and hy perbolic tangent function.The loss function of the generator is improved to address the problem of mode col lapse and strengthen the stability.Suppose the generator is G, the discriminator is D, the real data are y , and the generated data are ŷ .The GAN loss function gan L is given by the following: where N is the input number.
For the multi-classification problem with unbalanced samples, the gradient harmo nizing mechanism (GHM) loss function ghm L [35] is employed to reduce the non-hard sample loss and increase the hard-sample loss.ghm L is formulated as follows: The loss function of the generator is improved to address the problem of mode collapse and strengthen the stability.Suppose the generator is G, the discriminator is D, the real data are y, and the generated data are ŷ.The GAN loss function L gan is given by the following: where N is the input number.
For the multi-classification problem with unbalanced samples, the gradient harmonizing mechanism (GHM) loss function L ghm [35] is employed to reduce the non-hard-sample loss and increase the hard-sample loss.L ghm is formulated as follows: where p i is the prediction distribution, p * i is the true distribution, L ce is the cross-entropy (CE) loss function with p i and p * i , and G gd is the gradient density function with gradient mode length g i .
In a traditional GAN, model collapse may lack data diversity, which can improve the generalization ability of the model.To overcome this, the mode seeking loss function L ms [36] is generally employed to improve sample diversity, which is expressed as follows: where c is the condition for the input mode, and G(c, z 1 ) and G(c, z 2 ) are the map modes with the latent codes z 1 and z 2 , respectively.
Appl.Sci.2024, 14, 4927 Then, the final loss can be given by the following: where α is the weight of L ms .

Discriminant Model
In this work, the structure of the discriminant model is used for both discrimination and classification, as shown in Figure 6.The discriminative model is constructed with the convolutional, BN, and PReLU layers to improve the model accuracy.With the increase in the network depth, the gradient disappearance and gradient explosion may degrade the model, so the skip connection is employed to make the discriminator much deeper and gain a broader field of view and greater fitting ability, as shown in Figure 6.
where c is the condition for the input mode, and Then, the final loss can be given by the following: where  is the weight of ms L .

Discriminant Model
In this work, the structure of the discriminant model is used for both discrimination and classification, as shown in Figure 6.The discriminative model is constructed with th convolutional, BN, and PReLU layers to improve the model accuracy.With the increas in the network depth, the gradient disappearance and gradient explosion may degrad the model, so the skip connection is employed to make the discriminator much deepe and gain a broader field of view and greater fitting ability, as shown in Figure 6.The discriminative model as a discriminator does not need to be connected to th label data at the input and directly uses the feature image data as input to avoid the impac The discriminative model as a discriminator does not need to be connected to the label data at the input and directly uses the feature image data as input to avoid the impact of the label data.The output is 1D probability.Considering that learning can cause model overfitting and gradient sharpness, a gradient penalty can be introduced to the traditional loss function of the discriminator L D_ge , and then the improved loss function L * D_ge becomes the following: where λ is the penalty parameter.
The discriminative model as a classifier for the fault classification task is convenient for the overall extraction when the classifier is trained later.In contrast to the discriminator structure, the class number is the last fully connected layer's output dimension, and a Softmax classifier outputs the results.Entering label information at each layer of the generator guides the output better than adding condition information only at the input layer.For classification, the label smoothing regularization (LSR) loss L D_cls is employed as follows: where ε is the loss coefficient.

Model Training
Model training is divided into two stages in this study.The first stage is the training process of data generation with the generator and discriminator to generate satisfactory samples.The second stage is the training process of data classification with the classifier based on the saved parameters of the first stage for the discriminative model to identify fault conditions.Note that the generator uses the structure of the generative model, and the discriminator and classifier both use the structure of the discriminative model.
In the first stage, random noise is fed to the generator to create samples of the same kind as the small samples.The generator can make the output of the generator near to 1 after the sample passes the discriminator (i.e., the synthesized data are similar to the actual data).However, the discriminator's goal is to get the generated sample's result as near to 0 as possible and the real sample's result close to 1.During learning, the discriminator will win the confrontation training with the generator easily, leading to the vanishing gradient of the generator.Thus, when the discriminator is updated once for k updates of the generator, the discriminator cannot quickly reach an approximate optimal to keep the equilibrium between the generator and the discriminator.The parameter k should also be selected according to datasets of different sizes.If k is too small, the discriminator reaches an approximate optimal, and the generator will face the gradient disappearing.If k is too large, the gradient of the generator will be inaccurate.In this study, the generator is updated at a lower learning rate, and the discriminator is updated more quickly to ensure the network converges to the local Nash equilibria [37].
In the second stage, the trained discriminative model (excluding the last layer) is used.The parameters are fine-tuned for the new structure for the classifier.However, the classifier only fine-tunes the trained parameters of the first stage with the actual and generated samples for fault classification.Unlike the discriminator structure, the last fully connected layer of the classifier has an output of category number.An Adam optimizer is used.The dropout value can be set to 0.5 to avoid overfitting in the fully connected layer during training.

Results
To demonstrate the viability of the presented strategy, two openly available datasets were employed: the bearing dataset of Case Western Reserve University (CWRU) [38] and the gearbox dataset of Southeast University (SEU) [39].The parameters of the experimental model, the outcomes of the experiment, and comparison experiments are all described in this section.

Data Description
The case data are from the CWRU dataset [38] generally employed for fault diagnosis.The experiment platform includes electronic control equipment, torque sensors, power meters, and an electric motor.The test bed evaluated the bearings that support the motor.The fault state of the bearing was measured by electrical discharge machining (EDM) technology.For the experiment, the fault data for the drive end bearing were chosen with a speed of 1730 rpm and a sample frequency of 12 kHz.There are nine failure classes including ball fault, inner ring fault, and outer ring fault with 0.007 inches, 0.014 inches, and 0.021 inches, respectively.With normal conditions, ten classes of fault labels are tested.
In order to ensure that each sample has at least one period of vibration data, samples are defined at 600 data points apart.For the purpose of diagnosing faults, the 1D vibration data were converted into time-frequency images.The preprocessed results of wavelet transform in the CWRU dataset are shown in Figure 7.
(EDM) technology.For the experiment, the fault data for the drive end bearing were chosen with a speed of 1730 rpm and a sample frequency of 12 kHz.There are nine failure classes including ball fault, inner ring fault, and outer ring fault with 0.007 inches, 0.014 inches, and 0.021 inches, respectively.With normal conditions, ten classes of fault labels are tested.
In order to ensure that each sample has at least one period of vibration data, samples are defined at 600 data points apart.For the purpose of diagnosing faults, the 1D vibration data were converted into time-frequency images.The preprocessed results of wavelet transform in the CWRU dataset are shown in Figure 7.The CWRU dataset was implemented to simulate the environment of sample imbalance with three imbalance ratios, including IR = 4:1, 8:1, and 16:1.Data preprocessing is used for transforming 1D raw signals into feature images.Table 1 shows the sample sizes for different imbalance ratios (IRs) with various classes for Case I.Ten experiments were repeated for each training and test cycle.The CWRU dataset was implemented to simulate the environment of sample imbalance with three imbalance ratios, including IR = 4:1, 8:1, and 16:1.Data preprocessing is used for transforming 1D raw signals into feature images.Table 1 shows the sample sizes for different imbalance ratios (IRs) with various classes for Case I.Ten experiments were repeated for each training and test cycle.

Sample Generation Experiment
For this case, the learning rate of the generator was 0.0004, and the learning rate of the discriminator was 0.0002 for data generation.There were 64 samples in each batch.In order to maintain the adversarial balance, the updating number of the discriminator and generator is 1:4.The changes in the loss function with the increase in training times are shown in Figure 8.The generator and discriminator gradually stabilized as the training time increased, and the two networks competed, causing large oscillations, as shown in Figure 8a,b.It is seen that the iteration number is more than 13,000.However, in the general trend, the loss function of the generator increased gradually, while that of the discriminator decreased slightly.In the adversarial process, the discriminator beat the generator by a narrow margin.Some synthesized samples are shown in Figure 9.
To quantitatively analyze the discrepancy between the real and fake images, this study chose histogram matching to calculate the distance between them.Then, the histogram distance k hm can be defined by the following: Appl.Sci.2024, 14, 4927 10 of 19 where H 1 and H 2 are the histograms of the images, respectively.
For RGB image samples, the histogram distance D hm is expressed as follows: To demonstrate the method's effectiveness, the first sample of each health condition was selected to produce a histogram with R, G, and B channels due to the generated sample similarity.DCGAN was used for the generated sample comparison, and the results of produced histograms are displayed in Figure 8.The histograms with the proposed technique are closer to the actual data than DCGAN for various classes.Although there is a small difference between the real and synthesized samples, Figure 10 also shows that the synthesized samples are learned from real data rather than simple copies.The superior results are further verified, as shown in Figure 11, by comparing the average histogram distance D hm of the proposed model and DCGAN.It was observed that most D hm of the proposed technique exceeded DCGAN.It was demonstrated that the effectiveness of the synthesized samples using the proposed method was satisfactory.

Sample Generation Experiment
For this case, the learning rate of the generator was 0.0004, and the learning rate of the discriminator was 0.0002 for data generation.There were 64 samples in each batch.In order to maintain the adversarial balance, the updating number of the discriminator and generator is 1:4.The changes in the loss function with the increase in training times are shown in Figure 8.The generator and discriminator gradually stabilized as the training time increased, and the two networks competed, causing large oscillations, as shown in Figure 8a,b.It is seen that the iteration number is more than 13,000.However, in the general trend, the loss function of the generator increased gradually, while that of the discriminator decreased slightly.In the adversarial process, the discriminator beat the generator by a narrow margin.Some synthesized samples are shown in Figure 9.To quantitatively analyze the discrepancy between the real and fake images, this study chose histogram matching to calculate the distance between them.Then, the histogram distance hm k can be defined by the following: Appl.Sci.2024, 14, x FOR PEER REVIEW 10 of 20

Sample Generation Experiment
For this case, the learning rate of the generator was 0.0004, and the learning rate of the discriminator was 0.0002 for data generation.There were 64 samples in each batch.In order to maintain the adversarial balance, the updating number of the discriminator and generator is 1:4.The changes in the loss function with the increase in training times are shown in Figure 8.The generator and discriminator gradually stabilized as the training time increased, and the two networks competed, causing large oscillations, as shown in Figure 8a,b.It is seen that the iteration number is more than 13,000.However, in the general trend, the loss function of the generator increased gradually, while that of the discriminator decreased slightly.In the adversarial process, the discriminator beat the generator by a narrow margin.Some synthesized samples are shown in Figure 9.To quantitatively analyze the discrepancy between the real and fake images, this study chose histogram matching to calculate the distance between them.Then, the histogram distance hm k can be defined by the following: where 1 H and 2 H are the histograms of the images, respectively.

Fault Classification Experiment
The learning rate was set to 0.0002 in the Adam optimizer for the classifier.The multiclass confusion matrixes of the output and the three IRs are displayed in Figure 12.The train samples are augmented as a sufficient dataset.For exploring the advantages of the proposed algorithm in fault classification, three compared classifiers were trained separately with the augmented samples for comparison.The data were normalized before training.Table 2 compares the four classifiers with various imbalance ratios in four general metrics: recall, precision, F1, and accuracy.For the classifier, the average diagnostic accuracy reached more than 99.10%.When IR = 8:1, the accuracy of the proposed method matched ResNet18, but the precision of the proposed method was slightly higher.Table 2 demonstrates that the proposed method is superior, as determined by all the metrics, to other methods for 10 fault states.Therefore, the proposed method can effectively improve the accuracy and stability of fault diagnosis under the condition of imbalanced samples.

Fault Classification Experiment
The learning rate was set to 0.0002 in the Adam optimizer for the classifier.The multiclass confusion matrixes of the output and the three IRs are displayed in Figure 12.The train samples are augmented as a sufficient dataset.For exploring the advantages of the proposed algorithm in fault classification, three compared classifiers were trained separately with the augmented samples for comparison.The data were normalized before training.Table 2 compares the four classifiers with various imbalance ratios in four general metrics: recall, precision, F1, and accuracy.For the classifier, the average diagnostic accuracy reached more than 99.10%.When IR = 8:1, the accuracy of the proposed method matched ResNet18, but the precision of the proposed method was slightly higher.Table 2 demonstrates that the proposed method is superior, as determined by all the metrics, to other methods for 10 fault states.Therefore, the proposed method can effectively improve the accuracy and stability of fault diagnosis under the condition of imbalanced samples.racy reached more than 99.10%.When IR = 8:1, the accuracy of the proposed method matched ResNet18, but the precision of the proposed method was slightly higher.Table 2 demonstrates that the proposed method is superior, as determined by all the metrics, to other methods for 10 fault states.Therefore, the proposed method can effectively improve the accuracy and stability of fault diagnosis under the condition of imbalanced samples.The SEU dataset was designed to simulate the environment of sample imbalance with three imbalance ratios, including IR = 4:1, 8:1, and 16:1.Data preprocessing is used for transforming from 1D raw signals to feature images.Table 3 shows the sample sizes of different imbalance ratios (IRs) for gearbox data with various classes for Case II.

Sample Generation Experiment
The learning rates were also 0.0004 and 0.0002 in the generator and discriminator, respectively.The batch size was 64.The generator is updated once when the discriminator is updated four times.Figure 13a shows the change in the generator loss function.Figure 13b shows the variation in the discriminator loss function with data augmentation.It is seen that the iteration number is more than 13,000.The generator and discriminator oscillated during the training stage due to the confrontation between the two networks.
with load configuration 20 HZ-0V.The experimental platform includes a motor, mo controller, planetary gearbox, parallel gearbox, brake, and brake controller.The gear dataset contained four failure classes containing chipped, miss, root, surface, and one n mal class.Therefore, fault diagnosis of Case II data was a five-class diagnosis task.O sample contained 1024 sampling points in the x direction.
The SEU dataset was designed to simulate the environment of sample imbalance w three imbalance ratios, including IR = 4:1, 8:1, and 16:1.Data preprocessing is used transforming from 1D raw signals to feature images.

Sample Generation Experiment
The learning rates were also 0.0004 and 0.0002 in the generator and discrimina respectively.The batch size was 64.The generator is updated once when the discrimin is updated four times.Figure 13a shows the change in the generator loss function.effectively and ensures the high quality of the synthesized samples.The train samples are augmented as a sufficient dataset.

Fault Classification Experiment
The multi-class confusion matrixes of the output with the three IRs are presented in Figure 15.We can see that the accuracy is more than 99% in each IR.The normal class (No. 1) can achieve 100% accuracy, so the proposed method can completely identify the normal and other failure classes.The results show that the method of synthesizing samples can effectively reduce the misjudgment rate under the condition of imbalanced samples.To verify the superiority of the proposed approach for fault classification, the three mainstream classifiers mentioned above were also used.It can be found in Table 4 that the proposed approach is superior to other classifiers, according to the four metrics, manifesting the feasibility of the improved method.The accuracy of the model decreases with the increase in imbalance ratio.However, the classifier has no remarkable advantage over ResNet18 and AlexNet, especially in the accuracy of IR = 4:1.It is convenient to use the self-owned discriminative model as the classifier to avoid designing another classifier.It proved the effectiveness and practicability of using the proposed approach to realize diagnosis of the gearbox dataset.

Fault Classification Experiment
The multi-class confusion matrixes of the output with the three IRs are presented in Figure 15.We can see that the accuracy is more than 99% in each IR.The normal class (No. 1) can achieve 100% accuracy, so the proposed method can completely identify the normal and other failure classes.The results show that the method of synthesizing samples can effectively reduce the misjudgment rate under the condition of imbalanced samples.To verify the superiority of the proposed approach for fault classification, the three mainstream classifiers mentioned above were also used.It can be found in Table 4 that the proposed approach is superior to other classifiers, according to the four metrics, manifesting the feasibility of the improved method.The accuracy of the model decreases with the increase in imbalance ratio.However, the classifier has no remarkable advantage over ResNet18 and AlexNet, especially in the accuracy of IR = 4:1.It is convenient to use the self-owned discriminative model as the classifier to avoid designing another classifier.It proved the effectiveness and practicability of using the proposed approach to realize diagnosis of the gearbox dataset.

Figure 3 .
Figure 3. Architecture of CNN.2.1.4.Deep Convolutional GAN Deep convolutional GANs (DCGANs) [32] are a commonly used neural network architecture for training GANs.DCGAN introduces a convolution layer into the GAN structure and uses the feature extraction capability of a CNN to enhance performance.This network has found wide acceptance since it was proposed.Batch normalization (BN) layers are utilized in the model.2.1.5.Parametric Rectified Linear Unit Rectified linear unit (ReLU) is a general activation function.It maintains the biological heuristics of the step function, but the derivative is non-zero when the input is positive, permitting gradient-based learning.The function ReLU is expressed by the following:

Figure 5 .
Figure 5. Detailed architecture of generative model.

Figure 5 .
Figure 5. Detailed architecture of generative model.

Figure 7 .
Figure 7.Some preprocessed results for Case I.

Figure 7 .
Figure 7.Some preprocessed results for Case I.

Figure 8 .
Figure 8. Trends of loss functions in Case I: (a) trends of the generator loss ( G L ); (b) trends of the

10 Figure 9 .
Figure 9.Some synthesized samples for Case I.

Figure 8 .
Figure 8. Trends of loss functions in Case I: (a) trends of the generator loss (L G ); (b) trends of the discriminator loss of generation (L * D_ge ).

Figure 8 .
Figure 8. Trends of loss functions in Case I: (a) trends of the generator loss ( G L ); (b) trends of the

Figure 9 .
Figure 9.Some synthesized samples for Case I.

Figure 10 .
Figure 10.Histogram matching comparison of real and synthesized samples in Case I (x axis denotes the number of bins, and y axis denotes the frequency in the histogram).

Figure 10 .
Figure 10.Histogram matching comparison of real and synthesized samples in Case I (x axis denotes the number of bins, and y axis denotes the frequency in the histogram).

Figure 11 .
Figure 11.Comparison of generated samples with histogram distance in Case I.

Figure 11 .
Figure 11.Comparison of generated samples with histogram distance in Case I.

Figure 12 .
Figure 12.Confusion matrix with different imbalance ratios in Case I.

Figure 12 .
Figure 12.Confusion matrix with different imbalance ratios in Case I.

3. 2 .
Case II: SEU Dataset 3.2.1.Data Description Compared with the bearing fault diagnosis of Case I, the proposed model can be applied in other fault applications, such as gearboxes.Then, the SEU gearbox dataset was used for Case II.The dataset was collected from a dynamic drivetrain simulator (DDS) with load configuration 20 HZ-0V.The experimental platform includes a motor, motor controller, planetary gearbox, parallel gearbox, brake, and brake controller.The gearbox dataset contained four failure classes containing chipped, miss, root, surface, and one normal class.Therefore, fault diagnosis of Case II data was a five-class diagnosis task.One sample contained 1024 sampling points in the x direction.
The 1D vibration signals were transformed into time-frequency images for fault diagnosis.Ten experiments were also repeated for each training and testing.The 1D vibration signals are transformed into 2D feature images with wavelet transform for Case II.

Figure 13 .
Figure 13.Trends of loss functions in Case II: (a) generator loss ( G L ); (b) discriminator loss of

Figure 14 compares
Figure 14 compares the results with the histogram distance hm D of the propo

Figure 14 compares
Figure 14 compares the results with the histogram distance D hm of the proposed model and DCGAN in Case II.The average D hm values of the proposed approach are higher than that of DCGAN.It is demonstrated that the model learns the fault features effectively and ensures the high quality of the synthesized samples.The train samples are augmented as a sufficient dataset.

Figure 14 .
Figure 14.Comparison of generated samples with histogram distance in Case II.

Figure 14 .
Figure 14.Comparison of generated samples with histogram distance in Case II.
The discrim inator makes a distinction between real data y and fake data D for the input sample Suppose real () Py is the actual data distribution of y , and () 2.1.1.GAN Principle GAN [30] contains generator G and discriminator D in Figure 1.Random noise u generates samples () Gu , which follow the actual data distribution, real P .u Pu is the generated noise distribution of u .The adversarial training between G and D is realized by the follow ing:

Table 1 .
Sample sizes of imbalance ratio in Case I.

Table 1 .
Sample sizes of imbalance ratio in Case I.

Table 2 .
Comparison of classifier performance for Case I.

Table 2 .
Comparison of classifier performance for Case I.

Table 3 .
Sample sizes of imbalance ratio in Case II.
Table 3 shows the sample size different imbalance ratios (IRs) for gearbox data with various classes for Case II.The vibration signals were transformed into time-frequency images for fault diagnosis.experiments were also repeated for each training and testing.The 1D vibration signals transformed into 2D feature images with wavelet transform for Case II.

Table 3 .
Sample sizes of imbalance ratio in Case II.

Table 4 .
Comparison results of classifier accuracy in Case II.