A Novel Deep Learning System with Data Augmentation for Machine Fault Diagnosis from Vibration Signals

: In real engineering scenarios, it is di ﬃ cult to collect adequate cases with faulty conditions to train an intelligent diagnosis system. To alleviate the problem of limited fault data, this paper proposes a fault diagnosis method combining a generative adversarial network (GAN) and stacked denoising auto-encoder (SDAE). The GAN approach augments the limited real measured data, especially in faulty conditions. The generated data are then transformed into the SDAE fault diagnosis model. The GAN-SDAE approach improves the accuracy of the fault diagnosis from the vibration signals, especially when the measured samples are few. The usefulness of this method is assessed through two condition-monitoring cases: one is a classic bearing example and the other is a more general gear failure. The results demonstrate that diagnosis accuracy for both cases is above 90% for various working conditions, and the GAN-SDAE system is stable.


Introduction
Fault diagnosis is necessary for mechanical equipment in modern industries [1][2][3]. Through condition monitoring, the signals of various sensors are extracted and analyzed for fault diagnosis. Because of the inconstancy and enrichment of the response signals under harsh operating conditions, it is difficult to identify the failure mode of mechanical systems.
Deep learning algorithms for diagnosing machinery faults have become prevalent owing to their robustness and capacity for adaptation. Deep architectures of computational layers and neurons are the foundation of deep learning. Roozbeh et al. [4] proposed a semi-supervised and deep ladder network (SSDLN); with few labeled samples, the model is trained through a high-dimensional feature space to diagnose gear failures. Zhang et al. [5] presented a novel unsupervised learning algorithm based on the generalized norm of the feature matrix for intelligent fault diagnosis. The feature sparsity was successfully measured by optimizing the objective function. However, in real engineering scenarios, it is difficult to identify useful details in the recorded fault signal. Therefore, it is necessary to reduce the feature dimensions. Khare et al. [6] proposed a spider monkey optimization (SMO)-based deep neural network (DNN) model. Compared to the classic DNN model, the SMO-DNN achieved a dimensionality reduction and accurate classification. As another classic deep learning model, the deep belief network conducts information fusion through data reduction [7]. Wang et al. [8] presented an unsupervised fault-diagnosis model using the sparse-filtering algorithm. The model uses the learned fault features to classify different fault types.
Current deep learning is inherently challenged by an unstable network structure. It is necessary to analyze the robust stability of deep neural networks [9]. Compared to other deep learning methods, the sparse denoising auto-encode (SDAE) model is established by a series of auto-encoders and has a stable deep network structure with a layer-by-layer nature [10]. The SDAE is widely used in fault diagnosis of rotating machinery. Typical examples are gears and bearings [11,12]. Another problem of deep learning is that many training samples are required. Deep learning models are traditionally big data driven and an obvious lack of sufficient training affects the accuracy of the model [13]. Based on a relatively small dataset, Sulikowski et al. [14] presented a deep learning-enhanced framework that had a satisfactory accuracy. However, in practice, there are very few fault samples affected by cost. In traditional mathematical statistics, the definition of small sample data is that of samples of size n ≤ 30 [15]. Obviously, most of the fault samples are small sample data, which lead to limitations in the data [16]. This data limitation results in poor model performance, as shown in Figure 1 [17]. The deep learning model of fault diagnosis is trained by a few training samples. Given the limited fault data, the model training is often insufficient, resulting in an inaccurate fault diagnosis. Therefore, data limitation is a huge challenge for deep-learning model training. So, we propose the generative adversarial network (GAN) data-augmentation to effectively solve the data-limitation problem. In the current era of fierce artificial intelligence, GAN can productively solve the problem of small-sample data with insufficient data volumes. Based on artificial intelligence theory, GAN provides enough data for deep learning training.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 2 of 21 Current deep learning is inherently challenged by an unstable network structure. It is necessary to analyze the robust stability of deep neural networks [9]. Compared to other deep learning methods, the sparse denoising auto-encode (SDAE) model is established by a series of auto-encoders and has a stable deep network structure with a layer-by-layer nature [10]. The SDAE is widely used in fault diagnosis of rotating machinery. Typical examples are gears and bearings [11,12]. Another problem of deep learning is that many training samples are required. Deep learning models are traditionally big data driven and an obvious lack of sufficient training affects the accuracy of the model [13]. Based on a relatively small dataset, Sulikowski et al. [14] presented a deep learning-enhanced framework that had a satisfactory accuracy. However, in practice, there are very few fault samples affected by cost. In traditional mathematical statistics, the definition of small sample data is that of samples of size n ≤ 30 [15]. Obviously, most of the fault samples are small sample data, which lead to limitations in the data [Error! Reference source not found.]. This data limitation results in poor model performance, as shown in Figure 1 [17]. The deep learning model of fault diagnosis is trained by a few training samples. Given the limited fault data, the model training is often insufficient, resulting in an inaccurate fault diagnosis. Therefore, data limitation is a huge challenge for deep-learning model training. So, we propose the generative adversarial network (GAN) data-augmentation to effectively solve the data-limitation problem. In the current era of fierce artificial intelligence, GAN can productively solve the problem of small-sample data with insufficient data volumes. Based on artificial intelligence theory, GAN provides enough data for deep learning training. The GAN is a data-augmentation model proposed by Goodfellow [18]. It has been proverbially applicable to image processing. Wang et al. [19] proposed a new method called 3D GAN to calculate the full-dose images with low-dose ones. The conclusion showed that the 3D GAN method outperformed the benchmark methods. Similarly, Ma et al. [20] presented a GAN method to fuse the images with different resolutions. The results demonstrated that the strategy generated clear, clean fused images, without the noise caused by up-sampling of infrared information. In the GAN approach, the joint distribution of latent representations was sought by adversarial layers [21,22]. After adversarial learning, the trained latent representations were well-aligned to explain the cooccurrence configurations of the fused images.
Ghorban et al. [23] presented a multichannel GAN for generating traffic signs. In contrast to other existing approaches, the proposed method processed multiple channels with different textures and resolutions. Similarly, the principle of GAN has been applied to fault diagnosis. Usually, the testing data from different machine-fault conditions are unavailable for training; however, deep generative neural networks can provide reliable diagnoses by artificially generating training samples [24]. In practical working conditions with data-acquisition equipment, the achievable fault data are really limited. Furthermore, the problem of fault types being unevenly distributed also restricts the diagnostic accuracy. To solve these problems, Mao et al. [25] presented an imbalanced fault diagnosis method using GAN. Moreover, a detailed comparative study is provided in this paper. Considering the mode collapse of GAN, Wen et al. [26] presented a deep learning method to identify different The GAN is a data-augmentation model proposed by Goodfellow [18]. It has been proverbially applicable to image processing. Wang et al. [19] proposed a new method called 3D GAN to calculate the full-dose images with low-dose ones. The conclusion showed that the 3D GAN method outperformed the benchmark methods. Similarly, Ma et al. [20] presented a GAN method to fuse the images with different resolutions. The results demonstrated that the strategy generated clear, clean fused images, without the noise caused by up-sampling of infrared information. In the GAN approach, the joint distribution of latent representations was sought by adversarial layers [21,22]. After adversarial learning, the trained latent representations were well-aligned to explain the co-occurrence configurations of the fused images.
Ghorban et al. [23] presented a multichannel GAN for generating traffic signs. In contrast to other existing approaches, the proposed method processed multiple channels with different textures and resolutions. Similarly, the principle of GAN has been applied to fault diagnosis. Usually, the testing data from different machine-fault conditions are unavailable for training; however, deep generative neural networks can provide reliable diagnoses by artificially generating training samples [24]. In practical working conditions with data-acquisition equipment, the achievable fault data are really limited. Furthermore, the problem of fault types being unevenly distributed also restricts the diagnostic accuracy. To solve these problems, Mao et al. [25] presented an imbalanced fault diagnosis method using GAN. Moreover, a detailed comparative study is provided in this paper. Considering the mode collapse of GAN, Wen et al. [26] presented a deep learning method to identify different failure modes of gears. However, the study did not consider the robustness of the model under different operating and faulty conditions, such as different workloads and fault sizes. In addition, the quality of the generated data needs to be analyzed. Mode collapse is another challenge of the GAN model [27]. A novel GAN model with a gradient penalty term was designed to avoid model collapse, thereby ensuring model performance.
This paper combines the results of deep learning network applications in other fields with the characteristics of a vibration signal. The SDAE [28] was used to mine implied feature information from complex vibration signals. Compared to the prime deep learning for complex fault diagnosis, the superiority of SDAE is that it uses the depth structure to mine the essential information. In addition, SDAE is robust, which reduces the influence of noise on the recognition results to enhance their accuracy. Furthermore, GAN solves the problem of insufficient SDAE training samples.
We built a novel deep learning system with data augmentation called GAN-SDAE based on waterfall model fusion. For fault diagnosis of complex vibration signals, a single method is not continuously accurate or effective in considering the limitation of the fault samples. Model fusion includes the combination of models that realize different functions and has been widely used in different fault diagnosis fields [29,30]. The fault diagnosis method based on model fusion can complement the advantages of different models and make up for the shortcomings. The GAN model solves the limited fault samples and the SDAE model analyzes the fault types. So, the GAN and SDAE models are integrated and optimized. The GAN-SDAE system for fault diagnosis can greatly improve the ability and accuracy of fault diagnosis. Moreover, the uncertainty of the GAN-SDAE system is analyzed to assure the system stability.
In this paper, we first review the fault diagnosis, which is the basis for building a new model. Secondly, we introduce the GAN data augmentation, and the generated training data are used in the SDAE model training. This process actually constructs the GAN-SDAE system. The accuracy of the GAN-SDAE system is then analyzed and the system is repetitively tested to analyze the system uncertainty. Finally, the usefulness of the proposed system is demonstrated using two examples of bearings and gears. Based on the above introductions, the novelties and highlights of this paper are as follows: (1) The existing GAN model is improved, by designing a gradient penalty term to avoid model collapse. (2) Based on waterfall model fusion, the GAN and SDAE models are merged to construct a novel GAN-SDAE system. (3) The performance and uncertainty of the GAN-SDAE system are fully analyzed. (4) The GAN-SDAE system is applied to the fault diagnosis of bearings and gears, which proves the effectiveness and versatility of the proposed system.

Fault Diagnosis
Critical machines regularly have permanently installed vibration sensors and transducers, and the acquired vibration signal is used to diagnose the machine for maintenance [31,32]. Many vibrations are directly linked to cyclical events in the machine equipment's operation, such as rotating shafts gear-teeth. Such events occur often and give a direct indication of the source from the frequency; hence, many powerful diagnostic techniques are usually using frequency analysis.
However, critical industrial machinery always has a complicated configuration and operates under mixed non-stationary modes. The various vibration signals of these machines are complex and are often contaminated by other interference and noises. The collected signals are analyzed with traditional signal-processing methods, such as wavelet analysis and envelope spectrum analysis [33]. More importantly, the traditional fault-diagnosis accuracy is low considering the massive and multiple data. Therefore, it remains a challenge to accurately and intelligently diagnose machine faults using the recorded vibration signals.

GAN Approach
The GAN approach was inspired by a zero-sum game, and it is able be used to generate many fault signals to achieve data augmentation. The GAN approach consists of two models, namely the generative model (G model) and the discriminant model (D model). Usually, both the G and D models apply a neural network structure, such as the typical multilayer perception network.
The input of the G model is a stochastic noise signal Z = (z 1 , z 2 , . . . , z m ) with a data distribution P z . The output of the G model is the generated signals or samples G(z), with a data distribution P g that resemble the true distribution P data of the original experiment signal. The relationships among the P z , P g and P data are shown in Figure 2. For example, the generative model can convert a random-noise signal into multiple new signals and transform the data in generated data space to the true data space through the loss.

GAN Approach
The GAN approach was inspired by a zero-sum game, and it is able be used to generate many fault signals to achieve data augmentation. The GAN approach consists of two models, namely the generative model (G model) and the discriminant model (D model). Usually, both the G and D models apply a neural network structure, such as the typical multilayer perception network.
The input of the G model is a stochastic noise signal Z = (z1, z2, …, zm) with a data distribution Pz. The output of the G model is the generated signals or samples G(z), with a data distribution Pg that resemble the true distribution Pdata of the original experiment signal. The relationships among the Pz, Pg and Pdata are shown in Figure 2. For example, the generative model can convert a random-noise signal into multiple new signals and transform the data in generated data space to the true data space through the loss. The input of the D model is the generated samples G(z) and the true and fault samples X = (x1, x2, …, xn). The output of D model is the probability that the generated fault samples are true samples. The likelihood function L(D, G) of G and D is given by During the training procedure in GAN, G is first initialized and fixed; then, D is trained by maximizing the loss function in Equation (2). D is upgraded using the stochastic gradient ascending.
The trained D is fixed, and then G is promoted by minimizing the loss function in Equation (3) with the descending stochastic gradient.
The basis of generating vibration fault samples is to ensure their accuracy. Under this premise, the target of D is to recognize the true or false fault samples by exporting the logical value. The G and D models are alternately trained until the GAN model reaches the Nash equilibrium [34]. By combining Equations (2) and (3), the cross-entropy loss of L(D, G) is calculated as follows: With many peaks, the data distribution of the vibration signals is highly complex and multimodal. Each peak is called a mode, and each mode represents the concentration of similar failure samples. Mode collapse means that G only exports part of multiple failure modes, and misses other failure modes. In mode collapse, the generated failure samples belongs to a limited set of modes, when G thinks it can fool D by locking a single mode. D eventually finds that the samples in the single mode are fake. However, G is just locked into the single mode, which fundamentally limits the generated samples' diversity to affect the GAN model performance.
According to the manifold distribution law [35], high-dimensional data of the same category is often concentrated near a low-dimensional manifold. The ideal situation for G is to map the input Figure 2. Relationships among the P data . P z , P g and P data.
The input of the D model is the generated samples G(z) and the true and fault samples X = (x 1 , x 2 , . . . , x n ). The output of D model is the probability that the generated fault samples are true samples. The likelihood function L(D, G) of G and D is given by During the training procedure in GAN, G is first initialized and fixed; then, D is trained by maximizing the loss function in Equation (2). D is upgraded using the stochastic gradient ascending.
The trained D is fixed, and then G is promoted by minimizing the loss function in Equation (3) with the descending stochastic gradient.
The basis of generating vibration fault samples is to ensure their accuracy. Under this premise, the target of D is to recognize the true or false fault samples by exporting the logical value. The G and D models are alternately trained until the GAN model reaches the Nash equilibrium [34]. By combining Equations (2) and (3), the cross-entropy loss of L(D, G) is calculated as follows: With many peaks, the data distribution of the vibration signals is highly complex and multi-modal. Each peak is called a mode, and each mode represents the concentration of similar failure samples. Mode collapse means that G only exports part of multiple failure modes, and misses other failure modes. In mode collapse, the generated failure samples belongs to a limited set of modes, when G thinks it can fool D by locking a single mode. D eventually finds that the samples in the single mode are fake. However, G is just locked into the single mode, which fundamentally limits the generated samples' diversity to affect the GAN model performance.
According to the manifold distribution law [35], high-dimensional data of the same category is often concentrated near a low-dimensional manifold. The ideal situation for G is to map the input noise to the manifold where the training data is located, and correspond to the probability distribution of the training data. For example, the probability distribution of a training data set is a simple one-dimensional Gaussian mixture distribution, including two peaks, as shown in Figure 3.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 21 noise to the manifold where the training data is located, and correspond to the probability distribution of the training data. For example, the probability distribution of a training data set is a simple one-dimensional Gaussian mixture distribution, including two peaks, as shown in Figure 3.  However, the ideal situation is almost impossible. In fact, most of the time, there are many generated low-quality samples (marked as red), as shown in Figure 5.  noise to the manifold where the training data is located, and correspond to the probability distribution of the training data. For example, the probability distribution of a training data set is a simple one-dimensional Gaussian mixture distribution, including two peaks, as shown in Figure 3.  However, the ideal situation is almost impossible. In fact, most of the time, there are many generated low-quality samples (marked as red), as shown in Figure 5.  However, the ideal situation is almost impossible. In fact, most of the time, there are many generated low-quality samples (marked as red), as shown in Figure 5.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 21 noise to the manifold where the training data is located, and correspond to the probability distribution of the training data. For example, the probability distribution of a training data set is a simple one-dimensional Gaussian mixture distribution, including two peaks, as shown in Figure 3.  However, the ideal situation is almost impossible. In fact, most of the time, there are many generated low-quality samples (marked as red), as shown in Figure 5.  The target of a GAN is to generate green samples instead of red samples. On the other hand, mode collapse restricts the diversity of the generated samples; that is, the generated samples are repetitive, similar, and lack modes, as shown in Figure 6.
The target of a GAN is to generate green samples instead of red samples. On the other hand, mode collapse restricts the diversity of the generated samples; that is, the generated samples are repetitive, similar, and lack modes, as shown in Figure 6. Because there are many regional Nash equilibrium states in a GAN, the parameter optimization is not a convex majorization problem. Even if a GAN enters a certain Nash equilibrium state and the loss function appears to converge, mode collapse may still occur. Mode collapse is usually accompanied by such a phenomenon: when the discriminator updates the parameters near the training samples, the gradient value is so large. Therefore, the solution of mode collapse is that the discriminator is near the training sample to impose a gradient penalty term , which is given by The proposed method attempts to construct a linear function near the training samples, because the linear function is a convex function and has a global optimal solution. The gradient penalty term is designed to improve GAN model and avoid mode collapse.
By way of the reciprocal adversarial learning between D and G, the capability of D and G is progressive. It has been proven that when Pg = Pdata, the distribution of the sample generated by G ideally matches the original fault-sample distribution. Moreover, the cross-entropy loss value converges to the Nash equilibrium when the output of D equals 0.5 [36]. Table 1 shows the training algorithm of the GAN approach for generating new samples to augment the number of faulty samples. Table 1. Training process of the generative adversarial network (GAN).
Algorithm minibatch stochastic gradient descent (SGD) training for the GAN approach. The number of steps to apply to D was a hyper parameter (k). In the data augmentation experiments, k = 1 was used, which was the least expensive option.  Because there are many regional Nash equilibrium states in a GAN, the parameter optimization is not a convex majorization problem. Even if a GAN enters a certain Nash equilibrium state and the loss function appears to converge, mode collapse may still occur. Mode collapse is usually accompanied by such a phenomenon: when the discriminator updates the parameters near the training samples, the gradient value is so large. Therefore, the solution of mode collapse is that the discriminator is near the training sample to impose a gradient penalty term δ, which is given by The proposed method attempts to construct a linear function near the training samples, because the linear function is a convex function and has a global optimal solution. The gradient penalty term is designed to improve GAN model and avoid mode collapse.
By way of the reciprocal adversarial learning between D and G, the capability of D and G is progressive. It has been proven that when P g = P data , the distribution of the sample generated by G ideally matches the original fault-sample distribution. Moreover, the cross-entropy loss value converges to the Nash equilibrium when the output of D equals 0.5 [36]. Algorithm 1 shows the training algorithm of the GAN approach for generating new samples to augment the number of faulty samples. Algorithm 1. Training process of the generative adversarial network (GAN). Minibatch stochastic gradient descent (SGD) training for the GAN approach. The number of steps to apply to D was a hyper parameter (k). In the data augmentation experiments, k = 1 was used, which was the least expensive option.

1:
For number of GAN training iterations do 2: For k steps do 3: Sample minibatch of m noise samples from noise prior P z (z).

4:
Sample minibatch of m examples from vibration data distribution P data (x).

5:
Update the D by ascending the stochastic gradient: Sample minibatch of m noise samples from noise prior P z (z). 9: Update the G by descending the stochastic gradient: 10: end for Any normal gradient-based learning regulation was used by the gradient-based updates, and the momentum was also used in the data augmentation experiments.
As mentioned earlier, it is necessary to check the uncertainty of the generated samples. Therefore, a Gaussian-distribution cloud model was integrated into the GAN approach to evaluate this uncertainty. The mean X and variance S 2 of the generated samples are easily obtained. The entropy E n of the generated samples is calculated as follows: where E x refers to the expectation of the generated samples and is given by E x = X. In the next step, the hyper entropy term (H e ) is the uncertainty index of E n . It incarnates the dispersion degree of the cloud droplets that make up the cloud model. The H e value is given by In the cloud model, we suppose l is satisfied: l ∼ N(Ex, En 2 ), where En ∼ N(En, He 2 ). The degree of membership function for the fuzziness of the generated samples is written as follows: The calculated membership function is fed back to the GAN approach and can reduce the uncertainty of the generated samples. The GAN approach is improved to generate samples by integrating into the Gaussian-distribution cloud model for data quality. The cloud model is demonstrated in Section 3. In Section 2.3, the samples generated by the GAN and the cloud model analysis provide high-quality training samples for the SDAE model.

Stacked Denoising Auto-Encoder
The SDAE consists of multiple denoising auto-encoders (DAE) through stacking [37]. The SDAE can provide extensive and vigorous fault-feature extraction from raw-input vibration data. The structure of the SDAE is composed of neural networks, as shown in Figure 7.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 21 end for Any normal gradient-based learning regulation was used by the gradient-based updates, and the momentum was also used in the data augmentation experiments.
As mentioned earlier, it is necessary to check the uncertainty of the generated samples. Therefore, a Gaussian-distribution cloud model was integrated into the GAN approach to evaluate this uncertainty. The mean X and variance S 2 of the generated samples are easily obtained. The entropy En of the generated samples is calculated as follows: where Ex refers to the expectation of the generated samples and is given by In the next step, the hyper entropy term (He) is the uncertainty index of En. It incarnates the dispersion degree of the cloud droplets that make up the cloud model. The He value is given by In the cloud model, we suppose l is satisfied:

He En En
. The degree of membership function for the fuzziness of the generated samples is written as follows: The calculated membership function is fed back to the GAN approach and can reduce the uncertainty of the generated samples. The GAN approach is improved to generate samples by integrating into the Gaussian-distribution cloud model for data quality. The cloud model is demonstrated in Section 3. In Section 2.3, the samples generated by the GAN and the cloud model analysis provide high-quality training samples for the SDAE model.

Stacked Denoising Auto-Encoder
The SDAE consists of multiple denoising auto-encoders (DAE) through stacking [37]. The SDAE can provide extensive and vigorous fault-feature extraction from raw-input vibration data. The structure of the SDAE is composed of neural networks, as shown in Figure 7. The auto-encoder includes an encoding network and a decoding network. The encoding network maps the input vector X = {x1, …, xi} to the encoded vector, which is subsequently remapped to the output vector Y by the decoding network. The encoded vector is the characteristic representation of the input vector, whereas the output vector is the reconstruction representation of the input vector. The dimension of the output vector is equal to the input vector. The output of the encoding network is expressed as The auto-encoder includes an encoding network and a decoding network. The encoding network maps the input vector X = {x 1 , . . . , x i } to the encoded vector, which is subsequently remapped to the output vector Y by the decoding network. The encoded vector is the characteristic representation of the input vector, whereas the output vector is the reconstruction representation of the input vector. The dimension of the output vector is equal to the input vector. The output of the encoding network is expressed as Appl. Sci. 2020, 10, 5765 8 of 21 where W = {w 1 , w 2 , . . . , w n } and B = {b 1 , b 2 , . . . , b n } are the weight matrix and the bias matrix, respectively, from the input layer to the hidden layer; and W and B constitute the parameter θ = {W, B}. Generally, the activation function S is a sigmoid function. The function of W and B is able to be summarized as the encoding of original data. SDAE decodes the output of the hidden layer using the following equation: Constructing the SDAE model mainly determines the model parameters. The equation to solve the optimal parameters θ and θ T is as follows: To obtain the features of the input data, the loss function of SDAE must be reduced as much as possible. Like the traditional auto-encoder, the SGD algorithm is used to minimize the model error of SDAE. The loss function L(X, Z) is expressed by The principle of SDAE is shown in Figure 8. As a typical deep learning method, SDAE has its own shortcomings. When the training vibration-fault samples are inadequate, the training on the SDAE model is also insufficient and the model cannot accurately obtain the parameter θ. This means that the SDAE-based fault diagnosis is inaccurate. However, as mentioned above, the GAN can generate many training samples after the data augmentation, and this can compensate for the shortcomings of the SDAE. A joint model combining GAN and SDAE-a GAN-SDAE system-is the best solution to improve the accuracy of the fault diagnosis.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 21 where W = {w1, w2, …, wn} and B = {b1, b2, …, bn} are the weight matrix and the bias matrix, respectively, from the input layer to the hidden layer; and W and B constitute the parameter θ = {W, B}. Generally, the activation function S is a sigmoid function. The function of W and B is able to be summarized as the encoding of original data. SDAE decodes the output of the hidden layer using the following equation: Constructing the SDAE model mainly determines the model parameters. The equation to solve the optimal parameters θ and θ T is as follows: To obtain the features of the input data, the loss function of SDAE must be reduced as much as possible. Like the traditional auto-encoder, the SGD algorithm is used to minimize the model error of SDAE. The loss function L(X, Z) is expressed by The principle of SDAE is shown in Figure 8. As a typical deep learning method, SDAE has its own shortcomings. When the training vibration-fault samples are inadequate, the training on the SDAE model is also insufficient and the model cannot accurately obtain the parameter θ. This means that the SDAE-based fault diagnosis is inaccurate. However, as mentioned above, the GAN can generate many training samples after the data augmentation, and this can compensate for the shortcomings of the SDAE. A joint model combining GAN and SDAE-a GAN-SDAE system-is the best solution to improve the accuracy of the fault diagnosis.

Model Fusion and Evaluation
Model fusion merges multiple weak models into a strong model, which reduces the tendency of a single model to be overfitted. The fusion of multiple models can improve the normalization ability. It avoids the case wherein the single model is unstable with a low fault diagnosis ability. Multiple models can often improve the fault diagnosis ability and the robustness of the model significantly.
Waterfall model fusion uses a method of connecting multiple models in series. Each recommendation algorithm is regarded as a model. Waterfall model fusion connects models with different functions back and forth, as shown in Figure 9.

Model Fusion and Evaluation
Model fusion merges multiple weak models into a strong model, which reduces the tendency of a single model to be overfitted. The fusion of multiple models can improve the normalization ability. It avoids the case wherein the single model is unstable with a low fault diagnosis ability. Multiple models can often improve the fault diagnosis ability and the robustness of the model significantly.
Waterfall model fusion uses a method of connecting multiple models in series. Each recommendation algorithm is regarded as a model. Waterfall model fusion connects models with different functions back and forth, as shown in Figure 9. In the waterfall fusion, the target result achieved by the previous recommendation method can be used as the input of the latter method, progressively step by step. The function of the fusion model is gradually enhanced, and finally a result set with a low quantity and high quality is obtained. It is usually used to implement the recommended scenarios for different model functions. In designing a waterfall model-fusion method, the model functions are usually sequenced and gradually transitioned to achieve the final goal. In the face of a large number of candidate recommendation objects, but few valuable recommendation results, high accuracy requirements and limited computing time, it is often suitable.
The small fault-sample regeneration model (GAN) and SDAE model are a series of two single models, adopting the waterfall-type in model fusion. The GAN-SDAE structure is shown in Figure  10. The GAN-SDAE system based on waterfall model fusion strengthens the model's fault tolerance rate and improves the accuracy of the integrated learning. For the GAN-SDAE system, the number of neurons is more significant than the layer thickness in the case of small fault samples [38]. The number of layers for both GAN and SDAE is three or four each. To construct an optimal GAN-SDAE system structure, a reconstruction error in the deep networks is proposed [39]. The reconstruction errors of each sample from the GAN-SDAE were obtained in a training process for a fault diagnostic task. Because the real fault diagnosis result (C) for each sample is known, the diagnostic error can be calculated. The output result from the GAN-SDAE system is Z, and the reconstruction error Ɛ in the training process is given by Ɛ = |C − Z|. In the waterfall fusion, the target result achieved by the previous recommendation method can be used as the input of the latter method, progressively step by step. The function of the fusion model is gradually enhanced, and finally a result set with a low quantity and high quality is obtained. It is usually used to implement the recommended scenarios for different model functions. In designing a waterfall model-fusion method, the model functions are usually sequenced and gradually transitioned to achieve the final goal. In the face of a large number of candidate recommendation objects, but few valuable recommendation results, high accuracy requirements and limited computing time, it is often suitable.
The small fault-sample regeneration model (GAN) and SDAE model are a series of two single models, adopting the waterfall-type in model fusion. The GAN-SDAE structure is shown in Figure 10. The GAN-SDAE system based on waterfall model fusion strengthens the model's fault tolerance rate and improves the accuracy of the integrated learning. In the waterfall fusion, the target result achieved by the previous recommendation method can be used as the input of the latter method, progressively step by step. The function of the fusion model is gradually enhanced, and finally a result set with a low quantity and high quality is obtained. It is usually used to implement the recommended scenarios for different model functions. In designing a waterfall model-fusion method, the model functions are usually sequenced and gradually transitioned to achieve the final goal. In the face of a large number of candidate recommendation objects, but few valuable recommendation results, high accuracy requirements and limited computing time, it is often suitable.
The small fault-sample regeneration model (GAN) and SDAE model are a series of two single models, adopting the waterfall-type in model fusion. The GAN-SDAE structure is shown in Figure  10. The GAN-SDAE system based on waterfall model fusion strengthens the model's fault tolerance rate and improves the accuracy of the integrated learning. For the GAN-SDAE system, the number of neurons is more significant than the layer thickness in the case of small fault samples [38]. The number of layers for both GAN and SDAE is three or four each. To construct an optimal GAN-SDAE system structure, a reconstruction error in the deep networks is proposed [39]. The reconstruction errors of each sample from the GAN-SDAE were obtained in a training process for a fault diagnostic task. Because the real fault diagnosis result (C) for each sample is known, the diagnostic error can be calculated. The output result from the GAN-SDAE system is Z, and the reconstruction error Ɛ in the training process is given by Ɛ = |C − Z|. For the GAN-SDAE system, the number of neurons is more significant than the layer thickness in the case of small fault samples [38]. The number of layers for both GAN and SDAE is three or four each. To construct an optimal GAN-SDAE system structure, a reconstruction error in the deep networks is proposed [39]. The reconstruction errors of each sample from the GAN-SDAE were obtained in a training process for a fault diagnostic task. Because the real fault diagnosis result (C) for each sample is known, the diagnostic error can be calculated. The output result from the GAN-SDAE system is Z, and the reconstruction error ε in the training process is given by ε = |C − Z|.
Different reconstruction errors for different network structures were calculated. The network structure having the least reconstruction error was finally selected as the optimal structure (the various network structures of GAN-SDAE and corresponding reconstruction errors are analyzed in Section 3). Combining the analyses in Sections 2.2 and 2.3 provides an overall flow chart of using GAN-SDAE for machine fault diagnosis; this is shown in Figure 11.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 21 Different reconstruction errors for different network structures were calculated. The network structure having the least reconstruction error was finally selected as the optimal structure (the various network structures of GAN-SDAE and corresponding reconstruction errors are analyzed in Section 3). Combining the analyses in Section 2.2 and 2.3 provides an overall flow chart of using GAN-SDAE for machine fault diagnosis; this is shown in Figure 11. Evaluation indicators are regarded as developers showing the results to users and making standardized comparisons. Moreover, evaluation indicators can illustrate the appropriate expression of needs and scientific information. Over the years, due to the practice of fault diagnosis methods in various disciplines, various index sets have been established in the field of fault diagnosis to evaluate model performance [40]. The metric sets are mainly designed to verify the performance of the GAN-SDAE system's applications. Therefore, evaluation indicators are particularly important in the model development phase, which can be used to improve the fault diagnosis program.
The error is basically expressed as a deviation from the desired target. Most of the accuracybased metrics are derived directly or indirectly from the error term. The error is expressed as where i ŷ is the estimated value and yi is the expected output value. According to the error definition, the absolute error (AE) is given by Evaluation indicators are regarded as developers showing the results to users and making standardized comparisons. Moreover, evaluation indicators can illustrate the appropriate expression of needs and scientific information. Over the years, due to the practice of fault diagnosis methods in various disciplines, various index sets have been established in the field of fault diagnosis to evaluate model performance [40]. The metric sets are mainly designed to verify the performance of the GAN-SDAE system's applications. Therefore, evaluation indicators are particularly important in the model development phase, which can be used to improve the fault diagnosis program.
The error is basically expressed as a deviation from the desired target. Most of the accuracy-based metrics are derived directly or indirectly from the error term. The error is expressed as whereŷ i is the estimated value and y i is the expected output value. According to the error definition, the absolute error (AE) is given by Considering that there are multiple instances, the mean absolute error (MAE) calculates the average of the absolute error term. The quantity is used to measure how close the estimate is to the final result. MAE is expressed by The mean square error (MSE) is a hazard function that calculates the average of the squared error. MSE can be estimated in the following way: The mean absolute deviation from the sample median (MAD), which is an estimate of the resistance to changes in the result errors, is given by Root mean square error (RMSE) is a measure of the deviation between the observed value and the true value. It is often used as a standard for measuring the output results of machine learning models. RMSE is expressed by The performance evaluation indexes are completely used to provide accurate verification of the GAN-SDAE system, especially in vibration signals, to accurately evaluate the fault diagnosis performance.

Analysis of Uncertainty in a GAN-SDAE
In Section 2.2, a Gaussian-distributed cloud model was built to analyze the uncertainty of the samples generated by the GAN approach. Similarly, it was important to check the uncertainty of the GAN-SDAE system. This system contains a neural network structure, whose calculation method is essentially a black box; hence, the system has inherent instability. The reliability of a GAN-SDAE for machine fault diagnosis can be ensured by analyzing the uncertainty of the system [41]. As previously demonstrated [42], the system uncertainty can be analyzed through repetitive experiments. For the diagnostic result Z, n independent observations are repeatedly performed under reproducible conditions. Subsequently, the standard deviation S(Z i ) of the fault diagnoses yielded by GAN-SDAE can be calculated. The S Z term represents the standard uncertainty outcome for the GAN-SDAE system, scilicet, the A standard uncertainty, u A . The standard deviation is given by Using non-statistical methods to evaluate the standard uncertainty of the proposed system yields B standard uncertainty (u B ). In line with engineering experience and information related to the system, we analyzed the interval in which the fault diagnosis results were within [Z − a, Z + a]. The a denotes the half-width of the confidence interval. With confidence level p, the inclusion factor k is calculated.
The u B can be expressed by u B = a/k. The synthetic standard uncertainty (u C ), which characterizes the degree of the dispersion from GAN-SDAE system experiments, is written as (20) where the sensitivity coefficients of u A and u B are severally k 1 and k 2 .

Vibration Data Description
The vibration data of the ball bearings were collected from a motor-driven test rig [43]. The test instruments mostly are composed of an induction motor (Reliance Electric 2HP IQPreAlert), a dynamometer, a torque sensor, an accelerometer, and other components.
The ball bearings have four conditions and the fault diameters with their labels are in Each sample contained 2048 data points; hence, it was appropriate to implement a fast Fourier transformation (FFT) on the samples. All the test and training data in datasets A, B, C, and D were normalized to render the vibration data comparable with each other. Due to the limited amount of original vibration data with bearing faults, GAN data augmentation was applied to the training data.

Parameters of the GAN-SDAE
The training samples were inputted into the training network of the GAN-SDAE system. When the system was trained to a stable condition, the test samples were inputted into the GAN-SDAE system to test the system. The system output was the fault type; the fault-diagnosis accuracy was also calculated. As mentioned earlier, the accuracy of fault diagnosis depends on the structure of the GAN-SDAE network, containing many layers and neurons. The reconstruction errors were analyzed to determine the optimal network structure. The calculated reconstruction errors are shown in Table 2. In addition, we found a new phenomenon: training the same network structure of G and D performed better than using different network structures. The same-network training results ensured the quality of the generated samples and a high accuracy in the fault diagnosis. On the basis of the reconstruction error, the network structures of G and D have three layers, which respectively have four, four, and one neuron(s). The structure of the SDAE network is four layers, which respectively have three, three, two, and one neuron(s).

Data Augmentation
Like other deep learning methods, there are many parameters in the SDAE model. A vast number of training samples are necessary for training the SDAE so as to avoid overfitting and improve the generalization. In addition, training samples are balanced or imbalanced, which can cause differences in the diagnostic results [44,45]. The accuracy of fault diagnosis is usually affected by the imbalance of datasets. However, datasets A, B, C, and D were all balanced.
According to the previous introduction (Section 2.2), we needed to check the Nash balance point to verify whether G and D were in the trained convergence state. The result for all data reached the Nash equilibrium point. The calculation result, based on the Label 2 type of Dataset A, is shown in Figure 12 as an example. This method was applicable to other datasets.

Data Augmentation
Like other deep learning methods, there are many parameters in the SDAE model. A vast number of training samples are necessary for training the SDAE so as to avoid overfitting and improve the generalization. In addition, training samples are balanced or imbalanced, which can cause differences in the diagnostic results [44,45]. The accuracy of fault diagnosis is usually affected by the imbalance of datasets. However, datasets A, B, C, and D were all balanced.
According to the previous introduction (Section 2.2), we needed to check the Nash balance point to verify whether G and D were in the trained convergence state. The result for all data reached the Nash equilibrium point. The calculation result, based on the Label 2 type of Dataset A, is shown in Figure 12 as an example. This method was applicable to other datasets.  Figure 13 shows the original vibration sample of the Label 2 type of Dataset A and its corresponding generated samples. Other fault-type generated samples were similar. The generated samples differed from the original sample and the original training samples were extended by the generated ones. The uncertainty analysis for the generated samples of the Label 2 type of Dataset A is shown in Figure 14. The analysis of other generated fault samples was similar. We concluded that the generated samples conformed to the 3δ principle and that the membership function obeyed a Gaussian distribution using the cloud model. Therefore, the generated samples were feasible for SDAE training.  Figure 13 shows the original vibration sample of the Label 2 type of Dataset A and its corresponding generated samples. Other fault-type generated samples were similar. The generated samples differed from the original sample and the original training samples were extended by the generated ones.

Data Augmentation
Like other deep learning methods, there are many parameters in the SDAE model. A vast number of training samples are necessary for training the SDAE so as to avoid overfitting and improve the generalization. In addition, training samples are balanced or imbalanced, which can cause differences in the diagnostic results [44,45]. The accuracy of fault diagnosis is usually affected by the imbalance of datasets. However, datasets A, B, C, and D were all balanced.
According to the previous introduction (Section 2.2), we needed to check the Nash balance point to verify whether G and D were in the trained convergence state. The result for all data reached the Nash equilibrium point. The calculation result, based on the Label 2 type of Dataset A, is shown in Figure 12 as an example. This method was applicable to other datasets.  Figure 13 shows the original vibration sample of the Label 2 type of Dataset A and its corresponding generated samples. Other fault-type generated samples were similar. The generated samples differed from the original sample and the original training samples were extended by the generated ones. The uncertainty analysis for the generated samples of the Label 2 type of Dataset A is shown in Figure 14. The analysis of other generated fault samples was similar. We concluded that the generated samples conformed to the 3δ principle and that the membership function obeyed a Gaussian distribution using the cloud model. Therefore, the generated samples were feasible for SDAE training. The uncertainty analysis for the generated samples of the Label 2 type of Dataset A is shown in Figure 14. The analysis of other generated fault samples was similar. We concluded that the generated samples conformed to the 3δ principle and that the membership function obeyed a Gaussian distribution using the cloud model. Therefore, the generated samples were feasible for SDAE training. Enough training samples were obtained to investigate how many training samples were sufficient and how well the SDAE performed. Different numbers of training samples as generated by the GAN approach were fed into the SDAE to test its performance. These amounts were selected as 700, 750, 800, 850, 900, 950, 1000, 1050, and 1100 samples in the experiment. The results with varying amounts of training samples are shown in Figure 15. It is evident that the performance of the SDAE for fault diagnosis improved as the volume of the training samples increased, with an increase of 30% in accuracy from 700 samples to 1100 samples. With 1000 training samples, the accuracy of the fault diagnosis was above 80%. When the samples were increased to 1100, the accuracy of the SDAE peaked at 92.01%. Considering the cost of the calculation, we made a compromise by expanding the sample size to 1100 without increasing the amount of data. The random-noise data were inputted into the GAN for data augmentation and were converted into vibration fault data, based on adversarial learning. This approach makes up for the shortcomings of other methods, such as oversampling. Because GAN data augmentation expands the count of the training samples, the training performance of the SDAE is intensified and the generalization of the SDAE is further increased.

Performance of the GAN-SDAE System
Deep learning can adaptively mine vibration-fault features from the signal spectra. To examine the feature-mining capability of the GAN-SDAE, we used t-SNE to extract the features from the output of the last layer of the GAN-SDAE. The details of the t-SNE algorithm have been documented [46]. Visual analysis of the categories for Label 1 to Label 10 based on the t-SNE is shown in Figure  16. It is evident that the t-SNE was able to group the faults according to their types and that this method accurately distinguished ten types. Enough training samples were obtained to investigate how many training samples were sufficient and how well the SDAE performed. Different numbers of training samples as generated by the GAN approach were fed into the SDAE to test its performance. These amounts were selected as 700, 750, 800, 850, 900, 950, 1000, 1050, and 1100 samples in the experiment. The results with varying amounts of training samples are shown in Figure 15. Enough training samples were obtained to investigate how many training samples were sufficient and how well the SDAE performed. Different numbers of training samples as generated by the GAN approach were fed into the SDAE to test its performance. These amounts were selected as 700, 750, 800, 850, 900, 950, 1000, 1050, and 1100 samples in the experiment. The results with varying amounts of training samples are shown in Figure 15. It is evident that the performance of the SDAE for fault diagnosis improved as the volume of the training samples increased, with an increase of 30% in accuracy from 700 samples to 1100 samples. With 1000 training samples, the accuracy of the fault diagnosis was above 80%. When the samples were increased to 1100, the accuracy of the SDAE peaked at 92.01%. Considering the cost of the calculation, we made a compromise by expanding the sample size to 1100 without increasing the amount of data. The random-noise data were inputted into the GAN for data augmentation and were converted into vibration fault data, based on adversarial learning. This approach makes up for the shortcomings of other methods, such as oversampling. Because GAN data augmentation expands the count of the training samples, the training performance of the SDAE is intensified and the generalization of the SDAE is further increased.

Performance of the GAN-SDAE System
Deep learning can adaptively mine vibration-fault features from the signal spectra. To examine the feature-mining capability of the GAN-SDAE, we used t-SNE to extract the features from the output of the last layer of the GAN-SDAE. The details of the t-SNE algorithm have been documented [46]. Visual analysis of the categories for Label 1 to Label 10 based on the t-SNE is shown in Figure  16. It is evident that the t-SNE was able to group the faults according to their types and that this method accurately distinguished ten types. It is evident that the performance of the SDAE for fault diagnosis improved as the volume of the training samples increased, with an increase of 30% in accuracy from 700 samples to 1100 samples. With 1000 training samples, the accuracy of the fault diagnosis was above 80%. When the samples were increased to 1100, the accuracy of the SDAE peaked at 92.01%. Considering the cost of the calculation, we made a compromise by expanding the sample size to 1100 without increasing the amount of data. The random-noise data were inputted into the GAN for data augmentation and were converted into vibration fault data, based on adversarial learning. This approach makes up for the shortcomings of other methods, such as oversampling. Because GAN data augmentation expands the count of the training samples, the training performance of the SDAE is intensified and the generalization of the SDAE is further increased.

Performance of the GAN-SDAE System
Deep learning can adaptively mine vibration-fault features from the signal spectra. To examine the feature-mining capability of the GAN-SDAE, we used t-SNE to extract the features from the output of the last layer of the GAN-SDAE. The details of the t-SNE algorithm have been documented [46]. Visual analysis of the categories for Label 1 to Label 10 based on the t-SNE is shown in Figure 16.
It is evident that the t-SNE was able to group the faults according to their types and that this method accurately distinguished ten types. The performance of the GAN-SDAE was tested using four datasets in different load domains. The diagnostic results from the proposed GAN-SDAE were compared with those from SDAE. They were also compared with the results from classic machine learning algorithms, namely support vector machine (SVM), back propagation networks (BPN), and DAE. The comparisons are shown in Figure  17. It is evident that the accuracy of GAN-SDAE for all four loads was above 90% and was higher than the accuracy of the other four methods. The accuracy from SDAE was above 80%, which was higher than other traditional machine learning methods. The SVM performed worst in the comparison, with an average accuracy below 75%. In summary, with the support of GAN, the capability of the SDAE was markedly increased.

Baselines and Evaluation
The baselines for bearing fault diagnosis were set as per References [47][48][49]. The effect of vibration signal fault diagnosis is shown in Figure 18. Obviously, the GAN-SDAE system proposed in this paper has a higher average accuracy rate than the baselines in the references, so it improved the accuracy of the fault diagnosis. The performance of the GAN-SDAE was tested using four datasets in different load domains. The diagnostic results from the proposed GAN-SDAE were compared with those from SDAE. They were also compared with the results from classic machine learning algorithms, namely support vector machine (SVM), back propagation networks (BPN), and DAE. The comparisons are shown in Figure 17. The performance of the GAN-SDAE was tested using four datasets in different load domains. The diagnostic results from the proposed GAN-SDAE were compared with those from SDAE. They were also compared with the results from classic machine learning algorithms, namely support vector machine (SVM), back propagation networks (BPN), and DAE. The comparisons are shown in Figure  17. It is evident that the accuracy of GAN-SDAE for all four loads was above 90% and was higher than the accuracy of the other four methods. The accuracy from SDAE was above 80%, which was higher than other traditional machine learning methods. The SVM performed worst in the comparison, with an average accuracy below 75%. In summary, with the support of GAN, the capability of the SDAE was markedly increased.

Baselines and Evaluation
The baselines for bearing fault diagnosis were set as per References [47][48][49]. The effect of vibration signal fault diagnosis is shown in Figure 18. Obviously, the GAN-SDAE system proposed in this paper has a higher average accuracy rate than the baselines in the references, so it improved the accuracy of the fault diagnosis. It is evident that the accuracy of GAN-SDAE for all four loads was above 90% and was higher than the accuracy of the other four methods. The accuracy from SDAE was above 80%, which was higher than other traditional machine learning methods. The SVM performed worst in the comparison, with an average accuracy below 75%. In summary, with the support of GAN, the capability of the SDAE was markedly increased.

Baselines and Evaluation
The baselines for bearing fault diagnosis were set as per References [47][48][49]. The effect of vibration signal fault diagnosis is shown in Figure 18. Obviously, the GAN-SDAE system proposed in this paper has a higher average accuracy rate than the baselines in the references, so it improved the accuracy of the fault diagnosis. Appl. Sci. 2020, 10, x FOR PEER REVIEW 16 of 21 The MAE, MSE, MAD, RMSE for the GAN-SDAE system are shown in Table 4. The proposed method evaluation is satisfactory. By using improved data enhancement methods, the SDAE model can complete learning and training on demand, while significantly improving the performance. As mentioned previously, all current deep learning algorithms inherently face the problem of system uncertainty; furthermore, non-convergence may happen during network training. Therefore, besides the accuracy requirement for fault diagnosis, the stability of the system must be checked. The uncertainty analysis was performed to ensure the stability of the GAN-SDAE. The GAN-SDAE system was used to conduct repetitive experiments in various load domains. The synthetic standard uncertainty uC for the four datasets (four loads) is illustrated in Figure 19. Overall, the value of uC decreased as the number of repetitive experiments increased. When the number of repetitive experiments was above 40, uC reached a stable condition at a 0.2 float.

Versatility of GAN-SDAE System
The usefulness of the GAN-SDAE system was confirmed by the classic bearing fault diagnosis. However, in the actual industry, gear fault diagnosis is also common. To verify the applicability of GAN-SDAE, we applied this system to gear fault diagnosis. The gear fault signals were obtained by The MAE, MSE, MAD, RMSE for the GAN-SDAE system are shown in Table 3. The proposed method evaluation is satisfactory. By using improved data enhancement methods, the SDAE model can complete learning and training on demand, while significantly improving the performance. As mentioned previously, all current deep learning algorithms inherently face the problem of system uncertainty; furthermore, non-convergence may happen during network training. Therefore, besides the accuracy requirement for fault diagnosis, the stability of the system must be checked. The uncertainty analysis was performed to ensure the stability of the GAN-SDAE. The GAN-SDAE system was used to conduct repetitive experiments in various load domains. The synthetic standard uncertainty u C for the four datasets (four loads) is illustrated in Figure 19. Overall, the value of u C decreased as the number of repetitive experiments increased. When the number of repetitive experiments was above 40, u C reached a stable condition at a 0.2 float. The MAE, MSE, MAD, RMSE for the GAN-SDAE system are shown in Table 4. The proposed method evaluation is satisfactory. By using improved data enhancement methods, the SDAE model can complete learning and training on demand, while significantly improving the performance. As mentioned previously, all current deep learning algorithms inherently face the problem of system uncertainty; furthermore, non-convergence may happen during network training. Therefore, besides the accuracy requirement for fault diagnosis, the stability of the system must be checked. The uncertainty analysis was performed to ensure the stability of the GAN-SDAE. The GAN-SDAE system was used to conduct repetitive experiments in various load domains. The synthetic standard uncertainty uC for the four datasets (four loads) is illustrated in Figure 19. Overall, the value of uC decreased as the number of repetitive experiments increased. When the number of repetitive experiments was above 40, uC reached a stable condition at a 0.2 float.

Versatility of GAN-SDAE System
The usefulness of the GAN-SDAE system was confirmed by the classic bearing fault diagnosis. However, in the actual industry, gear fault diagnosis is also common. To verify the applicability of GAN-SDAE, we applied this system to gear fault diagnosis. The gear fault signals were obtained by

Versatility of GAN-SDAE System
The usefulness of the GAN-SDAE system was confirmed by the classic bearing fault diagnosis. However, in the actual industry, gear fault diagnosis is also common. To verify the applicability of GAN-SDAE, we applied this system to gear fault diagnosis. The gear fault signals were obtained by the drive-train dynamic simulator (DDS). The different types of faults in the gears are shown in Table 4 [50]. The process of data augmentation for the gears was similar to the bearing experiment. The number of samples in the training set increased from 900 to 1200, which achieved the purpose of expanding the training samples. Visual analysis of categories Label 1 to Label 5 based on t-SNE is shown in Figure 20. It is evident that the t-SNE for the GAN-SDAE grouped the same gear fault types together and accurately distinguished five gear types. The cluster analysis for the gears had the same effect as in the bearing experiment and these results confirmed that the gear-fault features extracted by the GAN-SDAE were separable.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 17 of 21 the drive-train dynamic simulator (DDS). The different types of faults in the gears are shown in Table  5 [50].

Fault type Description Label Normal
The gear works normally 1 Chipped Crack in the gear teeth 2 Miss Missing a tooth in the gear 3 Root Crack in the root of gear teeth 4 Surface Wear in the surface of gear 5 The process of data augmentation for the gears was similar to the bearing experiment. The number of samples in the training set increased from 900 to 1200, which achieved the purpose of expanding the training samples. Visual analysis of categories Label 1 to Label 5 based on t-SNE is shown in Figure 20. It is evident that the t-SNE for the GAN-SDAE grouped the same gear fault types together and accurately distinguished five gear types. The cluster analysis for the gears had the same effect as in the bearing experiment and these results confirmed that the gear-fault features extracted by the GAN-SDAE were separable.  The results indicate that both SVM and BPN were less than 80% accurate. The accuracy of the DAE fault diagnosis was 80.23%, and the effect is common. The accuracy of the SDAE was 86.93%, and data augmentation by GAN increased this result to 90.35% for the proposed system. The uC of  Table  5 [50].

Fault type Description Label Normal
The gear works normally 1 Chipped Crack in the gear teeth 2 Miss Missing a tooth in the gear 3 Root Crack in the root of gear teeth 4 Surface Wear in the surface of gear 5 The process of data augmentation for the gears was similar to the bearing experiment. The number of samples in the training set increased from 900 to 1200, which achieved the purpose of expanding the training samples. Visual analysis of categories Label 1 to Label 5 based on t-SNE is shown in Figure 20. It is evident that the t-SNE for the GAN-SDAE grouped the same gear fault types together and accurately distinguished five gear types. The cluster analysis for the gears had the same effect as in the bearing experiment and these results confirmed that the gear-fault features extracted by the GAN-SDAE were separable.  The results indicate that both SVM and BPN were less than 80% accurate. The accuracy of the DAE fault diagnosis was 80.23%, and the effect is common. The accuracy of the SDAE was 86.93%, and data augmentation by GAN increased this result to 90.35% for the proposed system. The uC of The results indicate that both SVM and BPN were less than 80% accurate. The accuracy of the DAE fault diagnosis was 80.23%, and the effect is common. The accuracy of the SDAE was 86.93%, and data augmentation by GAN increased this result to 90.35% for the proposed system. The u C of the