A Novel Autoencoder with Dynamic Feature Enhanced Factor for Fault Diagnosis of Wind Turbine

: Due to the complicated operating environment and variable operating conditions, wind turbines (WTs) are extremely prone to failure and the frequency of fault increases year by year. Therefore, the solutions of effective condition monitoring and fault diagnosis are urgently demanded. Since the vibration signals contain a lot of health condition information, the fault diagnosis based on vibration signals has received extensive attention and achieved impressive progress. However, in practice, the collected health condition signals are very similar and contain a lot of noise, which makes the fault diagnosis of WTs more challenging. In order to handle this problem, this paper proposes a model called denoising stacked feature enhanced autoencoder with dynamic feature enhanced factor (DSFEAE-DF). Firstly, a feature enhanced autoencoder (FEAE) is constructed through feature enhancement so that the discriminative features can be extracted. Secondly, a feature enhanced factor which is independent of manual judgments is proposed and embedded into the training process. Finally, the DSFEAE-DF, combining one noise adding scheme, stacked FEAEs and dynamic feature enhanced factor, is established. Through experimental comparisons, the superiorities of the proposed DSFEAE-DF are veriﬁed.


Introduction
Nowadays, wind energy is becoming one of the most effective means to alleviate energy shortages and protect the environment, so wind turbines (WTs) are widely applied. However, due to the harsh working environment and variable working conditions, WTs are extremely susceptible to failure, resulting in unexpected shutdowns and additional maintenance costs. For example, the shutdown frequency of a wind farm investigated by Caithness Windfarm Information Forum increases from 156 times per year (2009)(2010)(2011)(2012)(2013)(2014) to 176 times per year (2015-2019), which causes huge economic losses to investors. To reduce maintenance costs and avoid unplanned shutdowns, it is urgent to develop an effective condition monitoring and fault diagnosis model to detect weak faults as early as possible [1][2][3].
In recent years, the condition monitoring and fault diagnosis of WTs have been greatly developed. In summary, the signals and monitoring means currently employed mainly include the following categories-vibration [4][5][6], acoustic emission [7,8], strain [9], torque, temperature, oil [10], electrical parameters [11,12], supervisory control and data acquisition (SCADA) parameters [1], non-destructive testing, and so forth. Comprehensive consideration of several aspects such as monitorable componssnts, installation intrusion, installation complexity, installation costs, sampling frequency requirements, and commercialization, vibration monitoring has become the most widely used approach which provides rich data supports to the development of data-based health monitoring and fault diagnosis.
Benefiting from the development of signal processing and machine learning, many fault diagnosis models have been formed through feature extraction and feature classification [13][14][15][16][17]. Among them, deep neural networks (DNNs), which extract effective features from complex monitoring data automatically and construct a high-reliability model, have gradually become a hotspot in fault diagnosis of WTs [18]. Various DNNs, for example, autoencoder (AE) [13,19,20], sparse filter [4,21], deep belief network (DBN), convolutional neural network (CNN) [22,23], recurrent neural network (RNN) [24], have been employed widely for many challenging problems in fault diagnosis. AE, which minimizes the error in reconstructing the input, can adaptively perform feature extraction in an absolutely unsupervised manner with a simple network structure and few parameters [25][26][27]. Following this line of reasoning, many variants of autoencoders, for example, denoising autoencoder (DAE), contractive autoencoder (CAE), variational autoencoder (VAE), K-sparse autoencoder, locally connected autoencoder, and so forth., have been proposed recently. For example, considering the noise in signals and non-linearity of signals, Jiang et al. [28] utilized a stacked multilevel-DAE to extract more robust and discriminative fault features. Shen et al. [29] proposed a stacked CAE for anti-noise and robust fault diagnosis. Martin et al. [30] adopted a fully unsupervised deep VAE method for some latent fault feature extraction by variational inferences. These studies motivate us to develop a new AE-based fault diagnosis model for WTs. However, AE is a greedy neural network, and its extracted features are usually trivial. Especially for similar faults, the extracted features of AE are not distinctive and lack of meaning, which guides the AE to focus on important features by adding constraints. L1 regularizer, L2 regularizer, L1L2 regularizer, and KL divergence are widely applied while some of their parameters are usually needed manual judgments. Meanwhile, the signals from WTs operating in variable conditions often contain much noise, which forces AE to still hold the capability of extracting more robust features.
To improve the discriminative and robust feature extraction ability of traditional AE, the denoising stacked feature enhanced autoencoder with dynamic feature enhanced factor (DSFEAE-DF) for fault diagnosis of wind turbine is proposed. The main contributions of this paper are summarized into three folds.
(1) A novel feature enhanced autoencoder (FEAE) is proposed. The FEAE, which introduces feature enhancement, can extract more representative and discriminative features from raw signals. (2) A dynamic feature enhanced factor is proposed in this paper. The dynamic feature enhanced factor, which involves the diversity of features and information amount between feature and input, is smoothly embedded into the training process and calculated without manual judgments. (3) DSFEAE-DF is proposed for fault diagnosis of WTs, which involves one noise adding scheme, stacked FEAEs and dynamic feature enhanced factor. Compared to the traditional stacked denoising autoencoder (SDAE), the DSFEAE-DF can extract hierarchical discriminative and robust features and therefore DSFEAE-DF has better ability of similar fault diagnosis and noise environment fault diagnosis.
The remainder of this paper is organized as follows. A brief background of AE is described in Section 2. The proposed model, DSFEAE-DF, is detailed in Section 3 and a set of experiments are conducted for the validity of DSFEAE-DF in Section 4. Finally, the conclusions are drawn in Section 5.

Autoencoder
Autoencoder (AE) [13], as shown in Figure 1a, is a special DNN which can be divided into two parts: encoding network and decoding network. Given a training sample set 1 ; x i 2 , · · · ; x i t ; · · · ; x i n ∈ n×1 and M is the number of training samples. The encoding network is to extract a hidden feature f i from the input sample x i , which is described as where W en ∈ d×n is the weight and b en ∈ d×1 is the bias. And the decoding network is to map the hidden feature f i to the reconstruction output z i , which is described as where W de ∈ n×d is the weight and b de ∈ n×1 is the bias. The g en and g de represent the activation functions. The training process of AE is to update the parameters {W en , b en , W de , b de } by minimizing the error between X = x i M i=1 and Z = z i M i=1 as follows:

Denoising
Due to the complicated environment, the collected signals often contain strong background noise, so the performance of AE with noise needs to be improved. To learn more robust features, noise samplẽ x i is constructed by input sample x i through adding noise by q D , which is denoted asx i ∼ q D x i |x i . Then, through the encoding network and the decoding network, the noise samplex i are mapped to the reconstruction output z i . Finally, by minimizing error in Equation (3), robust features are extracted.

Proposed Method
In this section, the proposed DSFEAE-DF is firstly presented, including FEAE, dynamic feature enhanced factor and the structure of DSFEAE-DF. Then, the fault diagnosis procedures are detailed.

Feature Enhancement
When observing the features extracted by AEs, they are not discriminatively different. That is, AEs greedily extract relatively trivial features to reconstruct input samples.To overcome this shortcoming, one approach guides the AE to focus on important features by adding constraints in the training process via mutual competition and enhancement. In competition and enhancement, neurons in the hidden layer compete for the right to respond to the input samples, then the specialization of neurons increases so that discriminative features can be extracted. Following this idea and inspired by Reference [31], the feature enhancement is proposed as two processes, as described below and detailed in Algorithm 1.
(1) Competition: In the feature vector f i = f i 1 ; f i 2 ; · · ·; f i t ; · · ·; f i d , the most competitive k neurons with the largest activation values are selected as the "winner" in competition, while the remaining "loser" are suppressed as 0.
(2) Enhancement: In order to compensate for the energy loss caused by suppressing the "loser" neurons and make the competition among the neurons more obvious, the average "loser" neuron energy E F i 1,d−k is redistributed to the "winner" neurons by energy enhanced factor β, which achieves the enhancement. Given a feature enhanced factor α, the most competitive k and energy enhanced factor β can be denoted as below.

Feature Enhanced Autoencoder
According to the descriptions of AE in Section 2.1 and feature enhancement in Section 3.1, feature enhanced autoencoder (FEAE), as shown in Figure 1b, is proposed as Equations (1), (6) and (7), where f i is denoted as feature and h i is denoted as enhanced feature. FEAE, as an unsupervised feature extractor, can extract discriminative features through feature enhancement. Similarly, the parameters θ = {W en , b en , W de , b de } can be trained by minimizing the Equation (3).

Dynamic Feature Enhanced Factor
As can be seen in Section 3.1, feature enhanced factor α, as a hyperparameter, is a key factor of feature enhancement and represents the proportion of most competitive k neurons in the total neurons. With a large α, k is also large so that too many features are enhanced and the significance of feature enhancement decreases. While with a small α, k is also small causing few features are enhanced and the remaining features are set to 0 so that the features are lost. In the existing method [31], a stable α is employed by prior knowledge and human judgment, but due to the complexity and diversity of , a stable α cannot be suitable for all features, so a dynamic feature enhanced factor α b independent on human judgment is proposed as follows.
As for the training process, the batch size is B, the current batch is b, then the training sample can be obtained. Then, the similarity between the f i and the rest features in F b can be denoted as Meanwhile, the information amount I between feature f i and input sample x i is designed as where d and n are the dimensions of f i and x i respectively. Finally, in current batch b, the dynamic feature enhanced factor α b is designed as Furthermore, the k b , β b can be calculated by Equations (4) and (5).
Interpreting Equations (11), the dynamic feature enhanced factor α b consists of two terms. The first term 1 − sim b is the average diversity of F b , which can be roughly regarded as the proportion of discriminative features. The second term I represents the information amount carried by the features from the input samples, which prevents too few features that can be enhanced due to the small average diversity of F b and ensures enough enhanced features to reconstruct the input. As can be seen in the above descriptions, the dynamic feature enhanced factor α b is proposed reasonably and acquired adaptively without prior knowledge in every training batch.

DSFEAE-DF Model
The proposed denoising stacked feature enhanced autoencoder with dynamic feature enhanced factor (DSFEAE-DF) model is a DNN with one noise adding scheme, multiple feature enhanced layers, and one softmax layer, as shown in Figure 2.

Input Layer
Feature Enhanced Layer 2 Figure 2. Structure of denoising stacked feature enhanced autoencoder with dynamic feature enhanced factor (DSFEAE-DF) model.
The feature enhanced layers are composed of a set of FEAEs to achieve discriminatively and automatically extraction of enhanced features at different layers from the original signals. Assuming that the multiple feature enhanced layers has L FEAEs, l ∈ {1, 2, 3, . . . , L} represents the l-th FEAE. When l = 1, the input x i,1 of the first FEAE is described as x i,1 =x i , which means that the input of DSFEAE-DF is the noise samplex i . Then enhanced features h i,1 in first FEAE can be obtained by updating θ 1 = {W 1 en , b 1 en , W 1 de , b 1 de }. And when l = 2, 3, . . . , L, the input x i,l of the l-th FEAE is h i,l−1 , and update θ l = {W l en , b l en , W l de , b l de } to get the enhanced features h i,l . The softmax layer, whose input is h i,L , is employed to make the prediction y i θ c ∈ C×1 of the input sample x i . Supposing that the label of x i is y i , the discrepancy between y i θ c and y i , computed by the crossentropy loss function L c in Equation (12), reach the minimum through updating the weight θ c of softmax layer.
where y i ∈ C×1 is one-hot form of label y i . Consequently, the stacked multiple FEAE layers can obtain hierarchical non-linear features, where enhanced features of lower layers are extracted in lower layers and enhanced features of higher layers are extracted in the higher layers. Meanwhile, the dynamic feature enhanced factor α b,l of the l-th FEAE in every batch can be smoothly embedded into the model and adaptively calculated by Equation (11) during the training process so that enhanced features are extracted automatically without any human judgment. Furthermore, the input of the first FEAE is the noise samplex i by one noise adding scheme, which corresponds to the denoising described in Section 2.2.

DSFEAE-DF for Fault Diagnosis
The fault diagnosis procedures based on the DSFEAE-DF model includes two phases: training phase and testing phase. During the training phase, the different health condition vibration signals are collected, segmented, normalized, and put into the established DSFEAE-DF model. Next, complete the training of the model and obtain the trained DSFEAE-DF. The detailed training process of the training phase is shown in Algorithm 2. In the testing phase, new health condition signals are acquired, segmented, normalized, and fed into the trained model, then the diagnosis result can be obtained.

Experiments and Verification
In this section, since the bearing is a core component of WTs, the dataset of bearing is applied to verify the effectiveness of the proposed method. All experiments are conducted with a computer with AMD A8-5550M APU, Linux OS, and Tensorflow Toolbox. All experiments repeat 10 trails to avoid the one-time occasionality.

Data Description
The bearing vibration signals of Case Western Reserve University (CWRU) [32] are employed in this paper. All signals are obtained from artificially damaged bearings in the motor driving mechanical system shown in Figure 3. The signals under four fault locations (Normal, Ball, Inner race and Outer race) are collected by the acceleration sensors under four different loads with 48 kHz sampling frequency. For each fault location, three fault severities (0.18 mm, 0.36 mm and 0.53 mm) are introduced, respectively. We use these situations to simulate actual bearing faults in WTs.  Table 1. The dataset contains a total of 10 health conditions. For each health condition, there are 800 segments (200 segments under each load) and a segment (also called a sample) contains 1200 data points, which means the dataset containing a total of 8000 samples. Figure 4 provides examples of 10 health condition samples.

Experimental Setup
The parameters of the DSFEAE-DF structure and training process have an impact on testing accuracy. To determine these parameters, exhaustive experiments on the bearing dataset are undertaken to obtain the optimized parameters and the testing accuracy is set to be the indicator. The parameters of five aspects under consideration are presented as follows.
(1) The structure of DSFEAE-DF. Combining the structure in Reference [33] and testing accuracy, the structure of DSFEAE-DF includes an input layer, three feature enhanced layers, and a softmax layer. The dimension of the input layer is equal to the dimension of the input sample, and the dimensions of the three feature enhanced layers are 600, 400, and 200, respectively. The number of nodes in the softmax layer is 10, which is the number of health conditions. (2) Activation function. Compared to other nonlinear functions, for example, ReLU, Tanh, LogSig, and SoftSign, the g en and g de utilize the Sigmoid function. (3) Noise adding probability. The Gaussian noise is adopted in this paper. Combing noise settings in Reference [1] and testing accuracy, the noise adding probability q D is set as 0.1. (4) Optimization. Following Reference [34], Adam is adopted for stochastic optimization. The learning rates of the unsupervised training and the supervised fine-tuning are set to 0.01 and 0.0165, respectively. Comprehensive consideration of the degree of optimization and the speed of optimization, the epoch is set to 200 and batch size is set to 100. (5) Training sample number. Following the setting in Reference [35], random 7200 samples are used for training and the rest are used for testing.
To verify the effectiveness of dynamic feature enhanced factor α b,l , we embed 9 values of α into the feature enhancement of the DSFEAE for comparison to DSFEAE-DF in fault diagnosis. Otherwise, in order to verify the performances of the proposed model in similar fault diagnosis and noise environment fault diagnosis, other AE-based models with the same establishment of DSFEAE-DF are employed for comparisons. The details are as follows.

Effectiveness of Dynamic Feature Enhanced Factor
In this paper, the consideration of the diversity of corresponding features in a training batch and the information amount between feature and input in the current feature enhanced layer, the dynamic feature enhanced factor α b,l is designed, which is smoothly embedded into the training process. To verify the effectiveness of the dynamic feature enhanced factor, 9 values of α are embedded in the feature enhancement of the DSFEAE. The diagnosis results of manually setting parameters α and dynamic α b,l are shown in Figure 5. It can be seen that as the value increases from 0.1 to 0.4, the testing accuracy continuously increases, while the value increases from 0.4 to 0.9, the testing accuracy continuously decreases. That is because a small value means a small number of feature are enhanced and little information would be contained in enhanced features, which makes it difficult to reconstruct input samples. While too larger value means most of the features are enhanced so that the meaning of feature enhancement decreases. The DSFEAE-DF uses dynamic feature enhanced factor α b,l which can adaptively select features and enhance them. The accuracy of DSFEAE-DF is higher than that of DSFEAE with any stable α, and its standard deviation is also smaller than others. These results verify the effectiveness of the dynamic feature enhanced factor.

Performance of Fault Diagnosis
In this subsection, the performances of fault diagnosis and similar fault diagnosis are discussed. Table 2 shows the diagnosis results of the five models. The SAE achieves the worst accuracy of 91.30% with a standard deviation of 2.34%. K-sparse SAE, SDAE and DSFEAE-0.4 acquire the middle accuracies, which are 94.72%, 97.14% and 98.70% with standard deviations of 2.07%, 1.89% and 0.84%. It can be seen that DSFEAE-DF obtains the highest accuracy of 99.37% and the lowest standard deviation of 0.47%, which indicates the fault diagnosis performance of DSFEAE-DF is superior and stable. Further, we explore the performance of similar fault diagnosis, so the confusion matrices of the testing results, which can detail the classification of each health condition, are drawn in Figure 6. According to Figure 4, the samples of the label 4 and label 9 are similar, so the classification of label 4 and label 9 is challenging. In Figure 6, 14.8% of samples of label 9 are misclassified as label 4 in SAE, 11.25% in K-sparse SAE, 10.30% in SDAE, 3.6% in DSFEAE-0.4, but only 1.08% in DSFEAE-DF. Similar situations also occur between label 1 and label 2 and between label 3 and label 4, which confirm the capability of classification of similar faults in DSFEAE-DF. Meanwhile, not only the average accuracy but also the accuracy of each label in DSFEAE-DF are generally higher than with other models. All these results illustrate the superiority of similar faults diagnosis.  In Figure 7, the t-distributed stochastic neighbor embedding (t-SNE) [38] is used to map the features of last enhanced feature layer to two-dimensional features, thereby displaying the fault diagnosis results more intuitively. It can be seen that the features of SAE, K-sparse SAE, and SDAE are partially overlapped and the clustering effect is not satisfying. And in DSFEAE-0.4, the clustering effect is good but some similar faults overlapped slightly, which matches the confusion matrix shown in Figure 6e. For the DSFEAE-DF, the features of the same health condition can be clustered together, and the separation between the features of different health conditions is clearer. These can verify the superiority of DSFEAE-DF.

Performance Under Noise Environment
To verify the superiority of the proposed method in real WTs, the additive white Gaussian noise is added to the testing samples to synthesize signals with different signal-to-noise ratios to simulate the actual working conditions. The definition of the signal-to-noise ratio is defined as follows: where P signal and P noise are the power of the original signal and noise, respectively. Experiments of SAE, K-sparse SAE, SDAE, DSFEAE-0.4 and DSFEAE-DF under different noise environments are conducted, whose results are shown in Table 2 and Figure 8. According to Figure 8, when the SNR is 0, the DSFEAE-DF accuracy is as high as 91.55% which is much higher than the other four models. When the SNR value is less than 0, the accuracy gaps between DSFEAE-DF and the other four models are obvious. The standard deviations of DSFEAE-DF are smaller than the other four methods, which means strong stability. When the SNR is greater than 0, the accuracies of DSFEAE-DF are still much higher than the accuracies of SAE, K-sparse SAE, and SDAE. Meanwhile, the accuracy gaps between DSFEAE-DF and DSFEAE-0.4 become tiny but DSFEAE-DF still has advantages with the fact that when SNR=10, the accuracies of DSFEAE-DF and DSFEAE-0.4 are 98.20% and 98.00%, respectively. These results can verify the superiority of the proposed method under noise environment.

Visualization of Network
In order to visualize the reactions of neurons to gain some insights of feature enhancement, the features of random 200 testing samples of each hidden layer of DSFEAE-DF model and SDAE model are shown in Figure 9. In Figure 9a

Conclusions
In this paper, a novel model called denoising stacked feature enhanced autoencoder with dynamic feature enhanced factor (DSFEAE-DF) is proposed for fault diagnosis. The model, integrating a noise adding scheme, stacked feature enhanced autoencoders and dynamic feature enhanced factor, is proposed to discover more discriminative features from raw signals. Compared with traditional approaches, such as SAE, K-sparse SAE, SDAE and DSFEAE-0.4, our proposed method achieves superior performances in similar faults diagnosis and noise environment faults diagnosis. In addition, the reactions of neurons are visualized to show the insights of feature enhancement.