Anomaly Detection of Wind Turbines Based on Deep Small-World Neural Network

Li, Meng; Wang, Shuangxin; Fang, Shanxiang; Zhao, Juchao

doi:10.3390/app10041243

Open AccessArticle

Anomaly Detection of Wind Turbines Based on Deep Small-World Neural Network

¹

School of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing 100044, China

²

Department of Mechanical Engineering, National University of Singapore, Singapore 119077, Singapore

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2020, 10(4), 1243; https://doi.org/10.3390/app10041243

Submission received: 19 January 2020 / Revised: 6 February 2020 / Accepted: 10 February 2020 / Published: 12 February 2020

Download

Browse Figures

Versions Notes

Abstract

Accurate and efficient condition monitoring is the key to enhance the reliability and security of wind turbines. In recent years, an intelligent anomaly detection method based on deep learning networks has been receiving increasing attention. Since accurately labeled data are usually difficult to obtain in real industries, this paper proposes a novel Deep Small-World Neural Network (DSWNN) on the basis of unsupervised learning to detect the early failures of wind turbines. During network construction, a regular auto-encoder network with multiple restricted Boltzmann machines is first constructed and pre-trained by using unlabeled data of wind turbines. After that, the trained network is transformed into a DSWNN model by randomly add-edges method, where the network parameters are fine-tuned by using minimal amounts of labeled data. In order to guard against the changes and disturbances of wind speed and reduce false alarms, an adaptive threshold based on extreme value theory is presented as the criterion of anomaly judgment. The DSWNN model is excellent in depth mining data characteristics and accurate measurement error. Last, two failure cases of wind turbine anomaly detection are given to demonstrate its validity and accuracy of the proposed methodology contrasted with the deep belief network and deep neural network.

Keywords:

wind turbine; supervisory control and data acquisition (SCADA) data; fault diagnosis; deep small-world neural network (DSWNN); adaptive threshold

1. Introduction

Changeable and severe environment is the main cause of failures for wind turbines (WT), such frequent malfunctions will inevitably lead to low availability and expensive maintenance costs [1]. In general, various types of condition monitoring sensors are installed in different WT components, and their multi-dimensional state parameters, such as wind speed, pitch angle, hydraulic oil temperature, etc., are recorded and saved by the WT supervisory control and data acquisition (SCADA) system [2]. Once an exception occurs, its fault information will be fed back in the multi-dimensional sensor parameters of the SCADA system [3], where such parameters are referred as “SCADA data”. Hence, using the SCADA data for early anomaly detection is beneficial to achieve the condition assessment and fault warning of wind turbines.

Generally, fault detection and isolation (FDI) approaches have two critical processes: (1) Extract effective features from the complex data, and (2) use prior knowledge or machine learning techniques to achieve failure classifications. It is worth noting that both steps need obvious label information (normal or abnormal labels) as the essential elements of training the intelligent classification algorithms [4,5]. But in the actual WT operation, it is almost impossible to acquire enough label data. This is because the normal operation time is much longer than fault occurrence time, which causes the data sparseness to be inevitable. In addition, the monitoring data and fault information of the SCADA system are mostly recorded separately, and the correlation between them only depends on manual completion after the fault occurs. Such non-intelligent recording and classifying work is huge and unrealistic. Therefore, using imperfect labels to complete FDI of the wind turbines is fraught with challenges.

Deep learning methods [6] have been widely reported in data mining and intelligent fault diagnosis due to their excellent nonlinear approximation performance [7]. Compared with the shallow neural networks, the deep learning algorithms [8], such as multi-layer perceptron (MLP), deep neural network (DNN), long short-term memory (LSTM) or convolutional neural network (CNN), can mine valuable information from the data more truly and effectively through the real imitation of the human learning process. Jiang and He [9] proposed a multi-scale convolutional neural network to effectively learn the high-level fault features and obtain rich diagnostic information on different vibration frequencies. Yang et al. used the LSTM-based recurrent neural network (RNN) to mine the spatial or temporal relationships hidden in SCADA signals, so as to realize the fault classification of WT gears [10]. Faced with a large amount of textual data collected in the SCADA system, recent research work has innovatively transformed them into image or map form and then used image recognition algorithms to detect and identify anomalies. For instance, Yu et al. converted the vibration signal into time-domain spectrums, and then applied the CNN to extract the fault features [11]. Du [12] first arranged the multidimensional SCADA data and then used a window function to slide to intercept a digital matrix of equal length and width. Each value in this digital matrix is equivalent to the pixel value in an image. On this basis, the multilayer CNN is used for deep-level anomaly detection. The methods mentioned above provide more possibilities for the realization of WT fault diagnosis. However, the implementation of the above networks requires a supervised learning environment with sufficient labels, which is a gap and difficult to achieve in practical applications.

The techniques of using unsupervised learning in FDI are also relatively mature. The corresponding algorithms are mainly deep learning networks [13] constructed with Restricted Boltzmann Machine (RBM) or Auto-encoder (AE). Deep belief network (DBN) [14], denoise auto-encoder (DAE) [15], and stacked auto-encoder (SAE) [16] are the most representative unsupervised deep learning networks. In the field of fault diagnosis, auto-encoder is the most widely used. The process of the auto-encoder is to first explore the fusion rules of hidden features from the unlabeled input data by encoding, and then interpret the hidden features through decoding to give a more prominent output than the original data. Qin [17] used a DBN to diagnose the fault of the planetary gearbox in wind turbines and proposed an improved Logistic-sigmoid unit and pulse characteristic method to solve the problem that DBN is prone to gradient disappearance in the back propagation. Bai and Qin built a deep-stacked auto-encoder network based on a multilayer back propagation (BP) neural network [18]. First, the SAE network neurons are trained layer by layer through unsupervised greedy learning by using the bearing data collected from the SCADA system. Then the trained weights are used as the initial weights of the deep BP neural network, and the labeled data, in a supervised manner, to continue training the network. In addition, Zhao proposed a deep auto-encoder network based on multiple RBMs to implement early abnormality detection and fault detection of wind turbine components [19].

However, deep learning networks with different structures are not born equal, one kind of network for specific tasks may be superior to others. For example, a convolution neural network is suitable for image recognition, while the recurrent neural network is adept in speech analyzing. Hence the structures of the neural networks will be improved with the changing application field. At present, the improved methods for neural networks mainly include deepening network level, dropout and drop-connect, batch normalization, and so on. Among them, increasing the connection edges between non-adjacent hidden layers (add-edge) is the current research focus of neural networks to alleviate gradient disappearance. From the view of network structure, the add-edge improvement makes the traditional neural network become a small-world neural network, which is a middle topology between regularity and disorder. The authors of this paper have improved a BP neural network into a small-world one and then used it to predict the wind power [20]. On this basis, a selective ensemble strategy combining multiple small-world neural networks has been proposed to diagnose and detect the WT pitch failures [21]. Combined with the characteristics of wind turbines’ SCADA data, this paper proposed a deep small-world neural network (DSWNN) for anomaly detection of wind turbines. First, a DSWNN prototype using multiple restricted Boltzmann machines (RBM) was constructed based on the classical deep auto-encoder network. Previously unlabeled SCADA data from wind turbines were used to pre-train this DSWNN prototype to extract implied features. Then, the regular neural network was transformed into a small-world one though a randomly add-edge method, and the labeled data were used to train the reconstructed DSWNN model within the supervised case to fine-tune the global parameters of the network. Due to the acute changes and disturbances of wind speed in actual operation, the fixed thresholds for judging failures are always unreasonable and can cause false alarms. Therefore, an adaptive threshold determined by extreme value theory was presented as the criterion of anomaly judgment. Finally, the effectiveness of the proposed method was verified by two failure cases of wind turbine pitch systems.

The remainder of this paper is organized as follows. Section 2 describes the proposed DSWNN model and its training method; Section 3 gives the estimating method of the adaptive threshold; Section 4 presents and discusses two failure cases of wind turbine pitch system. Finally, Section 5 lists the conclusion of the study. All abbreviations and symbols used in this paper are shown in Table A1 and Table A2 of the Appendix A.

2. Deep Small-World Neural Network (DSWNN)

A deep auto-encoder network is a deep learning network with multiple hidden layers between the input and out layers, which can model complex non-linear relationships among multiple types of variables. The parameters of this network are initialized by unsupervised learning for input data layer by layer, and then supervised learning is used for fine-tuning. In this network framework, deep learning models lead to more complex features at higher output layers, and the learned complex features will be invariant with the change of input data [22,23]. In this paper, the proposed DSWNN model is an improved deep auto-encoder network that has small-world characteristics. Such improvement adds additional neuron connections between non-adjacent hidden layers on the basis of the original network structure. In particular, the DSWNN model is composed of multiple RBM stacks, and Figure 1 gives an example of the DSWNN structure with four hidden layers. The training process of the DSWNN model includes three phases, pre-training, small-world transformation and fine-tuning, which are applied to obtain the model parameters.

2.1. Pre-Training of the DSWNN Prototype

As mentioned above, the DSWNN prototype is a deep auto-encoder network with multiple RBMs. An RBM is a specific energy-based stochastic model [24] with one visible layer and one hidden layer, in which the visible neurons v = (v₁, v₂,…, v_m) are connected fully to the hidden neurons h = (h₁, h₂,…, h_n). Figure 2 shows the mechanisms of an RBM.

In the RBM, the energy of the joint configuration units E(v, h) is shown as Equation (1), and the joint probability P(v, h) between units based on the energy model is described as Equation (2).

E (v, h) = - \sum_{i = 1}^{n} \sum_{j = 1}^{m} w_{i j} v_{i} h_{j} - \sum_{i = 1}^{n} a_{i} v_{i} - \sum_{j = 1}^{m} b_{j} h_{j}

(1)

P (v, h) = \frac{e^{- E (v, h)}}{\sum \sum e^{- E (v, h)}}

(2)

where w_ij is the symmetric weight between visible neurons i and hidden neurons j, v_i, h_j are the binary states and a_i, b_j are their biases, respectively. The unbiased samples of v_i and h_j give a hidden vector h and a visible vector v can be calculated as follows:

P (v_{i} = 1 |h) = f (\sum_{j = 1}^{m} w_{i j}^{T} h_{j} + a_{i})

(3)

P (h_{i} = 1 |v) = f (\sum_{i = 1}^{n} w_{i j}^{T} v_{i} + b_{j})

(4)

where f(•) is an activation function that can be taken as a logistic Sigmoid function or tanh function, which are defined as Equation (5).

f (\cdot) = \{\begin{cases} Sigmoid (x) = \frac{1}{1 + e^{- x}} \\ t a n h (x) = \frac{1 - e^{- 2 x}}{1 + e^{- 2 x}} \end{cases} .

(5)

The essence of the activation function is to retain the characteristics of the activated neuron and map it out. The derivative of the log-likelihood with respect to weight w_ij is shown in Equation (6).

\frac{\partial \ln P (v)}{\partial w_{i j}} = {〈v_{i} h_{j}〉}_{data} - {〈v_{i} h_{j}〉}_{model}

(6)

where, 〈v_ih_j〉_data represents the expectation with respect to the data distribution, and 〈v_ih_j〉_model represents the expectation with respect to the model distribution.

However, getting an accurate description of 〈v_ih_j〉_model is computationally intractable, so Hinton proposed a Contrastive Divergence (CD) algorithm to crudely approximate the gradient [25]. First, the CD algorithm uses the input data to initialize the visible layer, and then calculates the hidden layer based on conditional distribution rules. Second, the visible layer is also calculated according to the hidden layer. Relying on the repeated calculation, a reconstruction of the input data is obtained.

Δ w_{i}_{j} = η ({〈v_{i} h_{j}〉}_{data} - {〈v_{i} h_{j}〉}_{recon})

(7)

where, η is the learning rate, 〈v_ih_j〉_recon is the expectation of reconstruction states, which is calculated by Gibbs sampler based on the initialized input data.

The DSWNN prototype needs to pre-trained first, which is regarded as the process of training multiple RBMs layer by layer. The output of each RBM is considered as a new input of another RBM with a higher level to achieve the transmission of learning results. Once an RBM is trained, the weights between two-layer neurons are determined and locked. This procedure is conducted in an unsupervised environment, and the whole network weights will be obtained after the layer by layer pre-training. These weights will be recorded and used as the prior values for subsequent supervised training (fine-tuning).

2.2. Small-World Transformation of DSWNN

A small-world network is an intermediate network between completely random and completely regular, which was originally proposed by Watts in 1998 to describe the natural distribution of biological, technological and social networks [26]. After that, various researches began to apply the characteristics of a small-world network to the structural improvement of artificial neural networks (ANNs) [21,22,27]. We summarize the current researches and consider that the small-world transformation has two ways (see Figure 3): reconstruct-edge transformation and add-edge transformation.

Taking a four-layer BP neural network with four neurons in each layer as an example, the reconstruct-edge transformation first separates the connections between adjacent-layer neurons randomly, and then reconstructs the new connections between nonadjacent-layer neurons. The add-edge transformation does not have the disconnecting procedure, it only adds new connections between nonadjacent-layer neurons without changing any original edges. The positions of the newly added connections are all randomly distributed among all network neurons, and the degree of network randomization is described by probability p.

p = n_{a d} / n_{o r}

(8)

where, n_ad and n_or are the number of the newly added connections and original connections, respectively.

Figure 4 gives the random add-edge procedure from a regular network to a random one. When p = 0 or p = 1, the network is completely regular or completely random. While when p is between 0 and 1, the network has a small-world property. To probe the intermediate region 0 < p < 1, the characteristic path length L(p) and clustering coefficient C(p) are used to quantify the small world structural properties.

The characteristic path length L(p) is a global property of measuring the average length of all connected edges in a network, and clustering coefficient C(p) is a local property that is used to describe the density of connected edges in local areas. Figure 5 shows the changing normalized values of L(p) and C(p) with p for the four-hidden-layer DSWNN described in Figure 1. It can be observed that as p increases, L(p) drops sharply while C(p) descends relatively slowly. When p moves towards 0.1, a large C(p) and a small L(p) are obtained, which indicates that the topology of the DSWNN has the best small-world properties. Therefore, the number of newly added edge connections is p = 0.1 times of the total number of connections.

This paper selects the way of add-edge transformation to reconstruct the DSWNN model. Suppose that the DSWNN model has H (H = 1, 2, 3,…, i,…) hidden layers, and there are N neurons in the ith hidden layer and M neurons in the i + 1th hidden layer. When p = 0, the DSWNN model is a regular network and the weight matrix W_i between the ith and i + 1th hidden layers is described as Equation (9). Accordingly, the connection matrix W for the entire hidden layers can be expressed as in Equation (10).

W_{i} = [\begin{matrix} w_{i 11} & w_{i 12} & \dots & w_{i 1 M} \\ w_{i 21} & w_{i 22} & \dots & w_{i 2 M} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ w_{i N 1} & w_{i N 2} & \dots & w_{i N M} \end{matrix}]

(9)

W = [\begin{matrix} W_{1} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & W_{2} & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & W_{i} & 0 & \dots & 0 \\ 0 & 0 & 0 & 0 & W_{i + 1} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & W_{H - 1} \end{matrix}] .

(10)

When p = 0.1, the DSWNN does not disconnect connections in the original network and only generates connections between neurons in the non-adjacent hidden layers. The diagonal weights in matrix W are not changed because the connection does not disconnect. The global W_a for the entire hidden layers can be represented as the Equation (11).

W = [\begin{matrix} W_{1} & W_{1}^{3} & \dots & W_{1}^{i + 1} & W_{1}^{i + 2} & \dots & W_{1}^{H} \\ 0 & W_{2} & \dots & W_{2}^{i + 1} & W_{2}^{i + 2} & \dots & W_{2}^{H} \\ ⋮ & ⋮ & ⋱ & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & W_{i} & W_{i}^{i + 2} & \dots & W_{i}^{H} \\ 0 & 0 & 0 & 0 & W_{i + 1} & \dots & W_{i + 1}^{H} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & 0 & 0 & \dots & W_{H - 1} \end{matrix}]

(11)

where,

W_{a}^{b}

is the weight matrix of the added edges between layer a and layer b, and its structure is shown in Equation (12).

W_{a}^{b} = [\begin{matrix} w_{a 11}^{b} & w_{a 12}^{b} & \dots & w_{a 1 M}^{b} \\ w_{a 21}^{b} & w_{a 22}^{b} & \dots & w_{a 2 M}^{b} \\ ⋮ & ⋮ & w_{a x y}^{b} & ⋮ \\ w_{a N 1}^{b} & w_{a N 2}^{b} & \dots & w_{a N M}^{b} \end{matrix}]

(12)

where,

w_{a x y}^{b}

is the weight of the added edge between the xth neuron in layer a and the yth neuron in layer b. N and M represent the number of neurons contained in layers a and b, respectively.

On the basis of the probability p, the

W_{a}^{b}

is a sparse matrix in which only the non-zero weights represent the randomly added edges. Therefore, transforming the DSWNN model into a small-world one is equivalent to randomly selecting the position in the global weight matrix and giving non-zero value. The pre-trained weights will not be changed in this procedure.

2.3. Fine-Tuning of the DSWNN Parameters

After the layer-wise pre-training and add-edge small-world transformation, the DSWNN model will be fine-tuned by using BP algorithm. BP is a classical method commonly used for supervised learning to improve the representation of data features and optimize the parameters of hidden layers in the fine-tuning. In this process, the initial parameters of the DSWNN model are composed of the pre-training parameters (weights and bias) and the edge-weights obtained by the small-world transformation. Because the fine-tuning only completes the local search based on these superior initial parameters, the convergence time of the optimization is significantly shortened in this process. After fine-tuning, the globally optimized parameters of the DSWNN model are obtained.

In this paper, the tanh function is regarded as the activation function for hidden layers, and the Softmax as the activation function for the top classifier layer. Moreover, the Cross Entropy [28] shown in Equation (13) is chosen as the cost function C to measure error.

C = - \frac{1}{k} \sum_{j = 1}^{k} [t_{k} \log (a_{k}) + (1 - t_{k}) \log (1 - a_{k})]

(13)

where, k is the number of neurons in the output layer, t is the expected output.

To summarize, the training process of the DSWNN model includes three steps.

(1): Pre-training: In the unsupervised case, each RBM of DSWNN is trained one by one to mine the feature information of the unlabeled input data.
(2): Small-world transformation: Lock the weight values obtained by pre-training, add new edges according to the probability p, and randomly assign the weight values to the newly added edges.
(3): Fine-tuning: Add a classification layer in the last layer of DSWNN to receive the output eigenvector from the last RBM. Train the entity DSWNN model with labeled data in the supervised case and the network global weights are adjusted by the BP algorithm.

Figure 6 gives the three training processes of the DSWNN model. The process of RBM training can be regarded as the initialization of the weight parameters for the DSWNN model, which overcomes the shortcomings of the network easily falling into local optimum and long training time due to the random initialization of the weight parameters. For the detection of WT abnormality, a large number of unlabeled SCADA data can be used for pre-training, and a small number of labeled data can be used for fine-tuning. In this way, the trained DSWNN is more optimized than the deep auto-encoder network or BP network alone.

3. Anomaly Detection Based on Adaptive Threshold Estimating Method

In anomaly detection, the trained DSWNN model is used to predict the future values of the SCADA signals, and the prediction error (PE) between predicted value and actual value is used to judge the abnormalities. The PE is defined as Equation (14).

P E = \frac{(\overset{⌢}{X} - X)}{X} \times 100 %

(14)

where X is the actual value of SCADA data, and

\overset{⌢}{X}

is the output of the DSWNN model.

Generally, setting a threshold and comparing it with the PE is the most common and effective way to evaluate whether the wind turbine is in failure. When a wind turbine operates normally, the SCADA signals are all within the threshold range. Once an abnormality occurs, the implicit relationship between these monitoring signals will be broken, and one or more signal values will suddenly exceed the threshold to give an alarm. The rule for determining the abnormal condition is defined as Equation (15):

D = \{\begin{cases} 1 (a l a r m), & P E \geq R_{t h} \\ 0 (n o r m a l), & P E < R_{t h} \end{cases}

(15)

where R_th is the threshold.

However, the thresholds are often set in a wide range and remain unchanged after one set. On the one hand, the faults within the threshold will not be diagnosed; on the other hand, some occasional fluctuations caused by the random wind speed will be misdiagnosed as faults. Therefore, this paper presents an adaptive threshold estimating method on the basis of the extreme value theory to monitor the trend of PE and detect its anomaly variation.

Suppose X₁, X₂,…, X_n are n sample vectors of independent and randomly distributed variables whose distribution function is F(x). Each sample vector of X_i contains a certain amount of values in a period of time. M_n = max(X₁, X₂,…, X_n) represents the maximum of the n sample vectors. For a set of M_n, the probability distribution function can be described as Equation (16).

P_{r} (M_{n} \leq x) = P_{r} (X_{1} \leq x, X_{2} \leq x, \dots, X_{n} \leq x) = F^{n} (x), x \in R .

(16)

In general, Fⁿ(x) is unknown, so we need to replace the Fⁿ(x) with the extreme distribution function of the maximum or minimum values. When

n \to \infty

,

F^{n} (x) \to 0

. We should normalize the extreme distribution function to avoid the degradation of M_n to a point. Assume that there are two normalization parameters a_n and b_n that satisfying the non-degenerate distribution function H(x) [29].

\lim_{n \to \infty} P_{r} \{\frac{M_{n} - b_{n}}{a_{n}} \leq x\} \to H (x), x \in R

(17)

H (x) = \exp (- {[1 + (\frac{x - b_{n}}{a_{n}})]}^{- \frac{1}{β}}), (\begin{array}{l} a_{n} > 0, - \infty < b_{n} < \infty, \\ - \infty < β < \infty \end{array})

(18)

where, a_n and b_n are the scale parameter and the location parameter, respectively, β is the shape parameter.

A large number of normal SCADA data are used to train the DSWNN model, and also be used to calculate the PE value. As the data are mostly normal, the mean value of PE will be stable, but their variance data should be non-stationary. Therefore, the scale parameter a_n and the location parameter b_n can be obtained as follows.

a_{n} (t) = \exp (δ_{0} + δ_{t} g (t))

(19)

b_{n} (t) = δ_{0}

(20)

where δ₀ and δ_t are the constant coefficients, g(t) is a function describing the variable operating condition, which is affected by the changing SCADA data. Then the final adaptive thresholds for determination of warning can be calculated by:

R_{t h} = b_{n} - \frac{a_{n}}{β} [1 - {\{- \ln (1 - p)\}}^{- β}]

(21)

where, p is the confidence limit, which is calculated by the cumulative distribution function (CDF) method. The parameters a_n, b_n and β can be ensured by the maximum likelihood estimation approach.

4. Case Studies

The DSWNN model proposed in this paper is a novel complex neural network with small-world characteristics, which was trained using an unsupervised learning technique. In order to test the performance of the proposed DSWNN model, the following case analyses related to fault identification, prediction and classification are given in this section. The experimental data were one year’s SCADA data, which were collected from the SCADA system of thirty 2-MW wind turbines in a wind farm. Additionally, to increase the contrast, the classical deep belief network (DBN) model and deep neural network (DNN) model were used as the comparison methods.

Pitch failures are mainly categorized by the pitch sensor and actuator. The pitch sensor fault occurs from dust on the encoder disc, mis-adjustment of the blade pitch bearing, temperature beyond the acceptable range and humidity or improper calibration. These causes can result in the unbalanced rotation of the rotor from the sensor bias and fixed outputs from the last measurements. These faults for the blade pitch sensor and actuator frequently appear and result in structural loading of the turbine due to rotor imbalance and affect the stability of the floating platform. These failures are mainly reactions to the signals of pitch angle, pitch torque, pitch motor, and so on. The specific monitoring SCADA parameters will be used to train the DSWNN, DBN and DNN models, and the specific parameter information is listed in Table 1. In the following case analyses, the DSWNN, DBN, and DNN were all chosen to have the same network structure and the training data.

4.1. Prediction Analysis for the Pitch Abnormalities (Case A)

The three blades of the wind turbine have three groups of pitch driving devices, which are independent and synchronous. When the wind speed changes suddenly and the blades need to change the pitch angles frequently, the failure of synchronous action of the three pitch drives often leads to multiple failures occurring at the same time. Table 2 shows a record of multiple alarms that occurred on 1 January 2017. It is confirmed that the main reason was that the pitch action of the three blades was out of sync after the last maintenance. This kind of failure is caused by typical mechanical wear, and its fault characteristics will be hidden in SCADA data. Therefore, the relevant signals monitored can be expressed as Equation (22). We applied the dataset X_A to train the DSWNN, DBN and DNN models, then calculated the prediction errors by using the recorded failure SCADA data, as shown in Figure 7.

X_{A} = [v_{0}, θ_{1}, θ_{2}, θ_{3}, T_{1}, T_{2}, T_{3}, Ω, P] .

(22)

It can be seen from Figure 7 that the prediction error obtained by DSWNN fluctuated within the adaptive threshold range before time T3. But after T3, the error approached its upper limit and then gradually exceeded its upper limit. According to the alarm record, the first fault named “Pitch angle 1 out of sync” occurred at 07:58:12. This means that the DSWNN can detect an abnormality approximately 3 h ahead of actual downtime, which can provide sufficient time to take some actions for pitch system maintenance. In addition, from Figure 7a,b, the DNN and DBN models can also predict failures 1.1 h and 2.3 h in advance, respectively. The proposed DSWNN can detect incipient faults earlier than that of the DBN and DNN. Moreover, the error calculated by the DSWNN was the smallest in the three models, which shows that the DSWNN model can extract more sufficient dynamic features from normal SCADA data. Hence, on the one hand, the DSWNN model in terms of describing the dynamic behavior of the pitch system is more accurate. On the other hand, the adaptive threshold can effectively track the prediction error, which can increase the adaptability of the prediction models and provide a more accurate judgment basis for the system to reduce false alarms and omissions.

4.2. Performance Comparison of Pitch Fault Classification (Case B)

To verify the accuracy of the proposed method for multi-fault classification, nine typical or frequent pitch faults were selected as the classification targets according to the real alarm information from the SCADA system. Table 3 lists the specific failures and their descriptions, in which F1–F9 represent the nine fault alarms and F10 stands for the fault-free status. Similarly, DBN and DNN were used as comparison models, and the three models adopted the same network structure: (i) The number of the input neurons was 12, which corresponded to the 12-dimensional parameters described in Table 1; (ii) The number of the output neurons was 10, corresponding to the 10 classifications in Table 3 respectively; (iii) Five hidden layers and 30 neurons in each layer were selected for all three models. Specifically, the DSWNN model had five RBMs in the pre-training process, and the probability in the small-world transformation process was set as p = 0.1. Compared with the DSWNN, the DBN model had no process of small-world transformation, and it only included two processes: pre-training and fine-tuning. The DNN model was a standard multilayer feed-forward neural network, and its training process followed the error back-propagation principle.

In terms of data preparation, the experimental data were divided into 20,000 fragments of training data and 6000 fragments of validation data. Moreover, each failure category contained a certain amount of fault data, where the data distribution is shown in Table 4. It is worth noting that all the data fragments provided are labeled on the basis of the corresponding fault types. During the model training, the DSWNN and DBN were first pre-trained by the data fragments with labels removed, and then they were fine-tuned with these labeled data fragments. Simply, the DNN model used all labeled data for training and validating, which were conducted in a supervised environment.

Figure 8 gives the classification accuracy of the three models. By comparison with the misclassified conditions shown in the confusion matrices, the DNN model easily misjudged the faults of F3, F5, F6, and F8, which was critical as these true negatives could cause serious consequences. F3 and F5 were electrical failures, the occurrence of which has strong randomness and contingency. F6 and F8 were related to the pitch angle, which is supposed to be monitored by wind speed, wind power, pitch encoder, and blade root torque, etc. The reason for these failures is that the random wind speed makes a strong impact on the blades. The essential requirement for diagnosing the above faults is that the classification algorithms used should have the ability to mine implied features from multiple operating data. However, this was exactly what the DNN model lacked because its network parameters were generated by random initialization without any theoretical basis. It is generally known that parameters directly affect the classification results. Fortunately, the DSWNN and DBN models used pre-training to get better network parameters, and their classification accuracy for all failures showed better performance than that of the DNN model (see Figure 8b,c). But the accuracy in diagnosing the fault F4 and F5 also decreased, which may have been caused by the lack of fault data in the training data.

In addition, we also recorded the global changing errors of the three models in the training process at the same time, as shown in Figure 9. It can be seen from the figure that the convergence times of DSWNN, DBN and DNN were 255 s, 210 s, and 168 s, respectively. DSWNN took the longest time to calculate because it had the small-world transformation process between pre-training and fine-tuning. In a 0–110 s interval, the training errors of DSWNN and DBN almost reached the stage stability, which was because the network was in the pre-training stage and their multiple RBMs had reached the energy conservation. In the range of 110–130 s, DSWNN and DBN changed from unsupervised training to supervised training, and the training error suddenly decreased until a new lower convergence value appeared. Note that at 124 s, the training error of DSWNN increased in a short time, which was due to the additional weights of the add-edges in the network structure after the small-world transformation. The DSWNN retrained the random new add-edges, leading to the short-term error increase.

Seen from the above case studies, the advantages of the DSWNN model appeared mainly in two aspects: (1) the learning ability of the DSWNN model was better than the DBN method, and it was much better than the traditional DNN method; (2) the DSWNN model had very good sensitivity and accuracy in reflecting the condition changes of the wind turbine pitch system; (3) although the DSWNN model was not dominant in time cost, it had a stronger ability to mine deeper feature information from the same data source.

5. Conclusions

This paper presents a novel DSWNN model based on unsupervised learning for early anomaly detection and fault detection of wind turbines. The DSWNN model is a combination of deep auto-encoder network and small-world neural network, which are more accurate in simulating the dynamic behavior of wind turbines by working on a closer level of mimicking the working process of a natural brain. Analysis results of an actual case study confirm the following conclusions:

(1): The case study shows that the adaptive threshold can effectively track the prediction errors and reduce the false alarms. Therefore, the proposed adaptive threshold method based on the extreme value theory can be used in real-time monitoring of wind turbines to reduce the impact of wind speed fluctuations and external interference on the anomaly detection of the wind turbine.
(2): Compared with the DBN and DNN algorithms, the proposed DSWNN model has better performance in error prediction, fault classification and learning ability, which benefits its unique small-world characteristics and unsupervised pre-training. Although it takes slightly longer on network training, it still has broad application prospects and research value.
(3): In addition, the strategy combined with the DSWNN model and the adaptive threshold has been proven able to predict pitch system failures 3 h in advance, which can be used for subsequent anomaly detection and fault diagnosis of wind turbines.

Author Contributions

S.W. planned and supervised the whole project; M.L. developed the DSWNN algorithm, designed the criterion and performed the simulation and experiments; M.L. and S.W. contributed to discussing the results and writing the manuscript. S.F. and J.Z. contributed to the application of adaptive threshold theory and English proofreading. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 50776005, 51577008.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Acronyms list.

Acronyms	Full Name	Acronyms	Full Name
WT	Wind turbine	BP	back propagation
DSWNN	deep small-world neural network	MLP	multi-layer perceptron
SCADA	supervisory control and data acquisition	LSTM	long-short-term memory
FDI	fault detection and isolation	RNN	recurrent neural network
RBM	restricted Boltzmann machine	DAE	denoise auto-encoder
DBM	deep belief network	SAE	stacked auto-encoder
DNN	deep neural network	CD	contrastive divergence
AE	auto-encoder	ANN	artificial neural network

Table A2. Symbols list.

Symbols	Mean	Symbols	Mean
v	the visible neurons in RBM	W	weight matrix
h	the hidden neurons in RBM	W_a	global weight matrix
E(v, h)	the energy of joint configuration units in RBM	$W_{a}^{b}$	added weight matrix between a and b
P(v, h)	the joint probability	a, b	the ath and the bth hidden layer
w_ij	the symmetric weight	$w_{a x y}^{b}$	the weight of the added edge
v_i, h_j	the binary states	M	neuron numbers in the i + 1th hidden layer
a_i, b_j	the biases of the binary states	C	the Cross Entropy
Sigmoid	Sigmoid function	k	neuron number of the output layer
tanh	tanh function	t	the expected output
〈v_ih_j〉_data	expectation of data distribution	PE	prediction error
〈v_ih_j〉_model	expectation of model distribution	X	the actual value of SCADA data
〈v_ih_j〉_recon	expectation of reconstruction states	$\overset{⌢}{X}$	the output of the DSWNN model
η	learning rate	R_th	threshold
p	probability	X_i	sample vectors
n_ad	the number of the newly added connections	F(x)	distribution function
n_or	the number of the original connections	M_n	the maximum sample vectors
L(p)	characteristic path length	H(x)	non-degenerate distribution function
C(p)	clustering coefficient	a_n, b_n	normalization parameters
H = 1, 2, 3,…, i	the number of hidden layers in DSWNN	β	shape parameter
i	the ith hidden layer	δ₀, δ_t	constant coefficients
N	neuron numbers in the ith hidden layer	g(t)	Conditional function

References

Bouffard, F.; Galiana, F.D. Stochastic security for operations planning with significant wind power generation. IEEE Trans. Power Syst. 2008, 23, 306–316. [Google Scholar] [CrossRef]
Yang, W.; Court, R.; Jiang, J. Wind turbine condition monitoring by the approach of SCADA data analysis. Renew. Energy 2013, 53, 365–376. [Google Scholar] [CrossRef]
Schlechtingen, M.; Santos, I.F. Wind turbine condition monitoring based on SCADA data using normal behavior models. Part 2: Application examples. Appl. Soft Comput. 2014, 14, 447–460. [Google Scholar] [CrossRef]
Abramson, N.; Braverman, D.J.; Sebestyen, G.S. Pattern recognition and machine learning. IEEE Trans. Inf. Theory 1963, 9, 257–261. [Google Scholar] [CrossRef]
Mitchell, T.M. Machine Learning; McGraw-Hill: New York, NY, USA, 2003. [Google Scholar]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Zhao, R.; Yan, R.Q.; Chen, Z.H.; Mao, K.Z.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
Deng, L.; Yu, D. Deep learning: Methods and applications. Found. Trends^® Signal Process. 2013, 7, 1–192. [Google Scholar] [CrossRef]
Jiang, G.Q.; He, H.B.; Yan, J.; Xie, P. Multiscale convolutional neural networks for fault diagnosis of wind turbine gearbox. IEEE Trans. Ind. Electron. 2019, 66, 3196–3207. [Google Scholar] [CrossRef]
Yang, R.; Huang, M.G.; Lu, Q.D.; Zhong, M.Y. Rotating machinery fault diagnosis using long-short-term memory recurrent neural network. IFAC PapersOnLine 2018, 51, 228–232. [Google Scholar] [CrossRef]
Yu, W.X.; Huang, S.D.; Xiao, W.H. Fault diagnosis based on an approach combining a spectrogram and a convolutional neural network with application to a wind turbine system. Energies 2018, 11, 2561. [Google Scholar] [CrossRef]
Du, Y.L.; Li, J.Z.; Zhang, Y.; Fan, C. Saliency detection based on deep cross CNN and non- interaction GrabCut. Comput. Eng. Appl. 2017, 53, 32–40. [Google Scholar]
Sutskever, I.; Hinton, G.E.; Taylor, G.W. The recurrent temporal restricted Boltzmann machine. In Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–11 December 2008. [Google Scholar]
Geoffrey, H.E. Deep belief networks. Scholarpedia 2009, 4, 1–4. [Google Scholar]
Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; Volume 1316, pp. 1–16. [Google Scholar]
Vincent, P.; Larochelle, H.; Lajoie, I.; Yoshua, B.; Manzagol, P.A. Stacked Denoising Autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
Qin, Y.; Wang, X.; Zou, J.Q. The optimized deep belief networks with improved logistic sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines. IEEE Trans. Ind. Electron. 2019, 66, 3814–3824. [Google Scholar] [CrossRef]
Qin, F.W.; Bai, J.; Yuan, W.Q. Research on intelligent fault diagnosis of mechanical equipment based on sparse deep neural networks. J. Vibroeng. 2017, 19, 2439–2455. [Google Scholar] [CrossRef]
Zhao, H.S.; Liu, H.H.; Hu, W.J.; Yan, X.H. Anomaly detection and fault analysis of wind turbine components based on deep learning network. Renew. Energy 2018, 127, 825–834. [Google Scholar] [CrossRef]
Wang, S.X.; Li, M.; Zhao, L.; Jin, C. Short-term wind power prediction based on improved small-world neural network. Neural Comput. Appl. 2019, 29, 3173–3185. [Google Scholar] [CrossRef]
Li, M.; Wang, S.X. Dynamic fault monitoring of pitch system in wind turbines using selective ensemble small-world neural networks. Energies 2019, 12, 3256. [Google Scholar] [CrossRef]
Chen, Y.; Zhao, X.; Jia, X. Spectral–spatial classification of hyperspectral data based on deep belief network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2381–2392. [Google Scholar] [CrossRef]
Lei, Y.G.; Jia, F.; Zhou, X.; Lin, J. A deep learning-based method for machinery health monitoring with big data. J. Mech. Eng. 2015, 51, 49–56. [Google Scholar] [CrossRef]
Salakhutdinov, R.R.; Murray, I. On the quantitative analysis of deep belief networks. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 872–879. [Google Scholar]
Geoffrey, H.E. Training products of experts by minimizing contrastive divergence. Neural Comput. 2002, 14, 1771–1800. [Google Scholar]
Watts, D.J.; Strogatz, S.H. Collective dynamics of small world networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef] [PubMed]
Kim, S.Y.; Lim, W. Stochastic spike synchronization in a small-world neural network with spike-timing-dependent plasticity. Neural Netw. 2018, 97, 92–106. [Google Scholar] [CrossRef] [PubMed]
Humpert, B.K. Improving back propagation with a new error function. Neural Netw. 1994, 7, 1191–1192. [Google Scholar] [CrossRef]
Toshkova, D.; Lieven, N.; Morrish, P.; Hutchinson, P. Applying extreme value theory for alarm and warning levels setting under variable operating conditions. In Proceedings of the 8th European Workshop on Structural Health Monitoring, Bilbao, Spain, 5–8 July 2016; pp. 3142–3151. [Google Scholar]

Figure 1. Structure of the deep small-world neural network (DSWNN) model.

Figure 2. Structure of a Restricted Boltzmann Machine (RBM).

Figure 3. The ways of small-world transformation.

Figure 4. The random add-edge procedure from a regular back propagation (BP) neural network to a random one.

Figure 5. L(p) and C(p) for the four-hidden-layer DSWNN described in Figure 1.

Figure 6. Three processes for training the proposed DSWNN model.

Figure 7. Fault prediction errors (a) by deep neural network (DNN); (b) by denoise auto-encoder (DAE); (c) by DSWNN.

Figure 8. Comparison of three neural networks in classification accuracy of 10 types of faults.

Figure 9. Training errors of the DSWNN, DBN, and DNN.

Table 1. Description of wind turbine pitch system signals.

No.	Signals	Notation	No.	Signals	Notation
1	Wind speed	v₀	7	Pitch moment 3	M₃
2	Pitch angle 1	θ₁	8	Pitch motor temperature 1	T₁
3	Pitch angle 2	θ₂	9	Pitch motor temperature 2	T₂
4	Pitch angle 3	θ₃	10	Pitch motor temperature 3	T₃
5	Pitch moment 1	M₁	11	Generator speed	Ω
6	Pitch moment 2	M₂	12	Power	P

Table 2. Fault alarm records for the pitch system of the wind turbine.

Device Name	Wind Speed (m/s)	Abnormal Message	Set Time	Reset Time
FKD_F088	8.42	Pitch angle 1 out of sync	1 January 2017 07:58:12	1 January 2017 08:37:04
FKD_F088	5.05	Pitch 1 drive error	1 January 2017 07:58:35	1 January 2017 08:37:04
FKD_F088	5.04	Pitch 2 drive error	1 January 2017 07:58:35	1 January 2017 08:37:04
FKD_F088	6.76	Storage relay not reset	1 January 2017 08:00:04	1 January 2017 08:37:04

Table 3. Fault list.

No.	Fault Names	No.	Fault Names
F1	Pitch drive error	F6	Brake position error
F2	Hub drive error	F7	Over speed during braking
F3	Storage relay not reset	F8	Pitch position error delay
F4	Yaw brake fuse	F9	Pitch position out of sync
F5	Power interruption	F10	Fault free

Table 4. Confusion information of the experimental data.

Data Fragment	F1	F2	F3	F4	F5	F6	F7	F8	F9	F10	Total
Training data	233	238	224	84	74	245	232	229	238	18,203	20,000
Validation data	79	73	58	28	41	91	67	72	68	5423	6000

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, M.; Wang, S.; Fang, S.; Zhao, J. Anomaly Detection of Wind Turbines Based on Deep Small-World Neural Network. Appl. Sci. 2020, 10, 1243. https://doi.org/10.3390/app10041243

AMA Style

Li M, Wang S, Fang S, Zhao J. Anomaly Detection of Wind Turbines Based on Deep Small-World Neural Network. Applied Sciences. 2020; 10(4):1243. https://doi.org/10.3390/app10041243

Chicago/Turabian Style

Li, Meng, Shuangxin Wang, Shanxiang Fang, and Juchao Zhao. 2020. "Anomaly Detection of Wind Turbines Based on Deep Small-World Neural Network" Applied Sciences 10, no. 4: 1243. https://doi.org/10.3390/app10041243

APA Style

Li, M., Wang, S., Fang, S., & Zhao, J. (2020). Anomaly Detection of Wind Turbines Based on Deep Small-World Neural Network. Applied Sciences, 10(4), 1243. https://doi.org/10.3390/app10041243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Anomaly Detection of Wind Turbines Based on Deep Small-World Neural Network

Abstract

1. Introduction

2. Deep Small-World Neural Network (DSWNN)

2.1. Pre-Training of the DSWNN Prototype

2.2. Small-World Transformation of DSWNN

2.3. Fine-Tuning of the DSWNN Parameters

3. Anomaly Detection Based on Adaptive Threshold Estimating Method

4. Case Studies

4.1. Prediction Analysis for the Pitch Abnormalities (Case A)

4.2. Performance Comparison of Pitch Fault Classification (Case B)

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI