A Multiscale Spatio-Temporal Convolutional Deep Belief Network for Sensor Fault Detection of Wind Turbine

Sensor fault detection of wind turbines plays an important role in improving the reliability and stable operation of turbines. The supervisory control and data acquisition (SCADA) system of a wind turbine provides promising insights into sensor fault detection due to the accessibility of the data and the abundance of sensor information. However, SCADA data are essentially multivariate time series with inherent spatio-temporal correlation characteristics, which has not been well considered in the existing wind turbine fault detection research. This paper proposes a novel classification-based fault detection method for wind turbine sensors. To better capture the spatio-temporal characteristics hidden in SCADA data, a multiscale spatio-temporal convolutional deep belief network (MSTCDBN) was developed to perform feature learning and classification to fulfill the sensor fault detection. A major superiority of the proposed method is that it can not only learn the spatial correlation information between several different variables but also capture the temporal characteristics of each variable. Furthermore, this method with multiscale learning capability can excavate interactive characteristics between variables at different scales of filters. A generic wind turbine benchmark model was used to evaluate the proposed approach. The comparative results demonstrate that the proposed method can significantly enhance the fault detection performance.


Introduction
Recently, wind energy as an inexhaustible and fast-growing clean renewable energy source has received considerable attention. As critical equipment for wind power generation, wind turbines have been widely distributed around the world. In practice, these turbines are usually situated in far-flung regions and always suffer from harsh operating environments, which can easily cause various failures and even shutdowns in severe cases [1]. Specifically, sensors, such as pitch angle, power, and speed sensors, which are widely equipped in wind turbines for monitoring and controlling the operation of the entire turbine, are extremely prone to various faults. Statistically, sensor failures account for approximately 15% of the total wind turbine failures [2]. Furthermore, sensor failures can cause signal corruption for condition monitoring and fault diagnosis, which in turn affect the health status of other key subassemblies, thereby reducing the reliability of the turbine and increasing economic losses [3,4]. As a result, it is particularly important and challenging to research effective and valuable fault detection approaches for wind turbine sensors.
Up to now, numerous fault detection techniques for wind turbine sensors have been proposed and discussed. On the one hand, physical-model based approaches have been proved to be effective 1.
A novel MSTCDBN method is proposed to overcome the limitations of traditional CDBN that lack the ability to capture the spatio-temporal correlations inherent in multivariate time series and cannot realize multiscale feature learning. In other words, the proposed MSTCDBN has the superiorities of spatio-temporal dependence extraction and multiscale feature characterization, simultaneously. At the same time, as far as we know, this is the first time CDBN has been applied to the analysis and processing of multivariate time series.

2.
Specifically, the spatio-temporal dependences hidden in the multivariate time series are considered by designing different forms of convolution kernels in a cascade way. Furthermore, the interactive and complementary representations between sensor variables are extracted at multiple different scales of filters in a parallel fashion. The proposed MSTCDBN with the multiscale spatio-temporal feature learning ability enables us to enhance the classification performance greatly.

3.
A generic wind turbine benchmark model is utilized to evaluate the effectiveness of the proposed method in the fault detection of wind turbine sensors, and comparative studies are performed.
The reminder of this paper is organized as follows. Section 2 briefly describes the theory of standard CDBN and introduces the proposed MSTCDBN approach for fault detection of wind turbine sensors in detail. A systematic description of the experiment and the acquisition and preprocessing of multivariate time series signals are proposed in Section 3. Section 4 gives the comparative detection results to evaluate the effectiveness of the proposed method. Conclusions are provided in Section 5.

Standard CDBN Methodology
The standard CDBN is a novel hierarchical probabilistic generative model constituted by several stacked convolutional restricted Boltzmann machines (CRBMs). This model takes the two-dimensional input structure into account and has the superiorities of weight sharing and unsupervised feature learning. Generally, each CRBM contains one visible layer (input layer, typically, binary or Gaussian) and one hidden layer. Since the introduction of the convolution operation, not only the connection weights between these two layers are shared, but also the most significant features of the local area can be extracted. In addition, in order to improve computational efficiency and retain the most useful information, a probabilistic max-pooling layer is usually added after the hidden layer. Figure 1 displays the typical structure of the CRBM model, and for simplicity, only the kth hidden group and the pooling layer are shown.
Sensors 2020, 20, 3580 3 of 14 multiple different scales of filters in a parallel fashion. The proposed MSTCDBN with the multiscale spatio-temporal feature learning ability enables us to enhance the classification performance greatly. 3. A generic wind turbine benchmark model is utilized to evaluate the effectiveness of the proposed method in the fault detection of wind turbine sensors, and comparative studies are performed.
The reminder of this paper is organized as follows. Section 2 briefly describes the theory of standard CDBN and introduces the proposed MSTCDBN approach for fault detection of wind turbine sensors in detail. A systematic description of the experiment and the acquisition and preprocessing of multivariate time series signals are proposed in Section 3. Section 4 gives the comparative detection results to evaluate the effectiveness of the proposed method. Conclusions are provided in Section 5.

Standard CDBN Methodology
The standard CDBN is a novel hierarchical probabilistic generative model constituted by several stacked convolutional restricted Boltzmann machines (CRBMs). This model takes the two-dimensional input structure into account and has the superiorities of weight sharing and unsupervised feature learning. Generally, each CRBM contains one visible layer (input layer, typically, binary or Gaussian) and one hidden layer. Since the introduction of the convolution operation, not only the connection weights between these two layers are shared, but also the most significant features of the local area can be extracted. In addition, in order to improve computational efficiency and retain the most useful information, a probabilistic max-pooling layer is usually added after the hidden layer. Figure 1   Assume that the input layer of the CRBM consists of a N V × N V matrix. The hidden layer is composed of K groups with N H × N H matrix. Therefore, there are N 2 H K hidden units included in this CRBM. Meanwhile, each hidden group is jointed with a N W × N W filter, where N W = N V − N H + 1. In particular, in order to deal with the real-valued input variables of the SCADA system, the Gaussian visible units should be adopted. The energy function of the Gaussian CRBM can be expressed as where v and h denote the visible units and hidden units, respectively. v i,j is the element in the ith row and the jth column of the matrix v, h k i,j is the element in the ith row and the jth column of the kth hidden group, and W k r,s is the element in the rth row and sth column of the kth filter. c represents the shared bias of all visible units and b k is the bias of each hidden group.
Then, the conditional distributions of Gaussian CRBMs are calculated according to the block Gibbs sampling [27], which can be described as follows where σ(x) = 1/(1 + exp(x)) represents the logistic sigmoid function, N(µ, σ 2 ) is the Gaussian distribution with mean µ and variance σ 2 , * refers to the convolution operation, and W k i,j = W k N W −j+1 . After obtaining the features of training samples through the convolution operation, the probabilistic max-pooling layer is usually introduced to further reduce the computational complexity and retain the most useful feature information. The pooling layer also has K groups, each of which is a N P × N P matrix. For each k ∈ {1, · · · , K}, the hidden group H k is partitioned into multiple blocks of size C × C, where C generally refers to a small integer like one, two, or three. Meanwhile, each block is associated with a unit in the pooling layer. Finally, a contrastive divergence method [31] is conducted to get the optimal model parameters W k , c, b k of the Gaussian CRBM. As a result, the learning process of the hierarchical generative algorithm CDBN can be accomplished by continuously training several individual CRBMs.

MSTCDBN Architecture
The main idea of the new MSTCDBN method is to incorporate spatio-temporal characteristic representation and multiscale feature learning into the conventional CDBN structure to enhance fault detection performance. It is worth noting that the fault detection method for wind turbine sensors mentioned in this paper is a classification-based supervised detection method, which typically belongs to a binary classification problem. To be specific, different sensor failure scenarios are uniformly defined as fault condition to perform effective classification detection. The overall schematic is given in Figure 2, and the general implementation process are illustrated as follows.
where v and h denote the visible units and hidden units, respectively. , Then, the conditional distributions of Gaussian CRBMs are calculated according to the block Gibbs sampling [27], which can be described as follows N μ σ is the Gaussian distribution with mean μ and variance 2 σ , * refers to the convolution operation, and After obtaining the features of training samples through the convolution operation, the probabilistic max-pooling layer is usually introduced to further reduce the computational complexity and retain the most useful feature information. The pooling layer also has K groups, each of which is a P , the hidden group k H is partitioned into multiple blocks of size C C × , where C generally refers to a small integer like one, two, or three. Meanwhile, each block is associated with a unit in the pooling layer. Finally, a contrastive divergence method [31] is conducted to get the optimal model parameters { } , , Gaussian CRBM. As a result, the learning process of the hierarchical generative algorithm CDBN can be accomplished by continuously training several individual CRBMs.

MSTCDBN Architecture
The main idea of the new MSTCDBN method is to incorporate spatio-temporal characteristic representation and multiscale feature learning into the conventional CDBN structure to enhance fault detection performance. It is worth noting that the fault detection method for wind turbine sensors mentioned in this paper is a classification-based supervised detection method, which typically belongs to a binary classification problem. To be specific, different sensor failure scenarios are uniformly defined as fault condition to perform effective classification detection. The overall schematic is given in Figure 2, and the general implementation process are illustrated as follows.

1.
Collect SCADA datasets with various health conditions of wind turbine sensors. For each health condition, data is preprocessed and further divided into several two-dimensional fragments to acquire training and testing sets separately.

2.
Multiple CDBN models with different structures are integrated for multiscale spatio-temporal feature learning. In particular, this learning process is realized in a typical unsupervised manner. Then, the obtained multiscale spatio-temporal characteristics are input to a classifier to detect the condition of wind turbine sensors.
Testing sets are fed into the well-learned MSTCDBN-based model to perform multiscale spatio-temporal feature extraction and produce detection results.
Inspired by inception structure [32,33], one of the advantages of this approach is to extract and capture useful interactive signatures at multiple different scales of filters in a parallel manner. On the contrary, due to the inherent spatio-temporal characteristics of the multivariate time series, feature extraction along different dimensions can not only make the model more interpretative but also can improve the model performance [34,35]. In view of this, another property is to excavate spatial and temporal correlation information in a cascade way by designing different forms of convolution kernels. In general, the method mainly consists of three consecutive stages, which are multiscale spatial feature learning, multiscale temporal feature leaning, and classification, respectively.

Multiscale Spatial Feature Learning
Given an input sensor measurement matrix X ∈ R S×T , where s denotes the number of sensor variables, and t represents sampling points. For this multivariate time series matrix, three different scales of filters, including two by one, three by one, and five by one, are used to extract the interactive characteristics among multiple variables in parallel, respectively. Note that each filter is designed to slide only along the variable axis in this section. In other words, three different CDBN modules are initially executed to process the input data, therein extracting advanced abstract multiscale spatial correlation information.
In terms of each CDBN module, it includes two hidden layers followed by a pooling layer, and as mentioned above, the convolution and pooling operations are performed only in the spatial dimension. Once the above process is completed, in order to keep the temporal dimension unchanged, the local spatial feature maps yielded by these CDBN modules are concatenated in the direction of the variable axis for further multiscale temporal feature extraction.

Multiscale Temporal Feature Learning
This phase is designed to extract valuable temporal characteristics from the learned spatial maps at different filter scales in parallel. Similar to the spatial feature learning, three different scales of filters are adopted. However, the difference is that each filter is designed to move only along the time axis of each variable, and the sizes are set to one by two, one by three, and one by five, respectively. This implies that three additional CDBN modules are applied to mine useful and different temporal correlation information. In the same way, each CDBN module contains two hidden layers and a pooling layer. Then, the local temporal correlations extracted at each filter scale are concatenated along the time axis for the final fault classification detection.

Classification
It should be noted that the fault detection for wind turbine sensors focused on this work belongs to the category of binary classification, which indicates whether the sensors are in a healthy condition. In this case, the high-level multiscale spatio-temporal representations learned in the feature extraction phase is first transformed into a two-dimensional matrix, and then directly fed into the final softmax function to convert each class of predictions into conditional probabilities. For the training of the proposed method, the cross entropy function is selected as the loss function, as shown in Equation (4), where p(i) refers to the true distribution, and q(i) stands for the estimated distribution. Finally, after adequate training of the proposed method, the testing set is further used for performance evaluation.

Available Data
In order to implement the proposed approach in wind turbine sensor fault detection, a generic 5MW-based offshore wind turbine benchmark model presented in [36] was employed. This benchmark models a realistic three-bladed, variable speed horizontal axis wind turbine using a fatigue, aerodynamics, structures, and turbulence (FAST) aeroelastic simulator, and has been extensively applied to evaluate a variety of fault detection and diagnostic approaches [37,38]. Moreover, the model can generate actual stochastic wind data series, and the cut-in, rated, and cut-out wind speeds were 3 m/s, 11.4 m/s, and 25 m/s, respectively. A more detailed illustration of the wind turbine benchmark model was given in [36]. In this work, in order to verify the proposed approach in a more realistic scenario, a mean speed of 17 m/s at hub height was used to generate the required data set, and a simulation of this wind speed sequence is shown in Figure 3.
Sensors 2020, 20, 3580 6 of 14 estimated distribution. Finally, after adequate training of the proposed method, the testing set is further used for performance evaluation.

Available Data
In order to implement the proposed approach in wind turbine sensor fault detection, a generic 5MW-based offshore wind turbine benchmark model presented in [36] was employed. This benchmark models a realistic three-bladed, variable speed horizontal axis wind turbine using a fatigue, aerodynamics, structures, and turbulence (FAST) aeroelastic simulator, and has been extensively applied to evaluate a variety of fault detection and diagnostic approaches [37,38]. Moreover, the model can generate actual stochastic wind data series, and the cut-in, rated, and cut-out wind speeds were 3 m/s, 11.4 m/s, and 25 m/s, respectively. A more detailed illustration of the wind turbine benchmark model was given in [36]. In this work, in order to verify the proposed approach in a more realistic scenario, a mean speed of 17 m/s at hub height was used to generate the required data set, and a simulation of this wind speed sequence is shown in Figure 3. In a wind turbine SCADA system, vast amounts of measurements were collected to monitor the operating status of the turbine and its key subsystems, such as power output, wind speed, temperature, blade pitch angle, and generator speed. Similarly, in regard to the benchmark model, a total of 15 sensor outputs with measurement noise are provided, as listed in Table 1 [17], each of which was generated by adding a band-limited Gaussian white noise that are parameterized by noise power to the actual variable value by the FAST simulator. Note that all these measured variables were derived from real wind turbine SCADA systems. Based on this benchmark model, the actual sensor failure scenarios of wind turbines can also be defined. Overall, six sensor operating conditions with different kinds of faults were involved in this work, and the details of all these conditions are displayed in Table 2.  In a wind turbine SCADA system, vast amounts of measurements were collected to monitor the operating status of the turbine and its key subsystems, such as power output, wind speed, temperature, blade pitch angle, and generator speed. Similarly, in regard to the benchmark model, a total of 15 sensor outputs with measurement noise are provided, as listed in Table 1 [17], each of which was generated by adding a band-limited Gaussian white noise that are parameterized by noise power to the actual variable value by the FAST simulator. Note that all these measured variables were derived from real wind turbine SCADA systems. Based on this benchmark model, the actual sensor failure scenarios of wind turbines can also be defined. Overall, six sensor operating conditions with different kinds of faults were involved in this work, and the details of all these conditions are displayed in Table 2.

Data Collection and Preprocessing
In this section, in order to acquire more realistic time series measurements, different wind data sets were used for generating numerical simulations. Note that the sampling time of each data set was 0.0125 s, and the duration was 630 s. In particular, 420 simulations were carried out totally, of which 210 were in normal condition and 45 for each failure scenario.
However, in practice, the sampling time of 0.0125 s was usually relatively low compared with the real SCADA system. Therefore, in order to consider a realistic higher sampling time [39], the raw measured values from all conditions were down-sampled to 1 s. Furthermore, because the initialization of the numerical simulation can result in transients [40], the first 30 s sequence of each simulation was deleted. Generally, different variables had different dimensions, so it was necessary to normalize Sensors 2020, 20, 3580 7 of 14 these measurements such that they are in a consistent range of [0,1]. After that, in order to train the proposed method on the multivariate time series, the normalized signals were divided into a set of segments with a length of 100 sampling points without overlap through using sliding window technology, which meant that each time series segment represents a sequence of 100 s. At this point, there were 1260 samples for the normal condition and 270 samples for each failure scenario, and these samples were manually labeled to indicate whether the sensor was in a healthy condition. In this paper, a commonly used random selection approach was used for performance evaluation, which can avoid the impact of contingency and particularity on the diagnosis results. In addition, considering that data bias could severely deteriorate the evaluation results, these data sets were intentionally divided to make the classes balanced. In particular, random 1000 samples of the normal condition were selected for model training and 250 samples for testing. For each fault condition, random 200 samples were applied for training and 50 samples for testing. The detailed description of the sample distribution is shown in Table 3, and the dimension of each multivariate time series sample was 15 by 100. Azimuth angle low speed rad 10 −3 10, 11, 12 Blade root moment of ith blade Nm 10 3

Results
In this section, in order to verify the effectiveness of the proposed approach and overcome the influence of randomness in the model training process on the detection results, ten trials were conducted for evaluating the overall performance. Meanwhile, the advantage of the proposed MSTCDBN fault detection method was proved by comparing with traditional CDBN and its other variants. Here, four commonly adopted evaluation metrics, classification accuracy, precision, recall, and F1 score were used for performance evaluation and comparison, which can be defined as follows where TP refers to the number of correctly classified as positive samples, TN is the number of correctly classified as negative samples, FP is the number of misclassified as positive samples, and FN is the number of misclassified as negative samples, respectively. As described in Section 2, in order to extract interactive spatio-temporal features for sensor fault detection, six different forms of CDBN were designed in the proposed MSTCDBN method. Each CDBN module consists of two hidden layers and a pooling layer, and the number of filter groups for two hidden layers were set to 9 and 16, respectively. The detailed structures are listed in Table 4. In addition, in the process of model training, the batch sizes of each CDBN were 100 and 10, and the stride length was selected as 1 by 1. In the final classification phase, the output size of the MSTCDBN was two, which corresponds to the normal and fault conditions of the sensor, respectively. In addition, herein, several structures of CDBN from different perspectives were investigated to deeply explore the capability of the presented method in the wind turbine sensor fault detection, including standard CDBN, single-scale temporal CDBN (STCDBN), single-scale spatial CDBN (SSCDBN), single-scale spatio-temporal CDBN (SSTCDBN), multiscale temporal CDBN (MTCDBN), and multiscale spatial CDBN (MSCDBN). Specifically, the first three methods all contain one CDBN module and only consider the correlations on a single scale. For the standard CDBN, conventional square filters are adopted. The filters of STCDBN and SSCDBN are specified to slide only along the time and spatial axes, respectively. In terms of the SSTCDBN, it contains two cascaded CDBN modules, taking into account the correlations in both temporal and spatial dimensions on a single scale. For the latter two methods, MTCDBN and MSCDBN consist of three parallel CDBN modules, extracting temporal and spatial information on multiple scales, respectively. In these experiments, the same input, two-layer structures CDBN and softmax classifier as the proposed method was used, and the stride length and batch sizes were also set to 1 by 1 and 100 and 10, respectively. The detailed structures are shown in Table 5. Moreover, all models with the running environment Intel Core (TM) i5-4300 CPU and 8-GB RAM using the Matlab software. The average testing performance (mean ± standard deviation) of all methods over ten trials are given in Figure 4 and the average testing time is shown in Table 6. improvements from other variants of CDBN to MSTCDBN can be observed for accuracy, precision, and F1-score. Overall, the method presented in this paper results in an enhanced fault detection performance. Likewise, it is not difficult to find from Table 6 that although the structure of the proposed method is relatively complex, it costs less computing time than the MTCDBN and MSCDBN due to the introduction of pooling operation and need for the fewer number of filter groups.    It can be easily seen from Figure 4 that the first six methods yield similar detection performance. However, compared with standard CDBN with conventional square filter, STCDBN and SSCDBN have better performance, which shows the ability of temporal and spatial information extraction. Moreover, in terms of the mean value, it can be seen that MTCDBN is superior to STCDBN in all evaluation metrics. Although MSCDBN is slightly inferior to SSCDBN, the proposed method obviously outperforms these two methods and other structures of CDBN. This is mainly because MSTCDBN with different scales of filters can extract and learn the interactive spatio-temporal correlation information that is beneficial for classification. Specifically, better and more stable improvements from other variants of CDBN to MSTCDBN can be observed for accuracy, precision, and F1-score. Overall, the method presented in this paper results in an enhanced fault detection performance. Likewise, it is not difficult to find from Table 6 that although the structure of the proposed method is relatively complex, it costs less computing time than the MTCDBN and MSCDBN due to the introduction of pooling operation and need for the fewer number of filter groups.
From another perspective, in order to better understand the classification performance of different approaches, the testing classification results over ten trials using the confusion matrix are given in Figure 5. The 0 and 1 represent normal and fault conditions, respectively. It can be observed that the confusion matrix comprehensively describes the number of correctly classified samples and misclassified samples for normal and fault conditions, the percentage of each condition that is correctly classified and incorrectly classified, and the percentage of correctly classified and misclassified in each predicted label. Obviously, compared with other methods, fewer total samples are misclassified when employing proposed MSTCDBN method, resulting in better detection performance. From another perspective, in order to better understand the classification performance of different approaches, the testing classification results over ten trials using the confusion matrix are given in Figure 5. The 0 and 1 represent normal and fault conditions, respectively. It can be observed that the confusion matrix comprehensively describes the number of correctly classified samples and misclassified samples for normal and fault conditions, the percentage of each condition that is correctly classified and incorrectly classified, and the percentage of correctly classified and misclassified in each predicted label. Obviously, compared with other methods, fewer total samples are misclassified when employing proposed MSTCDBN method, resulting in better detection performance. According to Section 3, six health conditions of sensors are mainly focused on this paper, including the normal condition and five different patterns of faults. Therefore, it is necessary to quantitatively evaluate the detection performance of an individual condition based on the overall detection results. The comparative results with average classification accuracy are given in Table 7.  According to Section 3, six health conditions of sensors are mainly focused on this paper, including the normal condition and five different patterns of faults. Therefore, it is necessary to quantitatively evaluate the detection performance of an individual condition based on the overall detection results. The comparative results with average classification accuracy are given in Table 7.
As can be observed from Table 7 that in terms of normal condition, fault types three and four, all methods achieved relatively good performance, with the classification accuracy of above 98%, 93%, and 92%, respectively. Moreover, the proposed method and SSTCDBN outperform the other models in classifying fault type one. As far as fault type five was concerned, the recognition accuracy of MSCDBN was only 62.8%, which is significantly lower than other methods. It is worth mentioning that as for fault type two, the approach presented in this paper achieved the highest classification accuracy of 83.6%. However, with respect to other variants of CDBN, the best result was only 60.4%, which clearly shows the advantage of the proposed method in fault identification. Moreover, it also indirectly indicates that fault type two is probably the most difficult of the six health conditions to detect. In a word, the method incorporating multiscale and spatio-temporal feature learning capability presented in this paper plays an important role in sensor fault detection and finally obtains the highest overall accuracy. In order to further demonstrate the ability of the proposed method in fault detection of wind turbine sensors, a comparison with traditional ANN and deep belief network (DBN) is carried out. For these two networks, the four-layer structures consisting of an input layer, two hidden layers, and an output layer are used for classification. It should be noted that different from the CDBN model, DBN, and ANN deal with data on the one-dimensional input structure. Table 8 shows the average comparison results between the proposed MSTCDBN and the two methods in terms of four evaluation metrics over ten trails. It can be easily found from Table 8 that the accuracy, precision, recall, and F1-score generated by the proposed MSTCDBN method was higher than the ANN and DBN models. It means that the proposed method achieves the best overall performance compared to the other two methods, indicating the superiority of the proposed method in fault detection. Furthermore, it can also be seen that the DBN performs better than the ANN. This is mainly because compared with the ANN model, DBN has powerful unsupervised feature learning ability, which can handle complex relationships between variables, thus resulting in relatively better performance. However, this model ignores the two-dimensional structure of the input, making it difficult to extract the spatio-temporal correlations hidden in the multivariate time series. In contrast, the proposed MSTCDBN method takes into account spatio-temporal features at different filter scales, which contributions to the enhancement of the fault detection performance.
In summary, since the sensor measurements generated by the benchmark model can truly reflect the wind turbine SCADA data, the proposed MSTCDBN method has the potential to be an alternative method for sensor fault detection in real wind farms. Meanwhile, in practice, it is reasonable to train an independent model for each turbine due to the different operating conditions and environments of each turbine. Likewise, the condition labels which are helpful for fault detection should be carefully labeled so that the proposed method is widespread adopted.

Conclusions
In this paper, a novel MSTCDBN approach is presented to address the challenging task of fault detection of wind turbine sensors by considering the temporal and spatial correlations hidden in the multivariate time series. Firstly, a major feature of the presented approach is that different forms of convolution kernels are designed to extract the spatio-temporal characteristics between sensor variables in a cascade way. Secondly, another contribution is to learn interactive and rich fault features by incorporating different scales of filters in a parallel manner. Accordingly, the proposed method can enhance the feature learning capacity and fault detection performance. Furthermore, the effectiveness of the MSTCDBN model was investigated by comparing with traditional CDBN and its other variants, as well as ANN and DBN methods. In particular, the developed method combing spatio-temporal feature extraction and multiscale learning is better than other models in terms of classification performance, which provides a new insight for detecting wind turbine sensor faults and can be extended to other research applications.
However, due to the limited SCADA data, this study is only validated on a generic benchmark model. In future work, the developed MSTCDBN will further be spread to real wind turbine sensors, and other critical components in wind turbines will be discussed. From the perspective of pattern recognition, a fault diagnosis system can quickly identify and locate fault categories when abnormalities occur in the wind turbine. Therefore, the proposed method will be further used for the sensor fault diagnosis to identify the specific fault type of the sensor. Based on the sensor fault detection and diagnosis, it is necessary to adopt corresponding fault-tolerant control strategies to ensure that the turbine is in normal operation, thereby improving the safety and reliability of the turbine. Meanwhile, more advanced feature extraction and learning approaches will be established to further mine the spatio-temporal correlations in SCADA multivariate time series. In this paper, the data-driven method is used to realize the fault detection, and the combination of data-driven and model-based methods is expected to become a new research direction [41].