Next Article in Journal
Dynamic Modeling and Robust Controllers Design for Doubly Fed Induction Generator-Based Wind Turbines under Unbalanced Grid Fault Conditions
Next Article in Special Issue
Environmental Decision Support System for Biogas Upgrading to Feasible Fuel
Previous Article in Journal
Impact Analysis of Survivability-Oriented Demand Response on Islanded Operation of Networked Microgrids with High Penetration of Renewables

Energies 2019, 12(3), 453; https://doi.org/10.3390/en12030453

Article
Feature Selection Algorithms for Wind Turbine Failure Prediction
1
Data and Signal Processing Group, University of Vic—Central University of Catalonia, c/ de la Laura 13, 08500 Vic, Catalonia, Spain
2
Smartive-ITESTIT SL, Carretera BV-1274, Km1, 08225 Terrassa, Catalonia, Spain
*
Author to whom correspondence should be addressed.
Received: 3 December 2018 / Accepted: 28 January 2019 / Published: 31 January 2019

Abstract

:
It is well known that each year the wind sector has profit losses due to wind turbine failures and operation and maintenance costs. Therefore, operations related to these actions are crucial for wind farm operators and linked companies. One of the key points for failure prediction on wind turbine using SCADA data is to select the optimal or near optimal set of inputs that can feed the failure prediction (prognosis) algorithm. Due to a high number of possible predictors (from tens to hundreds), the optimal set of inputs obtained by exhaustive-search algorithms is not viable in the majority of cases. In order to tackle this issue, show the viability of prognosis and select the best set of variables from more than 200 analogous variables recorded at intervals of 5 or 10 min by the wind farm’s SCADA, in this paper a thorough study of automatic input selection algorithms for wind turbine failure prediction is presented and an exhaustive-search-based quasi-optimal (QO) algorithm, which has been used as a reference, is proposed. In order to evaluate the performance, a k-NN classification algorithm is used. Results showed that the best automatic feature selection method in our case-study is the conditional mutual information (CMI), while the worst one is the mutual information feature selection (MIFS). Furthermore, the effect of the number of neighbours (k) is tested. Experiments demonstrate that k = 1 is the best option if the number of features is higher than 3. The experiments carried out in this work have been extracted from measures taken along an entire year and corresponding to gearbox and transmission systems of Fuhrländer wind turbines.
Keywords:
feature selection; failure prediction; wind energy; health monitoring; sensing systems; wind farms; condition monitoring; SCADA data

1. Introduction

Each year, the wind sector has profit losses due to wind turbine failures that can range from around 200 M€ in Spain or 700 M€ in Europe to 2200 M€ in the rest of the world. Additionally, if operation costs are taken into account, these losses can be tripled. Owing to the volume of losses and the actual economic situation in the sector, without any bonuses to the generation and furthermore with generation selling prices policy restricted by new energy directives (see for example [1,2]), tasks related to maintenance and operation improvement are key for wind farm operators, maintenance companies, financial institutions, insurance companies and investors.
The operating and environmental conditions of virtually all wind turbines (WT) in use today are recorded by the turbines’ “supervisory control and data acquisition” (SCADA) system in 10-min intervals [3]. The number of signals available to the turbine operator varies considerably between turbines of different manufacturers as well as between generation of turbines by the same manufacturer. This is because of its complex nature as indicated on the IEC [4]. The minimum data-set typically includes 10 min-average values of wind speed, wind direction, active power, reactive power, ambient temperature, pitch angle and rotational speed (rotor and/or generator). An example of these sensors are depicted in Figure 1.
One of the main tasks of the Operation and Maintenance (O&M) process is to find out the possible causes of a fault manifested by a specific alarm or a set of alarms that stops the wind turbine production. This process is crucial to reduce the downtime or detect critical faults in earlier stages. Methodologies and tools that can support this type of process can benefit wind farm owners not only to increase availability and production but also to reduce costs.
The earlier O&M processes were corrective, meaning that the maintenance was carried out when turbines broke down and faults were detected. This is an expensive strategy because of a lack of planning. By contrast, a preventive maintenance tries to either repair or replace components before they fail, but is expensive because maintenance tasks are completed more frequently than is absolutely necessary. Condition based maintenance (CBM) are a trade-off between both aforementioned strategies in which continuous monitoring and inspection techniques are employed to detect incipient faults early, and to determine any necessary maintenance tasks ahead of failure [5]. This is achieved using condition monitoring systems (CMS), which involve acquisition, processing, analysis and interpretation of data using the SCADA systems.
In modern wind turbines, however, the SCADA data often comprises of hundreds of signals, including temperature values from a variety of measurement positions in the turbine, pressure data, for example from the gearbox lubrication system, electrical quantities such as line currents and voltages or pitch-motor currents or tower vibration, among many others [6,7,8,9]. Comprehensive SCADA data-sets often contain not only the 10-min or even 5-min averaged values, but also minimum, maximum and standard deviation values for each interval. Therefore, due to the high number of available variables and data, analyzing them can be a high time consuming task [10,11,12] and when just well-known related variables are analyzed, hidden causes (or not common causes) cannot be, or are hard to be, found. As these data are already being collected and are available for the purpose of condition monitoring, some research has been carried out in the recent years for the purpose of predicting fault-detection in a non-invasive manner.
Amongst the state of the art research, some authors focus on methods for the signal analysis, mathematical models or an ensemble of statistical methods sequentially connected. Authors such as Shafiee et al. [13] develop methods to calculate the number of redundant converters and to determine the number of failures needed before reporting a maintenance task in the case of turbines offshore, located in hard-to-reach places. On the other hand, Hameed et al. [14] apply transformations for spectral analysis with the aim of detecting deviations before failures. Astolfi et al. [15] use statistical methods to extract indicators showing miss-alignment of the nacelle with respect to the wind direction; these indicators are checked with real SCADA data. The same authors, in Astolfi et al. [16], show different algorithms that generate performance indicators (Malfunctioning index, Stationarity index and Miss-alignment index) for the analyzed turbine. Unlike other authors, Qiu et al. [17] work with the data from alarms and also introduce methods of temporal and probabilistic analysis, generating a diagnosis of the current state of the WT and a forecast of their future state. There are also authors who have focused on creating a physical-statistical model to detect faults [18]. A statistical analysis of the duration of each type of alarm can be found in [19].
In the area of artificial intelligence (AI) there is a wide variety of techniques largely based on support-vector machines (SVM) and artificial neural networks (ANN). One of the first examples of a system based on ANN is the SIMAP (Intelligent System for Predictive Maintenance) [20] developed for detecting and diagnosing gearbox faults. The system was able to detect a gearbox fault two days before the actual failure, which is an interesting result but the system is not developed enough to be used for other types of applications. In 2007, Singh et al. [21] also use an ANN approach for wind turbine power generation forecasting, showing that the ANN offered -over a monthly period- a much more accurate estimation closer to actual generated power than the traditional method. Zaher et al. [22] propose an ANN-based automated analysis system. The study describes a set of techniques that can be used for early fault identification for the main components of a WT, interpreting the large volume of SCADA data and highlighting the important aspects of interest before presenting them to the operator.
Neural networks are used in [23] for the estimation of the wind farm’s power curve. This curve links the wind speed to the power that is produced by the whole wind farm, which is non-linear and non-stationary. The authors model the power curve of an entire wind farm using a self-supervised neural network called GMR (Generalized Mapping Regressor). The model allows them to estimate the reference power curve (on-line profile) for monitoring the performance of the wind farm as a whole. Another example related to forecasting wind parameters can be found in [24], where a combination of Wavelet Decomposition (WD), Empirical Mode Decomposition (EMD) and Elman Neural Networks is presented for wind speed forecasting.
An ANN is also used in the work of Bangalore and Tjernberg [25] and Cui et al. [26], with four continuous variables as input and one as output. The objective is to compare the output of the model with the real data. In the training step they obtain the threshold from which a positive output will be generated. This threshold is determined using the error distribution and with a p-value of 0.05 the corresponding value is found. In this other work, Bangalore and Tjernberg [27], they present another methodology to detect the deviation from the ANN model using the Mahalanobis distance and with a p-value of 0.01 the threshold value is obtained.
In Mazidi et al. [28] the authors propose to use an ANN, again with continuous variables, in order to detect anomalies. As in the previous work, the input variables are selected manually. Pearson correlation is used to eliminate the more correlated ones. They define various error indicators that are compared to an experimentally derived threshold. A post-analysis based on PCA is then performed to identify the variable that exceeds the threshold. In a posterior study, Mazidi et al. [29] improve this methodology. First they apply PCA to visualize the correlations between variables and to select some of them, by means of the Pearson correlation, Spearman correlation, Kendall correlation, Mutual Information, RReliefF or Decision-Trees. Then, and based on experiments, they choose the variables to be used as inputs for the ANN model, which will have the Power as output variable. The output error is used to create a stress model that will be used to indicate the status of the WT. We refer the reader to [30] where a detailed explanation of these techniques can be found.
Authors such as [31] have used a different type of ANN, Neuro-Fuzzy Inference System (ANIFS), to characterize normal behaviour models in order to detect abnormal behaviour of the captured signals using the prediction error to indicate component malfunctions or faults; whil [32] use an ANN to perform a regression using two to four input variables and one output variable.
On the SVM side, authors such as Vidal et al. [33] focus on using a multiclass SVM classifier to detect different failures. They use a pre-analysis of the contribution of each variable by the means of PCA. It should be noted that these authors work with data simulated by the FAST system [34] which does not have the handicap of noise and the low quality of data in real datasets. [35] use a SVM classifier with five output classes. An important contribution of this work is that it carries out the tasks of cleaning and sampling, which are necessary when dealing with real data, although the selection of variables is done manually. Works such as [36] use an ensemble of models based on ANN, Boosting Tree Algorithm (BTA), Support Vector Machine (SVM), Random Forest (RF) or Standard Classification and Regression Tree (CART), generating an interval of probability of failure. Leahy et al. [37] also use an ensemble of SVM, RF, Logistic regression (LR) and ANN to generate a model that is capable of classifying 3 classes (Fault, No Fault, Previous Fault) from SCADA data and alarms. The author achieves a prediction rate of 71% with 35 hours in advance, in some cases.
We can also find works that use models based on clustering like SOM (Self Organizing Maps) in Du et al. [38], which sets the target variable (power) and selects the input variable by correlation. Then, a SOM map is created from a WT in good conditions. Using this map, the distribution of distances to the BMU (Best Matching Unit) is generated and the threshold is established as the quartile value. The data of new wind turbines are mapped to this SOM obtaining the distance to the BMU and determining the points that are out of normality. To determine the origin, they compute a statistic of which variable has had the greatest contribution to generate the distance from the BMU. Following with the SOM techniques, authors such as Blanco-M. et al. [39] propose a process that includes a clustering technique on the result of the turbines after applying SOM, in order to identify the health status of the turbines. Other authors, such as Leahy et al. [40], focus on clustering groups of alarms, detecting particular sequences before a failure. Gonzalez et al. [41] uses similarity measurements between turbines, KNN, RF and Quantile Regression Forests to determine the error and dispersion of data from each turbine to detect an anomaly. SCADA alarms are used to find the system that generated it.
In many papers of the state of the art research we can see that the selection of the variables is done manually by an expert, or based on the perception of the author according to the subsystem to analyze. Some authors, such as [29,33,42,43], include some type of reduction stage by correlations or PCA, but do not make a comparison of selection methods, or this comparison does not contain methods that include the interaction of more than two variables such as those presented in this paper.
As we have seen in previous studies, choosing the optimal and adequate number of variables related to a failure is a key step when making the model. To address this issue, this paper explores the possibility of using automatic methods for feature selection and studies their performance in real SCADA data. In this work, an exhaustive search-based quasi-optimal algorithm (QO), which has been used as a reference for the automatic algorithms, is proposed. This will allow us to consider the whole set of variables of the subsystem and automatically select the smallest subset of relevant variables, which in turn will simplify the models and permit a graphical representation of their time evolution.
The paper is organized as follows: Section 2 is dedicated to review and present the automatic feature selection algorithms based on Information Theory measures; Section 3 describes a QO algorithm for feature selection in order to define a reference for the experiments; Section 4 details the study case and methodology; Section 5 is then devoted to the experimental results and discussion. Finally Section 6 provides conclusions to the work.

2. Automatic Feature Selection Algorithms

When dealing with classification systems, the selection of optimal features is of great importance because even if theoretically having more features should give us more discriminating power, in real-world scenarios this is not always the case. The reason for that is because some features can be irrelevant with respect to predicting the class, or can be redundant to other features (highly correlated, sharing mutual information, etc.) which can decrease the performance of the classification system.
To explore all the available features, and due to the impossibility of testing all the possible combinations, feature selection algorithms are needed to sort the features according to a balance between its relevance and its redundancy. As the goal is to solve a classification problem from a subset of variables, the employed algorithms should automatically provide the smallest subsets of non-redundant and most-relevant features.
One way to do this is to apply a criterion that allows us to obtain a score of each feature X k by employing information theory measures. Naming J the score function, the scores of each characteristics X k will be obtained as J ( X k ) . That measure must establish a descending-order ranking of features.
One of the first and simplest heuristic rules to score features employs the Mutual Information (MI) measure I ( X k ; Y ) , where in that expression Y is the class label and X k , is the feature under analysis. Then J ( X k ) = I ( X k ; Y ) provides the scores of all features X k according to their individual mutual information content [44] and the feature selection is performed by choosing the first K ones, according to the needs of a given application. Note that the term I ( X k ; Y ) gives a measure of the relevance of a feature, so that sometimes it is known as relevance index (RI). Note also that in a feature selection stage for a classification problem, the use of RI is only optimum when the features are mutually independent. When features are interdependent this criteria is known to be sub-optimal because it can select a set of individually relevant features which also should be redundant to each other [45].
To overcome that limitation, some other criteria have been proposed in order to also take into account their possible redundancy. One way to do this is not only by considering the RI of a new feature but also by measuring and extracting the mutual information that a new feature shares with the previously selected features (referred as S) in order to aggregate only its contribution in the set. That is what the Mutual Information Feature Selection (MIFS) criterion implements [46]. Its corresponding score function J M I F S ( X k ) is shown in Equation (2). Note that its first term is again I ( X k ; Y ) which takes into consideration the relevancy of X k . Its second term, which contributes with negative sign, is X j S I ( X k ; X j ) and accumulates the mutual information of X k with all X j already selected in S. This term clearly introduces a penalty to enforce low correlations with the features previously selected, those X j S . Note that in Equation (2), the term X j S I ( X k ; X j ) increases with the number of selected features whereas I ( X k ; Y ) keeps constant. Therefore, when dealing with a large set of features the second term could be the predominant one.
A new refinement can be done if each new feature selected to be aggregated in S is the one which increases the complementary information between features previously selected. That criteria is fulfilled when working with the Joint Mutual Information (JMI) [47,48]. In that case, the JMI score function for X k is J J M I ( X k ) = X j S I ( X k X j ; Y ) and computes the mutual information between the targets Y and the joint random variable X k X j , defined by pairing the candidate X k with each X j S . After some mathematical manipulations, J J M I ( X k ) can be written as shown in the right part of Equation (4) in which the RI term appears, followed by the term that penalizes the redundancy (present also in MIFS approach) and finally a new term: X j S I ( X k ; X j | Y ) . This last term contributes with positive sign to J J M I increasing it with some class-conditional dependence of X k with the existing features in S. This means that the inclusion of some correlated features can improve feature selection performance thanks to the complementary of the new added features with the ones already present in S. A similar term can be observed in Equation (4). The improvement in the feature selection performance that can be observed in some data-sets due to the inclusion of this third term was also reported by [45].
What is interesting in this point is that according to the framework presented in Brown et al. [45], although many other criteria have been reported in the literature, most of the linear score functions can always be rewritten as a linear combination of the exposed three terms as follows:
J x ( X k ) = I ( X k ; Y ) β X j S I ( X k ; X j ) + γ X j S I ( X k ; X j | Y )
where β and γ are configurable parameters.
Not all the methods found in the literature have all three terms. It’s also obvious that the performance of different criteria will depend on the statistical properties of each feature data-set. Consequently, in order to evaluate the best criteria for our data-set, different methods have been employed in the feature selection stage.
In the next subsection, the expressions of information theory based feature selection algorithms that have been used in this work are detailed. For all these algorithms, Table 1 contains the list of acronyms, names, references and if the method employs a second term to avoid redundancy in features or has some way to capture the inter-class correlation that improves the classification performance (as it is observed in some data-sets). A detailed description of all these algorithms can be found in [45].

Compilation of Used Criteria

The feature selection algorithms used in the experiments are mainly described as a function of the Mutual Information and the Conditional Information. Given the discrete variables X, Y and Z, these functions are denoted by I ( X ; Y ) and I ( X ; Y | Z ) respectively. Both expressions can be written in terms of Shannon entropy expressions [53] which are used directly in Equation (6) as a normalization parameter. In the following expressions X k is the feature under analysis and Y is the class label. The group of previously selected features is indicated by S. All sums are performed considering all the features already included in S which is denoted as X j S . Symbol | S | stands for the cardinality of S and it is employed in Equations (4) and (5) so that, as the cardinality of S increases, its inverse reduces the effect of the term to whom it multiplies. Note that Equations (8) and (9), corresponding to Conditional Mutual Information Maximization (CMIM) and Interaction Capping (ICAP) criteria, are non-linear due to max and min operations and therefore the interpretations are not as straightforward as in the linear case.
Mutual Information Feature Selection
J M I F S ( X k ) = I ( X k ; Y ) X j S I ( X k ; X j )
Conditional Mutual Information
J C M I ( X k ) = I ( X k ; Y ) X j S I ( X k ; X j ) + X j S I ( X k ; X j | Y )
Joint Mutual Information
J J M I ( X k ) = j S I ( X k X j ; Y ) = I ( X k ; Y ) 1 | S | X j S I ( X k ; X j ) I ( X k ; X j | Y )
Minimum-Redundancy Maximum-Relevance
J m R M R ( X k ) = I ( X k ; Y ) 1 | S | X j S I ( X k ; X j )
Double Input Symmetrical Relevance
J D I S R ( X k ) = X j S I ( X k X j ; Y ) H ( X k X j Y )
Conditional Mutual Information Maximization
J C M I M ( X k ) = min X j S [ I ( X k ; Y | X j ) ]
or:
J C M I M ( X k ) = I ( X k ; Y ) max X j S I ( X k ; X j ) I ( X k ; Y j | Y )
Interaction Capping
J I C A P ( X k ) = I ( X k ; Y ) X j S max [ 0 , I ( X k ; X j ) I ( X k ; X j | Y ) ]
To perform the experiments, the original code from [45] was adapted to R language, the speed of calculations were optimized and a new functionality was included in the functions to provide a set of features to be used as mandatory for the feature selection functions and then allowing the algorithm to add other features, ranking them according to the optimization process. This functionality was not provided by the original code. The R code of the library (FEASTR) is freely available at http://mon.uvic.cat/data-signal-processing/software/.

3. Exhaustive-Search-Based Quasi-Optimal Algorithm

In this section a quasi-optimal (QO) algorithm for feature selection is presented, in order to establish a reference or gold standard for the rest of experiments performed using automatic feature selection algorithms. Optimal feature selection implies to test all possible combinations and select the one that give us the best classification rate. Unfortunately this is only possible when the number of features is sufficiently small, due to the exponentially growing of possible combinations when increasing the number of features. This effect is know as curse of dimensionality. Indeed, the number of combinations of n features taking k at a time (without repetition) is equal to the binomial coefficient.
In our specific case each sub-system has 4 variables (minimum value, maximum value, average value, standard deviation) which gives us 36 features (4 variables × 9 sub-systems) coming from the gearbox, transmission and nacelle wind sensors systems of wind turbines (see Table 2 for the exact list of variables). This implies, for example, that we have 7140 combinations of three features, 58,905 combinations of four features and 376,992 combinations of five features. The worst case, when taking 18 features, gives a total of 9,075,135,300 combinations.
Therefore, all the possible combinations of 1, 2 and 3 features will be calculated and a QO strategy for 4, 5, and 6 features will be implemented. In all the cases, the criteria for selecting the best combination is based on the classification rate obtained with the k-NN classifier. The following strategy (see Figure 2 for a block diagram) gives the details on how the QO feature selection is implemented. Suppose you want to determine the best combination of n characteristics. Then:
  • Calculate the frequency of selection of the characteristics for the case n-1 using the best 500 results.
  • Sort the features according to its frequency.
  • Select the subset of S features with highest frequency.
  • Calculate all possible combinations of these S features taking n at a time (without repetition).
  • Select the best combination based on the classification rate obtained with the k-NN classifier.
For the case n = 4 the best 20 frequent features (S = 20) of the case n = 3 will be used, generating a total of 4845 combinations of 4 characteristics. For the case n = 5 the best 15 features (S = 15) of the case n = 4 will be used, generating a total of 3003 combinations of 5 characteristics. Finally, for the case n = 6 the best 15 features (S = 15) of the case n = 5 will be used, generating a total of 3003 combinations of 6 characteristics.
The advantage of optimal feature selection is that all possible combinations (interactions) between features are tested. The disadvantage is the impossibility of implementing the large number of combinations when the number of characteristics is huge and you want to consider a substantial number of characteristics in each group. The QO strategy presented above gives an approximation to the selection of optimal features, but even so some combinations that could be better are probably ignored, and even if the number of combinations decreases, there are still a lot of cases to try with the classification algorithm. On the other hand one is usually interested in a fast algorithm for automatic characteristic selection, which can deal with all 36 characteristics and classify them according to their importance for the classification problem. Therefore, the aim is to replace the QO characteristics selection with an automatic characteristic selection algorithm without losing performance and allowing all available characteristics to be exploited.

4. Study Case and Methodology

In the following section, the data-set used in the experiments and the classification system employed are detailed. The general scheme of experiments is depicted in Figure 3.

4.1. Data-Set Description

The collected data-set used in this work covers an entirely year (2014) of a farm with five Fuhrländer wind turbines in Catalonia. The original set of more than 200 variables comes in 5-min format for analogous variables and as a record of events for digital data (alarms) from the wind farm’s SCADA. Among all these features, a subset of them related to wind turbine gearbox and transmission system was selected to be used in the experiments. The events are labeled as 0 for normal functioning, 1 for warning and 2 for alarm. The difference between warning and alarm is in the state of the wind turbine, on working for the warning state but stopped for the alarm state. Considering that a warning is a signal that something wrong may occur, the warnings and alarms are integrated and the developed system will focus on improving the classification events between the operating and fault conditions (warning or alarm).

4.2. Classification System

The k nearest neighbours (k-NN) is one of the simplest and oldest classification methods that classifies an unknown observation in the same class as the majority of their neighbour observations, where the proximity between observations is defined by a distance metric [54]. Among its advantages, k-NN is a simple method that offers comparable results and sometimes even outperforms other more sophisticated machine learning (ML) strategies. However, characteristics of data that do not contain useful information, and that commonly appear in high-dimensional problems, cause a decrease in their performance. Improvements have been obtained by employing ensemble techniques, as reported in [55,56,57,58]. Analyzing big data-sets can consume huge computational resources and execution time. Taking into account that sometimes not all characteristics of the data contribute equally to the final results, it is reasonable to try to identify the main contributing characteristics and use them instead of the whole set of features. Therefore, features with low contribution can be eliminated to reduce complexity and computational time.
In general, using k-NN classification, k = 1 is often not the best case as the classification accuracy can be easily degraded by noise. With the increase of k, multiple nearest neighbors help to improve the classification accuracy. However, if k is very large, the classification accuracy of k-NN tends to decrease as the nearest and farthest neighbors have assigned equal weights in the decision making process. To sum up, the classification accuracy of the k-NN algorithm experiences a rise–peak–drop process and in practical situations it is important to determine the optimal k value. We will discuss the used value in Section 5.
To measure the performance of our system, the Classification Rate (CR) and the F1-score (F1) are used. The CR is calculated as the percentage of well-classified instances divided by the total number of instances, while the F1 is obtained as the harmonic mean of precision and recall. In order to have statistically consistent results, 100 different cases are computed. These different cases are obtained by randomly splitting the database in two subsets: the first for deriving the model (training subset) and the second to test it (test subset). Due to the fact that almost all the time the wind turbines (WT) are in normal state, the database is clearly biased and presents a high number of instances of this class. Therefore the training set is balanced by keeping the same number of instances for each class (down sampling the majority class). As the splitting process is totally random, all the instances will be used at the end of all 100 experiments.

5. Experimental Results and Discussion

All the experiments (see Figure 3) use the data-set presented in Section 4.1, which contains 36 features, and each target has a label indicating normal state, warning state or alarm state. Warnings and alarms are integrated, therefore it becomes a binary classification problem. The selection of the best features to be used as input to the classification system is implemented as detailed in Section 2. Several experiments were performed using all the WT, and the best features, from 1 to 6, were obtained trough several feature selection algorithms. Panel (a) of Figure 4 shows the CR against the number of features for the quasi-optimal algorithm and all the WT. Results are very good in all the WT, reaching above 85% of CR when the number of features is 3 or higher. Adding new features slightly increases the CR, but for more than 4 features the change is almost imperceptible. Numerical results for these experiments (in terms of CR and F1) are detailed in Table 3. All results are obtained with k = 1 and we can see that the F1-score is close to 1 and highly correlated with the CR results.
The specific features selected by the algorithms are included in Table 3, coded with a letter and a number. The letter indicates the group of the feature, while the number stands for the exact variable code (1: average; 2: min; 3: max; 4: sdv (standard deviation)). Table 2 contains the translation from the variable code to the variable name. For instance, in Table 3 and using only one feature, the best result for WT1 is 91.79% with the feature A1. Table 2 indicates that this feature is “WGDC.TrfGri.PwrAt.cVal.avgVal”, meaning the active power (letter A), averaged value (number 1).

5.1. Quasi-Optimal vs. Automatic Feature Selection

The next step is to look for a feature selection algorithms able to obtain similar results with a few number of features. Results for those feature selection algorithms are presented in panels (b) to (f) of Figure 4. Each panel corresponds to a WT and contains the result obtained for the quasi-optimal method (as a reference, dashed line) and the results obtained with all the others algorithms for this WT. As can be observed, some WT are easy to model (see for example WT4) while others are more challenging (see for example WT5). Numerical results for all the experiments are detailed in Table 4, again showing the CR and the F1. When comparing results obtained by the quasi-optimal exploratory method and the automatic feature selection methods, QO results are always the best ones, as expected, but several automatic methods obtain also very good results.
Among all the automatic algorithms, CMI emerges as stable along all the WT and obtaining (almost) always a very good result, comparable to that obtained with the quasi-optimal method for a number of features equal or higher than 4.
By exploring all possible combinations of features, the optimal number of features is determined. As can be seen, CR saturates for 6 features, therefore the system will not increases its performance by adding new features. It is important to keep the number of features as small as possible in order to develop less complex classification systems. Besides, if systems are less complex it will be easier to train the models and the risk of overfitting will be lower. Finally, using a small number of features can allow to graphically represent the information, if having up to 3 features. This is of great importance as a tool in the front-end of real applications for the managers of the wind farms. Hence, CMI with 3 or 4 features is a good choice in the experiment, with CR and F1 comparable to the quasi-optimal one for all WT.

5.2. Effect of the Number of Neighbors Considered

To analyze the effect of the number of neighbors in the k-NN algorithm, experiments exploring all the cases for k = 1 to k = 50 in all the algorithms are performed, using the best combination of features for each case.
When analyzing the quasi-optimal case, k = 1 is the best option for all the WT. When using any of the automatic feature selection algorithms, if the number of features is small then the number of neighbors affects the CR and habitually k = 1 is not the best. Nevertheless, even increasing the number of neighbors, the obtained CR is lower that the QO case for the number of features analyzed. If the number of features increases, and therefore also the CR increases, k = 1 becomes again the best option and CR tends to the QO case. The advantage of increasing the neighbors is compensated by increasing the number of features. This effect can be observed in Figure 5: On the left column, the evolution of the CR as a function of k, for the quasi-optimal set of features (1 to 6) for WT1 and WT3, is presented. On the right column, the same WT but now using features obtained with the best feature selection algorithm among all the analyzed algorithms. Note that increasing the number of neighbors is only useful for the CMI algorithm when the number of features used is small (1 or 2), but does not help increase the CR when the number of features is larger. For the quasi-optimal feature selection algorithm, k = 1 is (almost) always the best option regardless of the number of features. Therefore, changing the number of neighbors has only impact when using 1 or 2 features in the CMI algorithm and degrades CR when the number of features is large or when the QO method is used.

6. Conclusions

In this paper, several methods for automatic feature selection for wind turbine failure prediction are explored and their performances are compared against the proposed quasi-optimal feature selection method detailed in Section 4.2. Experimental results using the 36 sensor variables listed in Table 2 show that CMI algorithm obtains good CR for all the wind turbine with up to six features and only one neighbour. Therefore, the speed of the system can be increased by using this algorithm instead of exhaustive search-based quasi-optimal strategy. The advantages are its low computational costs and fast speed calculations in order to find the best subset of features for wind turbine failure prediction. Although our study confirms that a selected set of three to six more discriminant variables are required to obtain the best prognosis performance, that selection is rather difficult to be represented. This is why sets of three selected variables, admitting a 3D Cartesian plot, becomes interesting. In this scenario, time evolution can be included generating plot animations. These dynamic representations provide powerful and intuitive insights about the behaviour of variables 21 days before failure occurs and becomes a useful tool to improve the models used for prognostic. In future works the dynamic representations of three features will be explored, allowing to visualize interactions between them, with the aim of simplifying and facilitating the management of wind farms.

Author Contributions

Conceptualization, P.M.-P. and J.S.-C.; methodology, P.M.-P., J.J.C., J.C. and J.S.-C.; software, A.B.-M.; validation, A.B.-M., J.J.C., and J.C.; formal analysis, P.M.-P., A.B.-M. and J.S.-C.; investigation, J.J.C., and J.C; resources, J.J.C. and J.C.; data curation, A.B.-M. and J.J.C.; writing–original draft preparation, P.M.-P., A.B.-M., J.C. and J.S.-C; writing–review and editing, P.M.-P., A.B.-M., J.J.C., J.C. and J.S.-C.; supervision, P.M.-P. and J.S.-C.; project administration, J.C.; funding acquisition, P.M.-P., J.C. and J.S.-C.

Funding

Research partially funded by Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) of the Catalan Government (Project reference: 2014-DI-032).

Acknowledgments

The authors would like to thank anonymous reviewers for their detailed and helpful comments to the manuscrit.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. European Commission. European Commission Guidance for the Design of Renewables Support Schemes; Official Journal of the European Union: Brussels, Belgium, 2013; p. 439. [Google Scholar]
  2. The European parliament and the council of the European Union. Guidelines on State Aid for Environmental Protection and Energy 2014–2020; Official Journal of the European Union: Brussels, Belgium, 2014; pp. 1–55. [Google Scholar]
  3. David Bailey, E.W. Practical SCADA for Industry; Elsevier: Amsterdam, The Netherlands, 2003. [Google Scholar]
  4. IEC. International Standard IEC 61400-25-1; Technical Report; International Electrotechnical Commission: Geneva, Switzerland, 2006. [Google Scholar]
  5. García Márquez, F.P.; Tobias, A.M.; Pérez, J.M.P.; Papaelias, M. Condition monitoring of wind turbines: Techniques and methods. Renew. Energy 2012, 46, 169–178. [Google Scholar] [CrossRef]
  6. Romero, A.; Lage, Y.; Soua, S.; Wang, B.; Gan, T.-H. Vestas V90-3MW Wind Turbine Gearbox Health Assessment Using a Vibration-Based Condition Monitoring System. Shock Vib. 2016, 2016, 18. [Google Scholar] [CrossRef]
  7. Weijtjens, W.; Devriendt, C. High frequent SCADA-based thrust load modeling of wind turbines. Wind Energy Sci. 2017. [Google Scholar] [CrossRef]
  8. Wilkinson, M. Use of Higher Frequency SCADA Data for Turbine Performance Optimisation; Technical Report; DNV GL, EWEA: Brussels, Belgium, 2016. [Google Scholar]
  9. Vestas R&D Department. General Specification VESTAS V90 3.0MW; Technical Report; Vestas Wind Systems: Ringkobing, Denmark, 2004. [Google Scholar]
  10. Tyagi, P. The Case for an Industrial Big Data Platform; Technical Report; General Electric (GE): Boston, MA, USA, 2013. [Google Scholar]
  11. Henry Louie, A.M. Lossless Compression of Wind Plant Data. IEEE Trans. Sustain. Energy 2012, 2012, 598–606. [Google Scholar] [CrossRef]
  12. Vestas&IBM. Turning Climate into Capital with Big Data; Technical Report; International Business Machines Corporation (IBM): Armonk, NY, USA, 2011. [Google Scholar]
  13. Shafiee, M.; Patriksson, M.; Strömberg, A.B.; Tjernberg, L.B. Optimal redundancy and maintenance strategy decisions for offshore wind power converters. Int. J. Reliab. Qual. Saf. Eng. 2015, 22, 1550015. [Google Scholar] [CrossRef]
  14. Hameed, Z.; Hong, Y.; Cho, Y.; Ahn, S.; Song, C. Condition monitoring and fault detection of wind turbines and related algorithms: A review. Renew. Sustain. Energy Rev. 2009, 13, 1–39. [Google Scholar] [CrossRef]
  15. Astolfi, D.; Castellani, F.; Scappaticci, L.; Terzi, L. Diagnosis of Wind Turbine Misalignment through SCADA Data. Diagnostyka 2017, 18, 17–24. [Google Scholar]
  16. Astolfi, D.; Castellani, F.; Garinei, A.; Terzi, L. Data mining techniques for performance analysis of onshore wind farms. Appl. Energy 2015, 148, 220–233. [Google Scholar] [CrossRef]
  17. Qiu, Y.; Feng, Y.; Tavner, P.; Richardson, P.; Erdos, G.; Chen, B. Wind turbine SCADA alarm analysis for improving reliability. Wind Energy 2012, 15, 951–966. [Google Scholar] [CrossRef]
  18. Gray, C.S.; Watson, S.J. Physics of failure approach to wind turbine condition based maintenance. Wind Energy 2010, 13, 395–405. [Google Scholar] [CrossRef]
  19. Bartolini, N.; Scappaticci, L.; Garinei, A.; Becchetti, M.; Terzi, L. Analysing wind turbine states and scada data for fault diagnosis. Int. J. Renew. Energy Res. 2017, 7, 323–329. [Google Scholar]
  20. Garcia, M.C.; Sanz-Bobi, M.A.; del Pico, J. SIMAP: Intelligent System for Predictive Maintenance: Application to the health condition monitoring of a windturbine gearbox. Comput. Ind. 2006, 57, 552–568. [Google Scholar] [CrossRef]
  21. Singh, S.; Bhatti, T.; Kothari, D. Wind power estimation using artificial neural network. J. Energy Eng. 2007, 133, 46–52. [Google Scholar] [CrossRef]
  22. Zaher, A.; McArthur, S.; Infield, D.; Patel, Y. Online wind turbine fault detection through automated SCADA data analysis. Wind Energy 2009, 12, 574–593. [Google Scholar] [CrossRef]
  23. Marvuglia, A.; Messineo, A. Monitoring of wind farms’ power curves using machine learning techniques. Appl. Energy 2012, 98, 574–583. [Google Scholar] [CrossRef]
  24. Liu, H.; Tian, H.; Liang, X.; Li, Y. Wind speed forecasting approach using secondary decomposition algorithm and Elman neural networks. Appl. Energy 2015, 157, 183–194. [Google Scholar] [CrossRef]
  25. Bangalore, P.; Tjernberg, L.B. Self evolving neural network based algorithm for fault prognosis in wind turbines: A case study. In Proceedings of the 2014 International Conference on Probabilistic Methods Applied to Power Systems (PMAPS), Durham, UK, 7–10 July 2014; pp. 1–6. [Google Scholar]
  26. Cui, Y.; Bangalore, P.; Tjernberg, L.B. An Anomaly Detection Approach Based on Machine Learning and SCADA Data for Condition Monitoring of Wind Turbines. In Proceedings of the 2018 IEEE International Conference on Probabilistic Methods Applied to Power Systems (PMAPS), Boise, ID, USA, 24–28 June 2018; pp. 1–6. [Google Scholar]
  27. Bangalore, P.; Tjernberg, L.B. An artificial neural network approach for early fault detection of gearbox bearings. IEEE Trans. Smart Grid 2015, 6, 980–987. [Google Scholar] [CrossRef]
  28. Mazidi, P.; Bertling Tjernberg, L.; Sanz-Bobi, M.A. Performance Analysis and Anomaly Detection in Wind Turbines based on Neural Networks and Principal Component Analysis. In Proceedings of the 12th Workshop on Industrial Systems and Energy Technologies, Madrid, Spain, 23–24 September 2015. [Google Scholar]
  29. Mazidi, P.; Tjernberg, L.B.; Bobi, M.A.S. Wind turbine prognostics and maintenance management based on a hybrid approach of neural networks and a proportional hazards model. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 2017, 231, 121–129. [Google Scholar] [CrossRef]
  30. Mazidi, P. From Condition Monitoring to Maintenance Management in Electric Power System Generation with focus on Wind Turbines. Ph.D. Thesis, Universidad Pontificia Comillas, Madrid, Spain, 2018. [Google Scholar] [CrossRef]
  31. Schlechtingen, M.; Santos, I.F.; Achiche, S. Wind turbine condition monitoring based on SCADA data using normal behavior models. Part 1: System description. Appl. Soft Comput. 2013, 13, 259–270. [Google Scholar] [CrossRef]
  32. Astolfi, D.; Scappaticci, L.; Terzi, L. Fault diagnosis of wind turbine gearboxes through temperature and vibration data. Int. J. Renew. Energy Res. 2017, 7, 965–976. [Google Scholar]
  33. Vidal, Y.; Pozo, F.; Tutivén, C. Wind turbine multi-fault detection and classification based on SCADA data. Energies 2018, 11, 3018. [Google Scholar] [CrossRef]
  34. NREL. NWTC Information Portal (FAST). 2018. Available online: https://nwtc.nrel.gov/FAST (accessed on 10 January 2019). [Google Scholar]
  35. Leahy, K.; Hu, R.L.; Konstantakopoulos, I.C.; Spanos, C.J.; Agogino, A.M.; O’Sullivan, D.T.J. Diagnosing and predicting wind turbine faults from SCADA data using support vector machines. Int. J. Progn. Health Manag. 2018, 9, 1–11. [Google Scholar] [CrossRef]
  36. Kusiak, A.; Li, W. The prediction and diagnosis of wind turbine faults. Renew. Energy 2011, 36, 16–23. [Google Scholar] [CrossRef]
  37. Leahy, K.; Gallagher, C.; O’Donovan, P.; Bruton, K.; O’Sullivan, D.T.J. A Robust Prescriptive Framework and Performance Metric for Diagnosing and Predicting Wind Turbine Faults Based on SCADA and Alarms Data with Case Study. Energies 2018, 11, 1738. [Google Scholar] [CrossRef]
  38. Du, M.; Tjernberg, L.B.; Ma, S.; He, Q.; Cheng, L.; Guo, J. A SOM based Anomaly Detection Method for Wind Turbines Health Management through SCADA Data. Int. J. Progn. Health Manag. 2016, 7, 1–13. [Google Scholar]
  39. Blanco-M, A.; Gibert, K.; Marti-Puig, P.; Cusidó, J.; Solé-Casals, J. Identifying Health Status of Wind Turbines by Using Self Organizing Maps and Interpretation-Oriented Post-Processing Tools. Energies 2018, 11, 723. [Google Scholar] [CrossRef]
  40. Leahy, K.; Gallagher, C.; O’Donovan, P.; O’Sullivan, D.T.J. Cluster analysis of wind turbine alarms for characterising and classifying stoppages. IET Renew. Power Gener. 2018, 12, 1146–1154. [Google Scholar] [CrossRef]
  41. Gonzalez, E.; Stephen, B.; Infield, D.; Melero, J. On the Use of High-Frequency SCADA Data for Improved Wind Turbine Performance Monitoring; Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2017; Volume 926, p. 012009. [Google Scholar]
  42. Zhao, Y.; Li, D.; Dong, A.; Kang, D.; Lv, Q.; Shang, L. Fault Prediction and Diagnosis of Wind Turbine Generators Using SCADA Data. Energies 2017, 10, 1210. [Google Scholar] [CrossRef]
  43. Wang, K.S.; Sharma, V.S.; Zhang, Z.Y. SCADA data based condition monitoring of wind turbines. Adv. Manuf. 2014, 2, 61–69. [Google Scholar] [CrossRef][Green Version]
  44. Lewis, D.D. Feature selection and feature extraction for text categorization. In Proceedings of the Workshop on Speech and Natural Language; Association for Computational Linguistics: Stroudsburg, PA, USA, 1992; pp. 212–217. [Google Scholar]
  45. Brown, G.; Pocock, A.; Zhao, M.J.; Luján, M. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 2012, 13, 27–66. [Google Scholar]
  46. Battiti, R. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 1994, 5, 537–550. [Google Scholar] [CrossRef] [PubMed][Green Version]
  47. Yang, H.H.; Moody, J.E. Data Visualization and Feature Selection: New Algorithms for Nongaussian Data; Advances in Neural Information Processing Systems (NIPS); MIT Press: Boston, MA, USA, 1999; Volume 99, pp. 687–693. [Google Scholar]
  48. Meyer, P.E.; Bontempi, G. On the use of variable complementarity for feature selection in cancer classification. In Applications of Evolutionary Computing; Springer: Berlin, Germany, 2006; pp. 91–102. [Google Scholar]
  49. Cheng, H.; Qin, Z.; Feng, C.; Wang, Y.; Li, F. Conditional mutual information-based feature selection analyzing for synergy and redundancy. ETRI J. 2011, 33, 210–218. [Google Scholar] [CrossRef]
  50. Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed][Green Version]
  51. Fleuret, F. Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 2004, 5, 1531–1555. [Google Scholar]
  52. Jakulin, A. Machine Learning Based on Attribute Interactions. Ph.D. Thesis, Fakulteta za racunalništvo in informatiko, Univerza v Ljubljani, Liubliana, Slovenia, June 2005. [Google Scholar]
  53. Thomas, J.A.; Cover, T. Elements of Information Theory; Wiley: New York, NY, USA, 2006; Volume 2. [Google Scholar]
  54. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef][Green Version]
  55. Domeniconi, C.; Yan, B. Nearest neighbor ensemble. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 23–26 August 2004; Volume 1, pp. 228–231. [Google Scholar]
  56. Zhou, Z.H.; Yu, Y. Adapt bagging to nearest neighbor classifiers. J. Comput. Sci. Technol. 2005, 20, 48–54. [Google Scholar] [CrossRef]
  57. Hall, P.; Samworth, R.J. Properties of bagged nearest neighbour classifiers. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 363–379. [Google Scholar] [CrossRef][Green Version]
  58. Samworth, R.J. Optimal weighted nearest neighbour classifiers. Ann. Stat. 2012, 40, 2733–2763. [Google Scholar] [CrossRef]
Figure 1. Example of Wind Turbine sensors types. Adapted from TE connectivity (available via license: CC BY 4.0) http://www.te.com/.
Figure 1. Example of Wind Turbine sensors types. Adapted from TE connectivity (available via license: CC BY 4.0) http://www.te.com/.
Energies 12 00453 g001
Figure 2. Proposed exhaustive-search-based quasi-optimal algorithm.
Figure 2. Proposed exhaustive-search-based quasi-optimal algorithm.
Energies 12 00453 g002
Figure 3. General scheme of the experiments.
Figure 3. General scheme of the experiments.
Energies 12 00453 g003
Figure 4. Evolution of the CR(%) against the number of features. (a) Quasi-optimal feature selection algorithm, all WT. (bf) Specific results for each WT and all the automatic feature selection algorithms analyzed. The dashed line in each panel corresponds to the quasi-optimal result for that specific WT.
Figure 4. Evolution of the CR(%) against the number of features. (a) Quasi-optimal feature selection algorithm, all WT. (bf) Specific results for each WT and all the automatic feature selection algorithms analyzed. The dashed line in each panel corresponds to the quasi-optimal result for that specific WT.
Energies 12 00453 g004
Figure 5. Effect of the number of neighbors for WT1 and WT3. Each colored curve corresponds to a specific number of features, from 1 to 6. Only the QO and the CMI feature selection algorithms are reported here.
Figure 5. Effect of the number of neighbors for WT1 and WT3. Each colored curve corresponds to a specific number of features, from 1 to 6. Only the QO and the CMI feature selection algorithms are reported here.
Energies 12 00453 g005aEnergies 12 00453 g005b
Table 1. Information-based criteria used in the experiments.
Table 1. Information-based criteria used in the experiments.
CriterionFull NameAuthorsRelevance/Redundance
MIFSMutual Information Feature Selection[46]no
CMIConditional Mutual Information[49]yes
JMIJoint Mutual Information[47]yes
mRMRMin-Redundancy Max-Relevance[50]no
DISRDouble Input Symmetrical Relevance[48]yes
CMIMConditional Mutual Info Maximisation[51]yes
ICAPInteraction Capping[52]yes
Table 2. Variable code to variable name.
Table 2. Variable code to variable name.
GroupVariable CodeVariable NameDescription
A1WGDC.TrfGri.PwrAt.cVal.avgValActive power
2WGDC.TrfGri.PwrAt.cVal.minVal
3WGDC.TrfGri.PwrAt.cVal.maxVal
4WGDC.TrfGri.PwrAt.cVal.sdvVal
B1WTRM.TrmTmp.Brg1.avgValMain bearing 1 Temperature
2WTRM.TrmTmp.Brg1.minVal
3WTRM.TrmTmp.Brg1.maxVal
4WTRM.TrmTmp.Brg1.sdvVal
C1WTRM.TrmTmp.Brg2.avgValMain bearing 2 Temperature
2WTRM.TrmTmp.Brg2.minVal
3WTRM.TrmTmp.Brg2.maxVal
4WTRM.TrmTmp.Brg2.sdvVal
D1WTRM.Brg.OilPres.avgValMain bearing oil pressure (inside bearing)
2WTRM.Brg.OilPres.minVal
3WTRM.Brg.OilPres.maxVal
4WTRM.Brg.OilPres.sdvVal
E1WTRM.Gbx.OilPres.avgValGearbox oil pressure
2WTRM.Gbx.OilPres.minVal
3WTRM.Gbx.OilPres.maxVal
4WTRM.Gbx.OilPres.sdvVal
F1WTRM.Brg.OilPresIn.avgValMain bearing oil pressure (inlet hose)
2WTRM.Brg.OilPresIn.minVal
3WTRM.Brg.OilPresIn.maxVal
4WTRM.Brg.OilPresIn.sdvVal
G1WNAC.WSpd1.avgValWind Speed sensor 1
2WNAC.WSpd1.minVal
3WNAC.WSpd1.maxVal
4WNAC.WSpd1.sdvVal
H1WNAC.Wdir1.avgValWind direction sensor 1
2WNAC.Wdir1.minVal
3WNAC.Wdir1.maxVal
4WNAC.Wdir1.sdvVal
I1WNAC.Wdir2.avgValWind director sensor 2
2WNAC.Wdir2.minVal
3WNAC.Wdir2.maxVal
4WNAC.Wdir2.sdvVal
Table 3. CR (a) and F1-score (b) numerical results for the best features of the quasi-optimal feature selection algorithm. Results are grouped in sub-tables for each WT and each sub-table contains the top 5 results for this WT. The selected features are coded with the variable codes detailed in Table 2.
(a) CR(%)
(a) CR(%)
CR(%)1FCR(%)2FCR(%)3FCR(%)4FCR(%)5FCR(%)6F
WT191.79A193.67A2 E393.71A2 B2 B393.71A1 B1 B2 B393.73A3 A4 B1 B3 B493.66A1 A2 A3 B1 B2 B3
91.78A393.66A1 E393.70A3 B4 E293.70A1 A3 B4 E393.69A1 A2 A3 B4 E393.64A1 A4 B1 B2 B3 B4
91.71A293.65A3 B193.70A2 B1 B393.68A1 A2 A3 E393.68A1 A3 A4 B4 E393.61A1 A2 A3 A4 B4 E2
81.70B393.64A3 E393.69A1 A3 E393.68A3 A4 B2 B393.67A1 A4 B1 B3 B493.61A1 A3 A4 B1 B2 B3
81.63B293.62A2 B393.69A1 B1 B293.67A2 A4 B2 B393.65A3 B1 B2 B3 B493.60A1 A2 A3 A4 B1 B2
WT288.01B395.48A3 C296.10A2 C2 D196.43B1 C2 D1 G396.67A3 B1 C2 D2 G396.77A2 A3 B3 C2 D2 H1
87.87B195.46A1 C296.05A3 C2 D196.42A3 C2 D1 G396.62A2 B2 C2 D1 G396.74A1 A3 B1 C2 D2 H1
87.85B295.31A2 C295.99A3 C2 D296.38A2 C2 D1 H196.56A1 A3 C2 D2 H196.73A1 A2 B3 C2 D1 G3
85.83C295.20B2 C295.89A1 C2 D196.38B1 C2 D2 G396.55A2 A3 C2 D1 H196.73A2 A3 B1 C2 D2 G3
85.60E194.99B3 C295.77A2 C2 D296.38A1 C2 D2 G396.55A3 B3 C2 D1 G196.73A1 A2 B1 C2 D1 H1
WT387.02C391.54A2 E391.74A3 B1 E392.45A3 C1 D3 E392.67B3 C1 C3 D2 E392.89B3 C1 C3 D2 E1 E3
86.90C291.44A1 E391.73B1 C3 E392.36A1 C1 D3 E392.66B3 C1 C3 D2 E192.85B1 C1 C3 D2 E1 E3
79.33B191.37A3 E391.67A2 B3 E392.23B1 C1 D1 E392.61A3 C1 D2 E1 E392.82A2 C1 C3 D2 E1 E3
78.95B291.10B2 E391.65A3 A4 E392.18B3 C1 D3 E392.58B1 C1 C3 D2 E392.80A3 C1 C3 D2 E1 E3
78.79B391.01B1 E391.62B3 C3 E392.17B2 C1 D2 E392.58B2 C1 C3 D2 E192.78B1 B4 C1 C3 D2 E3
WT493.30C294.44C2 D295.18B1 C2 D295.56B1 C2 D2 E295.56B1 B2 C2 D2 H395.74B1 C2 D2 D3 E2 H3
92.27C394.32D1 E295.14C2 D2 H395.47B1 C2 D2 H395.54B3 C2 D2 E2 H395.59A4 B1 C2 D2 D3 H3
91.46C194.32D2 E294.97B3 C2 D295.37B1 B4 C2 D295.42B1 B3 C2 D2 D395.55B2 B3 B4 C2 D2 E2
91.29D294.22C2 D194.94C2 D1 H395.30B1 B3 C2 D295.42B1 B4 C2 D2 E295.55B1 B2 C2 D1 D2 E2
90.98D393.74B3 C294.92D1 E2 H395.29B2 C2 D2 H395.40B1 C2 D2 D3 H395.47A4 B3 C2 D2 E2 H3
WT567.37A286.25A1 E290.23A3 C3 E290.70A2 C3 E2 E391.23A1 B2 C3 E2 E391.49A1 B3 C1 C3 E3 G1
67.28A386.08A3 E290.12A2 C3 E290.64A3 C3 E2 E391.22A3 B2 C3 E2 E391.47A2 B3 C1 C3 E3 G1
67.21A186.05A2 E290.12A1 C3 E290.63A1 C3 E2 E391.22A2 B3 C3 E2 E391.46A2 B1 C1 E2 E3 G1
66.31B385.96A3 E390.01A2 C2 E390.62A1 B1 C3 E291.22A1 B1 C3 E2 E391.42A3 B3 C1 C3 E3 G1
66.27B285.92A3 E189.98A2 C3 E390.59A1 B3 C3 E291.22A1 B3 C3 E2 E391.42A2 B2 C1 C3 E2 E3
(b) F1-score
(b) F1-score
F1-Score1FF1-Score2FF1-Score3FF1-Score4FF1-Score5FF1-Score6F
WT10.9238A10.9397A2 E30.9403A2 B2 B30.9403A1 B1 B2 B30.9404A3 A4 B1 B3 B40.9398A1 A2 A3 B1 B2 B3
0.9237A30.9396A1 E30.9398A3 B4 E20.9399A1 A3 B4 E30.9399A1 A2 A3 B4 E30.9395A1 A4 B1 B2 B3 B4
0.9231A20.9397A3 B10.9402A2 B1 B30.9398A1 A2 A3 E30.9397A1 A3 A4 B4 E30.9398A1 A2 A3 A4 B4 E2
0.8448B30.9394A3 E30.9399A1 A3 E30.9400A3 A4 B2 B30.9398A1 A4 B1 B3 B40.9393A1 A3 A4 B1 B2 B3
0.8442B20.9395A2 B30.9401A1 B1 B20.9399A2 A4 B2 B30.9397A3 B1 B2 B3 B40.9392A1 A2 A3 A4 B1 B2
WT20.8875B30.9557A3 C20.9616A2 C2 D10.9646B1 C2 D1 G30.9671A3 B1 C2 D2 G30.9680A2 A3 B3 C2 D2 H1
0.8862B10.9553A1 C20.9612A3 C2 D10.9646A3 C2 D1 G30.9666A2 B2 C2 D1 G30.9677A1 A3 B1 C2 D2 H1
0.8858B20.9539A2 C20.9606A3 C2 D20.9642A2 C2 D1 H10.9659A1 A3 C2 D2 H10.9677A1 A2 B3 C2 D1 G3
0.8730C20.9526B2 C20.9596A1 C2 D10.9642B1 C2 D2 G30.9659A2 A3 C2 D1 H10.9677A2 A3 B1 C2 D2 G3
0.8555E10.9505B3 C20.9584A2 C2 D20.9642A1 C2 D2 G30.9658A3 B3 C2 D1 G10.9676A1 A2 B1 C2 D1 H1
WT30.8825C30.9198A2 E30.9205A3 B1 E30.9264A3 C1 D3 E30.9289B3 C1 C3 D2 E30.9309B3 C1 C3 D2 E1 E3
0.8827C20.9190A1 E30.9194B1 C3 E30.9255A1 C1 D3 E30.9288B3 C1 C3 D2 E10.9306B1 C1 C3 D2 E1 E3
0.8229B10.9182A3 E30.9196A2 B3 E30.9244B1 C1 D1 E30.9285A3 C1 D2 E1 E30.9301A2 C1 C3 D2 E1 E3
0.8197B20.9158B2 E30.9207A3 A4 E30.9239B3 C1 D3 E30.9281B1 C1 C3 D2 E30.9299A3 C1 C3 D2 E1 E3
0.8189B30.9150B1 E30.9185B3 C3 E30.9242B2 C1 D2 E30.9279B2 C1 C3 D2 E10.9300B1 B4 C1 C3 D2 E3
WT40.9369C20.9453C2 D20.9521B1 C2 D20.9559B1 C2 D2 E20.9562B1 B2 C2 D2 H30.9578B1 C2 D2 D3 E2 H3
0.9261C30.9442D1 E20.9518C2 D2 H30.9551B1 C2 D2 H30.9556B3 C2 D2 E2 H30.9562A4 B1 C2 D2 D3 H3
0.9179C10.9441D2 E20.9499B3 C2 D20.9541B1 B4 C2 D20.9544B1 B3 C2 D2 D30.9560B2 B3 B4 C2 D2 E2
0.9157D20.9431C2 D10.9499C2 D1 H30.9533B1 B3 C2 D20.9546B1 B4 C2 D2 E20.9557B1 B2 C2 D1 D2 E2
0.9124D30.9383B3 C20.9500D1 E2 H30.9534B2 C2 D2 H30.9543B1 C2 D2 D3 H30.9551A4 B3 C2 D2 E2 H3
WT50.7532A20.8767A1 E20.9072A3 C3 E20.9115A2 C3 E2 E30.9159A1 B2 C3 E2 E30.9165A1 B3 C1 C3 E3 G1
0.7526A30.8755A3 E20.9062A2 C3 E20.9109A3 C3 E2 E30.9160A3 B2 C3 E2 E30.9163A2 B3 C1 C3 E3 G1
0.7522A10.8752A2 E20.9063A1 C3 E20.9108A1 C3 E2 E30.9158A2 B3 C3 E2 E30.9163A2 B1 C1 E2 E3 G1
0.7472B30.8742A3 E30.9053A2 C2 E30.9104A1 B1 C3 E20.9159A1 B1 C3 E2 E30.9159A3 B3 C1 C3 E3 G1
0.7469B20.8680A3 E10.9050A2 C3 E30.9100A1 B3 C3 E20.9159A1 B3 C3 E2 E30.9177A2 B2 C1 C3 E2 E3
Table 4. CR (a) and F1-score (b) numerical results for best features for the automatic feature selection algorithms analyzed and each WT. Results are grouped in sub-tables for each algorithm, and each row of each sub-table corresponds to wind turbines (WT1 to WT5). The selected features are coded with the variable codes detailed in Table 2.
(a) CR(%)
(a) CR(%)
CR(%)1FCR(%)2FCR(%)3FCR(%)4FCR(%)5FCR(%)6F
CMI64.73E166.93E1 E483.19E1 E4 F185.89E1 E4 F1 H188.52A1 E1 E4 F1 H189.9A1 C4 E1 E4 F1 H1
53.58E491.76C2 E492.72C2 E4 H194.68A2 C2 E4 H195.51A2 C2 D3 E4 H195.26A2 C2 D3 E2 E4 H1
66.03D382.92B1 D386.97B1 C2 D389.24B1 C2 D3 G390.31B1 C2 D3 E3 G389.90B1 C2 D3 E3 F4 G3
91.62D290.45D2 F393.27D2 E2 F393.15D2 E2 E3 F392.95A1 D2 E2 E3 F392.50A1 D2 E2 E3 F3 H4
53.24E270.03C3 E285.71C3 E2 H384.16C3 E2 F4 H385.03C3 E2 F4 H1 H386.72A1 C3 E2 F4 H1 H3
CMIM64.68E166.74E1 E467.66E1 E2 E483.59C1 E1 E2 E484.94C1 C2 E1 E2 E485.46C1 C2 E1 E2 E3 E4
53.68E489.29D1 E493.73A1 D1 E494.64A1 D1 E2 E494.78A1 D1 E2 E3 E495.14A1 D1 E1 E2 E3 E4
66.02D384.37C3 D388.71B1 C3 D385.15B1 C3 D3 H386.13B1 C3 D3 F1 H386.25A1 B1 C3 D3 F1 H3
91.60D292.63D2 E393.55D2 E2 E392.91A1 D2 E2 E393.21A1 D2 E2 E3 F493.03A1 D2 E2 E3 F3 F4
53.24E256.31E2 E371.53E2 E3 F472.64E1 E2 E3 F472.62E1 E2 E3 E4 F481.87C1 E1 E2 E3 E4 F4
DISR64.84E166.9E1 E466.98B4 E1 E479.69B4 C4 E1 E480.83B4 C4 E1 E2 E480.72A4 B4 C4 E1 E2 E4
53.62E453.05A4 E462.10A4 C4 E492.83A4 C2 C4 E494.40A1 A4 C2 C4 E494.46A1 A4 C1 C2 C4 E4
65.84D365.91A4 D384.76A4 C3 D384.57A4 C3 D1 D386.08A4 C1 C3 D1 D386.43A4 C1 C3 D1 D2 D3
91.52D291.19A4 D291.25A4 D1 D292.07A4 D1 D2 D391.96A4 B4 D1 D2 D393.05A4 B4 D1 D2 D3 E3
53.19E270.07C3 E269.99C3 E1 E270.51C3 E1 E2 E370.80C2 C3 E1 E2 E370.89C2 C3 C4 E1 E2 E3
ICAP64.64E166.84E1 E482.66C1 E1 E483.48C1 E1 E3 E486.50C1 E1 E3 E4 G189.53A1 C1 E1 E3 E4 G1
53.65E489.30D1 E493.45A1 D1 E494.84A1 D1 E2 E495.02A1 D1 E1 E2 E495.08A1 D1 E1 E2 E3 E4
66.28D384.43C3 D388.25B1 C3 D385.13B1 C3 D3 H386.34B1 C3 D3 F1 H386.55A1 B1 C3 D3 F1 H3
92.08D292.80D2 E392.71A1 D2 E392.31A1 D2 E3 F491.65A1 D2 E3 F3 F492.54A1 D2 E3 F3 F4 H1
53.23E256.35E2 E371.69E2 E3 F473.97C4 E2 E3 F482.60C1 C4 E2 E3 F479.92C1 C4 E2 E3 F2 F4
JMI64.67E166.82E1 E467.75E1 E2 E468.35E1 E2 E3 E481.13C4 E1 E2 E3 E485.78C2 C4 E1 E2 E3 E4
53.30E491.96C2 E494.45A1 C2 E495.17A1 C2 D1 E495.07A1 A2 C2 D1 E494.99A1 A2 C2 D1 E2 E4
66.26D382.39B1 D388.40B1 C3 D389.12B1 C3 D2 D388.44B1 C3 D1 D2 D389.94B1 C1 C3 D1 D2 D3
91.43D291.30D2 F392.02D2 D3 F392.73D2 D3 E3 F392.84D1 D2 D3 E3 F393.49D1 D2 D3 E2 E3 F3
53.28E269.95C3 E269.96C3 E1 E281.29C3 E1 E2 F482.09C3 E1 E2 E3 F482.68C2 C3 E1 E2 E3 F4
MIFS64.68E164.76B4 E165.05A4 B4 E171.76A4 B4 D4 E172.57A4 B4 D4 E1 G482.56A4 B4 C4 D4 E1 G4
53.62E453.54A4 E453.11A4 B4 E469.82A4 B4 E4 G472.06A4 B4 E4 F4 G486.47A4 B4 D4 E4 F4 G4
66.27D366.10B4 D366.43A4 B4 D372.47A4 B4 C4 D374.91A4 B4 C4 D3 G481.55A4 B4 C4 D3 G1 G4
91.71D291.48A4 D291.77A4 B4 D291A4 B4 D2 G491.81A4 B4 C4 D2 G492.56A4 B4 C4 D2 G3 G4
53.23E253.42A4 E254.09A4 B4 E266.56A4 B4 E2 G476.26A4 B4 E2 G4 H280.36A4 B4 C4 E2 G4 H2
mRMR64.83E164.94B4 E164.74A4 B4 E178.19A4 B4 C4 E181.16A4 B4 C4 D4 E183.89A4 B4 C4 D4 E1 H1
53.44E453.26A4 E469.94A4 E4 G470.05A4 B4 E4 G471.76A4 B4 E4 F4 G486.77A4 B4 D4 E4 F4 G4
65.81D366.14B4 D366.14A4 B4 D372.29A4 B4 C4 D374.63A4 B4 C4 D3 G481.33A4 B4 C4 D3 G1 G4
91.45D291.32A4 D291.88A4 B4 D290.68A4 B4 D2 G491.37A4 B4 C4 D2 G493.12A4 B4 C4 D2 G3 G4
53.24E253.44A4 E254.16A4 B4 E266.37A4 B4 E2 G476.29A4 B4 E2 G4 H280.40A4 B4 C4 E2 G4 H2
(b) F1-score
(b) F1-score
F1-Score1FF1-Score2FF1-Score3FF1-Score4FF1-Score5FF1-Score6F
CMI0.7015E10.7198E1 E40.8326E1 E4 F10.8608E1 E4 F1 H10.8874A1 E1 E4 F1 H10.9010A1 C4 E1 E4 F1 H1
0.6630E40.9195C2 E40.9279C2 E4 H10.9477A2 C2 E4 H10.9555A2 C2 D3 E4 H10.9528A2 C2 D3 E2 E4 H1
0.7341D30.8359B1 D30.8731B1 C2 D30.8966B1 C2 D3 G30.9072B1 C2 D3 E3 G30.9040B1 C2 D3 E3 F4 G3
0.9178D20.9051D2 F30.9333D2 E2 F30.9322D2 E2 E3 F30.9298A1 D2 E2 E3 F30.9252A1 D2 E2 E3 F3 H4
0.6812E20.7618C3 E20.8597C3 E2 H30.8436C3 E2 F4 H30.8522C3 E2 F4 H1 H30.8695A1 C3 E2 F4 H1 H3
CMIM0.7015E10.7185E1 E40.7262E1 E2 E40.8403C1 E1 E2 E40.8537C1 C2 E1 E2 E40.8592C1 C2 E1 E2 E3 E4
0.6633E40.8953D1 E40.9385A1 D1 E40.9472A1 D1 E2 E40.9484A1 D1 E2 E3 E40.9520A1 D1 E1 E2 E3 E4
0.7338D30.8480C3 D30.8901B1 C3 D30.8567B1 C3 D3 H30.8637B1 C3 D3 F1 H30.8683A1 B1 C3 D3 F1 H3
0.9188D20.9273D2 E30.9363D2 E2 E30.9295A1 D2 E2 E30.9325A1 D2 E2 E3 F40.9314A1 D2 E2 E3 F3 F4
0.6812E20.6933E2 E30.7382E2 E3 F40.7489E1 E2 E3 F40.7490E1 E2 E3 E4 F40.8302C1 E1 E2 E3 E4 F4
DISR0.7022E10.7194E1 E40.7194B4 E1 E40.8088B4 C4 E1 E40.8201B4 C4 E1 E2 E40.8188A4 B4 C4 E1 E2 E4
0.6638E40.6584A4 E40.7063A4 C4 E40.9302A4 C2 C4 E40.9449A1 A4 C2 C4 E40.9455A1 A4 C1 C2 C4 E4
0.7330D30.7319A4 D30.8515A4 C3 D30.8484A4 C3 D1 D30.8637A4 C1 C3 D1 D30.8681A4 C1 C3 D1 D2 D3
0.9179D20.9146A4 D20.9140A4 D1 D20.9223A4 D1 D2 D30.9210A4 B4 D1 D2 D30.9313A4 B4 D1 D2 D3 E3
0.6810E20.7620C3 E20.7572C3 E1 E20.7612C3 E1 E2 E30.7623C2 C3 E1 E2 E30.7629C2 C3 C4 E1 E2 E3
ICAP0.7009E10.7188E1 E40.8310C1 E1 E40.8389C1 E1 E3 E40.8666C1 E1 E3 E4 G10.8970A1 C1 E1 E3 E4 G1
0.6627E40.8949D1 E40.9358A1 D1 E40.9490A1 D1 E2 E40.9508A1 D1 E1 E2 E40.9514A1 D1 E1 E2 E3 E4
0.7358D30.8480C3 D30.8858B1 C3 D30.8562B1 C3 D3 H30.8696B1 C3 D3 F1 H30.8715A1 B1 C3 D3 F1 H3
0.9226D20.9291D2 E30.9275A1 D2 E30.9234A1 D2 E3 F40.9174A1 D2 E3 F3 F40.9252A1 D2 E3 F3 F4 H1
0.6811E20.6935E2 E30.7394E2 E3 F40.7617C4 E2 E3 F40.8371C1 C4 E2 E3 F40.8129C1 C4 E2 E3 F2 F4
JMI0.7006E10.7186E1 E40.7272E1 E2 E40.7324E1 E2 E3 E40.8229C4 E1 E2 E3 E40.8623C2 C4 E1 E2 E3 E4
0.6613E40.9212C2 E40.9454A1 C2 E40.9523A1 C2 D1 E40.9514A1 A2 C2 D1 E40.9505A1 A2 C2 D1 E2 E4
0.7350D30.8307B1 D30.8872B1 C3 D30.8950B1 C3 D2 D30.8883B1 C3 D1 D2 D30.9024B1 C1 C3 D1 D2 D3
0.9167D20.9133D2 F30.9204D2 D3 F30.9276D2 D3 E3 F30.9285D1 D2 D3 E3 F30.9354D1 D2 D3 E2 E3 F3
0.6814E20.7613C3 E20.7568C3 E1 E20.8242C3 E1 E2 F40.8319C3 E1 E2 E3 F40.8377C2 C3 E1 E2 E3 F4
MIFS0.7013E10.7014B4 E10.7035A4 B4 E10.7238A4 B4 D4 E10.7259A4 B4 D4 E1 G40.8275A4 B4 C4 D4 E1 G4
0.6635E40.6610A4 E40.6579A4 B4 E40.6979A4 B4 E4 G40.7203A4 B4 E4 F4 G40.8659A4 B4 D4 E4 F4 G4
0.7351D30.7329B4 D30.7335A4 B4 D30.7326A4 B4 C4 D30.7528A4 B4 C4 D3 G40.8213A4 B4 C4 D3 G1 G4
0.9195D20.9172A4 D20.9199A4 B4 D20.9103A4 B4 D2 G40.9180A4 B4 C4 D2 G40.9258A4 B4 C4 D2 G3 G4
0.6812E20.6815A4 E20.6841A4 B4 E20.6888A4 B4 E2 G40.7639A4 B4 E2 G4 H20.8058A4 B4 C4 E2 G4 H2
mRMR0.7025E10.7027B4 E10.7004A4 B4 E10.7948A4 B4 C4 E10.8169A4 B4 C4 D4 E10.8414A4 B4 C4 D4 E1 H1
0.6618E40.6594A4 E40.6994A4 E4 G40.6988A4 B4 E4 G40.7182A4 B4 E4 F4 G40.8689A4 B4 D4 E4 F4 G4
0.7322D30.7331B4 D30.7315A4 B4 D30.7301A4 B4 C4 D30.7507A4 B4 C4 D3 G40.8199A4 B4 C4 D3 G1 G4
0.9168D20.9159A4 D20.9212A4 B4 D20.9071A4 B4 D2 G40.9137A4 B4 C4 D2 G40.9316A4 B4 C4 D2 G3 G4
0.6812E20.6816A4 E20.6843A4 B4 E20.6869A4 B4 E2 G40.7646A4 B4 E2 G4 H20.8063A4 B4 C4 E2 G4 H2

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Energies EISSN 1996-1073 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top