A Data-Mining Approach for Wind Turbine Fault Detection Based on SCADA Data Analysis Using Artificial Neural Networks

: Wind energy has shown signiﬁcant growth in terms of installed power in the last decade. However, one of the most critical problems for a wind farm is represented by Operation and Maintenance (O&M) costs, which can represent 20–30% of the total costs related to power generation. Various monitoring methodologies targeted to the identiﬁcation of faults, such as vibration analysis or analysis of oils, are often used. However, they have the main disadvantage of involving additional costs as they usually entail the installation of other sensors to provide real-time control of the system. In this paper, we propose a methodology based on machine learning techniques using data from SCADA systems (Supervisory Control and Data Acquisition). Since these systems are generally already implemented on most wind turbines, they provide a large amount of data without requiring extra sensors. In particular, we developed models using Artiﬁcial Neural Networks (ANN) to characterize the behavior of some of the main components of the wind turbine, such as gearbox and generator, and predict operating anomalies. The proposed method is tested on real wind turbines in Italy to verify its effectiveness and applicability, and it was demonstrated to be able to provide signiﬁcant help for the maintenance of a wind farm.


Introduction
The increasingly evident climate change and the need to increase the amount of energy produced from renewable sources, dictated by national and international strategic objectives [1], have led to growing interest in the development of technology that allows the utilization of wind as an energy resource. For instance, in the last decade in Europe, wind has been the source characterized by the most significant growth in terms of installed power [2] and, although this number is already very high today, it still seems destined to increase. At the same time, the development of wind technology has been characterized by a continuous growth in the size of turbines, up to over 10 MW. Such large investments require ever-higher levels of reliability and availability.
Furthermore, the search for profitable wind conditions leads to new investments in remote locations that are usually difficult to reach, such as offshore and high altitudes sites. In these conditions, intervention times are long thus the occurrence of unexpected critical failures can generate, in addition to the standard cost of intervention, very high costs related to non-availability. In fact, one of the biggest problems for a wind farm is represented by O&M (Operation and Maintenance) costs that can reach 20-30% of the total costs related to power generation [3].
The wind turbine is a complex system constituted of numerous components and subcomponents, each characterized by the possibility of incurring in different failure types, often difficult to locate and that may impact on other components' health.
In recent years, the scientific research paid great attention to the study of specific components such as the gearbox and generator, characterized by high replacement and repair costs and more extended downtime in case of failure [4].
Long downtime for the wind turbine in a scenario where the request for availability is increasingly high can lead to production losses not acceptable. To prevent such situations is important to research methodologies that can identify malfunctions and failures in their initial state, so that the impact of the loss of productivity can be minimized.
This paper aims to develop a monitoring system based on the use of data from SCADA systems ("Supervisory Control and Data Acquisition"). In condition monitoring and fault detection for wind turbines, the use and analysis of SCADA data have recently been one of the most investigated methods. These systems are nowadays implemented on all wind turbines and make available a large amount of data, sampled with a relatively high frequency (typically 1 Hz) and recorded with their average values every 10 min [5].
We proceed by identifying, among these data, extended periods where the turbine has been free from failures to develop a model that is representative of its "healthy state". This model will be taken as a reference during normal operating conditions and used to highlight the presence of abnormal behavior.
In this work, we address all the phases leading to the development of a methodology aimed at identifying failures: data pre-processing, model development and data postprocessing. Several levels of wind turbine models are proposed: the turbine as a whole, and components such as the generator and the gearbox, which are among the most critical.
The main tools used are ANN (Artificial Neural Networks) to develop models to describe the natural behavior of the system and its components and control charts to support the prompt identification of malfunctions.
The developed methodology is tested on a real case study represented by a wind farm currently operating in an Italian location.
To conclude the introduction of the paper, the diagram in Figure 1 describes the methodological steps followed in the proposed research.

Background
At present, the maintenance policy typically adopted in wind farms is preventive, either scheduled or condition-based according to the development of simple static threshold alerts, with the aim of identifying possible faults in a timely manner to avoid further problems.

Background
At present, the maintenance policy typically adopted in wind farms is preventive, either scheduled or condition-based according to the development of simple static threshold alerts, with the aim of identifying possible faults in a timely manner to avoid further problems.
Interventions of typical scheduled maintenance are, for example, the purging of the generator bearings, the slip ring cleaning, the cleaning of the lubrication system filters, inspections of the gearbox with video endoscope. This type of approach does not, however, assure the avoidance of critical failures. Besides, a failure of a component, albeit not yet critical, could cause the turbine to work in non-optimal conditions, causing significant efficiency losses that could not be detected by the normal performance monitoring systems.
In recent years, various established monitoring methodologies targeted to the identification of faults have been transported to the wind turbine sector: vibration analysis [6][7][8][9][10][11][12][13], the analysis of acoustic emissions [14,15], MCSA (Machine Current Signature Analysis) [16][17][18][19][20], the analysis of oils [21,22] have been applied with interesting results. These techniques have the main disadvantage of requiring additional costs as they entail the installation of additional sensors if real-time control of the system is required [23,24]. Moreover, these methods have effectively been proven on high-speed rotation machines, but their sensitivity, validity and feasibility still need to be further verified on wind turbines in which some components are characterized by slow variation speeds and large dynamic loadings [25].
Also, techniques based on image detection, such as thermal image analysis [26], or microscope analysis [27] can find application in wind turbine fault detection [25]. Despite their effectiveness, the images of the failure modes need to be captured, stored and analyzed, and this requires an extra set up as well as advanced data analysis techniques [28].
Another type of data-driven based approach, on which the methodology proposed in this article is based, utilize the data from the SCADA systems. Most MW-scale wind turbines are already equipped with SCADA systems; therefore, one of the main advantages of these methods compared to those previously mentioned is that they do not require extra sensors, showing significant cost-effectiveness and are considered to be one of the most efficient solutions for wind turbine condition monitoring [29].
Typical condition-based maintenance through SCADA control systems, which are normally used as a support by the maintenance function, being able of generating alert signals (such as for the exceeding of static threshold values of the monitored parameters), is not always effective because it often does not allow intervention times sufficient to prevent critical scenarios.
To overcome these limitations is, therefore, necessary to move towards condition-based maintenance developed through more complex models, able to assess the interaction between the operating variables and the boundary conditions and identify in a predictive manner anomalous operating conditions before significant performance losses are generated.
Physical models, regression-based model, artificial neural networks or even machine learning techniques are widely used [30].
Instead of physical models, which use physical and thermodynamic relations to derive exactly determined output variables, data-driven models use historical data to identify the relationships between the input and output variables defined. From this point of view, therefore, the approaches based on physical models require a thorough knowledge of the specific structure of the system and its behavior in different operating conditions, often obtainable with great difficulty [31,32], therefore not easily feasible.
In contrast, data-driven models have the advantage of getting good accuracy in modeling without the need for large interaction with the end-user of the instrument [31].
The success of such an approach aimed at identifying failures is determined by the accuracy of the model developed. Several tools, such as K-nearest Neighbors [29,33], clustering algorithms [34], Support Vector Machines [35][36][37][38], both static and dynamic neural networks [39][40][41][42][43][44][45][46][47][48][49], and even deep learning approaches [50,51], have been proven very effective in modeling the relations between the parameters of a wind turbine. In the methodology proposed in this paper, the principal tools used to model the turbine behavior are ANNs, as these have been shown to be very promising in numerous applications, especially in those where different methods are compared [52,53].
To obtain more information about the different applications and techniques used in condition-monitoring and fault detection for wind turbines refer to Appendix A, where a more in-depth analysis of the publications investigated is reported.
Additionally, one of the objectives of this work is also to contribute to the scientific literature by addressing the current absence of methodological support that explains in detail the configuration of the tools and models to be developed as these phases significantly determine the system's ability to detect operating anomalies. These phases are described in the next section.

Methodology
This paper aims to propose a comprehensive methodology to design and apply a clear and effective approach based on the use of ANN and SPC (Statistical Process Control) for the fault detection of wind turbines.
The methodology defines all the steps to follow to create and deploy a fault detection control system, integrating tools from different fields (i.e., supervised and unsupervised machine learning techniques to develop data-driven models and techniques and multiple control charts from statistical process control), and which can be reliably applied to different scenarios.
While only a few of the scientific contributions highlighted in Appendix A present a general approach and only a few researchers have tried to improve their analysis with the support of statistical control charts, as has been done in this work, none of them have integrated all these steps and tested the final applicability of the resulting methodology on a real case study application.
The main steps of the approach here presented are the following: 1. Data acquisition and data pre-processing: the data are acquired, cleaned and prepared to be suitable for subsequent processing; 2.
Model processing: the different models of the turbine and its components are developed and configured; 3.
Post-processing: the deviations are evaluated using the control chart.
Large databases, generally regarding several years of operation of the wind turbine, and information about the maintenance interventions carried out are required. We then proceed by identifying among these data extended periods where the turbine has been free from failures to develop a model that is representative of its state of health (model training phase). These models will be the reference in the testing phase, where, with a new dataset, this time representative of general operating conditions of the turbine (therefore not excluding any possible failure), the system's ability to identify the presence of anomalies will be validated. Only after this last phase, the model is deemed ready to be used on real-time data. Figure 2 represent respectively, the process to elaborate the model and the application of the model in the control phase.

Data Acquisition and Data Pre-Processing
In order to build representative models for the system, it is necessary to have the historical monitoring data of an adequate timeframe for the customization of the models. The width of the time interval will be influenced by the frequency of available data and the typical behavior of the system (i.e., the data used should be representative of all possible conditions of the system examined).
It is also important to have accurate and detailed information on maintenance interventions (both preventive and corrective) performed on the system for the same period.
Finally, any alarm signals generated automatically by the measurement system is considered useful support to model processing. During this activity, an assessment of the quality and quantity of information available is also made in order to highlight any need for additional information before the start of the model processing phase.
3. Post-processing: the deviations are evaluated using the control chart.
Large databases, generally regarding several years of operation of the wind turbine, and information about the maintenance interventions carried out are required. We then proceed by identifying among these data extended periods where the turbine has been free from failures to develop a model that is representative of its state of health (model training phase). These models will be the reference in the testing phase, where, with a new dataset, this time representative of general operating conditions of the turbine (therefore not excluding any possible failure), the system's ability to identify the presence of anomalies will be validated. Only after this last phase, the model is deemed ready to be used on real-time data. Figure 2 represent respectively, the process to elaborate the model and the application of the model in the control phase. The elaboration of the model starts from the acquisition of the historical Supervisory Control and Data Acquisition (SCADA) data from the wind turbines with the aim to obtain a trained Artificial Neural Network (ANN) model able to represent the turbine in its "healthy state". (b) Use of the trained model in the control phase: receiving as inputs the new measurements, the model elaborates the predicted values of the output variable that is then compared with the actual values measured by the measurement system at the same time. The deviation between the two values (actual and estimated by the model) is evaluated statistically to assess the health state of the turbine.

Data Cleaning
During the training of the models, when the relationship that binds inputs and outputs is identified, the presence of outliers is a condition particularly dangerous. Indeed, in the definition of the "healthy behavior" of the system examined, the presence of an anomaly in the data can strongly affect the accuracy of the resulting models, and therefore operations aimed at cleaning the dataset are necessary.
The data cleaning phase can consist of several steps. Once identified, potentially relevant variables for the model processing phase, the following steps should be performed:

•
Removal of samples in which at least one input or output signal is missing; • Removal of samples in which the wind turbine output power is zero; • Removal of samples where one or more variables are out of the range of normal variation (is also essential to identify the cause of such an occurrence).
In addition to sensor errors, a good part of the anomalous behaviors can be due to the artificial power reductions to which the wind farms can be subjected. The power limitations may be due to maintenance requirements, but mostly they are due to constraints imposed by the national power grid to overcome dispatching problems. These behaviors are not considered normal, and samples affected by these restrictions have to be removed (information about the power limitation is generally present in SCADA data).
Although a simple preliminary filter is often sufficient to remove most of the outliers, it is suggested to consider using more specific techniques in case the preliminary cleaning phase fails to exclude the presence of all outliers. Thus, the data set for training can be further cleaned up to ensure better accuracy of the model.

Clustering and Mahalanobis Distance
For this purpose, a clustering method that, albeit with some diversification, has been applied, with excellent results, to similar problems, is proposed [34,45]. The method is based on the removal of outliers using the evaluation of the Mahalanobis distance. The Mahalanobis distance is defined as the distance, measured in terms of standard deviation from the average, of a point from a distribution, and it takes into account the correlation in the data since it is calculated using the inverse of the variance-covariance matrix of the data set of interest [54].
To improve the identification of outliers is useful to divide the dataset into smaller groups [34]. Looking at a characteristic curve of a turbine reported in Figure 3, we realize how it behaves differently in the different areas of the power curve. In the criterion proposed by [43,45], it is recommended to divide the parameters considered in clustering into intervals where the turbine behavior changes. A simple method of dividing observations into a given number of groups is to use K-means clustering. To improve the identification of outliers is useful to divide the dataset into smaller groups [34]. Looking at a characteristic curve of a turbine reported in Figure 3, we realize how it behaves differently in the different areas of the power curve. In the criterion proposed by [43,45], it is recommended to divide the parameters considered in clustering into intervals where the turbine behavior changes. A simple method of dividing observations into a given number of groups is to use K-means clustering. After subdividing the samples, to determine the outliers, we proceed to calculate the Mahalanobis distance for each observation following the subsequent formula [54]: where represents the Mahalanobis distance of the i-th sample , represents the coordinates of the center of the n-th cluster (with = 1 … ) corresponding to the sample , while is the covariance matrix by . To determine the outliers, we can use a simple method proposed by [34]. This method consists of setting a threshold value δ of the distance to consider about 10-15% of the points as anomalous.

Model Processing
In the model processing phase, an Artificial Neural Network is defined. Neural networks are a powerful modeling tool. The choice of their use is dictated by the fact that they have been proven to be very capable of modeling the complex non-linear relationships between the characteristic parameters of a wind turbine.
Regardless of the type of architecture chosen, the configuration of models based on After subdividing the samples, to determine the outliers, we proceed to calculate the Mahalanobis distance for each observation following the subsequent formula [54]: where MD i represents the Mahalanobis distance of the i-th sample x i , C n i represents the coordinates of the center of the n-th cluster (with n = 1 . . . k) corresponding to the sample x i , while cov(x) is the covariance matrix by x.
To determine the outliers, we can use a simple method proposed by [34]. This method consists of setting a threshold value δ of the distance MD to consider about 10-15% of the points as anomalous.

Model Processing
In the model processing phase, an Artificial Neural Network is defined. Neural networks are a powerful modeling tool. The choice of their use is dictated by the fact that they have been proven to be very capable of modeling the complex non-linear relationships between the characteristic parameters of a wind turbine.
Regardless of the type of architecture chosen, the configuration of models based on neural networks have essential common points: the choice of the number of layers, the number of neurons in each layer and the selection of the algorithm to be used in the training phase.
In general, increasing the number of layers and using a more significant number of neurons within them increases the capacity of the neural network; however, it requires greater computational complexity and increases the probability of overfitting [55].
Overfitting or overtraining is one of the most typical problems encountered in the creation of models based on neural networks: the network performs well during the training phase but fails to replicate as good results when it is working with new data.
There are no rules that allow choosing which architecture is the best, nor the number of layers, nor the number of neurons that compose them. The best method, albeit expensive in computational terms, is to rely on a trial and error procedure: different configurations are tested, and the one that gives the best results is selected [55].
Since this is a problem with different orders of infinity, it will be impossible to test all possible configurations to find the best solution: it will therefore be of fundamental importance to have an idea of which configurations are most used for the application considered. In this phase, the support of the bibliographic research carried out will therefore be fundamental.
For wind turbine applications, feed-forward neural networks (FFNN) are by far the most used. There are also numerous applications of recursive networks, which in several cases have shown better performance. As regards the configurations of the structure, the most adopted provide a single hidden layer with a number of neurons varying between 10 and 30, which, despite being a very simple configuration, has shown satisfactory results. In all the applications viewed, the Marquardt-Levenberg algorithm is used to train the network. It is an evolution of the backpropagation algorithm that has been proven to be faster and more efficient than other standard algorithms for neural networks composed of a few hundred neurons [56].
The idea of our work was to start the experimentation of these tools starting from feed-forward architectures, both static and dynamic, chosen for their simplicity, and the non-linear auto-regressive networks with exogenous inputs (NARX) as representative of recursive networks, selected for their positive applications [43,45].
In order to perform the training of the model, the available dataset will be separated into three parts:

1.
Training set (used to effectively train the model, defining the hyperparameters of the ANN); 2.
Validation set (necessary to overcome the overfitting problem); 3.
Test set (final set, never seen by the trained model, used instead to assess its real performance).
Typical percentages of division for the dataset are 70-15-15. The model thus created will then be used in the monitoring and control phase: receiving as inputs the measurements of some variables of the system (both operational and environmental; their choice is to be determined through the study of technical and scientific literature), the model generates the value of the output variable, characterizing the system in its healthy condition.
The output variable generated by the model is then compared with the actual value measured by the measurement system at the same time. The deviation between the two values (actual and estimated by the model) is evaluated statistically using control charts to identify anomalies in place in the system. The preliminary analysis of control charts is necessary to have a first evaluation of the performance of the models in highlighting anomalous behaviors in the past. The choice of variables should be made in order to be able to characterize the "healthy" behavior of the system completely. Inputs and outputs will significantly characterize the system's ability to monitor components and identify faults. The inputs and outputs that can allow adequate visibility of abnormal behaviors are not known a priori and are not easy to determine. A careful analysis of the system is needed, and important skills are required to estimate the mutual influence to which the parameters of a wind turbine are subjected.
For this phase, the support of bibliographic research is essential to understand which of the variables made available by SCADA systems are the best for implementing a turbine monitoring system through its components. The determination of the link between these quantities is generally based on the mixed-use of data reduction techniques (e.g., Principal Component Analysis) and engineering knowledge.
In Table 1, the different inputs and outputs for the proposed models are presented. Their choice has been the results of research in the scientific literature in order to identify the possible variables to use.

Post-Processing
After having processed the model and having evaluated its accuracy, it is necessary to analyze the behavior of the resulting deviations.
The deviations are calculated as: The deviation between each pair of values (actual value and value estimated by the model) is evaluated statistically through the use of a control chart to identify anomalies in the system. In particular, in this approach, the Shewhart control chart is used.
The deviations are plotted on the chart showing their evolution over time. Two control limits are added to simplify the evaluation of anomalous behaviors [58]: The Lower Control Limit (LCL).
These values define the sensitivity of the control chart and are often set as multiples of the standard deviation of the deviations' distribution, σ. The standard deviation is Energies 2021, 14, 1845 9 of 25 calculated from the moving range MR, as the difference between the i-th deviation and the previous one: when the system examined shows a healthy behavior (compliant with the model) the deviations on the chart will show a normal statistical distribution with a mean equal to zero, on the contrary, the presence of non-random patterns (e.g., points outside the control limits, mixtures or shifts of the average) are signals of non-conformity with the model and therefore of anomalous behavior.
Once the model has been validated, by ascertaining that the anomalies detected are real faults, the model can be used to enact real-time fault detection. Figure 4 summarizes the proposed method to perform fault detection in wind turbines.
Energies 2021, 14, x FOR PEER REVIEW 10 of 26 when the system examined shows a healthy behavior (compliant with the model) the deviations on the chart will show a normal statistical distribution with a mean equal to zero, on the contrary, the presence of non-random patterns (e.g., points outside the control limits, mixtures or shifts of the average) are signals of non-conformity with the model and therefore of anomalous behavior.
Once the model has been validated, by ascertaining that the anomalies detected are real faults, the model can be used to enact real-time fault detection. Figure 4 summarizes the proposed method to perform fault detection in wind turbines.

Case Study Application
The application of the proposed methodology is carried out on a selection of wind turbines from a wind farm in southern Italy. The turbines' model is Vestas V90 2MW.
The data available are in different formats: • SCADA data recorded every 10 min, from 1 January 2015 to 9 January 2018, for a total of 192 sampled variables;

Case Study Application
The application of the proposed methodology is carried out on a selection of wind turbines from a wind farm in southern Italy. The turbines' model is Vestas V90 2MW.
The data available are in different formats: • SCADA data recorded every 10 min, from 1 January 2015 to 9 January 2018, for a total of 192 sampled variables; • Service report, in which for each month from January 2015 to October 2016, the records of the maintenance interventions carried out are collected.
The models were created with historical data and were then applied to the following period to assess the tool. In order to do so, therefore, maintenance information has been essential.
In the following paragraphs all the considerations that emerged in the different phases are reported-the issues of data pre-processing, the choice of ideal configurations and the selection of the most suitable models. Finally, the capacity of the developed system has been tested for the identification of faults that, in the period investigated, were found in the turbines of the wind farm.
As previously specified, we followed two different approaches-the first, at a higher level, based on the turbine monitoring as a whole entity via the power output, and then the development of more specific models for the major components, in particular for the generator and the gearbox. The input and output variables of the developed models, determined thanks to the literature review, are shown in Table 1.
In Table 2, it is possible to observe a list with the most critical faults concerning the components on which we have concentrated. Below, for each component, we will see how the system reacts.

Data Pre-Processing
The first general filtering operation is aimed at avoiding the presence of anomalous points that can have a negative impact on the accuracy of the model. The following categories of data have been excluded: • Output power is zero; • Instances in which at least one of the measures of the relevant variables is missing; • Instances in which the turbine is working under a regime of limited power.
In Figure 5a it is shown the initial state of the data for one of the turbines, where the presence of numerous outliers is obvious. Already in this first phase, one of the characteristics that will have a great impact on our system emerges: about 50% of the data are affected by a limited power regime. rithm to divide the data into subgroups and afterward use the Mahalanobis distance to detect abnormal points and delete them.
In Figure 5c, there is an example of the outliers' removal using this method. For the number of clusters, a value has been chosen equal to 12, while the threshold distance was defined as such a value that the 5% of the points are considered anomalies.
Finally, Figure 5d shows the training dataset clean and ready to be used in training the model.   The power limitations are mainly due to constraints imposed by the national power grid to overcome dispatching problems and, since these behaviors are not considered normal, the samples affected by these restrictions have to be removed.
An example of applying the first general filter is shown in Figure 5b. For the selection of an appropriate set of data to be used for the training of the different models, the maintenance records were analyzed. Through the service reports the history of the individual turbines in the wind farm can be reconstructed in terms of the failures that occurred. For the different models, a part of the dataset is manually selected where the turbine has been free from faults that may have had repercussions on the monitored variables. There are no general rules for establishing the ideal size of the dataset to be used in training, but it should contain all the natural variability of the quantities used as inputs and outputs. In this regard, where possible, an annual interval of operation for the wind turbine is used.
With the aim of increasing the performance of the models, the training set is subjected to a second filtering operation: data outside of the normal operating range of the turbine has been removed through the use of a clustering method, based on the K-means algorithm to divide the data into subgroups and afterward use the Mahalanobis distance to detect abnormal points and delete them.
In Figure 5c, there is an example of the outliers' removal using this method. For the number of clusters, a value has been chosen equal to 12, while the threshold distance was defined as such a value that the 5% of the points are considered anomalies.
Finally, Figure 5d shows the training dataset clean and ready to be used in training the model. Figure 6 shows a graphic representation of the output variables chosen to realize the four models developed by one of the turbines analyzed during the training phase.

Model Processing
To develop the models, both static and dynamic FFNN and recursive networks (NARX) have been considered.
As there are no general rules for the optimal configuration to be used, the number of layers and the number of neurons inside them are determined following an experimental campaign, widely varying these two characteristics for the different types of networks tested. Although increasing the size of the network generally results in better performance, the results obtained show how using neural networks with more than two hidden layers and with a number of neurons greater than 30 does not lead to a substantial improvement of the models.
To facilitate easy training and avoid the phenomenon of overfitting, to which large neural networks are particularly prone, in this application it is preferred the use of neural networks with a maximum of one hidden layer and with a number of neurons not exceeding 30, characteristics that will be established from time to time through an iterative procedure. Furthermore, to improve the network's ability to generalize, in addition to the standard early stopping methodology, with a division of the train, validation and test sets of 70%, 15% and 15% respectively, the network has been tested on an additional independent test set equal to 20% of the set used in the entire training. The network characterized by the best performance has been selected.
Although the literature shows dynamic feed-forward networks or recursive networks such as NARX are particularly suitable for this problem [43,45], for our particular application, the use of the suggested tools did not produce the expected results: the best performances of the models have in fact been obtained with the use of the simplest neural networks, the static feed-forward. One reason why these regressive approaches have not been proven suitable is certainly their sensitivity to "missing" data, in this case caused by the copious, but inevitable, removal of the numerous outliers.
All neural network models were developed using MATLAB.

Model Processing
To develop the models, both static and dynamic FFNN and recursive networks (NARX) have been considered.
As there are no general rules for the optimal configuration to be used, the number of layers and the number of neurons inside them are determined following an experimental campaign, widely varying these two characteristics for the different types of networks tested. Although increasing the size of the network generally results in better performance, the results obtained show how using neural networks with more than two hidden layers and with a number of neurons greater than 30 does not lead to a substantial improvement of the models.
To facilitate easy training and avoid the phenomenon of overfitting, to which large neural networks are particularly prone, in this application it is preferred the use of neural networks with a maximum of one hidden layer and with a number of neurons not exceeding 30, characteristics that will be established from time to time through an iterative procedure. Furthermore, to improve the network's ability to generalize, in addition to the standard early stopping methodology, with a division of the train, validation and test sets of 70%, 15% and 15% respectively, the network has been tested on an additional independent test set equal to 20% of the set used in the entire training. The network characterized by the best performance has been selected.
Although the literature shows dynamic feed-forward networks or recursive networks such as NARX are particularly suitable for this problem [43,45], for our particular application, the use of the suggested tools did not produce the expected results: the best performances of the models have in fact been obtained with the use of the simplest neural networks, the static feed-forward. One reason why these regressive approaches have not been proven suitable is certainly their sensitivity to "missing" data, in this case caused by the copious, but inevitable, removal of the numerous outliers.
All neural network models were developed using MATLAB.

Wind Turbine Model
The model used for monitoring the output power is made with FFNN, using as input, in addition to wind speed and ambient temperature, wind direction and standard deviation of wind speed, which, albeit to a lesser extent, offer a contribution to the performance of the model. Figure 7 presents a graphical representation of the FFNN model elaborated.

Wind Turbine Model
The model used for monitoring the output power is made with FFNN, using as input, in addition to wind speed and ambient temperature, wind direction and standard deviation of wind speed, which, albeit to a lesser extent, offer a contribution to the performance of the model. Figure 7 presents a graphical representation of the FFNN model elaborated. An example of the use of this model is reported in Figure 8, where a control chart of Power Output deviations is shown. An example of the use of this model is reported in Figure 8, where a control chart of Power Output deviations is shown.

Wind Turbine Model
The model used for monitoring the output power is made with FFNN, using as input, in addition to wind speed and ambient temperature, wind direction and standard deviation of wind speed, which, albeit to a lesser extent, offer a contribution to the performance of the model. Figure 7 presents a graphical representation of the FFNN model elaborated.  The proposed model has been tested on several turbines and compared with another type of model common in literature, based on a non-linear regression (see the formula below from [59]): The proposed model has been tested on several turbines and compared with another type of model common in literature, based on a non-linear regression (see the formula below from [59]): where v represents the wind speed and P max the nominal wind turbine power. Figure 9 presents a control chart of Power Output deviations using this other reference model. where represents the wind speed and the nominal wind turbine power. Figure 9 presents a control chart of Power Output deviations using this other reference model. In Table 3, a comparison between the two models is reported. By the comparison between the two models, it is clear that the first one, the ANN model, is able to overcome the issue of seasonality, assuring a better representation of the behavior of the wind turbine.
Although the output power model provides performances, calculated in terms of Root-Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) perfectly in line with results from the literature, it was not able to identify the occurrence of faults in specific components. Therefore, specific models were created for the critical components: gearbox and generator.

Gearbox Model
The ANN model is an FFNN with one hidden layer and 27 neurons (with RMSE = 0.68 °C, MAE = 0.48 °C, MAPE = 0.93% calculated during the training phase). The control chart of the gearbox bearing temperature model is reported in Figure 10.
To assess its application, we should refer to the maintenance history of the wind turbine:  In Table 3, a comparison between the two models is reported. By the comparison between the two models, it is clear that the first one, the ANN model, is able to overcome the issue of seasonality, assuring a better representation of the behavior of the wind turbine.
Although the output power model provides performances, calculated in terms of Root-Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) perfectly in line with results from the literature, it was not able to identify the occurrence of faults in specific components. Therefore, specific models were created for the critical components: gearbox and generator.

Gearbox Model
The ANN model is an FFNN with one hidden layer and 27 neurons (with RMSE = 0.68 • C, MAE = 0.48 • C, MAPE = 0.93% calculated during the training phase). The control chart of the gearbox bearing temperature model is reported in Figure 10.
To assess its application, we should refer to the maintenance history of the wind turbine: Although the value of the deviations decreases, there are still points outside the control limits. However, there are still not enough elements to be able to predict the last maintenance action on the gearbox in February 2017. the replacement of the Intermediate Shaft (IMS) bearings of 27 September 2016. Therefore, it seems that maintenance intervention might have been predicted.
Although the value of the deviations decreases, there are still points outside the control limits. However, there are still not enough elements to be able to predict the last maintenance action on the gearbox in February 2017.

Wind Turbine WT01
The ANN model for the generator bearing temperature is an FFNN with one hidden layer and 28 neurons (with RMSE = 3.80 °C, MAE = 2.11 °C, MAPE = 4.54% calculated during the training phase). The control chart for this application is reported in Figure 11. To assess its application, we should refer to the maintenance history of the wind turbine:

Wind Turbine WT01
The ANN model for the generator bearing temperature is an FFNN with one hidden layer and 28 neurons (with RMSE = 3.80 • C, MAE = 2.11 • C, MAPE = 4.54% calculated during the training phase). The control chart for this application is reported in Figure 11. the replacement of the Intermediate Shaft (IMS) bearings of 27 September 2016. Therefore, it seems that maintenance intervention might have been predicted.
Although the value of the deviations decreases, there are still points outside the control limits. However, there are still not enough elements to be able to predict the last maintenance action on the gearbox in February 2017.

Wind Turbine WT01
The ANN model for the generator bearing temperature is an FFNN with one hidden layer and 28 neurons (with RMSE = 3.80 °C, MAE = 2.11 °C, MAPE = 4.54% calculated during the training phase). The control chart for this application is reported in Figure 11. To assess its application, we should refer to the maintenance history of the wind turbine: To assess its application, we should refer to the maintenance history of the wind turbine: The turbine is subjected to a replacement of the bearings and a replacement of the generator. The latter generated one of the most critical scenarios for wind farm maintenance, keeping the turbine stationary for fourteen days. There is no information of alarms or No alarm is detected in the following months, but the turbine is in a stopped state from 16 August 2016 to 28 August 2016, a period in which, following a request submitted on the 22 August, the replacement of the generator is conducted, ending on 28 August 2016.
The control chart of the deviations of the generator bearing temperature model shows an evident change in variability with numerous points out of control from 5 November 2015, to the replacement of the bearings of 16 May 2016. In particular, the last point out of control dates back to 12 May 2016, when the replacement was ordered. The proposed monitoring system, therefore, seems to predict the anomaly well in advance. Following the replacement of the bearings, the deviations change significantly, presenting a shift of the average, while the variability seems to have returned to normal. Although the shift of the average can be justified by the use of a different type of bearing characterized by different specifications, the growing trend that stops only after replacing the generator is still anomalous.
To assess how early the system could have predicted the anomaly, considering the first three weeks from the replacement of the bearings to be "normal" and calculating the new average of the deviations, 100% of the following points would be above the average just mentioned. Referring to the rules created by Western Electric, which consider eight points on the same side of the control chart to be anomalous, the proposed system would have identified the anomaly on 25 June 2016, approximately two months before the generator's replacement.

Wind Turbine WT02
Now, it is possible to observe another application on a different wind turbine. The ANN models for the generator bearing temperature and the generator slip ring temperature are FFNN with one hidden layer and respectively 24 neurons (with RMSE = 2.46 • C, MAE = 1.28 • C, MAPE = 3.00% calculated during the training phase) and 30 neurons with RMSE = 0.88 • C, MAE = 0.70 • C, MAPE = 3.14% calculated during the training phase). The control chart for these applications are reported in Figure 12 (generator bearing temperature model) and in Figure 13 (generator slip ring temperature model).
To assess its application, we should refer to the maintenance history of the wind turbine: The turbine is subjected to a replacement of the bearings and a replacement of the generator. The latter generated one of the most critical scenarios for wind farm maintenance, keeping the turbine stationary for fourteen days. There is no information of alarms or interventions except for the request to replace the bearings which started on 12 May 2016 and was carried out from 16 May 2016 to 19 May 2016. No alarm is detected in the following months, but the turbine is in a stopped state from 16 August 2016 to 28 August 2016, a period in which, following a request submitted on the 22 August, the replacement of the generator is conducted, ending on 28 August 2016.
The control chart of the deviations of the generator bearing temperature model shows an evident change in variability with numerous points out of control from 5 November 2015, to the replacement of the bearings of 16 May 2016. In particular, the last point out of control dates back to 12 May 2016, when the replacement was ordered. The proposed monitoring system, therefore, seems to predict the anomaly well in advance. Following the replacement of the bearings, the deviations change significantly, presenting a shift of the average, while the variability seems to have returned to normal. Although the shift of the average can be justified by the use of a different type of bearing characterized by different specifications, the growing trend that stops only after replacing the generator is still anomalous.
To assess how early the system could have predicted the anomaly, considering the first three weeks from the replacement of the bearings to be "normal" and calculating the new average of the deviations, 100% of the following points would be above the average just mentioned. Referring to the rules created by Western Electric, which consider eight points on the same side of the control chart to be anomalous, the proposed system would have identified the anomaly on 25 June 2016, approximately two months before the generator's replacement.

Wind Turbine WT02
Now, it is possible to observe another application on a different wind turbine. The ANN models for the generator bearing temperature and the generator slip ring temperature are FFNN with one hidden layer and respectively 24 neurons (with RMSE = 2.46 °C, MAE = 1.28 °C, MAPE = 3.00% calculated during the training phase) and 30 neurons with RMSE = 0.88 °C, MAE = 0.70 °C, MAPE = 3.14% calculated during the training phase). The control chart for these applications are reported in Figure 12 (generator bearing temperature model) and in Figure 13 (generator slip ring temperature model).  °C, certainly generating alarms from the SCADA system. The numerous points out of control in the months preceding the replacement of the bearings seem to be non-random and potentially signals able to predict the need for intervention.
Although to a much lesser extent, the temperature of the slip ring shows several points out of control between October 2016 and February 2017, as reported in Figure 13. Besides, from 12 March 2017, the model is stably beyond the control limits. Deviations that fall within limits correspond to the exact time of replacement of the bearings.

Discussion
In this experimental application, all the steps of the proposed fault detection methodology have been tested.
In the first steps, the data acquisition and pre-processing, some difficulties were encountered caused by the numerous outliers. This situation is typical in the operational context of the investigated application. However, despite a large number of outliers, the application of the proposed clustering method that combines the K-Means algorithm with From the analysis of the control charts of the generator bearing temperature deviations (Figure 12), in the first period, several points beyond the upper limit are noted.
The first alarm dates back to 8 June 2016 and has been repeated fourteen more times until 1 August 2016, the date on which the clogged grease drain channel was cleaned. In this case, the system has shown good forecasting capacity, also providing prediction in reference to the second alarm of 5 September 2016, about the high temperature of the bearings.
Since October 2016, there are repeated points beyond the upper limit with very high deviation values. The anomalous behavior appeared to end on 18 February 2017, and then the deviations go back out of control several times until the bearings were replaced on 11 May 2017.
Unfortunately, as of October 2016, maintenance reports were no longer available, preventing more specific considerations regarding the actual detection of anomalies.
However, it is possible to assume that there have been interventions in the turbine, probably when the deviations drop since the generator temperatures have exceeded 100 • C, certainly generating alarms from the SCADA system. The numerous points out of control in the months preceding the replacement of the bearings seem to be non-random and potentially signals able to predict the need for intervention.
Although to a much lesser extent, the temperature of the slip ring shows several points out of control between October 2016 and February 2017, as reported in Figure 13. Besides, from 12 March 2017, the model is stably beyond the control limits. Deviations that fall within limits correspond to the exact time of replacement of the bearings.

Discussion
In this experimental application, all the steps of the proposed fault detection methodology have been tested.
In the first steps, the data acquisition and pre-processing, some difficulties were encountered caused by the numerous outliers. This situation is typical in the operational context of the investigated application. However, despite a large number of outliers, the application of the proposed clustering method that combines the K-Means algorithm with the use of the Mahalanobis distance was quite efficient. Indeed, it allowed to obtain an adequate dataset for the subsequent phases and the procedure has the additional advantage of being easily automatable to support large-scale applications on wind farms.
Two different monitoring approaches have been undertaken. The first, at a higher level, based on the turbine monitoring via the power output, showed excellent results from the point of view of the model performance but has not been proven capable of signaling the presence of anomalies in the turbine, thus fostering the development of more specific models for the major components, in particular for the generator and the gearbox.
The best results for the detection of faults and operating anomalies were obtained for the generator, where the proposed approach showed evidence of the applicability in the prediction of the occurrence of critical failures.
In addition, the system was able to predict minor interventions carried out as purging and cleaning of the bearings and failures in the ventilation system.
The proposed approach has the notable advantage of being tailored to only use SCADA data that are generally present and are already transmitted in real-time, whereas the other relevant predictive techniques cited before require additional measurement systems that cannot be continuously performed.
Besides, the use of data-driven models, in opposition to physical models, allows for the possibility of getting good accuracy in modeling without the need for an extensive knowledge of the specific structure of the system and its behavior in every operating conditions.
The experimental application has been successfully carried out in the case study presented, but it should be highlighted the fundamental importance that the data acquisition and data collection phases have in this approach. Indeed, historical data that are not enough extensive to be representative of the wind turbine's healthy state would prove detrimental to the application's success.

Conclusions
In this paper, all the phases that lead to the realization of a system aimed at the monitoring and the identification of anomalies of a wind turbine and its main components, such as the generator and the gearbox, have been described. The proposed approach is based on the use of data collected by SCADA acquisition systems. The main tools used for the development of the fault detection methodology are ANN for the development of the models and SPC for the identification and analysis of operating anomalies.
The proposed methodology has the objective to implement a fault detection system for wind turbines on several levels: monitoring the performance of the turbine as a whole, while also monitoring two critical components such as the generator and the gearbox.
The methodological approach has been applied to a real case study regarding two wind turbines to test its effectiveness and it was successful in identifying abnormal behaviors before the insurgence of faults. Thus, the system developed has been proven to be a valuable support to the maintenance of a wind farm, providing additional information to evolve the current maintenance policy based on time-based scheduled inspections and alarms related to the exceedance of static threshold values, which often do not allow sufficient time to prevent critical scenarios.
The proposed method has the possibility of being extended to other major components. A future development of this approach is the evolution of said methodology towards fault diagnosis. In addition to identifying the abnormal behavior, in order to define with precision the cause, faults that occurred should be associated with specific patterns on the control charts, thus laying the foundations for the subsequent automation of the fault diagnosis. To do so, however, deep cooperation with industrial actors is necessary since not only a long-term trial is mandatory but also the support of experts and maintenance personnel is deemed critical to successfully tailor this further step. This aspect would have interesting consequences, even regarding the scheduling optimization of maintenance jobs.
In more general terms, then, the increased awareness of the real health of the system generated by the implementation of such tools would have advantages also concerning the very current theme of power forecasting for renewable energy sources [60].
Author Contributions: All authors contributed equally to the idea and the design of the methodology proposed; A.S. and D.D. were responsible for the case study application; A.S. and D.D. prepared the original draft; A.S. and V.I. contributed to the review and editing and V.I. was responsible for the project supervision. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
This appendix reports the most relevant contributions examined in the scientific literature review performed in this research. It aims to summarize the different applications and techniques used in condition-monitoring and fault detection for wind turbines. The literature review process was performed by using keywords such as "wind turbines", "condition monitoring", "fault detection", "neural networks". Table A1 reports a list of the relevant scientific publications analyzed. For each application is reported: investigated components, the main tools and methods used, the presence of a real case study in which the proposed tools and techniques are validated, and the type of approach followed.
In particular, for the column "Real Case Study" we considered real case studies only those that involved wind turbines operating in real conditions (no prototypes, single components or simulation approaches).