Wind Fleet Generator Fault Detection via SCADA Alarms and Autoencoders

Featured Application: Novel approach to wind ﬂeet generator fault detection using Supervisory Control and Data Acquisition (SCADA) data and alarm logs. Abstract: A hybrid health monitoring system for wind turbine generators is introduced. The novelty of this research consists in approaching a 115-wind turbine ﬂeet by using the fusion of multiple sources of information. Analog SCADA data is analyzed through an autoencoder which allows to identify anomalous patterns within the input variables. Alarm logs are processed and merged to the anomaly detection output, creating a reliable health estimator of generator conditions. The proposed methodology has been tested on a ﬂeet of 115 wind turbines from four di ﬀ erent manufacturers located in various locations around Europe. The solution has been compared with other existing data modeling techniques o ﬀ ering impressive results on the ﬂeet. An accuracy of 82% and a Kappa of 56% were obtained. The detailed methodology is presented using one of the available windfarms, composed of 13 onshore wind turbines rated 2 MW power. The rigorous evaluation of the results, the utilization of real data and the heterogeneity of the dataset prove the validity of the system and its applicability in an online operating scenario.


Introduction
Wind energy is one of the main enablers of the ongoing renewable energy revolution. It was reported by WindEurope that in 2016, wind energy production overtook coal as the second largest form of power capacity in Europe, right behind natural gas. The strong increasing trend suggests that it is just a matter of time for wind energy to take the lead [1].
Many challenges are yet to be solved to increase wind energy profitability, and operation and maintenance (O&M) in particular has to be improved. It was reported that unexpected breakdowns typically cause 10-15% of production losses, with extreme peaks of 30% [2]. These losses cripple the profit of energy companies, thus it is not surprising to find optimization of O&M through big data, cloud solutions and innovative technologies as one of the top priorities of the industry [3].
Historically, maintenance has been performed via a reactive approach, based on preventive inspections and corrective interventions once failures were acknowledged. New approaches providing predictive maintenance solutions have emerged both in the academia and the industrial scene.
Turbines are commonly equipped with a Supervisory Control and Data Acquisition (SCADA) system, which was initially installed to monitor and operate the system, but lately has been utilized to assess and predict the health status of the turbines as well. SCADA data is recorded by a network of sensors located in the main components of the turbine, the typical sampling frequency is 10 min, making it relatively cheap to collect, transmit and store in a database. All wind fleet operators collect data on their centralized control. SCADA data is collected and stored on Structured Query Language (SQL) databases from SCADA providers or OsiSoft PI system.
Early fault detection can be achieved, as shown by Schlettingen and Santos, by building a model that captures normal operation of the system and by comparing the difference between predicted and measured values of a key variable, to detect anomalies [4]. This approach does not fully take advantage of the high dimensionality of the SCADA dataset and focuses only on the behavior of a single key variable, while component failures are typically complex and can manifest themselves in different failure modes.
The literature is rich with examples based on power curve modeling of wind turbines [5][6][7][8]. This approach is based on tracking the relation between wind speed and output power, the function that describes the relation between these two variables can be inferred from operational data and compared to the one provided by the manufacturer, and significant deviations from the theoretical power curve can be hints of problems in the turbine. Different algorithms, as well as the introduction of context variables, have been studied in order to get a reliable picture of the turbine behavior. The main drawback of this approach is its incapacity to determine which component is causing underperformance since the turbine is studied as a whole.
Solutions based on control monitoring systems (CMS) are available and have been studied in the literature [9][10][11]. These analyses typically use vibration, sound and acceleration measurements to detect anomalies in the behavior of bearings, gearboxes and other mechanical components. The frequency of the data used for these analyses is much higher than the typical SCADA data, thus bringing more information for the detection of failures. That being said, most turbines are not provided with vibration sensors, the installation of these instruments disrupts the operation of the turbine and can cost a windfarm owner thousands of euros per turbine. The authors of Reference [12] presented a thorough analysis of the available monitoring techniques for wind turbine; regarding the CMS, they highlighted as main challenges: financial cost, difficulty of interpretation of the results and not-trivial integration with all the existent monitoring systems, as well as its scalability.
For these reasons, solutions based on the usage of SCADA data can be particularly interesting for owners of old turbines, since no installation of additional sensors or interruption of their operations is needed. Value can be created from the large quantity of unutilized SCADA data stored in their databases.
The rapid growth of the Deep Learning field led many researchers to apply neural networks to solve data challenges. Autoencoders in particular appear to be a good fit for anomaly detection. Autoencoders have been applied in multiple practical applications, such as anomaly detection of seasonal Key Performance Indicators (KPIs) in web application [13], cyber-security monitoring [14] and monitoring of gas turbine conditions [15].
In the wind energy sector, Jiang et al. stacked multiple autoencoders to extract new representations of vibration data in the event of gearbox failures [16]. Successively, they also utilized denoising autoencoders, enriched with temporal information to assess turbine conditions in a laboratory and online scenario [17]. Finally, autoencoders have been successfully used for ice-detection on turbines' blades by Liu et al. [18].
Alarms and events records have been used to determine the remaining useful life of wind turbines [19]. In Reference [20], the time-sequence of the alarms is analyzed to detect relations between the different alarms, determining the causal relationship between the different events and helping to determine the root-cause of failures.
This research aims to explore the capabilities of autoencoders and SCADA alarms as a hybrid fault detection system for wind turbines' generators. While in the literature examples of predictive strategies based only on SCADA data or alarms are present, no holistic approach using both sources of Appl. Sci. 2020, 10, 8649 3 of 15 information is present. This paper reports a methodology that takes advantage of both SCADA and alarm logs in the same algorithm.
As a benchmark, other typical anomaly detection algorithms are implemented, and their results are compared with the autoencoder's results. Additionally, the overall methodology is compared to a normality model, one of the most common predictive maintenance approaches available in the literature. Given the practical nature of the project, SCADA and alarm logs of existing windfarms are used. Results are validated using maintenance logs and verifying the concordance between the predictions and the available information.
A key aspect of this investigation is the thorough analysis of real data from a heterogeneous sample of data. The dataset includes four different turbine brands, from seven different windfarms, located in different nations and climates (Spain, United Kingdom and Poland). Moreover, the size of the sample is remarkable, as more than a hundred turbines are studied. These factors are rare in the relevant literature, as most of the time, a single turbine or windfarm is analyzed. All these considerations support the applicability of the approach in real-life scenarios and its ability to generalize results to heterogeneous conditions.

Data Description
The source of information used for this research are the SCADA and alarm datasets as inputs to the model, and the maintenance task logs as ground-truth material to evaluate the effectiveness of the methodology. Two years of operation data for more than 100 turbines rated 2 MW and different manufacturers was available. Data has been received directly from the windfarm operator in the form of comma-separated values (csv) and text archives and uploaded in a SQL database.
The dataset was split into a training and test set, maintaining a train/test split ratio of 70-30%. The last 9 months of data have been used as the test dataset, and the remaining data was used for training the algorithms. The utilized data is a real-life dataset of various windfarms operating under common conditions, it is not the results of a simulation. As a consequence, the data required thorough cleaning and pre-processing to get rid of inconsistencies due to sensors' errors and communication malfunctions.

SCADA Dataset
The SCADA dataset contains more than 300 variables as the main systems of the turbine are all monitored (pitch, main shaft bearing, gearbox, generator, etc.). Sampling frequency is 10 min and quantities such as the arithmetic mean, minimum, maximum and standard deviation are computed with the data acquired for this period. The format of the SCADA dataset, as well as the name of the variables and position of the sensors, may vary according to the manufacturer of the turbine. An example of the dataset used in this research is provided in Table 1. Alarms are typically triggered whenever an operating parameter, most typically a temperature, exceeds its normal operation range. Table 2 is an example of the information contained in the alarm Appl. Sci. 2020, 10, 8649 4 of 15 dataset. The alarm description field contains standardized text data, generated by the control system of the turbines.

Work Orders Dataset
All the maintenance tasks that have been carried on in the windfarm, including inspections, regular checks as well as extraordinary interventions, are registered in the work order logs. An example of the available work order logs is provided in Table 3. This information has been used for labeling turbines' SCADA data. Records preceding critical interventions to the turbines have been removed from the training dataset. Work orders have also been used for the prediction evaluation. The information of the work orders is not provided in any form to the predicting algorithm, it is uniquely utilized to process data, assigning labels, and finally, evaluate the predictions, thus being the ground truth for the algorithm.

Autoencoder Anomaly Detection
Anomaly detection via autoencoder is performed providing the network a training dataset composed of normal data, that can be represented as {x(1), x(2), ..., x(m)}. Autoencoders can be divided into two parts: an encoder and a decoder.
The encoder's goal is to reduce the dimension of the data, mapping data into lower dimensional spaces, reducing the number of neurons in each successive layer, until the bottleneck is reached. The number of layers and neurons in the network is determined by a tradeoff between the compression of the input information and the ability to reconstruct the input sufficiently well. Neurons are activated by an activation function such as the one presented by the following equation [21]: where W and b are the weight and bias of the model, and the indexes i and j denote the unit and the layer, respectively. Non-linear activation functions are typically utilized to allow the network to represent non-linear characteristics of the data. In this research, the rectified linear unit (ReLU) function has been used, and is defined as follows: The decoder's function is to reconstruct the encoded data at the best of its possibilities. The entire structure, encoder and decoder, is in fact optimized, minimizing the following cost function, presented in Reference [21]: Appl. Sci. 2020, 10, 8649 5 of 15 in which n l is the number of layers, s l is the number of units in layer L l and λ is the regularization parameters that keep a balance between the memorization and generalization capabilities of the network. As the equation shows, a larger and more complex network would be penalized by the factor λ. The first part of the equation defines the difference between the input and output vectors, and thus a priority of the network will be to minimize this difference.
As explained in Reference [22], anomaly detection using autoencoders can be seen as a semi-supervised learning problem. The autoencoder is trained with normal data and learns its representation in a reduced dimensional space. The reconstruction error is utilized as a metric to determine abnormal data. Data that does not fit the representation learned in the training phase results in higher reconstruction error and can be marked as anomalous.

Methodology
Fusion of multiple sources of information, namely SCADA data anomaly detection and alarm registers, is the core of this research. First, the initial processing of the SCADA data is presented, then the processing of the alarms and the final step of merging the autoencoder and alarms' predictions in unique indicators are discussed separately.

SCADA Data Processing
Of the entire dataset, a subset of six variables is used to model the generator: active and reactive power, temperature of nacelle and generator stator, as well as wind and generator speed. While the dataset was composed of more than 300 variables, just a small selection was kept. Processing all the variables would result in very large computation time and likely lead to overfitting of the data, interpretability of the predictions would also be not trivial since the number of inputs would be very large. The selection of the variables has been done choosing measurements related to the system under evaluation (generator speed, generator stator temperature) as well as context signals that determine the operating status of the turbine (active and reactive power, nacelle temperature and wind speed).
The dataset is split into a training and a test set, the first 70% of the data was used for training and the remaining 30% for test. Data shuffling has been avoided, since the dataset is composed of timeseries and random selection of data could result in information leakage.
Analysis of the maintenance and alarm logs allows to filter out abnormal operating conditions from the training set, as well as remove outliers caused by sensor malfunctions, thus creating a training set composed only by normal operation records. No imputation of missing data was performed. To filter data, pre-processing algorithms [23] are applied. In practice, a range of acceptable values for the input variable of the model is defined and all the data not conforming with this range has been filtered, considered as communication errors.
A crucial part of pre-processing is normalization of data, the training set is used to determine the minimum and maximum value for each input variable, and these values are then stored to be used later on in the test set.

Autoencoder Architecture Selection and Training Process
To determine the optimal architecture (number of layers and neurons, activation function, etc.) of the autoencoder, a grid search approach is used, multiple configurations are tested and the one obtaining the lowest reconstruction error is chosen. Training time and complexity of the network have been considered. A process of trial and error of different configurations is necessary to determine the best structure for the available data; thus, a different dataset could result in a different network structure. The best network layout is a fully connected network composed of six layers, having respectively 7-12-4-12-7 neurons activated by the rectified linear unit (ReLU) function and mean squared error was used to measure the distance between the input and output, and the optimization algorithm is "adam".
Having found the best network layout, its predictions on the training data are created to obtain the distribution of the reconstruction error, which is the difference between the original and the processed Appl. Sci. 2020, 10, 8649 6 of 15 data. The assumption that the reconstruction error does not contain systematic errors is verified, analyzing its distribution that resembles a normal distribution. Using this information, it is possible to determine a critical value to identify anomalies. Three standard deviations from the central value are utilized.

Alarms' Processing
Alarms' data is processed by selecting, from all the alarms available in the dataset, the ones that are more relevant for the generator assembly, such as high temperature, overspeed and overload of the generator or its auxiliaries, such as cooling fans. The alarm description field of the dataset was analyzed by keywords, terms such as: "high-temperature", "error", "warning", "over speed", "overload", etc., were searched. In this step, expert knowledge played an important role in excluding from the initial selection those alarms that do not represent truly critical conditions and not simple communication errors.
Once the list of alarms has been defined, it is possible to count how many times any of the selected alarms has occurred during the period under evaluation. In this research, the authors decided not to assign a different weight to the various alarms and simply counted the occurrences. More refined strategies involving rankings of the alarms, as well as detection of patterns or study of the time separating two consecutive alarms, could be implemented in future studies. According to this indicator, turbines having a higher number of alarms should be prioritized for maintenance.

Indicators' Merging Process
The health predictions are made for the entire period of time comprised in the test set and information is aggregated to construct a generator health indicator. Anomalies are summarized to a weekly resolution, by comparing the number of anomalies detected in each turbine with respect to the windfarm. The distribution of anomalies within the windfarm is calculated and turbines lying at a distance superior to two standard deviations from the central value are considered anomalous. This is done because particular external conditions lead the entire windfarm to behave anomalously while not undergoing a real fault in the generator system.
The generator's health indicator is a vector defined in a two-dimensional space. The components of the vector are the processed output of the autoencoder and the counter of key alarms per turbine during the period of the analysis, the module of the vector is calculated as the Euclidean Sum of the two components. A threshold is defined to determine and prioritize the turbines that require maintenance. Alarms' data is used directly in the model, hybridizing and complementing the results of the numerical analysis performed with the autoencoder. The generated status vector considers anomalies in the numerical data and information from the alarm system. Figure 1 summarizes all the steps of the proposed methodology showing data reception, its storage and preprocessing and the predicting algorithm.

Results
The methodology has been proven on a fleet of more than 100 wind turbines, from four different manufacturers, located in very different geographical locations ranging from hot climates, such as south of Spain, to colder ones such as Poland and the United Kingdom. While adjustments were required due to the different variables and characteristics of the turbines, the overall methodology

Results
The methodology has been proven on a fleet of more than 100 wind turbines, from four different manufacturers, located in very different geographical locations ranging from hot climates, such as south of Spain, to colder ones such as Poland and the United Kingdom. While adjustments were required due to the different variables and characteristics of the turbines, the overall methodology was not modified.

KPIs Definition
A brief explanation of the indicators utilized for the presentation of the results is provided in this subsection.
In order to assess the prediction power of the predictive models, we have used the confusion matrix (CM) as a basic unit of evaluation. The CM consists of four labels given to each prediction according to its veracity. In summary, these labels are true positives (TP, a failure occurs when a failure was predicted), false positives (FP, no failure when a failure was predicted), true negatives (TN, no failure when no failure was predicted) and false negative (FN, failure when no failure was predicted). Using the count of these basic evaluation units, the main KPIs are calculated.
The main KPIs used in this project are sensitivity, specificity, accuracy, Kappa, precision and F1 score. Sensitivity, Recall is the ratio of predicted events over the total of events: Specificity is the ratio of well-predicted negative events over the total of negative events: Accuracy is the ratio of the total well-predicted observations over the total number of observations: Cohen's Kappa is defined as follows: where p 0 is the relative observed agreement among raters, which is analogous to accuracy, and p e is the hypothetical probability of chance agreement, using the observed data to calculate the probabilities of each observer randomly seeing each category. For categories, k, number of items, N, and n ki , the numbers of times the rater i predicted category k, p e can be calculated as follows: A low value of K means that there is no agreement among the raters other than what would be expected by chance. A K value close to one is an indication of good performance of the classifier. Precision is the ratio of predicted events over the total of positive predictions: F1 score is defined as the harmonic mean of precision and recall and it is typically used to measure the accuracy of a test: Appl. Sci. 2020, 10, 8649 8 of 15

Autoencoder and Alarms Results
As one of the main goal of this research is to demonstrate the advantages of merging different sources of information, the results of the autoencoder and an alarm-based predictive system are presented and compared to the numbers obtained using a unique predictor made by the fusion of the two individual methods. Table 4 presents the results obtained using the autoencoder as a unique predictor of the generator status. It can be seen that various failures are anticipated, but the rate of FPs is quite high, as well as the FNs.  Table 5 shows the results obtained using an alarm-based predictor. The results are not so different from the autoencoder's ones, a slightly higher Kappa is achieved by this method, and one more TP was found, while the FPs rate is almost equal. It is clear that neither of the two techniques, on its own, would be sufficiently reliable in a real-life scenario.

Overall Results
As the results of the individual predictors are not sufficiently good, the authors present a hybrid technique that merges the two systems in a more complete predictor, as detailed in Section 2.3. Table 6 presents a summary of the results. The turbines that were obtaining higher values for the health KPIs were reported. Examining the reported turbines and the maintenance log, the results table was done. During the test period, problems such as broken generators, consumed generator brushes or generators bearing damages were encountered.
It can be seen that most of the reported turbines were found to have some problems; moreover, the results across the various windfarms are consistent. The accuracy never gets lower than 70% and the overall Kappa is 56%. The advantages of using a hybrid predictor are clear when its results are compared to the ones of the autoencoder and alarm predictors. The number of TPs increased substantially, and remarkably, the number of FPs was halved. The two components of the composed predictors are complementary, allowing for more accurate and reliable predictions. The Receiving Operator Curve (ROC) is calculated to represent the predictive power of the proposed methodology and its response to adjustments in the cutoff value to apply to the health status vector. In Figure 2, the ROC curves of the different windfarms are presented. The cutoff values are adjusted for each wind farm to obtain optimal results. It can be seen that most of the reported turbines were found to have some problems; moreover, the results across the various windfarms are consistent. The accuracy never gets lower than 70% and the overall Kappa is 56%. The advantages of using a hybrid predictor are clear when its results are compared to the ones of the autoencoder and alarm predictors. The number of TPs increased substantially, and remarkably, the number of FPs was halved. The two components of the composed predictors are complementary, allowing for more accurate and reliable predictions.
The Receiving Operator Curve (ROC) is calculated to represent the predictive power of the proposed methodology and its response to adjustments in the cutoff value to apply to the health status vector. In Figure 2, the ROC curves of the different windfarms are presented. The cutoff values are adjusted for each wind farm to obtain optimal results.   Figure 3 represents the dataset as a whole, without distinction between the different windfarms and simulating the effect of a unique cutoff value. The two dashed line defines the values of the false positive rate and true positive rate that can be obtained by selecting the optimal cutoff value for each windfarm. It can be seen that fixing a unique threshold value yields good results while being a simpler decision strategy, but in applications where the reliability of the prediction is the key objective, the additional complexity provides better outputs. sitive rate and true positive rate that can be obtained by selecting the optimal cutoff value for eac ndfarm. It can be seen that fixing a unique threshold value yields good results while being pler decision strategy, but in applications where the reliability of the prediction is the ke jective, the additional complexity provides better outputs.  Table 6.

. Normality Model Comparison
An additional validation of the results is presented. A normality model using the same inpu ta is trained and utilized to make health predictions of the generators. Details on how to build rmality model are available in Reference [4]. The value of the generator stator temperature edicted by a ridge regression model and the prediction error is used as a metric for the generato tus. Details on the algorithm can be found in Reference [24], the decision of using this algorithm tated by its capacity to deal with multicollinearity in the inputs. The results of the normality mode presented in Table 7. One can see that while the normality model yields reasonable results, it scores lower overa lues for the tracked indicator when compared to the presented methodology. In particular, ould be noted that the number of FPs is more than double the proposed solution and the tota mber of TPs is lower. The only case in which the normality model performed better is WF7, wher o additional TPs are found.
The presented results were obtained using a large sample of real data. The sample is extremel terogeneous since it represents four different turbine brands, and the windfarms are located i  Table 6.

Normality Model Comparison
An additional validation of the results is presented. A normality model using the same input data is trained and utilized to make health predictions of the generators. Details on how to build a normality model are available in Reference [4]. The value of the generator stator temperature is predicted by a ridge regression model and the prediction error is used as a metric for the generator status. Details on the algorithm can be found in Reference [24], the decision of using this algorithm is dictated by its capacity to deal with multicollinearity in the inputs. The results of the normality model are presented in Table 7. One can see that while the normality model yields reasonable results, it scores lower overall values for the tracked indicator when compared to the presented methodology. In particular, it should be noted that the number of FPs is more than double the proposed solution and the total number of TPs is lower. The only case in which the normality model performed better is WF7, where two additional TPs are found.
The presented results were obtained using a large sample of real data. The sample is extremely heterogeneous since it represents four different turbine brands, and the windfarms are located in different geographical locations (Poland, Spain and United Kingdom), characterized by very different climates and wind conditions. Such results are rare in the literature, as many algorithms have been tested either in laboratories or in a reduced sample of turbines.
In Section 4, the detailed analysis of windfarm 5 is proposed. This one was chosen since it has a high prevalence of failures of the generator and two predictions were classified as FN, so it is useful to analyze them in detail to determine the reason why the alarms were raised.

Discussion
The last 9 months of data available were used as a test set. The performance of the autoencoder as an anomaly detector was compared to other algorithms that have been widely used for anomaly detection tasks. Isolation forest and one-class support vector machine were tested. Details on these algorithms can be found in References [25,26].
The same post-processing methodology was applied to all algorithms. Results are presented in Figure 4. Three risk-areas were identified based on the generator's health indicator value distribution. Table 8 provides the information to assess the accuracy of the predictions, and major component replacements that took place during the testing phase are reported.

Discussion
The last 9 months of data available were used as a test set. The performance of the autoencoder as an anomaly detector was compared to other algorithms that have been widely used for anomaly detection tasks. Isolation forest and one-class support vector machine were tested. Details on these algorithms can be found in References [25,26].
The same post-processing methodology was applied to all algorithms. Results are presented in Figure 4. Three risk-areas were identified based on the generator's health indicator value distribution. Table 8 provides the information to assess the accuracy of the predictions, and major component replacements that took place during the testing phase are reported. Figure 4. Comparison of the results obtained by the three implemented algorithms hybridized with alarms information. The higher the distance from the origin, the worse the conditions of the generator. Three areas are identified according to the health status: healthy (green), warning (yellow) and danger (red). The shape determines the presence and type of fault occurred. Table 8. Principal maintenance intervention occurred during the testing phase.

Turbine
Maintenance Description Component WT13 Bearing High Speed Shaft replacement Gearbox-Generator WT11 Generator brushes replaced Generator WT11 Generator bearing Non-Drive End replaced Generator WT10 Generator bearing Non-Drive Endreplaced Generator WT08 Generator bearing Non-Drive Endreplaced Generator WT07 Generator brushes replaced Generator All three algorithms, when merged with alarm information, are able to satisfactorily isolate faulty turbines from the rest. Autoencoder is selected as the algorithm of choice to analyze SCADA data, since it is able to better diagnose faulty turbines even in the absence of alarms data, as in the case of turbine WT13. Moreover, the autoencoder better identifies the high-speed shaft-bearing fault, where isolation forest could not separate it sufficiently and one class Support Vector Machine (SVM) positioned it on the frontier between the warning and safe areas, the ability to identify various failure modes holds large relevance in the selection of the algorithm. Analyzing the results of the autoencoder, it can be noticed that most of the turbines in the critical (red) and dangerous (yellow) areas required replacement of the bearings or brushes of the generator. None of the windmills located in the safe (green) area required maintenance.
A detailed study of the data of WT09 and WT12 was done due to their high anomaly count and absence of maintenance intervention. The input variable distributions of all the signals and some other key variables of the generator have been reviewed thoroughly to understand the reason why the autoencoder has found these turbines to be anomalous. The most relevant relationships related with generator failure are presented here and discussed.
In Figure 5, the distribution of the probability density of the temperature difference across the two sides of the generator bearing of turbine WT09 are represented, compared with the mean value of the windfarm and the characteristic curve of this temperature difference with respect to nominal Figure 4. Comparison of the results obtained by the three implemented algorithms hybridized with alarms information. The higher the distance from the origin, the worse the conditions of the generator. Three areas are identified according to the health status: healthy (green), warning (yellow) and danger (red). The shape determines the presence and type of fault occurred. Table 8. Principal maintenance intervention occurred during the testing phase.

WT13
Bearing High Speed Shaft replacement Gearbox-Generator WT11 Generator brushes replaced Generator WT11 Generator bearing Non-Drive End replaced Generator WT10 Generator bearing Non-Drive Endreplaced Generator WT08 Generator bearing Non-Drive Endreplaced Generator WT07 Generator brushes replaced Generator All three algorithms, when merged with alarm information, are able to satisfactorily isolate faulty turbines from the rest. Autoencoder is selected as the algorithm of choice to analyze SCADA data, since it is able to better diagnose faulty turbines even in the absence of alarms data, as in the case of turbine WT13. Moreover, the autoencoder better identifies the high-speed shaft-bearing fault, where isolation forest could not separate it sufficiently and one class Support Vector Machine (SVM) positioned it on the frontier between the warning and safe areas, the ability to identify various failure modes holds large relevance in the selection of the algorithm. Analyzing the results of the autoencoder, it can be noticed that most of the turbines in the critical (red) and dangerous (yellow) areas required replacement of the bearings or brushes of the generator. None of the windmills located in the safe (green) area required maintenance.
A detailed study of the data of WT09 and WT12 was done due to their high anomaly count and absence of maintenance intervention. The input variable distributions of all the signals and some other key variables of the generator have been reviewed thoroughly to understand the reason why the autoencoder has found these turbines to be anomalous. The most relevant relationships related with generator failure are presented here and discussed.
In Figure 5, the distribution of the probability density of the temperature difference across the two sides of the generator bearing of turbine WT09 are represented, compared with the mean value of the windfarm and the characteristic curve of this temperature difference with respect to nominal power. It can be observed that the behavior of turbine WT09 is widely different from the rest of the windfarm. These considerations lead us to categorizing this prediction as early fault alert of the generator bearing conditions, rather than a false alarm.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 15 power. It can be observed that the behavior of turbine WT09 is widely different from the rest of the windfarm. These considerations lead us to categorizing this prediction as early fault alert of the generator bearing conditions, rather than a false alarm.

Figure 5.
On the left, the characteristic curve that relates the active power and the temperature difference on the two sides of the generator bearing. On the right, the distribution of the probability density function of the temperature difference across the generator bearing. In red, the data belonging to turbine WT09, in black, the mean of the windfarm. Figure 6 shows that turbine WT12 is characterized by an anomalous distribution of the generator stator temperature, in fact the standard deviation of its recorded values is larger than the value of the windfarm, meaning that the generator of this turbine is subjected to less stable operating conditions. This case can also be considered anomalous and worthy of a technical review of the generator. Merging the information of alarms with anomalies provides a more comprehensive health status of the generator. Looking at the plots, it can be seen that alarms are able to isolate most of the faulty turbines, that being said, there are also cases in which a low number, or no alarms are raised, but nonetheless, the turbine was found to be faulty. WT08 problems are detected mainly by the alarm counter, whereas WT13 is purely diagnosed by the anomaly count, the rest of the faults are found by a mix of the two information sources. Ultimately, merging the information from alarms and SCADA data proved a rewarding strategy able to better separate turbines according to their health status, making use of available and easily accessible data. On the left, the characteristic curve that relates the active power and the temperature difference on the two sides of the generator bearing. On the right, the distribution of the probability density function of the temperature difference across the generator bearing. In red, the data belonging to turbine WT09, in black, the mean of the windfarm. Figure 6 shows that turbine WT12 is characterized by an anomalous distribution of the generator stator temperature, in fact the standard deviation of its recorded values is larger than the value of the windfarm, meaning that the generator of this turbine is subjected to less stable operating conditions. This case can also be considered anomalous and worthy of a technical review of the generator.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 15 power. It can be observed that the behavior of turbine WT09 is widely different from the rest of the windfarm. These considerations lead us to categorizing this prediction as early fault alert of the generator bearing conditions, rather than a false alarm.

Figure 5.
On the left, the characteristic curve that relates the active power and the temperature difference on the two sides of the generator bearing. On the right, the distribution of the probability density function of the temperature difference across the generator bearing. In red, the data belonging to turbine WT09, in black, the mean of the windfarm. Figure 6 shows that turbine WT12 is characterized by an anomalous distribution of the generator stator temperature, in fact the standard deviation of its recorded values is larger than the value of the windfarm, meaning that the generator of this turbine is subjected to less stable operating conditions. This case can also be considered anomalous and worthy of a technical review of the generator. Merging the information of alarms with anomalies provides a more comprehensive health status of the generator. Looking at the plots, it can be seen that alarms are able to isolate most of the faulty turbines, that being said, there are also cases in which a low number, or no alarms are raised, but nonetheless, the turbine was found to be faulty. WT08 problems are detected mainly by the alarm counter, whereas WT13 is purely diagnosed by the anomaly count, the rest of the faults are found by a mix of the two information sources. Ultimately, merging the information from alarms and SCADA data proved a rewarding strategy able to better separate turbines according to their health status, making use of available and easily accessible data. Merging the information of alarms with anomalies provides a more comprehensive health status of the generator. Looking at the plots, it can be seen that alarms are able to isolate most of the faulty turbines, that being said, there are also cases in which a low number, or no alarms are raised, but nonetheless, the turbine was found to be faulty. WT08 problems are detected mainly by the alarm counter, whereas WT13 is purely diagnosed by the anomaly count, the rest of the faults are found by a mix of the two information sources. Ultimately, merging the information from alarms and SCADA data proved a rewarding strategy able to better separate turbines according to their health status, making use of available and easily accessible data.

Conclusions
A hybrid fault detection system based on SCADA alarm logs and an anomaly detection autoencoder were presented and validated on a fleet of more than 100 wind turbines, from four different manufacturers, located in different parts of Europe. Real operating data has been used and most of the raised alarms corresponded to problems related to the generator that required the substitution of the component or some parts of it (bearings, brushes).
A detailed explanation of the most critical windfarm was presented to show how the methodology can be applied in practice and the kind of analyses that were carried out to corroborate the results.
It has been observed that the alarm counter is a valid tool to distinguish faulty turbines from healthy ones. That being said, the alarm counter alone cannot anticipate all failures. The fusion of anomalies and alarms information complements the individual approaches, providing a more reliable system.
All five failures that occurred during the test phase were correctly detected. Of the two "false positive" predictions that were obtained, detailed analyses suggested that they are likely early fault detections, rather than errors. Ultimately, this methodology provides windfarm operators a reliable tool to assess the health of generators and improve operation and maintenance of the turbines.
The results of the autoencoder as an anomaly detector were compared with other common algorithms in the literature, such as isolation forest and one-class support vector machine. The results showed that while the other two algorithms provide acceptable results, autoencoders are more confident in their predictions in cases where alarm information cannot help so much with separating faulty from healthy turbines. Autoencoders, having more tunable parameters and allowing for more elaborated structures, are capable to better interpret non-linear data, such as that of a turbines generator. Additionally, the overall methodology was tested against a normality model, and the results clearly showed that the proposed solution ranks better for all the tracked statistics.
This research contributes to present a novel methodology that makes use of data analysis techniques for anomaly detection and consolidates the results, merging the anomaly predictions with information from the alarm system. The large size of the datasets and its diversity contribute to prove the approach as a general solution that can work well in real-life conditions and is not only applicable to a niche of turbines.
Different network architecture, including temporal information and denoising autoencoders, should be explored in future research to boost the accuracy of the system. Interpretability of results is a key aspect that requires further improvements to ensure acceptance of this methodology in the market.