On Real-time Fault Detection in Wind Turbines: Sensor Selection Algorithm and Detection Time Reduction Analysis

In this paper, we address the problem of real-time fault detection in wind turbines. Starting from a data-driven fault detection method, the contribution of this paper is twofold. First, a sensor selection algorithm is proposed with the goal to reduce the computational effort of the fault detection method. Second, an analysis is performed to reduce the data acquisition time needed by the fault detection method, that is, with the goal of reducing the fault detection time. The proposed methods are tested in a benchmark wind turbine where different actuator and sensor failures are simulated. The results demonstrate the performance and effectiveness of the proposed algorithms that dramatically reduce the number of sensors and the fault detection time.


Introduction
Wind energy, in contrast to burning fossil fuels, is a clean and inexhaustible renewable energy source.It is on the way to becoming a leading electricity generating technology in Europe.There is now 142 GW of installed wind power capacity with 131 GW onshore and 11 GW offshore [1].Wind turbines have become the largest rotating machines on earth, while plans for future turbines show even larger diameters up to 200 m (10 to 15 MW) and there is a need for research to cope with the many challenges the technology upscaling implies [2].One of the key challenges is to make wind energy one of the most cost-efficient energy sources.Avoiding unexpected failures seems to be the crux of the matter because of the impact on downtime and thus on the cost of energy [3].Continuous monitoring of wind turbine health using automated failure detection algorithms can improve turbine reliability and reduce maintenance costs by detecting failures before they reach a catastrophic stage and by eliminating unnecessary scheduled maintenance.
Nowadays, a tremendous amount of information is available to support operation of wind turbines.Through the use of lidars, wind data become available, upstream as well as within the turbines.All turbines permanently collect big data from hundreds of sensors, ranging from gearbox oil temperature to stresses in the blade root.All control actions use sensor data as inputs, with increasing resolution and complexity.Condition monitoring relies on clever processing of these data.Proper selection and reduction of the amount of the data is of the utmost importance [3].
Traditionally, condition monitoring systems for wind turbines have focused on the detection of failures in the main bearing, generator, and gearbox, some of the highest cost components on a wind turbine.Two widely-used methods are vibration analysis and oil monitoring [4].These are standalone systems that require installation of sensors and hardware.In this paper, the fault detection strategy presented in [5] is used.It uses on-line SCADA (supervisory control and data acquisition) data already available at an industrial wind turbine to provide advance warning of failures.
In this work, first, sensor selection is investigated based on principal components analysis.The goal is to select a reduced number of sensors to be used by the fault detection method.From a practical point of view, a reduced number of sensors installed in the wind turbine leads to a reduced cost of installation and maintenance.In addition, from a computational point of view, less sensors implies less computational effort.Due to these practical issues (reduction of hardware costs, possibility of storage or physical space, computational and communication burden), sensor selection is currently being studied and applied to several research areas.For instance, Chepuri and Leus [6] propose convex relaxations techniques to select the best subset of sensors that guarantees some recommended performance.Similarly, Bai et al. [7] suggest a sensor selection schedule to minimize the observation cost on a sequential probability ratio test (SPRT).With respect to a more precise problem, Miao et al. [8] investigated the performance of combinations of several metal-oxide sensors for the discrimination of a set of ginsengs.Equivalently and yet in the field of wind turbines, Wang et al. [9] review and apply standard techniques proposed by Jolliffe [10] for the sensor selection for wind turbine condition monitoring.
The contribution of this paper is twofold.On one hand, a sensor selection algorithm based on principal component analysis (PCA) is used to select the sensors that best separate the healthy and the faulty wind turbine with the purpose of fault detection thus leading to some reduction in the computational and communication effort.On the other hand, a reduction in the data acquisition time needed by the fault detection method is investigated.In this second case, the goal is to reduce the fault detection time.
In order to test the proposed sensor selection algorithm as well as the fault detection time reduction, we used data from simulations using the comprehensive wind turbine simulator FAST (fatigue, aerodynamic, structures and turbulence) for a 5 MW wind turbine [11].Different actuator and sensor failures are simulated following the benchmark model proposed in [12].In this benchmark challenge, a more sophisticated wind turbine model (using FAST) and updated fault scenarios are presented.This higher-fidelity model also allows the use of more realistic wind inputs that vary spatially across the rotor plane in addition to temporally.
This paper is organized as follows.In Section 2, the reference wind turbine, wind model, as well as generator-converter actuator and pitch actuator models are recalled.The fault scenarios are described in Section 3. Next, in Section 4, the proposed sensor selection algorithm is presented as well as results showing the effect on the fault detection strategy when using the reduced number of sensors.Thereafter, in Section 5, an analysis is performed to reduce the number of time instants needed by the fault detection method, thus reducing the fault detection time.Then, Section 6 analyzes the effect on the performance of the fault detection strategy when a reduced number of sensors is used altogether with reducing the fault detection time.Finally, conclusions are drawn in Section 7.

Reference WT
The National Renewable Energy Laboratory (NREL) offshore 5-MW baseline wind turbine [13] is used in the simulations.This model is used as a reference by research teams throughout the world to standardize baseline offshore wind turbine specifications and to quantify the benefits of advanced land-and sea-based wind energy technologies.In this work, the wind turbine is operated in its onshore version and in the above-rated wind-speed range.The main properties of this turbine are listed in Table 1.
Table 1.Gross properties of the wind turbine [13].In this work, the proposed fault detection method is SCADA-data based, that is, it uses data already collected at the wind turbine controller.In particular, Table 2 presents assumed available data on a MW-scale commercial wind turbine that is used in this work by the fault detection method.The reference wind turbine has a conventional variable-speed, variable blade-pitch-to-feather configuration.In such wind turbines, the conventional approach for controlling power-production operation relies on the design of two basic control systems: a generator-torque controller and a rotor-collective blade-pitch controller.In this work, the baseline torque and pitch controllers are utilized, but the generator-converter and the pitch actuators are modeled and implemented externally; i.e., apart from the embedded FAST code.This will facilitate to model different type of faults on the generator and the pitch actuator.The next subsections recall these models and also the wind model used to generate the wind data.

Wind Modeling
The TurbSim stochastic inflow turbulence tool (National Wind Technology Center, Boulder, CO, USA) [14] has been used.It provides the ability to drive design code (e.g., FAST) simulations of advanced turbine designs with simulated inflow turbulence environments that incorporate many of the important fluid dynamic features known to adversely affect turbine aeroelastic response and loading.
The generated wind data has the following characteristics: Kaimal turbulence model with intensity set to 10%, logarithmic profile wind type, mean speed is set to 18.2 m/s and simulated at hub height, and the roughness factor is set to 0.01 m.
In this work, every simulation is ran with a different wind data set.

Generator-Converter Actuator Model and Pitch Actuator Model
The generator-converter and the pitch actuators are modeled apart from the embedded FAST code, with the objective to ease the model of different type of faults on these parts of the wind turbine.
On one hand, the generator-converter can be modeled by a first-order differential system [12]: where τ r and τ c are the real generator torque and its reference (given by the controller), respectively, and we set α gc = 50 [13].The power produced by the generator, P e (t), can be modeled by [12]: where η g is the efficiency of the generator and ω g is the generator speed.In the numerical experiments, η g = 0.98 is used [12].
On the other hand, the three pitch actuators are modeled as a second-order linear differential equation, pitch angle β i (t), and its reference u(t) (given by the collective-pitch controller) [12]: where ω n and ξ are the natural frequency and the damping ratio, respectively.In the fault free case, these values are set to ω n = 11.11rad/s, and ξ = 0.6.

Fault Description
In this paper, the different faults proposed in the fault tolerant control benchmark [15] will be considered, as gathered in Table 3.These faults selected by the benchmark cover different parts of the wind turbine, different fault types and classes, and different levels of severity.Usually, pitch systems use either an electric or a fluid power actuator.However, the fluid power subsystem has lower failure rates and better capability of handling extreme loads than the electrical systems.Therefore, fluid power pitch systems are preferred on multi-MW size and offshore turbines.However, general issues such as leakage, contamination, component malfunction and electrical faults make current systems work sub-optimal [16].In this work, faults in the pitch actuator are considered in the hydraulic system, which result in changed dynamics due to either a high air content in oil (fault 1) or a drop in pressure in the hydraulic supply system due to pump wear (fault 2) or hydraulic leakage (fault 3) [17], as well as pitch position sensor faults (faults 5-7).
Pump wear (fault 2) is an irreversible slow process over the years that results in low pump pressure.As this wear is irreversible, the only possibility to fix it is to replace the pump, which will happen after pump wear reaches certain level.Meanwhile, the pump will still be operating and the system dynamics is slowly changing, while the turbine structure should be able to withstand the effects of this fault.Pump wear after approximately 20 years of operation might result in pressure reduction to 75% of the rated pressure, which is reflected by the faulty natural frequency ω n = 7.27 rad/s and a fault damping ratio of ξ = 0.75.
Hydraulic leakage (fault 3) is another irreversible incipient fault but is introduced considerably faster than the pump wear.Leakage of pitch cylinders can be internal or external [16].When this fault reaches a certain level, system repair is necessary, and if the leakage is too fast (normally due to external leakage), it will lead to a pressure drop and the preventive procedure is deployed to shut down the turbine before the blade is stuck in undesired position (if the hydraulic pressure is too low, the hydraulic system will not be able to move the blades that will cause the actuator to be stuck in its current position resulting in blade seize).The fast pressure drop is easy to detect (even visually as it is normally related to external leakage) and requires immediate reaction; however, the slow hydraulic leakage reduces the dynamics of the pitch system, and for a reduction of 50% of the nominal pressure the natural frequency under this fault condition is reduced to ω n = 3.42 rad/s and the corresponding damping ratio is ξ = 0.9.In this work, the slow (internal) hydraulic leakage is studied.
On the contrary to pump wear and hydraulic leakage, high air content in the oil (fault 1) is an incipient reversible process, which means that the air content in the oil may disappear without any necessary repair to the system.The nominal value of the air content in the oil is 7%, whereas the high air content in the oil corresponds to 15%.The effect of such a fault is expressed by the new natural frequency ω n = 5.73 rad/s and the damping ratio of ξ = 0.45 (corresponding to the high air content in the oil).
The generator speed measurement is done using encoders.The gain factor fault (fault 4) is introduced when the encoder reads more marks on the rotating part than actually present, which can happen as a result of dirt or other false markings on the rotating part.
Faults in the pitch position measurement (pitch position sensor fault) are also advised.This is one of the most important failure modes found on actual systems [16,18].The origin of these faults is either electrical or mechanical, and it can result in either a fixed value (faults 5 and 6) or a changed gain factor (fault 7) on the measurements.In particular, the fixed value fault should be easy to detect, and, therefore, it is important that a fault detection, isolation, and accommodation scheme be able to deal with this fault.If not handled correctly, these faults will influence the pitch reference position because the pitch controller is based on these pitch position measurements.
Finally, a converter torque offset fault is considered (fault 8).It is difficult to detect this fault internally (by the electronics of the converter controller).However, from a wind turbine level, it is possible to be detected, isolated, and accommodated because it changes the torque balance in the wind turbine power train.

Sensor Selection
The goal of this section is to present a method to select a reduced number of sensors to be used in the fault detection method.Classical approaches to sensor or variable selection may be summarized in the following example.Let us assume that we have N sensors or variables that are measuring during (L − 1)∆ seconds, where ∆ is the sampling time and L ∈ N. The discretized measures of each sensor can be arranged as a column vector x i = (x i 1 , x i 2 , . . ., x i L ) T , i = 1, . . ., N so we can build up a L × N matrix as follows: It is worth noting that each column in matrix X in Equation ( 2) represents the measures of a single sensor or variable.In general, when some application of principal component analysis is used, and when a large number of variables or sensors is available, the results are usually slightly changed if just a subset of the sensors is used [10].Consequently, a simple approach is to calculate the subset of σ sensors that maximizes the multiple correlation of the N − σ non-selected sensors with respect to the σ selected sensors.A similar approach, based on principal component analysis (PCA), that is also used in the field of feature extraction, is to compute the first principal components and observe the coefficients of the corresponding eigenvectors.More precisely, if the unit eigenvector related to the largest eigenvalue is the sensor associated with the smallest coefficient α = min i=1,...,N α i can be neglected.A comprehensive list of methods for deciding on which variables or sensors to reject can be found in [10].However, when multiway principal component analysis is applied to data coming from N sensors at L discretization instants and n experimental trials, the information can be stored in an unfolded n × (N × L) matrix as follows: In this case, a column in matrix X in Equation ( 3) no longer represents the values of a variable at different time instants but the measurements of a variable at one particular time instant in the whole set of experimental trials.Consequently, even though PCA can be applied to these kind of matrices as a way to reduce the dimensionality of the data and to create a new coordinates space where the data is best represented, the eigenvalues and eigenvectors of the covariance matrix C X = 1 N×L−1 X T X cannot be directly used to infer what variables or sensors could be neglected.In addition, we are not only interested in the sensors that best model the healthy wind turbine but the sensors that best discriminate the faulty wind turbine.This is one of the main differences between the work proposed by Wang et al. [9] and the strategy presented in the present work: while in [9] the authors use principal component analysis to reduce the number of inputs (sensors) to build the model of the active and reactive powers of a wind turbine as a multiple-input multiple-output linear system, in this paper, PCA is used to find the sensors that best separate the healthy and the faulty wind turbine with the purpose of fault detection.
The overall strategy to select the best subset of sensors that discriminate the healthy and the faulty wind turbine is to create a multiway PCA model measuring a healthy wind turbine.With the model, and for each fault scenario, we measure the Euclidean distance between the arithmetic mean of the projections into the PCA model that come from the healthy wind turbine and the mean of the projections that come from the faulty one.The subset of sensors related to the maximum distance between the means of each pair of projections will be the selected sensors.The detailed algorithmic procedure is described in the next subsection.

Sensor Selection Algorithm
Sensor selection can be essentially viewed as a combinatorial problem involving some kind of performance criterion over all possible options.In the subsequent algorithm, some parameters must be selected, such as the cardinal of the initial set of variables or sensors N, the number of sensors σ to be combined or the number of principal components: 1. Consider a set S = {s 1 , s 2 , . . ., s 13 } of N = 13 sensors as in Table 2. 2. Consider a number of sensors σ to be combined, σ = 2, . . ., N.
Arrange the collected data coming from the σ sensors in a matrix as follows: n h 1 y (1) Perform a sensor-based group scaling and project the data to the principal component space using the matrix product Th = Y h P ∈ M n h × (R).Define t i h ∈ R , i = 1, . . ., n h each row vector of matrix Th .Note that n h is a natural number not necessarily equal to n in Step 4. 9. Measure, from a faulty wind turbine, sensors s (1) , s (2) , . . ., s (σ) during (n f L − 1)∆ seconds.
Arrange the collected data coming from the σ sensors in a matrix Y f ∈ M n f ×(σ×L) (R) as follows: Perform a sensor-based group scaling and project the data to the principal component space using the matrix product Tf = Define µ f as the mean vector value of where Therefore, given a particular fault scenario, the σ sensors in the set S σ κ σ are the sensors that separate most of the data coming from the healthy wind turbine and the data coming from the faulty one.

Results of the Sensor Selection
The results of the sensor selection are summarized in Table 4 when the number of sensors to be combined is σ = 6 and the number of principal components is = 10.
More precisely, with respect to Table 4, it is worth noting that sensors 5, 6 and 7-corresponding to the first, second and third pitch angles-appear as selected in all of the eight fault scenarios.In this case, the sextuple of sensors is completed in fault scenarios 1, 2, 3 and 7 with sensors 9, 11 and 13 (side-to-side accelerations at tower bottom, mid-tower and tower top, respectively); in fault scenario 4 with sensors 1, 2 and 3 (generated electrical power, rotor speed and generator speed, respectively); in fault scenario 5 with sensors 1, 2 and 13 (generated electrical power, rotor speed and side-to-side acceleration at tower top, respectively); in fault scenario 6 with sensors 1, 3 and 13 (generated electrical power, generator speed and side-to-side acceleration at tower top, respectively); and finally, in fault scenario 8 with sensors 1, 11 and 13 (generated electrical power, side-to-side acceleration at mid-tower and tower top, respectively).From a physical point of view, when using σ = 6 sensors, it is interesting to note that the three pitch angles and the side-to-side accelerations (and not the fore-aft accelerations that are in the same direction as the wind speed) are the most important signals to detect faults.

Fault Detection with a Reduced Number of Sensors
To analyze the effect on the overall performance of the fault detection strategy with a reduced number of sensors, we will study a total of 24 samples of ν = 50 elements each, corresponding to the following distribution: • 16 samples of a healthy wind turbine, and • Eight samples of a faulty wind turbine with respect to each of the eight particular fault scenarios defined in Table 3.
In this section, we will consider the following combination of σ = 6 sensors: • sensors 1, 2, 4, 5, 6 and 7, that is, we will measure and collect the information provided by the generated electrical power, rotor speed, generator torque and the first, second and third pitch angles.
For this combination of sensors, each sample of ν = 50 elements is formed by the measures gathered from the sensors during (ν × L − 1)∆ = 312.4875seconds, where L = 500 and the sampling rate 1/∆ = 80 Hz.The fault detection strategy is based on the work by Pozo and Vidal [5], where multiway principal component analysis (MPCA) is first applied and then the so-called Welch-Satterthwaite method [19] to test for the equality of means.One of the key issues of the strategy presented in [5] is the way the data is collected and arranged in a matrix.To create the baseline pattern or PCA model, we measure the σ = 6 sensors from a healthy wind turbine and all the collected data is organized in a matrix X ∈ M n×(N×L) as follows: x (1) i2 x (1) n2 where the superindex s = 1, . . ., σ of each element x (s) ij , i = 1, . . ., n, j = 1, . . ., L in the matrix represents the number of sensors.For the sake of completeness, we summarize in the following itemized list the fault detection strategy presented in [5], which is also illustrated in Figure 1: • Sensor group-scaling is applied to matrix X in Equation ( 7) so the mean of each column is 0 and the standard deviation of each sensor submatrix X (s) , s = 1, . . ., σ is 1.
• The covariance matrix C X is computed according to the expression • The eigenvalues and eigenvectors of matrix C X are computed.The eigenvectors (principal components) constitute the columns of the transformation matrix P-also called PCA model or baseline pattern-according to the eigenvalues in descending order.• With respect to the first principal component, for instance, the baseline sample is defined as the set of numbers {τ i 1 } := X(i, :) × P × e 1 , i = 1, . . ., n, where e 1 is the first vector of the canonical basis.
• When the measures of the σ = 6 sensors are obtained from the current wind turbine to be diagnosed, a new matrix Y has to be constructed as in Equation ( 7): Note that the number ν of rows in matrix Y in Equation ( 8) is a natural number not necessarily equal to the number of rows n in matrix X in Equation ( 7).• Matrix Y in Equation ( 8) is scaled with respect to matrix X in Equation (7), that is, we subtract to the κ-th column of matrix Y, κ = 1, . . ., ν × L × σ, the mean of the κ-th column of matrix X.
Likewise, each element in the submatrix Y (s) , s = 1, . . ., σ, is divided by the standard deviation of the submatrix X (s) .• With respect to the first principal component, for instance, the sample of the current wind turbine to be diagnosed is defined as the set of numbers {t i 1 } := Y(i, :) × P × e 1 , i = 1, . . ., ν. • The Welch-Satterthwaite test for the equality of means [19] is used with samples {τ i 1 } and {t i 1 } to classify the current wind turbine to be diagnosed as healthy or not.
As stated before, a particular configuration of σ = 6 sensors has been considered.Table 5 summarizes how the results in Table 6 are organized.More precisely, Table 6 includes-using the measures of sensors 1, 2, 4, 5, 6 and 7-the number of samples of the healthy wind turbine correctly classified by the test as healthy (correct decision); the number of samples of the faulty wind turbine correctly classified as faulty (correct decision); the number of samples of the faulty wind turbine wrongly classified as healthy (type II error or missing fault); and the number of samples of the healthy wind turbine wrongly classified as faulty (type I error or false alarm).It is worth noting that, for this configuration, type I errors (false alarms) and type II errors (missing faults) occur when we consider scores 2, 3 or 4, i.e., when the test is based purely on the first score all the classifications are accurate.
Wind turbine to diagnose Healthy wind turbine Figure 1.The fault detection strategy is based on testing for notable changes in the distributions of the sample coming from the healthy wind turbine and the sample coming from the current wind turbine to diagnose [5].
The sensitivity and specificity can also be used here as two statistical measures to analyze the performance of the test.On one hand, the sensitivity or power of the test is defined as the ratio of samples from the faulty wind turbine correctly classified.Therefore, if the false negative rate is defined as γ, the sensitivity is computed as 1 − γ.On the other hand, the specificity of the test can be defined as the percentage of samples from the healthy wind turbine correctly identified as such and is usually expressed as 1 − α, where α is the false positive rate.These two measures are calculated-organized as detailed in Table 7-in Table 8 with respect to the 24 samples and for the first four scores.The results in Table 8 show that the sensitivity 1 − γ of the test is, on average, 94%, which is very close to 100%.Sensitivities of 100% are achieved when scores 1 and 3 are considered.The mean value of the the specificity is 72%, which is very close to the expected value of 1 − α = 64%, since the level of significance used in the test is α = 36%.The specificity when only score 1 is considered is increased to 100%.The reliability of the fault detection strategy can also be measured using the true rate of false negatives and the true rate of false positives.These two quantities are based on the Bayes theorem [20].On one hand, the true rate of false positives is defined as the proportion of samples from the faulty wind turbine with respect to the samples that have been classified by the test as healthy, that is, On the other hand, the true rate of false negatives is defined as the proportion of samples from the healthy wind turbine with respect to those samples that have been classified by the test as faulty, that is, For the sensor configuration proposed in this section, the results-organized as described in Table 9-are summarized in Table 10.
Table 9. Relationship between the proportion of false negatives and false positives and scheme for the presentation of the results in Table 10.

Healthy Sample (H 0 )
Faulty Sample (H 1 ) Fail to reject Reject H 0 true rate of false positives Table 10.True rate of false positives and false negatives for each of the four scores when the size of the samples to diagnose is ν = 50 and the sensors used are numbers 1, 2, 4, 5, 6 and 7. A final study is developed here based on the receiver operating curves (ROC) to demonstrate the overall accuracy of the fault detection strategy applied to a configuration with a reduced number of sensors.In general, these curves illustrate the compromise between the sensitivity and the false positive rate.More precisely, for a given level of significance and for each score, the pair of quantities

Score
is represented on a plane.As in [5], we have considered 49 levels of significance, α 1 , . . ., α 49 , within the range [0.02, 0.98], where α i = 0.02 × i, i = 1, . . ., 49.The position of such points can be understood as follows.Since the ultimate goal is to reduce the quantity of false positives while the number of true positives is increased, these points must be ideally placed in the upper-left half plane.Therefore, a method is considered as satisfactory if those points lie within that upper-left half plane.In this sense, Figure 2 illustrates the receiver operating curves for the four scores when the size of the samples to diagnose is ν = 50 and the sensors used are numbers 1, 2, 4, 5, 6 and 7.The ROCs for score 1 (in red, Figure 2) are particularly remarkable.The overall behavior of scores 2 and 4 are still acceptable, while the ROC for the third score must be considered as unsatisfactory.
The results of the fault detection strategy presented in this section, with respect to score 1 and using a reduced number of sensors (6 out of 13), when compared to the results in [5]-when the 13 sensors are used-do not present any performance degradation.That is, in both cases, we have a 100% of accuracy in the detection of the 24 samples (healthy and faulty), but with about 54% less sensors.The pair (false positive rate, sensitivity) ∈ R 2 is emphasized-with a black border-for the four scores when the level of significance is α = 36%.

Detection Time Reduction Analysis
In the previous section, we have considered a strategy to reduce and select the number of sensors used in the fault detection strategy.It has been shown, for instance that, with 54% less sensors, we In order to analyze the effect on the accuracy of the fault detection strategy, the sensitivity and specificity have been studied for the 10 different scenarios, corresponding to 10 different values for ν.As defined in Section 4.3, the specificity is the percentage of correct decisions when the sample studied is healthy (a healthy sample is set as healthy, hence failing to reject H 0 ), whereas the sensitivity is the percentage of faulty samples (H 1 ) rejected to be healthy.As can be seen in Figure 3, the sensitivity and specificity change drastically when varying ν.Therefore, there is a direct connection between the correct decisions and the size of the sample.It can be seen that when the size (ν) decreases, the specificity (that is, out of the 8 × 48001/(ν i × L) healthy samples, how many of them are detected as healthy) decreases rapidly from its maximum (a 100% effectivity at ν = 50) to values around 50% when the size is half its initial value.Therefore, the results get worse as soon as the size of the sample decreases from 50.However, the sensitivity (how many of the 48001/(ν i × L) faults are detected correctly) maintains a pretty good effectivity from sizes between 25 and 50, but then decreases to approximately 75% accuracy.
If we examine the ROC curves in Figures 4-6 for three different values of ν (ν = 50, ν = 30 and ν = 10, respectively), we see that, when ν = 50, the first score has a perfect performance, as it is always detecting both faulty and healthy situations.However, as ν decreases, there is a degradation in all scores, and, therefore, the results obtained are not as good as desired.
As a conclusion to this study, it is suggested that the size of the sample, that is, the number of rows of matrix Y in Equation (8), cannot be reduced from 50.This quantity is the minimum value with the maximum accuracy, and the results get much worse as soon as it began being reduced.This result is consistent with the general theory of statistical hypothesis testing [19], where the power or sensitivity of a test and the sample size are shown to be related.

On the Time Instants per Row (L)
The second possibility to reduce the time of diagnosis is to decrease the number of columns in the original matrix Y in Equation (8).Each column represents a different time instant.Therefore, the bigger number of columns the matrix needs, the more time instants the sensor must collect as information before processing it.To analyze the effect on the accuracy of the fault detection strategy, the reduction in the number of columns is made keeping the number of rows as ν = 50.
To analyze the effect on the overall performance of the fault detection strategy with a reduced number of columns, we will study a total of 19 different scenarios, corresponding to 19 different values for L: For each scenario, we will study a total of 24 samples of ν = 50 elements each, corresponding to the following distribution: • 16 samples of a healthy wind turbine, and • Eight samples of a faulty wind turbine with respect to each of the eight particular fault scenarios defined in Table 3.
The results of this analysis can be summarized as follows: • There is no direct connection between the decrease of the number of time instants and the specificity and the sensitivity.Hence, the detection is maintained to values of 100% of effectivity of detection in both healthy and faulty samples (Figure 7), except for L = 200 and L = 250, when there are 6.25% of false alarms (which can be considered a quite good performance), and for values of L smaller than 25.
• It can also be observed from Figure 7 that the first principal component (score 1) has a perfect recognition of the faulty samples of the wind turbine, as its detection is always 100% when L is greater or equal to 25. • As it can be inferred from the ROC curves in Figures 8 and 9, the first principal component keeps, in both cases, a very good overall performance.As a conclusion to this study, we can infer that the number of columns (L, time instants) can be reduced up to 25 without a degradation in the overall performance.Therefore, the detection time can be reduced from 312.4875 seconds to only 15.6125 seconds, that is, a reduction of 95% of the original detection time.

Fault Detection with a Reduced Number of Sensors and a Reduced Number of Time Instants (L)
In this section, we will present two examples considering a fault detection strategy with a reduced number of sensors, as stated in Section 4, and also a reduced number of time instants (L a = 50 and L b = 25), as considered in Section 5.2.The purpose of this example is to prove that both simplifications altogether follow the objective of fault detection, without missing faults nor false alarms, and with a clear reduction in detection time and computational effort.
To analyze the effect on the overall performance of the fault detection strategy with a reduced number of sensors and with L a = 50 and L b = 25 time instants per row, we will study a total of 24 samples of ν = 50 elements each, respectively, corresponding to the following distribution: • 16 samples of a healthy wind turbine, and • Eight samples of a faulty wind turbine with respect to each of the eight particular fault scenarios defined in Table 3.
In both cases, the following combination of σ = 6 sensors is considered: • sensors 1, 2, 4, 5, 6 and 7, that is, we will measure and collect the information provided by the generated electrical power, rotor speed, generator torque and the first, second and third pitch angles.
For the first case, each sample of ν = 50 elements is formed by the measures gathered from the sensors during (ν × L a − 1)∆ = 31.24875seconds, where L a = 50 and the sampling rate 1/∆ = 80 Hz.For the second case, each sample of ν = 50 elements is formed by the measures gathered from the sensors during (ν × L b − 1)∆ = 15.6125seconds, where L b = 50 and the same sampling rate as in the first case.
For the first case (L a = 50), the results of the fault detection strategy are summarized in Tables 12  and 13.These results clearly exposed that the first principal component is capable of detecting all the faulty samples, and, at the same time, it is capable to state that all 16 healthy samples come from a wind turbine working on their normal condition.Thus, there are neither missing faults, which is a major problem in wind turbines, nor false alarms.
For the second case (L b = 25), the results for the first score are summarized in Table 14.Even for this case, when the fault detection scheme is based on the measured gathered from the sensors during less than 16 seconds, the strategy is able to correctly classify a 100% of the samples both coming from the healthy and faulty wind turbine.More precisely, with 54% less sensors and 95% less detection time-compared to [5]-the fault detection strategy is still able to accurately detect when the wind turbine is healthy or faulty.

Conclusions
The proposed strategy, using only six sensors when ν = 50 and L = 25, detects all the studied faults with a detection time of 15.6125 s and with no false positive detections nor missed detections.
On one hand, compared to the five solutions to the problem given in [15], our strategy reduces the detection time for all the pitch actuator faults (that is, faults 1, 2, and 3), and for the scaling pitch angle sensor fault (fault 7).In the case of faults 4, 5, 6 and 8, our detection time is not the best nor the worst (again, compared to the five solutions given in [15]).However, it is noteworthy that, for all the studied faults, our fault detection scheme is the only one that achieves, in all cases, sensitivity and specificity of 100%.
On the other hand, compared to [5], with 54% less sensors and 95% less detection time, the real-time application of the fault detection method is now possible while still being able to detect when the wind turbine is healthy or faulty with a sensitivity and specificity of 100%.

Figure 2 .
Figure 2.The receiver operating curves (ROCs) for the four scores when the size of the samples to diagnose is ν = 50 and the sensors used are numbers 1, 2, 4, 5, 6 and 7.The pair (false positive rate, sensitivity) ∈ R 2 is emphasized-with a black border-for the four scores when the level of significance is α = 36%.

Figure 3 .
Figure 3. Specificity (green) and sensitivity (red) of the test as a function of the size of the sample (ν), that is, the number of rows of matrix Y in Equation (8), when the first score is used.

Figure 4 .Figure 5 .Figure 6 .
Figure 4.The ROCs for the four scores when the size of the samples to diagnose is ν = 50 and all the sensors are used.

Figure 7 .Figure 8 .Figure 9 .
Figure 7. Specificity (green) and sensitivity (red) of the test as a function of the number of time instants per row (L) when the first score is used.

Table 2 .
Assumed available measurements.These sensors are representative of the types of sensors that are available on a MW-scale commercial wind turbine.
n f each row vector of matrix Tf .Note again that n f is a natural number not necessarily equal to n in Step 4 neither n h in Step 8. 10.Define µ h as the mean vector value of t i h ∈ R , i = 1, . . ., n h , that is,

Table 4 .
Results of the sensor selection when the number of sensors to be combined is σ = 6 for each of the eight fault scenarios described in Table3.

Table 5 .
Scheme for the presentation of the results in Table6.

Table 6 .
Categorization of the samples with respect to the presence or absence of a fault and the result of the test for each of the four scores when the size of the samples to diagnose is ν = 50 and the sensors used are numbers 1, 2, 4, 5, 6 and 7.

Table 7 .
Scheme for the presentation of the results in Table8(specificity and sensitivity).

Table 8 .
Sensitivity and specificity of the test for each of the four scores when the size of the samples to diagnose is ν = 50 and the sensors used are numbers 1, 2, 4, 5, 6 and 7.

Table 12 .
Categorization of the samples with respect to the presence or absence of a fault and the result of the test for each of the four scores when the size of the samples to diagnose is ν = 50, L a = 50, and the sensors used are numbers 1, 2, 4, 5, 6 and 7.

Table 13 .
True rate of false positives and false negatives for each of the four scores when the size of the samples to diagnose is ν = 50 and the sensors used are numbers 1, 2, 4, 5, 6 and 7.

Table 14 .
Categorization of the samples with respect to the presence or absence of a fault and the result of the test for the first score when the size of the samples to diagnose is ν = 50, L b = 25, and the sensors used are numbers 1, 2, 4, 5, 6 and 7.