Wind Turbine Operation Curves Modelling Techniques

Wind turbines are machines operating in non-stationary conditions and the power of a wind turbine depends non-trivially on environmental conditions and working parameters. For these reasons, wind turbine power monitoring is a complex task which is typically addressed through data-driven methods for constructing a normal behavior model. On these grounds, this study is devoted the analysis of meaningful operation curves, which are rotor speed-power, generator speedpower and blade pitch-power. A key point is that these curves are analyzed in the appropriate operation region of the wind turbines: the rotor and generator curves are considered for moderate wind speed, when the blade pitch is fixed and the rotational speed varies (Region 2); the blade pitch curve is considered for higher wind speed, when the rotational speed is rated (Region 2 1/2). The selected curves are studied through a multivariate Support Vector Regression with Gaussian Kernel on the Supervisory Control And Data Acquisition (SCADA) data of two wind farms sited in Italy, featuring in total 15 2 MW wind turbines. An innovative aspect of the selected models is that minimum, maximum and standard deviation of the independent variables of interest are fed as input to the models, in addition to the typically employed average values: using the additional covariates proposed in this work, the error metrics decrease of order of one third, with respect to what would be obtained by employing as regressors only the average values of the independent variables. In general it results that, for all the considered curves, the prediction of the power is characterized by error metrics which are competitive with the state of the art in the literature for multivariate wind turbine power curve analysis: in particular, for one test case, a mean absolute percentage error of order of 2.5% is achieved. Furthermore, the approach presented in this study provides a superior capability of interpreting wind turbine performance in terms of the behavior of the main sub-components and eliminates as much as possible the dependence on nacelle anemometer data, whose use is critical because of issues related to the sites complexity.


Introduction
Operation and maintenance of wind turbines represent an important fraction (up to 20-25%) of the life-cycle costs of an industrial plant. Furthermore the recent trends about increasing rotor sizes [1,2] and offshore exploitation [3][4][5] complicate the accessibility to wind turbine sites and increase the intervention costs in case of unexpected wind turbine stops.
For these reasons, and also in light of the non-stationary conditions to which wind turbines are subjected, remote control and monitoring [6,7] of wind farms is an objective whose importance has been continuously growing. Fortunately, the widespread diffusion of Supervisory Control And Data Acquisition (SCADA) control systems and of Turbine Condition Monitoring (TCM) systems guarantees the availability of most of the data which are necessary for wind turbine control and monitoring. Nevertheless, transforming the available information into knowledge about the health status or the performance of wind turbines is particularly complex. For example, as regards the detection of mechanical damages at gears and bearings through vibration data analysis, it is prohibitive to isolate the signatures associated to incoming faults [8][9][10][11].
As regards the monitoring of wind turbine performance, the main issue is that a wind turbine has a multivariate dependence on environmental conditions [12,13] and working parameters. SCADA control systems typically record and store (with some minutes of sampling time) the main working parameters and also internal temperatures collected at meaningful wind turbine sub-components. By this point of view, therefore, the main issue is incorporating efficiently this kind of information for constructing normal behavior models for the performance of the wind turbines [14][15][16][17], which can useful as well for condition monitoring purposes [18,19].
The simplest data-driven model employed for wind turbine performance monitoring is the power curve [20][21][22][23], which is the relation between the average wind speed and the extracted power. On the grounds of the above discussion, a recent trend in wind energy literature regards multivariate approaches to the power curve: in [24][25][26], the point of view is including additional environmental variables (either measured ambient temperature, humidity, and wind direction as in [24] or estimated by a Numerical Weather Prediction model as in [25]); in [27][28][29][30], the approach consists in including the most important operation variables (as the blade pitch and the rotor speed) in the multivariate modelling of the power curve.
An interesting approach to wind turbine power monitoring consists in the analysis of other operation curves, mainly regarding the rotor speed and the blade pitch. In [31,32], the wind speed-blade pitch curve of wind turbines is analyzed using Support Vector Regression and the results are compared against the binning method. In [33], two fundamental operation curves are analyzed through Gaussian process methods: the wind speed-blade pitch and the wind speed-rotor speed curves.
A further development of these concepts regards the analysis of operation curves which do not involve the wind speed measured at the nacelle of the wind turbines. The rationale for this consideration is that in general the nacelle wind speed is measured behind the rotor span and the undisturbed wind speed is estimated through a nacelle transfer function: this introduces possible drawbacks, especially in complex environment. Furthermore, it is not rare that nacelle anemometers are subjected to failures or bias [34]. These considerations have even led to the formulation of the concept of rotor equivalent wind speed [35,36]: the idea is that, since the rotor speed is controlled on the grounds on the torque and not on the grounds of the nacelle anemometer, the rotor itself can be considered as a probe for estimating the wind speed.
Therefore, this study is devoted to the data-driven analysis of three operation curves, which are rotor speed-power, generator speed-power and blade pitch-power curve. To best of the author's knowledge, despite the potential interest of these curves in wind turbine power monitoring, this topic has not been addressed systematically in the literature, except for the intuition contained in [37], regarding the fact these curves can be particularly meaningful for wind turbine aging estimation.
The objective of this study is formulating a methodology for monitoring the performance of wind turbines using the above indicated curves: this in practice has been realized by verifying the robustness of a non-linear multivariate regression, which can therefore be employed as a normal behavior model for comparing the expected performance against the actual performance.
The data analysis is appropriately conducted in light of the principles of the control of wind turbines: for moderate wind speed (approximately between 5 m/s and 9 m/s), the blade pitch is kept fixed and the wind turbine attains the highest possible aerodynamic efficiency by regulating the rotor speed; for higher wind speed (approximately between 9 m/s and 13 m/s), the wind turbine operates in partial aerodynamic load by keeping constant the rotational speed and by regulating the blade pitch. These two operation regions are typically overall indicated as Region 2 of the power curve of a wind turbine, because they are between cut-in and rated: in [37], in order to distinguish them, they are indicated as Region 2 and Region 2 1 /2 and the same notation is employed in this study. Therefore, in this work, the selected operation curves are analyzed in the regions where they are of most interest for performance monitoring purposes: rotor and generator speed curves in Region 2, blade pitch curve in Region 2 1 /2. The interest for the blade pitch-power curve regards also the fact that, when the rotor speed saturates, the rotor equivalent wind speed becomes meaningless and therefore it is conceivable to use the blade pitch curve as the basis for a sort of pitch equivalent wind speed.
Due to the non-linear relation between the input variables and the power output, the selected methodology for the analysis of the curves is a Support Vector Regression with Gaussian Kernel. An innovative aspect of this study deals with the selection of the set of covariates for the regression. In the literature about wind turbine power monitoring, it is typical that the selected covariates are the average values as stored by the SCADA control system, but it should be noticed that SCADA systems store as well the minimum, the maximum and the standard deviation in the sampling interval for each measured channel. In this study, the possibility has been explored of including also minimum, maximum and standard deviation of the selected input variables, in order to inquire within what extent the normal behavior model improves. Sideways, this simple idea, which is enriching the covariates structure rather than complicating the model structure, could be inspiring for multivariate power curve modelling because the critical point is exactly reproducing the observed variability of the extracted power, given average conditions.
The main result of this work therefore consists in the verification that, using the proposed methodology, it is possible to model the power of a wind turbine with a precision which is competitive with the state of the art in the literature, but without using the nacelle wind speed as input variable: this is definitely non-trivial because, as discussed for example in [27], the wind speed explains up to 98% of the variance of the power of a wind turbine.
Summarizing, the structure of the manuscript is therefore the following: in Section 2, the test case and the data sets used for the present study are described; Section 3 is devoted to the methods; the results are collected and discussed in Section 4; conclusions are drawn and further directions of the present work are indicated in Section 5.

The Test Cases
The selected test cases were two wind farms (named as WF1 and WF2) sited in southern Italy on gentle terrains, featuring respectively six and nine wind turbines whose rated power was 2 MW. The rotor diameters at WF1 was 92 m and at WF2 was 82 m. The layouts are reported in Figures 1 and 2: from these Figures it arises that the interturbine distance at WF1 was considerably higher with respect to WF2. Therefore, the impact of wakes on wind turbine operation was low for WF1, while it was relevant for WF2 (as discussed, for example, in [38,39]).  Operation data spanning the years 2017 and 2018 were used, courtesy of the ENGIE Italia company. The measurements used were the following: • nacelle wind speed v; • power production P; The data had 10 min of sampling time. For each channel, average, minimum, maximum and standard deviation over the 10 min interval have been used.
Data were pre-processed by filtering on wind turbine operation using the appropriate run time counter available in the SCADA data set. Subsequently, each data set was divided according to the operation region, as indicated in Table 1, on the grounds of nacelle wind intensity v. The same notation as in [37] was adopted: the regime when the wind turbine operated at full aerodynamic load, with variable rotational speed and fixed pitch, is indicated as Region 2; instead, the regime characterized by rated rotational speed and variable pitch is indicated as Region 2 1 /2.

Region Condition
Examples of the operation curves under analysis were reported for a sample wind turbine (T1) for each test case in Figure 3 (generator speed-power), Figure 4 (rotor speedpower) and Figure 5 (blade pitch-power): from these Figures, it arises that the curves were qualitatively similar and therefore the logic of the control was the same.

The Method
The objective of the study was formulating a reliable methodology for wind turbine power monitoring using the selected operation curves, depending on the working region. This passes through the construction of a normal behavior model for the curves and consequently consists in the comparison between the model estimates and the measurements in the validation data set. In this study, a Support Vector Regression with Gaussian Kernel was selected for constructing the normal behavior models because this kind of regression has proven to be effective for multivariate modeling of wind turbine power [37,40]. In the following, therefore, the general principles of the Support Vector Regression are briefly recapped and, subsequently, it will be discussed how to apply them for monitoring the operation curves of interest.
Given a matrix X of input variables, where the covariates are grouped according to the columns and the observations are grouped according to the rows, a linear model is posed in Equation (1): where β is the vector of regression coefficients and b is the intercept vector. The Support Vector Regression consists in a methodology for estimating the β parameters. It relies on the constraint that the residuals between the measurement Y and the model estimate f (X) are lower than a threshold for each n-th observation Equation (16): The optimization problem can be rephrased in the Lagrange dual formulation; the function to minimize is L(α), given in Equation (3): with the constraints Equation (4) where C is the box constraint. The solution for the β parameters in terms of the observations matrix X and of α n or α * n is given in Equation (5): In a nutshell, the optimization passes through the data-driven selection of the most meaningful rows of the observations matrix X, which for this reason are named support vectors and which are weighted through the α n or α * n coefficients. Once the β coefficients have been computed on a reference data set, they can be used for predicting new values through Equation (6), given the input variables matrix X: The non-linear Support Vector Regression is obtained by replacing the products between the observations matrix with a non-linear Kernel function Equation (7): where ϕ is a transformation mapping the X observations into the feature space. A Gaussian Kernel selection is given in Equation (8) and has been widely employed in wind energy literature for non-linear regression problems [31]: where κ is the Kernel scale. Then Equation (3) rewrites as in Equation (9): (9) and Equation (6) for predicting rewrites as in Equation (10): In this work, the hyperparameters of the regression κ, C, have been automatically optimized, basing on the evaluation on 30 model calls of the cross-validation loss.
In order to appreciate the usefulness of the selected non-linear regression for the problem of this study, a comparison is set up against a multivariate linear model: Principal Component Regression [41].
The ordinary least squares estimate of the linear model coefficients Equation (1) is given in Equation (11): If the input variables of the matrix X are strongly correlated, the estimate of the β coefficients is affected by large uncertainty: for this reason, the idea of of the Principal Component Regression is constructing an input variables matrix having mutually orthogonal columns.
Given the singular value decomposition of X in Equation (12) the columns of U and V are orthonormal sets of vectors denoting the left and right singular vectors of X and ∆ is a diagonal matrix, whose elements are the singular values of X. Therefore, the matrix X X T can be decomposed as: where Λ = diag λ 1 , . . . , λ p and λ 1 ≥ · · · ≥ λ p ≥ 0. The Principal Component Regression poses an ordinary least squares model between the transformed data matrix W = xV and the target Y. Therefore, the estimate of β is given in Equation (14): Once β PCR has been calculated using training data, the model can be used for predicting through Equation (15): which is the corresponding of Equation (10) for the Principal Component Regression. The data sets used for this study were employed as follows for the regression: • The data from the year 2017 were used for optimizing and training the regression and are named as D1; • The validation data set D2 was constituted by the data from the 2018.
The quality of the regression was analyzed through common error metrics, which were the Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE).
Given the measurements Y(X) for the data set D2 and the model estimates f (X), the residuals are defined in Equation (16): The MAE is defined in Equation (17): where N is the number of samples in the D2 data set. The MAPE is defined in Equation (18): and the RMSE is defined in Equation (19): whereR is the average residual in the data set D2. Finally, the selected structure of the regression was the following: for each curve, the columns of the X matrix were constituted by the corresponding independent variable (generator speed, rotor speed, blade pitch respectively). The main novelty proposed in this manuscript as regards covariates selection was to include also the minimum, maximum and standard deviation as further regressors in addition to the average value. Therefore for, say, the generator speed-power curve regression, the X matrix was constituted by average, minimum, maximum, standard deviation of the generator speed and the output Y will be the power P. In the following, this kind of model will be indicated as M. In order to provide a benchmark more in line with previous findings in the literature, a reduced model (indicated as M red ) was analyzed as well: in this case, the unique covariate was the average value of the independent variable of interest (for example, the generator speed for the generator speed-power curve) and therefore the model was univariate nonlinear. Furthermore, for the M multivariate models, the Support Vector Regression and the Principal Component Regression were set up and the results were compared, while the univariate linear model was discarded because it was too simple for the problem. Summarizing, for each curve a comparison of univariate and multivariate non-linear models was performed (M against M red ), in order to appreciate the importance of the additional covariates selection proposed in this study. Furthermore, for the multivariate models, a comparison of non-linear (Support Vector) and linear (Principal Component) regression was performed, in order to appreciate the importance of non-linearity. Therefore, the structure of the employed regressions is summarized in Table 2.

WF1
In Figures 6-8, examples of measured and simulated curves are reported. The results for the complete models M are reported on the left, while on the right the results for the corresponding reduced models M red are reported. A sample wind turbine (T1) was selected and the employed data set was the validation one (D2). As regards all the curves, it arises that the full models M were more capable of reproducing the actual dispersion of the measured curves: therefore, the inclusion of the minimum, maximum and standard deviation of the input as further covariates for the regression proved to be decisive for a more realistic modelling of the curves.   The qualitative results of Figures 6-8 were confirmed by the quantitative analysis reported in Tables 3-5, where the error metrics are reported for the validation of the non-linear univariate and multivariate models. A substantial decrease of the error metrics occurred when the models also included in the input variables matrix the minimum, maximum and standard deviation of the independent variable. In order to appreciate how much the multivariate model improved with respect to the univariate reduced models, Table 6 was reported: it contains the percentage decrease ∆ of each error metric, in average for the wind farm, when the full model M was employed with respect to the corresponding reduced model M red . It arises that in general the order of error metrics improvement was one third. In Table 7, the average wind farm error metrics for the multivariate linear models are reported and it arises that they were several factors higher than the corresponding metrics in Tables 3-5: this supports that it is fundamental to employ a non-linear multivariate model as the selected one.
It is very important to notice that for all the three curves, the order of magnitude of the MAPE was between 2.5% and 3%: this meant that it was conceivable to monitor the power of a wind turbine with this order of precision using the selected curves, which meant without using the wind speed measurements. This result was far from trivial, for several reasons: as argued in [27], the wind speed explains up to 98% of the variance of the data employed for multivariate power curve analysis and it is therefore remarkable to obtain a comparable regression quality without using the most important covariate (wind speed). Consider that, for example, in [25] multivariate models for wind turbine power curve are analyzed and the best performing one provides a MAPE of order of 7%, which is more than double of the results in this work. Putting together Region 2 and Region 2 1 /2, an overall MAE of order of 20 kW was obtained in this work along all the power curve span: this result is similar to those in [30], with the difference that the percentage errors are lower in the present study because the test case wind turbines had 2 MW of rated power (while in [30] the rated power was 1.4 MW).
In order to appreciate the goodness of the obtained results in the context of wind turbine power monitoring, a Support Vector Regression was set up for a multivariate model of the power curve, including the nacelle wind speed v. The selected input variables were nacelle wind speed v, blade pitch β, rotor speed Ω and generator speed ω; the output of the model was the produced power P. Data from cut-in to rated were included and only the average values of the above indicated variables were fed as input to the regression, which was validated according to the same procedure as described in Section 3 and applied in this section. The rationale for this analysis was comparing the operation curves regression proposed in this study against a benchmark of multivariate power curve model, which included the most important operation variables but used only the average vales (as typical in the literature): for this reason, the same model structure (Support Vector Regression) was considered. It arises that the wind farm average MAE for the multivariate power curve Support Vector Regression is of order of 20 kW, as for the combination of the curves selected in this study. This supports that it can be more convenient for wind turbine power monitoring to divide the data in appropriate working regions and to consider separately different operation curves: the reason is that there is no con (the error metrics were comparable to a multivariate power curve) and the main pros are that it is not needed to use nacelle anemometer data and that the performance of the wind turbine is more easily interpretable in terms of the behavior of the sub-components.

WF2
In Figures 9-11, the measured and simulated operation curves are reported for a sample wind turbine (T1), as has been done in Section 4.1 for WF1. Similar considerations arise at this level of analysis, because it was evident that the multivariate non-linear models (including minimum, maximum and standard deviation of the input variables) allowed reproducing more realistically the dispersion of the actual curves.   In Tables 8-10, the results for the regressions are reported and in Table 11, the results are reported for the comparison between the univariate non-linear models M red and the multivariate non-linear models proposed in this study (M). Employing the non-linear multivariate models M, a sensible decrease was achieved as regards all the error metrics (order of 30%). In Table 12, the average wind farm error metrics for the multivariate linear models are reported and it arises that they were up to four times higher than for the models proposed in this work: this supports that non-linearity is fundamental to incorporate the variability of the wind turbine operation parameters.
Concerning the absolute values of the error metrics when the multivariate non-linear models M were adopted, it can be noticed that for WF2 they were slightly higher with respect to WF1: for example, the MAPE of the full model was of order of 4% (with respect to order of 2.5% for WF1). This can be likely interpreted as the combination effect of the different technology and of the fact that WF2 was a wind farm characterized by frequent operation under wake: therefore, modelling its behavior reliably was more complicated, especially as regards the pitch control. Nevertheless, it is important to notice that also for this challenging test case, the adoption of the further covariates proposed in this work improved the quality of the regression considerably and the obtained results were competitive with the state of the art in the literature as regards wind turbine power curve modelling. Table 8. Results of the models validation for the generator speed-power curve in Region 2: WF2.

Conclusions
The objective of the present study was contributing to the SCADA data analysis techniques for wind turbine power monitoring. The main innovative points are substantially two: • instead of analyzing the power curve from cut-in to rated, other meaningful operation curves are considered and each of them is studied in the appropriate working region of the wind turbines; • the curves are studied through a multivariate Support Vector Regression with Gaussian Kernel and the set of covariates has been augmented by including in the input of the models the minimum, maximum and standard deviation of the independent variables.
Three curves have been selected, which have not been addressed in detail in the literature before: • generator speed-power; • rotor speed-power; • blade pitch-power.
A model for the curves of interest has been set up using a Support Vector Regression with Gaussian Kernel. The data sets employed for training and testing come from 152 MW wind turbines from two industrial wind farm (owned by ENGIE Italia): the two test cases have been selected because one is characterized by frequent operation under wakes and the other is not.
The main result of this study is that targeting the curves and using the covariates selected in this study, it is possible to model the power of a wind turbine with results which are competitive with the state of the art in the literature as regards multivariate power curve modelling. Therefore, there is no con in renouncing to use nacelle wind speed measurements for wind turbine power monitoring, if the appropriate operation curves are selected and if the models are set up appropriately. Instead, the possible advantages of adopting this point of view are several: • using operation curves which do not employ the nacelle wind speed as independent variable, it is possible to compare the performance of wind turbines of the same model which are sited in different environments. This is in general not possible using the power curve, because the nacelle transfer function depends on wind shear, turbulence, atmospheric stability and so on. • the selected operation curves are particularly appropriate for interpreting the performance of wind turbines in relation to the behavior of the main sub-components and this represents an added value with respect to the power curve, which connects directly the wind flow to the power output.
As regards the latter point, actually, it should be noticed that the generator speedpower curve has been analyzed in the context of wind turbine aging analysis in [37]: despite the methodology employed in that study was simpler with respect to the analysis proposed in this work, it has been sufficient to highlight a considerable under-performance of a wind turbine because of generator efficiency aging. Another potential application of this kind of methodology is the analysis of wind turbine optimization technology, which likely intervenes with slight modifications of the characteristic operation curves of the wind turbines (see for example [42][43][44][45][46]), involving the rotational speed and-or the blade pitch control. A practical approach for assessing the net effect of this kind of technology upgrades is training a model (similar to those presented in this work) with data before the technology upgrade, validating on two target data sets (one before and one after the upgrade) and analyzing how the statistical properties of the residuals between model estimates and measurements change.
Another fruitful result of the present study is the analysis of how much the error metrics for the selected curves modeling diminish when the set of covariates includes minimum, maximum and standard deviation of the independent variable (in addition to the average value): a percentage decrease of order of one third is achieved, with respect to the same kind of model employing only the average of the variable of interest. Since the variability of the main working parameters (as for example the rotational speed) is substantially connected to the turbulence intensity, by incorporating the additional covariates proposed in this work it is possible to construct normal behavior models which are sitespecific, because they take into account the conditions which are measured on site and the response of the machine. This inspires as further direction to adopt a similar approach also for multivariate wind turbine power curve modelling: actually, there is a wide literature about the improvement of the model structure, while the discussion about the use of further covariates is at its early stages. In [24,25,[27][28][29], multivariate models for the power curves have been proposed, which include further environmental and operation variables with respect to solely the average nacelle wind speed, but at present there are no studies dealing with the inclusion in the models of minimum, maximum and standard deviation of the main variables. Finally, it would be extremely interesting to analyze the operation curves selected in this study by using time-resolved wind turbine data [47] having much lower sampling time (order of seconds).