Wind Turbine Power Curve Modeling with a Hybrid Machine Learning Technique

Featured Application: The proposed method can provide accurate wind turbine power curves even in the presence of outliers. Abstract: A power curve of a wind turbine describes the nonlinear relationship between wind speed and the corresponding power output. It shows the generation performance of a wind turbine. It plays vital roles in wind power forecasting, wind energy potential estimation, wind turbine selection, and wind turbine condition monitoring. In this paper, a hybrid power curve modeling technique is proposed. First, fuzzy c-means clustering is employed to detect and remove outliers from the original wind data. Then, di ﬀ erent extreme learning machines are trained with the processed data. The corresponding wind power forecasts can also be obtained with the trained models. Finally, support vector regression is used to take advantage of di ﬀ erent forecasts from di ﬀ erent models. The results show that (1) ﬁve-parameter logistic function is superior to the others among the parametric models; (2) generally, nonparametric power curve models perform better than parametric models; (3) the proposed hybrid model can generate more accurate power output estimations than the other compared models, thus resulting in better wind turbine power curves. Overall, the proposed hybrid strategy can also be applied in power curve modeling, and is an e ﬀ ective tool to get better wind turbine power curves, even when the collected wind data is corrupted by outliers.


Introduction
The sense of crisis brought about by the depletion of fossil energy has prompted the global energy revolution [1,2].Vigorously developing renewable energy is an important measure to protect the environment and meet the energy demand of social development [1,2].Recently, due to the advances in technology, and the stimulation of energy policies and environmental pressures, the proportion of renewable energy in the power system of many countries and regions has been rising [1,2].Wind power, as one of the clean and renewable energy sources, has received great attention from all countries in the world [1][2][3].
Currently, many researchers focus on studying the performance of different wind turbines due to the large-scale development and utilization of wind energy [1,4].Usually, a wind turbine power curve (WTPC) shown in Figure 1 can be used to describe the performance of wind turbines, namely the power output of a wind turbine at a specific wind speed [4].When wind speed V is less than the cut-in wind speed V cut−in , the wind turbine does not work and no wind power is generated.When V is located in the interval [V cut−in , V rated ], the output power increases as the wind speed increases.The output power reaches a constant, the rated wind power P rated , when the wind speed is larger than the rated wind speed V rated but less than the cut-out wind speed V cut−out .The wind turbine will be shut down to avoid defects and damage when the wind speed is larger than V cut−out [1,5].
Appl.Sci.2019, 9, x 2 of 17 cut-in wind speed  , the wind turbine does not work and no wind power is generated.When  is located in the interval [ ,  ], the output power increases as the wind speed increases.
The output power reaches a constant, the rated wind power  , when the wind speed is larger than the rated wind speed  but less than the cut-out wind speed  .The wind turbine will be shut down to avoid defects and damage when the wind speed is larger than  [1,5].Often, the turbine manufacturer provides a theoretical WTPC, which is measured under ideal meteorological and topographical conditions [6].However, due to the climate variability in the observed wind farm, the theoretical WTPC cannot describe the performance of a wind turbine accurately [7].So, it is essential to estimate the actual WTPC from the real operational data of a wind turbine.Accurate WTPCs play important roles in wind power forecasting, wind turbine condition monitoring, wind energy potential estimation, and wind turbine selection, and so forth.[1,8,9].Currently, there are two types of power curve estimation methods, parametric models and nonparametric models [10].
As for parametric models, they often consider the shape of power curves.Therefore, many Sshaped functions are employed to model WTPCs.They are the double exponential-based functions and logistic functions [1].A simple double exponential function and an adjusted double exponential function were employed to model real WTPCs in [11,12], respectively.Many types of logistic functions were employed to model WTPCs.In [13], a three-parameter logistic function (3-PLF) was used by the authors to evaluate the performance of the Gamsesa G90-2.0MV wind turbine.Many researchers suggested using a four-parameter logistic function (4-PLF) to model WTPC [14,15].When the variable asymmetry factor is added into the above 4-PLF, a five-parametrer logistic function (5-PLF) can be derived.And a 5-PLF can degenerate into a 4-PLF by letting the asymmetry factor equal 1 [1].In [15,16], a 5-PLF was used, while authors in [17] employed a logistic function with six parameters (6-PLF) to simulate WTPCs.Even in [10], Taslimi-Renani et al. used a modified hyperbolic tangent (MHTan) with nine parameters to approximate the nonlinear relationship between wind speed and power output of a wind turbine.
Unlike parametric models, nonparametric models do not consider the shape of WTPC.They only consider the complex nonlinear relationship between wind speed and wind power.Currently, many artificial intelligence-based models have been employed to learn the above complex nonlinear relationship.In [18], three different types of neural networks, self-supervised neural network, multilayer perceptron, and general regression neural network, were used, and their performances were also compared in power curve modeling.In [19], the authors showed that artificial neural network (ANN) generated a better power curve than some parametric models.In [20,21], support vector regression (SVR) and Gaussian process (GP) were employed to estimate the real WTPCs.Moreover, GP was also used to obtain probabilistic WTPC in [21,22].In [23], Wang et al. proposed two Bayesian-based models, heteroscedastic and robust spline regression models, to fit deterministic and probabilistic power curves in different seasons, respectively.They also proposed two asymmetric Often, the turbine manufacturer provides a theoretical WTPC, which is measured under ideal meteorological and topographical conditions [6].However, due to the climate variability in the observed wind farm, the theoretical WTPC cannot describe the performance of a wind turbine accurately [7].So, it is essential to estimate the actual WTPC from the real operational data of a wind turbine.Accurate WTPCs play important roles in wind power forecasting, wind turbine condition monitoring, wind energy potential estimation, and wind turbine selection, and so forth.[1,8,9].Currently, there are two types of power curve estimation methods, parametric models and nonparametric models [10].
As for parametric models, they often consider the shape of power curves.Therefore, many S-shaped functions are employed to model WTPCs.They are the double exponential-based functions and logistic functions [1].A simple double exponential function and an adjusted double exponential function were employed to model real WTPCs in [11,12], respectively.Many types of logistic functions were employed to model WTPCs.In [13], a three-parameter logistic function (3-PLF) was used by the authors to evaluate the performance of the Gamsesa G90-2.0MV wind turbine.Many researchers suggested using a four-parameter logistic function (4-PLF) to model WTPC [14,15].When the variable asymmetry factor is added into the above 4-PLF, a five-parametrer logistic function (5-PLF) can be derived.And a 5-PLF can degenerate into a 4-PLF by letting the asymmetry factor equal 1 [1].In [15,16], a 5-PLF was used, while authors in [17] employed a logistic function with six parameters (6-PLF) to simulate WTPCs.Even in [10], Taslimi-Renani et al. used a modified hyperbolic tangent (MHTan) with nine parameters to approximate the nonlinear relationship between wind speed and power output of a wind turbine.
Unlike parametric models, nonparametric models do not consider the shape of WTPC.They only consider the complex nonlinear relationship between wind speed and wind power.Currently, many artificial intelligence-based models have been employed to learn the above complex nonlinear relationship.In [18], three different types of neural networks, self-supervised neural network, multilayer perceptron, and general regression neural network, were used, and their performances were also compared in power curve modeling.In [19], the authors showed that artificial neural network (ANN) generated a better power curve than some parametric models.In [20,21], support vector regression (SVR) and Gaussian process (GP) were employed to estimate the real WTPCs.Moreover, GP was also used to obtain probabilistic WTPC in [21,22].In [23], Wang et al. proposed two Bayesian-based models, heteroscedastic and robust spline regression models, to fit deterministic and probabilistic power curves in different seasons, respectively.They also proposed two asymmetric spline regression models to obtain accurate power curves in different wind farms and different seasons Appl.Sci.2019, 9, 4930 3 of 18 in [9].Besides, monotonic regression [24], K-nearest neighbor model (KNN) [25], adaptive neuro-fuzzy interference system (ANFIS) [26], fuzzy based models [27], and copula model [28] were also taken to solve the problem of power curve modeling.
From the above literature review, it can be found that only one power curve model is selected and used to fit the real WTPC with the measured wind data.However, due to the complex expression of WTPC or complex nonlinear relationship between wind speed and power output, one model cannot describe them comprehensively.In many applications, such as wind speed forecasting [29] and load forecasting [30,31], it has been proven that a combination of several models will perform better than single models.In power curve modeling, there is no paper that takes advantage of different power forecasts generated by several power curve models.Besides, it is reported that there are many outliers in the collected wind data [32][33][34].They have adverse effects on the learning process of power curve models and prevent us from obtaining accurate WTPCs [34].So, it is essential to process these outliers in the original wind data first.
In this paper, a hybrid wind turbine power curve modeling technique is proposed.Firstly, fuzzy c-means clustering (FCM) is employed to detect outliers.In each cluster, if the distance between the sample and cluster center is larger than a threshold, the sample will be signed as an outlier.The outliers will be removed from the original wind data.Secondly, with the above processed wind data, three types of extreme learning machines, original extreme learning machine (ELM) [35], weighted regularized extreme learning machine (WELM) [36], and outlier-robust extreme learning machine (ORELM) [37], are employed to get the power forecasts.Finally, SVR [38] is used to generate the final forecasts, which are the combination of the forecasts obtained from the above three ELM-based models.The performance of the proposed hybrid model is compared with some popular parametric and nonparametric models on different wind turbines.The results show that the proposed hybrid model can produce a more accurate power forecast at a given wind speed, resulting in a better power curve.
The organization of the paper is described as follows.Section 1 is the introduction part.Some popular power curve models are introduced briefly in Section 2. Section 3 presents the methodologies related to the proposed hybrid model and the proposed strategy for power curve modeling.The power curve modeling results of different wind turbines are shown in Section 4. Section 5 concludes the whole paper.

Popular Power Curve Models
In this section, some popular power curve models are introduced briefly.Usually, they can be divided into two categories: parametric models and nonparametric models.

Parametric Models
Parametric models, which are constructed by several mathematic expressions with several parameters, are often used to describe the WTPC [23].Some representative parametric models such as 3-PLF, 4-PLF, 5-PLF, 6-PLF, and MHTan are introduced here.Supposing V, P represent wind speed and wind power, respectively, the calculations of different parametric models are presented in Table 1.
As for 5-PLF, it can degenerate in 4-PLF when the asymmetric factor is equal to 1. MHTan can degenerate into a hyperbolic tangent by assuming α 1 = • • • = α 8 = 1 and α 9 = 0, while it becomes a hyperbolic sine when Table 1.Parametric wind turbine power curve models.

3-PLF P =
ky0e rV k+y0(e rV −1) k: the capacity of the system; r: the rate of increase; y 0 : no special mean.

6-PLF
δ: the lower asymptote; α: the upper asymptote; β: the growth rate; γ: the closet asymptote to the maximum growth part of the curve; V 0 : the value P(0); : the value around 1.

Nonparametric Models
For a wind turbine, there is a nonlinear relationship between power output and the other variables, such as wind speed and wind direction.It is usually described by a wind turbine power curve and expressed by the following function, namely.
where θ denotes the set of variables that affect the power output of a wind turbine, such as wind direction and pressure, ζ is the error term, f (•) is the nonlinear function.
Due to the superior nonlinear fitting ability of many artificial intelligence methods, they have been widely utilized in wind turbine power curve modeling, namely estimating the unknown nonlinear function f (•).Some popular ones are SVR [20], GP [21,22], and ANNs [19,26].

Proposed Wind Power Curve Model
In this section, related methods for the proposed model are introduced at first.Then, the proposed strategies for deterministic and probabilistic power curve modeling are presented in detail.

Fuzzy C-Means Clustering
FCM was first proposed by Bezdek in 1981 [39].In FCM, each sample is not strictly divided into a certain class.It has a degree of membership for each class.Generally, the objective function for FCM can be expressed as [40] where the membership matrix is the number of samples, m is the weighting exponent, and its value is usually selected as 2, u ij denotes the degree that the jth sample belongs to the ith cluster, 0 is the Euclidean distance between the sample x j and the cluster center v i .
Considering the constraint on u ij , the objective function can be rewritten as the following function by introducing a Lagrange multiplier λ, Let the partial derivatives of function F with respect to u ij and v i , respectively, equal to 0, the updated formulas for u ij and v i can be expressed as [40] FCM employs the above two functions to update the membership matrix U and all cluster centers so as to make the objective function reach its minimum.

Extreme Learning Machine and Its Variants
In this subsection, the original ELM and its variants are introduced briefly.

Extreme Learning Machine
ELM is a new single hidden layer feed-forward neural network that was first proposed by Huang et al. [35] in 2004.Given the ith sample pair where H(•) is the activation function, β m is the mth output weight, m ∈ R D is the input weight vector, and α m is the corresponding bias.Equation ( 6) can be re-written as a matrix form, T = Hβ (7) where is the hidden layer output matrix, which is represented as The optimization of ELM is to solve the following minimization problem, where 2 denotes the squared l 2 -norm of a vector.Then, the output weight vector β can be estimated by where H † denotes the Moore-Penrose generalized inverse of H.

Weighted Regularized Extreme Learning Machine (WELM)
The objective function for a regularized ELM can be expressed as [41] argmin where C is the regularization term.By introducing a vector of Lagrangian multiplier α ∈ R N , the corresponding Lagrangian function is defined as The corresponding solution for the above problem is [37,41] where I is an identity matrix.Deng et al. [36] developed a WELM to reduce the adverse effects of outliers.They weighted the ith error term ε i by the weighting factor η i .Thus, the error term ε 2 2 in Equation ( 11) becomes ηε 2 2 .The objective function for WELM is expressed as [36,37] argmin where Similar to the optimization of the regularized ELM, Lagrangian multiplier is also introduced.The solution for WELM is given by [36,37] There are several weighting functions that can be used to construct weighting matrix η, as suggested in [36,37,41], the expression of Hampel weighting function is given by where ŝ is the robust estimate of the standard deviation of the error variables generated by the regularized ELM, The term IQR denotes the inter-quartile range, which is the difference between the 75th percentile and the 25th percentile.And, the constants c 1 , c 2 are typically set as c 1 = 2.5 and c 2 = 3 [37].
On the whole, there are three steps to construct a WELM model.Firstly, the regularized ELM model is used to generate the error series {ε 1 , • • • , ε N }.Secondly, according to the computed error series, the weighting matrix will be calculated by Equation (16).Finally, the output weight vector β is obtained via Equation (15).

Outlier-Robust Extreme Learning Machine (ORELM)
Owing to the presence of outliers, the training error will show the sparse characteristic [37].The sparsity can be realized by l 0 -norm rather than l 2 -norm.So, to deal with outliers, the objective function can be modified by replacing E 2  2 with E 0 .However, the modified objective function becomes a non-convex problem, which is difficult to be optimized.With this in mind, Zhang and Luo [37] replaced ε 0 by the l 1 -norm-based loss function, E 1 .Thus, the objective function is given by argmin For Equation (18), it not only guarantees the sparsity, but also leads to the overall minimization convex [37].Then, the Augmented Lagrange Multiplie (ALM) method is used to optimize Equation (18), and the corresponding Lagrangian function is expressed as where µ is a penalty parameter and set as µ = 2N/ T 1 in [37].The optimal solutions for ε, β, α can be obtained by iteratively minimizing Equation (19).The corresponding iterative functions are where • means the element-wise multiplication, and sign (•) is a sign function.The optimal output weights β can be obtained when the number of iterations reaches a predefined maximum iteration [37].

Support Vector Regression
SVR was first proposed by Vapnik based on the structural risk minimization [38].The main idea is to find a nonlinear mapping function φ, which can map low-dimensional data into high-dimensional feature space.
Given the training data (x 1 , y 1 ), (x 2 , y 2 ), • • • , (x n , y n ) , the aim is to get an estimated f (x) for the real output y, the expression of f (x) is f (x) = w T φ(x) + b, in which w, b are two model parameters.The objective function for SVR is min where l(•) is the loss function, C is the trade-off parameter.For the original SVR, the ε-insensitive loss function is employed.When introducing the slack variables ξ i , ξ * i , the primal problem for SVR is min The dual problem for SVR can be obtained by applying the Karush-Kuhn-Tucker (KKT) conditions.Then, the decision function f (x) for a new sample x is where Usually, Gaussian kernel is used in many applications due to its superior performance.

Proposed Strategy for Wind Power Curve Modeling
In our collected wind data, there are many outliers, which may be caused by over dating, pitch malfunction, pitch controller malfunction, wind speed under reading, dirt, bugs or icing on blades, and down rating, and so on.[42].So, it is essential to detect and remove those outliers in our collected wind data.In this paper, several steps, which are described in Figure 2, are taken to generate accurate wind power curves.Step 1: Clustering the collected wind data.Different environmental conditions or different working conditions lead to different patterns and characteristics of the collected wind data.Here, FCM is employed to divide the data into different clusters.The samples in the same cluster may have some common characteristics.
Step 2: Detecting and removing outliers.In a certain cluster, Mahalanobis distance is employed to measure the distance between the sample and cluster center.Generally, outliers will be far from normal samples, and also farther from the cluster center than normal samples.So, the sample will be considered as an outlier when the distance between the sample and cluster center is larger than a given threshold.Then, the detected outliers will be removed from the data.The Mahalanobis distance can be computed by where  is the sample in the observed cluster, , Σ are the corresponding cluster center and covariance of all samples in the same cluster.
Step 3: Training ELM-based power curve models.After removing the outliers from the collected data, the processed data is employed to train different wind power curve models.However, one cannot ensure that all outliers are detected and removed from the collected wind data.So, some unobvious outliers may hide in the processed data.It is wise to use robust wind power curve models to eliminate the adverse effect of hidden outliers on the power curve modeling.Moreover, considering the efficiency of ELM, the original ELM and its two robust variants are selected as basic power curve models to describe the nonlinear relationship between wind speed and power output.The number of hidden nodes in ELM-based models greatly affects their performance.So, in their training phases, validation sets are employed to help select the optimal model parameters, which can be obtained when the forecasting error on a validation set reaches its minimum.
Step 4: Producing deterministic wind power curve.Occasionally, a single model is unable to describe the complex relationship comprehensively.So, in this paper, alongside the wind speed, the wind power forecasts on the validation set obtained in Step 3 are also used as inputs to train a regression model, SVR.Namely, SVR is utilized to approximate the function (⋅) in the following equation, where  is the real wind power,  ,  ,  denote the forecasted wind power obtained Step 1: Clustering the collected wind data.Different environmental conditions or different working conditions lead to different patterns and characteristics of the collected wind data.Here, FCM is employed to divide the data into different clusters.The samples in the same cluster may have some common characteristics.
Step 2: Detecting and removing outliers.In a certain cluster, Mahalanobis distance is employed to measure the distance between the sample and cluster center.Generally, outliers will be far from normal samples, and also farther from the cluster center than normal samples.So, the sample will be considered as an outlier when the distance between the sample and cluster center is larger than a given threshold.Then, the detected outliers will be removed from the data.The Mahalanobis distance can be computed by where x is the sample in the observed cluster, µ, Σ are the corresponding cluster center and covariance of all samples in the same cluster.
Step 3: Training ELM-based power curve models.After removing the outliers from the collected data, the processed data is employed to train different wind power curve models.However, one cannot ensure that all outliers are detected and removed from the collected wind data.So, some unobvious outliers may hide in the processed data.It is wise to use robust wind power curve models to eliminate the adverse effect of hidden outliers on the power curve modeling.Moreover, considering the efficiency of ELM, the original ELM and its two robust variants are selected as basic power curve models to describe the nonlinear relationship between wind speed and power output.The number of hidden nodes in ELM-based models greatly affects their performance.So, in their training phases, validation sets are employed to help select the optimal model parameters, which can be obtained when the forecasting error on a validation set reaches its minimum.
Step 4: Producing deterministic wind power curve.Occasionally, a single model is unable to describe the complex relationship comprehensively.So, in this paper, alongside the wind speed, the wind power forecasts on the validation set obtained in Step 3 are also used as inputs to train a regression model, SVR.Namely, SVR is utilized to approximate the function f (•) in the following equation, P real = f S, P elm , P welm , P orelm + ε (25) where P real is the real wind power, P elm , P welm , P orelm denote the forecasted wind power obtained by ELM, WELM, and ORELM, respectively, at wind speed V, ε is the error term.Then, at any unknown wind speed, the corresponding wind power forecasts can be obtained using the trained SVR.Thus, a deterministic wind power curve can be derived.

Wind Turbine Power Curve Modeling in Different Wind Farms
In this section, to evaluate the performances of different power curve models under various geographical environments, a comparative study is conducted in different wind farms.

Data Description
Four wind datasets were taken to assess the performance of different power curve models.Dataset A and Dataset B were collected from China Xichang and Hunan wind farms, respectively.For Dataset C and Dataset D, they were collected from different wind turbines of the same wind farm.The whole dataset for each wind farm was divided into three separate parts: training set, validation set, and test set.The training set was used to train different power curve models, the validation set was employed to select the optimal model parameters, and the performances of different models were evaluated on the test set.The information of four datasets is presented in Table 2.In all datasets, only two variables, wind speed and wind power, were measured and reserved.So, the only input for all power curve models is wind speed.

Experiment Setting
In order to make a comparative study comprehensively, five parametric models (e.g., 3-PLF, 4-PLF, 5-PLF, 6-PLF and MHTan) and five nonparametric models, back-propagation neural network (BPNN), ANFIS, ELM, WELM, ORELM, were utilized.For all models, the only input was wind speed, and the output was wind power.Moreover, all models were trained with the same processed data generated by the FCM-based model for a fair comparison.
In the paper, similar to [10], the backtracking search algorithm (BSA) was employed to optimize the model parameters in all parametric models with the least squares loss function.Grid search was employed to tune the parameters in SVR.For BPNN, the number of neurons in hidden layers are determined by the Hecht-Nelson method [9].For ELM-based regression models, the optimal number of hidden layers can be obtained when the modeling error of the observed model on the validation set reaches the minimum.The candidate number of hidden layers can be selected from the set {20, 40, 60, • • • , 200}.
In order to test the performances of different models, four error indicators, mean absolute error (MAE), root mean square error (RMSE), normalized mean absolute percentage error (NMAPE) and the coefficient of determination (R 2 ), were employed.For all models, the values of different error indicators were computed based on the same test set.Assuming that y l denotes the lth measured value in the test set, and ŷl is the corresponding forecast (l = 1, • • • , L), y l is the mean of all measured values, L is the number of samples in the test set, the calculations for the four error indicators are presented in Table 3.For MAE, NMAPE and RMSE, a smaller value indicates a better estimator.However, the closer the value of R 2 is to 1, the better the model.Table 3. Performance evaluation metrics and the corresponding definitions.

Indicator
Calculation Description The smaller the better The smaller the better The smaller the better The bigger the better

Results of Wind Turbine Power Curve Modeling
According to the steps described in Section 3.3, the original wind data are clustered by FCM at first.Then, in each cluster, if the Mahalanobis distance (Equation ( 24)) is larger than 10, the sample will be considered as outliers.The refined data and outliers are presented in Figure 3. From Figure 3, most outliers can be detected from the original wind data by FCM-based outlier detection model.
The smaller the better The bigger the better

Results of Wind Turbine Power Curve Modeling
According to the steps described in Section 3.3, the original wind data are clustered by FCM at first.Then, in each cluster, if the Mahalanobis distance (Equation ( 24)) is larger than 10, the sample will be considered as outliers.The refined data and outliers are presented in Figure 3. From Figure 3, most outliers can be detected from the original wind data by FCM-based outlier detection model.After some outliers are detected and removed from four datasets, the processed wind data will be used to train different power curve models.For ELM-based power curve models, the number of neurons in hidden layers has a great effect on their performance.So, the validation set in each dataset helps to select the optimal number of neurons in the given candidates.In four datasets, as the number After some outliers are detected and removed from four datasets, the processed wind data will be used to train different power curve models.For ELM-based power curve models, the number of neurons in hidden layers has a great effect on their performance.So, the validation set in each dataset helps to select the optimal number of neurons in the given candidates.In four datasets, as the number of neurons in hidden layers changed, the performances of ELM-based power curve models on the validation set were calculated, and these are presented in Tables 4-7.From Table 4, in Dataset A, the optimal numbers of hidden nodes for ELM, WELM, and ORELM are 120, 120 and 80, respectively.According to the results in Table 5, 40, 40, and 40 were selected as the optimal numbers of hidden nodes in ELM, WELM, and ORELM, respectively, for Dataset B. From Table 6, the optimal ones are 120, 120, and 200 for Dataset C, while from Table 7, they are 20, 60, and 80 for Dataset D. After the optimal numbers of hidden nodes in ELM-based models were decided, the trained models were used to forecast the power output of a wind turbine at an unknown wind speed.The results of ELM, WELM, and ORELM in different datasets are presented in Tables 8 and 9.
According to the Step 4 of the proposed hybrid model described in Section 3.3, the wind power forecasts of the optimal ELM, WELM, and ORELM on the validation set are used as inputs, together with the corresponding wind speed, to train SVR.Then, the forecasted power output can be obtained by the trained SVR.The results are presented in Tables 8 and 9.Moreover, the results of the other ten compared models (five parametric models and five nonparametric models) are also shown in Tables 8 and 9.According to Table 8, the proposed hybrid model provides better power forecasts than the other compared models in Dataset A. Moreover, the majority of nonparametric models outperform parametric models.However, 5-PLF performs better than ANFIS.In five parametric models, 5-PLF is the best one.With the exception of the hybrid model, among the five nonparametric models, no model outperforms the others in terms of four error indicators.Specifically, WELM performs the best in terms of MAE and NMAPE, while the best one is BPNN in terms of RMSE and R 2 .
From the results on Dataset B and Dataset D presented in Tables 8 and 9, respectively, the best model is also the proposed hybrid model in terms of all error indicators.Similar to the results in Dataset A, most nonparametric models perform better than parametric models.Among all parametric models, 5-PLF usually generates better power forecasts.
According to the results in Dataset C presented in Table 9, the proposed hybrid model is superior to the other models in terms of MAE, RMSE, and NMAPE, while BPNN is the best model in terms of R 2 .Moreover, in terms of MAE and RMSE, ELM, WELM, and ORELM perform better than the other compared models.However, in terms of NMAPE and R 2 , ELM, WELM, and ORELM perform worse than the other compared models.
According to the results in Tables 8 and 9, the average ranks of all wind turbine power output estimation models were computed, and these are presented in Figure 4. From Figure 4, according to different error indicators, the average ranks of wind turbine power output models are also different.In terms of the MAE, RMSE, and NMAPE, the proposed hybrid model ranks the highest among all models.However, in terms of R 2 , BPNN ranks first and the proposed hybrid model ranks second.Moreover, in terms of MAE and NMAPE, all nonparametric models rank higher than all parametric models.In terms of RMSE, except the ANFIS, all nonparametric models perform better than parametric models.different error indicators, the average ranks of wind turbine power output models are also different.In terms of the MAE, RMSE, and NMAPE, the proposed hybrid model ranks the highest among all models.However, in terms of  , BPNN ranks first and the proposed hybrid model ranks second.Moreover, in terms of MAE and NMAPE, all nonparametric models rank higher than all parametric models.In terms of RMSE, except the ANFIS, all nonparametric models perform better than parametric models.The average ranks of all models based on four error indicators as well as the results in Table 8; Table 9 are shown in Figure 5.According to Figure 5, the proposed hybrid model performs the best The average ranks of all models based on four error indicators as well as the results in Tables 8 and 9 are shown in Figure 5.According to Figure 5, the proposed hybrid model performs the best among all models selected in this paper.Generally, the performance of nonparametric models is better than that of parametric models.However, sometimes 5-PLF is superior to ANFIS in wind turbine power output estimation.
Appl.Sci.2019, 9, x 14 of 17 among all models selected in this paper.Generally, the performance of nonparametric models is better than that of parametric models.However, sometimes 5-PLF is superior to ANFIS in wind turbine power output estimation.For different wind turbines, the estimated wind turbine power curves generated by the proposed hybrid model are presented in Figure 6.It can be seen from Figure 6 that the estimated power curves can fit the data of different wind turbines well.For different wind turbines, the estimated wind turbine power curves generated by the proposed hybrid model are presented in Figure 6.It can be seen from Figure 6 that the estimated power curves can fit the data of different wind turbines well.
Based on the above analysis, it can be concluded that the proposed hybrid model is an effective technique for wind turbine power curve modeling, and can also be chosen and perform well when there are some outliers in the collected wind data.For different wind turbines, the estimated wind turbine power curves generated by the proposed hybrid model are presented in Figure 6.It can be seen from Figure 6 that the estimated power curves can fit the data of different wind turbines well.Based on the above analysis, it can be concluded that the proposed hybrid model is an effective technique for wind turbine power curve modeling, and can also be chosen and perform well when there are some outliers in the collected wind data.

Conclusions
Wind power curves play important roles in many aspects, such as wind power forecasting, wind turbine condition monitoring, wind energy potential estimation, and wind turbine selection.In this paper, a hybrid model is proposed in order to obtain a more accurate power curve.Firstly, an FCM-

Conclusions
Wind power curves play important roles in many aspects, such as wind power forecasting, wind turbine condition monitoring, wind energy potential estimation, and wind turbine selection.In this paper, a hybrid model is proposed in order to obtain a more accurate power curve.Firstly, an FCM-based outlier detection method is employed to process the collected wind data and eliminate the adverse effects of outliers on wind power curve modeling.Secondly, different types of ELM-based regression models are trained and used to estimate the power output of wind turbines.Finally, the power forecasts generated by the above regression models on a validation set are used to train SVR, and the trained SVR is employed to forecast the power output of a wind turbine at different wind speeds.It can be concluded from the power curve modeling results that: (1) most nonparametric models perform better than parametric models; (2) in all parametric models selected in this paper, 5-PLF is superior to the other parametric models with large probability; (3) the proposed hybrid model can generate more accurate power curves than single parametric and nonparametric models for different wind turbines.In practice, even if there are numerous outliers in the collected wind data, the proposed hybrid model is also a good choice for power curve modeling.

Figure 1 .
Figure 1.A typical wind turbine power curve.

Figure 1 .
Figure 1.A typical wind turbine power curve.

17 Figure 2 .
Figure 2. The flowchart of the proposed hybrid model.

Figure 2 .
Figure 2. The flowchart of the proposed hybrid model.

Figure 3 .
Figure 3. Different kinds of data in four datasets.

Figure 3 .
Figure 3. Different kinds of data in four datasets.

Figure 4 .
Figure 4. Averages ranks of different models on all datasets based on four error indicators.

Figure 4 .
Figure 4. Averages ranks of different models on all datasets based on four error indicators.

Figure 5 .
Figure 5. Averages ranks of different models based on all datasets and all error indicators.

Figure 5 .
Figure 5. Averages ranks of different models based on all datasets and all error indicators.

Figure 5 .
Figure 5. Averages ranks of different models based on all datasets and all error indicators.

Figure 6 .
Figure 6.The estimated power curves for different wind turbines.

Figure 6 .
Figure 6.The estimated power curves for different wind turbines.

Table 2 .
The numbers of different types of samples in four datasets.

Table 4 .
Power curve modeling results of ELM-based models on validation set of Dataset A.

Table 5 .
Power curve modeling results of ELM-based models on validation set of Dataset B.

Table 6 .
Power curve modeling results of ELM-based models on validation set of Dataset C.

Table 7 .
Power curve modeling results of ELM-based models on validation set of Dataset D.

Table 8 .
Results of all power curve models on test sets of Dataset A and Dataset B.

Table 9 .
Results of all power curve models on test sets of Dataset C and Dataset D.