A Combined Fuzzy GMDH Neural Network and Grey Wolf Optimization Application for Wind Turbine Power Production Forecasting Considering SCADA Data

: A cost-effective and efﬁcient wind energy production trend leads to larger wind turbine generators and drive for more advanced forecast models to increase their accuracy. This paper proposes a combined forecasting model that consists of empirical mode decomposition, fuzzy group method of data handling neural network, and grey wolf optimization algorithm. A combined K-means and identifying density-based local outliers is applied to detect and clean the outliers of the raw supervisory control and data acquisition data in the proposed forecasting model. Moreover, the empirical mode decomposition is employed to decompose signals and pre-processing data. The fuzzy GMDH neural network is a forecaster engine to estimate the future amount of wind turbines energy production, where the grey wolf optimization is used to optimize the fuzzy GMDH neural network parameters in order to achieve a lower forecasting error. Moreover, the model has been applied using actual data from a pilot onshore wind farm in Sweden. The obtained results indicate that the proposed model has a higher accuracy than others in the literature and provides single and combined forecasting models in different time-steps ahead and seasons.


Introduction
Wind power industries have been tremendously expanded and are expected to progress at a compound annual growth rate (CAGR) of 5.2% between 2020 and 2027.This extension resulted in the produced power cost of wind energy as one of the most significant renewable and low-carbon energy resources.Wind power generation is currently one of the principal renewable energy power generations [1][2][3][4].Wind energy is stochastic, uncertain, and discontinuous, antagonistically influencing the power grid's protected and stable activity and the nature of the power supply [5].The stochasticity and discontinuity of wind power could diminish the reliability prediction system and wind power quality [6].
A potential answer to these issues is to improve the forecast accuracy of wind generation.Several studies [7][8][9][10][11][12] are proposed to portray the distribution of the wind power prediction, and diverse scientific methodologies are connected to improve its accuracy.
Energies 2021, 14, 3459 2 of 13 Other studies proposed complex models such as the Laplace distribution [9], the Beta distribution [10], the hyperbolic distribution [11], the Levy α-stable distribution [12], and the flexible likelihood distribution [13], which have been proposed to improve the fitting precision of the wind power prediction.In the previous decade, many studies to assess and predict the various aspects of energy management and power systems have been presented.For example, wind and solar power generation forecasting [14][15][16], condition monitoring of wind turbines [17], electricity market [18], and load forecasting [19,20] are proposed.Amjady et al. (2011) provided the short-term wind power prediction dependent on the ridgelet neural network (RNN) with a high capacity estimate ability.They suggested a differential evolution algorithm with a new selection mechanism and crossover to train the network [21].Han et al. (2017) proposed combined models based on autoregressive integrated moving average (ARMA) and non-parametric model for wind speed forecasting [22].
The results demonstrated that non-parametric based combined models usually have a better performance than other models.Jonas C. Pelajo et al. (2019) developed a model to predict wind speed and energy price to determine the optimal maintenance planning of a real wind farm in the Brazilian Northeast [23].Osório et al. (2014) proposed a combined forecasting model based on mutual information, wavelet transform, particle swarm optimization, and adaptive neuro-fuzzy inference system framework to predict the short-term wind power and electricity market prices [24].Gallego-Castillo et al. (2016) provided a quantile relapse model dependent on the recreating piece Hilbert space (RKHS) system to predict the wind power probabilistic.Furthermore, they implemented two types of models (online and offline) for a real wind farm [25].Xiao et al. (2017) employed an electrical power system prediction model using a wavelet neural network (WNN) model and an improved cuckoo search algorithm.The results showed that the proposed model essentially diminished the expectation error with respect to other relative models [26].
Kunpeng Shi et al. (2018) provided a combined model based on two-stage feature selection and improved random forest models to short term wind power forecasting [27].Van Quang Doan et al. (2019) have presented a mesoscale ensemble model to predict wind speed ramps.The proposed model applied at real wind farms in Japan [28].Duan et al. (2021) developed a combined intelligent model based on the improved variational mode decomposition and Correntropy long short-term memory neural network to predict wind power.The model was evaluated using two wind farms in China at different sampling intervals [29].Yildiz et al. (2021) presented a two-step new deep learning approach based on the variational mode decomposition (VMD) method and modified the residual-based deep convolutional neural network for wind power forecasting [30].Jafarzadeh et al. (2021) provided a modified fuzzy wavelet neural network for short-term wind power forecasting considering weather and power plant parameters.In order to evaluate the model, the Mnjil wind power plant in Iran has been used [31].
In addition, GIS-based models play an important role in renewable energy potential assessment and prediction [32,33].Furthermore, the behavior and performance of renewable energy systems can be estimated using GIS models [34,35].
Generally, in order to model the wind turbine power production analysis, a combined intelligent solution is required.It means that the data should first be modelled by a combined data pre-processing model then a combined intelligent strategy should analyze the processed data.This type of strategy plays an essential role in managing the energy production of wind farms.
In this research, we propose an integrated strategy that couples an empirical mode decomposition, fuzzy GMDH (group method of data handling) neural network, and grey wolf optimization algorithm (GWO) to forecast the produced power of wind turbines.Furthermore, to detect and clean outliers, a combined K-means and density-based local outliers (LOF) are applied.
The main contributions and novelty of this paper are illustrated as follows: (a) K-means is one of the most well-known clustering methods, a fast and efficient technique in unsupervised learning methods.However, it suffers from some deficiencies such as (i) predefining the number of clusters and centers in advance, (ii) not being able to handle noisy data and outliers properly, and finally, k-means is not proper to classify clusters with non-convex shapes.In order to deal with these listed issues, we proposed a combination of identifying density-based local outliers (LOF) and k-means for cleaning the raw supervisory control and data acquisition (SCADA) data as the initial section of the pre-processing.(b) As wind speed and power forecasting involve the non-linear power curve, stochastic and noisy behavior of the recorded wind data, the empirical mode decomposition (EMD) method is proposed to deal with these uncertainties and increase the modelling accuracy.(c) Fuzzy-GMDH neural networks are considered as one of the most effective methods to model the time-series data with high-level noise and short input sampling.However, initializing the hyper-parameters of fuzzy-GMDH is challenging and time-consuming.With regards to adjusting the hyper-parameters, we apply a robust and fast search method called the grey wolf optimization (GWO) algorithm.Applying the GWO as a hyper-parameter tuner improved the proposed model's accuracy and reliability to forecast wind power.(d) The proposed combined forecasting model has successfully verified on two actual wind turbines SCADA datasets.In addition, the proposed forecasting model is compared with the other valid combined forecasting models.

Materials and Methods
After a brief description of the SCADA system and data gathering, this section illustrates the artificial intelligence methods proposed in this paper.

SCADA System
The SCADA system, known as remote supervision and control of wind turbines in wind farms, plays a significant role in the wind power forecasting models.This paper's collected and applied SCADA data is related to a large wind farm (located in Sweden).The input data includes the power output of wind turbines and wind speed (short-term with the interval of 10 min) for a year from Jan to Dec 2015.Furthermore, in order to evaluate and compare the performance of the proposed hybrid model, we applied the SCADA data for two wind turbines (wind turbine 1 (WT1) and wind turbine 2 (WT2)).

Proposed Wind Power Forecasting Strategy
In this study, a multi-step hybrid intelligent model has been proposed as a means to predict wind power production (see Figure 1).(a) K-means is one of the most well-known clustering methods, a fast and efficient technique in unsupervised learning methods.However, it suffers from some deficiencies such as (i) predefining the number of clusters and centers in advance, (ii) not being able to handle noisy data and outliers properly, and finally, k-means is not proper to classify clusters with non-convex shapes.In order to deal with these listed issues, we proposed a combination of identifying density-based local outliers (LOF) and kmeans for cleaning the raw supervisory control and data acquisition (SCADA) data as the initial section of the pre-processing.(b) As wind speed and power forecasting involve the non-linear power curve, stochastic and noisy behavior of the recorded wind data, the empirical mode decomposition (EMD) method is proposed to deal with these uncertainties and increase the modelling accuracy.(c) Fuzzy-GMDH neural networks are considered as one of the most effective methods to model the time-series data with high-level noise and short input sampling.However, initializing the hyper-parameters of fuzzy-GMDH is challenging and time-consuming.With regards to adjusting the hyper-parameters, we apply a robust and fast search method called the grey wolf optimization (GWO) algorithm.Applying the GWO as a hyper-parameter tuner improved the proposed model's accuracy and reliability to forecast wind power.(d) The proposed combined forecasting model has successfully verified on two actual wind turbines SCADA datasets.In addition, the proposed forecasting model is compared with the other valid combined forecasting models.

Materials and Methods
After a brief description of the SCADA system and data gathering, this section illustrates the artificial intelligence methods proposed in this paper.

SCADA System
The SCADA system, known as remote supervision and control of wind turbines in wind farms, plays a significant role in the wind power forecasting models.This paper's collected and applied SCADA data is related to a large wind farm (located in Sweden).The input data includes the power output of wind turbines and wind speed (short-term with the interval of 10 min) for a year from Jan to Dec 2015.Furthermore, in order to evaluate and compare the performance of the proposed hybrid model, we applied the SCADA data for two wind turbines (wind turbine 1 (WT1) and wind turbine 2 (WT2)).

Proposed Wind Power Forecasting Strategy
In this study, a multi-step hybrid intelligent model has been proposed as a means to predict wind power production (see Figure 1).on FGMDH and GWO for wind power production forecasting.The GWO optimization algorithm can perform the neural network (FGMDH) training step well and optimize the value of network parameters.Therefore, this algorithm (GWO) is constructive to the performance of the proposed model for predicting wind power production.
Since the structure of the input matrix plays a significant role in determining the output and accuracy of the model, in the first place, the various input signals (wind power and wind speed) are decomposed through the EMD method to different high and low frequencies (see Figure 2).
put and accuracy of the model, in the first place, the various input signals (wind power and wind speed) are decomposed through the EMD method to different high and low frequencies (see Figure 2).
Five types of decomposed frequencies (IMF1, IMF2, IMF3, IMF4, residual) are selected and applied by delaying a unit of time (t-1) as inputs of the model subsequently (see Figure 3-inputs and output data structure).In addition, the lagged values (1 to 5) for the original wind power signal and actual wind speed signal are considered as input parameters (Figure 3-inputs and output data structure).In the next step, the FGMDH method has been employed to predict the wind turbine power.
The FGMDH model structure includes different neurons.The parameters grouped in the form of Gaussian variables and the weight of the fuzzy rule in each neuron are unknown.In this paper, the GWO algorithm is applied with the purpose of optimizing the FGMDH model variables (the group-unknown variables in neurons).In this study, in order to evaluate the performance and reliability of forecasting models, the wind turbine power production is predicted for different seasons at two times (10min and 1-h).The framework of the proposed model is represented in Figure 3. Five types of decomposed frequencies (IMF1, IMF2, IMF3, IMF4, residual) are selected and applied by delaying a unit of time (t-1) as inputs of the model subsequently (see Figure 3-inputs and output data structure).In addition, the lagged values (1 to 5) for the original wind power signal and actual wind speed signal are considered as input parameters (Figure 3-inputs and output data structure).In the next step, the FGMDH method has been employed to predict the wind turbine power.
The FGMDH model structure includes different neurons.The parameters grouped in the form of Gaussian variables and the weight of the fuzzy rule in each neuron are unknown.In this paper, the GWO algorithm is applied with the purpose of optimizing the FGMDH model variables (the group-unknown variables in neurons).
In this study, in order to evaluate the performance and reliability of forecasting models, the wind turbine power production is predicted for different seasons at two times (10-min and 1-h).The framework of the proposed model is represented in Figure 3.

Data Cleaning
The raw SCADA datasets usually include different forms of noise that directly negatively affect the accuracy of the forecasting process.One of the most notable outliers can be the negative wind turbine power outputs observed when the wind speed is shallow or during a failure situation.For evaluating the distribution of the raw SCADA dataset, Figure 1 is plotted, and also an abnormal distribution of wind power can be seen in Figure 4.

Data Cleaning
The raw SCADA datasets usually include different forms of noise that directly negatively affect the accuracy of the forecasting process.One of the most notable outliers can be the negative wind turbine power outputs observed when the wind speed is shallow or during a failure situation.For evaluating the distribution of the raw SCADA dataset, Figure 1 is plotted, and also an abnormal distribution of wind power can be seen in Figure 4.In the pre-processing section, it is recommended that [36] these negative powers should be set as zero.In addition, to remove the impact of the data scale, a Min-Max normalization is implemented for the feature scaling.Meanwhile, as each wind turbine has a unique power curve that presents the average efficiency of the applied wind turbine, without declaring the particular mechanical components, Figure 5   The proposed cleaning data method is a combined K-means clustering and the identifying density-based local outliers (LOF) method [37].In the first step, a k-means clustering method is employed to classify the SCADA data into various clusters.Then, in each cluster, the local density-based method is adopted to eliminate the potential noises.The clean data after using K-means clustering and the LOF method can be illustrated in Figure 6.In the pre-processing section, it is recommended that [36] these negative powers should be set as zero.In addition, to remove the impact of the data scale, a Min-Max normalization is implemented for the feature scaling.Meanwhile, as each wind turbine has a unique power curve that presents the average efficiency of the applied wind turbine, without declaring the particular mechanical components, Figure 5  In the pre-processing section, it is recommended that [36] these negative powers should be set as zero.In addition, to remove the impact of the data scale, a Min-Max normalization is implemented for the feature scaling.Meanwhile, as each wind turbine has a unique power curve that presents the average efficiency of the applied wind turbine, without declaring the particular mechanical components, Figure 5   The proposed cleaning data method is a combined K-means clustering and the identifying density-based local outliers (LOF) method [37].In the first step, a k-means clustering method is employed to classify the SCADA data into various clusters.Then, in each cluster, the local density-based method is adopted to eliminate the potential noises.The clean data after using K-means clustering and the LOF method can be illustrated in Figure 6.The proposed cleaning data method is a combined K-means clustering and the identifying density-based local outliers (LOF) method [37].In the first step, a k-means clustering method is employed to classify the SCADA data into various clusters.Then, in each cluster, the local density-based method is adopted to eliminate the potential noises.The clean data after using K-means clustering and the LOF method can be illustrated in Figure 6.

Empirical Mode Decomposition
The EMD is a method of signal decomposition that can analyse the non-linear and non-stationary time series.Moreover, using this method is more accessible and more understandable compared to wavelet decomposition [38,39].In addition, EMD does not stand in the need of deciding a mother function in advance (beforehand of time) by no means such as wavelet decomposition.The most important characteristic of the EMD is a fully data-driven means by which signals break down into various independent components within the interval of local specifications of a signal.
Decomposing initial signals as intrinsic mode functions (IMFs) and residual into a finite amount of oscillatory functions is the concept of EMD.These IMFs must be met by the following conditions: (1) The number of extreme must be equal to the number of zero crossings or their maximum difference is equal to one; (2) the mean value of the envelopes characterized by local maxima and local minima must be zero at all components.
The EMD is a sifting method using a real signal to extract the IMFs and residual.The calculation of the EMD can be given as the following steps [38,39]: Stage 1: Recognize all local maxima and local minima in time series ().
Stage 2: Connect all local maxima and minima to produce the upper U(t) and lower L(t) envelopes using a cubic spline line.
Stage 3: Calculate the point-by-point mean envelope from the upper and lower envelopes and create the mean envelopes () later as: Stage 4: Compute the distinction between the mean envelopes and the actual signal: Stage 5: Check whether ℎ() is an intrinsic mode function (IMF).Provided that this is true, it is treated as the ith IMF and afterwards the actual time series is supplanted by the residuals ℎ() = () − ().If not, is supplanted by ℎ().
Stage 6: Repeat Steps 1-5 until the standard deviation magnitude of the two consecutive sifting results (IMFS and Residual) is lower than the predefined stopping criterion.
Using the above-mentioned sifting process, many IMFs can be obtained from high frequency to low frequency, thereby disintegrating into several IMFs and a residual as: where   () and n are the last residuals and the number of IMFs, respectively.  () ( = 1,2, … , ) indicates different IMFs.

Empirical Mode Decomposition
The EMD is a method of signal decomposition that can analyse the non-linear and non-stationary time series.Moreover, using this method is more accessible and more understandable compared to wavelet decomposition [38,39].In addition, EMD does not stand in the need of deciding a mother function in advance (beforehand of time) by no means such as wavelet decomposition.The most important characteristic of the EMD is a fully data-driven decomposing means by which signals break down into various independent components within the interval of local specifications of a signal.
Decomposing initial signals as intrinsic mode functions (IMFs) and residual into a finite amount of oscillatory functions is the concept of EMD.These IMFs must be met by the following conditions: (1) The number of extreme must be equal to the number of zero crossings or their maximum difference is equal to one; (2) the mean value of the envelopes characterized by local maxima and local minima must be zero at all components.
The EMD is a sifting method using a real signal to extract the IMFs and residual.The calculation of the EMD can be given as the following steps [38,39]: Stage 1: Recognize all local maxima and local minima in time series S(t).
Stage 2: Connect all local maxima and minima to produce the upper U(t) and lower L(t) envelopes using a cubic spline line.
Stage 3: Calculate the point-by-point mean envelope from the upper and lower envelopes and create the mean envelopes m(t) later as: Stage 4: Compute the distinction between the mean envelopes and the actual signal: Stage 5: Check whether h(t) is an intrinsic mode function (IMF).Provided that this is true, it is treated as the ith IMF and afterwards the actual time series is supplanted by the residuals h(t) = S(t) − m(t).If not, is supplanted by h(t).
Stage 6: Repeat Steps 1-5 until the standard deviation magnitude of the two consecutive sifting results (IMFS and Residual) is lower than the predefined stopping criterion.
Using the above-mentioned sifting process, many IMFs can be obtained from high frequency to low frequency, thereby disintegrating into several IMFs and a residual as: Energies 2021, 14, 3459 8 of 13 where r n (t) and n are the last residuals and the number of IMFs, respectively.C i (t) (i = 1, 2, . . ., n) indicates different IMFs.

Fuzzy-GMDH Model
The FGMDH is a machine learning strategy in the hierarchical structure [40].In this model, every neuron has two inputs and an output.The general structure of the FGMDH system was shown in Figure 3 (the FGMDH forecasting model).In this figure, the output of each neuron in each layer is considered as the input in the following layer.The last output is determined to utilize the mean of the last layer output.
The FGMDH structure part in Figure 6 demonstrates that the inputs from the mth model and pth layer are the outputs of the (m − 1)th and mth model in the (p − 1)th layer.The numerical function for computing the y pm (the output variable of the mth model in the pth layer) is as follows:  [41].Furthermore, the last output is calculated by the following equation: The learning procedure of feed forward FGMDH is known to solve the composite problems as an iterative technique.
A simplified fuzzy logic rule has been provided by [40] to improve the GMDH neural network: If x 1 = F k1 and x 2 = F k2 , then output y = w k

Gray Wolf Optimization
The GWO algorithm, which is a new meta-heuristic algorithm based on swarm intelligence evolutionary, is proposed by Mirjalili et al. [42].
The GWO is inspired by grey wolves.The four types of grey wolves are hired as alpha, beta, delta, and omega to replicate the hierarchy of management.On the other side, the notable steps of grey wolves (encircling prey, hunting, attacking prey, and searching for prey) are performed during the operation [43].
Encircling prey: The encircling behaviour of each agent of the group is computed by the following mathematical formula: The vectors → a and → c are computed as the following formula: Energies 2021, 14, 3459 9 of 13 Hunting: For a mathematical simulation of the hunting behaviour of grey wolves, it is assumed that α, β, and δ have better information about the possible location of the prey.
Attacking the prey and searching for the prey: The → a is a random value in the interval [−2a, 2a].When the random value is less than 1, the grey wolves are enforced to attack the prey and if the random value is greater than 1, the grey wolves are forced to diverge from the prey.

Error Indicators
In order to assess the accuracy and reliability of the proposed forecasting model, different error indicators have been used in this paper: The mean absolute percentage error (MAPE), the sum squared error (SSE), the root mean squared error (RMSE), and the mean absolute error (MAE).All error indicators are based on error percentage (unitless).
where x real i and x f ori are the actual value and predicted value, respectively.m is the number of data.

Results and Discussion
As discussed in the previous sections, in order to evaluate the performance and efficiency of the proposed method, the real on-shore SCADA dataset for two wind turbines (WT1 and WT2) has been exploited in this paper.Regarding the framework of the proposed model (Figure 3), several frequencies of wind speed (IMF1 Speed , IMF2 Speed , IMF3 Speed , IMF4 Speed , Res Speed ) and wind turbine power (IMF1 Power , IMF2 Power , IMF3 Power , and IMF4 Power , and Res Power ) are considered as input parameters of the model: In addition, several combined forecasting models such as MI-CNN [44], MRMR-HNES [45], MI-CNEA [46], GRNN, and FGMDH have been applied to measure the performance of the proposed model.First, the 1-year-dataset is selected as the prediction model data and the predicted results are calculated with the error indicators presented in the previous section.The results are shown in Table 1.In Table 1, the prediction results are calculated for two wind turbines in two different time steps (10-min and 1-h).The wind turbine power production is highly dependent on the wind speed.On the other hand, the wind speeds vary greatly on different days thus, the forecasting time intervals for power production have been chosen amongst the days of the four following months: February, May, August, and November which have the highest fluctuation values for power production.

Original and decomposition signals
According to the results in Table 1, the performance of the proposed model is better than the other provided models in different time steps and wind turbines.In addition, the results indicate that the performance of the forecasting models in the 10-min time step is better than the 1-h time step.Table 2 and Figure 7 indicate the results of the proposed forecasting model and other models for wind turbine power production forecasting (WT1).According to the results of Table 2 and Figure 7, the performance of the forecasting models has been evaluated in different time steps (10-min and 1-h) and seasons (winter, spring, summer, and fall).Based on these results, the proposed model can predict the wind turbine power more reliably and highly accurately multiple times ahead compared to the other valid forecasting models (GRNN, FGMDH, MI-CNN, MRMR-HNES, and MI-CNEA).

Conclusions
Considering the highly volatile and nonlinear process of wind turbines power production, a hybrid intelligent system to improve the accuracy and efficiency of wind turbine power prediction has been proposed.For the initial step, the hybrid K-means-LOF and EMD methods have been applied as a pre-processing step for removing the outliers and decomposition of the SCADA data, respectively.Then, the processed data was given to the forecasting model (FGMDH) and the future power of the wind turbine has been calculated.Furthermore, in order to complete the proposed model as a parallel calculation, the GWO algorithm has been used as an optimization method to optimize the FGMDH parameters.In this study, the SCADA data for two wind turbines in the real wind farm located in Sweden has been used to measure the performance and reliability of the proposed model.
The new forecasting model has been applied to predict the power of wind turbines for two time-intervals ahead (10-min and 1-h) in 1 year and different seasons.The obtained results pinpointed that the performance of the proposed method (EMD-FGMDH-GWO) at different time intervals has a high accuracy and reliability than many other available methods such as GRNN, FGMDH, MI-CNN, MRMR-HNES, and MI-CNEA.The  According to the results of Table 2 and Figure 7, the performance of the forecasting models has been evaluated in different time steps (10-min and 1-h) and seasons (winter, spring, summer, and fall).Based on these results, the proposed model can predict the wind turbine power more reliably and highly accurately multiple times ahead compared to the other valid forecasting models (GRNN, FGMDH, MI-CNN, MRMR-HNES, and MI-CNEA).

Conclusions
Considering the highly volatile and nonlinear process of wind turbines power production, a hybrid intelligent system to improve the accuracy and efficiency of wind turbine power prediction has been proposed.For the initial step, the hybrid K-means-LOF and EMD methods have been applied as a pre-processing step for removing the outliers and decomposition of the SCADA data, respectively.Then, the processed data was given to the forecasting model (FGMDH) and the future power of the wind turbine has been calculated.Furthermore, in order to complete the proposed model as a parallel calculation, the GWO algorithm has been used as an optimization method to optimize the FGMDH parameters.In this study, the SCADA data for two wind turbines in the real wind farm located in Sweden has been used to measure the performance and reliability of the proposed model.
The new forecasting model has been applied to predict the power of wind turbines for two time-intervals ahead (10-min and 1-h) in 1 year and different seasons.The obtained results pinpointed that the performance of the proposed method (EMD-FGMDH-GWO) at different time intervals has a high accuracy and reliability than many other available methods such as GRNN, FGMDH, MI-CNN, MRMR-HNES, and MI-CNEA.The MAPE error indicator obtained for GRNN, FGMDH, MI-CNN, MRMR-HNES, MI-CNEA and the proposed model is equal to 20.818, 11.883, 3.981, 5.105, 5.754, and 3.012, respectively.In

Figure 1 .
Figure 1.General schematic of the proposed forecasting model.Figure 1.General schematic of the proposed forecasting model.

Figure 1 .
Figure 1.General schematic of the proposed forecasting model.Figure 1.General schematic of the proposed forecasting model.Due to the wide range of intelligent methods such as neural networks and metaheuristic optimization algorithms, in this paper, we presented a hybrid forecasting model based

14 Figure 3 .
Figure 3.The framework of the proposed model.

Figure 3 .
Figure 3.The framework of the proposed model.

Figure 4 .
Figure 4.The distribution of the raw SCADA data WT1 (wind speed and power).
is plotted for showing this characteristic of the first wind turbine in this research.The scatter data point indicates the outliers.

Figure 5 .
Figure 5.The power curve model of the WT1.

Figure 4 .
Figure 4.The distribution of the raw SCADA data WT1 (wind speed and power).

14 Figure 4 .
Figure 4.The distribution of the raw SCADA data WT1 (wind speed and power).
is plotted for showing this characteristic of the first wind turbine in this research.The scatter data point indicates the outliers.

Figure 5 .
Figure 5.The power curve model of the WT1.

Figure 5 .
Figure 5.The power curve model of the WT1.

Figure 6 .
Figure 6.The K-means clustering method performance and the applied data (WT1) divided into 10 clusters (left).The dark points represent the clean SCADA data after applying the LOF (right).

Figure 6 .
Figure 6.The K-means clustering method performance and the applied data (WT1) divided into 10 clusters (left).The dark points represent the clean SCADA data after applying the LOF (right).
corresponding weight parameter and the kth Gaussian function, respectively.Moreover, a pm k and b pm k are the Gaussian parameters

Figure 7 .
Figure 7.The wind turbine power forecasting results of the comparative models for WT1.

Figure 7 .
Figure 7.The wind turbine power forecasting results of the comparative models for WT1.

Table 1 .
Comparison of the wind turbine power forecasting errors of the models for two wind turbines (WT1 and WT2).

Table 2 .
Comparison of the wind turbine power forecasting errors of the models for the different seasons of a year.