A Fuzzy Group Forecasting Model Based on Least Squares Support Vector Machine (LS-SVM) for Short-Term Wind Power

: Many models have been developed to forecast wind farm power output. It is generally difficult to determine whether the performance of one model is consistently better than that of another model under all circumstances. Motivated by this finding, we aimed to integrate groups of models into an aggregated model using fuzzy theory to obtain further performance improvements. First, three groups of least squares support vector machine (LS-SVM) forecasting models were developed: univariate LS-SVM models, hybrid models using auto-regressive moving average (ARIMA) and LS-SVM and multivariate LS-SVM models. Each group of models is selected by a decorrelation maximisation method, and the remaining models can be regarded as experts in forecasting. Next, fuzzy aggregation and a defuzzification procedure are used to combine all of these forecasting results into the final forecast. For sample randomization, we statistically compare models. Results show that this group-forecasting model performs well in terms of accuracy and consistency


Introduction
Along with science and technology in general, wind power technology has also developed rapidly.Because wind power technology is mature, many medium-and large-sized wind farms have been built and put into operation.Wind power has become an important source of the entire power system; worldwide, the installed wind power capacity was 157.9 GW in 2009, representing an annual growth of 20% over the preceding 10 years.Wind energy resources available in China are estimated at 1000 GW, ranking the country third after Russia and the U.S. In recent years, wind power has experienced rapid development in China, as the capacity increased from 0.34 to 25.8 GW between 2000 and 2009.In 2020, the total installed capacity of wind power is expected to reach 150 GW [1].
Wind power is always fluctuating because wind is volatile and intermittent.When the power output exceeds a certain value, it significantly affects power quality, power system security and the stability of operations.If an accurate short-term wind power output forecast is available, the power dispatching department can adjust scheduling in accordance with changes in wind power output to ensure power quality and reduce the system's excess capacity and power system cost.Therefore, short-term wind power forecasts are of key importance [2][3][4].
Modern wind farms usually incorporate remote monitoring systems in wind turbines so that all turbines can capture and record all signals.The real-time output data from wind generators can be used directly for wind power forecasts without any additional cost, which reduces the cost and improves the quality of data collection, as well as increases forecast accuracy.The existing forecasting methods can be classified into two groups.The first group consists of univariate forecasting models based on historical and real-time power data, in which changes in wind speed are not considered.The second group consists of multivariate models, in which forecasts are based on the relationship between weather data and output power [5].The numerical weather prediction (NWP) model is popular for short-term wind power prediction with advantages in accuracy, but, it needs more weather information [6].Detailed algorithms include time series methods, such as the auto-regressive moving average (ARMA) and the auto-regressive conditional heteroskedasticity (ARCH) models [7,8], the linear regression model [9], the grey theory model [10,11], the support vector machine (SVM) [12,13], adaptive fuzzy logic algorithms [14,15] and artificial neural networks (ANNs) [16,17], among others [18].
In the above-mentioned individual models, it is difficult to determine whether the performance of one model is consistently better than that of another model under all circumstances.Typically, a number of different models are utilised, and the model with the most accurate results is selected.However, the selected model may not necessarily be the best for future use because of potentially influential factors, such as sampling variation, model uncertainty and structure change.It is almost universally agreed upon in the forecasting literature that no single method is best in every situation, primarily because a real-world problem is often complex in nature and because any single model may not be able to capture different patterns equally well.Therefore, there is a certain optimal combination of forecasts to be studied, such as an adaptive combination of forecasts [19] and an optimal combination of wind power forecasts [20].Motivated by this finding, we aimed to integrate multiple models into an aggregated model to obtain further performance improvement.Therefore, certain intelligent SVM forecasting models were developed.The models are selected by a decorrelation maximisation method, and the remaining models can be regarded as experts in forecasting.Then, the fuzzy theory is used to combine all of these forecasting results into the final forecast.
The remainder of this paper is organised as follows: Section 2 describes three group models.In Section 3, real datasets are statistically used for the testing of these models.Finally, conclusions are presented in Section 4.

Principle of Least Squares SVM (LS-SVM)
In this study, SVM was selected as the basic algorithm with which to construct forecasting models because this algorithm is often viewed as a "universal approximator".It has been proven to provide a good arbitrary approximation of any continuous function.Therefore, the model is used here to simulate mutual relationships between historical data and the forecast power output.The models have the ability to provide flexible mapping between inputs and outputs.The SVM model of a data set is given by the formula described below.
Consider an n set of data{(x 1 , y 1 ), …, (x N , y N )}, where x i is the i th input vector and y i is the corresponding desired output.Because i = 1, 2, …, N, where N is the size of the sample, the estimating function assumes the following form: where w is the weight vector, b is the bias and ϕ(x) is the high-dimensional feature space nonlinearly mapped from the input space, and (•) represents the inner product.This leads to the optimisation problem associated with standard SVM: where γ is a positive real constant that determines the penalty for estimation errors and is the estimation error measured by the experimental risk and loss function.Usually, the ε-insensitive loss function is adopted because of its excellent sparsity: For least-squares SVM (LS-SVM), the two norms of the estimation error are adopted as the loss function in the objective function and equality constraints instead of inequality constraints.Therefore, the optimisation problem is described as: where ξ i is a slack variable, ξ i ≥ 0. It is a variable added to an inequality constraint to transform it to equality.It is non-negative number in this paper.
After the introduction of Lagrange multipliers α i , the Lagrange function is constructed as: According to KKT conditions which can transform inequality constraints into equality constraints, defined as: The following equation can then be obtained: After eliminating w and γ, we obtain: where Θ = [1, …, 1] 1×N , I is a unit matrix, Ω is a square matrix and the element of Ω is expressed as: By solving Equation (7), values of α and b are obtained.According to Mercer's condition, there exists a kernel function with a value that is equal to the inner product of the two vectors x i and x j in the feature spaces ϕ(x i ) and ϕ(x j ); that is, K(x i , x j ) = ϕ(x i ) T ϕ(x j ).Then, the LS-SVM model for regression is expressed as:

Group 1: Diversified Univariate LS-SVM Model
The first group is the univariate forecasting model.It is based on historical and real-time power data; other weather data, such as wind speed, are not considered.Many experimental results have shown that the generalisation of individual networks is not unique.Even for some simple problems, different SVMs with different settings (e.g., different network architectures and different initial conditions) may result in different generalisation results.Diverse models are generated by selecting different core learning algorithms, such as the steep-descent algorithm, the Levenberg-Marquardt algorithm and other training algorithms [21].Finally, 10 different univariate least squares support vector machine (LS-SVM) models are formulated [22,23].All of these models use the Gaussian function as the kernel function, and the output is the one-hour-ahead forecasted wind power output.Other parameters are shown in Table 1.Because real-world time series are rarely purely linear or nonlinear, researchers have revealed that hybrid models that hybridise two or more different algorithms can produce forecasts of higher accuracy than those produced by individual models.ARIMA and LS-SVM models have different capabilities of capturing data characteristics in linear and nonlinear domains; therefore, the hybrid model proposed in this study is composed of an ARIMA component and an LS-SVM component.Thus, the hybrid model is expected to capture linear and nonlinear patterns with improved overall forecasting performance.Experimental results with real data sets indicate that the hybrid model can be an effective means by which to improve forecasting accuracy over that achieved by either of the models separately.In this section, a type of hybrid approach using both ARIMA and LS-SVM models is proposed.Because ARIMA is a linear model [24] and LS-SVM [22,25] is a nonlinear model, the hybrid approach is expected to capture both linear and nonlinear patterns in wind park power time series.
Based on the structure proposed by [26], the hybrid model (y t ) can be represented as: where L t denotes the linear component and N t denotes the nonlinear component.These two components must be estimated from the data.First, ARIMA is used to model the linear component, resulting in the residuals from the linear model containing only the nonlinear relationship.The residual at time t (from the linear model) is denoted as e t , and then: where ˆt L is the forecast value at time t from the ARIMA models.Specifications of the (1, 0, 0) × (0, 1, 1) model are as described in Equation ( 11): Residuals are also important.By modelling residuals using LS-SVM, nonlinear relationships can be discovered.With n input nodes, the LS-SVM model for residuals will be: The proposed hybrid method is applied to forecast wind power output, i.e., the LS-SVM model is used to model the nonlinearity of residuals obtained from the ARIMA models.As mentioned in Section 2.1, to generate the diverse models, the structure of the above LS-SVM can be varied by changing the number of nodes in the input layer and the second layer.Because the number of input layers is changed, there should be different training data.These data can be acquired by re-sampling and pre-processing the data.There are many techniques that can be used to obtain diverse training data sets, such as bagging noise injection, cross-validation and stacking.With these different training datasets and structures, 10 diverse hybrid models are generated using ARIMA and LS-SVM models as described in Table 2.For all of these models, the linear parts use ARIMA ( ) and the nonlinear parts use different LS-SVMs.All of these LS-SVM models use the Gaussian function as the kernel function, and the output is the forecasted error.Other parameters are shown in Table 2.In this group of multivariate methods, the relationship between weather data and power output is considered.There are five fundamental variables that impact wind power output.The first, w 1 , is the wind speed, measured in metres/second (m/s); the second, w 2 , is the wind direction, measured as the angle between the incoming wind and the north; the third, w 3 , is the air temperature, measured in °C; the fourth, w 4 , is the atmospheric pressure in Pa; and the fifth, w 3 a, is the relative humidity.These five fundamental variables are used as input data, and the wind power output is the output of the LS-SVM model.
To generate the diverse models, the structure of the above LS-SVM model is varied by changing the number of nodes in the second layer.Different initial conditions can also create diversity in models; these initial conditions include random weights, learning rates and momentum rates from which each network is trained.With these different initial conditions and structures, 10 diverse LS-SVMs are generated.All of these models use the Gaussian function as the kernel function, and the output is the one-hour-ahead forecasted wind power output.Other parameters are shown in Table 3.

Group Model Based on LS-SVM
As mentioned above, each group consists of 10 forecasting models.We need to select a subset of representatives to improve ensemble efficiency.It is clear that it is a necessary requirement of diverse models for making fuzzy group decisions.In this study, a decorrelation maximisation method was used to select the appropriate number of ensemble members.As noted previously, the basic starting point of the decorrelation maximisation algorithm is the principle of ensemble model diversity; that is, the correlations between the selected models should be as small as possible.If there are p models (f 1 , f 2 , …, f p ) with n forecast values, an error matrix (e 1 , e 2 , …, e p ) of p predictors can be represented by: From the matrix, the mean, variance and covariance of E can be calculated as: Mean: Covariance: Considering Equations ( 17) and ( 18), we can obtain a variance covariance matrix: Based on the variance-covariance matrix, correlation matrix R can be calculated using the following equations: where r ij is the correlation coefficient, representing the degrees of correlation classifiers f i and f j .Subsequently, the plural-correlation coefficient between classifier f i and other p − 1 classifiers can be computed based on the results of Equations ( 20) and ( 21).For convenience, To calculate the plural-correlation coefficient, the correlation matrix R can be represented by a block matrix; that is: where R − i denotes the deleted correlation matrix.It should be noted that r ii = 1(i = 1, 2, …, p).Next, the plural-correlation coefficient can be calculated by: For a pre-specified threshold θ, if ρ i 2 > θ, then model f i should be removed from p models.
Otherwise, model f i should be retained.Generally, the decorrelation maximisation algorithm can be summarised in the following steps: Computing the variance-covariance matrix V ij and the correlation matrix R with Equations ( 19) and (20).For the i th classifier (i = 1, 2, …, p), the plural-correlation coefficient ρ i can be calculated using Equation (23).
For a pre-specified threshold θ, if ρ i < θ, then the i th classifier should be deleted from the ρ classifiers.Conversely, if ρ i > θ, then the i th classifier should be retained.For each group of models, we select eight as the representative for the subsequent step.

Fuzzy Group Prediction
For a specified forecasting problem, different experts usually give different estimations based on a set of criteria X = (c 1 , c 2 , ..., c m ).Some experts give optimistic estimates, some prefer pessimistic estimates, and others present the most likely estimates.To incorporate these different judgements into the final forecasting result and to make full use of the different estimates, a process of fuzzification is used.In this paper, a typical triangular fuzzy number can be used to describe the forecasting results provided by the experts; that is: ( , , ) (the lowest forecast value; the most likely forecast value; the highest forecast value), where i represents the numerical index of experts.
Like human experts, individual LS-SVM forecasting groups can also generate different forecasting results by using different parameter settings and training sets.For example, the first forecasting group (univariate LS-SVM model group) generates eight different forecasting results from the eight models (selected from the first 10 models; Section 2.3) of different hidden neurons or different initial weights.The entire first group can be considered an expert in forecasting.Assume that this expert produces k different results, 1 2 ( ), ( ),..., ( ) for a specified applicant "A" over a set of models of different hidden neurons or different initial weights in this group.To make full use of all of the information provided by these results, without loss of generalisation, we use the triangular fuzzy number to construct the fuzzy opinion for consistency; that is the smallest, average and largest of the k forecasting results are used as the left-, medium-and right-membership degrees, respectively.In other words, the smallest and largest scores are seen as optimistic and pessimistic evaluations, respectively, and the average forecasting result is considered to be the most likely score.Of course, the median can also be used as the most likely score to construct the triangular fuzzy number.However, that approach can cause the loss of certain useful information because some other scores are ignored.Therefore, the average is selected as the most likely power output to incorporate the full information from all of the models into the fuzzy judgement.Using this fuzzification method, the expert can make a fuzzy forecast for each point.More precisely, the triangular fuzzy number used for forecasting can be represented as: min( ( ), ( ),..., ( )) , ( )/ , max( ( ), ( ),..., ( )) Suppose there are p experts, and let 1 2 ( , ,..., )  be the aggregation of p fuzzy judgements, where ()  is an aggregation function.Many methods have been developed to determine the aggregation function.Usually, fuzzy judgements of the p group members are aggregated by using a common linear additive procedure; that is: where w i is the weight of the i th fuzzy judgement, i = 1, 2, ..., p.The weights usually satisfy the following normalisation condition: At this point, the goal is to determine the optimal weight w i of the i th fuzzy expert.In this study, three groups of models are used as experts, and we give them the same weight of 1/3 each.After completing aggregation, a fuzzy group consensus can be obtained using Equation (25).To obtain a crisp value of the credit score, we use a defuzzification procedure to obtain the crisp value for decision-making purposes.According to Bortolan and Degani, the defuzzified value of a triangular fuzzy number can be determined by its centroid, which is computed by: 3 At this point, a final group consensus has been computed using the above process.To summarise, the proposed intelligent-agent-based fuzzy group forecasting model is comprised of five steps: (1) Three forecasting groups are presented, and each group has eight models with varied structures and initial data, for example.(2) Based on the datasets, each forecasting group can produce eight different forecasting results from the different models.(3) For the different forecasting results, Equation ( 25) is used to fuzzify the judgements of intelligent agents into fuzzy opinions.(4) The fuzzy opinions are aggregated into a group consensus, using the optimisation method proposed above, in terms of the maximum agreement principle.(5) The aggregated fuzzy group consensus is defuzzified into a crisp value.This defuzzified value can be used as the final forecasting result.
To illustrate and verify the proposed intelligent-agent-based fuzzy group forecasting model, the following section presents an illustrative numerical example of real-world data.The flow chart of the entire procedure is shown in Figure 1.

Forecasting Results
In this study, we collected wind power output data from the Changshun wind park in Huade County, Inner Mongolia Autonomous Region, China.This wind park is located on the slopes of hills and mountains within an area of 260 km 2 .Details of the park's geographical information are provided in Table 4.This wind park was completed in May 2010 and has a capacity of 49.5 MW.Its wind power-out data from 1 January 2011, to 31 December 2011, were collected as shown in Figure 2. The short-term forecasting model for predicting hourly power output over a 24-hour horizon was tested.Other input data, such as the actual climate information, were collected from local environmental stations.The data from 1 January 2011, to 31 October 2011, are used for constructing and training the models.The data from November 2011 are used to test the models and select the group modes according to Section 2.3.The results are presented in Table 5.
The data from December 2011 are used in the testing of the models and in the model analysis.There are 24 points for each day.To judge the accuracy of the model, individual models and the combined fuzzy forecasting model are compared using the following MAPE: where ˆi p is the forecast data, i p is the real-time data, and N is the number of time points used in determining the forecast.Also the relative error is adopted to evaluate the models performance.The error is calculated as the follows: The MAPEs of the individual models and the combined fuzzy forecasting model are calculated.The results are shown in Table 5.

Statistical Test
The best individual model is DLS-SVM-6, and the second best is H-AR-LS-7 in terms of MAPE, Statistical test is carried out among the GFSVM model and those two models.According to the methods mentioned in reference [27], comparison in made between the GFSVM model and the best individual model DLS-SVM-6.
  1 T it t y  is the history data series,   1 ˆT it t y  is the results from the GFSVM model,   where   The comparison result between the GFSVM model and DLS-SVM-6 the model is shown as Figure 3 and Table 6.
The same comparison is made between the GFSVM model and the H-AR-LS-7 model, and the result is shown as Figure 4 and Table 7.
In Table 6 and Table 7, T is sample size, ρ is the contemporaneous correlation, and θ is the serial correlation.All tests are at the 10% level.We perform 260 replications.
For comparison between the GFSVM model and the DLS-SVM-6 model, we obtain S 1 = 11.74,S 2a = 10.67 which implying a p-value= 0.089, 0.076.Thus, for sample at hand we do not reject at conventional level the hypothesis of the accuracy of the GFSVM model is better than the DLS-SVM-6 model.In the similar way, we can also statistically conclude that the GFSVM model is better than the H-AR-LS-7 model.
From above, we can draw a statistical conclusion that the GFSVM model is better than the DLS-SVM-6 model and the H-AR-LS-7 model.

Result Discussions
From Table 5, it can be observed that the fuzzy group forecasting model (GFSVM) performs best in terms of MAPE, with a MAPE of only 15.27%.The average MAPEs of these 8 models for groups 1, 2 and 3 are 21.61, 18.79 and 19.6%, respectively; all of these MAPEs are higher than those of the GFSVM.The best and second best individual models are DLS-SVM-6 and H-AR-LS-7, and their relative errors for total testing points are shown in Figure 5 and Figure 6 respectively.From these two figures, it can be observed that the range of the relative errors from the fuzzy group forecasting model GFSVM is smaller than that for DLS-SVM-6 and H-AR-LS-7.This means that the GFSVM is much more reliable than the other models.Table 8 represents the number of predictions between ±10%, ±20%, ±30% and ±40% for DLS-SVM-6, H-AR-LS-7 and GFSVM.For example, for the GFSVM model, 47.3% of the predictions have errors between ±10%, whereas for the DLS-SVM-6 model, 34.1% of the errors are in the same error margin, and for H-AR-LS-7 model, only 30.5% of the errors are in the same error margin.Obviously, the accuracy of GFSVM model is the best among these three models.From Figure 7, we know that the GFSVM can imitate the actual wind power output with high accuracy.

Figure 2 .
Figure 2. Time series plots of hourly wind power output.

Figure 5 .
Figure 5. Wind power forecast relative errors of GFSVM model and DLS-SVM-6 model.

Table 2 .
Ten diverse hybrid models using ARIMA and LS-SVM.

Table 4 .
Wind park geographical information.Note: The very low minimum temperature is the extremely low temperature in this area, the lowest temperatures in this wind park is −27 °C in January.There is no stop in 2011 due to low temperature.

Table 5 .
The MAPEs of individual models and the combined fuzzy forecasting model.
[27]sian 3.Serially correlated 4 contemporaneously correlated.The null hypothesis is a positive median loss differential: med(g(e it ) − g(e jt )) < 0. So, we introduce two test statistics in reference[27], S 1 and S 2a as the follows:

Table 8 .
Wind power forecast errors distribution for three models (% of errors in each margin).