A New Hybrid Prediction Method of Ultra-Short-Term Wind Power Forecasting Based on EEMD-PE and LSSVM Optimized by the GSA

.


Introduction
As a clean renewable energy, wind energy is regarded as a good alternative to deal with environmental problems and energy crises [1,2].According to a report published by the World Wind Energy Association (WWEA), worldwide wind capacity reached 54 GW by the end of 2017, with a growth rate of 11.8% [3].The total installed capacity is reported in Figure 1.The intermittent nature of wind power generation has posed a big challenge for maximizing the utilization of the wind power industry [4].It is of practical significance to optimize the wind power prediction algorithm and make it more suitable for the operation and wind conditions of a specific wind farm.Wind power forecasting is difficult to achieve due to its intermittency and stochastic fluctuation, which brings great challenges to power system operation and control [5,6].Over the past few decades, a large amount of research has been devoted to the development of effective and reliable wind speed/power forecasting methods, models, and tools [7].Generally, these methods can be broadly divided into four major models [8][9][10][11]: (a) physical models, (b) statistical models, (c) hybrid models, and (d) spatial correlation models.Physical models take into account parameters such as topography, temperature, and pressure, at the location of the wind farm, which are often utilized in short-term forecasting 6-72 h ahead and long-term forecasting with multiple weather variables.Typical physical wind power forecasting systems, among others, are the Prediktor system, developed by the Risoe National Laboratory in Denmark [12]; Previento, developed by the University of Oldenburg in Germany [13]; and eWind, developed by AWS True Wind Inc. (New York, NY, USA) [14].Statistical models are built based on historical power/speed time series data, which establishes a functional relationship between historical data and forecast data [15].These models analyze the relationship between various explanatory variables and online measurements.The well-known pure statistical models are the autoregressive (AR) model [16], autoregressive moving average (ARMA) model [17], autoregressive integrated moving average (ARIMA) model [18], seasonal autoregressive integrated moving average (SARIMA) model [19], and the autoregressive integrated moving average with exogenous variables (ARMAX) model [20].However, statistical models based on the assumption that linear structures exist among time series data cannot capture non-linear patterns very well.
Individual models lack the ability to deal with big data and fail to capture the majority of the complex characteristics of the original wind power series data [21].To make use of the advantages of statistical and physical models, a number of hybrid models with data pre-processing techniques, error post-processing techniques, and parameter selection and optimization techniques, have been proposed.
Data pre-processing techniques involve analyzing and processing data make original time series into multiple sequences or matrices, which have more obvious characteristics.Therefore, to some extent, pre-processing techniques can improve forecasting accuracy.Wavelet decomposition [22][23][24][25][26] and empirical mode decomposition [27][28][29] are the prevailing data pre-processing techniques, which can analyze the original wind power series in time and frequency domains.De et al. [25] compared the hybrid artificial neural networks (ANN) method.Case studies show that wavelet decomposition (WD)-LSSVM performs better than WD-ANN.Zhang et al. [29] used variational mode decomposition (VMD) to process original wind power series, then established a novel combined model based on machine learning methods.Zhao et al. [30] analyzed the characteristics of the outliers caused by wind curtailment, then, a data-driven outlier elimination approach, combining quartile method and density-based clustering method was proposed; however, variational mode decomposition (VMD) is prone to mode mixing problems.Niu et al. [31] used empirical mode decomposition (EMD) to decompose original wind speed data, then, a novel Wind power forecasting is difficult to achieve due to its intermittency and stochastic fluctuation, which brings great challenges to power system operation and control [5,6].Over the past few decades, a large amount of research has been devoted to the development of effective and reliable wind speed/power forecasting methods, models, and tools [7].Generally, these methods can be broadly divided into four major models [8][9][10][11]: (a) physical models, (b) statistical models, (c) hybrid models, and (d) spatial correlation models.Physical models take into account parameters such as topography, temperature, and pressure, at the location of the wind farm, which are often utilized in short-term forecasting 6-72 h ahead and long-term forecasting with multiple weather variables.Typical physical wind power forecasting systems, among others, are the Prediktor system, developed by the Risoe National Laboratory in Denmark [12]; Previento, developed by the University of Oldenburg in Germany [13]; and eWind, developed by AWS True Wind Inc. (New York, NY, USA) [14].Statistical models are built based on historical power/speed time series data, which establishes a functional relationship between historical data and forecast data [15].These models analyze the relationship between various explanatory variables and online measurements.The well-known pure statistical models are the autoregressive (AR) model [16], autoregressive moving average (ARMA) model [17], autoregressive integrated moving average (ARIMA) model [18], seasonal autoregressive integrated moving average (SARIMA) model [19], and the autoregressive integrated moving average with exogenous variables (ARMAX) model [20].However, statistical models based on the assumption that linear structures exist among time series data cannot capture non-linear patterns very well.
Individual models lack the ability to deal with big data and fail to capture the majority of the complex characteristics of the original wind power series data [21].To make use of the advantages of statistical and physical models, a number of hybrid models with data pre-processing techniques, error post-processing techniques, and parameter selection and optimization techniques, have been proposed.
Data pre-processing techniques involve analyzing and processing data make original time series into multiple sequences or matrices, which have more obvious characteristics.Therefore, to some extent, pre-processing techniques can improve forecasting accuracy.Wavelet decomposition [22][23][24][25][26] and empirical mode decomposition [27][28][29] are the prevailing data pre-processing techniques, which can analyze the original wind power series in time and frequency domains.De et al. [25] compared the hybrid artificial neural networks (ANN) method.Case studies show that wavelet decomposition (WD)-LSSVM performs better than WD-ANN.Zhang et al. [29] used variational mode decomposition (VMD) to process original wind power series, then established a novel combined model based on machine learning methods.Zhao et al. [30] analyzed the characteristics of the outliers caused by wind curtailment, then, a data-driven outlier elimination approach, combining quartile method and density-based clustering method was proposed; however, variational mode decomposition (VMD) is prone to mode mixing problems.Niu et al. [31] used empirical mode decomposition (EMD) to decompose original wind speed data, then, a novel hybrid forecasting model based on the general regression neural network (GRNN) method, optimized by the fruit fly optimization algorithm (FOA), was proposed.Ye et al. [32] discussed EMD, EEMD, complementary ensemble empirical mode decomposition (CEEMD), and complete empirical mode decomposition with adaptive noise (CEEMDAN).Their results showed that the proposed CEEMDAN-support vector regression (CEEMDAN-SVR) model out-performed the other models.
Error post-processing (EP-P) techniques use estimated error, which is obtained from a forecasting model, to correct final forecasting results.Huang et al. [33] proposed a new, real-time decomposition model based on the feature selection and error correction of wind speed forecasting, which improved prediction accuracy.Platon et al. [34] used an advanced technique to estimate surface wind gusts, then, combined dynamic and statistical techniques into the wind power forecasting model.Liang et al. [35] improved wind speed forecasting performance using a correlation analysis method to analyze multi-step forecast errors and proposed a novel hybrid wind speed prediction model based on error forecast correction.Federica et al. [36] employed a principal component analysis (PCA), combined with post-processing, to reduce computational costs and forecast errors.Li et al. [37] proposed a new combined approach based on Extreme Learning Machine (ELM) and an error correction model, which improved prediction accuracy over a short-term time scale (6-72 h).
Parameter selection and optimization techniques can improve prediction accuracy and reduce prediction time through the training model.Xiao et al. [38] employed a new hybrid prediction model based on a modified bat algorithm (BA) with the conjugate gradient (CG) method to multi-step wind speed prediction, which optimized the initial weights of the neural networks.Wang et al. [39] proposed a novel combined forecasting model based on a multi-objective bat algorithm (MOBA), multi-step-ahead wind speed forecasting.Huang et al. [40] proposed a novel forecasting model, using a quantum particle swarm optimization (PSO) algorithm, to receive higher forecast accuracy levels.Chang et al. [41] compared the persistence method, the back propagation artificial neural network (BP) model, and radial basis neural network (RBF) model.Case studies showed that the proposed forecasting method was more accurate and reliable than the other three models.The clonal selection algorithm (CSA) [42], gravitational search algorithm (GSA) [43], particle swarm optimization (PSO) [44,45], simplified swarm optimization (SSO) [46], and cuckoo search algorithm (CS) [47], among others, are the prevailing methods to optimize the parameters of wind power/speed forecasting models.
Spatial correlation models characterize the relationship between the wind power or speed of a target wind farm and a reference wind farm at different spatial locations.Zhou et al. [48] proposed a spatial and temporal correlation model and it was found that this model could improve ultra-short-term wind power forecasting accuracy.Tascikaraoglu et al. [49] proposed a novel method, which first utilized a Wavelet Transform (WT) method to decompose the wind speed data into more stationary components and then used a spatio-temporal model on each of the subseries to incorporate both the temporal and spatial information for wind speed forecasting.Ye et al. [50] analyzed uncertainty and dependence in wind power output, and employed a physical spatio-temporal correlation model.They found that this method outperformed statistical models.
In this paper, a novel combine model is proposed based on ensemble empirical mode decomposition, permutation entropy, least squares support vector machine, and gravitational search algorithm for ultra-short wind power forecasting.To investigate the effectiveness of the model, the proposed method will be thoroughly tested and benchmarked on real wind power data from Hebei, China.The main contributions of this research will be as follows: (1) Using pre-processing techniques to deal with the complex wind power time series.
The ensemble empirical mode decomposition-permutation entropy will be used to analyze the original wind power series, by which the original wind power time series will be translated into some new, relatively stable subsequences.Ensemble empirical mode decomposition can decompose original wind power time series into a series of intrinsic mode functions (IMF) with different characteristic scales; however, it fails to capture weak changes in time series.Permutation entropy will be used to reconstitute subsequences by similar principles, which can promote weak time signals.
LSSVM will be employed as the basic forecasting model, due to the features of regression for wind power prediction.To improve the forecasting accuracy and stability of LSSVM directly, the hyper-parameters of LSSVM will, firstly, be optimized by GSA to obtain the best hyper-parameters.
(3) Using comprehensive error metrics to assess the performance of the proposed model.
The error indicators, in this paper, will include the normalized mean absolute of errors (NMAE), normalized root mean square error (NRMSE), and Pearson correlation coefficient (R).In this paper, we will also introduce two improvement percentage error indexes, ξ NMAE(%) and ξ NRMSE(%) .
The remainder of the paper will be organized as follows: the details of the proposed hybrid model based on EMD-PE-LSSVM-GSA for wind power forecasting will be illustrated in Section 2. Forecasting performance evaluation indicators will be described in Section 3. Experimental examples will be presented in Section 4. The resulting analysis and forecasting performance of the proposed method, compared with other methods, will be given in Section 5. Finally, conclusions will be given in Section 6.

Proposed Methodology
The approaches used, including ensemble empirical mode decomposition, permutation entropy, the least squares support vector machine model, and gravity search algorithm, are described in this section.The EEMD-PE-LSSVM-GSA wind power prediction process is shown in Figure 2.
Energies 2018, 11, x FOR PEER REVIEW 4 of 24 will be used to reconstitute subsequences by similar principles, which can promote weak time signals.
LSSVM will be employed as the basic forecasting model, due to the features of regression for wind power prediction.To improve the forecasting accuracy and stability of LSSVM directly, the hyper-parameters of LSSVM will, firstly, be optimized by GSA to obtain the best hyper-parameters.
(3) Using comprehensive error metrics to assess the performance of the proposed model.
The error indicators, in this paper, will include the normalized mean absolute of errors (NMAE), normalized root mean square error (NRMSE), and Pearson correlation coefficient (R).In this paper, we will also introduce two improvement percentage error indexes, NMAE(%) The remainder of the paper will be organized as follows: the details of the proposed hybrid model based on EMD-PE-LSSVM-GSA for wind power forecasting will be illustrated in Section 2. Forecasting performance evaluation indicators will be described in Section 3. Experimental examples will be presented in Section 4. The resulting analysis and forecasting performance of the proposed method, compared with other methods, will be given in Section 5. Finally, conclusions will be given in Section 6.

Proposed Methodology
The approaches used, including ensemble empirical mode decomposition, permutation entropy, the least squares support vector machine model, and gravity search algorithm, are described in this section.The EEMD-PE-LSSVM-GSA wind power prediction process is shown in Figure 2.

Ensemble Empirical Mode Decomposition (EEMD)
EMD is frequently subject to a mode mixing problem, where a portion of the IMF may have properties that are quite similar to adjacent IMFs.EEMD is based on EMD and is an algorithm-based method of processing signals, which can be used to developed to solve the mode mixing problem [51].White noise is added to the wind power time series at different scales.In order to solve the EMD mode mixing problem, a detailed explication is given in [52], as follows: (1) Add white noise series to the original wind power series: where x(t) is the original wind power series, and w i (t) is the white noise series.Then, find the corresponding EMD components.(2) Find the local maxima and minima of x new,i (t).
(3) Find the upper envelope x new,iu (t) and lower envelope x new,il (t).(4) Calculate the mean of the wind power time series with white noise and the difference between x new,iu (t) and x new,il (t).
(5) Repeat Steps 1-3 with d(i) instead of x new,i (t), until m(i) ≤ δ (where δ is the acceptable error).Then, take , and the residual is as follows: (6) The wind power time series x(t) can be decomposed as follows: where c m i represents the IMFs, and w n i is the final residue.

Permutation Entropy (PE)
In the case of nonlinear analysis, the complexity of the signal can be effectively determined according to its entropy values [53], such as scale entropy, sample entropy, and multi-scale entropy.Permutation entropy is widely used in sequence complexity and nonlinear analysis because of its high robustness, efficiency, and simplicity.This method's motivation is the classification of the complex system.The larger the permutation entropy value, the higher the time series randomness of the sequence and the more likely another pattern will occur.Conversely, the smaller the permutation entropy value, the lower the time series randomness of the subsequence and the less likely another pattern will occur.The algorithm implementation process of PE is given below.
where m is the embed dimension of the wind power time series, and τ is the delay time. [ Energies 2018, 11, 697 6 of 23 where j 1 , j 2 , . . ., j m represents the index number of the column in which each element in the reconstruction vector resides.Each vector can be mapped to a set of symbols.
where g = 1, 2, . . ., k, k ≤ m!We calculated the probability of occurrence for each symbol sequence, P 1 , P 2 , . . ., P k , ∑ k l=1 P l , where: In the form of Shannon entropy, the permutation entropy of the wind power time series can be expressed as: When P l = 1 m! , H p (m) reaches the maximum ln(m!), the standardized processing can be achieved by: Permutation entropy values were used to evaluate the complexity of each IMFs signal, and the adjacent entropy values were used to reconstitute IMFs into new subsequences (RS).

Least Squares Support Vector Machine (LSSVM)
The support vector machine (SVM) is an effective machine algorithm for data classification and regression [54].SVM can overcome data over-fitting problems and improve generalization performance by minimizing structural risk instead of empirical risk.The standard SVM uses nonnegative errors in the cost function and inequality constraints, while the LSSVM uses square errors and equality constraints.Therefore, LSSVM is a variation of the standard SVM.
Considering the wind power training dataset, (x i , y j ), i = 1, . . ., NL, NL is the number of training datasets, x i ∈ R d is the input vector, y i ∈ R is the corresponding output, and d is the dimension of x i .The optimal decision function can be constructed by mapping the input space into the high-dimension feature space as follows: where ϕ(x) is the nonlinear function, ω is the weight, and b is the bias.
where ω is the model complexity, γ is the regularization parameter to balance the complex degree and approximation accuracy of the model, and R e is the empirical risk function.The objective function of LSSVM can be framed: where λ i (1, 2, . . ., N) represents the Lagrange multipliers.
Energies 2018, 11, 697 Based on the Karush-Kuhn-Tucker (KKT) conditions, Equation ( 15) is given by: Based on Equation (15) the following expression can be derived: where T is the coefficient matrix, y = [y 1 , y 2 , . . . ,y t ] T is the output matrix, K(x i , y j ) = ϕ(x i ) T ϕ(x j ), and K is the kernel function on the basis of Mercer's condition.The regression function of the LSSVM model can be described as: The radial basis function is selected as the kernel function, which is given as follows: where σ is the kernel parameter.

Gravitational Search Algorithm (GSA)
The GSA was first proposed in 2009 [55] and is a population optimization algorithm based on the law of gravity and Newton's second law.The algorithm searches for the optimal solution by moving the particle position of the population.That is, as the algorithm iterates, the particles move continuously in the search space by the gravitation between them.
Assuming that the optimization problem can be given in ( 14) and ( 15), the particle's position is the solution.The position of particle i is defined as: Step 1: Initialize the speed and position of random particles and calculate the fitness of each particle.
Step 2: Calculate the gravitational constant G(t) and the inertia mass of each particle: where G 0 is the initial gravitational constant, α is the decay rate, T is the maximum generation, and Energies 2018, 11, 697 8 of 23 Step 3: Calculate the resultant particle force, which can be given as: where F d ij (t) is gravitation with the particles i and j, with dimension, d, at the t generation; M pi (t) is the passive gravitational mass related to particle i; ε is the small constant; x d i (t) and x d j (t) is the position of dimension, d, of particles i and j at the t generation; R ij (t) = x i (t), x i (t) 2 is the Euclidean distance between particles i and j.
Step 4: Calculate accelerated speed.According to Newton's second law, the acceleration is obtained as follows: Step 5: Update speed and position: where rand j is a random number with a uniform distribution [0,1].
Step 6: Check the termination condition.Terminate the optimization if the stopping criteria requirements are met, and, if not, repeat the procedure from step 2 to 5 until the termination condition requirements are met.

The Proposed Method for Wind Power Forecasting
The flowchart of the proposed hybrid model based on EEMD-PE-LSSVM-GSA is illustrated in Stage 1: EEMD process To build an effective prediction model, the features of the original wind power datasets must be fully analyzed and considered.EEMD techniques can be used to decompose the original wind power time series, x(t), into new, relatively stable subsequences, x i (t), (i = 1, 2, 3, . ..).
(1): Initialize: Setting the parameters of GSA, the particle number is L, gravitational constant is G 0 , attenuation rate is α, and the dimensions of GSA are d.
(2): Calculate: Calculate the fitness function F f itness as follows: where x i is the real wind power value, xi is the forecasting value, and N is the number of samples.
(3): Update: The states are updated as follows: where x d i (t) is the position of the particle, v d i (t) is the speed of search, and rand i is a uniform random variable, with a value in the range of [0, 1].
(4): Selection: If the iteration reaches its maximum, or the F f itness reaches its minimum, the best hyper parameters (σ and γ) and corresponding kernel parameters can be found.
(5): Validation: Output wind power prediction values for every new subsequence.The wind power forecasting errors, in terms of different criteria, are computed to validate the method.The results are compared with that of other methods.The best parameters of the optimized model will be obtained.
Stage 4: Hybrid process Combine all the reconstituted subsequences of forecasting results and output the final forecasting results.
(2): Calculate: Calculate the fitness function fitness F as follows: where i x is the real wind power value,  i x is the forecasting value, and N is the number of samples.
(3): Update: The states are updated as follows: ( 1) ( ) ( 1) ( 1) ( ) ( ) where ( ) x t is the position of the particle, ( ) (4): Selection: If the iteration reaches its maximum, or the fitness F reaches its minimum, the best hyper parameters (σ and γ ) and corresponding kernel parameters can be found.
(5): Validation: Output wind power prediction values for every new subsequence.The wind power forecasting errors, in terms of different criteria, are computed to validate the method.The results are compared with that of other methods.The best parameters of the optimized model will be obtained.
Stage 4: Hybrid process Combine all the reconstituted subsequences of forecasting results and output the final forecasting results.
x nTs x nTs w nTs = +

Performance Criterion
In this paper, the error indexes include the normalized mean absolute of errors (NMAE), which reflects actual prediction error, normalized root mean square errors (NRMSE), which reflects large forecasting deviations, and the Pearson correlation coefficient.They are defined, respectively, as follows: where x i is the actual wind power value, xi is the forecasting value, P Inst is the installed wind power capacity, and N is the number of samples.Additionally, this paper introduces two percentage error indexes, which are defined as follows: where

Dataset Description
In this paper, a total of 5760 samples were collected from a wind farm in Hebei, China.Considering the influence of seasonal factors, the whole dataset was divided into four parts, Datasets A, B, C, and D, which were independently used to verify the effectiveness of the proposed method.Dataset A was from 1-15 January 2016, Dataset B was from 1-15 April, Dataset C was from 1-15 July, and Dataset D was from 1-13 October.Wind power generation data were 15 min averaged values.The forecasting methods were applied over very short time horizons, of up to 4 steps (i.e., 1 h) ahead, with each step being 15 min.The samples are shown in Figure 4.

Data Processing
EEMD-PE was used to analyze the wind power series, by which the wind power time series could be translated into new, relatively stable subsequences.EEMD was also used to decompose the wind power time series into a series of IMFs with different characteristic scales.Then, PE was used to analyze the IMFs.
There were two important EEMD parameters: the number of the ensemble and the amplitude of the added white noise.In this experiment, 200 groups of white noise signals were added, with a standard deviation was 0.2.There were twelve independent IMF compositions.Decomposition results are shown in Figure 5.

Data Processing
EEMD-PE was used to analyze the wind power series, by which the wind power time series could be translated into new, relatively stable subsequences.EEMD was also used to decompose the wind power time series into a series of IMFs with different characteristic scales.Then, PE was used to analyze the IMFs.
There were two important EEMD parameters: the number of the ensemble and the amplitude of the added white noise.In this experiment, 200 groups of white noise signals were added, with a standard deviation was 0.2.There were twelve independent IMF compositions.Decomposition results are shown in Figure 5.The PE parameters ( m and τ ) have an impact on simulation time and prediction accuracy; if the parameter m is too large, prediction accuracy will be reduced, and if the parameter τ is too small, the simulation time will become longer.Therefore, we discuss the processing of the two parameters.
A number of different parameter values were chosen to forecast the series.It is known that the higher the embedded dimensions, the more complex the structure will be and the more modeling time will be spent.Considering the forecast time and model complexity, shown in Table 1.
In Table 1, the NRMSE of   The PE parameters and τ) have an impact on simulation time and prediction accuracy; if the parameter m is too large, prediction accuracy will be reduced, and if the parameter τ is too small, the simulation time will become longer.Therefore, we discuss the processing of the two parameters.
A number of different parameter values were chosen to forecast the series.It is known that the higher the embedded dimensions, the more complex the structure will be and the more modeling time will be spent.Considering the forecast time and model complexity, m = 3, 4, 5, 6, 7 and τ = 0.1, 0.5, 1 models are discussed for (1-4)-step-ahead wind power forecasting, and the errors are shown in Table 1.
In Table 1, the NRMSE of m = 3 and τ = 1 are the smallest among all parameters; then, error increases slowly with the dimension of embedding.Performance evaluation, using NMAE and NRMSE, shows that m = 3 and τ = 1 is better than other parameters.When m = 7 and τ = 1, the NRMSE of the testing sample is 3.0773, which is the largest level of error, and the modeling time is 10.9929.Comparative analysis shows that increasing the embedded dimension and computation time impacts on the prediction at different time scales.Considering prediction accuracy and simulation time, m = 3 and τ = 1 are selected to predict wind power.Wind power time series data has nonlinear and non-stationary features.It can be seen from Figure 5 that there are a lot of IMF components after decomposition.If the LSSVM model is used to build each component respectively, the computing time will increase significantly.PE technology can be used to evaluate complexity of each IMF signal.
In order to forecast ultra-short-term wind power effectively, this paper used PE technology to analyze the complexity features of each IMF component.The PE values of all IMFs are shown in Figure 6.In Figure 6, the IMF component frequency decreases from high to low, and the PE value also decreases, which verifies that the PE theory is effective.The PE value indicates the stochastic degree of the time series, where a smaller PE value means more regular time series, and a larger PE value means more randomness.To reduce the computing complexity of the proposed method, according to the PE values, the IMFs were classified and merged to reconstituted subsequences.From IMF 1 to IMF 11 and residue (r), the PE values gradually decreased from 1.7916 to 0.6320.IMF 1 was assigned to RS I, since it had the highest frequency.IMF 2 and IMF 3 PE value differences were about 0.2~0.3,so they could be set as RS II.IMF 4 and IMF 5 PE value differences were about 0.1, and, thus, could be set as RS III.IMF 5~IMF 11+r PE value differences were about 0.02~0.06,and were set as RS IV.The reconstituted subsequences processed by PE are shown in Figure 7.
Energies 2018, 11, x FOR PEER REVIEW 13 of 24 Wind power time series data has nonlinear and non-stationary features.It can be seen from Figure 5 that there are a lot of IMF components after decomposition.If the LSSVM model is used to build each component respectively, the computing time will increase significantly.PE technology can be used to evaluate complexity of each IMF signal.
In order to forecast ultra-short-term wind power effectively, this paper used PE technology to analyze the complexity features of each IMF component.The PE values of all IMFs are shown in Figure 6.In Figure 6, the IMF component frequency decreases from high to low, and the PE value also decreases, which verifies that the PE theory is effective.The PE value indicates the stochastic degree of the time series, where a smaller PE value means more regular time series, and a larger PE value means more randomness.To reduce the computing complexity of the proposed method, according to the PE values, the IMFs were classified and merged to reconstituted subsequences.From IMF 1 to IMF 11 and residue (r), the PE values gradually decreased from 1.7916 to 0.6320.IMF 1 was assigned to RS I, since it had the highest frequency.IMF 2 and IMF 3 PE value differences were about 0.2~0.3,so they could be set as RS II.IMF 4 and IMF 5 PE value differences were about 0.1, and, thus, could be set as RS III.IMF 5~IMF 11+r PE value differences were about 0.02~0.06,and were set as RS IV.The reconstituted subsequences processed by PE are shown in Figure 7.The experimental parameters [43] are shown in Table 2.
Energies 2018, 11, x FOR PEER REVIEW 14 of 24  The optimal parameters, which result from using RBF kernel functions in the LSSVM model, is shown in Table 3.The optimal parameters, which result from using RBF kernel functions in the LSSVM model, is shown in Table 3.In Figure 8, for Datasets A and B, the NRMSE values for six models tended to decrease with the length of the training dataset.

Types of Kernel Function Penalty Factor Kernel Function
For Dataset A, the NRMSE of each method varied irregularly when the length of the training dataset increased from 100 to 700.The values in this interval could not be selected as the length of the training dataset.In the range of 700-1400, the trend of NRMSE became flat.The proposed model was most insensitive to training dataset length, and the NRMSE of the proposed approach remained almost unchanged as the dataset length was greater than 700, which shows the proposed model was a simple, but powerful forecasting method.
For Dataset B, the NRMSE for each method varied irregularly when the length of training dataset increased from 100 to 600.Similarly, the values in this interval could not be selected as the length of the training dataset.In the range of 600-1400, the trend of NRMSE became flat, and the other five methods remained unchanged after 1000.However, the EEMD-PE-LSSVM model kept decreasing at 1000.
Taking into account the sensitivity of each method to the data, 1000 data points, as the length of the training set, was appropriate.

Results and Discussion
The proposed hybrid model was employed to forecast ultra-short-term wind power, and the corresponding results from the proposed model and other contrast models are discussed in the following section.

Experiment 1: The Comparison Results of the Proposed Model and Other Models
Ultra-short-term wind power for 1-step, 2-step, 3-step and 4-step-ahead prediction was implemented for Datasets A, B, C, and D. Results from the analyses will be clearly demonstrated in Tables 4-7 to reveal the effectiveness of each model.
To further verify the applicability, performance, and superiority of the proposed hybrid model, the wind power data from Datasets A, B, C, and D were employed for modeling, with five alternative forecasting models (the SVM model, RBF model, LSSVM model, EEMD-LSSVM model, and EEMD-PE-LSSVM model) that were compared with the proposed hybrid model.The results are shown in .
In Figures 9-12, it can be seen that the error of prediction results for the proposed model was much smaller than the SVM model, RBF model, LSSVM model, EEMD-LSSVM model, and EEMD-PE-LSSVM, which implies that the proposed method performs much better than other five models.The prediction results of the EEMD-PE-LSSVM model lagged behind the proposed model In Figure 8, for Datasets A and B, the NRMSE values for six models tended to decrease with the length of the training dataset.
For Dataset A, the NRMSE of each method varied irregularly when the length of the training dataset increased from 100 to 700.The values in this interval could not be selected as the length of the training dataset.In the range of 700-1400, the trend of NRMSE became flat.The proposed model was most insensitive to training dataset length, and the NRMSE of the proposed approach remained almost unchanged as the dataset length was greater than 700, which shows the proposed model was a simple, but powerful forecasting method.
For Dataset B, the NRMSE for each method varied irregularly when the length of training dataset increased from 100 to 600.Similarly, the values in this interval could not be selected as the length of the training dataset.In the range of 600-1400, the trend of NRMSE became flat, and the other five methods remained unchanged after 1000.However, the EEMD-PE-LSSVM model kept decreasing at 1000.
Taking into account the sensitivity of each method to the data, 1000 data points, as the length of the training set, was appropriate.

Results and Discussion
The proposed hybrid model was employed to forecast ultra-short-term wind power, and the corresponding results from the proposed model and other contrast models are discussed in the following section.It can be observed from Tables 4-7 that the NRMSE and NMAE values of the proposed method were the lowest and the R values were the highest, compared with other models for (1-4)-step-ahead prediction during the entire evaluation period, which demonstrates the superior performance of the proposed method.For all the forecasting horizons investigated in this paper, the proposed method always reached the minimum values of NRMSE and NMAE and the maximum value of R.This indicates that the proposed model significantly outperforms benchmark models.Thus, the proposed model is an effective tool for wind power forecasting.In order to test the accuracy of wind power forecasting, NRMSE, NMAE, R, ξ NMAE(%) , and ξ NRMSE(%) were used in this paper, for Datasets A, B, C, and D with 1-step-ahead and 4-steps-ahead.Detailed numerical analysis is given in Tables 4-7.
It can be observed from Tables 4-7 that the NRMSE and NMAE values of the proposed method were the lowest and the R values were the highest, compared with other models for (1-4)-step-ahead prediction during the entire evaluation period, which demonstrates the superior performance of the proposed method.For all the forecasting horizons investigated in this paper, the proposed method always reached the minimum values of NRMSE and NMAE and the maximum value of R.This indicates that the proposed model significantly outperforms benchmark models.Thus, the proposed model is an effective tool for wind power forecasting.In order to compare performance differences between the combined model and other benchmark models, ξ NMAE(%) and ξ NRMSE(%) were utilized in this study.By using this type of criterion, the improvement percentage values of the proposed model and benchmark models are given in Figure 13.
The histogram of ξ NMAE(%) and ξ NRMSE(%) for all models regarding Datasets A, B, C, and D for (1-4)-step-ahead wind power forecasting are shown in Figure 13.For Dataset A, except for the 1-and 4-step-ahead forecasting, the ξ NRMSE(%) values of the proposed method compared with EEMD-PE-LSSVM were positive, which shows that the performance of the proposed model was worse than the EEMD-PE-LSSVM model.In the 1-step-ahead forecasting, the negative value of ξ NRMSE(%) for the proposed method, compared with the RBF model, was minimal, which indicates that the forecasting performance of the proposed model was powerful.In the 2-, 3-and 4-step-ahead forecasting, the proposed method, compared with the EEMD-LSSVM model, performed best.
In (1-3)-step-ahead forecasting, the proposed method had a positive value of ξ NMAE(%) compared with EEMD-PE-LSSVM, which shows that the proposed model was worse than the EEMD-PE-LSSVM model.In the 4-step ahead forecasting, the proposed model was slightly worse than the RBF, SVM, and LSSVM models.
There were similar results for Datasets B, C, and D, which shows that the proposed model was effective for ultra-short-term wind power prediction.
As demonstrated in Figure 13, we can derive the following conclusions: (a) heuristic algorithms have good optimization capabilities in wind power forecasting; (b) hybrid models obtain better performance compared with individual and other combined models without optimization; and (c) the proposed model performed the best among all of the studied models.
The simulation time of the 4-step-ahead wind power forecasting, with regard to Datasets A, B, C, and D, for all methods, is given in Table 8.Although the simulation time of the proposed method had higher time consumption than the other prediction models, it achieved the best prediction accuracy, and this simulation time is acceptable in practical implementation.

Conclusions
A new hybrid prediction method of ultra-short-term wind power forecasting, based on EEMD-PE and LSSVM optimization by GSA, was proposed in this paper.The EEMD method was used to decompose raw wind power data time series into a series of IMFs with different scales to solve the mode mixing problem.To effectively reduce the computational complexity of the combined forecasting method, PE was introduced into the complexity assessment of each IMF component, based on the PE value, then the IMF components were recombined to generate new subsequences with significant differences in complexity.The GSA model was utilized to optimize the parameters of LSSVM, which avoided the choice of parameters; then, the optimized model was used in wind power forecasting, which improved regression prediction accuracy.For a fair, clear, comparative study, the proposed method was tested on a practical wind farm (in Hebei, China) and compared with several other models, including the EEMD-PE-LSSVM, EEMD-LSSVM, LSSVM, SVM, and RBF models.The results of the experiments indicated that the proposed model satisfactorily forecasted ultra-short-term wind power for the different datasets.

Figure 1 .
Figure 1.Total installed wind power from 2005 to 2017 worldwide.

Figure 1 .
Figure 1.Total installed wind power from 2005 to 2017 worldwide.

Figure 2 .
Figure 2. The procedure of the new proposed prediction model.

Figure 2 .
Figure 2. The procedure of the new proposed prediction model.
the speed of search, and i rand is a uniform random variable, with a value in the range of [0, 1].

•
update gravitation constant, the best and the worst particle • calculate M and a for each agent

Figure 3 .
Figure 3.The overall framework of the proposed model.

Figure 3 .
Figure 3.The overall framework of the proposed model.

Figure 5 .
Figure 5. Decomposition by EEMD of wind power series.
for (1-4)-step-ahead wind power forecasting, and the errors are

3 m = and 1 τ
= are the smallest among all parameters; then, error increases slowly with the dimension of embedding.Performance evaluation, using NMAE and NRMSE, shows that 3 m = and 1 τ = is better than other parameters.When 7 m = and 1 τ = , the NRMSE of the testing sample is 3.0773, which is the largest level of error, and the modeling time is 10.9929.Comparative analysis shows that increasing the embedded dimension and computation time impacts on the prediction at different time scales.Considering prediction accuracy and simulation time, 3 m = and 1 τ = are selected to predict wind power.

Figure 5 .
Figure 5. Decomposition by EEMD of wind power series.

4. 3 .
Parameter and Training Dataset Settings 4.3.1.The Parameter Setting of the Forecasting Models The simulation was done on a Windows 7 PC with a 64-bit, 2.20 GHz Intel Core i3 2330M CPU, and 6 GB of memory.The wind power forecasting experiments were employed in MATLAB

2 .
Length of Training DatasetsThe length of training datasets is an important factor affecting prediction accuracy.The 1-step-ahead NRMSE of different forecasting methods, with training Datasets A and B, are presented in Figure8.

Table 3 . 24 Figure 8 .
Figure 8.The NRMSE of 1-step-ahead forecasting by different models with two training datasets.

Figure 8 .
Figure 8.The NRMSE of 1-step-ahead forecasting by different models with two training datasets.

Figure 12 .
Figure 12.Comparison between the forecast and real values for Dataset D. (a) 1-step-ahead result; (b) 4-step-ahead result.

Table 1 .
The NRMSE and NMAE of the testing sample with different values of m and τ .

Table 1 .
The NRMSE and NMAE of the testing sample with different values of m and τ.

Table 2 .
Setting the experimental parameters.

Table 3 .
Optimal kernel function parameters.

Table 2 .
Setting the experimental parameters.

Table 4 .
The values of NRMSE, NMAE and R for six models in Dataset A.

Table 5 .
The values of NRMSE, NMAE and R for six models in Dataset B.

Table 6 .
The values of NRMSE, NMAE and R for six models in Dataset C.

Table 7 .
The values of NRMSE, NMAE and R for six models in Dataset D. Experiment II: Comparison Results of Improvement Percentage Error Indexes and Modeling Time