Forecasting Models for Wind Power Using Extreme-Point Symmetric Mode Decomposition and Artificial Neural Networks

The randomness and volatility of wind power poses a serious threat to the stability, continuity, and adjustability of the power system when it is connected to the grid. Accurate short-term wind power prediction methods have important practical value for achieving high-precision prediction of wind farm power generation and safety and economic dispatch. Therefore, this paper proposes a novel combined model to improve the accuracy of short-term wind power prediction, which involves grey correlation degree analysis, ESMD (extreme-point symmetric mode decomposition), sample entropy (SampEn) theory, and a hybrid prediction model based on three prediction algorithms. The meteorological data at different times and altitudes is firstly selected as the influencing factors of wind power. Then, the wind power sub-series obtained by the ESMD method is reconstructed into three wind power characteristic components, namely PHC (high frequency component of wind power), PMC (medium frequency component of wind power), and PLC (low frequency component of wind power). Similarly, the wind speed sub-series obtained by the ESMD method is reconstructed into three wind speed characteristic components, called SHC (high frequency component of wind speed), SMC (medium frequency component of wind speed), and SLC (low frequency component of wind speed). Subsequently, the Bat-BP model, Adaboost-ENN model, and ENN (Elman neural network), which have high forecasting accuracy, are selected to predict PHC, PMC, and PLC, respectively. Finally, the prediction results of three characteristic components are aggregated into the final prediction values of the original wind power series. To evaluate the prediction performance of the proposed combined model, 15-min wind power and meteorological data from the wind farm in China are adopted as case studies. The prediction results show that the combined model shows better performance in short-term wind power prediction compared with other models.


Introduction
With energy shortage and environmental pollution's further deterioration, the development and utilization of renewable energy is receiving more and more attention from the whole world.Wind power, one of the most promising renewable energy sources, has experienced the most growth in the past several decades from a global perspective [1].However, wind power generation is characterized by volatility and intermittency, which has a seriously negative influence on the electrical energy quality and the safe and stable operation of a power system [2].Therefore, it is necessary to predict wind power accurately to reduce the negative influence of the integration of wind turbines into power systems.
The increasing interest in the integration of wind power plants has heightened the need of accurate wind power prediction methods, which mainly includes physical methods, statistical methods, intelligent methods, and combined models.Physical methods, based on the meteorological and geographical information [3], have higher prediction accuracy when the environment of the wind power plant remains stable.For example, Wu et al. [4] developed a prediction system by combining statistical models and physical models to test the data of the Wattle Point wind farm in Australia and found that the system was effective for predicting the power output of wind farms.Cheng et al. [5] assimilated anemometer wind speed observations from wind farm turbines into a numerical weather forecast system, which can effectively improve the accuracy of wind power and wind speed prediction.The statistical method is to establish a mapping relationship between the input of the Numerical Weather Prediction (NWP) system and the wind power, mainly including the autoregressive (AR) model [6], auto-regressive moving average (ARMA) approach [7], auto-regressive integrated moving average (ARIMA) approach [8], and seasonal autoregressive integrated moving average (SARIMA) model [9], and so forth.Liu et al. [10] used the autoregressive moving average-generalized autoregressive conditional heteroscedasticity (ARMA-GARCH) method to simulate the mean and volatility of wind speed at the observation site in Colorado, USA.The results showed that the method can effectively capture the trend change of the mean and volatility of wind speed.Matallah et al. [11] developed a new wind speed forecasting model by combining the Hammerstein model and an autoregressive model, which reflects good prediction performance.
In addition, intelligent methods have good capabilities, such as a strong nonlinear fitting ability, simple learning rules, and high robustness, and they have been extensively applied by researchers for wind power forecasting.In [12], a least squares support vector machine (LSSVM) model optimized by PSOSA was used to predict wind power, and actual calculation examples demonstrated that the prediction method used in the article has high prediction precision.Wang et al. [13] used the support vector machine (SVM) based on the structure risk minimization principle to predict the short-term wind power.Zhang et al. [14] employed a radial basis function neural network (RBFNN) and a multi-objective optimization method to perform interval forecasting of wind speed and ultimately achieved a higher forecasting precision.The back propagation neural network (BPNN) has a long history in prediction, and it has made outstanding contributions to forecasting, especially in the state of uncovering nonlinearity between the inputs and outputs, even with a lack of sufficient information about the relationship between them [15].For instance, Wang et al. [16] proposed a wind power range prediction model based on the multiple output property of BPNN.The simulation results of a practical example showed that the proposed wind power range prediction model can effectively forecast the output power interval.Sun et al. [17] developed a novel wind speed prediction model by combining fast ensemble empirical mode decomposition (FEEMD), phase space reconstruction, and improved BPNN, which ultimately obtained a higher forecasting accuracy.BPNN could approximate complex nonlinear functions, but it easily falls into local minima, and often exhibits over-fitting.To overcome the shortcomings of BPNN, some optimization algorithms are applied.Chao et al. [18] introduced a method called IS-PSO-BP that combines PSO-BP with comprehensive parameter selection to predict wind speed, and the experiment results clearly show that the proposed method achieves much a better forecasting performance than the BPNN and ARIMA model.Wang et al. [19] proposed a forecasting method based on improved empirical mode decomposition (EMD) and the GA-BP neural network to the prediction of wind speed on a wind farm in Inner Mongolia, China.The simulation with MATLAB shows that the proposed method can improve the forecasting accuracy and computational efficiency.A novel hybrid forecasting model called E-SA-BP, which combines ensemble empirical mode decomposition (EEMD), a simulated annealing (SA) algorithm, and BPNN, was developed to perform wind speed forecasting in [20].Although the optimization algorithm mentioned above can improve the prediction performance of BPNN to a certain extent, they still have some defects of a slow convergence speed, they easily fall into a local optimum, and have a long training process.Therefore, this paper uses the Bat algorithm to update the learning rule and network weights of BPNN.Compared with the optimization algorithm mentioned above, the Bat algorithm has the advantages of fewer parameters and better global optimization ability [21].Additionally, the mathematical model involved in the Bat algorithm is relatively simple and computationally efficient.
As the recurrent neural network, ENN has been proven to be helpful for the prediction of discrete-time series, contributing to the advantage of modeling nonlinear dynamic systems and learning time-varying patterns [22], which has an abundant application in the prediction field.Zhang et al. [23] introduced a novel EMD-ENN approach to forecast wind speed, and the simulation results show that the proposed approach consistently has the minimum statistical error.Liu et al. [24] presented a novel hybrid model, with FEEMD and wavelet packet decomposition, and ENN, which had a desirable performance in multi-step ahead wind speed forecasting.Yu et al. [25] designed a new hybrid model combining improved wavelet transform (IWT) and ENN, which exhibited satisfactory performance in wind speed forecasting.Although ENN has made many contributions in the field of wind power and wind speed prediction, wind power has intermittent and volatility characteristics.Therefore, based on ENN, the Adaboost algorithm is used to further enhance the prediction ability of ENN for nonlinear, chaotic, and volatile data, which has quickly become a new research hotspot [26].Liu and Tian et al. [27] applied the multilayer perceptron (MLP) neural networks optimized by the Adaboost algorithm to predict wind speed, and the prediction results show that the Adaboost algorithm has promoted the forecasting performance of MLP neural networks considerably and the Adaboost-MLP model is effective for wind speed predictions.Shao et al. [28] proposed a novel solution using the AdaBoost neural network in combination with wavelet decomposition to solve the defect of lower accuracy and enhance model robustness, and the experimental evaluation demonstrates that the proposed strategy can significantly enhance model robustness and effectively improve the prediction accuracy.Xiao et al. [29] developed a reliable combination model for wind speed forecasting based on an improved Adaboost algorithm named the time-vary-forecasting-effectiveness (TW-FE-Adaboost) algorithm to improve the overall forecasting accuracy.The increasing application of the Adaboost algorithm can be attributed to two aspects: One is that it can improve the prediction performance by combining multiple predictor models; another aspect is that the Adaboost algorithm has the advantages of simple calculation and small error.Therefore, ENN optimized by the Adaboost algorithm is applied to predict short-term wind power, which can maximize the merit of Adaboost algorithms and enhance prediction ability of ENN for nonlinear, chaotic, and volatile data.
Over the past few decades, numerous wind power forecasting approaches have been presented, which have enhanced the prediction accuracy of wind power series.However, since the relatively noisy and unstable characteristics of wind power data, wind power prediction by directly using original data would lead to substantial forecasting errors and poor performance [30].Hence, the signal decomposition technique has been considered and applied for wind power forecasting to improve prediction performance, especially EMD [31] and EEMD [32].Although these two techniques have improved the forecasting performance to a certain extent, they still have some disadvantages, such as the mode mixing problem in EMD and the residual noise in EEMD [33].To overcome these defects of EMD and EEMD, a novel technique called ESMD, proposed by Wang et al. [34], is employed for reducing the noise and uncertainty of wind speed and wind power series.Moreover, compared with some classical time-frequency transform methods [35][36][37], the ESMD method proposes a "direct difference (DI) method" for data, broking through the traditional concept of using integral changes.Additionally, the ESMD method has been applied in many fields at present, such as climate change issues [38] and seismology [39].
A review of the previous literature indicates that the prediction methods discussed above have some inherent disadvantages.The shortcomings of these methods are summarized as follows: (1) Physical methods are extremely weak in coping with short-term horizons; therefore, these methods do not have accurate and effective results in short-term forecasting [33].Moreover, physical methods are characterized by high cost and complexity.(2) Relative to physical methods, statistical methods based on single data are a relatively simple method, however, its ability to handle abrupt information is poor.Additionally, wind speed or wind power prediction is affected by meteorological data, but statistical methods are usually applied to single data.(3) Different from other methods, intelligent methods have been widely researched and applied to address complicated relationships and effectively perform forecasting, which could successfully capture hidden non-linear relationships among given historical data [40].However, there are still many disadvantages and defects with intelligence methods, for example, easily getting into a local optimum, over-fitting, and exhibiting a relatively low convergence rate [41].(4) The individual prediction model does not take into account the necessity and importance of data preprocessing techniques, so it cannot achieve high prediction accuracy and meet the requirements of time series prediction.Therefore, since each model has its inevitable shortcomings, the above prediction method cannot always capture the wind speed or wind power trend and cannot be applied in all cases.Consequently, a combined model has often been taken into consideration, which is deemed as an excellent method that utilizes the advantages of individual approaches to obtain a higher forecasting accuracy [42].
Based on the analysis above, this study introduces a novel combined model to improve the accuracy of short-term wind power prediction.It combines meteorological data (humidity, pressure, temperature, wind direction, and wind speed), the signal decomposition technique, SampEn theory, and several forecasting algorithms, namely the Bat-BP model, Adaboost-ENN model, and ENN.It successfully exploits the advantages of each prediction model for further improvement.More specifically, considering that wind power is influenced by meteorological data at different times and altitudes (10 m, 30 m, 50 m, 70 m, and 100 m), we use grey correlation degree analysis to select meteorological data at different altitudes in the same location as part of the input variable to the prediction model.Then, to deal with the randomness and instability of wind speed and wind power time series, the ESMD method is employed to decompose the original wind speed and wind power time series into several sub-series, called Intrinsic Mode Functions (IMFs) and a residue R, respectively.Subsequently, SampEn theory is employed to calculate the complexity of each wind power sub-series obtained by the ESMD method, and reconstruct them into three characteristic components, namely PHC, PMC, and PLC.Similarly, wind speed sub-series are reconstructed into SHC, SMC, and SLC.Then, the training samples of PHC, PMC, and PLC are determined by PACF theory.Next, Bat-BP, Adaboost-ENN, and ENN, which have high prediction accuracy, are selected as forecasting models to predict PHC, PMC, and PLC, respectively.Finally, the prediction results of each characteristic component are aggregated into the final prediction values of the original wind power series.
On the whole, the novelty of this study can be described as follows: (1) Based on the historical wind power data, the meteorological data at different times and altitudes are selected as the input variables of the wind power prediction to improve the forecasting accuracy.The grey correlation degree analysis is used to determine the influencing factors of wind power, which can more fully consider the influence of the external environment on wind power.
(2) The ESMD method, based on the direct difference (DI) method, is adopted to handle the complexity and volatility of wind speed and wind power series for the first time, which can smooth the data and extract the main characteristics of the data.The original wind power and wind speed series are decomposed and reconstructed into three characteristic components to decrease the instability of wind power and wind speed series.
(3) The proposed combined model based on three prediction algorithms improves the accuracy of wind power prediction to a certain extent.The proposed combined model utilizes the advantages of each individual model and overcomes the limitation of the low accuracy and instability of a single model.
The remainder of this paper is structured as follows.Section 2 describes the specific prediction model, including the Bat-BP model, Adaboost-ENN model, and ENN.The framework of this paper can be seen in Section 3. Section 4 discusses the prediction results and prediction performance of the proposed combined model.Finally, Section 5 concludes the results of this study.

Methodology
This section aims to provide a brief introduction to the methods used in this study, including the ESMD method, Bat-BP model, and Adaboost-ENN model.

Extreme-Point Symmetric Mode Decomposition
ESMD, a data processing method, was proposed by Wang and Li et al., which is a new development of the Hilbert-Huang transform (HHT) method [34].The ESMD method is similar to the EMD method, which can smooth the complex signals and obtain several IMFs and a residue R. The basic idea of ESMD draws on the EMD method, which changes a cubic spline interpolant of the upper and lower envelopes in EMD to the internal pole symmetric interpolation.At the same time, the ESMD method uses the least squares method to optimize R as the "adaptive global average" of the entire data to determine the optimal number of filters.The specific algorithm of ESMD is as follows: Step 1: Find all the poles that exist in the original data, Y t , and record them as Step 2: Connect the adjacent poles, E i , with line segments and mark the midpoint of each line segment as Step 3: Use the linear interpolation method to set midpoints of the left and right borders to F 0 and F n , respectively.
Step 4: Construct p interpolation curves, L 1 , L 2 , • • • , L p (p ≥ 1), using n + 1 midpoints, and an average curve can be obtained, Step 5: Calculate Y t − L * by repeating step 1-step 4 until |L * | ≤ ε is satisfied or the number of screenings reaches the preset maximum value, K, then the first I MF 1 is obtained.In general, the set of permitted error, ε, for 0.001σ 0 , and σ 0 is the standard deviation of the original data, Y t .
Step 6: Repeat steps 1 to 5 for Y t − I MF 1 to obtain I MF 2 , • • • , I MF n and a residual R (The number of poles of R is set as at least 4 poles).
Step 7: Make K vary between K min and K max and repeat steps 1 to 6.Then, calculate the relative standard deviation, σ, of Y t − R when K takes different values.
Step 8: Find the K 0 corresponding to the minimum variance ratio, v = σ/σ 0 (where R is the best fit curve of the data), and the output decomposition results of steps 1 to 6 are repeated again.
It is worth noting that in step 4, ESMD mainly includes three forms according to the difference of the P number, which are called ESMD_I, ESMD_I I, and ESMD_I I I.It has been proven by practice that ESMD_I I shows better signal decomposition characteristics [43].Therefore, this paper chooses ESMD_I I as a signal decomposition method.In step 8, the minimum variance ratio, v, reflects the trend change of the trend term and the original signal.The value is smaller, the decomposition result is better.

Bat-BP Neural Networks
As a multi-layer forward network, BPNN is trained by the error back propagation algorithm, whose network topology mainly consists of an input layer, hidden layer, and output layer [44,45].In the BPNN algorithm, the weight and threshold of the network are usually adjusted along the negative gradient direction of the network error change, and finally the network error reaches a minimum value.Although BPNN is widely used, it has the disadvantages of a poor global search ability, slow convergence rate, and it easily falls into local minimum values.In order to overcome the shortcomings of BPNN, the bat optimization algorithm is used to optimize the parameters of BPNN to further improve its prediction performance.
The bat algorithm is a new type of bionic algorithm.It is very suitable for the excellent selection of complex problems for simplicity and robustness, and it is widely used in various fields, such as optimization and classification.The main principle of the bat algorithm is to simulate the process of bats searching for prey.The updated formula of frequency, velocity, and position is shown in Equations ( 1)-( 3): (1) (2) where α ∈ [0, 1] is a random number subjecting to uniform distribution; f i represents the update frequency of bat i in the range [ f min , f max ]; X * indicates the current global best location; V i t+1 and V i t are the velocity values of the bat i at time step t + 1 and t; X i t+1 and X i t represent the specific location of the bat i at time step t + 1 and t, respectively.The local search is proposed to improve the performance of the bat algorithm.Each bat gets a partial new solution by random walks when a solution is selected in the current best solutions: where ε ∈ [0, 1] is a random number, A t is the average loudness of all bats in the same time period.Further, the loudness, A i , is strong and the rate, r i , pulse emission is low, contributing to expand the search space.After the prey is discovered, the loudness is gradually reduced and the rate of pulse emission is increased, which helps to accurately grasp the spatial position of the prey.This search feature is shown by Equations ( 5)-( 6): where, A i t+1 and A i t are the loudness of the bat i at time step t + 1 and t; α ∈ [0, 1] is the attenuation coefficient of loudness.Here, r i t+1 indicates the rate of the pulse emission of the bat i at time step t + 1; r i 0 represents the maximum rate of pulse emission of the bat, i; γ > 0 is the increasing coefficient of the rate of the pulse emission.
This specific algorithm of Bat-BP is briefly described as follows: Step 1: Construct the BPNN model and experiment repeatedly to determine its structure.
Step 2: Produce the number of bat, n, and form an initial population size of bats.Initialize bat position, x i , velocity, v i , loudness, A i , and pulse rates, r i .
Step 3: Contact the location of each individual bat to the fitness function f (X).
Step 4: Determine whether the algorithm reaches the maximum number of iterations.If yes, the algorithm ends and the best solution is output; otherwise, go to step 5.
Step 5: Update the velocity and location of the bat by Equations (1)-(3).
Step 6: Generate a random number, rand1.If rand1 > r i , a new solution will be obtained by Equation (4).
Step 7: Evaluate the quality of the solution.
Step 8: Generate a random number, rand2.If rand2 < A i and f (X i t ) < f (X * ), the bat individual will be updated.A i will be reduced by Equation ( 5) and r i will be enlarged by Equation ( 6), respectively.
Step 9: Output target value if the termination condition is met, and the termination program is executed.Otherwise, return to step 3.
Step 10: Use the weights and thresholds obtained by the bat algorithm to train the established BPNN model.

Adaboost-ENN Model
ENN is a recurrent neural network, with local memory units and feedback connections.Compared with the forward networks, ENN not only has an input layer, a hidden layer, and an output layer, but also has a context layer.The neurons in the input layer only play the role of signal transmission, and the output layer applying purelin function is linearly weighted.The hidden layer, a single-layer network structure, adopts the tansig function, which can reduce the running time by improving the convergence speed of the ENN model.Compared with the forward networks, ENN adds a special layer, called the context layer.The context layer acts as a one-step delay operator to achieve the purpose of memory, so that the system has the ability to adapt to time-varying characteristics.Moreover, it can directly reflect the characteristics of the dynamic process system.Although ENN has been improved on the basis of the forward networks, there are still deficiencies due to its inherent characteristics.For example, ENN uses the gradient descent method to modify weights and thresholds between neurons in each layer, which reduces the convergence accuracy of the network [46].For the inherent defects of ENN, the Adaboost algorithm is applied to optimize ENN to improve the prediction performance of the model.
The Adaboost algorithm was developed by Schapire for regression with time series data, which is a typical example of the boosting algorithm [47].The Adaboost algorithm can generate different weak learners by repeatedly training the same data set, and then combines these weak learners into powerful learners to improve the prediction accuracy and generalization ability of the weak learners.The core idea of the Adaboost algorithm is to value the samples with large prediction error and the weak learners with good performance, that is, to improve the sample weights with a poor training effect and the weak learner weights with a strong learning ability.On the contrary, it reduces the sample weights with a good training effect and the weak learner weights with a weak learning ability.
According to Wang et al. [48], this detailed prediction process of the Adaboost-ENN model is as follows: Step 1: Preprocess original data by data quantification and normalization.
Step 2: Assume training set, X = (x 1 , y 1 ), , and the initial distribution weight of the sample on the training set is initialized: D 1 (i) = 1/m.The neural network structure is determined by the input and output dimensions, and the weights and thresholds of ENN are initialized.
Step 3: Find the weak predictor, h j (j = 1, 2, • • • , T).When the jth weak predictor is trained, the ENN is trained with the training set and the prediction results are output.Then, the sum of prediction error, ε j , of the prediction series, h(j), can be obtained, which can be expressed as follows: where h j (x i ) is the prediction results, and y i is the expected values.
Step 4: Update weight.According to the ε j , the weight of the series is calculated as: Then, the weight of the next training sample is adjusted, the adjustment formula is: where Z j is the normalization factor, and Step 5: Obtain T strong prediction functions, h j (x)(j = 1, 2, • • • , T), through T-round training.A strong prediction function is formed:

The Framework of the Combined Model
In this study, a novel combined model is successfully developed to improve the wind power prediction effectiveness.The main operational steps of the combined model proposed in this paper are as follows.In addition, we also plotted the corresponding flow chart, as shown in Figure 1.
(1) Use the grey correlation degree analysis to screen meteorological data at different times and altitudes to determine the influencing factors of the original wind power series.
(2) Adopt the ESMD method to decompose the wind speed and wind power series into several sub-series, called wind power sub-series and wind speed sub-series, respectively.
(3) Utilize sample entropy theory to calculate the complexity of each wind power sub-series obtained by the ESMD method, and reconstruct them into three wind power characteristic components of reduced order of complexities, namely PHC, PMC, and PLC.Meanwhile, the complexity of each wind speed sub-series is calculated by sample entropy theory, and they also are reconstructed into SHC, SHC, and SLC.
(4) Apply PACF theory to determine the input-output samples of PHC, PMC, and PLC to improve the accuracy of wind power prediction. (

The Framework of the Combined Model
In this study, a novel combined model is successfully developed to improve the wind power prediction effectiveness.The main operational steps of the combined model proposed in this paper are as follows.In addition, we also plotted the corresponding flow chart, as shown in Figure 1.
(1) Use the grey correlation degree analysis to screen meteorological data at different times and altitudes to determine the influencing factors of the original wind power series.
(2) Adopt the ESMD method to decompose the wind speed and wind power series into several sub-series, called wind power sub-series and wind speed sub-series, respectively.
(3) Utilize sample entropy theory to calculate the complexity of each wind power sub-series obtained by the ESMD method, and reconstruct them into three wind power characteristic components of reduced order of complexities, namely PHC, PMC, and PLC.Meanwhile, the complexity of each wind speed sub-series is calculated by sample entropy theory, and they also are reconstructed into SHC, SHC, and SLC.
(4) Apply PACF theory to determine the input-output samples of PHC, PMC, and PLC to improve the accuracy of wind power prediction. (

Datasets
To validate the prediction performance of the proposed combined prediction model, 15-min wind power and meteorological data were collected from wind farm in China, which were generated from 09:00 24 April 2016 to 09:00 9 May 2016, a total of 15 days.Among them, meteorological data were Sustainability 2019, 11, 650 9 of 23 measured at heights of 10 m, 30 m, 50 m, 70 m, and 100 m from the ground.The rated installed capacity of the wind farm is 49.5 MW.Each dataset was divided into a training set and testing set, in which the data of the first 14 days was used as the training set to train the prediction model, and the data of the last day were used as the testing set to estimate the prediction performance of the model [49].The data statistical description of the wind power and meteorological data is shown in Table A1.Meanwhile, the description of the training set and testing set is presented in Table A2.Although the wind power and meteorological data for the 15 days are continuous without missing data points, there are still a small number of outliers.Therefore, we used fractal interpolation to process the outliers that were rejected [50].

Evaluation Criteria
In this paper, three error metrics were used to evaluate the prediction performance of the proposed model, namely MAPE (mean absolute percent error), NMAE (normalized mean absolute error), and NRMSE (normalized root mean square error).The specific formula of each error metric is as follows: where y t is the real wind power, ŷt represents the predicted wind power, P inst is the rated installed capacity of wind farm, and N indicates the test samples number for the prediction model.Moreover, we used three percentage error indices to compare the prediction performance of the built model, called P MAPE (promoting percentages of mean absolute percentage error), P N MAE (promoting percentages of normalized mean absolute error), and P NRMSE (promoting percentages of normalized root mean square error).Their detailed formulas are as follows.

Grey Correlation Degree Analysis
To determine the influencing factors of wind power, we used grey correlation degree analysis to measure the relatedness between the wind power series and meteorological data at different times and altitudes.As a multi-factor statistical analysis method, the grey correlation degree analysis is used to describe the relationship between variables.The gray relational degree changes in the range of 0 to 1, generally believed that greater than 0.5 has a strong correlation.As illustrated in Table 1, the grey relational degree between the meteorological data and wind power series exceeds 0.6 at different times and altitudes, which indicates that the correlation between the meteorological data and wind power is significant.Among the five types of meteorological data, the correlation between wind speed and wind power is the strongest, especially the wind speed (50 m).Additionally, humidity (30 m) and temperature (100 m) are also strongly correlated with wind power.In addition, pressure at different times and altitudes are parallel and direction series at different times and altitudes are the same, so the correlation between the two types of meteorological data and wind power at different times and altitudes is 0.60583 and 0.63686, respectively.In view of the above analysis, we selected humidity (30 m), pressure (50 m), temperature (100 m), wind direction (50 m), and wind speed (50 m), respectively, as the influencing factors of wind power.

ESMD Decomposition
Through the analysis of the grey correlation degree, it can be seen that wind power is mainly affected by wind speed, which has the characteristics of instability and intermittentency.Therefore, the ESMD method was employed to decompose original wind power and wind speed series (50 m).Before decomposing the original time series, we determined the number of screening times corresponding to the minimum variance ratio, v, by repeatedly adjusting the number of residual component extreme points and the number of iterations.Therefore, the number of residual component extreme points of wind power is 20 and the number of iterations is 56 after repeated tests and comparisons.Similarly, the number of residual component extreme points of wind speed is 20 and the number of iterations is 73.The number of iterations corresponding to the variance ratio of wind speed and wind power is presented in Figure 2. It can be seen from Figure 2 that the variance ratio is the smallest when the number of iterations of the wind speed is the 72nd time, so the best screening times for wind speed is 72.Similarly, the best screening times for wind power is 13, which corresponds to the minimum variance ratio.After determining the best number of screening times, the wind speed and wind power series were decomposed into six IMFs and a residue, respectively.The specific decomposition results of the wind power and wind speed series are shown in Figure 3.It can be seen from Figure 2 that the variance ratio is the smallest when the number of iterations of the wind speed is the 72nd time, so the best screening times for wind speed is 72.Similarly, the best screening times for wind power is 13, which corresponds to the minimum variance ratio.
After determining the best number of screening times, the wind speed and wind power series were decomposed into six IMFs and a residue, respectively.The specific decomposition results of the wind power and wind speed series are shown in Figure 3.It can be seen from Figure 2 that the variance ratio is the smallest when the number of iterations of the wind speed is the 72nd time, so the best screening times for wind speed is 72.Similarly, the best screening times for wind power is 13, which corresponds to the minimum variance ratio.After determining the best number of screening times, the wind speed and wind power series were decomposed into six IMFs and a residue, respectively.The specific decomposition results of the wind power and wind speed series are shown in Figure 3.

Sample Entropy Theory
As a method for measuring the complexity of time series, sample entropy was proposed by Richman in 2000, which has been widely used in many fields [51].Specifically, the sample entropy has an anti-noise ability and anti-interference ability, and is suitable for mixed signals composed of random components [52].However, the wind power series are characterized by instability and volatility.Therefore, we used the sample entropy method to calculate the sample entropy values of each wind power sub-series decomposed by the ESMD method, and grouped them into PHC, PMC, and PLC.According to [53], the specific grouping rules are as follows: Step 1: Calculate the sample entropy values of the original wind power series and each wind power sub-series, denoted as SampEn data and SampEn sub-series .
Step 2: Record the SampEndata as the initial threshold.If SampEn data > SampEn sub-series , SampEn sub-series is grouped into a characteristic component; otherwise, SampEn sub-series is grouped into another characteristic component.
Step 3: Set a special threshold, u, and an upper bound line (SampEn sub-series − SampEn data = u) and a lower bound line (SampEn data − SampEn sub-series = u) will form.
Step 4: According to the results of Step 2 and Step 3, the wind power sub-series with similar sample entropy values can be grouped into PHC, PMC, and PLC.
This paper set the u value as 0.3, and the detailed grouping process is shown in Figure 4.At the same time, the composition of the IMF number in each characteristic component is given in Table 2.After the sample entropy values of each wind power sub-series were calculated, the sub-series were reconstructed by aggregating the sub-series in each characteristic component, respectively, as shown in Figure 5.It can be seen that the original wind power series was reconstructed into three new wind power components, which contributes to improving the prediction performance.Similarly, we used the sample entropy method to calculate the complexity of each wind speed sub-series obtained by the ESMD method, and reconstructed them into SHC, SMC, and SLC, as shown in Table 2 and Figure 5.
Step 4: According to the results of Step 2 and Step 3, the wind power sub-series with similar sample entropy values can be grouped into PHC, PMC, and PLC.
This paper set the u value as 0.3, and the detailed grouping process is shown in Figure 4.At the same time, the composition of the IMF number in each characteristic component is given in Table 2.After the sample entropy values of each wind power sub-series were calculated, the sub-series were reconstructed by aggregating the sub-series in each characteristic component, respectively, as shown in Figure 5.It can be seen that the original wind power series was reconstructed into three new wind power components, which contributes to improving the prediction performance.Similarly, we used the sample entropy method to calculate the complexity of each wind speed sub-series obtained by the ESMD method, and reconstructed them into SHC, SMC, and SLC, as shown in Table 2 and Figure 5.

PACF Theory
The input-output samples in each wind power component were determined by PACF (Partial autocorrelation function) theory, which is a useful tool for analyzing the correlation between candidate variables and historical datasets [54].Therefore, PACF theory was used to determine the input vector for each wind power characteristic component.After normalizing the training samples,

PACF Theory
The input-output samples in each wind power component were determined by PACF (Partial autocorrelation function) theory, which is a useful tool for analyzing the correlation between candidate variables and historical datasets [54].Therefore, PACF theory was used to determine the input vector for each wind power characteristic component.After normalizing the training samples, the PACF values of each wind power characteristic component were calculated and shown in Figure 6.It can be seen from Figure 6 that the dimension values of each wind power characteristic component are seven, five, and eight, respectively.
Once the input-output samples of each wind power component were determined, the corresponding data format of PHC can be defined in Figure 7.Moreover, the corresponding data format of PMC and PLC were omitted due to a restriction of space.It is worth noting that the training set and testing set both use the same rolling prediction mechanism.Therefore, the rolling prediction mechanism was adopted for wind power prediction of one day based on one-step prediction results.

The Parameter Setting of the Prediction Models
In this paper, the main procedures were performed in a Windows 7 PC with a 32-bit, 2.60 GHz Intel Core i5-3230M CPU, and 4 GB of memory.The wind power prediction experiments were employed in MATLAB R2014a.The experimental parameters are shown in Table 3.

Analysis of Forecast Results and Comparisons of Different Models
In this study, wind power sub-series obtained by the ESMD method were reconstructed into PHC, PMC, and PLC, and the data characteristics of each characteristic component are different.Among them, PHC is a nonlinear system whose fluctuations are nonlinear and abrupt; the fluctuation of PMC is nonlinear and chaotic and its volatility is relatively moderate; and the fluctuation of PLC is relatively flat and nonlinear [55].However, Bat-BP, Adaboost-ENN, and ENN are all suitable for dealing with nonlinear, fluctuating, abrupt, and chaotic data, and have made great contributions to wind power prediction.Therefore, to select the best prediction model for each wind power characteristic component, we performed a set of comparative experiments by using Bat-BP, Adaboost-ENN, and ENN in each characteristic component, with the prediction results shown in Table 4.It can be seen from Table 4 that the Bat-BP model is the best for PHC prediction.Similarly, the Adaboost-ENN model and ENN model are suitable for PMC and PLC prediction, respectively.
To estimate the prediction accuracy of the proposed combined model, we set up four different experiments.The comparisons of MAPE for different prediction models and specific results of four groups of experiments are shown in Figure 8, part A and B. It can be seen from Figure 8 that wind power prediction results of the proposed model are superior to others in wind power forecasting.Additionally, the prediction error is lower than others in terms of MAPE.The specific purpose of each experiment and corresponding analysis are as follows.
The first group of experiments was built to validate that meteorological data at different times and altitudes has an influence on the accuracy of wind power prediction, which was compared to other prediction models that did not consider meteorological data (NMD), which were the NMD combined model, NMD-ESMD-Bat-BP, NMD-ESMD-Adaboost-ENN, and NMD-ESDM-ENN.From Table 5, in all prediction results, the proposed model is superior to other models that do not consider meteorological data, with MAPE values of 5.60%, 6.76%, 8.29%, and 8.84%, respectively.In addition, the NMD combined model has a better prediction performance among the remaining four methods, with NMAE values of 2.96%.The prediction results obtained by the NMD combined model can further illustrate that although the meteorological data has not been considered, the combined model based on three prediction models has significantly improved the accuracy of wind power prediction.However, the prediction performance of the ESMD-Adaboost-ENN model (or ESMD-Bat-BP model) is better than the Adaboost-ENN model (or Bat-BP model), and the prediction performance of the proposed model is further improved.Compared with the EMD model, the EEMD method effectively improves the prediction performance.However, the prediction accuracy and stability of the proposed model is better than that of the EEMD model.For instance, the MAPE promoting percentages by the ESMD method is 17.69%; the NMAE promoting percentages by the ESMD method is 14.14%; the NRMSE promoting percentages by the ESMD is 11.95%.

Fourth group of experiment
Short-term w ind pow er (KW ) ) Do a group of trial experiments to select the best training algorithm for PHC, PMC, and PLC, respectively.(6) Employ the Bat-BP model, Adaboost-ENN model, and ENN to do one-step rolling prediction of PHC, PMC, and PLC, respectively, to forecast wind power in the next 24 h.(7) Conduct aggregate calculation for the one-step rolling prediction results in PHC, PMC, and PLC to obtain the final prediction values of the original wind power series.(8) To compare the prediction performance, the proposed combined prediction model will be compared with other models through four sets of comparative experiments, such as NMD-ESMD-Bat-BP model, ESMD-Adaboost-Elman model, EMD, and ENN, etc. Sustainability 2019, 11, x FOR PEER REVIEW 8 ) Do a group of trial experiments to select the best training algorithm for PHC, PMC, and PLC, respectively.(6) Employ the Bat-BP model, Adaboost-ENN model, and ENN to do one-step rolling prediction of PHC, PMC, and PLC, respectively, to forecast wind power in the next 24 h.(7) Conduct aggregate calculation for the one-step rolling prediction results in PHC, PMC, and PLC to obtain the final prediction values of the original wind power series.(8) To compare the prediction performance, the proposed combined prediction model will be compared with other models through four sets of comparative experiments, such as NMD-ESMD-Bat-BP model, ESMD-Adaboost-Elman model, EMD, and ENN, etc.

Figure 1 .
Figure 1.The overall framework for the combined prediction model.Figure 1.The overall framework for the combined prediction model.

Figure 1 .
Figure 1.The overall framework for the combined prediction model.Figure 1.The overall framework for the combined prediction model.

Figure 2 .
Figure 2. The number of iterations corresponding to the variance ratio of wind speed and wind power.

Figure 2 .
Figure 2.The number of iterations corresponding to the variance ratio of wind speed and wind power.

Figure 2 .
Figure 2. The number of iterations corresponding to the variance ratio of wind speed and wind power.

Figure 3 .
Figure 3.The decomposed results of wind power and wind speed series using the ESMD method: (A) Wind power series is decomposed by the ESMD method into six IMFs and a residue R (from top to bottom); (B) wind speed series is decomposed by the ESMD method into six IMFs and a residue R (from top to bottom).

Figure 3 .
Figure 3.The decomposed results of wind power and wind speed series using the ESMD method: (A) Wind power series is decomposed by the ESMD method into six IMFs and a residue R (from top to bottom); (B) wind speed series is decomposed by the ESMD method into six IMFs and a residue R (from top to bottom).

Figure 4 .
Figure 4.The SampEn value of each sub-series for the wind power series obtained by ESMD.

Figure 4 .
Figure 4.The SampEn value of each sub-series for the wind power series obtained by ESMD.

Figure 5 .
Figure 5. Reconstructed characteristic components of wind power and wind speed series: (A) wind power series is reconstructed into PHC, PMC and PLC; (B) wind power series is reconstructed into SHC, SMC and SLC.

Figure 5 .
Figure 5. Reconstructed characteristic components of wind power and wind speed series: (A) wind power series is reconstructed into PHC, PMC and PLC; (B) wind power series is reconstructed into SHC, SMC and SLC.

Figure 6 .
Figure 6.The PACF values of three wind power characteristic components: (a) The PACF values of PHC; (b) the PACF values of PMC; (c) the PACF values of PLC.

Figure 7 .
Figure 7.The data format of inputting and outputting for PHC.4.3.5.The Parameter Setting of the Prediction Models In this paper, the main procedures were performed in a Windows 7 PC with a 32-bit, 2.60 GHz Intel Core i5-3230M CPU, and 4 GB of memory.The wind power prediction experiments were

Figure 6 .
Figure 6.The PACF values of three wind power characteristic components: (a) The PACF values of PHC; (b) the PACF values of PMC; (c) the PACF values of PLC.

Figure 6 .
Figure 6.The PACF values of three wind power characteristic components: (a) The PACF values of PHC; (b) the PACF values of PMC; (c) the PACF values of PLC.

Figure 7 .
Figure 7.The data format of inputting and outputting for PHC.

Figure 7 .
Figure 7.The data format of inputting and outputting for PHC.

Table 1 .
The grey correlation degree between wind power series and meteorological data at different times and altitudes.

Table 2 .
The composition of the sub-series number in different characteristic components.

Table 2 .
The composition of the sub-series number in different characteristic components.

Table 3 .
Setting the experimental parameters.

Table 4 .
The results of the prediction model experiment.

Table 5 .
The prediction performances for three different experiments.

Table 6 .
The promoting percentages for three different experiments.

Table A2 .
Forecast results and comparison of MAPE between different models: (A) Comparison of MAPE (100%) in different models from four different experiments (from top to bottom in part A; first group of experiment, second group of experiment, third group of experiment, fourth group of experiment); (B) comparison of forecasts by four different experiments (from top to bottom in part B; first group of experiment, second group of experiment, third group of experiment, fourth group of experiment).Forecast results and comparison of MAPE between different models: (A) Comparison of MAPE (100%) in different models from four different experiments (from top to bottom in part A; first group of experiment, second group of experiment, third group of experiment, fourth group of experiment); (B) comparison of forecasts by four different experiments (from top to bottom in part B; first group of experiment, second group of experiment, third group of experiment, fourth group of experiment).The statistical descriptions of the training set and testing set.