Day-Ahead Wind Power Forecasting Based on Wind Load Data Using Hybrid Optimization Algorithm

: Accurate wind power forecasting is essential to reduce the negative impact of wind power on the operation of the grid and the operation cost of the power system. Day-ahead wind power forecasting plays an important role in the day-ahead electricity spot trading market. However, the instability of the wind power series makes the forecast difficult. To improve forecast accuracy, a hybrid optimization algorithm is established in this study, which combines variational mode decomposition (VMD), maximum relevance & minimum redundancy algorithm (mRMR), long short-term memory neural network (LSTM), and firefly algorithm (FA) together. Firstly, the original historical wind power sequence is decomposed into several characteristic model functions with VMD. Then, mRMR is applied to obtain the best feature set by analyzing the correlation between each component. Finally, the FA is used to optimize the various parameters LSTM. Adding the forecasting results of all sub-sequences acquires the forecasting result. It turns out that the proposed hybrid algorithm is superior to the other six comparison algorithms. At the same time, an additional case is provided to further verify the adaptability and stability of the proposed hybrid model


Introduction
Nowadays, the global environment is deteriorating and energy resources are experiencing persistent shortages.The development of renewable energy has become an issue of increasing concern to the international community.The proportion of wind power capacity in the power grid continues to increase.China's wind power has developed rapidly in recent years.According to official data from the National Energy Administration of China; China's grid-connected installed capacity of wind power gradually increased from 2013 to 2019, with a compound annual growth rate of 20.94% [1].By the end of 2019, the cumulative grid-connected installed capacity was 210 million kWh and the wind power generation was 405.7 billion kWh, accounting for 5.5% of total power generation [2].In the process of continuous development of wind power, the relationship between wind power and the power grid is getting closer and closer.However; because of the characteristics of intermittent, uncontrollable and volatile of wind energy, there are many difficulties in the grid connection of large-scale wind power.Therefore, it is necessary to adopt an accurate and effective wind power forecasting technology to ensure the security and stability of the power grid.Through short-term wind power forecasting, the electric power department can make timely scheduling plans for wind farm output power changes in advance, thereby reducing the system's reserve capacity, the operating cost of the power system and the adverse impact on the grid.
According to the literature, the forecasting methods of wind power are mainly divided into two categories.One is the physical method; the Numerical Weather Prediction (NWP) data, as well as related information affecting the surroundings and wind speed are used to establish the correlation model, and the forecasting result is obtained after tedious calculation [3].However, this method requires high-precision and complete data.The correlation model is relatively rough, and the forecasting accuracy is poor.In addition, it is more suitable for long-term wind power forecasting.The other one is statistical methods.The statistical methods mainly include time series, regression analysis, Kalman filtering, and so on [4][5][6].The advantage of the statistical methods is that the forecasting spontaneously adapts to the position of the wind farm to automatically reduce the system error.However, they require long-term measurement data and additional training that refers to testing under various weather conditions and correcting forecasts.Due to the instability and non-linearity of wind energy itself, the predicted results are unsatisfactory.
In recent years, with the continuous deepening of researches, a new branch of statistical methods, artificial intelligence methods, has been developed, which has been successfully used in the forecasting of wind power, and the forecasting results have been recognized [7][8][9].These kinds of methods include support vector machine (SVM) [5], longterm and short-term memory (LSTM) model [10], artificial neural network (ANN) [11], and so on.However, as a branch of statistical method, it also has the same disadvantages as the statistical methods.The single artificial intelligence model is hard to master the law of wind power change, so it cannot meet the needs of forecasting accuracy.As the wind power data are random and volatile without stationary sequences, it is necessary to preprocess the data through other ways.Otherwise, it will seriously affect the accuracy of the forecast results.
There are various data pre-processed methods, which can effectively improve the accuracy of forecasting [12].The decomposition methods, as one of the data pre-processing methods, mainly include the wavelet decomposition [13], empirical mode decomposition (EMD) [14], variational mode decomposition (VMD) [15,16], Fourier decomposition Error!Reference source not found., ensemble empirical mode decomposition (EEMD) Error!Reference source not found., and so on.The decomposition methods can pre-process the complex and changeable original wind power sequence to obtain more regular model characteristics.However, the results obtained by the decomposition have certain shortcomings; for example, decomposition results obtained by the wavelet decomposition method have residual noise.The EMD will produce modal aliasing during the decomposition process, which affects the decomposition performance and reduces the forecasting accuracy Error!Reference source not found..This problem can be solved by EEMD, but its construction components remain noisy.As for the Fourier decomposition method, it has poor adaptability, while the wind power sequence is a non-stationary signal and will be affected by many factors Error!Reference source not found..In contrast, the VMD is a good way to decompose and process the original data, which reduces the non-stationarity of the wind power sequence and improves the anti-interference ability and robustness of the model.Compared with other decomposition models such as EMD, ensemble empirical mode decomposition (EEMD), etc., VMD can decompose the original data into fewer sub-data as long as the appropriate convergence function is selected, thereby reducing the difficulty of modeling Error!Reference source not found.. Besides, minimal redundancy and maximal relevance (mRMR) can be used to recognize patterns and select features after decomposition Error!Reference source not found.. mRMR has the advantage that it not only considers the correlation between characteristics and target variables, but obtains redundant information among features.In reference Error!Reference source not found., mRMR was applied in wind speed forecasting.In reference Error!Reference source not found., mRMR was utilized to forecast global solar radiation.
Nowadays, more and more researchers tend to use hybrid algorithms to reduce forecasting error by combining the advantages of multiple methods [23][24][25][26].Sun et al. [27] proposed a combination model of EEMDCAN, ARFIMA, and PSOSVM to forecast wind power.The results showed that the EEMDCAN-ARFIMA-PSOSVM hybrid model effectively improved forecasting accuracy.Zhao and Huang [23] used the ultra-short-term power forecasting model of EMD and the support vector regression (SVR) optimized by the simulated annealing (SA) algorithm (EMD-SA-SVR).This combined method had higher forecasting accuracy and stronger forecasting ability, and the optimization time was significantly shorter than other algorithms.Zhang et al. [28,29] employed a singular spectrum analysis (SSA) algorithm to decompose the original wind power sequence data and then optimized the support vector machine (SVM) to predict the wind power through the least square support algorithm.Wang et al. [30] proposed forecasting models integrating the back propagation (BP) algorithm, wavelet decomposition, and SVM, and used Gaussian cloud models to reflect the uncertainty in the forecasting process.The simulation results showed that each forecasting method had its limitations, each which can cause large errors, while the combined forecasting model was significantly better than the single forecasting models.Wang et al. [31] proposed a wind power forecasting method based on SSA, the opposition transition state transition algorithm (OTSTA), Laguerre polynomial, and neural network.By comparing with other popular methods, the results showed that the combination of SSA, OTSTA, and other methods indirectly or directly improved the forecasting accuracy of the model.Lang et al. [32] developed an improved long-term and short-term memory (LSTM) model based on VMD.This combination method had higher forecasting accuracy than other forecasting methods, but still had a relatively large error.Liang [33] used a multi-variable stacking LSTM model to predict uncertain short-term wind speeds, which allowed the ingestion of multiple weather parameters for real-time weather forecasting.Lopez et al. [34] established a LSTM-ESN (Echo State Network) hybrid algorithm, which was an improved LSTM training process to predict wind power and wind generator power.
The LSTM model is the development of Recurrent Neural Networks (RNN).It solves the inherent 'gradient dispersion' problem of RNN in the process of long-term sequences, which greatly improves the time series forecasting capabilities, and realizes a single point worth predicting.This compares favorably with traditional machine learning techniques, such as BP neural network, SVM, and so on, which only regard the wind speed forecasting problem as a static modeling problem.The LSTM model adds a cyclic structure increasing the connection between hidden layers so that it has a strong nonlinear mapping ability and memory function.However, the LSTM model has the drawback that the shadow layer is overloaded, causing it to be low in efficiency [35].To solve this problem, the forecast models can be optimized by algorithm, which can directly optimize the parameters and improve the search ability of the forecasting models.The forecast model can be optimized by a genetic algorithm Error!Reference source not found., cuckoo search algorithm Error!Reference source not found., firefly algorithm (FA) Error!Reference source not found., and so on.The firefly algorithm (FA) is used in this study to improve efficiency, which is relatively simple without strict continuous, micro or other conditions, and has high calculation efficiency compared with the genetic and cuckoo search algorithms.This paper not only considers the inherent randomness and the uncertainty of wind energy but aims to improve the structure of LSTM.
On this basis, in order to obtain more accurate wind power results, a multi-step hybrid wind power forecasting model is proposed.Firstly, for the purpose of obtaining more regular model features, the VMD is used to decompose the wind power data, which is a new method of signal decomposition.Then the algorithm of mRMR is used to select features.The algorithm considers the correlation and redundancy between features Error!Reference source not found.. Finally, the improved LSTM model combined with FA is applied to forecast wind power, which can be used directly to improve the neural structure of the network without any interaction.
The innovations of this article are as follows: (1) In this paper, we firstly propose an improved LSTM model optimized by the FA to predict wind power.The FA is simple to implement with few parameters.Thus, it has high efficiency that can compensate for the efficiency problem of the LSTM model, which is comparatively inefficient due to the excessive hidden layer load.What's more, the FA is feasible and effective in continuous space and discrete space optimization that can improve forecast accuracy.
(2) A hybrid model VMD-mRMR-LSTM-FA is constructed for the first time: data decomposition preparation, data feature selection, algorithm optimization modeling optimization, and forecasting.This reasonable multi-step method is clear to people at a glance and lays a solid foundation for the following research.
The rest of the paper is organized as followed: Section 2 introduces the theories of methods that are used in the hybrid wind power forecasting model, which includes VMD, mRMR, FA, and LSTM.Section 3 establishes the hybrid forecasting model and there is a case study to test the model.Section 4 compares the model proposed in this paper with SVM, LSTM, FA-LSTM, and mRMR-FA-LSRM methods to prove its effectiveness.In Section 5 there is a further case study to validate the adaptability of the proposed model.The conclusion is presented in Section 6.

Theoretical Framework
The second part mainly introduces the methodology used in this paper, mainly introduces the principles and processes of each method.

Variational Mode Decomposition
VMD is a newly proposed non-recursive technology to decompose a multi-component signal model in several bandwidth-limited modes completely.Unlike EMD, VMD overcomes the shortcomings such as modal aliasing phenomenon and end effect.The specific process is as follows [40][41][42]: (1) For each observed signal, assume that it is an independent Gaussian noise superimposed by the original signal.Firstly, perform noise reduction and reconstruction on the sampled signal 0 f .The objective function is expressed as: where f is obtained through the regularization method: (2) Calculating the analytical signal of each mode k u by Hilbert transform.It aims to obtain the unilateral frequency spectrum of the mode components.
(3) Estimating the center frequency by mixing the modal function k u and the exponent; then the spectrum of each mode is transferred to the respectively estimated center frequency.(4) The Gaussian smoothing method of the demodulated signal is applied to estimate the bandwidth of each mode function.The variational constraints are followed: where k u is the subsequence; k  is the center frequency; K is the total number of subse- quences; t  is Dirac distribution; and ( ) f t is the original signal.
(5) In order to solve the above constraint problem, the quadratic penalty term and Lagrange multiplier  are introduced, which can be expressed as: where  is the convolution operator.
(6) Through the alternating direction multiplier method (ADMM), the above functions can be solved by following expressions: where

Max-Relevance, and Min-Redundancy
The mRMR is a feature selection method based on mutual information, which selects features according to the maximum statistical dependence criterion Error!Reference source not found.. Therefore, the mRMR is used to evaluate the features through the mutual information and then find features from the feature space that have the greatest correlation with the target category and the least redundancy among them.The details are as follows: (1) Defining the maximum correlation and minimum redundancy [44,45]: where S is the feature set, c is the target category, and function  represents the mu- tual information between the two variables: The feature selection criteria of mRMR is as follows: (3) Based on the principle of maximum correlation and minimum redundancy, the optimal feature set m S is selected.Assuming that the feature set 1 m S  composed of m-1 fea- tures has been obtained, then the next feature m can be searched by the operator in the following formula: where j x is the other features in the original feature set that do not contain the feature quantities in 1 m S  .

Firefly Algorithm
The firefly algorithm (FA) was proposed by Yang in 2008.Light intensity and attractive function are important matters in FA [46][47][48].The mathematical description and analysis of the algorithm is as follows: Hypothesis: For any two fireflies, the fireflies with higher brightness will attract the other, but the brightness will become weaker with the distance increase, which also means if the given firefly is the brightest one, it will move randomly.This connects the brightness with the objective attractive function.
(1) Defining brightness function: where I is the maximum fluorescence emitted by the fireflies;  is the coefficient of light intensity absorption; r is the distance between the two fireflies.
(2) Defining the attractive function: where  is the maximum attraction of fireflies.
(3) During the attraction of two fireflies, the position update is expressed by Cartesian distance.
Every time the location is updated, follow the formula: where  is the step factor, and rand is a random function of number generated be- tween [0,1].
(4) Designing of fitness function: In most intelligent algorithms, the design of the fitness function directly affects the convergence speed of the algorithm and the choose of optimal solution.Therefore, it is of great significance to design a reasonable fitness function in the intelligent algorithm.The design of its fitness function includes the following key formulas: (1) The formula of the distance between the individual fireflies and the target point, which can be obtained by Formula ( 15); (2) The average absolute error (MAE) between each point; (3) The root mean square error (RMSE) between each point; (4) Final construction:

Long Short-Term Memory
LSTM is proposed by Hochreiter and Schmidhuber [49] to learn long-term dependence information.It is a time recurrent neural network that based on recurrent neural network (RNN).RNN as the traditional neural network is different from the general neural network especially in the way of neuron connection: the information of general neural network flows unidirectional, while the information transmission of RNN has a directional loop Error!Reference source not found..After the improvement, the LSTM replaces the hidden layer neurons of the RNN with memory units, then it introduces "gates" to select and control discarding or adding information.Through the gate structure on the unit state, the neural network can choose to remember or forget information Error!Reference source not found..The control theory of the gate unit is what determines whether the data is updated or discarded, so that it can solve the problems of gradient disappearance and gradient explosion in the later stage of RNN network training.The sequence principle of LSTM is shown in Figure 1.The core calculation formula of the LSTM model is as follows [52]: (1) Defining the function of the input gate, forget gate, output gate respectively: ) ) where t x is the input data for the current time step; (2) In the input gate, the new information will selectively recorded into the cell state.
During the process, the target is the memory cell: where the , , ,  f i c  (3) In hidden layer output, the required output value can be determined, which target on t h .
where t h is the hyperbolic tangent nonlinear function.

Case Study
The third part is data processing and empirical analysis.It uses actual data to verify the hybrid model and compares it with other single models to further verify the accuracy and stability of the hybrid model.

Wind Power Sequence Decomposition
This paper uses the measured wind power of Beijing Lumingshan Wind Power Plant from 10 May to 28 May 2016 as the research object.The data sampling interval was 5min, and the original wind power sequence was decomposed by VMD.The original sequence is shown in Figure 2. The decomposition results are shown in Figure 3.

Finding the Best Feature Set Using mRMR
Based on the results of wind power scenario screen out the key influencing factors of wind energy to improve the efficiency of load forecasting, which is shown in Table 1.
Table 1.Key influencing factors of wind energy.

Feature Description ALt-n
Load at the time period t-n TPt Temperature at the predicted period t APt Air pressure at the predicted period t HPt Humidity at the predicted period t WSt, WDt Wheel height wind speed and wind direction at the predicted period t TSt, TDt 10m wind speed and wind direction at the predicted period t THSt, THDt 30m wind speed and wind direction at the predicted period t FSt, FDt 50m wind speed and wind direction at the predicted period t SSt, SDt 70m wind speed and wind direction at the predicted period t After calculation, the best feature numbers of the three components P1, P2, and P3 are 12, 17, and 18.The corresponding optimal input feature sets are shown in Table 2.

Load Component
The Best Input Feature Sets

Load Forecasting Based on FA-LSTM
This paper selects the data from 10 May 2016 to 27 May 2016 as the training set, and uses the remaining data from 28 May 2016 as the test set.On the basis of determining the best input feature set of each component, the firefly algorithm was used to optimize the weights and thresholds of LSTM.The test simulation environment for this paper was Python 3.7.The parameter settings of the firefly algorithm are shown in Table 3.During the optimization process of the firefly algorithm, the optimal individual fitness value changes as shown in Figure 4.It can be seen from Figure 4 that the firefly algorithm converges to the optimal fitness value of 0.06 after 60 evolutions in the case of a population of 50.This shows that the firefly algorithm can find the optimal parameters of the LSTM neural network at a cost.After the optimization of the firefly algorithm, it was determined that the parameter combination of the LSTM neural network is as follows: the number of hidden layers is 120, the time window step is 6, the number of training times is 160, and the learning rate is 0.015.
The forecasting results of the respective components are summed up to obtain the final forecasting result, as shown in Figure 5.

Comparison Analysis
In this section, SVM, LSTM, FA-LSTM, mRMR-FA-LSRM will be used to compare with the proposed VMD-mRMR-FA-LSTM in order to verify the forecasting performance of the model proposed in this paper.Table 4 lists the parameter settings and input options of the comparison model.The forecasting results of each model are shown in Figure 6.It can be seen from Figure 6 that the VMD-mRMR-FA-LSTM short-term load forecasting model can better approximate the true value and have better forecasting accuracy.
In order to quantitatively analyze the forecasting accuracy of each model, five evaluation indicators were introduced for forecasting results, that is the coefficient of determination ( 2 R ), mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean square error (RMSE).This can be defined as follows: where m is the number of data points; . The smaller the values of RMSE, MAE, and MAPE are, the more accurate the forecasting result is. 2 R can measure the regression fitting effect of the model; the larger 2 R is, the better the fitting effect of the model will be.TIC can measure the predictive ability of the model.Finally, based on the above, we can summarize the following points: (1) In terms of forecasting accuracy, both the VMD decomposition and the firefly algorithm effectively improve the forecasting accuracy of the model.At the same time, LSTM has a greater advantage in wind power forecasting than the traditional SVM model.It can be found that after using mRMR for feature selection, the forecasting accuracy of FA-LSTM will slightly decrease because the indicators selected in the cases of this paper were chose based on experience.(2) In terms of computing efficiency, the application of mRMR, which can perform feature extraction with maximum correlation and minimum redundancy, significantly accelerates the calculation speed of the model and reduces the calculation scale, thereby improving forecasting efficiency.(3) After the FA algorithm is used to optimize the LSTM model, the influence of the initial value selection is reduced.The initial parameters of the model can be set more flexibly, which avoids the shortage of manual selection of model parameters.

Further Case Study
For the purpose of verifying the adaptability of the hybrid model, another case is included which selects additional load data (from 10 June 2016 to 16

Conclusions
Aiming at the instability of day-ahead wind power forecasting, a hybrid short-term wind power forecasting model, namely the VMD-mRMR-FA-LSTM model, is proposed.Firstly, the VMD decomposes the original load, and then mRMR is applied to obtain the best feature set by analyzing the correlation between each component and the features, including temperature, wind speed, wind direction, and so on.Secondly, different LSTM forecasting models for each new sequence based on the mRMR selection result are constructed.Finally, the FA is used to optimize the parameters of LSTM.The case study of the proposed hybrid model shows that: (1) The hybrid model has higher forecasting accuracy than those of benchmarking models, and has broad application prospects in day-ahead wind power forecasting.
(2) Compared with the single LSTM model, FA can optimize the parameters and function of LSTM to obtain higher forecasting accuracy, which indicates that FA-LSTM model has stronger global search ability and more stable forecasting performance.
(3) Compared with other data preprocessing strategy, VMD-mRMR has better performance and effective improves the forecasting accuracy.The hybrid model proposed in this paper can be well applied to day-ahead wind power forecasting.
composed of two vectors; and  is the sigmoid activation function.
o W and , , , f i c o b are the weight matrices and bias vectors respectively; t c is the memory cell; and t c  is the element-by-element multiplication symbol between vec- tors.

Figure 4 .
Figure 4.The fitness curve of firefly algorithm.

Figure 6 .
Figure 6.Load forecasting curves of each model.
June 2016) from Lumingshan Wind Power Plant.The training error of each model is shown in Figures 7-11.

Table 3 .
Parameter settings of the firefly algorithm.

Table 5 .
The comparisons of the related forecasting results for power load.Comparing the FA-LSTM model and the LSTM model, it can be found that the firefly algorithm improved the forecasting results on the five indicators by 18.3%, 69.1%, 34.0%, 54.1%, and 40.8%, respectively.Comparing the LSTM model with the classic SVM model, we found that the forecasting results of LSTM improved by 8.5%, 34.3%, 75.6%, 37.4%, and 61.2% on the five indicators, respectively.The running time of FA-LSTM and mRMR-FA-LSTM was 241.3697 s and 193.3369 s.It can be seen that in this training, the application of mRMR reduced the training time by 24.8% compared to FA-LSTM.