Hourly Day-Ahead Wind Power Prediction Using the Hybrid Model of Variational Model Decomposition and Long Short-Term Memory

A more accurate hourly prediction of day-ahead wind power can effectively reduce the uncertainty of wind power integration and improve the competitiveness of wind power in power auction markets. However, due to the inherent stochastic and intermittent nature of wind energy, it is very difficult to sharply improve the multi-step wind power forecasting (WPF) accuracy. According to theory of direct and recursive multi-step prediction, this study firstly proposes the models of R (recursive)-VMD (variational model decomposition)-LSTM (long short-term memory) and D (direct)-VMD-LSTM for the hourly forecast of day-ahead wind power by using a combination of a novel and in-depth neural network forecasting model called LSTM and the variational model decomposition (VMD) technique. The data from these model tests were obtained from two real-world wind power series from a wind farm located in Henan, China. The experimental results show that LSTM can achieve more precise predictions than traditional neural networks, and that VMD has a good self-adaptive ability to remove the stochastic volatility and retain more adequate data information than empirical mode decomposition (EMD). Secondly, the R-VMD-LSTM and D-VMD-LSTM are comparatively studied to analyze the accuracy of each step. The results verify the effectiveness of the combination of the two models: The R-VMD-LSTM model provides a more accurate prediction at the beginning of a day, while the D-VMD-LSTM model provides a more accurate prediction at the end of a day.


Introduction
Wind energy is one of the fastest growing renewable energy sources in the world, representing an environmentally-friendly and rapidly developing wind power technology with renewable advantages [1].Due to the influences of weather, environment, and power generation equipment on the output of wind power generation, its power curve has strong volatility and intermittency.When a certain threshold is exceeded, it has a great impact on the power quality and grid operation reliability after grid connection [2,3].Therefore, it is of great practical significance to carry out more accurate forecasting in advance, so as to extend the forecasting step of wind power production, adjust the Energies 2018, 11, 3227 2 of 20 dispatching plan in time, reduce the operating cost of the power system, and determine the appropriate wind power price [4].
Wind power forecasts (WPFs) can be divided into four different types: very short-term (a few seconds to 30 min), short-term (30 min to 6 h), medium-term (6 h to 24 h), and long-term (one day and more) [5].The China National Energy Bureau (NEB) enacted a regulation in 2011 that requires the hourly prediction of day-ahead WPFs for dispatching preparation.In addition, the maximum error of the daily forecast curve should not exceed 25%, and the root mean square error (RMSE) of the all-day forecast results should be less than 20% [6].Due to the randomness of wind power output forecasts, wind power has brought new demands to the safe operation of the power system.Hence, day-ahead WPFs, especially for WPFs up to 1-24 h, have become a hot button issue in wind power systems and new energy domains with the implementation of large-scale wind power projects [7].
The mainstream WPF methods are generally divided into physical methods and machine learning methods [8].Physical methods aim to describe the physical process of the transformation from wind power to electric energy, and physical models rely on numerical weather prediction (NWP) [9].Since multiple parameters are involved in NWP, and wind farms are located in sparsely populated regions, complete data can be hardly guaranteed [10].The statistical methods applied in the WPF field are mostly time-series-based approaches, and the future value of wind power can be expressed by a linear or nonlinear function of its historical data [11,12].Instead, machine learning methods can characterize a nonlinear and complicated relationship of the networks-between input data and output data-and provide a WPF by applying various algorithms to this network.The artificial neural network (ANN), employing different structures such as feed-forward neural networks (FFNNs), extreme learning machines (ELMs), and support vector machines (SVMs), has been used to learn exemplar patterns of wind power.However, multi-step WPFs using statistical and machine learning methods have rarely been studied, because a greater number of forecasting steps corresponds to a lower accuracy.Specifically, the conventional shallow neural network (SNN) models often exhibit several shortcomings such as slow convergence and overfitting.Moreover, they are easy to trap into the local minimum when applied to high-dimensional and complex problems [13].However, one effective way to address the shallow model issue is the use of deep learning, which has the ability to discover the inherent abstract features and hidden high-level invariant structures in data.The characteristics that are specific to feature extraction make deep learning much more attractive for WPF methods.In summary, the unsatisfactory feature mining of the shallow model and existing problems for individual forecasters inspires us to rethink the WPF problem based on deep learning architecture and the ensemble technique.
To address the shallow model issues, researchers have realized that the deep neural network (DNN), which contains the convolutional neural network (CNN), deep belief network (DBN), and recurrent neural network (RNN), can be applied to handle complex nonlinear relations and dynamics [14,15].Unlike the traditional ANN, the RNN can discover the inherent abstract features and hidden high-level invariant structures in data.The characteristics that are specific to feature extraction make deep learning much more attractive for WPF methods.Specifically, the output of the RNN depends on previous computations as well as calculations of the current time step.However, due to the difficulty of learning long-range dependencies, the training of the RNN can be extremely challenging.This problem is commonly known as the vanishing/exploding gradient problem [16].A previous work proposed a novel prediction model for recursive multi-step wind speed forecasting based on long short-term memory (LSTM).As a special RNN model, a LSTM network can avoid gradient vanishing and gradient explosion in the RNN training process to a great extent, thereby making full use of the large amounts of training data for classifying and forecasting as well as clustering analyses [17].LSTM is an advanced approach in natural language processing, which considers not only the current word, but also other adjoining words in the sentence or even paragraph.Data with this kind of contextual information is called sequential data.Stimulated by the success of LSTM on machine translation, a few previous works have explored the power of LSTM on time-series prediction, and obtained Energies 2018, 11, 3227 3 of 20 promising results.For example, in reference [18], the authors Qu et al. employed principal component analysis (PCA) for data dimension reduction and established an LSTM-based short-term wind power prediction model.According to their results, the prediction accuracy of LSTM was significantly enhanced compared to the results of back propagation (BP, which is one of the training algorithms including multiple layers of perceptron) and support vector machine (SVM).In the literature [19], a wind speed prediction model was presented based on variational mode decomposition (VMD), singular spectrum analysis (SSA), an LSTM network, and extreme learning machine (ELM), in which the LSTM was employed as the predictor.Their work firstly proposed a novel prediction model for recursive small multi-step wind speed forecasting based on the LSTM, while our current study focuses on the direct and recursive forecasting model for wind power prediction of up to 24 steps.
In terms of prediction, pure statistical forecasts show that excellent performances under certain conditions are usually unavailable beyond 6 h [20,21].Apart from these basic statistical methods, preprocessing methods such as time-series decomposition (TSD) also play important roles in forecasting.The multiple frequency components that always exist in a wind power series are considered the challenging parts in big multi-step forecasting.To address this problem, empirical mode decomposition (EMD) has been widely applied to decompose the original complex time series into several simply structured series, before the datasets are constructed and input into the basic predicting model [22].The EMD method can divide the signal into several so-called Intrinsic Mode Function (IMF) components.Since the decomposition is based on the local characteristic time scale of the data, it can be applied to a non-stationary time series of produced wind power.The number of IMFs can be changed according to the harmonic content of signals, which has also been seen as the main disadvantage of EMD [23][24][25].Based on the above-mentioned information, a decomposition method named variational model decomposition (VMD) was introduced for the purpose of improving big multi-step prediction accuracy for hourly day-ahead wind power generation.Unlike EMD, VMD transforms the signal into a non-recursive signal and has a good self-adaptive ability to remove the stochastic volatility [26].
Notably, hourly day-ahead WPF plays a vital role in the wind industry and power markets, because day-ahead forecasting can effectively capture the dynamic behavior of wind power in the future, which is crucial for improving the security and economic benefits of a wind power system.Since multi-step day-ahead forecasting may accumulate forecast errors with the increasing numbers of horizons in real applications, big multi-step forecasting is much more difficult and complicated than the multi-step WPF (below six steps).In order to improve the forecasting accuracy, this paper proposes R (recursive)-VMD-LSTM and D (direct)-VMD-LSTM models on the basis of VMD and LSTM.In contrast to the aforementioned research, this study makes three novel contributions to the field.(1) The combined method proposed pursues the multi-step prediction accuracy of wind power.Specifically, the VMD technique and the combination of several base methods are implemented simultaneously to develop a multi-leveled, combined method.(2) As a well-known deep learning algorithm, the LSTM network is widely used to complete the forecasting for the sub-layers obtained by the VMD, which has satisfactory performance in long-short term dependencies.Owing to the advantages of LSTM, the outputs can directly depend on previous computations as well as calculations of the current time step.(3) To evaluate the hourly day-ahead wind power prediction performance of the combined method, the direct and recursive multi-step prediction mathematical theories are first adopted to establish the R-VMD-LSTM model and D-VMD-LSTM model for comparison.The two LSTM models can learn the correlation relationships through integrating the decomposed modes into the input of one model, which improves the overall forecast accuracy.
The rest of this paper is organized as follows.Section 2 introduces the procedure of the proposed methods and gives brief descriptions of the required individual algorithms.In Section 3, the R-VMD-LSTM and D-VMD-LSTM models are described.Section 4 provides the experimental results of two series.Finally, Section 5 concludes this paper.

The Whole Process of the VMD-LSTM Method
The combined method to forecast wind power is established based on the concept of decomposition and deep learning.On the other hand, the future generated wind power is estimated using previous historical data.The structure of the VMD-LSTM method is depicted in Figure 1, the process of which can be explained as follows [26]: (1) The LSTM neural network, with three hidden layers serving as efficient and fast learning machine tools, is adopted to complete the forecasting for each decomposed mode obtained by VMD.LSTM is utilized as the prediction core of the proposed forecasting engine to extract the complicated input-output mapping between historical and forecasting data.Thus, the number of required LSTMs is equal to the number of decomposed modes.The output of each LSTM denotes the forecasted wind power of the same mode order.The LSTM neural network is described in Section 2.2.(2) VMD is adopted to decompose the original wind power series into m modes with different frequencies.The analysis of the time series of wind power can be helpful for the precise modeling of its intermittent characteristics.These modes are then used to construct training patterns and forecasted outputs.The details of VMD are presented in Section 2.3.(3) The prediction results of the mode values are summed as the ultimate prediction of wind power by the wind farm.

The Whole Process of the VMD-LSTM Method
The combined method to forecast wind power is established based on the concept of decomposition and deep learning.On the other hand, the future generated wind power is estimated using previous historical data.The structure of the VMD-LSTM method is depicted in Figure 1, the process of which can be explained as follows [26]: 1) The LSTM neural network, with three hidden layers serving as efficient and fast learning machine tools, is adopted to complete the forecasting for each decomposed mode obtained by VMD.LSTM is utilized as the prediction core of the proposed forecasting engine to extract the complicated input-output mapping between historical and forecasting data.Thus, the number of required LSTMs is equal to the number of decomposed modes.The output of each LSTM denotes the forecasted wind power of the same mode order.The LSTM neural network is described in Section 2.2.2) VMD is adopted to decompose the original wind power series into m modes with different frequencies.The analysis of the time series of wind power can be helpful for the precise modeling of its intermittent characteristics.These modes are then used to construct training patterns and forecasted outputs.The details of VMD are presented in Section 2.3.
3) The prediction results of the mode values are summed as the ultimate prediction of wind power by the wind farm.

LSTM Network
Due to the limited computing power and the large amount of data acquisition, deep learning has only been widely concerned and applied in the last two years.LSTM is a type of deep learning algorithm that is an extended version of recurrent neural networks (RNNs).Different from RNN, LSTM has three gates and a memory cell, which are used to control the information in the hidden neuron.The bidirectional RNN (BRNN) is proposed from the directional neural network structure, which can be used for training forward neural networks and backward neural networks.Hence, the output is calculated based on the output state of the hidden layers of forward neural networks and backward neural networks [18].Figure 2 is a schematic diagram of the structure of the RNN bidirectional cyclic neural network, which is one of the earliest RNN architectures. Figure 3 shows the directional recurrent neural network structures and its structure development diagram.However, the gradient problem in the RNN neural network, which exists widely in forward propagation and backward propagation, may occur when the training gradient cannot be transmitted

LSTM Network
Due to the limited computing power and the large amount of data acquisition, deep learning has only been widely concerned and applied in the last two years.LSTM is a type of deep learning algorithm that is an extended version of recurrent neural networks (RNNs).Different from RNN, LSTM has three gates and a memory cell, which are used to control the information in the hidden neuron.The bidirectional RNN (BRNN) is proposed from the directional neural network structure, which can be used for training forward neural networks and backward neural networks.Hence, the output is calculated based on the output state of the hidden layers of forward neural networks and backward neural networks [18].Figure 2 is a schematic diagram of the structure of the RNN bidirectional cyclic neural network, which is one of the earliest RNN architectures. Figure 3 shows the directional recurrent neural network structures and its structure development diagram.However, the gradient problem in the RNN neural network, which exists widely in forward propagation and backward propagation, may occur when the training gradient cannot be transmitted in the long sequence.Eventually, the neural network will not be able to learn the internal complete information of data acquisition.It is noted that the special structure of LSTM neurons successfully solved the defect gradient uncertainty problem (gradient blasting and gradient disappearance) of the traditional loop neural network during the model training [18].Figure 4 shows the core difference between RNN and LSTM, and displays the special hidden unit of LSTM.
Energies 2018, 11, x 5 of 21 in the long sequence.Eventually, the neural network will not be able to learn the internal complete information of data acquisition.It is noted that the special structure of LSTM neurons successfully solved the defect gradient uncertainty problem (gradient blasting and gradient disappearance) of the traditional loop neural network during the model training [18].Figure 4 shows the core difference between RNN and LSTM, and displays the special hidden unit of LSTM.    in the long sequence.Eventually, the neural network will not be able to learn the internal complete information of data acquisition.It is noted that the special structure of LSTM neurons successfully solved the defect gradient uncertainty problem (gradient blasting and gradient disappearance) of the traditional loop neural network during the model training [18].Figure 4 shows the core difference between RNN and LSTM, and displays the special hidden unit of LSTM.    in the long sequence.Eventually, the neural network will not be able to learn the internal complete information of data acquisition.It is noted that the special structure of LSTM neurons successfully solved the defect gradient uncertainty problem (gradient blasting and gradient disappearance) of the traditional loop neural network during the model training [18].Figure 4 shows the core difference between RNN and LSTM, and displays the special hidden unit of LSTM.As shown in Figure 4, the input gate permits inputs to modify the memory cell state, the output gate allows or obstructs the memory cell state from influencing other neurons, and the memory cell can impede outside interference, which further allows the LSTM model to learn time series with long Energies 2018, 11, 3227 6 of 20 spans.The i, f, and o represent the input, forget, and output gates, respectively.The c and c represent the memory cell state and the new memory cell gate, respectively.The mathematical representation of LSTM can be obtained as [27]: Input gate (i t , which decides the information to be added to the cell): Forget gate (f t , which decides the information to be deleted from the cell): Output gate (o t , which decides the information to be exported from the cell): Cell: Output vector: where x t is the input vector.W, U, V, and b are parameters for weights and biases.• denotes the scalar product of two vectors; σ g is the sigmoid function; and σ h and σ c are the hyperbolic tangent functions.For a given input z, the output of the hyperbolic tangent function is:

VMD
VMD is a newly developed multi-resolution technique for adaptive and non-recursive signal decomposition.Its overall solution process is the construction and solution of the variational problem.Each mode can be compacted around a center pulsation that is determined during the decomposition process.This involves decomposing a real valued signal into different modes called u k , which have specific sparsity properties, while reproducing the main signal.Each mode compacts with a central frequency of ω k and has limited bandwidth [28][29][30][31].The solution of the sub-signal u k (t) is as follows: (1) According to the Hilbert transform, the corresponding analytic signal of each sub-modal u k (t) is calculated, so that the single-side spectrum can be obtained.(2) By mixing the index e −jω k t of each modal analytical signal corresponding center frequency ω k , the sub-modal signal u k (t) spectrum is changed to the base frequency band.(3) The norm of the square of the demodulation signal gradient L 2 is calculated to estimate the width of the sub-modal signal u k (t).The variational problem with constraint is: Here, the quadratic penalty factor and Lagrange multiplier α are used to transform Equation ( 8) to an unconstrained optimization problem.The formula is as follows: where α is the balance parameter, which is used to guarantee the reconstruction accuracy of the signal in the presence of Gaussian noise, and λ is the Lagrange multiplier, which is used to maintain the rigor of the constraint conditions.
The minimization problem of Equation ( 8) is transformed into a series of sub-optimization problems, which are expressed by Equation ( 9).The multiplier alternating direction method is used to solve the problem, and the solution is as follows: 10) where f (ω), ûn i (ω), λ(ω), and ûn+1 (ω) represents the focus center of the current modal function.n is the number of iterations.The steps of variational modal decomposition are as follows: (1) Initialize ω 1 k , ûn i , and λ1 k , set the number of iterations to 1. (2) For each sub-signal, according to Equations ( 10) and (11), update operations to obtain ûn+1 k (ω) and ω n+1 k .
(3) Update the Lagrange multiplier according to Equation (12): where τ is the update parameter.(4) The decomposition process ends when the convergence condition is satisfied; otherwise, the iteration number is updated, and the process returns to Step 2.

Performance Evaluation Index
Three evaluation indexes were chosen to evaluate the prediction performance of the proposed model [5].
The mean absolute error (MAE) can be defined as: Energies 2018, 11, 3227 where e t+k/t denotes the prediction errors and can be defined as: where V t+k and Vt+k/t are the actual and predicted wind power, MW.
The mean absolute percentage error (MAPE) can be defined as: e t+k/t P cap (16) where P cap denotes the installed capacity of the wind farm.
The RMSE can be defined as:

R-VMD-LSTM and D-VMD-LSTM Models
A neural network model for small multi-step wind power prediction is extensive, but big multi-step methods are relatively few, mainly due to the limitations of traditional neural network internal structures and computer operating performance.With the rapid development of computer parallel computing technology and the widespread use of deep learning, the LSTM neural network was introduced for big multiple steps prediction due to its special network structure, as described in detail in Section 2.3.Therefore, on the basis of the combined method, in this study, R-VMD-LSTM and D-VMD-LSTM are proposed for hourly day-ahead WPFs.The model architectures are demonstrated in Figure 5.

R-VMD-LSTM
Wind power hourly data formally lined up as a time series in hours.This time series

R-VMD-LSTM
Wind power hourly data formally lined up as a time series in hours.This time series [P 1 , P 2 , . . ., P N ], which is composed of N observations, is used for forecasting wind power for s steps Energies 2018, 11, 3227 9 of 20 (hours) ahead, given as [P N+1 , P N+2 , . . ., P N+S ], where s denotes the absolute forecasting horizon.The basis of any forecasting strategy is that the predicted power (P ) can be shown as a function of past values (P).
In the R-VMD-LSTM model, first, a single model f r (•) is trained to perform a one-step ahead forecast, i.e., P t+1 = f r (P t , P t−1 , . . ., P t−d+2 , P t−d+1 + ω (18) where f r (•) represents the recursive relationship between P t+1 and P t , P t−1 , . . ., P t−d+2 , P t−d+1 ; P t+1 denotes the first-step forecasted wind power; P t , P t−1 , . . ., P t−d+2 , P t−d+1 denotes the historical wind power; d denotes the number of previous inputs of the series; and ω denotes the bias.For t steps, step one is predicted by using the model in Equation (18).Afterwards, the forecasted value is included as the latest entry of the input data to predict the next step using the same trained model.This procedure is repeated for the horizon s and input length d.The established recursive forecasting models are described as: where P t+1 , P t+2 , . . ., P t+s denote the one-step ahead, two-steps ahead and s-steps ahead forecasted wind power, respectively; and 24 was selected for s in the recursive day-ahead forecasting model.

D-VMD-LSTM
The D-VMD-LSTM model is based on the principle that each step is independent from the others.Hence, a separate function model is trained for each forecasting horizon, with a total number of models for the entire forecasting horizon.For big multi-step forecasts, a recursive model may be potentially inaccurate because of the accumulation of forecast error with each forecasted value.

Experimental Date Description
This paper adopts two series of hourly wind power collected respectively in summer (1-31 August 2017) and spring (1-31 March 2017) from a wind farm located in Henan, China to evaluate the effectiveness of the proposed models (Figure 6).Due to the different climatic features in summer and spring, these series have obviously different characteristics.Hence, the sets adopted in this paper can comprehensively and systematically evaluate the effectiveness and practicability of the proposed models.In addition, in each dataset, the observation numbers 1-576 (24 days) and 577-744 (7 days) are respectively adopted as the training and testing sets.It should be noted that all of the prediction simulations are implemented in Python 3.x.
August 2017) and spring (1-31 March 2017) from a wind farm located in Henan, China to evaluate the effectiveness of the proposed models (Figure 6).Due to the different climatic features in summer and spring, these series have obviously different characteristics.Hence, the sets adopted in this paper can comprehensively and systematically evaluate the effectiveness and practicability of the proposed models.In addition, in each dataset, the observation numbers 1-576 (24 days) and 577-744 (7 days) are respectively adopted as the training and testing sets.It should be noted that all of the prediction simulations are implemented in Python 3.x.

Parameter Selection
In this study, BP, ELM, and SVM are utilized to evaluate the performance of the proposed models.Table 1 represents the main parameters of BP, ELM, SVM, and LSTM.To demonstrate the superiority of the R-VMD-LSTM and D-VMD-LSTM models over the benchmarked models, comparisons between the proposed models and EMD-BP, EMD-ELM, EMD-SVM, EMD-LSTM, VMD-BP, VMD-ELM, and VMD-SVM are made.

Parameter Selection
In this study, BP, ELM, and SVM are utilized to evaluate the performance of the proposed models.Table 1 represents the main parameters of BP, ELM, SVM, and LSTM.To demonstrate the superiority of the R-VMD-LSTM and D-VMD-LSTM models over the benchmarked models, comparisons between the proposed models and EMD-BP, EMD-ELM, EMD-SVM, EMD-LSTM, VMD-BP, VMD-ELM, and VMD-SVM are made.In addition to the parameters of the above-mentioned models, the selection of candidate input also has a crucial impact on the prediction accuracy of a given model.In the proposed models, the forecast interval is 1 h, and forecast horizons are 24 h.Thus, the time interval between two original series of wind power is 1 h, and the forecast engine predicts the next 24 h of wind farm-produced power.The hourly lagged values of wind farm-produced power in the past 24 h (one day), 48 h (two days) and 72 h (three days) are considered as candidate input.Each candidate vector of input length has 24 points, 48 points (24 × 2), and 72 points (24 × 3), respectively.The comparison and evaluation of three candidate input lengths based on the LSTM model are shown in Table 2.The RMSE, MAE, and MAPE of 48 points are 7.6 MW, 6.4 MW, and 21%, showing that 48 points provided the best performance of the three candidate input lengths.In order to improve the overall predicting performance, this paper adopts the VMD technique to decompose the original series into a number of components.The data decomposition results are listed in Figures 7 and 8. Specifically, the first 576 data observations of each mode are used as the training sets for the forecast model.
VMD is a completely non-recursive decomposition model, in which all of the modes are extracted concurrently.However, the model requires a preset mode number k, which limits the adaptability of the method, since a large deviation in the number of modes will cause some modes to be discarded or cause the mixing of modes.Hence, the ratio of residual energy to the original signal energy was introduced as an optimization index in the VMD technique, which is the residual error after the decomposition of the original signal [31].The final mode number was determined when the ratio was less than 1%.The ratio of residual energy under a different mode number by VMD is presented in Table 3.The formula is as follows: where f and u k (t) denote the original wind power series and the mode, respectively.VMD is a completely non-recursive decomposition model, in which all of the modes are extracted concurrently.However, the model requires a preset mode number k, which limits the adaptability of the method, since a large deviation in the number of modes will cause some modes to be discarded or cause the mixing of modes.Hence, the ratio of residual energy to the original signal energy was introduced as an optimization index in the VMD technique, which is the residual error

Recursive Hourly Day-Ahead Forecasting
To validate the prediction performance of the R-VMD-LSTM model, eight models are employed for comparison: the LSTM model, the ELM model, the SVM model, the EMD-ELM model, the EMD-SVM model, the EMD-LSTM, the VMD-SVM model, and the VMD-ELM model.The evaluation indexes of the proposed and comparison models are presented in Table 4. Figure 9 shows the scatter plot of the observed values and forecasting values for different recursive models.Also, the forecasting results based on the R-VMD-LSTM, VMD-ELM, and VMD-SVM models are shown in Figure 10.
Based on the forecasting error results shown in Table 4, the following can be observed.

Recursive Hourly Day-Ahead Forecasting
To validate the prediction performance of the R-VMD-LSTM model, eight models are employed for comparison: the LSTM model, the ELM model, the SVM model, the EMD-ELM model, the EMD-SVM model, the EMD-LSTM, the VMD-SVM model, and the VMD-ELM model.The evaluation indexes of the proposed and comparison models are presented in Table 4. Figure 9 shows the scatter plot of the observed values and forecasting values for different recursive models.Also, the forecasting results based on the R-VMD-LSTM, VMD-ELM, and VMD-SVM models are shown in Figure 10.
Based on the forecasting error results shown in Table 4, the following can be observed.

Direct Hourly Day-Ahead Forecasting
In the direct forecasting model, BP and LSTM are used to establish the hybrid methods (EMD-BP, VMD-BP, EMD-LSTM, and D-VMD-LSTM) for comparison.The evaluations are summarized in Table 5, and the forecasting results are shown in Figures 11 and 12. From Table 5, the following can be observed.

Direct Hourly Day-Ahead Forecasting
In the direct forecasting model, BP and LSTM are used to establish the hybrid methods (EMD-BP, VMD-BP, EMD-LSTM, and D-VMD-LSTM) for comparison.The evaluations are summarized in Table 5, and the forecasting results are shown in Figures 11 and 12. From Table 5, the following can be observed.

Contrast Analysis between R-VMD-LSTM and D-VMD-LSTM Models
As a further investigation, in this section, wind power Series 1 and 2 are adopted to make direct and recursive hourly day-ahead predictions using R-VMD-LSTM and D-VMD-LSTM model comparisons and analyses.Table 6 and Figure 13 show that the errors of R-VMD-LSTM and D-VMD-LSTM range from step one to Step 24 of Series 1 and 2, respectively.In order to more clearly analyze the results of the D-VMD-LSTM model, the multi-step histogram of the forecasting error of Series 1 and 2 by the proposed model is described in Figures 14 and 15

Contrast Analysis between R-VMD-LSTM and D-VMD-LSTM Models
As a further investigation, in this section, wind power Series 1 and 2 are adopted to make direct and recursive hourly day-ahead predictions using R-VMD-LSTM and D-VMD-LSTM model comparisons and analyses.Table 6 and Figure 13 show that the errors of R-VMD-LSTM and D-VMD-LSTM range from step one to Step 24 of Series 1 and 2, respectively.In order to more clearly analyze the results of the D-VMD-LSTM model, the multi-step histogram of the forecasting error of Series 1 and 2 by the proposed model is described in Figures 14 and 15  Prediction results of direct hourly day-ahead forecasting models by VMD-BP and D-VMD-LSTM.

Contrast Analysis between R-VMD-LSTM and D-VMD-LSTM Models
As a further investigation, in this section, wind power Series 1 and 2 are adopted to make direct and recursive hourly day-ahead predictions using R-VMD-LSTM and D-VMD-LSTM model comparisons and analyses.Table 6 and Figure 13 show that the errors of R-VMD-LSTM and D-VMD-LSTM range from step one to Step 24 of Series 1 and 2, respectively.In order to more clearly analyze the results of the D-VMD-LSTM model, the multi-step histogram of the forecasting error of Series 1 and 2 by the proposed model is described in Figures 14 and 15.From Table 6 and Figures 13 and 14  From Table 6 and Figures 13 and 14

Discussion and Conclusions
With the large-scale popularization of wind energy, the liberalized power market is undergoing fierce competition.Accurate WPF plays a vital role in energy auction markets and efficient resource planning.Efficient day-ahead forecasting models have to be applied to mitigate the uncertainty of wind energy access to the grid.For existing power systems, although various forecasting models could supply a straightforward solution, their refined planning and the operation of smart grids represent significant constraints.
This paper mainly proposed R-VMD-LSTM and D-VMD-LSTM models for hourly predictions of day-ahead wind power.Compared with the other works, this work focuses more on the characteristics of direct and recursive forecasting, and expands the research to hourly day-ahead forecasts.Based on a novel combined technique, this method shows that the VMD algorithm has an integrating performance as well as good self-adaptive ability to remove the stochastic volatility.
This study also demonstrated that LSTM is able to provide precise time-series predictions for wind power plants.LSTM can be a powerful tool for managers or engineers of electric power systems, who can take advantage of accurate predictions to improve their decision-making.Intelligent wind farms will become the backbone of smart grids.In recent years, deep learning methods have gained more and more success in many fields.In the literature [31], the authors presented a wind speed prediction model based on VMD and the CNN (Convolutional Neural Network), in which the CNN

Discussion and Conclusions
With the large-scale popularization of wind energy, the liberalized power market is undergoing fierce competition.Accurate WPF plays a vital role in energy auction markets and efficient resource planning.Efficient day-ahead forecasting models have to be applied to mitigate the uncertainty of wind energy access to the grid.For existing power systems, although various forecasting models could supply a straightforward solution, their refined planning and the operation of smart grids represent significant constraints.
This paper mainly proposed R-VMD-LSTM and D-VMD-LSTM models for hourly predictions of day-ahead wind power.Compared with the other works, this work focuses more on the characteristics of direct and recursive forecasting, and expands the research to hourly day-ahead forecasts.Based on a novel combined technique, this method shows that the VMD algorithm has an integrating performance as well as good self-adaptive ability to remove the stochastic volatility.
This study also demonstrated that LSTM is able to provide precise time-series predictions for wind power plants.LSTM can be a powerful tool for managers or engineers of electric power systems, who can take advantage of accurate predictions to improve their decision-making.Intelligent wind farms will become the backbone of smart grids.In recent years, deep learning methods have gained more and more success in many fields.In the literature [31], the authors presented a wind speed prediction model based on VMD and the CNN (Convolutional Neural Network), in which the CNN

Discussion and Conclusions
With the large-scale popularization of wind energy, the liberalized power market is undergoing fierce competition.Accurate WPF plays a vital role in energy auction markets and efficient resource planning.Efficient day-ahead forecasting models have to be applied to mitigate the uncertainty of wind energy access to the grid.For existing power systems, although various forecasting models could supply a straightforward solution, their refined planning and the operation of smart grids represent significant constraints.
This paper mainly proposed R-VMD-LSTM and D-VMD-LSTM models for hourly predictions of day-ahead wind power.Compared with the other works, this work focuses more on the characteristics of direct and recursive forecasting, and expands the research to hourly day-ahead forecasts.Based on a novel combined technique, this method shows that the VMD algorithm has an integrating performance as well as good self-adaptive ability to remove the stochastic volatility.
This study also demonstrated that LSTM is able to provide precise time-series predictions for wind power plants.LSTM can be a powerful tool for managers or engineers of electric power systems,

Figure 1 .
Figure 1.The whole process of the variational model decomposition (VMD)-long short-term memory (LSTM) method.

Figure 1 .
Figure 1.The whole process of the variational model decomposition (VMD)-long short-term memory (LSTM) method.

Figure 3 .
Figure 3. Directional loop neural network structure development diagram.

Figure 3 .
Figure 3. Directional loop neural network structure development diagram.

Figure 3 .
Figure 3. Directional loop neural network structure development diagram.

Figure 3 .
Figure 3. Directional loop neural network structure development diagram.
LSTM is implemented using Keras and Tensorflow.Keras is a Python-based high-level deep learning library.It runs on top of TensorFlow or Theano.TensorFlow is an open-source deep learning software released by Google in 2016.Under the Apache 2.0 and open-source code specifications, all developers can use it [28].

Figure 6 .
Figure 6.Two original wind power series.

Figure 6 .
Figure 6.Two original wind power series.

Figure 7 .
Figure 7. Decomposition results of wind power in Series 1.

Figure 7 .
Figure 7. Decomposition results of wind power in Series 1.

Figure 8 .
Figure 8. Decomposition results of wind power in Series 2.
(a) The comparison between without (without decomposition), EMD, and VMD prediction models show that the models combined with VMD are effective in boosting the forecasting accuracy.This is because the VMD model is helpful in modeling the data stochastic volatility.(b) A comparison is conducted among LSTM, ELM, and SVM.The results indicate that the deep learning approaches are more effective than the conventional machine learning method in enhancing the forecasting accuracy due to the directional loop neural network structure and special hidden unit.(c) The R-VMD-LSTM model performs better than the VMD-ELM and VMD-SVM models in the recursive forecasting.However, as the step increases, the accuracy of the R-VMD-LSTM model is decreasing, and its RMSE, MAE, and MAPE range from 0.35 MW, 0.28 MW, and 1% to 7.16 MW, 6.44 MW, and 22% from Step 1 to Step 24, respectively.

8 .
Decomposition results of wind power in Series 2.
(a) The comparison between without (without decomposition), EMD, and VMD prediction models show that the models combined with VMD are effective in boosting the forecasting accuracy.This is because the VMD model is helpful in modeling the data stochastic volatility.(b) A comparison is conducted among LSTM, ELM, and SVM.The results indicate that the deep learning approaches are more effective than the conventional machine learning method in enhancing the forecasting accuracy due to the directional loop neural network structure and special hidden unit.(c) The R-VMD-LSTM model performs better than the VMD-ELM and VMD-SVM models in the recursive forecasting.However, as the step increases, the accuracy of the R-VMD-LSTM model is decreasing, and its RMSE, MAE, and MAPE range from 0.35 MW, 0.28 MW, and 1% to 7.16 MW, 6.44 MW, and 22% from Step 1 to Step 24, respectively.

Figure 9 .
Figure 9. High-density scatter plot of true values and forecasting values of recursive hourly dayahead forecasting models for Series 1.

Figure 9 .
Figure 9. High-density scatter plot of true values and forecasting values of recursive hourly day-ahead forecasting models for Series 1.
(a) Similar to the use in the recursive hourly day-ahead prediction study, VMD and LSTM are effective in direct forecasting.(b) The D-VMD-LSTM model performs better than the VMD-BP model, which provides the forecasting performances of the proposed and other comparison models over horizons.Specifically, the RMSE, MAE, and MAPE range from 1.56 MW, 1.26 MW, and 4% to 5.91 MW, 4.66 MW, and 16% between Step 1 and Step 24, respectively.The results indicate that the D-VMD-LSTM model can improve the accuracy of hourly day-ahead predictions to a great extent.
(a) Similar to the use in the recursive hourly day-ahead prediction study, VMD and LSTM are effective in direct forecasting.(b) The D-VMD-LSTM model performs better than the VMD-BP model, which provides the forecasting performances of the proposed and other comparison models over horizons.Specifically, the RMSE, MAE, and MAPE range from 1.56 MW, 1.26 MW, and 4% to 5.91 MW, 4.66 MW, and 16% between Step 1 and Step 24, respectively.The results indicate that the D-VMD-LSTM model can improve the accuracy of hourly day-ahead predictions to a great extent.

Figure 11 .
Figure 11.High-density scatter plot of true values and forecasting values for different recursive models of Series 1.

Figure 12 .
Figure 12.Prediction results of direct hourly day-ahead forecasting models by VMD-BP and D-VMD-LSTM. .

Figure 11 . 21 Figure 11 .
Figure 11.High-density scatter plot of true values and forecasting values for different recursive models of Series 1.

Figure 12 .
Figure 12.Prediction results of direct hourly day-ahead forecasting models by VMD-BP and D-VMD-LSTM. .
Figure 12.Prediction results of direct hourly day-ahead forecasting models by VMD-BP and D-VMD-LSTM.

Figure 13 .
Figure 13.(a) Errors from Step 1 to Step 24 based on the R-VMD-LSTM and D-VMD-LSTM models of Series 1.(b) Errors from Step 1 to multi-step based on the R-VMD-LSTM and D-VMD-LSTM models of Series 2.
, the following can be concluded.(a) The R-VMD-LSTM model gives a more accurate prediction at the beginning of a day, while the D-VMD-LSTM model gives a more accurate prediction at the end of a day.For instance, the MAPE of the D-VMD-LSTM model is below 20% from step one to Step 24 for both Series 1 and 2. Therefore, the D-VMD-LSTM model can satisfy the bidding price requirements in a liberalized power market, as proposed by the NEB in 2011.(b) The prediction error of the D-VMD-LSTM model shows a single peak distribution of "high in the middle and low on both sides", and the peak at the error zero point of the distribution is concentrated.With the extension of the prediction horizon, the single peak value tends to be low and dwarfing, and the error tends to expand.

Figure 13 .
Figure 13.(a) Errors from Step 1 to Step 24 based on the R-VMD-LSTM and D-VMD-LSTM models of Series 1.(b) Errors from Step 1 to multi-step based on the R-VMD-LSTM and D-VMD-LSTM models of Series 2.

21 Figure 14 .
Figure 14.The diagram of error histogram of the D-VMD-LSTM model of Series 1.

Figure 15 .
Figure 15.The diagram of error histogram of the D-VMD-LSTM model of Series 2.

Figure 14 . 21 Figure 14 .
Figure 14.The diagram of error histogram of the D-VMD-LSTM model of Series 1.

Figure 15 .
Figure 15.The diagram of error histogram of the D-VMD-LSTM model of Series 2.

Figure 15 .
Figure 15.The diagram of error histogram of the D-VMD-LSTM model of Series 2.
} and {ω k } represent the set of sub-signals and its corresponding central frequency, k is the total number of sub-signals, δ(t) is the Dirac distribution, and ⊗ represents the convolution operation.

Table 2 .
Comparison and evaluation of different input lengths in the LSTM model.MAPE: mean absolute percentage error.

Table 3 .
The ratio of residual energy with different mode numbers by VMD.

Table 4 .
Performance evaluations of recursive hourly day-ahead forecasting models for series 1. EMD: empirical mode decomposition.

Table 4 .
Performance evaluations of recursive hourly day-ahead forecasting models for series 1. EMD: empirical mode decomposition.

Table 5 .
Performance evaluations of direct hourly day-ahead forecasting models for Series 1.

Table 5 .
Performance evaluations of direct hourly day-ahead forecasting models for Series 1.

Table 6 .
The hourly day-ahead forecasting results of the D-VMD-LSTM model of Series 1 and 2. MAE: mean absolute error; and RMSE: root mean square error.

Table 6 .
The hourly day-ahead forecasting results of the D-VMD-LSTM model of Series 1 and 2. MAE: mean absolute error; and RMSE: root mean square error.