Short-Term Wind Power Prediction Based on Data Decomposition and Combined Deep Neural Network

: A hybrid short-term wind power prediction model based on data decomposition and combined deep neural network is proposed with the inclusion of the characteristics of ﬂuctuation and randomness of nonlinear signals, such as wind speed and wind power. Firstly, the variational mode decomposition (VMD) is used to decompose the wind speed and wind power sequences in the input data to reduce the noise in the original signal. Secondly, the decomposed wind speed and wind power sub-sequences are reconstructed into new data sets with other related features as the input of the combined deep neural network, and the input data are further studied for the implied features by convolutional neural network (CNN), which should be passed into the long and short-term memory neural network (LSTM) as input for prediction. At the same time, the improved particle swarm optimization algorithm (IPSO) is adopted to optimize the parameters of each prediction model. By superimposing each predicted sub-sequence, the predicting wind power could be obtained. Simulations based on a short-term power prediction in different months with huge weather differences is carried out for a wind farm in Guangdong, China. The simulated results validate that the proposed model has a high prediction accuracy and generalization ability.


Introduction
As a source of clean and renewable power generation, the installed capacity of wind power is increasing year by year.However, wind power is volatile and stochastic, improving the accuracy of wind power prediction contributes to control the power grid stability, and maximizes the wind energy consumption of the power grid operation.According to the following references, predicting model based on Artificial Intelligence (AI) has more potential in the short-term prediction.At the same time, to enhance the prediction model with better performance and to address the limitations of a single AI model in terms of prediction accuracy, combined prediction models have been used in wind power prediction in recent years.Combined prediction models have been improved from multiple perspectives, such as data decomposition [1], feature clustering [2], parameter optimization [3], and error correction [4], which have greatly improved the performance of wind power prediction.
As the first step in combined prediction models, mode decomposition of data is usually performed with the purpose to reduce the volatility of data.The commonly used decomposition methods for wind power prediction are empirical mode decomposition (EMD) [5], ensemble empirical mode decomposition (EEMD) [6], and improved methods are based on EMD [7][8][9][10].At the same time, there are also Wavelet Decomposition (WPD) [11], singular spectrum analysis [12], and other methods for smoothing data.Variational mode decomposition (VMD) [13]  effectively suppress the phenomenon of spectral aliasing during EMD by controlling its bandwidth [14].The merits and shortcomings of the above-mentioned model decompositions are shown in Table 1.Compared with the original data, the decomposed sub-data are more likely to be mined by the prediction model for implied features, thus improving the training effectiveness.After one-step decomposition of the original wind speed and wind power, the sub-series may keep high complexity.To further reduce the noise of the original data, the framework of multi-step mode decomposition was firstly proposed in [15] to be applied to wind power prediction, and the experimental results verified the effectiveness of the method.In [16], after dividing the data into long and short time scales, the data of the two time scales are processed separately using EEMD and reinforcement learning, and the experimental results verify that the method is feasible.Parameter optimization, the use of intelligent algorithms to solve a problem, has the issues that the prediction model parameters are difficult to be determined.By setting the solution space of the prediction model parameters, the optimal solution of the parameters is continuously searched and optimized, which greatly maximizes the potential of the intelligent algorithm in parameter optimization [17].Reference [18] used a combination of Adaboost and PSO algorithms to optimize the thresholds and weights of ELM to improve the prediction accuracy of the model and to avoid the impact on accuracy caused by manually setting parameters.Using the idea of using FAR and AR models for forecasting uncertain components in [19] to apply to short-term wind power forecasting, the decomposed components of the data are divided into deterministic and non-deterministic components, and adopting a combined forecasting model to deal with the deterministic component and applying an autoregressive model to deal with the uncertain component will be helpful for forecasting accuracy.Wind power prediction is not only related to the input at the current moment but also correlated with the past inputs and outputs since As LSTM is equipped with memory function and captures the temporal correlation between wind power and other meteorological features, such as wind speed and temperature, LSTM is more suitable for wind power prediction [20].A detailed and in-depth study and analysis of the application of artificial intelligence techniques (deep learning models) to wind speed and wind power has been conducted in the literature [21] to verify the applicability of the combined depth model to wind power prediction from different perspectives, such as data pre-processing, data feature mining, and data relationship learning.
By summarizing the above results, the models used in wind power prediction mostly consider multiple perspectives and combine prediction models to ensure that prediction accuracy can be at a high level.Table 2 gives a detailed comparison of this paper with some other references.In this paper, a novel hybrid short-term wind power prediction model was proposed that is based on data decomposition (VMD) and combined deep neural network (CNN-LSTM).Firstly, the model uses VMD to decompose wind speed and wind power, with the aim of smoothing such time series as needed due to the volatility of wind speed and wind power.The decomposed wind speed and wind power sub-series are reconstructed with other correlated feature variables (temperature and humidity) into several new data sets as the multi-feature inputs to the prediction model.Secondly, to dig deeper into the implied relationships between the variables, the input data are subject to feature extraction using CNN.The prediction model is constructed independently for each new data set, and the network structure parameters are optimized using improved particle swarm algorithm (IPSO).Then, the optimized parameters are substituted back into the prediction model for wind power prediction.The final wind power is obtained by superimposing each prediction component.The simulated results verify the improvement of the proposed model in prediction accuracy and the effectiveness of combining IPSO to optimize the network structure parameters.This paper is organized as follows: Section 2 describes the methodology.Section 3 presents the construction of the wind power prediction model and the evaluation metrics.Section 4 provides case study.The conclusion is drawn in Section 5.

Variational Mode Decomposition
VMD is an adaptive and completely non-recursive mode decomposition method, which essentially smooths the sequence signal [23].VMD uses iterative search for the optimal solution of the variational model to select the center frequency and bandwidth of each component, so the decomposed modes can achieve the reproduction of the original sequence, and at the same time each mode demodulation to the baseband can be smooth.In this paper, VMD is chosen as the decomposition method because it can effectively reduce the non-smoothness of time series by overcoming the problem of mode component mixing in EMD.The specific decomposition process and the algorithmic steps are described in References [24][25][26][27][28]. VMD is used as a pre-processing tool for time series, and the decomposed mode quantities with different characteristics can be applied as input to subsequent analysis methods for further in-depth analysis.

Convolutional Neural Network
CNN has features, such as shared weights and local connectivity [29], which makes it an efficient processing tool for data feature extraction.CNN can be classified into 1D, 2D, and 3D, and each of the three types of CNN is applicable to different categories.The data sets that make up wind power prediction are essentially time series composed of multiple feature variables, and 1D CNN is mainly used for feature extraction of time series data.In this paper, the time series of temperature and humidity related to wind power are combined with the sub-series of wind speed and wind power decomposed by VMD to form the multi-feature input of the 1D-CNN, which aims to further explore the implied relationship between the feature variables and to improve the training efficiency of the model.Considering the small number of features in the data of this paper, no pooling layer is set in this CNN.The computation process of the 1D CNN is: Figure 1 shows the CNN-LSTM combined neural network model used in this paper.
Energies 2022, 15, x FOR PEER REVIEW 6 of 17 This paper is organized as follows: Section 2 describes the methodology.Section 3 presents the construction of the wind power prediction model and the evaluation metrics.Section 4 provides case study.The conclusion is drawn in Section 5.

Variational Mode Decomposition
VMD is an adaptive and completely non-recursive mode decomposition method, which essentially smooths the sequence signal [23].VMD uses iterative search for the optimal solution of the variational model to select the center frequency and bandwidth of each component, so the decomposed modes can achieve the reproduction of the original sequence, and at the same time each mode demodulation to the baseband can be smooth.In this paper, VMD is chosen as the decomposition method because it can effectively reduce the non-smoothness of time series by overcoming the problem of mode component mixing in EMD.The specific decomposition process and the algorithmic steps are described in References [24][25][26][27][28]. VMD is used as a pre-processing tool for time series, and the decomposed mode quantities with different characteristics can be applied as input to subsequent analysis methods for further in-depth analysis.

Convolutional Neural Network
CNN has features, such as shared weights and local connectivity [29], which makes it an efficient processing tool for data feature extraction.CNN can be classified into 1D, 2D, and 3D, and each of the three types of CNN is applicable to different categories.The data sets that make up wind power prediction are essentially time series composed of multiple feature variables, and 1D CNN is mainly used for feature extraction of time series data.In this paper, the time series of temperature and humidity related to wind power are combined with the sub-series of wind speed and wind power decomposed by VMD to form the multi-feature input of the 1D-CNN, which aims to further explore the implied relationship between the feature variables and to improve the training efficiency of the model.Considering the small number of features in the data of this paper, no pooling layer is set in this CNN.The computation process of the 1D CNN is: Figure 1 shows the CNN-LSTM combined neural network model used in this paper.

Long Short-Term Memory Neural Network
LSTM is the improved recurrent neural network (RNN), which aims to overcome the problem of gradient explosion and gradient disappearance in traditional RNN.The internal structure of LSTM is mainly composed of three gates, which are forget gate f t , input gate i t , and output gate o t .LSTM has the function of memory and can capture the correlation between past and future information, so LSTM is very advantageous for its application on time series [30].Considering that wind speed and wind power data are typically nonlinear and volatile time series, the prediction model uses LSTM as the neural network for modeling.The structure of LSTM is shown in Figure 2.
LSTM is the improved recurrent neural network (RNN) problem of gradient explosion and gradient disappearance i nal structure of LSTM is mainly composed of three gates, w gate it, and output gate ot.LSTM has the function of memory tion between past and future information, so LSTM is very a tion on time series [30].Considering that wind speed and w nonlinear and volatile time series, the prediction model uses for modeling (Figure 2).Firstly, the role of the forget gate ft in LSTM is to decide information in the previous cell state Ct −  Firstly, the role of the forget gate f t in LSTM is to decide whether to keep or delete the information in the previous cell state C t−1 according to the size of the output value.f t input is the current moment input X t and the previous moment output h t−1 .The input gate i t is used to determine which part of X t can be retained in C t .The input of i t is also used by the sigmoid function to generate an output value of size between 0 and 1 to control the tanh function to generate a new candidate cell state C t .The cell state C t at the current moment is updated by the combined action of the forget and input gates.The output of the LSTM is controlled by the output gate o t .The output h t of the LSTM at the current moment is equal to the product of the output weights and the pending output.The relationships among the variables in the LSTM are shown in Equations ( 2)- (7).

Adaptive Weighted Particle Swarm Algorithm Combined with Elimination Mechanism
The original data set is decomposed to form different data sub-sets, and manually setting the optimal network structure parameters corresponding to each data sub-set will increase the time cost, while the prediction error caused by manually setting parameters will cause the error to expand as the number of data sub-sets increases.Therefore, particle swarm optimization (PSO) algorithm, which has a strong search capability, is introduced to optimize the network structure parameters [31].The particle swarm continuously updates its position during the search process to finally obtain the totally optimal solution.During each iteration, the velocity and position of the particles are updated as shown in Equations ( 8) and ( 9).
When the standard PSO algorithm is used to optimize the parameters, multiple particles are usually involved in iterations to fully utilize the search capability of the particle swarm.In each round of iteration, some particles move to the position with a better fitness value, while others move to the position with a worse fitness value.If particles move to the position with a worse fitness value in the previous iteration, the inertial component of particles in the current iteration will lead to a worse fitness value for the particles, which is not good for the particles to converge to the global optimal solution.If the inertia direction of each particle is dynamically adjusted in each iteration according to the fitness value obtained in the previous iteration, it will be very beneficial to the particle convergence.Therefore, adaptive adjustment is used for the inertia weights, as shown in Equation (10).
where f k average is the average fitness of all particles at the kth iteration; f k min is the minimum fitness of all particles at the kth iteration.
To take further advantage of the total search capability of the particle swarm, this paper introduces the elimination mechanism in the genetic algorithm into the standard PSO algorithm [32].After the adaptive adjustment of the inertia weights, particles are arranged in descending order according to the fitness value of each particle in this round, and the particles ranked in the bottom 50% of the fitness value are eliminated, while the particles ranked in the top 50% are used as the boundary.The new 50% of the particles are randomly generated within the boundary and combined with the top 50% of the retained particles to form a new population to start the next round of iterations, thus improving the total optimization capability of the algorithm.When the IPSO algorithm is used to optimize the network structure parameters, the positions of the particles are used to represent the number of LSTM layers, the number of DENSE layers, and the number of neurons in each layer of the prediction model.The flowchart of the IPSO algorithm for optimizing the network structure parameters is shown in Figure 3.

VMD-CNN-IPSO-LSTM Prediction Model
In this paper, VMD-CNN-IPSO-LSTM prediction model is used for short-term wind power prediction, which is mainly divided into six steps, and the prediction process is shown in Figure 4.
Step 1: Decomposition of the original wind speed and wind power using VMD after removing and repairing the outliers and missing values of the data set; Step 2: Reconstruct the new dataset with other correlated meteorological features (temperature, humidity) after obtaining the sub-sequence and normalize the dataset at the same time; Step 3: Constructing CNN-LSTM combined prediction models according to the different datasets; Step 4: In the pending training model, the number of layers for LSTM layers and DENSE layers and the number of neurons in each layer are optimized by IPSO; Step 5: The mean square error between the true and predicted values is used as the fitness function, and the optimal network structure parameters of each prediction model are obtained based on the merit of the fitness value and substituted back into the corresponding prediction model; Step 6: Add all prediction components to obtain the final wind power.

VMD-CNN-IPSO-LSTM Prediction Model
In this paper, VMD-CNN-IPSO-LSTM prediction model is used for short-term wind power prediction, which is mainly divided into six steps, and the prediction process is shown in Figure 4.
Step 1: Decomposition of the original wind speed and wind power using VMD after removing and repairing the outliers and missing values of the data set; Step 2: Reconstruct the new dataset with other correlated meteorological features (temperature, humidity) after obtaining the sub-sequence and normalize the dataset at the same time; Step 3: Constructing CNN-LSTM combined prediction models according to the different datasets; Step 4: In the pending training model, the number of layers for LSTM layers and DENSE layers and the number of neurons in each layer are optimized by IPSO; Step 5: The mean square error between the true and predicted values is used as the fitness function, and the optimal network structure parameters of each prediction model are obtained based on the merit of the fitness value and substituted back into the corresponding prediction model; Step 6: Add all prediction components to obtain the final wind power.

Evaluation Metrics
In this paper, four evaluation metrics are used to measure the prediction performance of this model.They are mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and Adjusted R-Square (adj-R 2 ) and are defined as follows:

Evaluation Metrics
In this paper, four evaluation metrics are used to measure the prediction performance of this model.They are mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and Adjusted R-Square (adj-R 2 ) and are defined as follows: Energies 2022, 15, 6734 10 of 16 The smaller the value of MAE, RMSE, and MAPE, the better the prediction performance of the model.The value of adj-R 2 is between [0, 1], and if the value is closer to 1, the model can perform better.

Case Study
In this paper, simulations are conducted with historical data from a wind farm in Guangdong, and the dataset include wind speed, temperature, humidity, and wind power.The dataset is selected from 1 January 2019 to 31 January 2019, with a sampling interval of 15 min.The training set is selected for the first 29 days, while the test set is from the last 2 days.

Decomposition of Wind Speed and Wind Power Series Using VMD
The decomposition of the wind speed and wind power series are performed by using VMD to reduce the self-contained volatility of the original series.The results of wind speed and wind power decomposition are shown in Figures 5 and 6, respectively.( ) The smaller the value of MAE, RMSE, and MAPE, the better the prediction mance of the model.The value of adj-R 2 is between [0,1], and if the value is closer model can perform better.

Case Study
In this paper, simulations are conducted with historical data from a wind Guangdong, and the dataset include wind speed, temperature, humidity, an power.The dataset is selected from 1 January 2019 to 31 January 2019, with a s interval of 15 min.The training set is selected for the first 29 days, while the test se the last 2 days.

Decomposition of Wind Speed and Wind Power Series Using VMD
The decomposition of the wind speed and wind power series are performed b VMD to reduce the self-contained volatility of the original series.The results speed and wind power decomposition are shown in Figures 5 and 6, respectively   After VMD processing, wind speed and wind power are decomposed into five modes, and the decomposed modes maintain the characteristics of the original series, while reducing their non-smoothness.
Before presenting the model, three groups of prediction models (Standard LSTM, EMD-LSTM, and VMD-LSTM) are introduced for comparison.Figure 7 shows the improvement of the prediction curves after EMD and VMD decomposition over the undecomposed prediction curves, which are closer to the true value in terms of trend.Table 3 gives the MAE, RMSE, MAPE, and adj-R 2 error analysis for the predictions of the three models.The LSTM model without data decomposition is the worst in MAE, RMSE and adj-R 2 for prediction, so it's reasonable to apply data decomposition as data pre-processing; meanwhile, the prediction model with VMD is higher than EMD in terms of accuracy, which proves that the decomposition method can learn deeper information on the data set used in this paper.After VMD processing, wind speed and wind power are decomposed into five modes, and the decomposed modes maintain the characteristics of the original series, while reducing their non-smoothness.
Before presenting the model, three groups of prediction models (Standard LSTM, EMD-LSTM, and VMD-LSTM) are introduced for comparison.Figure 7 shows the improvement of the prediction curves after EMD and VMD decomposition over the undecomposed prediction curves, which are closer to the true value in terms of trend.Table 3 gives the MAE, RMSE, MAPE, and adj-R 2 error analysis for the predictions of the three models.The LSTM model without data decomposition is the worst in MAE, RMSE and adj-R 2 for prediction, so it's reasonable to apply data decomposition as data pre-processing; meanwhile, the prediction model with VMD is higher than EMD in terms of accuracy, which proves that the decomposition method can learn deeper information on the data set used in this paper.After VMD processing, wind speed and wind power are decomposed into five modes, and the decomposed modes maintain the characteristics of the original series, while reducing their non-smoothness.
Before presenting the model, three groups of prediction models (Standard LSTM, EMD-LSTM, and VMD-LSTM) are introduced for comparison.Figure 7 shows the improvement of the prediction curves after EMD and VMD decomposition over the undecomposed prediction curves, which are closer to the true value in terms of trend.Table 3 gives the MAE, RMSE, MAPE, and adj-R 2 error analysis for the predictions of the three models.The LSTM model without data decomposition is the worst in MAE, RMSE and adj-R 2 for prediction, so it's reasonable to apply data decomposition as data pre-processing; meanwhile, the prediction model with VMD is higher than EMD in terms of accuracy, which proves that the decomposition method can learn deeper information on the data set used in this paper.The decomposed wind speed and wind power sub-series were combined with other feature variables to reconstruct the data to obtain five datasets, and five CNN-LSTM combined prediction models were constructed simultaneously.The constructed combined neural network contains four 1D-CNN layers, the number of filters in each layer is 4, 4, 8, and 16, and the filter training step is 2. The number of training iterations of IPSO are set to 200; the number of particles is set to 10; the initial inertia weight ω is set to 0.9; c 1 and c 2 are set to 2.0.For the five combined neural networks, the LSTM layer and the DENSE layers are set at [1,3] layers; the number of neurons in each layer is set at [8,96], and the optimal number of layers and neurons are searched by IPSO.After 200 generations of training, the results of the optimal parameters of the network structure corresponding to the five prediction models are shown in Table 4 below.To verify the effectiveness of this model, the VMD-LSTM model and VMD-CNN-LSTM model are introduced to compare the performance of the two models.The network structures of both models are two LSTM layers and two DENSE layers; the number of neurons in the two LSTM layers are 32 and 16, and the number of neurons in the two DENSE layers are 100 and 50, respectively.
Figure 8 shows the prediction values of each model compared to the true values.It is clear that the prediction values obtained by this model are the best fit and demonstrate the best prediction performance of the model.The decomposed wind speed and wind power sub-series were combined with other feature variables to reconstruct the data to obtain five datasets, and five CNN-LSTM combined prediction models were constructed simultaneously.The constructed combined neural network contains four 1D-CNN layers, the number of filters in each layer is 4, 4, 8, and 16, and the filter training step is 2. The number of training iterations of IPSO are set to 200; the number of particles is set to 10; the initial inertia weight  is set to 0.9; c1 and c2 are set to 2.0.For the five combined neural networks, the LSTM layer and the DENSE layers are set at [1,3] layers; the number of neurons in each layer is set at [8,96], and the optimal number of layers and neurons are searched by IPSO.After 200 generations of training, the results of the optimal parameters of the network structure corresponding to the five prediction models are shown in Table 4 below.To verify the effectiveness of this model, the VMD-LSTM model and VMD-CNN-LSTM model are introduced to compare the performance of the two models.The network structures of both models are two LSTM layers and two DENSE layers; the number of neurons in the two LSTM layers are 32 and 16, and the number of neurons in the two DENSE layers are 100 and 50, respectively.
Figure 8 shows the prediction values of each model compared to the true values.It is clear that the prediction values obtained by this model are the best fit and demonstrate the best prediction performance of the model.Table 5 shows the values of the prediction performance analysis of each model under the evaluation metrics.Firstly, the accuracy of the model has been improved after VMD processing.After VMD processing, using CNN to further capture the data features will lead to better prediction performance of the model.However, manually setting model structure parameters are not the optimal structure parameters, so IPSO is used to search for the optimal structure parameters of the prediction model corresponding to each VMD-CNN processed dataset.After substituting the optimal parameters in Table 4 back to each prediction model for prediction, it is proven that all the metrics of the VMD-CNN-IPSO-LSTM model are optimal.Compared with the VMD-CNN-LSTM model, the MAE is reduced by 25.7%, the RMSE is reduced by 20.87%, and the MAPE also showed a substantial decrease compared to the above comparative models, while adj-R 2 = 0.97349 also indicates that the proposed model is helpful for short-term wind power prediction.

Verify the Generalization Ability of the Model under the New Data Set
To further evaluate the prediction performance and the generalization ability of this model, it was validated for different meteorological conditions of this wind farm.Considering the large differences of weather in Guangdong in January and June, 1 June 2019 to 30 June 2019 was chosen as the new data set for simulation.
The optimal parameters of the network structure after IPSO optimization for the five prediction models in the new dataset are shown in Table 6, while the wind power curves of the three models with the real values for 48 h are shown in Figure 9. Table 5 shows the values of the prediction performance analysis of each model under the evaluation metrics.Firstly, the accuracy of the model has been improved after VMD processing.After VMD processing, using CNN to further capture the data features will lead to better prediction performance of the model.However, manually setting model structure parameters are not the optimal structure parameters, so IPSO is used to search for the optimal structure parameters of the prediction model corresponding to each VMD-CNN processed dataset.After substituting the optimal parameters in Table 4 back to each prediction model for prediction, it is proven that all the metrics of the VMD-CNN-IPSO-LSTM model are optimal.Compared with the VMD-CNN-LSTM model, the MAE is reduced by 25.7%, the RMSE is reduced by 20.87%, and the MAPE also showed a substantial decrease compared to the above comparative models, while adj-R 2 = 0.97349 also indicates that the proposed model is helpful for short-term wind power prediction.

Verify the Generalization Ability of the Model under the New Data Set
To further evaluate the prediction performance and the generalization ability of this model, it was validated for different meteorological conditions of this wind farm.Considering the large differences of weather in Guangdong in January and June, 1 June 2019 to 30 June 2019 was chosen as the new data set for simulation.
The optimal parameters of the network structure after IPSO optimization for the five prediction models in the new dataset are shown in Table 6, while the wind power curves of the three models with the real values for 48 h are shown in Figure 9.   Since this model is adopted with a deep neural network, it can sharply capture the temporal correlation between the feature in the new data set and thus quickly learn the coupling relationship implied by each feature.The prediction curves of the VMD-CNN-IPSO-LSTM model still fit the real wind power curve well in terms of both trend and turning points, as seen in Figure 9, thus demonstrating that the prediction performance of the model remains robust with the new data set.The prediction error analysis of each model in the new data set is given in Table 7.In the new data set, the MAE, RMSE, MAPE, and adj-R 2 of all three models have increased.However, the VMD-CNN-IPSO-LSTM model still outperforms the comparison models in all error metrics.Compared with the model processed by VMD-CNN only, the MAE of this model is reduced by 8.55%, the RMSE is reduced by 10.5%, and the MAPE is reduced by 43.1%.Additionally, the adj-R 2 of the proposed model is greater than 0.96 for both January and June data sets, which proves that the VMD-CNN-IPSO-LSTM model has strong generalization ability for short-term wind power prediction.

Conclusions
In this paper, a hybrid short-term wind power prediction model is proposed based on data decomposition (VMD) and combined deep neural network (CNN-LSTM) for wind speed and wind power series with the consideration of the characteristics of nonlinearity and volatility.The effectiveness of the model is confirmed through comparative simulated analysis with a wind farm in Guangdong, China.Some of the observations can be summarized as follows: (1) The volatility and noise of wind speed and wind power series are effectively reduced after variational mode decomposition; (2) The combined deep neural network has a sharp learning ability for the data implicit feature.However, the overall consumption time is longer than that of the single model, so it is important to choose the number of decomposed modes reasonably.If the number of modes is too much or too little, it will have an impact on the hybrid model; (3) The network structure parameters of the prediction model can be optimized by the improved PSO algorithm to further enhance prediction accuracy; (4) To verify the generalized potential of the proposed model, the output of the same wind farm in two months with large climate difference is investigated.The analysis of the prediction results confirms that the model has a strong generalized ability; (5) For future research directions, the authors intend to use a multi-step decomposition model after which the sub-sequence obtained from the decomposition is passed through different neural networks for feature extraction, such as using graph convolutional neural networks for implicit mining, and to compare whether the model proposed in this paper makes a breakthrough in accuracy through simulation.

Figure 3 .
Figure 3. Flow chart of optimizing parameters for combined neural network structure by IPSO.

Figure 4 .
Figure 4. Flow chart of prediction model.

Figure 4 .
Figure 4. Flow chart of prediction model.

Figure 5 .
Figure 5. Decomposed sub-sequence of wind speed by VMD.

Figure 5 .
Figure 5. Decomposed sub-sequence of wind speed by VMD.

Figure 6 .
Figure 6.Decomposed sub-sequence of wind power by VMD.

Figure 7 .
Figure 7. Predicted and real values for different decomposition models.

Figure 6 .
Figure 6.Decomposed sub-sequence of wind power by VMD.

Figure 7 .
Figure 7. Predicted and real values for different decomposition models.Figure 7. Predicted and real values for different decomposition models.

Figure 7 .
Figure 7. Predicted and real values for different decomposition models.Figure 7. Predicted and real values for different decomposition models.

Figure 8 .
Figure 8. Wind power predicted value and real value of each model.Figure 8. Wind power predicted value and real value of each model.

Figure 8 .
Figure 8. Wind power predicted value and real value of each model.Figure 8. Wind power predicted value and real value of each model.

Figure 9 .
Figure 9. Wind power predicted value and real value of each model in the new data set.

Figure 9 .
Figure 9. Wind power predicted value and real value of each model in the new data set.
was proposed by Konstantin Dragomiretskiy et al. in 2014, which can

Table 1 .
Merit and demerit of different decomposition methods.

Table 2 . Cont. This Work Anfeng Zhu, et al. [7] Chao Zhang, et al. [9] Hao Yin, et al. [15] Rasoul Rahmani, et al. [22]
1 according to the siz is the current moment input Xt and the previous moment o used to determine which part of Xt can be retained in Ct.The sigmoid function to generate an output value of size betwee function to generate a new candidate cell state  .The cell st is updated by the combined action of the forget and input ga is controlled by the output gate ot.The output ht of the LST )

Table 3 .
Prediction performance analysis of single model and decomposed combined model.

Table 4 .
Optimal structural parameters of each prediction model.

Table 3 .
Prediction performance analysis of single model and decomposed combined model.

Table 4 .
Optimal structural parameters of each prediction model.

Table 5 .
Analysis of the prediction performance of each model.

Table 6 .
Optimal structural parameters of each prediction model in the new data set.

Table 5 .
Analysis of the prediction performance of each model.

Table 6 .
Optimal structural parameters of each prediction model in the new data set.

Table 7 .
Prediction performance analysis of each model in new data set.