A Hybrid Framework for Short Term Multi-Step Wind Speed Forecasting Based on Variational Model Decomposition and Convolutional Neural Network

Wind speed is an important factor in wind power generation. Wind speed forecasting is complicated due to its highly nonstationary character. Therefore, this paper presents a hybrid framework for the development of multi-step wind speed forecasting based on variational model decomposition and convolutional neural networks. In the first step of signal pre-processing, the variational model decomposition approach decomposes the wind speed data into several independent modes under different center pulsation. The vibrations of decomposed modes are useful for accurate wind speed forecasting. Then, the influence of different numbers of modes and the input length of the convolutional neural network are discussed to select the optimal value through calculating the errors. During the regression step, each mode is treated as a channel that constitutes the input of the forecasting model. The convolution operations in convolutional neural networks extract helpful local features in each mode and the relationships between modes for forecasting. We take advantage of the convolutional neural network and directly output multi-step forecasting results. In order to show the forecasting and generalization performance of the proposed method, wind seed data from two wind farms in Inner Mongolia, China and Sotavento Galicia, Spain with different statistical information were employed. Some classic statistical approaches were adopted for comparison. The experimental results show the satisfactory performance for all of the methods in single-step forecasting and the advantages of using decomposed modes. The root mean squared errors range from 0.79 m/s to 1.64 m/s for all of the methods. In the case of multi-step forecasting, our proposed method achieves an outstanding improvement compared with the other methods. The root mean squared error of our proposed method was 1.30 m/s while the worst performance of the other methods was 9.68 m/s. The proposed method is able to directly predict the variation trend of wind speed based on historical data with minor errors. Hence, the proposed forecasting schemes can be utilized for wind speed multi-step forecasting to cost-effectively manage wind power generation.


Introduction
In recently years, sustainability transitions, which aim to create more sustainable consumption and production for socio-technical systems, have become a significant issue.In the energy sector, the scarcity of fossil fuels and environmental pollution have become the important issues for the continuous development of human society.According to the current European roadmap, the reduction in greenhouse gas emission is necessary for limiting the politically agreed temperature increase by two degrees.In order to achieve this target, renewable energy as a percentage of gross final energy consumption should reach 20% [1].Therefore, the rapid development of renewable energy resources such as wind, solar and wave energy has been the common choice globally.Among these new resources, wind power has been growing fast worldwide and will continue growing to solve the energy crisis.Similar to the parameters which have an effect on power system, such as load control mode and power system operation, wind speed is significantly influential in wind power system.However, natural wind speed is nonstationary and random.Therefore, accurate forecasting of wind speed is a significant and challenging task.
Wind forecasting methods can generally be divided into the following categories: the persistence method, physical approach, statistical approach and hybrid approach [2].The persistence method is suitable for very short to short term forecasting.It is supposed that wind speed at next time will be the same as the current value.Although the performance of this method for long-term forecasting is unsatisfactory, it can still be a benchmark for comparison.In physical methods, numerical weather prediction (NWP) models are used to forecast wind speed and climate variables.Landberg utilized a Wind Atlas Application and Analysis program for corrections of wind speed predictions [3].Lazić et al. applied the regional atmospheric numerical weather prediction Eta model and effectively predicted wind power [4].Negnevitsky et al. introduced an adaptive neuro-fuzzy inference system to forecast a wind speed time series and obtained accurate predictions when weather conditions were stable [5].However, these physical models consume a large amount of computing time and the forecasting errors can be significant under complex conditions.
Statistical methods consider the relationship between the forecast wind speed and historical data.These models, including time-series approaches and machine learning approaches can be collectively called data-driven models.Time-series approaches such as autoregressive moving average models [6], autoregressive integrated moving average models [7] and others are widely adopted in the literature [8].The models mentioned above are based on the premise that the wind speed or power follows a normal distribution.Considering the randomness and non-stationarity of wind speed, it is difficult to accurately long-term predict through these time-series approaches.Machine learning approaches such as artificial neural networks (ANN) [7,9], recurrent neural networks (RNN) [10][11][12], extreme learning machine (ELM) [13][14][15] and support vector regression (SVR) [15,16] exhibit great nonlinear fitting ability through modeling from the historical data.In addition, signal process approaches have also been applied to improve the performance of machine learning forecasting results.Wang et al. proposed a three-phase signal decomposition technique to decompose wind speed and predicted multi-step ahead wind speed through feature extraction and weighted regularized extreme learning machine.Four real wind speed prediction cases verified the effectiveness of their proposed hybrid model [17].Liu et al. investigated hybrid methods using wavelet packet decomposition (WPD), empirical mode decomposition (EMD) and the ELM for wind speed prediction.Wind speed series were decomposed into low frequency and high frequency sub-layers.Different models including data-driven models and hybrid models were compared with each other and the results indicated that the hybrid model had the best predicting performance [18].Barbounis et al. employed meteorological information and different networks to deal with the problem of long-term wind speed and power forecasting.Simulation results demonstrated that the recurrent models outperformed the static ones [10].Senjyu et al. also confirmed the validity of neural networks (NN) for predicting wind speed by computer simulations and proposed an application of RNN for wind speed prediction [19].Abdoos decomposed wind speed through variational mode decomposition (VMD) and selected features based on Gram-Schmidt Orthogonalization.Then, ELM was trained using selected features for efficient and fast prediction.The results justified the superiority of their proposed method in accurate forecasting and saving computational time [20].Naik et al. combined variational mode decomposition with low rank multi-kernel ridge regression for short-term forecasting.They constructed prediction intervals with different confidence levels for wind speed and wind power [21].Although these methods Energies 2018, 11, 2292 3 of 18 achieved some satisfactory predicting results, most of them predicted single-step points subjected to the model structure.The accumulative errors of the aforementioned methods may be significant when forecasting multi-step data through iterative prediction.Consequently, predictive control and the dispatched mode for renewable power have been combined with multi-step forecasting.The precise results obtained from multi-step forecasting can help power systems make dispatch plans ahead and improve their competitiveness in energy markets.The ability of peak filling and valley filling of some renewable energy also benefits from precise multi-step forecasting and creates more potential financial benefits.Therefore, multi-step forecasting has become one of the research "hotspots" in wind speed forecasting.Recently, convolutional neural networks (CNNs) have shown an outstanding ability to discover useful representations for classification and regression tasks such as the generative model [22], image classification [23], fault diagnosis [24], modelling sentences [25], and so on.The output of the convolutional layer is multidimensional which offers the potential to address the challenges of multi-step wind speed forecasting.In this paper, a hybrid framework is proposed to predict the wind speed for a period of time instead of one point in time.The VMD approach is applied to decompose the wind speed series into modes as different channels of input.The decomposition of wind speed is helpful to enhance the performance of the forecasting.Then, the CNN is developed to address the problem of multi-step wind speed forecasting.We investigated the single-step and multi-step forecasting results of wind speed data from two wind farms in different areas with some other classical benchmark methods and the results proved the progressiveness of our proposed method.
The main contributions of our work for the field of multi-step wind speed forecasting are as follows: (1) This work combined a signal decomposition approach with deep learning methods to improve the multi-step predicting accuracy.Compared with the CNN method without decomposition, our proposed VMD-CNN method performed better in multi-step forecasting and single-step forecasting.The simpler vibration modes through VMD brought about more precise predictions.
Owing to the advantages of CNN, the output layer can be directly mapped to the data in next period of time.Experiments showed the outstanding fitting ability for multi-step forecasting of proposed method compared with other methods.(2) The proposed method provided a way to consider the local features and relationships of the decomposed modes.The common hybrid approaches train independent models for each decomposed mode and simply aggregate the forecasting results together.The convolutional operation in the CNN can learn the correlation relationships through integrating the decomposed modes into the input of one model, which enhanced the performance of multi-step forecasting.
The remainder of this paper is organized as follows.In Section 2, several descriptions of the basic methodology of VMD and CNNs are reviewed.In Section 3, we introduce the system framework and forecasting model in detail.The wind speed forecasting procedure is also illustrated.Then, in Section 4. the time series of wind speed from different wind farms are applied to demonstrate the validity of the proposed method Finally, Section 5 presents our conclusions.

Variational Mode Decomposition
VMD was first proposed by Dragomiretskiy and Zosso in 2014 as a signal processing method [26].It is a state-of-art adaptive and quasi-orthogonal method to decompose the signal f (t) into independent modes u k (k = 1, 2, . . ., K).Each mode is compact around a center pulsation ω k and the H 1 Gaussian Energies 2018, 11, 2292 4 of 18 smoothness is utilized to estimate the bandwidth.The VMD can be formulated as a constrained variational problem as follows: where ∂(t) means Dirac distribution and * denotes convolution.The quadratic penalty term and Lagrangian multipliers are introduced to translate the problem into an unconstrained one: where α denotes the balancing parameter of the data-fidelity constrained.In order to solve this problem, the alternate direction method of multipliers (ADMM) is adopted and the iterative process can be written as: (1) Minimization of u k : (2) Minimization of ω k : where ûn+1 k , ûi (ω), f (ω), λ(ω) are the Fourier transform of u n+1 k , u i (ω), f (ω), λ(ω) respectively, n is the number of iterations.The main procedures of VMD can be summarized as follows: Step 1: Initialize u  3) and (4) respectively; Step 3: Update λ by the following equation: where τ is the update parameter; Step Similar to many other signal process methods, there are some parameters of VMD that need to be determined in advance, such as the mode number K, the mode frequency bandwidth control parameter α, the noise-tolerance τ, the tolerance of convergence criterion ε and the maximum iterations N.Many studies in the literature have proved that these parameters have significant impacts on noise robustness and decomposition efficiency.When parameter K is set too low, under-segmentation may appear.A few modes may be integrated into other modes or disappear in this situation.The time-frequency distribution of modes may overlap each other when the signal is decomposed into too many modes.In addition, the parameter α is related to the data-fidelity constrained.Therefore, many scholars have combined intelligent algorithms to search optimal parameters for VMD [13,[27][28][29].In this paper, the employed deep learning method has great adaptivity and is insensitive to the different parameters.The influence of parameters was investigated and is discussed in Section 4.2.

Convolutional Neural Networks
CNNs have been proved to have great ability for classification and regression tasks.They can extract helpful local features from the input through convolution operation.More details about CNNs can be found in the literature [23].The layers adopted in our forecasting model will be described below.

Convolutional Layer
The convolutional layers in the convolutional neural network can be regarded as filters through the convolution kernels [23].In each filter, neurons multiply the data points by their weights.The kernels in each filter share weights to simplify computation overhead.One filter extracts one frame for the next layer.Assume the local region of layer l is X l(r j ) and the weights of i-th filter kernel is W l i , then the corresponding output can be calculated as follows: where * denotes the convolutional operation.

Activation Layer
Nonlinear activation functions, such as hyperbolic tangent, sigmoid and rectified linear unit (ReLU) functions, are usually adopted on the output of each layer to enhance the representation ability.In recent years, ReLU has been widely used to make the network more trainable.The formula of ReLU can be written as follow: where a l(i,j) is the activation output of y l(i,j) .

Flatten Layer
The input and output of convolutional layers are multidimensional.For the sake of obtaining multi-step forecasting output, the flatten layers are used to flatten the input matrixes to vectors.

Suppose the input is
, the output of the flatten layer will be

Upsampling Layer
The convolution operation in CNN will lead to dimension reduction compared to the input data.Similar to some interpolation methods, the upsampling layer repeats the input matrixes in a different axis to get an output in the expectant dimension.For example, we can upsample the input X on the first dimension by step s, and the corresponding output is

The Novel Hybrid Variational Mode Decomposition (VMD)-Convolutional Neural Network (CNN) Data-Driven Model
The specific flowchart of the proposed method is illustrated in Figure 1, in which the signal process and the artificial intelligence approach are systematically integrated to predict wind speed.The original .

The Novel Hybrid Variational Mode Decomposition (VMD)-Convolutional Neural Network (CNN) Data-Driven Model
The specific flowchart of the proposed method is illustrated in Figure 1, in which the signal process and the artificial intelligence approach are systematically integrated to predict wind speed.The original data are decomposed into different modes which constitute the input data and determine the architecture of the CNN forecasting model.Then, the input data set is divided into training and testing data sets to train the CNN forecasting model and evaluate the forecasting performance, respectively.

Data Decomposition
Wind speed time-series often contain different frequency components, including the trend term, cycle term and stochastic term, which hinders accurate modeling and forecasting.Through VMD, the historical speed wind data from a wind farm in Inner Mongolia, China together with four decomposed modes are shown in Figure 2. The original wind speed is extremely random and complex.Each decomposed mode is compacted around a center pulsation, and the higher order mode contains higher frequency components.The features of the trend components can be extracted from low frequency modes.The cycle term and stochastic term of wind speed can be predicted through modeling of the higher order modes.

Data Decomposition
Wind speed time-series often contain different frequency components, including the trend term, cycle term and stochastic term, which hinders accurate modeling and forecasting.Through VMD, the historical speed wind data from a wind farm in Inner Mongolia, China together with four decomposed modes are shown in Figure 2. The original wind speed is extremely random and complex.Each decomposed mode is compacted around a center pulsation, and the higher order mode contains higher frequency components.The features of the trend components can be extracted from low frequency modes.The cycle term and stochastic term of wind speed can be predicted through modeling of the higher order modes.

Constitution of Input and Output Matrices
After decomposition of the wind speed data, modes in different frequency scales are obtained.As introduced in Section 2, the convolutional layers in CNN can be regarded as filters to extract local features from the input data.Therefore, the input and output matrices are constituted intuitively as illustrated in Figure 3. Let M k = [m k1 , m k2 , . . ., m kL ] ∈ R 1×L , k = 1, 2, . . ., K denote the k-th decomposed mode and S = [s 1 , s 2 , . . ., s L ] ∈ R 1×L denote the original wind speed signal, where L means the total number of observed points.Then, the input matrix Input and output matrix Output can be formulated as follows:

Constitution of Input and Output Matrices
After decomposition of the wind speed data, modes in different frequency scales are obtained.As introduced in Section 2, the convolutional layers in CNN can be regarded as filters to extract local features from the input data.Therefore, the input and output matrices are constituted intuitively as illustrated in Figure 3. Let denote the original wind speed signal, where L means the total number of observed points.Then, the input matrix Input and output matrix Output can be formulated as follows: , , , , , , where i l and o l denote the input length and output length, respectively, i means the index of sampling point.

Forecasting Model Structure
In this paper, 1-D CNN is adopted for wind speed forecasting.Common CNNs process images and accept input tensors in three dimensions, which are the width, height and number of color channels in images.Similarly, the 1-D CNN accepts time-series input in two dimensions: the time steps and number of channels.In fact, when the height of images is 1, the common CNN can be simplified to 1-D CNN.The CNN structure adopted in this paper is shown in Figure 4.The four channels in input data match the four decomposed modes from wind speed signal.Then, multiple filters extract features in different scale to capture the mapping relationship between input data and output data.The convolution of stride is employed instead of pooling (e.g., max pooling) because stride convolution is fully differentiable and allows the network to learn its own special downsampling [17].

Forecasting Model Structure
In this paper, 1-D CNN is adopted for wind speed forecasting.Common CNNs process images and accept input tensors in three dimensions, which are the width, height and number of color channels in images.Similarly, the 1-D CNN accepts time-series input in two dimensions: the time steps and number of channels.In fact, when the height of images is 1, the common CNN can be simplified to 1-D CNN.The CNN structure adopted in this paper is shown in Figure 4.The four channels in input data match the four decomposed modes from wind speed signal.Then, multiple filters extract features in different scale to capture the mapping relationship between input data and output data.The convolution of stride is employed instead of pooling (e.g., max pooling) because stride convolution is fully differentiable and allows the network to learn its own special down-sampling [17].

Forecasting Model Structure
In this paper, 1-D CNN is adopted for wind speed forecasting.Common CNNs process images and accept input tensors in three dimensions, which are the width, height and number of color channels in images.Similarly, the 1-D CNN accepts time-series input in two dimensions: the time steps and number of channels.In fact, when the height of images is 1, the common CNN can be simplified to 1-D CNN.The CNN structure adopted in this paper is shown in Figure 4.The four channels in input data match the four decomposed modes from wind speed signal.Then, multiple filters extract features in different scale to capture the mapping relationship between input data and output data.The convolution of stride is employed instead of pooling (e.g., max pooling) because stride convolution is fully differentiable and allows the network to learn its own special downsampling [17].

Data Description
In this paper, wind speed observation data gathered from a wind farm in Inner Mongolia, China were employed to demonstrate the efficiency of the proposed model.There was a total of 5760 points with a sampling interval of half an hour.We chose the first 4600 observations as the training set and the remaining points are used to test the model performance.The statistical information foreach data set is illustrated in Table 1.Before the training, the training set is normalized to enhance the CNN training performance.Besides, several classical statistical and their hybrid methods are constructed for comparison with our proposed model to evaluate the performance.

Data Description
In this paper, wind speed observation data gathered from a wind farm in Inner Mongolia, China were employed to demonstrate the efficiency of the proposed model.There was a total of 5760 points with a sampling interval of half an hour.We chose the first 4600 observations as the training set and the remaining points are used to test the model performance.The statistical information foreach data set is illustrated in Table 1.Before the training, the training set is normalized to enhance the CNN training performance.Besides, several classical statistical and their hybrid methods are constructed for comparison with our proposed model to evaluate the performance.

Model Establishment
There are two parameters that mainly influence the forecasting results of the proposed methods: the number of decomposed modes and the input length.The training data set was adopted to verify the structure of CNN.All the following experiments were run in Python 2.7 code.The Central Processing Unit (CPU) of the runtime environment was Intel Xeon E5-2650 v2 and the size of Random Access Memory (RAM) was 128 GB.The training data set was processed as described in Section 3. The parameters of each layer in the network were initialized through sampling from a random Gaussian distribution with zero mean and 0.1 standard deviation.In each experiment, the training iterations and output predicting length were kept the same, which were 100 epochs and 32, respectively.Considering the multi-step prediction output is a vector, the root mean squared error (RMSE) defined by the following is adopted to judge the predicting performance.(10) where M means the number of testing samples, s m d and s m d are the d-th predicting and true output of m-th sample, respectively.

Number of Decomposed Modes
As mentioned above, mode mixing may appear and cause inaccurate prediction when the number of decomposed modes is too small.On the other side, too many modes will give rise to a complicated forecasting method and unnecessary computing overheads.To identify the channels of the input layer, the experiment was first conducted to observe the influence of number of decomposed modes.As illustrated in Figure 5a, the error comes to a minimum value when the number of decomposed modes is four.With the increase in number of modes, the predicting error increases from 1.59 to 2.78.When the wind speed signal is decomposed into 3 modes, the predicting error increases to 3.13.Therefore, the most suitable number of decomposed modes is chosen as 4 for the rest of the experiments.Therefore, the details of the adopted model are determined through the experiments described above.The number of decomposed modes is 4 and the input length is set as 64.The details of the forecasting model are shown in Table 2.As illustrated in Figure 5, although different parameters of VMD and input length influence the forecasting results, the CNN-based forecasting model generally achieved satisfactory results.Owing to the strong nonlinear learning and fitting ability, the proposed CNN is significantly better than some hybrid forecasting methods which are sensitive to the change in parameters.Adam's [30] optimization algorithm is adopted to update the parameters in each layers.ReLU: rectified linear unit.

Forecasting Results and Analysis
To verify the effectiveness of the proposed VMD-CNN method, the following experiments were conducted to compare it with some other existing methods that have been proved feasible for wind speed forecasting.The SVR is an efficient machine learning method for regression.The input matrices of SVRs were determined by partial autocorrelation function (PACF) values [13], and the parameters

Input Length
The determination of input length is another critical issue which not only decides the input layer of adopted forecasting structure but also influences the prediction results.As stride technology is employed in this model, the input length should be a power of 2 which is useful for construction.Hence, we choose 256, 128, 64 and 32 as the input length for comparison.In Figure 5b, the prediction error decreases from 3.43 to 1.81 when the input length ranges from 256 to 64.This phenomenon indicates that the long input tensors bring about extra noise for accurate forecasting.However, when the input length is the same as the output length, the prediction error increases to 2.11, which means that essential information about the relationship between the current data and historical data is lacking in this situation.
Therefore, the details of the adopted model are determined through the experiments described above.The number of decomposed modes is 4 and the input length is set as 64.The details of the forecasting model are shown in Table 2.As illustrated in Figure 5, although different parameters of VMD and input length influence the forecasting results, the CNN-based forecasting model generally achieved satisfactory results.Owing to the strong nonlinear learning and fitting ability, the proposed CNN is significantly better than some hybrid forecasting methods which are sensitive to the change in parameters.Adam's [30] optimization algorithm is adopted to update the parameters in each layers.ReLU: rectified linear unit.

Forecasting Results and Analysis
To verify the effectiveness of the proposed VMD-CNN method, the following experiments were conducted to compare it with some other existing methods that have been proved feasible for wind speed forecasting.The SVR is an efficient machine learning method for regression.The input matrices of SVRs were determined by partial autocorrelation function (PACF) values [13], and the parameters of SVRs were selected through grid search (GS).The ELM is a feed-forward network with a single hidden layer and is easy to train due to its fast convergence speed.In the case of ELM forecasting, the input matrices were the same as SVRs.GS was also used to optimize the number of hidden nodes, ranging from 20 to 1000.NN as a simplification of CNN, is also employed for comparison.Different from the case of SVR and ELM, the output of NN can be in multiple dimensions.The topology of NN was 64-100-100-32, and both hidden layers were activated by the ReLU activation function.The training epoch and optimization algorithm were the same as the proposed method.The CNN method without VMD had one channel instead of four and accepted the original data as input.The rest of the layers of CNN were the same as the proposed VMD-CNN model.
In the case of multi-step forecasting, we selected 16 h as the forecast horizon, which means all the methods were tested through generating 32 predictions from historical data.In order to obtain multiple step prediction results, iterative prediction was employed for the ELM and SVR methods, which means that one forecasting point was added into the input to constitute a new testing example until the length of the predicting output is the same as the proposed method.The output dimensions of NN were directly set as the output length while in the case of single-step forecasting, the output dimensions of NN was 1 and the outputs of CNN models were sent to a full connection layer with one-dimension.For each aforementioned forecasting approach, there was a VMD-based hybrid method for comparison and to exhibit the effectiveness of decomposition.For example, VMD-SVR means four SVR models were trained for four decomposed modes and the prediction results were obtained by aggregating each prediction value.
Three common evaluating indicators were adopted to estimate the performance of the forecasting models.The RMSE is defined as Equation (10), while the mean absolute error (MAE) and mean absolute percentage error (MAPE) are formulated as follows: Energies 2018, 11, 2292 11 of 18 In addition, the improved percentage of RMSE, MAE, MAPE are exploited to intuitively describe the degree of enhancement from model 1 to model 2. The definitions are as the follows: where RMSE 1 , MAE 1 , MAPE 1 , RMSE 2 , MAE 2 and MAPE 2 are the errors of model 1 and model 2, respectively.The final results are shown in Table 3 and illustrated in Figure 6.In the single-step forecasting experiment, all the VMD-based hybrid methods had a better performance compared with their statistical method.The RMSE decreases 0.61, 0.75, 0.35 and 0.18 for SVR, ELM, NN, CNN and their hybrid method, respectively.This phenomenon proves that the decomposition of signal is helpful for single-step forecasting because the simpler vibration mode of decomposed modes is easier to predict.It can be seen that the VMD-SVR hybrid method performs the best in RMSE, MAE and MAPE.Although the errors of other methods are greater, the maximum RMSE, MAE and MAPE values among them are also satisfactory, which are 1.64, 1.36 and 22.88 respectively.The forecasting results of all the hybrid methods are presented in Figure 7.All of them can approximately fit the test data with very little error.Therefore, multi-step forecasting is more worthy of research and development for short-term wind speed forecasting.
In the multi-step forecasting experiment, the SVR and ELM, which are the single-step predicting methods, underperformed significantly.The applied iterative predicting approach accumulates a little error in each step, and finally leads to a significant error.The NN and CNN can directly output predictions in multiple dimensions owing to their architecture.Thus, in the case of multi-step forecasting, the NN and CNN based methods performed much better than SVR and ELM.As illustrated in Figure 6 and Table 3, the proposed VMD-CNN method achieved the best result among all the methods.The RMSE, MAE and MAPE of VMD-CNN are 1.3, 1.04 and 20.31, respectively which are roughly the same as the single-step experiment.By contrast, the RMSE, MAE and MAPE of VMD-SVR increase to 6.36, 5.04 and 89.22, respectively.In the multi-step forecasting experiment, the SVR and ELM, which are the single-step predicting methods, underperformed significantly.The applied iterative predicting approach accumulates a little error in each step, and finally leads to a significant error.The NN and CNN can directly output predictions in multiple dimensions owing to their architecture.Thus, in the case of multi-step forecasting, the NN and CNN based methods performed much better than SVR and ELM.As  In the multi-step forecasting experiment, the SVR and ELM, which are the single-step predicting methods, underperformed significantly.The applied iterative predicting approach accumulates a little error in each step, and finally leads to a significant error.The NN and CNN can directly output predictions in multiple dimensions owing to their architecture.Thus, in the case of multi-step forecasting, the NN and CNN based methods performed much better than SVR and ELM.As In addition, the improved percentages between the proposed VMD-CNN method and the other methods are shown in Table 4.It is clearly shown that the proposed method performs 72.22%, 79.33%, 40.91% and 70.18% better for RMSE than SVR, ELM, VMD-NN and CNN, respectively.A great improvement also exits for MAE and MAPE indexes.This result verifies the significant superiority of the proposed method in multi-step wind speed forecasting.It is worth noting that the SVR and ELM perform better than their hybrid VMD-based methods, which means their multi-step forecasting performance on decomposed modes are also not good enough.By contrast, the hybrid VMD-based NN and CNN methods can enhance RMSE by 52.99% and 72.18%compared to NN and CNN methods, respectively.Finally, some random test examples were chosen and their multi-step prediction results for SVR, ELM, VMD-NN and the proposed VMD-CNN methods are shown in Figure 8. Figure 8, the cumulative processes of errors of SVR and ELM are clearly exhibited.At the beginning of the iterations, the SVR and ELM methods can accurately predict the future value.With the increases in iteration, the prediction results of SVR and ELM methods are uncontrollable and far from the true value.The VMD-NN and VMD-CNN methods can basically forecast the change of wind speed and VMD-CNN performs better in the majority of examples.In fact, the proposed VMD-CNN method can accurately predict the variation trends in wind speed ignoring the slight randomness.Therefore, these multi-step forecasting results strongly suggest the significance of the proposed method for multi-step wind speed forecasting.In order to specifically describe the prediction results of different models, a Taylor diagram is employed as illustrated in Figure 9.The standard deviations, centered root-mean-square and pattern correlations with observation of each prediction in testing example 1 are clearly exhibited.In Figure 9, it can be seen that the VMD-NN and VMD-CNN model have roughly the same correlation coefficient as the observation.However, the VMD-CNN model has the smallest RMSE and the same standard deviation as the observation, whereas the VMD-NN model has little spatial variability, considering its smaller standard deviations.Of the poorer performing models, the ELM model is most correlative with observation, while the standard deviation of the SVR model is close to zero, which means the prediction of the SVR model stays the same without variation.In general, our proposed VMD-CNN model outperforms the compared models on all of the statistical characteristics.

Additional Forecasting Case
In order to further verify the efficiency of the proposed method, the 10-min wind speed data of the widely-studied wind farm in Sotavento Galicia, Spain [31] collected from 1 March 2018 to 31 March 2018 were used as an additional multi-step forecasting case.The number of observations for wind speed included in this study amounts to 4280.The first 3200 points were chosen for training and the remaining data are used to verify the performance of the models.The original observations are shown in Figure 10 and the statistical information is listed in Table 5.We chose SVR and ELM models for comparison in this study because their performances are better than their hybrid model in the multi-step forecasting experiment.Like the experiment for wind speed data in Inner Mongolia, China, the output length of all models was 32, which meant the forecast horizon was 5.3 h.The RMSE, MAE and MAPE of the different methods are listed in Table 6.As illustrated in Table 6, the rankings of the forecasting results are similar to the one in Section 4.3.The proposed method still has the best performance among all of the tested models.These results have demonstrated that the proposed method has immense potential for multi-step wind speed forecasting.models for comparison in this study because their performances are better than their hybrid model in the multi-step forecasting experiment.Like the experiment for wind speed data in Inner Mongolia, China, the output length of all models was 32, which meant the forecast horizon was 5.3 h.The RMSE, MAE and MAPE of the different methods are listed in Table 6.As illustrated in Table 6, the rankings of the forecasting results are similar to the one in Section 4.3.The proposed method still has the best performance among all of the tested models.These results have demonstrated that the proposed method has immense potential for multi-step wind speed forecasting.

Conclusions
This paper proposes a hybrid method for multi-step wind speed forecasting.The proposed VMD-CNN method employs VMD to decompose the wind speed signal into different modes under different center pulsation.By taking advantage of the structure of CNN, each mode is regarded as

Conclusions
This paper proposes a hybrid method for multi-step wind speed forecasting.The proposed VMD-CNN method employs VMD to decompose the wind speed signal into different modes under different center pulsation.By taking advantage of the structure of CNN, each mode is regarded as one channel to constitute the input.Then, the filters in each layer of CNN are trained to extract the local features and relationships between modes.Finally, the output layer of CNN is set in multiple dimensions to directly forecast the future wind speed.Several experiments were conducted to prove the effectiveness of the proposed method.By comparing the statistical approaches and their hybrid VMD-based methods, it has been proved that the decomposed modes are helpful for accurate predicting.Although the ELM and SVR methods performed better in single-step forecasting, the proposed method also exactly predicted the next wind speed value.In the case of multi-step forecasting, the proposed method achieved significant results while the SVR-based and ELM-based methods performed poorly.The VMD-CNN method predicted the variation trend of wind speed in general.For every evaluating indicator, our proposed VMD-CNN achieved the lowest value.However, the number and size of the filters of the proposed method should be optimized to adapt to more complicated wind speed.The influence of other signal process approaches should be discussed

1 k , ω 1 k
and maximum number of iterations N. Set n = 1
into different modes which constitute the input data and determine the architecture of the CNN forecasting model.Then, the input data set is divided into training and testing data sets to train the CNN forecasting model and evaluate the forecasting performance, respectively.

Figure 1 .
Figure 1.The architecture of the proposed variational mode decomposition (VMD)-convolutional neural network (CNN) hybrid data-driven forecasting model.

Figure 1 .
Figure 1.The architecture of the proposed variational mode decomposition (VMD)-convolutional neural network (CNN) hybrid data-driven forecasting model.

Figure 2 .
Figure 2. Wind speed signal and the decomposed modes.Energies 2018, 11, x FOR PEER REVIEW 8 of 18

Figure 3 .
Figure 3. Constitution of input and output matrices.

Figure 3 .
Figure 3. Constitution of input and output matrices.

Figure 4 .
Figure 4. Structure of the CNN model adopted for forecasting.

Figure 4 .
Figure 4. Structure of the CNN model adopted for forecasting.

Figure 5 .
Figure 5.Comparison of different model parameters: (a) number of modes, (b) input length.

Figure 8 .
Figure 8.The multi-step forecasting results of the methods.

Figure 8 .
Figure 8.The multi-step forecasting results of the methods.

Figure 8 .Figure 9 .
Figure 8.The multi-step forecasting results of the methods.

Figure 10 .
Figure 10.Wind speed time-series of Sotavento Galicia wind farm.

Figure 10 .
Figure 10.Wind speed time-series of Sotavento Galicia wind farm.

Table 1 .
Statistical information for each data set.

Table 3 .
Multi-step and single-step forecasting results of different methods.

Table 4 .
Improved percentages of different methods in multi-step forecasting.

Table 5 .
Statistical information of wind speed from Sotavento Galicia wind farm.

Table 6 .
Multi-step forecasting results of models in Sotavento Galicia wind farm.

Table 5 .
Statistical information of wind speed from Sotavento Galicia wind farm.

Table 6 .
Multi-step forecasting results of models in Sotavento Galicia wind farm.