A Novel Decomposition-Optimization Model for Short-Term Wind Speed Forecasting

Due to inherent randomness and fluctuation of wind speeds, it is very challenging to develop an effective and practical model to achieve accurate wind speed forecasting, especially over large forecasting horizons. This paper presents a new decomposition-optimization model created by integrating Variational Mode Decomposition (VMD), Backtracking Search Algorithm (BSA), and Regularized Extreme Learning Machine (RELM) to enhance forecasting accuracy. The observed wind speed time series is firstly decomposed by VMD into several relative stable subsequences. Then, an emerging optimization algorithm, BSA, is utilized to search the optimal parameters of the RELM. Subsequently, the well-trained RELM is constructed to do multi-step (1-, 2-, 4-, and 6-step) wind speed forecasting. Experiments have been executed with the proposed method as well as several benchmark models using several datasets from a widely-studied wind farm, Sotavento Galicia in Spain. Additionally, the effects of decomposition and optimization methods on the final forecasting results are analyzed quantitatively, whereby the importance of decomposition technique is emphasized. Results reveal that the proposed VMD-BSA-RELM model achieves significantly better performance than its rivals both on singleand multi-step forecasting with at least 50% average improvement, which indicates it is a powerful tool for short-term wind speed forecasting.


Introduction
With the massive consumption of fossil fuel and the increasing pressure of environmental protection, wind energy, one of the most major sustainable and clean energy sources, has been attracting an increasing attention in the last decades due to its remarkable features, such as broad distribution and abundant reserves [1].Therefore, wind energy is a promising substitute in many parts of the world.As the Global Wind Energy Council (GWEC) have reported, over 54 GW of clean and sustainable wind power has been installed across the global market in 2016, which now contains over 90 countries, including nine with over 10,000 MW installed, and 29 which have now exceeded the 1000 MW mark.Cumulative capacity increased by 12.6% to reach a total of 486.8 GW [1].However, affected by various factors (e.g., terrain, air pressure, temperature), wind energy is seriously intermittent, random, highly non-linear, and non-stationary, which is not conducive to the large-scale grid-connected operation of wind farms, and can bring a series of fatal problems for the safe and stable operation of power systems.Fortunately, accurate and reliable wind speed forecasting can effectively mitigate the negative impacts of wind energy on the power grid.Thus, many efforts have been done in wind speed forecasting to achieve higher wind energy utilization rates, safe and stable operation of power grids, and thereby gain more economic profits.At present, various forecasting models have been developed and applied in many fields [2][3][4][5][6].Weron [3] provided a thorough review of the strengths, weaknesses, and future for the state-of-the-art forecasting methods.Models used in wind speed/power forecasting can be divided into four main types, including physical models, statistical models, machine learning (ML) models, and hybrid models.The physical models are established according to hydrodynamic and thermodynamic equations.They usually require various meteorological and geographic information, such as wind speed, wind direction, temperature, humidity, barometric pressure, air density, elevation, among others.Therefore, the input dimension of the physical models is extremely high and their implementation process are very complex due to the large dimension of inputs.These two features limit the generalization of the physical models in practical engineering applications.
Unlike physical models, statistical models are constructed using relative less historical data through the analysis of the relevance between each point in the observed wind speed series.Most commonly used statistical models are auto regressive (AR) model [7], autoregressive moving average (ARMA) model [8], auto regressive integrated moving average (ARIMA) model [9], and their variants.These models have simple structures, whereas they are often inefficient when handle time series with high-nonlinear and non-stationary characteristics which are two essential features of wind speed series.Therefore, machine learning (ML) models are exploited in this field due to their remarkable abilities of nonlinear learning and generalization abilities.Cincotti et al. [6] has demonstrated that the ARMA-Generalized AutoRegressive Conditional Heteroscedasticity (GARCH) model is inferior to computational intelligence methods.Artificial neural networks (ANNs), the most popular ML models, have been widely exploited over the last decades.Traditional ANNs mainly include multi-layer perceptron (MLP) [6,10], back-propagation neural networks (BPNNs) [11][12][13], generalized regression neural networks (GRNNs) [13], radial basis function neural networks (RBFNNs) [13], and Elman neural networks (ENNs) [14,15].Recently, the extreme learning machine (ELM), a new single hidden layer feed-forward network (SLFN), has been developed [16].Compared with conventional ANNs, the most prominent characteristics of ELM are its simple structure, fast learning rate, and strong generalization ability [16].Unfortunately, the standard ELM is easy to over-fit and sensitive to outliers, because it only takes the empirical risk minimization principle into account during its implementation process [17][18][19].Many researchers have applied their efforts to improving the performance of ELM [17,18].The most effective way is introducing regularization methods into the basic ELM model to build the regularized ELM (RELM) model.Compared with the basic ELM, the RELM can provide more accurate and stable results, which has been proved by [5,17,18].
With the rapid development of data mining and computational intelligence techniques, a number of hybrid models with signal decomposition approaches and/or optimization algorithms have been proposed/developed.The signal decomposition approaches are able to decompose the raw data into a group of subseries which are smoother and easier to predict.Signal decomposition methods, such as wavelet decomposition (WD) [20,21], empirical mode decomposition (EMD) [22][23][24], ensemble empirical mode decomposition (EEMD), and variational mode decomposition (VMD) [25,26] are widely used in recent years.Generally, the WD method depends heavily on the determination of the mother wavelet functions, while, EMD has many drawbacks, including lack of an accurate mathematical expression, interpolation method selection, and trapping into mode mixing problems.Although EEMD is capable of solving the mode mixing issues of EMD, it still lacks a mathematical theory, which may reduce its robustness.In contrast, the VMD method can adaptively decompose the raw signal into several modes with specific sparsity properties and is also capable to overcoming the problem of mode mixing [27].
On the other hand, optimization algorithms have become popular in constructing hybrid models by tuning the parameters of ML models to further enhance forecasting accuracy.For example, Ren et al. [11] applied the particle swarm optimization (PSO) algorithm to optimize the parameters of BPNN so as to improve prediction accuracy of wind speed.Similarly, Gao et al. [28] used the firefly algorithm (FA) instead of PSO to adjust the weights and thresholds of the BPNN, and then developed a new hybrid model.There are more examples of hybrid models based on optimization algorithms in the wind speed/power forecasting, such as BPNN optimized by genetic algorithm (GA) [12], ELM optimized by crisscross optimization algorithm [29], MLP optimized by GA [10], MLP optimized by mind evolutionary algorithm (MEA) [10], SVM optimized by GA [21], least squares support vector machine (LSSVM) optimized by gravitational search algorithm (GSA) [30], and adaptive neuro-fuzzy inference system (ANFIS) optimized by an evolution PSO [31].Though there are many examples of successful applications for these optimization algorithms, the problems of premature convergence and deficiencies in balancing global search and local mining still exist in these algorithms.Therefore, it is worthwhile to find new efficient algorithms to solve wind speed forecasting problems.Recently, the backtracking search algorithm (BSA), a novel stochastic search algorithm, has been proposed by [32].Compared with the other stochastic population-based algorithms, BSA needs to set only one control parameter and is easy to implement.Due to its simple structure and easy operation, BSA has been applied to settle various complex nonlinear optimization problems [33][34][35], and therefore we attempt to use it for solving wind speed forecasting problem in our work.
In this study, a novel decomposition-optimization model is proposed through combining RELM, VMD, and BSA to achieve more accurate and reliable ultra-short-term wind speed forecasting.Firstly, VMD is applied to decompose the original wind speed series into a group of relatively stable subseries to reduce the distractions of the randomness and fluctuations of the original series on the prediction accuracy.Then, RELM optimized by BSA is establish to forecast each subseries.Meanwhile, partial autocorrelation function (PACF) is utilized to determine the optimal input vector.Finally, eventual results can be obtained by the aggregation method.To demonstrate the effectiveness of the proposed model, it has been thoroughly tested on several real wind speed datasets from the Sotavento Galicia (SG) wind farm in Spain.Experimental results demonstrate that by using decomposition and optimization techniques together, the forecasting performance of the proposed VMD-BSA-RELM model is significantly better than that of the basic RELM model.Moreover, the decomposition method VMD plays a more important role in the final improvement of the VMD-BSA-RELM model than the optimization method BSA.This clearly shows how important it is to smooth time series to achieve a desired prediction performance.
The main contributions of this study are listed as follows: (a) we first investigate the ability of the combination of VMD, RELM, and BSA to forecast multi-step short-term wind speed; (b) the proposed model can take full advantages of the signal decomposition approach, machine learning, and optimization algorithm; (c) the positive effects of the decomposition and optimization approaches on the final improvement are quantitatively analyzed.
The rest of the paper is organized as follows: the methods involved in the proposed model including VMD, RELM, and BSA are briefly introduced in Section 2; the framework of the proposed decomposition-optimization model is presented in Section 3; experiments and comprehensive analyses to validate the proposed model are presented in Sections 4 and 5; and Section 6 concludes the paper.

Methodology
In this paper, the proposed hybrid model is integrated with three components, variational mode decomposition (VMD), regularized ELM (RELM), and backtracking search algorithm (BSA).So, in this section, separate theories of the VMD-BSA-RELM model will be described in detail.

Variational Mode Decomposition
Variational mode decomposition (VMD) developed by Dragomiretskiy and Zosso [27] is a novel adaptive and non-recursive signal processing approach.The core of VMD is decomposing a signal f (t) into a series of modes denoted as u k with specific sparsity characteristic [27].The sparsity of each mode is called its bandwidth in the spectral domain, which can be estimated using the following steps: (1) Employ the Hilbert transform to each mode u k to produce a unilateral frequency spectrum, (2) transform frequency spectrum of each mode to baseband regions by means of an exponential adjusted to the respective estimated frequency, and (3) estimate the bandwidth using the H 1 Gaussian smoothness of the demodulated signal, i.e., L 2 -norm of the gradient.Therefore, the process of decomposition is implemented by settling the following optimization problem: where u k and w k represent the set of all modes and their frequencies, respectively; f (t) denotes the original signal; δ(t) denotes the Dirac distribution; and * is convolution operator.
Transform the above optimization problem into an unconstrained one by adding a quadratic penalty term and Lagrangian multipliers, as follows: where α denotes the balancing factor of the data-fidelity constraint.
The above unconstrained optimization problem can be solved by means of the ADMM (alternate direction method of multipliers), which can search the saddle point of the augmented Lagrangian in a series of iterative sub-optimizations by updating u n+1 k , ω n+1 k , and λ n+1 .u n+1 k , ω n+1 k , and λ n+1 are updated by: (3) where ûn+1 k , ûi (ω), f (ω), and λ(ω) represent the Fourier transform of u n+1 k , u i (t), f (t), and λ(t), respectively; n denotes the number of iterations; τ is time-step of the dual ascent.
The termination condition of the VMD algorithm is presented as follows: where ε is tolerance of convergence criterion.The entire decomposition process for VMD can be described as: Step 1: Initialize parameters for VMD method including û1 k , ω 1 k , and λ1 , and set iteration number n = 1.
Step 3: Update the Lagrangian multiplier in terms of Equation ( 5) and then set n = n + 1.
Step 4: Repeat the steps 2-4 until meeting the termination condition.Then, the final decomposed modes can be obtained.

Regularized Extreme Learning Machine
An extreme learning machine (ELM) is a novel single-hidden-layer feed-forward neural network developed by Huang et al. [16].The significant feature of an ELM is that it randomly generates the input weights and hidden biases, and then determines its output weights directly according to the Moore-Penrose generalized inverse matrix theory.Suppose there is a given set of training samples (x t , y t ) with M samples, the output of ELM with L hidden nodes can be estimated by: where g i (x) is the activation function of the ith hidden node; w i is the input weight vector; b i is the hidden bias and β i is the output weight connecting the ith hidden node and the output node.
The above equation can be rewritten as: where , and H is the hidden layer output matrix defined as : The output weight can be calculated by means of the least squares method to find the optimal solution of the following equation: min The optimal solution can be written as: where H † is the Moore-Penrose generalized inverse matrix of H, which can be calculated by the following orthogonal projection [16]: Due to the numerical instability of the pseudo-inverse, the regularized ELM (RELM) is developed through adding a positive value 1/C into the diagonal elements H T H when calculating the output weights β.Hence, the estimated output weights β of the RELM can be written as: where I is the identity matrix.More information about RELM can be found in [18].

Backtracking Search Optimization Algorithm
The backtracking search optimization algorithm (BSA), put forward by Civicioglu [32], is a novel stochastic search algorithm for real-valued numerical optimization problems.In contrast to other population-based evolutionary algorithms, BSA has achieved good performance in both computation speed and computation accuracy.The detailed structure of BSA is described as: (1) Initialization.In this stage, the current population P is randomly generating in the search space by: where N and D represent the population size and the individual dimensionality, respectively; rand(0, 1) is a random generator to provide the number in range (0, 1) uniformly.
(2) Selection I.The selection strategy is applied in this process to select the historical population which will guide the search direction in the mutation step.The initial historical population OldP is generated by: At the beginning of each iteration, the OldP is updated by: where a and b are two random numbers distributed in the range (0, 1) uniformly; permuting(oldP) means that the order of the individuals in oldP is randomly updated by a shuffling function.
(3) Mutation.In this step, the initial form of the trial population Mutant is defined as: where (oldP − P) is the search-direction matrix; F is the mutation factor, which controls the amplitude of (oldP − P).
(4) Crossover.In this step, the final form of the trial population T is generated.The crossover operator contains a two-stage process.In the first step, a binary integer-valued matrix map of size N×D is generated by: where u = permuting(D) represents that the order 1, 2, • • • , D is changed by a random shuffle function; mixrate is the only control parameter in BSA (called the mix rate parameter), which controls the number of the individuals that will mutate in a trial.
In the second step, the trial population T is updated by: Note that, several individuals of the final trial population T may exceed the permissible search space, hence boundary control strategy is quite necessary.The boundary control strategy is: (5) Selection II.A greedy selection is applied in this stage to update the population P, trial individuals with better fitness value then are reserved.Steps 2-5 are repeated until the terminal condition is reached.

The Proposed Decomposition-Optimization Model
The decomposition-optimization model developed in this study consists of variational mode decomposition (VMD), regularized ELM (RELM), and backtracking search algorithm (BSA).The proposed decomposition-optimization model is shorted as VMD-BSA-RELM.In the VMD-BSA-RELM model, VMD is first used to smooth the wind speed data for preprocessing.RELM is adopted as a predictor.Meanwhile, partial autocorrelation function (PACF) is executed to choose the suitable input vector and BSA is applied to optimize the input weights and hidden thresholds of the RELM model.The detailed procedures of the proposed hybrid VMD-BSA-RELM model are shown in Figure 1.Due to the multi-step wind speed forecasting can provide more useful information for decision makers, so the proposed VMD-BSA-RELM model is executed for multi-step wind speed forecasting.The input-output combinations for different forecasting horizons are shown as: where h is forecasting horizon; d is the suitable lag time which is determined by the PACF.
Energies 2018, 11, x 7 of 27 VMD-BSA-RELM model, VMD is first used to smooth the wind speed data for preprocessing.RELM is adopted as a predictor.Meanwhile, partial autocorrelation function (PACF) is executed to choose the suitable input vector and BSA is applied to optimize the input weights and hidden thresholds of the RELM model.The detailed procedures of the proposed hybrid VMD-BSA-RELM model are shown in Figure 1.
Due to the multi-step wind speed forecasting can provide more useful information for decision makers, so the proposed VMD-BSA-RELM model is executed for multi-step wind speed forecasting.The input-output combinations for different forecasting horizons are shown as: ( , , , ) where h is forecasting horizon; d is the suitable lag time which is determined by the PACF.
Decompose the raw data using VMD

Data Collection
In this study, historical wind speed data were collected from the Sotavento Galicia (SG) wind farm (original wind speed data from the SG wind farm can be found at: http://sotaventogalicia.com/en/real-time-data/historical.).The SG wind farm is located in Galicia, in northwest Spain, with latitude/longitude of 43.354377° N and 7.881213° W. Considering the influence of seasonal factors on forecasting accuracy, four datasets, A, B, C, and D, from different seasons were selected to verify the effectiveness of the proposed VMD-BSA-RELM method.Time periods of the four datasets are 15-21 January, 17-23 April, 13-19 July, and 3-9 October, respectively.Each dataset includes 1008 points with 10 min interval.Based on our test results and [36][37][38][39], in each dataset, the first 75% data are selected as training samples to build the prediction model while the

Data Collection
In this study, historical wind speed data were collected from the Sotavento Galicia (SG) wind farm (original wind speed data from the SG wind farm can be found at: http://sotaventogalicia. com/en/real-time-data/historical.).The SG wind farm is located in Galicia, in northwest Spain, with latitude/longitude of 43.354377

Data Decomposition and Parameters Settings
VMD is executed to decompose the raw wind speed series into several relatively stable modes to make them easy to be predicted.Before the implementation of the decomposition using VMD, the number of modes K needs to be preset.In this study, number of modes for each wind speed series is searched in the range [3,14], respectively.Then, the suitable number of modes is determined by the center pulsation of the decomposed modes [19].After that, each mode will be forecasted by the RELM optimized by BSA (BSA-RELM for short).The input vector of the BSA-RELM is determined by the partial autocorrelation function (PACF).Take Dataset A as an example, the subseries generated by VMD are shown in Figure 3 and the PACF values with 95% confidence interval are

Data Decomposition and Parameters Settings
VMD is executed to decompose the raw wind speed series into several relatively stable modes to make them easy to be predicted.Before the implementation of the decomposition using VMD, the number of modes K needs to be preset.In this study, number of modes for each wind speed series is searched in the range [3,14], respectively.Then, the suitable number of modes is determined by the center pulsation of the decomposed modes [19].After that, each mode will be forecasted by the RELM optimized by BSA (BSA-RELM for short).The input vector of the BSA-RELM is determined by the partial autocorrelation function (PACF).Take Dataset A as an example, the subseries generated by VMD are shown in Figure 3 and the PACF values with 95% confidence interval are presented in Figure 4.According to the partial autocorrelograms in Figure 4, the lagged variable with PACF value over the confidence interval will be chosen to form the input vector of forecasting model.The population size and the maximum iterations of BSA are set to 50 and 100, respectively.The input selection approach PACF is exploited for all forecasting models involved in this study to guarantee fair and effective comparisons.
Energies 2018, 11, x 9 of 27 presented in Figure 4.According to the partial autocorrelograms in Figure 4, the lagged variable with PACF value over the confidence interval will be chosen to form the input vector of forecasting model.The population size and the maximum iterations of BSA are set to 50 and 100, respectively.The input selection approach PACF is exploited for all forecasting models involved in this study to guarantee fair and effective comparisons.

Evaluation Indices
To evaluate forecasting performance of all forecasting models, three commonly used error evaluation metrics including mean absolute error (MAE), mean absolute percent error (MAPE), and root mean square error (RMSE) are used in the study.They can be calculated by:

Evaluation Indices
To evaluate forecasting performance of all forecasting models, three commonly used error evaluation metrics including mean absolute error (MAE), mean absolute percent error (MAPE), and root mean square error (RMSE) are used in the study.They can be calculated by: where y i and ŷi are the ith observed and predicted wind speed, respectively; N is the number of samples.
To clearly view the improvement of a specific model, improved percentage metrics of RMSE, MAE, and MAPE including P MAE , P RMSE , and P MAPE are calculated to exhibit the relative improvement degree between two different models denoted as Model 1 and Model 2. P MAE , P RMSE , and P MAPE of Model 2 relative to Mode 1 can be defined as:

Results and Discussions
Several experimental results are presented in this section to demonstrate the efficiency and applicability of the proposed decomposition-optimization model (VMD-BSA-RELM).These experiments are grouped into three subsections: one-step forecasting results, multi-step forecasting results, and Diebold-Mariano tests and computational time.

One-Step Forecasting Results
This part focuses on presenting the one-step forecasting performance of the proposed VMD-BSA-RELM forecasting model using four datasets from different seasons.ARIMA, RBF, GRNN, RELM, VMD-RELM, and BSA-RELM are used as comparison models.Akaike's Information Criteria (AIC), which has widely used in model selections [40,41], is used to determine the appropriate parameters of ARIMA.RMSE, MAE, and MAPE values provided by these seven forecasting models on the testing data for all datasets are exhibited in Table 2, where the model with the lowest evaluation indices values are highlighted in green.It can be seen that as for the three single neural network models (RELM, GRNN, and RBF), RELM has the best performance and GRNN has the worst performance for all datasets from different seasons.Meanwhile, forecasting results of ARIMA are closer to that of RELM in some cases.Further, comparisons of RELM and VMD-RELM, RELM and BSA-RELM, RELM and VMD-BSA-RELM suggest hybrid models outperform single model in all cases.This can be directly and clearly seen in Table 3 and Figure 5.It is clear that both VMD and BSA have positive effects on improving forecasting accuracy, while BSA has less contribution than VMD. Figure 5 visually indicates that the decomposition-optimization method can gain remarkable improvement of forecasting accuracy compared with hybrid models based on either signal decomposition approach or optimization algorithm.This clearly shows how important it is to incorporate VMD, BSA, and RELM to achieve a desired prediction performance.Concretely, the average improved percentages of RMSE, MAE, and MAPE between the VMD-BSA-RELM model and the single RELM model are 65.60%, 65.88%, and 66.21%, respectively, indicating a remarkable improvement.Figure 5 also shows that P RMSE , P MAE , and P MAPE values between VMD-BSA-RELM and VMD-RELM are greater than those between VMD-RELM and RELM.Similarly, P RMSE , P MAE , and P MAPE values between VMD-BSA-RELM and BSA-RELM are greater than those between BSA-RELM and RELM.These results emphasize the importance of the signal decomposition approach VMD, and the proposed models can take full advantages of both decomposition and optimization techniques.Predicted and observed curves as well as forecasting errors of all forecasting models are shown in Figure 6, where the predicted curves of the VMD-BSA-RELM model are close to real curves and its forecasting errors are evenly distributed around zero with a tiny range.

Multi-Step Forecasting Results
This section is devoted to illustrate the efficacy of the proposed model on multi-step wind speed forecasting.The RELM, VMD-RELM, and BSA-RELM which perform better among all competitors are performed as benchmark models in this experiment.Table 4 displays the forecasting performance for different seasons by these four models in 2-step, 4-step, and 6-step forecasting in terms of RMSE, MAE, and MAPE values, where the lowest values among diverse models are emphasized in green.It can be seen that although the forecasting performances deteriorate as the length of the forecasting horizons increase, the proposed model can always outperform than other forecasting models in all cases and horizons, followed by the VMD-RELM model, last the RELM.For instance, for the Dataset A (Winter), the RMSE value of the proposed model (VMD-BSA-RELM) in 2-step, 4-step, and 6-step are 0.2462 m/s, 0.4286 m/s, and 0.7120 m/s, respectively, which are better than these of 0.604 m/s, 0.7544 m/s and 1.0633 m/s for VMD-RELM.Moreover, the performance of BSA-RELM is closer to or even worse than that of the RELM along with the increase of the forecasting steps.More concretely, for the Dataset C, the RMSE and MAE values of the BSA-RELM are 1.2911 m/s and 1.0298 m/s for 6-step forecasting, which are slightly worse than these of 1.2893 m/s and 1.0286 m/s for the single RELM model.The 4-step and 6-step wind speed forecasting results as well as forecasting errors for different models are presented in Figures 7-10, where the superiority of the VMD-BSA-RELM model is confirmed.In these figures, the proposed VMD-BSA-RELM model always provides the smallest forecasting error variation ranges than other models and can accurately capture the variation trend of real wind speed, even for the 6-step forecasting.Additionally, forecasting errors increase along with the growth of horizons, specifically in peak and valley parts.Overall, the proposed model can maximize the advantages of the VMD and BSA methods to produce highly accurate results in multi-step forecasting, which is consistent with the conclusion drawn from one-step forecasting results.The 4-step and 6-step wind speed forecasting results as well as forecasting errors for different models are presented in Figures 7-10, where the superiority of the VMD-BSA-RELM model is confirmed.In these figures, the proposed VMD-BSA-RELM model always provides the smallest forecasting error variation ranges than other models and can accurately capture the variation trend of real wind speed, even for the 6-step forecasting.Additionally, forecasting errors increase along with the growth of horizons, specifically in peak and valley parts.Overall, the proposed model can maximize the advantages of the VMD and BSA methods to produce highly accurate results in multi-step forecasting, which is consistent with the conclusion drawn from one-step forecasting results.

Diebold-Mariano Tests and Computational Time
In this part, the Diebold-Mariano (DM) test [42] is applied to assess whether there are real differences between the proposed model and its competitors.The DM test results calculated by the square error loss function are tabulated in Table 6.It can be seen that the DM statistical values of the RELM, VMD-RELM, and BSA-RELM are more than the threshold value of the 1% significance level for all datasets and all forecasting horizons, which demonstrates the proposed model is superior to its rivals.
The average values of computational times of various step-ahead wind speed forecasting, with regard to Datasets A, B, C, and D, for all prediction models, are shown in Table 7.When compared with the other forecasting models, the proposed VMD-BSA-RELM model has higher time consumption due to the utilization of the optimization algorithm BSA, whereas its computational time is acceptable in real engineering application.These results have proven that the VMD-BSA-RELM model can provide more accurate wind speed forecasting results through sacrificing computational time within an admissible degree.

Conclusions
Wind speed forecasting is a crucial part of wind energy generation.However, due to its inherent randomness, high non-linearity and non-stationarity, accurate wind speed forecasting is a very challengeable task.In this study, a new decomposition-optimization method called VMD-BSA-RELM is proposed for short-term wind speed forecasting.Original wind speed data is preprocessed by VMD into a group of relative stationary modes where regressions by RELM are executed.Suitable parameters of RELMs are determined by means of BSA.The efficacy of the VMD-BSA-RELM was tested against several benchmark models using several datasets under different forecasting horizons.The results indicate that the VMD-BSA-RELM model significantly outperforms the other models and sacrifices computational time with an acceptable degree in real applications.Additionally, quantitative analyses of the effects of decomposition and optimization techniques on the final improvement of forecasting

Figure 1 .
Figure 1.The detailed procedures of the proposed VMD-BSA-RELM model.

Figure 1 .
Figure 1.The detailed procedures of the proposed VMD-BSA-RELM model.
• N and 7.881213 • W. Considering the influence of seasonal factors on forecasting accuracy, four datasets, A, B, C, and D, from different seasons were selected to verify the effectiveness of the proposed VMD-BSA-RELM method.Time periods of the four datasets are 15-21 January, 17-23 April, 13-19 July, and 3-9 October, respectively.Each dataset includes 1008 points with 10 min interval.Based on our test results and [36-39], in each dataset, the first 75% data are selected as training samples to build the prediction model while the remaining 25% are utilized to test.The proposed model is applied to obtain 1-step, 2-step, 4-step, and 6-step (1 h) ahead wind speed Energies 2018, 11, 1752 8 of 27 forecasting.The raw wind speed data are shown in Figure 2, which indicates non-stationary and nonlinear features of wind speed series.The statistical information including average (Ave.)value, maximum (Max.)value, minimum (Min.)value, standard deviation (Std.), the coefficient of variation (C v ) and the skewness coefficient (C s ) of the four datasets are listed in Table 1.The standard deviations of all datasets are all above 1.49(m/s), and the maximum/minimum values of Datasets A-D are 15.91/0.35(m/s), 19.13/0.64(m/s), 9.94/0.35(m/s), and 9.08/0.74(m/s).These results also indicate the non-stationary and nonlinear features of the original wind speed series.Energies 2018, 11, x 8 of 27 remaining 25% are utilized to test.The proposed model is applied to obtain 1-step, 2-step, 4-step, and 6-step (1 h) ahead wind speed forecasting.The raw wind speed data are shown in Figure 2, which indicates non-stationary and nonlinear features of wind speed series.The statistical information including average (Ave.)value, maximum (Max.)value, minimum (Min.)value, standard deviation (Std.), the coefficient of variation (Cv) and the skewness coefficient (Cs) of the four datasets are listed in Table 1.The standard deviations of all datasets are all above 1.49(m/s), and the maximum/minimum values of Datasets A-D are 15.91/0.35(m/s), 19.13/0.64(m/s), 9.94/0.35(m/s), and 9.08/0.74(m/s).These results also indicate the non-stationary and nonlinear features of the original wind speed series.

Figure 2 .
Figure 2. Wind speed on different time periods.

Figure 2 .
Figure 2. Wind speed on different time periods.

Figure 3 .
Figure 3. Decomposed subseries generated by VMD for Dataset A.Figure 3. Decomposed subseries generated by VMD for Dataset A.

Figure 3 .
Figure 3. Decomposed subseries generated by VMD for Dataset A.Figure 3. Decomposed subseries generated by VMD for Dataset A.

Figure 4 .
Figure 4. PACF values of the original series and the subseries for Dataset A.

Figure 4 .
Figure 4. PACF values of the original series and the subseries for Dataset A.

Figure 5 .
Figure 5. Average improved percentage of different error indices for all four datasets under one-step forecasting.

Figure 5 .Figure 6 . 27 Figure 6 Figure 6 . 27 Figure 6 Figure 6 .
Figure 5. Average improved percentage of different error indices for all four datasets under one-step forecasting.

Figure 6 .
(a) One-step forecasting results for seven forecasting models for Dataset A. (b) One-step forecasting results for seven forecasting models for Dataset B. (c) One-step forecasting results for seven forecasting models for Dataset C. (d) One-step forecasting results for seven forecasting models for Dataset D.

Table 1 .
Statistical information for the four datasets.

Table 1 .
Statistical information for the four datasets.

Table 2 .
One-step forecasting results for different models.The model with the lowest RMSE, MAE, and MAPE values are marked in green. a

Table 2 .
One-step forecasting results for different models.
aThe model with the lowest RMSE, MAE, and MAPE values are marked in green.

Table 3 .
Percentage of improvement of different error indices under one-step forecasting.

Table 4 .
Multi-step forecasting results for different models.The model with the lowest RMSE, MAE, and MAPE values are marked in green. a

Table 4 .
Multi-step forecasting results for different models.The model with the lowest RMSE, MAE, and MAPE values are marked in green. a

Table 6 .
Results for the DM test.