A Hybrid Approach for Multi-Step Wind Speed Forecasting Based on Multi-Scale Dominant Ingredient Chaotic Analysis, KELM and Synchronous Optimization Strategy

: Accurate wind speed prediction plays a significant role in reasonable scheduling and the safe operation of the power system. However, due to the non-linear and non-stationary traits of the wind speed time series, the construction of an accuracy forecasting model is difficult to achieve. To this end, a novel synchronous optimization strategy-based hybrid model combining multi-scale dominant ingredient chaotic analysis and a kernel extreme learning machine (KELM) is proposed, for which the multi-scale dominant ingredient chaotic analysis integrates variational mode decomposition (VMD), singular spectrum analysis (SSA) and phase-space reconstruction (PSR). For such a hybrid structure, the parameters in VMD, SSA, PSR and KELM that would affect the predictive performance are optimized by the proposed improved hybrid grey wolf optimizer-sine cosine algorithm (IHGWOSCA) synchronously. To begin with, VMD is employed to decompose the raw wind speed data into a set of sub-series with various frequency scales. Later, the extraction of dominant and residuary ingredients for each sub-series is implemented by SSA, after which, all of the residuary ingredients are accumulated with the residual of VMD, to generate an additional forecasting component. Subsequently, the inputs and outputs of KELM for each component are deduced by PSR, with which the forecasting model could be constructed. Finally, the ultimate forecasting values of the raw wind speed are calculated by accumulating the predicted results of all the components. Additionally, four datasets from Sotavento Galicia (SG) wind farm have been selected, to achieve the performance assessment of the proposed model. Furthermore, six relevant models are carried out for comparative analysis. The results illustrate that the proposed hybrid framework, VMD-SSA-PSR-KELM could achieve a better performance compared with other combined models, while the proposed synchronous parameter optimization strategy-based model could achieve an average improvement of 25% compared to the separated optimized VMD-SSA-PSR-KELM model.

In recent decades, wind energy has become more and more important as an emerging renewable energy source.Nevertheless, due to uncertainties such as sunshine, topography, and pressure, as well as the intermittent and random effects of wind speed, wind power possesses a large amount of uncertainties, with which wind power operation would be challenging.However, accurate short-term wind speed forecasting plays a vital role in meeting such a challenge, with which the dispatch plan could be adjusted in time, as well as the optimal unit combination plan being formulated effectively [1].Additionally, due to the fact that the dynamic behavior of future wind speed trends could be excavated by multi-step forecasting, satisfactory multi-step prediction results are important indicators of wind speed forecasting.Therefore, it is necessary to develop an accuracy multi-step short-term wind speed prediction model imminently, to thus improve economic benefits for wind farms.
Over the past few decades, a great variety of methods have been developed to achieve wind speed prediction, which can be roughly divided into four categories [2]: physical models, conventional statistical models, spatial correlation models, and artificial intelligence (AI) models.As a well-known physical model, numerical weather prediction (NWP) [3], achieves wind speed prediction, considering various relevant meteorological factors such as humidity, pressure, wind speed, and direction, etc.However, the drawbacks of the long operation time and the large amount of computing resources make such models difficult to construct.Another popular forecasting approach, namely statistical models could extract potential information contained in the historical wind speed series, among which autoregressive (AR) [4], autoregressive moving average (ARMA) [5], and autoregressive integrated moving average (ARIMA) [6] have been widely investigated.Nevertheless, due to the strong nonlinearity and non-stationarity within the wind speed time series, the capabilities of such models would be restricted significantly.Spatial correlation models that consider the spatial relationship of the wind speed information collected from various wind farms are investigated to be an effective tool for wind speed forecasting [7,8].Whereas the formulation of such models is more difficult to implement than conventional statistical models, due to the difficulty in wind speed data collection and the timely transmission from various space-related sites [2].The other category, namely AI models, have been developed rapidly in the field of wind speed prediction over the past few decades.The superior performance in dealing with nonlinear and non-stationary time series has been approved by a number of scholars.Among the varied methods, artificial neural networks (ANNs) [9,10] possess strong robustness, as well as the ability to fully approximate complex nonlinear relationships, whereas the network structures are difficult to determine, as well as being time consuming for examination.In contrast, the appropriate parameters in support vector regression (SVR) [11,12] are easier to determine, of which the nonlinear forecasting problems could be solved by proper kernel transformations.Compared with the AI approaches mentioned above, extreme learning machine (ELM) [1,13,14] is widely utilized in the field of wind speed forecasting due to the fast computing speed and strong generalization capability.Additionally, to enhance the generalization performance of ELM, the regularization coefficient is employed to solve optimization problems, as well as replacing hidden nodes by kernel functions, thus weakening the randomness of the predicted results [15].
Generally, the forecasting performance obtained by directly applying a prediction model would be yield terrible results, which can be attributed to the non-linear and non-stationary traits within the wind speed time series.To this end, a large number of data preprocessing methods has been developed for reducing the non-stationarity of the raw wind speed data, which has been proven to be beneficial for improving prediction accuracy [16].Time-frequency signal decomposition methods such as wavelet transform (WT) [17], empirical mode decomposition (EMD) [18], and variational mode decomposition (VMD) [19] have been widely utilized in wind speed prediction.Among the approaches, VMD possesses a more adaptive ability than WT, as well as owning a more solid mathematical theoretical basis than EMD [17,18], with which VMD has been widely applied into various fields [20,21].Nevertheless, the decomposition efficiency of VMD is affected by the mode number and the quadratic penalty term as well as updating steps to some extent [21,22], which makes parameter optimizations for VMD necessary.On the basis of the comparisons analyzed above, VMD will be employed in this study to preliminary preprocessing of the raw wind speed time series.
To further enhance the forecasting performance on the basis of the decomposition methods applied, singular spectrum analysis (SSA) was employed, to extract the dominant and residuary ingredients from all sub-series, which has been proven as an effective technique to improve the capabilities of the prediction models when they are combined with time-frequency signal decomposition methods [23,24].Furthermore, in order to exploit the inherent laws of chaotic systems for the preprocessed sub series, phase space reconstruction (PSR) [25], which is considered to be a powerful tool for chaotic time series analysis, is employed to deduce the inputs and outputs of the forecasting engine.Nevertheless, the reconstruction performance for the chaotic system would be affected by the parameters in PSR to some extent, with which the forecasting performance would be restricted [9,26].
In order to achieve better parameter optimization for the approaches mentioned above, an improved hybrid grey wolf optimizer-sine cosine algorithm (IHGWOSCA) is proposed, to handle such problems.Hence, to enhance the accuracy of the multi-step short-term wind speed prediction, a novel hybrid model based on multi-scale dominant ingredient chaotic analysis, kernel extreme learning machine (KELM), and IHGWOSCA-based synchronous optimization strategy is proposed in this paper.The proposed multi-scale dominant ingredient chaotic analysis, combining VMD, SSA, and PSR is implemented to preprocess the raw wind speed data, with which the non-stationarity of the wind speed time series could be significantly weakened.In this phase, the residual of VMD would be accumulated with the residuary ingredients obtained by SSA, to generate an additional forecasting component.Meanwhile, the inputs and outputs of KELM could be deduced by PSR effectively, after which the predictors for all the components could be constructed.Finally, the ultimate prediction values of the original wind speed are calculated by integrating the predicted results of all the components.The optimal parameters of each module could be obtained by repeating the whole process introduced above, in the IHGWOSCA optimizer, with which the best performance could be achieved.Furthermore, the superiority and effectiveness of the proposed model has been testified by comparative experiments, of which four sets of wind speed time series were collected from Sotavento Galicia (SG) as well as other six relevant single and combined models were used for comparative analysis.
The remaining parts of this paper are organized as follows: Section 2 detailed presents the base knowledge for VMD, SSA, PSR and KELM.Section 3 introduces multi-scale dominant ingredient chaotic analysis, the proposed IHGWOSCA algorithm, optimization strategies, and the specific procedure of the proposed model.Section 4 denotes the effectiveness of the proposed model through experimental results and analysis.Section 5 details the perspectives about further investigation directions.The conclusions are summarized in Section 6.The abbreviations of technical terms are listed in Appendix A.

Variational Mode Decomposition
Variational mode decomposition (VMD), proposed by Dragomiretskiy [22] is a novel adaptive signal processing method, where the decomposition components of the given signal can be obtained by determining the center frequency and bandwidth of each component when searching for the optimal solution of a variational problem.In this way, the frequency domain division of the signal, and the effective separation of intrinsic mode functions (IMFs) can be adaptively implemented.Assuming that the original signal f is decomposed into K components, the corresponding constrained variational problem is as follows: where m k and ω k represent the set of all modes and the corresponding center frequencies, δ(t) denotes the Dirac distribution, and * is convolution operator.Then, the quadratic penalty term and Lagrangian multiplication operator β(t) are employed to transform the constraint variational problem into an unconstrained one: where α represents the balancing factor of the data-fidelity constraint.Subsequently, the alternating direction method of the multipliers (ADMM) [27] is employed to search the saddle point of the augmented Lagrangian expressions by updating m k , ω k , and β alternately, as follows: where γ represents the time-step of the dual ascents, and m k n+1 , m i (ω), f (ω), and β (ω) denote the Fourier transform corresponding to m k n+1 , m i (t), f(t), and β(t), respectively.The main procedures of VMD are exhibited below: Step 1: Initialize k 1 , ω k 1 , β 1 and n = 1; Step 2: Update k and ω k by Formulas (3) and (4); Step 3: Update β based on Formula (5); Step 4: < ε stop updating; else n = n + 1, and turn to Step 2.

Singular Spectrum Analysis
Singular spectrum analysis (SSA), combining multivariate statistics and probability theory for time series analysis, is a novel data preprocessing method, which is commonly utilized to identify and extract periodic, quasi-periodic, and oscillatory components from the raw data [28].There exist four main procedures of SSA, i.e., embedding, singular value decomposition (SVD), grouping, and diagonal averaging [29,30].The detailed calculations of SSA are exhibited as follows [31]: (1) Embedding.The original time sequence x = {xi | i =1, 2, ⋯, N} is reconstructed into a Hankel matrix [32] to begin with SSA, which is defined as: where t = N − l + 1 and l denote the window length.
(2) SVD.On the basis of the time series embedded, the i-th eigentriple (σi, Ui, Vi) can be obtained by decomposing the matrix H with SVD, thus deducing the Hankel matrix H as follows: 1 2 ... , where σi is the singular value, and Ui and Vi denote the singular vectors of matrixes HH T and H T H, respectively.

Phase Space Reconstruction
One of the approaches that could restore the original dynamic system in phase space and namely coordinates delay reconstruction method, was proposed by Packard et al. [25], which is used to construct the d-dimensional phase space vector with different delay times τ for a one-dimensional time series.Therefore, the PSR expression with various prediction horizons h for the collected wind speed series x = {xi |i = 1, 2, ⋯, N} can be denoted as follows: ( 1) (10) where L = N − (d − 1)•τ − h, τ and d are the delay time and the embedded dimension, respectively, N represents the total number of wind speed samples, and Xi (i = 1, 2, …, L) denotes the i-th space vector in the phase space.The corresponding output matrix of the forecasting engine could be deduced by the following formula: where Oi represents the forecasting value corresponding to the i-th vector of the phase space matrix.

Kernel Extreme Learning Machine
The Extreme learning machine (ELM) proposed by Huang et al. [33] is a type of single hidden layer feed-forward network (SLFN), of which the input-weights and biases are generated randomly.On the basis of ELM, a modified version of ELM combining kernel functions is proposed by Huang et al. [15].The minimal norm least square method is employed in ELM, to thus deduce the output weights β through solving the set of linear equations Hβ = T, which is shown as below: † = β H T (12) where H † represents the Moore-Penrose generalized inverse of matrix H. Due to the fact that both of the smallest training errors and smallest norms of the output weights are considered in ELM, a better generalization performance for the networks could be obtained.For this purpose, the regularization coefficient C was adopted in the optimization phase, with which the output weights β could be described as [15]: where I denotes an identity matrix of dimension N.For the cases where the hidden layer feature mapping h(•) would be unknown, the kernel matrix for kernel extreme learning machine (KELM) can be defined as [15]: where K (•, •) denotes the kernel functions.According to Equations ( 13) and ( 14), the output functions of ELM can be described as follows: In the previous references [34,35], the radial basis function has been employed as an effective kernel function as well, as it is known as a Gaussian kernel function, which is defined as follows: where σ 2 denotes the kernel parameter.To achieve better generalization for the performance of the networks, the regularization coefficient C and the kernel parameter σ 2 need to be set appropriately [15].

Multi-Scale Dominant Ingredient Chaotic Analysis
Due to the non-linearity, non-stationarity, and random fluctuation characteristics within the wind speed time series, the forecasting performance would be severely restricted.Therefore, VMD is employed to preliminarily decompose the collected wind speed data into several sub-series with various frequency scales, of which the decomposition efficiency of VMD and the prediction accuracy are greatly affected by the parameters K, α and γ [21,22].In order to further reduce the non-stationarity of the decomposed sub-series, SSA is implemented to extract the dominant ingredients and residuary ingredients from the sub-series, thus achieving multi-scale dominant ingredient analysis effectively.In this study, the set of indices Z = {1, 2, …, l} in the grouping phase of SSA is divided into two discrete subsets, that is, Z1 = {1, 2, …, s} and Z2 = {s + 1, s + 2, …, l}, with which the matrix HZ could be represented as HZ = HZ1 + HZ2.It is worth noting that the parameter s that determines the dominant ingredients could affect the prediction accuracy to some extent [36].Additionally, the residual of VMD, i.e., is integrated with all the residuary ingredients of all the sub-series for the ulterior improvement of the forecasting model's capabilities.Subsequently, PSR, which has been widely utilized for chaotic time series analysis [9,37], is implemented to construct the inputs and outputs of the forecasting models, corresponding to each predictive component.Nevertheless, the time delay τ and the embedded dimension d could affect the recovery of the PSR dynamic system, with which the prediction performance would be significant restricted.It can be seen that the key to constructing the proposed hybrid forecasting model is to assign appropriate parameters to each module.To this end, an improved hybrid grey wolf optimizer-sine cosine algorithm (IHGWOSCA)-based synchronous optimization strategy is proposed, to achieve better parameter optimization and forecasting performance, which will be detailed later.

An Improved Hybrid Grey Wolf Optimizer-Sine Cosine Algorithm
The hybrid grey wolf optimizer-sine cosine algorithm (HGWO-SCA) is proposed by Singh et al. [38], of which both the grey wolf optimizer (GWO) and the sine cosine algorithm (SCA) are developed by Mirjalili et al. [39,40].Four categories of grey wolves, α, β, δ, and ω are defined for simulating the leadership hierarchy in normal GWO, which are determined by the top three best positions (fitness value).Then, the equations that could be utilized for mathematically model encircling behavior are defined as below: where t denotes the t-th iteration, X ⃑ p represents the position of the prey, X ⃑ indicates the position vector of the grey wolf, and A ⃑ and C ⃑ are coefficient vectors calculated as follows: where r1 and r2 are random vectors in the scopes of [0, 1]; the components of a ⃑ linearly decrease from 2 to 0 over the course of the iteration in the normal GWO and HGWO-SCA.Besides, the transition between the exploration and exploitation stage depends on the components of a ⃑ and A ⃑ , of which half of the iterations are divided toward exploration, when |A| ⃑ > 1 and the remaining iterations are assigned for exploitation when |A| ⃑ < 1 [39].Hence, to better improve the corresponding abilities of these two phases, a cosine function-based decreasing formula for updating the components of a ⃑ is proposed in this study: where t and T represent the current iteration and the maximum number of iterations, respectively.In addition, the comparison of the proposed function and the original one for updating the components of a ⃑ over the course of the iterations is intuitively exhibited in Figure 1.As can be seen from Figure 1, the values of a ⃑ are generally larger in the proposed function during the first half of the iterations, which could contribute to improving the exploration ability of the algorithm at this stage.The corresponding effects in the exploitation phase would be obtained with smaller values of a ⃑, compared to the original ones.The other stage of the whole algorithm, namely hunting, is usually guided by the α wolf, while the β and δ wolves participate in hunting occasionally [39].To effectively mathematically simulate the hunting behavior, the α, β, and δ wolves are assumed to possess more knowledge about the potential location of prey.In addition, the position updating functions of the α wolf in HGWO-SCA are modified by applying the position updating equations of SCA, thus enhancing the convergence capabilities of GWO [38].The corresponding updating equations of the α, β, and δ wolves are defined as follows: ; ; ; where X ⃑ 1 , X ⃑ 2 , and X ⃑ 3 indicate the positional information owned by the α, β, and δ wolves so far.Due to the fact that individuals in HGWO-SCA are merely updated by simple averaging X ⃑ 1 , X ⃑ 2 , and X ⃑ 3 , some scholars have focused on the updating approaches for the individuals [41].In this study, a weighted averaging strategy is proposed to iterate the individuals, in which α, β, and δ wolves are separately assigned a weight value that is deduced by inversing the corresponding fitness values of the wolves.The detailed calculations are as follows: where fit denotes the fitness of the corresponding individual.Furthermore, the pseudo code of the proposed IHGWOSCA algorithm is exhibited in Algorithm 1.
Algorithm 1.The pseudo code of the proposed IHGWOSCA algorithm 1: Initialize a, A ⃑ , and C ⃑ 3: Calculate the fitness of each search member 4: X ⃑ α : the best search agents, X ⃑ β : the second-best search agent, X ⃑ δ : the third-best search agent 5: While (t < maximum number of iterations) 6: For each search agent: 7: Update the position of the current search agent on the basis of Equations ( 25) and (26) 8: End for: 9: Update a, A ⃑ , and C ⃑ by Equations ( 21), ( 19) and (20), respectively.10: Calculate the fitness of all grey wolves 11: Save the position information owned by the β and δ wolves with Equations ( 23) and ( 24), while the position information for α wolf are updated as below: 12: If rand () < 0.5 13: Then: 14: Else: 16: End else 20: t = t+1 21: End while 22: Return X ⃑ α

Optimization Strategy
In order to effectively optimize the parameters in various modules, as well as construct the hybrid forecasting model, the proposed IHGWOSCA algorithm is adopted to handle this problem.To begin with, the parameters of SSA, PSR, and KELM for all the sub-series are considered to be the same in this study, which could facilitate fast convergence as well as reduce computation.Hence, the total number of the variables is eight, while the corresponding coding strategy of the agents in the proposed IHGWOSCA is described in Figure 2. Additionally, the metric root-mean-square error (RMSE) represented in Equation ( 27) is adopted as the objective function for parameter optimization.

Specific Procedures
The main procedures of the proposed novel hybrid wind speed forecasting model, combined with VMD, SSA, PSR, KELM, and IHGWOSCA-based synchronous optimization strategies are described as follows: Parameters of SSA Step 1: Collect the original wind speed data and initialize the population of IHGWOSCA; Step 2: Calculate the fitness value for each agent; Step 2.1: Decode the population and assign the corresponding parameters for each module, i.e., the parameters K, α, γ for VMD, s for SSA, τ, d for PSR as well as C, and σ 2 for KELM; Step 2.2: Decompose the collected wind speed data into K modes by utilizing VMD, then calculate the residual mr of VMD; Step 2.3: Implement SSA for all the sub-series, then extract the dominant ingredients as well as accumulate all of the residuary ingredients with mr.
Step 2.4: For the k-th (k = 1, …, K, K + 1) component, construct the input and output matrixes for the k-th (k = 1, …, K, K + 1) KELM, applying PSR; Step 2.5: Model the k-th (k = 1, …, K, K + 1) KELM with parameters C and σ 2 .Repeat Steps 2.4 to 2.5 until k = K + 1, then accumulate the predicted results of all the components to obtain the ultimate forecasting value, and then calculate the fitness value by Equation ( 27).
Step 2.6: Repeat Steps 2.1 to 2.5 until the fitness values for all the agents are generated; Step 3: Execute the operators of IHGWOSCA.
Step 4: Repeat Step 2 to 3 until the maximum number of iterations is reached; Step 5: Obtain the optimal parameters for all of the modules by decoding the best individual, as well as calculate the final forecasting values of the collected wind speed data by repeating Steps 2.1 to 2.5.
The overall process of the proposed wind speed forecasting model is depicted in Figure 3.In this study, four cases of short-term wind speed data are collected from Sotavento Galicia (SG) wind farm, of which the data are recorded with a mean time interval of 10 minutes.Furthermore, these four cases are selected from SG for the time periods of 8-14 March, 7-13 June, 22-28 September, 8-14 December in 2018, which are named as SG Mar., SG Jun., SG Sep. and SG Dec. in the later experiments, respectively.The visualizations of all of the collected wind speed time series are shown in Figure 4, and the corresponding statistical information including the maximum (Max.)value, minimum (Min.)value, mean value, skewness (Skew.),kurtosis (Kurt.), and standard deviation (Std.) are exhibited in Table 1.It can be intuitively observed that the raw wind speed time series possesses strong non-linearity and non-stationarity, with which the precise forecasting model would be difficult to construct.Additionally, the phase space matrices corresponding to the forecasting components are deduced by PSR with various parameters, among which the last 288 samples are assigned to be the testing sets in all of the experimental cases.

Experimental Description
In order to testify the availability of the proposed hybrid prediction model based on the VMD, SSA, PSR, KELM, and IHGWOSCA-based synchronous optimization strategies, a set of single models and combined models are carried out for comparative experiments in one-step, three-step, and five-step predictions.The single models, namely SVR and KELM are directly adopted for prediction, of which the parameters in both of these two models are searched by grid search (GS).The remaining four combine comparative models integrating with various assistive technologies, including EMD-KELM, VMD-KELM, EMD-SSA-PSR-KELM, and VMD-SSA-PSR-KELM are utilized to demonstrate the availability of the proposed modified modules.Among these combined models, both EMD-KELM and EMD-SSA-PSR-KELM achieved wind speed forecasting with the EMD-based decomposition method, while the former one used KELM for prediction modeling without SSA, whereas KELM and the proposed dominant ingredient chaotic analysis were adopted in the latter one.Both VMD-KELM and VMD-SSA-PSR-KELM are based on the VMD decomposition technique, while the only difference between the two is that the proposed dominant ingredient chaotic analysis combining SSA and PSR is employed in latter one, while the former one is not implemented.
In order to assess the performance of all the forecasting models quantitatively, three common indexes, including RMSE, mean absolute error (MAE), and mean absolute percentage error (MAPE) are employed, which could measure the deviation between the predicted and collected values [42].The detailed equations of these three metrics are described as follows: ) where N denotes the total number of testing samples, and Y and Y represent the collected and predicted values, respectively.Additionally, the reducing ratios of the indexes RMSE, MAE, and MAPE are employed to evaluate the performance between two different models [1,43,44], with which the improvement degree of the proposed model could be quantitatively expressed.The detailed definitions of these percentage-based indexes, including PRMSE, PMAE, and PMAPE are as follows: = 100 = 100 = 100 where the subscript a denotes the comparative model, and subscript b represents the proposed model in this study.
For the proposed model, the search agents and the maximum iterations of IHGWOSCA are given as 10 and 20, respectively.The predetermined parameters of the window length l for Hankel matrix in SSA is set as 500.The parameters (K, α, γ) in VMD, (s) in SSA, (τ, d) in PSR, and (C, σ 2 ) in KELM are searched for the scopes of [2,10], [0, 1], [1,2000], [1,167], [1,15], [1,40], [1,1000], and [1,1000], orderly.For the relevant comparative models, the regularization coefficient C and the kernel parameter σ 2 of all the SVR-and KELM-based models are optimized by GS, where the searching scopes are in intervals [2 −8 , 2 8 ] and [2 −5 , 2 5 ], respectively.For the VMD-based comparative models, the parameter α is set as the default value of 2000, and the parameters K and γ are optimized by GS [13], of which K is searched in [2,10] with increasing step 1, and γ is searched in [0, 1] with increasing step 0.1.Meanwhile, for the SSA-and PSR-based models, including EMD-SSA-PSR-KELM and VMD-SSA-PSR-KELM, the window length l of the Hankel matrix is given as 500, and the corresponding parameter s is set as 105, as suggested in [36].Besides, the parameters τ and d of PSR are set as 1 and 10, respectively.Furthermore, the optimal parameters within the proposed models are obtained by the proposed IHGWOSCA algorithm in different horizons for all experimental cases, as illustrated in Table 2.

Contrasting Analyses
In this section, the results of the one-step, three-step, and five-step forecasting on all of the experimental cases are discussed in detail.The metrics RMSE, MAE, and MAPE, obtained by all of the comparative models and the proposed model with various prediction horizons are illustrated in Table 3, respectively, of which the proposed models are highlighted in bold.Meanwhile, in order to assess the improvements that are achieved by the proposed model compared with the comparative models, the performance improvements are depicted in Table 4 integrally.As can be observed from Tables 3 and 4, several consequences could be drawn as below: (1) Comparing the metrics RMSE, MAE, and MAPE, obtained by SVR and KELM in all experimental cases, it can be observed that KELM generally possesses lower metrics than SVR, which means that a better forecasting performance could be obtained by KELM.For instance, in the cases of SG Mar. and SG Jun., the one-step prediction results in terms of MAPE for these two models are 11.37%, 10.35%, and 13.69% 12.76%, of which the reducing ratios of MAPE for KELM are 8.92% and 6.75%, respectively.Furthermore, this trend would be pronounced in the multi-step predictions.In the three-step and five-step predictions in the case of SG Sep., the three employed indicators obtained by SVR and KELM are 0.97 m/s, 0.71 m/s, 23.04% (SVR, three-step), 0.96 m/s, 0.71 m/s, 18.91% (KELM, three-step) and 1.27 m/s, 0.97 m/s, 29.97 (SVR, five-step), 1.19 m/s, 0.91 m/s, 23.27% (KELM, five-step) orderly.It can be seen that the decreasing percentages of the index MAPE for KELM in three-step and five-step forecasting are 17.91% and 22.37%, respectively, with which the superiority of KELM could be demonstrated effectively.It is worth noting that satisfactory results could not be directly achieved by the single models such as SVR and KELM, which could be attributed to the strong non-stationarity and non-linearity of the original wind speed time series.To this end, signal preprocessing technologies are necessary to enhance prediction performance.(2) Following the comparison of KLEM, EMD-KELM and VMD-KELM, it can be indicated that time-frequency signal processing approaches could greatly improve the prediction accuracy for wind speed.In the case of SG Dec., the evaluation metrics obtained by EMD-KELM in one-step predictions are 0.63 m/s, 0.50 m/s, 6.76%, which are deceased by 48.37%, 43.45%, and 43.74%, compared to the single-model KELM.Meanwhile, the metrics reducing ratio in the three-step and five-step predictions obtained by comparing KELM and EMD-KELM are 52.37%,50.16%, 49.15% and 47.87%, 46.07%, and 42.20%, respectively.From further comparison of the results of EMD-KELM and VMD-KELM, it can be indicated that the three evaluation indicators obtained by VMD-KELM are decreased by averaging 75.05%, 64.90%, and 56.17% in three experimental prediction horizons, respectively.Hence, it can be concluded that VMD could improve the forecasting accuracy better than EMD.Similar conclusions could be drawn by the same analysis for the remaining experimental cases.(3) The proposed dominant ingredient chaotic analysis combining SSA and PSR could improve the prediction model performance in an ulterior manner.In the case of SG Mar., compared with EMD-KELM, the metrics obtained by EMD-SSA-PSR-KELM are 0.52 m/s, 0.38 m/s, 4.71%, 0.80 m/s, 0.60 m/s, 7.69%, 0.93 m/s, 0.70 m/s, 9.12% in three variously predicted horizons, of which the corresponding decreasing percentages are averaged by 25.67%, 6.50% and 12.04%.Meanwhile, compared with VMD-KELM, the metrics of VMD-SSA-PSR-KELM have been averaged, decreasing by 31.11%,8.02% and 9.30% in different predicted horizons, respectively.The comparisons of EMD-KELM, EMD-SSA-PSR-KELM and VMD-KELM, and VMD-SSA-PSR-KELM indicate that the proposed dominant ingredient chaotic analysis could further enhance the forecasting performance on the basis of the signal decomposition approaches implemented.Nevertheless, the performance of the proposed dominant ingredient chaotic analysis would be restricted by the parameters that are settled in SSA and PSR, which makes parameter optimization necessary.(4) Comparing VMD-SSA-PSR-KELM and the proposed model, both of these two models possess the same frameworks, while the parameters in the proposed one are optimized by the proposed IHGWOSCA algorithm synchronously.In the case of SG Mar. as the example, the metrics decreasing the percentage between VMD-SSA-PSR-KELM and the proposed model in terms of MAPE are 23.05%,8.71%, and 15.94% in three predicted horizons, respectively.It can be concluded that the forecasting performance obtained by the synchronous optimization strategy-based model is much better; in other words, the appropriate parameters in each module could be optimized by the proposed IHGWOSCA effectively.Additionally, for one-step prediction in all cases, the average decline ratios between these two models in terms of RMSE, MAE, and MAPE are 32.86%,32.94%, 31.89%,respectively.Furthermore, compared with SVR models in all experimental cases, the maximum decreasing ratio of MAPE in the one-, three-and five-step predictions are 95.57%(in the case of SG Jun.), 91.62% (in the case of SG Sep.), and 90.79% (in the case of SG Sep.), respectively, with which it can be indicated that a large promotion in performance could be achieved by the proposed model.Additionally, in order to achieve intuitive observation of the forecasting results, the curves of the predicted and collected values, as well as prediction errors of all experimental models were depicted in Figures 5 and 6.As can be observed, the predicted curves of the proposed model in both single-step and multi-step forecasting could better approximate real curves, while the corresponding forecasting errors are evenly distributed around zero with small fluctuations.It can be indicated that the forecasting performance of the proposed model does not decrease significantly with the increase of the prediction step size, with which the superiority of the proposed model could be demonstrated convincingly.
Furthermore, the histograms of the metrics RMSE, MAE, and MAPE obtained by all of the experimental models with various prediction horizons, are shown in Figures 7 and 8, with which the variation tendencies of the evaluation indicators over the course of increasing prediction horizons could be observed intuitively.It can be observed that the indexes values obtained by the proposed model achieved minimums in all of the experiments, while the same framework-based VMD-SSA-PSR-KELM models achieved suboptimal performance in all experiments.Besides, the time-frequency signal decomposition-based models generally possess lower index values in terms of RMSE, MAE, and MAPE, while the performance improvements brought by VMD are much more than EMD, in this study.In addition, the proposed dominant ingredient, chaotic analysis combining SSA and PSR, could act as an ulterior force to enhance the forecasting performance to some extent, while the promotion caused by the analysis approach would be significant if the parameters of SSA and PSR were appropriately set.Furthermore, such parameter-searching problems could be effectively solved by the proposed IHGWOSCA algorithm, based on the synchronous optimization strategy.

Discussion
Following the detailed comparative analyses that are depicted above, the superiority of the proposed hybrid approach combining the VMD, SSA, PSR, KELM, and IHGWOSCA-based synchronous optimization strategies could be demonstrated effectively.It is worth noting that there exist only one kind of forecasting model in the proposed approach; i.e., the AI model, while the strategy of mixing various categories of prediction models such as AI and physical models was not considered in this study.According to the past references [45][46][47], such mixed models have been widely investigated for enhancing the prediction accuracy.Among the models, the advantages corresponding to each model could be maximized, thus making full use of each model to deal with different situations.In addition, the proposed IHGWOSCA algorithm could be optimized with better strategies for further enhancing the convergence speed, as well as the global search capability.Furthermore, multi-objective optimization that has been widely utilized in the field of controlling [48] could be implemented in wind speed forecasting, which could contribute to enhancing the performance of the models [49].Therefore, several perspectives for further investigation directions could be summarized as: (1) the combination of multiple forecasting models would be the focus of our future works, (2) for the proposed IHGWOSCA algorithm, some strategies that could contribute to a jump out of the local optimum could be employed, (3) multi-objective optimization implemented by various algorithms will be investigated in wind speed forecasting in our future studies.

Conclusions
To improve multi-step prediction performance, a novel hybrid based on a multi-scale dominant ingredient chaotic analysis, the KELM-and IHGWOSCA-based synchronous optimization strategy, is proposed in this paper.Specifically, the proposed model possesses a structure of VMD-SSA-PSR-KELM, of which the parameters in each module was synchronously optimized by the proposed IHGWOSCA algorithm.Firstly, VMD was applied to decompose the raw non-stationary wind speed data into several sub-series, while the residual of VMD was calculated concurrently.Then, SSA was employed to extract the dominant and residuary ingredients for each sub-series, while the residuary ones were integrated with the residual of VMD to be an additional forecasting component.Later, PSR was executed, to deduce the inputs and outputs of KELM for all of the forecasting components.Finally, the prediction models for all of the components were constructed by KELM, as well as the forecasting results that were accumulated to obtain the final forecasting values of the raw wind speed data.The whole procedure of the proposed VMD-SSA-PSR-KELM structure was iterated in the parameters searching phase, with which the parameters of each module would be optimized effectively.In the experimental stage, six relevant comparative models were employed to compare with the proposed model.Through an intensive analysis of the prediction results for the one-step and multi-step predictions, it can be concluded that the forecasting performance could be enhanced through an implementation of the proposed dominant ingredient chaotic analysis with appropriate parameters, from which the metrics obtained by EMD-SSA-PSR-KELM in a one-step prediction were decreased to an average of 32.46% compared with EMD-KELM.Besides, the proposed VMD-SSA-PSR-KELM structure achieved satisfactory results compared with other combined models, of which the indicators of such models achieved the second-lowest minimum in all of the experimental cases, and decreased by an average of 78.13%, 73.28% and 63.61%, in various prediction horizons compared to EMD-SSA-PSR-KELM.Nevertheless, the synchronous optimization strategy based on the proposed IHGWOSCA algorithm could maximize the performance of the proposed hybrid structure; i.e., the parameter optimizations for each module could be effectively implemented.The performance improvement brought about by the synchronous optimization strategy achieved an average by 25%, compared with the separated optimized VMD-SSA-PSR-KELM model.In consequence, the proposed novel hybrid approach could be considered as a credible tool for multi-step short-term wind speed forecasting.

Figure 1 .
Figure 1.Comparison of the proposed function and the original one for updating the components of a ⃑ over the course of iterations.

Figure 2 .
Figure 2. Coding strategy of the population in the proposed IHGWOSCA.

Figure 3 .
Figure 3.The procedures of the proposed IHGWOSCA-based synchronous optimization hybrid forecasting approach.

Figure 4 .
Figure 4.The original short-term wind speed time series from SG.

Figure 5 .Figure 6 .
Figure 5.The multi-step forecasting results of various hybrid models in different cases: (a) the case of SG Mar.; (b) the case of SG Jun.

Figure 7 .
Figure 7.Comparison between all of the multi-step experimental results in terms of RMSE, MAE, and MAPE in different cases: (a) the case of SG Mar.; (b) the case of SG Jun.

Figure 8 .
Figure 8.Comparison of all of the multi-step experimental results in terms of RMSE, MAE, and MAPE in different cases: (a) the case of SG Sep.; (b) the case of SG Dec.

Table 1 .
Statistical information for the four datasets from SG.

Table 2 .
Optimal parameters of the proposed models in different prediction horizons for all of the experimental cases.

Table 3 .
Results of multi-step forecasting on the test data in all cases.

Table 4 .
Percentage of the promotion between the comparative models and the proposed model in all experimental cases for multi-step prediction.