A Novel Hybrid Model Based on an Improved Seagull Optimization Algorithm for Short-Term Wind Speed Forecasting

: Wind energy is a clean energy source and is receiving widespread attention. Improving the operating efﬁciency and economic beneﬁts of wind power generation systems depends on more accurate short-term wind speed predictions. In this study, a new hybrid model for short-term wind speed forecasting is proposed. The model combines variational modal decomposition (VMD), the proposed improved seagull optimization algorithm (ISOA) and the kernel extreme learning machine (KELM) network. The model adopts a hybrid modeling strategy: ﬁrstly, VMD decomposition is used to decompose the wind speed time series into several wind speed subseries. Secondly, KELM optimized by ISOA is used to predict each decomposed subseries. The ISOA technique is employed to accurately ﬁnd the best parameters in each KELM network such that the predictability of a single KELM model can be enhanced. Finally, the prediction results of the wind speed sublayer are summarized to obtain the original wind speed. This hybrid model effectively characterizes the nonlinear and nonstationary characteristics of wind speed and greatly improves the forecasting performance. The experiment results demonstrate that: (1) the proposed VMD-ISOA-KELM model obtains the best performance for the application of three different prediction horizons compared with the other classic individual models, and (2) the proposed hybrid model combining the VMD technique and ISOA optimization algorithm performs better than models using other data preprocessing techniques.


Introduction
To achieve global clean energy development, reduce greenhouse gas emissions and prevent the crisis of the depletion of nonrenewable fossil energy reserves, the large-scale use of clean energy has become a global energy development trend [1,2]. Among the various widely used new energies, wind energy is used worldwide due to its wide energy distribution, pollution-free nature and sustainability, and it is of great significance to tap into the potential of wind energy to adjust the traditional energy structure. According to a report released by the Global Wind Energy Association (GWEC) in 2019, the global installed capacity of wind power in 2019 was 60.4 GW, reaching a total of 651 GW. As of the end of 2019, China's cumulative installed wind power capacity reached 210 MW [3]. The chaotic, random and intermittent characteristics of wind speed pose considerable challenges to power systems. The violent fluctuation of wind power in a short period of time causes a short-term imbalance of the power system, which may cause the power system to collapse. Therefore, accurate wind speed forecasting is critical to accurately predicting the output power of wind power and stabilizing the operating state of the power system. complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN)and support vector regression (SVR) as the best wind speed prediction method. Zhou et al. [30] proposed a hybrid framework for multilevel wind speed prediction based on variational model decomposition (VMD) and convolutional neural networks. Furthermore, chaos theory has increasingly attracted attention. Multifractal patterns of wind speed can be obtained through chaotic characteristics analysis. Jiang et al. [31] employed a hybrid linearnonlinear modeling method based on chaos theory to capture the linear and nonlinear factors hidden in wind speed time series, which contained VMD technology to remove the noise in original data. The experimental results showed that the hybrid model was more accurate compared with other models.
Based on the analysis above, artificial intelligence methods have been the most extensive and successful approaches to short-term wind speed prediction, but the prediction ability of a single artificial intelligence method is limited. Hybrid approaches have shown better performance than single models. Therefore, it has gradually become a popular trend to apply data preprocessing techniques before sending wind speed data into forecasting models.
In this study, a novel hybrid strategy is proposed that includes three portions: data preprocessing, optimization and forecasting. Specifically, based on the decomposition and integration strategy, VMD decomposition is used to decompose the original wind speed series into several variational modes to filter out the noise in the original wind speed time series. Then, the KELM prediction network is applied to the problem of wind speed forecasting. At the same time, the improved seagull optimization algorithm is used to optimize the kernel parameters of the KELM network, thereby forming a hybrid model.
The main contributions and innovations of this research are as follows: (1) data preprocessing technology is included to reduce the volatility and randomness of wind speed series and improve the accuracy of prediction. VMD decomposes the original wind speed series into a set of relatively stable modes. (2) In the prediction phase, the kernel function is added to ELM to map the one-dimensional wind speed sequence to the high-dimensional space for prediction, which reduces the difficulty of prediction. (3) An improved seagull optimization algorithm (ISOA) is proposed to determine the two best parameters in KELM simultaneously. In the prediction phase, ISOA continuously searches for the two parameters of the kernel function in KELM. At the same time, each search can retain the optimal approximate solution, so that the KELM network can be optimized, and the prediction accuracy and stability of the prediction are improved. (4) A systematic assessment system is established to evaluate the forecasting ability of our developed hybrid model. Four multistep prediction experiments and three performance indicators are included in this study to compare and analyze the forecasting capacity of the proposed hybrid model in each case.

Methods
The technologies used in the hybrid strategy are introduced in this section, including the data preprocessing technology (VMD), the KELM network and the improved seagull optimization algorithm. In the last part, the workflow of the hybrid strategy is presented.

Variational Mode Decomposition (VMD)
VMD is a novel signal decomposition method that was proposed by Dragomiretskiy and Zosso in 2014 [32], which decomposes a one-dimensional signal into a limited number of modes with a center frequency bandwidth through an iterative search. VMD has good adaptive ability and can overcome modal aliasing. It can decompose nonstationary wind speed time series into subseries called intrinsic mode functions (IMFs). Each subseries contains rich information. The mathematical model of VMD can be expressed as follows: where f is the signal to be decomposed, δ(t) is the impulse function and u k and ω k are the k-th mode component and the corresponding center frequency, respectively. To solve the optimization problem of Formula (1), we introduce the terms of the Lagrange multiplier operator λ and quadratic penalty factor α: The following shows the whole process of VMD decomposition: Step 1: Set the initial values of {û 1 k }, ω 1 k , {λ 1 } and n, whereˆuses the Parseval/Plancherel Fourier equidistant transform for conversion to the frequency domain.

Kernel Extreme Learning Machine
KELM is a single hidden layer feedforward neural network (SLFN). Traditional feedforward neural network training speed is slow and easily falls into local minimums, and the selection of the learning rate is sensitive. ELM randomly generates the connection weight between the input layer and the hidden layer and the threshold of the hidden layer source to obtain a unique optimal solution. For N arbitrarily distinct samples ( the output of an ELM with L hidden neurons can be expressed as where g(·) represents the activation function of the hidden layer, a i = [a i1 , a i2 , · · · , a im ] T is the input weight vector, β ι = [β i1 , β i2 , · · · , β im ] T is the output weight vector and b i is the bias. Equation (7) can be simplified as where where H is called the ELM hidden layer output matrix. Training a network of ELMs can be understood as finding a suitable set ofâ,b andβ satisfying: The regularization coefficient C is introduced and the regularized least square solution is obtained:β Thus, the output function of the ELM model is transformed into: KELM combines the ELM algorithm with a kernel function. The idea of the kernel function is to map the input spatial sample data to the high-dimensional feature space, and replace the inner product operation in the transformed high-dimensional space with the kernel function operation in the original input space.
In the KELM, the HH T of Equation (12) is constructed as follows: Then, we can deduce Equation (15), where K(·, ·) denotes the kernel functions. It can be seen that KELM's output function Θ(x) and the output layer β are as follows: It is worth noting that the Gaussian kernel function is employed in this paper according to the Mercer theorem as follows: where γ 2 represents the parameter of the kernel function. Therefore, there are two parameters that need to be adjusted in KELM, and the accuracy of KELM can be improved by adjusting C and γ. a new type of bioinspired optimization algorithm, the seagull optimization algorithm, by studying the biological characteristics of seagulls. Seagulls live in groups, using their intelligence to find and attack their prey. The most important characteristics of seagulls are migration and aggressive behavior. The mathematical expression of the natural behavior of seagulls is as follows.
During the migration process, seagulls move from one position to another and meet three conditions:

•
Avoid collision: To avoid collisions with other seagulls, variable A is employed to calculate the new position of the search seagull.
where C s (t) represents a new position that does not conflict with other search seagulls, P s (t) represents the current position of the search seagull, t represents the current iteration and A represents the motion behavior of the search seagull in a given search space.
where t = 0, 1, 2, . . . , Max iteration , f c can control the frequency of the variable, and its value drops from 2 to 0. • Best position: After avoiding overlapping with other seagulls, seagulls will move in the direction of the best position.
where M s (t) represents the positions of the search seagull. B is the random number responsible for balancing the global and local search seagull.
where r d is a random number that lies in the range of [0, 1].

•
Close to the best search seagull: After the seagull moves to a position where it does not collide with other seagulls, it moves in the direction of the best position to reach its new position.
where D s (t) represents the best fit search seagull. Seagulls can constantly change their attack angle and speed during their migration. They use their wings and weight to maintain height. When attacking prey, they move in a spiral shape in the air. The motion behavior in the x, y and z planes is described as follows: where r is the radius of the spiral and θ is a random angle in the range of [0, 2π]. u and v are the correlation constants of the spiral shape, and e is the base of the natural logarithm. The attack position of seagulls is constantly updated.
where P s (t) saves the best solution and updates the position of other search seagulls. The SOA algorithm has the advantages of solving large-scale constrained problems, low computational cost, and fast convergence speed. Compared with other optimization algorithms, it has strong advantages. However, the global optimization search process of SOA is linear as shown in Equation (19). This linear search method means that the global search capability of SOA cannot be fully utilized. Therefore, we propose a nonlinear search control formula as shown in Equation (28), which can target the seagull group exploration process stage and improve the speed and accuracy of the algorithm.
where e represents the base of natural logarithm. The specific implementation procedures of the proposed ISOA are shown as below: Step 1: Set the initial parameters of the SOA, including A, B, Max iteration , f c = 2, u = 1, and v = 1.
Step 2: Initialize the seagull population.
Step 3: Use the calculated fitness function to calculate the fitness value of each seagull and select the current best seagull position.
Step 4: Choose different strategies to update seagull migration and attack positions according to the description in Section 2.3.2.
Step 5: Repeat steps 3 and 4 to update the best seagull position and fitness value until the maximum number of iterations is reached.
Step 6: Obtain the final best seagull position and fitness value.

Workflow of the Hybrid Model
Through decomposition-based data preprocessing technology, VMD, SOA and KELM were combined to establish a hybrid method for wind speed prediction. To improve the prediction accuracy and search speed, an improved seagull algorithm was used to synchronously search the optimal parameters C and σ 2 of KELM. The root mean square error was used as the fitness function. The workflow of this study is provided in Figure 1 and detailed explanations are given below.

Data Preprocessing
The original wind speed sequence was volatile and random. At this stage, VMD technology was used to decompose the complex wind speed data. The modes decomposed by VMD had their own center frequencies, which were stable relative to the original wind speed time series.

Hybrid Models Forecasting
The KELM model was used as the basic predictive model of the system because of its advantages of fast learning and a super-nonlinear description ability. The decomposed subseries were respectively predicted by the KELM model. ISOA was used to find the two best parameters of KELM at the same time in the subseries prediction process to ensure that the prediction of each subseries was optimal. The two parameters of each subseries reached the optimal value when the number of iterations reach the maximum. Then, the forecasting results of these models were combined together to obtain the final wind speed forecasting result. The ISOA-KELM process is shown in Figure 1.

Multi-Step ahead Forecasting
The developed combined model was employed in this study to forecasting short-term wind speed. One-step, two-step and three-step forecasts were included in this study. Multistep forecasting was conducted to evaluate the predictive ability of the proposed strategy. The description of multi-step ahead forecasting is as follows: assume that the input datasets are {x(t − 5), x(t − 4), · · · x(t − 1), x(t)} and the output datasets are {x(t + l)}, where t donates a certain moment and l donates the forecast horizon. When l is equal to a positive integer, set the output data toŷ(l) = x(t + l). At this time,ŷ(l) is the l-step ahead forecast value of the original x(t + l).

Multi-Step ahead Forecasting
The developed combined model was employed in this study to forecasting shortterm wind speed. One-step, two-step and three-step forecasts were included in this study. Multi-step forecasting was conducted to evaluate the predictive ability of the proposed strategy. The description of multi-step ahead forecasting is as follows: assume that the input datasets are   where t donates a certain moment and l donates the forecast horizon. When l is equal to a positive integer, set the output data to ˆ( ) ( ) y l x t l   . At this time, ˆ( ) y l is the l -step ahead forecast value of the original ( ) x t l  .

Data Description
The experimental data for this study were taken from the Shanghai (SH) wind farm, which possesses rich wind energy resources. These data sets were collected on April 8, July 4, October 20 and January 15, 2019. All data sets included 1006 points, which were recorded every 10 minutes and lasted approximately a week. The first six datasets were

Data Description
The experimental data for this study were taken from the Shanghai (SH) wind farm, which possesses rich wind energy resources. These data sets were collected on 8 April, 4 July, 20 October and 15 January 2019. All data sets included 1006 points, which were recorded every 10 min and lasted approximately a week. The first six datasets were used for preheating, and the entire dataset was divided into a training set and a test set before the experiment. The first 80% was used for training, and the last 20% was used for testing. The maximum (Max.), minimum (Min.), mean, median (Med.), standard deviation (SD), kurtosis (Kurt.) and skewness (Skew) of the four data sets were also recorded, as shown in Table 1.

Performance Metrics
The value predicted by the model often had an error with regard to the true value. The performance indicator evaluates the prediction effect of different models by evaluating the Processes 2021, 9, 387 9 of 21 error between the observed value and the predicted value. Different evaluation indicators have different evaluation capabilities. In this study, the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were calculated. The calculation methods of MAE and RMSE offset the positive and negative prediction errors, taking into account the average degree of error between the predicted value and the observed value. MAPE is the average value of absolute error and is the most widely implemented indicator used to reflect the effectiveness and reliability of aproposed new model. To explain the performance indicators more clearly, Table 2 lists the definitions and specific formulas of the four error indicators. Here Y o (i) andŶ p (i) represent the actual value and the predicted value, respectively, and N is the sample size. Table 2. Three error metrics.

Metrics
Definition Equation

MAE
Mean absolute error

Different Experiments and Relative Analysis
In this section, a detailed evaluation and analysis of the proposed model are carried out. Two sets of experiments are designed, and the graphs and tables visually show the corresponding prediction results and evaluation indicators. The experimental setup and results are as follows.

Experimental Setup
Two sets of comparative experiments were used to compare the forecasting ability between the proposed model and other comparable models. Experiment 1 compared the proposed combined model with five independent models to investigate its prediction performance. Experiment 2 compared the forecasting accuracy between the proposed model and models using various data preprocessing technologies. The four data sets were tested by all models. The results of multistep ahead forecasting further illustrated the forecasting capability of different models. Three error evaluation indicators were used to quantify the predictive ability. The smaller the value of error criteria, the better the predictive performance.
In Experiment 1, we selected five widely used individual models (BP, SVM, LSTM, ELM and KELM) as the control group of the comparative experiment. In order to compare the developed strategy with the prediction ability based on different data preprocessing technologies, such as discrete wavelet transform (DWT), EMD and complementary ensemble empirical mode decomposition (CEEMD), we conducted experiment 2. Table 3 shows the comparison of the results of the proposed model and the other individual models in the four seasons datasets. Figures 2-4 show the forecasting results of individual forecasting models in SH in April. At the top of the chart, the predicted results versus 10 min interval sampling points for all forecasting models are shown. Below, the error distribution diagram of forecasting and the scatter diagram of each individual model are presented.

Experiment I: Comparison with Other Individual Models
For SH Apr, in the one-step forecasting, the proposed model showed the best MAE, RMSE and MAPE scores at 0.315, 0.408 and 6.606% respectively, followed by the KELM model, whose values for MAE, RMSE and MAPE were 0.888, 1.190 and 17.373% respectively. The worst was the BP neural network, with MAE, RMSE and MAPE scores of 1.247, 1.642 and 30.167%, respectively. When the model forecasting was two-step, the developed model had the best accuracy with an RMSE of 0.436. In the three-step, the proposed model still had the best predictive ability with an RMSE of 0.496, but the second most accurate model was the BP network. Figures 4-6 shows the prediction results of the proposed model and the individual model in the spring experimental series (SH Apr).
sults versus 10 min interval sampling points for all forecasting models are shown. Below, the error distribution diagram of forecasting and the scatter diagram of each individual model are presented.
For SH Apr, in the one-step forecasting, the proposed model showed the best MAE, RMSE and MAPE scores at 0.315, 0.408 and 6.606% respectively, followed by the KELM model, whose values for MAE, RMSE and MAPE were 0.888, 1.190 and 17.373% respectively. The worst was the BP neural network, with MAE, RMSE and MAPE scores of 1.247, 1.642 and 30.167%, respectively. When the model forecasting was two-step, the developed model had the best accuracy with an RMSE of 0.436. In the three-step, the proposed model still had the best predictive ability with an RMSE of 0.496, but the second most accurate model was the BP network. Figures 4-6 shows the prediction results of the proposed model and the individual model in the spring experimental series (SH Apr).  For SH July, when the forecasting is one-step, the proposed VMD-ISOA-KELM hybrid model achieves the highest accuracy with a MAPE value of 3.140%. Comparatively, the individual models have fairly lower MAPE values of 9.792%, 7.434%, 8.561%, 7.355% and 7.342%, respectively. In the two-step and three-step forecasting, the developed combined model is more effective than the other methods for wind speed forecasting. Meanwhile, KELM has the lowest MAPE values at 7.342% and 9.883% in the one-step and twostep among the remaining four individual models. For SH July, when the forecasting is one-step, the proposed VMD-ISOA-KELM hybrid model achieves the highest accuracy with a MAPE value of 3.140%. Comparatively, the individual models have fairly lower MAPE values of 9.792%, 7.434%, 8.561%, 7.355% and 7.342%, respectively. In the two-step and three-step forecasting, the developed combined model is more effective than the other methods for wind speed forecasting. Meanwhile, KELM has the lowest MAPE values at 7.342% and 9.883% in the one-step and two-step among the remaining four individual models. For SH July, when the forecasting is one-step, the proposed VMD-ISOA-KELM hybrid model achieves the highest accuracy with a MAPE value of 3.140%. Comparatively, the individual models have fairly lower MAPE values of 9.792%, 7.434%, 8.561%, 7.355% and 7.342%, respectively. In the two-step and three-step forecasting, the developed combined model is more effective than the other methods for wind speed forecasting. Meanwhile, KELM has the lowest MAPE values at 7.342% and 9.883% in the one-step and twostep among the remaining four individual models.  For SH Oct, according to the evaluation criteria shown in Table 3, the proposed model still outperformed the individual models in the three steps, with MAPE values of 2.367%, 2.541% and 2.844%. According to the obtained MAPE, long short-term memory (LSTM) is ranked as the second most effective model in the three forecasts, with lower MAPE values of 7.731%, 10.557% and 11.753%.
For SH Jan, in all forecasting steps, the developed combined model exceeded the five benchmark models with MAPE values of 3.894%, 4.276% and 4.737%. In the two-step and three-step forecasting, the five individual models performed poorly, and their RMSE values were all over 1.

Experiment II: Comparsion with Other Models Using Different Data Preprocessing Methods
This experiment demonstrated the forecasting performance of the wind speed time series by comparing the VMD-ISOA-model with models using different data preprocessing methods, namely DWT, EMD and CEEMD. The comparison results are listed in Table 4 and Figures 5-8. More details of the experiment are given below: For SH Apr, in the one-step forecasting, the proposed model showed the best performance with a MAPE value of 6.606%. In comparison, the model after pretreatment of VMD ranked as the second most effective model among the other data preprocessing technologies, with MAPE values of 7.089%, 7.412% and 8.340%, respectively, from one-step to three-step forecasting. Correspondingly, the DWT-Model showed the worst forecasting accuracy with MAPE values of 18.12%, 28.585%, and 36.064% from one-step to three-step forecasting.
For SH July, according to the evaluation criteria shown in Table 4 Figure 7.
For SH Jan, when the model forecasting is one-step, the prediction accuracy of the hybrid model, which has the lowest MAE, RMSE and MAPE values of 0.252, 0.333 and 3.894% respectively, was still superior compared to the other models using different preprocessing methods. In addition, the CEEMD -Model showed a better forecasting performance than EMD, with MAPE values of 6.807%, 7.601% and 8.246% respectively when the model forecasting changed from one-step to three-step. Processes 2021, 9, x FOR PEER REVIEW 13 of 21 Figure 5. Forecasting performance of decomposed models in one-, two-and three-step ahead forecasting for the spring dataset.
For SH July, according to the evaluation criteria shown in Table 4 For SH Oct, when the forecasting was one-step, the proposed VMD-ISOA-KELM hybrid model achieved the highest accuracy with a MAPE value of 3.140%. Comparatively, the DWT-Model, EMD-Model, CEEMD-Model and VMD-Model had MAPE values of 5.981%, 6.744%, 3.452%, 7.355% and 7.342%, respectively, which wereinferior to our developed hybrid model. The comparison results of our forecasting strategy and DWT-Model, EMD-Model and CEEMD-Model are shown in Figure 7.
For SH Jan, when the model forecasting is one-step, the prediction accuracy of the hybrid model, which has the lowest MAE, RMSE and MAPE values of 0.252, 0.333 and 3.894% respectively, was still superior compared to the other models using different preprocessing methods. In addition, the CEEMD -Model showed a better forecasting performance than EMD, with MAPE values of 6.807%, 7.601% and 8.246% respectively when the model forecasting changed from one-step to three-step.

Discussion
This section presents an insightful discussion of the experiment results, namely the main contributions, the performance of the employed optimization algorithm, the effectiveness of the proposed model and improvements of the proposed model. The concrete details are as follows.

Main Achievements and Results
Considering the noisy and highly nonlinear features of real wind speed data, this paper mainly proposes an optimized hybrid forecasting strategy based on VMD, KELM and ISOA for short-term wind speed forecasting. VMD decomposition technology has advantages in terms of weakening the non-stationarity of wind speed data, which were found by comparing and analyzing the experimental results of VMD-KELM, EMD-KELM, CEEMD-KELM and DWT-KELM techniques. With regard to wind speed forecasting, KELM is used as a powerful regression core to characterize the relationship between the samples in each subsequence and the expected output. Experiment 1 showed that KELM has a certain advantage in several widely used individual models. However, the prediction accuracy of KELM is sensitive to parameters. For this purpose, a novel algorithm ISOA was proposed to solve optimization issues, transforming the global optimization strategy from linear to non-linear. In order to further improve the prediction, the two parameters of KELM were optimized by the proposed ISOA algorithm. The superiority of the proposed prediction strategy was shown through relative experiments and contrastive analysis.

Performance of the Employed Optimization Algorithm
In this subsection, eight typical benchmark functions were used to measure and verify the proposed ISOA algorithm, including three unimodal functions and five multimodal functions. The unimodal function was used to test the development ability, and the multimodal function was used to test the development ability and avoid falling into the local optimum. These benchmark functions are shown in Table 5. Peak donates the features of the function, Dim donates the dimension of the function, Range donates the definition domain of the function and f min donates the optimal value of the function. Table 5. Description of unimodal, multimodal and fixed-dimension benchmark functions.

Function
Peak Dim Range f min , 600] 0 In addition, seven classic optimization algorithms were selected for comparison with the new algorithm, namely particle swarm optimization (PSO), differential evolution (DE), seagull optimization algorithm (SOA), gray wolf optimizer (GWO), sine cosine algorithm (SCA), moth flame optimization (MFO) and the multiverse optimizer (MVO). All algorithms were run 50 times on each benchmark function and with a maximum of 200 iterations. Figure 9 shows the convergence curve of ISOA and other comparison algorithms with the same dimensions. Compared with SOA, ISOA was closer to the optimal value with the same number of iterations. Among all comparative functions, ISOA had the fastest convergence speed, reflecting ISOA's efficient exploration capability. In order to measure the experimental results, the average value (AVG) and standard deviation (STD) were used to evaluate the results. Note that the best results are presented in bold. The data in Table 6 demonstrate that the optimization result of ISOA was the best among all optimization algorithms. At the same time, the STD values of the solutions were still the smallest, indicating the stability of the ISOA. the fastest convergence speed, reflecting ISOA's efficient exploration capability. In order to measure the experimental results, the average value (AVG) and standard deviation (STD) were used to evaluate the results. Note that the best results are presented in bold. The data in Table 6 demonstrate that the optimization result of ISOA was the best among all optimization algorithms. At the same time, the STD values of the solutions were still the smallest, indicating the stability of the ISOA.   (h) F15.

Effectiveness of the Developed Strategy
To investigate the different effectiveness of the developed model and other comparison models, the Diebold-Mariano (DM) test was employed, which is a statistical hypothesis test. The null hypothesis H 0 and alternative hypothesis H 1 are written as follows: where F is the loss function of forecasting errors, e 1 i and e 2 i are forecasting errors between actual values and forecasted values of the different forecasting models. Then, implementing statistical reasoning by DM test statistics, the DM test statistic values can be computed by where τ 2 denotes the estimation for the variance of F(e 1 i ) − F(e 2 i ). Table 7 lists the mean DM values from one-to three-step forecasting. Regardless of the DM values for one-step, two-step and three-step forecasting, the DM values of the nine comparison models were all obviously significant. For some classic individual models, all DM values were much larger than the upper limits at a 1% significance level. Moreover, when comparing with models applying different data pretreatment technologies, the proposed hybrid model similarly obtains showed a improvement.

Improvements of the Proposed Model
To further discuss and evaluate the degree of improvement in forecasting when comparing a selected model with the proposed mode, we adopted an improvement percentage of the MAPE criteria (P MAPE ), which enabled a comprehensive analysis of the proposed hybrid model. It is defined as

Conclusions
To follow the trend of clean energy development, strive to achieve low-carbon environmental protection, and vigorously develop wind energy resources, this paper proposes a hybrid forecasting model based on VMD, an improved seagull optimization algorithm and KELM. Firstly, VMD is applied to decompose the given non-stationary wind speed data into several subseries with various scales. Then, KELM is used as a powerful regression core to characterize the relationship between the samples in each subsequence and the expected output. To enhance the prediction performance, the proposed ISOA is designed by including a nonlinear formula, which controls the population migration process and attack process of SOA. Subsequently, the proposed ISOA algorithm is applied to the simultaneous optimization of two parameters in the KELM model. Finally, the final predicted value is obtained by summing the results of all subseries. Furthermore, to evaluate the effectiveness and applicability of the developed combined model, different forecasting models are implemented on four datasets. The selected forecasting models includes five classic individual models and four hybrid models. The experimental results of the three metrics show that (1) the VMD is effective in improving the accuracy and stability of the wind speed predictions; (2) compared with the common ANN and SVM models, the KELM models show advantages in capturing the nonlinear characteristics of the wind speed time series; (3) regardless of the forecasting step or the observation datasets, the proposed combined strategy was superior to all of the selected methods with average MAPE values of 3.865%, 4.213% and 4.614% for one-to three-step forecasting.
Author Contributions: Conceptualization, X.C. and Y.L.; writing-original draft preparation, X.C.; writing-review and editing, X.Y. and X.X.; supervision, Y.Z. and F.Z.; All authors have read and agreed to the published version of the manuscript.