Short-Term Wind Power Prediction Based on Improved Grey Wolf Optimization Algorithm for Extreme Learning Machine

: In order to improve the accuracy of wind power prediction and ensure the e ﬀ ective utilization of wind energy, a short-term wind power prediction model based on variational mode decomposition (VMD) and an extreme learning machine (ELM) optimized by an improved grey wolf optimization (GWO) algorithm is proposed. The original wind power sequence is decomposed into series of modal components with di ﬀ erent center frequencies by the VMD method and some new sequences are obtained by phase space reconstruction (PSR). Then, the ELM model is established for di ﬀ erent new time series, and the improved GWO algorithm is used to optimize its parameters. Finally, the output results are weighted and merged as the ﬁnal predicted value of wind power. The root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) of the proposed VMD-improved GWO-ELM prediction model in the paper are 5.9113%, 4.6219%, and 13.01% respectively, which are better than these of ELM, back propagation (BP), and the improved GWO-ELM model. The simulation results show that the proposed model has higher prediction accuracy than other models in short-term wind power prediction.


Introduction
Over the past two decades, rising energy demand has driven the development of renewable energy. Wind energy, with its advantages of non-pollution and abundant reserves, has received worldwide attention and development [1]. However, due to the randomness of wind energy, there is no significant breakthrough in wind energy utilization. Therefore, it is particularly essential to predict wind power [2,3] for exploiting wind energy much more efficiently and reasonably. Wind power prediction is one of the most practicable methods.
Currently, there are two categories of wind power predictions approaches which are: the physical method [4] and the statistical method [5], according to different prediction models. The data of wind speed and direction measured by numerical weather forecast are employed as input data to realize the prediction of wind power [6], which is called the physical method. The measured data and historical data are used in the statistical method to establish the functional input-output relationship, which is used for wind power prediction. The statistical method is suitable for short-term wind power prediction.
By contrast, the extreme learning machine (ELM) has the advantages of fewer parameters, higher learning speed, and more excellent generalization performance [7]. However, how to determine the parameters of ELM is a challenging problem which has attracted many scholars from all over the world 1.
The GWO algorithm is improved. Firstly, the nonlinear convergence factor is proposed to improve the convergence precision of the algorithm. Then, the beetle antennae search (BAS) algorithm is used to optimize the GWO algorithm, which increases the environment judgement ability for the beetle individuals and avoids the algorithm falling into the local optimum. 2.
VMD is used to decompose the original wind power sequence. Firstly, the modal components generated by decomposition of the original wind power sequence by the VMD algorithm are autonomously combined into a low-frequency wave, medium-frequency wave, and high-frequency wave respectively, according to the size of the sample entropy value. Moreover, the reconstructed waveforms are respectively predicted by the improved GWO-ELM. Finally, the results are merged into the final prediction results. 3.
Four classical test functions are used to test the optimization ability of the PSO and GWO algorithms, respectively. By comparing the test results, it is found that the improved GWO algorithm can increase the optimization speed of the algorithm and avoid the algorithm falling into local optimum. By comparison with ELM, BP, and the improved GWO-ELM, it shows that the model proposed in this paper has a better prediction effect.
The structure of this paper is organized as follows: In Section 2, two improvements of the GWO algorithm are proposed. The first one is the thought of the linear convergence factor and the other is the combination of the beetle antennae search algorithm and the standard GWO algorithm. In Section 3, several ELM prediction models are established by using the modal components obtained by VMD, and the model parameters are optimized by the improved GWO algorithm. In Section 4, with the actual wind power data, several simulations have been implemented to verify the optimization performance of the improved GWO algorithm and prediction model. Finally, Section 5 summarizes this paper.

Standard Grey Wolf Optimization Algorithm
The grey wolf optimization (GWO) algorithm [15,16], which simulates the hierarchy and predation behavior of wolves, is one sort of swarm intelligence algorithm. Its evident strengths are a simple mechanism, few parameters, and easy implementation. The algorithm separates the entire wolf group into four layers. The first layer is the alpha wolf (α) who is responsible for making decisions about predation, habitat, and schedule. Other wolves must obey the command from the alpha wolf (α). The second layer is the beta wolf (β), who obeys and assists the α. The third layer is the delta wolf (δ), who obeys the α and the β while dominating the rest of the wolf group. The fourth layer is the omega wolf (ω), who must obey other social-level wolves. The former three layers have the best fitness values and lead the group towards the target. Its hunting process mainly includes: tracking, encircling, and attacking.

1.
Tracking process: When the grey wolf searches for prey, it gradually approaches the prey and surrounds it. The distance between the grey wolf and the prey during the tracking process is: where D is the distance between the grey wolf and the prey, t is the current number of iterations, X P (t) is the position of the prey after the t-th iteration (i.e., the position of the optimal solution), X(t) is the position of the grey wolf after the t-th iteration (i.e., the locations of the potential solution), A and C are the coefficient factor, and its calculation equation is: where r 1 and r 2 are the random numbers limited from 0 to 1, and a decreases linearly from 2 to 0 as the number of iterations increases.

2.
Encircling process: The grey wolf can identify the location of the potential prey (optimal solution). The encircling process is mainly done by the guidelines of α, β, and δ. It normally assumes that α, β, and δ have a strong ability to identify the positions of potential prey. During each iteration, the best three grey wolves, α, β, and δ, from the current population are retained. Finally, the positions of ω are updated by their location information.
where D α , D β , and D δ are the distances between α, β, δ wolves and ω wolves. X α , X β , and X δ represent the positions of α, β, and δ in the current population. X 1 , X 2 , and X 3 are the moving step size and directions of the ω wolves toward α, β, and δ, respectively.

3.
Attacking process: When the artificial wolves surround the prey, they are going to start hunting. Their positions are mainly updated by α, β, and δ wolves. where X(t + 1) is the calculated position of the updated ω wolves. In combination with Equation (2), A changes as a changes. When |A| > 1, the grey wolves spread out in each area as far as possible to search for prey. Hence, the algorithm expands its search area. When |A| < 1, the grey wolves focus on one or more particular areas of prey, which results in a decrease of the search scope of the algorithm.
There may still be some problems with the standard GWO algorithm. Firstly, the convergence factor in the standard GWO decreases linearly from 2 to 0 with the increasing numbers of iteration. During this procedure, the exploration and development process of GWO is not fully reflected, which leads to the low convergence accuracy. Secondly, the standard GWO algorithm focuses on the influence of the group on individuals and ignores the judgment of the individuals in the attacking process. Consequently, it is easy to fall into local optimum. Based on the analysis, these issues will be addressed separately.

Nonlinear Convergence Factor
The nonlinear convergence factor a is proposed for the problem of low convergence precision of the GWO algorithm, which determines the global search and local search performance of the GWO algorithm. The larger the a, the greater the global search capability. The smaller the a, the stronger the local development capability [17]. The nonlinear convergence factor a can be expressed as: where t is the current number of iterations and T max is the maximum number of iterations. The comparison of original and improved a is displayed in Figure 1. The convergence factor of improved a nonlinearly decreases from 2 to 0. As the number of iterations increases, the decay rate of the convergence factor gradually decreases, which enhances the local search ability and the local optimal solution is more accurate. Therefore, this nonlinear convergence factor is more consistent with the actual convergence process of the algorithm, which can better balance the global search and local search performance of the algorithm and avoid the algorithm falling into local optimum.
Processes 2020, 8,109 4 of 17 where ( 1) + Xt is the calculated position of the updated  wolves. In combination with Equation (2), A changes as a changes. When |1 A|> , the grey wolves spread out in each area as far as possible to search for prey. Hence, the algorithm expands its search area. When |1 A|< , the grey wolves focus on one or more particular areas of prey, which results in a decrease of the search scope of the algorithm.
There may still be some problems with the standard GWO algorithm. Firstly, the convergence factor in the standard GWO decreases linearly from 2 to 0 with the increasing numbers of iteration. During this procedure, the exploration and development process of GWO is not fully reflected, which leads to the low convergence accuracy. Secondly, the standard GWO algorithm focuses on the influence of the group on individuals and ignores the judgment of the individuals in the attacking process. Consequently, it is easy to fall into local optimum. Based on the analysis, these issues will be addressed separately.

Nonlinear Convergence Factor
The nonlinear convergence factor a is proposed for the problem of low convergence precision of the GWO algorithm, which determines the global search and local search performance of the GWO algorithm. The larger the a , the greater the global search capability. The smaller the a , the stronger the local development capability [17]. The nonlinear convergence factor a can be expressed as: where t is the current number of iterations and max T is the maximum number of iterations.
The comparison of original and improved a is displayed in Figure 1. The convergence factor of improved a nonlinearly decreases from 2 to 0. As the number of iterations increases, the decay rate of the convergence factor gradually decreases, which enhances the local search ability and the local optimal solution is more accurate. Therefore, this nonlinear convergence factor is more consistent with the actual convergence process of the algorithm, which can better balance the global search and local search performance of the algorithm and avoid the algorithm falling into local optimum.

Improved GWO Algorithm Based on the Beetle Antennae Search Algorithm
The beetle antennae search (BAS) algorithm [18][19][20] is an efficient, intelligent optimization algorithm proposed in 2017. Beetles search for food according to the intensity of food odor. If the odor intensity received by the right antenna is larger than that on the left, beetles will fly to the right. Otherwise, they will fly to the left. The intensity of food odor felt by beetles at each point is the value

Improved GWO Algorithm Based on the Beetle Antennae Search Algorithm
The beetle antennae search (BAS) algorithm [18][19][20] is an efficient, intelligent optimization algorithm proposed in 2017. Beetles search for food according to the intensity of food odor. If the odor intensity received by the right antenna is larger than that on the left, beetles will fly to the right. Otherwise, they will fly to the left. The intensity of food odor felt by beetles at each point is the value of fitness function. The purpose of its flight is to find the position where the intensity of food odor is strongest. The BAS algorithm is similar to other optimization algorithms, like the ant colony optimization (ACO) algorithm [21], the particle swarm optimization (PSO) algorithm [22], the cuckoo search optimization (CSO) algorithm [23], etc. However, it can intelligently and efficiently search without knowing the specific form and gradient information of the function.
The specific algorithm flow is as follows: 1.
Randomly initialize the position, x, of the beetle, and the distances between the two antennas of each beetle are d 0 ; 2.
Calculate the fitness value of the beetle f (x); 3.
Randomly generate the direction of the antennas according to Formula (8), and update the step according to Formula (9): where rnd is a random function, D is the spatial dimension, k is the decay factor of the step size, and δ t is the search step of the beetle at time t.

4.
Determine the positions of the left and right antennas according to Formula (10): where x l and x r are positions of the beetle's left and right antennas, x t is the position of the beetle's center of mass at time t, and d t is the distance between the two antennas at time t. In practice, adjustments can be made according to the step size, specifically as in Formula (12): where c is a constant, c ∈ [2, 10].

5.
Calculate the fitness value f (x l ) and f (x r ) of the left and right antennae. 6.
Determine the direction of the next movement of the beetle, and further update the position according to a combination of Formula (13) and the fitness values of left and right antennas:

7.
Judge whether the convergence condition is met, namely, the maximum number of iteration or the set precision. If so, the iteration is terminated. Otherwise, continue the iteration process from step 3.
Compared with the PSO algorithm, ACO algorithm, CSO algorithm, and other algorithms, the BAS algorithm only needs the individual to search optimal solution. Besides, its computational cost is low, the process is simple, and implementation is easy. Therefore, it is appropriate to combine the advantages of the BAS algorithm and GWO algorithm to solve the problem that the GWO algorithm focuses too much on the group. Consequently, the GWO algorithm optimized by the BAS algorithm is proposed. The main ideas are as follows: Firstly, according to the idea of the BAS algorithm, the α, β, δ, and a random grey wolf among the wolves group are described as the elite individuals. In the iterative process, the position update of the wolf group not only solely relies on the guidance of three such kinds of wolves, but also relies on the judgment of the environment from the elite grey wolves in each iteration. Thus, it prevents the GWO algorithm from falling into local optimum. Then, elite individuals compare the fitness function values of the left and right sides during each iteration. The optimal value can be obtained for updating the position of the grey wolf group.
By constructing the elite grey wolf individuals mentioned previously, it is efficient to overcome the poor stability caused by the GWO algorithm and the problem of local optimum.
The flow of the improved GWO algorithm is as follows: 1.
Initialize the number of wolf group, N, dimension, D, number of iteration, m, and wolf group, Calculate the fitness value of the grey wolf individual.

3.
Select α, β, and δ according to the size of the fitness value.

4.
Define α, β, δ, and a wolf randomly selected in the wolf group as beetles. Calculate position x and fitness function value f (x) of the left and right antennae for each beetle. Compare the fitness function of each grey wolvf and select the final α, β, and δ.

5.
Update the position of the ordinary wolf, ω, according to the position iteration equation. 6.
Update parameters a, A, C.

7.
Judge whether the stop condition is satisfied, otherwise move to the second step. 8.
Output the position and fitness value of the α wolf.

Standard Extreme Learning Machine
Standard intelligent learning algorithms, such as the neural network algorithm [24], usually adopt the gradient descent method to adjust weight parameters. However, this may cause algorithms to slow down learning speed and fall into the local minimum. According to these problems, Huang et al. proposed the method of the extreme learning machine (ELM) [25]. ELM reduces the calculation of model parameter selection while preserving the generalization of the neural network.
The main ideas are as follows: Input n samples X = {x 1 , x 2 , x 3 , · · · x n }, one of the outputs among the samples is: where β i is the output weight, g(·) is the activation function, w i is the input weight, b i is the hidden layer bias, and w i · x j is the inner product of w i and x j . Then, the model is trained to minimize the error. It normally assumes that the error is 0, which means the existence of β i , w i , and x j makes the training result equal to the output result.
The specific equation is as follows: where Solving min Hβ − Y 2 , where Y is the actual output, H is determined by b i and w i which are set randomly. Finally, the β is solved to bulid the model.

Optimized ELM based on VMD
The standard ELM has some distinct advantages. Nevertheless, there are still some problems with it, which will cause the inefficient prediction effect of the eventual wind power model if the actual wind power data are directly used to train the limit learning machine due to its violent fluctuation and randomness.
For this reason, variational mode decomposition (VMD) is used to decompose the original wind power sequence in priority. VMD [26] is a new signal decomposition estimation method proposed by Dragomiretskiy et al. in 2014, and its overall framework is a variational problem.
VMD decomposes the input signal into several sub-modes to minimize the sum of the estimated bandwidths of each mode [27]. It normally assumes that each mode is limited bandwidth with a different center frequency. Then, by alternating the direction multiplier method, each mode and center frequency is updated continuously. As a result, each of the modes gradually demodulates to their corresponding fundamental frequency bands. Finally, each of the modes and corresponding center frequencies are extracted.
The process of the VMD algorithm is as follows: 1.
For each mode function u k (t), the analytic signal is obtained through Hilbert transformation for obtaining the one-sided spectrum. [ where δ(t) is the unit pulse function.

2.
Estimate the center frequency of each modal function by adjusting the exponent e − jw k t , and then modulate the spectrum of each mode to the corresponding baseband.
3. Construct a constrained variational problem of Equation (21) by calculating the square of the L2-norm of the demodulated signal gradient.
It also meets the following condition: where f is a time series, u k is a mode signal, k is the number of modes, * presents the convolution, and w k indicates the center frequency of the kth mode.
After that, the phase space reconstruction (PSR) [28,29] is carried out for the above decomposition waveforms, respectively. The idea of PSR is as follows: For the original sequence, X(t) = [x 1 , x 2 , · · · , x n ], the reconstructed phase space sequence is shown in Equation (23): where τ is the delay time and m is the embedded dimension.

Short-Term Wind Power Model based on the VMD-Improved GWO-ELM
For further explaining the main sources of prediction error, the VMD algorithm was used to decompose the original wind power sequence into a series of modal components sequences and then autonomously combine them into high-, medium-, and low-frequency waves according to the size of its sample entropy value. Moreover, improved GWO-ELM prediction is conducted for each reconstructed waveform. Then, phase space reconstruction was carried out for the three kinds of waveforms, and the improved GWO-ELM prediction was implemented for the reconstructed sequence. Finally, the final output results were obtained by weighting the predicted results of the three models.
The specific process is shown in Figure 2.

Short-Term Wind Power Model based on the VMD-Improved GWO-ELM
For further explaining the main sources of prediction error, the VMD algorithm was used to decompose the original wind power sequence into a series of modal components sequences and then autonomously combine them into high-, medium-, and low-frequency waves according to the size of its sample entropy value. Moreover, improved GWO-ELM prediction is conducted for each reconstructed waveform. Then, phase space reconstruction was carried out for the three kinds of waveforms, and the improved GWO-ELM prediction was implemented for the reconstructed sequence. Finally, the final output results were obtained by weighting the predicted results of the three models.
The specific process is shown in Figure 2.

Improved GWO Algorithm Performance Verification
Randomly generating the input weight and hidden layer bias of the standard ELM will lead to its unstable performance and weak generalization ability. So, the improved GWO algorithm was used to optimize the initial parameters of the ELM in this paper.
Four classical test functions were selected to verify its effectiveness, convergence speed, and accuracy. The four functions are all multi-peak functions, which can be used to detect the ability of local searching and the ability to escape from local optimum. The expressions, variable ranges, and theoretical optimal values of each test function are shown in Table 1.

Improved GWO Algorithm Performance Verification
Randomly generating the input weight and hidden layer bias of the standard ELM will lead to its unstable performance and weak generalization ability. So, the improved GWO algorithm was used to optimize the initial parameters of the ELM in this paper.
Four classical test functions were selected to verify its effectiveness, convergence speed, and accuracy. The four functions are all multi-peak functions, which can be used to detect the ability of local searching and the ability to escape from local optimum. The expressions, variable ranges, and theoretical optimal values of each test function are shown in Table 1. Ackley To verify the performance of the improved GWO algorithm proposed in this paper, the Shubert function in Table 1 has been respectively optimized by the standard GWO algorithm, the GWO algorithm improved by the nonlinear convergence factor (a + GWO), the GWO algorithm combined with BAS algorithm (BAS + GWO), and the GWO algorithm improved by both of the improvements. The method of the test is that conducting 50 independent experiments, with 200 times cycle number for each experiment, and the population is 20. Figure 3 shows the convergence curve of Shubert optimization using four algorithms. It is obvious that both the nonlinear convergence factor and the improvement combined with the BAS algorithm can increase the optimization effect of the GWO algorithm. However, the convergence effect of the BAS + GWO algorithm is much better. Meanwhile, the variance of the BAS + GWO algorithm is 1.2541, which is smaller than 3.03569 of the a + GWO algorithm. In comparison of the optimization results' mean value −185.671 of the a + GWO algorithm, the same value of the BAS + GWO algorithm is −186.0797, which is closer to the theoretical optimal solution −186.7309. It further indicates that both improvements of nonlinear convergence factor and the combination with the BAS algorithm can increase the optimization effect of the algorithm, among which, the latter one improves the GWO algorithm better. Therefore, the combination of two improvements can better optimize the performance of the GWO algorithm.  To verify the performance of the improved GWO algorithm proposed in this paper, the Shubert function in Table 1 has been respectively optimized by the standard GWO algorithm, the GWO algorithm improved by the nonlinear convergence factor (a + GWO), the GWO algorithm combined with BAS algorithm (BAS + GWO), and the GWO algorithm improved by both of the improvements. The method of the test is that conducting 50 independent experiments, with 200 times cycle number for each experiment, and the population is 20. Figure 3 shows the convergence curve of Shubert optimization using four algorithms. It is obvious that both the nonlinear convergence factor and the improvement combined with the BAS algorithm can increase the optimization effect of the GWO algorithm. However, the convergence effect of the BAS + GWO algorithm is much better. Meanwhile, the variance of the BAS + GWO algorithm is 1.2541, which is smaller than 3.03569 of the a + GWO algorithm. In comparison of the optimization results' mean value −185.671 of the a + GWO algorithm, the same value of the BAS + GWO algorithm is −186.0797, which is closer to the theoretical optimal solution −186.7309. It further indicates that both improvements of nonlinear convergence factor and the combination with the BAS algorithm can increase the optimization effect of the algorithm, among which, the latter one improves the GWO algorithm better. Therefore, the combination of two improvements can better optimize the performance of the GWO algorithm. Furthermore, with the same method of the above test, the standard PSO algorithm and the standard GWO algorithm were respectively used to optimize four classic test functions in Table 1, and the results were compared. It demonstrated the actual performance of the improved GWO algorithm.
(a), (b), (c), and (d) of Figure 4 are the convergence curves of each function under the standard PSO algorithm, standard GWO algorithm, and improved GWO algorithm, respectively. In Figure 4, compared to the prediction results of the three algorithms, the results of the improved GWO algorithm has the minimum times of deviating from the optimal solution and the smallest amplitude. It demonstrated that the improved GWO algorithm has better optimization accuracy and is more suitable for optimizing the initial parameters of the ELM model. Furthermore, with the same method of the above test, the standard PSO algorithm and the standard GWO algorithm were respectively used to optimize four classic test functions in Table 1, and the results were compared. It demonstrated the actual performance of the improved GWO algorithm.
(a), (b), (c), and (d) of Figure 4 are the convergence curves of each function under the standard PSO algorithm, standard GWO algorithm, and improved GWO algorithm, respectively. In Figure 4, compared to the prediction results of the three algorithms, the results of the improved GWO algorithm has the minimum times of deviating from the optimal solution and the smallest amplitude. It demonstrated that the improved GWO algorithm has better optimization accuracy and is more suitable for optimizing the initial parameters of the ELM model.  In order to further prove the convergence characteristics of the improved GWO algorithm, the optimal value, the worst value, the variance and the excellent rate of the results were compared, and the results are shown in Table 2.
In Table 2, it should be noted that the excellent rates of the three algorithms for 1 f are 100%, and that for 2 f , 3 f , and 4 f are different, among which the improved GWO algorithm is the best one. Furthermore, it can prove that the improved GWO algorithm is more stable due to its smallest variance among the optimization results of the three algorithms in 50 independent experiments. In conclusion, the optimization result of the improved GWO algorithm is the best one, which indicates that the accuracy of the GWO algorithm can be optimized by the proposed method in this paper. In summary, the optimal solution can be obtained more quickly and accurately with the improved GWO algorithm, and it is not easy to fall into the local optimum value during the implementation process. In order to further prove the convergence characteristics of the improved GWO algorithm, the optimal value, the worst value, the variance and the excellent rate of the results were compared, and the results are shown in Table 2.
In Table 2, it should be noted that the excellent rates of the three algorithms for f 1 are 100%, and that for f 2 , f 3 , and f 4 are different, among which the improved GWO algorithm is the best one. Furthermore, it can prove that the improved GWO algorithm is more stable due to its smallest variance among the optimization results of the three algorithms in 50 independent experiments. In conclusion, the optimization result of the improved GWO algorithm is the best one, which indicates that the accuracy of the GWO algorithm can be optimized by the proposed method in this paper. In summary, the optimal solution can be obtained more quickly and accurately with the improved GWO algorithm, and it is not easy to fall into the local optimum value during the implementation process. Figure 5 is the picture of 100 kw medium size wind turbines from a wind field in Henan, China. In this sector, a simulation experiment with the measured wind power data from Henan wind field have been carried out to verify the performance of the proposed short-term wind power prediction model. The 720 groups of data were selected to build the prediction model, and the sampling interval of wind power data was 15 min. Among these data, the former 624 data were used as training samples, and the rest of the 96 data were used as test samples.

Case Analysis
Processes 2020, 8,109 11 of 17 Figure 5 is the picture of 100 kw medium size wind turbines from a wind field in Henan, China. In this sector, a simulation experiment with the measured wind power data from Henan wind field have been carried out to verify the performance of the proposed short-term wind power prediction model. The 720 groups of data were selected to build the prediction model, and the sampling interval of wind power data was 15 min. Among these data, the former 624 data were used as training samples, and the rest of the 96 data were used as test samples.

Simulation Process
The number of decomposition mode number K is determined automatically by the method in Reference [30]. VMD decomposition is performed on the signal, and the correlation coefficient between the last modal component and the decomposition of the original signal is calculated. Firstly, suppose the modal number of VMD decomposition is k, and the correlation coefficient between the decomposition of the last modal component and the original signal is k A . Then, if the decomposition of the modal number is k + 1, the corresponding value is k1 A + . When the difference between k A and k1 A + is less than the threshold value a (a is set as 0.3% in this experiment), stop the decomposition.
Otherwise, increase the value of k and keep decomposing until the stop condition is satisfied. Finally, record the final decomposition modal number k. The flow chart of VMD is as Figure 6:

Simulation Process
The number of decomposition mode number K is determined automatically by the method in Reference [30]. VMD decomposition is performed on the signal, and the correlation coefficient between the last modal component and the decomposition of the original signal is calculated. Firstly, suppose the modal number of VMD decomposition is k, and the correlation coefficient between the decomposition of the last modal component and the original signal is A k . Then, if the decomposition of the modal number is k + 1, the corresponding value is A k+1 . When the difference between A k and A k+1 is less than the threshold value a (a is set as 0.3% in this experiment), stop the decomposition. Otherwise, increase the value of k and keep decomposing until the stop condition is satisfied. Finally, record the final decomposition modal number k. The flow chart of VMD is as Figure 6: Processes 2020, 8,109 11 of 17 Figure 5 is the picture of 100 kw medium size wind turbines from a wind field in Henan, China. In this sector, a simulation experiment with the measured wind power data from Henan wind field have been carried out to verify the performance of the proposed short-term wind power prediction model. The 720 groups of data were selected to build the prediction model, and the sampling interval of wind power data was 15 min. Among these data, the former 624 data were used as training samples, and the rest of the 96 data were used as test samples.

Simulation Process
The number of decomposition mode number K is determined automatically by the method in Reference [30]. VMD decomposition is performed on the signal, and the correlation coefficient between the last modal component and the decomposition of the original signal is calculated. Firstly, suppose the modal number of VMD decomposition is k, and the correlation coefficient between the decomposition of the last modal component and the original signal is k A . Then, if the decomposition of the modal number is k + 1, the corresponding value is k1 A + . When the difference between k A and k1 A + is less than the threshold value a (a is set as 0.3% in this experiment), stop the decomposition.
Otherwise, increase the value of k and keep decomposing until the stop condition is satisfied. Finally, record the final decomposition modal number k. The flow chart of VMD is as Figure 6:  Through the simulation experiment, when the value of k is in the range from 1 to 8, the minimum value among the 8 values of the |A K − A K+1 | is 0.5%, which is greater than a. When k = 9, the value of |A K − A K+1 | is 0.09%, which is less than a. Hence, k = 9.
The decomposition result is shown in Figure 7, which shows the original wind power sequences diagram and series decomposed by VMD. By analyzing the original sequence, it is worth noting that the original wind power sequence has the characteristics of randomness and strong fluctuation, which will affect the accuracy of the prediction if it is directly used as the input of the prediction model. Hence, it is necessary to decompose the original wind power sequence through the VMD to improve the prediction accuracy of the model. Processes 2020, 8, 109 12 of 17 Through the simulation experiment, when the value of k is in the range from 1 to 8, the minimum value among the 8 values of the K K 1 AA + − is 0.5%, which is greater than a. When k = 9, the value of is 0.09%, which is less than a. Hence, k = 9. The decomposition result is shown in Figure 7, which shows the original wind power sequences diagram and series decomposed by VMD. By analyzing the original sequence, it is worth noting that the original wind power sequence has the characteristics of randomness and strong fluctuation, which will affect the accuracy of the prediction if it is directly used as the input of the prediction model. Hence, it is necessary to decompose the original wind power sequence through the VMD to improve the prediction accuracy of the model. According to how close the sample entropy value is, the modal components are autonomously decomposed into high-frequency, medium-frequency, and low-frequency. In this paper, they are u(1)u(2), u(3)u(7) and u(8)u(9), respectively.
Phase space reconstruction (PSR) was carried out for three different waveforms respectively, and input and output variables of the prediction model were set.
Finally, the improved GWO algorithm was used to optimize the initial parameters of the ELM model, and the sum of the absolute value of the training data output error of the ELM model was set as the fitness function. The improved GWO-ELM model was used to predict the three different frequency waveforms, respectively. Then, the prediction results of three waveforms were weighted and combined to obtain the final wind power prediction.

The Analysis of Simulation Results
The prediction results of the VMD-improved GWO-ELM model for different frequency decomposition waves are shown in (a), (b), and (c) of Figure 8.
As can be seen from Figure 8, the group of low-frequency waves have the best prediction result with almost no error. The prediction result of the medium frequency wave is worse with small range error. The prediction result of the high-frequency wave is the worst with a wide range of error.
Since the low-frequency wave changes slowly, the prediction model can accurately predict the wind power in the future. By contrast, the high-frequency wave changes faster, which leads to a significant error in the prediction model.
To further indicate the influence of high-frequency waves on prediction accuracy, wind power sequences with or without high-frequency waves were predicted by the VMD-improved GWO-ELM model, respectively. As can be seen from Figure 9, the fitting curve of wind power prediction with or without high-frequency waves is nearly the same. According to how close the sample entropy value is, the modal components are autonomously decomposed into high-frequency, medium-frequency, and low-frequency. In this paper, they are u(1)~u(2), u(3)~u(7) and u(8)~u(9), respectively.
Phase space reconstruction (PSR) was carried out for three different waveforms respectively, and input and output variables of the prediction model were set.
Finally, the improved GWO algorithm was used to optimize the initial parameters of the ELM model, and the sum of the absolute value of the training data output error of the ELM model was set as the fitness function. The improved GWO-ELM model was used to predict the three different frequency waveforms, respectively. Then, the prediction results of three waveforms were weighted and combined to obtain the final wind power prediction.

The Analysis of Simulation Results
The prediction results of the VMD-improved GWO-ELM model for different frequency decomposition waves are shown in (a), (b), and (c) of Figure 8.
As can be seen from Figure 8, the group of low-frequency waves have the best prediction result with almost no error. The prediction result of the medium frequency wave is worse with small range error. The prediction result of the high-frequency wave is the worst with a wide range of error.
Since the low-frequency wave changes slowly, the prediction model can accurately predict the wind power in the future. By contrast, the high-frequency wave changes faster, which leads to a significant error in the prediction model.
To further indicate the influence of high-frequency waves on prediction accuracy, wind power sequences with or without high-frequency waves were predicted by the VMD-improved GWO-ELM model, respectively. As can be seen from Figure 9, the fitting curve of wind power prediction with or without high-frequency waves is nearly the same.  The predicted evaluation indexes are root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). The smaller the values of these three indicators, the higher the accuracy of the model. The prediction evaluation indicators are as follows:  The predicted evaluation indexes are root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). The smaller the values of these three indicators, the higher the accuracy of the model. The prediction evaluation indicators are as follows: The predicted evaluation indexes are root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). The smaller the values of these three indicators, the higher the accuracy of the model. The prediction evaluation indicators are as follows: where N is the predicted quantity, y t is the actual wind power data, and o t is the predicted wind power data. In detail, the RMSE, MAE, and MAPE values of waves without high frequency are 5.9433, 4.6467, and 13.09. Correspondingly, these values of waves with high frequency are 5.9113, 4.6219, and 13.01, which are all smaller than the former values. It indicates that the prediction performance of wind power sequence with high-frequency waves is better. On the contrary, the wind power sequence without high-frequency waves will affect the prediction accuracy of wind power.
For better analyzing the prediction results, the prediction results of the ELM model, BP neural network model [31], improved GWO-ELM model, and VMD-improved GWO-ELM model were respectively compared.
Set the input node number of the BP neural network [32] model as 5, the number of hidden layer nodes as 8, and the number of output nodes as 1, among which, the number of hidden layer nodes was determined by multiple experiments. The comparison of different prediction results is shown in Figure 10.
As the curves are shown in Figure 10, it is obvious to find that the prediction curve of the VMD-improved GWO-ELM model has the best fitting effect. Correspondingly, the prediction curve of ELM and BP has the worst fitting effect.  (24) where N is the predicted quantity, t y is the actual wind power data, and t o is the predicted wind power data. In detail, the RMSE, MAE, and MAPE values of waves without high frequency are 5.9433, 4.6467, and 13.09. Correspondingly, these values of waves with high frequency are 5.9113, 4.6219, and 13.01, which are all smaller than the former values. It indicates that the prediction performance of wind power sequence with high-frequency waves is better. On the contrary, the wind power sequence without high-frequency waves will affect the prediction accuracy of wind power.
For better analyzing the prediction results, the prediction results of the ELM model, BP neural network model [31], improved GWO-ELM model, and VMD-improved GWO-ELM model were respectively compared.
Set the input node number of the BP neural network [32] model as 5, the number of hidden layer nodes as 8, and the number of output nodes as 1, among which, the number of hidden layer nodes was determined by multiple experiments. The comparison of different prediction results is shown in Figure 10.
As the curves are shown in Figure 10, it is obvious to find that the prediction curve of the VMDimproved GWO-ELM model has the best fitting effect. Correspondingly, the prediction curve of ELM and BP has the worst fitting effect. In order to further illustrate the effectiveness of the proposed model, Table 3 shows the prediction error indexes of the four models. According to Table 3, the prediction error of the improved GWO-ELM model is smaller than that of the ELM model and the BP model, indicating that the prediction accuracy of the model can be improved by optimizing the parameters of the ELM model with the improved GWO. The prediction error of the VMD-improved GWO-ELM model is smaller than that of the improved GWO-ELM model, indicating that the prediction accuracy can be increased by decomposing the original wind power sequence through the method of VMD before prediction. The VMD-improved GWO-ELM model has the smallest prediction error among the four prediction models, which indicates that the VMD-improved GWO-ELM has a great feasibility in wind power prediction. In order to further illustrate the effectiveness of the proposed model, Table 3 shows the prediction error indexes of the four models. According to Table 3, the prediction error of the improved GWO-ELM model is smaller than that of the ELM model and the BP model, indicating that the prediction accuracy of the model can be improved by optimizing the parameters of the ELM model with the improved GWO. The prediction error of the VMD-improved GWO-ELM model is smaller than that of the improved GWO-ELM model, indicating that the prediction accuracy can be increased by decomposing the original wind power sequence through the method of VMD before prediction. The VMD-improved