Short-Term Wind Power Forecasting Based on VMD and a Hybrid SSA-TCN-BiGRU Network

: Wind power generation is a renewable energy source, and its power output is in ﬂ uenced by multiple factors such as wind speed, direction, meteorological conditions, and the characteristics of wind turbines. Therefore, accurately predicting wind power is crucial for the grid operation and maintenance management of wind power plants. This paper proposes a hybrid model to improve the accuracy of wind power prediction. Accurate wind power forecasting is critical for the safe operation of power systems. To improve the accuracy of wind power prediction, this paper proposes a hybrid model incorporating variational modal decomposition (VMD), a Sparrow Search Algorithm (SSA), and a temporal-convolutional-network-based bi-directional gated recurrent unit (TCN-BiGRU). The model ﬁ rst uses VMD to break down the raw power data into several modal components, and then it builds an SSA-TCN-BIGRU model for each component for prediction, and ﬁ nally, it accumulates all the predicted components to obtain the wind power prediction results. The proposed short-term wind power prediction model was validated using measured data from a wind farm in China. The proposed VMD-SSA-TCN-BiGRU forecasting framework is compared with benchmark models to verify its practicability and reliability. Compared with the TCN-BiGRU, the symmetric mean absolute percentage error, the mean absolute error, and the root mean square error of the VMD-SSA-TCN-BiGRU model reduced by 34.36%, 49.14%, and 55.94%.


Research Background
With the rapid development of the social economy, the energy demand continues to increase, especially the excessive consumption of fossil fuels such as coal, oil, and natural gas.This has caused problems such as environmental pollution and resource depletion [1].Due to limited natural resources [2], vigorously developing new energy is beneficial for sustainable social development [3].Therefore, it is imperative to explore novel energy sources urgently, promote the development and application of clean energy, and increase the proportion of clean energy in the national energy structure.Consequently, wind power generation, as a clean energy technology, is increasingly gaining attention in the field of power systems research.Wind power generation involves converting mechanical energy into electrical energy, which does not cause environmental pollution and can effectively replace traditional thermal power generation, thereby reducing the consumption of fossil fuels.Wind power generation offers numerous advantages, including the following: (1) Wind energy is a renewable and sustainable clean energy source.(2) Wind power is an environmentally friendly source of electricity.(3) Wind turbines occupy less land and are easy to expand.
Wind energy is a pollution-free renewable energy source that has developed rapidly due to its mature technology [4].Nevertheless, due to its random fluctuations and uncertainties, ensuring the real-time balance of power generation and consumption in the power system is challenging.Short-term wind power prediction is an effective tool to overcome the problems caused by intermittent and fluctuating wind power.Therefore, improving the accuracy of short-term wind power prediction has crucial academic value and economic significance [5].

Related Works
Currently, domestic and foreign wind power prediction models are mainly categorized as physical methods, statistical models, machine learning, and deep learning methods.Among them, deep learning models are the main focus of current research.The physical model does not require historical data and needs to be combined with wind farm topography and meteorological conditions.However, the complexity of the method needs to be lowered and is computationally challenging.Statistical models are modeled after historical data on wind farms, such as probabilistic autoregression [6] and probability mass bias [7], which builds a relationship between wind power generation and explanatory variables to forecast power generation.These models need to be more capable of dealing with substantial data uncertainty.In nonlinear data, the performance of these models needs to be improved.
Machine learning models are effective in handling nonlinear data.An improved support vector machine (ISVM) was used to forecast short-term wind power [8].Li et al. used an improved extreme learning machine (ELM) for wind power prediction [9].However, these models need to be improved in handling large datasets.Deep learning has become a research hotspot in wind power prediction due to its rapid development in recent years.Common deep learning methods include deep neural networks (DNNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), etc.The long short-term memory network (LSTM) is proposed based on RNNs, which can solve the problem of RNN gradient disappearance.Ewee et al. used a heap-based optimizer (HBO) to optimize the LSTM prediction of wind [10].Although a gated recurrent unit (GRU) can solve the gradient disappearance problem like LSTM, it has a simpler model structure and a faster training speed and is widely used in wind power prediction.Yang et al. performed a multi-step wind prediction using a combined model of GRU [11].A CNN is utilized in time series prediction for accurate wind speed prediction using CNN-extracted features [12].Yildiz et al. converted features obtained after the variational mode decomposition (VMD) process into images and employed an improved residual-based deep CNN to predict wind power [13].However, CNNs can only extract local features and ignore the relationship between the time series data.For this purpose, Lara et al. used a TCN for time series prediction [14].A TCN takes into account the timing characteristics and allows for flexibility in adding sensory fields.Zhu et al. used a TCN for wind prediction and compared it with other recurrent neural networks.The experiments showed that the TCN had better prediction results for time series data [15].
Due to the complexity of time series forecasting, it is difficult for a single model to predict the results accurately.Some scholars proposed a hybrid model to improve accuracy.Hossain et al. proposed a hybrid CNN-GRU model, which effectively extracts raw data features using a CNN.A GRU predicts wind power using the input features and verifies the model's effectiveness by comparing the prediction results with those of a single model [16].Kosana et al. first preprocessed the raw data using GAN and proposed a hybrid TCN-GRU model for wind speed prediction, which was evaluated using four metrics [17].Chen et al. applied CNN-LSTM to predict wind speed and concluded that the combined model can deeply extract the temporal and spatial correlation features simultaneously [18].Due to the non-stationarity and nonlinearity of time series data, Li et al. chose the VMD algorithm to decompose the wind speed.They used the particle swarm optimization (PSO) method to optimize the bi-directional long and short-term memory (Bi-LSTM) prediction model [19].Huang et al. proposed ensemble empirical mode decomposition (EEMD) to decompose the original wind speed, the decomposed components were predicted using gaussian process regression (GPR) and LSTM, and finally, all of the predicted components were superimposed to obtain the final prediction result, and the experimental results show the validity of the proposed model [20].

Main Contributions
This paper proposes a TCN-BiGRU method for wind power prediction based on VMD decomposition and SSA optimization.The original wind power data are first decomposed via VMD to obtain a number of intrinsic mode function (IMF) components, and each component is combined with the remaining features of the original data; then, TCN-BiGRU is used to predict the combined data separately, and SSA is used to optimize the number of iterations, learning rate, and number of neurons of TCN-BiGRU to improve the prediction accuracy of the model, and finally, the results of each component of the prediction are summed to obtain the final prediction results.In this paper, the model is used to predict the power of wind turbines in domestic wind farms, and the wind power prediction results of different models obtained every 5 min are compared to verify the effectiveness of the proposed model.The proposed model can accurately predict the wind power output and prepare for scheduling in advance to ensure the smooth operation and safe stability of the power grid.The following are the major contributions: (1) By employing the VMD method to decompose the original power data, noise can be reduced, and the lag in the prediction model can be addressed.(2) The TCN-BiGRU combination model is utilized to forecast the decomposed power components, and the accumulated predicted components yield the final prediction result.
(3) To further enhance the prediction accuracy of the model, the SSA algorithm is employed to optimize the hyperparameters in the TCN-BiGRU combination model.This optimization process helps to obtain the optimal parameters, leading to an improvement in the prediction accuracy of the model.

Methodology
In this paper, VMD decomposes the raw data, TCN-BiGRU is the basic model for wind power prediction, and sampling SSA optimizes the model's hyperparameters.Here are the theories and methods related to the model.

Variational Modal Decomposition
Variational modal decomposition (VMD) is an adaptive signal decomposition method [21] that can effectively suppress the modal aliasing phenomenon that occurs in empirical modal decomposition (EMD) [22].Unlike EMD, VMD determines each component's center frequency and bandwidth by iteratively searching for the optimal solution.VMD is based on a non-recursive method to process the original signal.The resulting IMFs are mutually distinct, preventing modal aliasing and resulting in better decomposition and a higher robustness.The decomposition process of VMD is the construction and solution process of a variational problem.
Assuming that the original power series is ( ) f t , decomposing the power into k components, the corresponding constrained variational model is where { } k u , { } k ω is the set of central frequencies corresponding to the subsignals, k is the number of modes in the decomposition, t is the sampling moment, t ∂ is the Dirac function, * is the convolution operation, and ( ) f t is the original power sequence.To reconstruct the constrained variational problem as an unconstrained variational problem, the expression of the augmented Lagrangian function is where ( ) t λ is the Lagrange multiplier operator and α is the quadratic penalty factor.
By searching for the saddle points of the increasing Lagrangian function, it is alternatively possible to find the iterated ( ) where is the Wiener filter for each modal component and ω + is the frequency center of the corresponding modal component.k modal components are obtained when the iterative stop of Equation ( 6) is satisfied.
where ε is the convergence progress.

Time Convolutional Network
TCN is a convolutional neural network specifically designed for analyzing time series data [23][24][25].It consists of three key components: causal convolution, dilated convolution, and residual connectivity.

Causal Convolution
Causal convolution [26] serves two essential functions in TCN [27]: ensuring that the network generates an output of the same length as the input and preventing leakage from future to past.Causal convolution differs from a standard convolutional neural network in that it utilizes a unidirectional structure, ensuring that the model input and output are the same size.However, it is important to note that causal convolution is often limited by its receptive field, which restricts its ability to incorporate long-term historical information for accurate predictions.

Dilated Convolution
In contrast to conventional ordinary convolution, the dilation convolution, as shown in Figure 1, can better solve the problem of the restricted perceptual field by performing interval sampling at the input.The dilation convolution can be expressed as Equation (7) with the input and the filter : {0,1, 2, , 1} where d is the expansion factor and n is the size of the filter.

Residual Connection
Increasing the hidden layers can enhance the TCN model's perceptual field.However, when there are too many hidden layers, problems such as gradient disappearance may occur and affect the model's performance.To address these challenges, TCN incorporates residual blocks [28]; the residual connection can effectively reduce the training difficulty, and 1 × 1 conv can adjust the width of the residual tensor.Figure 2 shows the basic structure of the residual connection.The residual block operation is shown in Equation ( 8) as follows: where ( ) Activation ⋅ is the activation func- tion.

Bi-directional Gated Recurrent Unit
Conventional RNNs have a limited feature extraction ability for time series and a weak ability to mine deep-level features.Hence, increasing the number of network layers is often necessary to make up for this deficiency.However, increasing the number of network layers may lead to problems such as gradient disappearance and gradient explosion.To address these challenges, researchers proposed a model called GRU [29], which is a simplified LSTM model [30][31][32].The GRU model contains only two gating mechanisms: reset gate and update gate.This gating mechanism can effectively extract temporal information, has fewer parameters, and requires fewer computational resources than the LSTM model.The basic structure of GRU is shown in Figure 3, and the formulas are ( 9)- (12).
( ) ( ) where t x and  In order to effectively mine the laws of forward and backward information in time series data, the BiGRU [33][34][35] model employs a structure consisting of a bi-directional recurrent neural network with forward and backward propagation.The structure is shown in Figure 4.In contrast to the unidirectional GRU model, BiGRU can simultaneously consider the changing pattern of the data before and after to mine the time series features more comprehensively.The formula of BiGRU is as follows: ( ) , where is the bias at time t .

Sparrow Search Algorithm
The Sparrow Search Algorithm (SSA) is an intelligent heuristic optimization algorithm that categorizes the sparrow population into discoverers, followers, and vigilantes.
A discoverer is a well-adapted individual in a population whose task is to search for food sources and provide its location to other followers.Typically, the number of discoverers in the population ranges from 10% to 20% of the total population.The equation for this location update is as follows: where t is the number of iterations, ij X is the position of the i th sparrow in the In addition to the discoverers, there are followers whose locations are updated as follows: where pj X is the iterative optimal position, wj X is the iterative global worst position, and A is a 1 j × matrix with a value of 1 or −1, ( ) When a sparrow finds danger, it issues an alert and calls a vigilante, and the location is updated as follows: ( ) where bj X is the global best position, β is the step control parameter, sents the uniform random numbers, ξ is the minimal actual number, and g f and w f are the best and worst adaptations of the current sparrow, respectively.

Algorithm Overview
The nonlinear and non-smooth nature of wind power makes accurate prediction challenging.VMD decomposition can decompose the raw wind power data into several modal components, reduce the raw data's noise, and reduce the series' complexity.During the model training process, the hyperparameters of the model impact its performance.To optimize the model's hyperparameters and obtain the optimal model, SSA is utilized.

Model Structure and Algorithm Flow
Numerous factors affect wind power, and it is difficult for the traditional single model to effectively extract the temporal features of wind power.To solve the above problems, this paper proposes a combined TCN-BiGRU model, which enables the model to effectively mine the features of time series data.As Figure 5 shows, the structure of the TCN-BiGRU model consists of two layers of TCN blocks, two layers of BiGRU, and fully connected layers.Initially, the TCN block extracts the time series feature information, and the feature matrix extracted by the TCN is input into the BiGRU.The bi-directional information flow of time series information is captured through the BiGRU to learn the dynamic change pattern of wind power features.Finally, the dimensional change is performed through the fully connected layer, so it is possible to effectively predict the wind power at the next moment.As the TCN-BiGRU model features numerous hyperparameters, and manually adjusting the hyperparameters of the model has a significant impact on the prediction results, it is proposed to use SSA [36][37][38] to find the number of iterations, the learning rate, and the number of neurons and find the convolution kernel size in TCN-BiGRU to obtain the optimal model and improve the prediction accuracy.
The model optimization process can be divided into the following steps: (1) Determine the TCN-BiGRU parameters to be optimized and initialize the population size of the sparrows, the maximum number of iterations, the percentage of discoverers, the percentage of followers, the warning value, the initial positions of the sparrows, and the optimization interval of the hyperparameters.(2) Calculate the initial fitness of the individual sparrows and save the optimal fitness and the corresponding sparrows' positions.(3) Update the positions of the discoverers, followers, and alerters.(4) Calculate the fitness of the sparrow population and update the optimal fitness and sparrow positions.(5) Judge whether the optimal fitness is satisfied or whether the maximum number of iterations is reached.If the condition is satisfied, the optimal parameters are obtained.Otherwise, proceed to step 3.
Since wind power generation is nonlinear and non-stationary, the original wind power generation is first decomposed into multiple subsequences and residuals through signal decomposition.Then, these subsequences and residuals are used as the inputs of the model for wind power generation prediction to effectively reduce the adverse effects of the nonlinear and non-stationary characteristics of wind power generation on the prediction model and improve the prediction accuracy.To this end, the paper introduces VMD to the above model and constructs the VMD-SSA-TCN-BiGRU model, whose structure is shown in Figure 6 below, and its algorithmic flow is as follows: (1) Data preprocessing.Since the data extracted from the SCADA system will have some missing values, the original missing values are filled to form the complete time series data.(2) The power in the original data is decomposed into several components and residuals via VMD, which was introduced in Section 2.

Experimental Setup
The experiment code is implemented using the PyTorch framework with PyCharm 2021.3 as the platform.The operating system is Windows 11 and the hardware device is an NVIDIA GeForce RTX 3060, .NVIDIA Corporation.NVIDIA is located in Santa Clara, California, USA.

Dataset Description
The dataset used in this paper is the measured data of wind turbine 44 at a wind farm in China.This turbine has a cut-in wind speed of 3 m/s, a rated wind speed of 9 m/s, and a rated power of 2.5 MW.Selected for analysis from 1 December 2020 to 31 January 2021, the historical SCADA data are sampled at the second level, including the wind speed, power, ambient temperature, nacelle temperature, wind direction, angle to the wind, generator speed, and other characteristics.

Data preprocessing Feature Selection
To address the issue of uneven sampling intervals in the original data, the data are resampled at a frequency of 5 min.After sampling, the total number of data samples is 17,857, with some missing values due to fan failure on 16 December 2020.In this paper, the missing data are filled with zero values.After filling in the complete time series data, the wind direction is normalized using trigonometric functions.The sin and cos values of the wind direction are taken to jointly characterize the wind direction information.The remaining features are normalized to the [0, 1] interval using the max-min normalization process of Equation (19).
where x is the original sequence, max x is the maximum value of the sequence, and min x is the minimum value of the sequence.

Feature Selection
To reduce the complexity of the wind power prediction model, this paper employs the Pearson correlation coefficient to analyze the factors influencing wind farms.The Pearson correlation coefficient measures the degree of linear correlation between two characteristic variables.It is calculated by the ratio between the covariance and the standard deviation of the two variables.A correlation coefficient of 1 indicates a perfect positive correlation, −1 indicates a perfect negative correlation, and 0 indicates no correlation between the two variables.As shown in Figure 7, a darker color indicates a stronger correlation.It can be found that the wind power is strongly correlated with the wind speed, generator speed, impeller speed, and theoretical power, but there is a weak correlation with the wind angle, yaw angle, and absolute wind direction, and a negative correlation with the ambient temperature and cabin temperature.This article selects wind power, wind speed, wind direction, generator speed, impeller speed, cabin temperature, and environmental temperature as the input features of the model.For example, an increase in the wind speed corresponds to an increase in power.When the wind speed reaches the rated wind speed, the wind power will also correspondingly reach the rated power because the wind speed significantly impacts the wind power.For the other features, they also have a certain impact on the wind power, and using multiple features as inputs for the prediction model can make the model's prediction more effective.

VMD Decomposition Results
In order to enhance the accuracy and stability of the wind power prediction model, it is necessary to solve the issues of significant fluctuations of the original wind power sequence and the large mutual interference of signals at different frequencies.This paper employs VMD decomposition to break down the original sequence into multiple sub-series with different frequencies so that the signals in each sub-series have higher purity and less interference.This approach effectively reduces the noise and interference in the original sequence, thereby simplifying the power sequence.However, selecting the number of decompositions, K, is crucial to the VMD decomposition.A value of K that is too large will generate redundant modes and cause modal mixing.At the same time, a value of K that is too small will lead to modal under-decomposition and reduce the model's prediction accuracy.The central frequency method is based on spectral analysis in signal processing, and the central idea is to determine the decomposition structure of a signal based on its frequency domain characteristics.In the spectrum of a signal, each peak represents a frequency component in the signal, so detecting the peak in the spectrum can determine the number of frequency components with significant energy in the signal.Based on this idea, the central frequency method uses the spectral information of the signal to determine the K value of the VMD decomposition.This article adopts the central frequency method to determine the size of K.The value of α in VMD can also have an impact on the decomposition.The larger the value of α , the smoother the decomposition result, but it may cause aliasing and information loss between patterns.The smaller the value of α , the closer the decomposition result is to the original signal, but it may cause the decomposition mode to be unsmooth with noise and oscillation.This article uses a grid search to determine the value of α as 2200.
Table 1 shows the relationship between the K value taken and the center frequency.When the K value is fixed, an increase exceeding 50% in the center frequency between adjacent modes indicates that the adjacent modes are dissimilar.From Table 1, it can be seen that when the K value is from 1 to 8, the center frequencies between the adjacent modes are above 50%, which means that it does not cause modal confusion.However, when K = 9, the increase in mode 7 and mode 8 is only 35.02%, and it is considered that the two modes are similar.There is modal confusion, so the K value of 8 is chosen as an appropriate number of decompositions.Figure 8 illustrates the modal components and residuals obtained from the original power decomposition.( ) where N is the number of prediction series, ˆt x and t x are the predicted and true val- ues of wind power, respectively, and t x is the average true value of power.

Results and Discussion
In this study, the dataset is split into a 70% training set and a 30% testing set.The step size is set to 20.This means that the historical data from t -20 to t moments are used to predict the wind power at t +1 moments.The proposed VMD-SSA-TCN-BiGRU model is initialized with the following parameters: The number of decompositions of VMD is 8.The population size of the SSA algorithm is 30.The producers account for 20%, the number of iterations is 20, and the search dimension is 4, which are the number of iterations, the number of neurons, the learning rate, and the convolutional kernel size.The search range of the number of iterations is [50, 300], the search range of the neurons is [50, 150], the search range of the learning rate is [0.001, 0.01], and the search range of the convolutional kernel size is [2,30].The expansion factor in TCN is [1,2,4,8,16], the size of the convolutional filter is 64, and the parameter optimization algorithm obtains the convolutional kernel size.The BiGRU has two layers of the network.The number of neurons is obtained using the optimization algorithm, Adam is used as the optimizer, the batch_size is 512, and mean square error is used as the loss function.To verify the validity of the model proposed in this paper, comparison experiments are performed with the ELM [39,40], BiLSTM, BiGRU, Transformer [41,42], TCN, N-BEATSX [43], TCN-BiGRU, and SSA-TCN-BiGRU models.R are used in Table 2 to evaluate the advantages and disadvantages of each model.Figure 9 shows the fitted curves of the actual and predicted values as well as the local enlargement.
Figure 9 shows the fitting curves of the different models.The wind power prediction curves for the last 5 days and different time periods are drawn.It can be found that the overall trend of the predicted and actual values of the nine models is roughly the same.However, it can be seen that the fitting performance of ELM, BiLSTM, BiGRU, TCN, N-BEATSX, and Transformer for a single model is not as good as that of the combined TCN-BiGRU model, and the combined SSA-TCN-BiGRU model is closer to the actual value than the TCN-BiGRU model.After adding VMD, it is evident that it will reduce the overall lag of the prediction results and improve the prediction curve fitting results.It can be found that the overall prediction results of the VMD-SSA-TCN-BiGRU model proposed in this paper are better than those of the rest of the models.2, it can be found that the proposed model in this paper outperforms the rest of the models in wind power prediction and can predict wind power more accurately.
Figure 10 shows the scatter plots of the actual and predicted power of the nine models.The horizontal axis of each subgraph represents the actual power corresponding to the model, while the vertical axis represents the predicted power corresponding to the model, including ELM, BiLSTM, BiGRU, Transformer, TCN, N-BEATSX, TCN-BiGRU, SSA-TCN-BiGRU, and VMD-SSA-TCN-BiGRU, which visually shows that the deviation between the true and predicted values of the single model is larger.The deviation of the combined model between the actual and predicted values of VMD-SSA-TCN-BiGRU is significantly smaller and more concentrated than that of TCN-BiGRU and SSA-TCN-BiGRU, which can indicate that the proposed model in this paper has significant advantages in wind power prediction.To illustrate the effectiveness of the proposed model more intuitively, Figure 11  R improves by 0.60%, which shows that the VMD decomposition can effectively reduce the adverse effects of nonlinear and non-stationary characteristics of wind power on prediction.From the above chart, it can be observed that the hybrid VMD-SSA-TCN-BiGRU model proposed in this article has a significant decrease in the SMAPE, MAE, and RMSE, while the R2 improves.Compared to the current time series' new algorithm N-BEATSX [44], the TCN can extract deep-level features from the time series, while BiGRU can effectively predict wind power.Although TCN-BiGRU [31] can improve the accuracy of wind power prediction, due to the numerous hyperparameters in the model, it is difficult to find suitable parameters manually.Therefore, hyperparameter optimization algorithms are needed.This paper uses SSA to optimize the hyperparameters of the TCN-BiGRU model, which has a good global search ability and convergence.There are many factors that affect wind power, and the original wind power series fluctuates greatly.Therefore, it is necessary to decompose the original wind power data.This article uses VMD [45,46] to decompose the original wind power data, which can effectively reduce the interference of the wind power data and reduce the overall lag of the model prediction.

Conclusions
This article proposes a short-term wind power prediction method for VMD-SSA-TCN-BiGRU.The proposed VMD-SSA-TCN-BiGRU prediction model reduces the SMAPE, MAE, and RMSE by 13.24%, 24.03%, and 35.90%, respectively, compared to the SSA-TCN BiGRU model.The hybrid model was proven to be effective in predicting wind power through four evaluation indicators and draws the following conclusions: (1) VMD can reduce noise in the original power and alleviate the overall lag of the prediction model.(2) The TCN-BiGRU model can effectively extract temporal features and accurately predict wind power.And using SSA to optimize the hyperparameters of the TCN-BiGRU model further improves the performance of the model.(3) This paper verifies the model's validity through the measured data of wind turbine 44 in a domestic wind farm.It conducts a comparative analysis with the remaining eight models, and the results show that the model proposed in this paper has better stability and a higher accuracy.
Wind power prediction is mainly used for power dispatch, but the information provided by wind power point prediction to power system personnel is limited.In the future, interval prediction [47] based on the point prediction model in this paper can provide more useful credit and better dispatch of the power system.
and 1 n k λ + , whose expressions are as fol- lows:

W
 and t h  are the state and weight of the forward hidden layer at time t , t h W  and t h  are the state and weight of the backward hidden layer at time t , and t b
number, M is the maximum number of iterations, Q is a random number obeying the positive, too, values, respectively, and L is the unit matrix of 1 j × .

1 .
Each component and residual are combined with the remaining features of the wind power data to form new data, and then the data are normalized to divide the training and testing sets.(3) The parameters of SSA and TCN-BiGRU are initialized, the composed new data are inputted into the TCN-BiGRU model as shown in Figure 5, and the number of iterations, learning rate, number of neurons, and a convolution kernel size of the TCN-BiGRU are optimized via SSA to obtain the optimal model parameters and improve the prediction accuracy.(4) The predicted values of all the subsequences are accumulated to output the final results of the model's one-step-ahead prediction, and the constructed model is evaluated.

Figure 7 .
Figure 7.The heatmap of the correlation coefficients of input variables.

Figure 8 . 2 R
Figure 8. Raw power and VMD decomposition results.4.6.Performance MetricsIn this paper, the symmetric evaluation absolute percentage error SMAPE E

Figure 9 .
Figure 9. Forecasting curves of different models.

Figure 10 .
Figure 10.Scatter plot of actual and predicted values.

Figure 11 .
Figure 11.Comparison of evaluation indicators of different models.
− are the input information at the current moment and the hidden state at the previous moment, respectively, σ is the sigmoid function, z r W and r U are the weights of the reset gate, U and W are the weights of t h  , t r is the state of the reset gate, and  denotes the Hadmard product.

Table 1 .
Central frequencies of each mode under different K values.
The evaluation metrics SMAPE E

Table 2 .
Performance index prediction results.This paper performs several experiments using identical model parameters and calculates the average value based on 10 trials.Table2shows the comparison of the evaluation metrics of different models.The SMAPE BiGRU, and SSA-TCN-BiGRU are 162.763,156.92,152.863, 190.164,  148.232, 127.764, 125.071, and 85.972, respectively.The VMD-SSA-TCN-BiGRU model's 2 R reached 0.997.Based on the evaluation metrics in Table compares the eight models in terms of SMAPE E R .When comparing the combined TCN-BiGRU model to the best single model N-BEATSX described above, SMAPE E decreases by 4.23%, MAE E decreases by 4.91%, RMSE E decreases by 2.11%, and 2 R improves by 1.76%, which shows that TCN can effectively extract complex wind power timing features, and BiGRU can learn dynamically changing wind power.After adding the SSA algorithm, SMAPE E decreases by 24.35%, MAE E decreases by 33.04%, RMSE E decreases by 31.26%, and 2 R improves by 1.22%, indicating that SSA can find the optimal model parameters and improve the model's performance.After adding VMD, SMAPE E decreases by 13.24%, MAE E decreases by 24.03%, RMSE E decreases by 35.90%, and 2