1. Introduction
As environmental degradation and the world’s energy constraint worsen, the growth of renewable energy has emerged as a crucial avenue for changing the energy structure [
1]. Because of its abundant resource reserves, affordability, and other benefits, wind energy has demonstrated promising development prospects among them. However, due to the intermittency and uncertainty of wind energy, large-scale grid integration of wind power will bring serious challenges to the stability of the power grid. As a result, precise forecasting of wind farm group power not only enhances wind power consumption capacity and grid dispatch security but also contributes to more economical power system operation, which is a crucial step in guaranteeing the power system’s safe operation [
2].
The non-stationarity and high volatility of meteorological data, including wind speed, present a problem for wind power prediction and have a direct impact on the model’s prediction accuracy.
Wind turbines within wind farms exhibit significant spatiotemporal correlations in power generation, but traditional forecasting methods, which use equal weight averaging, overlook this characteristic, resulting in insufficient accuracy. Current mainstream models exhibit three major limitations: they focus predominantly on individual turbine characteristics and lack effective modeling of spatial dependencies among turbine groups; their data preprocessing approaches are limited, failing to address wind speed non-stationarity and multi-scale features; and the subjective selection of clustering optimization parameters renders the models sensitive to noise. In order to increase the accuracy of power forecasts for wind power clusters, it is now crucial to precisely break down wind speed sequences, arrange wind turbines scientifically, and optimize model architectures.
To this end, this paper employs SVMD, TCN, MSA, BiGRU, and GJO while implementing targeted enhancements and tightly coupled designs for each module. During the decomposition preprocessing stage, parameters such as the number of SVMD layers, penalty factors, and bandwidth constraints are first reasonably set based on prior experience and statistical analysis of wind speed and power sequences, enabling multiscale decomposition of the original non stationary signals. Subsequently, FOA-DBSCAN is applied to cluster and segment the decomposed wind speed and power samples, mitigating the impact of sample distribution variations and outliers on model training. Each SVMD modality is then fed as an independent feature channel into the subsequent prediction network. At the output stage, a trainable reconstruction layer is introduced, enabling the model to autonomously learn the nonlinear contribution weights of different modalities to the overall power. This transforms the traditional fixed reconstruction method of “simple linear superposition after decomposition” into a “data-driven adaptive fusion oriented toward multiscale information.” During the deep spatio-temporal modeling phase, this paper employs TCN as the front-end temporal feature encoder, focusing on extracting multi-scale temporal features from each modality sequence via causal and dilated convolutions. Building upon this, the MSA is introduced, transcending single-head attention confined to the temporal dimension. This mechanism simultaneously encodes temporal and unit information within the TCN feature space, leveraging multi-head attention to characterize correlations across different temporal scales and spatial regions. explicitly modeling coupling relationships between turbines and across different time points within the same turbine. Its structural position between TCN and BiGRU effectively mitigates the interference of raw noise on attention weight learning. Subsequently, the high-order features processed by TCN and MSA are fed into BiGRU, functioning as a sequence memory module at the high-level semantic stage. During training, it dynamically models relatively smooth and abstract features by simultaneously utilizing both forward and backward temporal information. This approach reduces the pressure of directly handling strongly non stationary and highly noisy sequences while mitigating gradient vanishing and enhancing training stability. Throughout this process, the GJO algorithm serves as a global hyperparameter optimization tool exclusively for the TCN-BiGRU-MSA model. By searching for optimal configurations of hidden layer size, network depth, convolution kernel width, stride, learning rate, and batch size. By balancing final prediction error with model complexity as the composite objective, it automatically identifies optimal network configurations, achieving synergistic optimization of deep network architecture and training strategy.
In light of this, this paper suggests an ultra-short-term power prediction method for wind power clusters based on SVMD and TCN MSA BiGRU GJO. This method uses the GJO algorithm, variational modal decomposition, and spatiotemporal attention mechanism to improve prediction accuracy and lessen the effect of power uncertainty on grid stability. Simulation results demonstrate that the model performs exceptionally well in handling power sequence volatility and complex dependency relationships, with prediction accuracy significantly outperforming existing methods, thereby validating its advanced nature and feasibility.
This paper’s primary contributions to the aforementioned problems are as follows:
- (1)
The multi-scale decomposition of wind speed data is accomplished using the SVMD method. In order to minimize data non-stationarity, precisely capture signal features, and supply the model with high quality input, a stacking technique is used.
- (2)
An improved FOA DBSCAN density clustering method is proposed, in which the neighborhood radius and density threshold are optimized to address the challenge of parameter selection, thereby significantly enhancing the accuracy and robustness of clustering.
- (3)
A hybrid SVMD TCN MSA BiGRU GJO prediction model is developed by integrating TCN, BiGRU, and MSA, greatly enhancing the ability to capture temporal dynamics in wind power sequences.
The organizational structure of this study is as follows:
Section 1 will provide a detailed explanation of the theoretical foundation of the data processing stage of the proposed prediction method.
Section 2 will describe the basic structure of the proposed algorithmic model and its evaluation metrics.
Section 3 will present experimental data and analysis results, followed by an in depth discussion of the conclusions drawn. Finally,
Section 4 will provide a summary and outlook for the study. This paper proposes a hybrid predictive model based on FOA-DBSCAN and SVMD TCN MSA BiGRU GJO, with the overall model framework shown in
Figure 1.
2. Related Work Discussion
The non-stationarity and high volatility of meteorological data, including wind speed, present a problem for wind power prediction and have a direct impact on the model’s prediction accuracy. For this reason, scholars at home and abroad have proposed a variety of signal decomposition techniques to preprocess the raw wind speed data to improve the performance of the prediction model. Common signal decomposition techniques include Empirical Mode Decomposition (EMD) [
3], Variational Mode Decomposition (VMD), Ensemble Empirical Mode Decomposition (EEMD) and Complementary Ensemble Empirical Mode Decomposition (CEEMD) [
4]. These methods can effectively reduce the non-stationarity of the data and extract the implied laws by decomposing the complex signal into several components with different frequency characteristics. However, although these methods are able to adaptively decompose non stationary signals, they are prone to problems such as modal aliasing, end point effects, and increased computational complexity, which affect the accuracy of the decomposition.
The SVMD method proposed in recent studies can separate different frequency features of input data layer by layer and significantly improve the learning ability of the subsequent model, which provides ideas for solving the above problems [
5]. Combining SVMD with deep learning models and sophisticated optimization algorithms has emerged as a key area of research and development for enhancing the preprocessing and forecast accuracy of meteorological data, including wind speed. It also offers theoretical and methodological support for the prediction of wind farm clusters’ ultra-short-term power [
6]. When predicting the power of large wind power clusters, it is necessary to strike a balance between accuracy and efficiency. If each wind turbine is predicted separately, although a high prediction accuracy can be ensured, the computational and time costs are high and the overall efficiency is low. Although estimating the power of the entire wind cluster just from the output of a single turbine can greatly increase efficiency, it is frequently challenging to guarantee prediction accuracy. Wind turbines with similar features can be grouped using a clustering technique, and each group can then be predicted independently to increase prediction accuracy while accounting for efficiency. The study of clustering algorithms has advanced quickly in the last several years, and many methods exhibit varying clustering performance in various contexts [
7].
Silhouette Coefficient (SC), CH index (Calinski–Harabasz, CH), DB index (Davies–Bouldin, DB), and other frequently used clustering performance metrics are used to scientifically assess the clustering impact. SC serves as a measure of clustering quality for individual sample points, reflecting both their similarity to other points within the same cluster and their dissimilarity to those in different clusters. Its values range from −1 to 1, where values closer to 1 represent better clustering performance. By comparing the degree of proximity between samples within a cluster with the degree of separation between samples within clusters, the CH index assesses the clustering effect; the higher the value, the stronger the clustering effect. The DB index compares each cluster’s average similarity to the most similar clusters; the smaller the number, the closer the samples are to the cluster, the more clearly the clusters are separated, and the better the clustering effect. These indexes provide an objective basis for comparing the advantages and disadvantages of different clustering methods and optimizing the parameters. Commonly used clustering methods include Fuzzy C Means (FCM) [
8], K means, Spectral Clustering (SC), Gaussian Mixture Model (GMM), Affinity Propagation (AP), DBSCAN clustering algorithms, etc. FCM allows data points to belong to multiple clusters at the same time, which is highly adaptive but sensitive to the choice of initial cluster centers. The stability of the clustering effect may be impacted by variations in initiation techniques, which could result in different clustering outcomes [
9]. K means, a traditional hard clustering technique, allocates data by iteratively optimizing the clusters’ center of mass. The approach is better suited to handling data with a spherical distribution, but it is more susceptible to the initial center of mass selection. K means is less resilient to outliers and noise [
10,
11]. Despite its ability to cluster data well, SC is a graph theoretic based clustering method that is computationally complicated, wasteful, and highly susceptible to the choice of similarity matrices when working with large datasets [
12]. GMM can better fit complex data distributions by using a mixed model of Gaussian distributions for clustering. In fact, it is frequently challenging for the dataset to fully satisfy the method’s assumption that the data follows a Gaussian distribution, which could result in subpar clustering outcomes [
13,
14]. There is some flexibility because the AP method does not have to predetermine the number of clusters. Nevertheless, the clustering effect is exceptionally sensitive to the parameter settings, and the method’s computation procedure is quite complex [
15].
DBSCAN is more robust in modeling multidimensional meteorological characteristics and spatio temporal correlation of wind power clusters than the aforementioned clustering methods because it does not impose any restrictions on the clusters’ geometry and can accurately capture the grouping patterns of units with irregular spatial correlation characteristics in wind farm clusters [
16]. Large wind clusters can be clustered using DBSCAN to group turbines with comparable operating characteristics into a single group, which provides a more effective grouping technique for power prediction because surrounding turbines’ power operating states are typically similar. DBSCAN is effective at handling dense data with arbitrary shapes and noisy data extraction; however, it depends on two important parameters, neighborhood radius and nadir density, which greatly affect the clustering results. Inappropriate parameter selection can result in over or under clustering, which restricts its use in high dimensional complex scenarios, so it is essential to optimize these parameters to enhance performance [
17].
To fill this gap, the FOA was introduced to dynamically optimize the DBSCAN parameters. FOA is a population intelligence optimization algorithm proposed by Pan et al. [
18], which simulates the foraging behavior of fruit flies and solves the optimization problem with a simple structure, efficient searching capability, and fast convergence properties. In wind cluster power prediction, the grouping strategy and prediction performance can be improved by the FOA optimization parameters [
19].
Since different prediction methods have their own advantages in coping with data characteristics and practical application scenarios, wind power prediction requires targeted modeling. Wind power prediction methods mainly include physical methods, statistical methods, and artificial intelligence based methods [
20,
21]. Complex physical models that accurately simulate wind turbine features and meteorological factors (such as wind speed, wind direction, and air density) are typically the foundation of physical approaches. Typical tools and algorithms that can predict wind power directly from physical principles include analytical techniques based on wind turbine power characteristic curves and Computational Fluid Dynamics (CFD) models [
22]. Physical approaches, however, are ineffective in the face of complex and quickly changing meteorological conditions because they rely heavily on the consistency of the environmental conditions and the accuracy of the modeling parameters. They also perform poorly in ultra short-term forecasts.
Statistical methods, on the other hand, focus on learning the intrinsic relationships between variables from historical observations and making predictions by modeling functions between inputs and outputs. Among the most popular statistical techniques are support vector regression (SVR), autoregressive integrated moving average (ARIMA), moving average (MA), and autoregressive model (AR). Among these, SVR can partially capture nonlinear relationships because of the flexibility of its kernel function [
23], while ARIMA shows great prediction accuracy when working with smooth time series data and excels at identifying trends and periodicity in data [
24]. However, wind power data tends to be significantly non stationary and highly volatile, which leads to the limited fitting ability of statistical models when dealing with complex time series data.
In recent years, artificial intelligence based methods have demonstrated excellent performance in wind power prediction [
25]. These methods are capable of extracting nonlinear features and uncovering hidden patterns within the data, owing to their powerful learning abilities. They are particularly well suited for dealing with highly volatile and non-stationary time series data. Random Forest (RF), Graph Convolutional Network (GCN), Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM) are examples of common artificial intelligence algorithms [
26]. Due to its memory function and time dependent modeling capability, LSTM has emerged as a popular option for both short-term and ultra-short-term forecasting, along with other neural network variations and hybrid neural networks [
27]. The RF algorithm enhances wind power forecasting by constructing an ensemble of decision trees, which integrates individual prediction results and automatically selects and combines relevant features from both multidimensional meteorological data and historical power records. This methodology significantly improves both the predictive accuracy and robustness against fluctuations in wind power output [
28]. GRU can effectively capture the long-term dependencies in meteorological time series—such as wind speed and direction—for wind power forecasting. Its gating mechanism simplifies model parameters and enhances computational efficiency, thereby enabling efficient modeling and accurate prediction of power output trends [
29]. By using the graph structure to model the spatial topology of wind farms and the node feature propagation and aggregation mechanism, GCN is able to better characterize the dynamic properties of complex wind farms by capturing the spatial dependence between wind turbine generators (WTGs) [
30]. Improved deep learning models such as BiGRU and TCN further enhance the ability to capture long-term dependencies and spatio temporal correlations in time series. Deep learning models that incorporate the attention mechanism can also dynamically trade off the effects of each input feature at different time steps, which greatly improves the accuracy of prediction [
31]. Hang Fan et al. proposed M2WLLM, which uses a large language model to fuse textual prompts and temporal numerical data for ultra-short-term wind power forecasting [
32]. Poonam Dhaka, Mini Sreejeth, and M. M. Tripathi proposed a hybrid ensemble system that combines multivariate signal decomposition, stacked GRU networks, and a Bagging-Boosting error-correction scheme for short-term wind power forecasting [
33]. Erlong Zhao, Shaolong Sun, and Shouyang Wang proposed a scientometric and review-based framework that classifies and analyzes big-data and AI methods for wind energy forecasting over the past two decades [
34]. Overall, these studies demonstrate that integrating signal decomposition, advanced clustering and optimization strategies, and intelligent forecasting models has become an important trend for improving the accuracy and robustness of wind power prediction.
Wind power forecasting is categorized into three time scales: ultra-short-term (minutes/hours), short-term (hours/days), and long-term (months/years). Among these, ultra-short-term forecasting is of critical importance in modern wind power dispatch [
35], providing essential support for real time wind farm operations and grid stability [
36]. Despite challenges posed by high dynamic data and nonlinearity, AI technologies such as TCN and LSTM have significantly improved prediction accuracy and response speed by capturing spatio temporal features [
37]. In contrast, short-term forecasting often suffers from weak generalization due to the need to model turbine distribution and sudden changes in wind speed; long-term forecasting, meanwhile, faces limited accuracy due to the extended timeframe and high uncertainty of resources.
3. Data Feature Construction and Preprocessing
3.1. SVMD
A stacked variational modal decomposition technique called SVMD was proposed in 2019 [
38]. It is based on the conventional VMD and uses a stack decomposition strategy with the goal of more precisely breaking down signals into finer submodalities in order to extract more significant feature information. In order to increase the flexibility and stability of the deep decomposition and information extraction, the main concept of SVMD is to use VMD to break down complex signals by adding constraints. The specific optimization steps are as follows:
Step 1: Objective function construction: SVMD adopts a stacking mechanism in the decomposition process to decompose the original signal
into a combination of multiple components
(i.e., a set of components
) and its corresponding decoupling center frequency set
. The objective equation is as follows:
The goal of SVMD is to use incremental variational decomposition and loss-constrained optimization to create the set of components that meet the optimal decomposition condition in order to confine the solution. Its goal for optimization can be stated as follows:
In Equation (2), denotes the variational objective function, denotes the bandwidth measurement operator, denotes the center frequency, is the decomposed submodal sequence, and is the unit impulse function.
Step 2: Construction of the Lagrange optimization problem. In order to merge the constraints
, SVMD introduces a Lagrange multiplier
and defines the penalty term coefficients
to enhance the convergence of the constraints. The Lagrange optimization problem is as follows:
In Equation (3), is the penalty term coefficient used to enhance the reconstruction accuracy of the components. The decomposition of the wind speed series into multiple modes reduces the noise effect of wind speed, which can be used as an input to the prediction model in combination with other meteorological data.
3.2. PCA
In this study, PCA [
39] is employed to reduce data dimensionality and extract key features. As a widely adopted method, PCA utilizes linear transformations to minimize information loss and retain essential information [
40]. The dimensionality reduction phases for PCA are as follows:
Step 1: Constructing the data matrix’s sliding time window. Segment the wind turbine operating power data using sliding windows. Assume that the wind farm contains
turbines, and the operating power of each turbine is recorded as
in time period
. Utilizing the sliding time window, the data matrix
is constructed:
where
denotes the sliding window width and
is the number of fans.
Step 2: Calculate the covariance matrix. Normalize the mean of each row of the wind power data matrix
to obtain the centered matrix
:
Using the normalization matrix
, the covariance matrix
is calculated:
Step 3: Eigenvalue decomposition and principal component extraction. Eigenvalue decomposition is performed on the covariance matrix
to obtain the eigenvalue
and the corresponding eigenvector
:
The eigenvalues are ranked in descending order
and the first
principal components with a cumulative contribution of 90% of the eigenvalues are retained:
Step 4: Dimensionality reduction representation. Based on the selected
eigenvalues and their corresponding eigenvectors, construct the dimensionality reduction matrix
. The original data
is finally converted into the dimensionality reduced feature representation:
The downscaled wind power data matrix retains most of the operational features and significantly reduces the dimensionality, providing optimized inputs for subsequent DBSCAN clustering. With PCA, the feature dimensionality is significantly reduced, the information redundancy is reduced, and key characteristics are retained. This approach improves the speed of clustering calculation and enhances the interpretability of clustering results.
3.3. DBSCAN
DBSCAN is a density-based clustering method that uses density connectivity to quickly identify data clusters of various shapes. In this process, wind power data are first extracted and preprocessed using PCA to obtain feature vectors for clustering. Then, under the given neighborhood radius
and density threshold
, if a data point
has points
in its neighborhood,
is defined as the core object; non-core points may become boundary points according to their relationship with the core object, otherwise they are regarded as noise points. Finally, clusters are generated by recursively linking data points via the density reachability of core points. In this way, DBSCAN can effectively detect clusters in datasets with complex structures and heterogeneous density. The algorithm’s basic flowchart is presented in
Figure 2.
The performance of DBSCAN clustering is highly dependent on the appropriate selection of two parameters: the neighborhood radius and the density threshold . Traditional manual parameter tuning is often computationally intensive and may not reliably achieve optimal accuracy. In this paper, we enhance DBSCAN by introducing FOA to globally optimize its two key parameters, thereby significantly improving clustering performance. The procedure for FOA based optimization of DBSCAN is summarized as follows:
Step 1: Parameter initialization. Initialize the Drosophila population size , the search step , and the key parameters and of DBSCAN as the optimization objects of Drosophila individuals. The starting value of the initial parameters is , and the search space range and the maximum number of iterations are set.
Step 2: Initialize the search direction of Drosophila individuals. Execute the Drosophila search movement according to Equation (10) and calculate the corresponding fitness distribution.
Step 3: Obtain the current state food content determination value. Take the reciprocal between the current fruit fly position and the starting point as the food flavor concentration determination value :
- (1)
Calculate the Euclidean distance between the current position of the fruit fly and the initial point:
In Equation (11), is the distance between the current Drosophila position and the starting point.
- (2)
Use the olfactory formula (proportional to the inverse of the distance):
Step 4: Find the current best individual. Calculate the odor concentration at the location of the ith Drosophila individual according to Equation (13), and denote it by
. Then, determine the individual with the highest odor concentration
.
Step 5: Search globally for optimization. Step 4 involves noting the fruit fly with the highest flavor content and the associated location coordinates. Equations (14) and (15) are used to update the fruit fly position data as the remaining individuals eventually converge to the site:
In Equation (15), and are the maximum and minimum values of the weighting coefficients, respectively, and is the maximum number of iterations.
Step 6: Continue steps 3 through 5 until the computation reaches the maximum number of iterations. At that point, the optimal parameter is the position data of the ideal Drosophila individuals.
Through FOA optimization, DBSCAN is able to analyze the wind power clustering characteristics more efficiently and accurately, thus reducing the cost of manual parameter adjustment, improving the clustering effect, and laying the foundation for the subsequent tasks, such as large-scale wind turbine power prediction.
To objectively validate the clustering performance of DBSCAN optimized by FOA, it is necessary to conduct a quantitative quality assessment of the obtained clustering results. Three widely used internal cluster validity indices are adopted in this study: SC, DB and CH index.
- (1).
SC. For each sample , let denote the average distance between sample and all other samples within the same cluster (intra-cluster dissimilarity), and let denote the minimum average distance between sample and all samples in any other cluster (inter-cluster dissimilarity). The silhouette value of sample is defined as:
The overall SC for a clustering result is obtained by averaging over all samples. A larger SC value indicates that samples are closer to their own clusters and farther away from other clusters, and thus corresponds to better clustering quality.
- (2).
DB. Let be the number of clusters. For cluster , we first compute its intra-cluster dispersion (), which is defined as the average distance of samples in cluster to its centroid. For any two clusters and , the similarity measure () is given by:
where
and
are the centroids of clusters
and
, respectively.
denotes the center-to-center distance between each pair of clusters
and
, reflecting the degree of separation between different clusters. The
is then defined as:
A smaller DB value indicates that clusters are more compact and better separated from each other.
- (3).
CH Index. Let n be the total number of samples and k be the number of clusters. The CH index is defined as the ratio between the between-cluster dispersion and the within-cluster dispersion:
where
denotes the trace of the between-cluster dispersion matrix, and
denotes the trace of the within-cluster dispersion matrix. Intuitively, a larger CH value means that cluster centers are far apart from each other while samples inside each cluster are relatively close to their own center, which corresponds to better clustering performance under this criterion.
5. Experimental Analysis
To validate the effectiveness of the proposed forecasting method, the model was tested using a wind farm cluster dataset. This dataset comprises measured data from seven onshore wind farms in Inner Mongolia, China. All wind farms are equipped with approximately 2-megawatt horizontal-axis variable-speed wind turbines, primarily located on flat or gently sloping grasslands without complex mountainous or offshore environments. For each wind farm, aggregated active power and corresponding wind speed data were recorded at a 10 min time resolution and synchronized with numerical weather prediction (NWP) data at the same resolution. The dataset spans from 1 March to 30 December 2022, with samples divided into a training set (70%) and a test set (30%).
5.1. SVMD Based Wind Speed Decomposition Results
In order to further explore the detailed features of the wind speed data in the time series, the decomposition is carried out using SVMD, and the results are shown in
Figure 8.
As illustrated in
Figure 8, IMF1 to IMF6 are the results obtained by applying SVMD to decompose the wind speed series from seven wind farms. Each subsequence captures a distinct characteristic, with IMF1 representing the low-frequency trend and IMF6 reflecting localized fluctuation rates. This technique preserves the wind speed sequence features and suppresses modal aliasing.
5.2. DBSCAN Wind Turbine Clustering Method Based on FOA Optimization
Initially, PCA is applied to reduce the data dimensionality, and the resulting principal components are used as feature vectors. Subsequently, FOA is employed to globally optimize the parameters and of DBSCAN, thereby constructing an FOA-DBSCAN clustering model for effective clustering of the feature vectors.
5.2.1. Parameter Optimization Process
As can be seen from
Figure 9, the FOA performs best in terms of convergence speed and final score, obtaini532ng a high clustering score with fewer iterations. The PSO algorithm converges second fastest, and its effectiveness is better than that of the GA algorithm. In conclusion, the FOA is overall better than PSO and GA and finally converges to the global optimal solution, which verifies its effectiveness.
5.2.2. Clustering Results
Figure 10 and
Table 1 show that the results of clustering algorithms varied significantly. DBSCAN combined with FOA effectively differentiated the data points in the 3D PCA scatterplot, whereas the K-means and spectral clustering overlapped more (especially between data1 and data2).
Table 1’s SC, CH, and DB indices are 0.5009, 9.1585, and 0.8888, respectively. FOA-DBSCAN performs better than the other approaches, confirming its efficacy. FOA optimizes the DBSCAN parameters to solve the traditional setup problem, and the experimental results show that the algorithm is efficient and accurate.
In summary, a higher SC and CH and a lower DB correspond to better clustering quality, i.e., higher intra-cluster compactness and inter-cluster separation. Therefore, the clustering method with larger SC and CH values and smaller DB values is regarded as providing more favorable clusters for the subsequent forecasting task.
5.2.3. Selection of Representative Turbines Within Clusters
After obtaining the final clustering results, representative turbines are further selected within each cluster to reduce the computational burden of subsequent modeling. Specifically, for each cluster , we compute the cluster-average power output series by averaging the active power of all turbines in that cluster at each time step. Then, for every turbine , we calculate the Pearson correlation coefficient between its individual power sequence and the cluster-average sequence . The turbine with the highest correlation coefficient is selected as the representative turbine of cluster . In this study, one representative turbine is selected per cluster, and the proposed forecasting model is trained on the representative turbines. The cluster-level power prediction is finally obtained by mapping the prediction of representative turbines back to their corresponding clusters.
5.3. SVMD-TCN-BiGRU-MSA Model Prediction Results
The proposed model was implemented on the MATLAB R2023b platform, where training and testing were completed. Key hyperparameters were finely tuned based on a series of preliminary experiments. During training, the Adam optimizer was employed with mean squared error (MSE) as the loss function. The maximum number of training iterations was set to 200, with an initial learning rate of 0.001.
To mitigate overfitting risks in highly volatile, ultra-short-term wind power scenarios, multiple regularization strategies were incorporated during training. Dropout layers were added after both the TCN and BiGRU modules, with a dropout rate of 0.2 for the TCN layer and 0.3 for the fully connected output layer. Additionally, L2 weight regularization (weight decay) with a coefficient of 1 × 10−4 is applied to all trainable parameters to penalize excessively large weights and enhance the model’s generalization capability.
Furthermore, to validate the effectiveness of SVMD-based data decomposition on prediction performance, the decomposed wind power sequence and the original undecomposed sequence are separately input into the TCN-BiGRU-MSA model for comparative analysis.
The maximum number of training iterations is 200 epochs. The learning rate is 0.001. First, to verify the effectiveness of SVMD data processing on prediction results, the decomposed data and undecomposed data were input into the TCN-BiGRU-MSA model, respectively. The prediction results are shown in
Figure 11.
To ensure fairness and reproducibility in comparative experiments, the data preprocessing procedures for all baseline models were consistent with those of the proposed model. The RF baseline model employed 100 trees, with all other parameters set to the default values of the implementation environment. TCN-BiGRU and TCN-BiLSTM share the same TCN backbone architecture: comprising three residual TCN blocks with a convolution kernel size of 5, 128 channels, and dilation factors of 1, 2, and 4, respectively. Each convolution layer is followed by LayerNorm and ReLU activation, with dropout applied after every two convolutions within each residual block at a rate of 0.08. The TCN output sequentially feeds into a forward GRU (or LSTM) layer and a backward GRU (or LSTM) layer, each with 64 hidden units. The forward and backward outputs are concatenated along the feature dimension before passing through a self-attention layer and a fully connected regression output layer. The GCN-BiGRU model employs a 2-layer GCN with 64 channels per layer, using ReLU activation and a dropout rate of 0.05. Its output similarly connects to forward and backward GRU layers (64 units each), concatenation, a self-attention layer, and a fully connected regression layer. All hyperparameters—including optimizers, loss functions, and training epochs—for baseline models align with the unified training configuration of the proposed model. All models are evaluated on identical training and testing datasets to ensure fair and comparable results.
The figure shows that the prediction curve is closer to the actual value and that the SVMD data input model in this paper has a better prediction effect. In order to verify the prediction performance of the model, ablation experiments are carried out, which are compared with TCN-BiGRU, TCN-BiGRU-MSA (GJO), and SVMD-TCN-BiGRU-MSA, respectively, and the results are shown in
Figure 12.
As can be seen from
Figure 12, the model in this paper has the highest agreement between the prediction curve (red) and the true value (black). This shows a strong prediction ability; it captures the trend of the true signals, especially in the fluctuating time period. In addition, the prediction results of the TCN-BiGRU-MSA(GJO) model (green), the SVMD-TCN-BiGRU-MSA model (yellow), and the TCN-BiGRU model (purple) are relatively poor, with the data points deviating from the true values.
As shown in
Figure 12 and
Figure 13 and
Table 2, to eliminate the effects of random model initialization and data shuffling, all experiments were repeated five times under different random seeds. The results in
Table 2 are presented as “mean ± standard deviation.” The proposed model demonstrates high overall accuracy and low bias, effectively suppressing prediction errors. Compared to traditional baseline models SVR, BiLSTM, and RF, our model achieves reductions in MAE of 46.8%, 33.9%, and 21.0%, respectively; reductions in RMSE of 45.3%, 17.3%, and 19.7%, respectively; and reductions in MAPE of 58.6%, 27.3%, and 24.8%, respectively. Compared to deep learning models TCN-BiGRU, TCN-BiLSTM, and GCN-BiGRU, our model still achieves stable improvements, reducing MAE by 18.5%, 14.2%, and 10.0%, respectively, RMSE decreased by 16.6%, 13.0%, and 9.5%, respectively, while MAPE decreased by 17.8%, 12.3%, and 8.3%, respectively. Furthermore, compared to the improved hybrid models TCN-BiGRU-MSA(GJO) and SVMD-TCN-BiGRU-MSA, our proposed model achieved additional reductions of 13.4% and 5.4% in MAE, 11.5% and 5.6% in RMSE, and 10.9% and 5.8% in MAPE, respectively. Among all comparison models, the proposed model achieves the minimum values across all three error metrics—MAE, RMSE, and MAPE—demonstrating superior prediction accuracy and stronger generalization capabilities. Simultaneously, it exhibits the highest R
2 value (0.959), indicating excellent overall fitting performance. Although the absolute improvement in each error metric is relatively modest, this enhancement remains consistently stable across multiple replicate experiments.
6. Conclusions
In this study, a complete set of solutions is formed through innovative data preprocessing techniques, improved clustering methods and construction of hybrid prediction models. These innovations improve prediction accuracy and calculation efficiency, provide reliable technical support for wind power grid-connected scheduling, and have important theoretical significance and practical value. The specific results are mainly reflected in the following three aspects:
To reduce data non-stationarity, the SVMD algorithm applies stacked variational mode decomposition. It decomposes the wind speed signal into six eigenmodes, IMF1 to IMF6. This method separates the low-frequency trend (IMF1) from the high-frequency fluctuation (IMF6). As a result, the input data’s MAE is reduced by 6.4% to 22.9%. In the comparison experiments of seven wind farms, the prediction curve fit R2 of SVMD-processed data is improved to 0.959, which verifies its advantage in suppressing modal aliasing.
The FOA-DBSCAN algorithm dynamically adjusts neighborhood radius and density threshold using the FOA. It was tested on seven wind farms and showed superior performance. The algorithm converges in only 15 iterations, making it faster than PSO and the GA. It significantly outperforms these traditional optimization algorithms. The method solves DBSCAN’s parameter sensitivity problem effectively. It lays a reliable data foundation for future power prediction tasks.
The model integrates the advantages of the TCN, BiGRU and MSA. In ablation experiments, this model’s MAE and RMSE decrease by more than 21.5%. Compared with the benchmark, the MAPE drops to 15.34%. The coefficient of determination, R2, reaches a high value of 0.959. Through the adaptive parameter tuning of the GJO algorithm, the model converges quickly within 200 epochs, showing higher prediction accuracy and smaller prediction error.
Although the model performs well in wind power prediction, there are still limitations. This study mainly considers key meteorological variables such as wind speed, temperature, and air pressure. However, wind power output is also affected by more complex external factors, including grid scheduling strategies, equipment maintenance conditions, and diverse terrain characteristics. In future work, a multi-source data fusion prediction framework will be developed to integrate meteorological data, SCADA measurements, grid operation information, and static geographic features so as to further improve the adaptability and reliability of the model in practical applications. At the algorithmic level, the model will be continuously refined by incorporating more advanced architectures, such as the M2WLLM algorithm for multi-modal wind power large-model learning, and by introducing adaptive feature selection mechanisms to automatically identify highly informative variables under different operating scenarios. These enhancements are expected to further strengthen the generalization capability of the proposed approach and enhance its engineering application value.