Next Article in Journal
Can Virtual Influencers Drive Online Consumer Behavior? An Applied Examination of ELM Model Investigating the Marketing Effects of Virtual Influencers
Previous Article in Journal
Decoding Climate–Soil Interactions in Kazakhstan’s Drylands: Insights from PCA and SHAP Analyses
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sustainability-Oriented Ultra-Short-Term Wind Farm Cluster Power Prediction Based on an Improved TCN–BiGRU Hybrid Model

1
College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010080, China
2
College of New Energy, Inner Mongolia University of Technology, Ordos 010051, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(23), 10719; https://doi.org/10.3390/su172310719
Submission received: 29 October 2025 / Revised: 21 November 2025 / Accepted: 27 November 2025 / Published: 30 November 2025

Abstract

With the large-scale integration of wind power into the grid, the accuracy of wind farm cluster power prediction has become a key factor for the sustainability of modern power systems. Reliable ultra-short-term forecasts support the secure dispatch of high-penetration renewable energy, reduce wind curtailment, and improve the low-carbon and economical operation of power systems. Aiming at the problem of significant differences in wind turbine characteristics, this paper proposes a prediction method based on an improved density-based spatial clustering of applications with noise (DBSCAN) and a hybrid deep learning model. First, the wind speed signal is decomposed at multiple scales using successive variational modal decomposition (SVMD) to reduce non-stationarity. Subsequently, the DBSCAN parameters are optimized by the fruit fly optimization algorithm (FOA), and dimensionality reduction is performed by principal component analysis (PCA) to achieve efficient clustering of wind turbines. Next, the representative turbines with the highest correlation are selected in each cluster to reduce computational complexity. Finally, the SVMD-TCN-BiGRU-MSA-GJO hybrid model is constructed, and long-term dependence is extracted using a temporal convolutional network (TCN); the temporal features are captured by bidirectional gated recurrent units (BiGRUs); the feature weights are optimized by a multi-head self-attention mechanism (MSA), and the hyper-parameters are, in turn, optimized by golden jackal optimization (GJO). The experimental results show that this method reduces the MAE, RMSE, and MAPE by 14.02%, 12.9%, and 13.84%, respectively, and improves R2 by 3.9% on average compared with the traditional model, which significantly improves prediction accuracy and stability. These improvements enable more accurate scheduling of wind power, lower reserve requirements, and enhanced stability and sustainability of power system operation under high renewable penetration.

1. Introduction

As environmental degradation and the world’s energy constraint worsen, the growth of renewable energy has emerged as a crucial avenue for changing the energy structure [1]. Because of its abundant resource reserves, affordability, and other benefits, wind energy has demonstrated promising development prospects among them. However, due to the intermittency and uncertainty of wind energy, large-scale grid integration of wind power will bring serious challenges to the stability of the power grid. As a result, precise forecasting of wind farm group power not only enhances wind power consumption capacity and grid dispatch security but also contributes to more economical power system operation, which is a crucial step in guaranteeing the power system’s safe operation [2].
The non-stationarity and high volatility of meteorological data, including wind speed, present a problem for wind power prediction and have a direct impact on the model’s prediction accuracy.
Wind turbines within wind farms exhibit significant spatiotemporal correlations in power generation, but traditional forecasting methods, which use equal weight averaging, overlook this characteristic, resulting in insufficient accuracy. Current mainstream models exhibit three major limitations: they focus predominantly on individual turbine characteristics and lack effective modeling of spatial dependencies among turbine groups; their data preprocessing approaches are limited, failing to address wind speed non-stationarity and multi-scale features; and the subjective selection of clustering optimization parameters renders the models sensitive to noise. In order to increase the accuracy of power forecasts for wind power clusters, it is now crucial to precisely break down wind speed sequences, arrange wind turbines scientifically, and optimize model architectures.
To this end, this paper employs SVMD, TCN, MSA, BiGRU, and GJO while implementing targeted enhancements and tightly coupled designs for each module. During the decomposition preprocessing stage, parameters such as the number of SVMD layers, penalty factors, and bandwidth constraints are first reasonably set based on prior experience and statistical analysis of wind speed and power sequences, enabling multiscale decomposition of the original non stationary signals. Subsequently, FOA-DBSCAN is applied to cluster and segment the decomposed wind speed and power samples, mitigating the impact of sample distribution variations and outliers on model training. Each SVMD modality is then fed as an independent feature channel into the subsequent prediction network. At the output stage, a trainable reconstruction layer is introduced, enabling the model to autonomously learn the nonlinear contribution weights of different modalities to the overall power. This transforms the traditional fixed reconstruction method of “simple linear superposition after decomposition” into a “data-driven adaptive fusion oriented toward multiscale information.” During the deep spatio-temporal modeling phase, this paper employs TCN as the front-end temporal feature encoder, focusing on extracting multi-scale temporal features from each modality sequence via causal and dilated convolutions. Building upon this, the MSA is introduced, transcending single-head attention confined to the temporal dimension. This mechanism simultaneously encodes temporal and unit information within the TCN feature space, leveraging multi-head attention to characterize correlations across different temporal scales and spatial regions. explicitly modeling coupling relationships between turbines and across different time points within the same turbine. Its structural position between TCN and BiGRU effectively mitigates the interference of raw noise on attention weight learning. Subsequently, the high-order features processed by TCN and MSA are fed into BiGRU, functioning as a sequence memory module at the high-level semantic stage. During training, it dynamically models relatively smooth and abstract features by simultaneously utilizing both forward and backward temporal information. This approach reduces the pressure of directly handling strongly non stationary and highly noisy sequences while mitigating gradient vanishing and enhancing training stability. Throughout this process, the GJO algorithm serves as a global hyperparameter optimization tool exclusively for the TCN-BiGRU-MSA model. By searching for optimal configurations of hidden layer size, network depth, convolution kernel width, stride, learning rate, and batch size. By balancing final prediction error with model complexity as the composite objective, it automatically identifies optimal network configurations, achieving synergistic optimization of deep network architecture and training strategy.
In light of this, this paper suggests an ultra-short-term power prediction method for wind power clusters based on SVMD and TCN MSA BiGRU GJO. This method uses the GJO algorithm, variational modal decomposition, and spatiotemporal attention mechanism to improve prediction accuracy and lessen the effect of power uncertainty on grid stability. Simulation results demonstrate that the model performs exceptionally well in handling power sequence volatility and complex dependency relationships, with prediction accuracy significantly outperforming existing methods, thereby validating its advanced nature and feasibility.
This paper’s primary contributions to the aforementioned problems are as follows:
(1)
The multi-scale decomposition of wind speed data is accomplished using the SVMD method. In order to minimize data non-stationarity, precisely capture signal features, and supply the model with high quality input, a stacking technique is used.
(2)
An improved FOA DBSCAN density clustering method is proposed, in which the neighborhood radius and density threshold are optimized to address the challenge of parameter selection, thereby significantly enhancing the accuracy and robustness of clustering.
(3)
A hybrid SVMD TCN MSA BiGRU GJO prediction model is developed by integrating TCN, BiGRU, and MSA, greatly enhancing the ability to capture temporal dynamics in wind power sequences.
The organizational structure of this study is as follows: Section 1 will provide a detailed explanation of the theoretical foundation of the data processing stage of the proposed prediction method. Section 2 will describe the basic structure of the proposed algorithmic model and its evaluation metrics. Section 3 will present experimental data and analysis results, followed by an in depth discussion of the conclusions drawn. Finally, Section 4 will provide a summary and outlook for the study. This paper proposes a hybrid predictive model based on FOA-DBSCAN and SVMD TCN MSA BiGRU GJO, with the overall model framework shown in Figure 1.

2. Related Work Discussion

The non-stationarity and high volatility of meteorological data, including wind speed, present a problem for wind power prediction and have a direct impact on the model’s prediction accuracy. For this reason, scholars at home and abroad have proposed a variety of signal decomposition techniques to preprocess the raw wind speed data to improve the performance of the prediction model. Common signal decomposition techniques include Empirical Mode Decomposition (EMD) [3], Variational Mode Decomposition (VMD), Ensemble Empirical Mode Decomposition (EEMD) and Complementary Ensemble Empirical Mode Decomposition (CEEMD) [4]. These methods can effectively reduce the non-stationarity of the data and extract the implied laws by decomposing the complex signal into several components with different frequency characteristics. However, although these methods are able to adaptively decompose non stationary signals, they are prone to problems such as modal aliasing, end point effects, and increased computational complexity, which affect the accuracy of the decomposition.
The SVMD method proposed in recent studies can separate different frequency features of input data layer by layer and significantly improve the learning ability of the subsequent model, which provides ideas for solving the above problems [5]. Combining SVMD with deep learning models and sophisticated optimization algorithms has emerged as a key area of research and development for enhancing the preprocessing and forecast accuracy of meteorological data, including wind speed. It also offers theoretical and methodological support for the prediction of wind farm clusters’ ultra-short-term power [6]. When predicting the power of large wind power clusters, it is necessary to strike a balance between accuracy and efficiency. If each wind turbine is predicted separately, although a high prediction accuracy can be ensured, the computational and time costs are high and the overall efficiency is low. Although estimating the power of the entire wind cluster just from the output of a single turbine can greatly increase efficiency, it is frequently challenging to guarantee prediction accuracy. Wind turbines with similar features can be grouped using a clustering technique, and each group can then be predicted independently to increase prediction accuracy while accounting for efficiency. The study of clustering algorithms has advanced quickly in the last several years, and many methods exhibit varying clustering performance in various contexts [7].
Silhouette Coefficient (SC), CH index (Calinski–Harabasz, CH), DB index (Davies–Bouldin, DB), and other frequently used clustering performance metrics are used to scientifically assess the clustering impact. SC serves as a measure of clustering quality for individual sample points, reflecting both their similarity to other points within the same cluster and their dissimilarity to those in different clusters. Its values range from −1 to 1, where values closer to 1 represent better clustering performance. By comparing the degree of proximity between samples within a cluster with the degree of separation between samples within clusters, the CH index assesses the clustering effect; the higher the value, the stronger the clustering effect. The DB index compares each cluster’s average similarity to the most similar clusters; the smaller the number, the closer the samples are to the cluster, the more clearly the clusters are separated, and the better the clustering effect. These indexes provide an objective basis for comparing the advantages and disadvantages of different clustering methods and optimizing the parameters. Commonly used clustering methods include Fuzzy C Means (FCM) [8], K means, Spectral Clustering (SC), Gaussian Mixture Model (GMM), Affinity Propagation (AP), DBSCAN clustering algorithms, etc. FCM allows data points to belong to multiple clusters at the same time, which is highly adaptive but sensitive to the choice of initial cluster centers. The stability of the clustering effect may be impacted by variations in initiation techniques, which could result in different clustering outcomes [9]. K means, a traditional hard clustering technique, allocates data by iteratively optimizing the clusters’ center of mass. The approach is better suited to handling data with a spherical distribution, but it is more susceptible to the initial center of mass selection. K means is less resilient to outliers and noise [10,11]. Despite its ability to cluster data well, SC is a graph theoretic based clustering method that is computationally complicated, wasteful, and highly susceptible to the choice of similarity matrices when working with large datasets [12]. GMM can better fit complex data distributions by using a mixed model of Gaussian distributions for clustering. In fact, it is frequently challenging for the dataset to fully satisfy the method’s assumption that the data follows a Gaussian distribution, which could result in subpar clustering outcomes [13,14]. There is some flexibility because the AP method does not have to predetermine the number of clusters. Nevertheless, the clustering effect is exceptionally sensitive to the parameter settings, and the method’s computation procedure is quite complex [15].
DBSCAN is more robust in modeling multidimensional meteorological characteristics and spatio temporal correlation of wind power clusters than the aforementioned clustering methods because it does not impose any restrictions on the clusters’ geometry and can accurately capture the grouping patterns of units with irregular spatial correlation characteristics in wind farm clusters [16]. Large wind clusters can be clustered using DBSCAN to group turbines with comparable operating characteristics into a single group, which provides a more effective grouping technique for power prediction because surrounding turbines’ power operating states are typically similar. DBSCAN is effective at handling dense data with arbitrary shapes and noisy data extraction; however, it depends on two important parameters, neighborhood radius and nadir density, which greatly affect the clustering results. Inappropriate parameter selection can result in over or under clustering, which restricts its use in high dimensional complex scenarios, so it is essential to optimize these parameters to enhance performance [17].
To fill this gap, the FOA was introduced to dynamically optimize the DBSCAN parameters. FOA is a population intelligence optimization algorithm proposed by Pan et al. [18], which simulates the foraging behavior of fruit flies and solves the optimization problem with a simple structure, efficient searching capability, and fast convergence properties. In wind cluster power prediction, the grouping strategy and prediction performance can be improved by the FOA optimization parameters [19].
Since different prediction methods have their own advantages in coping with data characteristics and practical application scenarios, wind power prediction requires targeted modeling. Wind power prediction methods mainly include physical methods, statistical methods, and artificial intelligence based methods [20,21]. Complex physical models that accurately simulate wind turbine features and meteorological factors (such as wind speed, wind direction, and air density) are typically the foundation of physical approaches. Typical tools and algorithms that can predict wind power directly from physical principles include analytical techniques based on wind turbine power characteristic curves and Computational Fluid Dynamics (CFD) models [22]. Physical approaches, however, are ineffective in the face of complex and quickly changing meteorological conditions because they rely heavily on the consistency of the environmental conditions and the accuracy of the modeling parameters. They also perform poorly in ultra short-term forecasts.
Statistical methods, on the other hand, focus on learning the intrinsic relationships between variables from historical observations and making predictions by modeling functions between inputs and outputs. Among the most popular statistical techniques are support vector regression (SVR), autoregressive integrated moving average (ARIMA), moving average (MA), and autoregressive model (AR). Among these, SVR can partially capture nonlinear relationships because of the flexibility of its kernel function [23], while ARIMA shows great prediction accuracy when working with smooth time series data and excels at identifying trends and periodicity in data [24]. However, wind power data tends to be significantly non stationary and highly volatile, which leads to the limited fitting ability of statistical models when dealing with complex time series data.
In recent years, artificial intelligence based methods have demonstrated excellent performance in wind power prediction [25]. These methods are capable of extracting nonlinear features and uncovering hidden patterns within the data, owing to their powerful learning abilities. They are particularly well suited for dealing with highly volatile and non-stationary time series data. Random Forest (RF), Graph Convolutional Network (GCN), Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM) are examples of common artificial intelligence algorithms [26]. Due to its memory function and time dependent modeling capability, LSTM has emerged as a popular option for both short-term and ultra-short-term forecasting, along with other neural network variations and hybrid neural networks [27]. The RF algorithm enhances wind power forecasting by constructing an ensemble of decision trees, which integrates individual prediction results and automatically selects and combines relevant features from both multidimensional meteorological data and historical power records. This methodology significantly improves both the predictive accuracy and robustness against fluctuations in wind power output [28]. GRU can effectively capture the long-term dependencies in meteorological time series—such as wind speed and direction—for wind power forecasting. Its gating mechanism simplifies model parameters and enhances computational efficiency, thereby enabling efficient modeling and accurate prediction of power output trends [29]. By using the graph structure to model the spatial topology of wind farms and the node feature propagation and aggregation mechanism, GCN is able to better characterize the dynamic properties of complex wind farms by capturing the spatial dependence between wind turbine generators (WTGs) [30]. Improved deep learning models such as BiGRU and TCN further enhance the ability to capture long-term dependencies and spatio temporal correlations in time series. Deep learning models that incorporate the attention mechanism can also dynamically trade off the effects of each input feature at different time steps, which greatly improves the accuracy of prediction [31]. Hang Fan et al. proposed M2WLLM, which uses a large language model to fuse textual prompts and temporal numerical data for ultra-short-term wind power forecasting [32]. Poonam Dhaka, Mini Sreejeth, and M. M. Tripathi proposed a hybrid ensemble system that combines multivariate signal decomposition, stacked GRU networks, and a Bagging-Boosting error-correction scheme for short-term wind power forecasting [33]. Erlong Zhao, Shaolong Sun, and Shouyang Wang proposed a scientometric and review-based framework that classifies and analyzes big-data and AI methods for wind energy forecasting over the past two decades [34]. Overall, these studies demonstrate that integrating signal decomposition, advanced clustering and optimization strategies, and intelligent forecasting models has become an important trend for improving the accuracy and robustness of wind power prediction.
Wind power forecasting is categorized into three time scales: ultra-short-term (minutes/hours), short-term (hours/days), and long-term (months/years). Among these, ultra-short-term forecasting is of critical importance in modern wind power dispatch [35], providing essential support for real time wind farm operations and grid stability [36]. Despite challenges posed by high dynamic data and nonlinearity, AI technologies such as TCN and LSTM have significantly improved prediction accuracy and response speed by capturing spatio temporal features [37]. In contrast, short-term forecasting often suffers from weak generalization due to the need to model turbine distribution and sudden changes in wind speed; long-term forecasting, meanwhile, faces limited accuracy due to the extended timeframe and high uncertainty of resources.

3. Data Feature Construction and Preprocessing

3.1. SVMD

A stacked variational modal decomposition technique called SVMD was proposed in 2019 [38]. It is based on the conventional VMD and uses a stack decomposition strategy with the goal of more precisely breaking down signals into finer submodalities in order to extract more significant feature information. In order to increase the flexibility and stability of the deep decomposition and information extraction, the main concept of SVMD is to use VMD to break down complex signals by adding constraints. The specific optimization steps are as follows:
Step 1: Objective function construction: SVMD adopts a stacking mechanism in the decomposition process to decompose the original signal f ( t ) into a combination of multiple components u ( t ) (i.e., a set of components u k ( t ) k = 1 K ) and its corresponding decoupling center frequency set { ω k } k = 1 K . The objective equation is as follows:
f ( t ) = k = 1 K u k ( t ) , u k ( t ) { u k ( t ) } , ω k { ω k }
The goal of SVMD is to use incremental variational decomposition and loss-constrained optimization to create the set of components that meet the optimal decomposition condition in order to confine the solution. Its goal for optimization can be stated as follows:
J 1 = k = 1 K δ ( t ) + j π t u k ( t ) × exp ( j w k t ) 2
In Equation (2), J 1 denotes the variational objective function, denotes the bandwidth measurement operator, ω k denotes the center frequency, u k ( t ) is the decomposed submodal sequence, and δ ( t ) is the unit impulse function.
Step 2: Construction of the Lagrange optimization problem. In order to merge the constraints k = 1 K u k ( t ) = f ( t ) , SVMD introduces a Lagrange multiplier λ ( t ) and defines the penalty term coefficients ρ to enhance the convergence of the constraints. The Lagrange optimization problem is as follows:
J 2 = k = 1 K | ( δ ( t ) + j π t u k ( t ) ) × exp ( i w k t ) | 2 + p | | k = 1 K u k ( t ) f t | | 2 |
In Equation (3), p is the penalty term coefficient used to enhance the reconstruction accuracy of the components. The decomposition of the wind speed series into multiple modes reduces the noise effect of wind speed, which can be used as an input to the prediction model in combination with other meteorological data.

3.2. PCA

In this study, PCA [39] is employed to reduce data dimensionality and extract key features. As a widely adopted method, PCA utilizes linear transformations to minimize information loss and retain essential information [40]. The dimensionality reduction phases for PCA are as follows:
Step 1: Constructing the data matrix’s sliding time window. Segment the wind turbine operating power data using sliding windows. Assume that the wind farm contains N turbines, and the operating power of each turbine is recorded as x = [ x 1 , x 2 , , x T ] T in time period t = 1 , 2 , , T . Utilizing the sliding time window, the data matrix X is constructed:
X = x 1 , 1 x 1 , 2 x 1 , W x 2 , 1 x 2 , 2 x 2 , W x N , 1 x N , 2 x N , W
where W denotes the sliding window width and N is the number of fans.
Step 2: Calculate the covariance matrix. Normalize the mean of each row of the wind power data matrix X to obtain the centered matrix X ¯ :
X ¯ = X 1 W X · 1
Using the normalization matrix X ¯ , the covariance matrix R is calculated:
R = 1 N W + 1 X ¯ T × X ¯
Step 3: Eigenvalue decomposition and principal component extraction. Eigenvalue decomposition is performed on the covariance matrix R to obtain the eigenvalue λ i ,   and the corresponding eigenvector v i :
R × v i = λ i v i , i = 1 , 2 , , N
The eigenvalues are ranked in descending order λ 1 λ 2 λ N and the first K principal components with a cumulative contribution of 90% of the eigenvalues are retained:
K = a r g m i n i = 1 K λ i j = 1 K λ j   90 %
Step 4: Dimensionality reduction representation. Based on the selected K eigenvalues and their corresponding eigenvectors, construct the dimensionality reduction matrix V = [ v 1 , v 2 , , v K ] . The original data X is finally converted into the dimensionality reduced feature representation:
Y = V T × X ¯
The downscaled wind power data matrix retains most of the operational features and significantly reduces the dimensionality, providing optimized inputs for subsequent DBSCAN clustering. With PCA, the feature dimensionality is significantly reduced, the information redundancy is reduced, and key characteristics are retained. This approach improves the speed of clustering calculation and enhances the interpretability of clustering results.

3.3. DBSCAN

DBSCAN is a density-based clustering method that uses density connectivity to quickly identify data clusters of various shapes. In this process, wind power data are first extracted and preprocessed using PCA to obtain feature vectors for clustering. Then, under the given neighborhood radius ε and density threshold p min , if a data point x C d has points N ε ( x ) p min in its neighborhood, x is defined as the core object; non-core points may become boundary points according to their relationship with the core object, otherwise they are regarded as noise points. Finally, clusters are generated by recursively linking data points via the density reachability of core points. In this way, DBSCAN can effectively detect clusters in datasets with complex structures and heterogeneous density. The algorithm’s basic flowchart is presented in Figure 2.
The performance of DBSCAN clustering is highly dependent on the appropriate selection of two parameters: the neighborhood radius ε and the density threshold p min . Traditional manual parameter tuning is often computationally intensive and may not reliably achieve optimal accuracy. In this paper, we enhance DBSCAN by introducing FOA to globally optimize its two key parameters, thereby significantly improving clustering performance. The procedure for FOA based optimization of DBSCAN is summarized as follows:
Step 1: Parameter initialization. Initialize the Drosophila population size   s i z e p o p , the search step Δ R , and the key parameters ε and p min of DBSCAN as the optimization objects of Drosophila individuals. The starting value of the initial parameters is ( X axis , Y axis ) , and the search space range   F R   and the maximum number of iterations   m a x _ i t e r are set.
Step 2: Initialize the search direction of Drosophila individuals. Execute the Drosophila search movement according to Equation (10) and calculate the corresponding fitness distribution.
X i = X axis + Δ R   Y i = Y axis + Δ R
Step 3: Obtain the current state food content determination value. Take the reciprocal between the current fruit fly position and the starting point as the food flavor concentration determination value S i :
(1)
Calculate the Euclidean distance between the current position of the fruit fly and the initial point:
D i s t i = X i 2 + Y i 2
In Equation (11), D i s t i is the distance between the current Drosophila position and the starting point.
(2)
Use the olfactory formula (proportional to the inverse of the distance):
S i = 1 / D i s t i
Step 4: Find the current best individual. Calculate the odor concentration at the location of the ith Drosophila individual according to Equation (13), and denote it by C i . Then, determine the individual with the highest odor concentration C b e s t .
C i = f ( S i ) ; [ C b e s t , b e s t I n d e x ] = max ( C ) ;
Step 5: Search globally for optimization. Step 4 involves noting the fruit fly with the highest flavor content and the associated location coordinates. Equations (14) and (15) are used to update the fruit fly position data as the remaining individuals eventually converge to the site:
X i = w X i + X axis + Δ R Y i = w Y i + Y axis + Δ R
w = w s w e × max _ i t / s i z e p o p
In Equation (15), w s   and   w e   are the maximum and minimum values of the weighting coefficients, respectively, and m a x _ i t is the maximum number of iterations.
Step 6: Continue steps 3 through 5 until the computation reaches the maximum number of iterations. At that point, the optimal parameter is the position data of the ideal Drosophila individuals.
Through FOA optimization, DBSCAN is able to analyze the wind power clustering characteristics more efficiently and accurately, thus reducing the cost of manual parameter adjustment, improving the clustering effect, and laying the foundation for the subsequent tasks, such as large-scale wind turbine power prediction.
To objectively validate the clustering performance of DBSCAN optimized by FOA, it is necessary to conduct a quantitative quality assessment of the obtained clustering results. Three widely used internal cluster validity indices are adopted in this study: SC, DB and CH index.
(1).
SC. For each sample s , let a s denote the average distance between sample s and all other samples within the same cluster (intra-cluster dissimilarity), and let b s denote the minimum average distance between sample   s and all samples in any other cluster (inter-cluster dissimilarity). The silhouette value of sample   s is defined as:
S ( s ) = b ( s ) a ( s ) max { a ( s ) , b ( s ) } , a ( s ) [ 1 , 1 ]
The overall SC for a clustering result is obtained by averaging S s over all samples. A larger SC value indicates that samples are closer to their own clusters and farther away from other clusters, and thus corresponds to better clustering quality.
(2).
DB. Let k be the number of clusters. For cluster c , we first compute its intra-cluster dispersion ( C c ), which is defined as the average distance of samples in cluster c to its centroid. For any two clusters   c and d , the similarity measure ( R c d ) is given by:
R c d = C c + C d | | w c w d | | 2
where w c and w d are the centroids of clusters c and d , respectively. w c w d 2 denotes the center-to-center distance between each pair of clusters c and d , reflecting the degree of separation between different clusters. The D B is then defined as:
D B = 1 k c = 1 k max d c R c d
A smaller DB value indicates that clusters are more compact and better separated from each other.
(3).
CH Index. Let n be the total number of samples and k be the number of clusters. The CH index is defined as the ratio between the between-cluster dispersion and the within-cluster dispersion:
C H = T r ( B k ) / ( k 1 ) T r ( W k ) / ( n k )
where Tr ( B k ) denotes the trace of the between-cluster dispersion matrix, and Tr ( W k ) denotes the trace of the within-cluster dispersion matrix. Intuitively, a larger CH value means that cluster centers are far apart from each other while samples inside each cluster are relatively close to their own center, which corresponds to better clustering performance under this criterion.

4. Algorithmic Models

4.1. TCN

TCN, proposed by Bai et al. [41], is a neural network architecture designed for time series data. By integrating causal and dilated convolutions, TCN effectively addresses the limitations of traditional RNNs and captures long-term dependencies more efficiently. This structure significantly expands the receptive field, enabling the model to efficiently extract information from long time series. TCN not only excels in modeling long-term dependencies but also eliminates the recursive computation bottleneck of RNNs, making it especially suitable for time series prediction tasks.

4.1.1. Causal Convolution

Causal convolution ensures that the output of each time step of the model depends only on the current and previous time step data, following the causality of the time series. Compared with ordinary convolution, causal convolution fulfills the need for time series forecasting by preventing future inputs from being mixed into the computation of the current time step through strict temporal constraints. Its mathematical expression is as follows:
F ( t ) = i = 0 k 1 f ( i ) × x t 1
In Equation (20), F ( t ) denotes the output value at time step t , f ( i ) denotes the corresponding weight of the ith convolutional kernel, k is the size of the convolutional kernel, and x t i is the value at time step t i   in the input sequence. This mechanism ensures that the network does not leak future information.

4.1.2. Dilated Convolution

Dilated convolution expands the receptive field by introducing a dilation factor into the ordinary convolution operation, resulting in sparse connections across time steps. As the dilation factor increases, the interval between sampling points in each convolutional layer widens, allowing the network to capture a longer temporal range within its receptive field. This property greatly enhances the model’s ability to extract information from long sequences. The mathematical formulation is as follows:
F ( t ) = i = 0 k 1 f ( i ) × x t d · i
In Equation (21), F ( t ) is the output value at time step t , f ( i ) is the weight of the ith unit of the convolution kernel, k is the size of the convolution kernel, and x t d · i is the input point after dilation. d is the dilation factor, which determines the range of the sensory field. With multi-layer dilation convolution, the perceptual field grows exponentially and is able to cover a long range of data with fewer convolutional layers. Dilation convolution significantly increases the perceptual fields of the model while preserving the feature details, thus avoiding the increase in complexity. Dilated causal convolution is constructed by integrating causal convolution with dilated convolution. This design strictly enforces causality in time-series modeling by ensuring that the output at each time step depends solely on the current and previous input data. Moreover, incorporating dilation factors significantly expands the receptive field, allowing the model to effectively capture long-range dependencies. The architecture of this convolution is depicted in Figure 3. In this paper, the parameters are set to d = 1 , 2 , 4 ,   k = 3 .

4.1.3. Residual Module

As seen in Figure 4, TCN introduces a residual network structure to mitigate the gradient vanishing and convergence difficulties in deep network training. The gradient propagation efficiency is increased by the residual module superimposing the inputs to the convolutional outputs via jump connections. The mechanisms include causal convolution, dilation convolution, ReLU activation, normalization, dropout, etc., and multiple layers are superimposed to form a deep structure. This design improves the stability of feature extraction and significantly reduces the impact of the gradient vanishing problem. The formula for the residual module is as follows:
y t = x t + F ( x t )
In Equation (22), x t   is the input feature and   F ( x t )   is the feature after filtering by convolution.
Figure 4. Residuals module.
Figure 4. Residuals module.
Sustainability 17 10719 g004

4.2. MSA Model

The main role of the attention mechanism is to highlight the importance of key time points in wind power signals and provide accurate feature characterization for subsequent prediction [42]. Based on the wind power features extracted by TCN, combined with the MSA, the feature relationships at different spatio-temporal locations are effectively captured by parallel attention layers, as shown in Figure 5. Figure 5 illustrates the computation process of the scaled dot-product attention (left) and its extension to multi-head self-attention with h = 2 heads (right). The arrows and lines indicate the flow of information from the input queries Q , keys K , and values   V through matrix multiplications, masking, normalization, and concatenation to produce the final attention output. The attention output at each layer is assigned a weight and serves as the predictive basis of the model [43]. The attention mechanism employs a scaled dot product as its weighted scoring function, and the resulting output vector is given by:
A t t e n t i o n ( Q , K , V ) = s o f t m a x Q K T d k V
H i = A t t e n t i o n ( Q W Q i , K W K i , V W V i ) M A S ( Q , K , V ) = c o n c a t ( H 1 , H 2 , , H n ) W 0
In the above equation, K and V are key-value pairs (Key, Value);   Q is the target data;   d k denotes the dimension of Q . The computation of the MSA attention mechanism needs to satisfy Q   =   K   =   V , h is the number of attention heads,   d v is the dimension of V , and the matrix of attention weights, W 0 R hdv × dim , W Q i , W K i R dim × d k , W V i R dim × d v .
Figure 5. Network structure of the MSA.
Figure 5. Network structure of the MSA.
Sustainability 17 10719 g005

4.3. BiGRU Network

The GRU is an efficient recurrent neural network that reduces computational complexity by using update and reset gates while effectively capturing long-term dependencies. Here, the BiGRU—a bidirectional extension of the GRU—is employed to improve the ability to model bidirectional dependencies and to capture comprehensive semantic information from both forward and backward sequences. By splicing the features of forward and backward GRUs, BiGRU provides the comprehensive characterization capability of complex sequence information. Its architecture, shown in Figure 6, significantly improves the accuracy of feature representation and is suitable for complex time series modeling tasks. The specific computational mechanism can be defined by the following equation:
h t = GRU ( h t 1 , x t )
h t = GRU ( h t + 1 , x t )
h t = [ h t ; h t ]
In the above equation,   h t   denotes the hidden state of forward GRU network at time step t , h t   denotes the hidden state of reverse GRU network at time step t   , and   x t   is the input feature at time step t   . The final output h t is defined by concatenating the forward and reverse hidden states at each time step, which facilitates more effective extraction of the global information from the complex sequence.
Figure 6. BiGRU structure.
Figure 6. BiGRU structure.
Sustainability 17 10719 g006

4.4. TCN-BiGRU-MSA Network Structure

Combining the TCN, BiGRU network, and MSA mechanism, a TCN-BiGRU-MSA hybrid neural network model is constructed for wind power temporal prediction. The model gives full play to the long-term feature extraction of TCN, the bidirectional dynamic information capture of BiGRU, and the feature weight optimization of MSA, which significantly improves the modeling capability of complex spatio-temporal features of wind power. The structure is shown in Figure 7.

5. Experimental Analysis

To validate the effectiveness of the proposed forecasting method, the model was tested using a wind farm cluster dataset. This dataset comprises measured data from seven onshore wind farms in Inner Mongolia, China. All wind farms are equipped with approximately 2-megawatt horizontal-axis variable-speed wind turbines, primarily located on flat or gently sloping grasslands without complex mountainous or offshore environments. For each wind farm, aggregated active power and corresponding wind speed data were recorded at a 10 min time resolution and synchronized with numerical weather prediction (NWP) data at the same resolution. The dataset spans from 1 March to 30 December 2022, with samples divided into a training set (70%) and a test set (30%).

5.1. SVMD Based Wind Speed Decomposition Results

In order to further explore the detailed features of the wind speed data in the time series, the decomposition is carried out using SVMD, and the results are shown in Figure 8.
As illustrated in Figure 8, IMF1 to IMF6 are the results obtained by applying SVMD to decompose the wind speed series from seven wind farms. Each subsequence captures a distinct characteristic, with IMF1 representing the low-frequency trend and IMF6 reflecting localized fluctuation rates. This technique preserves the wind speed sequence features and suppresses modal aliasing.

5.2. DBSCAN Wind Turbine Clustering Method Based on FOA Optimization

Initially, PCA is applied to reduce the data dimensionality, and the resulting principal components are used as feature vectors. Subsequently, FOA is employed to globally optimize the parameters ε and p m i n of DBSCAN, thereby constructing an FOA-DBSCAN clustering model for effective clustering of the feature vectors.

5.2.1. Parameter Optimization Process

As can be seen from Figure 9, the FOA performs best in terms of convergence speed and final score, obtaini532ng a high clustering score with fewer iterations. The PSO algorithm converges second fastest, and its effectiveness is better than that of the GA algorithm. In conclusion, the FOA is overall better than PSO and GA and finally converges to the global optimal solution, which verifies its effectiveness.

5.2.2. Clustering Results

Figure 10 and Table 1 show that the results of clustering algorithms varied significantly. DBSCAN combined with FOA effectively differentiated the data points in the 3D PCA scatterplot, whereas the K-means and spectral clustering overlapped more (especially between data1 and data2). Table 1’s SC, CH, and DB indices are 0.5009, 9.1585, and 0.8888, respectively. FOA-DBSCAN performs better than the other approaches, confirming its efficacy. FOA optimizes the DBSCAN parameters to solve the traditional setup problem, and the experimental results show that the algorithm is efficient and accurate.
In summary, a higher SC and CH and a lower DB correspond to better clustering quality, i.e., higher intra-cluster compactness and inter-cluster separation. Therefore, the clustering method with larger SC and CH values and smaller DB values is regarded as providing more favorable clusters for the subsequent forecasting task.

5.2.3. Selection of Representative Turbines Within Clusters

After obtaining the final clustering results, representative turbines are further selected within each cluster to reduce the computational burden of subsequent modeling. Specifically, for each cluster C k , we compute the cluster-average power output series P k ( t ) by averaging the active power of all turbines in that cluster at each time step. Then, for every turbine i C k , we calculate the Pearson correlation coefficient ρ i , k between its individual power sequence P i ( t ) and the cluster-average sequence P k ( t ) . The turbine with the highest correlation coefficient ρ i , k is selected as the representative turbine of cluster C k . In this study, one representative turbine is selected per cluster, and the proposed forecasting model is trained on the representative turbines. The cluster-level power prediction is finally obtained by mapping the prediction of representative turbines back to their corresponding clusters.

5.3. SVMD-TCN-BiGRU-MSA Model Prediction Results

The proposed model was implemented on the MATLAB R2023b platform, where training and testing were completed. Key hyperparameters were finely tuned based on a series of preliminary experiments. During training, the Adam optimizer was employed with mean squared error (MSE) as the loss function. The maximum number of training iterations was set to 200, with an initial learning rate of 0.001.
To mitigate overfitting risks in highly volatile, ultra-short-term wind power scenarios, multiple regularization strategies were incorporated during training. Dropout layers were added after both the TCN and BiGRU modules, with a dropout rate of 0.2 for the TCN layer and 0.3 for the fully connected output layer. Additionally, L2 weight regularization (weight decay) with a coefficient of 1 × 10−4 is applied to all trainable parameters to penalize excessively large weights and enhance the model’s generalization capability.
Furthermore, to validate the effectiveness of SVMD-based data decomposition on prediction performance, the decomposed wind power sequence and the original undecomposed sequence are separately input into the TCN-BiGRU-MSA model for comparative analysis.
The maximum number of training iterations is 200 epochs. The learning rate is 0.001. First, to verify the effectiveness of SVMD data processing on prediction results, the decomposed data and undecomposed data were input into the TCN-BiGRU-MSA model, respectively. The prediction results are shown in Figure 11.
To ensure fairness and reproducibility in comparative experiments, the data preprocessing procedures for all baseline models were consistent with those of the proposed model. The RF baseline model employed 100 trees, with all other parameters set to the default values of the implementation environment. TCN-BiGRU and TCN-BiLSTM share the same TCN backbone architecture: comprising three residual TCN blocks with a convolution kernel size of 5, 128 channels, and dilation factors of 1, 2, and 4, respectively. Each convolution layer is followed by LayerNorm and ReLU activation, with dropout applied after every two convolutions within each residual block at a rate of 0.08. The TCN output sequentially feeds into a forward GRU (or LSTM) layer and a backward GRU (or LSTM) layer, each with 64 hidden units. The forward and backward outputs are concatenated along the feature dimension before passing through a self-attention layer and a fully connected regression output layer. The GCN-BiGRU model employs a 2-layer GCN with 64 channels per layer, using ReLU activation and a dropout rate of 0.05. Its output similarly connects to forward and backward GRU layers (64 units each), concatenation, a self-attention layer, and a fully connected regression layer. All hyperparameters—including optimizers, loss functions, and training epochs—for baseline models align with the unified training configuration of the proposed model. All models are evaluated on identical training and testing datasets to ensure fair and comparable results.
The figure shows that the prediction curve is closer to the actual value and that the SVMD data input model in this paper has a better prediction effect. In order to verify the prediction performance of the model, ablation experiments are carried out, which are compared with TCN-BiGRU, TCN-BiGRU-MSA (GJO), and SVMD-TCN-BiGRU-MSA, respectively, and the results are shown in Figure 12.
As can be seen from Figure 12, the model in this paper has the highest agreement between the prediction curve (red) and the true value (black). This shows a strong prediction ability; it captures the trend of the true signals, especially in the fluctuating time period. In addition, the prediction results of the TCN-BiGRU-MSA(GJO) model (green), the SVMD-TCN-BiGRU-MSA model (yellow), and the TCN-BiGRU model (purple) are relatively poor, with the data points deviating from the true values.
As shown in Figure 12 and Figure 13 and Table 2, to eliminate the effects of random model initialization and data shuffling, all experiments were repeated five times under different random seeds. The results in Table 2 are presented as “mean ± standard deviation.” The proposed model demonstrates high overall accuracy and low bias, effectively suppressing prediction errors. Compared to traditional baseline models SVR, BiLSTM, and RF, our model achieves reductions in MAE of 46.8%, 33.9%, and 21.0%, respectively; reductions in RMSE of 45.3%, 17.3%, and 19.7%, respectively; and reductions in MAPE of 58.6%, 27.3%, and 24.8%, respectively. Compared to deep learning models TCN-BiGRU, TCN-BiLSTM, and GCN-BiGRU, our model still achieves stable improvements, reducing MAE by 18.5%, 14.2%, and 10.0%, respectively, RMSE decreased by 16.6%, 13.0%, and 9.5%, respectively, while MAPE decreased by 17.8%, 12.3%, and 8.3%, respectively. Furthermore, compared to the improved hybrid models TCN-BiGRU-MSA(GJO) and SVMD-TCN-BiGRU-MSA, our proposed model achieved additional reductions of 13.4% and 5.4% in MAE, 11.5% and 5.6% in RMSE, and 10.9% and 5.8% in MAPE, respectively. Among all comparison models, the proposed model achieves the minimum values across all three error metrics—MAE, RMSE, and MAPE—demonstrating superior prediction accuracy and stronger generalization capabilities. Simultaneously, it exhibits the highest R2 value (0.959), indicating excellent overall fitting performance. Although the absolute improvement in each error metric is relatively modest, this enhancement remains consistently stable across multiple replicate experiments.

6. Conclusions

In this study, a complete set of solutions is formed through innovative data preprocessing techniques, improved clustering methods and construction of hybrid prediction models. These innovations improve prediction accuracy and calculation efficiency, provide reliable technical support for wind power grid-connected scheduling, and have important theoretical significance and practical value. The specific results are mainly reflected in the following three aspects:
  • To reduce data non-stationarity, the SVMD algorithm applies stacked variational mode decomposition. It decomposes the wind speed signal into six eigenmodes, IMF1 to IMF6. This method separates the low-frequency trend (IMF1) from the high-frequency fluctuation (IMF6). As a result, the input data’s MAE is reduced by 6.4% to 22.9%. In the comparison experiments of seven wind farms, the prediction curve fit R2 of SVMD-processed data is improved to 0.959, which verifies its advantage in suppressing modal aliasing.
  • The FOA-DBSCAN algorithm dynamically adjusts neighborhood radius ε and density threshold p m i n using the FOA. It was tested on seven wind farms and showed superior performance. The algorithm converges in only 15 iterations, making it faster than PSO and the GA. It significantly outperforms these traditional optimization algorithms. The method solves DBSCAN’s parameter sensitivity problem effectively. It lays a reliable data foundation for future power prediction tasks.
  • The model integrates the advantages of the TCN, BiGRU and MSA. In ablation experiments, this model’s MAE and RMSE decrease by more than 21.5%. Compared with the benchmark, the MAPE drops to 15.34%. The coefficient of determination, R2, reaches a high value of 0.959. Through the adaptive parameter tuning of the GJO algorithm, the model converges quickly within 200 epochs, showing higher prediction accuracy and smaller prediction error.
Although the model performs well in wind power prediction, there are still limitations. This study mainly considers key meteorological variables such as wind speed, temperature, and air pressure. However, wind power output is also affected by more complex external factors, including grid scheduling strategies, equipment maintenance conditions, and diverse terrain characteristics. In future work, a multi-source data fusion prediction framework will be developed to integrate meteorological data, SCADA measurements, grid operation information, and static geographic features so as to further improve the adaptability and reliability of the model in practical applications. At the algorithmic level, the model will be continuously refined by incorporating more advanced architectures, such as the M2WLLM algorithm for multi-modal wind power large-model learning, and by introducing adaptive feature selection mechanisms to automatically identify highly informative variables under different operating scenarios. These enhancements are expected to further strengthen the generalization capability of the proposed approach and enhance its engineering application value.

Author Contributions

Conceptualization, R.G.; methodology, Z.Z.; software, R.G.; formal analysis, R.G.; investigation, Y.G.; data curation, W.L.; writing—original draft preparation, R.G.; supervision, K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Inner Mongolia Autonomous Region “Open Competition Mechanism to Select the Best Candidates” Project (grant number 2022JBGS0045), the Inner Mongolia Autonomous Region Science and Technology Major Special Program Project (grant number 2021ZD0032).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors greatly appreciate the comments from the reviewers, whose comments helped improve the quality of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SVMDSuccessive Variational Modal Decomposition
PCAPrincipal Component Analysis
DBSCANDensity-Based Spatial Clustering of Applications with Noise
FOAFruit Fly Optimization Algorithm
TCNTemporal Convolutional Network
BiGRUBidirectional Gated Recurrent Unit
MSAMulti-head Self-attention Mechanism
GJOGolden Jackal Optimization
SCSilhouette Coefficient
CHCalinski–Harabaz
DBDavies–Bouldin Index

References

  1. World Earth Day: Protecting the Earth’s Green Future with Renewable Energy. Xinhua Daily Telegraph, 23 April 2025.
  2. Sun, R.F.; Zhang, T.; He, Q.; Xu, H. Review on Key Technologies and Applications in Wind Power Forecasting. High Volt. Eng. 2021, 47, 1129–1143. [Google Scholar] [CrossRef]
  3. Hou, B.; Wang, D.; Xia, T.; Peng, Z.; Tsui, K.-L. Difference mode decomposition for adaptive signal decomposition. Mech. Syst. Signal Process. 2023, 191, 110203. [Google Scholar] [CrossRef]
  4. Chen, S.Q.; Peng, Z.K.; Zhou, P. Review of Signal Decomposition Theory and Its Applications in Machine Fault Diagnosis. J. Mech. Eng. 2020, 56, 91–107. [Google Scholar] [CrossRef]
  5. He, J.L.; Hao, J.X.; Su, C.F.; Tu, Z.Z. Ultra-short-term photovoltaic power prediction based on SVMD-BO-BiTCN. Distrib. Energy 2024, 9, 22–31. [Google Scholar] [CrossRef]
  6. Parri, S.; Teeparthi, K. SVMD-TF-QS: An efficient and novel hybrid methodology for the wind speed prediction. Expert Syst. Appl. 2024, 249, 123516. [Google Scholar] [CrossRef]
  7. Ji, Q.; Sun, Y.F.; Hu, Y.L.; Yin, B.C. Review of Clustering With Deep Learning. J. Beijing Univ. Technol. 2021, 47, 912–924. [Google Scholar] [CrossRef]
  8. Su, J.; Zheng, S.; Yan, G.; Xiong, G.; Cai, T. Research on Day-ahead Forecast of Wind Power Based on FCM-Equivalent Wind Speed Model. Power Syst. Clean Energy 2022, 38, 110–120. [Google Scholar]
  9. Fan, H.; Zhen, Z.; Liu, N.; Sun, Y.; Chang, X.; Li, Y.; Wang, F.; Mi, Z. Fluctuation pattern recognition based ultra-short-term wind power probabilistic forecasting method. Energy 2022, 266, 126420. [Google Scholar] [CrossRef]
  10. Ou, Z.; Lan, W.; Zhang, L.; Tong, Z.; Liu, D.; Liu, Z. Comparative Analysis of Offshore Wind Power Prediction Models and Clustering-Based Daily Output Classification. In Proceedings of the 2024 IEEE 2nd International Conference on Power Science and Technology, ICPST 2024, Dali, China, 9–11 May 2024; IEEE: New York, NY, USA, 2024; pp. 1482–1487. [Google Scholar] [CrossRef]
  11. Yuan, G.L.; Wu, Z.M.; Liu, H.Q.; Yu, J.F.; Fang, F. Short-Term Wind Power Prediction Based On Deep Belief Network. Acta Energiae Solaris Sin. 2022, 43, 451–457. [Google Scholar] [CrossRef]
  12. Zhang, Y.; Wang, S. An innovative forecasting model to predict wind energy. Environ. Sci. Pollut. Res. 2022, 29, 74602–74618. [Google Scholar] [CrossRef]
  13. He, X.; Lei, Z.; Jing, H.; Zhong, R. Short-Term Probabilistic Forecasting Method for Wind Speed Combining Long Short-Term Memory and Gaussian Mixture Model. Atmosphere 2023, 14, 717. [Google Scholar] [CrossRef]
  14. Wang, Y.; Liu, H.; Song, P.; Hu, Z.; Wu, L. Short-term Power Forecasting Method of Wind Farm Based on Gaussian Mixture Model Clustering. Autom. Electr. Power Syst. 2021, 45, 37–43. [Google Scholar]
  15. Zhao, F.; Zhang, T.X. Multi-Regional Composite Short-Term Wind Power Prediction Based on Adaptive Optimization Ap Clustering and Bp Weighted Network. Acta Energiae Solaris Sin. 2024, 45, 634–640. [Google Scholar]
  16. Xu, C.; Yang, P.; Huang, Y. Data Assimilation Method Based on Wind Farm Data and DBSCAN-OI Algorithm. In Proceedings of the 2017 2nd International Conference on Power and Renewable Energy (ICPRE), Chengdu, China, 20–23 September 2017; IEEE: New York, NY, USA, 2017; pp. 407–411. Available online: https://webofscience.clarivate.cn/wos/alldb/summary/764f19ea-15c4-4e8f-be11-a8f4b5d60d6d-0130d23539/relevance/1 (accessed on 3 December 2024).
  17. Wang, G.; Lin, G.Y. Improved Adaptive Parameter DBSCAN Clustering Algorithm. Comput. Eng. Appl. 2020, 56, 45–51. [Google Scholar] [CrossRef]
  18. Pan, W.C. Using Fruit Fly Optimization Algorithm Optimized General Regression Neural Network to Construct the Operating Performance of Enterprises Model. J. Taiyuan Univ. Technol. (Soc. Sci. Ed.) 2011, 29, 1–5. [Google Scholar]
  19. Ranjanr, K.; Kumar, V. A Systematic Review on Fruit Fly Optimization Algorithm and Its Applications. Artif. Intell. Rev. 2023, 56, 13015–13069. [Google Scholar] [CrossRef]
  20. Wang, Q.; Wang, Y.; Zhang, K.; Liu, Y.; Qiang, W.; Wen, Q.H. Artificial Intelligent Power Forecasting for Wind Farm Based on Multi-Source Data Fusion. Processes 2023, 11, 1429. [Google Scholar] [CrossRef]
  21. Xu, D.; Shao, H.; Deng, X.; Wang, X. The Hidden-Layers Topology Analysis of Deep Learning Models in Survey for Forecasting and Generation of the Wind Power and Photovoltaic Energy. CMES-Comput. Model. Eng. Sci. 2022, 131, 567–597. [Google Scholar] [CrossRef]
  22. Cui, J.; Yang, J.Y.; Yang, L.J.; Gao, K.M.; Song, Z.C.; Gao, Z.A. Wind farm power prediction method based on improved CFD and wavelet hybrid neural network combination. Power Syst. Technol. 2017, 41, 79–85. [Google Scholar] [CrossRef]
  23. Liu, H.; Mi, X.; Li, Y.; Duan, Z.; Xu, Y. Smart wind speed deep learning based multi-step forecasting model using singular spectrum analysis, convolutional Gated Recurrent Unit network and Support Vector Regression. Renew. Energy 2019, 143, 842–854. [Google Scholar] [CrossRef]
  24. Biswas, A.K.; Ahmed, S.I.; Bankefa, T.; Ranganathan, P.; Salehfar, H. Performance analysis of short and mid-term wind power prediction using ARIMA and hybrid models. In Proceedings of the 2021 IEEE Power and Energy Conference at Illinois (PECI), Urbana, IL, USA, 1–2 April 2021; pp. 1–7. [Google Scholar]
  25. Lipu, M.S.H.; Miah, S.; Hannan, M.A.; Hussain, A.; Sarker, M.R.; Ayob, A.; Saad, M.H.M.; Mahmud, S. Artificial Intelligence Based Hybrid Forecasting Approaches for Wind Power Generation: Progress, Challenges and Prospects. IEEE Access 2021, 9, 102460–102489. [Google Scholar] [CrossRef]
  26. Qian, Y.S.; Shao, J.; Ji, X.X.; Li, X.R.; Mo, C.; Chen, Q.Y. Short-Term Wind Power Forecasting Based on LSTM-Attention Network. Electr. Mach. Control. Appl. 2019, 46, 95–100. [Google Scholar]
  27. Tuerxun, W.; Xu, C.; Guo, H.; Guo, L.; Zeng, N.; Gao, Y. A Wind Power Forecasting Model Using LSTM Optimized by the Modified Bald Eagle Search Algorithm. Energies 2022, 15, 2031. [Google Scholar] [CrossRef]
  28. Liu, X.; Wang, Y.; Ji, Z.C. Short-term Wind Power Prediction Method Based on Random Forest. J. Syst. Simul. 2021, 33, 2606–2614. [Google Scholar] [CrossRef]
  29. Chen, W.; Qi, W.; Li, Y.; Zhang, J.; Zhu, F.; Xie, D.; Ru, W.; Luo, G.; Song, M.; Tang, F. Ultra-Short-Term Wind Power Prediction Based on Bidirectional Gated Recurrent Unit and Transfer Learning. Front. Energy Res. 2021, 9, 808116. [Google Scholar] [CrossRef]
  30. Tang, J.; Liu, Z.; Hu, J. Spatial-Temporal Wind Power Probabilistic Forecasting Based on Time-Aware Graph Convolutional Network. IEEE Trans. Sustain. Energy 2024, 15, 1946–1956. [Google Scholar] [CrossRef]
  31. Xiao, F.; Chai, L.; Xu, W.; Peng, R. Multifactor Short-Term Wind Power Interval Prediction Based on IFC-TCN-BiGRU-attention. In Proceedings of the 2024 IEEE 2nd International Conference on Power Science and Technology, ICPST 2024, Dali, China, 9–11 May 2024; IEEE: New York, NY, USA, 2024; pp. 1529–1535. [Google Scholar] [CrossRef]
  32. Fan, H.; Li, M.; Zhang, Z.; Cheng, L.; Ye, Y.; Liu, W.; Liu, D. M2WLLM: Multi-modal multi-task ultra-short-term wind power prediction algorithm based on large language model. Inf. Fusion 2026, 126 Pt A, 103541. [Google Scholar] [CrossRef]
  33. Dhaka, P.; Sreejeth, M.; Tripathi, M.M. Wind power forecasting using multivariate signal decomposition and stacked GRU ensembles with error correction. Future Gener. Comput. Syst. 2026, 175, 108105. [Google Scholar] [CrossRef]
  34. Zhao, E.; Sun, S.; Wang, S. New developments in wind energy forecasting with artificial intelligence and big data: A scientometric insight. Data Sci. Manag. 2022, 5, 84–95. [Google Scholar] [CrossRef]
  35. Ye, L.; Ren, C.; Zhao, Y.N.; Rao, R.S.; Teng, J.Z. Stratification Analysis Approach of Numerical Characteristics for Ultra-short-term Wind Power Forecasting Error. Proc. CSEE 2016, 36, 692–700. [Google Scholar] [CrossRef]
  36. Liu, X.L.; Mo, Y.C.; Wu, Z.; Yan, K. Hybrid Deep Learning Model Based on Super-Short-Term Wind Power Forecasting. J. Huaqiao Univ. (Nat. Sci.) 2022, 43, 668–676. [Google Scholar] [CrossRef]
  37. Chen, L.; Huang, K.Y.; Zhang, Y.; Cai, K.Z.; Chen, Y.; Zhang, Z.R. Ultra-Short-Term Wind Power Forecasting Based on Information Recombination and TCN-LSTM-MHSA. South. Power Syst. Technol. 2025, 19, 1–10. [Google Scholar]
  38. Nazari, M.; Sakhaeis, M. Successive Variational Mode Decomposition. Signal Process. 2020, 174, 107610. [Google Scholar] [CrossRef]
  39. Zhao, Q. A review of principal component analysis methods. Softw. Eng. 2016, 19, 1–3. [Google Scholar]
  40. Zhou, S.L.; Mao, M.Q.; Su, J.H. Wind power prediction based on principal component analysis and artificial neural network. Power Syst. Technol. 2011, 35, 128–132. [Google Scholar] [CrossRef]
  41. Bai, S.; Kolter, J.Z.; Koltun, V. Trellis Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
  42. Ren, H.; Wang, X.G. A survey of attention mechanisms. J. Comput. Appl. 2021, 41, 1–6. [Google Scholar]
  43. Zhu, Z.L.; Rao, Y.; Wu, Y.; Li, J.; Wang, H.; Chen, X. Research progress of attention mechanism in deep learning. J. Chin. Inf. Process. 2019, 33, 1–11. [Google Scholar]
Figure 1. Overall modeling framework.
Figure 1. Overall modeling framework.
Sustainability 17 10719 g001
Figure 2. DBSCAN algorithm flow.
Figure 2. DBSCAN algorithm flow.
Sustainability 17 10719 g002
Figure 3. Expanded causal convolution structure.
Figure 3. Expanded causal convolution structure.
Sustainability 17 10719 g003
Figure 7. TCN-BiGRU-MSA network structure.
Figure 7. TCN-BiGRU-MSA network structure.
Sustainability 17 10719 g007
Figure 8. Wind speed decomposition for seven wind farms. (a) Wind speed decomposition of Wind Farm 1; (b) Wind speed decomposition of Wind Farm 2; (c) Wind speed decomposition of Wind Farm 3; (d) Wind speed decomposition of Wind Farm 4; (e) Wind speed decomposition of Wind Farm 5; (f) Wind speed decomposition of Wind Farm 6; (g) Wind speed decomposition of Wind Farm 7.
Figure 8. Wind speed decomposition for seven wind farms. (a) Wind speed decomposition of Wind Farm 1; (b) Wind speed decomposition of Wind Farm 2; (c) Wind speed decomposition of Wind Farm 3; (d) Wind speed decomposition of Wind Farm 4; (e) Wind speed decomposition of Wind Farm 5; (f) Wind speed decomposition of Wind Farm 6; (g) Wind speed decomposition of Wind Farm 7.
Sustainability 17 10719 g008
Figure 9. Comparison of convergence curves of different optimization algorithms.
Figure 9. Comparison of convergence curves of different optimization algorithms.
Sustainability 17 10719 g009
Figure 10. Comparison of results of different clustering algorithms. (a) Result of wind turbine clustering using the FOA-DBSCAN algorithm shown in three-dimensional PCA space; (b) Result of wind turbine clustering using the K-means algorithm in three-dimensional PCA space; (c) Result of wind turbine clustering using the spectral clustering algorithm in three-dimensional PCA space.
Figure 10. Comparison of results of different clustering algorithms. (a) Result of wind turbine clustering using the FOA-DBSCAN algorithm shown in three-dimensional PCA space; (b) Result of wind turbine clustering using the K-means algorithm in three-dimensional PCA space; (c) Result of wind turbine clustering using the spectral clustering algorithm in three-dimensional PCA space.
Sustainability 17 10719 g010
Figure 11. Comparison of prediction results with and without data decomposition.
Figure 11. Comparison of prediction results with and without data decomposition.
Sustainability 17 10719 g011
Figure 12. Results of ablation experiments.
Figure 12. Results of ablation experiments.
Sustainability 17 10719 g012
Figure 13. Comparison results of this model with other models.
Figure 13. Comparison results of this model with other models.
Sustainability 17 10719 g013
Table 1. Comparison of performance evaluation indexes of different clustering algorithms.
Table 1. Comparison of performance evaluation indexes of different clustering algorithms.
Clustering MethodSCCHDB
FOA-DBSCAN0.50099.15850.8888
k-means0.423736.67291.6220
spectral clustering0.459446.10811.4999
Table 2. Comparison results of this model with other models.
Table 2. Comparison results of this model with other models.
Predictive ModelMAERMSEMAPER2
SVR0.730 ± 0.0390.856 ± 0.03436.59 ± 1.330.791 ± 0.066
BiLSTM0.587 ± 0.0240.566 ± 0.02220.85 ± 0.720.901 ± 0.049
RF0.491 ± 0.0160.583 ± 0.02420.17 ± 0.820.907 ± 0.023
TCN-BiGRU0.476 ± 0.0130.561 ± 0.01918.45 ± 0.910.916 ± 0.018
TCN-BiLSTM0.452 ± 0.0190.538 ± 0.01717.29 ± 0.980.921 ± 0.012
TCN-BiGRU-MSA(GJO)0.448 ± 0.0120.529 ± 0.02117.02 ± 0.790.923 ± 0.014
GCN-BiGRU0.431 ± 0.0180.517 ± 0.03216.54 ± 1.260.928 ± 0.026
SVMD-TCN-BiGRU-MSA0.410 ± 0.0110.496 ± 0.01716.10 ± 0.660.943 ± 0.021
Model of this paper0.388 ± 0.0150.468 ± 0.02015.16 ± 0.810.959 ± 0.017
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, R.; Zhang, Z.; Meng, K.; Gao, Y.; Liu, W. Sustainability-Oriented Ultra-Short-Term Wind Farm Cluster Power Prediction Based on an Improved TCN–BiGRU Hybrid Model. Sustainability 2025, 17, 10719. https://doi.org/10.3390/su172310719

AMA Style

Gao R, Zhang Z, Meng K, Gao Y, Liu W. Sustainability-Oriented Ultra-Short-Term Wind Farm Cluster Power Prediction Based on an Improved TCN–BiGRU Hybrid Model. Sustainability. 2025; 17(23):10719. https://doi.org/10.3390/su172310719

Chicago/Turabian Style

Gao, Ruifeng, Zhanqiang Zhang, Keqilao Meng, Yingqi Gao, and Wenyu Liu. 2025. "Sustainability-Oriented Ultra-Short-Term Wind Farm Cluster Power Prediction Based on an Improved TCN–BiGRU Hybrid Model" Sustainability 17, no. 23: 10719. https://doi.org/10.3390/su172310719

APA Style

Gao, R., Zhang, Z., Meng, K., Gao, Y., & Liu, W. (2025). Sustainability-Oriented Ultra-Short-Term Wind Farm Cluster Power Prediction Based on an Improved TCN–BiGRU Hybrid Model. Sustainability, 17(23), 10719. https://doi.org/10.3390/su172310719

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop