Triple-Flow Dynamic Graph Convolutional Network for Wind Power Forecasting

Bin Li; Bo Ding; Wei Pang; Hongyin Ni

doi:10.3390/sym17122026

,

and

¹

School of Computer Science, Northeast Electric Power University, Jilin 132012, China

²

School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK

^*

Author to whom correspondence should be addressed.

Symmetry2025, 17(12), 2026;https://doi.org/10.3390/sym17122026

This article belongs to the Special Issue Symmetry in Deep Learning Networks and Its Applications in the Real World

Version Notes

Order Reprints

Abstract

Wind energy is a clean but intermittent and volatile energy source, and its large-scale integration into power systems poses great challenges to ensuring safe and stable operation and achieving scheduling optimization and effective energy planning of the power systems. Accurate wind power forecasting is an effective way to mitigate the impact of wind power instability on power systems. However, wind power data are often in the form of multivariate time series. Existing wind power forecasting research often directly models the temporal and spatial characteristics of coupled wind power time-series data, ignoring the heterogeneity of time and space, thereby limiting the model’s expressive power. To address the above problems, we propose a triple-flow dynamic graph convolutional network (TFDGCN) for short-term wind power forecasting. The proposed TFDGCN is a symmetric dynamic graph neural network with three branches. It decouples and learns features of three different dimensions: within a wind power variable sequence, between sequences, and between wind turbines. The proposed TFDGCN constructs dynamic sparse graphs based on cosine similarities within variable sequences, between variable sequences, and between wind turbine nodes, and feeds them into their respective dynamic graph convolution modules. Afterwards, TFDGCN utilizes linear attention encoders which fuse local position encoding (LePE) and rotational position encoding (RoPE) to learn global dependencies within variable sequences, between sequences, and between wind turbines, and provide prediction results. Extensive experimental results on two real-world datasets demonstrate that the proposed TFDGCN outperforms other state-of-the-art methods. On the SDWPF and SD23 datasets, the proposed TFDGCN achieved mean absolute error values of 37.16 and 14.63, respectively, as well as root mean square error values of 44.84 and 17.56, respectively.

Keywords:

wind power prediction; spatiotemporal graph neural network; dynamic sparse graph; triple-flow network structures; symmetric structured network

1. Introduction

Wind energy has received widespread attention due to its renewable, environmentally friendly, and sustainable characteristics. However, as a clean energy source with significant intermittent and fluctuating characteristics, its large-scale grid connection poses many great challenges, including the safe and stable operation of the power system, dispatch optimization, and energy planning []. Accurate wind power forecasting (WPF) can be used to guide power dispatching and grid deployment [,], and therefore it is an effective way to mitigate the impact of wind power instability on power systems [].

In recent years, deep learning has been widely used in wind power forecasting tasks due to its powerful nonlinear fitting and complex data modeling capabilities. Liu et al. [] proposed a short-term wind power forecasting framework based on wavelet transform (WT) denoising and multi-feature long short-term memory (LSTM) networks. Li et al. [] decomposed wind power sequences using VMD, and grouped these sequences by complexity, and they then implemented forecasting using a dual-channel network composed of an informer and a temporal convolutional network (TCN) informer. Wang et al. [] divided wind farm clusters based on empirical orthogonal functions and combined multi-level quantiles with WaveNet with multi-head self-attention to achieve ultra-short-term interval power multi-step prediction. Zhu et al. [] proposed a wind speed prediction method that combines a non-stationary Transformer with dynamic data distillation and wake effect correction. Although the above methods have achieved some degree of success in capturing temporal dependencies of wind power data, wind power prediction is by no means a simple time series prediction problem. The power generation of a wind turbine is determined by its own historical pattern and the conditions of related wind turbines []. Therefore, the wind power prediction task is essentially a spatiotemporal prediction problem that requires simultaneous characterization of temporal laws and spatial characteristics.

Due to the advantages of graph neural networks (GNNs) in modeling spatial dependencies, many studies have used GNNs to capture dependencies between wind turbines []. These methods use GRU or CNN and their related variants to capture temporal features, which are combined with spatial features captured by GNNs to form spatiotemporal joint features. For example, Daenens et al. [] constructed a graph structure based on line-of-sight visibility and used generalized graph convolution (GENConv) and LSTM to extract spatiotemporal dependencies to achieve offshore wind farm power prediction. An et al. [] used electrical connection relationships to construct a graph structure and embedded diffusion graph convolution (Diffusion Graph Convolution) into a gated recurrent unit (GRU) to capture spatiotemporal dependencies to achieve ultra-short-term power prediction. Yang et al. [] divided wind farm clusters through deep attention embedded graph clustering (DEAGC) and used TimesNet to capture multi-cycle time features after data correction and screening. Zhao et al. [] constructed a multi-graph structure and integrated TCN with spatiotemporal attention mechanism to achieve ultra-short-term wind power interpretable prediction. Zhao et al. [] modelled wind farms as nodes, calculated edge weights based on distance and Pearson correlation coefficient, and constructed a static adjacency matrix. They used two spatiotemporal graph convolutional layers (STGCN) and an output layer to achieve the effect of ultra-short-term wind power prediction. All the aforementioned methods constructed static graphs using wind turbines as nodes and information such as their correlation and geographic distance as edges. However, the actual spatial connections between wind turbines are determined not only by their location but also by time-varying meteorological conditions such as wind speed, direction, and humidity. Static graphs with fixed structure and edge weights may not capture these dynamic changes well.

Since it is difficult to use static graphs to capture spatial-temporal correlations, researchers have begun to explore the application of dynamic graph models in wind power prediction. For example, in [], the authors integrated geographic distance graphs, semantic distance graphs, and learnable parameter matrices, and they dynamically optimized the graph structure through a spatial attention mechanism, and then captured spatiotemporal dependencies by combining an Attentional Graph Convolutional Recurrent Network (AGCRN) with LSTM. Yang et al. [] proposed a method for ultra-short-term wind power prediction for large-scale wind farm clusters based on dynamic spatiotemporal graph convolution. This method abstracts the wind farm into a graph and dynamically adjusts it. It combines spatial graph convolution with temporal graph convolution to model temporal and spatial dependencies and introduces multi-task learning (MTL) to simultaneously output multi-step prediction results for multiple wind farms. Han et al. [] used dynamic graph convolution to capture the dynamic spatial correlation of wind farm clusters and used BiLSTM and Transformer to mine the bidirectional temporal dependencies of wind power, thus achieving power prediction for wind farm clusters. Song et al. [] proposed a wind power forecasting method based on dynamic graph convolution and multi-resolution CNN. This method uses multi-resolution CNN and dynamic graph convolution to extract temporal and spatial features, respectively. Dong et al. [] extracted the temporal and spatial features of wind power data through directed graph convolution and TCN, respectively, and they realized the power forecasting of wind farm clusters. Although dynamic graphs can dynamically capture spatial features, recent studies on time series modeling have shown that the dependencies between time and space are heterogeneous []; time dependencies are usually reflected as dynamic characteristics within the sequence of wind turbine-related variables (such as changes in wind speed, wind direction, or temperature over time), while spatial dependencies are often manifested as cross-sequence relationships between wind turbine variables (such as the correlation between wind speed and wind direction) or spatial dependencies between wind turbines. Existing dynamic graph-based wind power forecasting methods often attempt to extract features simultaneously from the coupled temporal and spatial dimensions. Due to the fundamental differences between these two types of dependencies, jointly modelling them can lead to information entanglement in the dependency representation during feature learning and thus limit the model’s expressive power. Furthermore, existing methods create adjacency graphs with numerous links, and redundant noise in the graph may interfere with the model’s learning of core dependencies, reducing forecasting performance.

To address the above issues, this research proposes a short-term wind power forecasting method based on a triple-flow dynamic graph convolutional network (TFDGCN). Our contributions are as follows:

We propose a method for constructing dynamic sparse graphs for wind turbines based on cosine similarity. This method dynamically constructs three sparse graphs based on similarities within variable sequences, between variable sequences, and between nodes (wind turbines). This method effectively describes the relationships within wind turbine variable sequences, between sequences, and between wind turbines while reducing computational complexity.
We propose TFDGCN, which features a symmetric triple-flow architecture. It decouples and learns relationships across three dimensions, namely, within variable sequences, between sequences, and between wind turbines, represented by three sparse dynamic graphs. TFDGCN also uses linear attention to capture global information across the three dimensions.
We conduct extensive experiments on two real-world wind farm datasets to validate the effectiveness of TFDGCN. TFDGCN outperforms the baseline methods on both wind farm datasets. Furthermore, we demonstrate the effectiveness of various components of TFDGCN through ablation experiments.

The rest of this paper is organized as follows: Section 2 details the problem studied in this research and formally defines the wind power prediction problem. The proposed TFDGCN is presented in Section 3. This is followed by the simulation results reported in Section 4. Finally, Section 5 concludes this paper.

2. Problem Statement

Wind power forecasting for wind farms can reflect three types of correlations: (1) Variable correlation: the various characteristic variables of wind turbines, such as wind speed, wind direction, and temperature, do not act independently. They are interrelated and influence each other, jointly determining wind power. (2) Temporal correlation: continuous wind power data readings form a time series. This means that the data observed at the current moment is correlated with data observed at past moments. This correlation allows for future power forecasts through historical data analysis. (3) Spatial correlation: different wind turbine nodes within the same wind farm may have similar characteristics (e.g., wind speed, wind direction, temperature, power) at the same moment. The correlation between wind turbine nodes with similar characteristics is essentially spatial correlation.

We capture the spatiotemporal correlations of the three types mentioned above through three spatiotemporal dynamic graphs. Namely, we establish dynamic graphs

G_{V}

based on variable correlation,

G_{T}

based on temporal similarity, and

G_{N}

based on spatial correlation. The construction methods of these three dynamic graphs will be detailed in Section 3.1.

Given a multidimensional data set

χ = \{X^{1}, X^{2}, \dots, X^{n}\} \in R^{N \times T \times F}

observed in a wind farm,

N

is the number of wind turbines in the wind farm,

T

is the total time, and

F

is the measured data of the wind turbines. Where

X^{i} \in R^{1 \times T \times F}

represents the data observed by the

i

th wind turbine. Our goal is to learn a function f from the historical observation value

χ^{'} \in R^{N \times S \times F}

of the time window of length

S

, and predict the wind power of the future time window of length

S^{'}

, as shown below:

[\begin{matrix} {\hat{X}}_{(t + 1)}^{1} & {\hat{X}}_{(t + 2)}^{1} & \dots & {\hat{X}}_{(t + S^{'})}^{1} \\ {\hat{X}}_{(t + 1)}^{2} & {\hat{X}}_{(t + 2)}^{2} & \dots & {\hat{X}}_{(t + S^{'})}^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\hat{X}}_{(t + 1)}^{N} & {\hat{X}}_{(t + 2)}^{N} & \dots & {\hat{X}}_{(t + S^{'})}^{N} \end{matrix}] = f ([\begin{matrix} X_{(t - S + 1)}^{1} & X_{(t - S + 2)}^{1} & \dots & X_{(t)}^{1} \\ X_{(t - S + 1)}^{2} & X_{(t - S + 2)}^{2} & \dots & X_{(t)}^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ X_{(t - S + 1)}^{N} & X_{(t - S + 2)}^{N} & \dots & X_{(t)}^{N} \end{matrix}], G_{V}, G_{T}, G_{N})

(1)

In the above,

\{{\hat{X}}_{(t + 1)}^{i}, {\hat{X}}_{(t + 2)}^{i}, \dots, {\hat{X}}_{(t + S^{'})}^{i}\} \in R^{N \times S^{'} \times 1}

is the wind power predicted by the

i

th wind turbine in time window

S^{'}

at time

t

.

3. Methodology

The proposed TFDGCN network architecture is shown in Figure 1. The input data for TFDGCN is multivariate wind power time series data. The coordinate system in Figure 1 can be used to illustrate the variable correlation, temporal correlation, and spatial correlation inherent in these multivariate time series data. In Figure 1, the Variable axis represents wind turbine time series variables such as: wind speed, wind direction, temperature of the surrounding environment, temperature inside the turbine nacelle, nacelle direction, pitch angle of blade, reactive power and active power. The Variable axis describes the relationship between multivariate time series. The Time axis represents the time series, which is the data sequence formed by the changes in wind turbine variables over time. The Time axis reflects the changes in different time slices within a variable’s time series, that is, the change in the variable over time. The Node axis represents the characteristics of wind turbine nodes, namely the wind turbine power at the current moment, wind speed, wind direction, and temperature at the location. The Node axis reflects the spatial correlation between wind turbine nodes at the current moment.

Figure 1. TFDGCN network architecture.

TFDGCN is a three-stream Graph Convolutional Network (GCN) consisting of variable flow, time flow and node flow, with a symmetric architectural design that ensures consistent processing logic across all three branches. The statistical characteristics of time series data, such as mean and variance, change dynamically over time, leading to inconsistent distributions between training and test data, known as distribution drift. This drift can seriously affect the generalization ability of the model. Therefore, TFDGCN first uses the RevIN layer for standardization to address the problem of data distribution drift []. The standardized data is then fed into the three branches. For each branch, an embedding operation is first performed, which constructs an adjacency graph representing the variable, time, and node relationship characteristics. TFDGCN then uses dynamic graph convolution to propagate and aggregate features across the dynamic graph and employs linear attention to capture long-range dependencies. The results from each branch are then concatenated. The concatenated features pass through a fully connected layer to produce the final prediction result.

3.1. Spatiotemporal Sequence Embedding and Construction of Dynamic Sparse Graph

We propose a method for constructing dynamic sparse graphs. By using variable embedding, time embedding, and node embedding, we represent variables, time, and nodes as sparse graphs, respectively, thereby decoupling the dependencies among the three dimensions of variables, time, and nodes.

3.1.1. Variable Embedding and Construction of Sparse Graph Between Sequence

As previously mentioned, the variables that affect wind power do not act independently; rather, they are interrelated and mutually influential, jointly determining the final wind power output. Variable embedding aims to represent the differences between different variables in the input data for wind power forecasting tasks. Specifically, for the observed multidimensional data set

χ^{'}

, the

i

th turbine treats each variable as a node of the graph and maps each variable to a high-dimensional vector space through linear transformation:

E^{V} = E m b e d d i n g V (χ^{'})

(2)

E^{V} = \{\begin{matrix} E_{1}^{V 1} & E_{2}^{V 1} & \dots & E_{F}^{V 1} \\ E_{1}^{V 2} & E_{2}^{V 2} & \dots & E_{F}^{V 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ E_{1}^{V N} & E_{2}^{V N} & \dots & E_{F}^{V N} \end{matrix}\}

(3)

Among them,

E_{j}^{V i} \in R^{D} (i = 1, 2, \dots, N & j = 1, 2, \dots, F)

represents the embedding vector of the

j

th variable of the

i

th wind turbine, and the Embedding function transforms

χ \in R^{N \times F \times S} \to R^{N \times F \times D}

. That is, the Embedding function maps the variables to a high-dimensional vector space through linear transformation, which enables the model to capture these complex relationships that are difficult to represent in low-dimensional space and improves the model’s understanding of the changing patterns of wind power.

Dependencies between variables must be dynamically analyzed based on the observational data, rather than relying on pre-set assumptions or fixed rules. TFDGCN uses cosine similarity to assess similarity between variables and constructs a dynamic sparse graph between variable sequences, as shown below:

\{Θ_{V}^{1}, Θ_{V}^{2}, \dots, Θ_{V}^{N}\} = C O S (E^{V}, E^{V})

(4)

Θ_{V}^{i} = [\begin{matrix} θ_{1, 1}^{i} & θ_{1, 2}^{i} & \dots & θ_{1, F}^{i} \\ θ_{2, 1}^{i} & θ_{2, 2}^{i} & \dots & θ_{2, F}^{i} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ θ_{F, 1}^{i} & θ_{F, 2}^{i} & \dots & θ_{F, F}^{i} \end{matrix}]

(5)

θ_{j, k} = \frac{\sum_{n = 1}^{D} E_{j}^{V} (n) E_{k}^{V} (n)}{\sqrt{\sum_{n = 1}^{D} E_{j}^{V} {(n)}^{2}} \sqrt{\sum_{n = 1}^{D} E_{k}^{V} {(n)}^{2}}}

(6)

where

C O S ()

represents the cosine similarity function,

Θ_{V}^{i}

represents the cosine similarity matrix of the

i

th wind turbine variable,

i

represents the wind turbine number and

θ_{i, j}

represents the cosine similarity between variable

j

and variable

k

.

j

and

k

represents the variable index. In the similarity matrix shown in Equation (5), the

k

most relevant variables are found for each variable (Equation (7)), and the inter-sequence sparse graph set

M^{V}

is established (Equation (8)), as shown below:

M_{i}^{V} = T o p K (Θ_{V}^{i})

(7)

M^{V} = \{M_{1}^{V}, M_{2}^{V}, \dots, M_{N}^{V}\}

(8)

where

M_{i}^{V}

represents the sparse graph between variable sequences of the

i

th wind turbine, and

M^{V} \in R^{N \times F \times F}

represents the set of sparse graphs of all wind turbines in the variable dimension.

3.1.2. Temporal Embedding and Construction of Sparse Graph Within Sequence

In wind power forecasting tasks, the goal of time embedding is to characterize the multivariate time series data along the time dimension, such as the trend of short-term power changes, intraday periodic fluctuations, and short-term sudden fluctuations. In order to capture the temporal characteristics, we draw on the PatchTST method [] to divide the continuous time series data into several time patch units with local correlation. This method can not only focus on the details of variable fluctuations within each time segment but also capture trend or cyclical patterns across time periods through correlation modelling between patches. At the same time, this segmentation method can reduce the sequence length while retaining key time information, allowing the model to focus on the local time series characteristics of short-term predictions, thereby improving the ability of time embedding to characterize the short-term variation pattern of wind power. Specifically, for a sequence

χ^{'}

with a time window size of

S

, we divide it into

P

subsequences:

P = ⌊\frac{S}{l}⌋ + 1

, where

l

is the patch length and

+ 1

represents the completion of the last patch. The time embedding can be expressed as follows:

\{\begin{matrix} L_{1}^{1} & L_{2}^{1} & \dots & L_{P}^{1} \\ L_{1}^{2} & L_{2}^{2} & \dots & L_{P}^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ L_{1}^{N} & L_{2}^{N} & \dots & L_{P}^{N} \end{matrix}\} = P a t c h (χ^{'})

(9)

E^{T} = \{\begin{matrix} E_{1}^{T 1} & E_{2}^{T 1} & \dots & E_{P}^{T 1} \\ E_{1}^{T 2} & E_{2}^{T 2} & \dots & E_{P}^{T 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ E_{1}^{T N} & E_{2}^{T N} & \dots & E_{P}^{T N} \end{matrix}\} = E m b e d d i n g T (\begin{matrix} L_{1}^{1} & L_{2}^{1} & \dots & L_{P}^{1} \\ L_{1}^{2} & L_{2}^{2} & \dots & L_{P}^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ L_{1}^{N} & L_{2}^{N} & \dots & L_{P}^{N} \end{matrix})

(10)

In the above,

E^{T} \in R^{N \times F \times P \times D}

represents the time flow embedding,

L_{j}^{i} \in R^{F \times l} (i = 1, 2, \dots, N & j = 1, 2, \dots, P)

represents the

j

th subsequence of the

i

th wind turbine, and

E_{j}^{T i} \in R^{F \times D} (i = 1, 2, \dots, N & j = 1, 2, \dots, P)

represents the embedding vector of the

j

th subsequence of the

i

th wind turbine. The embedding dimension is

D

.

E m b e d d i n g T

captures the time dimension features by integrating wind power information and mapping it into a high-dimensional space.

The method for constructing a time dimension sparse graph (intra-sequence sparse graph) is the same as that for constructing an inter-variable sparse graph. The equation is as follows:

\{Θ_{T}^{1}, Θ_{T}^{2}, \dots, Θ_{T}^{P}\} = C O S (E^{T}, E^{T})

(11)

Θ_{T}^{i} = [\begin{matrix} θ_{1, 1}^{i} & θ_{1, 2}^{i} & \dots & θ_{1, P}^{i} \\ θ_{2, 1}^{i} & θ_{2, 2}^{i} & \dots & θ_{2, P}^{i} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ θ_{P, 1}^{i} & θ_{P, 2}^{i} & \dots & θ_{P, P}^{i} \end{matrix}]

(12)

where

θ_{i, j}

is the cosine similarity between the

i

th time slice and the

j

th time slice. In the similarity matrix shown in Equation (12), we select the

k

most relevant variables for each variable (Equation (13)) and establish the sparse graph set

M^{T}

between sequences (Equation (14)):

M_{i}^{T} = T o p K (Θ_{T}^{i})

(13)

M^{T} = \{M_{1}^{T}, M_{2}^{T}, \dots, M_{N}^{T}\}

(14)

Here

M_{i}^{T}

represents the sparse graph within the variable sequence of the

i

th wind turbine, and

M^{T} \in R^{N \times P \times P}

represents the set of sparse graphs of all wind turbines in the time dimension.

3.1.3. Node Embedding and Construction of Sparse Graph of Node Relationships

The purpose of node embedding is to learn the relative spatial relationship between wind turbines. Considering that the wind turbine’s own state parameters, such as the yaw angle and pitch angle of the nacelle, reflect the real-time operating status of a single wind turbine, the correlation between it and other wind turbines is not clear in the physical sense. Using state parameters to define the connections between nodes would result in complex and difficult-to-interpret coupling mechanisms within the graph structure. Instead, we chose environmental parameters such as wind direction and speed, along with the predicted target parameters, as inputs for node embedding. This more directly reflects the correlations between wind turbines due to environmental factors. For example, the wind speed and power changes in adjacent wind turbines are often synchronous or transferable. The graph structure constructed based on these parameters is more consistent with the laws of wind farm power forecasting and makes it easier for the model to clearly capture the dynamic influence relationship between nodes, thereby improving the node embedding’s ability to capture the coordinated operation characteristics of wind turbines. Node embedding can be expressed as follows:

χ_{S e l e c t}^{'} = S e l e c t (χ^{'})

(15)

E^{N} = \{\begin{matrix} E_{1}^{N 1} & E_{2}^{N 1} & \dots & E_{N}^{N 1} \\ E_{1}^{N 2} & E_{2}^{N 2} & \dots & E_{N}^{N 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ E_{1}^{N F^{'}} & E_{2}^{N F^{'}} & \dots & E_{N}^{N F^{'}} \end{matrix}\} = E m b e d d i n g N (χ_{S e l e c t}^{'})

(16)

where

χ_{S e l e c t}^{'} \in R^{N \times F^{'} \times S}

represents the input time window tensor after the environmental factor features are selected,

E_{j}^{N i} \in R^{D} (i = 1, 2, \dots, F^{'} & j = 1, 2, \dots, N)

represents the embedding vector of the

j

th wind turbine under the

i

th variable, and

E^{N} \in R^{N \times F^{'} \times D}

represents the node flow embedding.

In the correlation mapping between wind turbines, previous studies often constructed geographic correlation graph structures by presetting distance thresholds or attenuation functions. However, the influence between wind turbines is not solely determined by geographic distance, but changes dynamically with external conditions. Static distance graphs cannot reflect this time-varying dependence. In addition, some studies [] randomly initialize the embedding matrix and construct a dynamic graph through post-learning. This will lead to insufficient convergence stability, resulting in large fluctuations in the dynamic graph structure in different training rounds, reducing the prediction stability. To address the above problems, TFDGCN uses a data-driven approach to construct a dynamic sparse graph between wind turbine nodes. The equation is as follows:

\{Θ_{N}^{1}, Θ_{N}^{2}, \dots, Θ_{N}^{F^{'}}\} = C O S (E^{N}, E^{N})

(17)

Θ_{N}^{i} = [\begin{matrix} θ_{1, 1}^{i} & θ_{1, 2}^{i} & \dots & θ_{1, N}^{i} \\ θ_{2, 1}^{i} & θ_{2, 2}^{i} & \dots & θ_{2, N}^{i} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ θ_{N, 1}^{i} & θ_{N, 2}^{i} & \dots & θ_{N, N}^{i} \end{matrix}]

(18)

where

Θ_{N}^{i}

represents the cosine similarity matrix between wind turbines under the

i

th environmental variable, and

θ_{i, j}

represents the cosine similarity between the

j

th wind turbine and the

i

th wind turbine. In the similarity matrix shown in Equation (18), find the

k

most relevant variables for each variable (Equation (19)) and establish the inter-sequence sparse graph set

M^{T}

(Equation (20)):

M_{i}^{N} = T o p K (Θ_{N}^{i})

(19)

M^{N} = \{M_{1}^{N}, M_{2}^{N}, \dots, M_{F^{'}}^{N}\}

(20)

Here

M_{i}^{N}

represents the sparse graph between nodes under the

i

th environment variable, and

M^{N} \in R^{F' \times N \times N}

represents the set of sparse graphs between nodes under all environment variables.

3.2. Triple-Flow GCN

TFDGCN captures the dimensional dependencies of variables, time patch, and nodes by inputting the three dynamic sparse graphs constructed in Section 3.1 into their corresponding GCNs.

Specifically, TFDGCN uses variable graph convolutional networks (Variable-GCN), time patch graph convolutional networks (Time patch-GCN), and node graph convolutional networks (Node-GCN) to capture the spatial dependencies between variables, dynamic dependencies along the time dimension, and dynamic dependencies between nodes. In this way, TFDGCN can effectively learn the spatiotemporal features of data and improve network performance.

3.2.1. Variable-GCN

We construct a sparse graph between sequence for Variable-GCN based on the method described in Section 3.1.1. By using graph convolution, the node features are weighted and averaged with their adjacent node features to update the new feature representation, as shown below:

E_{G}^{V} = σ ({\hat{M}}^{V} E^{V} W^{V})

(21)

{\hat{M}}^{V} = D^{(- \frac{1}{2})} (M^{V} + I) D^{(\frac{1}{2})}

(22)

where

E_{G}^{V} \in R^{N \times F \times D}

represents the features learned through graph convolution,

{\hat{M}}^{V}

represents the adjacency matrix of the sparse graph between variable sequences with self-loops after normalization, as shown in Equation (22),

M^{V}

represents the sparse graph set in Section 3.1.1,

I

represents the identity matrix, and

D

represents the degree matrix. This normalization helps stabilize the training process of graph convolution and avoid gradient vanishing or exploding.

W^{V}

represents the learnable parameter matrix, and

σ

represents the activation function, which is used to introduce nonlinear factors and enhance the expressive power of the model.

3.2.2. Time Patch-GCN

We construct a sparse graph within the sequence for Time patch-GCN based on the method in Section 3.1.2, and explore dynamic dependency features between different time slices through graph convolution operations. This is represented as follows:

E_{G}^{T} = σ ({\hat{M}}^{T} E^{T} W^{T})

(23)

{\hat{M}}^{T} = D^{(- \frac{1}{2})} (M^{T} + I) D^{(\frac{1}{2})}

(24)

where

E_{G}^{T} \in R^{N \times F \times P \times D}

represents the features learned by graph convolution,

{\hat{M}}^{T}

represents the adjacency matrix of the sparse graph in the variable sequence with self-loops after normalization, and its specific content is shown in Equation (24).

M^{T}

is the sparse graph set in Section 3.1.2,

I

is the identity matrix, and

D

is the degree matrix.

W^{T}

represents the learnable parameter matrix, and

σ

is the activation function.

3.2.3. Node-GCN

We construct a sparse graph of node relationships for node GCN based on the method in Section 3.1.3, and the wind turbine features are weighted and averaged with their neighboring features to update them into new features. This is represented as follows:

E_{G}^{N} = σ ({\hat{M}}^{N} E^{N} W^{N})

(25)

{\hat{M}}^{N} = D^{(- \frac{1}{2})} (M^{N} + I) D^{(\frac{1}{2})}

(26)

where

E_{G}^{N} \in R^{N \times F' \times D}

represents the features learned by graph convolution,

{\hat{M}}^{N}

represents the adjacency matrix of the normalized sparse graph with self-loops between nodes, as shown in Equation (26),

M^{N}

is the sparse graph set in Section 3.1.3,

I

is the identity matrix, and

D

is the degree matrix.

W^{N}

represents the learnable parameter matrix, and

σ

represents the activation function.

3.3. Triple-Flow Linear Attention Layer

Variable-GCN, Time patch-GCN, and Node-GCN are followed by a linear attention encoder with rotational position encoding and local position encoding. The linear attention mechanism aims to reduce model complexity while learning global dependencies between variable sequences, within variable sequences, and between wind turbines, thereby improving the model’s representational capabilities. The detailed structure of the linear attention module is shown in Figure 2. The features learned by the dynamic graph convolution in Section 3.2 are used as the input of Q, K, and V. The query Q and key K are processed by the linear layer and then activated by the ELU function. The position information is then added to the query Q and key K through the rotation position encoding, and the local position encoding adds local spatial information to the value V, which supplements the deficiency of global dependency modeling in the attention mechanism []. After attention calculation, residual connection and layer normalization output are performed.

Figure 2. Linear Attention Module.

3.3.1. Variable Flow Attention Correlation

Linear attention mechanism [,] can effectively calculate the global correlation between variables and complete the weighted aggregation of features while reducing the complexity of the model. This is shown as follows:

E_{G}^{V} = \{\begin{matrix} E_{G 1}^{V 1} & E_{G 2}^{V 1} & \dots & E_{G F}^{V 1} \\ E_{G 1}^{V 2} & E_{G 2}^{V 2} & \dots & E_{G F}^{V 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ E_{G 1}^{V N} & E_{G 2}^{V N} & \dots & E_{G F}^{V N} \end{matrix}\} \to E_{A}^{V} = \{\begin{matrix} E_{A 1}^{V 1} & E_{A 2}^{V 1} & \dots & E_{A F}^{V 1} \\ E_{A 1}^{V 2} & E_{A 2}^{V 2} & \dots & E_{A F}^{V 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ E_{A 1}^{V N} & E_{A 2}^{V N} & \dots & E_{A F}^{V N} \end{matrix}\}

(27)

We update the feature

E_{G}^{V}

learned from the Variable-GCN in Section 3.2.1 to

E_{A}^{V}

through the linear attention mechanism.

E_{A i}^{V m}

in Equation (28) represents the updated feature of the

i

th variable for wind turbine

m

, and

L_{i}^{V m}

represents the local position encoding of the

i

th variable for wind turbine

m

, which is learnable.

a_{i j}^{m}

represents the relationship coefficient, which determines the amount of information each feature receives from other variables, as shown in Equation (29):

E_{A i}^{V m} = \sum_{j = 1}^{F} a_{i j}^{m} (W_{v}^{V m} E_{G i}^{V m} + L_{i}^{V m})

(28)

a_{i j}^{m} = \frac{κ (P o P E_{i} (W_{q}^{V m} E_{G i}^{V m})) κ (P o P E_{j} (W_{k}^{V m} E_{G j}^{V m}))}{\sum_{j = 1}^{F} κ (P o P E_{i} (W_{q}^{V m} E_{G i}^{V m})) κ (P o P E_{j} (W_{k}^{V m} E_{G j}^{V m}))}

(29)

In the above,

P o P E

is the rotation position encoding,

κ

is the kernel function (

E L U

), and

W_{q}^{V m}

,

W_{k}^{V m}

and

W_{v}^{V m}

are learnable parameters.

3.3.2. Time Flow Attention Correlation

Linear attention mechanism can effectively calculate the global correlation between time slices while reducing model complexity, and thus performs weighted feature aggregation. This is calculated as follows:

E_{A i}^{T m} = \sum_{j = 1}^{P} a_{i j}^{m} (W_{v}^{T m} E_{G i}^{T m} + L_{i}^{T m})

(30)

In Equation (30),

E_{A i}^{T m}

represents the updated feature for wind turbine

m

at the

i

th time slice, and

L_{i}^{T m}

represents the local position encoding for wind turbine

m

at the

i

th time slice. It helps the model capture short-term, high-frequency temporal dependencies, forcing the model to focus on the dependencies of these nearby moments and avoid being disturbed by irrelevant historical data in the long sequence.

a_{i j}^{m}

represents the relationship coefficient, which determines the amount of information each feature receives from other time slices, as shown in Equation (31):

a_{i j}^{m} = \frac{κ (P o P E_{i} (W_{q}^{T m} E_{G i}^{T m})) κ (P o P E_{j} (W_{k}^{T m} E_{G j}^{T m}))}{\sum_{j = 1}^{P} κ (P o P E_{i} (W_{q}^{T m} E_{G i}^{T m})) κ (P o P E_{j} (W_{k}^{T m} E_{G j}^{T m}))}

(31)

In the above,

P o P E

is the rotation position encoding,

κ

is the kernel function (

E L U

), and

W_{q}^{T m}

,

W_{k}^{T m}

and

W_{v}^{T m}

are learnable parameters.

3.3.3. Node Flow Attention Correlation

In the node flow, we still use linear attention mechanism, as shown below

E_{A i}^{N m} = \sum_{j = 1}^{N} a_{i j}^{m} (W_{v}^{N m} E_{G i}^{N m} + L_{i}^{N m})

(32)

In Equation (32),

E_{A i}^{N m}

represents the updated feature of the

i

th wind turbine for environmental variable

m

, and

L_{i}^{N m}

represents the local position code of the

i

th wind turbine for environmental variable

m

. It can strengthen the close dependence between wind turbines, that is, the mutual influence of wind turbines at close distances is much stronger than that of wind turbines at far distances.

a_{i j}^{m}

represents the relationship coefficient, which determines the amount of information each feature receives from other wind turbines, as shown in Equation (33):

a_{i j}^{m} = \frac{κ (P o P E_{i} (W_{q}^{N m} E_{G i}^{N m})) κ (P o P E_{j} (W_{k}^{N m} E_{G j}^{N m}))}{\sum_{j = 1}^{N} κ (P o P E_{i} (W_{q}^{N m} E_{G i}^{N m})) κ (P o P E_{j} (W_{k}^{N m} E_{G j}^{N m}))}

(33)

In the above,

P o P E

is the rotation position encoding,

κ

is the kernel function (

E L U

), and

W_{q}^{N m}

,

W_{k}^{N m}

and

W_{v}^{N m}

are learnable parameters.

3.4. Fusion Prediction Module

After processing through the Triple-flow linear attention mechanism, we obtain features in three dimensions: variable, time, and node. The task of the feature fusion module is to fuse the features of these three dimensions to generate the final prediction result. Specifically, it uses a combination of splicing and fully connected layers to aggregate. Before feature splicing, it is necessary to map the features of each dimension to the same dimensional space through linear transformation to ensure the effectiveness of the splicing operation and the stability of subsequent feature fusion. Specifically, we convert the dimensions into:

E_{A}^{V} \in R^{N \times F \times D} \to R^{N \times D \times 1}

,

E_{A}^{T} \in R^{N \times F \times P \times D} \to R^{N \times D \times 1}

,

E_{A}^{N} \in R^{N \times F^{'} \times D} \to R^{N \times D \times 1}

. Then we concatenate the feature vectors along the second dimension through the concatenation operation, and

E_{c a t}

is the new feature after concatenation.

E_{c a t} = C o n c a t (E_{A}^{V}, E_{A}^{T}, E_{A}^{N}) \in R^{N \times 3 D \times 1}

(34)

Finally, we map the hidden layer dimensions to the output time window dimensions through the fully connected layer to obtain the final prediction result:

E_{o u t} = E_{c a t} W^{o u t} \in R^{N \times S^{'} \times 1}

(35)

On the one hand, this method retains the feature information of variable flow, time flow and node flow; on the other hand, it generates more expressive integrated features through the fully connected layer. These two aspects work together to effectively enhance the performance of the model.

3.5. Loss Function

TFDGCN uses mean square error (MSE) as our loss function, which is defined as follows:

L o s s_{M S E} = \frac{1}{N \times S^{'}} \sum_{i = 1}^{N} \sum_{j = 1}^{S^{'}} {(E_{o u t, i, j} - E_{T r u e, i, j})}^{2}

(36)

where ETrue is the actual value of wind power, is

N

the number of wind turbines, and

S^{'}

is the size of the prediction time window.

4. Experiments

4.1. Data Description

We conduct experiments using the datasets SDWPF and SD23 obtained from two real wind farms. (1) SDWPF [] is real wind power data from Longyuan Power Group. This dataset contains 4,727,520 records, sampled every 10 min, and collected over 245 days from a wind farm with 134 wind turbines. Each record contains 13 attributes, including environmental parameter features such as wind speed, wind direction, temperature and state features such as cabin temperature, cabin orientation. (2) The SD23 dataset. This dataset contains wind power data from a wind farm in Shandong Province, China. The dataset contains 2,170,800 records, sampled every 10 min, and covers 335 days of data from 45 wind turbines in the wind farm. The dataset spans from 1 January 2023 to 31 December 2023. Each record contains eight attributes, including features such as wind speed, yaw angle, and actual power.

For the SDWPF dataset, we used data from the past day to predict wind power for the next day. Specifically, we used the past 144 historical time windows to predict power for the next 144 time windows. We set the training, validation, and test sets to 155 days, 30 days, and 60 days, respectively. For SD23, we similarly used data from the past day to predict wind power for the next day. We split the dataset into 235 days, 67 days, and 33 days for training, validation, and testing, respectively.

Both of these datasets can be used for short-term, medium-term, and long-term forecasting tasks. The short-term time scale ranges from a few hours to a few days and is used for economic dispatch decisions, determining reserve requirements, and developing electricity market bidding strategies. The medium-term time scale ranges from a few days to a few weeks and is used for unit commitment and maintenance scheduling. The long-term time scale ranges from a few weeks to a few months, or even years, and is mainly used for wind power capacity planning and power system planning. Our task falls under short-term power forecasting.

For both datasets, the Top_k settings for the time flow, variable flow, and node stream were [8, 8, 8] and [8, 3, 8], respectively. The time slice length was 8. The feature dimension of the hidden layer was set to 128, and the number of heads was set to 2. The linear attention encoder dropout was set to 0.1, and the initial learning rate of the model was 1 × 10⁻⁴. The Adam optimizer was used to minimize the mean squared error (MSE) loss until training was terminated with an early stopping strategy. The model with the best performance on the validation set was selected as the final model and evaluated on the test set. Table 1 shows the hyperparameters and their meanings

Table 1. Hyperparameters and their meanings.

4.2. Evaluation Metrics

We use the mean absolute error (MAE) and root mean square error (RMSE) metrics to evaluate the performance of all methods on the entire wind farm. These two metrics are defined as follows:

M A E = \frac{1}{N} \sum_{i = 1}^{N} |p r e d_{i} - g t_{i}|

(37)

R M S E = \sum_{i = 1}^{N} \sqrt{\frac{1}{N} {(p r e d_{i} - g t_{i})}^{2}}

(38)

where

N

represents the number of wind turbines,

p r e d_{i}

represents the predicted value of wind turbine

i

, and

g t_{i}

represents the actual value of wind turbine

i

. It is worth noting that SDWPF [] provides a range for the abnormal values of the data, for example, the actual power must not be less than 0. If it is less than 0, it will be regarded as an abnormal value. Therefore, we will not use abnormal values or missing values for calculation and evaluation of the model.

4.3. Model Comparison

We compare the proposed TFDGCN with eight methods. These include four time series prediction models: LSTM [], GRU [], LSTM + Transformer [], and TSAT [], and four spatiotemporal prediction models: MTGNN [], DCRNN [], Bi-STAT [], and AGCRN []. AGCRN and Bi-STAT are methods based on static graph networks. They take static graphs constructed based on Euclidean distance as input. MTGNN and DCRNN are based on the use of dynamic graph networks, where the adjacency relationships of graphs dynamically change over time. All parameters are set according to the data division in the previous section.

Table 2 and Table 3 respectively present the performance of all models on the SDWPF dataset and SD23 dataset. Based on the results shown in Table 2 and Table 3, it can be observed that the proposed TFDGCN performs well in both MAE and RMSE metrics, indicating that the model has good generalization and predictive performance. From Table 2, it can be found that the prediction accuracy of spatiotemporal models (AGCRN, Bi-STAT, MTGNN, DCRNN) is superior to traditional temporal models (LSTM, GRU, and LSTM + Transformer). This indicates that spatiotemporal models can more comprehensively capture the complex patterns of wind power data by integrating temporal and spatial information, thus outperforming temporal models in prediction accuracy and generalization ability. From Table 3, it can be seen that the prediction accuracy of spatiotemporal models based on dynamic graphs (MTGNN, DCRNN) is better than that of spatiotemporal models based on static graphs (AGCRN, Bi-STAT). This indicates that dynamic graphs can better capture the spatiotemporal dependencies that vary over time in wind farms.

Table 2. Performance of different models on the SDWPF dataset.

Table 3. Performance of different models on the SD23 dataset.

To further evaluate the model we proposed, we visualized the prediction results of TFDGCN on SDWPF dataset. We summed the wind power of all turbines from 4–5 August 2023, to obtain the total power of the wind farm, and plotted it together with the TFDGCN predictions for the same time period. Figure 3 shows the error between the predicted and actual power.

Figure 3. The error between the predicted and actual power.

4.4. Research on Top-k Values

We conducted experiments on the selection of the Top_k parameter for the variable flow, time flow, and node flow of TFDGCN using two datasets. In the experiments, we fixed the Top_k values of two of the flows to investigate the impact of the Top_k value of the remaining flow on model performance. On the SDWPF dataset, when the Top_k values of the variable flow and temporal flow were fixed, the effects of changes in the Top_k value of the node flow on MAE and RMSE are illustrated in Figure 4a,b. On the SD23 dataset, when the Top_k values of the variable flow and temporal flow were fixed, the effects of changes in the Top_k value of the node flow on MAE and RMSE are shown in Figure 4c,d. As presented in Figure 4, regardless of the dataset, both MAE and RMSE first decrease and then increase as Top_k increases. Specifically, as shown in Figure 4a–c, the optimal Top_k value is 8. This indicates that: A lower Top_k value means a smaller number of neighbors, resulting in limited information aggregated by the graph convolution operation from neighboring nodes. A higher Top_k value implies a larger number of neighbors; during the aggregation process of graph convolution, the model will be affected by neighbors irrelevant to the target node. Such irrelevant neighbors interfere with the model’s learning and reduce its wind power prediction performance. We also conducted experiments on the Top_k values of the variable flow and temporal flow using the same method. On the SD23 dataset, the impact of changes in Top-k values of variable flows and temporal flow on MAE and RMSE are shown in Figure 5a,b and Figure 6a,b, respectively. Based on the experimental results, the Top_k values for the temporal flow, variable flow, and node flow were all set to 8.

Figure 4. Select the Top_k value of node flow on the SDWPF and SD23 datasets.

Figure 5. Select the Top_k value of variable flow on the SDWPF dataset.

Figure 6. Select the Top_k value of time flow on the SDWPF dataset.

4.5. Ablation Studies

4.5.1. Studies on the Validity Ablation of Triple-Flow Structure

To assess the effectiveness of the triple flow architecture of TFDGCN, we compared it with three other architectures, where the definitions of the comparative architectures are as follows: T&N denotes the structure that removes the variable flow while retaining the temporal flow and node flow. T&V denotes the structure that removes the node flow while retaining the temporal flow and variable flow. T&V&StaticN denotes the three-branch structure that replaces the dynamic graph of the node flow with a static graph based on Euclidean distance. The results are presented in Table 4. It can be observed that removing the variable flow, removing the node flow, or using a static graph all lead to a decline in model performance. Specifically, the MAE of T&N and T&V is 1.9% and 1% higher than that of TFDGCN, respectively. This indicates that both the variable flow and node flow contribute to improving the performance of TFDGCN. Additionally, T&V&StaticN performs worse than TFDGCN, which demonstrates the role of dynamic graphs in enhancing the performance of TFDGCN.

Table 4. Ablation study of the effectiveness of the triple flow structure.

4.5.2. Ablation Study on the Effectiveness of Attention Mechanism

We removed the linear attention module of TFDGCN, and the experimental results are shown in Table 5. It can be seen that removing the linear attention module has an impact on the model’s prediction performance. This indicates that the linear attention module can effectively capture global dependencies and contributes to the performance of TFDGCN.

Table 5. Ablation study on the effectiveness of attention mechanism.

To verify the advantage of linear attention mechanism in inference time, we compared the inference time of applying linear attention mechanism and global attention mechanism in TFDGCN. We conducted this experiment on 60 batches of data spanning from day 185 to day 245. Each batch contains daily monitoring data for 134 fans, including 13 fields. We can see from Table 6 that the linear attention mechanism encoder outperforms the global attention mechanism encoder in inference time.

Table 6. Inference time.

5. Conclusions

We propose TFDGCN, a novel spatiotemporal wind power short-term prediction model. TFDGCN has a three-branch structure, which models the variable dimension, time dimension, and node dimension separately to capture heterogeneous dependencies in spatiotemporal sequence data. Experimental results show that the proposed TFDGCN outperforms traditional temporal models and the state-of-the-art method in terms of performance. The ablation experiments further confirmed the effectiveness of the three-branch structure, that is, when any one of the three branch structures is removed, the model performance decreases. Removing the linear attention module will also result in a decrease in prediction accuracy. This demonstrates the role of the linear attention mechanism in extracting global dependencies.

The latest research results show that the collaborative application of wind power prediction and power scheduling optimization can improve the quality of scheduling strategies. In future work, we will attempt to study the synergistic effect between wind power forecasting and power dispatch optimization, in order to further reduce the operating costs of the power grid and ensure the safe and stable operation of the power system.

Author Contributions

Conceptualization, methodology, B.L.; software, validation, visualization, B.D.; writing—original draft preparation, writing—review and editing, W.P. and H.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Research Project of Jilin Provincial Department of Education under grant no. JJKH20240153KJ.

Data Availability Statement

Data available in a publicly accessible repository. The SDWPF dataset presented in the study are openly available at https://aistudio.baidu.com/competition/detail/152/0/datasets. However, the SD23 dataset comes from a third party, and the author can provide the data upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Karmakar, S.D.; Chattopadhyay, H. A comprehensive look into the sustainability of wind power. Renew. Sustain. Energy Rev. 2025, 217, 115694. [Google Scholar] [CrossRef]
Zhuang, Y.; Cheng, L.; Wan, C.; Xie, R.; Qi, N.; Chen, Y. A weighted predict-and-optimize framework for power system operation considering varying impacts of uncertainty. IEEE Trans. Power Syst. 2025; submitted. [Google Scholar]
Qu, K.; Si, G.Q.; Wang, Q.Y.; Xu, M.L.; Shan, Z.H. Improving economic operation of a microgrid through expert behaviors and prediction intervals. Appl. Energy 2025, 383, 125391. [Google Scholar] [CrossRef]
Ullah, F.; Zhang, X.; Khan, M.; Mastoi, M.S.; Munir, H.M.; Flah, A.; Said, Y. A comprehensive review of wind power integration and energy storage technologies for modern grid frequency regulation. Heliyon 2024, 10, e30466. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.-H.; Wang, C.-T.; Wei, H.-L.; Zheng, B.; Li, M.; Song, X.-P. A wavelet-LSTM model for short-term wind power forecasting using wind farm SCADA data. Expert Syst. Appl. 2024, 247, 123237. [Google Scholar] [CrossRef]
Li, Q.; Ren, X.; Zhang, F.; Gao, L.; Hao, B. A novel ultra-short-term wind power forecasting method based on TCN and Informer models. Comput. Electr. Eng. 2024, 120, 109632. [Google Scholar] [CrossRef]
Wang, C.; Lin, H.; Yang, M.; Chen, L. Ultra-short-term wind farm cluster interval power prediction based on cluster division and MQ-WaveNet-MSA. Electr. Power Syst. Res. 2025, 244, 111577. [Google Scholar] [CrossRef]
Zhu, G.; Jia, W.; Cheng, L.; Xiang, L.; Hu, A. A Non-stationary Transformer model for power forecasting with dynamic data distillation and wake effect correction suitable for large wind farms. Energy Convers. Manag. 2025, 324, 119292. [Google Scholar] [CrossRef]
Hou, G.; Li, Q.; Huang, C. Spatiotemporal forecasting using multi-graph neural network assisted dual domain transformer for wind power. Energy Convers. Manag. 2025, 325, 119393. [Google Scholar] [CrossRef]
Yang, M.; Ju, C.; Huang, Y.; Guo, Y.; Jia, M. Short-Term Power Forecasting of Wind Farm Cluster Based on Global Information Adaptive Perceptual Graph Convolution Network. IEEE Trans. Sustain. Energy 2024, 15, 2063–2076. [Google Scholar] [CrossRef]
Daenens, S.; Verstraeten, T.; Daems, P.-J.; Nowé, A.; Helsen, J. Spatio-temporal graph neural networks for power prediction in offshore wind farms using SCADA data. Wind Energy Sci. 2025, 10, 1137–1152. [Google Scholar] [CrossRef]
An, Y.; Zhang, Y.; Lin, J.; Yi, Y.; Fan, W.; Cai, Z. Ultra-Short-Term Power Prediction of Large Offshore Wind Farms Based on Spatiotemporal Adaptation of Wind Turbines. Processes 2024, 12, 696. [Google Scholar] [CrossRef]
Yang, M.; Han, C.; Zhang, W.; Wang, B. A short-term power prediction method for wind farm cluster based on the fusion of multi-source spatiotemporal feature information. Energy 2024, 294, 130770. [Google Scholar] [CrossRef]
Zhao, Y.; Liao, H.; Pan, S.; Zhao, Y. Interpretable multi-graph convolution network integrating spatio-temporal attention and dynamic combination for wind power forecasting. Expert Syst. Appl. 2024, 255, 124766. [Google Scholar] [CrossRef]
Zhao, H.; Li, G.; Chen, R.; Zhen, Z.; Wang, F. Ultra-short-term Power Forecasting of Wind Farm Cluster Based on Spatio-temporal Graph Neural Network Pattern Prediction. In Proceedings of the 2022 IEEE Industry Applications Society Annual Meeting (IAS), Detroit, MI, USA, 9–14 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–25. [Google Scholar]
Han, C.; Chen, Z. A Short-Term Power Prediction Method for Wind Farm Cluster Based on Time Dynamic Graph Network and Improved Transformer Model. In Proceedings of the 2025 International Russian Smart Industry Conference (SmartIndustryCon), Sochi, Russia, 24–28 March 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 285–290. [Google Scholar]
Song, Y.; Tang, D.; Yu, J.; Yu, Z.; Li, X. Short-Term Forecasting Based on Graph Convolution Networks and Multiresolution Convolution Neural Networks for Wind Power. IEEE Trans. Ind. Inform. 2023, 19, 1691–1702. [Google Scholar] [CrossRef]
Dong, X.; Sun, Y.; Li, Y.; Wang, X.; Pu, T. Spatio-temporal Convolutional Network Based Power Forecasting of Multiple Wind Farms. J. Mod. Power Syst. Clean Energy 2022, 10, 388–398. [Google Scholar] [CrossRef]
Ye, J.; Li, J.; Su, R.; Yang, S.; Huang, Y.; Zhao, C. DFGCN: Decoupled dual-flow dynamic graph convolutional network for multivariate time series forecasting. Knowl.-Based Syst. 2025, 323, 113720. [Google Scholar] [CrossRef]
Kim, T.; Kim, J.; Tae, Y.; Park, C.; Choi, J.-H.; Choo, J. Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 25–29 April 2022. [Google Scholar]
Nie, Y.Q.; Nguyen, N.H.; Sinthong, P.W.; Kalagnanam, J. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Liu, G.; Zhang, Y.; Zhang, P.; Gu, J. Spatiotemporal Graph Contrastive Learning for Wind Power Forecasting. IEEE Trans. Sustain. Energy 2025, 16, 1889–1902. [Google Scholar] [CrossRef]
Han, D.; Wang, Z.; Xia, Z.; Han, Y.; Pu, Y.; Ge, C.; Song, J.; Song, S.; Zheng, B.; Huang, G. Demystify Mamba in Vision: A Linear Attention Perspective. In Advances in Neural Information Processing Systems 37 (NeurIPS 2024), Proceedings of the 38th Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; Curran Associates, Inc.: Red Hook, NY, USA, 2024. [Google Scholar]
Su, J.; Ahmed, M.; Lu, Y.; Pan, S.; Bo, W.; Liu, Y. RoFormer: Enhanced transformer with Rotary Position Embedding. Neurocomput 2024, 568, 127063. [Google Scholar] [CrossRef]
Katharopoulos, A.; Vyas, A.; Pappas, N.; Fleuret, F. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), Virtual Conference, 13–18 July 2020; PMLR: London, UK, 2020; pp. 5156–5165. [Google Scholar]
Zhou, J.; Lu, X.; Xiao, Y.; Su, J.; Lyu, J.; Ma, Y.; Dou, D. SDWPF: A Dataset for Spatial Dynamic Wind Power Forecasting over a Large Turbine Array. Sci. Data 2024, 11, 649. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In Advances in Neural Information Processing Systems 27 (NIPS 2014), Proceedings of the 2014 Neural Information Processing Systems Workshop on Deep Learning and Representation Learning, Montreal, QC, Canada, 8–13 December 2014; Curran Associates, Inc.: Red Hook, NY, USA, 2014. [Google Scholar]
Zhu, M.; Li, Z.; Lin, Q.; Ding, L. Fast-Powerformer: A Memory-Efficient Transformer for Accurate Mid-Term Wind Power Forecasting. arXiv 2025, arXiv:2504.10923v1. [Google Scholar]
Ng, W.T.; Siu, K.; Cheung, A.C.; Ng, M.K. Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer. arXiv 2022, arXiv:2208.09300. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 23–27 August 2020; ACM: New York, NY, USA, 2020; pp. 7538–7548. [Google Scholar]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Chen, C.; Liu, Y.; Chen, L.; Zhang, C. Bidirectional Spatial-Temporal Adaptive Transformer for Urban Traffic Flow Forecasting. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 6913–6925. [Google Scholar] [CrossRef]
Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Proceedings of the 34th Conference on Neural Information Processing Systems, Vancouver, Canada (Virtual), 6–12 December 2020; Curran Associates, Inc.: Red Hook, NY, USA, 2020. [Google Scholar]

Figure 1. TFDGCN network architecture.

Figure 2. Linear Attention Module.

Figure 3. The error between the predicted and actual power.

Figure 4. Select the Top_k value of node flow on the SDWPF and SD23 datasets.

Figure 5. Select the Top_k value of variable flow on the SDWPF dataset.

Figure 6. Select the Top_k value of time flow on the SDWPF dataset.

Table 1. Hyperparameters and their meanings.

Hyperparameters	Explain
patch_len	Length of time series patches in the time flow
dropout	Dropout probability, used to prevent model overfitting
n_heads	The number of heads in the attention mechanism
top_k	Number of nearest neighbors for constructing the adjacency matrix

Table 2. Performance of different models on the SDWPF dataset.

Method	SDWPF
Method	MAE (MW)	RMSE (MW)
LSTM	41.23	46.15
GRU	40.92	46.40
LSTM + Transformer	39.21	46.32
TSAT	38.03	44.91
AGCRN	38.45	45.92
Bi-STAT	38.34	45.75
MTGNN	40.43	46.61
DCRNN	38.33	46.52
TFDGCN	37.16	44.84

Table 3. Performance of different models on the SD23 dataset.

Method	SD23
Method	MAE (MW)	RMSE (MW)
LSTM	15.94	17.73
GRU	16.38	18.03
LSTM + Transformer	15.58	17.63
TSAT	15.27	17.79
AGCRN	15.81	17.83
Bi-STAT	15.63	17.82
MTGNN	15.37	17.65
DCRNN	15.49	17.75
TFDGCN	14.63	17.56

Table 4. Ablation study of the effectiveness of the triple flow structure.

Method	SDWPF		SD23
Method	MAE	RMSE	MAE	RMSE
TFDGCN (T&V)	37.56	45.49	14.86	17.59
TFDGCN (T&N)	37.88	45.32	15.23	17.61
TFDGCN (T&V&StaticN)	37.35	44.93	14.75	17.63
TFDGCN	37.16	44.84	14.63	17.56

Table 5. Ablation study on the effectiveness of attention mechanism.

Method	SDWPF		SD23
Method	MAE	RMSE	MAE	RMSE
Delete Attention	38.87	44.86	14.98	17.63
TFDGCN	37.16	44.84	14.63	17.56

Table 6. Inference time.

Method	Average Inference Time	Total Inference Time
TFDGCN (LinearAttention)	26.685 ms	1574.421 ms
TFDGCN (GlobalAttention)	41.168 ms	2428.925 ms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.