A Novel Wind Power Prediction Model That Considers Multi-Scale Variable Relationships and Temporal Dependencies

Xu, Zhanyang; Zhao, Hong; Xu, Chengxi; Shi, Hongyan; Xu, Jian; Wang, Zhe

doi:10.3390/electronics13183710

Open AccessEditor’s ChoiceArticle

A Novel Wind Power Prediction Model That Considers Multi-Scale Variable Relationships and Temporal Dependencies

by

Zhanyang Xu

¹,

Hong Zhao

^1,*

,

Chengxi Xu

²,

Hongyan Shi

¹,

Jian Xu

¹ and

Zhe Wang

¹

School of Software, Nanjing University of Information Science and Technology, Nanjing 210044, China

²

Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA 92092, USA

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(18), 3710; https://doi.org/10.3390/electronics13183710

Submission received: 22 August 2024 / Revised: 6 September 2024 / Accepted: 11 September 2024 / Published: 19 September 2024

(This article belongs to the Special Issue Applications of Machine Learning and Artificial Intelligence in Modern Power and Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

Wind power forecasting is a critical technology for promoting the effective integration of wind energy. To enhance the accuracy of wind power predictions, this paper introduces a novel wind power prediction model that considers the evolving relationships of multi-scale variables and temporal dependencies. In this paper, a multi-scale frequency decomposition module is designed to split the raw data into high-frequency and low-frequency parts. Subsequently, features are extracted from the high-frequency information using a multi-scale temporal graph neural network combined with an adaptive graph learning module and from the low-frequency data using an improved bidirectional temporal network. Finally, the features are integrated through a cross-attention mechanism. To validate the effectiveness of the proposed model, extensive comprehensive experiments were conducted using a wind power dataset provided by the State Grid. The experimental results indicate that the MSE of the model proposed in this paper has decreased by an average of 7.1% compared to the state-of-the-art model and by 48.9% compared to the conventional model. Moreover, the improvement in model performance becomes more pronounced as the prediction horizon increases.

Keywords:

wind power forecasting; multi-scale modeling; graph neural network; temporal convolutional network

1. Introduction

As global warming intensifies and energy crises escalate, the need to seek clean and sustainable energy solutions has become an international consensus. Wind power generation, as a green energy technology, is becoming key to the global energy transition due to its renewable nature and low carbon emissions [1]. Figure 1 provides a detailed depiction of the workflow of a modern wind turbine, from the capture of wind energy to the output of electricity. The left side of the image illustrates the main components of a wind turbine. Wind is captured by the turbine blades and converted into mechanical energy through transmission gears, which drive the generator to produce electricity. The right side of the image then describes the further processing and distribution of electricity. Power is first transmitted to a charge controller, then flows to a battery bank for storage or directly to an inverter, which supplies power to the grid or directly to the load. According to data released by the Global Wind Energy Council (GWEC) [2], global cumulative installed wind power capacity grew from 433 GW in 2015 to 906 GW in 2022, with an annual compound growth rate of 11.12%. Accurate wind power prediction is crucial for realizing precise scheduling and efficient power supply, thus making wind power prediction technology essential. However, the variability and intermittency of wind power present significant challenges to the planning, operation, and control of power systems [3].

Wind power prediction methods are primarily divided into physical models, statistical models, and machine learning algorithms. Physical models depend on numerical weather predictions (NWP), utilizing meteorological data and surface information to solve equations of fluid dynamics and thermodynamics for predicting wind speed and direction, and subsequently calculating the power output of wind farms [4]. However, these models are computationally complex and sensitive to initial conditions, which limits their application in short-term high-precision forecasting. Statistical methods predict by establishing a relationship between input and output based on historical data [5]. Despite being computationally efficient, statistical models have drawbacks such as difficulty in model order selection and sensitivity to climate changes, which limit their effectiveness in long-term applications and environments with severe climate variability.

With the development of artificial intelligence technology, potential applications in energy forecasting have emerged. Traditional machine learning methods such as Artificial Neural Networks (ANN), Support Vector Machines (SVM), Extreme Gradient Boosting (XGBoost), and Multi-Layer Perceptrons (MLPs) have been widely utilized. The literature [6,7,8] proposes hybrid neural network architectures combining modal decomposition, optimization algorithms, and Extreme Learning Machines (ELM) to further improve prediction accuracy. The literature [9] introduces a novel wind power curve model integrating Isolation Forest anomaly detection, Asymmetric Fuzzy Mean Radial Basis Function Neural Network modeling, and meta-heuristic algorithm optimization. Although these traditional machine learning methods have high training efficiency, they still require extensive feature engineering and have limitations in handling long-term dependency relationships. Deep learning, as a branch of machine learning, has shown significant advancements in the field of sequence data prediction. The literature [10,11,12] proposes models integrating Recurrent Neural Networks (RNN) and Long Short Term Memory (LSTM) based on reinforcement learning, a wind power prediction method based on Gated Recurrent Units (GRU), and a model combining Self-Attention Temporal Convolutional Networks (SATCN) with LSTM for wind power prediction, respectively. These models aim to enhance prediction accuracy and reduce computational costs. Despite their strong fitting capabilities, they face issues like vanishing gradients, exploding gradients, and sensitivity to noise, which can lead to poor prediction outcomes.

In recent years, due to the ability of Graph Neural Networks (GNNs) to effectively model interactions between nodes through weighted edges, their application in wind power forecasting has gradually increased. The literature [13] proposed a Multi-dimensional Spatio-Temporal Graph Neural Network, which combines Wind Transformer for single-point wind speed prediction and utilizes GNNs to integrate wind speed information from multiple points in the spatial dimension, thereby improving wind speed prediction accuracy. Additionally, another study [14] combined Graph Convolutional Networks (GCN) with multiresolution convolution neural networks to dynamically extract spatial and temporal features among input variables, thus enhancing prediction accuracy. The literature [15] combined GNN with an improved Bootstrap technique to model the spatio-temporal characteristics between wind farms and meteorological factors, improving the precision and reliability of ultra-short-term predictions. The literature [16] integrated GNN with a Gated Dilated Inception Network, specifically considering the blockage effects between wind turbines, which further improved prediction accuracy.

With the introduction of the Vanilla Transformer [17], the attention mechanism it employs has greatly enhanced the model’s ability to handle long-distance sequence dependencies, heralding a new era in time series prediction. Models that integrate attention mechanisms and Transformer-based variants have become the focus of current research. The literature [18,19,20,21] demonstrates that integrating attention mechanisms with models such as CNN, LSTM, and GRU has enhanced the accuracy of wind power prediction to varying degrees. The Vanilla Transformer, due to its need to compute the relevance of every position in the input sequence to all others, exhibits high complexity. In contrast, the Informer [22] utilizes the Kullback–Leibler divergence distribution measure to filter out dominant queries on the order of

O (N log N)

, reducing complexity and designing a generative decoder to directly produce results, avoiding cumulative errors associated with one forward-step predictions. However, its reliance on single-step time calculations may result in the loss of crucial information under highly volatile wind power data. The Autoformer [23] attempts to decompose seasonal-trend features and auto-correlation attention for patch-level connections, but its essentially manual design may fail to capture all semantic information within the patch. The FEDformer [24], by transforming time series into the frequency domain to compute attention, achieves linear computational complexity, though this method may overlook detailed temporal dependencies, affecting prediction accuracy. The iTransformer [25] applies attention mechanisms and feed-forward networks in inverted dimensions, effectively capturing multivariate correlations and learning nonlinear representations.

Through in-depth observation of wind power sequence data, we found that power data present diverse time patterns at different sampling scales. For instance, wind power data recorded hourly reveal intra-day variations in wind speed, while data sampled daily highlight seasonal fluctuations. From a macro perspective, quarterly climate trends dominate the patterns of annual average power. These observations suggest that understanding complex temporal variations requires a multi-scale analysis approach, which can capture both short-term changes and long-term climate patterns, thereby offering a more comprehensive perspective for wind power prediction. Regarding references [13,14,15,16], most of the GNNs primarily focus on modeling the interactions between wind turbines within wind farms from a spatial dimension. These methods typically adopt a data-driven approach for constructing the graph structure; while capturing correlations between time series to some extent, they exhibit limited flexibility as they often only learn a specific type of dependency between variables. Consequently, they struggle to effectively handle complex temporal features across different time scales. This paper, from the perspective of capturing dependencies at various scales, combines a multi-scale temporal graph neural network with an adaptive graph learning module and optimizes the synergistic effects of features across different time scales through a fusion module. Furthermore, we introduce an improved temporal convolutional network, which enhances prediction capability by aggregating information from different scales and leveraging the complementarity of multi-scale observations.

The contributions of this paper are as follows:

(1): We developed a multi-scale frequency decomposition (MSF-Decomp) module that effectively extracts seasonal and trend changes from data of different sampling sizes, transforming them into high and low-frequency components for independent modeling.
(2): The Multi-Scale Temporal Graph Convolutional Network (MST-GCN) was designed to use low-frequency components as inputs, capturing correlations across multi- scale sequences.
(3): A Bidirectional Temporal Gated Convolution Network (Bi-TGCN) was introduced, utilizing high-frequency components to effectively handle dependencies within multi-scale sequences.
(4): By using a multi-head cross-attention mechanism to fuse the prediction results of two models and comparing them with several benchmark models, the advantages of our method in terms of robustness and accuracy were validated.

The remainder of this paper is organized as follows: Section 2 provides a detailed explanation of the proposed model’s construction and key techniques. Section 3 introduces the data sources and preprocessing steps. Section 4 covers the experimental design, result analysis, and performance comparison of the model. Finally, Section 5 summarizes the research findings and limitations and discusses future research directions.

2. Proposed Methodology

We propose a novel method for processing and predicting multi-scale time series data, whose overall framework is displayed in Figure 2. This method begins by decomposing the original series into low and high-frequency components using the MSF-Decomp module. The high-frequency component, which contains rapid change information, reflects local fluctuations and is processed through the designed Bi-TGCN to capture internal sequence dependencies. The low-frequency component, which reflects overall trends, is handled by the MST-GCN, utilizing graph node propagation to reveal deep trends and temporal patterns. Finally, the outputs of the two modules are integrated through a multi-head cross-attention mechanism to form a comprehensive prediction result.

2.1. Multi-Scale Decomposition Module

When analyzing the original wind power data, we observed that even at the finest granularity of sampling, the data exhibited mixed temporal patterns containing both seasonal and trend changes. The superposition and coupling of these patterns increase the complexity and variability of the sequence. Inspired by seasonal and trend decomposition methods [26], we propose a multi-scale frequency decomposition strategy. This strategy separates high-frequency signals, which include abrupt changes and noise, from low-frequency signals, modeling them separately. This approach enhances the model’s ability to capture long-term trends and improves prediction stability.

Previous time series decomposition methods primarily used a moving average strategy to smooth the original data. Specifically, given an input sequence

X \in R^{L \times d}

, where L represents the total length of the sequence and d represents the dimensions, the decomposition steps are as follows:

X_{t} = A v g P o o l {(P (X))}_{k e r n e l}

(1)

X_{s} = X - X_{t}

(2)

where

X_{t}, X_{s} \in R^{L \times d}

represent the trend component and seasonal component, respectively.

AvgPool (\cdot)

denotes the average pooling operation, and

P (\cdot)

represents the padding operation. However, the fixed pool size in this method may not be adequate for adapting to the dynamic mixed modes in wind power sequences. Moreover, using the difference to extract the seasonal component

X_{s}

overlooks the nonlinear coupling relationship between trend and seasonal components.

Based on the aforementioned method, we designed the MSF-Decomp module shown in Figure 3. This module uses average and max pooling to process data, where average pooling captures general trends, and max pooling highlights peaks or extreme values. It also incorporates various-sized pooling kernels, enhancing the separation of high-frequency and low-frequency components. Specifically, the module concatenates components from different-sized pooling operations. To maintain the original temporal data, it expands the data by one dimension, transforming it into a vector

X_{L}

, with the last dimension being 3. Integration of various components is achieved through a weighted sum operation. For the input sequence

X \in R^{L \times d}

, the expression is as follows:

X_{l o w} = s u m (A v g P o o l {(P (X))}_{k_{1}}, \dots, A v g P o o l {(P (X))}_{k_{n}} X_{L})

(3)

X_{h i g h} = s u m (M a x P o o l {(P (X))}_{k_{1}}, \dots, M a x P o o l {(P (X))}_{k_{n}} X_{L})

(4)

where

X_{low}, X_{high} \in R^{L \times d}

respectively represent the low-frequency and high-frequency components, and

MaxPool (\cdot)

denotes the max pooling operation. Through this design, the MSF-Decomp module provides a basis for subsequent modeling by extracting high and low-frequency features.

2.2. Multi-Scale Graph Neural Network Modeling

2.2.1. Dynamic Graph Learning

Wind power data are considered a unique form of graph-structured data, with nodes representing different variables connected by an adjacency matrix that depicts their correlations. Traditional methods [27,28,29] typically learn a single adjacency matrix, capturing only significant shared temporal patterns and often overlooking the diversity of time scales.

In response, this paper adopts a Dynamic Graph Learning strategy with M scale layers [30]. In this strategy, continuous values at different time granularities are used to quantify the associations between variables. Specifically, the parameter

E_{nodes} \in R^{L \times d}

is shared across all scales, while

E_{scale} \in R^{M \times d}

represents the embeddings within each scale. Ultimately, embeddings from both spaces are fused using the Hadamard product to generate scale-specific weight information

E_{space}

. The formula is as follows:

E_{space}^{m} = E_{nodes} ⊙ E_{scale}^{m}

(5)

where

E_{scale}^{m} \in R^{1 \times d}

, and

m \in {0, \dots, M}

, represents the embedding at scale m. Then, the proximity similarity between nodes a and b is computed as follows:

D_{a}^{m} = tanh (E_{space}^{m} φ_{1}^{m})

(6)

D_{b}^{m} = tanh (E_{space}^{m} φ_{2}^{m})

(7)

A^{m} = σ ({(D_{a}^{m})}^{T} D_{b}^{m} - D_{a}^{m} {(D_{b}^{m})}^{T})

(8)

where

φ_{1}^{m}, φ_{2}^{m} \in R^{1 \times 1}

are learnable parameters used to capture the features between nodes

D_{a}^{m}

and

D_{b}^{m}

.

A^{m} \in R^{N \times N}

is normalized by the activation function

σ

into the range

[0, 1]

, indicating node connection weights. To generate a continuous sparse adjacency matrix, we introduce the Smoothed Sparse Unit (SSU), which is defined by the following formula:

A_{m}^{'} = Softmax (α tanh (β (A^{m})))

(9)

where

α

and

β

are learnable parameters. The SSU balances reconstruction error and sparsity by adjusting the regularization coefficient

λ

, aiming to align the actual sparsity

ρ^{'}

of the adjacency matrix with the target sparsity

ρ

[31]. The formula is as follows:

min_{A^{m}, α, β} ({∥A^{m} - A_{m}^{'}∥}_{F}^{2} + λ K L (ρ ∥ ρ^{'}))

(10)

where

{∥A^{m} - A_{m}^{'}∥}_{F}^{2}

is the Frobenius norm of the reconstruction error,

KL (\cdot)

calculates the Kullback–Leibler divergence, and

ρ^{'}

is the actual sparsity of

A^{m}

, defined as

ρ^{'} = \frac{1}{N^{2}} \sum_{i j} A_{i j}^{m}

. Ultimately, this yields a specific scale of sparse adjacency matrices

{A^{1}, \dots, A^{m}, \dots, A^{M}}

.

2.2.2. Multi-Scale Temporal Graph Convolution Network

Based on the Dynamic Graph Learning strategy, we have proposed MST-GCN to generate and process multi-scale adjacency matrices. Figure 4 illustrates the overall framework of the network.

In this framework, low-frequency components enter the Multi-Scale Block and are processed through multi-layer convolution operations for scale division. Specifically, given the initial input data

X_{low} \in R^{N \times T}

, where N represents the feature dimensions and T represents the time steps, undergoes a

1 \times 1

Conv operation that preserves sequence information. It then proceeds through m Conv layers, with the output results being

{X_{low}, \dots, X_{low}^{m}, \dots, X_{low}^{M}}

, where

X_{low}^{m} \in R^{N \times \frac{T}{2^{m - 1}}}

, and each Conv layer halves the time dimension. The lower right of Figure 4 shows the processing flow in the Multi-Scale Block, with kernel sizes

1 \times 1

,

1 \times 7

,

1 \times 6

, and

1 \times 3

for the convolution layers.

In the graph constructor, the original information of nodes and edges is encoded into graph representations, and an activated sparse adjacency matrix is generated using Formulas (5)–(10).

GCNs effectively utilize the topological structure of graphs to propagate features and capture local connectivity patterns between nodes. The mathematical formula is as follows:

H^{l} = σ ({\tilde{D}}^{- \frac{1}{2}} \hat{A} {\tilde{D}}^{- \frac{1}{2}} H^{l - 1} W^{l})

(11)

where

H^{l}, H^{l - 1} \in R^{N \times d}

are the feature results of the

l^{t h}

and

{(l - 1)}^{t h}

layers respectively,

\hat{A} = A + I

includes self-connections,

\tilde{D}

is the degree matrix of

\hat{A}

, and

W^{l}

is the learnable weight matrix. As the number of GCN layers increases, node representations often converge, leading to an over-smoothing phenomenon. To address this, we introduce the Mix-hop Propagation mechanism [32], which integrates neighbor information from multiple hop counts, thoroughly considering both the direct connections of nearby nodes and the indirect influences of more distant neighboring relationships. Its structure is shown in the upper right of Figure 4, where

{\hat{A}}^{i}

represents the

i^{t h}

power of

\hat{A}

with

i = 0, 1, 2

, and the operator × has a fixed relative position. Specifically, given input features X and the normalized adjacency matrix

\hat{A}

, node features are updated through g layers of depth propagation. In each layer i, node features H are updated as follows:

H^{i + 1} = α X + (1 - α) \cdot (\hat{A} H^{i} W^{i})

(12)

where

H^{i}

is the feature after i iterations,

α

is the mixing factor controlling the proportion of input information retained.

After g iterations, features across all scales are concatenated along the feature dimension, and an output feature matrix h is generated through an MLP:

h = M L P (Concat [H^{0}, H^{1}, \dots, H^{g}])

(13)

Finally, in Gated Fusion, importance weights of different scales are learned to fuse feature representations across multiple scales.

2.3. Multi-Scale Temporal Convolution Modeling

In wind power prediction, we often have access to covariates for the upcoming period, such as weather forecast data and time indicators. To better utilize this forward-looking information, we have designed the Bi-TGCN based on the TCN architecture. We aim for the current state to be related not only to past observations but also to integrate a perspective on the upcoming period, accurately predicting rapid fluctuations in high-frequency components and quickly identifying key power change points [33]. For this, we perform forward and backward dilated convolutions on the data in the temporal dimension, concatenate them, and then pass the result through a linear projection to achieve a specific size. This is added to the vector

X_{r}

after Residual Conv, and finally, the output is obtained after linear processing and Dropout. The left side of Figure 5 shows the overall framework of the Bi-TGCN, which is expressed as follows:

X_{output} = Dropout (Linear (Concat (F T C, B T C)) + X_{r})

(14)

where

F T C

and

B T C

represent Forward Temporal Convolution and Backward Temporal Convolution, respectively.

We introduce a gating mechanism in TCN, designing the Gated-TCN component to enhance the model’s control over information flow and optimize attention to key temporal points. The process is divided into two branches. First, the input data are initialized using two dilated causal convolution layers, then processed through the tanh and Sigmoid functions for activation, and their outputs are merged. This is followed by layer normalization, activation via the ReLU function, and Dropout processing to produce the final output. The core formula is expressed as follows:

h (x) = tanh (x * θ_{1} + b) + Sigmoid (x * θ_{2} + c)

(15)

where b and c are bias terms for the different convolution kernels

θ_{1}

and

θ_{2}

, respectively;

tanh (\cdot)

and

sigmoid (\cdot)

control the speed of information flow and the proportion of information retained. The right side of Figure 5 displays the structure of the Temporal Convolution Block and Gated-TCN, where the Temporal Convolution Block consists of the outputs from three Gated-TCNs with different dilation factors added together.

Figure 6 depicts the bidirectional architecture of a three-layer TCN, which utilizes forward and backward dilated convolutions to extend its coverage across a broader time span without compromising resolution. As indicated, dilation factors of 1, 2, and 4 are employed to effectively capture seasonal fluctuations in wind power data over multiple time scales. The outputs from these dilated convolutions are concatenated and then subjected to a linear projection to generate the forecast. To enhance the model’s robustness, a skip connection is implemented that reintroduces the linear projection results back to the original input, countering potential vanishing gradient issues caused by layer accumulation.

2.4. Fusion Mechanism

In multi-scale time series forecasting, various scale patterns play distinct roles in influencing prediction outcomes. To effectively merge feature representations from MST-GCN and Bi-TGCN, a multi-head cross-attention mechanism is employed instead of simpler methods such as concatenation or global pooling. This approach effectively integrates and balances high and low-frequency information from different scales for the final prediction. As depicted on the right side of Figure 2, MST-GCN processes the low-frequency component

X_{low}

to produce

H_{low} \in R^{N \times d_{1}}

, while Bi-TGCN processes the high-frequency component

X_{high}

to generate

H_{high} \in R^{N \times d_{2}}

. In this setup, the high-frequency features act as the query, with the low-frequency features forming the key-value set. Through the multi-head cross-attention mechanism, the final multi-scale fused feature representation

H_{fuse} \in R^{N \times d}

is computed as follows:

H_{fuse} = Concat (h e a d_{1}, h e a d_{2}, \dots, h e a d_{i}) W^{O}

(16)

h e a d_{i} = CrossAttention (H_{high} Q_{i}, H_{low} K_{i}, H_{low} V_{i})

(17)

CrossAttention (H_{high}, H_{low}) = Softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(18)

Q = H_{high} W^{Q}, K_{i} = H_{low} W_{i}^{K}, V_{i} = H_{low} W_{i}^{V}

(19)

where

W^{O} \in R^{h d \times d}

,

W^{Q} \in R^{d_{1} \times d_{k}}

,

W^{K} \in R^{d_{2} \times d_{k}}

, and

W^{V} \in R^{d_{2} \times d_{k}}

are learned projection matrices. i is the number of attention heads, and

d_{k}

is the dimension of the key-value set.

3. Data Collection and Analysis

3.1. Data Collection

The dataset used in this study comes from the Chinese State Grid (CSG) [34], including actual power output and corresponding weather data for wind energy generation from 2019 to 2020. The dataset comprises over seventy thousand data points, with a time interval of 15 min. These data were collected via Supervisory Control and Data Acquisition (SCADA) systems and involved wind farms located in various terrains across North, Central, and Northwest China. A detailed description of all feature variables can be found in Table 1.

3.2. Data Preprocessing and Analysis

For the collected raw dataset, we first used linear interpolation to fill in missing values, then applied box plot methods to identify and analyze outliers. The trends in wind power generation and the fluctuations in wind speed at hub height in wind farm site 1 for the year 2019 are respectively shown in Figure 7. These trends indicate that the power series is influenced by various environmental factors, displaying non-stationarity and high volatility, while the wind speed is higher in the summer and lower in the winter, demonstrating clear seasonal variations. Figure 8a illustrates the close relationship between wind speed and power output, showing that once the wind speed reaches a certain critical point, the power output no longer increases, which is associated with the turbine’s rated power and cut-out wind speed. Additionally, Figure 8b reveals significant differences in wind speeds from different directions. When the angle difference between the wind direction and the turbine orientation is minimized, the turbine’s rotational speed significantly increases, thereby effectively enhancing the efficiency of wind power output.

To further reveal the dependencies between power output and various variables, we evaluate these relationships using Pearson correlation coefficients (PCC) and Maximum Information Coefficients (MIC). The mathematical formulas are as follows:

f_{PCC} (x, y) = \frac{\sum x_{i} y_{i} - n \bar{x y}}{(n - 1) s_{x} s_{y}}

(20)

I (x, y) = \int p (x, y) {log}_{2} \frac{p (x, y)}{p (x) p (y)} d x d y

(21)

f_{MIC} (x, y) = max_{a * b < B} \frac{I (x, y)}{{log}_{2} min (a, b)}

(22)

where

x, y

represent different variables,

s_{x}

and

s_{y}

are the variances of the variables,

p (x, y)

is the joint probability between variables x and y, a and b are the number of grid distributions, and B is set to the 0.6 power of the total data count. Based on the analysis results, we created heatmaps in Figure 9. The results indicate that wind speed is highly correlated with power output, while wind direction and environmental factors have weaker correlations with power. However, their correlation with wind speed is moderate. Relative humidity has the lowest correlation with both wind speed and power output and, therefore, is not considered in the analysis.

4. Experimental Case Studies

4.1. Evaluation Metrics

To comprehensively assess the performance of the wind power prediction model, this study has chosen Mean Squared Error (MSE), Mean Absolute Error (MAE), and the coefficient of determination

R^{2}

as key evaluation metrics [35]. The specific formulas for these metrics are as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}

(23)

where the

Y_{i}

are the actual values, the

{\hat{Y}}_{i}

are the predicted values, and n is the number of observations.

M A E = \frac{1}{n} \sum_{i = 1}^{n} | Y_{i} - {\hat{Y}}_{i} |

(24)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}

(25)

where

\bar{Y}

is the mean of the actual values. The model’s predictive results are normalized to facilitate direct comparison and evaluation with actual wind power generation values.

Additionally, to compare the performance of two time series forecasting models, we employed the Diebold-Mariano (DM) test [36], which is used to detect whether there is a significant difference in the forecasting errors between the two models. If the forecasting errors of models A and B are

e_{t}^{A}

and

e_{t}^{B}

, respectively, the difference sequence is defined as

d_{t} = e_{t}^{A} - e_{t}^{B}

. The average of the difference sequence is then calculated as

\bar{d} = \frac{1}{L} \sum_{t = 1}^{L} d_{t}

, and the autocovariance estimate at lag

h = 0

is computed as

\hat{γ} (h = 0) = \frac{1}{L} \sum_{t = 1}^{L} d_{t}^{2}

.

The DM test statistic is given by the following:

DM = \frac{\bar{d}}{\sqrt{\frac{\hat{γ} (h = 0)}{L}}}

(26)

where L is the length of the time series. The null hypothesis

H_{0}

of the DM test is that there is no significant difference in the forecasting errors between the two models, while the alternative hypothesis

H_{1}

suggests a significant difference. By comparing the DM statistic to the standard normal distribution, the resulting p-value determines the outcome: if the p-value is greater than 0.05, we accept the null hypothesis, indicating that the two models perform similarly; if the p-value is less than 0.05, we reject the null hypothesis, indicating that the two models perform differently. Typically, a positive DM test value indicates that model B outperforms model A, while a negative value suggests that model A outperforms model B.

4.2. Experiment Parameter

The system environment used in this study is Ubuntu 22.04, with 64GB of memory and a GeForce RTX 4090 D graphics card (NVIDIA, Santa Clara, CA, USA). The deep learning framework employed is PyTorch 1.9.0, and the software used for the experimental simulation and graphing is Pycharm 2024.1.4. The dataset is divided into training, validation, and test sets with a ratio of 7:1:2. In this study, we systematically explored the parameter space using grid search and Bayesian optimization as quantitative optimization techniques and combining with trial and error to determine the optimal parameters for multi-step ahead forecasting. Specific parameters are provided in Table 2.

4.3. Case 1: Wind Farm Site 1

To validate the effectiveness of the proposed model, we designed two types of comparative experiments. In the first category, we selected five popular baseline models: RF, BiLSTM, CNN-LSTM, GRU, and TCN. The second category includes time series prediction models based on improvements to the Transformer architecture, which encompasses the Informer, Autoformer, FEDformer, and iTransformer. Among them, iTransformer is the current state-of-the-art (SOTA) model in the field of long-term time series forecasting. The short-term forecasting results can be observed in Table 3, where the prediction steps for both Case 1 and Case 2 are set to 2 steps (half an hour ahead). Our proposed model outperformed others across all evaluation metrics, with an MSE of 11.231, MAE of 2.445, and an

R^{2}

of 0.981. This showcases the model’s robustness and accuracy in wind power prediction tasks.

We plotted the indicator radar charts by equal scaling in Figure 10, which clearly demonstrates the outstanding performance of the proposed model on the metrics. The results of the first category of experiments show that in the context of short-term wind power prediction, RF performed the worst in terms of MSE and MAE, with scores 61.07% and 60.70% higher, respectively, than those of the proposed model, due to its limited capability to handle highly volatile time series. Although the other four models featured improvements in feature extraction, they failed to fully capture the impact of wind speed at different scales on wind power output. In the second category of experiments, the performance was generally far superior to that of the common baseline models used in the first category. Notably, the performance of Autoformer and FEDformer was significantly worse than their performance on general datasets, mainly because their full-sequence self-attention mechanisms overly focus on internal dependencies of variables, leading to high computational costs and excessive sensitivity to local noise. This makes them poorly suited to the high volatility of wind power data. The iTransformer showed performance similar to the proposed model, but the proposed model, by integrating MST-GCN and Bi-TGCN, optimized the capture of dependencies both between and within variables, making it more effective in handling real wind power data.

Figure 11 displays the visualized prediction results of the top five performing models. Specifically, Figure 11a shows the prediction curves of each model, while Figure 11b provides a magnified view of the prediction curves to facilitate the observation of details. The results show that the proposed model excels in capturing the fluctuation trends of wind power, validating its effectiveness. Interestingly, the proposed model predicts up and down fluctuations when the true value is 0, indicating difficulty for the model to predict 0 directly. Additionally, in cases where the power consumption of SCADA leads to negative power values, the predicted values show significant errors. These issues will be addressed and further optimized in future research.

In short-term wind power prediction, model stability and robustness are crucial. In Figure 12, we present the PCC between each model and the actual wind power, along with the corresponding scatter plots. The proposed model achieves the highest PCC value, indicating the strongest linear correlation between its predictions and the actual power output. Notably, it demonstrates high stability and accuracy in the low-value regions (bottom right of the figure), as well as a high degree of fit with the actual values in the peak regions (top right of the figure). Combining the DM detection values in Table 4 to observe and compare the stability of other models, we found that the proposed model is better than other models, and the DM statistical value with the largest difference reaches −37.23, further verifying the excellent performance of the proposed model.

4.4. Case 2: Wind Farm Site 4

In this subsection, we evaluate the proposed model using the dataset of wind farm site 4. The evaluation results are presented in Table 5, focusing on two key performance indicators: MSE and MAE. The proposed model achieved the lowest prediction errors among all models evaluated, with an MSE of 13.730 and an MAE of 2.511. The iTransformer’s metrics were slightly higher than those of the proposed model, ranking second in performance, while traditional popular baseline models exhibited higher MSE and MAE values. These results show that the proposed model has good generalization performance in the field of wind power prediction.

In the analysis for Case 2, we selected the five best-performing models for visualization. Figure 13 clearly shows that the proposed model continues to maintain the best performance, which demonstrates its strong fitting capabilities. We plotted violin plots in Figure 14 to compare the probability densities and distributions of the predicted values against the actual values across different models. In Figure 14, the central part is a boxplot, with the white dot representing the median. The violin plot and kernel density plot of the proposed model more closely match the actual data distribution, exhibiting the least volatility and the best stability, further validating the high robustness of the proposed model.

4.5. Model Analysis

To further validate the effectiveness of the proposed model, this subsection conducts an ablation study to analyze the importance and contributions of each component, with specific results detailed in Table 6.

Initially, the MSF-Decomp module (Proposed-D) was removed, and raw data were directly inputted into the Bi-TGCN and MST-GCN for feature extraction. The absence of the MSF-Decomp module prevents the model from effectively separating complex modalities, thereby reducing model performance. Further, removing Bi-TGCN (Proposed-DT) and MST-GCN (Proposed-DG) from the Proposed-D configuration resulted in performance degradation. The absence of the Bi-TGCN prevents the model from effectively capturing the periodicity and abrupt changes in high-frequency sequences. Meanwhile, without MST-GCN, the model struggled to thoroughly analyze the correlations and evolutionary patterns among nodes (variables) across various time scales.

In summary, the effective integration of the above three modules significantly enhances the model’s capability to handle complex dependencies in wind power data, thereby effectively improving prediction accuracy and robustness.

4.6. Seasonal Variation Analysis

To assess the impact of seasonal changes, we extracted data from 2020 and divided them into four seasonal segments: Spring (March to May), Summer (June to August), Autumn (September to November), and Winter (December to February). We then re-evaluated the proposed model and selected baseline models for each segment.

The results of Table 7 clearly demonstrate the significant advantages of the proposed model in adapting to seasonal changes, especially in the transitional seasons of Spring and Autumn. In these seasons, the proposed model recorded the lowest MSE and MAE values at 12.563 and 2.547 for Spring, and 11.564 and 2.45 for Autumn, respectively. These performance metrics were slightly better than those of the iTransformer model and significantly surpassed other baseline models. Moving into Summer, as the climate and wind patterns become relatively stable, all models generally showed improved performance. The iTransformer’s capability to learn nonlinear representations for each variable individually allowed it to exhibit higher predictive efficiency and accuracy in the relatively stable summer, slightly leading the proposed model with lower MSE and MAE values by 0.204 and 0.08, respectively. As for Winter, characterized by high wind speeds and variable wind directions, which increase prediction difficulties, all models saw a rise in errors. However, the proposed model still demonstrated superior performance, with MSE and MAE values of 14.115 and 2.842, respectively, and an

R^{2}

value close to 1, indicating that its predictions closely matched the actual wind power outputs, showcasing its exceptional robustness and adaptability.

Overall, the seasonal analysis highlights the effectiveness of the proposed model’s MST-GCN and Bi-TGCN networks in capturing and adapting to the nonlinear dynamics of seasonal wind changes, further demonstrating the model’s potential value and superior performance in practical wind power forecasting applications.

4.7. Discussion of Multi-Step Ahead Forecasting

In wind power forecasting and grid scheduling, short-term predictions are insufficient to support mid-to-long-term decisions such as grid load balancing and energy storage scheduling. Therefore, this subsection focuses on evaluating the performance of the proposed model in mid-to-long-term forecasting scenarios.

The error of all prediction methods increases with the number of forecasting steps, reflecting the challenges of mid-to-long-term forecasting. We chose a second category baseline model with better performance for comparative experiments. The mid-to-long-term prediction results are displayed in Table 8. The results show that although the proposed model’s MSE and MAE reached 47.752 and 4.265 for mid-term forecasting and 89.818 and 5.847 for long-term forecasting, it still performed better than the comparison models. The iTransformer showed similar metrics to those of the proposed model in short-term forecasts but had significantly higher errors in mid-to-long-term forecasts. In summary, the proposed model demonstrates significant competitive advantages in the mid-to-long-term wind power forecasting domain, providing a precise and reliable forecasting tool for wind farm management and grid scheduling.

5. Conclusions

Due to the uncertainty and high volatility of wind power, the accuracy of wind power prediction is often unsatisfactory. To address this issue, this paper proposes a new prediction model. First, the MSF-Decomp module is used to decompose the wind power data into high- and low-frequency components, mitigating the impact of wind power uncertainty. Then, feature extraction is performed using the Bi-TGCN and MST-GCN networks, and the results are fused to obtain the final prediction. Thanks to its excellent multi-scale processing capabilities, the proposed model, in comparison experiments, reduced the MSE error by an average of 7.1% compared to state-of-the-art models and by an average of 48.9% compared to traditional models. The results demonstrate that the proposed model can effectively capture and integrate data dynamics across different time scales, showing significant practical value in short-term data analysis and mid-to-long-term decision support and proving its advantages in handling complex forecasting tasks.

Although the wind power prediction model proposed in this paper performs excellently in handling multi-scale data and capturing temporal dependencies, it still has some limitations. Firstly, the model’s ability to predict zero or negative wind power values has not been fully optimized, which is crucial for ensuring reliability in practical applications as these values can represent critical operational scenarios. Secondly, due to the integration of MSF-Decomp, Bi-TGCN, and MST-GCN, the complexity of the model might pose challenges in real-time processing, particularly when deployed in online systems that require quick decision-making. Finally, although the model shows higher accuracy and stability compared with the baseline model, different application scenarios in actual operation still need to be considered.

Future research will focus on optimizing the model’s accuracy in predicting zero values and negative power and exploring methods to enhance the precision of mid-to-long-term forecasts. Additionally, we plan to develop an online wind power prediction system for integration into actual power scheduling systems, thus further advancing the practicality of wind power technology.

Author Contributions

Conceptualization, Z.X. and H.Z.; methodology, H.Z.; software, C.X.; validation, H.S., J.X. and Z.W.; investigation, H.Z., Z.X. and H.S.; resources, C.X.; data curation, H.S.; writing—original draft preparation, H.Z.; writing—review and editing, Z.X. and H.Z.; visualization, H.S. and J.X.; supervision, Z.W.; project administration, Z.W.; funding acquisition, Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Science and Technology Project of State Grid of China, grant number 5108-202218280A-2-289-XG.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

MLP	Multi-Layer Perceptron
RF	Random Forest
LSTM	Long Short Term Memory
GRU	Gated Recurrent Units
MSF-Decomp	Multi-Scale Frequency Decomposition
MST-GCN	Multi-Scale Temporal Graph Convolutional Network
Bi-TGCN	Bidirectional Temporal Gated Convolution Network
CSG	Chinese State Grid
SCADA	Supervisory Control and Data Acquisition
PCC	Pearson correlation coefficients
MIC	Maximum Information Coefficients
MSE	Mean Squared Error
MAE	Mean Absolute Error

References

Li, N.; Dong, J.; Liu, L.; Li, H.; Yan, J. A novel EMD and causal convolutional network integrated with Transformer for ultra short-term wind power forecasting. Int. J. Electr. Power Energy Syst. 2023, 154, 109470. [Google Scholar] [CrossRef]
Zhen, Z.; Qiu, G.; Mei, S.; Wang, F.; Zhang, X.; Yin, R.; Li, Y.; Osório, G.J.; Shafie-khah, M.; Catalão, J.P. An ultra-short-term wind speed forecasting model based on time scale recognition and dynamic adaptive modeling. Int. J. Electr. Power Energy Syst. 2022, 135, 107502. [Google Scholar] [CrossRef]
Yang, T.; Yang, Z.; Li, F.; Wang, H. A short-term wind power forecasting method based on multivariate signal decomposition and variable selection. Appl. Energy 2024, 360, 122759. [Google Scholar] [CrossRef]
Ouarda, T.B.; Charron, C. Non-stationary statistical modelling of wind speed: A case study in eastern Canada. Energy Convers. Manag. 2021, 236, 114028. [Google Scholar] [CrossRef]
Ding, Y.; Ye, X.W.; Guo, Y.; Zhang, R.; Ma, Z. Probabilistic method for wind speed prediction and statistics distribution inference based on SHM data-driven. Probabilistic Eng. Mech. 2023, 73, 103475. [Google Scholar] [CrossRef]
Fu, W.; Fu, Y.; Li, B.; Zhang, H.; Zhang, X.; Liu, J. A compound framework incorporating improved outlier detection and correction, VMD, weight-based stacked generalization with enhanced DESMA for multi-step short-term wind speed forecasting. Appl. Energy 2023, 348, 121587. [Google Scholar] [CrossRef]
Xiong, J.; Peng, T.; Tao, Z.; Zhang, C.; Song, S.; Nazir, M.S. A dual-scale deep learning model based on ELM-BiLSTM and improved reptile search algorithm for wind power prediction. Energy 2023, 266, 126419. [Google Scholar] [CrossRef]
Hua, L.; Zhang, C.; Peng, T.; Ji, C.; Shahzad Nazir, M. Integrated framework of extreme learning machine (ELM) based on improved atom search optimization for short-term wind speed prediction. Energy Convers. Manag. 2022, 252, 115102. [Google Scholar] [CrossRef]
Li, T.; Liu, X.; Lin, Z.; Morrison, R. Ensemble offshore Wind Turbine Power Curve modelling—An integration of Isolation Forest, fast Radial Basis Function Neural Network, and metaheuristic algorithm. Energy 2022, 239, 122340. [Google Scholar] [CrossRef]
Lin, L.; Li, M.; Ma, L.; Baziar, A.; Ali, Z.M. Hybrid RNN-LSTM deep learning model applied to a fuzzy based wind turbine data uncertainty quantization method. Ad Hoc Netw. 2021, 123, 102658. [Google Scholar] [CrossRef]
Kisvari, A.; Lin, Z.; Liu, X. Wind power forecasting—A data-driven method along with gated recurrent neural network. Renew. Energy 2021, 163, 1895–1909. [Google Scholar] [CrossRef]
Xiang, L.; Liu, J.; Yang, X.; Hu, A.; Su, H. Ultra-short term wind power prediction applying a novel model named SATCN-LSTM. Energy Convers. Manag. 2022, 252, 115036. [Google Scholar] [CrossRef]
Wu, Q.; Zheng, H.; Guo, X.; Liu, G. Promoting wind energy for sustainable development by precise wind speed prediction based on graph neural networks. Renew. Energy 2022, 199, 977–992. [Google Scholar] [CrossRef]
Song, Y.; Tang, D.; Yu, J.; Yu, Z.; Li, X. Short-Term Forecasting Based on Graph Convolution Networks and Multiresolution Convolution Neural Networks for Wind Power. IEEE Trans. Ind. Inform. 2023, 19, 1691–1702. [Google Scholar] [CrossRef]
Liao, W.; Wang, S.; Bak-Jensen, B.; Pillai, J.R.; Yang, Z.; Liu, K. Ultra-short-term Interval Prediction of Wind Power Based on Graph Neural Network and Improved Bootstrap Technique. J. Mod. Power Syst. Clean Energy 2023, 11, 1100–1114. [Google Scholar] [CrossRef]
Qiu, H.; Shi, K.; Wang, R.; Zhang, L.; Liu, X.; Cheng, X. A novel temporal–spatial graph neural network for wind power forecasting considering blockage effects. Renew. Energy 2024, 227, 120499. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 4–9 December 2017; NIPS’17. pp. 6000–6010. [Google Scholar]
Xiong, B.; Lou, L.; Meng, X.; Wang, X.; Ma, H.; Wang, Z. Short-term wind power forecasting based on Attention Mechanism and Deep Learning. Electr. Power Syst. Res. 2022, 206, 107776. [Google Scholar] [CrossRef]
Niu, Z.; Yu, Z.; Tang, W.; Wu, Q.; Reformat, M. Wind power forecasting using attention-based gated recurrent unit network. Energy 2020, 196, 117081. [Google Scholar] [CrossRef]
Huang, B.; Liang, Y.; Qiu, X. Wind Power Forecasting Using Attention-Based Recurrent Neural Networks: A Comparative Study. IEEE Access 2021, 9, 40432–40444. [Google Scholar] [CrossRef]
Ma, Z.; Mei, G. A hybrid attention-based deep learning approach for wind power prediction. Appl. Energy 2022, 323, 119608. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2021; Volume 35, pp. 11106–11115. [Google Scholar] [CrossRef]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 22419–22430. [Google Scholar]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MA, USA, 17–23 July 2022; Proceedings of Machine Learning Research. Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S., Eds.; PMLR: Westminster, UK, 2022; Volume 162, pp. 27268–27286. [Google Scholar]
Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. arXiv 2024, arXiv:2310.06625. Available online: http://arxiv.org/abs/2310.06625 (accessed on 26 May 2024).
Wang, H.; Peng, J.; Huang, F.; Wang, J.; Chen, J.; Xiao, Y. MICN: Multi-scale Local and Global Context Modeling for Long-term Series Forecasting. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. arXiv 2019, arXiv:1901.00596. Available online: http://arxiv.org/abs/1901.00596 (accessed on 26 May 2024). [CrossRef] [PubMed]
Veličković, P. Everything is connected: Graph neural networks. Curr. Opin. Struct. Biol. 2023, 79, 102538. [Google Scholar] [CrossRef] [PubMed]
Gao, C.; Zheng, Y.; Li, N.; Li, Y.; Qin, Y.; Piao, J.; Quan, Y.; Chang, J.; Jin, D.; He, X.; et al. A Survey of Graph Neural Networks for Recommender Systems: Challenges, Methods, and Directions. ACM Trans. Recomm. Syst. 2023, 1, 1–51. [Google Scholar] [CrossRef]
Chen, L.; Chen, D.; Shang, Z.; Zhang, Y.; Wen, B.; Yang, C. Multi-Scale Adaptive Graph Neural Network for Multivariate Time Series Forecasting. arXiv 2022, arXiv:2201.04828. Available online: http://arxiv.org/abs/2201.04828 (accessed on 26 May 2024). [CrossRef]
Peters, B.; Martins, A.F.T. Smoothing and Shrinking the Sparse Seq2Seq Search Space. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y., Eds.; ACL Anthology: Melbourne, Australia, 2021; pp. 2642–2654. [Google Scholar] [CrossRef]
Abu-El-Haija, S.; Perozzi, B.; Kapoor, A.; Alipourfard, N.; Lerman, K.; Harutyunyan, H.; Steeg, G.V.; Galstyan, A. MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; Proceedings of Machine Learning Research. PMLR: Westminster, UK, 2019; Volume 97, pp. 21–29. [Google Scholar]
Sprangers, O.; Schelter, S.; de Rijke, M. Parameter-efficient deep probabilistic forecasting. Int. J. Forecast. 2023, 39, 332–345. [Google Scholar] [CrossRef]
Chen, Y.; Xu, J. Solar and wind power data from the Chinese State Grid Renewable Energy Generation Forecasting Competition. Sci. Data 2022, 9, 577. [Google Scholar] [CrossRef]
González-Sopeña, J.; Pakrashi, V.; Ghosh, B. An overview of performance evaluation metrics for short-term statistical wind power forecasting. Renew. Sustain. Energy Rev. 2021, 138, 110515. [Google Scholar] [CrossRef]
Diebold, F.X.; Mariano, R.S. Comparing Predictive Accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [Google Scholar] [CrossRef]

Figure 1. Wind turbine workflow.

Figure 2. The overall framework of the proposed model.

Figure 3. Framework diagram of MSF-Decomp module.

Figure 4. The primary structure of MST-GCN.

Figure 5. Overall architecture diagram of Bi-TGCN.

Figure 6. Bidirectional architecture diagram of three-layer TCN.

Figure 7. Data distribution of power and wind speed.

Figure 8. Scatter plot of power versus wind speed and direction.

Figure 9. Correlation analysis results.

Figure 10. Prediction accuracy comparison.

Figure 11. Visualization of forecast results in wind farm site 1. The section within the red circle in Figure 11a is magnified and displayed in Figure 11b.

Figure 12. PCC scatter plot of predicted and true values in wind farm site 1.

Figure 13. Visualization of forecast results in wind farm site 4. The section within the red circle in Figure 13a is magnified and displayed in Figure 13b.

Figure 14. Voilin plots of predicted and true values in wind farm site 4.

Table 1. Description of the feature variables for CSG.

Heading Name	Shortened Name
Wind speed at height of x meters (m/s)	WS_x
Wind direction at height of x meters (°)	WD_x
Air temperature (°C)	Air_T
Atmosphere (hpa)	Air_P
Relative humidity (%)	Air_H
Power output(MW)	Power (MW)

Table 2. Parameter list.

Parameter	Value
GCN depth	2
Bi-TGCN scale	[12,12,12]
Dimensions of embedding	256
Number of heads of attention	6
Batch size	32
Optimizer	Adam
Dropout	0.2
Initial learning rate	$1 \times 10^{- 5}$

Table 3. Models prediction error results in wind farm site 1.

Model	MSE	MAE	R²
RF	28.856	5.221	0.785
BiLSTM	25.371	4.873	0.812
CNN-LSTM	23.790	4.210	0.819
GRU	23.182	3.789	0.931
TCN	20.530	3.940	0.940
Informer	16.458	2.991	0.968
Autoformer	18.131	3.514	0.963
FEDformer	19.415	3.661	0.941
iTransformer	11.743	2.532	0.980
Proposed	11.231	2.445	0.981

Table 4. DM detection results of the proposed method and the comparison model.

Methods	DM Statistic	p Value
Proposed vs. RF	−37.23	1.26 × 10⁻¹⁶⁸
Proposed vs. BiLSTM	−25.00	3.15 × 10⁻⁶⁷
Proposed vs. CNN-LSTM	−18.89	9.25 × 10⁻¹²⁰
Proposed vs. GRU	−19.97	2.14 × 10⁻¹⁴⁸
Proposed vs. TCN	−21.23	7.25 × 10⁻¹²²
Proposed vs. Informer	−14.90	1.67 × 10⁻⁴⁰
Proposed vs. Autoformer	−16.23	6.78 × 10⁻⁴⁸
Proposed vs. FEDformer	−17.29	1.27 × 10⁻⁵⁴
Proposed vs. iTransformer	−8.91	4.73 × 10⁻¹⁹

Table 5. Models prediction error results in wind farm site 4.

Model	MSE	MAE	R²
RF	30.043	5.034	0.783
BiLSTM	25.991	4.427	0.833
CNN-LSTM	25.774	4.029	0.845
GRU	24.101	4.384	0.883
TCN	22.451	3.848	0.937
Informer	18.119	2.907	0.951
Autoformer	19.857	3.582	0.952
FEDformer	21.534	3.661	0.939
iTransformer	14.005	2.540	0.975
Proposed	13.730	2.511	0.978

Table 6. Results of ablation study.

Model	MSE	MAE	R²
Proposed-D	14.868	2.896	0.961
Proposed-DT	21.501	4.080	0.937
Proposed-DG	17.543	3.767	0.949
Proposed	11.231	2.445	0.981

Table 7. Seasonal Variation in Model Performance.

Season	Model	MSE	MAE	R²
Spring	Informer	18.512	3.42	0.948
	Autoformer	16.12	3.581	0.957
	FEDformer	17.638	3.593	0.951
	iTransformer	13.107	2.551	0.978
	Proposed	12.563	2.547	0.978
Summer	Informer	16.018	2.589	0.971
	Autoformer	17.416	2.471	0.977
	FEDformer	16.555	2.4	0.975
	iTransformer	10.75	2.139	0.989
	Proposed	10.954	2.219	0.988
Autumn	Informer	16.589	3.41	0.971
	Autoformer	18.964	3.471	0.96
	FEDformer	19.564	3.51	0.947
	iTransformer	11.69	2.578	0.981
	Proposed	11.564	2.45	0.982
Winter	Informer	21.04	4.224	0.937
	Autoformer	20.417	4.346	0.931
	FEDformer	22.471	4.851	0.935
	iTransformer	14.864	2.799	0.969
	Proposed	14.115	2.842	0.972

Table 8. Medium and long-term forecast results.

Model	12-Step (3 h Ahead)			24-Step (6 h Ahead)
Model	MSE	MAE	R²	MSE	MAE	R²
Informer	58.531	4.976	0.787	126.458	7.457	0.688
Autoformer	66.544	5.446	0.731	145.581	7.575	0.649
FEDformer	62.998	5.22	0.743	131.912	7.22	0.653
iTransformer	55.184	4.492	0.827	100.31	6.656	0.71
Proposed	47.752	4.265	0.833	89.818	5.847	0.737

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Z.; Zhao, H.; Xu, C.; Shi, H.; Xu, J.; Wang, Z. A Novel Wind Power Prediction Model That Considers Multi-Scale Variable Relationships and Temporal Dependencies. Electronics 2024, 13, 3710. https://doi.org/10.3390/electronics13183710

AMA Style

Xu Z, Zhao H, Xu C, Shi H, Xu J, Wang Z. A Novel Wind Power Prediction Model That Considers Multi-Scale Variable Relationships and Temporal Dependencies. Electronics. 2024; 13(18):3710. https://doi.org/10.3390/electronics13183710

Chicago/Turabian Style

Xu, Zhanyang, Hong Zhao, Chengxi Xu, Hongyan Shi, Jian Xu, and Zhe Wang. 2024. "A Novel Wind Power Prediction Model That Considers Multi-Scale Variable Relationships and Temporal Dependencies" Electronics 13, no. 18: 3710. https://doi.org/10.3390/electronics13183710

APA Style

Xu, Z., Zhao, H., Xu, C., Shi, H., Xu, J., & Wang, Z. (2024). A Novel Wind Power Prediction Model That Considers Multi-Scale Variable Relationships and Temporal Dependencies. Electronics, 13(18), 3710. https://doi.org/10.3390/electronics13183710

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Wind Power Prediction Model That Considers Multi-Scale Variable Relationships and Temporal Dependencies

Abstract

1. Introduction

2. Proposed Methodology

2.1. Multi-Scale Decomposition Module

2.2. Multi-Scale Graph Neural Network Modeling

2.2.1. Dynamic Graph Learning

2.2.2. Multi-Scale Temporal Graph Convolution Network

2.3. Multi-Scale Temporal Convolution Modeling

2.4. Fusion Mechanism

3. Data Collection and Analysis

3.1. Data Collection

3.2. Data Preprocessing and Analysis

4. Experimental Case Studies

4.1. Evaluation Metrics

4.2. Experiment Parameter

4.3. Case 1: Wind Farm Site 1

4.4. Case 2: Wind Farm Site 4

4.5. Model Analysis

4.6. Seasonal Variation Analysis

4.7. Discussion of Multi-Step Ahead Forecasting

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI