A Dual-Branch Transformer Network with Multi-Scale Attention Mechanism for Microgrid Wind Turbine Power Forecasting

Wu, Jie; Chang, Zhengwei; Zhang, Linghao; Chen, Mingju; Li, Senyuan; Qiu, Fuhong

doi:10.3390/electronics14132566

Open AccessArticle

A Dual-Branch Transformer Network with Multi-Scale Attention Mechanism for Microgrid Wind Turbine Power Forecasting

by

Jie Wu

¹,

Zhengwei Chang

¹,

Linghao Zhang

^1,*,

Mingju Chen

^2,*,

Senyuan Li

^2,*

and

Fuhong Qiu

²

¹

Research Institute of Electric Power Science, State Grid Corporation of Sichuan Province, Chengdu 610041, China

²

School of Automation and Information Engineering, Sichuan University of Science and Engineering, Yibin 644000, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(13), 2566; https://doi.org/10.3390/electronics14132566

Submission received: 5 May 2025 / Revised: 8 June 2025 / Accepted: 23 June 2025 / Published: 25 June 2025

(This article belongs to the Special Issue Real-Time Monitoring and Intelligent Control for a Microgrid)

Download

Browse Figures

Versions Notes

Abstract

Wind power generation provides clean and renewable electricity for microgrids, but its intermittency and uncertainty pose challenges to the operation and power quality of microgrids. Accurate forecasting is conducive to maintaining the stability of microgrids and improving the efficiency of energy management. Therefore, this study proposes a dual-branch frequency transformer (DBFformer), which leverages multi-scale spectral transformation and the multi-head attention mechanism to improve the prediction accuracy of microgrid wind turbines. In the encoder, two parallel branches are designed to extract the global features and local dynamic features of meteorological data based on Fourier transform and wavelet transform, respectively. In the decoder, an exponential smoothing attention (ESA) mechanism and a frequency attention (FA) mechanism are introduced to extract multi-scale temporal features. ESA enhances the model’s ability to capture long-term growth trends, whereas FA focuses on periodic pattern recognition. Additionally, to further optimize the model’s performance, a periodic weight coefficient (PWC) mechanism is employed to dynamically adjust the fusion coefficients to further improve the fusion performance and prediction accuracy. The factors influencing wind turbine power are analyzed; then, the most relevant factors are selected for the experiment. According to the experimental results, the proposed DBFformer accurately predicts the output power of wind turbines and exhibits superior performance. It achieves lower mean squared error (MSE) and mean absolute error (MAE) values than other state-of-the-art models. Specifically, its MSE values are 0.195, 0.216, 0.457, and 0.583, and the corresponding MAE values are 0.318, 0.335, 0.474, and 0.503 for different rated wind turbines. Furthermore, comprehensive ablation experiments validate that the dual-branch structure, frequency transformations, dual-attention mechanisms, and PWC module have a positive impact on the proposed model. Therefore, this research offers a novel and effective approach for wind power forecasting and supports the broader goal of integrating clean energy into microgrids.

Keywords:

power forecasting; microgrid wind turbine; frequency-domain transformer; multi-head exponential smoothing attention

1. Introduction

Microgrids construct a multi-energy complementary network by integrating wind turbines, energy storage devices, and controllable loads. Such a system dynamically adjusts wind power integration strategies through prediction technologies while suppressing wind power fluctuations via control techniques such as virtual synchronous generators (VSGs), thereby achieving coordinated optimization of source-grid load storage. Wind power generation provides clean and renewable electricity for microgrids, but its intermittency and uncertainty pose challenges to the operation and power quality of microgrids. Phasor measurement units (PMUs) and state estimation (SE) can capture the instantaneous fluctuations of wind power generation and the dynamic behavior of microgrids, enabling better control and integration of wind power into microgrids and achieving adaptive energy dispatch under different wind conditions [1]. Accurate wind power forecasting is conducive to improving the ability of PMUs to achieve adaptive energy dispatch under different wind conditions, increasing renewable energy integration, and providing a robust foundation for optimal microgrid configuration and demand-side response [2].

Wind power generation forecasting methods are generally categorized into three groups: physical modeling, statistical analysis, and machine learning approaches [3]. Physical modeling effectively analyzes real-time meteorological parameters and geographical characteristics in the vicinity of wind turbine units while incorporating turbine output characteristic curves to predict power output, thereby reducing dependence on historical data. Numerical Weather Prediction (NWP) technology maps meteorological data to hub-height wind speeds through micro-meteorological modeling and fluid dynamics simulations, thereby forming a physical framework for wind power generation forecasting [4]. Although this framework offers fundamental predictive support by inversely deriving wind energy characteristics based on physical mechanisms, it requires significant computational resources to obtain high-precision meteorological data. In theory, physical models can achieve relatively high accuracy; however, their high computational complexity and strong dependence on high-quality meteorological data present considerable limitations [5].

Compared to physical models, statistical methods are simpler in their approach. They primarily rely on the historical correlation between wind turbine power outputs and the surrounding meteorological variables to establish a mapping relationship. By conducting in-depth analyses and pattern recognition based on historical data, statistical models predict future wind power outputs over a given time horizon [6]. Among them, the Autoregressive Moving Average (ARMA) model is widely used for time-series analysis. However, ARMA models impose strict requirements on the stationarity of input data. In contrast, wind energy data are characterized by high volatility and randomness, often making them unsuitable for the application prerequisites of ARMA models [7].

As the demand for highly accurate wind power forecasting grows, the applicability of traditional statistical methods in this domain has gradually diminished. Statistical methods are generally suitable for short-term forecasting, especially when data fluctuations are small. However, they often perform poorly in long-term prediction tasks [8].

In this context, researchers have increasingly turned to machine learning approaches to extract features from historical wind power data and multidimensional meteorological parameters. These methods leverage adaptive learning mechanisms to capture the underlying association rules within the data [9]. Support vector machines (SVMs) use kernel transformations to enhance data separability. As a result, they show excellent predictive performance in specific scenarios. However, their effectiveness heavily depends on expert experience for kernel function selection and parameter tuning, and they are highly sensitive to wind speed data quality, posing challenges for practical deployment. Extreme Learning Machines (ELMs), with their rapid training speed and minimal human intervention, have exhibited strong performance in multivariate time-series forecasting tasks. Nevertheless, they still face common issues associated with single-model architectures, such as gradient vanishing and limited generalization capabilities [10].

Compared to traditional machine learning methods, deep neural networks, featuring deep convolutional architectures and temporal recurrent networks, have significantly enhanced nonlinear modeling capabilities and are emerging as key technologies for wind power system forecasting. For instance, in wind speed prediction, Chen et al. [11] integrated spatial and temporal correlations by employing convolutional neural networks (CNNs) to extract spatial features and long short-term memory (LSTM) networks to capture temporal dependencies. Imani et al. [12] developed a short-term wind power prediction (WPP) model specifically designed for newly constructed, expanded, or reconstructed meteorological feature representations surrounding wind turbine units by combining stacked denoising autoencoders (SDAEs) with multi-layer transfer learning. This approach addresses the challenges of insufficient operational data and high prediction errors in newly established wind farms. Erick et al. [13] proposed a wind speed prediction method based on an Echo State Network (ESNs) and LSTM networks, wherein the hidden units of the ESN were replaced with LSTM blocks. The entire network was jointly trained, and experimental results demonstrated that this method outperformed traditional ESN approaches in wind speed forecasting. To effectively capture long-range dependencies within wind power data, Qu et al. [14] introduced a novel Transformer-based forecasting model. By optimizing the model structure, constructing dedicated matrices, and recalculating multi-head attention (MHA) values, their approach proved effective for multi-unit wind power forecasting, demonstrating a strong capability in capturing correlations among meteorological conditions surrounding different wind turbine units, thereby achieving highly accurate predictions. Furthermore, to better leverage localized patterns within the meteorological data surrounding wind turbine units, Nascimento et al. [15] proposed a deep neural network architecture that combines wavelet transform with the Transformer framework. By enhancing features through wavelet transform, their method further improved forecasting accuracy.

In summary, the intermittent volatility and randomness of weather data surrounding wind turbine units are critical factors affecting the accuracy of wind power generation forecasting. This study aims to further exploit the multi-scale characteristics of the meteorological data surrounding wind turbine units and mitigate the impact of volatility. In order to enhance prediction accuracy, a novel model named dual-branch frequency transformer (DBFformer) based on multi-head attention is proposed. DBFformer adopts a unique dual-branch architecture. With this architecture, the network can capture and enhance both local features and global patterns within time-series data, thereby improving the forecasting performance. The main contributions of this paper are outlined as follows:

(1): A dual-branch architecture is constructed to integrate both discrete Fourier transform and wavelet transform modules, enabling multi-scale modeling of sequence information in terms of both local and global features. In this architecture, the Fourier transform branch is employed to extract global representations of the time series, while the wavelet transform is designed to capture short-term localized features. This design significantly enhances the model’s ability to detect complex periodic structures within non-stationary time-series data.
(2): A multi-scale attention mechanism that integrates multi-head exponential smoothing attention (MH-ESA) and frequency attention (FA) is designed to capture dependencies across multiple temporal scales. Specifically, ESA enhances the model’s sensitivity to long-term trends through exponential smoothing, while FA leverages frequency-domain transformations to identify periodic patterns embedded in local segments of the data. This combination enables effective extraction of multi-scale characteristics inherent in time-series data.
(3): To further optimize the performance of the DBFformer model and improve the accuracy of wind power generation forecasting, a periodic weight coefficient (PWC) mechanism is introduced. This mechanism adjusts the contribution of features from both global and local frequencies. As a result, it improves the integration of long-term trend features with short-term fluctuations.

2. Related Work

In this study, weather data surrounding wind turbine units is used as input to forecast wind power generation through the proposed dual-branch frequency transformer. In recent years, deep learning-based predictive networks have demonstrated excellent adaptability, especially in handling complex time-series data. This section introduces wind power prediction based on deep learning and analyzes the factors affecting wind turbine power generation.

2.1. Deep Learning Networks for Wind Turbine Power Forecasting

Deep learning frameworks show strong capabilities in capturing nonlinear relationships and have demonstrated excellent performance in forecasting weather data surrounding wind turbine units [16]. Shen et al. [17] proposed a novel hybrid neural network framework that combines a convolutional neural network with a long short-term memory network for multi-step wind speed forecasting. The framework comprises a data preprocessing module and a model training module. Experimental results on three benchmark datasets demonstrate that the proposed method significantly improves prediction accuracy and enhances the model’s generalization capability compared to various baseline approaches. Li et al. [18] proposed a photovoltaic power forecasting model based on LSTNet, achieving high accuracy and robustness in the prediction of highly stochastic and volatile photovoltaic power generation. To address the issue of limited training data in newly deployed wind turbine units, Peng et al. [19] developed a short-term wind power generation forecasting model based on stacked denoising autoencoders and multi-level transfer learning. By hierarchically transferring highly correlated samples, the model significantly reduced forecasting errors and demonstrated superior performance across multiple wind turbine units. To effectively utilize the time-frequency information of the meteorological data surrounding wind turbine units, Cai et al. [20] proposed a short-term load forecasting model that integrates the Hunter–Prey Optimizer (HPO) algorithm with an LSTM network. By employing the HPO algorithm to automatically optimize the hyperparameters of the LSTM model, the approach eliminates the uncertainty associated with manual and empirical parameter tuning. Experimental results demonstrate that the proposed model significantly enhances forecasting accuracy and overall model performance in short-term load prediction tasks, outperforming several existing benchmark methods. Furthermore, Memarzadeh et al. [21] integrated the crow search algorithm (CSA), wavelet transform (WT), and a mutual information-based feature selection method (MI-FS) with an LSTM network to effectively mine the hidden temporal and feature information in wind speed data, thereby enhancing short-term wind speed forecasting accuracy. Zhang et al. [22] utilized BiLSTM networks to capture long-term dependencies in time series, combined with random forest models for nonlinear modeling and feature selection, which improved both prediction accuracy and global search capability. Yang et al. [23] proposed a short-term wind speed forecasting model that integrates CEEMDAN decomposition, the RIME optimization algorithm, multi-head self-attention (MHSA), and BiLSTM networks, demonstrating strong adaptability across different geographic locations and seasonal conditions.

Recurrent neural networks often encounter problems such as gradient vanishing or explosion. When they model long sequences, they tend to overlook important global features and the variability of local information across different time steps. This oversight limits their forecasting performance. To address these issues, Transformer architectures based on the self-attention mechanism have been proposed in recent years. These models can effectively capture global dependencies within input sequences and mitigate the information decay problem encountered by RNNs when dealing with long sequences, achieving widespread application in time-series forecasting tasks [24]. Farsani et al. [25] proposed a Transformer-based neural network that demonstrates strong robustness and accuracy in long-term forecasting tasks. Yan et al. [26] introduced a Transformer model by integrating frequency-domain decomposition and global signal enhancement, achieving high-precision long-term wind power generation forecasting. Xu et al. [27] developed a power load forecasting model based on the Informer architecture, which processes long-range dependencies through a sparse self-attention mechanism and outperforms traditional RNN models in both forecasting accuracy and computational efficiency for long time series. Wu et al. [28] proposed an Autoformer-based power load forecasting model that leverages series decomposition and auto-correlation mechanisms to capture periodic consumption patterns, significantly improving long-term electrical demand prediction accuracy while reducing computational requirements. Zhou et al. [29] introduced FEDformer, a frequency enhanced decomposed transformer that combines seasonal-trend decomposition with frequency-domain transformations to capture both global patterns and detailed structures in power-load time series, achieving superior long-term forecasting accuracy with linear computational complexity. Dual-branch architectures have demonstrated remarkable success across various domains by enabling models to process information through multiple complementary pathways. In computer vision, Simonyan and Zisserman [30] pioneered the two-stream CNN architecture for action recognition, where one stream processes spatial information and the other captures temporal dynamics. Similarly, in audio processing, attention-based models have shown strong capabilities in handling spectrogram data with complex time-frequency relationships, inspiring the application of similar architectures to wind power data that exhibit comparable multi-scale temporal patterns [31]. The integration of frequency-domain transformations with deep learning architectures has recently gained increasing attention. In audio processing, for instance, Lu et al. [32] proposed SpecTNT, which models spectrograms along both time and frequency axes through a dual-branch design: spectral Transformers extract frequency-related features, while temporal Transformers capture temporal dependencies. This architecture demonstrates the effectiveness of jointly modeling spectral and temporal information. In image processing, hybrid networks combining spectral layers with attention mechanisms have also been developed to enhance both global frequency perception and spatial interaction [33]. Inspired by these successful applications in the image and audio domains, this study adapts the dual-branch paradigm and frequency-domain modeling to tackle the unique challenges of wind power forecasting. Unlike static images or audio signals, wind power time series exhibit strong non-stationarity, with both short-term fluctuations and long-term trends. The proposed DBFformer is specifically designed to address these characteristics, leveraging a dual-branch structure that simultaneously captures global frequency representations and local temporal features, thereby improving forecasting accuracy in the presence of volatility and intermittency.

2.2. Influencing Factors of Wind Turbine Power

Wind speed, as the core driving parameter for wind energy conversion, has a decisive impact on power output due to its intermittent and stochastic nature [34]. In addition to wind speed, factors such as wind direction, air temperature, atmospheric pressure, and relative humidity are also closely related to wind power generation [35]. The spatiotemporal coupling effects among these variables may lead to significant nonlinear characteristics in power output, thereby increasing the complexity of prediction modeling.

To validate the correlation of the above-mentioned factors with wind power generation, meteorological and operational data from wind turbine units in a northwestern Chinese city (2019–2020) were analyzed. The dataset includes four typical turbine types with rated capacities of 36 MW, 66 MW, 99 MW, and 200 MW—representative of the primary models deployed in the region. These values reflect the actual installed units in operation during the data collection period. Notably, wind turbine power output is a continuous variable influenced by meteorological conditions, yet classifying turbines by rated capacity enables a systematic investigation of how environmental factors correlate with turbines of different scales and design specifications. Rated capacity provides a basis for grouping turbines, while actual power output varies continuously within each unit’s operational range. This classification facilitates structured, comparative analysis across turbine types, aligning with practical deployment patterns and enabling cross-type modeling under varying meteorological influences.

The dataset used in this study was obtained from operational SCADA systems of four wind farms located in the capital city of a northwestern province in China. The data spans from 2019 to 2020 and includes meteorological and power generation records collected at 15 min intervals. The four wind farms selected for analysis correspond to wind turbines with rated power outputs of 36 MW, 66 MW, 99 MW, and 200 MW, which represent the primary turbine capacities deployed in the region. Table 1 summarizes the amount of available data and the observed atmospheric pressure ranges.

The wind data involved in the correlation analysis included wind speed (m/s) at 10 m, wind direction (°) at 10 m, wind speed (m/s) at 30 m, wind direction (°) at 30 m, wind speed (m/s) at 50 m, wind direction (°) at 50 m, wind speed (m/s) at rotor height, wind direction angle (°) at rotor height, air temperature (°C), atmospheric pressure (hPa), relative humidity (%), and the electrical power output (MW) of the wind turbines. The analysis results are displayed in Figure 1.

WS_10, WS_30, WS_50, and WS_C represent wind speeds (m/s) at heights of 10 m, 30 m, 50 m, and rotor height, respectively. WD_10, WD_30, WD_50, and WD_C represent wind direction angles (°) at the same heights. Air_T, Air_P, and Air_H represent air temperature (°C), atmospheric pressure (hPa), and relative humidity (%), respectively. Power represents the final electrical power output (MW).

It can be seen from the figure that wind speed demonstrates a robust correlation with the power outputs of all wind turbines with different capacities, while wind direction maintains a moderate correlation among units with different rated power. Temperature exhibits a comparatively weaker association. Given that atmospheric pressure and humidity show negligible impacts on power generation (approaching zero correlation), these parameters were excluded from the predictive input dataset to reduce model computation. Consequently, this study selects the wind speed, the wind direction and the temperature as input variables for the power prediction model to forecast the power output of the wind turbine.

3. DBFformer

3.1. Traditional Transformer

The Transformer architecture centers on the attention mechanism. It abandons traditional sequential recursion and convolutional operations for computing input and output representations. Instead, it focuses on extracting long-range dependencies within sequences [36]. Unlike conventional models that process data in a strictly sequential manner, the Transformer architecture leverages attention mechanisms to capture global dependencies across the entire sequence. For complex and highly dynamic time-series data, the Transformer model assigns higher attention weights to key components. This approach improves predictive accuracy, particularly in challenging wind power generation forecasting tasks.

The overall structure of the Transformer model, as illustrated in Figure 2, consists of multiple stacked encoder and decoder layers. Each module incorporates a multi-head attention (MHA) mechanism and a Feed-Forward Network (FFN) [37]. The encoder receives the input sequence and uses the multi-head attention mechanism to capture relationships and semantic information among the sequence elements. It then applies a feed-forward neural network for nonlinear feature transformations, progressively extracting more abstract and high-level feature representations through stacked layers. The decoder utilizes the encoded representations. It then applies multi-head attention mechanisms to simultaneously focus on the encoder’s output and the partially generated outputs. This dual attention allows the decoder to generate the target sequence step by step, with each step processed through the feed-forward network to transform the features into output probabilities.

3.2. Proposed DBFformer

To effectively utilize the characteristics of the complexity of long-term sequences and the periodic variations of short-term time series, DBFformer is proposed as illustrated in Figure 3. The network adopts a unique dual-branch structure. In the encoder phase, Fourier transforms are applied to extract and process global sequence features, while wavelet transforms handle local sequence features.

The decoder incorporates two specialized modules. The multi-head exponential smoothing attention module, which combines the concept of exponential smoothing with a multi-head attention mechanism, captures growth trends in the meteorological data. The frequency attention module, which focuses on analyzing frequency components of the data, extracts periodic information from the data surrounding wind turbine units. Furthermore, during the decoding process, the periodic weight coefficient module balances global features with local features; then, the feed-forward network enhances these multi-scale representations. This comprehensive approach ultimately enables accurate predictions of wind power generation.

To address real-world challenges such as abrupt fluctuations in wind speed and complex temporal patterns in power generation data, DBFformer integrates domain-adaptive modules that explicitly model both global and local dynamics. The Fourier transform component captures dominant global periodicities, enabling the model to robustly track seasonal or cyclical trends, which are often obscured by transient disturbances. This makes the model less sensitive to short-term noise and more capable of maintaining stability across varying temporal scales. Conversely, the wavelet transform is particularly effective in detecting localized anomalies and short-term shifts, such as sudden wind gusts or rapid drops in temperature. By decomposing the signal at multiple resolutions, the wavelet branch allows the model to adaptively respond to non-stationary features, enhancing sensitivity to dynamic changes.

The Multi-Head Exponential Smoothing Attention (MH-ESA) module introduces temporal prioritization, assigning greater weights to more recent data while smoothing noisy signals, thereby improving the model’s responsiveness to sudden wind changes. In parallel, the frequency attention (FA) module enhances the extraction of periodic components by operating in the frequency domain, which allows the model to distinguish between short-term volatility and true recurring patterns. This dual-path and multi-scale attention design ensures that the model not only captures long-term regularities but also remains responsive to irregular or rapidly changing inputs, making DBFformer well-suited for real-world wind power forecasting tasks with high volatility and complexity.

3.2.1. Dual-Branch Encoder

In the encoder, the global branch receives the long-term trend sequence (

X_{g r o w t h}

) from the long-term series and extracts the global information through the embedded Fourier transform-based global module, denoted as

Z_{g l o b a l}

; the local branch receives the short-term seasonal sequence (

X_{s e a s o n a l}

) from the short-term series and extracts the local information through the embedded wavelet transform-based local module, denoted as

Z_{l o c a l}

. It is expressed as follows:

Z_{global} = B r a n c h_{global} (X_{g r o w t h}),

(1)

Z_{local} = B r a n c h_{local} (X_{s e a s o n a l}),

(2)

This study addresses the high memory demands of Transformer models through a specialized design. The approach incorporates two key components: a Fourier transform module for extracting global periodic information and a wavelet transform module for capturing local periodic information. This design allows model complexity to grow sub-linearly with sequence length. As a result, the number of parameters required for long-term time-series forecasting is reduced, significantly improving the model’s overall efficiency.

3.2.2. Multi-Scale Attention Mechanism Decoder

The task of the decoder is to generate predictions for the next h steps. The final prediction consists of the horizontal prediction (

E_{t + h}^{(n)}

), the growth trend prediction (

B_{t + h}^{(n)}

), and the seasonal trend prediction (

S_{t + h}^{(n)}

), which are defined as follows:

E_{t + h}^{(n)} = α * (E_{t}^{(n - 1)} - Linear (S_{t}^{(n)})) + (1 - α) * (E_{t - 1}^{(n)} + Linear (B_{t - 1}^{(n)})),

(3)

where

α \in R^{m}

is a learnable smoothing parameter,

*

represents an element-wise multiplication term, and

Linear (\cdot) : R^{d} \to R^{m}

maps the representation to the observation space. The level extracted in the final layer can correspond to the level of the look-back window.

B_{t + h}^{(n)} = MH - ESA (LN (Z_{t}^{(n - 1)} + FF (Z_{t : t}^{(n - 1)}))),

(4)

S_{t + h}^{(n)} = {FA}_{t} (Z_{t}^{(n - 1)} - S_{t}^{(n)}),

(5)

where LN denotes the normalization layer,

FF (x) = Linear (σ (Linear (x)))

represents the position-wise feed-forward network,

σ (\cdot)

is the sigmoid function, MH-ESA stands for the multi-head exponential smoothing attention module, and FA refers to the frequency attention module.

After defining the individual predictions, the final h-step-ahead prediction is generated by combining the horizontal, growth, and seasonal predictions, defined as follows:

X_{t + h} = E_{t + h} + Linear (\sum_{n = 1}^{N} (B_{t + h}^{(n)} + S_{t + h}^{(n)})),

(6)

In the equation,

E_{t + H}

,

B_{t + H}^{(n)}

, and

S_{t + H}^{(n)}

represent the horizontal prediction, growth trend prediction, and seasonal trend prediction, respectively.

In this paper, to enhance the feature extraction capability, novel multi-head exponential smoothing attention and frequency attention mechanisms are introduced for each branch to replace traditional attention mechanisms.

(1): MH-ESA

The multi-head attention mechanism parallelizes the attention process by using multiple attention heads to capture information from different subspaces. It smooths the attention weights, assigning higher weights to more recent features, thereby enhancing the model’s ability to extract horizontal, growth, and seasonal features. The query (

Q

), key (

K

), and value (

V

) vectors are projected into multiple lower-dimensional subspaces, resulting in

Q_{i}

,

K_{i}

, and

V_{i}

for each head (where

i

denotes the index of the head). The attention weights (

A_{i}

) for each head are calculated as follows:

A_{i} = s o f t m a x (\frac{Q_{i} K_{i}^{T}}{\sqrt{d_{k}}}),

(7)

where

d_{k}

is the dimension of the key vector and the

s o f t m a x

function normalizes the weights.

For the meteorological data surrounding wind turbine units, there exist stationary seasonal features and non-stationary characteristics [38]. To effectively integrate both stationary and non-stationary information, the attention weights (

A_{i}

) calculated for each head are smoothed using an exponential smoothing algorithm, assigning higher weights to more recent data. Let

A_{i, t}

represent the

i - t h

multi-head attention weight at time

t

. The exponential smoothing update is formulated as follows:

A_{i, t} (V) = α A_{i, t} + (1 - α) A_{i, t - 1} {(V)}_{t - 1} = \sum_{j = 0}^{t - 1} α {(1 - α)}^{j} V_{t - j} + {(1 - α)}^{t} v_{0},

(8)

In this equation,

0 < α < 1

and

v_{0}

represent the smoothing parameter and the initial state, respectively. This method enables the weights to be smoother and highlights the importance of more recent data. By constructing the attention matrix (

A_{E S}

) and performing matrix multiplication on the input sequence, the MH-ESA algorithm is directly implemented, as described below:

A_{ES} (V) = [\begin{matrix} A_{1, t} (V) \\ ⋮ \\ A_{n, t} {(V)}_{L} \end{matrix}] = A_{ES} \cdot [\begin{matrix} v_{0}^{T} \\ V \end{matrix}],

(9)

This paper utilizes the unique structure of the multi-head exponential smoothing attention matrix (

A_{E S}

) to implement MH-ESA. The core idea is to use the

A_{E S}

matrix as a fundamental building block to develop MH-ESA for extraction of potential growth features. In practice, the growth representation is obtained by taking the consecutive differences of the residuals:

{\tilde{Z}}_{t - L : t}^{(n)} = Linear (Z_{t - L : t}^{(n - 1)}),

(10)

B_{t - L : t}^{(n)} = A_{ES} ({\tilde{Z}}_{t - L : t}^{(n)} - [{\tilde{Z}}_{t - L : t - 1}^{(n)}, v_{0}^{(n)}]),

(11)

B_{t - L : t}^{(n)} : = Linear (B_{t - L : t}^{(n)}),

(12)

In this equation,

v_{0}^{(n)}

represents the initial state of the MH-ESA mechanism.

(2): FA

The frequency attention mechanism can perform anti-seasonal processing on the input features, which is beneficial for this branch to focus on modeling horizontal and growth information. To effectively automate the extraction of seasonal features, a discrete Fourier transform is used to construct the FA mechanism, performing periodic detection on the power spectral density estimation. The following equation is established:

Φ_{k, i} = ϕ (F {(Z_{t - L : t}^{(n - 1)})}_{k, i}), A_{k, i} = | F {(Z_{t - L : t}^{(n - 1)})}_{k, i} |,

(13)

κ_{i}^{(1)}, \dots, κ_{i}^{(K)} = \arg Top - K {A_{k, i}},

(14)

S_{j, i}^{(n)} = \sum_{i = 1}^{K} A_{κ_{i}^{(k)}, i} [\cos (2 π f_{κ_{i}^{(k)}} j + Φ_{κ_{i}^{(k)}, i}) + \cos (2 π {\bar{f}}_{κ_{i}^{(k)}} j + {\bar{Φ}}_{κ_{i}^{(k)}, i})],

(15)

In the equation,

Φ_{k, i}

and

A_{k, i}

represent the phase/amplitude of the

K - t h

frequency in the

i

dimension.

a r g T o p - K

returns the parameters of the previous

K

amplitudes, where

K

is a hyperparameter and

f_{k}

is the Fourier transform frequency of the corresponding index;

{\bar{f}}_{k}

and

{\bar{Φ}}_{k, i}

are the Fourier transform frequencies of the conjugate index.

3.2.3. PWC

The feed-forward network performs a nonlinear transformation on the output information of MH-ESA. By applying different weight matrices and activation functions, it processes the input features to capture more complex relationships between features and extract richer information. The output of the feed-forward network is combined with the weighted outputs from the two branches through periodic weighting coefficients:

X_{t + H} = α^{n} X_{f} + (1 - α^{n}) X_{w} + X_{l},

(16)

Here, n represents the hyperparameter that regulates the contribution of

α

.

Multivariate time series encompass both stable long-term dependencies and short-term or instantaneous interactions. To enhance prediction accuracy, the model must simultaneously capture stable long-term patterns and dynamic short-term fluctuations, achieving an effective balance. To improve the collaborative adaptability between the dual-branch structures, this paper introduces a periodicity-weighted coefficient, denoted as

α

, strengthening the mutual adaptability between the dual-branch structures. The expression for the PWC is as follows:

\begin{matrix} α = \frac{1}{C} \sum_{i = 1}^{C} \frac{\max {(a_{i}^{2})}_{i = 1, \dots, m}}{\sum_{i = 1}^{m} a_{i}^{2}} \end{matrix},

(17)

Here,

a_{i}

represents the amplitude of the

i - t h

frequency after Fourier transformation. The adaptability of

α

makes it highly versatile. Values close to 1 emphasize global periodicity by utilizing Fourier features, while values close to 0 focus on local behavior using wavelet features.

4. Experimental Analysis

To validate the advantages of the proposed DBFformer network in wind power generation forecasting, comparative experiments and ablation studies were conducted. The comparative experiment analyzed the prediction performance of DBFformer by comparing it with the current mainstream time-series forecasting models. The ablation study investigated the impact of various components, such as the dual-branch structure, frequency transformation, exponential smoothing attention mechanism, and frequency attention mechanism, on the overall network performance.

4.1. Data Preprocessing

The dataset uses data from wind turbines with rated capacities of 36 MW, 66 MW, 99 MW, and 200 MW. The sampling interval of the data is denoted as

Δ t

, where

Δ t

= 15 min. The anomalies in the meteorological data surrounding wind turbine units primarily stem from the combined effects of equipment malfunctions and external environmental factors. Based on the equipment itself, aging, damage, or malfunction of wind turbine components can lead to abnormal power generation. Environmental factors also impact wind power generation. Since wind power relies on wind energy, the instability, frequent fluctuations, and significant variations in wind speed are key environmental factors contributing to power generation anomalies. Extreme weather conditions, such as heavy rainfall, snowstorms, and sandstorms, affect the aerodynamic performance of the blades, thereby reducing power generation and leading to abnormalities in power output. Therefore, it is essential to employ appropriate methods to enhance the efficiency of data utilization and improve forecasting accuracy.

Analysis of wind power system data reveals that the main anomalies stem from data corruption (e.g., sensor errors) and missing values, primarily caused by sensor failures, abrupt turbine shutdowns, or external environmental interference (e.g., electromagnetic noise). To address these issues, this study employed a comprehensive preprocessing strategy combining error correction and imputation techniques. For erroneous data points, anomalies were first detected by defining valid data ranges and analyzing temporal trends. Short-term corruptions (less than or equal to three consecutive time steps) were corrected using linear interpolation to maintain temporal continuity. For longer-term anomalies or clearly inconsistent values, power output data were corrected using local neighborhood averaging within a ±3 time step window or estimated through trend-based extrapolation to preserve dynamic patterns. For missing meteorological data such as wind speed, wind direction, and temperature, cubic spline interpolation and historical statistical profiles were used to ensure smoothness and consistency. These preprocessing methods collectively enhance data quality and ensure that the forecasting model receives reliable and physically meaningful input sequences.

4.2. Experimental Parameters and Evaluation Metrics

All comparative experiments of the algorithms were conducted on the same computer to ensure a fair evaluation of the predictive performance of each network. The hardware specifications used in the experiments are shown in Table 2.

Based on the correlation analysis presented earlier, this study uses wind speed, wind direction, temperature, and the electrical power output of the wind turbine as inputs for model prediction. The model is trained for 10 epochs, with input data step sizes set at 96, 192, 336, and 720, which correspond to 24, 48, 84, and 180 h of historical data, respectively, given a 15 min sampling interval. The model dimension is 512, with a dropout rate of 0.05. The activation function is GELU, early stopping is applied with a patience of 3, the learning rate is set to 0.0001, and the loss function is MSE. The parameters are detailed in Table 3.

The dataset is divided into training, testing, and validation sets with a distribution ratio of 7:2:1. The neural network model is trained and validated, and predictions are then made using the validation set. The evaluation metrics used in the experiment are MSE and MAE. The formula for calculating the mean squared error is expressed as follows:

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(18)

The formula for calculating the Mean Absolute Error is expressed as follows:

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |,

(19)

where

n

is the number of samples,

y_{i}

is the true value, and

{\hat{y}}_{i}

is the predicted value. The closer the values of both evaluation metrics are to 0, the smaller the model’s error, the higher the accuracy, and the better the performance.

4.3. Experimental Validation Results

To evaluate the accuracy of wind power generation forecasting using the proposed model, we compared DBFformer with six baseline models: Autoformer, Informer, FEDformer, Transformer, LSTNet, and LSTM. The dataset was split into training, validation, and testing sets with a ratio of 7:2:1, where the models were trained to forecast the power generation of the final 10% of the data. Four different input sliding windows were adopted, with window lengths (

Δ T

) of 96, 192, 336, and 720 data points. Given the 15 min data sampling interval, these correspond to 24 h (1 day), 48 h (2 days), 84 h (3.5 days), and 180 h (7.5 days) of historical input data, respectively. The forecasting task was configured as a multi-step-ahead prediction problem, where the models predicted the next 24 time steps—equivalent to a 6 h-ahead forecasting horizon (24 × 15 min = 6 h). This time frame is practically significant for wind farm operators in tasks such as energy dispatch planning and grid balancing, offering a good trade-off between operational value and prediction accuracy. Figure 4 presents the prediction curves of different models on turbines with varying rated capacities under a sliding window size of 96.

Table 4 presents the power prediction results for wind turbines with a rated power of 36 MW, 66 MW, 99 MW, and 200 MW using different sliding window lengths. Among the models, the DBFformer model proposed in this study stands out with its remarkable performance, achieving the lowest MSE and MAE values for turbines with different power ratings under sliding windows of different scales. When the sliding window length is 96, the MSE values of DBFformer are 0.195, 0.216, 0.457, and 0.583, and the MAE values are 0.318, 0.335, 0.474, and 0.503. Taking the 36 MW turbine with a sliding window length of 96 as an example, the MSE of DBFformer decreases by 0.119, 0.208, 0.319, 0.433, 1.236, and 1.516, and its MAE decreases by 0.106, 0.196, 0.298, 0.391, 1.277, and 1.597 compared to Autoformer, Informer, FEDformer, Transformer, LSTNet, and LSTM, respectively. The comparative experiments demonstrate that the DBFformer model proposed in this study exhibits superior prediction accuracy.

Figure 5 shows a comparison between predicted wind power values and the corresponding ground truth under different sliding window lengths (

Δ T

= 96, 192, 336, and 720). The visualized results clearly show that the length of the sliding window significantly affects the model’s ability to track and reproduce power fluctuations. When shorter windows (e.g.,

Δ T

= 96 or 192) are used, the model demonstrates greater responsiveness to sudden changes, accurately capturing rapid variations and abrupt shifts in power output. In contrast, longer windows (e.g.,

Δ T

= 720) result in smoother prediction curves, which can be advantageous in relatively stable intervals but often fail to react promptly in highly dynamic regions due to over-smoothing. Moreover, excessively long input sequences may introduce redundant historical information, hindering the model’s focus on relevant short-term patterns and reducing overall prediction accuracy. Overall, the results shown in the figure suggest that the choice of sliding window length should be carefully tailored to the volatility of the target data. Shorter windows tend to strike a better balance between accuracy and adaptability, thereby enhancing the model’s ability to capture the temporal dynamics of wind power series effectively.

4.4. Ablation Experiments

To assess the impact of each module on the final model, an ablation experiment was conducted, recording the prediction accuracy evaluation metrics of MSE, MAE, and training time. Table 5 presents the prediction results for a 36 MW-rated wind turbine with a window length of 336. The results from the ablation experiment show that incorporating Fourier and wavelet frequency transformations based on the dual-branch structure further improves the model’s accuracy. Specifically, these frequency transformations help reduce the complexity of model training, leading to a decrease in training time.

Furthermore, the use of the NH-ESA and FA attention mechanisms contributes to the improvement of the model’s prediction accuracy. However, it also increases the number of model parameters, resulting in a longer training time.

After the introduction of the PWC module, both MSE and MAE are reduced, demonstrating that the PWC module effectively integrates local and global features by adaptively weighting different feature representations, further enhancing prediction accuracy.

5. Conclusions

The escalating integration of wind power generation within microgrid systems necessitates accurate prediction of wind power output. This accurate prediction is crucial for optimizing grid control, enhancing operational stability, and improving wind energy accommodation capacity. However, current wind power forecasts confront significant challenges due to the inherent diversity and stochastic fluctuations of meteorological parameters that govern wind power generation dynamics.

To address these issues, this paper proposes DBFformer, a dual-branch frequency-domain model based on the Transformer architecture. The model separately extracts long-term trend features and short-term local variations, which enhances its ability to capture multi-scale characteristics in time series. This design mitigates the Transformer’ architecture’s limitations in modeling long-term dependencies and ultimately improves the forecasting accuracy of wind power generation. In the dual-branch encoder, Fourier transform and wavelet transform are incorporated to exploit the sparse representation properties in the frequency domain. The incorporation of these transforms enables effective compression and extraction of features at both global and local scales, significantly improving inference efficiency while maintaining accuracy. The decoder incorporates a dual attention mechanism that integrates ESA and FA, which facilitates the extraction of long-term trends and periodic patterns from the data. This integration enhances the model’s ability to model growth and periodic information, improving forecasting accuracy in complex non-stationary time series. To further improve the coordination and adaptability between the two branches, the PWC mechanism is introduced. This PWC mechanism dynamically balances global and local feature information, effectively enhancing the stability and robustness of long-term dependency modeling.

Experimental results clearly show that DBFformer outperforms six mainstream baseline models—Autoformer, Informer, FEDformer, Transformer, LSTNet, and LSTM—under varying sliding window lengths and turbine capacities. For instance, in the case of a 36 MW wind turbine with an input window length of 96, DBFformer’s MSE decreases by 0.119, 0.208, 0.319, 0.433, 1.236, and 1.516 compared to Autoformer, Informer, FEDformer, Transformer, LSTNet, and LSTM, respectively. Similarly, its MAE decreases by 0.106, 0.196, 0.298, 0.391, 1.277, and 1.597 relative to the same baseline models. Ablation experiments further demonstrate that the Fourier and wavelet components contribute significantly to both prediction accuracy and training efficiency. Meanwhile, the NH-ESA and FA mechanisms enhance temporal feature extraction, and the PWC module boosts robustness by improving the fusion of global and local features. These findings confirm the effectiveness and generalizability of the proposed architecture in addressing the challenges of wind power prediction under highly variable meteorological conditions.

While the dataset used in this study was collected from utility-scale wind farms and solar stations in China, its 15 min sampling interval, diverse generation capacities, and inclusion of real-time weather parameters closely reflect the characteristics typical of microgrid systems. The DBFformer model, designed to effectively capture multi-scale temporal features and handle the stochastic nature of renewable generation, is architecture-agnostic and highly adaptable to microgrid scenarios. Therefore, although this study does not directly utilize microgrid data, the modeling approach and results strongly indicate that DBFformer can be transferred to microgrid wind power forecasting tasks. Future work will further investigate this extrapolation by validating the model on dedicated microgrid datasets.

6. Future Work

Although the proposed DBFformer model has achieved promising results in wind power generation forecasting, there are still several research directions that require further research to promote the development of microgrids.

Future work could explore the incorporation of more advanced frequency transformation methods or alternative feature extraction techniques. These approaches could help uncover multi-scale latent features within the data. Additionally, this paper uses wind speed, wind direction, and temperature data for forecasting. Future research should consider integrating additional multi-source data related to wind power, such as satellite remote sensing data, terrain and topography data, and power-grid operation data. These data contain rich information about the surrounding environment of wind turbine units, which can help more comprehensively depict the influencing factors of wind power and improve the accuracy and reliability of predictions. Finally, microgrids need to be designed to withstand sudden fluctuations in wind power generation. This could involve developing more advanced control strategies for energy storage systems, ensuring that the microgrid can maintain a stable power supply, even during periods of low wind.

Moreover, accurate forecasting alone is insufficient to ensure reliable microgrid operation under high levels of renewable penetration. It is critical to incorporate the impact of PMUs and SE, particularly how optimal PMU and SE placement affects observability and control of wind energy generation. Subsequent research will introduce real-time PMU measurement data as additional input features into the deep learning framework and reconstruct the full system state through the state estimation module to enhance the deep learning model’s ability to capture dynamic microgrid characteristics. Meanwhile, combined with the optimal PMU configuration algorithm, the sensor deployment scheme will be optimized to maximize system observability while minimizing hardware costs. Additionally, the closed-loop collaborative mechanism between the prediction model and SE will be explored to construct an integrated “prediction–measurement–control” microgrid operation framework, enhancing the model’s reliability and engineering applicability in practical scenarios and providing more comprehensive technical support for the stable operation of microgrids with high levels of wind power penetration.

Author Contributions

Conceptualization, Z.C. and J.W.; methodology, M.C.; software, M.C. and F.Q.; validation, J.W., Z.C., and L.Z.; formal analysis, L.Z., S.L., and F.Q.; investigation, F.Q. and S.L.; resources, M.C. and J.W.; data curation, Z.C., F.Q., and S.L.; writing—original draft preparation, Z.C. and L.Z.; writing review and editing, S.L., M.C., and F.Q.; visualization, J.W.; supervision, M.C. and L.Z.; project administration S.L., M.C., and L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Sichuan Science and Technology Program (grant number 2023NSFSC1987) and the Science & Technology Project of Sichuan Province Electric Power Company (grant number 52199723001S).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Authors Jie Wu, Zhengwei Chang and Linghao Zhang were employed by the company State Grid Corporation of Sichuan Province. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Theodorakatos, N.P.; Babu, R.; Moschoudis, A.P. The Branch-and-Bound Algorithm in Optimizing Mathematical Programming Models to Achieve Power Grid Observability. Axioms 2023, 12, 1040. [Google Scholar] [CrossRef]
Darmis, O.; Korres, G.N. Forecasting-Aided Power System State Estimation Using Multi-Source Multi-Rate Measurements. IEEE Trans. Instrum. Meas. 2025; Early Access. [Google Scholar] [CrossRef]
Tang, X.; Dai, Y.; Wang, T.; Chen, Y. Short-Term Power Load Forecasting Based on Multi-Layer Bidirectional Recurrent Neural Network. IET Gener. Transm. Distrib. 2019, 13, 3847–3854. [Google Scholar] [CrossRef]
Donadio, L.; Fang, J.; Porté-Agel, F. Numerical Weather Prediction and Artificial Neural Network Coupling for Wind Energy Forecast. Energies 2021, 14, 338. [Google Scholar] [CrossRef]
Hanifi, S.; Liu, X.; Lin, Z.; Lotfian, S. A Critical Review of Wind Power Forecasting Methods—Past, Present and Future. Energies 2020, 13, 3764. [Google Scholar] [CrossRef]
Li, N.; Wang, Y.; Ma, W.; Xiao, Z.; An, Z. A Wind Power Prediction Method Based on De-Bp Neural Network. Front. Energy Res. 2022, 10, 844111. [Google Scholar]
Zhang, F.; Li, P.-C.; Gao, L.; Liu, Y.-Q.; Ren, X.-Y. Application of Autoregressive Dynamic Adaptive (Arda) Model in Real-Time Wind Power Forecasting. Renew. Energy 2021, 169, 129–143. [Google Scholar] [CrossRef]
Yang, Y.; Lou, H.; Wu, J.; Zhang, S.; Gao, S. A Survey on Wind Power Forecasting with Machine Learning Approaches. Neural Comput. Appl. 2024, 36, 12753–12773. [Google Scholar] [CrossRef]
Wang, Y.; Zou, R.; Liu, F.; Zhang, L.; Liu, Q. A Review of Wind Speed and Wind Power Forecasting with Deep Neural Networks. Appl. Energy 2021, 304, 117766. [Google Scholar] [CrossRef]
Feng, B.-f.; Xu, Y.-s.; Zhang, T.; Zhang, X. Hydrological Time Series Prediction by Extreme Learning Machine and Sparrow Search Algorithm. Water Supply 2022, 22, 3143–3157. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, S.; Zhang, W.; Peng, J.; Cai, Y. Multifactor Spatio-Temporal Correlation Model Based on a Combination of Convolutional Neural Network and Long Short-Term Memory Neural Network for Wind Speed Forecasting. Energy Convers. Manag. 2019, 185, 783–799. [Google Scholar]
Imani, M.; Ghassemian, H. Residential Load Forecasting Using Wavelet and Collaborative Representation Transforms. Appl. Energy 2019, 253, 113505. [Google Scholar] [CrossRef]
López, E.; Valle, C.; Allende, H.; Gil, E. Long Short-Term Memory Networks Based in Echo State Networks for Wind Speed Forecasting. In Proceedings of the Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 22nd Iberoamerican Congress, CIARP 2017, Valparaíso, Chile, 7–10 November 2017; Proceedings 22. Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Qu, K.; Si, G.; Shan, Z.; Kong, X.; Yang, X. Short-Term Forecasting for Multiple Wind Farms Based on Transformer Model. Energy Rep. 2022, 8, 483–490. [Google Scholar] [CrossRef]
Nascimento, E.G.S.; de Melo, T.A.; Moreira, D.M. A Transformer-Based Deep Neural Network with Wavelet Transform for Forecasting Wind Speed and Wind Energy. Energy 2023, 278, 127678. [Google Scholar] [CrossRef]
Peesapati, R.; Kumar, N. Electricity Price Forecasting and Classification through Wavelet–Dynamic Weighted Pso–Ffnn Approach. IEEE Syst. J. 2017, 12, 3075–3084. [Google Scholar]
Shen, Z.; Fan, X.; Zhang, L.; Yu, H. Wind Speed Prediction of Unmanned Sailboat Based on Cnn and Lstm Hybrid Neural Network. Ocean Eng. 2022, 254, 111352. [Google Scholar] [CrossRef]
Li, X.; Yang, G.; Gou, J. Pv Power Forecasting in the Hexi Region of Gansu Province Based on Ap Clustering and Lstnet. Int. Trans. Electr. Energy Syst. 2024, 2024, 6667756. [Google Scholar] [CrossRef]
Peng, X.; Yang, Z.; Li, Y.; Wang, B.; Che, J. Short-Term Wind Power Prediction Based on Stacked Denoised Auto-Encoder Deep Learning and Multi-Level Transfer Learning. Wind Energy 2023, 26, 1066–1081. [Google Scholar] [CrossRef]
Cai, J.; Li, Q.; Cheng, Z.; Wang, R. Short-Term Power Load Forecasting Method Based on Hpo-Lstm Model. In Proceedings of the 2023 Panda Forum on Power and Energy (PandaFPE), Chengdu, China, 27–30 April 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Memarzadeh, G.; Keynia, F. A New Short-Term Wind Speed Forecasting Method Based on Fine-Tuned Lstm Neural Network and Optimal Input Sets. Energy Convers. Manag. 2020, 213, 112824. [Google Scholar] [CrossRef]
Zhang, W.; Yan, H.; Xiang, L.; Shao, L. Wind Power Generation Prediction Using Lstm Model Optimized by Sparrow Search Algorithm and Firefly Algorithm. Energy Inform. 2025, 8, 35. [Google Scholar] [CrossRef]
Yang, W.; Zhang, Z.; Meng, K.; Wang, K.; Wang, R. Ceemdan-Rime–Bidirectional Long Short-Term Memory Short-Term Wind Speed Prediction for Wind Farms Incorporating Multi-Head Self-Attention Mechanism. Appl. Sci. 2024, 14, 8337. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 19–21 May 2021. [Google Scholar]
Mohammadi Farsani, R.; Pazouki, E. A Transformer Self-Attention Model for Time Series Forecasting. J. Electr. Comput. Eng. Innov. (JECEI) 2020, 9, 1–10. [Google Scholar]
Yan, L.; Wu, S.; Li, S.; Chen, X. Seaformer: Frequency Domain Decomposition Transformer with Signal Enhanced for Long-Term Wind Power Forecasting. Neural Comput. Appl. 2024, 36, 20883–20906. [Google Scholar] [CrossRef]
Xu, H.; Peng, Q.; Wang, Y.; Zhan, Z. Power-Load Forecasting Model Based on Informer and Its Application. Energies 2023, 16, 3086. [Google Scholar] [CrossRef]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; PMLR: Cambridge, MA, USA, 2022. [Google Scholar]
Simonyan, K.; Zisserman, A. Two-Stream Convolutional Networks for Action Recognition in Videos. Adv. Neural Inf. Process. Syst. 2014, 1, 568–576. [Google Scholar]
Gong, Y.; Chung, Y.-A.; Glass, J. Ast: Audio Spectrogram Transformer. arXiv 2021, arXiv:2104.01778. [Google Scholar]
Lu, W.-T.; Wang, J.-C.; Won, M.; Choi, K.; Song, X. Spectnt: A Time-Frequency Transformer for Music Audio. arXiv 2021, arXiv:2110.09127. [Google Scholar]
Patro, B.N.; Namboodiri, V.P.; Agneeswaran, V.S. Spectformer: Frequency and Attention Is What You Need in a Vision Transformer. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, 26 February–6 March 2025; IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
Abate Mitaw, A.; Tadesse Kassie, A.; Shiferaw Negash, D. Dynamic Programming Strategy in Optimal Controller Design for a Wind Turbine System. Cogent Eng. 2024, 11, 2340212. [Google Scholar] [CrossRef]
Zhang, H.; Hu, Y.; Wang, W. Wind Tunnel Experimental Study on the Aerodynamic Characteristics of Straight-Bladed Vertical Axis Wind Turbine. Int. J. Sustain. Energy 2024, 43, 2305035. [Google Scholar] [CrossRef]
Wang, H.; Li, B.; Xue, Z.; Fan, S.; Liu, X. Powerformer: A Temporal-Based Transformer Model for Wind Power Forecasting. Energy Rep. 2024, 11, 736–744. [Google Scholar]
Tian, Z.; Liu, W.; Jiang, W.; Wu, C. Cnns-Transformer Based Day-Ahead Probabilistic Load Forecasting for Weekends with Limited Data Availability. Energy 2024, 293, 130666. [Google Scholar] [CrossRef]
Woo, G.; Liu, C.; Sahoo, D.; Kumar, A.; Hoi, S. Etsformer: Exponential Smoothing Transformers for Time-Series Forecasting. arXiv 2022, arXiv:2202.01381. [Google Scholar]

Figure 1. Correlation between the factors of the output power of wind turbines.

Figure 2. Overall structure of the Transformer model.

Figure 3. Overall structure of DBFformer network structure.

Figure 4. Experimental results of different models for different rated wind turbines (ΔT = 96).

Figure 5. Visualization of wind power prediction results under different sliding window lengths (

Δ T

= 96, 192, 336, and 720).

Figure 5. Visualization of wind power prediction results under different sliding window lengths (

Δ T

= 96, 192, 336, and 720).

Table 1. Comparison of data from different wind turbines.

Turbine Capacity (MW)	Data Length (Samples)	Atmospheric Pressure Range (hPa)
36 MW	55,850	896–898
66 MW	66,170	889–890
99 MW	65,832	1088–1090
200 MW	62,475	884–885

Table 2. Parameters of wind turbine power prediction experimental equipment.

Hardware	Parameters
CPU model	Intel i5-12500H (Intel Corporation, Santa Clara, CA, USA)
GPU model	GeForce RTX 3060 (NVIDIA Corporation, Santa Clara, CA, USA)
Memory	16 G
Operating system	Windows 11
GPU accelerator	CUDA 12.4
Driver version	551.61

Table 3. Parameters of the experimental model for wind turbine power prediction.

Parameter	Setup
Training Epochs	10
Batch Size	32
Input Sequence Length	96-192-336-720
Start Token Length	48
Prediction Sequence Length	24
Dimension of Model	512
Dropout	0.05
Activation	gelu
Early Stopping Patience	3
Optimizer Learning Rate	0.0001
Loss Function	mse

Table 4. Comparative experimental results of wind turbine power prediction.

Model		DBFformer		Autoformer		Informer		FEDformer		Transformer		LSTNet		LSTM
$Δ T$		MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE
36 MW	96	0.195	0.318	0.314	0.424	0.403	0.514	0.514	0.616	0.628	0.709	1.431	1.595	1.711	1.915
	192	0.216	0.343	0.311	0.459	0.405	0.555	0.511	0.653	0.593	0.731	1.372	1.605	1.674	1.922
	336	0.257	0.372	0.366	0.473	0.453	0.564	0.548	0.658	0.629	0.768	1.443	1.649	1.742	1.955
	720	0.283	0.392	0.374	0.499	0.473	0.605	0.559	0.703	0.658	0.786	1.435	1.693	1.746	1.993
66 MW	96	0.216	0.335	0.337	0.423	0.417	0.537	0.532	0.624	0.635	0.743	1.439	1.638	1.725	1.937
	192	0.243	0.342	0.325	0.458	0.408	0.543	0.515	0.632	0.623	0.723	1.449	1.613	1.731	1.914
	336	0.265	0.364	0.352	0.455	0.469	0.544	0.588	0.661	0.674	0.753	1.493	1.687	1.784	1.971
	720	0.278	0.376	0.361	0.472	0.475	0.577	0.556	0.684	0.671	0.776	1.458	1.689	1.758	1.977
99 MW	96	0.457	0.474	0.522	0.568	0.652	0.683	0.719	0.789	0.837	0.877	1.637	1.769	1.932	2.085
	192	0.514	0.518	0.583	0.601	0.725	0.696	0.801	0.795	0.934	0.885	1.695	1.803	2.012	2.086
	336	0.554	0.502	0.654	0.597	0.737	0.701	0.852	0.823	0.902	0.901	1.751	1.799	2.011	2.097
	720	0.599	0.531	0.692	0.659	0.778	0.728	0.869	0.795	0.953	0.897	1.769	1.825	2.075	2.112
200 MW	96	0.583	0.503	0.678	0.633	0.772	0.692	0.864	0.795	0.999	0.826	1.761	1.757	2.006	2.072
	192	0.581	0.492	0.697	0.588	0.803	0.683	0.928	0.805	1.018	0.869	1.785	1.776	2.174	2.083
	336	0.601	0.506	0.728	0.622	0.785	0.732	0.923	0.819	0.999	0.921	1.772	1.817	2.278	2.123
	720	0.643	0.511	0.738	0.654	0.829	0.726	0.897	0.832	0.983	0.927	1.779	1.852	2.379	2.143

Table 5. Results of ablation experiment on DBFformer network. (“√” indicates that the corresponding module is included in the ablation setting.)

Basic Model	PWC	Fourier–Wavelet	NH-ESA	FA	MSE	MAE	Times (min)
Transformer					1.092	1.126	258
	√				0.823	0.896	214
	√	√			0.575	0.624	181
	√		√		0.616	0.726	235
	√			√	0.634	0.698	223
	√	√	√		0.515	0.613	205
	√	√		√	0.426	0.538	228
	√		√	√	0.325	0.419	327
	√	√	√	√	0.216	0.343	296

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, J.; Chang, Z.; Zhang, L.; Chen, M.; Li, S.; Qiu, F. A Dual-Branch Transformer Network with Multi-Scale Attention Mechanism for Microgrid Wind Turbine Power Forecasting. Electronics 2025, 14, 2566. https://doi.org/10.3390/electronics14132566

AMA Style

Wu J, Chang Z, Zhang L, Chen M, Li S, Qiu F. A Dual-Branch Transformer Network with Multi-Scale Attention Mechanism for Microgrid Wind Turbine Power Forecasting. Electronics. 2025; 14(13):2566. https://doi.org/10.3390/electronics14132566

Chicago/Turabian Style

Wu, Jie, Zhengwei Chang, Linghao Zhang, Mingju Chen, Senyuan Li, and Fuhong Qiu. 2025. "A Dual-Branch Transformer Network with Multi-Scale Attention Mechanism for Microgrid Wind Turbine Power Forecasting" Electronics 14, no. 13: 2566. https://doi.org/10.3390/electronics14132566

APA Style

Wu, J., Chang, Z., Zhang, L., Chen, M., Li, S., & Qiu, F. (2025). A Dual-Branch Transformer Network with Multi-Scale Attention Mechanism for Microgrid Wind Turbine Power Forecasting. Electronics, 14(13), 2566. https://doi.org/10.3390/electronics14132566

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Dual-Branch Transformer Network with Multi-Scale Attention Mechanism for Microgrid Wind Turbine Power Forecasting

Abstract

1. Introduction

2. Related Work

2.1. Deep Learning Networks for Wind Turbine Power Forecasting

2.2. Influencing Factors of Wind Turbine Power

3. DBFformer

3.1. Traditional Transformer

3.2. Proposed DBFformer

3.2.1. Dual-Branch Encoder

3.2.2. Multi-Scale Attention Mechanism Decoder

3.2.3. PWC

4. Experimental Analysis

4.1. Data Preprocessing

4.2. Experimental Parameters and Evaluation Metrics

4.3. Experimental Validation Results

4.4. Ablation Experiments

5. Conclusions

6. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI