Prediction Model of Offshore Wind Power Based on Multi-Level Attention Mechanism and Multi-Source Data Fusion

Xu, Yuanyuan; Lin, Yixin; Li, Shuhao; Gao, Xiutao

doi:10.3390/electronics14163183

Open AccessArticle

Prediction Model of Offshore Wind Power Based on Multi-Level Attention Mechanism and Multi-Source Data Fusion

College of Information Science and Engineering, Huaqiao University, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(16), 3183; https://doi.org/10.3390/electronics14163183

Submission received: 23 June 2025 / Revised: 27 July 2025 / Accepted: 8 August 2025 / Published: 10 August 2025

Download

Browse Figures

Versions Notes

Abstract

In response to the strong coupling and nonlinear interactions among complex meteorological and marine variables in offshore wind power generation—and given the implicit, topologically intricate nature of multi-source data—this paper introduces a novel multi-source data fusion model that combines a multi-layer attention mechanism (AM) with a bidirectional gated recurrent unit (BiGRU) network. For the spatio-temporal forecasting of offshore wind power, we embed the AM within a deep BiGRU framework to construct a hierarchical attention architecture that jointly learns spatial and temporal dependencies. This architecture dynamically uncovers latent correlations between wind farm outputs and diverse input features, yielding adaptive importance weights across both dimensions. The empirical validation on an offshore wind farm dataset demonstrates that the proposed model achieves superior predictive accuracy and stability compared with benchmark methods.

Keywords:

offshore wind power; prediction; AM; BiGRU; interpretability

1. Introduction

The rapid expansion of the global economy and the concomitant surge in the energy demand have precipitated a dramatic increase in greenhouse gas emissions. In this context, wind energy—characterized by its cleanliness, renewability, and broad geographic availability—has assumed an increasingly prominent role within the worldwide energy portfolio [1]. According to projections by the Global Wind Energy Council, wind power is expected to supply one fifth of the global electricity by 2030 and to grow by an additional two-thirds by 2050 [2]. Offshore wind resources, which exceed those available on land, have driven the migration of new wind farm developments to marine environments. The proximity to major load centers facilitates more efficient power delivery, rendering offshore installations a strategic focus for future expansion [3]. Moreover, offshore farms occupy no terrestrial acreage, exhibit higher mean wind speeds, and operate with zero on-site emissions; their capacity factors are typically 20–40% greater than those of comparable onshore projects [4]. Despite the rapidly growing global offshore wind market, the inherent variability of wind generation continues to challenge grid integration [5]. In particular, the offshore wind farm output is commonly collected at sea and transmitted to coastal distribution networks—areas that often already bear a substantial local load—which exacerbates the impact of generation fluctuations on system stability [6]. Consequently, enhancing the precision of wind power forecasting is vital for maximizing farm utilization and ensuring reliable grid operation [7].

Forecasting methods for offshore wind power can be broadly categorized into physical, statistical, intelligent, and hybrid approaches [8]. Physical models leverage meteorological and physical variables—such as Numerical Weather Prediction (NWP) outputs and the ambient temperature—to simulate the underlying aerodynamics and convert them into power estimates [9]. Although physically grounded, these models often entail substantial computational overhead due to the need to solve complex mathematical formulations [10]. In contrast, statistical approaches—exemplified by autoregressive (AR) models—rely solely on historical power measurements to extrapolate future outputs; while simpler to implement, they frequently exhibit limited predictive accuracy and poor stability under volatile conditions [11,12].

In recent years, artificial intelligence (AI) techniques have been increasingly adopted to capture the nonlinear dependencies inherent in wind power time series. Notable examples include artificial neural networks (ANNs) [13], backpropagation neural networks (BPNNs) [14], and extreme learning machines (ELMs) [15]. Hybrid models, which combine these learning algorithms with signal processing, feature selection, or optimization methods, aim to exploit the complementary strengths of individual techniques and have demonstrated an enhanced accuracy and robustness relative to standalone models [16]. Recognizing that wind power generation is ultimately driven by atmospheric kinetic energy—and is thus influenced not only by turbine parameters but also by meteorological and geographic factors [17,18]—researchers have begun incorporating multimodal data streams into these hybrid frameworks. For instance, Hanifi et al. developed a hybrid scheme integrating wavelet packet decomposition (WPD), long short-term memory networks (LSTMs), and convolutional neural networks (CNNs) to boost the forecasting precision [19,20]. Additionally, Qin et al. proposed a dual-stage attention-based recurrent neural network (DA-RNN) that adaptively extracts relevant driving series through an input attention mechanism and selects key encoder hidden states via a temporal attention mechanism, providing an effective approach for modeling dependencies in multivariate time series [21]. Meng et al. proposed a hybrid EEMD-BA-RGRU-CSO model that integrates Ensemble Empirical Mode Decomposition (EEMD), a bi-attention mechanism (BA), a residual gated recurrent unit (RGRU), and the crisscross optimization algorithm (CSO), demonstrating an excellent performance in multi-step wind power prediction [22].

Existing approaches to offshore wind power forecasting excel at uncovering local feature correlations but often fail to capture broader, global dependencies. They typically employ static weighting schemes for inputs, thereby overlooking how feature–target relationships evolve over time [23]. Addressing this shortcoming requires a modeling framework capable of simultaneously learning long-range spatial and temporal dependencies across multiple data modalities. Recent breakthroughs in deep learning—most notably the attention mechanism (AM)—provide just such a framework [24]. Drawing inspiration from the brain’s selective allocation of cognitive resources, attention dynamically highlights the most informative components of the input, yielding significant gains in accuracy. Consequently, the AM has been successfully applied across computer vision, natural language processing, and time series forecasting domains.

To this end, this paper is inspired by the introduction of the attention mechanism for multimodal fusion based on the deep learning BiGRU network and constructs a multimodal fusion short-term wind power prediction model based on the multi-layer attention mechanism.

The novel contributions of this study are as follows:

A short-term wind power prediction model, MAM-BiGRU, with the multimodal fusion of the multi-layer attention mechanism (MAM) and the bidirectional gating unit (BiGRU) is proposed.
A dual spatial attention mechanism (DSAM) module is constructed to realize the effective fusion of complex multidimensional data.
A BiGRU module based on the temporal attention mechanism (TAM) is constructed to capture the significant features of multivariate wind power time series changes.
Multiple wind meteorological features, such as the power, wind speed, temperature, wind direction, humidity, and barometric pressure, are considered in the modeling.

The rest of this paper is organized as follows:

In Section 2, we describe the algorithmic basis of the attention mechanism. In Section 3, we describe the proposed multimodal fusion wind power prediction problem and detail our multi-layer attention prediction model. In Section 4, we present a comprehensive case study and discussion. Finally, in Section 5 we summarize the paper.

2. Attention Mechanism Algorithmic Foundations

Multi-step wind speed forecasting—where a historical sequence of wind speed measurements is used to predict a sequence of future values—is a canonical sequence-to-sequence learning task. The attention mechanism, first introduced by Bahdanau et al. at ICLR 2015, enables a model to learn which elements of the input sequence are most relevant when generating each output step.

In the context of multimodal fusion for wind power prediction, a standalone recurrent neural network (RNN) may struggle to discern the relative importance of each input variable. By incorporating an attention module, the model can assign adaptive weights to different features, thereby enhancing the forecasting accuracy. Since the attention mechanism was originally designed for time series modeling and RNNs excel at capturing temporal dependencies, most implementations couple the attention mechanism directly with RNN architectures [25].

Researchers generally agree that the attention mechanism evolved from the original encoder–decoder model. In that model, the encoder compresses all inputs X into a single fixed-length context vector C, which the decoder then uses to generate the output. This approach forces the model to treat every part of the input equally, since it must distill all information into one uniform representation.

The attention mechanism overcomes this limitation by replacing the lone context vector C with a set of context vectors

\{C_{1}, C_{2}, \dots, C_{T}\}

, each corresponding to a different part of the input sequence. During decoding, the model computes a weight for each

C_{i}

that reflects its relevance to the current decoding step. In effect, the decoder attends more to certain parts of the input and less to others, producing a weighted combination of the

C_{i}

vectors rather than relying on a single summary.

With this modification, the encoder–decoder framework becomes what is shown in Figure 1:

To better understand the attention mechanism, AM, it is extracted from the “encoder–decoding” framework, as shown in Figure 2:

As illustrated in Figure 2, when the input information is represented as key–value pairs, the entire source can be written as

\{{(K}_{1}, V_{1}), {(K}_{2}, V_{2}), \dots, {(K}_{N}, V_{N})\}

, where each “key”

K_{i}

governs how much attention that piece of input should receive, and each “value”

V_{i}

carries the actual content to be aggregated.

The calculation process of the AM can be summarized into three stages, as shown in Figure 3.

Stage 1: Calculate the relevance of the query and the key to get the attention score.

S_{i} = F (Q, K_{i})

(1)

The computational methods include dot product modeling, finding the cosine similarity, and additive modeling, as shown in Formulas (2)–(4).

S_{i} (Q, K_{i}) = Q \cdot K_{i}

(2)

S_{i} (Q, K_{i}) = \frac{Q \cdot K_{i}}{‖Q‖ \cdot ‖K_{i}‖}

(3)

S_{i} (Q, K_{i}) = V^{T} \tanh (W Q + U K_{i})

(4)

where W, U, and V are learnable network parameters.

Stage 2: The correlations obtained in the first step are numerically transformed by SoftMax, as shown in Formula (5).

a_{i} = s o f t \max (S_{i}) = \frac{e^{s_{i}}}{\sum_{j = 1}^{N} e^{s_{j}}}

(5)

where

a_{i}

is the corresponding weight coefficient of

v a l u e_{i}

, and s is the similarity at the previous stage of the computation.

Stage 3: For

a_{i}

and

v a l u e_{i}

performing a weighted summation yields the attention value, as shown in Formula (6):

A t t e n t i o n ((K, V), Q) = \sum_{i = 1}^{N} a_{i} \cdot V a l u e_{i}

(6)

3. Short-Term Wind Power Prediction Model with Multi-Layer Attention

3.1. A Description of the Multimodal Fusion Wind Power Prediction Problem

A subset of multimodal measurements from an offshore wind farm is used to illustrate six concurrent time series variables—the power output, wind speed, temperature, humidity, wind direction, and barometric pressure—collected over a specified interval.

The proposed network architecture incorporates a two-stage attention mechanism. In the first stage, a spatial attention module learns pairwise correlations among the input features at each time step, thereby highlighting the most informative sensors or variables. In the second stage, a temporal attention module selectively weights the spatially attended hidden representations across time to capture long-range dependencies. These temporally filtered states are then aggregated into context vectors that jointly encode spatial and temporal relationships. By alternating the spatial and temporal attention, the model effectively learns both inter-variable interactions at each instant and their evolution over extended horizons [26].

This article describes the prediction model problem of the multimodal fusion as follows:

We formulate the multimodal fusion prediction problem as follows. Given

n

(

n > 1

) external time series X (including wind direction, temperature, etc.) and a target series Y (wind power), we denote these sequences and their variables as follows:

x^{(k)} = {({x_{1}}^{(k)}, {x_{2}}^{(k)}, \dots, {x_{T}}^{(k)})}^{T} \in R^{T}

(7)

The expression here represents the

k

-th external series over a window of length T, for

k = 1, 2, \dots, 5

. In our multimodal input, we take

k = 1

to be the wind speed,

k = 2

as the wind direction,

k = 3

as the air pressure,

k = 4

as the temperature, and

k = 5

as the humidity so that

x = {(x^{(1)}, x^{(2)}, x^{(3)}, x^{(4)}, x^{(5)})}^{T}

(8)

X = {(x_{1}, x_{2}, \dots, x_{T})}^{T} \in R^{5 * T}

(9)

collects all five external series over the same window.

Y = {(y_{1}, y_{2}, \dots, y_{T})}^{T} \in R^{T}

(10)

denotes the observed wind power output over the window.

\hat{Y} = {({\hat{y}}_{T + 1}, {\hat{y}}_{T + 2}, \dots, {\hat{y}}_{T + τ})}^{T} \in R^{T}

(11)

denotes the predicted values of the target series, where

τ

is the forecasting horizon.

Hence, given the historical external sequence

(x_{1}, x_{2}, \dots, x_{T}), x_{t} \in R^{5}

, and the wind power history

(y_{1}, y_{2}, \dots, y_{T}), y_{t} \in R

, the

τ

-step-ahead wind power predictions

\hat{Y} = {({\hat{y}}_{T + 1}, {\hat{y}}_{T + 2}, \dots, {\hat{y}}_{T + τ})}^{T}

are modeled by

{\hat{y}}_{T + 1}, {\hat{y}}_{T + 2}, \dots, {\hat{y}}_{T + τ} = F (y_{1}, y_{2}, \dots, y_{T}, x_{1}, x_{2}, \dots, x_{T})

(12)

where F(⋅) is the nonlinear mapping function to be learned.

3.2. Aerodynamic Characteristics and Mechanical Performance Analysis of Inflatable Savonius Wind Turbines

Today wind turbines are the primary equipment for harnessing wind energy. This article analyzes them from two perspectives: their aerodynamic characteristics and their mechanical performance. Starting from the aerodynamic characteristics of wind turbines, their output power can be written based on their characteristics [27]:

P_{o u t} = \frac{1}{2} C_{p} (λ) ρ D H V^{3}

(13)

λ = \frac{ω R}{V}

(14)

where

P_{o u t}

indicates the output power;

C_{p} (λ)

represents the power factor, with typical peaks ranging from 0.15 to 0.25, the corresponding

λ

is between 0.8 and 1.2;

ρ

refers to the density of air; D stands for the diameter; H is the leaf height, V represents the incoming wind speed,

λ

represents the tip-speed ratio (TSR),

ω

is the angular velocity, and R is the radius.

The key factors influencing the wind turbine power output are the overlap ratio and the tip-speed ratio. The overlap ratio, defined as the horizontal overlap distance e between the two semi-cylindrical blades divided by the rotor diameter D, governs the startup torque: a larger e/D yields a higher initial torque at low wind speeds and lowers the cut-in speed. However, an excessive overlap also increases airflow leakage, reducing effective pressure during steady operation. Meanwhile, by adjusting the generator’s load—either electrically or mechanically—and maintaining the operation around the optimal tip-speed ratio, the electrical energy production can be maximized.

From the analysis of its mechanical performance, the equivalent inertia of its impeller

J

can be listed as follows:

J = m_{b l a d e} \frac{R^{2}}{2}

(15)

E_{k} = \frac{1}{2} J ω^{2}

(16)

where

m_{b l a d e}

represents the mass of the impeller, and

E_{k}

represents the kinetic energy of the rotating machinery. A larger moment of inertia requires greater wind force or more time to reach the operating speed, while a smaller moment of inertia allows the turbine to respond, accelerate, or decelerate more quickly to sudden wind speed changes (e.g., gusts). When wind speeds fluctuate, higher inertia stores more kinetic energy, smoothing out output variations. However, too much inertia compromises the system agility and adds structural weight. Thus, to maximize the power generation efficiency, these factors must be carefully balanced.

3.3. The General Framework of the Model

In this paper, we combine and reconfigure the attention mechanism and propose a short-term wind power prediction model based on the multi-layered attention mechanism and the bidirectional gating unit, BiGRU, with the multimodal fusion of the MAM-BiGRU, which adequately extracts the predicted power and the external sequences with the spatial and temporal factors. The overall framework of the MAM-BiGRU prediction model is shown in Figure 4.

The prediction model in this paper contains three stages:

Spatial modeling: We introduce a dual-stage spatial attention mechanism (DSAM) built upon a BiGRU backbone. The first layer of the DSAM captures local feature correlations among auxiliary time series inputs, while the second layer models their global relationships with the wind power output. By learning adaptive weights for each variable, the DSAM explicitly quantifies the contribution of every external measurement to the power generation.

Temporal modeling: A BiGRU-based temporal attention mechanism (TAM) is applied to the spatially attended representations. The TAM selectively emphasizes past hidden states that exhibit strong long-term dependencies and periodic patterns, enabling the network to learn both trend and seasonality effects inherent in multivariate wind power series.

Prediction output: The context vectors produced by the DSAM and TAM are concatenated with recent wind power observations and passed through a final BiGRU layer. This module generates the multi-step power output sequence, leveraging the fused spatio-temporal embeddings to enhance the prediction accuracy over extended horizons.

3.3.1. Hierarchical Struacture

To capture the spatial dependencies between auxiliary input sequences and the target power series, we propose a BiGRU-based, two-layer spatial attention mechanism (DSAM). The first sub-module (SAM1) attends exclusively to the external variables, extracting local inter-feature correlations and producing fine-grained attention weights. The second sub-module (SAM2) concatenates the wind power sequence with the SAM1-filtered representations, thereby learning their global spatial relationships and generating aggregated response weights. By stacking these two attention layers, the DSAM ensures a robust, comprehensive, and effective extraction of multivariate spatial features.

(1): First layer of spatial attention mechanism SAM1

This layer of the attention module is used to extract the local spatial correlation between the external sequence data, and its model structure is shown in Figure 5:

Typically, given the k attribute vector of the external sequence (

x^{k}

), attention weights can be calculated using

α

:

e_{t}^{k} = v_{f}^{T} Relu (W_{f} h_{t - 1}^{f} + U_{f} x^{k} + b_{f})

(17)

α_{t}^{k} = \frac{\exp (e_{t}^{k})}{\sum_{j = 1}^{n} \exp (e_{t}^{j})}

(18)

where Relu is the selected activation function, and

v_{f}, b_{f} \in R^{T}, W_{f} \in R^{T \times m}, U_{f} \in R^{T \times T}

is the parameter to be learned.

h_{t - 1}^{f} \in R^{m}

is the hidden state of the previous BiGRU unit, and m is the number of hidden cells in the BiGRU cell, the weighting of attention

α

determined by the historical hidden state and the current input, which represents the impact of each attribute on the prediction.

Since at each moment in time the individual sequence data have corresponding weights, the output of the first-stage spatial attention SAM1 can be expressed as follows:

{\tilde{x}}_{t} = (α_{t}^{1} x_{t}^{1}, α_{t}^{2} x_{t}^{2}, \dots, α_{t}^{n} x_{t}^{n})^{T}

(19)

(2): Second-level spatial attention mechanism SAM2

In this stage, the attention module is used to extract the global spatial correlation between the target sequence Y and the external sequence features, and its model structure is shown in Figure 6.

In this module, the y of the target sequence is spliced with the external sequence features

{\tilde{x}}_{t}

at the corresponding time to construct the vector

z

,

z = [\tilde{x}; y] \in R^{(n + 1) \times T}

as the input data for this module. Then, the attention weight is calculated as follows:

s_{t}^{k} = v_{S}^{T} Relu (W_{S} h_{t - 1}^{S} + U_{S} z + b_{S})

(20)

β_{t}^{k} = \frac{\exp (s_{t}^{k})}{\sum_{j = 1}^{n + 1} \exp (s_{t}^{j})}

(21)

where Relu is the selected activation function, and

v_{S}, b_{S} \in R^{T}, W_{S} \in R^{T \times q}, U_{S} \in R^{T \times T}

is the parameter to be learned.

b_{t - 1}^{S} \in R^{q}

is the hidden state of the previous BiGRU cell, and q is the number of hidden cells in the BiGRU cell.

The output of the second level of the spatial attention SAM2 can be represented as follows:

{\tilde{z}}_{t} = (β_{t}^{1} x_{t}^{1}, β_{t}^{2} x_{t}^{2}, \dots, β_{t}^{n + 1} x_{t}^{n + 1})^{T}

(22)

3.3.2. Time Attention TAM-BiGRU

Within the two-layer DSAM framework, spatial attention modules capture both inter-variable correlations among auxiliary inputs and their relationships with the power series over a fixed time window TTT, thereby learning some short-range temporal patterns. However, these limited temporal contexts may be insufficient for modeling longer-term dependencies. To remedy this, we introduce a BiGRU-based temporal attention module (TAM-BiGRU), whose architecture is illustrated in Figure 7.

For the i hidden state of the temporal attention, the attention weight of its temporal relation can be obtained through the attention mechanism

γ

:

d_{t}^{i} = v_{d}^{T} Relu (W_{d} h_{t - 1}^{o} + U_{d} h_{i}^{s} + b_{d})

(23)

γ_{t}^{k} = \frac{\exp (d_{t}^{i})}{\sum_{j = 1}^{T} \exp (d_{t}^{i})}

(24)

where Relu is the selected activation function, and

v_{d}, b_{d} \in R^{p}, W_{d} \in R^{p \times p}, U_{d} \in R^{p \times q}

is the parameter to be learned.

h_{t - 1}^{o} \in R^{p}

is the hidden state of the previous BiGRU cell, and p is the number of hidden cells in the BiGRU cell.

h_{i}^{s} \in H^{s}

denotes the i hidden state of the second layer of the spatial attention module SAM2.

Finally, the context vector

c_{t}

at moment t represents the weighted summation of all hidden states, and the formula can be expressed as follows:

c_{t} = \sum_{k = 1}^{T} γ_{t}^{k} h_{t}^{o}

(25)

Finally, the context vector

c_{t}

and the hidden layer state of the BiGRU are used as the new hidden layer state, which is added to the fully connected layer and linearly transformed, and the final multi-step prediction result of the model is

{\hat{y}}_{T}, {\hat{y}}_{T + 1}, \dots, {\hat{y}}_{T + τ} = ν_{y}^{T} (W_{y} [o_{t}; c_{t}] + b_{y}) + b_{y}^{'}

(26)

where

W_{y} \in R^{p * 2 p}

and

b_{y} \in R^{τ}

reflect

[o_{t}; c_{t}] \in R^{2 p}

to the hidden state of the decoder,

v_{y} \in R^{τ * p}

is the model weight, and

b_{y}^{'} \in R^{τ * p}

is the model bias.

4. Experimental Tests and Analysis of Results

The multimodal datasets for this study were drawn from two offshore wind farms on China’s southeastern coast, situated more than 100 km apart to minimize the overlap in regional environmental characteristics. Each dataset comprises six sensor streams—the power output, wind speed, wind direction, atmospheric pressure, ambient temperature, and humidity—recorded at 15 min intervals. The first dataset was used for the model training and validation, while the second, geographically independent dataset was employed to further assess the proposed model’s generalization capability across different offshore wind farm environments.

4.1. Experimental Design

Four different models, such as the GRU, BiGRU, TAM-BiGRU, and STAM-BiGRU, were used in the experiments to compare the MAM-BiGRU model proposed in this paper. Meanwhile, this experiment adopts the strategy of the recursive multi-step prediction for the multimodal wind power time series and establishes a sliding window for multivariate data to overshoot the short-term wind power prediction by 1–6 steps.

In this paper, the bidirectional function in the deep learning framework Keras 2.0.2 is chosen to create the BiGRU neural network layer. The specific setup of the MAM-BiGRU model proposed in this paper is shown in Table 1.

4.2. Experimental Results

In order to verify the effectiveness of the model proposed in this paper, four different models such as the GRU, BiGRU, TAM-BiGRU, and STAM-BiGRU are set up to compare the MAM-BiGRU model proposed in this paper, respectively. The experimental results, including LOSS training iteration plots, multi-step prediction result plots, and multi-step predictions, for each of the above five models are as follows.

4.2.1. Loss Function Training Iteration Plot

The loss function of each model is plotted using MAE training iterations as follows (Figure 8).

From Figure 8, it can be seen that the MAM-BiGRU model has a significant advantage in both the convergence speed and convergence accuracy, indicating that the model can learn the data well.

4.2.2. Plots of the Results of the Over-the-Top Multi-Step Prediction

Comparison plots of the prediction results for multiple steps ahead (one, three, and five steps ahead) for each model are shown in Figure 9, Figure 10 and Figure 11.

The above figure shows that the predicted curves of the two models, the MAM-BiGRU and STAM-BiGRU, fit well with the actual power curve, while the curves predicted by the BiGRU and TAM-BiGRU are somewhat different from the original data.

4.2.3. Comparison Table of Prediction Accuracy of Models

Here, this paper evaluates the prediction accuracy of each model. The prediction error evaluation indexes of each model for multiple steps ahead on the first dataset are shown in Table 2 and Table 3, while those on the second dataset are presented in Table 4, which is used to illustrate the generalization ability of the model.

As can be seen from Table 2 and Table 3, the MAM-BiGRU exhibits a good prediction performance in multiple accuracy evaluation metrics for overshooting multiple steps. Meanwhile, the indicators in Table 4, which presents the prediction errors on the second geographically independent dataset, also demonstrate that the model has a good generalization ability.

This can be seen by analyzing the results of the ablation experiments described above:

Attention models are generally better: Models incorporating attention (MAM-BiGRU, STAM-BiGRU, TAM-BiGRU) consistently outperform vanilla GRU and BiGRU networks.
Temporal attention is effective but insufficient: While the TAM-BiGRU effectively captures long-term dependencies—yielding a higher accuracy than the GRU/BiGRU—it still neglects spatial interactions among input variables.
Limitations of single-layer spatio-temporal attention: The STAM-BiGRU improves short-term forecasts but degrades markedly over multi-step horizons, as its single spatial attention layer cannot fully model complex inter-variable relationships and is vulnerable to irrelevant feature noise.
Advantages of multi-layer spatio-temporal attention: The proposed MAM-BiGRU, with its stacked spatial and temporal attention modules, more comprehensively extracts multimodal features, preserves critical dependencies between the target and auxiliary sequences, and achieves the highest accuracy and stability in short-term multi-step predictions under complex offshore conditions.

5. Conclusions

In light of the limitations inherent in single-modality forecasting—where conventional models rely solely on power or wind speed series and thus neglect the interdependencies with multidimensional meteorological factors (e.g., temperature, barometric pressure, wind direction)—and given that the naïve aggregation of multimodal inputs can introduce redundancy and overfitting, we propose a novel fusion framework, the MAM-BiGRU. This architecture comprises two key components: (1) a double spatial attention module (DSAM), built on a bidirectional GRU, which learns both intra- and inter-modal spatial correlations to generate adaptive feature weights, and (2) a multi-layer temporal attention mechanism (TAM), also leveraging the BiGRU, which extracts long-term temporal dependencies and cyclic patterns. The experimental evaluation confirms that the MAM-BiGRU delivers superior forecasting accuracy and stability compared to benchmark methods.

Author Contributions

Conceptualization, Y.X.; methodology, Y.X.; software, X.G.; validation, Y.X., Y.L. and X.G.; formal analysis, S.L.; investigation, Y.X.; resources, Y.X.; data curation, X.G.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L. and S.L.; visualization, S.L.; supervision, Y.X.; project administration, Y.X.; funding acquisition, Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the External Cooperation Program of Science and Technology Planning of Fujian Province (Grant No. 2022I0015) and supported by the Scientific Research Funds of Huaqiao University.

Data Availability Statement

The research data underlying these findings may be accessed by contacting yyxu@hqu.edu.cn with a reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AM	Attention Mechanism
NWP	Numerical Weather Prediction
AR	Autoregressive
AI	Artificial Intelligence
ANN	Artificial Neural Network
BPNN	Backpropagation Neural Network
ELM	Extreme Learning Machine
WPD	Wavelet Packet Decomposition
LSTM	Long Short-Term Memory Network
CNN	Convolutional Neural Network
ARMA	Autoregressive Moving Average
DWT	Discrete Wavelet Transform
EEMD	Ensemble Empirical Mode
BA	Bi-Attention Mechanism
CSO	Crisscross Optimization Algorithm
MAM	Multi-Layer Attention Mechanism
BiGRU	Bidirectional Gating Recurrent Unit
DSAM	Dual Spatial Attention Mechanism
TAM	Temporal Attention Mechanism
RNN	Recurrent Neural Network
GRU	Gating Recurrent Unit
STAM	Spatio-Temporal Attention Mechanism
MAE	Mean Absolute Error
RMSE	Root Mean Square Error
MSTAM	Multi-Layer Spatio-Temporal Attention Mechanism

References

Desalegn, B.; Gebeyehu, D.; Tamrat, B.; Tadiwose, T.; Lata, A. Onshore versus offshore wind power trends and recent study practices in modeling of wind turbines’ life-cycle impact assessments. Clean. Eng. Technol. 2023, 17, 100691. [Google Scholar] [CrossRef]
Global Wind Energy Council. Global Wind Energy Council Report 2019. 2020. Available online: http://arxiv.org/abs/1704.02971 (accessed on 26 July 2025).
Abdel-Aty, A.-H.; Nisar, K.S.; Alharbi, W.R.; Owyed, S.; Alsharif, M.H. Boosting wind turbine performance with advanced smart power prediction: Employing a hybrid AR–MA–LSTM technique. Alex. Eng. J. 2024, 96, 58–71. [Google Scholar] [CrossRef]
de Castro, M.; Salvador, S.; Gómez-Gesteira, M.; Costoya, X.; Carvalho, D.; Sanz-Larruga, F.J.; Gimeno, L. Europe, China and the United States: Three different approaches to the development of offshore wind energy. Renew. Sustain. Energy Rev. 2019, 109, 55–70. [Google Scholar] [CrossRef]
Lu, S.; Gao, Z.; Xu, Q.; Jiang, C.; Zhang, A.; Wang, X. Class-imbalance privacy-preserving federated learning for decentralized fault diagnosis with biometric authentication. IEEE Trans. Ind. Inform. 2022, 18, 9101–9111. [Google Scholar] [CrossRef]
Li, M.; Jiang, X.; Carroll, J.; Negenborn, R.R. A multi-objective maintenance strategy optimization framework for offshore wind farms considering uncertainty. Appl. Energy 2022, 321, 119284. [Google Scholar] [CrossRef]
Choi, Y.; Park, S.; Choi, J.; Lee, G.; Lee, M. Evaluating offshore wind power potential in the context of climate change and technological advancement: Insights from Republic of Korea. Renew. Sustain. Energy Rev. 2023, 183, 113497. [Google Scholar] [CrossRef]
Hanifi, S.; Liu, X.; Lin, Z.; Lotfian, S. A critical review of wind power forecasting methods—Past, present and future. Energies 2020, 13, 3764. [Google Scholar] [CrossRef]
Zhang, W.; He, Y.; Yang, S. A multi-step probability density prediction model based on Gaussian approximation of quantiles for offshore wind power. Renew. Energy 2023, 202, 992–1011. [Google Scholar] [CrossRef]
Wu, Z.; Xia, X.; Xiao, L.; Liu, Y. Combined model with secondary decomposition-model selection and sample selection for multi-step wind power forecasting. Appl. Energy 2020, 261, 114345. [Google Scholar] [CrossRef]
Poggi, P.; Muselli, M.; Notton, G.; Cristofari, C.; Louche, A. Forecasting and simulating wind speed in Corsica by using an auto-regressive model. Energy Convers. Manag. 2003, 44, 3177–3196. [Google Scholar] [CrossRef]
Liu, H.; Tian, H.; Liang, X.; Li, Y. Wind speed forecasting approach using secondary decomposition algorithm and Elman neural networks. Appl. Energy 2015, 157, 183–194. [Google Scholar] [CrossRef]
Rahimilarki, R.; Gao, Z.; Zhang, A.; Binns, R.R. Robust neural network fault estimation approach for nonlinear dynamic systems with applications to wind turbine systems. IEEE Trans. Ind. Inform. 2019, 15, 6302–6312. [Google Scholar] [CrossRef]
Yan, L.; Hu, P.; Li, C.; Yao, Y.; Xing, L.; Lei, F.; Zhu, N. The performance prediction of ground source heat pump system based on monitoring data and data mining technology. Energy Build. 2016, 127, 1085–1095. [Google Scholar] [CrossRef]
Zhao, Y.; Ye, L.; Li, Z.; Song, X.; Lang, Y.; Su, J. A novel bidirectional mechanism based on time series model for wind power forecasting. Appl. Energy 2016, 177, 793–803. [Google Scholar] [CrossRef]
Yang, Z.; Wang, J. A hybrid forecasting approach applied in wind speed forecasting based on a data processing strategy and an optimized artificial intelligence algorithm. Energy 2018, 160, 87–100. [Google Scholar] [CrossRef]
Lu, P.; Ye, L.; Zhao, Y.; Dai, B.; Pei, M.; Li, Z. Feature extraction of meteorological factors for wind power prediction based on variable weight combined method. Renew. Energy 2021, 179, 1925–1939. [Google Scholar] [CrossRef]
Chen, H. Cluster-based ensemble learning for wind power modeling from meteorological wind data. Renew. Sustain. Energy Rev. 2022, 167, 112652. [Google Scholar] [CrossRef]
Liu, H.; Yang, L.; Zhang, B.; Zhang, Z. A two-channel deep network based model for improving ultra-short-term prediction of wind power via utilizing multi-source data. Energy 2023, 283, 128510. [Google Scholar] [CrossRef]
Hanifi, S.; Zare-Behtash, H.; Cammarano, A.; Lotfian, S. Offshore wind power forecasting based on WPD and optimised deep learning methods. Renew. Energy 2023, 218, 119241. [Google Scholar] [CrossRef]
Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G. A dual-stage attention-based recurrent neural network for time series prediction. arXiv 2017, arXiv:1704.02971. [Google Scholar]
Meng, A.; Chen, S.; Ou, Z.; Ding, W.; Zhou, H.; Fan, J.; Yin, H. A hybrid deep learning architecture for wind power prediction based on bi-attention mechanism and crisscross optimization. Energy 2022, 238, 121795. [Google Scholar] [CrossRef]
Wang, X.; Cai, X.; Li, Z. Ultra-short-term wind power forecasting method based on a cross LOF preprocessing algorithm and an attention mechanism. Power Syst. Prot. Control. 2020, 48, 92–99. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. Paper No. 1409.0473. [Google Scholar]
Shi, Y.; Meng, J.; Wang, J. Seq2seq model with RNN attention for abstractive summarization. In Proceedings of the 2019 International Conference on Computational Linguistics & Intelligent Text Processing, Santa Fe, NM, USA, 10–16 April 2019. [Google Scholar]
Yin, R.; Zhang, Y.; Zhou, X.; Wang, L.; Li, Q.; Chen, S. Time series computational prediction of vaccines for Influenza A H3N2 with recurrent neural networks. J. Bioinform. Comput. Biol. 2020, 18, 1023–1039. [Google Scholar] [CrossRef]
Lin, J.; Wang, Y.; Yu, H.; Jian, L. Conceptual design of inflatable Savonius wind turbine and performance investigation of varying thickness and arc angle of blade. In Proceedings of the 2023 IEEE 7th Conference on Energy Internet and Energy System Integration (EI2), Hangzhou, China, 15–18 December 2023; pp. 1370–1376. [Google Scholar]

Figure 1. Extended encoder–decoder model diagram.

Figure 2. Basic principle diagram of attention mechanism.

Figure 3. The process chart of the attention mechanism computation.

Figure 4. Overall framework of MAM-BiGRU prediction model.

Figure 5. The structure of the first layer of the spatial attention mechanism model.

Figure 6. The structure of the second layer of the spatial attention mechanism model.

Figure 7. Structure of temporal attention mechanism model.

Figure 8. Iterative validation error curve of MAE model.

Figure 9. Comparison of one-step prediction results of different models.

Figure 10. Comparison of three-step prediction results of different models.

Figure 11. Comparison of five-step prediction results of different models.

Table 1. Summary of MAM-BiGRU model parameter settings.

Parameter Symbol	Parameter Value
Input Dimension	96 * 6
Time Steps	96
BiGRU Layer Length	64
BiGRU Activation Function	Relu
Attention Dimension	3
Batch Size	64
Learning Rate	0.0001
Epoch	100
Dropout Prob	0.3

Table 2. The table of prediction errors for each model on the first dataset (MAE).

Predicted Step Size	GRU	BiGRU	TAM-BiGRU	STAM-BiGRU	MAM-BiGRU
Step 1 (15 min)	0.0258	0.0253	0.0240	0.0237	0.0216
Step 3 (45 min)	0.0316	0.0310	0.0294	0.0290	0.0265
Step 5 (75 min)	0.0568	0.0557	0.0529	0.0521	0.0475
Step 8 (2 h)	0.0791	0.0775	0.0736	0.0725	0.0659
Step 16 (4 h)	0.0834	0.0817	0.0776	0.0765	0.0692
Step 32 (8 h)	0.0905	0.0887	0.0843	0.0830	0.0757
Step 48 (12 h)	0.0920	0.0902	0.0857	0.0844	0.0766
Step 96 (24 h)	0.1121	0.1099	0.1044	0.1023	0.0932

Table 3. The table of prediction errors for each model on the first dataset (RMSE).

Predicted Step Size	GRU	BiGRU	TAM-BiGRU	STAM-BiGRU	MAM-BiGRU
Step 1 (15 min)	0.0315	0.0309	0.0293	0.0289	0.0264
Step 3 (45 min)	0.0376	0.0368	0.0350	0.0345	0.0315
Step 5 (75 min)	0.0623	0.0611	0.0580	0.0571	0.0520
Step 8 (2 h)	0.0756	0.0741	0.0704	0.0693	0.0629
Step 16 (4 h)	0.0829	0.0812	0.0796	0.0785	0.0697
Step 32 (8 h)	0.0960	0.0941	0.0894	0.0880	0.0803
Step 48 (12 h)	0.1176	0.1152	0.1095	0.1078	0.0979
Step 96 (24 h)	0.1392	0.1364	0.1296	0.1252	0.1168

Table 4. The table of prediction errors for each model on the second dataset (MAE and RMSE).

Predicted Step Size	MAE			RMSE
Predicted Step Size	GRU	BiGRU	MAM-BiGRU	GRU	BiGRU	MAM-BiGRU
Step 1 (15 min)	0.0254	0.0248	0.0238	0.0320	0.0307	0.0261
Step 3 (45 min)	0.0313	0.0306	0.0292	0.0373	0.0370	0.0312
Step 5 (75 min)	0.0573	0.0554	0.0534	0.0628	0.0609	0.0518
Step 8 (2 h)	0.0789	0.0780	0.0733	0.0761	0.0739	0.0635
Step 16 (4 h)	0.0839	0.0815	0.0782	0.0826	0.0817	0.0694
Step 32 (8 h)	0.0901	0.0893	0.0848	0.0965	0.0938	0.0807
Step 48 (12 h)	0.0926	0.0899	0.0854	0.01172	0.1158	0.0975
Step 96 (24 h)	0.1118	0.1105	0.1041	0.1398	0.1361	0.1173

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Y.; Lin, Y.; Li, S.; Gao, X. Prediction Model of Offshore Wind Power Based on Multi-Level Attention Mechanism and Multi-Source Data Fusion. Electronics 2025, 14, 3183. https://doi.org/10.3390/electronics14163183

AMA Style

Xu Y, Lin Y, Li S, Gao X. Prediction Model of Offshore Wind Power Based on Multi-Level Attention Mechanism and Multi-Source Data Fusion. Electronics. 2025; 14(16):3183. https://doi.org/10.3390/electronics14163183

Chicago/Turabian Style

Xu, Yuanyuan, Yixin Lin, Shuhao Li, and Xiutao Gao. 2025. "Prediction Model of Offshore Wind Power Based on Multi-Level Attention Mechanism and Multi-Source Data Fusion" Electronics 14, no. 16: 3183. https://doi.org/10.3390/electronics14163183

APA Style

Xu, Y., Lin, Y., Li, S., & Gao, X. (2025). Prediction Model of Offshore Wind Power Based on Multi-Level Attention Mechanism and Multi-Source Data Fusion. Electronics, 14(16), 3183. https://doi.org/10.3390/electronics14163183

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction Model of Offshore Wind Power Based on Multi-Level Attention Mechanism and Multi-Source Data Fusion

Abstract

1. Introduction

2. Attention Mechanism Algorithmic Foundations

3. Short-Term Wind Power Prediction Model with Multi-Layer Attention

3.1. A Description of the Multimodal Fusion Wind Power Prediction Problem

3.2. Aerodynamic Characteristics and Mechanical Performance Analysis of Inflatable Savonius Wind Turbines

3.3. The General Framework of the Model

3.3.1. Hierarchical Struacture

3.3.2. Time Attention TAM-BiGRU

4. Experimental Tests and Analysis of Results

4.1. Experimental Design

4.2. Experimental Results

4.2.1. Loss Function Training Iteration Plot

4.2.2. Plots of the Results of the Over-the-Top Multi-Step Prediction

4.2.3. Comparison Table of Prediction Accuracy of Models

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI