Prediction of Blade Root Loads for Wind Turbine Based on RBMO-VMD and TCN-BiLSTM-Attention

Liu, Yifan; Cheng, Jing

doi:10.3390/math14020218

Open AccessArticle

Prediction of Blade Root Loads for Wind Turbine Based on RBMO-VMD and TCN-BiLSTM-Attention

by

Yifan Liu

¹ and

Jing Cheng

^1,2,*

¹

The School of Electrical Engineering, Xinjiang University, Urumqi 830017, China

²

Engineering Research Centre of Renewable Energy Generation and Grid Control, Ministry of Education, Urumqi 830017, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(2), 218; https://doi.org/10.3390/math14020218

Submission received: 18 November 2025 / Revised: 23 December 2025 / Accepted: 29 December 2025 / Published: 6 January 2026

(This article belongs to the Collection Applied Mathematics for Emerging Trends in Mechatronic Systems)

Download

Browse Figures

Versions Notes

Abstract

Addressing the challenges associated with wind turbine blade root loads—including nonlinearity, strong coupling effects, high computational complexity, and the limitations of conventional mathematical-physical modeling approaches. This paper proposes a wind turbine blade root load prediction model that integrates Variational Mode Decomposition (VMD) optimized by the Red-billed Blue Magpie Algorithm (RBMO) and a combined Temporal Convolutional Network (TCN)—Bidirectional Long Short-Term Memory (BiLSTM)—Attention mechanism. First, the RBMO algorithm optimizes VMD parameters. VMD decomposes data into multiple sub-sequences, which are combined with environmental and operational parameters to form input components for the TCN-BiLSTM-Attention ensemble prediction model. Finally, the RBMO algorithm determines the optimal hyperparameter configuration for the combined model. Prediction outputs from each component are then aggregated and reconstructed to yield the final blade root load prediction. Predictions are compared against actual data and results from other forecasting models. Results demonstrate superior predictive performance for the proposed model, effectively enhancing the accuracy of blade root load prediction for wind turbines.

Keywords:

blade root load prediction; combinatorial prediction models; variational modal decomposition; red-billed blue magpie algorithm; temporal convolutional networks; bi-directional long- and short-term memory networks; attention mechanisms

MSC:

68T07

1. Introduction

The global demand for electricity continues to grow, and wind energy, as a clean and renewable source of energy, has been recognized as a viable solution to this growing demand. As an important equipment for wind energy utilization, the performance of wind turbine blades directly affects the efficiency and reliability of power generation. The harsh service environment and large flexible structure make the reliability and safety of wind turbines face severe challenges [1]. As a key part of the blade-hub connection, the blade root is subjected to extremely complex and critical load conditions. Precise prediction of the blade root load is crucial for optimizing blade structural design, evaluating fatigue life, and ensuring the safe and stable operation of the entire system.

Traditional forecasting methods encompass machine learning and neural network algorithms. Conventional machine learning approaches include the Naive Bayes algorithm [2], K-Nearest Neighbors algorithm [3], and decision trees [4]. Additionally, many researchers employ traditional neural network methods such as Extreme Learning Machines (ELM) [5] and backpropagation (BP) neural network models [6] for forecasting. While these approaches yield satisfactory prediction results, their performance significantly deteriorates when applied to complex, multi-factor interacting, ultra-long forecasting sequences like blade root loads of wind turbines. Consequently, both prediction accuracy and overall effectiveness become suboptimal.

In recent years, deep learning technology has made rapid progress. Because it possesses strong learning capabilities and data representation abilities, it is suitable for processing nonlinear, high-dimensional, and spatiotemporal sequence data. Currently, Long Short-term Memory (LSTM) networks [7] and Convolutional Neural Networks (CNN) [8] have been successfully applied to wind turbine load prediction. Among these, LSTM has gained popularity in time series forecasting due to its strong learning capabilities and generalization ability. However, it struggles to comprehensively capture data features when predicting complex data. Kaitong, W. [9] et al. improved upon LSTM by applying a bidirectional long short-term memory (BiLSTM) network to process data simultaneously in both forward and backward directions, demonstrating a significant enhancement in predictive performance. However, single prediction models often have limitations. The predictive accuracy of ensemble models, which integrate multiple algorithms or models, significantly outperforms that of single prediction models [10]. Qian et al. [11] proposed a hybrid model combining CNN with BiLSTM. Experimental data demonstrated the model’s superior generalization capability. However, CNN models are relatively constrained in processing sequential data and lack sufficient modeling capacity for long-term dependencies in such data. Laitsos, V. et al. [12] applied Temporal Convolutional Network (TCN) to achieve improved load forecasting results, addressing the issue of CNNs’ inability to capture long-term dependencies.

The blade root load sequence of wind turbines is influenced by multiple factors, exhibiting non-stationary and fluctuating characteristics. To enhance the predictive model’s capability in handling complex, non-stationary data, researchers have proposed forecasting methods that integrate signal decomposition techniques with deep learning [13,14]. Zhang Yunqin et al. [15] employed Empirical Mode Decomposition (EMD) to decompose time series into modal components at different time scales, but this approach may suffer from issues such as modal artifacts and over-decomposition. Shen Haibo et al. [16] proposed a combined forecasting model based on CEEMDAN-PCA-BiLSTM-LSTNet. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) is an enhancement of EMD, but the empirical mode functions extracted by CEEMDAN may be unstable, potentially exhibiting duplicate modes or over-decomposition. Refs. [17,18] propose that VMD can effectively address the shortcomings of EMD and CEEMDAN while offering adaptability and reconstructive capability, achieving satisfactory prediction results. However, it is worth noting that the network parameters of these combined prediction models and the parameters of signal decomposition methods are mostly determined empirically, which cannot guarantee optimal model performance across all scenarios.

To this end, this paper proposes a novel blade root load prediction method for wind turbines. Specifically, it introduces a blade root load prediction method based on RBMO-VMD and TCN-BiLSTM-Attention. First, the RBMO algorithm is employed to determine the parameters of VMD. Subsequently, the data is decomposed into multiple sub-sequences, which are then combined with environmental parameters and operational status parameters to form the input components. After data processing, each component is input into the combined TCN-BiLSTM-Attention model for separate prediction, and RBMO is employed to determine the optimal combination of hyperparameters for each input component network. Finally, the prediction results from each input component are summed and reconstructed to obtain the final blade root load prediction for the wind turbine. This prediction result is compared with actual data and predictions from other models. The results demonstrate that the proposed model exhibits excellent prediction accuracy for blade root loads in wind turbines.

2. Basic Principle

2.1. Blade Element Theory

Blade element theory is a commonly used method in blade aerodynamic load analysis. The blade element theory means that the wind turbine blades are divided into several micro-elements, and each micro-element is called blade element as illustrated in Figure 1.

Take a section

r

of the blade at the radius of the paddle, its length is

d r

. Acted by the synthetic wind speed

w

aerodynamic force

d F

on the blade, the aerodynamic force can be further broken down into lift

d L

and drag

d D

according to orthogonal or aligned with the relative velocity

w

and

d F_{x}

and

d F_{y}

according to along the direction of rotation blade or perpendicular to the plane of rotation of the blade.

C_{l}

is the lift coefficient,

C_{d}

is the drag coefficient.

I

is the angle of incidence of the airflow,

i

is the angle of attack,

e

is the chord length; the force

d F_{x}

in the direction of swinging and

d F_{y}

in the direction of oscillation are given as follows:

d F_{x} = 0.5 ρ W^{2} e (C_{1} \sin I - C_{d} \cos I) d r

(1)

d F_{y} = 0.5 ρ W^{2} e (C_{1} \cos I + C_{d} \sin I) d r

(2)

The oscillating moment

M_{x}

and the waving moment

M_{y}

[19,20] at the blade root of the paddle are given as follows:

M_{x} = 0.5 \int_{r_{0}}^{R} w^{2} e (C_{1} \sin I - C_{d} \cos I) r d r

(3)

M_{y} = 0.5 \int_{r_{0}}^{R} w^{2} e (C_{1} \cos I + C_{d} \sin I) r d r

(4)

where

r_{0}

—the hub radius;

R

—the paddle length.

2.2. Red-Billed Blue Magpie Optimizer Algorithm

The RBMO intelligent optimization method draws inspiration from a series of predatory behaviors exhibited by the red-billed blue magpie population [21]. Its optimization process comprises the following steps: population initialization, foraging phase, prey attack, and food storage.

2.2.1. Population Initialization

Initially, random positions are generated for each red-billed blue magpie, which can be represented by the following matrix. This matrix captures the coordinates corresponding to the starting locations of the individual magpies within the optimization framework.

X = [\begin{matrix} x_{1, 1} & \dots & x_{1, j} & x_{1, d i m - 1} & x_{1, d i m} \\ x_{2, 1} & \dots & x_{2, j} & \dots & x_{2, d i m} \\ \dots & \dots & x_{i, j} & \dots & \dots \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ x_{n - 1, 1} & \dots & x_{n - 1, j} & \dots & x_{n - 1, d i m} \\ x_{n, 1} & \dots & x_{n, j} & x_{n, d i m - 1} & x_{n, d i m} \end{matrix}]

(5)

where

n

—population size;

d i m

—number of optimized variables;

x_{i, j}

—location of individual red-billed blue magpie.

x_{i, j} = (u_{b} - l_{b}) \times {rand}_{1} + l_{b}

(6)

where

u_{b}

—upper limit;

l_{b}

—lower limit;

rand

—random number from 0 to 1.

2.2.2. Foraging Phase

The following formula is used when red-billed blue magpies forage in small groups:

X^{i} (t + 1) = X^{i} (t) + (\frac{1}{p} \times \sum_{m = 1}^{p} X^{m} (t) - X^{r s} (t)) \times {rand}_{2}

(7)

where

t

—number of current iterations;

X^{i} (t + 1)

—location of the ith new search agent;

p

—When conducting a small-scale search, the random number of red-billed blue magpies within a small group ranges from 2 to 5;

X^{m}

—The mth randomly selected red-billed blue magpie individual, where m is the identification number of the individual within the subpopulation, is used to calculate the population’s average position;

X^{i}

—ith individual;

X^{r s} (t)

—During iteration, randomly select one individual from the entire population, designated as r, to serve as the benchmark for variance calculation.

The subsequent formula is applied when red-billed blue magpies engage in foraging within groups:

X^{i} (t + 1) = X^{i} (t) + (\frac{1}{q} \times \sum_{m = 1}^{q} X^{m} (t) - X^{r s} (t)) \times {rand}_{3}

(8)

where q—When conducting large-scale searches, a large cluster performs the search. The number of individuals in the cluster ranges from 10 to n.

2.2.3. Attacking Prey

In small group foraging activities, the primary targets typically consist of small prey or plant material. This is represented by the mathematical expression in Equation (9). Conversely, when red-billed blue magpies forage collectively in groups, they have the capacity to target larger prey, including sizable insects or small vertebrates. This scenario is illustrated by the mathematical expression in Equation (10).

X^{i} (t + 1) = X^{food} (t) + C F \times (\frac{1}{p} \times \sum_{m = 1}^{p} X^{m} (t) - X^{i} (t)) \times {randn}_{1}

(9)

X^{i} (t + 1) = X^{food} (t) + C F \times (\frac{1}{q} \times \sum_{m = 1}^{q} X^{m} (t) - X^{i} (t)) \times {randn}_{2}

(10)

where

X^{food}

—the position of the food;

randn

—the position of the food; the randomly generated number that produces a standard normal distribution with a mean of 0 and a standard deviation of 1;

C F = {(\begin{matrix} 1 - (\begin{matrix} \frac{t}{T} \end{matrix}) \end{matrix})}^{(2 \times \frac{t}{T})}

,

T

represents the maximum number of iterations.

2.2.4. Food Storage

Besides searching for and capturing food, red-billed blue magpies also store excess food in tree hollows or other hidden spots for future use. This behavior not only guarantees a consistent food supply during times of scarcity but also retains information pertinent to their foraging tactics, thus helping individuals to identify the best strategies in their resource search. The subsequent equation offers a mathematical representation of this process:

X^{i} (t + 1) = \{\begin{array}{l} X^{i} (t) & i f f i t n e s s_{o l d}^{i} > f i t n e s s_{n e w}^{i} \\ X^{i} (t + 1) & e l s e \end{array}

(11)

where

f i t n e s s_{o l d}^{i}

—the fitness score of the ith red-billed blue magpie prior to the update of its position;

f i t n e s s_{n e w}^{i}

—the fitness value of the ith red-billed blue magpie after the position update.

2.3. VMD Decomposition Principle

VMD is an adaptive, fully non-recursive modal variational and signal processing method that decomposes complex raw data into a series of intrinsic mode functions (IMF) [22]. The VMD algorithm achieves algorithm decomposition by establishing a constraint model. First, introduce the Lagrange multiplier and quadratic penalty term to transform the constrained problem into an unconstrained one; Subsequently, by continuously updating the bandwidth and center frequency of each decomposed component through specific methods, the optimal solution is sought to complete the decomposition of the original signal. When performing VMD decomposition, attention must be paid to the actual data characteristics and analytical requirements, particularly the selection of the decomposition number

K

and the penalty coefficient

α

. Consequently, VMD decomposition often necessitates the integration of intelligent optimization algorithms to determine the optimal values for (K, α).

The blade root loads investigated in this study result from the combined effects of multiple dynamic factors, exhibiting complex characteristics such as non-stationarity, multi-scale behavior, strong coupling, and significant noise interference. It is quite challenging to accurately predict the load directly on the blade root. Therefore, this study attempts to decompose signals using VMD optimized by intelligent optimization algorithms, breaking down the acquired long-term payload data to reduce the impact of interference terms and enhance the stability and predictability of the data sequence.

3. TCN-BiLSTM-Attention Model

3.1. Temporal Convolutional Neural Network

TCN are capable of capturing long-term temporal dependencies through a sequence of convolutional layers, offering advantages over traditional recurrent neural networks, particularly in addressing the issue of gradient vanishing [23]. This methodology has gained widespread acceptance within the realm of time series analysis, owing to its superior modeling capabilities, high computational efficiency, and adaptability in managing a variety of inputs while processing time series data.

TCN is a neural network model consisting of the fusion of dilated causal con-volutions (DCC) and residual connection (RC) [24]. TCN ensures strict causality in time series prediction through causal convolutions, preventing leakage of future information [25]; it utilizes dilated convolutions to expand the receptive field without increasing parameters, thereby capturing long-term dependencies; and introduces residual modules to mitigate gradient issues in deep networks, comprehensively enhancing the accuracy and stability of time series predictions. The configuration is illustrated in Figure 2 and Figure 3.

3.2. Bidirectional Long and Short-Term Memory Network

BiLSTM is improved on the basis of LSTM.LSTM network, as a unique variant of temporal recurrent neural networks, it plays a crucial role in addressing the issue of long-term dependencies, and avoiding gradient vanishing and gradient explosion [26]. LSTM uses the mechanism of ‘gates’ to control the information through the input gates, forgetting gates and output gates. The LSTM network structure is shown in Figure 4.

The steps of LSTM model for prediction are as follows:

First, the sigmoid layer of the ‘forgetting gate’ assigns a numerical value that ranges from 0 to 1. to each neuron in the state vector

C_{t - 1}

from the previous time step. This assignment is achieved using the output

h_{t - 1}

from the previous time step and the input

x_{t}

from the current time step. A value of ‘1’ denotes that the associated information is ‘fully preserved,’ whereas a value of ‘0’ indicates that the information is ‘entirely disregarded.’

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f})

(12)

where

σ

—sigmoid function;

W_{f}

—oblivion gate weight matrix;

b_{f}

—oblivion gate bias vector.

In the subsequent step, the sigmoid layer of the ‘input gate’ determines which values should be modified, while the tanh layer is employed to produce a new candidate value that is then incorporated into the neuron state.

i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i})

(13)

C_{t} = \tanh (W_{c} [\begin{matrix} h_{t - 1}, x_{t} \end{matrix}] + b_{c})

(14)

where

W_{i}

—input gate cell weight moments;

W_{c}

—state cell weight matrix;

b_{i}

—input gate bias vectors;

b_{c}

—input gate bias vectors -state cell bias vector.

Then update the old neuron state to the new neuron state.

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot C_{t}

(15)

Ultimately, the Sigmod layer is used to determine the neuron states that need to be output, and the information is output by passing the neuron states through the tanh layer and multiplying the output by the Sigmod threshold.

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(16)

h_{t} = o_{t} \cdot \tanh (C_{t})

(17)

where

W_{o}

—output gate weight matrix;

b_{o}

—output gate bias vector.

The BiLSTM architecture consists of two LSTM units, namely the forward LSTM and the backward LSTM, which are interconnected to enable the model to handle information from both the past and the future. This dual processing capability allows for more effective capture of temporal information in time series data [27]. The configuration of the BiLSTM network is illustrated in Figure 5.

3.3. Attention Mechanism

Traditional neural networks exhibit notable limitations when processing input sequences of user features, often encountering issues such as information confusion, information loss, and restricted learning capacity. The incorporation of the attention mechanism effectively addresses these challenges. This mechanism allows for the weighting and summarization of inputs, enabling the model to more accurately capture important information pertinent to user features within the sequence. Consequently, it enhances the focus on these critical elements, thereby improving overall performance.

The fundamental concept of the attention mechanism is to assign varying weights to the hidden layer states within the neural network, effectively distributing attention across different pieces of input information. The mathematical model for the allocation of weights by the attention mechanism is shown in Equations (18) and (19).

e_{t} = \tanh (w_{a} h_{t} + p_{c})

(18)

a_{t} = \frac{\exp (e_{t})}{\sum_{g = 1}^{t} e_{g}}

(19)

where

h_{t}

—sequence of input features at moment

t

;

a_{t}

—weights assigned to features by the attention module;

w_{a}

—weight matrix;

p_{c}

—Bias term;

e_{t}

—Probability distribution.

3.4. TCN-BiLSTM-Attention Model Construction

The TCN-BiLSTM-Attention model consists of an input layer, a TCN layer, a Dropout layer, a BiLSTM layer, an Attention layer, and a Flatten layer. Specifically, the TCN layer primarily extracts features inherent to individual time series, while the BiLSTM layer further uncovers long-term and short-term dependencies among these features. Then, the Flatten layer with Attention mechanism and the fully connected layer capture the correlations between sequences, filtering out irrelevant information that may cause interference, thereby producing better prediction sequences. The TCN-BiLSTM layer can extract features and dependency relationships for each sequence, thereby accelerating processing speed. After processing through the TCN-BiLSTM layer, the feature dimension is significantly reduced, facilitating the Attention mechanism to select useful features for prediction. The TCN-BiLSTM-Attention model architecture is shown in Figure 6.

In Figure 6,

z_{t}

,

z_{t + 1}

,

z_{t + 2}

, and

z_{t + 3}

represent the input data for TCN at time

t

,

t + 1

,

t + 2

, and

t + 3

, respectively; while

y_{t}

,

y_{t + 1}

,

y_{t + 2}

,

y_{t + 3}

represent the input data for the Attention mechanism at time steps

t

,

t + 1

,

t + 2

, and

t + 3

, respectively. Furthermore,

a_{t}

,

a_{t + 1}

,

a_{t + 2}

,

a_{t + 3}

denote the output weights for the Attention mechanism at time steps

t

,

t + 1

,

t + 2

, and

t + 3

, respectively.

4. VMD-TCN-BiLSTM-Attention Prediction Model Optimized by RBMO Algorithm

4.1. VMD Optimized Based on RBMO Algorithm

In the process of VMD decomposition, the selection of the penalty coefficient

α

and the number of decomposition layers

K

will directly affect the decomposition effect of the VMD method. Setting the value of

K

too low can result in insufficient decomposition and modal aliasing, while opting for a value that is too high may produce spurious components. The penalty coefficient

α

governs the bandwidth of the IMF component. Too large or too small a value may cause the IMF component to contain other component signals or signal loss. Therefore, the values of

K

and

α

need to be chosen reasonably.

Entropy is an index to describe the sparsity of the signal [28]. The Sample Entropy (SE) value is lower when the IMF component derived from VMD decomposition is more regular and less complex.; on the contrary, the higher the value of SE, the more irregular and complex the IMF component is. Therefore, the sample entropy is chosen as the fitness function for VMD parameter optimization. Iteration employs an early termination strategy: if the fitness value shows no significant decrease for 10 consecutive iterations, or if the maximum iteration count is reached, the iteration process terminates. The specific process is as follows:

(1): Initialization: Initialize the population using Equations (5) and (6). Set population size n = 40, dimension dim = 2, where j = 1 and j = 2 represent $K$ and $α$ respectively. Set the maximum iteration count T = 60, with K ranging from [4, 10] and α ranging from [100, 3000];
(2): Calculate the sample entropy corresponding to each individual;
(3): During the foraging phase, each individual adjusts its position based on the difference between the average position of the reference group and the position of a randomly selected individual within the population. That is, the search area is updated according to Equations (7) and (8).
(4): During the prey attack phase, the swarm detects small prey (local optima) and conducts a fine-grained search within a small radius centered on the prey. Upon discovering large prey (potentially a global optimum), the swarm concentrates its efforts on an intensive search around the prey. Specifically, the values of $K$ and $α$ are updated according to Equations (9) and (10).
(5): During the food storage phase, retain the parameter combination with the superior fitness value according to Equation (11);
(6): If the iteration termination condition is not met, continue iterating; otherwise, output the optimal solution for $K$ and $α$

The specific flow of the RBMO algorithm to determine the combination of VMD parameters is shown in Figure 7.

4.2. RBMO Optimization of Hyperparameters for the TCN-BiLSTM-Attention Model

In the study of blade root load prediction for wind turbines based on the TCN-BiLSTM-Attention model, hyperparameters such as the number of BiLSTM hidden layers, TCN layers, and learning rate directly influence the model architecture, convergence, and prediction accuracy. However, traditional methods like manual tuning are inefficient and resource-intensive, often becoming stuck in suboptimal local minima, making it difficult to determine suitable hyperparameter combinations. To this end, this study employs the RBMO algorithm, which possesses robust global search capabilities and adaptive local exploration, making it highly suitable for high-dimensional hyperparameter optimization. The RBMO algorithm can fully explore the implicit relationships between the model network’s hyperparameters and the dataset to obtain suitable hyperparameters, thereby improving prediction accuracy. Iteration also employs early stopping. If the fitness value shows no significant decrease for 10 consecutive iterations, or if the maximum iteration count is reached, the iteration process terminates. Its specific steps are as follows:

(1): Initialize the population using Equations (5) and (6), with a population size n = 60 and maximum iteration count T = 100. Set the ranges for each hyperparameter; detailed values are shown in Table 1;
(2): Design Fitness Function;
(3): During the foraging phase, exploration is conducted using population-level social information. The position of each individual within the search population is updated according to Equations (7) and (8), thereby updating the search area;
(4): During the prey attack phase, based on Equations (9) and (10), perform fine-grained development around the current optimal solution and update the model hyperparameters;
(5): Model training: Construct corresponding TCN-BiLSTM-Attention models based on hyperparameters for each group, evaluate them using the validation set, and recalculate individual fitness values;
(6): During the food storage phase, according to Equation (11), the fitness value of the new position is compared with that of the previous update, and the optimal information is retained;
(7): If the iteration termination condition is not met, continue iterating; otherwise, output the optimal solution for the hyperparameters.

The flowchart of the RBMO algorithm optimizing the hyperparameters of the TCN-BiLSTM-Attention model is shown in Figure 8.

4.3. Constructing Prediction Models

The total flow of constructing the combined VMD-TCN-BiLSTM—Attention prediction model optimized based on the RBMO algorithm is shown in Figure 9, and the detailed steps are as follows:

(1): Data preprocessing (including data cleaning and data normalization);
(2): Determination of $K$ and $α$ values for VMD by RBMO algorithm;
(3): Employ the VMD algorithm to decompose non-stationary sequences into multiple stationary sequences;
(4): Combine each sub-sequence with environmental parameters and operational state parameters (wind speed, rotor speed, pitch angle, etc.) to form input components;
(5): Divide the data into training, validation, and test sets;
(6): Determine the optimal hyperparameter combinations for each component of the TCN-BiLSTM-Attention prediction model using the RBMO algorithm (including training the model on the training set and evaluating performance on the validation set);
(7): Substitute each component into the TCN-BiLSTM-Attention model for prediction;
(8): Sum and reconstruct the predicted values for each component, then renormalize the data to obtain the final predicted blade root load for the wind turbine.

4.4. Evaluation Indicators for the Model

The mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE) and coefficient of determination (R²) were selected as the evaluation indexes of the model. The smaller the values of MAE, MSE, RMSE and MAPE under the same model, the better the model prediction effect is, and the range of R² is [0, 1], the closer the value is to 1, the closer the fitted curve is to the real value. The following is the calculation formula for the five evaluation indexes:

\begin{array}{l} MAE = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}| \\ MSE = \frac{1}{n} \sum_{i = 1}^{n} {(\begin{matrix} {\hat{y}}_{i} - y_{i} \end{matrix})}^{2} \\ RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}} \\ MAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}| \times 100 % \\ R^{2} = 1 - \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {({\bar{y}}_{i} - y_{i})}^{2}} \end{array}

(20)

4.5. Model Parameter Settings

The hardware configuration of the experimental platform used in this study is as follows: an Intel Core i5-11320H processor running at 3.20 GHz, 16 GB of RAM, and an NVIDIA GeForce GTX 1650 graphics card. The models employed in the experiments and their key parameter settings are detailed in Table 2.

5. Method Validation

5.1. Data Selection

Actual wind power data from a wind farm in Xinjiang Uygur Autonomous Region, China, are selected for an example of validation. The blade root load data for this wind turbine was collected from an 8.2 MW wind generator operated by a company in the Xinjiang Uygur Autonomous Region, China. The sampling wind speed ranged from 12 to 22 m/s in turbulent conditions, with uniform pitch control applied during operation. The sensor employs a fiber-optic grating sensor. Data collection was conducted from August 18 to September 6, 2024, spanning 20 days. Data was collected every 10 min, yielding a total of 2880 data points. After data cleaning, 2500 valid data points were retained and divided into training, validation, and test sets in a 6:2:2 ratio.

Figure 10 shows wind speed data for a certain period.

5.2. Data Cleansing

During normal power generation, the load on wind turbines is significantly affected by surrounding obstacles (other wind turbines). Based on whether the incoming flow is affected by obstacles, it can be classified as free-flow or non-free-flow (affected by wake). This paper predicts the load on wind turbines under free-flow conditions during normal power generation. Based on data validity, data collected under free-flow conditions during normal power generation undergoes cleaning. The data to be cleaned includes: (1) data from abnormal power generation states, (2) data affected by wake turbulence, (3) data with incomplete information, (4) values exceeding physical limits, and (5) anomalous data.

After data cleaning, the statistical results are shown in Table 3.

5.3. Data Normalization

To enhance the training speed and prediction accuracy of the model, the raw data is first normalized using the min-max method, as shown in the following equation:

X^{*} = \frac{X - \min (X)}{\max (X) - \min (X)}

(21)

where

X

represents the original data;

\min (X)

and

\max (X)

denote the minimum and maximum values in the sample data, respectively;

X^{*}

represents the normalized data after processing the sample data.

When comparing the test set data with the original data, since the input data has undergone normalization, it is necessary to perform denormalization on the normalized data, as shown in the following equation:

X = X^{*} (\max (X) - \min (X)) + \min (X)

(22)

5.4. Sliding Window Design

To convert the root-of-blade load sequence of a wind turbine into supervised data required for prediction, this paper employs a sliding window method to construct root-of-blade load samples, thereby determining the input sequence and predicted values. Figure 11 illustrates the process of constructing supervised learning training samples based on the sliding window approach. As shown, the load data moves along the time axis with a configured sliding step size, constructing input features and output labels to form the supervised sample data needed for prediction. The window length is 10 time steps, the prediction step size is 1 time step, and the sliding step size is 1 time step.

5.5. Data Pre-Processing

The RBMO-VMD algorithm is used for VMD parameter optimization of the original blade root load data of wind turbine, and the Gray Wolf Optimizer (GWO), Whale Optimization Algorithm (WOA) and Particle Swarm Optimization (PSO) for comparison with RBMO, and the iteration results are shown in Figure 12.The experimental results are presented in Table 4.

As shown in Figure 12, GWO and WOA have found the optimal solution at the beginning of the optimization iteration and fall into local optimization, resulting in slow convergence and poor accuracy; PSO, although converging faster, suffers from the same problems of falling into local optimization and poor accuracy; Compared to the above three intelligent optimization algorithms, RBMO is more advantageous in that a suitable solution can be found at the beginning of the iteration and it has the fastest convergence rate, converging to the optimal solution after four iterations. So RBMO is adopted as the optimization algorithm.

From Table 4, it can be seen that the corresponding SE value is the smallest when K = 6; K = 6 is selected as the number of data decomposition. The parameter combination of RBMO algorithm optimized VMD results in [K, α] = [6, 2668], and the optimal parameter is given to the VMD, and the waveform of the original blade root load data of wind turbines processed by the VMD algorithm is shown in Figure 13 (with My data as an example).

As seen in Figure 13a, the data are decomposed into six Intrinsic Mode Functions (IMFs), ranging from IMF1 to IMF6, are organized sequentially from the low- frequency signal to the high-frequency signal. The high-frequency signals of IMF4-IMF6 are firstly removed, and then the remaining low and middle frequency signals are reconstructed, and the newly generated curves are smoother and smoother than the original curves. The new curve exhibits a progressively smoother profile compared to the original curve, demonstrating that the VMD algorithm is effective in preserving the essential eigenvalues and significant information of the original dataset while successfully eliminating high-frequency noise components. As can be seen in Figure 13b, no modal aliasing is observed in the spectrograms of each IMF component, which ensures the accurate separation of the high, medium and low frequencies, and also verifies the feasibility of the TCN-BiLSTM-Attention model with different parameters for each component.

5.6. Experimental Results and Analysis

5.6.1. Analysis of Prediction Results After Joining VDM

In the presence of extreme wind conditions, such as turbulent winds, considerable noise pollution data is produced, resulting in suboptimal evaluation metrics for the TCN-BiLSTM-Attention prediction model. To address this issue, the VMD algorithm is employed to decompose the data into smooth, stable signals, thereby mitigating data noise and enhancing prediction accuracy. To validate the effectiveness of VMD in data decomposition tasks, a VMD-TCN-BiLSTM-Attention model was constructed. Its predictive performance was then compared with the TCN-BiLSTM-Attention, EMD-TCN-BiLSTM-Attention, and CEEMDAN-TCN-BiLSTM-Attention models. The comparison of prediction results is shown in Figure 14.

As shown in Figure 14, incorporating different signal decomposition techniques further enhances the prediction accuracy of the TCN-BiLSTM-Attention model, indicating that decomposing complex non-stationary data into stationary data at different frequencies can improve prediction precision. Moreover, the prediction curve of the VMD-TCN-BiLSTM-Attention model most closely approximates the actual load curve. The statistical values of R², MAE, MSE, RMSE, and MAPE for each model are shown in Figure 15 and Table 5.

As shown in Table 5, compared to the TCN-BiLSTM-Attention, EMD-TCN-BiLSTM-Attention, and CEEMDAN-TCN-BiLSTM-Attention models, the VMD-TCN-BiLSTM-Attention model achieved MAE reductions of 11.7%, 3.3%, and 8.8%, respectively; MSE values decreased by 26.2%, 10.4%, and 8.2%, respectively; RMSE values decreased by 14.1%, 10.5%, and 6.7%, respectively; MAPE% decreased by, 9.6%, 0.3% and 7%, respectively, while R² values approached 1 more closely. These results demonstrate that signal decomposition techniques effectively address poor forecasting performance caused by non-stationary data. Under identical forecasting models, incorporating the VMD algorithm yields superior prediction accuracy compared to EMD and CEEMDAN algorithms.

5.6.2. Analysis of Prediction Results Incorporating Different Intelligent Optimization Algorithms

After incorporating the VMD algorithm, to further enhance the model’s prediction accuracy, RBMO was selected as the intelligent optimization algorithm. The RBMO algorithm was employed to optimize the parameters of VMD and the hyperparameters of the TCN-BiLSTM-Attention model, respectively, to achieve optimal results. A comparison of prediction results using the VMD-TCN-BiLSTM-Attention, PSO-VDM-TCN-BiLSTM-Attention, GWO-VDM-TCN-BiLSTM-Attention, WOA-VDM-TCN-BiLSTM-Attention, and RBMO-VMD-TCN-BiLSTM-Attention models is shown in Figure 16.

As shown in Figure 16, compared with the other four models, the proposed RBMO-optimized VMD-TCN-BiLSTM-Attention model delivers the best prediction performance, with its prediction curve most closely matching the actual load curve. The statistical results for R², MAE, MSE, RMSE, and MAPE values across all models are presented in Figure 17 and Table 6.

As shown in Table 6, the RBMO-VMD-TCN-BiLSTM-Attention model achieved a significant reduction in MAE compared to the VMD-TCN-BiLSTM-Attention, PSO-VMD-TCN-BiLSTM-Attention, GWO-VMD-TCN-BiLSTM-Attention, and WOA-VMD-TCN-BiLSTM-Attention models. Specifically, the MAE values decreased by 51.7%, 29.6%, 25.2%, and 30.6%, respectively; the MSE values decreased by 55%, 34.3%, 37.2%, and 44.2%, respectively; RMSE decreased by 52.6%, 19.8%, 24.2%, and 17.6%, respectively; MAPE% decreased by 32%, 19.6%, 20.8%, and 27.1%, respectively, with R² values approaching 1. The results indicate that employing RBMO enables more accurate determination of VMD parameters and combined model hyperparameters, thereby enhancing prediction accuracy.

5.6.3. Comparison with Other Models

To validate the reliability, stability, and accuracy of the RBMO-VMD-TCN-BiLSTM-Attention model for predicting blade root loads in wind turbines. A comparative analysis was conducted using ELM, BiGRU, BiLSTM, Transformer, and Informer models. The prediction results are shown in Figure 18.

As shown in Figure 18, the combined RBMO-VMD-TCN-BiLSTM-Attention model demonstrates the best performance in tracking the blade root load trend of wind turbines, with details that more closely align with actual values. The statistical values of R², MAE, MSE, RMSE, and MAPE for each model are shown in Figure 19 and Table 7.

As shown in Table 7, the RBMO-VMD-TCN-BiLSTM-Attention model achieved MAE reductions of 81.1%, 76.5%, 71%, 67.8%, and 67.7% compared to ELM, BiGUR, BiLSTM, Transformer, and Informer, respectively. MSE values decreased by 81.6%, 79.4%, 78.1%, 74.1%, and 74.7%, respectively; RMSE values decreased by 80.1%, 73.4%, 68.7%, 63.7%, and 63.5%, respectively; MAPE% decreased by 81.1%, 73.8%, 68.9%, 58.4%, and 57.2%, respectively; R² values approached 1 most closely. The RBMO-VMD-TCN-BiLSTM-Attention model demonstrated optimal performance across all metrics, with the combined prediction model showing greater advantages compared to individual models.

5.6.4. Melting Experiment

To validate the effectiveness of the proposed ensemble model and assess the contribution of each component, the proposed model RBMO-VMD-TCN-BiLSTM-Attention was compared with VMD-TCN-BiLSTM-Attention, TCN-BiLSTM-Attention, TCN-BiLSTM, and BiLSTM models. The prediction results are shown in Figure 20.

As shown in Figure 20, all five models can generally fit the overall trend of blade root loads in wind turbines. However, the VMD-TCN-BiLSTM-Attention, TCN-BiLSTM-Attention, TCN-BiLSTM, and BiLSTM models exhibit suboptimal prediction performance in both peak and trough regions. The RBMO-VMD-TCN-BiLSTM-Attention model effectively captures the variation patterns of lade root loads in both peak and trough regions, achieving superior fit to actual values with smaller prediction errors. The statistical values of R², MAE, MSE, RMSE, and MAPE for each model are shown in Table 8.

As shown in Table 8, compared to the BiLSTM model, TCN-BiLSTM leverages structural complementarity to balance local feature extraction with global temporal modeling, demonstrating superior performance in scenarios involving long sequences and high-noise patterns. This resulted in reductions of 12.7%, 17.4%, 5.8%, and 17.6% in MAE, MSE, RMSE, and MAPE, respectively; while improving R² by 1.6%. Compared to the TCN-BiLSTM model, the TCN-BiLSTM-Attention model achieves differentiated weighting of input information by introducing an attention mechanism. This approach assigns greater weight to key information while reducing interference from redundant data, resulting in reductions of 21.9%, 20%, 18.3%, and 38.6% in MAE, MSE, RMSE, and MAPE, respectively. Additionally, R² improved by 0.9%. Compared to the TCN-BiLSTM-Attention model, the VMD-TCN-BiLSTM-Attention model incorporates a signal decomposition algorithm. This algorithm decomposes the acquired long-term sequence payload data, reducing the impact of interference terms and enhancing the stability and predictability of the data sequence, thereby further improving prediction accuracy. Its MAE, MSE, RMSE, and MAPE values decreased by 11.7%, 26.2%, 14.1%, and 9.6%, respectively; while R² improved by 1.2%. Compared to the VMD-TCN-BiLSTM-Attention model, optimizing the parameters of VMD and the hyperparameters of the TCN-BiLSTM-Attention model using the RBMO algorithm resulted in improved predictive performance. The Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) values decreased by 51.7%, 55%, 52.6%, and 32%, respectively, while the R² value increased by 1.1%.

5.7. Statistical Analysis

In addition to the aforementioned evaluation indicators, this section employs the Wilcoxon signed-rank test to further demonstrate the superiority and validity of the proposed model’s predictive performance from a statistical perspective. The Wilcoxon signed-rank test is a common nonparametric statistical method used to compare differences between paired samples.

The results of the three sets of comparative experiments and ablation experiments in Section 5.6 were subjected to Wilcoxon signed-rank tests. The significance probability values (p-values) of the Wilcoxon signed-rank tests are shown in Table 9.

The Wilcoxon signed-rank test indicates that in the comparison between the model proposed in this chapter and the actual values, the p-value is 0.3826 > 0.05. This suggests that at the 0.05 significance level, there is insufficient evidence to reject the null hypothesis. It can be concluded that the prediction results obtained through the model in this paper are essentially consistent with the actual leaf root load data, showing no significant difference.

In comparisons with other benchmark models, the proposed model consistently achieved p-values below 0.05. This superior performance is not coincidental but rather systematic and reproducible.

6. Conclusions

To enhance the accuracy of wind turbine blade root load prediction, we propose a VMD-TCN-BiLSTM-Attention prediction model using RBMO optimization, which integrates heuristic algorithms, deep learning and data decomposition methods. Through a case study, it is verified that the proposed prediction model has a good performance when applied to the prediction of blade root loads of wind turbines. The conclusions can be summarized as follows:

(1): The RBMO-VMD-TCN-BiLSTM-Attention model demonstrates superior performance in predicting blade root loads for wind turbines compared to single models such as ELM, BiGUR, BiLSTM, Transformer, and Informer.
(2): The VMD-TCN-BiLSTM-Attention model has better performance compared with the TCN-Bi LSTM-Attention model. It shows that the VMD algorithm can overcome the shortcomings of high volatility and non-stationarity of the original wind turbine sequence, which facilitates the prediction model to extract the hidden information in the data, and then improves the prediction accuracy of the blade root load of the wind turbines.
(3): To further enhance the prediction accuracy of blade root loads in wind turbines, the RBMO algorithm was employed to optimize the parameters of the VMD and the hyperparameters of the TCN-BiLSTM-Attention model, resulting in a significant improvement in prediction accuracy. Experiments demonstrate that the combined prediction model based on RBMO-VMD and TCN-BiLSTM-Attention achieves high prediction accuracy.

The combined prediction model based on RBMO-VMD and TCN-BiLSTM-Attention achieved the lowest MAE, MSE, RMSE, and MAPE, with an R² value closest to 1. The predicted blade root load curves are very much in line with the trend of the actual blade root load curves. The findings demonstrate that the proposed model exhibits outstanding predictive performance and generalization capabilities, making it a valuable tool for estimating the blade root load of wind turbines. Additionally, it offers a solid theoretical foundation for ensuring the stable operation of wind turbines and delivers precise blade root load data for effective wind turbine control, thereby contributing to the reliable integration of wind turbines into the power grid. Future work can focus on the following directions: First, explore lightweight modeling and edge deployment solutions to enable real-time online prediction of blade root loads, and achieve deep integration with multi-source operational data from the wind turbine SCADA system; Second, enhance the model’s predictive robustness under uncertain environmental disturbances such as turbulence and gusts, and provide risk perception capabilities for operational decisions through methods like uncertainty quantification.

Author Contributions

Conceptualization, Y.L. and J.C.; Methodology, Y.L.; Software, Y.L. and J.C.; Validation, Y.L. and J.C.; Formal analysis, Y.L.; Investigation, Y.L.; Resources, J.C.; Data curation, Y.L. and J.C.; Writing—original draft, Y.L.; Writing—review and editing, Y.L. and J.C.; Visualization, Y.L.; Supervision, J.C.; Project administration, J.C.; Funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program Foundation of China (2021YFB1506902); Major Science and Technology Special Project of Xinjiang Uygur Autonomous Region (2022A01001-4); Key Research and Development Project of Xinjiang Uygur Autonomous Region (2022B01003-3).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xiao, Z.; Cao, Z.H.; Deng, J.W.; Duan, S.Y.; Zhao, Q.C.; Dai, J.C.; Tao, J. A wind turbine load prediction method integrating fibre optic sensing data and gated recurrent neural network. J. Mech. Eng. 2025, 61, 272–282. [Google Scholar]
Chen, Z.H.; Huang, S.; Hu, Q.; Chen, F.; Han, G.W.; Han, W.F.; Pai, Y.F. A GIS Isolator Contact Status Assessment Method Based on Weighted Naive Bayes. High Volt. Appar. 2025, 61, 50–57. [Google Scholar] [CrossRef]
Manzali, Y.; Barry, K.A.; Flouchi, R.; Balouki, Y.; Elfar, M. A feature weighted K-nearest neighbor algorithm based on association rules. J. Ambient. Intell. Humaniz. Comput. 2024, 15, 2995–3008. [Google Scholar] [CrossRef]
Chen, S.C.; Huang, L.J.; Pan, Y.J.; Hu, Y.C.; Shen, D.L.; Dai, J.G. Decision tree-based prediction approach for improving stable energy management in smart grids. J. High Speed Netw. 2023, 29, 295–305. [Google Scholar] [CrossRef]
Qin, B.; Yi, H.Y.; Wang, X. Modelling of blade root load identification for wind turbines based on limit learning machine. Vib. Shock. 2018, 37, 257–262. [Google Scholar]
Xu, Y.; Cai, A.M.; Zhang, L.W.; Lin, W.R.; Li, C.; Li, S.Q. Wind turbine load prediction and analysis based on BP neural network and multi-factor weighting method. Therm. Power Gener. 2022, 51, 42–49. [Google Scholar]
Han, L.; Wang, X.; Yu, Y.; Wang, D. Power load forecast based on CS-LSTM neural network. Mathematics 2024, 12, 1402. [Google Scholar] [CrossRef]
Song, G.J.; Wang, L.; Zhang, P.C.; Zhang, C.Q.; Bao, Z.Z. Short-Term Load Forecasting Based on Dual-Channel Feature Extraction VMD-CNN-BiLSTM. Trans. Electr. Power Syst. Autom. 2025, 1–14. [Google Scholar] [CrossRef]
Wu, K.T.; Peng, X.G.; Chen, Z.W.; Su, H.K.; Quan, H.; Liu, H.Y. A novel short-term household load forecasting method combined BiLSTM with Application of Big Data in Power Systems trend feature extraction. Energy Rep. 2023, 9, 1013–1022. [Google Scholar] [CrossRef]
Wang, D.; Li, S.; Fu, X. Short-term power load forecasting based on secondary cleaning and CNN-BILSTM-Attention. Energies 2024, 17, 4142. [Google Scholar] [CrossRef]
Qian, C.; Liu, Y.C.; Li, H.X.; Chen, L.J.; Chen, J.X. Research on Highway Tunnel Structural Condition Prediction Method Based on HO–CNN–BiLSTM. J. Saf. Environ. 2025, 1–12. [Google Scholar] [CrossRef]
Laitsos, V.; Vontzos, G.; Bargiotas, D.; Daskalopulu, A.; Tsoukalas, L.H. Enhanced Automated Deep Learning Application for Short-Term Load Forecasting. Mathematics 2023, 11, 2912. [Google Scholar] [CrossRef]
Li, X.J.; Zhang, Y.; Wang, Z.W.; Song, Z.Y. Short-Term Wind Power Prediction Based on Optimized VMD and LSTM. Energy Eng. 2025, 122, 4603–4619. [Google Scholar] [CrossRef]
Fu, X. A Study on Multi-Feature Short-Term Electricity Load Forecasting Based on VMD and TCN-KAN. Master’s Thesis, Harbin Institute of Technology, Harbin, China, 2025. [Google Scholar]
Zhang, Y.Q.; Cheng, Q.Z.; Jiang, W.J.; Liu, X.F.; Shen, L.; Chen, Z.H. An EMD-PCA-LSTM-Based Photovoltaic Power Forecasting Model. Chin. J. Sol. Energy 2021, 42, 62–69. [Google Scholar]
Shen, H.B.; Wang, L.Z.; Deng, L.Y.; Cheng, X.L.; Wu, H.J. Short-Term Wind Power Portfolio Forecasting Model Based on CEEMDAN-PCA-BiLSTM-LSTNet. Renew. Energy 2025, 43, 902–910. [Google Scholar]
Zhang, Y.; Liu, K.; Qin, L. Deterministic and probabilistic interval prediction for short-term wind power generation based on variational mode decomposition and machine learning methods. Energy Convers. Manag. 2016, 112, 208–219. [Google Scholar] [CrossRef]
Shi, P.Z.; Wei, X.; Zhang, C.M.; Xie, L.R.; Ye, J.H.; Yang, J.L. Short-time wind power prediction based on VMD-BOA-LSSVM-AdaBoost. Acta Energiae Solaris Sin. 2014, 45, 226–233. [Google Scholar]
Liu, H.M.; Tang, X.X.; Zhang, Z.K.; Li, Y. Research on independent variable pitch control strategy based on joint feedback of azimuth and load. Chin. J. Electr. Eng. 2016, 36, 3798–3806. [Google Scholar]
Liu, H.; Zhou, B.; Sun, X.M.; Zhou, H. Modelling and dynamic characterization of large-scale constant-frequency variable-speed wind turbine blade. Renew. Energy 2014, 32, 54–57. [Google Scholar]
Fu, S.W.; Li, K.; Huang, H.S.; Ma, C.; Fan, Q.S.; Zhu, Y.W. Red-billed blue magpie optimizer: Anovel metaheuristic algorithm for 2D/3D UAV path planning and engineering design problems. Artif. Intell. Rev. 2024, 57, 134. [Google Scholar] [CrossRef]
Wang, G.; Wang, X.; Wang, Z.; Ma, C.; Song, Z. A VMD–CISSA–LSSVM Based Electricity Load Forecasting Model. Mathematics 2022, 10, 28. [Google Scholar] [CrossRef]
Zhao, B.; Zhang, L.X.; Zhang, L.Q.; Chen, Z.; Liu, X.W.; Xie, C.J. PEMFC life fusion prediction method based on joint iteration of TCN and AUKF. Chin. J. Electr. Eng. 2024, 45, 3609–3624. [Google Scholar]
Wu, J.B.; Zeng, G.H.; Zhang, Z.H.; Huang, B.; Liu, J. Short-Term Photovoltaic Power Forecasting Using a Dynamic Combination of TCN-GRU and LSTM Based on K-Means Hierarchical Clustering. Renew. Energy 2023, 41, 1015–1022. [Google Scholar]
Su, L.C.; Zhu, J.J.; Li, Y.W. Short-term wind power prediction based on time-convolutional network residual correction. J. Sol. Energy 2023, 44, 427–435. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Liu, M.J.; Lu, Y.S.; Long, S.; Bai, J.Y.; Lian, W.M. An attention-based CNN-BiLSTM hybrid neural network enhanced with features of discrete wavelet transformation for fetal acidosis classification. Expert Syst. Appl. 2021, 186, 115714. [Google Scholar] [CrossRef]
Ai, C.Y.; He, S.; Hu, H.; Fan, X.C.; Wang, W.Q. Chaotic time series wind power interval prediction based on quadratic decomposition and intelligent optimization algorithm. Chaos Solitons Fractals 2023, 177, 114222. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of blade element stress.

Figure 2. Causal inflation convolution.

Figure 3. TCN Residual Block.

Figure 4. LSTM network structure diagram.

Figure 5. BiLSTM network structure diagram.

Figure 6. TCN-BiLSTM-Attention Model Architecture.

Figure 7. Flowchart for optimizing VMD with RBMO algorithm.

Figure 8. Flowchart for Optimizing RBMO Algorithm Model Hyperparameters.

Figure 9. General process of predictive model.

Figure 10. Ambient air velocity.

Figure 11. Window Sample Construction Process.

Figure 12. The process of optimizing VMD parameters.

Figure 13. (a) VMD Decomposition Waveform; (b) Spectrogram of each modal component.

Figure 14. Predicted Load Curves for Blade Root Under Different Decomposition Algorithms.

Figure 15. Comparison of Model Metrics Across Different Decomposition Algorithms.

Figure 16. Prediction of Blade Root Load Curves Under Different Intelligent Optimization Algorithms. (a) Predicted results of Mx under different intelligent optimization algorithms. (b) Predicted results of My under different intelligent optimization algorithms.

Figure 17. Comparison of Model Metrics Across Different Intelligent Optimization Algorithms.

Figure 18. Blade Root Load Prediction Curve.

Figure 19. Model Evaluation Metrics.

Figure 20. Comparison of Model Metrics Across Different Decomposition Algorithms.

Table 1. Hyperparameter Range.

Hyperparameter Name	Search Scope
Number of TCN layers	[3, 6]
Size of the TCN convolution kernel	[3, 7]
Number of BiLSTM Layers	[1, 3]
Number of hidden layer units in BiLSTM	[64, 256]
Attention Dimension	[64, 256]
Batch size	[32, 128]
Learning rate	[1 × 10⁻⁴, 1 × 10⁻³]
Dropout Rate	[0.2, 0.5]

Table 2. Model Parameter Settings.

Parameter Category	Parameter Name	Value
Model Parameters	Number of TCN layers	3
	Size of the TCN convolution kernel	5
	TCN Dropout Rate	0.2
	Expansion Factor of TCN	[1, 2, 4]
	Number of BiLSTM Layers	2
	Number of hidden layer units in BiLSTM	128
	BiLSTM Dropout Rate	0.3
	Attention Dimension	128
Training Parameters	Batch size	64
	Learning rate	1 × 10⁻³
	Optimizer	Adam
	Loss function	MSE

Table 3. Statistical Analysis of Experimental Data.

Sample	Quantity	Load Types	Statistical Indicators
Sample	Quantity	Load Types	Maximum Value	Minimum Value	Average	Standard Deviation
All samples	2500	oscillating moment	11,624.9	−8001.9	556.3	6150.2
All samples	2500	waving moment	19,883.8	11,217.1	13,411.2	1865.3
Training samples	1500	oscillating moment	11,624.9	−8001.9	589.3	6147.3
Training samples	1500	waving moment	19,883.8	11,217.1	13,529.8	1892.3
Verification sample	500	oscillating moment	11,592.9	−7658.3	466.2	6135.9
Verification sample	500	waving moment	19,728.9	11,296.5	13,125.9	1749.3
Test sample	500	oscillating moment	11,589.3	−7892.7	546.6	6173.7
Test sample	500	waving moment	19,668.2	11,396.5	13,341.5	1861.9

Table 4. SE value of the VMD decomposition number.

Number of Decompositions	Value of SE
4	0.212
5	0.245
6	0.208
7	0.217
8	0.225
9	0.233
10	0.249

Table 5. Evaluation Metrics for Each Model.

Model	R²	MAE	MSE	RMSE	MAPE%
TCN-BiLSTM-Attention	0.9752	6.8217	98.5476	7.9526	3.5503
EMD-TCN-BiLSTM-Attention	0.9769	6.2275	81.1586	7.6315	3.2175
CEEMDAN-TCN-BiLSTM-Attention	0.9775	6.6072	79.2239	7.3143	3.4507
VMD-TCN-BiLSTM-Attention	0.9865	6.0232	72.7055	6.8276	3.2092

Table 6. Evaluation Metrics for Each Model.

Model	R²	MAE	MSE	RMSE	MAPE
VMD-TCN- BiLSTM-Attention	0.9865	6.0232	72.7055	6.8276	3.2092
PSO-VDM-TCN- BiLSTM-Attention	0.9893	4.1326	49.8872	4.0319	2.7132
GWO-VDM-TCN- BiLSTM-Attention	0.9885	3.8875	52.1537	4.2681	2.7567
WOA-VDM-TCN- BiLSTM-Attention	0.9881	4.1893	58.6631	3.9269	2.9954
RBMO-VDM-TCN- BiLSTM-Attention	0.9972	2.9073	32.7528	3.2356	2.1825

Table 7. Evaluation Metrics for Each Model.

Model	R²	MAE	MSE	RMSE	MAPE%
ELM	0.8922	15.3508	178.2559	16.2632	11.5507
BiGUR	0.9326	12.3618	158.6603	12.1662	8.3215
BiLSTM	0.9512	10.0116	149.1436	10.3265	7.0179
Transformer	0.9601	9.0345	126.2586	8.9122	5.2518
Informer	0.9603	9.0173	129.5139	8.8642	5.0937
RBMO-VMD-TCN-BiLSTM-Attention	0.9972	2.9073	32.7528	3.2356	2.1825

Table 8. Evaluation Metrics for Each Model.

Model	R²	MAE	MSE	RMSE	MAPE
BiLSTM	0.9512	10.0116	149.1436	10.3265	7.0179
TCN-BiLSTM	0.9663	8.7396	123.1519	9.7283	5.7816
TCN-BiLSTM-Attention	0.9752	6.8217	98.5476	7.9526	3.5503
VMD-TCN- BiLSTM-Attention	0.9865	6.0232	72.7055	6.8276	3.2092
RBMO-VMD-TCN-BiLSTM-Attention	0.9972	2.9073	32.7528	3.2356	2.1825

Table 9. Wilcoxon signed-rank test.

Experimental Group	Model A	Model B	p-Value	Significant
	RBMO-VMD-TCN-BiLSTM-Attention	Observation value	0.3826	No
Experiment 1	VMD-TCN-BiLSTM-Attention	TCN-BiLSTM-Attention	0.0089	Yes
		EMD-TCN-BiLSTM-Attention	0.0128	Yes
		CEEMDAN-TCN-BiLSTM-Attention	0.0177	Yes
Experiment 2	RBMO-VMD-TCN-BiLSTM-Attention	VMD-TCN-BiLSTM-Attention	0.0052	Yes
		PSO-VMD-TCN-BiLSTM-Attention	0.0236	Yes
		GWO-VMD-TCN-BiLSTM-Attention	0.0213	Yes
		WOA-VMD-TCN-BiLSTM-Attention	0.0228	Yes
Experiment 3	RBMO-VMD-TCN-BiLSTM-Attention	ELM	<0.0001	Yes
		BiGUR	<0.0001	Yes
		BiLSTM	<0.0001	Yes
		Transformer	<0.0001	Yes
		Informer	<0.0001	Yes
Melting Experiment	RBMO-VMD-TCN-BiLSTM-Attention	BiLSTM	<0.0001	Yes
		TCN-BiLSTM	<0.0001	Yes
		TCN-BiLSTM-Attention	0.0002	Yes
		VMD-TCN-BiLSTM-Attention	0.0052	Yes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Cheng, J. Prediction of Blade Root Loads for Wind Turbine Based on RBMO-VMD and TCN-BiLSTM-Attention. Mathematics 2026, 14, 218. https://doi.org/10.3390/math14020218

AMA Style

Liu Y, Cheng J. Prediction of Blade Root Loads for Wind Turbine Based on RBMO-VMD and TCN-BiLSTM-Attention. Mathematics. 2026; 14(2):218. https://doi.org/10.3390/math14020218

Chicago/Turabian Style

Liu, Yifan, and Jing Cheng. 2026. "Prediction of Blade Root Loads for Wind Turbine Based on RBMO-VMD and TCN-BiLSTM-Attention" Mathematics 14, no. 2: 218. https://doi.org/10.3390/math14020218

APA Style

Liu, Y., & Cheng, J. (2026). Prediction of Blade Root Loads for Wind Turbine Based on RBMO-VMD and TCN-BiLSTM-Attention. Mathematics, 14(2), 218. https://doi.org/10.3390/math14020218

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Blade Root Loads for Wind Turbine Based on RBMO-VMD and TCN-BiLSTM-Attention

Abstract

1. Introduction

2. Basic Principle

2.1. Blade Element Theory

2.2. Red-Billed Blue Magpie Optimizer Algorithm

2.2.1. Population Initialization

2.2.2. Foraging Phase

2.2.3. Attacking Prey

2.2.4. Food Storage

2.3. VMD Decomposition Principle

3. TCN-BiLSTM-Attention Model

3.1. Temporal Convolutional Neural Network

3.2. Bidirectional Long and Short-Term Memory Network

3.3. Attention Mechanism

3.4. TCN-BiLSTM-Attention Model Construction

4. VMD-TCN-BiLSTM-Attention Prediction Model Optimized by RBMO Algorithm

4.1. VMD Optimized Based on RBMO Algorithm

4.2. RBMO Optimization of Hyperparameters for the TCN-BiLSTM-Attention Model

4.3. Constructing Prediction Models

4.4. Evaluation Indicators for the Model

4.5. Model Parameter Settings

5. Method Validation

5.1. Data Selection

5.2. Data Cleansing

5.3. Data Normalization

5.4. Sliding Window Design

5.5. Data Pre-Processing

5.6. Experimental Results and Analysis

5.6.1. Analysis of Prediction Results After Joining VDM

5.6.2. Analysis of Prediction Results Incorporating Different Intelligent Optimization Algorithms

5.6.3. Comparison with Other Models

5.6.4. Melting Experiment

5.7. Statistical Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI