Ultra-Short-Term Wind Power Forecasting Based on Improved TTAO Optimization and High-Frequency Adaptive Weighting Strategy

Wang, Xiaoming; Huang, Yan; Pu, Jing; Yang, Youqing; Zhang, Lin; Bai, Xiaolong; Fan, Haoran; Lin, Sheng

doi:10.3390/electronics15020363

Open AccessArticle

Ultra-Short-Term Wind Power Forecasting Based on Improved TTAO Optimization and High-Frequency Adaptive Weighting Strategy

by

Xiaoming Wang

¹,

Yan Huang

¹,

Jing Pu

²,

Youqing Yang

¹,

Lin Zhang

¹,

Xiaolong Bai

¹,

Haoran Fan

³

and

Sheng Lin

^3,*

¹

Chengdu Chengdian Power Design Co., Ltd., Chengdu 610213, China

²

Sichuan Keride Industrial Group Co., Ltd., Chengdu 610045, China

³

School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(2), 363; https://doi.org/10.3390/electronics15020363

Submission received: 14 November 2025 / Revised: 2 January 2026 / Accepted: 13 January 2026 / Published: 14 January 2026

(This article belongs to the Section Systems & Control Engineering)

Download

Browse Figures

Versions Notes

Abstract

Accurate ultra-short-term wind power forecasting (WPF) is essential for maintaining power grid stability and minimizing economic risks, yet the inherent volatility of wind speed poses significant modeling challenges. To address this, this study proposes an ensemble framework integrating an Improved Triangular Topology Aggregation Optimizer (ITTAO) and a high-frequency adaptive weighting strategy. Methodologically, the ITTAO incorporates multi-strategy mechanisms to overcome the premature convergence of the traditional TTAO, thereby enabling precise hyperparameter optimization for the variational mode decomposition (VMD) and BiLSTM networks. Furthermore, in the reconstruction stage, a dynamic weighting strategy is introduced to modulate the contribution of high-frequency sub-sequences, thereby enhancing the capture of rapid fluctuations. Experimental results across multi-seasonal datasets demonstrate that the proposed hybrid model consistently outperforms representative baselines. Notably, in the most volatile scenarios, the model achieves an NMAE of 1.33%, an NRMSE of 2.20%, and an

R^{2}

of 98.18%. The results demonstrate that the proposed model achieves superior forecasting accuracy, enhancing the operational stability of wind farms and the secure integration of wind energy into the power grid.

Keywords:

wind power forecasting; optimization; triangulation topology aggregation optimizer; decomposition; long short-term memory; error stability index

1. Introduction

Wind energy, as one of the most widely used renewable energy sources in modern power systems, has experienced rapid growth worldwide in recent years. According to the Global Wind Energy Council, worldwide installed wind capacity grew from 318 GW in 2013 to 1021 GW by 2023 [1]. However, wind power is inherently random, volatile, and intermittent. Large-scale integration of wind power can introduce random power fluctuations into the grid and threatens the stability of system frequency [2]. Crucially, in deregulated electricity markets, these fluctuations also impose significant economic burdens, as grid operators often enforce strict penalties for deviations between scheduled and actual power generation. Therefore, developing highly accurate and reliable wind power forecasting (WPF) technologies is crucial for enabling the scientific allocation of spinning reserve capacity, facilitating timely peak load regulation, and ensuring the secure and stable operation of the power grid [3].

To achieve higher accuracy in the WPF domain, many WPF models have been developed. At present, WPF techniques are commonly categorized into five types: physical models, statistical models, artificial intelligence (AI)-based models, deep learning models, and hybrid models [4]. Table 1 provides a clear overview of these five categories of models, and outlines their key methods, main contributions, limitations, and related references. Physical methods rely on numerical weather prediction (NWP) data and replicate the physical process of converting wind energy into electricity to develop forecasting models [5]. They require highly accurate meteorological inputs and detailed turbine parameters, while the complexity of the modeling process restricts their application in ultra-short-term WPF. Statistical methods use historical data to uncover the correlations between input features and the output variable [6]. Traditional statistical models, such as ARMA and ARIMA [7], rely on linear assumptions, which limit their ability to capture the nonlinear characteristics of wind power data [8].

With the rapid advancement of AI technologies, AI-based approaches have been increasingly applied to WPF. These methods are generally classified into machine learning and deep learning methods [9]. Machine learning methods, including neural networks [10,11], extreme learning machines (ELMs) [12], and support vector machines (SVMs) [13], can model nonlinear relationships. However, their network structures make them prone to local optima, overfitting, and slow convergence [14]. In contrast, deep learning models have become the mainstream choice for WPF due to their multilayer architectures and strong feature extraction capability [15,16]. Among them, convolutional neural networks (CNNs) [17], temporal convolutional networks (TCNs) [18], and long short-term memory networks (LSTM) [19] have demonstrated strong performance in both wind power and wind speed forecasting. In particular, LSTM networks have gained widespread adoption in recent years because they effectively mitigate the gradient instability issues present in traditional recurrent neural networks (RNNs).

Wind power generation is susceptible to various external meteorological factors such as weather conditions and temperature, making it challenging for a single model to accurately capture the original feature information within wind power series. To address this limitation, hybrid models combining distinct techniques have been widely adopted to enhance prediction accuracy [20]. Generally, these hybrid models can be categorized into two primary types. The first category involves applying signal decomposition techniques to partition the original signal, thereby mitigating the nonlinear characteristics of the raw wind power time series. Common decomposition methods include wavelet decomposition [21], singular value decomposition [22], and empirical mode decomposition along with its variants [23]. These methods typically decompose the original wind power series into a sequence of sub-modes with different frequencies and model each sub-mode individually. However, a critical defect persists in this methodology. The decomposed high-frequency subsequences often retain significant noise and fluctuations, and directly predicting these components frequently leads to error accumulation. To mitigate this issue, existing literature typically adopts one of two approaches. The first is applying secondary decomposition to the high-frequency components [24], and the second is directly discarding them [25]. However, discarding components results in the loss of valid fluctuation details, whereas secondary decomposition inevitably imposes a substantial computational burden and significantly increases the training time.

The second category integrates optimization methods with deep learning predictors or decomposition algorithms to search for optimal hyperparameter settings, as the performance of hybrid models relies heavily on these configurations. For example, reference [26] developed a hybrid PSO-CNN-LSTM model for short-term WPF, while reference [27] utilized the firefly algorithm to adjust the parameter settings of the LSTM network to enhance the adaptability and stability of the model. Additionally, reference [28] proposed a model that employs the Sparrow Search Algorithm (SSA) to optimize the VMD decomposition process. Nevertheless, a critical common limitation persists across these studies. These basic algorithms often suffer from insufficient population diversity and are prone to premature convergence when solving such high-dimensional and non-convex optimization problems. Consequently, the potential for maximizing model accuracy is hindered by the inability to find the global optimal parameters.

Table 1. Classification of WPF models with their respective advantages and disadvantages.

Category	Model	Refs.	Advantages	Disadvantages
physical methods	NWP	[5]	Effective for long-term forecasting, ability to handle real-time observations.	Require high computational resources and large datasets.
Statistical methods	ARIMA-ANN	[6]	Computationally efficient and perform reliably with limited data.	Have limited capability in modeling nonlinear features.
Statistical methods	ARIMA-Kalman	[7]		Have limited capability in modeling nonlinear features.
Machine learning models	BP	[10,11]	Capture complex patterns and non-linear features, adaptability to dynamic trends.	Require higher computation and careful parameter tuning, with risk of overfitting.
	ELM	[12]
	SVM	[13]
Deep learning models	CNN	[17]	Learn temporal structures and long-range dependencies.	Incur high computation cost and long training time, and remain sensitive to parameter tuning and overfitting.
	TCN	[18]
	LSTM	[19]
Hybrid models	PSO-CNN-LSTM	[21]	Combine complementary algorithms, resulting in improved generalization and forecasting ability.	More complex to design and dependent on larger, high-quality datasets.
	FA-LSTM	[22]
	WT-MLPNN	[23]
	SVD-TCN	[24]
	EMD-BaNN	[25]
	VMD-ConvLSTM	[26]
	Secondary Decomposition	[27]
	VMD-IMPA-SVM	[28]

Despite extensive research in this field, existing WPF models still have some research gaps that need to be filled:

(1): Most existing hybrid models rely on standard metaheuristic algorithms to determine key hyperparameters. However, these algorithms lack sufficient capability to escape local optima when addressing complex non-convex optimization problems, limiting the further improvement of prediction accuracy.
(2): The processing strategies for high-frequency components in decomposition-based models are often inefficient. Existing methods either sacrifice valid information by discarding components or incur excessive computational costs through secondary decomposition, lacking a strategy that balances efficiency and accuracy.

To address the identified research gaps, this study proposes a novel wind power forecasting framework based on an Improved Triangular Topology Aggregation Optimizer (ITTAO) and a high-frequency adaptive weighting strategy. The main contributions of this work are twofold. address the identified research gaps, this study proposes a novel wind power forecasting framework based on an Improved Triangular Topology Aggregation Optimizer (ITTAO) and a high-frequency adaptive weighting strategy. The main contributions of this work are twofold.

(1): An advanced ITTAO algorithm is proposed to solve high-dimensional non-convex optimization problems. By integrating Logistic–Tent chaotic mapping, the Golden-Sine strategy, and lens-imaging learning, ITTAO effectively overcomes the premature convergence of standard metaheuristics, ensuring precise global optimization for model hyperparameters.
(2): A high-frequency adaptive weighting strategy is designed to balance reconstruction accuracy and computational efficiency. By dynamically adjusting weights based on error stability, this strategy suppresses high-frequency noise and minimizes error accumulation without the computational burden of secondary decomposition or the information loss of discarding.

The remainder of this paper is organized as follows. Section 2 describes the algorithmic principles employed in this study. Section 3 outlines the construction process of the proposed forecasting model. Section 4 introduces the data sources and presents the experimental results. Section 5 summarizes the conclusions and key findings.

2. Theoretical Framework

2.1. Improved TTAO

The triangulation topology aggregation optimizer (TTAO) is a metaheuristic optimization algorithm proposed in 2024 [29]. Inspired by the geometric properties of similar triangles, it constructs triangular topology units (TTUs) with three vertices and an inner point, where optimization is achieved through information exchange and aggregation among individuals. The optimization process of TTAO consists of four stages: initialization, TTU construction, generic aggregation, and local aggregation.

2.1.1. Initialization

Set the population size of the optimization task to N and the number of decision variables to D. The algorithm divides the population into

⌊ N / 3 ⌋

TTUs, where

⌊ \cdot ⌋

denotes the floor operation. Each unit is then randomly generated within the feasible domain, following the initialization strategy given below:

X_{i, 1} = (B_{u} - B_{l}) \times rand (0, 1) + B_{l}

(1)

where

X_{i, 1}

is the first vertex of the i-th TTU;

rand (0, 1)

generates a random number in the range

[0, 1]

;

B_{u}

and

B_{l}

are the upper and lower bounds of the variable, respectively.

2.1.2. TTU Construction

The construction principle of TTUs is illustrated in Figure 1. After the first vertex is determined, a direction vector of length L is randomly generated from this point in the spherical coordinate system. The second vertex

X_{i, 2}

is obtained by converting this vector into Cartesian coordinates. Rotating the vector counterclockwise by

π / 3

and repeating the conversion gives the third vertex

X_{i, 3}

, forming a symmetric TTU. The coordinates of each vertex are expressed as follows:

X_{i, 2} = X_{i, 1} + L \cdot f (θ)

(2)

X_{i, 3} = X_{i, 1} + L \cdot f (θ + \frac{π}{3})

(3)

where

f (θ)

and

f (θ + π / 3)

are the direction vectors extended from the first vertex. Each TTU generates a fourth vertex through internal aggregation. This point is formed using a linear weighted combination and is given as:

X_{i, 4} = r_{1} X_{i, 1} + r_{2} X_{i, 2} + r_{3} X_{i, 3}

(4)

where

r_{1}

,

r_{2}

, and

r_{3}

are independent random variables sampled from

[0, 1]

and satisfy

r_{1} + r_{2} + r_{3} = 1

. Therefore, the fourth vertex lies within the triangle.

2.1.3. Generic Aggregation

This stage focuses on enhancing the algorithm’s global search ability by generating new candidate solutions from the best individuals of each TTU, as illustrated in Figure 2. Let

X_{i, best}^{q}

denote the best vertex of the i-th TTU at the q-th iteration. The new feasible solution is given by:

X_{i, new 1}^{q + 1} = r_{4} X_{i, best}^{q} + (1 - r_{4}) X_{rand, best}^{q}

(5)

where

r_{4}

is a random value distributed in

[0, 1]

, and

X_{rand, best}^{q}

is the best vertex of a randomly selected TTU.

2.1.4. Local Aggregation

This stage focuses on enhancing the algorithm’s local search capability. It perturbs the current best solution using the difference vector between the best and second-best solutions, and adjusts the best solution’s position locally to explore better solutions within a small neighborhood. The calculation is expressed as follows:

X_{i, new 2}^{q + 1} = X_{i, best}^{q + 1} + ζ (X_{i, best}^{q + 1} - X_{i, sbest}^{q + 1})

(6)

where

ζ

controls the perturbation amplitude and gradually decreases during the iterations to approach the optimal solution.

2.1.5. Multi-Strategy Improvements

According to Equation (1), the traditional TTAO initializes the population by randomly generating TTUs using

rand (\cdot)

. However, this approach may lead to an uneven distribution of solutions in the search space and make the algorithm prone to local optima in subsequent iterations. To reduce the dependence on initial solutions, this study introduces a Logistic–Tent map at the population initialization stage [30]. Each random factor is perturbed by the chaotic sequence, which produces candidate solutions that are more evenly distributed across the search space. The Logistic–Tent-based chaotic sequence is defined as:

s_{k + 1} = \{\begin{matrix} mod [r s_{k} (1 - s_{k}) + \frac{(4 - r) s_{k}}{2}], & s_{k} < 0.5 \\ mod [r s_{k} (1 - s_{k}) + \frac{(4 - r) (1 - s_{k})}{2}], & s_{k} \geq 0.5 \end{matrix}

(7)

where

mod (\cdot)

denotes the modulo operation and

r \in [0, 4]

represents the chaotic parameter. To guarantee robust chaotic ergodicity and population diversity, the control parameter is set to

r = 0.59

, and the initial value

s_{1}

is generated randomly within the interval

(0, 1)

.

During generic aggregation phase, the original aggregation strategy relies solely on linear combinations among individuals. As a result, the information from high-quality solutions cannot be effectively transmitted and diffused, which limits the adjustment of perturbation amplitude near the optimal region and may cause the search to stagnate. To address this, a Golden-Sine strategy is incorporated to strengthen the algorithm’s global exploration ability [31]. The enhanced position update formula based on Equation (5) is defined as:

X_{i, new 1}^{q + 1} = X_{i}^{q} | sin r_{4} | + r_{5} sin r_{4} | x_{1} X_{i, best}^{q} - x_{2} X_{rand, best}^{q} |

(8)

\{\begin{matrix} x_{1} & = - π + (1 - τ) 2 π \\ x_{2} & = - π + τ 2 π \end{matrix}

(9)

where

τ

represents the golden ratio coefficient, fixed at

τ = (\sqrt{5} - 1) / 2 \approx 0.618

, which iteratively narrows the search space coefficients

x_{1}

and

x_{2}

.

X_{i}

indicates the location of the i-th triangular topology unit,

r_{4}

is a random number in the range

[0, 2 π]

, and

r_{5}

is a random number in the range

[0, π]

.

In addition, in the local aggregation stage, lens-imaging learning strategy is employed to derive the inverse solution

X_{i, new 2}^{q + 1 *}

, which enables the algorithm to escape from the current search region and expand the search range [32]. In the lens-imaging learning algorithm, the inverse position is calculated as:

X_{i, new 2}^{q + 1 *} = \frac{a_{j} + b_{j}}{2} + \frac{a_{j} + b_{j}}{2 r_{c}} - \frac{X_{i, new 2}^{q + 1}}{r_{c}}

(10)

where

a_{j}

and

b_{j}

are the upper and lower boundaries of the search space, respectively, and

r_{c}

is the scaling factor. The value of

r_{c}

is updated according to the following formula:

r_{c} = {(1 + \sqrt{\frac{q}{Q}})}^{10}

(11)

where Q is the maximum number of iterations. According to this schedule, the value of

r_{c}

increases as the algorithm proceeds. Since

r_{c}

acts as a denominator in the scaling term, its increase effectively compresses the search range of the inverse solution, allowing the algorithm to transition from exploration to fine-grained exploitation. Furthermore, to ensure the quality of the population, a greedy selection strategy is employed, where the generated inverse solution replaces the original individual only if it achieves a better fitness value. Overall, the workflow of the proposed multi-strategy ITTAO is shown in Figure 3.

2.2. VMD

Variational mode decomposition (VMD) is an adaptive signal processing technique that decomposes a signal into multiple intrinsic modes without recursion [33]. It introduces a bandwidth constraint into the EMD framework to obtain IMFs with limited bandwidth and non-overlapping spectra. Each IMF is treated as an analytic signal with amplitude and frequency modulation, and its instantaneous frequency is concentrated around a specific center frequency. This design enables VMD to effectively capture the local time–frequency characteristics of nonstationary signals. To achieve this, VMD constructs a constrained variational framework that minimizes the cumulative bandwidth of all modes while preserving the reconstruction of the original signal. The variational model of VMD is given as:

\{\begin{matrix} min_{{u_{k}}, {ω_{k}}} \{\sum_{k} {∥ϑ_{t} [(δ (t) + \frac{j}{π t}) u_{k} (t)] e^{- j ω_{k} t}∥}_{2}^{2}\} \\ s . t . \sum_{k} u_{k} = f (t) \end{matrix}

(12)

where

u_{k}

is the k-th mode component, and

ω_{k}

is the center frequency corresponding to

u_{k}

.

To solve Equation (12), VMD introduces a Lagrange multiplier and a quadratic penalty factor, reformulating the constrained variational problem into an equivalent unconstrained optimization problem, as presented in Equation (13).

\begin{matrix} L ({u_{k}}, {ω_{k}}, λ) & = α \sum_{k} {∥\partial_{t} [(δ (t) + \frac{j}{π t}) u_{k} (t)] e^{- j ω_{k} t}∥}_{2}^{2} \\ + {∥f (t) - \sum_{k} u_{k} (t)∥}_{2}^{2} \\ + 〈λ (t), f (t) - \sum_{k} u_{k} (t)〉 \end{matrix}

(13)

where

L (\cdot)

is the augmented Lagrangian function,

α

is the penalty factor,

λ

is the Lagrange multiplier, and

〈 \cdot 〉

denotes the inner product operation.

2.3. BiLSTM

LSTM is an enhanced RNN designed to address the gradient explosion problem that occurs in conventional RNNs when processing long sequences [34]. By incorporating a forget gate, input gate, memory cell, and output gate, LSTM maintains long-term dependencies in sequential data and delivers more accurate predictions. The basic structure of the LSTM model is shown in Figure 4.

In Figure 4,

h_{t}

and

h_{t - 1}

denote the hidden states at time t and

t - 1

, and

c_{t}

and

c_{t - 1}

represent the corresponding cell states;

x_{t}

is the input vector at time t;

σ

represents the sigmoid activation function with a range of

[0, 1]

; tanh represents the hyperbolic tangent activation function with a range of

[- 1, 1]

.

W_{f}

,

W_{i}

,

W_{c}

, and

W_{o}

are the weight matrices for the forget gate

f_{t}

, input gate

i_{t}

, memory cell

c_{t}

, and output gate

o_{t}

, respectively. The expressions of the LSTM gates are given as follows:

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f})

(14)

i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i})

(15)

{\tilde{c}}_{t} = tanh (W_{c} [h_{t - 1}, x_{t}] + b_{c})

(16)

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(17)

where

b_{f}

,

b_{i}

,

b_{c}

, and

b_{o}

are the bias terms of the corresponding equations. The cell state at time t is updated as:

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t}

(18)

where ⊙ represents element-wise matrix multiplication, and the hidden state output at time t is given as follows:

h_{t} = o_{t} ⊙ tanh (c_{t})

(19)

A BiLSTM consists of a forward LSTM and a backward LSTM that process the input sequence in opposite directions. By sharing parameters between the two directions, it improves the efficiency of feature extraction. This architecture increases the model’s representational capacity and reduces the risk of underfitting, even when only limited training data are available. The computational formulation of the BiLSTM is given as:

\{\begin{matrix} \vec{h_{t}} & = LSTM (x_{t}, \vec{h_{t - 1}}), \\ \overset{\leftarrow}{h_{t}} & = LSTM (x_{t}, \overset{\leftarrow}{h_{t - 1}}), \\ y_{t} & = σ (W_{y} [\vec{h_{t}}, \overset{\leftarrow}{h_{t}}] + b_{y}) . \end{matrix}

(20)

where

LSTM (\cdot)

denotes the conventional LSTM computation process;

\vec{h_{t}}

and

\overset{\leftarrow}{h_{t}}

represent the forward and backward hidden states at time t, respectively;

W_{y}

is the weight matrix for the combined hidden states and

b_{y}

is the bias term.

3. Prediction Model Construction

3.1. ITTAO Optimized VMD

Due to the randomness of wind power output, it is hard for WPF models to capture the nonlinear and nonstationary patterns in raw data. VMD is used to decompose the wind power series into several components distributed over different frequency bands. However, the performance of VMD is highly sensitive to the selection of the mode number k and the penalty factor

α

. Conventional parameter selection methods rely on empirical tuning, which are inefficient and can lead to mode mixing and loss of frequency information. To tackle this problem, this study applies ITTAO to optimize k and

α

. Envelope entropy is used as the fitness function, with the optimization objective set to minimize its value to identify the optimal parameter combination. A lower envelope entropy indicates more distinct signal features. The expression of envelope entropy is given as:

\{\begin{matrix} E_{p} = - \sum_{j = 1}^{N} p_{j} lg (p_{j}) \\ p_{j} = \frac{a (j)}{\sum_{j = 1}^{N} a (j)} \end{matrix}

(21)

where N is the length of the IMF component,

p_{j}

is the Hilbert transform of each IMF component, and

p (j)

is the normalized form of

a (j)

.

The pseudocode of the ITTAO–VMD optimization is provided in Algorithm 1. The process begins by initializing the population and setting the search ranges of

[k, σ]

. Then, the population is divided into several TTUs, and the fitness of each individual is calculated. During generic aggregation, new solutions are created through information exchange to improve search diversity. During local aggregation, perturbations are applied around the best individual to refine local search. The optimal solution is updated until the termination condition is satisfied. Finally, the best parameters are obtained and used for VMD decomposition.

Algorithm 1: ITTAO–VMD Optimization Algorithm

3.2. ITTAO Optimized BiLSTM

The performance of neural network models largely depends on hyperparameters such as the learning rate and the training epochs. To improve accuracy and reduce manual tuning, ITTAO is used to optimize the BiLSTM model. The pseudocode of ITTAO for optimizing BiLSTM is shown in Algorithm 2.

Algorithm 2: ITTAO–BiLSTM Hyperparameter Optimization

3.3. Adaptive Error-Based Weighting Strategy

The absolute prediction error of a forecasting component at i-th sample is given as:

e_{i} = |h (i) - y (i)|

(22)

A sliding error sequence

W_{i}

of length

L_{m}

, denoted as

W_{i} = {e_{i - L_{m}}, e_{i - L_{m} + 1}, \dots, e_{i}}

, is constructed, and the error stability index

δ_{i}

is defined as:

δ_{i} = \frac{σ (W_{i})}{μ (W_{i})}

(23)

where

σ (\cdot)

and

μ (\cdot)

represent the standard deviation and mean operations. A smaller

δ_{i}

indicates lower error fluctuations. The value of

δ_{i}

is then mapped to a dynamic weighting factor

w_{i}

using a sigmoid function:

w_{i} = \frac{1}{1 + exp [- λ (δ_{i} - δ_{0})]}

(24)

where

λ

is the slope factor that controls the sensitivity of the weight adjustment, and

δ_{0}

is the decision threshold. When

δ_{i}

equals

δ_{0}

,

w_{i}

is 0.5. Finally, the weighted prediction value of the component is given as:

{\hat{y}}_{w} (i) = w_{i} \cdot y (i)

(25)

To determine the optimal values of the

L_{m}

,

λ

, and

δ_{0}

, a grid search algorithm can be employed with the objective of minimizing the MAE on the validation set.

(L_{m}, λ, δ_{0}) = arg min_{L_{m}, λ, δ_{0}} (\frac{1}{m} \sum_{i = 1}^{m} |h (i) - y (i)|)

(26)

3.4. Forecasting Workflow

Based on the above analysis, an ultra-short-term WPF model is developed by integrating the ITTAO algorithm with a high-frequency adaptive weighting strategy. The overall framework is presented in Figure 5, and the main steps are summarized as follows:

(1): Detect and remove outliers in the wind power data using a method based on the variance change rate and the interquartile range. Missing values are then filled through cubic spline interpolation.
(2): Calculate the maximal information coefficient (MIC) between each meteorological feature and wind power, and retain the features that exhibit strong correlation with wind power according to their MIC values.
(3): Use ITTAO to optimize the VMD parameters $[k, α]$ , and then decompose the wind power series into multiple IMFs with different frequencies.
(4): Calculate the SE of each IMF, and then apply K-means clustering to reconstruct the IMFs into low-, mid-, and high-frequency sequences.
(5): Use ITTAO to optimize the BiLSTM hyperparameters and determine the parameters of the high-frequency dynamic weighting mechanism via grid search. The BiLSTM models are then trained on the training set using the optimal configuration.
(6): Input the test sets of each component into the trained models to obtain predictions. Apply dynamic weights to the high-frequency components and fuse them with the medium- and low-frequency predictions to obtain the final forecast results.

4. Experiment Analysis

4.1. Data Description

The data used for the experimental analysis come from the SCADA system of a wind farm located in North China. This wind farm has an installed capacity of 66 MW, with SCADA measurements taken at 15 min intervals. Twelve variables are collected at each time step, comprising wind speed and direction (°) measured at 10 m, 30 m, 50 m, and hub height, together with air temperature (°C), atmospheric pressure (hPa), relative humidity (%), and generated wind power (MW). The dataset is publicly available, and details can be found in [35]. It covers the period from 1 January 2019, to 31 December 2020. In this study, to comprehensively evaluate the model’s performance under different seasonal conditions, data from January, April, July, and October of 2020 are selected as representative samples for winter, spring, summer, and autumn, respectively. For each month, 70% is allocated for training, 15% for validation, and 15% for testing. The forecasting task is configured to predict the wind power output for the next time step (i.e., the upcoming 15 min) based on the historical data from the previous 4 time steps (spanning 1 h).

4.2. Data Processing and Decomposition

Given the consistency of data processing patterns across different seasons, the January dataset is selected as a representative case to illustrate the preprocessing and decomposition procedures. Since wind power data often contain outliers from sensor faults, hardware wear, or communication errors, which can reduce forecasting accuracy if not handled. The variance change rate combined with the interquartile range-based method [36] is used to detect and remove anomalies within each wind speed range. Detected anomalies are marked as missing and reconstructed using cubic spline interpolation [37]. All data are normalized to the

[0, 1]

range using Equation (27).

y^{'} = \frac{y - y_{min}}{y_{max} - y_{min}}

(27)

where

y^{'}

is the normalized value, y is the original value, and

y_{min}

and

y_{max}

are the minimum and maximum values of the variable, respectively.

To reduce feature redundancy and computational overhead, MIC is used to quantify the correlations between meteorological variables and wind power [38]. The quantitative results are presented in Table 2. As observed, wind speed variables across different heights consistently exhibit dominant correlations, e.g., 0.8018 at the hub height. In contrast, other meteorological features, including wind direction at the hub height (0.1929), temperature (0.1877), pressure (0.1581), and relative humidity (0.1431), demonstrate significantly weaker dependencies. To strike an optimal balance between prediction accuracy and computational efficiency, only the wind speed is retained as the input feature.

The population size of ITTAO is set to 15, and the maximum number of iterations is set to 30. These parameters are selected based on preliminary trials to balance computational efficiency and optimization accuracy. Further increasing the population size or the number of iterations led to a significant increase in runtime with negligible improvement in the optimization results. The search ranges for the VMD parameters are

k \in [3, 13]

and

α \in [100, 2500]

. Based on Algorithm 1, the optimal values are

k = 12

and

α = 2129

. Applying these values to VMD yields several IMFs, and their time-domain waveforms and spectra are shown in Figure 6. The results show that the ITTAO-optimized VMD strategy cleanly separates complex signals, assigns components with close frequencies to distinct bands, and keeps each spectrum free of aliasing.

Table 2. MIC values between meteorological variables and wind power.

Variable	MIC Value	Variable	MIC Value
Wind speed at 10 m	0.8393	Wind direction at 10 m	0.2954
Wind speed at 30 m	0.8054	Wind direction at 30 m	0.2735
Wind speed at 50 m	0.8044	Wind direction at 50 m	0.2064
Wind speed at hub height	0.8018	Wind direction at hub height	0.1929
Air temperature	0.1877	Atmospheric pressure	0.1581
Relative humidity	0.1431

To improve modeling efficiency and reduce redundancy among similar components, the decomposed IMFs are grouped based on their complexity degree. The SE of each IMF is computed, and K-means clustering is performed based on the SE values to reconstruct the IMFs [39]. The corresponding SE values and clustering results are illustrated in Figure 7. As seen from Figure 7, IMF1–IMF5 with smooth variations and similar SE values are grouped as one category, IMF6–IMF9 with moderate SE values as another, and IMF10–IMF12 with sharp variations as the third. These categories are reconstructed as the low-frequency

{IMF}_{l f}

, mid-frequency

{IMF}_{m f}

, and high-frequency

{IMF}_{h f}

components.

4.3. Experimental Setup and Evaluation Metrics

To guarantee numerical stability, the BiLSTM model incorporates a fixed dropout rate of 0.1 and a gradient clipping threshold of 1. The optimization process jointly targets the adaptive weighting parameters and BiLSTM hyperparameters, including the learning rate, batch size, hidden neurons, and L2 coefficient, as detailed in Table 3. In addition, to mitigate stochastic effects, all reported results reflect the statistical aggregate of 10 independent trials conducted with distinct random seeds.

To evaluate the predictive accuracy of the proposed WPF model, this study employs the Normalized Mean Absolute Error (

e_{NMAE}

), the Normalized Root Mean Square Error (

e_{NRMSE}

), and the coefficient of determination (

R^{2}

) as performance metrics. Their mathematical definitions are as follows:

e_{NMAE} = \frac{1}{m} \sum_{i = 1}^{m} \frac{| h_{i} - y_{i} |}{P_{cap}} \times 100 %

(28)

e_{NRMSE} = \frac{1}{P_{cap}} \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(h_{i} - y_{i})}^{2}} \times 100 %

(29)

R^{2} = 1 - \frac{\sum_{i = 1}^{m} {(h_{i} - y_{i})}^{2}}{\sum_{i = 1}^{m} {(h_{i} - \frac{1}{m} \sum_{j = 1}^{m} h_{j})}^{2}}

(30)

where

h_{i}

and

y_{i}

denote the actual and predicted values of the i-th sample, respectively, m represents the total number of samples, and

P_{c a p}

represents the installed capacity of the wind farm.

4.4. Performance Comparison with Benchmark Models

This section evaluates the forecasting performance of the proposed ITTAO-VMD-BiLSTM framework equipped with a high-frequency adaptive weighting strategy. The model is benchmarked against five baseline methods: BP, BiLSTM, CNN-BiLSTM, TCN-BiGRU, and CEEMDAN-BiLSTM. Figure 8 and Table 4 provide the detailed visual and quantitative comparisons across these four representative months.

As illustrated in Figure 8, the proposed model aligns most closely with the actual wind power values across all seasonal scenarios. In contrast, standalone models such as BP and BiLSTM deviate noticeably from the ground truth. Furthermore, although hybrid deep learning models, e.g., CNN-BiLSTM and TCN-BiGRU, enhance prediction accuracy to a certain extent through feature extraction mechanisms, they still underperform compared to decomposition-based approaches and the proposed framework.

The numerical results in Table 4 further confirm these observations. The proposed framework consistently achieves the lowest error metrics across all seasons. Compared with traditional benchmarks and advanced hybrid models, the proposed method substantially reduces both

e_{NMAE}

and

e_{NRMSE}

while maintaining high

R^{2}

values. Specifically, in Month 10, which features relatively smooth power profiles, the model yields an

e_{NMAE}

of 1.40%, an

e_{NRMSE}

of 2.03%, and an

R^{2}

of 99.70%. Furthermore, even in Month 7, which represents the most volatile period, the proposed model retains an

e_{NMAE}

of 1.33% and an

e_{NRMSE}

of 2.20%. These figures are markedly lower than those of the BP (2.62%, 4.96%) and CEEMDAN-BiLSTM (1.51%, 2.57%) models. These results across diverse conditions underscore the robust generalization and superior stability of the proposed framework for ultra-short-term WPF.

To demonstrate the robustness of the reported improvements, January and July are selected as representative case studies for a detailed statistical analysis, comprising Mean ± Std, 95% confidence intervals, and p-values. The corresponding results are summarized in Table 5 and Table 6, respectively.

Table 4. Comparison of prediction accuracy metrics for different forecasting models.

Month	Indicator	BP	BiLSTM	CNN-BiLSTM	TCN-BiGRU	CEEMDAN-BiLSTM	Proposed
1	$e_{NMAE}$	2.22	1.83	1.52	1.33	0.92	0.74
	$e_{NRMSE}$	3.17	2.61	2.13	1.92	1.17	0.94
	$R^{2}$	94.59	96.42	97.59	98.05	99.27	99.53
4	$e_{NMAE}$	1.64	1.10	0.96	0.99	0.93	0.82
	$e_{NRMSE}$	3.69	2.25	1.95	1.88	1.56	1.37
	$R^{2}$	87.37	95.17	96.35	96.52	97.64	98.21
7	$e_{NMAE}$	2.62	1.95	1.53	1.56	1.51	1.33
	$e_{NRMSE}$	4.96	3.65	3.07	2.82	2.57	2.20
	$R^{2}$	91.72	95.06	96.34	96.97	97.41	98.18
10	$e_{NMAE}$	4.41	3.64	2.19	1.78	1.78	1.40
	$e_{NRMSE}$	6.23	5.63	3.39	2.87	2.73	2.03
	$R^{2}$	97.32	97.82	99.20	99.43	99.46	99.70

Table 5. Statistical significance analysis for WPF models based on the January dataset.

Model	$e_{NMAE}$			$e_{NRMSE}$
Model	Mean ± Std	95% CI	$p$ -Value	Mean ± Std	95% CI	$p$ -Value
BP	2.22 ± 0.28	[2.09, 2.36]	<0.001	3.16 ± 0.57	[2.90, 3.43]	<0.001
BiLSTM	1.82 ± 0.14	[1.76, 1.89]	<0.001	2.60 ± 0.21	[2.51, 2.70]	<0.001
CNN-BiLSTM	1.52 ± 0.14	[1.46, 1.59]	<0.001	2.13 ± 0.19	[2.04, 2.22]	<0.001
TCN-GRU	1.33 ± 0.05	[1.31, 1.36]	<0.001	1.92 ± 0.03	[1.91, 1.94]	<0.001
CEEMDAN-BiLSTM	0.91 ± 0.12	[0.86, 0.97]	<0.001	1.17 ± 0.14	[1.11, 1.24]	<0.001
Proposed	0.74 ± 0.08	[0.70, 0.78]	/	0.94 ± 0.10	[0.90, 0.99]	/

Table 6. Statistical significance analysis for WPF models based on the July dataset.

Model	$e_{NMAE}$			$e_{NRMSE}$
Model	Mean ± Std	95% CI	$p$ -Value	Mean ± Std	95% CI	$p$ -Value
BP	2.62 ± 0.09	[2.58, 2.67]	<0.001	4.97 ± 0.18	[4.89, 5.05]	<0.001
BiLSTM	1.87 ± 0.13	[1.81, 1.93]	<0.001	3.65 ± 0.24	[3.54, 3.76]	<0.001
CNN-BiLSTM	1.57 ± 0.08	[1.53, 1.61]	<0.001	3.08 ± 0.16	[3.00, 3.15]	<0.001
TCN-GRU	1.57 ± 0.04	[1.55, 1.59]	<0.001	2.83 ± 0.03	[2.82, 2.84]	<0.001
CEEMDAN-BiLSTM	1.52 ± 0.03	[1.50, 1.53]	<0.001	2.51 ± 0.06	[2.49, 2.54]	<0.001
Proposed	1.33 ± 0.03	[1.32, 1.34]	/	2.20 ± 0.04	[2.18, 2.22]	/

According to Table 5 and Table 6, the p-values for comparisons between the proposed model and all baseline methods are below 0.001, indicating that the observed improvements are statistically significant. These results support the conclusion that the proposed ITTAO-VMD-BiLSTM provides improved forecasting accuracy compared with state-of-the-art benchmarks.

In addition, to provide a comprehensive insight into the error distribution and tail behavior, Figure 9 presents the boxplots of absolute errors for the representative months of January and July. As observed in the figures, the proposed method exhibits a much more compact error distribution with significantly fewer and smaller outliers. This quantifiably demonstrates the model’s robustness to extreme fluctuations and its superior ability to handle volatile wind power ramps.

4.5. Ablation Experiment

In this section, to verify the effectiveness of each module in the hybrid model, an ablation study is conducted using the July dataset, and the results are shown in Table 7. When using only BiLSTM for prediction, the prediction results are an

e_{NMAE}

of 1.95%, an

e_{NRMSE}

of 3.65%, and an

R^{2}

of 94.96%. After adding VMD decomposition and using IMF clustering, the

e_{NMAE}

decreases to 1.60%, the

e_{NRMSE}

decreases to 2.62%, and the

R^{2}

improves to 97.35%. This indicates that signal decomposition and clustering significantly enhance prediction performance. By further utilizing TTAO to optimize the hyperparameters of VMD and BiLSTM, the

e_{NMAE}

decreases to 1.51%, the

e_{NRMSE}

decreases to 2.47%, and the

R^{2}

improves to 97.46%. After adopting multi-strategy optimization for TTAO, all metrics are further improved to 1.45%, 2.43%, and 97.79%, respectively. Finally, after integrating the adaptive weighting strategy, the prediction accuracy reaches an

e_{NMAE}

of 1.35%, an

e_{NRMSE}

of 2.21%, and an

R^{2}

of 98.18%. The ablation study of the hybrid model demonstrates the effectiveness and necessity of the design of each module in the proposed method.

4.6. Comparative Analysis of Optimization Algorithms

The robust optimization performance of the proposed ITTAO is assessed by benchmarking it against the original TTAO and several mainstream alternatives, including Particle Swarm Optimization (PSO), Grey Wolf Optimizer (GWO), and Whale Optimization Algorithm (WOA). First, a comprehensive evaluation is carried out on six standard CEC benchmark functions, covering both unimodal and multimodal cases, as listed in Table 8. Each algorithm is executed independently 30 times. The population size is uniformly set to 30, and the maximum number of iterations is fixed at 500.

Table 9 presents the statistical results, including the mean and standard deviation of the optimization solutions. As observed, the proposed ITTAO consistently achieves the lowest mean fitness values and standard deviations across all test functions. In contrast, standard algorithms such as PSO and the original TTAO tend to be trapped in local optima, leading to slower convergence. Figure 10 further illustrates the convergence curves of the different algorithms. It is evident that ITTAO exhibits a faster convergence rate, reaching the theoretical optimal value with fewer iterations compared to the baselines. These empirical results confirm that the introduced improvement strategies significantly enhance the algorithm’s global search capability and stability, making it highly suitable for the complex parameter optimization.

The effectiveness of ITTAO in optimizing VMD and BiLSTM hyperparameters is further evaluated using the July dataset. To ensure a fair comparison, the VMD-BiLSTM framework and the adaptive weighting strategy remain consistent across all tested algorithms. The statistical results are summarized in Table 10. As observed in Table 10, the prediction accuracies across different methods are relatively close. This is because all optimization algorithms successfully converged to effective parameter configurations. Nevertheless, the ITTAO-optimized model consistently outperforms those based on PSO, WOA, GWO, and the original TTAO. Specifically, compared with the original TTAO, ITTAO achieves reductions of 7.64% in

e_{MAE}

and 7.95% in

e_{RMSE}

, along with an absolute improvement of 0.43% in

R^{2}

. These results confirm the effectiveness of the proposed optimization algorithm for WPF.

5. Conclusions

This study proposes an innovative ensemble forecasting method designed to achieve more accurate ultra-short-term WPF by integrating a multi-strategy improved TTAO, signal decomposition and reconstruction, and a high-frequency adaptive weighting technique. Compared with traditional hybrid models, this work optimizes both data decomposition and predictive modeling. Specifically, ITTAO overcomes premature convergence to achieve precise global parameter optimization for VMD and BiLSTM. Simultaneously, the high-frequency adaptive weighting strategy dynamically adjusts volatile sequences, simplifying the prediction task while facilitating the deep mining of intrinsic features.

Experimental results across four seasons demonstrate that the proposed ITTAO-VMD-BiLSTM model consistently outperforms five baselines. Notably, in the highly volatile January dataset, the model achieved optimal metrics with an NMAE of 1.33%, an NRMSE of 2.20%, and an

R^{2}

of 98.18%. Compared to the classical BP neural network and the advanced CEEMDAN-BiLSTM model, the proposed method reduced the NMAE by 49.24% and 11.92%, decreased the NRMSE by 55.65% and 14.40%, and improved the

R^{2}

by 6.46% and 0.77%, respectively. These quantitative findings confirm that the combined optimization and high-frequency adaptive weighting strategies effectively suppress noise and enhance prediction accuracy, establishing the model as a robust tool for wind power forecasting.

It is important to note that forecast accuracy is not merely a technical metric but a critical economic determinant. In modern electricity markets, large deviations between forecasted and actual power generation incur substantial financial penalties. Therefore, minimizing error metrics such as NMAE and NRMSE directly correlates with reduced penalty costs and optimized operational revenue. While this study focuses on the technical improvement of forecasting models, the translation of these accuracy gains into specific financial values remains a promising avenue for future research.

Author Contributions

Conceptualization, X.W. and S.L.; methodology, X.W. and Y.H.; software, X.W. and Y.H.; validation, X.W. and J.P.; writing—original draft preparation, Y.Y. and L.Z.; writing—review and editing, X.B. and H.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (52477127) and the Science and Technology Project of Sichuan Keride Industrial Group Co., Ltd. (U22010032ZJGCS005202311160003).

Data Availability Statement

The data that support the findings of this study are openly available in Scientific Data at https://doi.org/10.1038/s41597-022-01696-6.

Conflicts of Interest

Authors Xiaoming Wang, Yan Huang, Youqing Yang, Lin Zhang and Xiaolong Bai were employed by the company Chengdu Chengdian Power Design Co., Ltd. Author Jing Pu was employed by the company Sichuan Keride Industrial Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Regarding the commercial funding, the funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

References

Lee, J.; Zhao, F. Global Wind Report 2024; Global Wind Energy Council: Brussels, Belgium, 2024. [Google Scholar]
Wang, S.; Chang, L.; Liu, H.; Chang, Y.; Xue, Q. Short-term prediction of wind power based on temporal convolutional network and the informer model. IET Gener. Transm. Distrib. 2024, 18, 941–951. [Google Scholar] [CrossRef]
Li, L.L.; Zhao, X.; Tseng, M.L.; Tan, R.R. Short-term wind power forecasting based on support vector machine with improved dragonfly algorithm. J. Clean. Prod. 2020, 242, 118447. [Google Scholar] [CrossRef]
Giebel, G.; Kariniotakis, G. Wind power forecasting—A review of the state of the art. In Renewable Energy Forecasting; Kariniotakis, G., Ed.; Woodhead Publishing: Kidlington, UK, 2017; pp. 59–109. [Google Scholar] [CrossRef]
Al-Yahyai, S.; Charabi, Y.; Gastli, A. Review of the use of numerical weather prediction (NWP) models for wind energy assessment. Renew. Sustain. Energy Rev. 2010, 14, 3192–3198. [Google Scholar] [CrossRef]
Liu, H.; Tian, H.; Li, Y. Comparison of two new ARIMA-ANN and ARIMA-Kalman hybrid methods for wind speed prediction. Appl. Energy 2012, 98, 415–424. [Google Scholar] [CrossRef]
Eldali, F.A.; Hansen, T.M.; Suryanarayanan, S.; Chong, E.K.P. Employing ARIMA models to improve wind power forecasts: A case study in ERCOT. In Proceedings of the 2016 North American Power Symposium (NAPS), Denver, CO, USA, 18–20 September 2016; pp. 1–6. [Google Scholar]
Rajagopalan, S.; Santoso, S. Wind power forecasting and error analysis using the autoregressive moving average modeling. In Proceedings of the 2009 IEEE Power & Energy Society General Meeting, Calgary, AB, Canada, 26–30 July 2009; pp. 1–6. [Google Scholar]
Hanifi, S.; Liu, X.; Lin, Z.; Lotfian, S. A critical review of wind power forecasting methods—Past, present and future. Energies 2020, 13, 3764. [Google Scholar] [CrossRef]
Hu, Y.L.; Chen, L. A nonlinear hybrid wind speed forecasting model using LSTM network, hysteretic ELM and Differential Evolution algorithm. Energy Convers. Manag. 2018, 173, 123–142. [Google Scholar] [CrossRef]
Karasu, S.; Altan, A.; Saraç, Z.; Hacıoğlu, R. Estimation of fast varied wind speed based on NARX neural network by using curve fitting. Int. J. Energy Appl. Technol. 2017, 4, 137–146. [Google Scholar]
Viet, D.T.; Phuong, V.V.; Duong, M.Q.; Tran, Q.T. Models for short-term wind power forecasting based on improved artificial neural network using particle swarm optimization and genetic algorithms. Energies 2020, 13, 2873. [Google Scholar] [CrossRef]
Li, N.; He, F.; Ma, W. Wind power prediction based on extreme learning machine with kernel mean p-power error loss. Energies 2019, 12, 673. [Google Scholar] [CrossRef]
Karasu, S.; Altan, A.; Saraç, Z.; Hacıoğlu, R. Estimation of wind speed by using regression learners with different filtering methods. In Proceedings of the 1st International Conference on Energy Systems Engineering, Karabuk, Turkey, 2–4 November 2017. [Google Scholar]
Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. A review of deep learning for renewable energy forecasting. Energy Convers. Manag. 2019, 198, 111799. [Google Scholar] [CrossRef]
Wang, Y.; Zou, R.; Liu, F.; Zhang, L.; Liu, Q. A review of wind speed and wind power forecasting with deep neural networks. Appl. Energy 2021, 304, 117766. [Google Scholar] [CrossRef]
Yildiz, C.; Acikgoz, H.; Korkmaz, D.; Budak, U. An improved residual-based convolutional neural network for very short-term wind power forecasting. Energy Convers. Manag. 2021, 228, 113731. [Google Scholar] [CrossRef]
Hu, S.; Xiang, Y.; Huo, D.; Jawad, S.; Liu, J. An improved deep belief network based hybrid forecasting method for wind power. Energy 2021, 224, 120185. [Google Scholar] [CrossRef]
Zhao, Y.; Li, L.; Guo, Y.; Shi, B.; Sun, H. Short-term wind power prediction based on combined long short-term memory. IET Gener. Transm. Distrib. 2024, 18, 931–940. [Google Scholar] [CrossRef]
Duan, J.; Wang, P.; Ma, W.; Fang, S.; Hou, Z. A novel hybrid model based on nonlinear weighted combination for short-term wind power forecasting. Int. J. Electr. Power Energy Syst. 2022, 134, 107452. [Google Scholar] [CrossRef]
Azimi, R.; Ghofrani, M.; Ghayekhloo, M. A hybrid wind power forecasting model based on data mining and wavelets analysis. Energy Convers. Manag. 2016, 127, 208–225. [Google Scholar] [CrossRef]
Zhao, Y.; Jia, L. A short-term hybrid wind power prediction model based on singular spectrum analysis and temporal convolutional networks. J. Renew. Sustain. Energy 2020, 12, 5. [Google Scholar] [CrossRef]
Abedinia, O.; Lotfi, M.; Bagheri, M.; Sobhani, B.; Shafie-Khah, M.; Catalão, J.P.S. Improved EMD-based complex prediction model for wind power forecasting. IEEE Trans. Sustain. Energy 2020, 11, 2790–2802. [Google Scholar] [CrossRef]
Yue, Y.; Zheng, W.; Wu, A.; Jin, X.; Huang, Z.; Zhang, H. Ultra-short-term wind speed forecasting based on secondary decomposition and Transformer-MLR combined model. Electr. Power Syst. Res. 2025, 246, 111702. [Google Scholar] [CrossRef]
Liu, J.; Deng, J.; Gao, P.; Liu, H.; Sun, S. Ultra short term wind power prediction based on VMD-IMPA-SVM. Electr. Predict. Optim. 2024, 52, 24–31+79. [Google Scholar]
Lv, Q.; Zhang, J.; Zhang, J.; Zhang, Z.; Zhou, Q.; Gao, P.; Zhang, H. Short-term wind power prediction model based on PSO-CNN-LSTM. Energies 2025, 18, 3346. [Google Scholar] [CrossRef]
Qin, G.; Yan, Q.; Zhu, J.; Xu, C.; Kammen, D.M. Day-ahead wind power forecasting based on wind load data using hybrid optimization algorithm. Sustainability 2021, 13, 1164. [Google Scholar] [CrossRef]
Gao, X.; Guo, W.; Mei, C.; Sha, J.; Guo, Y.; Sun, H. Short-term wind power forecasting based on SSA-VMD-LSTM. Energy Rep. 2023, 9, 335–344. [Google Scholar] [CrossRef]
Zhao, S.; Zhang, T.; Cai, L.; Yang, R. Triangulation topology aggregation optimizer: A novel mathematics-based meta-heuristic algorithm for continuous optimization and engineering applications. Expert Syst. Appl. 2024, 238, 121744. [Google Scholar] [CrossRef]
Liu, J.; Deng, Y.; Liu, Y.; Chen, L.; Hu, Z.; Wei, P.; Li, Z. A logistic-tent chaotic mapping Levenberg Marquardt algorithm for improving positioning accuracy of grinding robot. Sci. Rep. 2024, 14, 9649. [Google Scholar] [CrossRef]
Tanyildizi, E.; Demir, G. Golden sine algorithm: A novel math-inspired algorithm. Adv. Electr. Comput. Eng. 2017, 17, 2. [Google Scholar] [CrossRef]
Tizhoosh, H.R. Opposition-based reinforcement learning. J. Adv. Comput. Intell. Intell. Inform. 2006, 10, 578–585. [Google Scholar] [CrossRef]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar]
Chen, Y.; Xu, J. Solar and wind power data from the Chinese state grid renewable energy generation forecasting competition. Sci. Data 2022, 9, 577. [Google Scholar] [CrossRef]
Zhao, Y.; Ye, L.; Wang, W.; Sun, H.; Ju, Y.; Tang, Y. Data-driven correction approach to refine power curve of wind farm under wind curtailment. IEEE Trans. Sustain. Energy 2017, 9, 95–105. [Google Scholar] [CrossRef]
Cheng, M.; Qian, W.; He, X. The research on wind power prediction based on wind speed correction. In Proceedings of the 42nd Chinese Control Conference (CCC), Xiamen, China, 24–26 July 2023; pp. 6309–6314. [Google Scholar]
Liang, T.; Zhang, Q.; Liu, X.; Lou, C.; Liu, X.; Wang, H. Time-frequency maximal information coefficient method and its application to functional corticomuscular coupling. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 2515–2524. [Google Scholar] [CrossRef] [PubMed]
Sun, Z.; Zhao, S.; Zhang, J. Short-term wind power forecasting on multiple scales using VMD decomposition, K-means clustering and LSTM principal computing. IEEE Access 2019, 7, 166917–166929. [Google Scholar] [CrossRef]

Figure 1. Diagram of TTUs in two- and three-dimensional domains. (a) two-dimensional space. (b) three-dimensional space.

Figure 2. Generic aggregation of triangular topological units.

Figure 3. Flowchart of the ITTAO algorithm.

Figure 4. LSTM basic framework.

Figure 5. Overall flowchart of the proposed WPF method.

Figure 6. ITTAO–VMD decomposition results.

Figure 7. SE values and clustering results of the IMFs.

Figure 8. Forecasting curves of different models across different months. (a) January, (b) April, (c) July, (d) October.

Figure 9. Boxplots of error metrics for different forecasting models in representative months. (a) January, (b) July.

Figure 10. Convergence curves of the test functions. (a)

f_{1}

, (b)

f_{2}

, (c)

f_{3}

, (d)

f_{4}

, (e)

f_{5}

, (f)

f_{6}

.

Figure 10. Convergence curves of the test functions. (a)

f_{1}

, (b)

f_{2}

, (c)

f_{3}

, (d)

f_{4}

, (e)

f_{5}

, (f)

f_{6}

.

Table 3. Hyperparameter search ranges and optimal settings.

Parameter	Search Range	Optimal Value
learning rate	$[0.001, 0.02]$	0.012
number of hidden-layer neurons	$[20, 128]$	128
regularization coefficient	$[0.001, 0.01]$	0.01
batch size	$[16, 128]$	32
$L_{m}$	$[2, 10]$	4
$λ$	$[0.1, 10]$	2.1
$δ_{0}$	$[0.2, 1.2]$	0.62

Table 7. Ablation experiment of network structure.

Models	$e_{NMAE}$ /%	$e_{NRMSE}$ /%	$R^{2} / %$
BiLSTM	1.95	3.65	94.96
VMD-BiLSTM	1.60	2.62	97.35
TTAO-VMD-BiLSTM	1.51	2.47	97.46
ITTAO-VMD-BiLSTM	1.45	2.43	97.79
Proposed method	1.35	2.21	98.18

Table 8. Details of the benchmark test functions.

Benchmark Test Function	Dimension	Range
$f_{1} (x) = \sum_{i = 1}^{n} x_{i}^{2}$	30	$[- 100, 100]$
$f_{2} (x) = \sum_{i = 1}^{n} \| x_{i} \| + \prod_{i = 1}^{n} \| x_{i} \|$	30	$[- 10, 10]$
$f_{3} (x) = \sum_{i = 1}^{n} {(\sum_{j = 1}^{i} x_{j})}^{2}$	30	$[- 100, 100]$
$f_{4} (x) = \sum_{i = 1}^{n} i x_{i}^{4} + ε$	30	$[- 1.28, 1.28]$
$f_{5} (x) = \sum_{i = 1}^{n} [x_{i}^{2} - 10 cos (2 π x_{i}) + 10]$	30	$[- 5.12, 5.12]$
$f_{6} (x) = - 20 exp (- 0.2 \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}) - exp (\frac{1}{n} \sum_{i = 1}^{n} cos (2 π x_{i})) + 20 + e$	30	$[- 32, 32]$

Table 9. Optimization results of each algorithm.

Function	Metric	PSO	WOA	GWO	TTAO	ITTAO
$f_{1}$	Mean	$4.17 \times 10^{2}$	$2.20 \times 10^{- 75}$	$9.16 \times 10^{- 28}$	$6.32 \times 10^{- 1}$	$0.00$
$f_{1}$	Std	$2.16 \times 10^{2}$	$6.93 \times 10^{- 75}$	$1.13 \times 10^{- 27}$	$4.21 \times 10^{- 1}$	$0.00$
$f_{2}$	Mean	$1.80 \times 10^{1}$	$6.58 \times 10^{- 50}$	$9.44 \times 10^{- 17}$	$5.36 \times 10^{0}$	$0.00$
$f_{2}$	Std	$9.64 \times 10^{0}$	$2.88 \times 10^{- 49}$	$7.55 \times 10^{- 17}$	$3.16 \times 10^{0}$	$0.00$
$f_{3}$	Mean	$8.30 \times 10^{3}$	$4.08 \times 10^{4}$	$2.01 \times 10^{- 5}$	$6.36 \times 10^{2}$	$0.00$
$f_{3}$	Std	$4.89 \times 10^{3}$	$1.57 \times 10^{4}$	$5.09 \times 10^{- 5}$	$2.14 \times 10^{2}$	$0.00$
$f_{4}$	Mean	$2.43 \times 10^{0}$	$3.88 \times 10^{- 3}$	$2.07 \times 10^{- 3}$	$2.90 \times 10^{- 1}$	$2.54 \times 10^{- 4}$
$f_{4}$	Std	$5.12 \times 10^{0}$	$5.16 \times 10^{- 3}$	$1.62 \times 10^{- 3}$	$8.56 \times 10^{- 2}$	$1.97 \times 10^{- 4}$
$f_{5}$	Mean	$1.95 \times 10^{2}$	$1.89 \times 10^{- 15}$	$2.70 \times 10^{0}$	$5.34 \times 10^{1}$	$0.00$
$f_{5}$	Std	$3.21 \times 10^{1}$	$1.04 \times 10^{- 14}$	$5.00 \times 10^{0}$	$1.26 \times 10^{1}$	$0.00$
$f_{6}$	Mean	$6.43 \times 10^{0}$	$3.76 \times 10^{- 15}$	$9.53 \times 10^{- 14}$	$6.92 \times 10^{0}$	$0.00$
$f_{6}$	Std	$2.72 \times 10^{0}$	$2.63 \times 10^{- 15}$	$1.97 \times 10^{- 14}$	$1.82 \times 10^{0}$	$0.00$

Table 10. Comparison of error metrics of different optimization algorithms.

Optimization Algorithms	$e_{MAE} / MW$	$e_{RMSE} / MW$	$R^{2} / %$
PSO	1.71	2.84	96.99
WOA	1.39	2.26	98.06
GWO	1.42	2.31	98.01
TTAO	1.44	2.39	97.75
ITTAO	1.33	2.20	98.18

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, X.; Huang, Y.; Pu, J.; Yang, Y.; Zhang, L.; Bai, X.; Fan, H.; Lin, S. Ultra-Short-Term Wind Power Forecasting Based on Improved TTAO Optimization and High-Frequency Adaptive Weighting Strategy. Electronics 2026, 15, 363. https://doi.org/10.3390/electronics15020363

AMA Style

Wang X, Huang Y, Pu J, Yang Y, Zhang L, Bai X, Fan H, Lin S. Ultra-Short-Term Wind Power Forecasting Based on Improved TTAO Optimization and High-Frequency Adaptive Weighting Strategy. Electronics. 2026; 15(2):363. https://doi.org/10.3390/electronics15020363

Chicago/Turabian Style

Wang, Xiaoming, Yan Huang, Jing Pu, Youqing Yang, Lin Zhang, Xiaolong Bai, Haoran Fan, and Sheng Lin. 2026. "Ultra-Short-Term Wind Power Forecasting Based on Improved TTAO Optimization and High-Frequency Adaptive Weighting Strategy" Electronics 15, no. 2: 363. https://doi.org/10.3390/electronics15020363

APA Style

Wang, X., Huang, Y., Pu, J., Yang, Y., Zhang, L., Bai, X., Fan, H., & Lin, S. (2026). Ultra-Short-Term Wind Power Forecasting Based on Improved TTAO Optimization and High-Frequency Adaptive Weighting Strategy. Electronics, 15(2), 363. https://doi.org/10.3390/electronics15020363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ultra-Short-Term Wind Power Forecasting Based on Improved TTAO Optimization and High-Frequency Adaptive Weighting Strategy

Abstract

1. Introduction

2. Theoretical Framework

2.1. Improved TTAO

2.1.1. Initialization

2.1.2. TTU Construction

2.1.3. Generic Aggregation

2.1.4. Local Aggregation

2.1.5. Multi-Strategy Improvements

2.2. VMD

2.3. BiLSTM

3. Prediction Model Construction

3.1. ITTAO Optimized VMD

3.2. ITTAO Optimized BiLSTM

3.3. Adaptive Error-Based Weighting Strategy

3.4. Forecasting Workflow

4. Experiment Analysis

4.1. Data Description

4.2. Data Processing and Decomposition

4.3. Experimental Setup and Evaluation Metrics

4.4. Performance Comparison with Benchmark Models

4.5. Ablation Experiment

4.6. Comparative Analysis of Optimization Algorithms

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI