Research on an Ultra-Short-Term Wind Power Forecasting Model Based on Multi-Scale Decomposition and Fusion Framework

Zhou, Daixuan; Jia, Yan; Liu, Guangchen; Li, Junlin; Xi, Kaile; Wang, Zhichao; Wang, Xu

doi:10.3390/sym18020253

Open AccessArticle

Research on an Ultra-Short-Term Wind Power Forecasting Model Based on Multi-Scale Decomposition and Fusion Framework

by

Daixuan Zhou

¹,

Yan Jia

^1,2,*

,

Guangchen Liu

^1,2

,

Junlin Li

³,

Kaile Xi

¹,

Zhichao Wang

³ and

Xu Wang

¹

School of Electric Power, Inner Mongolia University of Technology, Hohhot 010080, China

²

Engineering Research Center of Large Energy Storage Technology, Ministry of Education, Hohhot 010080, China

³

School of Energy and Power Engineering, Inner Mongolia University of Technology, Hohhot 010080, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(2), 253; https://doi.org/10.3390/sym18020253

Submission received: 5 December 2025 / Revised: 28 December 2025 / Accepted: 8 January 2026 / Published: 30 January 2026

(This article belongs to the Topic Energy Power, Mechanical Engineering and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

Accurate wind power prediction is of great significance for the dispatch, security, and stable operation of energy systems. It helps enhance the symmetry and coordination between the highly stochastic and volatile nature of the power generation supply side and the stringent requirements for stability and power quality on the grid demand side. To further enhance the accuracy of ultra-short-term wind power forecasting, this paper proposes a novel prediction framework based on multi-layer data decomposition, reconstruction, and a combined prediction model. A multi-stage decomposition and reconstruction technique is first employed to significantly reduce noise interference: the Sparrow Search Algorithm (SSA) is utilized to optimize the parameters for an initial Variational Mode Decomposition (VMD), followed by a secondary decomposition of the high-frequency components using Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN). The resulting components are then reconstructed based on Sample Entropy (SE), effectively improving the quality of the input data. Subsequently, a hybrid prediction model named IMGWO-BiTCN-BiGRU is constructed to extract spatiotemporal bidirectional features from the input sequences. Finally, simulation experiments are conducted using actual measurement data from the Sotavento wind farm in Spain. The results demonstrate that the proposed hybrid model outperforms benchmark models across all evaluation metrics, validating its effectiveness in improving forecasting accuracy and stability.

Keywords:

wind power prediction; variational mode decomposition; bidirectional gated recurrent unit; IMGWO optimization algorithm

1. Introduction

With the excessive exploitation and use of fossil fuels and traditional non-renewable energy sources, global warming and the overconsumption of fossil energy have become increasingly severe issues. Wind power, as a clean and green energy form, is characterized by its wide distribution and does not produce greenhouse gas emissions, thereby minimizing negative environmental impacts. Consequently, it has gained extensive attention and application worldwide [1,2,3]. However, the wind power generation process is influenced by various factors, exhibiting strong volatility and randomness, which significantly affect grid stability during large-scale integration. The complex dynamic behaviours inherent in wind power generation time series essentially reflect the dynamic equilibrium between symmetry and symmetry breaking in natural systems. This includes, for instance, the recurring “stationary-disturbance” symmetric patterns in short-term fluctuations, the asymmetric characteristics of power responses under different wind conditions, and the hierarchical symmetric correlations among multi-scale fluctuations. The exploration of symmetry correlations is crucial for revealing the evolutionary laws of wind power and breaking through the passive fitting to “disorder” characteristic of traditional forecasting models. Concurrently, by identifying symmetric structures within the fluctuations, the modelling dimensionality of complex systems can be effectively simplified, thereby enhancing prediction accuracy. Accurate wind power prediction is of great significance for energy planning and grid dispatching. Furthermore, it provides technical support for the safe, stable, and economical operation of power systems and wind farms [4,5].

Currently, wind power prediction methods mainly include physical methods, statistical methods, and artificial intelligence (AI) methods [6]. Physical methods rely on specific geographic information and Numerical Weather Prediction (NWP) data to calculate power predictions through physical modelling. Although theoretically robust, these methods are complex to implement and often yield poor prediction results with low robustness. Statistical methods predict future wind power based on historical data relationships, employing traditional models such as Autoregressive Moving Average (ARMA) [7] and Moving Average (MA). These models capture linear characteristics within the data but struggle to handle the complexity of wind power data, resulting in limited applicability and mediocre performance. With the continuous advancement of AI technology, an increasing number of researchers are adopting AI-based approaches for wind power prediction. Compared to traditional statistical and physical methods, AI methods can automatically extract latent feature correlations, efficiently process high-dimensional and nonlinear data, and eliminate the need for complex modelling processes, offering greater flexibility and robustness [8].

Wind power prediction time scales are generally categorized into ultra-short-term, short-term, and medium- to long-term predictions [9]. Ultra-short-term prediction involves forecasting within the next four hours at a 15 min resolution, primarily serving real-time dispatch in power systems. As the proportion of stochastic wind power increases, frequency fluctuations in power systems accelerate, making 15 min resolution predictions more challenging [10].

Current widely adopted wind power prediction methods predominantly employ ensemble approaches, integrating multiple models and techniques to comprehensively enhance prediction accuracy. Preprocessing input data through analysis and decomposition can significantly improve data reliability and prediction precision. Given the high volatility and noise levels in wind power data, signal decomposition techniques have attracted considerable attention in this field. Commonly used decomposition algorithms include VMD, EEMD, and CEEMDAN. VMD effectively addresses issues such as frequency overlap and large recursive errors found in Empirical Mode Decomposition (EMD). Additionally, VMD requires parameter tuning, which researchers optimize using various algorithms to improve its handling of nonlinear and non-smooth signals [11]. For instance, Reference [12] utilized Grey Wolf Optimizer (GWO)-optimized VMD to decompose sequences, reducing non-stationarity and improving prediction accuracy when feeding decomposed sequences into a constructed prediction model. Reference [13] applied the Sparrow Search Algorithm (SSA) to optimize VMD parameters, overcoming the limitations of manual parameter setting, and achieved enhanced prediction accuracy using LSTM. Reference [14] employed EEMD to decompose wind power sequences into subcomponents with different frequencies, mitigating the impact of non-smoothness on prediction accuracy. Similarly, Reference [15] used CEEMDAN for decomposition, followed by sample entropy-based reconstruction, and demonstrated improved prediction accuracy by feeding reconstructed high- and low-frequency sequences into a prediction model. Reference [16] employed a combination of the Improved Northern Goshawk Optimization (INGO) algorithm and a subtractive optimizer to determine the number of modes in VMD, enhancing both accuracy and interpretability. In reference [15], the original wind power sequence was decomposed using CEEMDAN, resulting in multiple sub-modes and a residual. The subsequences were then reconstructed by calculating their sample entropy. Subsequently, a Transformer model and a BiGRU-Attention model were utilized to predict the high-frequency and low-frequency sequences, respectively, based on the characteristics of each sequence, achieving favourable prediction results in wind power forecasting. Reference [17] initially decomposed the wind power sequence by integrating CEEMDAN with a rolling decomposition strategy. The sequences were then reconstructed into high-frequency, medium-frequency, and low-frequency categories based on sample entropy, and predictions were made using an Improved Dung Beetle Optimizer (IDBO)-optimized LSTM model. Reference [18] first applied CEEMDAN decomposition to extract Intrinsic Mode Functions (IMFs), addressing the nonlinearity and non-stationarity in the data. Subsequently, high-frequency and complex signals were further refined through VMD before being fed into a parallel prediction model for forecasting.

VMD transforms the decomposition problem into solving partial differential equations through a variational framework, thereby avoiding the subjective shortcomings of EMD-type methods that rely on empirical selection of extreme points and interpolation to generate envelope curves. This leads to more stable and repeatable decomposition results. By presetting the number of modes K and optimizing the bandwidth constraints, VMD forces each IMF to focus on a single central frequency, significantly reducing the risk of mode mixing. Wind power sequences are typical non-stationary, nonlinear stochastic processes. Their fluctuations are driven by the coupling of multiple factors, such as wind speed randomness, turbine mechanical inertia, and grid dispatching, and manifest as the superposition of multi-time-scale characteristics.

Traditional EMD-type methods are prone to confusing these features due to mode mixing, leading to error accumulation in subsequent modelling. In contrast, VMD proactively separates different frequency components through variational optimization, enabling a clearer extraction of dominant patterns (e.g., periodic trends, random fluctuations) in wind power sequences. However, after the initial VMD, two types of insufficiently captured components may remain: (1) minor fluctuations that are merged due to the limitation of the preset number of modes K; and (2) local high-frequency components caused by extreme fluctuations in wind power data. CEEMDAN, as a secondary decomposition tool, can both suppress boundary effect noise potentially introduced by VMD decomposition and further refine the multi-scale features, thereby avoiding the loss of effective information due to insufficient initial decomposition.

By leveraging the stability and anti-mode-mixing capability of variational optimization, VMD addresses the “coarse decomposition” challenge of multi-scale separation in wind power data. CEEMDAN achieves “fine decomposition” of residual sequences through adaptive noise suppression and complete component retention. The combination of the two methods avoids both the empirical dependence of traditional EMD-type methods and the limitations of a single decomposition algorithm. This integration facilitates the further extraction of hidden multi-scale features and ensures the completeness of the decomposition.

Furthermore, combining multiple machine learning techniques leverages their respective strengths while mitigating individual shortcomings. Reference [19] proposed a VMD-CNN-GRU hybrid model, demonstrating that VMD reduces wind speed sequence volatility, CNN extracts complex spatial features, and GRU captures temporal features, collectively outperforming single models. Reference [20] introduced a combined Temporal Convolutional Network (TCN) and Informer-based model, where TCN extracts hidden temporal features, and Informer encodes them for wind power prediction, achieving superior accuracy compared to standalone models. Reference [21] explored two novel hybrid models: CNN-ABiLSTM and CNN-Transformer-MLP. In these models, CNN captures short-term patterns in solar and wind energy data, while ABiLSTM and Transformer-MLP handle long-term patterns, excelling in daily, weekly, and monthly predictions.

The selection of hyperparameters in prediction models significantly influences their accuracy and robustness. Passing optimal parameters to the model fully exploits its predictive potential, achieving the best results. Reference [22] developed a photovoltaic power prediction method with PSO-optimized BiLSTM, yielding precise predictions. Reference [23] applied CEEMDAN to extract local features and time-frequency characteristics, optimized by an Improved Whale Optimization Algorithm (IWOA) for BiLSTM network training, observing enhanced prediction performance. Reference [24] utilized the RIME algorithm to optimize hyperparameters in an AM-TCN-BiLSTM model, achieving better prediction accuracy and lower error values. Table 1 provides a survey and comparative synthesis of previous research in this field.

Current wind power prediction technologies primarily focus on improving prediction accuracy through model combinations or optimizations. However, due to the complexity and high volatility of wind power data, single decomposition methods may still result in high data complexity after processing. Performing secondary decomposition on certain sub-sequences can further reduce the complexity of decomposed sequences, ensuring input data accuracy and reliability. Additionally, most existing studies rely on manual parameter settings for VMD, such as centre frequency methods, which require extensive experimentation and analysis. Moreover, optimization algorithms optimizing metrics like sample entropy, envelope entropy, or information entropy consider only single indicators, leading to uncertain decomposition results. Furthermore, most wind power prediction models employ unidirectional deep learning neural networks, potentially limiting their ability to extract bidirectional hidden information effectively. The optimization of hyperparameters in prediction models remains an area requiring further investigation. Traditional optimization algorithms often exhibit slow convergence speeds and limited exploration capabilities, necessitating improvements in algorithm performance. Existing research predominantly relies on single signal decomposition techniques (such as standalone VMD, CEEMDAN, or EEMD) to process wind power data. However, the strong non-stationarity of wind power arises from the superposition of multi-scale physical processes, making it difficult for a single decomposition to fully separate fluctuation components from different sources. This often results in mode mixing or information loss, leaving the decomposed subsequences with high residual complexity, which directly impairs the input quality for subsequent forecasting models. Furthermore, most current studies employ unidirectional forecasting models, which fail to capture potential dependencies preceding abrupt changes simultaneously. Regarding model parameter optimization, traditional heuristic algorithms are commonly used, which are prone to local optima, slow convergence, and especially low efficiency in global optimization when dealing with high-dimensional hyperparameter spaces. These gaps collectively lead to insufficient characterization of multi-scale wind power fluctuations and limited generalization capability in forecasting models, making it challenging to meet the requirements of ultra-short-term forecasting for high accuracy and strong adaptability.

This study addresses the shortcomings in the entire workflow of “decomposition-reconstruction-modelling-optimization” for ultra-short-term wind power forecasting in existing research: residual complexity after single decomposition, empirical and single-metric dependence for VMD parameter determination, the neglect of bidirectional temporal dependencies in unidirectional models, and the limited optimization capability of hyperparameter tuning algorithms. Wind power inherently exhibits strong volatility, high nonlinearity, and non-stationarity, and is susceptible to instantaneous influences from complex meteorological factors. For ultra-short-term wind power forecasting, rapid analysis of high-frequency data is required. The proposed model employs a BiTCN-BiGRU bidirectional structure to simultaneously learn contextual information from both past and future, accurately capturing minute-level power fluctuations. The VMD-CEEMDAN signal decomposition preprocessing reduces data non-stationarity, and combined with sample entropy reconstruction, enables the model to focus on key short-term features while avoiding interference from redundant noise. The causal convolution in BiTCN relies solely on historical data, preventing information leakage from the future; its parallel convolution computation enhances feature extraction speed, meeting the demand for rapid response. The proposed hybrid modelling framework systematically addresses the high-dimensional non-stationary characteristics of wind power, fully exploits bidirectional temporal dependencies, and achieves optimized model parameters.

The main innovations of this study are summarized as follows:

(1) A novel ultra-short-term wind power prediction model is proposed, predicting future 10 min wind power outputs based on the past 150 min of wind power data. Compared with single models and benchmark models, the proposed model demonstrates superior predictive performance.

(2) An improved decomposition method is introduced based on SSA and VMD. The SSA is employed to optimize the parameters of VMD adaptively, including the number of decomposition modes and penalty parameters, enhancing the quality of input data for the prediction model.

(3) A multi-layer data decomposition and reconstruction technique is proposed. After VMD, sequences are reconstructed using sample entropy theory. Sequences containing more information are subjected to secondary CEEMDAN decomposition and reconstruction, followed by combining all reconstructed sequences as final inputs. This approach reduces noise while preserving data features.

(4) A BiTCN-BiGRU prediction model is proposed and constructed for wind power prediction. The BiTCN network integrates TCN with bidirectional processing mechanisms, capturing bidirectional temporal dependencies to enhance feature extraction from complex time-series data. The extracted information is then fed into the BiGRU network for prediction, significantly improving wind power prediction accuracy.

(5) The traditional GWO algorithm is improved by incorporating strategies such as Golden Sine, Opposition-Based Learning, and Lévy Flight. These enhancements accelerate convergence speed and optimization capability. The IMGWO is applied to optimize multiple key hyperparameters of the prediction model, ensuring optimal performance.

The remaining sections of this paper are organized as follows: Section 2 describes the methods and relevant theories applied in this study. Section 3 introduces the proposed methodology, model framework, and technical flowchart. Section 4 verifies the algorithm using actual operational data from the Sotavento wind farm in Spain, analyzes the simulation results for each season, and evaluates the effectiveness of the wind power prediction model. Finally, Section 5 concludes the study.

2. Model Principles

2.1. VMD

Proposed in 2014, VMD is a variational mode decomposition method that simultaneously handles recursive and non-recursive signals. It decomposes original complex signals into multiple IMFs under specific constraints, particularly excelling in processing non-stationary and nonlinear time series [25]. The specific steps are as follows:

(1) Construction of the Variational Model:

VMD aims to minimize the estimated bandwidth of modal decompositions. The corresponding expression is:

\{\begin{matrix} \underset{\{p_{k}\}, \{ω_{k}\}}{m i n} \{\sum_{k} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * p_{k} (t)] e^{- j ω_{k} t}‖}^{2}\} \\ s . t . \sum_{k} p_{k} = f (t) \end{matrix}

(1)

where

p_{k} = \{p_{1}, p_{2}, \cdot \cdot \cdot, p_{k}\}

—the

k

component of the decomposition;

ω_{k} = \{ω_{1}, ω_{2}, \cdot \cdot \cdot, ω_{k}\}

—the centre frequency of the corresponding component;

\partial_{t}

—gradient operation;

*

—convolution operation;

p_{k} (t)

—the

k

component of the time t;

f (t)

—the original signal; j—imaginary unit; t—time.

(2) Solution of Constrained Variational Problems:

The Lagrangian penalty operator

λ (t)

and penalty coefficient

α

were introduced, such that when addressing scenarios involving Gaussian noise interference, the effective strategy involved converting the constraints to transform the constrained variational problem into an unconstrained variational problem. This approach simplified the solving process and mitigated noise effects, whose expression is:

\begin{matrix} L (\{p_{k}\}, \{ω_{k}\}, λ (t)) = \\ α \sum_{k} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * p_{k} (t)] e^{- j w t}‖}_{2}^{2} + \\ {‖f (t) - \sum_{k} p_{k} (t)‖}_{2}^{2} + 〈λ (t), f (t) - \sum_{k} p_{k} (t)〉 \end{matrix}

(2)

where

α

—penalty coefficient;

λ (t)

—Lagrange penalty operator; ω—centre frequency.

(3) Combining the Fourier isometric transform with the alternating direction penalty algorithm for alternating update

λ^{n + 1}, p_{k}^{n + 1}, ω_{k}^{n + 1}

is expressed as.

{\hat{p}}_{k}^{n + 1} (ω) = \frac{f (ω) - \sum_{i \neq k} {\hat{p}}_{k} (ω) + \frac{\hat{λ} (ω)}{2}}{1 + 2 α {(ω - ω_{k})}^{2}}

(3)

ω_{k}^{n + 1} = \frac{\int_{0}^{\infty} ω {|{\hat{p}}_{k} (ω)|}^{2} d ω}{\int_{0}^{\infty} {|{\hat{p}}_{k} (ω)|}^{2} d ω}

(4)

{\hat{λ}}^{n + 1} (ω) = {\hat{λ}}^{n} (ω) + τ (f (ω) - \sum_{i \neq k} {\hat{p}}_{k}^{n + 1} (ω))

(5)

where

: τ

—the noise tolerance, which satisfies the need for decomposition fidelity;

{\hat{p}}_{k}^{n + 1} (ω)

—the Wiener filter of

f (ω) - \sum_{i \neq k} {\hat{p}}_{k}^{n + 1} (ω)

;

ω_{k}^{n + 1}

—the centre of gravity of the power spectrum of the modal function;

{\hat{λ}}^{n + 1} (ω)

—the—the Fourier transform of

λ (t)

.

2.2. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)

When processing complex signals, the EMD method may encounter the issue of mode mixing. This can cause different signal modes to interfere with each other during the decomposition process, leading to inaccurate decomposition results. CEEMDAN integrating adaptive noise and ensemble decomposition concepts. By introducing noise adaptively at each decomposition step, CEEMDAN precisely separates different frequency components of the signal, reducing computational load and accelerating decomposition speed. By adaptively adjusting the noise level, the issue of mode mixing in modal decomposition can be mitigated. The CEEMDAN process involves the following steps [26]:

(1) White noise

ε_{0} ω_{i} (t)

is added to the original wind power signal L(t) to produce M distinct new sequences, when the white noise has zero mean and variance 0. Then, the new sequences are obtained by EMD. Thus, the 1st IMF and the 1st residual of CEEMDAN are obtained as:

{IMF}_{1} = M^{- 1} \sum_{i = 1}^{M} E_{1} (L (t) + ε_{0} ω_{i} (t))

(6)

R_{1} (t) = L (t) - {IMF}_{1} (t)

(7)

where operator E₁ is the 1st IMF obtained through EMD.

(2) For the following N−1 IMFs of CEEMDAN, the process is slightly different. First, the white noise

ε_{j - 1} E_{j - 1} (ω_{i} (t))

is added to the residual

R_{j - 1} (t)

to produce M distinct new residuals. Then, these new residuals are decomposed by EMD to obtain the jth IMF and the jth residual for CEEMDAN.

{IMF}_{j} (t) = M^{- 1} \sum_{i = 1}^{M} E_{1} (R_{j - 1} (t) + ε_{j - 1} E_{j - 1} (ω_{i} (t))), j = 1, \dots \dots, N

(8)

R_{j} (t) = R_{j - 1} (t) - {IMF}_{j} (t)

(9)

(3) Repeat step (2) until no meaningful IMFs can be extracted from the residual signal. Finally, the residual term of CEEMDAN is obtained.

R (t) = L (t) - \sum_{j = 1}^{N} {IMF}_{j} (t)

(10)

2.3. Grey Wolf Optimizer (GWO) Algorithm

GWO is a swarm intelligence optimization algorithm inspired by the social hierarchy and hunting behaviour of grey wolves [27]. Its advantages include fewer tunable parameters, intuitive principles, and strong global search capabilities.

In the Grey Wolf Optimization algorithm, the wolf pack’s social structure is divided into four ranks: α, β, δ, and ω. The α wolf represents the leader and symbolizes the current best solution; the β wolf supports the α wolf’s decisions and represents the second-best solution; the δ wolf follows the commands of α and β wolves and represents the third-best solution; and the ω wolves rank lowest, obeying the commands of α, β, and δ wolves. The hunting behaviour of wolves consists of two phases: encircling prey and hunting. These behaviours can be mathematically modelled.

In step 1, after discovering the prey, the pack surrounds the prey, and the distance

\vec{D}

between each grey wolf and the prey and the updated position of the grey wolves have

\vec{X} (t + 1)

are, respectively:

\vec{D} = |\vec{C} \cdot \vec{X} p (t) - \vec{X} (t)|

(11)

\vec{X} (t + 1) = \vec{X} p (t) - \vec{A} \cdot \vec{D}

(12)

where t denotes the current number of iterations;

\vec{X} (t)

denotes the current grey wolf position; and

\vec{X} p (t)

denotes the prey position;

\vec{A}

and

\vec{C}

denote the direction vectors, then we have

\vec{A} = a (2 \vec{r} 1 - 1)

(13)

\vec{C} = 2 \vec{r} 2

(14)

where

\vec{r} 1

and

\vec{r} 2

denote the random vectors of [0,1]; a denotes the decay factor, which decreases linearly from 2 to 0 as the number of iterations increases.

Step 2: After simulating the encircling behaviour, the α, β, and δ wolves guide the entire pack to gradually shrink the encirclement range, achieving the hunting objective. This process can be described mathematically.

\{\begin{matrix} \vec{D} α = |\vec{C} 1 \cdot \vec{X} α (t) - \vec{X} (t)| \\ \vec{D} β = |\vec{C} 2 \cdot \vec{X} β (t) - \vec{X} (t)| \\ \vec{D} δ = |\vec{C} 3 \cdot \vec{X} δ (t) - \vec{X} (t)| \end{matrix}

(15)

\{\begin{matrix} \vec{X} 1 = \vec{X} α - \vec{A} 1 \cdot \vec{D} α \\ \vec{X} 2 = \vec{X} β - \vec{A} 2 \cdot \vec{D} β \\ \vec{X} 3 = \vec{X} δ - \vec{A} 3 \cdot \vec{D} δ \end{matrix}

(16)

\vec{X^{'}} (t + 1) = \frac{\vec{X} 1 + \vec{X} 2 + \vec{X} 3}{3}

(17)

where

\vec{X} α

,

\vec{X} β

,

\vec{X} δ

denote the positions of α, β and δ wolves;

\vec{D} α

,

\vec{D} β

,

\vec{D} δ

denote the distances of individual grey wolves from α, β and δ wolves;

\vec{A}

and

\vec{C}

are directional random vectors;

\vec{X} 1

,

\vec{X} 2

,

\vec{X} 3

denote the updated positions of individual grey wolves based on the positions of α, β and δ wolves, respectively; and

\vec{X^{'}} (t + 1)

denote the final updated positions of individual grey wolves at the end of iteration in the tth round.

At the end of each iteration, the fitness values of all grey wolves are recalculated. Comparisons determine new α, β, and δ wolves, initiating the next round of iterations and progressively approaching the global optimum.

2.4. Improved Grey Wolf Optimizer (IMGWO) Algorithm

(1) Golden Sine Strategy

To enhance the exploration of the solution space, the Golden Sine strategy is introduced to update the position of grey wolves. Derived from Tanyildizi et al. [28]’s Golden Sine Algorithm (Golden-SA) in 2017, it replaces constant parameters in the original algorithm with sine functions to thoroughly explore local optima neighbourhoods, improving exploration capability. Simultaneously, the incorporation of the Golden Ratio enhances dynamic search, increasing algorithm coverage [29]. After introducing the Golden Sine strategy, the position update of grey wolves is expressed as follows:

Firstly, let

σ

= 0.5,

ρ \in

[0,1] be a random number, when

ρ

< 0.5, it is a no obstacle state. At this time the position of the grey wolf is updated as follows:

\{\begin{matrix} R_{i}^{τ + 1} = |\sin ω_{1}| \times R_{i}^{τ} + ω_{2} \times s i n ω_{1} \times Δ R \\ Δ R = |ε_{1} \times X_{best}^{g} - ε_{2} \times R_{i}^{τ}| \end{matrix}

(18)

where

ω_{1}

∈ [0,2

π

] and

ω_{2}

∈ [0,2

π

] are random numbers;

ε_{1}

and

ε_{2}

are the golden section coefficients, as specified in Equations:

ε_{1} = a_{1} \times h + a_{2} \times (1 - h)

(19)

ε_{2} = a_{1} \times (\begin{matrix} 1 - h \end{matrix}) + a_{2} \times h

(20)

where

h = (\sqrt{5} - 1) / 2

;

a_{1}

and

a_{2}

initial values are -π and

π

, respectively,

a_{1}

and

a_{2}

the update rules of and are as follows:

If the current solution is better than the optimal solution, the value of

ε_{2}

is assigned to

a_{2}

, the value of

ε_{1}

is assigned to

ε_{2}

,

ε_{1}

is updated according to Equation (19), otherwise, the value of

ε_{1}

is assigned to

a_{1}

, the value of

ε_{2}

is assigned to

ε_{1}

,

ε_{2}

is updated according to Equation (20).

When

ρ

\geq

0.5, it is the state with obstacles, and then the grey wolf position update rule is:

R_{i}^{τ + 1} = \begin{matrix} \sin ω_{1} \end{matrix} \times R_{i}^{τ} + ω_{2} \times \sin ω_{1} \times Δ R

(21)

(2) Opposition-Based Learning Strategy

This strategy updates individual positions during the algorithm’s iterative process with a certain probability, utilizing rich information from opposite individuals. Not only does it enhance population randomness but also improves algorithm convergence. Opposite learning involves solving the reverse of the current solution within the same space, comparing both solutions, and selecting the better one. Assuming x_i is a solution within the interval [a_i, b_i], its opposite solution is given by:

x^{*} = a_{i} + b_{i} - x_{i}

(22)

where a_i, b_i represent the lower and upper bounds of the solution space.

(3) Lévy Flight Strategy

Traditional GWO algorithms suffer from reduced population diversity when all grey wolves move toward the α wolf, especially if the α wolf does not represent the global optimum.

To address this issue, the Lévy Flight strategy is adopted to expand the search domain and increase solution diversity. Lévy Flight is a random walk process where step lengths follow a Lévy distribution, characterized by heavy tails, enabling long-distance jumps in the search space to discover new regions.

The Lévy Flight strategy updates the position of grey wolves x(t) as follows:

\vec{X^{'}} (t) = \vec{X} (t) + r \oplus L e v y (β)

(23)

where

\vec{X} (t)

denotes the individual solution after this round of iteration;

\oplus

denotes the point-to-point multiplication; r denotes the weight of the control step; β is in the range of (1,3), and in this paper, β takes the value of 1.5,

L e v y (β)

Denoting the stochastic search path, there are:

\{\begin{matrix} r = 0.01 [\vec{X} (t) - \vec{X} best] \\ L e v y (β) = \frac{u}{| v |^{\frac{1}{β}}} \end{matrix}

(24)

where

\vec{X} best

denotes the historical optimal solution; the variables u and v obey a normal distribution, then there are:

\{\begin{matrix} u \sim N 1 (0, {σ_{u}}^{2}) \\ v \sim N 2 (0, {σ_{v}}^{2}) \end{matrix}

(25)

where

σ_{u}

and

σ_{v}

are the standard deviation of the variables u and

v

, respectively,

σ_{u} = \{\frac{Γ (1 + β) \sin \frac{π β}{2}}{β \cdot Γ (\frac{1 + β}{2}) \cdot 2^{\frac{β - 1}{2}}}\} \frac{1}{β}

,

σ_{v}

= 1, and

Γ

is the gamma function.

In this study, the Lévy Flight strategy is applied exclusively to the α wolf, guiding other wolves indirectly and influencing the entire group dynamics. This approach significantly reduces computational time.

The optimization-solving steps of the Improved Grey Wolf Algorithm are illustrated in Figure 1 [30].

The primary solving steps of the Improved Grey Wolf Algorithm are as follows:

(1) Initialize the positions of grey wolves (pos) and calculate initial fitness values. Ensure all grey wolves remain within the defined search space. At the start of each iteration, update the positions of α, β, and δ (the top three optimal solutions) based on the current population’s fitness values.

(2) Update each grey wolf’s position using linearly decreasing parameter a and random coefficients A and C. Each grey wolf updates its position based on weighted averages of α, β, and δ. When ∣A∣ ≥ 1, the grey wolf explores randomly; when ∣A∣ < 1, it gradually approaches the optimal solution.

(3) Golden Sine Strategy (GSS):

Apply this strategy during each iteration to enhance search capability and avoid falling into local optima.

(4) Opposition-Based Learning Strategy:

Apply this strategy every five iterations to further enhance exploration of the search space.

(5) Lévy Flight Strategy:

Randomly select 20% of grey wolves for Lévy Flight during each iteration to introduce long-distance jumps and enhance global search capability.

(6) Boundary Checks and Fitness Evaluation:

After each position update, ensure all grey wolves remain within the defined search space. Recalculate fitness values and update personal best (pBest) and global best (gBest). Terminate the process when the maximum number of iterations is reached, outputting the global optimum.

2.5. Bidirectional Temporal Convolutional Neural Network (BiTCN)

Temporal Convolutional Networks (TCNs) are architectures based on convolutional neural networks designed to solve time-series problems, effectively capturing long-term dependencies [31]. Compared to traditional convolutional networks, TCNs possess strong feature extraction capabilities. They combine causal convolutions, dilated convolutions, and residual connections to form a new network structure.

In TCNs, causal convolutions are specialized operations that consider only preceding moment features during convolution, avoiding leakage of future information and maintaining causality in time-series data processing. To capture both forward and backward features in the data, this study employs a BiTCN structure to extract bidirectional features, achieving higher training model precision. The architecture of the bidirectional dilated causal convolution network is shown in Figure 2 [32].

However, simply increasing the number of network layers or enlarging the convolution kernel size to broaden the receptive field leads to significant increases in computational costs. To address this issue, blank values are inserted into the convolution kernels, a method known as dilated convolutions. Dilated convolutions effectively expand the receptive field of network layers, allowing higher-level nodes to cover broader input information without additional computational burdens. For time-series data x ∈ Rn and a convolution kernel f: {0, 1, …, k − 1} → R, the dilated convolution calculation formula is as follows:

F (s) = (x \times_{d} f) (s) = \sum_{i = 0}^{k - 1} f (i) \cdot x_{s - d \cdot i}

(26)

where d is the size of the expansion factor, k is the size of the convolution kernel, and

s - d \cdot i

denotes the past direction of the time series.

The TCN neural network consists of stacked TCN residual blocks, each combining dilated causal convolutions and neural network processing layers. Residual blocks include dilated causal convolutions, weight normalization, ReLU activation functions, and Dropout, as shown in Figure 3 [32].

2.6. Bidirectional GRU Neural Network

GRU can effectively learn and remember the dependencies over long time steps [33]. The structure of the GRU is shown in Figure 4.

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}])

(27)

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}])

(28)

{\overset{ˇ}{h}}_{t} = \tanh (W_{h} \cdot [r_{t} \cdot h_{t - 1}, x_{t}])

(29)

h_{t} = (1 - z_{t}) \cdot h_{t - 1} + z_{t} \cdot {\tilde{h}}_{t}

(30)

where

r_{t}

,

z_{t}

are the reset gate and update gate, respectively;

σ

is the Sigmoid activation function; W is the weight matrix;

{\overset{ˇ}{h}}_{t}

is the intermediate memory state;

x_{t}

is the input at time t; and

h_{t}

is the hidden state at time t.

The Bidirectional GRU (BiGRU) consists of two GRU hidden layers combined into a bidirectional neural network, allowing the current moment’s output to connect with both the previous and subsequent moments’ states, facilitating deeper feature extraction. The structure of BiGRU is shown in Figure 5.

The output information at time t is the sum of the forward and backward hidden layer outputs, calculated as follows:

\{\begin{matrix} h_{t}^{'} = G (x_{t}, h_{t - 1}^{'}) \\ h_{t}^{″} = G (x_{t}, h_{t - 1}^{″}) \\ h_{t} = α_{t} h_{t}^{″} + β_{t} h_{t}^{″} + b_{t} \end{matrix}

(31)

where G(·) is the state of the corresponding vector-encoded GRU hidden layer;

h_{t}^{'}

,

h_{t}^{″}

are the forward and backward hidden layer output states, respectively;

α_{t}

and

β_{t}

are the output weights of the corresponding hidden layer, respectively; and

b_{t}

is the bias of the hidden layer state at time t.

2.7. Improved Variational Modal Decomposition

When using VMD to decompose a time series, the parameters need to be manually set. Among these, the decomposition number K and the penalty factor α significantly influence the decomposition performance. If the decomposition number K is set too high, over-decomposition may occur, leading to redundant components. Conversely, if K is set too low, some signals may be lost. The penalty parameter α affects the bandwidth of each modal component; a smaller α results in wider bandwidths for the IMF components, while a larger α narrows the signal bandwidth of the IMF components. Therefore, it is necessary to use the SSA to adaptively search for the optimal parameter combination [K,α].

When optimizing parameters with the SSA method, an optimal solution must be selected. To achieve this, a fitness function tailored to wind power sequences needs to be constructed. A single metric cannot comprehensively and accurately reflect the characteristics of a time series signal. Therefore, a composite index based on sample entropy and mutual information is established, and the fitness function is defined as the minimum value of this composite index [34].

Sample entropy is used to assess the complexity and variation in a time series, i.e., the probability that a new pattern may arise in the sequence given a change in dimension.

A time series of length N is given as.

X (i) = \{\begin{matrix} x (i), x (i + 1), \dots, x (i + m - 1) \end{matrix}\}

(32)

where i = 1, 2, …, N − m + 1.

Define d[X(i),X(j)](i ≠ j) as the difference in the maximum distance between two corresponding elements X(i) and X(j), i.e.,

d [X (i), X (j)] = \max_{k \in (0, m - 1)} | x (i + k) - x (j + k) |

(33)

The threshold r is known (r > 0), calculates the number of d[X(i),X(j)] < r and ratios it to the total number of vectors N − m, i.e.,

B_{i}^{m} (r) = \frac{1}{N - m} n u m {d [X (i), X (j)] < r}

(34)

Define B^m(r) as.

B^{m} (r) = \frac{1}{N - m + 1} \sum_{i = 1}^{N - m + 1} B_{i}^{m} (r)

(35)

The sample entropy of this sequence is obtained by increasing the dimension to m + 1 and calculating its value as.

s a m p E n (m, r) = \lim_{N \to \infty} \{- \ln [\frac{B_{i}^{m + 1} (r)}{B^{m} (r)}]\}

(36)

When N is a finite value, the equation is

s a m p E n (m, r, N) = - \ln [\frac{B_{i}^{m + 1} (r)}{B^{m} (r)}]

(37)

Mutual information (MI) is mainly used in information theory to indicate the degree of correlation between two events [35]. which is not easily disturbed by external factors, and its expression is as follows

M I (X, Y) = H (Y) - H (Y | X)

(38)

where X and Y are different events, H(Y) is the entropy of Y, and H(Y|X) is the conditional entropy of Y when X is known. The MI is normalized as

m i = M I (X, Y) / \sqrt{H (X) H (Y)}

(39)

The larger the MI entropy value, the stronger the correlation between two events. As far as the IMF component is concerned, the richer signal feature information it contains, the larger its MI entropy value is

The established composite indicator

S I = \frac{s a m p E n (m, r, N)}{m i}

(40)

This indicator takes into account both the complexity of the IMF component and the feature information, and when the IMF component contains richer feature information, the smaller the value of the composite indicator SI is, so its minimum value is used as the fitness function, expressed as

f i t n e s s = \min_{1 \sim K} {S I}

(41)

3. Ultra-Short-Term Wind Power Prediction Model

3.1. Proposed Hybrid Prediction Model

The framework of the wind power prediction model proposed in this study is shown in Figure 6. It consists of the following five steps:

(1): Data Preprocessing: This includes data collection, handling anomalies and missing values, dividing the data into different time scales, and partitioning the final selected seasonal data into training, testing, and validation sets as inputs for the model.
(2): Data Decomposition and Reconstruction: The preprocessed wind power data undergo multi-layer decomposition and reconstruction. First, SSA-VMD is applied, followed by sequence reconstruction based on sample entropy theory. Sequences containing more information are subjected to secondary CEEMDAN decomposition and reconstruction. Finally, all reconstructed sequences from multiple layers are combined as the final input sequences.
(3): Prediction Model Construction: A BiTCN-BiGRU deep learning neural network is constructed as the prediction model for the decomposed and reconstructed components of wind power. The BiTCN addresses the issue of incomplete capture of unidirectional temporal features by combining forward and backward convolutional networks to enhance feature extraction. The extracted features are then fed into the BiGRU for training and prediction.
(4): Model Parameter Optimization: The SSA optimizes the parameters K (number of decomposition modes) and α (penalty factor) for VMD by solving the minimum value of a composite optimization index. Simultaneously, the IMGWO algorithm optimizes multiple hyperparameters of the BiTCN-BiGRU prediction model to prevent performance degradation caused by improper parameter selection, ensuring optimal predictive performance.
(5): Model Evaluation and Visualization of Prediction Results: The proposed prediction method is compared with multiple baseline models through simulation experiments, and the model’s performance is evaluated using various metrics to analyze its effectiveness.

The double-layer decomposition and optimized prediction process in the proposed model is illustrated in Figure 7, involving the following steps:

(1): Perform VMD on the raw wind power sequence, using the SSA to optimize the parameters k and α. The optimization target is to find the minimum value of the composite index, yielding the best VMD decomposition sequence.
(2): Calculate the sample entropy of each subsequence after VMD and reconstruct the subsequences into five components. The last decomposed sequence is retained as IMF6.
(3): Perform secondary decomposition on IMF6 using CEEMDAN and reconstruct the resulting sequences based on sample entropy similarity into five components.
(4): Combine the five reconstructed subsequences from VMD with the five subsequences from CEEMDAN secondary decomposition to obtain a total of ten sequences as inputs for prediction.
(5): Input the combined subsequences into the constructed BiTCN-BiGRU prediction model for forecasting. Use the proposed IMGWO optimization algorithm to optimize the hyperparameters of the prediction model and use the optimized hyperparameters for prediction.
(6): Superimpose the predicted values of all subsequences obtained from the prediction model to derive the final prediction value.

3.2. Evaluation Metrics

Four independent evaluation metrics are used to assess the performance of the described prediction model: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Squared Error (MSE), and Coefficient of Determination (R²). Mathematically, these evaluation metrics can be expressed as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(42)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(43)

M S E = \frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}

(44)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(45)

where

y_{i}

denotes actual wind power,

{\hat{y}}_{i}

denotes predicted wind power,

\bar{y}

denotes average wind power, and n denotes the number of samples. the smaller the value of MAE, RMSE, and MAPE, the better the predictive performance of the model. R² is a measure of the consistency between predicted and actual data, and the closer the value is to 1, the more accurate the model is [23].

4. Experimental Analysis

4.1. Data Source

Real operational data from the Sotavento Galicia wind farm in Spain during 2020 were used to validate the proposed method and model. The data used in this study can be obtained from the following website: https://www.sotaventogalicia.com/en/technical-area/real-time-data/historical/ (accessed on 1 October 2024). To effectively verify the accuracy of the proposed model, four groups of wind power data from different seasons were selected, including data from March 1 to 31, May 1 to 31, August 1 to 31, and November 1 to 30. The data sampling interval was 10 min. The model in this study focuses on ultra-short-term wind power prediction. It inputs historical wind power data from the past 150 min to forecast wind power for the next 10 min, with the data organized and fed into the model using a sliding window approach.

During data collection and processing, some zero and missing data points may occur, significantly affecting the accuracy of wind power prediction. Therefore, it is necessary to preprocess the raw wind power data using zero removal and mean interpolation. After preprocessing, the spring dataset contained 4019 data points, the summer dataset contained 3930 data points, the autumn dataset contained 3413 data points due to the removal of numerous zero values, and the winter dataset contained 3980 data points.

To systematically validate the prediction performance of the proposed model, the study employed a train-validation-test three-way split of the dataset, strictly adhering to the principle of “independent partitioning, no crossover” to mitigate the risk of data leakage. Specifically: the training set was used for supervised learning of model parameters, the validation set for optimization and calibration of hyperparameters, and the test set served as independent, unseen data to evaluate the model’s generalization capability.

Considering the seasonal fluctuations characteristic of wind power, the study selected representative samples from the original data across four seasons, constructing four seasonal datasets (spring, summer, autumn, winter) to test the model’s adaptability under different climatic scenarios. For outliers in the data, values with negative power generation were first uniformly filled, replaced with 0. For other anomalies—points that deviated significantly from surrounding values or fell well outside the normal range—a correction was applied by substituting the average of the two adjacent data points.

For the preprocessed datasets, the partitioning rules for each seasonal dataset are as follows:

Spring, Summer, and Winter datasets: Divided into training and test sets in a 9:1 ratio based on the total sample size. Additionally, 10% of the training set was extracted to form the validation set for hyperparameter optimization and selection.

Autumn dataset: Due to a relatively smaller sample size after preprocessing, the training-to-validation ratio was adjusted to 7.5:2.5 (i.e., the training set accounts for approximately 75% of the total samples, and the validation set for about 25%), while the test set remained at 10% of the total samples.

The dataset division is shown in Figure 8.

4.2. Data Preprocessing and Experimental Settings

First, the data were normalized. This process helps improve the convergence speed of the model. Our model adopts the min-max normalization method, defined by the following formula:

x_{N} (i) = \frac{x (i) - \min x (i)}{\max x (i) - \min x (i)}

(46)

where

i

represents the data points in the dataset;

x (i)

denotes the value to be normalized;

x_{N} (i)

is the normalized value; and

\min x (i)

and

\max x (i)

represent the minimum and maximum values of the data to be normalized, respectively.

The experiments were conducted on a PC equipped with an AMD Ryzen 9 7950X 16-Core Processor (4.50 GHz) and 64.00 GB of RAM, using Python 3.9. The BiTCN-BiGRU module was developed with the TensorFlow/Keras framework and trained using the Adam optimizer to balance training efficiency and stability. The model comprises five trainable layers (including the output layer), arranged in the order of data flow as follows: a Bidirectional Temporal Convolutional Network Layer (BiTCN Layer) to extract local temporal features; a Bidirectional Gated Recurrent Unit Layer (BiGRU Layer) to capture bidirectional long-term temporal dependencies; a Dense Layer for feature dimensionality reduction and nonlinear transformation; a LeakyReLU activation layer to introduce nonlinearity and alleviate gradient vanishing; and an Output Layer to generate prediction results. The loss function is set to the Mean Squared Error (MSE) metric.

For all four seasons, uniform default parameters are configured as follows: nb_filters = 64, kernel_size = 2, BiGRU_units = 50, num_epochs = 30, and batch_size = 32. Subsequently, the IGWO optimization algorithm is employed to optimize a subset of the model’s hyperparameters, aiming to achieve optimal prediction performance.

4.3. SSA-VMD-SE-CEEMDAN Multi-Layer Data Decomposition and Reconstruction

The number of modes determines the performance of VMD. When the number of modes is small, VMD tends to filter out important information in the wind power sequence, affecting the performance of the prediction model. Conversely, when the number of modes is large, centre frequencies of some IMFs may overlap, leading to mode mixing or generating additional noise [36]. By using the SSA to solve and optimize the constructed composite optimization index, both sample entropy and mutual information are considered simultaneously, yielding the optimal VMD parameters.

The wind power sequence is decomposed into multiple subsequences to eliminate noise in the original data, extract main features, and perform adaptive decomposition. Through multiple SSA-VMD experiments, the optimal combination of decomposition mode number k and quadratic penalty term α was determined. For the data decomposition and reconstruction steps of the proposed method in this paper, all operations are uniformly performed using MW as the unit. The population size for the SSA (Sparrow Search Algorithm) solver is set to 30, and the maximum number of iterations is set to 20. The optimization objective of SSA is defined as the composite evaluation metric constructed in this study, with the search intervals for parameters k and α set to [5, 10] and [1000, 2500], respectively. The SSA-optimized VMD parameter results are summarized in Table 2, and the VMD decomposition results for each season are shown in Figure 9.

After performing SSA-VMD on the wind power sequence, the sample entropy of each IMF for each season was calculated, as shown in Table 3. By measuring the complexity of the decomposed sequences and reconstructing them, redundant information is reduced, further improving prediction performance. As shown in Table 3, for the spring decomposition results, the complexity of the subsequences generally increases gradually, with IMF1 having the lowest SE value and complexity (0.0136) and IMF6 having the highest SE value and complexity (0.5565). Based on the similarity of SE values, IMF3 and IMF4 were merged into a new component IMF3. For the summer season, nine IMFs were obtained after SSA-VMD decomposition. Due to the similarity of SE values between IMF1 and IMF2, they were merged into a new component IMF1; similarly, IMF5 and IMF6 were merged into IMF4, and IMF7 and IMF8 were merged into IMF5. For the autumn season, seven IMFs were obtained after SSA-VMD decomposition, and IMF1 and IMF2 were merged into IMF1 due to their similar SE values. For the winter season, nine IMFs were obtained after SSA-VMD decomposition, with IMF1 and IMF2 merged into IMF1, IMF5 and IMF6 merged into IMF4, and IMF7 and IMF8 merged into IMF5. By merging VMD-decomposed components based on sample entropy similarity, prediction complexity is reduced, avoiding redundancy.

Through the reconstruction of VMD components with similar sample entropy sequences, six reconstructed components were obtained. Additionally, due to the high SE value of the residual sequence IMF6 after reconstruction, and as can be seen from the VMD decomposition diagrams for each season, IMF6 exhibits high complexity with complex sequence information and significant fluctuations. This indicates that the sequence still has a relatively high complexity, and directly predicting this sequence may increase the error of the prediction model. Therefore, CEEMDAN is used to perform secondary decomposition on this sequence to reduce its complexity.

Similarly, in order to save computational costs and reduce data redundancy, the sample entropy values of the sequences after CEEMDAN decomposition are calculated for component reconstruction. The statistics of sample entropy and the reconstructed sequences after secondary decomposition for each season are shown in Table 4. The first few components after CEEMDAN decomposition have higher complexity, while the complexity of the subsequent components decreases gradually. Based on the theory of sample entropy, the decomposed components are reconstructed according to their similar SE values. For the spring season, nine decomposition components were obtained after CEEMDAN decomposition. IMF5~IMF9 have high similarity in SE values and low complexity; therefore, IMF5~IMF9 are merged into a new component IMF5. For the summer season, twelve decomposition components were obtained after CEEMDAN decomposition. IMF3~IMF5 have high similarity in SE values and similar complexity; thus, IMF3~IMF5 are merged into a new component IMF3. Similarly, since the SE values of IMF7~IMF12 are small and their complexity is low, IMF7~IMF12 are merged into a new component IMF5. By reconstructing components with similar SE values, five sequences are obtained after reconstruction. For the autumn season, nine decomposition components were obtained after CEEMDAN decomposition. IMF3 and IMF4 have high similarity in SE values and similar complexity; thus, IMF3 and IMF4 are merged into a new component IMF3. IMF5 and IMF6 also have high similarity in SE values and similar complexity; thus, IMF5 and IMF6 are merged into a new component IMF4. Similarly, since the SE values of IMF7~IMF9 are small and their complexity is low, IMF7~IMF9 are merged into a new component IMF5. After reconstruction based on components with similar SE values, five sequences are obtained. For the winter season, ten decomposition components were obtained after CEEMDAN decomposition. IMF3 and IMF4 have high similarity in SE values and similar complexity; thus, IMF3 and IMF4 are merged into a new component IMF3. IMF5 and IMF6 also have high similarity in SE values and similar complexity; thus, IMF5 and IMF6 are merged into a new component IMF4. Similarly, since the SE values of IMF7~IMF10 are small and their complexity is low, IMF7~IMF10 are merged into a new component IMF5. After reconstruction based on components with similar SE values, five sequences are obtained. The results of the multilayer decomposition of the sample entropy are shown in Figure 10. The sequences after CEEMDAN decomposition and reconstruction for each season are shown in Figure 11, Figure 12, Figure 13 and Figure 14.

The components obtained after the first VMD, the secondary CEEMDAN decomposition, and the reconstruction based on similar sample entropy are combined, resulting in a total of 10 decomposed components. These components serve as the final input sequences for the prediction model. The final reconstructed and combined sequences for each season are shown in Figure 15, Figure 16, Figure 17 and Figure 18.

4.4. Verification of the Performance of the IMGWO Optimization Algorithm

To evaluate and measure the optimization performance and generalization ability of the proposed IMGWO algorithm, it was compared with several commonly used optimization algorithms. A comparative analysis was conducted using a control group of five optimization algorithms: PSO, DBO, GA, SSA, and GWO. To ensure a fair comparison, a uniform parameter set was applied to all optimizers, with the population size and maximum iterations fixed at 100 and 1000, respectively. Additionally, multiple test functions from CEC2005 were employed to further validate the optimization performance and convergence speed of the algorithms. The selected test functions are shown in Table 5, and their function graphs are illustrated in Figure 19.

After conducting multiple simulation optimization experiments, the performance of the proposed IMGWO optimization algorithm was further verified. Eight test functions were selected from the test function set for optimization testing. Optimization fitness iteration curves of the IMGWO algorithm and the control group algorithms are shown in Figure 20. As can be observed from Figure 20, the proposed IMGWO optimization algorithm demonstrates faster convergence speed compared to the control group algorithms and exhibits a stronger ability to explore global optima. Moreover, compared to the unimproved GWO algorithm, the improved IMGWO algorithm shows significantly better optimization performance across all test functions.

The IMGWO optimization algorithm proposed in this article was applied to optimize the hyperparameters of the constructed prediction model. By effectively optimizing the hyperparameters within the given range, the optimized parameters were assigned to the prediction model to enhance its predictive performance. This approach ensures the rationality and effectiveness of the model’s parameter selection, avoiding a decline in prediction performance caused by inappropriate hyperparameter choices. Thus, the proposed IMGWO algorithm was utilized to optimize the hyperparameters of the constructed BiTCN-BiGRU prediction model.

To achieve optimal performance for the BiTCN-BiGRU prediction model, this study employs an Improved Grey Wolf Optimization algorithm (GWO-GSS-Levy) for the automatic optimization of key hyperparameters. The validation set Root Mean Square Error (RMSE) is adopted as the fitness function, with the objective of minimizing the model’s prediction error. The hyperparameters to be optimized comprise two categories—model structure and training parameters—totaling four variables: (1) the number of TCN filters (nb_filters), which controls the feature extraction channels of the temporal convolutional network, with a range set to [32, 128]; (2) the number of BiGRU hidden units (BiGRU_units), which determines the nonlinear expressive capacity of the bidirectional gated recurrent unit, with a range of [30, 200]; (3) the number of training epochs (num_epochs), i.e., the number of model iteration rounds, with a range of [20, 50]; and (4) the batch size (batch_size), representing the number of samples per training batch, with a range of [8, 32]. The hyperparameter boundaries are explicitly defined in problem_dictas bounds = [(32, 128), (30, 200), (20, 50), (8, 32)], ensuring that the search space covers a reasonable parameter range.

An improved GWO algorithm incorporating multiple strategies is utilized to search for optimal hyperparameters. The process begins with population initialization: N = 10 individuals with dimension dim = 4 are randomly generated within the hyperparameter boundaries. In each generation, grey wolf positions are updated through the following mechanisms: basic GWO updating (with linearly decreasing parameter a balancing exploration and exploitation, guided by Alpha, Beta, and Delta positions), the Golden Sine Strategy (GSS, which introduces sine function periodicity to adjust positions and enhance local search), the Opposition-Based Learning Strategy (which generates opposition-based solutions for 20% of individuals every 5 generations to broaden the search range), and the Lévy Flight Strategy (which randomly selects 20% of individuals and superimposes heavy-tailed distribution step sizes to enhance global exploration). The fitness function of the new positions is computed in each generation, updating both the individual historical best and the global optimal positions.

For the IMGWO optimization applied across four seasons, the maximum number of iterations per generation is set to 13, and the population size is set to 10. Since the optimization is performed in a four-dimensional space, the parameters for the IGWO optimization were determined through multiple experimental trials. The optimization ranges for each parameter, as well as the optimized hyperparameters obtained through the IMGWO algorithm, are summarized in Table 6.

4.5. Analysis of Wind Power Prediction Results

To further validate the effectiveness and correctness of the model presented in this article, multiple comparative models were selected to compare their performance with the proposed model. Additionally, predictions from models that did not undergo secondary decomposition or only underwent single decomposition were considered for performance comparison. These included models such as TCN-BiGRU, BiTCN-BiGRU, CNN-BiLSTM-AM, XGBOOST, VMD-BiTCN-BiGRU, CEEMDAN-BiTCN-BiGRU, and VMD-CEEMDAN-BiTCN-BiGRU. The proposed model was compared with these models on the test datasets used in this article for wind power prediction, which included data from spring, summer, autumn, and winter. By separately constructing models and making predictions for each season’s dataset, the effectiveness and generalization performance of the proposed method could be more effectively validated.

The prediction errors for the spring season are shown in Table 7. Compared to other widely used models currently studied, the model proposed in this article demonstrated the best predictive performance. Specifically, the MSE, MAE, and RMSE metrics were 8.1, 69.3, and 89.9, respectively, while the R² value reached 0.9905. To intuitively display the error performance of the model, bar charts and radar charts were used for error comparisons, as shown in Figure 21 and Figure 22. Compared to other single models, the proposed model in this article achieved the smallest MSE, MAE, and RMSE values and the highest R² value, demonstrating the effectiveness of the proposed model.

Similarly, for the summer season, the prediction errors are shown in Table 8. The proposed model in this article demonstrated the best predictive performance compared to other comparative models. Specifically, the MSE, MAE, and RMSE metrics were 6.8, 64.9, and 81.7, respectively, while the R² value reached 0.9978. Compared to the control group models, all error metrics were better controlled, and prediction accuracy was effectively improved. To intuitively display the error performance of the model, bar charts and radar charts were used for error comparisons, as shown in Figure 23 and Figure 24. Compared to other single models, the proposed model in this article achieved the smallest MSE, MAE, and RMSE values and the highest R² value, demonstrating the effectiveness of the proposed model.

For the wind power prediction in autumn, the prediction errors are shown in Table 9. The proposed model in this article demonstrated the best predictive performance compared to other comparative models. Specifically, the MSE, MAE, and RMSE metrics were 14.9, 83.4, and 122.4, respectively, while the R² value reached 0.9963. Compared to the control group models, all error metrics were better controlled, and prediction accuracy was effectively improved. To intuitively display the error performance of the model, bar charts and radar charts were used for error comparisons, as shown in Figure 25 and Figure 26. Compared to other single models, the proposed model in this article achieved the smallest MSE, MAE, and RMSE values and the highest R² value, demonstrating the effectiveness of the proposed model.

For the wind power prediction in winter, the prediction errors are shown in Table 10. The proposed model in this article demonstrated the best predictive performance compared to other comparative models. Specifically, the MSE, MAE, and RMSE metrics were 0.0116, 0.0755, and 0.1079, respectively, while the R² value reached 0.9988. Compared to the control group models, all error metrics were better controlled, and prediction accuracy was effectively improved.

To intuitively display the error performance of the model, bar charts and radar charts were used for error comparisons, as shown in Figure 27 and Figure 28. Compared to other single models, the proposed model in this article achieved the smallest MSE, MAE, and RMSE values and the highest R² value, demonstrating the effectiveness of the proposed model.

4.6. Ablation Study

To further validate the rationality of the proposed method and the effectiveness of interactions between the introduced modules, ablation experiments were conducted by designing baseline models and performing modular comparative tests. The aim is to verify the contribution of each component to the overall model performance. The specific experimental configurations are as follows: ① Baseline + VMD, ② Baseline + CEEMDAN decomposition, ③ Baseline + two-layer decomposition and reconstruction, ④ Baseline + IMGWO optimization algorithm. Here, the baseline model is defined as the BiTCN-BiGRU prediction model. The results of the ablation experiments are summarized in Table 11.

The ablation study reveals that each added module plays a critical role in enhancing the overall prediction accuracy of the baseline BiTCN-BiGRU forecasting model for wind power. The baseline prediction model itself demonstrates high prediction accuracy across all four seasons. Furthermore, by incorporating a two-layer decomposition and reconstruction technique and using the reconstructed sequence data for prediction, the model’s accuracy is further improved compared to using a single decomposition method. Specifically, the R² values for the baseline model combined with the two-layer decomposition and reconstruction technique reach 0.9791, 0.9976, 0.9949, and 0.9982 for the four seasons, respectively.

After optimizing the model’s hyperparameters using the IMGWO proposed in this study, the prediction errors for each season are further reduced. For spring, the MSE, MAE, and RMSE values are reduced to 8.1, 69.3, and 89.9, respectively, with an R² of 0.9905. For summer, the MSE, MAE, and RMSE values are 6.8, 64.9, and 81.7, respectively, with an R² of 0.9978. For autumn, the MSE, MAE, and RMSE values are 14.9, 83.4, and 122.4, respectively, with an R² of 0.9963. For winter, the MSE, MAE, and RMSE values are 11.6, 75.5, and 107.9, respectively, with an R² of 0.9988. Through the ablation experiments conducted for each module, the superiority of the proposed technical approach is further validated. The model and methods introduced in this study contribute to enhancing the prediction accuracy of wind power forecasting.

5. Conclusions

This study proposes a prediction method and model tailored for ultra-short-term wind power forecasting. By constructing a hybrid ultra-short-term wind power prediction model based on multi-layer data decomposition–reconstruction and IMGWO-optimized BiGRU-BiTCN, the accuracy of ultra-short-term wind power forecasting is effectively improved.

The VMD technique was employed to perform the primary decomposition of raw wind power data, reducing the noise interference present in the data. The SSA was used to optimize and solve for the optimal values of K and α. Subsequently, the sample entropy theory was applied to reconstruct the decomposed sequences. For the last decomposed sequence containing complex information, the CEEMDAN technique was used for secondary decomposition, followed by sequence reconstruction.

To prevent the reliability of the prediction model from decreasing due to improper parameter settings, an optimization algorithm was utilized to solve for multiple hyperparameters of the BiTCN-BiGRU prediction model. The GWO algorithm was systematically enhanced with multiple strategies to improve its global search performance and convergence rate, leading to a marked increase in optimization efficacy.

The evaluation employed actual operational data from a wind farm to verify the predictive performance of the proposed method through simulations. Four evaluation metrics were employed, and datasets covering four seasons—spring, summer, autumn, and winter—were constructed for model validation. For spring, the MSE, MAE, and RMSE prediction error metrics reached 8.1, 69.3, and 89.9, respectively, with an R² value of 0.9905. For summer, the MSE, MAE, and RMSE values were 6.8, 64.9, and 81.7, respectively, with an R² of 0.9978. For autumn, the MSE, MAE, and RMSE values were 14.9, 83.4, and 122.4, respectively, with an R² of 0.9963. For winter, the MSE, MAE, and RMSE values were 11.6, 75.5, and 107.9, respectively, with an R² of 0.9988.

Although the model proposed in this study demonstrates strong performance in wind power forecasting tasks, it also has certain limitations. The forecasting model constructed in this work relies heavily on computational resources. Its high performance depends on a complex network architecture and bidirectional feature fusion mechanisms, which significantly increase computational complexity. Additionally, the model training process requires high-quality time-series data. Moreover, due to the use of optimization algorithms for hyperparameter tuning—which necessitates repeated calls to the model training interface—parameter calibration on large-scale datasets can be time-consuming. The parameters obtained and the model constructed may also be susceptible to potential overfitting issues.

In the future, the forecasting method proposed in this study will be further applied to more wind farms to validate its generalization performance. Subsequent research can focus on addressing these challenges and limitations.

Author Contributions

Study conception and design, D.Z. and Y.J.; manuscript writing, D.Z. and Y.J.; analyzing and interpreting the results, D.Z. and Y.J.; Methodology, D.Z. and Y.J.; Funding acquisition, Y.J.; Writing—review & editing, Y.J., G.L. and J.L.; investigation, K.X.; data curation, Z.W. resources, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported in part by the Inner Mongolia Autonomous Region Science and Technology “Breakthrough” Project—“Open Solicitation and Command Assignment” (2025KJTW0005) and Inner Mongolia Natural Science Fund (2023ZD20) and the National Key Research and Development Plan “Energy Storage and Smart Grid Technology” Project (2024YFB2408400).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SSA	Sparrow Search Algorithm
VMD	Variational Mode Decomposition
CEEMDAN	Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
SE	Sample Entropy
BiTCN	Bidirectional Temporal Convolutional Networks
BiGRU	Bidirectional Gated Recurrent Units
IMGWO	Improved Grey Wolf Optimizer Algorithm
BiLSTM	Bidirectional Long Short-Term Memory
EMD	Empirical Mode Decomposition
ELM	Extreme Learning Machine
PSO	Particle Swarm Optimization
INGO	Improved Northern Goshawk Optimization

References

Xiong, J.; Peng, T.; Tao, Z.; Zhang, C.; Song, S.; Nazir, M.S. A dual-scale deep learning model based on ELM-BiLSTM and improved reptile search algorithm for wind power prediction. Energy 2023, 266, 126419. [Google Scholar] [CrossRef]
Liu, J.; Zang, H.; Cheng, L.; Ding, T.; Wei, Z.; Sun, G. Generative probabilistic forecasting of wind power: A Denoising-Diffusion-based nonstationary signal modeling approach. Energy 2025, 317, 134576. [Google Scholar] [CrossRef]
Li, Y.; Tang, X.; Liu, M.; Chen, G. The benefits and burdens of wind power systems in reaching China’s renewable energy goals: Implications from resource and environment assessment. J. Clean. Prod. 2024, 481, 144134. [Google Scholar] [CrossRef]
Zhang, J.; Yan, J.; Infield, D.; Liu, Y.; Lien, F.-S. Short-term forecasting and uncertainty analysis of wind turbine power based on long short-term memory network and Gaussian mixture model. Appl. Energy 2019, 241, 229–244. [Google Scholar] [CrossRef]
Ge, Y.; Zhang, C.; Wang, Y.; Chen, J.; Wang, Z.; Nazir, M.S.; Peng, T. A novel hybrid model based on multiple influencing factors and temporal convolutional network coupling ReOSELM for wind power prediction. Energy Convers. Manag. 2024, 313, 118632. [Google Scholar] [CrossRef]
Peng, S.; Zhu, J.; Wu, T.; Yuan, C.; Cang, J.; Zhang, K.; Pecht, M. Prediction of wind and PV power by fusing the multi-stage feature extraction and a PSO-BiLSTM model. Energy 2024, 298, 131345. [Google Scholar] [CrossRef]
Cunha, J.L.R.N.; Pereira, C.M.N.A. A hybrid model based on STL with simple exponential smoothing and ARMA for wind forecast in a Brazilian nuclear power plant site. Nucl. Eng. Des. 2024, 421, 113026. [Google Scholar] [CrossRef]
Zhou, D.; Liu, Y.; Wang, X.; Wang, F.; Jia, Y. Research Progress of Photovoltaic Power Prediction Technology Based on Artificial Intelligence Methods. Energy Eng. 2024, 121, 3573–3616. [Google Scholar] [CrossRef]
Guan, S.; Wang, Y.; Liu, L.; Gao, J.; Xu, Z.; Kan, S. Ultra-short-term wind power prediction method based on FTI-VACA-XGB model. Expert Syst. Appl. 2024, 235, 121185. [Google Scholar] [CrossRef]
Hu, Y.; Hu, X.; Yao, X.; Li, Q.; Fang, F.; Liu, J. Ultra-short-term prediction for wind power via intelligent reductional reconfiguration of wind conditions and upgraded stepwise modelling with embedded feature engineering. Renew. Energy 2025, 240, 122155. [Google Scholar] [CrossRef]
Wang, X.; Ma, W. A hybrid deep learning model with an optimal strategy based on improved VMD and transformer for short-term photovoltaic power forecasting. Energy 2024, 295, 131071. [Google Scholar] [CrossRef]
Wang, H.; Ye, J.; Huang, L.; Wang, Q.; Zhang, H. A multivariable hybrid prediction model of offshore wind power based on multi-stage optimization and reconstruction prediction. Energy 2023, 262, 125428. [Google Scholar] [CrossRef]
Gao, X.; Guo, W.; Mei, C.; Sha, J.; Guo, Y.; Sun, H. Short-term wind power forecasting based on SSA-VMD-LSTM. Energy Rep. 2023, 9, 335–344. [Google Scholar] [CrossRef]
Jiang, T.; Liu, Y. A short-term wind power prediction approach based on ensemble empirical mode decomposition and improved long short-term memory. Comput. Electr. Eng. 2023, 110, 108830. [Google Scholar] [CrossRef]
Wang, S.; Shi, J.; Yang, W.; Yin, Q. High and low frequency wind power prediction based on Transformer and BiGRU-Attention. Energy 2024, 288, 129753. [Google Scholar] [CrossRef]
Wang, Y.; Gao, J.; Xing, H.; Liu, G.; Qi, Y.; Wang, X.; Yang, F. Ultra-short-term wind power prediction method based on optimized signal decomposition and deep learning. Int. J. Electr. Power Energy Syst. 2025, 173, 111417. [Google Scholar] [CrossRef]
Tan, J.; Zhu, H.; Zhang, J.; Liu, H. Multi-stage wind speed prediction with CEEMDAN-SE-IDBO-LSTM based on rolling decomposition. Energy 2025, 338, 138741. [Google Scholar] [CrossRef]
Wu, X.; Wang, D.; Yang, M.; Liang, C. CEEMDAN-SE-HDBSCAN-VMD-TCN-BiGRU: A two-stage decomposition-based parallel model for multi-altitude ultra-short-term wind speed forecasting. Energy 2025, 330, 136660. [Google Scholar] [CrossRef]
Zhao, Z.; Yun, S.; Jia, L.; Guo, J.; Meng, Y.; He, N.; Li, X.; Shi, J.; Yang, L. Hybrid VMD-CNN-GRU-based model for short-term forecasting of wind power considering spatio-temporal features. Eng. Appl. Artif. Intell. 2023, 121, 105982. [Google Scholar] [CrossRef]
Gong, M.; Yan, C.; Xu, W.; Zhao, Z.; Li, W.; Liu, Y.; Li, S. Short-term wind power forecasting model based on temporal convolutional network and Informer. Energy 2023, 283, 129171. [Google Scholar] [CrossRef]
Bashir, T.; Wang, H.; Tahir, M.; Zhang, Y. Wind and solar power forecasting based on hybrid CNN-ABiLSTM, CNN-transformer-MLP models. Renew. Energy 2025, 239, 122055. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, Z.; Ma, L.; Jia, Y.; Yin, W.; Liu, Z. Short-Term Prediction of Photovoltaic Power Based on DBSCAN-SVM Data Cleaning and PSO-LSTM Model. Energy Eng. 2024, 121, 3019–3035. [Google Scholar] [CrossRef]
Yu, C. A comprehensive wind power prediction system based on correct multiscale clustering ensemble, similarity matching, and improved whale optimization algorithm—A case study in China. Renew. Energy 2025, 243, 122529. [Google Scholar] [CrossRef]
Zhou, D.; Liu, Y.; Wang, X.; Wang, F.; Jia, Y. Combined ultra-short-term photovoltaic power prediction based on CEEMDAN decomposition and RIME optimized AM-TCN-BiLSTM. Energy 2025, 318, 134847. [Google Scholar] [CrossRef]
Hu, R.; Qiao, J.; Li, Y.; Sun, Y.; Wang, B. Medium and Long Term Wind Power Forecast Based on WOA-VMD-SSA-LSTM. Acta Energiae Solaris Sin. 2024, 45, 549–556. [Google Scholar] [CrossRef]
Zhan, Y.; Wang, X.; Wang, X.; Xu, Y.; Li, W.; Wang, W. Study on photovoltaic power combined prediction model employing ceemdan-informer based on comprehensive distance similar. Acta Energiae Solaris Sin. 2025, 46, 315–326. [Google Scholar] [CrossRef]
Ma, X.; Fan, J.; Wang, J.; Huang, J.; Ke, Y. GWO-ANFIS-RD3PG: A reinforcement learning approach with dynamic adjustment and dual replay mechanism for building energy forecasting. J. Build. Eng. 2024, 97, 110726. [Google Scholar] [CrossRef]
Tanyildizi, E.; Demir, G. Golden Sine Algorithm: A Novel Math-Inspired Algorithm. Adv. Electr. Comput. Eng. 2017, 17, 71–78. [Google Scholar] [CrossRef]
Li, Z. A local opposition-learning golden-sine grey wolf optimization algorithm for feature selection in data classification. Appl. Soft Comput. 2023, 142, 110319. [Google Scholar] [CrossRef]
Ramadan, H.S.; Haes Alhelou, H.; Ahmed, A.A. Impartial near-optimal control and sizing for battery hybrid energy system balance via grey wolf optimizers: Lead acid and lithium-ion technologies. IET Renew. Power Gener. 2025, 19, e12423. [Google Scholar] [CrossRef]
Liu, S.; Xu, T.; Du, X.; Zhang, Y.; Wu, J. A hybrid deep learning model based on parallel architecture TCN-LSTM with Savitzky-Golay filter for wind power prediction. Energy Convers. Manag. 2024, 302, 118122. [Google Scholar] [CrossRef]
Hua, Z.; Yang, Q.; Chen, J.; Lan, T.; Zhao, D.; Dou, M.; Liang, B. Degradation prediction of PEMFC based on BiTCN-BiGRU-ELM fusion prognostic method. Int. J. Hydrogen Energy 2024, 87, 361–372. [Google Scholar] [CrossRef]
Chen, S.; Wu, Y.; Yang, D.; Dehghanian, P. Short-term load forecasting of power system based on load quadratic decomposition and feature processing. High Volt. Eng. 2025, 51, 2571–2581. [Google Scholar] [CrossRef]
Wang, R.; Wang, Y.; Lu, J. Short-term wind power prediction based on double decomposition and deep learning. Eng. J. Wuhan Univ. 2025, 58, 1346–1355. [Google Scholar]
Huang, Y.; Pi, Y.; Shi, Y.; Guo, W.; Wang, S. Adaptive graph active learning with mutual information via policy learning. Expert Syst. Appl. 2024, 255, 124773. [Google Scholar] [CrossRef]
Zhang, Y.; Shi, J.; Li, J.; Yun, S. Short-Term Wind Speed Prediction Based on Residual and VMD-ELM-LSTM. Acta Energiae Solaris Sin. 2023, 44, 340–347. [Google Scholar] [CrossRef]

Figure 1. Workflow Diagram of the Improved Grey Wolf Algorithm.

Figure 2. Architecture of Bidirectional Expansive Causal Convolutional Networks.

Figure 3. Architecture of the Bidirectional Dilated Causal Convolution Network.

Figure 4. Structure of GRU.

Figure 5. Structure of BiGRU.

Figure 6. Overall Technical Flowchart of the Study.

Figure 7. Flowchart of Multi-Layer Data Decomposition, Reconstruction, and Optimized Prediction.

Figure 8. Dataset Division.

Figure 9. VMD Results.

Figure 10. Multilayer decomposition Sample Entropy.

Figure 11. Spring CEEMDAN secondary decomposition and reconstruction results.

Figure 12. Summer CEEMDAN secondary decomposition and reconstruction results.

Figure 13. Autumn CEEMDAN secondary decomposition and reconstruction results.

Figure 14. Winter CEEMDAN secondary decomposition and reconstruction results.

Figure 15. Spring multilayer decomposition and reconstruction of component results.

Figure 16. Summer multilayer decomposition and reconstruction of component results.

Figure 17. Autumn multilayer decomposition and reconstruction of component results.

Figure 18. Winter multilayer decomposition and reconstruction of component results.

Figure 19. Selected Test Functions.

Figure 20. Algorithm Optimization Adaptation Iteration.

Figure 21. Spring Error Statistics.

Figure 22. Spring Prediction Results.

Figure 23. Summer Error Statistics.

Figure 24. Summer Prediction Results.

Figure 25. Autumn Error Statistics.

Figure 26. Autumn Prediction Results.

Figure 27. Winter Error Statistics.

Figure 28. Winter Prediction Results.

Table 1. Comprehensive analysis of state-of-the-art method in survey.

Study	Method	Key Contribution	Limitation	Highlight of Our Study
Wang [11]	improved whale variational mode decomposition (IWVMD)-context-embedded causal convolutional Transformer (CCTrans)	IWVMD to decompose sequences for data decomposition and multi-channel multi-scale modelling	Only performed one-time data decomposition, which can be improved	Introduced SSA-optimized VMD parameters and implemented secondary decomposition-reconstruction via SE and CEEMDAN frameworks
Wang [12]	GWO-VMD-SSA-ELM	Employed GWO to optimize the parameters of VMD	Only used GWO algorithm to optimize VMD parameters	Employed composite optimization indices to optimize model parameters and constructed an IMGWO-BiTCN-BiGRU hybrid prediction model
Gao [13]	SSA-VMD-CNN-GRU	Constructed a hybrid wind power prediction model SSA-VMD-CNN-GRU	the prediction model and parameter selection can be improved	Advanced the framework through secondary decomposition-reconstruction strategy and IMGWO-based hyperparameter optimization
Gong [20]	TCN-Informer	Proposed a combined prediction model based on temporal convolutional network and Informer	The considered temporal convolution failed to account for bidirectional features, and hyperparameters can be optimized	Innovated BiTCN-BiGRU architecture with bidirectional feature learning and IMGWO-optimized hyperparameters
Bashir [21]	CNN-ABiLSTM, CNN-Transformer-MLP	Studied two novel combined prediction models: CNN-ABiLSTM and CNN-Transformer-MLP	Some hyperparameters of the prediction models can be optimized	Enhanced data quality through double decomposition-reconstruction and IMGWO-optimized hyperparameters
Liu [22]	PSO- BiLSTM	Proposed a prediction model using PSO and BiLSTM	Insufficient data preprocessing and relatively simple prediction model structure	Achieved performance improvement by constructing IMGWO-BiTCN-BiGRU hybrid model
Yu [23]	CEEMDAN- IWOA- BiLSTM	Adopted CEEMDAN decomposition and improved whale optimization algorithm (IWOA) optimized BiLSTM	Only used one-time decomposition, and the prediction model structure can be improved	Enhancing data quality through dual-layer decomposition-reconstruction, the prediction model was further refined
Wang [16]	INGO-VMD-Informer	INGO optimized the number of modes in VMD, adopting improved Informer model for prediction	Only optimized one parameter of VMD	Optimized two key parameters of the VMD process and implemented a secondary decomposition-reconstruction strategy to enhance data quality
Wu [18]	CEEMDAN-SE-HDBSCAN-VMD-TCN-BiGRU	First applied CEEMDAN decomposition, then further refined high-frequency and complex signals through VMD	Did not optimize the parameters of VMD, and the decomposition strategy can be further improved	The parameters of VMD were optimized using the SSA.

Table 2. SSA-VMD Optimization Results.

	Decomposition Mode Number k	Quadratic Penalty Term α
Spring	7	1954.2651
Summer	9	1344.2345
Autumn	7	1697
Winter	9	1091.8243

Table 3. Sample Entropy Results of SSA-VMD Components for Each Season.

Season		IMF1	IMF2	IMF3	IMF4	IMF5	IMF6	IMF7
Spring	SE value	0.0136	0.0935	0.3362	0.3529	0.5323	0.5565	0.5558
Spring	New IMF	IMF1	IMF2		IMF3	IMF4	IMF5	IMF6
Summer		IMF1	IMF2	IMF3	IMF4	IMF5	IMF6	IMF7	IMF8	IMF9
	SE value	0.0145	0.0882	0.2323	0.3961	0.4365	0.4999	0.5136	0.4909	0.3880
	New IMF		IMF1	IMF2	IMF3		IMF4		IMF5	IMF6
Autumn		IMF1	IMF2	IMF3	IMF4	IMF5	IMF6	IMF7
	SE value	0.0099	0.0711	0.2398	0.4417	0.5220	0.5988	0.4382
	New IMF		IMF1	IMF2	IMF3	IMF4	IMF5	IMF6
Winter		IMF1	IMF2	IMF3	IMF4	IMF5	IMF6	IMF7	IMF8	IMF9
	SE value	0.0174	0.0764	0.3499	0.4486	0.4922	0.5543	0.5336	0.4782	0.4104
	New IMF		IMF1	IMF2	IMF3		IMF4		IMF5	IMF6

Table 4. Sample Entropy Statistics and Sequence Reconstruction after Secondary Decomposition for Each Season.

Season		IMF1	IMF2	IMF3	IMF4	IMF5	IMF6	IMF7	IMF8	IMF9
Spring	SE	0.5503	0.5637	0.4887	0.2409	0.1186	0.0495	0.0202	0.0071	0.0001
Spring	New IMF	IMF1	IMF2	IMF3	IMF4					IMF5
Summer		IMF1	IMF2	IMF3	IMF4	IMF5	IMF6	IMF7	IMF8	IMF9	IMF10	IMF11	IMF12
	SE	0.3733	0.4615	0.4260	0.4715	0.4557	0.3764	0.2072	0.1021	0.0538	0.0232	0.0101	0.0024
	New IMF	IMF1	IMF2			IMF3	IMF4						IMF5
Autumn		IMF1	IMF2	IMF3	IMF4	IMF5	IMF6	IMF7	IMF8	IMF9
	SE	0.4453	0.5301	0.4896	0.4715	0.2754	0.1614	0.0755	0.0280	0.0050
	New IMF	IMF1	IMF2		IMF3		IMF4			IMF5
Winter		IMF1	IMF2	IMF3	IMF4	IMF5	IMF6	IMF7	IMF8	IMF9	IMF10
	SE	0.4426	0.4609	0.4359	0.4269	0.3318	0.1712	0.0678	0.0319	0.0105	0.0057
	New IMF	IMF1	IMF2		IMF3		IMF4				IMF5

Table 5. Test Function List.

Function Name	Baseline Test Function Expression	n	Search Scope	Global Optimal Solution
F1	$f_{1} (x) = \sum_{i = 1}^{n} x_{i}^{2}$	30	${[- 100, 100]}^{n}$	0
F3	$f_{3} (x) = \sum_{i = 1}^{n} (\sum_{j = 1}^{i} x_{j})^{2}$	30	${[- 100, 100]}^{n}$	0
F4	$f_{4} (x) = {m a x}_{i} {\| x_{i} \|, 1 \leq i \leq n}$	30	${[- 100, 100]}^{n}$	0
F6	$f (x) = \sum_{i = 1}^{n} {[(x_{i} + 0.5)]}^{2}$	30	${[- 100, 100]}^{n}$	0
F8	$f_{8} (x) = \sum_{i = 1}^{n} - x_{i} s i n (\sqrt{\| x_{i} \|})$	30	${[- 500, 500]}^{n}$	−12,569.5
F9	$f_{9} (x) = \sum_{i = 1}^{n} [x_{i}^{2} - 10 c o s (2 π x_{i}) + 10]$	30	${[- 5.12, 5.12]}^{n}$	0
F10	$f_{10} (x) = - 20 e x p (- 0.2 \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}) - e x p (\frac{1}{n} \sum_{i = 1}^{n} c o s 2 π x_{i}) + 20 + e$	30	${[- 32, 32]}^{n}$	0
F22	$f_{22} (x) = - \sum_{i = 1}^{7} [(x - a_{i}) (x - a_{i})^{T} + c_{i}]^{- 1}$	4	${[0, 10]}^{n}$	−10

Table 6. Hyperparameter optimization range and optimization results.

Model	TCN-nb_Filter	BiGRU_Units	Num_Epochs	Batch_Size
Optimized range	(32, 128)	(30, 200)	(20, 50)	(8, 32)
Spring	32	83	23	10
Summer	79	115	35	20
Autumn	103	120	49	9
Winter	32	30	20	8

Table 7. Spring Prediction Errors.

Model	MSE (KW)	MAE (KW)	RMSE (KW)	R²
TCN-BiGRU	20.4	104.1	142.7	0.9761
BiTCN-BiGRU	23.2	109.9	152.3	0.9727
CNN-BiLSTM-AM	32.5	131.6	180.4	0.9619
XGBOOST	22.6	104.4	150.4	0.9734
VMD-BiTCN-BiGRU	23.1	138.7	151.9	0.9729
CEEMDAN-BiTCN-BiGRU	24.3	136.9	156.1	0.9714
VMD-CEEMDAN-BiTCN-BiGRU	17.9	94.8	134	0.9791
Proposed Model	8.1	69.3	89.9	0.9905

Table 8. Summer Prediction Errors.

Model	MSE (KW)	MAE (KW)	RMSE (KW)	R²
TCN-BiGRU	42.9	167.4	207	0.9860
BiTCN-BiGRU	26.1	122.4	161.4	0.9915
CNN-BiLSTM-AM	42.4	150.8	206	0.9862
XGBOOST	35.3	138.6	187.8	0.9886
VMD-BiTCN-BiGRU	13.2	102.4	114.8	0.9957
CEEMDAN-BiTCN-BiGRU	31.6	151.3	177.8	0.9897
VMD-CEEMDAN-BiTCN-BiGRU	7.2	66.3	85.1	0.9976
Proposed Model	6.8	64.9	81.7	0.9978

Table 9. Autumn Prediction Errors.

Model	MSE (KW)	MAE (KW)	RMSE (KW)	R²
TCN-BiGRU	46	153.4	214.6	0.9886
BiTCN-BiGRU	38.3	147.7	195.8	0.9905
CNN-BiLSTM-AM	50.2	160.6	224	0.9876
XGBOOST	67.6	147.7	260.1	0.9833
VMD-BiTCN-BiGRU	83.8	174.0	289.6	0.9793
CEEMDAN-BiTCN-BiGRU	28.5	128.8	168.7	0.9929
VMD-CEEMDAN-BiTCN-BiGRU	20.5	109.3	143.4	0.9949
Proposed Model	14.9	83.4	122.4	0.9963

Table 10. Winter Prediction Errors.

Model	MSE (KW)	MAE (KW)	RMSE (KW)	R²
TCN-BiGRU	79.1	207.4	281.2	0.9920
BiTCN-BiGRU	60.4	181.6	245.8	0.9939
CNN-BiLSTM-AM	90.5	217.2	300.9	0.9909
XGBOOST	100.2	232.2	316.6	0.9903
VMD-BiTCN-BiGRU	35.8	160.8	189.2	0.9963
CEEMDAN-BiTCN-BiGRU	25.1	113.2	158.3	0.9975
VMD-CEEMDAN-BiTCN-BiGRU	17.7	98.9	133	0.9982
Proposed Model	11.6	75.5	107.9	0.9988

Table 11. Results of the ablation experiments conducted on the proposed model.

Season	Model	MSE (KW)	MAE (KW)	RMSE (KW)	R²
Spring	Baseline Model	23.2	109.9	152.3	0.9727
	① Baseline + VMD	23.1	138.7	151.9	0.9729
	② Baseline + CEEMDAN	24.3	136.9	156.1	0.9714
	③ Baseline + two-layer decomposition and reconstruction	17.9	94.8	134	0.9791
	④ Baseline + IMGWO optimization	8.1	69.3	89.9	0.9905
Summer	Baseline Model	26.1	122.4	161.4	0.9915
	① Baseline + VMD	13.2	102.4	114.8	0.9957
	② Baseline + CEEMDAN	31.6	151.3	177.8	0.9897
	③ Baseline + two-layer decomposition and reconstruction	7.2	66.3	85.1	0.9976
	④ Baseline + IMGWO optimization	6.8	64.9	81.7	0.9978
Autumn	Baseline Model	38.3	147.7	195.8	0.9905
	① Baseline + VMD	83.8	174.0	289.6	0.9793
	② Baseline + CEEMDAN	28.5	128.8	168.7	0.9929
	③ Baseline + two-layer decomposition and reconstruction	20.5	109.3	143.4	0.9949
	④ Baseline + IMGWO optimization	14.9	83.4	122.4	0.9963
Winter	Baseline Model	60.4	181.6	245.8	0.9939
	① Baseline + VMD	35.8	160.8	189.2	0.9963
	② Baseline + CEEMDAN	25.1	113.2	158.3	0.9975
	③ Baseline + two-layer decomposition and reconstruction	17.7	98.9	133	0.9982
	④ Baseline + IMGWO optimization	11.6	75.5	107.9	0.9988

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, D.; Jia, Y.; Liu, G.; Li, J.; Xi, K.; Wang, Z.; Wang, X. Research on an Ultra-Short-Term Wind Power Forecasting Model Based on Multi-Scale Decomposition and Fusion Framework. Symmetry 2026, 18, 253. https://doi.org/10.3390/sym18020253

AMA Style

Zhou D, Jia Y, Liu G, Li J, Xi K, Wang Z, Wang X. Research on an Ultra-Short-Term Wind Power Forecasting Model Based on Multi-Scale Decomposition and Fusion Framework. Symmetry. 2026; 18(2):253. https://doi.org/10.3390/sym18020253

Chicago/Turabian Style

Zhou, Daixuan, Yan Jia, Guangchen Liu, Junlin Li, Kaile Xi, Zhichao Wang, and Xu Wang. 2026. "Research on an Ultra-Short-Term Wind Power Forecasting Model Based on Multi-Scale Decomposition and Fusion Framework" Symmetry 18, no. 2: 253. https://doi.org/10.3390/sym18020253

APA Style

Zhou, D., Jia, Y., Liu, G., Li, J., Xi, K., Wang, Z., & Wang, X. (2026). Research on an Ultra-Short-Term Wind Power Forecasting Model Based on Multi-Scale Decomposition and Fusion Framework. Symmetry, 18(2), 253. https://doi.org/10.3390/sym18020253

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on an Ultra-Short-Term Wind Power Forecasting Model Based on Multi-Scale Decomposition and Fusion Framework

Abstract

1. Introduction

2. Model Principles

2.1. VMD

2.2. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)

2.3. Grey Wolf Optimizer (GWO) Algorithm

2.4. Improved Grey Wolf Optimizer (IMGWO) Algorithm

2.5. Bidirectional Temporal Convolutional Neural Network (BiTCN)

2.6. Bidirectional GRU Neural Network

2.7. Improved Variational Modal Decomposition

3. Ultra-Short-Term Wind Power Prediction Model

3.1. Proposed Hybrid Prediction Model

3.2. Evaluation Metrics

4. Experimental Analysis

4.1. Data Source

4.2. Data Preprocessing and Experimental Settings

4.3. SSA-VMD-SE-CEEMDAN Multi-Layer Data Decomposition and Reconstruction

4.4. Verification of the Performance of the IMGWO Optimization Algorithm

4.5. Analysis of Wind Power Prediction Results

4.6. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI