1. Introduction
Wind power, as a quintessential renewable energy source, has undergone remarkably rapid global expansion in recent years, establishing itself as a leading sustainable alternative to traditional fossil fuels, due to its substantial advantages in sustainability and environmental performance [
1]. However, the inherent intermittency and variability of wind energy—characteristics that define it as an intermittent power source—pose significant challenges to the operational safety and stability of large-scale grid-integrated wind power systems [
2]. As the integration of wind energy into power systems increases, accurate, timely, and reliable wind power forecasting becomes critical for effective power system planning, dispatch, and secure grid operations [
3].
Wind power forecasting (WPF) methods are typically categorized into four main approaches: physical models, statistical models, artificial intelligence-based techniques, and hybrid forecasting methods [
4]. Chang et al. [
5] propose a novel long-term WPF hybrid model that corrects numerical weather prediction (NWP) wind speed and uses multi-scale deep learning regression prediction to exclude excessive NWP data. However, the accuracy of physical models is heavily reliant on the precision of input meteorological data and is highly sensitive to fluctuations in weather conditions. Statistical models utilize historical data to derive relationships between the wind speed and power output. Commonly employed techniques include autoregressive integrated moving average (ARIMA) [
6], linear regression [
7], and Kalman filtering [
8]. These methods are particularly effective for short-term and very short-term forecasting under the condition of high-quality historical data. Chen [
9] proposed an innovative statistical downscaling technique for meteorological wind models, demonstrating that while statistical models are generally straightforward to implement and computationally efficient, their performance can deteriorate under complex nonlinear dynamics or rapidly changing weather conditions.
Artificial intelligence (AI)-based prediction techniques encompass a wide range of models, including artificial neural network (ANN) [
10], support vector machine (SVM) [
11], and deep learning models (DL) [
12]. Traditional ANNs—such as feedforward neural network (FNN) [
13], multilayer perceptron (MLP) [
14], backpropagation neural network (BPNN) [
15], and radial basis function neural network (RBFNN) [
16]—are highly effective at capturing the inherent temporal and spatial correlations within wind power datasets. However, their performance may degrade significantly when processing large-scale datasets due to the increased data complexity, presenting substantial challenges for model scalability and computational efficiency. Deep learning (DL), an advanced paradigm within machine learning, has emerged as a powerful and versatile tool for wind power forecasting due to its superior capacity for autonomous feature extraction and modeling intricate nonlinear dependencies within high-dimensional datasets. The predominant DL architectures deployed in this domain fall into four principal categories: deep neural networks (DNNs) [
17], convolutional neural networks (CNNs) [
18], recurrent neural networks (RNNs) [
19], and enhanced RNN variants—long short-term memory (LSTM) [
20] and gated recurrent unit (GRU) [
21]—specifically engineered to mitigate vanishing gradient challenges in long-term wind sequence modeling. CNNs exhibit robust feature extraction capabilities and computational efficiency, making them well-suited for spatial–temporal pattern analysis in wind datasets. As a time-series-adapted variant of CNNs, Temporal Convolutional Networks (TCNs) [
22] are specifically designed to capture both short- and long-term temporal dependencies more effectively, thereby enhancing the accuracy and reliability of wind power predictions. Complementing these approaches, generative adversarial networks (GANs) have emerged as effective frameworks for addressing data scarcity and distributional uncertainty in wind power forecasting tasks, particularly through semi-supervised learning paradigms [
23]. Recently, Transformer architectures have revolutionized wind power forecasting through multi-head self-attention mechanisms to simultaneously model localized fluctuations and global trend correlations. Erick et al. [
24] introduced a transformer-based architecture with adaptive positional encoding, specifically optimized for wind power sequences. This innovation has demonstrated superior accuracy and reliability in long-term forecasting, solidifying Transformers as a state-of-the-art methodology in the domain.
By combining individual forecasting models’ benefits, hybrid forecasting models have become a key approach across various forecasting domains [
25]. This integrative framework can retain the benefits of each model individually while effectively reducing the uncertainty arising from exclusive reliance on single methodologies. As a key subcategory within hybrid forecasting frameworks, signal decomposition-based combined models significantly enhance wind power forecasting accuracy by systematically reducing input data complexity. Common signal decomposition technologies include univariate and multivariate algorithms. Univariate algorithms comprise wavelet decomposition (WD) [
26], variational mode decomposition (VMD) [
27], empirical mode decomposition (EMD) [
28], and enhanced EMD variants—ensemble empirical mode decomposition (EEMD) [
29], complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [
30], etc. Ranjeeta Bisoi et al. [
31] demonstrate VMD’s superiority over EMD, particularly in noise robustness and feature extraction precision for predictive modeling applications.
However, these univariate decomposition algorithms are ineffective for processing multivariate data. In wind power forecasting, datasets are typically multidimensional, comprising multiple correlated time series such as wind speed, temperature, and pressure. Consequently, the prediction accuracy of such methods is inherently limited. Unlike univariate decomposition, multivariate techniques (MEMD [
32], MVMD [
33]) and their hybrid derivatives (e.g., MEMD-GRU [
34], MVMD-Transformer [
35], MVMD-CNN-BiLSTM [
36]) can effectively capture cross-variable dependencies, enabling more robust system modeling and superior prediction accuracy compared to traditional approaches. While effective, these multivariate decomposition-based hybrid models incur significantly higher training costs in terms of time and energy consumption, thereby limiting their applicability in sustainable forecasting tasks.
To address these limitations, we propose an accurate yet computationally efficient short-term wind power prediction framework that combines MVMD with our novel Series-Core Fused Time Series (SOFTS) approach. While MVMD delivers superior prediction accuracy, its computational demands remain substantial. The proposed SOFTS technique effectively mitigates this computational burden while preserving predictive performance. The key contributions of this work include the following:
- (1)
High prediction accuracy: In the data processing stage, we propose the MVMD algorithm to simultaneously decompose the meteorological data series and the wind power data series, effectively addressing the frequency mismatch between the meteorological and wind power sequences. This approach enables time–frequency synchronized analysis of both meteorological variables and wind power generation series, thereby ensuring high prediction accuracy.
- (2)
Low computational cost: In the prediction training stage, we propose the SOFTS framework, which employs a STAR aggregate–redistribute module within a centralized architecture. The STAR module aggregates all series to generate a global core representation, which is subsequently redistributed and fused with individual series representations, enabling efficient cross-channel interactions. Its computational complexity primarily scales with the number of input channels rather than the input sequence length. Notably, we provide a theoretical analysis of the computational complexity in comparison with the existing methods (see the results in
Table 1). Our theoretical analysis shows that the core computational complexity of the proposed method is
, which represents a significant reduction compared to the
complexity of LSTM and the
complexity of Transformer architectures.
- (3)
Practical simulation validation: A real-world dataset from the Xinjiang Guohua Jingxia North Wind Farm was used to compare the MVMD-SOFTS model with eight benchmark models, including the advanced Transformer model. The results demonstrate that the MVMD-SOFTS model achieves superior performance in both single-step and multi-step ahead forecasting.
The remainder of this paper is organized as follows.
Section 2 introduces the overall framework and methodology of the proposed model.
Section 3 describes the data preparation process and the evaluation metrics employed.
Section 4 presents the experimental setup and results, including detailed comparisons with baseline methods.
Section 5 concludes the paper and outlines potential directions for future research.
2. Materials and Methods
2.1. Multivariate Variational Mode Decomposition
As a multivariate extended signal decomposition algorithm based on VMD, MVMD has recently gained popularity. MVMD can simultaneously decompose meteorological data series and wind power time series, allowing for the capture of dynamic characteristics of wind power while effectively incorporating the influence of meteorological factors on wind power fluctuations. In contrast to traditional univariate decomposition methods, MVMD overcomes the limitations of single-signal processing by providing more comprehensive time-frequency information, improving the robustness and accuracy of the forecasting model.
The MVMD algorithm was initially proposed by Naveed ur Rehman and Hania Aftab in 2019 [
33]. The MVMD decomposition process is outlined as follows:
- (1)
Define input data. The input data consists of the wind power series along with meteorological data sequences, mathematically expressed as
where
,
,
,
,
, and
denote wind power, wind speed, wind direction, temperature, atmospheric pressure, and humidity, respectively. The variable
t denotes time.
- (2)
Signal decomposition model. The goal is to decompose the original multivariate input signal
into an ensemble of
K multivariate modulated oscillatory components
while meeting the following requirements: (i) the cumulative bandwidth of the extracted modes is as small as possible; (ii) the aggregate of the extracted modes precisely reconstructs the original signal. The constrained optimization problem can be formulated as
where
K and
C denote the number of IMFs and channels, respectively;
denotes the partial derivative operation with respect to time;
denotes the analytic signal characterized by a unilateral frequency spectrum for
using the Hilbert–Huang Transform;
represents the central frequency of the
kth IMFs set
, which is shared by multichannel oscillations;
is the input signal of the
cth data channel, encompassing both wind power time series and meteorological data sequences.
- (3)
Form augmented Lagrangian function. By introducing Lagrangian multipliers and quadratic penalty terms, the aforementioned constrained optimization problem can be converted to an augmented Lagrangian function as
where
serves as the weighting factor for the penalty.
- (4)
Alternating Direction Method of Multipliers (ADMM) iterations. Using ADMM, the complete optimization problem is decomposed into a sequence of iterative sub-optimization problems. Note that problem (3) only contains equality constraints, which allow the ADMM iterations to form a type of closed-form solution to the subproblems, thus reducing the difficulty of the solution process. The closed-form update equations for the modes
and the center frequency are presented below:
where
represent Fourier transforms of
, and
m denotes the current iterations. Ultimately, after executing the aforementioned processing steps, six sets of sub-series
,
,
,
,
,
are obtained.
2.2. Series-Core Fused Time Series (SOFTS) Model
To address the computational complexity issues arising by MVMD, this paper presents an efficient MLP-based model, the series-core fused time series (SOFTS) model [
37]. The architecture of the SOFTS model is depicted in
Figure 1, which comprise the following four components.
(1) Reversible Instance Normalization. Normalization is a fundamental preprocessing step in time series forecasting models. In SOFTS, reversible instance normalization is employed to enhance the stability of the prediction process. Initially, the historical time series are normalized by centering them to zero mean and scaling them to unit variance. This normalization effectively removes the local statistical dependencies within the data, thereby facilitating more stable and reliable predictions by the base forecaster. Once the forecasting is completed, the normalization is reversed to restore the original statistical properties of the predicted series. This approach has been widely adopted in state-of-the-art models to improve performance and ensure the model’s adaptability to various time series characteristics.
(2) Series Embedding. Series embedding projects each channel of the input time series into a hidden-dimensional space through a linear transformation. This transformation serves to prepare the time series data for subsequent processing while preserving the essential temporal dependencies inherent in the series. In our approach, we apply series embedding to the input historical data by linearly projecting
into
, where
L denotes the length of the historical time steps used for forecasting, and
H is the dimensionality of the hidden layer.
(3) STAR Module. A star-shaped aggregate-redistribute model, STAR model for short, is used to achieve information exchanges between different data channels, which represents the core innovation of SOFTS. Unlike traditional methods like attention, which involve pairwise comparisons between channels, STAR uses a centralized structure to aggregate the information from all series to obtain a comprehensive core representation and then distribute the core information to each channel, as shown in
Figure 2. This interaction pattern addresses not only the complexity and inefficiency of distributed interactions but also the robustness when there are abnormal channels. The input data
from the series embedding is refined in sequence through
N layers of the STAR module. Each layer iteratively processes the embedding from the previous layer, capturing increasingly complex patterns and dependencies within the multivariate time series. The output at the nth layer is updated as follows:
Specifically, the
nth layer STAR module first extracts the core representation of the multivariate time series when provided with the series representations of each channel as input. The core representation
O is defined as follows:
where
f denotes an arbitrary function, and
represent input multivariate series comprising
C channels.
The core representation encodes the global information across all the date channels. We employ the stochastic pooling technology [
38] to get the core representation by aggregating representations of
C channels:
where the role of
is to transform the sequence representation from the hidden dimension
H of the sequence embedding to the core dimension
H′ using the GELU activation function.
.
refers to the stochastic pooling processing, which effectively combines the advantages of max pooling and average pooling. Specifically, it normalizes these softmax activations to derive a probability distribution, where each channel’s activation value corresponds to a specific probability
p:
During training, we use the stochastic sampling method to randomly select core value
based on probability
p to pick a channel
c within the dimension
j. This selection follows activation probabilities, serving as the core representation to enhance the model’s generalization ability:
During the testing phase, a weighted summation method is used to obtain the core representation for each dimension to ensure model stability:
Subsequently, we use the following form to fuse the representations of the core and all the associated series, consolidating the information from these distinct components into a unified representation for further analysis:
where the Repeat_Concat operation involves concatenating the core representation
with each individual series representation (as shown in
Figure 2,
), resulting in a new representation
, i.e.,
. Subsequently,
is utilized to project the concatenated representation back into the hidden dimension, effectively fusing the information from both the core and series representation, resulting in the fused representation
(
).
(4) Linear Predictor. After performing
N layers of STAR models in sequence, we can obtain the fused representation at the
Nth layer, denoted by
. Then, we can use a linear predictor (
) to generate the forecasting results, given by the following formula:
2.3. MVMD-SOFTS Framework Structure
The framework of the proposed MVMD-SOFTS forecasting model is depicted in
Figure 3, and the specific steps are outlined as follows.
Step 1: Data decomposition. The input data comprises the wind power generation time series and the meteorological data time series such as the wind speed, wind direction, temperature, air pressure, and humidity. Based on the MVMD algorithm, the input multivariate signals are decomposed into a predefined number (denoted as K) of IMFs. This decomposition process separates the complex non-stationary data into simpler oscillatory components with distinct frequencies, thereby capturing the underlying patterns and trends in both the wind power generation and meteorological data. In this case study, the input variables are decomposed into eight distinct IMFs, each corresponding to a different frequency. These IMFs are crucial for subsequent analysis and forecasting, as they offer a more manageable and interpretable representation of the temporal dynamics inherent in the input data.
Step 2: Model prediction. For each IMF, we use SOFTS architecture to capture the temporal dependencies and channel correlation among wind power and meteorological variable channels, enabling producing the anticipated future behavior of wind power generation and meteorological variables at each frequency scale. These forecasted IMFs are subsequently utilized in the following steps to reconstruct the final prediction of the system’s behavior.
Step 3: Reconstruction and evaluation. By summing all the forecasted IMFs, this aggregation process can produce a comprehensive prediction for wind power generation and meteorological variables. Following reconstruction, error analysis is conducted using evaluation metrics such as the coefficient of determination (), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). These metrics quantify the discrepancies between the predicted and actual values, providing a thorough assessment of the model’s performance and accuracy. This step is essential for identifying potential areas for improvement and ensuring the reliability of the forecasting model.
2.4. Computation Complexity Comparison
Table 1 outlines the theoretical complexity of LSTM, Transformer, and SOFTS models. Each complexity formulation includes three components: input encoding, core computation (recurrent-based, attention-based, or MLP-based), and multi-step forecasting output. Here,
C denotes the number of input channels,
L represents the length of the input historical sequence,
d is the hidden dimension, and
H refers to the length of the forecast horizon.
For the LSTM model, the complexity term arises from projecting a multivariate input sequence of length L and channel C into a hidden space. The main computational cost results from the recurrent hidden-to-hidden transformations, which are carried out sequentially over time steps. The output complexity corresponds to mapping the hidden states to H forecast steps, with each step producing feature outputs across all C channels through a fully connected layer.
For the Transformer model, the complexity term accounts for embedding a multivariate input sequence with channel C and length L into a d-dimensional representation. The primary computational burden comes from the encoder’s self-attention mechanism, which incurs a complexity of due to pairwise interactions across all input positions. Furthermore, the decoder contributes an additional cost of through cross attention, as each of the H forecast steps attends to the entire encoded sequence. The output complexity results from transforming decoder outputs into final predictions, where each step generates C features through a fully connected layer.
For the SOFTS model, the complexity term reflects the temporal encoding of each input channel over the historical sequence. The core computational load stems from the STAR module, where inter-channel interactions are captured through parallel MLP operations. This design avoids the sequential dependencies present in recurrent or attention-based models, enabling efficient and fully parallel computation. The final term corresponds to producing multi-step predictions from the learned representations using a shared output layer.
Overall, LSTM involves sequential computation, where hidden states are updated step by step, resulting in a dominant cost of . Transformer requires intensive computation due to the encoder’s self-attention mechanism with complexity , and an additional cost of is introduced by the decoder’s cross-attention mechanism. While the encoder supports full parallelism, the decoder remains partially sequential during prediction. SOFTS offers a more efficient structure, with all operations being parallelizable. Its overall complexity grows linearly with the sequence length L, the channel count C, and the prediction horizon H, and it avoids both quadratic attention costs and recursive updates.