Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data

Lu, Wentian; Lu, Zhenming; Liu, Wenjie; Cao, Yifeng

doi:10.3390/forecast8010015

Open AccessArticle

Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data

School of Mechanical and Electric Engineering, Guangzhou University, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Forecasting 2026, 8(1), 15; https://doi.org/10.3390/forecast8010015

Submission received: 2 January 2026 / Revised: 29 January 2026 / Accepted: 10 February 2026 / Published: 12 February 2026

(This article belongs to the Collection Energy Forecasting)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

An innovative MVMD algorithm extracts informative time–frequency features via joint analysis of wind power and meteorological data, boosting short term wind power prediction accuracy.
A SOFTS framework integrated with the STAR aggregation and redistribution mechanism streamlines computation, enhances efficiency and suits real-time wind power forecasting scenarios well.

What are the implications of the main findings?

The MVMD-SOFTS model breaks dual bottlenecks of traditional single variable decomposition and high Transformer computational cost, providing a new high accuracy and high efficiency paradigm for renewable energy forecasting.
The MVMD-SOFTS model maintains excellent multi-step prediction accuracy and real time computational efficiency, meeting practical short time scale forecasting needs in power grid scheduling.

Abstract

Accurate wind power forecasting is critical for enhancing the operational efficiency and stability of electrical power grids. Conventional single-variable signal decomposition forecasting methods ignore the coupling relationship between wind power and multiple meteorological data, thus limiting prediction accuracy. This study proposes an accurate and fast short-term wind power prediction approach based on series-core fusion technology considering multiple meteorological data. In the data preprocessing stage, the multivariate variational mode decomposition (MVMD) algorithm decomposes wind power and meteorological variables into the same predefined number of frequency-aligned intrinsic mode functions (IMFs), thereby enhancing feature representation and improving forecasting accuracy via a more comprehensive and detailed dataset representation. During the training stage, the series-core fused time series (SOFTS) model establishes the connection among wind power channel and other meteorological variable channels for each IMF, achieving fast convergence through its streamlined and parallel structure. In the forecasting stage, the final wind power prediction is generated by the reconstruction of all IMFs. Furthermore, we conducted a comprehensive performance evaluation by comparing the proposed MVMD-SOFTS model with eight alternative models, including the CNN model, the TCN model, the LSTM model, the GRU model, the Transformer model, the SOFTS model, the CEEMDAN-SOFTS model, and the VMD-SOFTS model. The results indicate that MVMD-SOFTS outperformed all other models, demonstrating its effectiveness in capturing the multifaceted relationships in wind power forecasting.

Keywords:

wind power forecasting; meteorological data; multivariate variational mode decomposition; series-core fused time series model

1. Introduction

Wind power, as a quintessential renewable energy source, has undergone remarkably rapid global expansion in recent years, establishing itself as a leading sustainable alternative to traditional fossil fuels, due to its substantial advantages in sustainability and environmental performance [1]. However, the inherent intermittency and variability of wind energy—characteristics that define it as an intermittent power source—pose significant challenges to the operational safety and stability of large-scale grid-integrated wind power systems [2]. As the integration of wind energy into power systems increases, accurate, timely, and reliable wind power forecasting becomes critical for effective power system planning, dispatch, and secure grid operations [3].

Wind power forecasting (WPF) methods are typically categorized into four main approaches: physical models, statistical models, artificial intelligence-based techniques, and hybrid forecasting methods [4]. Chang et al. [5] propose a novel long-term WPF hybrid model that corrects numerical weather prediction (NWP) wind speed and uses multi-scale deep learning regression prediction to exclude excessive NWP data. However, the accuracy of physical models is heavily reliant on the precision of input meteorological data and is highly sensitive to fluctuations in weather conditions. Statistical models utilize historical data to derive relationships between the wind speed and power output. Commonly employed techniques include autoregressive integrated moving average (ARIMA) [6], linear regression [7], and Kalman filtering [8]. These methods are particularly effective for short-term and very short-term forecasting under the condition of high-quality historical data. Chen [9] proposed an innovative statistical downscaling technique for meteorological wind models, demonstrating that while statistical models are generally straightforward to implement and computationally efficient, their performance can deteriorate under complex nonlinear dynamics or rapidly changing weather conditions.

Artificial intelligence (AI)-based prediction techniques encompass a wide range of models, including artificial neural network (ANN) [10], support vector machine (SVM) [11], and deep learning models (DL) [12]. Traditional ANNs—such as feedforward neural network (FNN) [13], multilayer perceptron (MLP) [14], backpropagation neural network (BPNN) [15], and radial basis function neural network (RBFNN) [16]—are highly effective at capturing the inherent temporal and spatial correlations within wind power datasets. However, their performance may degrade significantly when processing large-scale datasets due to the increased data complexity, presenting substantial challenges for model scalability and computational efficiency. Deep learning (DL), an advanced paradigm within machine learning, has emerged as a powerful and versatile tool for wind power forecasting due to its superior capacity for autonomous feature extraction and modeling intricate nonlinear dependencies within high-dimensional datasets. The predominant DL architectures deployed in this domain fall into four principal categories: deep neural networks (DNNs) [17], convolutional neural networks (CNNs) [18], recurrent neural networks (RNNs) [19], and enhanced RNN variants—long short-term memory (LSTM) [20] and gated recurrent unit (GRU) [21]—specifically engineered to mitigate vanishing gradient challenges in long-term wind sequence modeling. CNNs exhibit robust feature extraction capabilities and computational efficiency, making them well-suited for spatial–temporal pattern analysis in wind datasets. As a time-series-adapted variant of CNNs, Temporal Convolutional Networks (TCNs) [22] are specifically designed to capture both short- and long-term temporal dependencies more effectively, thereby enhancing the accuracy and reliability of wind power predictions. Complementing these approaches, generative adversarial networks (GANs) have emerged as effective frameworks for addressing data scarcity and distributional uncertainty in wind power forecasting tasks, particularly through semi-supervised learning paradigms [23]. Recently, Transformer architectures have revolutionized wind power forecasting through multi-head self-attention mechanisms to simultaneously model localized fluctuations and global trend correlations. Erick et al. [24] introduced a transformer-based architecture with adaptive positional encoding, specifically optimized for wind power sequences. This innovation has demonstrated superior accuracy and reliability in long-term forecasting, solidifying Transformers as a state-of-the-art methodology in the domain.

By combining individual forecasting models’ benefits, hybrid forecasting models have become a key approach across various forecasting domains [25]. This integrative framework can retain the benefits of each model individually while effectively reducing the uncertainty arising from exclusive reliance on single methodologies. As a key subcategory within hybrid forecasting frameworks, signal decomposition-based combined models significantly enhance wind power forecasting accuracy by systematically reducing input data complexity. Common signal decomposition technologies include univariate and multivariate algorithms. Univariate algorithms comprise wavelet decomposition (WD) [26], variational mode decomposition (VMD) [27], empirical mode decomposition (EMD) [28], and enhanced EMD variants—ensemble empirical mode decomposition (EEMD) [29], complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [30], etc. Ranjeeta Bisoi et al. [31] demonstrate VMD’s superiority over EMD, particularly in noise robustness and feature extraction precision for predictive modeling applications.

However, these univariate decomposition algorithms are ineffective for processing multivariate data. In wind power forecasting, datasets are typically multidimensional, comprising multiple correlated time series such as wind speed, temperature, and pressure. Consequently, the prediction accuracy of such methods is inherently limited. Unlike univariate decomposition, multivariate techniques (MEMD [32], MVMD [33]) and their hybrid derivatives (e.g., MEMD-GRU [34], MVMD-Transformer [35], MVMD-CNN-BiLSTM [36]) can effectively capture cross-variable dependencies, enabling more robust system modeling and superior prediction accuracy compared to traditional approaches. While effective, these multivariate decomposition-based hybrid models incur significantly higher training costs in terms of time and energy consumption, thereby limiting their applicability in sustainable forecasting tasks.

To address these limitations, we propose an accurate yet computationally efficient short-term wind power prediction framework that combines MVMD with our novel Series-Core Fused Time Series (SOFTS) approach. While MVMD delivers superior prediction accuracy, its computational demands remain substantial. The proposed SOFTS technique effectively mitigates this computational burden while preserving predictive performance. The key contributions of this work include the following:

(1): High prediction accuracy: In the data processing stage, we propose the MVMD algorithm to simultaneously decompose the meteorological data series and the wind power data series, effectively addressing the frequency mismatch between the meteorological and wind power sequences. This approach enables time–frequency synchronized analysis of both meteorological variables and wind power generation series, thereby ensuring high prediction accuracy.
(2): Low computational cost: In the prediction training stage, we propose the SOFTS framework, which employs a STAR aggregate–redistribute module within a centralized architecture. The STAR module aggregates all series to generate a global core representation, which is subsequently redistributed and fused with individual series representations, enabling efficient cross-channel interactions. Its computational complexity primarily scales with the number of input channels rather than the input sequence length. Notably, we provide a theoretical analysis of the computational complexity in comparison with the existing methods (see the results in Table 1). Our theoretical analysis shows that the core computational complexity of the proposed method is $O (C d^{2})$ , which represents a significant reduction compared to the $O (L d^{2})$ complexity of LSTM and the $O (L^{2} d + H L d)$ complexity of Transformer architectures.
(3): Practical simulation validation: A real-world dataset from the Xinjiang Guohua Jingxia North Wind Farm was used to compare the MVMD-SOFTS model with eight benchmark models, including the advanced Transformer model. The results demonstrate that the MVMD-SOFTS model achieves superior performance in both single-step and multi-step ahead forecasting.

The remainder of this paper is organized as follows. Section 2 introduces the overall framework and methodology of the proposed model. Section 3 describes the data preparation process and the evaluation metrics employed. Section 4 presents the experimental setup and results, including detailed comparisons with baseline methods. Section 5 concludes the paper and outlines potential directions for future research.

2. Materials and Methods

2.1. Multivariate Variational Mode Decomposition

As a multivariate extended signal decomposition algorithm based on VMD, MVMD has recently gained popularity. MVMD can simultaneously decompose meteorological data series and wind power time series, allowing for the capture of dynamic characteristics of wind power while effectively incorporating the influence of meteorological factors on wind power fluctuations. In contrast to traditional univariate decomposition methods, MVMD overcomes the limitations of single-signal processing by providing more comprehensive time-frequency information, improving the robustness and accuracy of the forecasting model.

The MVMD algorithm was initially proposed by Naveed ur Rehman and Hania Aftab in 2019 [33]. The MVMD decomposition process is outlined as follows:

(1): Define input data. The input data consists of the wind power series along with meteorological data sequences, mathematically expressed as

$x (t) = [W P (t), W S (t), W D (t), T (t), P (t), H (t)]$

(1)

where $W P (t)$ , $W S (t)$ , $W D (t)$ , $T (t)$ , $P (t)$ , and $H (t)$ denote wind power, wind speed, wind direction, temperature, atmospheric pressure, and humidity, respectively. The variable t denotes time.
(2): Signal decomposition model. The goal is to decompose the original multivariate input signal ${\{x_{c} (t)\}}_{c = 1}^{C}$ into an ensemble of K multivariate modulated oscillatory components ${\{{\{u_{k, c} (t)\}}_{k = 1}^{K}\}}_{c = 1}^{C}$ while meeting the following requirements: (i) the cumulative bandwidth of the extracted modes is as small as possible; (ii) the aggregate of the extracted modes precisely reconstructs the original signal. The constrained optimization problem can be formulated as

$\begin{matrix} min_{\{u_{k, c}\} \{ω_{k}\}} \{\sum_{k = 1}^{K} \sum_{c = 1}^{C} {∥\partial_{t} [u_{+}^{k, c} (t) e^{- j ω_{k} t}]∥}_{2}^{2}\} \\ subject to x_{c} (t) = \sum_{k = 1}^{K} u_{k, c} (t), c = 1, 2, \dots, C \end{matrix}$

(2)

where K and C denote the number of IMFs and channels, respectively; $\partial_{t}$ denotes the partial derivative operation with respect to time; $u_{+}^{k, c} (t)$ denotes the analytic signal characterized by a unilateral frequency spectrum for $u_{k, c} (t)$ using the Hilbert–Huang Transform; $ω_{k}$ represents the central frequency of the kth IMFs set ${\{u_{k, c} (t)\}}_{c = 1}^{C}$ , which is shared by multichannel oscillations; $x_{c}$ is the input signal of the cth data channel, encompassing both wind power time series and meteorological data sequences.
(3): Form augmented Lagrangian function. By introducing Lagrangian multipliers and quadratic penalty terms, the aforementioned constrained optimization problem can be converted to an augmented Lagrangian function as

$\begin{matrix} L (\{u_{k, c}\}, \{ω_{k}\}, λ_{c}) & = α \sum_{k = 1}^{K} \sum_{c = 1}^{C} {∥\partial_{t} [u_{+}^{k, c} (t) e^{- j ω_{k} t}]∥}_{2}^{2} \\ + \sum_{c = 1}^{C} {∥x_{c} (t) - \sum_{k = 1}^{K} u_{k, c} (t)∥}_{2}^{2} \\ + \sum_{c = 1}^{C} 〈λ_{c} (t), x_{c} (t) - \sum_{k = 1}^{K} u_{k, c} (t)〉 . \end{matrix}$

(3)

where $α$ serves as the weighting factor for the penalty.
(4): Alternating Direction Method of Multipliers (ADMM) iterations. Using ADMM, the complete optimization problem is decomposed into a sequence of iterative sub-optimization problems. Note that problem (3) only contains equality constraints, which allow the ADMM iterations to form a type of closed-form solution to the subproblems, thus reducing the difficulty of the solution process. The closed-form update equations for the modes ${\hat{u}}_{k, c} (ω)$ and the center frequency are presented below:

${\hat{u}}_{k, c}^{m + 1} (ω) = \frac{{\hat{x}}_{c} (ω) - \sum_{n \neq k} {\hat{u}}_{n, c} (ω) + \frac{{\hat{λ}}_{c} (ω)}{2}}{1 + 2 α {(ω - ω_{k})}^{2}}$

(4)

$ω_{k}^{m + 1} = \frac{\sum_{c = 1}^{C} \int_{0}^{\infty} ω {|{\hat{u}}_{k, c}^{m + 1} (ω)|}^{2} d ω}{\sum_{c = 1}^{C} \int_{0}^{\infty} {|{\hat{u}}_{k, c}^{m + 1} (ω)|}^{2} d ω}$

(5)

where ${\hat{x}}_{c} (ω), {\hat{λ}}_{c} (ω), {\hat{u}}_{k, c} (ω)$ represent Fourier transforms of $x_{c} (ω), λ_{c} (ω), u_{k, c} (ω)$ , and m denotes the current iterations. Ultimately, after executing the aforementioned processing steps, six sets of sub-series $W P_{k} (t)$ , $W S_{k} (t)$ , $W D_{k} (t)$ , $T_{k} (t)$ , $P_{k} (t)$ , $H_{k} (t)$ are obtained.

2.2. Series-Core Fused Time Series (SOFTS) Model

To address the computational complexity issues arising by MVMD, this paper presents an efficient MLP-based model, the series-core fused time series (SOFTS) model [37]. The architecture of the SOFTS model is depicted in Figure 1, which comprise the following four components.

(1) Reversible Instance Normalization. Normalization is a fundamental preprocessing step in time series forecasting models. In SOFTS, reversible instance normalization is employed to enhance the stability of the prediction process. Initially, the historical time series are normalized by centering them to zero mean and scaling them to unit variance. This normalization effectively removes the local statistical dependencies within the data, thereby facilitating more stable and reliable predictions by the base forecaster. Once the forecasting is completed, the normalization is reversed to restore the original statistical properties of the predicted series. This approach has been widely adopted in state-of-the-art models to improve performance and ensure the model’s adaptability to various time series characteristics.

(2) Series Embedding. Series embedding projects each channel of the input time series into a hidden-dimensional space through a linear transformation. This transformation serves to prepare the time series data for subsequent processing while preserving the essential temporal dependencies inherent in the series. In our approach, we apply series embedding to the input historical data by linearly projecting

X \in R^{C \times L}

into

S_{0} \in R^{C \times H}

, where L denotes the length of the historical time steps used for forecasting, and H is the dimensionality of the hidden layer.

S_{0} = Series embedding (X)

(6)

(3) STAR Module. A star-shaped aggregate-redistribute model, STAR model for short, is used to achieve information exchanges between different data channels, which represents the core innovation of SOFTS. Unlike traditional methods like attention, which involve pairwise comparisons between channels, STAR uses a centralized structure to aggregate the information from all series to obtain a comprehensive core representation and then distribute the core information to each channel, as shown in Figure 2. This interaction pattern addresses not only the complexity and inefficiency of distributed interactions but also the robustness when there are abnormal channels. The input data

S_{0}

from the series embedding is refined in sequence through N layers of the STAR module. Each layer iteratively processes the embedding from the previous layer, capturing increasingly complex patterns and dependencies within the multivariate time series. The output at the nth layer is updated as follows:

S_{n} = STAR (S_{n - 1}), n = 1, 2, \dots, N .

(7)

Specifically, the nth layer STAR module first extracts the core representation of the multivariate time series when provided with the series representations of each channel as input. The core representation O is defined as follows:

O_{n} = f (s_{1}, s_{2}, \dots, s_{C}), n = 1, 2, \dots, N

(8)

where f denotes an arbitrary function, and

S_{n} = \{s_{1}^{n}, s_{2}^{n}, \dots, s_{C}^{n}\}

represent input multivariate series comprising C channels.

The core representation encodes the global information across all the date channels. We employ the stochastic pooling technology [38] to get the core representation by aggregating representations of C channels:

O_{n} = Stoch_Pooling ({MLP}_{1} (S_{n - 1})),

(9)

where the role of

{MLP}_{1}

is to transform the sequence representation from the hidden dimension H of the sequence embedding to the core dimension H′ using the GELU activation function.

({MLP}_{1} : R^{C \times H} \mapsto R^{C \times H^{'}})

.

Stoch_Pooling

refers to the stochastic pooling processing, which effectively combines the advantages of max pooling and average pooling. Specifically, it normalizes these softmax activations to derive a probability distribution, where each channel’s activation value corresponds to a specific probability p:

p_{c j} = \frac{e^{A_{c j}}}{\sum_{c = 1}^{C} e^{A_{c j}}}, c = 1, 2, \dots, C, j = 1, 2, \dots, H^{'}

(10)

During training, we use the stochastic sampling method to randomly select core value

o_{j}

based on probability p to pick a channel c within the dimension j. This selection follows activation probabilities, serving as the core representation to enhance the model’s generalization ability:

o_{j} = A_{c j}, where c \sim P (p_{1 j}, p_{2 j}, \dots, p_{C j}) .

(11)

During the testing phase, a weighted summation method is used to obtain the core representation for each dimension to ensure model stability:

o_{j} = \sum_{c = 1}^{C} p_{c j} A_{c j} .

(12)

Subsequently, we use the following form to fuse the representations of the core and all the associated series, consolidating the information from these distinct components into a unified representation for further analysis:

F_{n} = Repeat_Concat (S_{n - 1}, O_{n}),

(13)

S_{n} = {MLP}_{2} (F_{n}) + S_{n - 1},

(14)

where the Repeat_Concat operation involves concatenating the core representation

O_{n} = {o_{1}^{n}, o_{2}^{n}, \dots, o_{H^{'}}^{n}}

with each individual series representation (as shown in Figure 2,

f_{c}^{n} = [s_{c}^{n}, O_{n}]

), resulting in a new representation

F_{n} \in R^{C \times (H + H^{'})}

, i.e.,

F_{n} = {f_{1}^{n}, f_{2}^{n}, \dots, f_{C}^{n}}

. Subsequently,

{MLP}_{2}

is utilized to project the concatenated representation back into the hidden dimension, effectively fusing the information from both the core and series representation, resulting in the fused representation

S_{n} \in R^{C \times H}

(

{MLP}_{2} : R^{C \times (H + H^{'})} \mapsto R^{C \times H}

).

(4) Linear Predictor. After performing N layers of STAR models in sequence, we can obtain the fused representation at the Nth layer, denoted by

S_{N} \in R^{C \times H}

. Then, we can use a linear predictor (

R^{C \times H} \mapsto R^{C \times L^{'}}

) to generate the forecasting results, given by the following formula:

Y = Linear (S_{N}) .

(15)

2.3. MVMD-SOFTS Framework Structure

The framework of the proposed MVMD-SOFTS forecasting model is depicted in Figure 3, and the specific steps are outlined as follows.

Step 1: Data decomposition. The input data comprises the wind power generation time series and the meteorological data time series such as the wind speed, wind direction, temperature, air pressure, and humidity. Based on the MVMD algorithm, the input multivariate signals are decomposed into a predefined number (denoted as K) of IMFs. This decomposition process separates the complex non-stationary data into simpler oscillatory components with distinct frequencies, thereby capturing the underlying patterns and trends in both the wind power generation and meteorological data. In this case study, the input variables are decomposed into eight distinct IMFs, each corresponding to a different frequency. These IMFs are crucial for subsequent analysis and forecasting, as they offer a more manageable and interpretable representation of the temporal dynamics inherent in the input data.

Step 2: Model prediction. For each IMF, we use SOFTS architecture to capture the temporal dependencies and channel correlation among wind power and meteorological variable channels, enabling producing the anticipated future behavior of wind power generation and meteorological variables at each frequency scale. These forecasted IMFs are subsequently utilized in the following steps to reconstruct the final prediction of the system’s behavior.

Step 3: Reconstruction and evaluation. By summing all the forecasted IMFs, this aggregation process can produce a comprehensive prediction for wind power generation and meteorological variables. Following reconstruction, error analysis is conducted using evaluation metrics such as the coefficient of determination (

R^{2}

), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). These metrics quantify the discrepancies between the predicted and actual values, providing a thorough assessment of the model’s performance and accuracy. This step is essential for identifying potential areas for improvement and ensuring the reliability of the forecasting model.

2.4. Computation Complexity Comparison

Table 1 outlines the theoretical complexity of LSTM, Transformer, and SOFTS models. Each complexity formulation includes three components: input encoding, core computation (recurrent-based, attention-based, or MLP-based), and multi-step forecasting output. Here, C denotes the number of input channels, L represents the length of the input historical sequence, d is the hidden dimension, and H refers to the length of the forecast horizon.

For the LSTM model, the complexity term

O (L d C)

arises from projecting a multivariate input sequence of length L and channel C into a hidden space. The main computational cost

O (L d^{2})

results from the recurrent hidden-to-hidden transformations, which are carried out sequentially over time steps. The output complexity

O (C d H)

corresponds to mapping the hidden states to H forecast steps, with each step producing feature outputs across all C channels through a fully connected layer.

For the Transformer model, the complexity term

O (C L d)

accounts for embedding a multivariate input sequence with channel C and length L into a d-dimensional representation. The primary computational burden comes from the encoder’s self-attention mechanism, which incurs a complexity of

O (L^{2} d)

due to pairwise interactions across all input positions. Furthermore, the decoder contributes an additional cost of

O (H L d)

through cross attention, as each of the H forecast steps attends to the entire encoded sequence. The output complexity

O (C d H)

results from transforming decoder outputs into final predictions, where each step generates C features through a fully connected layer.

For the SOFTS model, the complexity term

O (C L d)

reflects the temporal encoding of each input channel over the historical sequence. The core computational load

O (C d^{2})

stems from the STAR module, where inter-channel interactions are captured through parallel MLP operations. This design avoids the sequential dependencies present in recurrent or attention-based models, enabling efficient and fully parallel computation. The final term

O (C d H)

corresponds to producing multi-step predictions from the learned representations using a shared output layer.

Overall, LSTM involves sequential computation, where hidden states are updated step by step, resulting in a dominant cost of

O (L d^{2})

. Transformer requires intensive computation due to the encoder’s self-attention mechanism with complexity

O (L^{2} d)

, and an additional cost of

O (H L d)

is introduced by the decoder’s cross-attention mechanism. While the encoder supports full parallelism, the decoder remains partially sequential during prediction. SOFTS offers a more efficient structure, with all operations being parallelizable. Its overall complexity grows linearly with the sequence length L, the channel count C, and the prediction horizon H, and it avoids both quadratic attention costs and recursive updates.

3. Data Preparation and Evaluation Metrics

3.1. Data Preparation

To evaluate the performance of the proposed model, a real-world dataset was utilized, obtained from the Guohua Jingxia North Wind Farm located in Xinjiang, China, covering the period from 1 January to 31 December 2019. The dataset was collected by the Supervisory Control and Data Acquisition (SCADA) system of the wind farm, which recorded high-frequency measurements at 15 min intervals. The variables included the wind speed at hub height, wind direction, temperature, humidity, atmospheric pressure, and actual power output. The wind turbines used in the wind farm are China Haizhuang HZ111/2000L models, manufactured by CSSC Haizhuang Wind Power, Chongqing, China with a rated capacity of 2 MW, and the hub height is 70 m.

The power generation data exhibited seasonal variations influenced by local geographical and meteorological factors. Consequently, the dataset was partitioned into four seasonal subsets: Spring (1 March to 31 May 2019), Summer (1 June to 31 August 2019), Autumn (1 September to 30 November 2019), and Winter (1 January to 28 February 2019, and 1 December to 31 December 2019). The partitioning ensured a detailed analysis of the seasonal behavior of the wind farm, and the statistical characteristics of each subset are presented in the accompanying Table 2.

Each seasonal dataset was split into 90% for training and 10% for testing to preserve temporal and seasonal patterns. Missing meteorological values caused by turbine faults were filled via linear interpolation. All input variables were then linearly normalized to [0, 1] to ensure training stability.

3.2. Evaluation Metrics

To evaluate the performance of the proposed method, four widely used evaluation metrics are employed:

R^{2}

, MAE, RMSE, and MAPE.

R^{2}

measures the proportion of variance in the dependent variable explained by the model, with values closer to 1 indicating better fit. MAE quantifies the average magnitude of errors, providing a straightforward interpretation of the forecasting accuracy. RMSE penalizes larger errors, making it sensitive to outliers and reflecting the overall prediction quality. Owing to the presence of near-zero actual wind power values in the dataset, the standard MAPE tends to exhibit disproportionately large errors. To address this issue, this study employs a modified MAPE formulation based on the mean of the actual values. The specific calculation formulas for

R^{2}

, MAE, RMSE, and MAPE are defined as follows:

R^{2} = 1 - \frac{\sum_{m = 1}^{M} {(y_{m} - {\tilde{y}}_{m})}^{2}}{\sum_{m = 1}^{M} {(y_{m} - \bar{y})}^{2}},

(16)

MAE = \frac{1}{M} \sum_{m = 1}^{M} |y_{m} - {\tilde{y}}_{m}|,

(17)

RMSE = \sqrt{\frac{\sum_{m = 1}^{M} {(y_{m} - {\tilde{y}}_{m})}^{2}}{M}},

(18)

MAPE = \frac{1}{M} \sum_{m = 1}^{M} |\frac{y_{m} - {\tilde{y}}_{m}}{\bar{y}}| \times 100 %,

(19)

where

y_{m}

denotes the true value at the m-th time step, while

{\tilde{y}}_{m}

represents the predicted value at the same time step. Additionally,

\bar{y}

indicates the mean of all true values, and M is the total number of forecasted data points used for evaluation.

4. Experiments and Analysis

To evaluate the effectiveness of the proposed MVMD-SOFTS model, comprehensive comparative experiments and detailed discussions were carried out. The superiority of the SOFTS model and the efficacy of the MVMD decomposition were systematically verified through rigorous experimental validation. All experiments were implemented in Python 3.11 using the TensorFlow and Keras frameworks. The training was performed on a workstation equipped with an Intel Core i9-13900K CPU and 64 GB RAM. The basic parameter configurations of each model are summarized in Table 3, including model configurations such as layer configurations and internal parameters like hidden size and model dimension, all of which were selected using the grid search method to ensure that we found the optimal configuration. The input sequence length was fixed at 24. Each model was trained using a batch size of 64 for 50 epochs with the Adam optimizer, and the loss function was set to mean squared error (MSE).

4.1. Comparative Experiments of Single Forecasting Models

In this section, the performance of the proposed SOFTS model was evaluated by comparing it with several commonly used forecasting benchmarks. Six models were constructed for this comparison: CNN, TCN, LSTM, GRU, Transformer, and the proposed SOFTS. The evaluation metrics used were

R^{2}

, MAE, RMSE, and MAPE. Figure 4 displays a bar chart comparing the SOFTS model with the other models, while the detailed prediction accuracy results are shown in Table 4.

Owing to its inherent architectural characteristics, the CNN model is able to capture certain local dependencies through convolutional kernels but struggles to handle long-term dependencies. The enhanced TCN model, integrating residual connections and dilated convolutions, achieves notable improvements in time-series processing. However, its predictive performance remains inferior to that of the LSTM model. This is primarily because LSTM’s recurrent architecture, with its internal memory cells and gating mechanisms, enables it to more effectively capture and store long-term dependencies. Across all four datasets as shown in Table 4 and Figure 4, the LSTM model consistently outperforms the CNN model and the TCN model. For example, in the spring dataset, the LSTM model reduces the MAE, RMSE, and MAPE by 2.215 MW, 1.9937 MW, and 2.4874%, respectively, compared to the TCN model. The GRU model, by reducing the number of memory units and gating mechanisms compared to the LSTM model, features a simpler structure with fewer parameters. This streamlined architecture results in a slightly improved predictive performance over the LSTM model. Compared to the recurrent structure of the LSTM model, the Transformer model utilizes a self-attention mechanism, which does not rely on sequential processing. By calculating the attention weights between each time step and all other time steps in the sequence, the Transformer can easily capture long-range dependencies. For instance, in the summer and winter datasets, the Transformer model outperformed the LSTM model, with reductions in the MAE, RMSE, and MAPE of 0.3783 MW, 0.0526 MW, and 0.5732% for the summer dataset, and 1.4405 MW, 2.1119 MW, and 1.3871% for the winter dataset, respectively. These results highlight the superior performance of the Transformer model in handling complex forecasting tasks. Compared to the traditional Transformer model, the SOFTS model replaces the attention mechanism with the STAR module, which employs distributed interactions to reduce computational complexity and enhance robustness. Among all models, the SOFTS model consistently delivers the best performance across all four seasonal datasets. Its

R^{2}

values are closest to 1, while its MAE, RMSE, and MAPE are the lowest among all single forecasting models. For instance, on the summer dataset, the SOFTS model achieves an

R^{2}

of 0.9727, an MAE of 5.2262, an RMSE of 8.6153, and a MAPE of 7.8813%, demonstrating its superior ability to capture long-term dependencies even under conditions of high uncertainty and fluctuation in wind speed, resulting in wind power output predictions that more accurately align with the actual values.

Additionally, the comparative analysis of training times for both Transformer and SOFTS models is presented in Table 5. SOFTS demonstrates superior training efficiency compared to Transformer across different sequence lengths (L = 24, 48, 96). In particular, in medium- to long-term forecasting tasks (L = 48 and L = 96), as the input sequence length increases, the training time of Transformer grows rapidly due to its inherent computational complexity. In contrast, the efficiency advantage of the SOFTS model becomes increasingly evident, with its training duration remaining stable even as the sequence length extends. This observation aligns well with the theoretical complexity analysis in Table 1: SOFTS is primarily affected by the number of input channels rather than the sequence length, demonstrating its efficiency advantage over Transformer across different sequence lengths.

4.2. Comparative Experiments of Different Decomposition Forecasting Methods

To evaluate the effectiveness of MVMD on wind power forecasting, we compared three distinct signal decomposition methods: complete EEMD with adaptive noise (CEEMDAN) [30], VMD [27], and MVMD [33]. CEEMDAN has been widely applied in wind power forecasting due to its strong capability in handling nonlinear and non-stationary signals. Therefore, in this study, CEEMDAN is applied to the wind power sequence alone to reflect its performance within a typical univariate modeling framework. In contrast, VMD decomposes the wind power sequence and each meteorological variable independently, whereas MVMD performs joint decomposition of all channel data—including wind power and meteorological variables—within a unified framework. This experimental setup facilitates a step-by-step comparison, progressing from traditional univariate decomposition (CEEMDAN), to independent multivariate decomposition (VMD), and ultimately to joint multivariate decomposition (MVMD), thereby demonstrating the superior capabilities of integrated multivariate signal decomposition.

In both the VMD and MVMD methods, the setting of the K value significantly impacted the decomposition quality and model performance. When the K value is set too low, the number of modes is insufficient to capture the main components of the signal, reducing the prediction accuracy; conversely, an excessively high K value leads to mode overproduction, additionally increasing the computational burden. Given the non-stationary, nonlinear, and uncertain characteristics of wind power data, setting the K value too low is inadvisable. Through repeated experiments with K values ranging from 6 to 10, a K value of 8 was selected for both VMD and MVMD to balance the predictive accuracy and computational efficiency. Table 6 illustrates the central frequencies of the IMFs obtained by VMD and MVMD in spring and summer datasets, revealing a trend of increasing IMF frequency with decomposition order. In VMD, the separate decomposition of wind power and meteorological variables (e.g., wind speed, temperature) results in inconsistent IMF central frequencies across variables; for example, in the spring dataset, IMF1 exhibits distinct frequencies for wind power and associated meteorological signals, reflecting limited capability to model multivariate interdependencies. By contrast, MVMD maintains uniform IMF central frequencies across variables in both the spring and summer datasets: through simultaneously processing correlated signals, MVMD ensures that corresponding IMFs (e.g., IMF1, IMF2) for wind power, wind speed, and temperature share consistent frequencies, preserving inter-variable correlations. This coherence in decomposed components highlights MVMD’s superiority in providing robust input features for predictive modeling compared to VMD’s disjointed single-variable decomposition. Finally, Figure 5 presents the MVMD decomposition results of the wind power sequence and the five meteorological data series: taking the wind power series as an example, IMF1 has the lowest central frequency and captures the sequence’s trend; IMF2 (with the second lowest central frequency) reflects its periodic characteristics; and IMF3-IMF8 (with the highest central frequencies) represent the short-term fluctuations.

Subsequently, to verify the superiority of the MVMD model, we employed SOFTS as the forecasting model to establish three hybrid models: CEEMDAN-SOFTS, VMD-SOFTS, and MVMD-SOFTS. The forecasting performance of these three hybrid models for wind power is presented in Table 7. The CEEMDAN-SOFTS model outperformed the VMD-SOFTS model across most datasets. For instance, in the autumn and winter datasets, the CEEMDAN-SOFTS model achieved higher

R^{2}

values than the VMD-SOFTS model, with MAE and RMSE reductions of 0.3415 MW, 0.5771 MW, and 0.6359 MW, 0.9247 MW, respectively. This advantage stems from CEEMDAN’s adaptive noise mechanism, which efficiently isolates major components without complex parameter tuning, enabling the predictive model to capture fluctuations and trends in wind power more effectively. Among the three models, the MVMD-SOFTS model consistently exhibited the highest forecasting accuracy across all four seasonal datasets: its

R^{2}

values were closest to 1, while its MAE, RMSE, and MAPE were the lowest. Compared to the CEEMDAN-SOFTS model, the MVMD-SOFTS model showed MAE and RMSE reductions of 0.3361 MW and 0.9170 MW; compared to the VMD-SOFTS model, these reductions were 0.4384 MW and 1.3730 MW. To provide a clearer comparison of single-point forecasting errors, Figure 6, Figure 7, Figure 8 and Figure 9 depict the wind power forecasting curves and their absolute error curves obtained by the three hybrid models. Notably, among all models, the MVMD-SOFTS model demonstrated the best performance. Specially, the box plots in Figure 6d, Figure 7d, Figure 8d and Figure 9d illustrate that the MVMD-SOFTS model exhibits lower outlier values compared to the other two models, enhancing the model’s robustness. These results indicate that, compared with CEEMDAN and VMD, the MVMD decomposition method’s superior multivariate processing capability, decomposition stability, noise resistance, and accuracy in capturing low-frequency trends better meet the complex signal requirements of wind power forecasting.

4.3. Comparative Experiments of Multi-Step Forecasting

In practical applications, wind power forecasting is not limited to single-step prediction; multi-step forecasting is of equal importance. In this section, multi-step forecasting experiments are conducted with forecast horizons set as 2-step ahead prediction (30 min), 3-step ahead prediction (45 min), and 4-step ahead prediction (60 min). A total of nine models are selected for comparative experiments: CNN, TCN, LSTM, GRU, Transformer, SOFTS, CEEMDAN-SOFTS, VMD-SOFTS, and the proposed MVMD-SOFTS. The forecasting performances of these nine models across the seasonal datasets at different time steps are presented in Table 8 and Table 9. As shown in Table 8 and Table 9, the main conclusions drawn from the multi-step ahead forecasting experiment are as follows:

(1): Multi-step forecasting poses significant challenges compared to single-step forecasting, primarily due to the cumulative error that tends to increase with each additional prediction step in most models. Compared to other single forecasting models, the SOFTS model demonstrated superior performance in both single-step and multi-step forecasting tasks. In experiments conducted on the seasonal datasets, the SOFTS model achieved $R^{2}$ values closer to 1 and had the lowest MAE, RMSE, and MAPE values among all the single forecasting models. Therefore, employing the SOFTS model as a baseline forecasting method is conducive to enhancing the accuracy and robustness of subsequent experiments.
(2): Compared to single forecasting models, hybrid models based on signal decomposition algorithms exhibit superior performance in multi-step forecasting tasks. Across all seasonal datasets, the signal decomposition-based hybrid models generally outperform single forecasting models in terms of MAE, RMSE, and MAPE metrics in multi-step forecasting experiments. Single forecasting models struggle to effectively capture the complex dynamic behavior of wind power sequences during multi-step forecasting due to the inherent volatility and uncertainty of wind power data. The hybrid signal decomposition algorithm addresses this issue by decomposing the original wind power signal into multiple sub-sequences with improved stationarity and specific frequency characteristics, making each sub-sequence easier to model. This approach reduces the burden of complexity and noise handling for each sub-model, thereby significantly enhancing the overall stability and robustness of the prediction.
(3): Compared to VMD-SOFTS and CEEMDAN-SOFTS, MVMD-SOFTS demonstrates significant advantages in multi-step forecasting. Taking the spring dataset as an example, in the two-step ahead forecasting, the $R^{2}$ value of MVMD-SOFTS is higher than that of the other two models, with the MAE and RMSE reduced by 1.7369 MW and 0.6250 MW, as well as 3.2509 MW and 1.8042 MW, respectively. Meanwhile, the MAPE is reduced by 1.9360% and 0.6966%, respectively. MVMD is capable of jointly decomposing multiple input variables, effectively suppressing noise and filtering out irrelevant information. This multivariate signal decomposition approach facilitates better extraction of intrinsic correlations between features, significantly improving the stationarity and distinctiveness of the decomposed sub-sequences, thereby enhancing the training effectiveness and prediction accuracy of the subsequent forecasting model.
(4): Overall, MVMD-SOFTS demonstrated superior experimental results in both single-step and multi-step ahead forecasting, achieving optimal performance in error metrics across all datasets. Taking the four-step ahead forecasting as an example, the average MAE and RMSE values of MVMD-SOFTS across the four datasets were 3.9387 MW and 5.7280 MW, respectively, while the average MAPE was 4.5273%. Compared to other models, MVMD-SOFTS exhibited better prediction accuracy, with the lowest values for all error metrics, indicating its higher accuracy and robustness in capturing wind power fluctuation trends and addressing random variations in the data.

5. Conclusions

In this paper, we propose a novel hybrid model for wind power forecasting, the MVMD-SOFTS model, which is the first application of the SOFT model in wind power forecasting. The model is evaluated using real-world data from the Guohua Jingxia North Wind Farm and is compared with several commonly used benchmark models through three sets of experiments: single-model comparison, hybrid-model comparison, and multi-step ahead prediction. The performance is analyzed using four error evaluation metrics:

R^{2}

, MAE, RMSE, and MAPE. The following conclusions are drawn:

(1): The MVMD method overcomes the limitations of traditional single-variable decomposition techniques (e.g., VMD, CEEMDAN) by effectively capturing the complex multivariate coupling relationships hidden in wind-power time-series data—such as the dynamic interactions between wind speed, wind direction, and power output—thereby enhancing the quality of the input data and ultimately improving the forecasting accuracy.
(2): By replacing the Transformer’s self-attention mechanism with the STAR module, the SOFTS architecture fully linearizes its core computation: its complexity now grows mainly with the number of input channels C rather than the sequence length L. This channel-oriented scaling eliminates the computational cost of the self-attention mechanism, making SOFTS markedly faster to train than a standard Transformer, thereby greatly improving the model’s efficiency.
(3): In the multi-step ahead prediction experiment, the MVMD-SOFTS model demonstrates superior performance compared to all other models, as it successfully maintains high forecasting accuracy over multiple time steps by combining effective data decomposition and advanced time-series modeling techniques, making it ideal for short-term and real-time forecasting applications in power grid operations.
(4): It should be noted that the key MVMD hyperparameters (e.g., the number of modes K and the regularization factor $α$ ) as well as the convergence behavior and termination criteria of the ADMM solver have been systematically discussed in the literature (as discussed in [33] and also the engineering settings and application practices in [34,35,36]). In prior engineering studies, the reported K values typically vary with the spectral characteristics of the signals and task requirements, generally ranging from approximately 5 to 11. Due to space limitations, we do not provide an exhaustive comparison across different K settings in the main text; instead, we focus on the proposed model architecture and its end-to-end forecasting performance. Future work will conduct dedicated sensitivity/stability analyses with respect to K and $α$ and further evaluate ADMM iteration cost and its impact on overall efficiency.

Author Contributions

Conceptualization, Z.L. and W.L. (Wentian Lu); methodology, Z.L.; software, Z.L.; validation, Z.L., W.L. (Wentian Lu) and W.L. (Wenjie Liu); formal analysis, Z.L.; investigation, Z.L. and Y.C.; resources, W.L. (Wentian Lu); data curation, Z.L. and W.L. (Wenjie Liu); writing—original draft preparation, Z.L.; writing—review and editing, W.L. (Wentian Lu) and W.L. (Wenjie Liu); visualization, Z.L. and Y.C.; supervision, W.L. (Wentian Lu); project administration, W.L. (Wentian Lu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 52107093 and in part by the Basic and Applied Basic Research Fund of Guangdong Province under Grant 2022A1515240038.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed in this study are proprietary unpublished data of our research team and are currently reserved for follow-up research projects. For academic research purposes that comply with ethical and academic norms, interested researchers may contact the corresponding author (Email: [lwj1993@gzhu.edu.cn]) to negotiate reasonable access to the data after the completion of the follow-up research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial Neural Network
CEEMDAN	Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
CNN	Convolutional Neural Network

DNN	Deep Neural Network
EMD	Empirical Mode Decomposition
EEMD	Ensemble Empirical Mode Decomposition
GAN	Generative Adversarial Network
GRU	Gated Recurrent Unit
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
MEMD	Multiple Empirical Mode Decomposition
MVMD	Multivariate Variational Mode Decomposition
$R^{2}$	Coefficient of Determination
RMSE	Root Mean Squared Error
SCADA	Supervisory Control and Data Acquisition
SVM	Support Vector Machine
SOFTS	Series-Core Fused Time Series Forecaster
TCN	Temporal Convolutional Network
VMD	Variational Mode Decomposition
WD	Wavelet Decomposition

References

Li, J.D.; Chen, S.J.; Wu, Y.Q.; Wang, Q.H.; Liu, X.; Qi, L.J.; Lu, X.Y.; Gao, L. How to make better use of intermittent and variable energy? A review of wind and photovoltaic power consumption in China. Renew. Sustain. Energy Rev. 2021, 137, 110626. [Google Scholar] [CrossRef]
Wang, Y.; Zou, R.M.; Liu, F.; Zhang, L.J.; Liu, Q.Y. A review of wind speed and wind power forecasting with deep neural networks. Appl. Energy 2021, 304, 117766. [Google Scholar] [CrossRef]
Liu, L.; Liu, J.C.; Ye, Y.; Liu, H.; Chen, K.; Li, D.; Dong, X.; Sun, M.Z. Ultra-short-term wind power forecasting based on deep Bayesian model with uncertainty. Renew. Energy 2023, 205, 598–607. [Google Scholar] [CrossRef]
Lu, P.; Ye, L.; Zhao, Y.N.; Dai, B.H.; Pei, M.; Tang, Y. Review of meta-heuristic algorithms for wind power prediction: Methodologies, applications and challenges. Appl. Energy 2021, 301, 117446. [Google Scholar] [CrossRef]
Chang, Y.; Yang, H.; Chen, Y.X.; Zhou, M.R.; Yang, H.B.; Wang, Y.; Zhang, Y.R. A Hybrid Model for Long-Term Wind Power Forecasting Utilizing NWP Subsequence Correction and Multi-Scale Deep Learning Regression Methods. IEEE Trans. Sustain. Energy 2023, 15, 263–275. [Google Scholar]
Jin, J.L.; Wen, Q.L.; Zhao, L.Y.; Zhou, C.Y.; Guo, X.J. Measuring environmental performance of power dispatch influenced by low-carbon approaches. Renew. Energy 2023, 209, 325–339. [Google Scholar] [CrossRef]
Capelletti, M.; Raimondo, D.M.; De Nicolao, G. Wind power curve modeling: A probabilistic Beta regression approach. Renew. Energy 2024, 223, 119970. [Google Scholar] [CrossRef]
Monjazeb, M.R.; Amiri, H.; Movahedi, A. Wholesale electricity price forecasting by Quantile Regression and Kalman Filter method. Energy 2024, 290, 129925. [Google Scholar] [CrossRef]
Chen, H. A novel wind model downscaling with statistical regression and forecast for the cleaner energy. J. Clean. Prod. 2024, 434, 140217. [Google Scholar] [CrossRef]
Ikram, R.M.A.; Ewees, A.A.; Parmar, K.S.; Yaseen, Z.M.; Shahid, S.; Kisi, O. The viability of extended marine predators algorithm-based artificial neural networks for streamflow prediction. Appl. Soft Comput. 2022, 131, 109739. [Google Scholar] [CrossRef]
Liu, Y.Q.; Sun, Y.; Infield, D.; Zhao, Y.; Han, S.; Yan, J. A hybrid forecasting method for wind power ramp based on orthogonal test and support vector machine (OT-SVM). IEEE Trans. Sustain. Energy 2016, 8, 451–457. [Google Scholar] [CrossRef]
Ng, K.W.; Huang, Y.F.; Koo, C.H.; Chong, K.L.; El-Shafie, A.; Ahmed, A.N. A review of hybrid deep learning applications for streamflow forecasting. J. Hydrol. 2023, 625, 130141. [Google Scholar] [CrossRef]
Guo, Z.H.; Zhao, W.G.; Lu, H.Y.; Wang, J.Z. Multi-step forecasting for wind speed using a modified EMD-based artificial neural network model. Renew. Energy 2012, 37, 241–249. [Google Scholar]
Liu, H.; Tian, H.Q.; Li, Y.F.; Zhang, L. Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions. Energy Convers. Manag. 2015, 92, 67–81. [Google Scholar] [CrossRef]
Wang, L.; Zeng, Y.; Chen, T. Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst. Appl. 2015, 42, 855–863. [Google Scholar] [CrossRef]
Deng, W.X.; Zhou, H.; Zhou, J.; Yao, J.Y. Neural Network-Based Adaptive Asymptotic Prescribed Performance Tracking Control of Hydraulic Manipulators. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 285–296. [Google Scholar] [CrossRef]
Khodayar, M.; Kaynak, O.; Khodayar, M.E. Rough deep neural architecture for short-term wind speed forecasting. IEEE Trans. Ind. Inform. 2017, 13, 2770–2779. [Google Scholar] [CrossRef]
Hong, Y.Y.; Satriani, T.R.A. Day-ahead spatiotemporal wind speed forecasting using robust design-based deep learning neural network. Energy 2020, 209, 118441. [Google Scholar]
Shi, Z.C.; Liang, H.; Dinavahi, V. Direct interval forecast of uncertain wind power based on recurrent neural networks. IEEE Trans. Sustain. Energy 2017, 9, 1177–1187. [Google Scholar] [CrossRef]
Ewees, A.A.; Al-Qaness, M.A.A.; Abualigah, L.; Abd Elaziz, M. HBO-LSTM: Optimized long short term memory with heap-based optimizer for wind power forecasting. Energy Convers. Manag. 2022, 268, 116022. [Google Scholar] [CrossRef]
Fantini, D.G.; Silva, R.N.; Siqueira, M.B.B.; Pinto, M.S.S.; Guimarães, M.; Junior, A.B. Wind speed short-term prediction using recurrent neural network GRU model and stationary wavelet transform GRU hybrid model. Energy Convers. Manag. 2024, 308, 118333. [Google Scholar] [CrossRef]
Meka, R.; Alaeddini, A.; Bhaganagar, K. A robust deep learning framework for short-term wind power forecast of a full-scale wind farm using atmospheric variables. Energy 2021, 221, 119759. [Google Scholar] [CrossRef]
Zhou, B.; Duan, H.R.; Wu, Q.W.; Wang, H.Z.; Or, S.W.; Chan, K.W.; Meng, Y.F. Short-term prediction of wind power and its ramp events based on semi-supervised generative adversarial network. Int. J. Electr. Power Energy Syst. 2021, 125, 106411. [Google Scholar]
Nascimento, E.G.S.; de Melo, T.A.C.; Moreira, D.M. A transformer-based deep neural network with wavelet transform for forecasting wind speed and wind energy. Energy 2023, 278, 127678. [Google Scholar] [CrossRef]
Ahmadi, M.; Khashei, M. Current status of hybrid structures in wind forecasting. Eng. Appl. Artif. Intell. 2021, 99, 104133. [Google Scholar] [CrossRef]
Yu, C.J.; Li, Y.L.; Chen, Q.; Lai, X.P.; Zhao, L.Y. Matrix-based wavelet transformation embedded in recurrent neural networks for wind speed prediction. Appl. Energy 2022, 324, 119692. [Google Scholar] [CrossRef]
Jiang, W.J.; Liu, B.; Liang, Y.; Gao, H.X.; Lin, P.F.; Zhang, D.Q.; Hu, G. Applicability analysis of transformer to wind speed forecasting by a novel deep learning framework with multiple atmospheric variables. Appl. Energy 2024, 353, 122155. [Google Scholar]
Li, N.; Dong, J.; Liu, L.Y.; Li, H.; Yan, J. A novel EMD and causal convolutional network integrated with Transformer for ultra short-term wind power forecasting. Int. J. Electr. Power Energy Syst. 2023, 154, 109470. [Google Scholar] [CrossRef]
He, Y.Y.; Wang, Y. Short-term wind power prediction based on EEMD–LASSO–QRNN model. Appl. Soft Comput. 2021, 105, 107288. [Google Scholar] [CrossRef]
Karijadi, I.; Chou, S.Y.; Dewabharata, A. Wind power forecasting based on hybrid CEEMDAN-EWT deep learning method. Renew. Energy 2023, 218, 119357. [Google Scholar] [CrossRef]
Bisoi, R.; Dash, P.K.; Parida, A.K. Hybrid variational mode decomposition and evolutionary robust kernel extreme learning machine for stock price and movement prediction on daily basis. Appl. Soft Comput. 2019, 74, 652–678. [Google Scholar] [CrossRef]
Rehman, N.; Mandic, D.P. Multivariate empirical mode decomposition. Proc. R. Soc. A 2010, 466, 1291–1302. [Google Scholar] [CrossRef]
ur Rehman, N.; Aftab, H. Multivariate variational mode decomposition. IEEE Trans. Signal Process. 2019, 67, 6039–6052. [Google Scholar] [CrossRef]
Gupta, P.; Singh, R. Combining a deep learning model with multivariate empirical mode decomposition for hourly global horizontal irradiance forecasting. Renew. Energy 2023, 206, 908–927. [Google Scholar] [CrossRef]
Fang, J.J.; Yang, L.S.; Wen, X.H.; Yu, H.J.; Li, W.D.; Adamowski, J.F.; Barzegar, R. Ensemble learning using multivariate variational mode decomposition based on the Transformer for multi-step-ahead streamflow forecasting. J. Hydrol. 2024, 636, 131275. [Google Scholar] [CrossRef]
Yang, T.; Yang, Z.N.; Li, F.; Wang, H.Y. A short-term wind power forecasting method based on multivariate signal decomposition and variable selection. Appl. Energy 2024, 360, 122759. [Google Scholar] [CrossRef]
Han, L.; Chen, X.Y.; Ye, H.J.; Zhan, D.C. SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion. arXiv 2024, arXiv:2404.14197. [Google Scholar] [CrossRef]
Zeiler, M.D.; Fergus, R. Stochastic Pooling for Regularization of Deep Convolutional Neural Networks. arXiv 2013, arXiv:1301.3557. [Google Scholar] [CrossRef]

Figure 1. The architecture of the SOFTS model.

Figure 2. The working principle of the STAR module.

Figure 3. The architecture of the MVMD-SOFTS model.

Figure 4. The error metrics of the SOFTS model compared to other models across the four seasonal datasets: (a)

R^{2}

; (b) MAE; (c) RMSE; (d) MAPE.

Figure 4. The error metrics of the SOFTS model compared to other models across the four seasonal datasets: (a)

R^{2}

; (b) MAE; (c) RMSE; (d) MAPE.

Figure 5. The MVMD decomposition results for the four datasets: (a) wind power; (b) wind speed; (c) wind direction; (d) temperature; (e) air pressure; and (f) humidity.

Figure 6. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the spring data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.

Figure 7. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the summer data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.

Figure 8. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the autumn data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.

Figure 9. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the winter data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.

Table 1. Comparative analysis of model computational complexity and parallelization capability.

Model	Exact Complexity	Core Computation	Parallelization
LSTM	$O (L d C + L d^{2} + C d H)$	$O (L d^{2})$	Low (sequential)
Transformer	$O (C L d + L^{2} d + H L d + C d H)$	$O (L^{2} d + H L d)$	Medium (partial)
SOFTS	$O (C L d + C d^{2} + C d H)$	$O (C d^{2})$	High (fully parallel)

Table 2. The detailed description of wind power-related datasets.

Dataset	Variable	Size	Max	Min	Mean	Std	Skewness	Kurtosis
Spring dataset	Wind power (MW)	8832	200.01	0.03	91.35	69.33	0.16	−1.43
	Wind speed (m/s)		22.76	0.32	8.76	4.71	0.33	−0.68
	Wind direction (°)		355.56	3.99	144.01	85.78	0.65	−1.15
	Temperature (°C)		32.34	−5.04	13.43	7.58	−0.15	−0.57
	Air pressure (hPa)		878.62	855.47	868.17	3.99	−0.33	−0.22
	Humidity (%)		94.53	4.83	26.32	15.54	1.45	2.18
Summer dataset	Wind power (MW)	8832	199.31	0.07	68.35	57.08	0.38	−1.14
	Wind speed (m/s)		20.60	0.32	7.62	4.23	0.40	−0.64
	Wind direction (°)		354.43	0	146.66	88.66	0.62	−1.14
	Temperature (°C)		40.13	13.75	26.71	5.04	0.17	−0.69
	Air pressure (hPa)		875.04	856.89	865.95	2.76	0.29	0.43
	Humidity (%)		93.04	7.63	28.28	15.04	1.45	2.32
Autumn dataset	Wind power (MW)	8736	200.00	0.04	65.55	62.99	0.58	−1.03
	Wind speed (m/s)		21.62	0.32	7.35	4.70	0.57	−0.62
	Wind direction (°)		355.56	0	160.34	85.16	0.20	−1.55
	Temperature (°C)		36.01	−14.47	11.55	10.65	−0.03	−0.77
	Air pressure (hPa)		882.59	859.96	871.58	4.33	−0.34	−0.47
	Humidity (%)		95.24	9.50	35.23	16.72	1.01	0.73
Winter dataset	Wind power (MW)	8640	200.05	0.03	55.59	68.64	1.05	−0.38
	Wind speed (m/s)		19.78	0.32	6.06	4.58	0.88	−0.10
	Wind direction (°)		355.61	0	177.63	87.71	−0.16	−1.58
	Temperature (°C)		9.96	−18.66	−6.78	4.94	0.40	0.06
	Air pressure (hPa)		882.02	857.17	869.86	4.73	−0.06	−0.39
	Humidity (%)		95.27	11.97	58.60	15.33	−0.18	−0.22

Table 3. The parameter settings of the prediction models.

Models	Parameters
CNN	Number of layers: 1; kernel size: 1; filters: 32; activation function: ReLU.
TCN	Number of layers: 1; kernel size: 1; filters: 32; activation function: ReLU.
LSTM	Number of layers: 2; hidden sizes: [128, 64]; activation function: ReLU.
GRU	Number of layers: 2; hidden sizes: [128, 64]; activation function: ReLU.
Transformer	Encoder layers: 2; heads: 4; model dimension: 64; feedforward dimension: 128.
SOFTS	STAR blocks: 2; core dimension: 64; feedforward dimension: 128; activation function: GeLU.

Table 4. Comparison of single forecasting models on wind power datasets (L = 24).

Dataset	Model	Evaluation Metrics
Dataset	Model	$R^{2}$	MAE (MW)	RMSE (MW)	MAPE (%)
Spring dataset	CNN	0.8570	11.3172	19.6229	12.7090
	TCN	0.9432	8.3824	12.3682	9.4133
	LSTM	0.9600	6.1674	10.3745	6.9259
	GRU	0.9602	6.2534	10.3540	7.0224
	Transformer	0.9627	5.4812	10.0261	6.1553
	SOFTS	0.9648	5.1321	9.7475	5.7475
Summer dataset	CNN	0.8648	13.3190	19.1517	20.1789
	TCN	0.9635	6.7929	9.9548	10.2916
	LSTM	0.9687	6.0819	8.7648	9.2144
	GRU	0.9714	5.7387	9.2089	8.6943
	Transformer	0.9720	5.7036	8.7122	8.6412
	SOFTS	0.9727	5.2262	8.6153	7.8813
Autumn dataset	CNN	0.9323	12.1398	18.6317	12.2910
	TCN	0.9689	9.0879	12.6255	9.2012
	LSTM	0.9784	6.0780	10.5238	6.1538
	GRU	0.9831	5.7694	9.3113	5.8413
	Transformer	0.9827	6.5001	9.4693	6.6582
	SOFTS	0.9864	4.6673	8.3294	4.7435
Winter dataset	CNN	0.8900	17.7266	23.3937	17.0917
	TCN	0.9651	9.7085	13.8147	9.3608
	LSTM	0.9690	7.5356	12.4233	7.2657
	GRU	0.9765	6.8146	10.8050	6.5705
	Transformer	0.9786	6.0951	10.3114	5.8786
	SOFTS	0.9810	5.4381	9.7437	5.2346

Table 5. Training time vs. input length.

Dataset	Model	Training Time (s)
Dataset	Model	$L = 24$	$L = 48$	$L = 96$
Spring dataset	Transformer	87.82	110.35	180.75
Spring dataset	SOFTS	66.13	67.82	68.33
Summer dataset	Transformer	85.31	110.81	179.45
Summer dataset	SOFTS	57.41	61.67	59.40
Autumn dataset	Transformer	84.50	111.13	176.55
Autumn dataset	SOFTS	56.47	60.41	61.11
Winter dataset	Transformer	87.99	111.06	179.82
Winter dataset	SOFTS	60.15	62.44	62.03

Table 6. Central frequencies of IMFs decomposed by VMD and MVMD in seasonal datasets (Note: Autumn and winter results are omitted for brevity).

Dataset	Method	Sequence	The Center Frequency of the Decomposed IMFs
Dataset	Method	Sequence	IMF1	IMF2	IMF3	IMF4	IMF5	IMF6	IMF7	IMF8
Spring dataset	VMD	Wind power (MW)	0.0002	0.0087	0.0176	0.0279	0.0440	0.0718	0.1064	0.1524
		Wind speed (m/s)	0.0001	0.0098	0.0220	0.0446	0.0843	0.1502	0.3327	0.4398
		Wind direction (°)	0.0001	0.0100	0.0278	0.0574	0.1062	0.1830	0.2774	0.4583
		Temperature (°C)	0.0000	0.0141	0.0301	0.0413	0.0534	0.0752	0.1096	0.1669
		Air pressure (hPa)	0.0000	0.0064	0.0203	0.0295	0.0434	0.0712	0.1184	0.1796
		Humidity (%)	0.0001	0.0097	0.0189	0.0306	0.0501	0.0759	0.1152	0.1853
	MVMD	All sequences	0.0000	0.0112	0.0312	0.0680	0.1239	0.2491	0.3768	0.4622
Summer dataset	VMD	Wind power (MW)	0.0002	0.0097	0.0196	0.0320	0.0497	0.0704	0.1091	0.1543
		Wind speed (m/s)	0.0001	0.0096	0.0223	0.0450	0.0766	0.1267	0.2227	0.4004
		Wind direction (°)	0.0001	0.0112	0.0310	0.0685	0.1190	0.1986	0.3077	0.4206
		Temperature (°C)	0.0000	0.0105	0.0299	0.0479	0.0639	0.0865	0.1264	0.1685
		Air pressure (hPa)	0.0000	0.0100	0.0208	0.0369	0.0632	0.1032	0.1511	0.3506
		Humidity (%)	0.0001	0.0091	0.0135	0.0259	0.0429	0.0657	0.1019	0.1508
	MVMD	All sequences	0.0000	0.0118	0.0357	0.0803	0.1613	0.2496	0.3560	0.4482

Table 7. Comparison of hybrid forecasting models on wind power datasets (L = 24).

Dataset	Model	Evaluation Metrics
Dataset	Model	$R^{2}$	MAE (MW)	RMSE (MW)	MAPE (%)
Spring dataset	CEEMDAN-SOFTS	0.9896	3.2279	5.2888	3.5968%
	VMD-SOFTS	0.9877	3.3302	5.7448	3.7108%
	MVMD-SOFTS	0.9929	2.8918	4.3718	3.2223%
Summer dataset	CEEMDAN-SOFTS	0.9914	3.1817	4.8365	4.8260%
	VMD-SOFTS	0.9925	3.0380	4.5030	4.6080%
	MVMD-SOFTS	0.9927	3.2620	4.4365	4.9478%
Autumn dataset	CEEMDAN-SOFTS	0.9961	2.7287	4.4725	2.7543%
	VMD-SOFTS	0.9950	3.0702	5.0496	3.0989%
	MVMD-SOFTS	0.9973	2.6514	3.7367	2.6762%
Winter dataset	CEEMDAN-SOFTS	0.9934	3.5460	5.7395	3.4355%
	VMD-SOFTS	0.9911	4.1819	6.6642	4.0516%
	MVMD-SOFTS	0.9952	3.4919	4.8814	3.3831%

Table 8. Comparative experiments of multi-step forecasting for the spring dataset (L = 24).

Model	2-Step				3-Step				4-Step
Model	$R^{2}$	MAE	RMSE	MAPE (%)	$R^{2}$	MAE	RMSE	MAPE (%)	$R^{2}$	MAE	RMSE	MAPE (%)
CNN	0.7405	17.0644	26.4334	19.1631	0.6300	20.7190	30.5648	23.2671	0.5714	22.1235	33.9684	24.8443
TCN	0.8676	11.1425	18.8784	12.5129	0.7740	14.0880	24.6652	15.8205	0.7016	17.6033	28.3444	19.7602
LSTM	0.8709	10.4355	18.6450	11.7301	0.7686	14.2536	24.9618	16.0065	0.6838	17.5598	29.1786	19.7193
GRU	0.8793	9.9493	18.0261	11.1728	0.7841	14.2133	24.1097	15.9669	0.6952	16.6860	28.6454	18.7381
Transformer	0.8587	10.7625	19.5031	12.0861	0.7664	14.5267	25.0804	16.3132	0.6676	17.6698	29.9146	19.8429
SOFTS	0.8831	9.6120	17.7453	10.7142	0.7930	12.8176	23.6259	14.2914	0.7006	15.8282	28.4243	17.6313
CEEMDAN-SOFTS	0.9770	4.7624	7.8779	5.3085	0.9564	6.7900	10.8466	7.5079	0.9384	8.1839	12.9666	9.1265
VMD-SOFTS	0.9846	3.6505	6.4312	4.0691	0.9785	4.3768	7.6061	4.8801	0.9745	4.9761	8.2975	5.4982
MVMD-SOFTS	0.9921	3.0255	4.6270	3.3725	0.9907	3.2537	5.0183	3.6279	0.9871	3.8649	5.9108	4.3101

Table 9. Comparative experiments of multi-step forecasting for the summer dataset (L = 24).

Model	2-Step				3-Step				4-Step
Model	$R^{2}$	MAE	RMSE	MAPE (%)	$R^{2}$	MAE	RMSE	MAPE (%)	$R^{2}$	MAE	RMSE	MAPE (%)
CNN	0.8091	14.4060	22.7560	21.8257	0.7053	19.3453	28.2768	29.3091	0.6125	21.7539	32.4525	32.9532
TCN	0.8997	10.5936	16.4916	16.0498	0.8150	14.3249	22.4014	21.7029	0.7235	17.7496	27.3864	26.8915
LSTM	0.8997	10.3511	16.4931	15.6884	0.7974	14.7626	23.4437	22.3661	0.7053	18.1328	28.2743	27.4720
GRU	0.9089	9.9358	15.7238	15.0519	0.8243	14.3810	21.8308	21.7878	0.7406	17.2930	26.5298	26.1997
Transformer	0.8978	10.6547	16.6473	16.1424	0.8179	14.1368	22.2290	21.4178	0.7149	17.6960	28.0309	26.1320
SOFTS	0.9145	9.5241	15.2423	14.4202	0.8416	12.9301	20.7475	19.5415	0.7353	16.0452	25.8832	24.2027
CEEMDAN-SOFTS	0.9849	4.2965	6.4028	6.5052	0.9749	5.8023	8.2575	8.7691	0.9607	7.3909	10.3364	11.1485
VMD-SOFTS	0.9902	3.4907	5.1507	5.2852	0.9870	4.0863	5.9344	6.1758	0.9854	4.5407	6.3092	6.8493
MVMD-SOFTS	0.9920	3.4080	4.6653	5.1600	0.9917	3.4630	4.7524	5.3379	0.9885	3.9501	5.6042	5.9584

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, W.; Lu, Z.; Liu, W.; Cao, Y. Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data. Forecasting 2026, 8, 15. https://doi.org/10.3390/forecast8010015

AMA Style

Lu W, Lu Z, Liu W, Cao Y. Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data. Forecasting. 2026; 8(1):15. https://doi.org/10.3390/forecast8010015

Chicago/Turabian Style

Lu, Wentian, Zhenming Lu, Wenjie Liu, and Yifeng Cao. 2026. "Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data" Forecasting 8, no. 1: 15. https://doi.org/10.3390/forecast8010015

APA Style

Lu, W., Lu, Z., Liu, W., & Cao, Y. (2026). Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data. Forecasting, 8(1), 15. https://doi.org/10.3390/forecast8010015

Article Menu

Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Multivariate Variational Mode Decomposition

2.2. Series-Core Fused Time Series (SOFTS) Model

2.3. MVMD-SOFTS Framework Structure

2.4. Computation Complexity Comparison

3. Data Preparation and Evaluation Metrics

3.1. Data Preparation

3.2. Evaluation Metrics

4. Experiments and Analysis

4.1. Comparative Experiments of Single Forecasting Models

4.2. Comparative Experiments of Different Decomposition Forecasting Methods

4.3. Comparative Experiments of Multi-Step Forecasting

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI