Next Article in Journal
Machine Learning Forecasting of Strong Subsequent Events in New Zealand Using the NESTORE Algorithm
Previous Article in Journal
The Impact of ESG Performance on Financial Performance: Evidence from Listed Companies in Thailand
Previous Article in Special Issue
Forecasting the U.S. Renewable-Energy Mix with an ALR-BDARMA Compositional Time-Series Framework
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data

School of Mechanical and Electric Engineering, Guangzhou University, Guangzhou 510006, China
*
Author to whom correspondence should be addressed.
Forecasting 2026, 8(1), 15; https://doi.org/10.3390/forecast8010015
Submission received: 2 January 2026 / Revised: 29 January 2026 / Accepted: 10 February 2026 / Published: 12 February 2026
(This article belongs to the Collection Energy Forecasting)

Highlights

What are the main findings?
  • An innovative MVMD algorithm extracts informative time–frequency features via joint analysis of wind power and meteorological data, boosting short term wind power prediction accuracy.
  • A SOFTS framework integrated with the STAR aggregation and redistribution mechanism streamlines computation, enhances efficiency and suits real-time wind power forecasting scenarios well.
What are the implications of the main findings?
  • The MVMD-SOFTS model breaks dual bottlenecks of traditional single variable decomposition and high Transformer computational cost, providing a new high accuracy and high efficiency paradigm for renewable energy forecasting.
  • The MVMD-SOFTS model maintains excellent multi-step prediction accuracy and real time computational efficiency, meeting practical short time scale forecasting needs in power grid scheduling.

Abstract

Accurate wind power forecasting is critical for enhancing the operational efficiency and stability of electrical power grids. Conventional single-variable signal decomposition forecasting methods ignore the coupling relationship between wind power and multiple meteorological data, thus limiting prediction accuracy. This study proposes an accurate and fast short-term wind power prediction approach based on series-core fusion technology considering multiple meteorological data. In the data preprocessing stage, the multivariate variational mode decomposition (MVMD) algorithm decomposes wind power and meteorological variables into the same predefined number of frequency-aligned intrinsic mode functions (IMFs), thereby enhancing feature representation and improving forecasting accuracy via a more comprehensive and detailed dataset representation. During the training stage, the series-core fused time series (SOFTS) model establishes the connection among wind power channel and other meteorological variable channels for each IMF, achieving fast convergence through its streamlined and parallel structure. In the forecasting stage, the final wind power prediction is generated by the reconstruction of all IMFs. Furthermore, we conducted a comprehensive performance evaluation by comparing the proposed MVMD-SOFTS model with eight alternative models, including the CNN model, the TCN model, the LSTM model, the GRU model, the Transformer model, the SOFTS model, the CEEMDAN-SOFTS model, and the VMD-SOFTS model. The results indicate that MVMD-SOFTS outperformed all other models, demonstrating its effectiveness in capturing the multifaceted relationships in wind power forecasting.

1. Introduction

Wind power, as a quintessential renewable energy source, has undergone remarkably rapid global expansion in recent years, establishing itself as a leading sustainable alternative to traditional fossil fuels, due to its substantial advantages in sustainability and environmental performance [1]. However, the inherent intermittency and variability of wind energy—characteristics that define it as an intermittent power source—pose significant challenges to the operational safety and stability of large-scale grid-integrated wind power systems [2]. As the integration of wind energy into power systems increases, accurate, timely, and reliable wind power forecasting becomes critical for effective power system planning, dispatch, and secure grid operations [3].
Wind power forecasting (WPF) methods are typically categorized into four main approaches: physical models, statistical models, artificial intelligence-based techniques, and hybrid forecasting methods [4]. Chang et al. [5] propose a novel long-term WPF hybrid model that corrects numerical weather prediction (NWP) wind speed and uses multi-scale deep learning regression prediction to exclude excessive NWP data. However, the accuracy of physical models is heavily reliant on the precision of input meteorological data and is highly sensitive to fluctuations in weather conditions. Statistical models utilize historical data to derive relationships between the wind speed and power output. Commonly employed techniques include autoregressive integrated moving average (ARIMA) [6], linear regression [7], and Kalman filtering [8]. These methods are particularly effective for short-term and very short-term forecasting under the condition of high-quality historical data. Chen [9] proposed an innovative statistical downscaling technique for meteorological wind models, demonstrating that while statistical models are generally straightforward to implement and computationally efficient, their performance can deteriorate under complex nonlinear dynamics or rapidly changing weather conditions.
Artificial intelligence (AI)-based prediction techniques encompass a wide range of models, including artificial neural network (ANN) [10], support vector machine (SVM) [11], and deep learning models (DL) [12]. Traditional ANNs—such as feedforward neural network (FNN) [13], multilayer perceptron (MLP) [14], backpropagation neural network (BPNN) [15], and radial basis function neural network (RBFNN) [16]—are highly effective at capturing the inherent temporal and spatial correlations within wind power datasets. However, their performance may degrade significantly when processing large-scale datasets due to the increased data complexity, presenting substantial challenges for model scalability and computational efficiency. Deep learning (DL), an advanced paradigm within machine learning, has emerged as a powerful and versatile tool for wind power forecasting due to its superior capacity for autonomous feature extraction and modeling intricate nonlinear dependencies within high-dimensional datasets. The predominant DL architectures deployed in this domain fall into four principal categories: deep neural networks (DNNs) [17], convolutional neural networks (CNNs) [18], recurrent neural networks (RNNs) [19], and enhanced RNN variants—long short-term memory (LSTM) [20] and gated recurrent unit (GRU) [21]—specifically engineered to mitigate vanishing gradient challenges in long-term wind sequence modeling. CNNs exhibit robust feature extraction capabilities and computational efficiency, making them well-suited for spatial–temporal pattern analysis in wind datasets. As a time-series-adapted variant of CNNs, Temporal Convolutional Networks (TCNs) [22] are specifically designed to capture both short- and long-term temporal dependencies more effectively, thereby enhancing the accuracy and reliability of wind power predictions. Complementing these approaches, generative adversarial networks (GANs) have emerged as effective frameworks for addressing data scarcity and distributional uncertainty in wind power forecasting tasks, particularly through semi-supervised learning paradigms [23]. Recently, Transformer architectures have revolutionized wind power forecasting through multi-head self-attention mechanisms to simultaneously model localized fluctuations and global trend correlations. Erick et al. [24] introduced a transformer-based architecture with adaptive positional encoding, specifically optimized for wind power sequences. This innovation has demonstrated superior accuracy and reliability in long-term forecasting, solidifying Transformers as a state-of-the-art methodology in the domain.
By combining individual forecasting models’ benefits, hybrid forecasting models have become a key approach across various forecasting domains [25]. This integrative framework can retain the benefits of each model individually while effectively reducing the uncertainty arising from exclusive reliance on single methodologies. As a key subcategory within hybrid forecasting frameworks, signal decomposition-based combined models significantly enhance wind power forecasting accuracy by systematically reducing input data complexity. Common signal decomposition technologies include univariate and multivariate algorithms. Univariate algorithms comprise wavelet decomposition (WD) [26], variational mode decomposition (VMD) [27], empirical mode decomposition (EMD) [28], and enhanced EMD variants—ensemble empirical mode decomposition (EEMD) [29], complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [30], etc. Ranjeeta Bisoi et al. [31] demonstrate VMD’s superiority over EMD, particularly in noise robustness and feature extraction precision for predictive modeling applications.
However, these univariate decomposition algorithms are ineffective for processing multivariate data. In wind power forecasting, datasets are typically multidimensional, comprising multiple correlated time series such as wind speed, temperature, and pressure. Consequently, the prediction accuracy of such methods is inherently limited. Unlike univariate decomposition, multivariate techniques (MEMD [32], MVMD [33]) and their hybrid derivatives (e.g., MEMD-GRU [34], MVMD-Transformer [35], MVMD-CNN-BiLSTM [36]) can effectively capture cross-variable dependencies, enabling more robust system modeling and superior prediction accuracy compared to traditional approaches. While effective, these multivariate decomposition-based hybrid models incur significantly higher training costs in terms of time and energy consumption, thereby limiting their applicability in sustainable forecasting tasks.
To address these limitations, we propose an accurate yet computationally efficient short-term wind power prediction framework that combines MVMD with our novel Series-Core Fused Time Series (SOFTS) approach. While MVMD delivers superior prediction accuracy, its computational demands remain substantial. The proposed SOFTS technique effectively mitigates this computational burden while preserving predictive performance. The key contributions of this work include the following:
(1)
High prediction accuracy: In the data processing stage, we propose the MVMD algorithm to simultaneously decompose the meteorological data series and the wind power data series, effectively addressing the frequency mismatch between the meteorological and wind power sequences. This approach enables time–frequency synchronized analysis of both meteorological variables and wind power generation series, thereby ensuring high prediction accuracy.
(2)
Low computational cost: In the prediction training stage, we propose the SOFTS framework, which employs a STAR aggregate–redistribute module within a centralized architecture. The STAR module aggregates all series to generate a global core representation, which is subsequently redistributed and fused with individual series representations, enabling efficient cross-channel interactions. Its computational complexity primarily scales with the number of input channels rather than the input sequence length. Notably, we provide a theoretical analysis of the computational complexity in comparison with the existing methods (see the results in Table 1). Our theoretical analysis shows that the core computational complexity of the proposed method is O ( C d 2 ) , which represents a significant reduction compared to the O ( L d 2 ) complexity of LSTM and the O ( L 2 d + H L d ) complexity of Transformer architectures.
(3)
Practical simulation validation: A real-world dataset from the Xinjiang Guohua Jingxia North Wind Farm was used to compare the MVMD-SOFTS model with eight benchmark models, including the advanced Transformer model. The results demonstrate that the MVMD-SOFTS model achieves superior performance in both single-step and multi-step ahead forecasting.
The remainder of this paper is organized as follows. Section 2 introduces the overall framework and methodology of the proposed model. Section 3 describes the data preparation process and the evaluation metrics employed. Section 4 presents the experimental setup and results, including detailed comparisons with baseline methods. Section 5 concludes the paper and outlines potential directions for future research.

2. Materials and Methods

2.1. Multivariate Variational Mode Decomposition

As a multivariate extended signal decomposition algorithm based on VMD, MVMD has recently gained popularity. MVMD can simultaneously decompose meteorological data series and wind power time series, allowing for the capture of dynamic characteristics of wind power while effectively incorporating the influence of meteorological factors on wind power fluctuations. In contrast to traditional univariate decomposition methods, MVMD overcomes the limitations of single-signal processing by providing more comprehensive time-frequency information, improving the robustness and accuracy of the forecasting model.
The MVMD algorithm was initially proposed by Naveed ur Rehman and Hania Aftab in 2019 [33]. The MVMD decomposition process is outlined as follows:
(1)
Define input data. The input data consists of the wind power series along with meteorological data sequences, mathematically expressed as
x ( t ) = [ W P ( t ) , W S ( t ) , W D ( t ) , T ( t ) , P ( t ) , H ( t ) ]
where W P ( t ) , W S ( t ) , W D ( t ) , T ( t ) , P ( t ) , and H ( t ) denote wind power, wind speed, wind direction, temperature, atmospheric pressure, and humidity, respectively. The variable t denotes time.
(2)
Signal decomposition model. The goal is to decompose the original multivariate input signal x c ( t ) c = 1 C into an ensemble of K multivariate modulated oscillatory components u k , c ( t ) k = 1 K c = 1 C while meeting the following requirements: (i) the cumulative bandwidth of the extracted modes is as small as possible; (ii) the aggregate of the extracted modes precisely reconstructs the original signal. The constrained optimization problem can be formulated as
min u k , c ω k k = 1 K c = 1 C t u + k , c ( t ) e j ω k t 2 2 subject to x c ( t ) = k = 1 K u k , c ( t ) , c = 1 , 2 , , C
where K and C denote the number of IMFs and channels, respectively; t denotes the partial derivative operation with respect to time; u + k , c ( t ) denotes the analytic signal characterized by a unilateral frequency spectrum for u k , c ( t ) using the Hilbert–Huang Transform; ω k represents the central frequency of the kth IMFs set u k , c ( t ) c = 1 C , which is shared by multichannel oscillations; x c is the input signal of the cth data channel, encompassing both wind power time series and meteorological data sequences.
(3)
Form augmented Lagrangian function. By introducing Lagrangian multipliers and quadratic penalty terms, the aforementioned constrained optimization problem can be converted to an augmented Lagrangian function as
L u k , c , ω k , λ c = α k = 1 K c = 1 C t u + k , c ( t ) e j ω k t 2 2 + c = 1 C x c ( t ) k = 1 K u k , c ( t ) 2 2 + c = 1 C λ c ( t ) , x c ( t ) k = 1 K u k , c ( t ) .
where α serves as the weighting factor for the penalty.
(4)
Alternating Direction Method of Multipliers (ADMM) iterations. Using ADMM, the complete optimization problem is decomposed into a sequence of iterative sub-optimization problems. Note that problem (3) only contains equality constraints, which allow the ADMM iterations to form a type of closed-form solution to the subproblems, thus reducing the difficulty of the solution process. The closed-form update equations for the modes u ^ k , c ( ω ) and the center frequency are presented below:
u ^ k , c m + 1 ( ω ) = x ^ c ( ω ) n k u ^ n , c ( ω ) + λ ^ c ( ω ) 2 1 + 2 α ω ω k 2
ω k m + 1 = c = 1 C 0 ω u ^ k , c m + 1 ( ω ) 2 d ω c = 1 C 0 u ^ k , c m + 1 ( ω ) 2 d ω
where x ^ c ( ω ) , λ ^ c ( ω ) , u ^ k , c ( ω ) represent Fourier transforms of x c ( ω ) , λ c ( ω ) , u k , c ( ω ) , and m denotes the current iterations. Ultimately, after executing the aforementioned processing steps, six sets of sub-series W P k ( t ) , W S k ( t ) , W D k ( t ) , T k ( t ) , P k ( t ) , H k ( t ) are obtained.

2.2. Series-Core Fused Time Series (SOFTS) Model

To address the computational complexity issues arising by MVMD, this paper presents an efficient MLP-based model, the series-core fused time series (SOFTS) model [37]. The architecture of the SOFTS model is depicted in Figure 1, which comprise the following four components.
(1) Reversible Instance Normalization. Normalization is a fundamental preprocessing step in time series forecasting models. In SOFTS, reversible instance normalization is employed to enhance the stability of the prediction process. Initially, the historical time series are normalized by centering them to zero mean and scaling them to unit variance. This normalization effectively removes the local statistical dependencies within the data, thereby facilitating more stable and reliable predictions by the base forecaster. Once the forecasting is completed, the normalization is reversed to restore the original statistical properties of the predicted series. This approach has been widely adopted in state-of-the-art models to improve performance and ensure the model’s adaptability to various time series characteristics.
(2) Series Embedding. Series embedding projects each channel of the input time series into a hidden-dimensional space through a linear transformation. This transformation serves to prepare the time series data for subsequent processing while preserving the essential temporal dependencies inherent in the series. In our approach, we apply series embedding to the input historical data by linearly projecting X R C × L into S 0 R C × H , where L denotes the length of the historical time steps used for forecasting, and H is the dimensionality of the hidden layer.
S 0 = Series embedding ( X )
(3) STAR Module. A star-shaped aggregate-redistribute model, STAR model for short, is used to achieve information exchanges between different data channels, which represents the core innovation of SOFTS. Unlike traditional methods like attention, which involve pairwise comparisons between channels, STAR uses a centralized structure to aggregate the information from all series to obtain a comprehensive core representation and then distribute the core information to each channel, as shown in Figure 2. This interaction pattern addresses not only the complexity and inefficiency of distributed interactions but also the robustness when there are abnormal channels. The input data S 0 from the series embedding is refined in sequence through N layers of the STAR module. Each layer iteratively processes the embedding from the previous layer, capturing increasingly complex patterns and dependencies within the multivariate time series. The output at the nth layer is updated as follows:
S n = STAR S n 1 , n = 1 , 2 , , N .
Specifically, the nth layer STAR module first extracts the core representation of the multivariate time series when provided with the series representations of each channel as input. The core representation O is defined as follows:
O n = f ( s 1 , s 2 , , s C ) , n = 1 , 2 , , N
where f denotes an arbitrary function, and S n = s 1 n , s 2 n , , s C n represent input multivariate series comprising C channels.
The core representation encodes the global information across all the date channels. We employ the stochastic pooling technology [38] to get the core representation by aggregating representations of C channels:
O n = Stoch_Pooling MLP 1 S n 1 ,
where the role of MLP 1 is to transform the sequence representation from the hidden dimension H of the sequence embedding to the core dimension H′ using the GELU activation function. ( MLP 1 : R C × H R C × H ) . Stoch_Pooling refers to the stochastic pooling processing, which effectively combines the advantages of max pooling and average pooling. Specifically, it normalizes these softmax activations to derive a probability distribution, where each channel’s activation value corresponds to a specific probability p:
p c j = e A c j c = 1 C e A c j , c = 1 , 2 , , C , j = 1 , 2 , , H
During training, we use the stochastic sampling method to randomly select core value o j based on probability p to pick a channel c within the dimension j. This selection follows activation probabilities, serving as the core representation to enhance the model’s generalization ability:
o j = A c j , where c P ( p 1 j , p 2 j , , p C j ) .
During the testing phase, a weighted summation method is used to obtain the core representation for each dimension to ensure model stability:
o j = c = 1 C p c j A c j .
Subsequently, we use the following form to fuse the representations of the core and all the associated series, consolidating the information from these distinct components into a unified representation for further analysis:
F n = Repeat_Concat S n 1 , O n ,
S n = MLP 2 F n + S n 1 ,
where the Repeat_Concat operation involves concatenating the core representation O n = { o 1 n , o 2 n , , o H n } with each individual series representation (as shown in Figure 2, f c n = [ s c n , O n ] ), resulting in a new representation F n R C × ( H + H ) , i.e., F n = { f 1 n , f 2 n , , f C n } . Subsequently, MLP 2 is utilized to project the concatenated representation back into the hidden dimension, effectively fusing the information from both the core and series representation, resulting in the fused representation S n R C × H ( MLP 2 : R C × ( H + H ) R C × H ).
(4) Linear Predictor. After performing N layers of STAR models in sequence, we can obtain the fused representation at the Nth layer, denoted by S N R C × H . Then, we can use a linear predictor ( R C × H R C × L ) to generate the forecasting results, given by the following formula:
Y = Linear S N .

2.3. MVMD-SOFTS Framework Structure

The framework of the proposed MVMD-SOFTS forecasting model is depicted in Figure 3, and the specific steps are outlined as follows.
Step 1: Data decomposition. The input data comprises the wind power generation time series and the meteorological data time series such as the wind speed, wind direction, temperature, air pressure, and humidity. Based on the MVMD algorithm, the input multivariate signals are decomposed into a predefined number (denoted as K) of IMFs. This decomposition process separates the complex non-stationary data into simpler oscillatory components with distinct frequencies, thereby capturing the underlying patterns and trends in both the wind power generation and meteorological data. In this case study, the input variables are decomposed into eight distinct IMFs, each corresponding to a different frequency. These IMFs are crucial for subsequent analysis and forecasting, as they offer a more manageable and interpretable representation of the temporal dynamics inherent in the input data.
Step 2: Model prediction. For each IMF, we use SOFTS architecture to capture the temporal dependencies and channel correlation among wind power and meteorological variable channels, enabling producing the anticipated future behavior of wind power generation and meteorological variables at each frequency scale. These forecasted IMFs are subsequently utilized in the following steps to reconstruct the final prediction of the system’s behavior.
Step 3: Reconstruction and evaluation. By summing all the forecasted IMFs, this aggregation process can produce a comprehensive prediction for wind power generation and meteorological variables. Following reconstruction, error analysis is conducted using evaluation metrics such as the coefficient of determination ( R 2 ), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). These metrics quantify the discrepancies between the predicted and actual values, providing a thorough assessment of the model’s performance and accuracy. This step is essential for identifying potential areas for improvement and ensuring the reliability of the forecasting model.

2.4. Computation Complexity Comparison

Table 1 outlines the theoretical complexity of LSTM, Transformer, and SOFTS models. Each complexity formulation includes three components: input encoding, core computation (recurrent-based, attention-based, or MLP-based), and multi-step forecasting output. Here, C denotes the number of input channels, L represents the length of the input historical sequence, d is the hidden dimension, and H refers to the length of the forecast horizon.
For the LSTM model, the complexity term O ( L d C ) arises from projecting a multivariate input sequence of length L and channel C into a hidden space. The main computational cost O ( L d 2 ) results from the recurrent hidden-to-hidden transformations, which are carried out sequentially over time steps. The output complexity O ( C d H ) corresponds to mapping the hidden states to H forecast steps, with each step producing feature outputs across all C channels through a fully connected layer.
For the Transformer model, the complexity term O ( C L d ) accounts for embedding a multivariate input sequence with channel C and length L into a d-dimensional representation. The primary computational burden comes from the encoder’s self-attention mechanism, which incurs a complexity of O ( L 2 d ) due to pairwise interactions across all input positions. Furthermore, the decoder contributes an additional cost of O ( H L d ) through cross attention, as each of the H forecast steps attends to the entire encoded sequence. The output complexity O ( C d H ) results from transforming decoder outputs into final predictions, where each step generates C features through a fully connected layer.
For the SOFTS model, the complexity term O ( C L d ) reflects the temporal encoding of each input channel over the historical sequence. The core computational load O ( C d 2 ) stems from the STAR module, where inter-channel interactions are captured through parallel MLP operations. This design avoids the sequential dependencies present in recurrent or attention-based models, enabling efficient and fully parallel computation. The final term O ( C d H ) corresponds to producing multi-step predictions from the learned representations using a shared output layer.
Overall, LSTM involves sequential computation, where hidden states are updated step by step, resulting in a dominant cost of O ( L d 2 ) . Transformer requires intensive computation due to the encoder’s self-attention mechanism with complexity O ( L 2 d ) , and an additional cost of O ( H L d ) is introduced by the decoder’s cross-attention mechanism. While the encoder supports full parallelism, the decoder remains partially sequential during prediction. SOFTS offers a more efficient structure, with all operations being parallelizable. Its overall complexity grows linearly with the sequence length L, the channel count C, and the prediction horizon H, and it avoids both quadratic attention costs and recursive updates.

3. Data Preparation and Evaluation Metrics

3.1. Data Preparation

To evaluate the performance of the proposed model, a real-world dataset was utilized, obtained from the Guohua Jingxia North Wind Farm located in Xinjiang, China, covering the period from 1 January to 31 December 2019. The dataset was collected by the Supervisory Control and Data Acquisition (SCADA) system of the wind farm, which recorded high-frequency measurements at 15 min intervals. The variables included the wind speed at hub height, wind direction, temperature, humidity, atmospheric pressure, and actual power output. The wind turbines used in the wind farm are China Haizhuang HZ111/2000L models, manufactured by CSSC Haizhuang Wind Power, Chongqing, China with a rated capacity of 2 MW, and the hub height is 70 m.
The power generation data exhibited seasonal variations influenced by local geographical and meteorological factors. Consequently, the dataset was partitioned into four seasonal subsets: Spring (1 March to 31 May 2019), Summer (1 June to 31 August 2019), Autumn (1 September to 30 November 2019), and Winter (1 January to 28 February 2019, and 1 December to 31 December 2019). The partitioning ensured a detailed analysis of the seasonal behavior of the wind farm, and the statistical characteristics of each subset are presented in the accompanying Table 2.
Each seasonal dataset was split into 90% for training and 10% for testing to preserve temporal and seasonal patterns. Missing meteorological values caused by turbine faults were filled via linear interpolation. All input variables were then linearly normalized to [0, 1] to ensure training stability.

3.2. Evaluation Metrics

To evaluate the performance of the proposed method, four widely used evaluation metrics are employed: R 2 , MAE, RMSE, and MAPE. R 2 measures the proportion of variance in the dependent variable explained by the model, with values closer to 1 indicating better fit. MAE quantifies the average magnitude of errors, providing a straightforward interpretation of the forecasting accuracy. RMSE penalizes larger errors, making it sensitive to outliers and reflecting the overall prediction quality. Owing to the presence of near-zero actual wind power values in the dataset, the standard MAPE tends to exhibit disproportionately large errors. To address this issue, this study employs a modified MAPE formulation based on the mean of the actual values. The specific calculation formulas for R 2 , MAE, RMSE, and MAPE are defined as follows:
R 2 = 1 m = 1 M ( y m y ˜ m ) 2 m = 1 M ( y m y ¯ ) 2 ,
MAE = 1 M m = 1 M y m y ˜ m ,
RMSE = m = 1 M y m y ˜ m 2 M ,
MAPE = 1 M m = 1 M y m y ˜ m y ¯ × 100 % ,
where y m denotes the true value at the m-th time step, while y ˜ m represents the predicted value at the same time step. Additionally, y ¯ indicates the mean of all true values, and M is the total number of forecasted data points used for evaluation.

4. Experiments and Analysis

To evaluate the effectiveness of the proposed MVMD-SOFTS model, comprehensive comparative experiments and detailed discussions were carried out. The superiority of the SOFTS model and the efficacy of the MVMD decomposition were systematically verified through rigorous experimental validation. All experiments were implemented in Python 3.11 using the TensorFlow and Keras frameworks. The training was performed on a workstation equipped with an Intel Core i9-13900K CPU and 64 GB RAM. The basic parameter configurations of each model are summarized in Table 3, including model configurations such as layer configurations and internal parameters like hidden size and model dimension, all of which were selected using the grid search method to ensure that we found the optimal configuration. The input sequence length was fixed at 24. Each model was trained using a batch size of 64 for 50 epochs with the Adam optimizer, and the loss function was set to mean squared error (MSE).

4.1. Comparative Experiments of Single Forecasting Models

In this section, the performance of the proposed SOFTS model was evaluated by comparing it with several commonly used forecasting benchmarks. Six models were constructed for this comparison: CNN, TCN, LSTM, GRU, Transformer, and the proposed SOFTS. The evaluation metrics used were R 2 , MAE, RMSE, and MAPE. Figure 4 displays a bar chart comparing the SOFTS model with the other models, while the detailed prediction accuracy results are shown in Table 4.
Owing to its inherent architectural characteristics, the CNN model is able to capture certain local dependencies through convolutional kernels but struggles to handle long-term dependencies. The enhanced TCN model, integrating residual connections and dilated convolutions, achieves notable improvements in time-series processing. However, its predictive performance remains inferior to that of the LSTM model. This is primarily because LSTM’s recurrent architecture, with its internal memory cells and gating mechanisms, enables it to more effectively capture and store long-term dependencies. Across all four datasets as shown in Table 4 and Figure 4, the LSTM model consistently outperforms the CNN model and the TCN model. For example, in the spring dataset, the LSTM model reduces the MAE, RMSE, and MAPE by 2.215 MW, 1.9937 MW, and 2.4874%, respectively, compared to the TCN model. The GRU model, by reducing the number of memory units and gating mechanisms compared to the LSTM model, features a simpler structure with fewer parameters. This streamlined architecture results in a slightly improved predictive performance over the LSTM model. Compared to the recurrent structure of the LSTM model, the Transformer model utilizes a self-attention mechanism, which does not rely on sequential processing. By calculating the attention weights between each time step and all other time steps in the sequence, the Transformer can easily capture long-range dependencies. For instance, in the summer and winter datasets, the Transformer model outperformed the LSTM model, with reductions in the MAE, RMSE, and MAPE of 0.3783 MW, 0.0526 MW, and 0.5732% for the summer dataset, and 1.4405 MW, 2.1119 MW, and 1.3871% for the winter dataset, respectively. These results highlight the superior performance of the Transformer model in handling complex forecasting tasks. Compared to the traditional Transformer model, the SOFTS model replaces the attention mechanism with the STAR module, which employs distributed interactions to reduce computational complexity and enhance robustness. Among all models, the SOFTS model consistently delivers the best performance across all four seasonal datasets. Its R 2 values are closest to 1, while its MAE, RMSE, and MAPE are the lowest among all single forecasting models. For instance, on the summer dataset, the SOFTS model achieves an R 2 of 0.9727, an MAE of 5.2262, an RMSE of 8.6153, and a MAPE of 7.8813%, demonstrating its superior ability to capture long-term dependencies even under conditions of high uncertainty and fluctuation in wind speed, resulting in wind power output predictions that more accurately align with the actual values.
Additionally, the comparative analysis of training times for both Transformer and SOFTS models is presented in Table 5. SOFTS demonstrates superior training efficiency compared to Transformer across different sequence lengths (L = 24, 48, 96). In particular, in medium- to long-term forecasting tasks (L = 48 and L = 96), as the input sequence length increases, the training time of Transformer grows rapidly due to its inherent computational complexity. In contrast, the efficiency advantage of the SOFTS model becomes increasingly evident, with its training duration remaining stable even as the sequence length extends. This observation aligns well with the theoretical complexity analysis in Table 1: SOFTS is primarily affected by the number of input channels rather than the sequence length, demonstrating its efficiency advantage over Transformer across different sequence lengths.

4.2. Comparative Experiments of Different Decomposition Forecasting Methods

To evaluate the effectiveness of MVMD on wind power forecasting, we compared three distinct signal decomposition methods: complete EEMD with adaptive noise (CEEMDAN) [30], VMD [27], and MVMD [33]. CEEMDAN has been widely applied in wind power forecasting due to its strong capability in handling nonlinear and non-stationary signals. Therefore, in this study, CEEMDAN is applied to the wind power sequence alone to reflect its performance within a typical univariate modeling framework. In contrast, VMD decomposes the wind power sequence and each meteorological variable independently, whereas MVMD performs joint decomposition of all channel data—including wind power and meteorological variables—within a unified framework. This experimental setup facilitates a step-by-step comparison, progressing from traditional univariate decomposition (CEEMDAN), to independent multivariate decomposition (VMD), and ultimately to joint multivariate decomposition (MVMD), thereby demonstrating the superior capabilities of integrated multivariate signal decomposition.
In both the VMD and MVMD methods, the setting of the K value significantly impacted the decomposition quality and model performance. When the K value is set too low, the number of modes is insufficient to capture the main components of the signal, reducing the prediction accuracy; conversely, an excessively high K value leads to mode overproduction, additionally increasing the computational burden. Given the non-stationary, nonlinear, and uncertain characteristics of wind power data, setting the K value too low is inadvisable. Through repeated experiments with K values ranging from 6 to 10, a K value of 8 was selected for both VMD and MVMD to balance the predictive accuracy and computational efficiency. Table 6 illustrates the central frequencies of the IMFs obtained by VMD and MVMD in spring and summer datasets, revealing a trend of increasing IMF frequency with decomposition order. In VMD, the separate decomposition of wind power and meteorological variables (e.g., wind speed, temperature) results in inconsistent IMF central frequencies across variables; for example, in the spring dataset, IMF1 exhibits distinct frequencies for wind power and associated meteorological signals, reflecting limited capability to model multivariate interdependencies. By contrast, MVMD maintains uniform IMF central frequencies across variables in both the spring and summer datasets: through simultaneously processing correlated signals, MVMD ensures that corresponding IMFs (e.g., IMF1, IMF2) for wind power, wind speed, and temperature share consistent frequencies, preserving inter-variable correlations. This coherence in decomposed components highlights MVMD’s superiority in providing robust input features for predictive modeling compared to VMD’s disjointed single-variable decomposition. Finally, Figure 5 presents the MVMD decomposition results of the wind power sequence and the five meteorological data series: taking the wind power series as an example, IMF1 has the lowest central frequency and captures the sequence’s trend; IMF2 (with the second lowest central frequency) reflects its periodic characteristics; and IMF3-IMF8 (with the highest central frequencies) represent the short-term fluctuations.
Subsequently, to verify the superiority of the MVMD model, we employed SOFTS as the forecasting model to establish three hybrid models: CEEMDAN-SOFTS, VMD-SOFTS, and MVMD-SOFTS. The forecasting performance of these three hybrid models for wind power is presented in Table 7. The CEEMDAN-SOFTS model outperformed the VMD-SOFTS model across most datasets. For instance, in the autumn and winter datasets, the CEEMDAN-SOFTS model achieved higher R 2 values than the VMD-SOFTS model, with MAE and RMSE reductions of 0.3415 MW, 0.5771 MW, and 0.6359 MW, 0.9247 MW, respectively. This advantage stems from CEEMDAN’s adaptive noise mechanism, which efficiently isolates major components without complex parameter tuning, enabling the predictive model to capture fluctuations and trends in wind power more effectively. Among the three models, the MVMD-SOFTS model consistently exhibited the highest forecasting accuracy across all four seasonal datasets: its R 2 values were closest to 1, while its MAE, RMSE, and MAPE were the lowest. Compared to the CEEMDAN-SOFTS model, the MVMD-SOFTS model showed MAE and RMSE reductions of 0.3361 MW and 0.9170 MW; compared to the VMD-SOFTS model, these reductions were 0.4384 MW and 1.3730 MW. To provide a clearer comparison of single-point forecasting errors, Figure 6, Figure 7, Figure 8 and Figure 9 depict the wind power forecasting curves and their absolute error curves obtained by the three hybrid models. Notably, among all models, the MVMD-SOFTS model demonstrated the best performance. Specially, the box plots in Figure 6d, Figure 7d, Figure 8d and Figure 9d illustrate that the MVMD-SOFTS model exhibits lower outlier values compared to the other two models, enhancing the model’s robustness. These results indicate that, compared with CEEMDAN and VMD, the MVMD decomposition method’s superior multivariate processing capability, decomposition stability, noise resistance, and accuracy in capturing low-frequency trends better meet the complex signal requirements of wind power forecasting.

4.3. Comparative Experiments of Multi-Step Forecasting

In practical applications, wind power forecasting is not limited to single-step prediction; multi-step forecasting is of equal importance. In this section, multi-step forecasting experiments are conducted with forecast horizons set as 2-step ahead prediction (30 min), 3-step ahead prediction (45 min), and 4-step ahead prediction (60 min). A total of nine models are selected for comparative experiments: CNN, TCN, LSTM, GRU, Transformer, SOFTS, CEEMDAN-SOFTS, VMD-SOFTS, and the proposed MVMD-SOFTS. The forecasting performances of these nine models across the seasonal datasets at different time steps are presented in Table 8 and Table 9. As shown in Table 8 and Table 9, the main conclusions drawn from the multi-step ahead forecasting experiment are as follows:
(1)
Multi-step forecasting poses significant challenges compared to single-step forecasting, primarily due to the cumulative error that tends to increase with each additional prediction step in most models. Compared to other single forecasting models, the SOFTS model demonstrated superior performance in both single-step and multi-step forecasting tasks. In experiments conducted on the seasonal datasets, the SOFTS model achieved R 2 values closer to 1 and had the lowest MAE, RMSE, and MAPE values among all the single forecasting models. Therefore, employing the SOFTS model as a baseline forecasting method is conducive to enhancing the accuracy and robustness of subsequent experiments.
(2)
Compared to single forecasting models, hybrid models based on signal decomposition algorithms exhibit superior performance in multi-step forecasting tasks. Across all seasonal datasets, the signal decomposition-based hybrid models generally outperform single forecasting models in terms of MAE, RMSE, and MAPE metrics in multi-step forecasting experiments. Single forecasting models struggle to effectively capture the complex dynamic behavior of wind power sequences during multi-step forecasting due to the inherent volatility and uncertainty of wind power data. The hybrid signal decomposition algorithm addresses this issue by decomposing the original wind power signal into multiple sub-sequences with improved stationarity and specific frequency characteristics, making each sub-sequence easier to model. This approach reduces the burden of complexity and noise handling for each sub-model, thereby significantly enhancing the overall stability and robustness of the prediction.
(3)
Compared to VMD-SOFTS and CEEMDAN-SOFTS, MVMD-SOFTS demonstrates significant advantages in multi-step forecasting. Taking the spring dataset as an example, in the two-step ahead forecasting, the R 2 value of MVMD-SOFTS is higher than that of the other two models, with the MAE and RMSE reduced by 1.7369 MW and 0.6250 MW, as well as 3.2509 MW and 1.8042 MW, respectively. Meanwhile, the MAPE is reduced by 1.9360% and 0.6966%, respectively. MVMD is capable of jointly decomposing multiple input variables, effectively suppressing noise and filtering out irrelevant information. This multivariate signal decomposition approach facilitates better extraction of intrinsic correlations between features, significantly improving the stationarity and distinctiveness of the decomposed sub-sequences, thereby enhancing the training effectiveness and prediction accuracy of the subsequent forecasting model.
(4)
Overall, MVMD-SOFTS demonstrated superior experimental results in both single-step and multi-step ahead forecasting, achieving optimal performance in error metrics across all datasets. Taking the four-step ahead forecasting as an example, the average MAE and RMSE values of MVMD-SOFTS across the four datasets were 3.9387 MW and 5.7280 MW, respectively, while the average MAPE was 4.5273%. Compared to other models, MVMD-SOFTS exhibited better prediction accuracy, with the lowest values for all error metrics, indicating its higher accuracy and robustness in capturing wind power fluctuation trends and addressing random variations in the data.

5. Conclusions

In this paper, we propose a novel hybrid model for wind power forecasting, the MVMD-SOFTS model, which is the first application of the SOFT model in wind power forecasting. The model is evaluated using real-world data from the Guohua Jingxia North Wind Farm and is compared with several commonly used benchmark models through three sets of experiments: single-model comparison, hybrid-model comparison, and multi-step ahead prediction. The performance is analyzed using four error evaluation metrics: R 2 , MAE, RMSE, and MAPE. The following conclusions are drawn:
(1)
The MVMD method overcomes the limitations of traditional single-variable decomposition techniques (e.g., VMD, CEEMDAN) by effectively capturing the complex multivariate coupling relationships hidden in wind-power time-series data—such as the dynamic interactions between wind speed, wind direction, and power output—thereby enhancing the quality of the input data and ultimately improving the forecasting accuracy.
(2)
By replacing the Transformer’s self-attention mechanism with the STAR module, the SOFTS architecture fully linearizes its core computation: its complexity now grows mainly with the number of input channels C rather than the sequence length L. This channel-oriented scaling eliminates the computational cost of the self-attention mechanism, making SOFTS markedly faster to train than a standard Transformer, thereby greatly improving the model’s efficiency.
(3)
In the multi-step ahead prediction experiment, the MVMD-SOFTS model demonstrates superior performance compared to all other models, as it successfully maintains high forecasting accuracy over multiple time steps by combining effective data decomposition and advanced time-series modeling techniques, making it ideal for short-term and real-time forecasting applications in power grid operations.
(4)
It should be noted that the key MVMD hyperparameters (e.g., the number of modes K and the regularization factor α ) as well as the convergence behavior and termination criteria of the ADMM solver have been systematically discussed in the literature (as discussed in [33] and also the engineering settings and application practices in [34,35,36]). In prior engineering studies, the reported K values typically vary with the spectral characteristics of the signals and task requirements, generally ranging from approximately 5 to 11. Due to space limitations, we do not provide an exhaustive comparison across different K settings in the main text; instead, we focus on the proposed model architecture and its end-to-end forecasting performance. Future work will conduct dedicated sensitivity/stability analyses with respect to K and α and further evaluate ADMM iteration cost and its impact on overall efficiency.

Author Contributions

Conceptualization, Z.L. and W.L. (Wentian Lu); methodology, Z.L.; software, Z.L.; validation, Z.L., W.L. (Wentian Lu) and W.L. (Wenjie Liu); formal analysis, Z.L.; investigation, Z.L. and Y.C.; resources, W.L. (Wentian Lu); data curation, Z.L. and W.L. (Wenjie Liu); writing—original draft preparation, Z.L.; writing—review and editing, W.L. (Wentian Lu) and W.L. (Wenjie Liu); visualization, Z.L. and Y.C.; supervision, W.L. (Wentian Lu); project administration, W.L. (Wentian Lu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 52107093 and in part by the Basic and Applied Basic Research Fund of Guangdong Province under Grant 2022A1515240038.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed in this study are proprietary unpublished data of our research team and are currently reserved for follow-up research projects. For academic research purposes that comply with ethical and academic norms, interested researchers may contact the corresponding author (Email: [lwj1993@gzhu.edu.cn]) to negotiate reasonable access to the data after the completion of the follow-up research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ANNArtificial Neural Network
CEEMDANComplete Ensemble Empirical Mode Decomposition with Adaptive Noise
CNNConvolutional Neural Network
DNNDeep Neural Network
EMDEmpirical Mode Decomposition
EEMDEnsemble Empirical Mode Decomposition
GANGenerative Adversarial Network
GRUGated Recurrent Unit
LSTMLong Short-Term Memory
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
MEMDMultiple Empirical Mode Decomposition
MVMDMultivariate Variational Mode Decomposition
R 2 Coefficient of Determination
RMSERoot Mean Squared Error
SCADASupervisory Control and Data Acquisition
SVMSupport Vector Machine
SOFTSSeries-Core Fused Time Series Forecaster
TCNTemporal Convolutional Network
VMDVariational Mode Decomposition
WDWavelet Decomposition

References

  1. Li, J.D.; Chen, S.J.; Wu, Y.Q.; Wang, Q.H.; Liu, X.; Qi, L.J.; Lu, X.Y.; Gao, L. How to make better use of intermittent and variable energy? A review of wind and photovoltaic power consumption in China. Renew. Sustain. Energy Rev. 2021, 137, 110626. [Google Scholar] [CrossRef]
  2. Wang, Y.; Zou, R.M.; Liu, F.; Zhang, L.J.; Liu, Q.Y. A review of wind speed and wind power forecasting with deep neural networks. Appl. Energy 2021, 304, 117766. [Google Scholar] [CrossRef]
  3. Liu, L.; Liu, J.C.; Ye, Y.; Liu, H.; Chen, K.; Li, D.; Dong, X.; Sun, M.Z. Ultra-short-term wind power forecasting based on deep Bayesian model with uncertainty. Renew. Energy 2023, 205, 598–607. [Google Scholar] [CrossRef]
  4. Lu, P.; Ye, L.; Zhao, Y.N.; Dai, B.H.; Pei, M.; Tang, Y. Review of meta-heuristic algorithms for wind power prediction: Methodologies, applications and challenges. Appl. Energy 2021, 301, 117446. [Google Scholar] [CrossRef]
  5. Chang, Y.; Yang, H.; Chen, Y.X.; Zhou, M.R.; Yang, H.B.; Wang, Y.; Zhang, Y.R. A Hybrid Model for Long-Term Wind Power Forecasting Utilizing NWP Subsequence Correction and Multi-Scale Deep Learning Regression Methods. IEEE Trans. Sustain. Energy 2023, 15, 263–275. [Google Scholar]
  6. Jin, J.L.; Wen, Q.L.; Zhao, L.Y.; Zhou, C.Y.; Guo, X.J. Measuring environmental performance of power dispatch influenced by low-carbon approaches. Renew. Energy 2023, 209, 325–339. [Google Scholar] [CrossRef]
  7. Capelletti, M.; Raimondo, D.M.; De Nicolao, G. Wind power curve modeling: A probabilistic Beta regression approach. Renew. Energy 2024, 223, 119970. [Google Scholar] [CrossRef]
  8. Monjazeb, M.R.; Amiri, H.; Movahedi, A. Wholesale electricity price forecasting by Quantile Regression and Kalman Filter method. Energy 2024, 290, 129925. [Google Scholar] [CrossRef]
  9. Chen, H. A novel wind model downscaling with statistical regression and forecast for the cleaner energy. J. Clean. Prod. 2024, 434, 140217. [Google Scholar] [CrossRef]
  10. Ikram, R.M.A.; Ewees, A.A.; Parmar, K.S.; Yaseen, Z.M.; Shahid, S.; Kisi, O. The viability of extended marine predators algorithm-based artificial neural networks for streamflow prediction. Appl. Soft Comput. 2022, 131, 109739. [Google Scholar] [CrossRef]
  11. Liu, Y.Q.; Sun, Y.; Infield, D.; Zhao, Y.; Han, S.; Yan, J. A hybrid forecasting method for wind power ramp based on orthogonal test and support vector machine (OT-SVM). IEEE Trans. Sustain. Energy 2016, 8, 451–457. [Google Scholar] [CrossRef]
  12. Ng, K.W.; Huang, Y.F.; Koo, C.H.; Chong, K.L.; El-Shafie, A.; Ahmed, A.N. A review of hybrid deep learning applications for streamflow forecasting. J. Hydrol. 2023, 625, 130141. [Google Scholar] [CrossRef]
  13. Guo, Z.H.; Zhao, W.G.; Lu, H.Y.; Wang, J.Z. Multi-step forecasting for wind speed using a modified EMD-based artificial neural network model. Renew. Energy 2012, 37, 241–249. [Google Scholar]
  14. Liu, H.; Tian, H.Q.; Li, Y.F.; Zhang, L. Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions. Energy Convers. Manag. 2015, 92, 67–81. [Google Scholar] [CrossRef]
  15. Wang, L.; Zeng, Y.; Chen, T. Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst. Appl. 2015, 42, 855–863. [Google Scholar] [CrossRef]
  16. Deng, W.X.; Zhou, H.; Zhou, J.; Yao, J.Y. Neural Network-Based Adaptive Asymptotic Prescribed Performance Tracking Control of Hydraulic Manipulators. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 285–296. [Google Scholar] [CrossRef]
  17. Khodayar, M.; Kaynak, O.; Khodayar, M.E. Rough deep neural architecture for short-term wind speed forecasting. IEEE Trans. Ind. Inform. 2017, 13, 2770–2779. [Google Scholar] [CrossRef]
  18. Hong, Y.Y.; Satriani, T.R.A. Day-ahead spatiotemporal wind speed forecasting using robust design-based deep learning neural network. Energy 2020, 209, 118441. [Google Scholar]
  19. Shi, Z.C.; Liang, H.; Dinavahi, V. Direct interval forecast of uncertain wind power based on recurrent neural networks. IEEE Trans. Sustain. Energy 2017, 9, 1177–1187. [Google Scholar] [CrossRef]
  20. Ewees, A.A.; Al-Qaness, M.A.A.; Abualigah, L.; Abd Elaziz, M. HBO-LSTM: Optimized long short term memory with heap-based optimizer for wind power forecasting. Energy Convers. Manag. 2022, 268, 116022. [Google Scholar] [CrossRef]
  21. Fantini, D.G.; Silva, R.N.; Siqueira, M.B.B.; Pinto, M.S.S.; Guimarães, M.; Junior, A.B. Wind speed short-term prediction using recurrent neural network GRU model and stationary wavelet transform GRU hybrid model. Energy Convers. Manag. 2024, 308, 118333. [Google Scholar] [CrossRef]
  22. Meka, R.; Alaeddini, A.; Bhaganagar, K. A robust deep learning framework for short-term wind power forecast of a full-scale wind farm using atmospheric variables. Energy 2021, 221, 119759. [Google Scholar] [CrossRef]
  23. Zhou, B.; Duan, H.R.; Wu, Q.W.; Wang, H.Z.; Or, S.W.; Chan, K.W.; Meng, Y.F. Short-term prediction of wind power and its ramp events based on semi-supervised generative adversarial network. Int. J. Electr. Power Energy Syst. 2021, 125, 106411. [Google Scholar]
  24. Nascimento, E.G.S.; de Melo, T.A.C.; Moreira, D.M. A transformer-based deep neural network with wavelet transform for forecasting wind speed and wind energy. Energy 2023, 278, 127678. [Google Scholar] [CrossRef]
  25. Ahmadi, M.; Khashei, M. Current status of hybrid structures in wind forecasting. Eng. Appl. Artif. Intell. 2021, 99, 104133. [Google Scholar] [CrossRef]
  26. Yu, C.J.; Li, Y.L.; Chen, Q.; Lai, X.P.; Zhao, L.Y. Matrix-based wavelet transformation embedded in recurrent neural networks for wind speed prediction. Appl. Energy 2022, 324, 119692. [Google Scholar] [CrossRef]
  27. Jiang, W.J.; Liu, B.; Liang, Y.; Gao, H.X.; Lin, P.F.; Zhang, D.Q.; Hu, G. Applicability analysis of transformer to wind speed forecasting by a novel deep learning framework with multiple atmospheric variables. Appl. Energy 2024, 353, 122155. [Google Scholar]
  28. Li, N.; Dong, J.; Liu, L.Y.; Li, H.; Yan, J. A novel EMD and causal convolutional network integrated with Transformer for ultra short-term wind power forecasting. Int. J. Electr. Power Energy Syst. 2023, 154, 109470. [Google Scholar] [CrossRef]
  29. He, Y.Y.; Wang, Y. Short-term wind power prediction based on EEMD–LASSO–QRNN model. Appl. Soft Comput. 2021, 105, 107288. [Google Scholar] [CrossRef]
  30. Karijadi, I.; Chou, S.Y.; Dewabharata, A. Wind power forecasting based on hybrid CEEMDAN-EWT deep learning method. Renew. Energy 2023, 218, 119357. [Google Scholar] [CrossRef]
  31. Bisoi, R.; Dash, P.K.; Parida, A.K. Hybrid variational mode decomposition and evolutionary robust kernel extreme learning machine for stock price and movement prediction on daily basis. Appl. Soft Comput. 2019, 74, 652–678. [Google Scholar] [CrossRef]
  32. Rehman, N.; Mandic, D.P. Multivariate empirical mode decomposition. Proc. R. Soc. A 2010, 466, 1291–1302. [Google Scholar] [CrossRef]
  33. ur Rehman, N.; Aftab, H. Multivariate variational mode decomposition. IEEE Trans. Signal Process. 2019, 67, 6039–6052. [Google Scholar] [CrossRef]
  34. Gupta, P.; Singh, R. Combining a deep learning model with multivariate empirical mode decomposition for hourly global horizontal irradiance forecasting. Renew. Energy 2023, 206, 908–927. [Google Scholar] [CrossRef]
  35. Fang, J.J.; Yang, L.S.; Wen, X.H.; Yu, H.J.; Li, W.D.; Adamowski, J.F.; Barzegar, R. Ensemble learning using multivariate variational mode decomposition based on the Transformer for multi-step-ahead streamflow forecasting. J. Hydrol. 2024, 636, 131275. [Google Scholar] [CrossRef]
  36. Yang, T.; Yang, Z.N.; Li, F.; Wang, H.Y. A short-term wind power forecasting method based on multivariate signal decomposition and variable selection. Appl. Energy 2024, 360, 122759. [Google Scholar] [CrossRef]
  37. Han, L.; Chen, X.Y.; Ye, H.J.; Zhan, D.C. SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion. arXiv 2024, arXiv:2404.14197. [Google Scholar] [CrossRef]
  38. Zeiler, M.D.; Fergus, R. Stochastic Pooling for Regularization of Deep Convolutional Neural Networks. arXiv 2013, arXiv:1301.3557. [Google Scholar] [CrossRef]
Figure 1. The architecture of the SOFTS model.
Figure 1. The architecture of the SOFTS model.
Forecasting 08 00015 g001
Figure 2. The working principle of the STAR module.
Figure 2. The working principle of the STAR module.
Forecasting 08 00015 g002
Figure 3. The architecture of the MVMD-SOFTS model.
Figure 3. The architecture of the MVMD-SOFTS model.
Forecasting 08 00015 g003
Figure 4. The error metrics of the SOFTS model compared to other models across the four seasonal datasets: (a) R 2 ; (b) MAE; (c) RMSE; (d) MAPE.
Figure 4. The error metrics of the SOFTS model compared to other models across the four seasonal datasets: (a) R 2 ; (b) MAE; (c) RMSE; (d) MAPE.
Forecasting 08 00015 g004
Figure 5. The MVMD decomposition results for the four datasets: (a) wind power; (b) wind speed; (c) wind direction; (d) temperature; (e) air pressure; and (f) humidity.
Figure 5. The MVMD decomposition results for the four datasets: (a) wind power; (b) wind speed; (c) wind direction; (d) temperature; (e) air pressure; and (f) humidity.
Forecasting 08 00015 g005
Figure 6. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the spring data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.
Figure 6. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the spring data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.
Forecasting 08 00015 g006
Figure 7. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the summer data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.
Figure 7. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the summer data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.
Forecasting 08 00015 g007
Figure 8. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the autumn data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.
Figure 8. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the autumn data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.
Forecasting 08 00015 g008
Figure 9. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the winter data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.
Figure 9. The forecasting results of three signal decomposition strategies combined with the SOFTS model on the winter data: (a) wind power forecasting results; (b) wind power forecasting scatter plot; (c) wind power absolute error plot; (d) wind power boxplot of absolute errors.
Forecasting 08 00015 g009
Table 1. Comparative analysis of model computational complexity and parallelization capability.
Table 1. Comparative analysis of model computational complexity and parallelization capability.
ModelExact ComplexityCore ComputationParallelization
LSTM O ( L d C + L d 2 + C d H ) O ( L d 2 ) Low (sequential)
Transformer O ( C L d + L 2 d + H L d + C d H ) O ( L 2 d + H L d ) Medium (partial)
SOFTS O ( C L d + C d 2 + C d H ) O ( C d 2 ) High (fully parallel)
Table 2. The detailed description of wind power-related datasets.
Table 2. The detailed description of wind power-related datasets.
DatasetVariableSizeMaxMinMeanStdSkewnessKurtosis
Spring datasetWind power (MW)8832200.010.0391.3569.330.16−1.43
Wind speed (m/s)22.760.328.764.710.33−0.68
Wind direction (°)355.563.99144.0185.780.65−1.15
Temperature (°C)32.34−5.0413.437.58−0.15−0.57
Air pressure (hPa)878.62855.47868.173.99−0.33−0.22
Humidity (%)94.534.8326.3215.541.452.18
Summer datasetWind power (MW)8832199.310.0768.3557.080.38−1.14
Wind speed (m/s)20.600.327.624.230.40−0.64
Wind direction (°)354.430146.6688.660.62−1.14
Temperature (°C)40.1313.7526.715.040.17−0.69
Air pressure (hPa)875.04856.89865.952.760.290.43
Humidity (%)93.047.6328.2815.041.452.32
Autumn datasetWind power (MW)8736200.000.0465.5562.990.58−1.03
Wind speed (m/s)21.620.327.354.700.57−0.62
Wind direction (°)355.560160.3485.160.20−1.55
Temperature (°C)36.01−14.4711.5510.65−0.03−0.77
Air pressure (hPa)882.59859.96871.584.33−0.34−0.47
Humidity (%)95.249.5035.2316.721.010.73
Winter datasetWind power (MW)8640200.050.0355.5968.641.05−0.38
Wind speed (m/s)19.780.326.064.580.88−0.10
Wind direction (°)355.610177.6387.71−0.16−1.58
Temperature (°C)9.96−18.66−6.784.940.400.06
Air pressure (hPa)882.02857.17869.864.73−0.06−0.39
Humidity (%)95.2711.9758.6015.33−0.18−0.22
Table 3. The parameter settings of the prediction models.
Table 3. The parameter settings of the prediction models.
ModelsParameters
CNNNumber of layers: 1; kernel size: 1; filters: 32; activation function: ReLU.
TCNNumber of layers: 1; kernel size: 1; filters: 32; activation function: ReLU.
LSTMNumber of layers: 2; hidden sizes: [128, 64]; activation function: ReLU.
GRUNumber of layers: 2; hidden sizes: [128, 64]; activation function: ReLU.
TransformerEncoder layers: 2; heads: 4; model dimension: 64; feedforward dimension: 128.
SOFTSSTAR blocks: 2; core dimension: 64; feedforward dimension: 128; activation function: GeLU.
Table 4. Comparison of single forecasting models on wind power datasets (L = 24).
Table 4. Comparison of single forecasting models on wind power datasets (L = 24).
DatasetModelEvaluation Metrics
R 2 MAE (MW)RMSE (MW)MAPE (%)
Spring datasetCNN0.857011.317219.622912.7090
TCN0.94328.382412.36829.4133
LSTM0.96006.167410.37456.9259
GRU0.96026.253410.35407.0224
Transformer0.96275.481210.02616.1553
SOFTS0.96485.13219.74755.7475
Summer datasetCNN0.864813.319019.151720.1789
TCN0.96356.79299.954810.2916
LSTM0.96876.08198.76489.2144
GRU0.97145.73879.20898.6943
Transformer0.97205.70368.71228.6412
SOFTS0.97275.22628.61537.8813
Autumn datasetCNN0.932312.139818.631712.2910
TCN0.96899.087912.62559.2012
LSTM0.97846.078010.52386.1538
GRU0.98315.76949.31135.8413
Transformer0.98276.50019.46936.6582
SOFTS0.98644.66738.32944.7435
Winter datasetCNN0.890017.726623.393717.0917
TCN0.96519.708513.81479.3608
LSTM0.96907.535612.42337.2657
GRU0.97656.814610.80506.5705
Transformer0.97866.095110.31145.8786
SOFTS0.98105.43819.74375.2346
Table 5. Training time vs. input length.
Table 5. Training time vs. input length.
DatasetModelTraining Time (s)
L = 24 L = 48 L = 96
Spring datasetTransformer87.82110.35180.75
SOFTS66.1367.8268.33
Summer datasetTransformer85.31110.81179.45
SOFTS57.4161.6759.40
Autumn datasetTransformer84.50111.13176.55
SOFTS56.4760.4161.11
Winter datasetTransformer87.99111.06179.82
SOFTS60.1562.4462.03
Table 6. Central frequencies of IMFs decomposed by VMD and MVMD in seasonal datasets (Note: Autumn and winter results are omitted for brevity).
Table 6. Central frequencies of IMFs decomposed by VMD and MVMD in seasonal datasets (Note: Autumn and winter results are omitted for brevity).
DatasetMethodSequenceThe Center Frequency of the Decomposed IMFs
IMF1IMF2IMF3IMF4IMF5IMF6IMF7IMF8
Spring datasetVMDWind power (MW)0.00020.00870.01760.02790.04400.07180.10640.1524
Wind speed (m/s)0.00010.00980.02200.04460.08430.15020.33270.4398
Wind direction (°)0.00010.01000.02780.05740.10620.18300.27740.4583
Temperature (°C)0.00000.01410.03010.04130.05340.07520.10960.1669
Air pressure (hPa)0.00000.00640.02030.02950.04340.07120.11840.1796
Humidity (%)0.00010.00970.01890.03060.05010.07590.11520.1853
MVMDAll sequences0.00000.01120.03120.06800.12390.24910.37680.4622
Summer datasetVMDWind power (MW)0.00020.00970.01960.03200.04970.07040.10910.1543
Wind speed (m/s)0.00010.00960.02230.04500.07660.12670.22270.4004
Wind direction (°)0.00010.01120.03100.06850.11900.19860.30770.4206
Temperature (°C)0.00000.01050.02990.04790.06390.08650.12640.1685
Air pressure (hPa)0.00000.01000.02080.03690.06320.10320.15110.3506
Humidity (%)0.00010.00910.01350.02590.04290.06570.10190.1508
MVMDAll sequences0.00000.01180.03570.08030.16130.24960.35600.4482
Table 7. Comparison of hybrid forecasting models on wind power datasets (L = 24).
Table 7. Comparison of hybrid forecasting models on wind power datasets (L = 24).
DatasetModelEvaluation Metrics
R 2 MAE (MW)RMSE (MW)MAPE (%)
Spring datasetCEEMDAN-SOFTS0.98963.22795.28883.5968%
VMD-SOFTS0.98773.33025.74483.7108%
MVMD-SOFTS0.99292.89184.37183.2223%
Summer datasetCEEMDAN-SOFTS0.99143.18174.83654.8260%
VMD-SOFTS0.99253.03804.50304.6080%
MVMD-SOFTS0.99273.26204.43654.9478%
Autumn datasetCEEMDAN-SOFTS0.99612.72874.47252.7543%
VMD-SOFTS0.99503.07025.04963.0989%
MVMD-SOFTS0.99732.65143.73672.6762%
Winter datasetCEEMDAN-SOFTS0.99343.54605.73953.4355%
VMD-SOFTS0.99114.18196.66424.0516%
MVMD-SOFTS0.99523.49194.88143.3831%
Table 8. Comparative experiments of multi-step forecasting for the spring dataset (L = 24).
Table 8. Comparative experiments of multi-step forecasting for the spring dataset (L = 24).
Model2-Step3-Step4-Step
R 2 MAERMSEMAPE (%) R 2 MAERMSEMAPE (%) R 2 MAERMSEMAPE (%)
CNN0.740517.064426.433419.16310.630020.719030.564823.26710.571422.123533.968424.8443
TCN0.867611.142518.878412.51290.774014.088024.665215.82050.701617.603328.344419.7602
LSTM0.870910.435518.645011.73010.768614.253624.961816.00650.683817.559829.178619.7193
GRU0.87939.949318.026111.17280.784114.213324.109715.96690.695216.686028.645418.7381
Transformer0.858710.762519.503112.08610.766414.526725.080416.31320.667617.669829.914619.8429
SOFTS0.88319.612017.745310.71420.793012.817623.625914.29140.700615.828228.424317.6313
CEEMDAN-SOFTS0.97704.76247.87795.30850.95646.790010.84667.50790.93848.183912.96669.1265
VMD-SOFTS0.98463.65056.43124.06910.97854.37687.60614.88010.97454.97618.29755.4982
MVMD-SOFTS0.99213.02554.62703.37250.99073.25375.01833.62790.98713.86495.91084.3101
Table 9. Comparative experiments of multi-step forecasting for the summer dataset (L = 24).
Table 9. Comparative experiments of multi-step forecasting for the summer dataset (L = 24).
Model2-Step3-Step4-Step
R 2 MAERMSEMAPE (%) R 2 MAERMSEMAPE (%) R 2 MAERMSEMAPE (%)
CNN0.809114.406022.756021.82570.705319.345328.276829.30910.612521.753932.452532.9532
TCN0.899710.593616.491616.04980.815014.324922.401421.70290.723517.749627.386426.8915
LSTM0.899710.351116.493115.68840.797414.762623.443722.36610.705318.132828.274327.4720
GRU0.90899.935815.723815.05190.824314.381021.830821.78780.740617.293026.529826.1997
Transformer0.897810.654716.647316.14240.817914.136822.229021.41780.714917.696028.030926.1320
SOFTS0.91459.524115.242314.42020.841612.930120.747519.54150.735316.045225.883224.2027
CEEMDAN-SOFTS0.98494.29656.40286.50520.97495.80238.25758.76910.96077.390910.336411.1485
VMD-SOFTS0.99023.49075.15075.28520.98704.08635.93446.17580.98544.54076.30926.8493
MVMD-SOFTS0.99203.40804.66535.16000.99173.46304.75245.33790.98853.95015.60425.9584
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, W.; Lu, Z.; Liu, W.; Cao, Y. Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data. Forecasting 2026, 8, 15. https://doi.org/10.3390/forecast8010015

AMA Style

Lu W, Lu Z, Liu W, Cao Y. Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data. Forecasting. 2026; 8(1):15. https://doi.org/10.3390/forecast8010015

Chicago/Turabian Style

Lu, Wentian, Zhenming Lu, Wenjie Liu, and Yifeng Cao. 2026. "Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data" Forecasting 8, no. 1: 15. https://doi.org/10.3390/forecast8010015

APA Style

Lu, W., Lu, Z., Liu, W., & Cao, Y. (2026). Series-Core Fusion Based Multivariate Variational Mode Decomposition for Short-Term Wind Power Prediction Using Multiple Meteorological Data. Forecasting, 8(1), 15. https://doi.org/10.3390/forecast8010015

Article Metrics

Back to TopTop