Next Article in Journal
A Fourth-Order Decomposition-Based RLS Algorithm with Variable Forgetting Factors
Previous Article in Journal
Colored Degree Factors in Regular and Triangle-Inflated Cubic Graphs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Adaptive Multi-Scale Heterogeneous Ensemble Framework for Interpretable Wind Power Forecasting in Sustainable Grids

1
School of Mathematics and Statistics, Beijing Technology and Business University, Beijing 102488, China
2
School of Economics and Management, Shanghai Maritime University, Shanghai 201306, China
3
School of Mathematical Sciences, Capital Normal University, Beijing 100048, China
4
School of Chemical Engineering, Sichuan University, Chengdu 610065, China
*
Author to whom correspondence should be addressed.
Symmetry 2026, 18(6), 921; https://doi.org/10.3390/sym18060921
Submission received: 29 April 2026 / Revised: 21 May 2026 / Accepted: 22 May 2026 / Published: 27 May 2026

Abstract

Reliable short-term wind power forecasting is crucial for smart grid stability. However, high-dimensional noise and stochastic fluctuations in wind sequences often degrade the accuracy of traditional forecasting models. Moreover, wind power time series typically exhibit asymmetric rising and decaying patterns, which further complicate accurate modeling. To address these challenges, this study proposes a hybrid intelligent system that integrates three components: data preprocessing, heterogeneous ensemble learning, and probabilistic interval forecasting. First, we build a multi-stage preprocessing workflow. Adaptive DBSCAN and Local Outlier Factor (LOF) remove spatial and density anomalies. Then multivariate variational mode decomposition (MVMD) synchronously separates multi-scale oscillatory patterns while preserving cross-channel correlations and frequency-domain symmetry across input variables. SHAP analysis quantifies feature importance, ensuring interpretability. The selected features are fed into a heterogeneous ensemble model consisting of Transformer, BPNN, ELM, XGBoost, and QRLSTM, which collectively capture multi-scale temporal dependencies and diverse data patterns. The ensemble weights are dynamically optimized by a modified multi-objective dragonfly algorithm (MMODA) that balances forecast accuracy and stability. Based on this ensemble, we apply MMODA to tune kernel density estimation for generating high-quality forecast intervals, maximizing coverage while minimizing interval width. Experiments on two wind farms in Shandong show that our MMODA-optimized ensemble reduces mean absolute percentage error by about 44.7% compared to single models, and ablations confirm that MVMD preprocessing adds a further 10.7% reduction. The proposed system provides an interpretable and reliable decision-support tool for sustainable grid operations.

1. Introduction

1.1. Motivation and Challenges

With the intensification of global climate change, promoting energy structure transformation and achieving sustainable development has become an international consensus [1]. Various countries are actively promoting the “dual carbon” goals, and have issued a series of policy documents [2], laying a solid policy foundation for the development of clean energy such as wind power and photovoltaics, making them the core direction of energy transformation [3]. Wind energy, as a clean and renewable energy source with abundant reserves, is widely valued worldwide, and its installed capacity continues to grow rapidly. According to the Global Wind Energy Council, the world’s newly installed wind power capacity will reach 167 gigawatts in 2025, setting a new historical high [4].
However, the large-scale development of wind power faces significant technical bottlenecks. The intermittency, variability, and uncertainty of wind energy make wind power forecasting challenging, and direct grid connection can seriously affect power system dispatching, safe and stable operation, and power quality [5]. Wind power time series often exhibit asymmetric patterns, and fluctuations across multiple time scales reveal hierarchical, self-similar structures. These symmetry-related properties complicate accurate modeling. Therefore, high-precision wind power forecasting is key to enhancing the grid’s consumption capacity, improving the economics of wind power, and ensuring reliable operation of the power system. Although progress has been made in wind power forecasting research, model accuracy and reliability still need to be improved. Wind speed is affected by multiple factors such as terrain, meteorology, and turbine status. The forecasting capabilities under multiple time scales and extreme weather scenarios are still insufficient to meet the needs of grid dispatching and market-oriented trading. Moreover, wind power time series typically exhibit asymmetric rising and decaying patterns: the ramp-up and ramp-down processes often have different durations and gradients. Such asymmetry complicates accurate forecasting and cannot be properly captured by models that assume symmetric error distributions or linear dynamics.

1.2. Literature Review

The central challenge in short-term wind power forecasting lies in the intrinsic characteristics of wind speed and power sequences: high-dimensional noise, strong stochastic volatility, and complex nonlinear coupling among multiple meteorological variables. These characteristics collectively degrade the forecasting performance of conventional modeling paradigms. To address these challenges, a wide range of methodologies has been proposed, which can be broadly categorized into physical methods, statistical time series models, intelligent algorithms, and hybrid approaches [6,7]. Table 1 provides a comparative summary of these four paradigms.
Physical methods, primarily based on Numerical Weather Prediction (NWP) models, exploit atmospheric dynamics and granular meteorological information to achieve long-term forecasting capabilities [8]. Although NWP-driven correction strategies have demonstrated effectiveness in reducing systematic biases, these methods require massive computational resources and high-quality multi-source data inputs [9], making them complex and time-consuming for short-term operational contexts [10]. Statistical time series methods [11], such as ARIMA [12], LSTM [13], GARCH [14], Holt–Winters exponential smoothing [15], and SARIMA [16], offer computational efficiency and mature theoretical foundations. However, their linear parametric nature fundamentally limits their capacity to capture the complex nonlinear dependencies prevalent in wind power data [17,18]. Moreover, these methods typically assume symmetric error distributions and cannot handle the asymmetric rising/decaying patterns or the scale-symmetric correlations across different frequencies that are common in wind power sequences.
Intelligent algorithms, particularly artificial neural networks (ANNs) [19], deep learning architectures [20], and attention mechanisms [21], have emerged as powerful alternatives. By autonomously identifying latent patterns from historical data, these data-driven approaches can extract implicit nonlinear dynamics with strong fault tolerance and generalization capability [22]. Unlike traditional statistical models that assume symmetric error distributions, neural networks can naturally capture both symmetric and asymmetric fluctuation patterns in wind power sequences. Transformer [23], Back Propagation Neural Network (BPNN) [24], and LSTM architecture have shown the most advanced results in various forecast scenarios. However, individual intelligence models are still susceptible to overfitting, require a large amount of high-quality training data, and exhibit poor interpretability, known as the “black box” problem [25]. Furthermore, a single model architecture rarely excels at capturing all relevant temporal scales and heterogeneous data patterns simultaneously, including the scale-symmetric correlations across different frequencies, leading to performance bottlenecks in complex real-world applications [26].
In response to these limitations, hybrid methods that integrate multiple complementary techniques have gradually become the mainstream paradigm in wind power forecasting [27]. By combining data preprocessing, optimization algorithms, and multiple learning architectures, hybrid models can partially compensate for the weaknesses of individual components while leveraging their respective strengths. Hanifi et al. systematically compared the hyperparameter optimization performance of Scikit opt, Optuna, and Hyperopt on CNN and LSTM models, and found that Optuna exhibited the highest efficiency in parameter adjustment. They also analyzed for the first time the impact of random initialization on the stability of neural networks [28]. Guo et al. proposed the IMK hybrid forecast system, which innovatively integrates data preprocessing, HHO algorithm and multi-objective chaotic mapping optimization, and uses Türkiye’s wind farm data to verify its excellent forecast performance [29]. Wang et al. integrated a hybrid model of the improved optimization algorithm, decomposition algorithm, and QR-CNN-BiGRU attention mechanism [30]. The results show that it is significantly better than five benchmark models in multiple error metrics. Kumar et al. constructed the EFD-LSTM-GWO model, which effectively addresses the challenges of non-stationarity and nonlinearity in wind power sequences [31]. Among them, the average forecast error of five wind farm datasets in India has been reduced by more than 10%, which proves its synergy advantage in the process of the decomposition and optimization of the forecast.
However, there are still key gaps in current hybrid forecast research, which hinder the realization of fully reliable and practical forecast systems. First, the effectiveness of even sophisticated hybrid models is severely constrained by the quality of input data. The high-dimensional noise and non-stationary fluctuations inherent in raw wind power sequences can propagate through the modeling pipeline, masking genuine patterns and degrading forecast accuracy. To mitigate this, researchers have explored signal decomposition techniques such as Empirical Mode Decomposition (EMD) [32] and Ensemble Empirical Mode Decomposition (EEMD) [33]. While these approaches offer partial improvements, challenges including residual noise and mode mixing remain unresolved. Variational mode decomposition (VMD) provides a more theoretically robust alternative by reformulating the decomposition as a constrained variational optimization problem [34]. Unlike EMD-based methods, VMD produces modes that are symmetric in the frequency domain, each centered around a specific frequency. However, standard VMD processes each input channels independently, which fails to capture the cross-channel correlations that naturally exist in multivariate meteorological datasets. A more advanced framework that achieves simultaneous multi-channel decomposition with shared frequency components is therefore required to fully exploit the coupling structures among input variables. Second, the determination of optimal ensemble weights in hybrid forecasting systems remains an unresolved multi-objective challenge. The scientific allocation of model weights is essential to ensemble effectiveness, as it governs the trade-off between the forecasting accuracy and the output stability of the combined system [35]. Existing studies predominantly rely on single-objective optimization algorithms, which are fundamentally incapable of resolving the inherent tension between these competing indicators—improving accuracy often comes at the cost of increased variance, and vice versa [36]. Although multi-objective optimization algorithms such as the multi-objective dragonfly algorithm (MODA) have been proposed, conventional implementations suffer from premature convergence, decelerated exploration in later iterations, and an insufficient balance between global search and local exploitation [37]. An enhanced optimization algorithm with adaptive mechanisms that can dynamically navigate the accuracy-stability Pareto frontier under complex wind conditions is urgently needed. Third, the predominant focus of current research on deterministic point forecasting falls short of the reliability requirements demanded by modern power grid operations. Point forecasts provide only a single numerical output and cannot describe the inherent uncertainty structure embedded in the forecasting process, thus limiting their practical value for risk assessment and dispatch decision making [38]. Probabilistic interval forecasting, which quantifies the range of possible future values with specified confidence levels, has gained increasing attention as a decision-support tool. Yet this critical dimension remains underexplored in the literature, and an effective framework that tightly integrates multi-objective optimization with density estimation for high-quality interval generation has yet to be established.
To systematically address the three interconnected challenges identified above, this study develops a synergistic soft computing framework that tightly couples intelligent data refinement, heterogeneous ensemble learning, and multi-objective evolutionary optimization within a unified architecture. The framework is conceptually organized into three coordinated functional modules, each targeting a specific bottleneck in the existing research landscape.
The first module establishes a robust and interpretable data preprocessing pipeline. Rather than treating noise removal and feature extraction as disconnected steps, the pipeline integrates a dual-filtering anomaly detection mechanism—jointly employing density-based clustering and local outlier factor analysis—with multivariate variational decomposition that synchronously separates multi-channel signals into band-limited intrinsic modes. This coordinated design simultaneously suppresses high-dimensional noise, preserves cross-variable coupling information, and enhances the signal-to-noise ratio of the input features. Moreover, by introducing Shapley Additive Explanations (SHAP) to quantify the marginal contribution of each input variable, the preprocessing workflow becomes auditable and transparent, ensuring that subsequent modeling decisions are driven by physically meaningful inputs rather than statistical artifacts. The second module constructs a diversified heterogeneous ensemble of forecasting models with distinct architectural biases. By combining architectures specialized in deep temporal feature extraction, gradient boosting, rapid nonlinear approximation, and recurrent memory, the ensemble comprehensively captures both global long-range dependencies and local high-frequency fluctuations across multiple time scales. To optimally calibrate the weight distribution among these heterogeneous predictors, an improved multi-objective optimization algorithm is developed. The algorithm incorporates an elite opposition-based learning mechanism to broaden exploration and an adaptive exponential step-size strategy to prevent stagnation in high-dimensional weight space. This design enables the optimizer to dynamically navigate the Pareto trade-off between forecasting accuracy and model stability, producing robust and well-balanced ensemble outputs that are unattainable through single-objective optimization schemes. The third module extends the framework from deterministic forecasting to reliability-oriented probabilistic inference. By embedding the improved multi-objective optimization algorithm within a kernel density estimation framework, the system simultaneously maximizes forecast interval coverage probability while minimizing interval width. This dual-objective mechanism overcomes the constraints of traditional quantile regression approaches, achieving an organic integration of evolutionary search and non-parametric density estimation. The resulting forecast intervals provide high-confidence uncertainty bounds that are directly actionable for grid dispatch and risk management decision making.
The design philosophy underpinning this three-module architecture is the establishment of a closed-loop coupling between data preprocessing and algorithmic self-tuning. Unlike conventional sequential approaches where each stage operates in isolation, the proposed framework allows the downstream optimization process to inform and be informed by the upstream data refinement, creating a coherent and self-adaptive forecasting system. Furthermore, wind power sequences possess inherent scale symmetry across multiple temporal resolutions, meaning that patterns observed at finer scales resemble those at coarser scales. Leveraging this property, our framework employs MVMD to decompose the signal into symmetric multi-scale components before ensemble learning.
Based on the proposed framework, the main contributions of this study are summarized as follows:
(1) A robust multi-stage data refinement scheme with built-in interpretability. An integrated preprocessing protocol is developed that combines dual-filtering anomaly detection (Adaptive DBSCAN and Local Outlier Factor) with multivariate variational mode decomposition (MVMD) for synchronized multi-channel analysis, effectively eliminating outliers while preserving cross-variable correlations. SHAP analysis is introduced to quantify feature importance, ensuring that the entire preprocessing pipeline is auditable and that modeling decisions are driven by physically meaningful inputs.
(2) A heterogeneous ensemble construction methodology with improved multi-objective optimization. A diversified ensemble comprising Transformer, BPNN, ELM, XGBoost, and QRLSTM is architected to comprehensively capture multi-scale temporal dependencies and heterogeneous data patterns. A modified multi-objective dragonfly algorithm (MMODA) is developed, integrating elite opposition-based learning and adaptive exponential step-size strategies to dynamically determine optimal ensemble weights. This methodology effectively overcomes the limitations of conventional single-objective optimization and achieves a superior balance between forecasting accuracy and model stability.
(3) An integrated MMODA-KDE framework for reliability-oriented probabilistic interval forecasting. The proposed framework combines multi-objective optimization with kernel density estimation to simultaneously optimize forecast interval coverage probability and interval width. By generating well-calibrated forecast intervals at specified confidence levels, the framework provides a practical decision-support tool for grid dispatch and energy market risk management.
The rest of this article is designed as follows. Section 2 introduces the basic principles of the methods proposed in the model. Section 3 shows the experimental preparation and six numerical experiments and forecast results under two datasets. The forecast performance of the model we developed is discussed in depth in Section 4. Section 5 presents the conclusion and future work.

2. Methodology

2.1. Data Preprocessing

In response to the common issue of outliers in wind power time series, this study constructs a two-stage, completely unsupervised DBSCAN-LOF outlier screening model. This model combines the advantages of a density-based DBSCAN clustering algorithm and Local Outlier Factor (LOF) detection method.
DBSCAN divides data points into core points, boundary points, and noise points by defining a neighborhood radius and minimum number of points. It can identify clusters of any shape and effectively process noise data, making it suitable for the complex distribution of wind power data. Local Outlier Factor (LOF) is a density-based anomaly detection algorithm that identifies outliers by comparing the local density differences between data points and their neighboring points. The core is to calculate the LOF value of each data point: if the value is significantly greater than 1, the point may be an outlier; approaching 1 indicates that the density is similar to the surrounding area; and less than 1 indicates being in a relatively dense area. The core calculation process is as follows: firstly, the Euclidean distance between point a and its k-th nearest neighbor is denoted as d i ( α ) . Based on this, the reachable distance from point o to point α is defined as the maximum value of the Euclidean distance between two points and the k-th distance from point o, that is
r d i ( α , o ) = m a x { d i ( o ) , d ( α , o ) }
Furthermore, the locally reachable density of point α within the k-neighborhood can be represented by the reciprocal of the average reachable distance of all points in its neighborhood, denoted as
h d k ( α ) = | N k ( α ) | o N k ( α ) r d k ( α , o )
Among them, N k α represents the k-neighborhood point set of α .
L O F k ( α ) = 1 N k ( α ) o N k ( α ) h d k ( o ) h d k ( α )
After two stages of processing, a more regular and forecast wind power sequence is finally obtained. The main advantages of this integrated method include: (1) high computational efficiency, (2) unsupervised learning ability, no need to label anomalies, (3) strong robustness to interference, and (4) automatic processing of outlier removal and feature selection preprocessing, successfully eliminating the need for manual threshold adjustment and special assumptions that plague traditional methods.

2.2. SHapley Additive exPlanations (SHAP)

SHapley Additive exPlanations (SHAP) is a model interpretation technique based on the SHapley value of game theory, which aims to quantify the contribution of each input feature in machine learning models to a specific forecast [39]. This method regards the model forecast as total income and each feature as participant, and distributes its contribution fairly by calculating the SHapley value φ i of each feature i. SHAP constructs an additive explanatory model by adding these φ i to a baseline forecast φ 0 to approximate the forecast behavior of the original complex model.
In practical applications, the SHAP value intuitively reflects the influence of features: a positive value indicates the promotion of forecast, a negative value indicates inhibition, and an absolute value indicates the degree of influence. More importantly, SHAP provides a key basis for feature selection by calculating the average absolute SHAP value of each feature on all samples. Effective screening of highly important features can significantly improve the accuracy and generalization ability of wind power forecast models and reduce the computational complexity. Therefore, SHAP has become a powerful tool for optimizing model input, improving forecast performance, and improving model transparency. The formula is as follows:
φ i = S F { i } | S | ! ( | F | | S | 1 ) | F | ! [ f ( S { i } ) f ( S ) ]  
m ( x ) = φ 0 + j = 1 N φ j x j

2.3. Multivariate Variational Mode Decomposition

MVMD is a multivariate extension of traditional variational mode decomposition (VMD). Its core lies in jointly processing multi-channel correlated signals, rather than analyzing individual time series in isolation [40]. The essence of MVMD is a constrained variational optimization problem that can synchronously decompose multi-channel input signals into a series of band-limited intrinsic mode functions (IMFs). By sharing the frequency component framework, the common oscillation patterns of all input channels are identified, effectively revealing the inherent dynamic correlations and coupling mechanisms of multiple variables in complex systems. Unlike conventional decomposition methods that process each channel independently, MVMD enforces a shared set of center frequencies across all channels, ensuring that the extracted intrinsic mode functions (IMFs) are symmetric in the frequency domain—each mode is compactly centered around its respective frequency. This frequency-domain symmetry property is fundamental to preserving cross-channel coherence and facilitates interpretable multi-scale analysis. For a multivariate input signal X ( t ) = [ x 1 ( t ) , x 2 ( t ) , , x c ( t ) ] with C channels, the optimization problem is formulated as:
min u c , k w k k = 1 K c = 1 C t δ t + j π t u c , k t e j w k t 2 2
Subject to the constraint: k = 1 K u c , k t = x c t , c = 1 , 2 , , C .
The symbol denotes convolution, and δ t + j π t is the kernel of the Hilbert transform that converts a real-valued mode into its analytic signal. The term δ t + j π t u c , k t thus represents the analytic extension of the mode u c k ( t ) . Minimizing this quantity simultaneously enforces compactness around the shared center frequency w k for the k-th mode across all C channels, where u c k ( t ) represents the k-th order modal function of the c-th channel, and w k denotes the shared central frequency of the k-th order mode. This sharing mechanism serves as the core mathematical foundation for MVMD to capture common frequency components across multiple variables. The decomposition procedure of MVMD is implemented iteratively using the Alternating Direction Method of Multipliers (ADMM), and the detailed steps are as follows:
  • Initialization: Set the number of modes K and the penalty parameter α . Initialize all modes u ^ c k 1 w (typically in the frequency domain), the shared center frequencies w k 1 , and the Lagrangian multipliers λ ^ c 1 w . Set the iteration index n = 0.
  • Mode Update: For each mode k and each channel c, update the mode estimate in the frequency domain using a Wiener-like filter:
    u ^ c k n + 1 w = x ^ c w i k û c l n w + λ ^ c n w 2 1 + 2 α w w k n 2
    Here, α  is the bandwidth penalty parameter, regulating the compactness of the modes.
  • Center frequency update: Update the shared center frequency based on the current estimated mode of all channels.
    w k n + 1 = c = 1 C 0 w û c k n + 1 w 2 d w c = 1 C 0 û c k n + 1 w 2 d w
    This step ensures that each center frequency represents the common spectral center for that oscillatory mode across all variables.
  • Lagrange multiplier update: Update the Lagrange multiplier for each channel based on the current reconstruction error to enhance convergence:
    λ ^ c n + 1 w = λ ^ c n w + τ x ^ c w k = 1 K û c k n + 1 w
    where τ serves as the noise tolerance parameter.
  • Convergence Check: Iterate steps 2 to 4 until the convergence criterion is satisfied. The criterion is often based on a threshold for the reconstruction error or the magnitude of change in modes between successive iterations:
    c = 1 C k = 1 K u ^ c k ( n + 1 ) x ^ c 2 < E
  • Output: Upon convergence, the algorithm outputs the complete set of IMFs for all channels u c k ( t ) and the final set of center frequencies w k .
Through this coordinated decomposition, MVMD can distinctly separate common oscillatory modes of different time scales within the signal.

2.4. Intelligent Optimization Algorithm

Multi-Objective Dragonfly Algorithm (MODA)

The multi-objective dragonfly algorithm (MODA), proposed by Mirjalili in 2016, mimics the static local foraging and dynamic long-distance migration of dragonfly colonies. Five interaction rules govern hyperparameter optimization:
Definition 1
(Separation). Avoid spatial conflicts with neighboring individuals and maintain a safe distance for individuals. The repulsive force vector generated by this behavior is calculated as:
S i = j = 1 N x x j
Definition 2
(Alignment). Realize speed synchronization between individuals and neighborhoods, and maintain consistency in group movement:
A i = j = 1 N V j N
Definition 3
(Cohesion). Drive individuals to move towards the centroid of the group and maintain cluster integrity:
C i = j = 1 N x j N x
Definition 4
(Attraction). Guide individuals to move towards the food source (current optimal solution):
F i = F l o c x
Definition 5
(Distraction). Drive individuals away from natural enemies (inferior solution): Among them, E l o c is the location of natural enemies, and high-density area solutions are dynamically selected from the archive. This mechanism avoids falling into local Pareto optimality.
E i = E l o c x
In the multi-objective dragonfly algorithm (MODA), position updating is achieved through the collaboration of the step vector Δ x and the position vector x. The step vector is determined by five behavioral criteria and an inertia term: Δ x t + 1 = s S i + a A i + c C i + f F i + e E i + w x t . This design allows MODA to achieve a good balance between convergence and diversity, making it well-suited for multi-objective parameter optimization in wind power forecasting.

2.5. Modified Multi-Objective Dragonfly Algorithm (MMODA)

Although the basic MODA possesses advantages such as process simplification, wide application, strong search capability, and outstanding robustness, it shares a common deficiency with most optimization algorithms: it is prone to falling into local optima and exhibits a decreasing convergence rate in the later stages of iteration. To alleviate these inherent limitations, this study adopts an improved multi-objective dragonfly algorithm that incorporates elite opposition-based learning techniques and an exponential function-controlled step-size strategy; the specific theoretical formulas for these two advanced strategies are presented below.

2.5.1. Elite Opposition Learning Strategy

The elite opposition-based learning method is an advanced intelligent search paradigm. Its principle is to construct opposing solutions through elite individuals, thereby broadening the exploration scope of the algorithm. In this study, the dragonfly individual that is most adaptable to the environment is designated as an elite, represented as: e x m t = e x m , 1 t , e x m , 2 t , , e x m , D t . m = 1, 2,…, EN, where e x m , j t represents the solution of the m-th elite individual in the j-th dimension, and D is the algorithm space dimension. For any dragonfly individual in the current population x i t = x i , 1 t , x i , 2 t , , x i , D t , its elite opposition solution e x i t = e x i , 1 t , e x i , 2 t , , e x i , D t is computed as:
e x i , j t = k · e a j t + e b j t x i , j t
e a j t = m i n e x m , j t , e b j t = m a x e x m , j t
Boundary constraints are enforced via:
e x i , j t = r a n d e b j t e a j t + e a j t , i f   e x i , j t < L b j
Here, k = r a n d ( 0,1 ) , following a uniform distribution on the [0,1] interval. Its function is to generate diverse elite opposition individuals; r a n d [ 0,1 ] , also a uniformly distributed random constant. Its function is to dynamically adjust the exponential scaling factor of the step size.

2.5.2. Exponential Function-Based Step-Size Strategy

In standard MODA, parameters are randomly adaptively adjusted, allowing individual dragonflies to update with random linear steps during the iteration process. This strategy cannot guarantee the optimality of the solution and has a slow convergence speed. To address this issue, this article introduces an exponential step-size strategy that constructs adaptive update rules by embedding exponential components in the step size. The step-size rule in this case is:
μ = r a n d 0.5 × 2 r a n d r a n d 0,1
Δ = μ Δ x t + 1 = r a n d 0.5 × 2 r a n d × Δ x t + 1
where rand denotes a uniformly distributed random scalar in the interval [0,1], sampled independently each time it appears. The formula for updating the dragonfly position vector is:
x t + 1 = x t + μ Δ x t + 1
where t denotes the current iteration index, and x t represents the position vector of the t-th iteration. This mechanism progressively accelerates the step size adjustment throughout the iterative process, This approach not only helps avoid local optima but also facilitates effective identification of the global optimum.
In summary, MMODA provides an effective solution for the optimization of the wind power forecast model by virtue of its enhanced global search capability and fast convergence characteristics.

2.6. Methodology and Implementation Flow

The pioneering research of Bates et al. in the 1960s showed that the forecast result of combining two or more models was significantly better than the forecast effect of relying on a single model [41]. Based on this principle, our research introduces an enhanced combination model specifically designed to capture the inherent linear and nonlinear features in wind speed data. Figure 1 intuitively describes its structure and process, and provides detailed explanations in subsequent chapters.
  • Procedure 1: Data Pretreatment
To effectively extract the dominant patterns of wind speed data and lay the foundation for high-precision forecasting, this study designed a complete data preprocessing workflow. Initially, DBSCAN and LOF algorithms were employed to clean the raw data and remove outliers. Subsequently, through MVMD, the intrinsic mode functions and residual terms were obtained to extract stable and effective sequence features. To overcome the challenge posed by high-dimensional input, a feature selection model was further constructed to screen the most forecasting feature subsets, thereby reducing data dimensionality and eliminating information redundancy. This preprocessing process effectively enhanced the stability and representativeness of the input data for subsequent forecasting models, providing an important guarantee for achieving accurate wind power forecasting.
  • Procedure 2: Shap-based Feature Selection
In order to improve computational efficiency, this study adopted a feature selection method based on XGBOOST-SHAP for dimensionality reduction. The SHapley value is used to quantify the marginal contribution of each feature to the forecast results. The analysis results (see Figure 2) indicate that the hub height and wind speed measurements at heights of 50 m, 30 m, and 10 m are the main influencing factors. Based on this analysis, select the most relevant features as model inputs to improve model performance and interpretability.
To further explore the mechanism relationship between input variables and model behavior, this study drew a feature dependency chart to reveal the interrelationships between meteorological parameters. As shown in Figure 3, the horizontal axis of each subgraph represents the measured feature values, the vertical axis represents the corresponding SHAP values, and the color bars display the auxiliary variable values with the strongest correlation with the main feature.
Analysis found that there is a significant dependency relationship between wind speeds at different heights. The wind speed at 10 m height mainly depends on a 30 m wind speed, and the higher 30 m wind speed will continuously enhance the positive impact of a 10 m wind speed on power forecast. Similarly, there is a strong correlation between a 30 m wind speed and a 50 m wind speed, and a higher 50 m wind speed can amplify the SHAP positive value of a 30 m wind speed throughout the entire observation range. These dependencies reflect the vertical consistency of wind speed observation and its common impact on wind power forecast, and the color gradient intuitively shows how the interaction between measurements at different heights adjusts their respective contributions to model output.
  • Procedure 3: Theoretical Rationale for Heterogeneous Base Model Selection
In order to comprehensively extract the inherent linear patterns and nonlinear dependencies in wind speed data, Transformer [42], BPNN [43], ELM [44], XGBoost [45], and QRLSTM [46], a group of diversified and heterogeneous basic models, are strategically integrated into a hybrid forecast architecture. By integrating robust data preprocessing strategies with these single forecast models, subsequent forecast analysis can be carried out [47].
The effectiveness of an ensemble depends critically on the diversity among its base learners. Ensemble learning theory establishes that when base models exhibit low forecast covariance—that is, when their errors are distributed across different regions of the feature space—the ensemble can achieve substantial error reduction even if individual models possess only moderate accuracy. Conversely, combining highly correlated models yields minimal improvement regardless of ensemble size. This principle directly motivates our selection strategy: the base models must represent structurally heterogeneous learning paradigms with distinct inductive biases.
To this end, five architectures spanning three learning paradigms were selected. First, the Transformer, built upon the self-attention mechanism, captures global long-range dependencies by directly modeling relationships between any two temporal positions—a capability structurally distinct from recurrent models that process information sequentially. Second, XGBoost, a tree-based gradient boosting algorithm, models complex nonlinear feature interactions through decision-tree logic that differs fundamentally from the continuous activation functions of neural networks. Third, within the neural network paradigm, three architectures with further structural sub-diversity are included: BPNN provides stable, general-purpose nonlinear mapping as a classical feedforward network; ELM employs randomly assigned and fixed hidden-layer weights, generating a generalization landscape distinct from gradient-optimized networks and contributing valuable error decorrelation; and QRLSTM integrates recurrent gating with quantile regression, endowing it with native temporal memory for short-term fluctuations while naturally bridging the deterministic forecasting and subsequent interval generation stages.
Collectively, these models are distributed across distinct regions of the architectural design space—tree-based versus neural-based, attention-based versus recurrent versus feedforward—ensuring low inter-model forecast covariance. Each model contributes a unique inductive bias that, when optimally combined, enables comprehensive capture of patterns ranging from high-frequency local fluctuations to long-range seasonal trends.
  • Procedure 4: Model Hyperparameters
To ensure full reproducibility of the multi-step wind power forecasting experiments, all architectural details and training-related hyperparameters used in the five base models (Transformer, XGBoost, BPNN, ELM, and QRLSTM) are explicitly documented in this subsection. For each model, the configuration—including the main architecture, activation functions, dropout rates, optimizer settings, learning rate, batch size, training epochs, and loss function—has been systematically listed in Table 2. These specifications provide the necessary information for independent verification of the model structures and training procedures adopted in this study. It should be noted that the ELM employs a regularized pseudo-inverse solution rather than gradient-based optimization, and the XGBoost model uses a tree-based learning rate and a fixed number of boosting rounds in place of conventional training epochs.
  • Procedure 5: Framework of the Proposed Integrated Wind Power Forecasting System
In order to improve the performance of the proposed model, this study introduced a new weight allocation mechanism after the framework construction was completed. The core of this mechanism is to apply optimization algorithms to globally search for the most effective weight distribution. To complete this optimization, the complete time series forecast value generated in the early forecast stage is divided into two parts: the first part of the forecast value is selected as the training dataset, which is specially used to train and determine the weight coefficient of the hybrid model. The remaining portion is used as an independent test set for subsequent evaluation of model performance. After obtaining the best weight coefficient, the forecast results of each individual model are weighted and integrated to finally realize the forecast of wind speed.
  • Procedure 6: Forecasting Performance and Validation of a Combined Model
Using the integrated hybrid forecasting model, one-step and multi-step forward forecast of the original wind speed data is carried out, and the output sequence of the model is obtained. Then, systematically evaluate the overall effectiveness of the model. The evaluation mainly focuses on: Firstly, point forecast performance evaluation, which is used to measure the closeness between the model forecast value and the actual value. Secondly, interval forecast performance evaluation. The detailed modeling steps are shown in Figure 4. Correspondingly, the Pseudo code of the proposed integrated wind power forecasting system is shown in Algorithm 1.
Algorithm 1. The Pseudo code of the proposed integrated wind power forecasting system
Input: Raw wind farm time-series data, K (Decomposition modes), α (Penalty factor), Ite r max (Max iterations), Population size SN , Elite number EN , Base models M .
Output: Optimized weight vector w , Multi-step deterministic forecasts, Probabilistic forecasting intervals.
//Objective Function Formulation
1: Define Multi-objective function: minF ( w ) = [ ob f 1 ( w ) , ob f 2 ( w ) ] T
2: ob f 1 ( w ) = MAPE = 100 % n i = 1 n v i v ^ i ( w ) v i              Accuracy Criterion
3: ob f 2 ( w ) = std ( v i v ^ i ( w ) )                      Stability Criterion
//Phase 1: Intelligent Data Refinement
4: Detect global outliers using Adaptive DBSCAN with elbow-point epsilon detection.
5: Filter local anomalies via Local Outlier Factor (LOF) on DBSCAN inliers.
6: Apply Mutual Information (MI) and SHAP to select the most significant meteorological features.
//Phase 2: Feature Decomposition & Base Prediction
7: Execute Multivariate VMD (MVMD) to extract robust sub-modes from non-stationary signals.
8: Construct recursive datasets for 1-step, 2-step, and 3-step horizons.
9: Train heterogeneous ensemble M (Transformer, BPNN, ELM, XGBoost, and QRLSTM).
10: Generate prediction matrix V ^ where each column represents a base model’s output.
//Phase 3: MMODA Weight Optimization
11: Initialize dragonfly population X and initialize an empty Pareto Archive.
12: While t < I t e r m a x do
13:  if t > 0 then                        Elite Opposition-Based Learning (EOBL) Strategy
14:   Select EN elite solutions from Archive using Roulette Wheel Selection.
15:   Compute dynamic search boundaries [ e a j , e b j ] based on the current elite set.
16:   Generate opposite solutions: X ¯ i , j = k · ( e a j + e b j ) X i , j .
17:   Re-evaluate F ( w ) and update X using non-dominated sorting.
18:  end if
19:  Update social coefficients { s , a , c , f , e } and inertia weight w using adaptive decay.
20:  for each individual i = 1 to SN do
21:   if neighbors exist within radius R then
22:    Calculate behaviors: Separation ( S i ), Alignment ( A i ), Cohesion ( C i ), Food ( F i ), Enemy ( E i ).
23:      Δ x t + 1 = ( s S i + a A i + c C i + f F i + e E i ) + w x t .
24:   Update position: x t + 1 = x t + Δ x t .
25:   else
26:   Perform Lévy flight-based stochastic search.     Exploration through Random Walk
27:   end if
28: Boundary handling: x t + 1 = max ( min ( x t + 1 , Ub ) , Lb ) .
29: end for
30: Update Pareto Archive and prune dense regions using Crowding Distance.
31:  t = t + 1 .
32: end while
//Phase 4: Probabilistic Synthesis & Output
33: Retrieve X (Best compromise weights) from the final Pareto Archive.
34: Calculate final deterministic forecast: V final = w i · V ^ i .
35: Estimate Forecasting Intervals (FIs) using MMODA-optimized Kernel Density Estimation (KDE).
36: V final , w , and interval evaluation metrics (PICP, PINAW).

3. Experimental Analysis

This section conducts four comprehensive experiments on two wind power datasets to thoroughly verify the forecast performance of the proposed synergistic forecasting framework. Before these experiments, basic preparations such as data collection and preprocessing, establishment of evaluation criteria, and model configuration were systematically completed to ensure a solid methodological foundation.

3.1. Data Source

The original wind speed sequences used in this study were collected from two wind turbines located in Shandong, China, with a time interval of 15 min. Site 1 has a capacity of 99 MW and Site 2 has 200 MW; both datasets were exported from the SCADA system. The two sites differ in terrain and power curve characteristics, enabling assessment of model robustness and generalization. For each dataset, continuous segments of 20,000 data points were selected to form the test time series, and the forecasting framework adopts the multi-step forecasting method. To rigorously evaluate model generalization and prevent data leakage, a strict temporal partitioning strategy was employed. For each dataset, the first 70% of the data points were allocated as the training set, the following 15% as the validation set for hyperparameter tuning, and the final 15% as the test set for final performance evaluation. This sequential partitioning preserves the temporal order of the data, ensuring that future information is never used to predict past observations—a critical requirement for realistic forecasting scenarios. All model development, including parameter selection and ensemble weight optimization, was conducted exclusively on the training and validation sets. The test set was held out completely until the final evaluation stage, after all model configurations were frozen, to provide an unbiased estimate of generalization performance.

3.2. Evaluation Criteria for Experimental Validation

To objectively assess the forecasting performance, well-defined evaluation criteria must be established. In this research, four different performance indicators are used to systematically evaluate the forecast accuracy of the proposed model. The specific definitions and calculation formulas of each indicator are fully included in Table 3. At the same time, in order to concisely and intuitively summarize the composition of all mixed models in the experiment for the following use, Table 4 shows the specific combination of decomposition models, forecast models, and optimization models used by different models.

3.3. Comparison Forecasting Experiments

3.3.1. Experiment I: Comparison with Other MVMD-Based Models

A systematic evaluation was conducted on the proposed combined forecasting model and several benchmark models employing the same MVMD decomposition technique, using datasets from two target wind farms (Site 1 and Site 2). The specific parameter configurations for all forecasting models are detailed in Table 5. To comprehensively assess forecasting performance, four key evaluation metrics were adopted: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Sum of Squared Errors (SSE). Detailed experimental results for different forecasting horizons (one-step, two-step, and three-step ahead) are summarized in Table 6.
The forecasting results at Site 1 confirmed the distinct superiority of the proposed combined model in both single-step and multi-step scenarios. Specifically, the MAPE values for one-step, two-step, and three-step forecasting were 1.9633%, 2.9035%, and 4.0199%, respectively, which were the smallest among all compared models. A summary of the relevant model characteristics for this site is provided in Figure 5, where the x-axis denotes the sample index at 15 min resolution.
Turning to Site 2, the proposed model also exhibited excellent performance. Taking one-step forecasting as an example, the proposed model achieved the optimal MAPE of 3.0607%, whereas the poorest-performing model, MVMD-XGBOOST, reached 10.4055%. Furthermore, in multi-step forecasting, the proposed model consistently outperformed others across all evaluation metrics, further validating its robustness in multi-step forecasting tasks.
Remark: The experimental results from both sites indicate that the proposed hybrid model is unequivocally effective for short-term wind speed forecasting. Compared to other MVMD-based hybrid models, the proposed model achieved the lowest average MAPE across the two datasets among all participating models and performed best on all four evaluation criteria, confirming its superior capability in reducing forecast errors and enhancing forecast reliability.

3.3.2. Experiment II: Benchmarking Against Models with Diverse Data Pretreatment

This study benchmarks the forecasting performance of hybrid models constructed upon four distinct data preprocessing methodologies: MVMD, VMD, EWT, and SSA. All models were tested on identical datasets collected from two wind farms (designated as Site 1 and Site 2). Table 7 provides the parameter configuration of these forecast models, and Table 8 summarizes the complete evaluation indicators.
For Site 1, the MVMD-based composite model (MVMD-MODA) shows the best forecast accuracy in all steps. Specifically, in the one-step forecast, it achieved the lowest MAE, MAPE, and RMSE, 0.1953%, 1.9633%, and 0.3089% respectively. Its MAPE is 9.539 percentage points lower than the suboptimal model. This excellent performance has remained unchanged in the two-step and three-step forecast. Among other preprocessing technologies, the VMD-based model shows unstable errors, the SSA-based model performs mediocrely, and the EWT-based model ranks lowest in forecast accuracy.
For Site 2, the proposed MVMD-based model likewise secured the most favorable outcomes across all three forecast steps. In the one-step forecast, its MAPE reached 3.0032%, 5.8531%, 15.2797%, and 6.0832% lower than the forecast system based on VMD, EWT, and SSA respectively. A visual comparison between the forecast results generated by the MVMD-based combination strategy and other preprocessing techniques for this site is provided in Figure 6, where the x-axis denotes the sample index at 15 min resolution.
Remark: The outcomes of Experiment II unambiguously indicate that the proposed MVMD-based combination strategy consistently outperforms strategies employing other data preprocessing techniques, regardless of the forecast step size or the site location. This assertion is strongly supported by the experimental data, which shows that our model obtained markedly lower values for MAE, MAPE, RMSE, and SSE compared to the benchmark models. The average MAPE of the three forecast steps of the model is 5.8979%, which is the lowest among all models, and is 10.7% lower than the suboptimal model. These findings confirm the robustness and consistency of MVMD pretreatment technology in improving short-term wind speed forecast.

3.3.3. Experiment III: The Advancement of MMODA in Addressing Multi-Objective Optimization Problems

In order to systematically evaluate the performance of MMODA in terms of convergence behavior and solution diversity features, three classic ZDT benchmark functions were selected for comparative analysis with three mainstream multi-objective optimization techniques. The evaluation process uses three performance indicators to examine the approximate Pareto front obtained from multiple perspectives: the reverse generated distance mainly reflects the closeness between the non-dominated solution set obtained by the algorithm and the true Pareto front. The hypervolume metric measures the volume of the target space dominated by the solution set. The larger the value, the higher the overall quality of the solution set. The spacing metric is used to evaluate the uniformity of solution distribution, with lower values typically corresponding to more balanced distributions. In the experiment, all comparison algorithms were configured with a population size of 100 and independently executed 50 times on each test function to obtain statistical results.
Data presented in Table 9 clearly demonstrate that MMODA achieved optimal overall performance across all test benchmarks. Particularly in convergence, the algorithm obtained mean IGD values of 0.0045, 0.0056, and 0.0036, respectively, significantly lower than other algorithms, demonstrating its superior convergence performance. In addition, MMODA also achieved the highest HV value, with an improvement of Δ ¯ H V M O G W O = 30.31 % ,   Δ ¯ H V M O W A = 19.20 % ,     Δ ¯ H V M O D A = 12.29 % compared to the three compared algorithms. Meanwhile, its SP value has also decreased, indicating that the non-dominated solution set has better uniformity in distribution.
A visual comparison of the Pareto fronts in Figure 7 reveals that MMODA can generate non-dominated solution sets that are closest to the true front and evenly distributed across all three test functions. In contrast, although MODA and MOWA manage to approximate the true front to some extent, their obtained solution sets exhibit deficiencies in distribution range and density; meanwhile, MOGWO demonstrates significant deviation in the ZDT1 test, along with poor distribution uniformity. Comprehensive analysis indicates that the multiple enhancement mechanisms introduced in MMODA effectively improve the solution accuracy and distribution characteristics of the original MODA algorithm.
After getting the above content, we further examined the effects of different optimization algorithms on the efficacy of the combined forecasting model. Parameter settings for each model are detailed in Table 10, and the corresponding calculated performance metrics are summarized in Table 11. Data analysis indicates that in the single-step forecast task, the MMODA-optimized combined model achieved forecasting effectiveness comparable to models using other optimization algorithms, with their Mean Absolute Percentage Error metrics all remaining within the 3% to 3.5% range.
In addition, no matter which dataset is used or multi-step forecast algorithms such as MOGWO, MODA, and MOWA are performed, they all show high similarity in forecast performance.
Remark: Synthesizing all experimental scenarios and evaluation metrics, the forecasting model employing the MMODA optimization algorithm demonstrates leading advantages based on MAE, MAPE, RMSE, and SSE. This fully attests to the model’s superior adaptability in short-term forecasting tasks. This enhancement directly translates to lower error rates and higher precision, clearly distinguishing it from the foundational MODA algorithm.

3.3.4. Experiment IV: Benchmarking Against Classic Individual Models

This section conducts a comprehensive investigation on the actual performance of the proposed short-term wind speed forecast hybrid system by comparing with five representative independent forecast models. Table 12 summarizes the parameter configurations of all comparison models. Table 13 provides detailed experimental results.
In the validation conducted for Site 1, the proposed model shows excellent forecast ability. In the one-step forecast, it obtained 1.9633% MAPE, which is much lower than the value recorded in other models. With the expansion of the forecast range to two and three steps, the model maintained its leading position. Compared with the BPNN model recording the highest MAPE value, the combined model achieved an average reduction of 44.7591% in MAPE within the range of one to three steps of forecast. Figure 8 provides an intuitive comparison of forecast effectiveness between different models at Site 1.
In the tests performed for Site 2, the proposed model achieves the highest accuracy in the one-step forecast dimension, and all evaluation indicators are significantly better than the single model. When the range of forecast is extended to two or three steps, the performance indicators of the model are still significantly better than those of all benchmark methods. It shows that the reliability and stability of the developed short-term wind speed forecast comprehensive forecast framework have been significantly improved.
Remark: the comparison results highlight the obvious advantages of the proposed model over the traditional individual forecast technology. The key evaluation parameters of the proposed model always show the most ideal numerical characteristics within all forecast ranges. The experimental evidence of this system strongly supports the conclusion that compared with the traditional single model; the proposed combined forecast model has greater competitiveness and effectiveness in the application of short-term wind speed forecast.

3.4. Interval Forecasting

In the research of wind power forecast, the traditional point forecast usually only provides a single numerical output, which has difficulty describing the inherent uncertainty in the process of forecast, nor can it reflect the potential error and fluctuation characteristics, thus limiting its application value in practical decision making. The interval forecast method based on kernel density estimation has attracted more and more attention. Compared with point forecast, the interval estimation provided by KDE not only covers the range of possible values in the future, but also visually displays the uncertainty structure, especially for wind power and other scenarios with strong volatility and complex distribution. Kernel density estimation is a non-parametric density estimation technique, expressed as:
f ( x ) = 1 N h j = 1 N K 2 ( x x j h )
Among them, N represents the sample size, h is the bandwidth parameter, and K 2 · represents the kernel function, whose specific form is:
K 2 ( x x j h ) = 1 2 π e x p [ ( x x j ) 2 2 h 2 ]
The bandwidth parameter h of kernel density estimation has a key impact on the performance of the forecast interval. Therefore, this study uses the optimization algorithm to optimize the bandwidth h in the interval forecast based on KDE, and takes PICP and CWC as the optimization objectives. PICP is mainly used to evaluate the coverage of forecast interval to actual value and reflect the reliability of interval estimation. CWC comprehensively considers the coverage and width characteristics of the forecast interval to measure the stability of the interval. From the perspective of decision making, the high-quality forecast interval needs to pursue higher PICP and lower CWC. However, in the optimization process, the two often constrain each other.
To balance this contradiction, this study introduces a multi-objective optimization method to synchronously process these two indicators and optimize the key parameters of KDE. The core concept of this method is to construct PICP and CWC as multi-objective optimization problems and search for bandwidth parameters that can achieve the best balance between coverage range and interval narrowness. The optimization objective is defined by the following function:
m a x P I C P = 1 N t i = 1 N t C i ( α ) ,   P I C P α
m i n C W C = P I N A W · ( 1 + γ e η ( P I C P μ ) ) , η > 0
P I N A W = 1 N t · R i = 1 N t [ U i ( α ) ( x i ) L i ( α ) ( x i ) ]
γ = { 0 , P I C P μ 1 , P I C P < μ
In the equations, R Indicates the actual distribution range of forecast results; U i ( α ) ( x i ) and L i ( α ) ( x i ) correspond to the upper and lower limits of the forecast interval of the i-th sample under the confidence level α, respectively. This method can improve the coverage of forecast interval, reduce the interval width, enhance the stability of the model, and obtain excellent interval forecast performance and stronger generalization ability. The interval forecast results of dataset 1 are shown in Figure 9.
Data from Table 14 show that the forecast intervals obtained using the MMODA-KDE model are of the highest quality, with corresponding CWC evaluation values of 0.0199 and 0.0363 at the 95% confidence level. The model’s CWC results are consistently lower than those of other comparative models, reflecting the efficacy of its forecasting strategy in elevating the overall standard of the forecast intervals. Moreover, MMODA-KDE achieves the lowest CRPS values across all confidence levels and both datasets indicating that its forecasting distribution is not only well-calibrated but also sharper than those obtained by competing methods, which indicates that MMODA performs better in taking into account both PICP and CWC goals while also delivering superior CRPS.

4. Discussion

This section is structured into five key parts to comprehensively discuss the proposed forecasting model: the evaluation of its practical value, forecast accuracy, improvement and reliability, and the sensitivity analysis of the integrated forecasting system proposed.

4.1. Model Significance Testing: Diebold–Mariano (DM) Test

In order to evaluate the forecast performance from a statistical point of view, the Diebold–Mariano test was used to carry out a significance analysis on the forecast effect of the proposed model. The purpose of this test is to determine whether there is a statistically significant difference in forecast accuracy between the proposed combined forecast model and the benchmark model. The theoretical basis is as follows: Under the given significance level α, if there is no significant difference in the forecast performance between the proposed model and the comparison model, the original hypothesis H 0 will not be rejected; otherwise, reject H 0 and accept the alternative hypothesis H 1 . Assume the statement is:
H 0 : E L e r r o r 1 = E L e r r o r 2
H 1 : E L e r r o r 1 E L e r r o r 2
Here, L denotes the loss function of prediction errors, while e r r o r 1 and e r r o r 2 represent the forecast errors from the proposed model and a comparative model, individually. The DM test statistic is defined as:
D W = 1 n i = 1 n L e r r o r 1 L e r r o r 2 S 2 / n
where S 2 denotes the estimated variance of the loss differential series d = L e r r o r 1 L e r r o r 2 .
After obtaining the DM statistic, compare it with the critical value Z α / 2 . If the absolute value of the statistic exceeds the critical value, the null hypothesis is rejected and it is considered that there is a significant difference between the proposed model and the comparative model. The average DM value of each forecast step is shown in Table 15, from which the following conclusions can be drawn:
First, in all forecast steps, the results show that the forecast performance of these models is significantly different from that of the benchmark model. In the three-step forecast, only the DM value of XGBOOST model is 1.8101, which can reject the original hypothesis at the 10% significance level. Secondly, the MVMD-MMODA strategy is significantly superior to the combination model using other data preprocessing techniques: its DM statistical data exceeds the critical value at a significance level of 1%. Compared with models using different optimization algorithms, the proposed model still shows the same obvious advantages. Finally, compared with the traditional single model, in the first and second steps of forecast, DM statistical data exceeded the critical value at the 1% significance level. In the third step of forecast, the MVMD-MMODA integrating strategy is still significantly superior to other methods at the 5% and 10% significance levels.
Comprehensive analysis shows that in most comparisons, the combination model proposed in this research has significant performance differences compared to other related models. This verifies that the model has better forecast accuracy, and the effectiveness and practical value of the research method in wind speed forecast.

4.2. Performance Improvements of the Proposed Model

In this section, four improved percentage indicators (PMAE, PMAPE, PRMSE, PSSE) are used to comprehensively evaluate the forecast effect of the model. As shown in Table 16, these indicators correspond to the improvement levels of MAE, MAPE, RMSE, and SSE, respectively. And the error improvement percentage between the proposed model and multiple comparison models was calculated, and the results are summarized in Table 17. The results show that the proposed strategy is superior to the benchmark method in forecast accuracy. The main conclusions are as follows:
(a)
In comparison to the non-optimized hybrid model, the hybrid model MVMD-XGBOOST has the most significant improvement effect. For Site 1, the improvement percentages of PMAE, PMAPE, PRMSE, and PSSE reached 76.7336%, 74.7581%, 77.3163%, and 89.2955% respectively, which shows that the proposed model is obviously superior to the traditional hybrid model in forecast ability.
(b)
Compared with the hybrid model integrated by different data preprocessing, the improvement effect is equally significant. For example, in Site 2, the integrated model based on EWT, SSA, and VMD achieved improvement rates of 70.0690%, 62.3612%, and 39.1098% in MAPE, MAE, and RMSE, respectively.
(c)
Compared with the hybrid model integrated by different optimization algorithms, the improvement effect is also consistent. Specifically, compared with MODA, the MAE improvement rate of Site 1 reached 31.111%, and the SSE improvement rate of Site 2 was 35.0400%. This indicates that our proposed method has significant advantages. The comparison with MOGWO, MOWA, and other hybrid models also shows a consistent improvement trend.
(d)
Compared with the traditional single model, the proposed model has the most significant improvement. In Site 1, the improvement rate of all indicators relative to the Transformer model exceeds 93%. In Station 2, the improvement rate relative to the QRLSTM model also exceeded 89%, indicating that the combined strategy adopted can maximize the forecast accuracy, and is an effective wind speed forecast method.

4.3. Sensitivity Analysis

In order to evaluate the stability of the combined forecast model under different parameter configurations, the sensitivity analysis of key parameters in the data processing module and optimization algorithm was carried out. By observing the fluctuation of model output with parameter changes, the impact of each parameter on forecast accuracy was systematically evaluated, and the standard deviation of MAE, MAPE, RMSE, and SSE was taken as the sensitivity measurement index (Table 18). The parameter settings are the same as before, and the analysis uses the controlled variable method. The specific results are shown in Table 19. Analysis shows that:
(a) The change in MVMD parameters has a limited impact on the forecast results. Taking the three-step forecast of Site 1 as an example, when adjusting K, α , and the maximum iteration count, the corresponding standard deviations are 5.1783, 3.0638, and 4.4872, respectively. The overall fluctuation is controllable, indicating that MVMD has good adaptability to parameter changes.
(b) The parameter sensitivity of the MMODA optimization algorithm is significantly lower than that of MVMD, and the changes in evaluation indicators caused by parameter variations are minimal, indicating stable algorithm performance.
(c) As the forecast horizon increases, the sensitivity of the model to parameter variations tends to rise. Among all evaluation metrics, the standard deviation of three-step forecasts is higher than that of two-step forecasts, which in turn is higher than that of one-step forecasts. This indicates that in multi-step forecasting tasks, the rationality of parameter selection significantly affects the final forecast accuracy and requires special attention.

4.4. Seasonal Forecasting Performance Evaluation

To test the adaptability of the model to seasonal fluctuations in wind energy, this study conducted simulation analysis over a longer period of time and a wider time range. In the experiment, four representative seasonal intervals of January, March, July, and September in Station 2 were selected as the evaluation dataset, and several forecast models that performed well in the previous experiment were selected as the reference benchmark. The parameter configurations of each model are shown in Table 20, and the final comparison results are shown in Table 21.
From the data in the table, it can be seen that the model constructed in this article outperforms other comparative models in all evaluation indicators. Specifically, in the single-step forecast, the average MAPE value of the model in the four seasons is 4.3163%, while the MVMD-QRLSTM model, the VMD-based composite model, and the MOGWO-based composite model, for comparison, had MAPE values of 6.0323%, 11.8375%, and 4.8963%, respectively. In addition, from the perspective of seasonal performance, the model has the best forecast effect in spring and winter, followed by autumn, and the accuracy of forecast in summer is relatively low, which is consistent with the actual climate characteristics. To sum up, the model proposed in this study has better adaptability, higher forecast accuracy, and greater stability when dealing with wind power forecast tasks in a longer time range. This conclusion is further supported by Figure 10.

4.5. Generalization Ability and Overfitting Analysis

To further evaluate the generalization ability of the proposed framework under different geographical and climatic conditions, we collected an additional dataset from a third wind farm located in Gansu Province. This site has a total installed capacity of 66 MW and a 15 min sampling interval, but with different terrain and climate from the two Shandong sites. The same temporal partitioning and multi-step forecasting strategy were applied. On this new dataset, the proposed model consistently achieved the lowest errors across one- to three-step forecasts among all compared models. For example, the one-step MAPE of the proposed model is 3.69%, which is 0.74 percentage points lower than that of the second-best model. These results confirm that the framework maintains strong forecasting performance under diverse geographic and climatic environments.
To further address concerns about possible overfitting, we performed a rolling time series cross-validation on the Site 1 dataset. Using a five-fold rolling temporal split, the mean one-step MAPE of the proposed model across the five test folds was 2.14%, with a standard deviation of only 0.27%. This result is highly consistent with the original test set performance. Moreover, the maximum cross-validation error was still significantly lower than the one-step MAPE of the benchmark MVMD–Transformer model (3.87%), indicating that the proposed model does not suffer from overfitting.
In addition, the seasonal analysis experiment (Section 4.4) achieved the lowest error across all seasons and forecast steps, using the same set of parameters without any targeted retraining or re-optimization. This further demonstrates the model’s strong generalization ability and robustness to different seasonal weather patterns. Taken together, the cross-validation, third-site validation, and seasonal analysis collectively confirm that the proposed framework—despite integrating multiple modules (MVMD decomposition, multi-objective optimization, and heterogeneous ensemble learning)—does not overfit the experimental dataset. It maintains stable and excellent forecasting performance under different temporal divisions, meteorological conditions, and geographic settings, thus offering good practical deployment value.

4.6. Run Time

To analyze the computational efficiency of the proposed model, we conducted a statistical analysis of the running time of all relevant models. Specifically, the average calculation time of the proposed model across all experimental scenarios was approximately 960 s. For comparison, the average running times of three representative benchmark models were: approximately 500 s for MVMD-QRLSTM, 720 s for the VMD-based hybrid model, and 800 s for the MOGWO-based hybrid model. The proposed framework thus incurs a longer computational cost, which is reasonable and predictable given its integration of multiple computationally intensive components, including multi-channel joint decomposition via MVMD, heterogeneous ensemble learning with five base models, and the multi-objective iterative optimization process of MMODA. All experiments were conducted in the PyCharm 21.0.3+13-b509.11 integrated development environment. Future work may focus on improving computational efficiency through higher-performance hardware, parallel processing, or model compression strategies such as knowledge distillation.

5. Conclusions

As a cornerstone of the global renewable energy transition, wind power integration necessitates high-precision forecasting to mitigate the inherent stochasticity and non-stationarity of wind speed sequences. This research successfully develops a synergistic soft computing framework that integrates advanced signal processing, heterogeneous ensemble learning, and evolutionary multi-objective optimization. By implementing an intelligent data refinement scheme utilizing Adaptive DBSCAN-LOF and XGBoost-driven SHAP analysis, the proposed system effectively eliminates anomalies and quantifies feature importance, while the application of multivariate variational mode decomposition (MVMD) decouples complex fluctuations into stable, band-limited modes to significantly suppress forecast noise. To transcend the performance bottlenecks of individual predictors, a diverse ensemble comprising Transformer, BPNN, ELM, XGBoost, and QRLSTM was architected to capture multi-scale temporal dependencies and nonlinear features simultaneously. The framework’s optimization core relies on an improved optimizer, MMODA, which introduces an adaptive exponential step-size mechanism and Elite Opposition-Based Learning (EOBL) to prevent premature convergence and efficiently determine the optimal weighting coefficients. Furthermore, through MMODA-optimized Kernel Density Estimation (KDE), the system achieves an optimal balance between interval coverage and width, providing high-confidence probabilistic intervals for grid dispatching.
Empirical validation using real-world datasets demonstrates the superior forecasting capability of the proposed model, which achieved the lowest average MAPE values of 2.9622% and 4.4210%, consistently outperforming hybrid models based on single neural networks or alternative denoising strategies. Benchmarking on ZDT functions further confirmed that MMODA attains the most accurate approximation of the True Pareto Front, maintaining practical forecasting MAPE values between 1.9% and 3.1%, which significantly surpasses results from MOGWO, MODA, and MOWA. These findings, bolstered by statistical significance tests, sensitivity analyses, and seasonality experiments, confirm that the framework maintains a robust advantage across different environmental distributions.
Despite its high performance, this study has certain limitations. First, while the current framework excels in short-term forecasting, its computational complexity—driven by the MVMD decomposition and multi-model ensemble—may pose challenges for real-time applications with extremely high-frequency data. Second, the current weight optimization mainly focuses on historical accuracy, which may not fully adapt to sudden extreme weather events. Future research will focus on developing a more lightweight version of the framework to enhance computational efficiency and integrating physical constraints from numerical weather prediction (NWP) to improve the model’s physical interpretability and robustness under extreme conditions. Ultimately, the proposed integrated wind power forecasting system establishes a solid theoretical and technical foundation for subsequent scheduling and decision-making processes, offering a highly stable and interpretable decision-support tool for modern energy market operations.

Author Contributions

Conceptualization, Z.S., H.X. and J.H.; methodology, J.G., Z.S., H.X., J.L. and J.H.; validation, J.G., H.Z. and H.X.; formal analysis, J.G.; investigation, J.G., H.Z., Z.S., H.X., J.L. and J.H.; data curation, J.G., H.Z., Z.S., H.X. and J.L.; writing—original draft, J.G. and J.H.; writing—review and editing, J.G., H.Z., Z.S., H.X., J.L. and J.H.; visualization, J.G., J.L. and J.H.; supervision, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China under Grant 72103186, BTBU Research Foundation for Youth Scholars under Grant RFYS2025, and BTBU Digital Business Platform Project by BMEC.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, Y.; Guo, C.H.; Chen, X.J.; Jia, L.Q.; Guo, X.N.; Chen, R.S.; Zhang, M.-S.; Chen, Z.-Y.; Wang, H.-D. Carbon peak and carbon neutrality in China: Goals, implementation path and prospects. China Geol. 2021, 4, 720–746. [Google Scholar] [CrossRef]
  2. Shi, C.; Zhi, J.; Yao, X.; Zhang, H.; Yu, Y.; Zeng, Q.; Li, L.; Zhang, Y. How can China achieve the 2030 carbon peak goal—A crossover analysis based on low-carbon economics and deep learning. Energy 2023, 269, 126776. [Google Scholar] [CrossRef]
  3. Shi, H.; Heng, J.; Duan, H.; Li, H.; Chen, W.; Wang, P.; Cui, L.; Wang, S. Critical mineral constraints pressure energy transition and trade toward the Paris Agreement climate goals. Nat. Commun. 2025, 16, 4496. [Google Scholar] [CrossRef]
  4. Global Wind Energy Council. Global Wind Statistics. 2024. Available online: https://www.gwec.net/reports/globalwindreport#Download (accessed on 21 May 2026).
  5. Wang, Y.; Zou, R.; Liu, F.; Zhang, L.; Liu, Q. A review of wind speed and wind power forecasting with deep neural networks. Appl. Energy 2021, 304, 117766. [Google Scholar] [CrossRef]
  6. Mayer, M.J.; Yang, D. Pairing ensemble numerical weather prediction with ensemble physical model chain for probabilistic photovoltaic power forecasting. Renew. Sustain. Energy Rev. 2023, 175, 113171. [Google Scholar] [CrossRef]
  7. Tang, Y.; Guo, Y.; Shen, G.; Zhang, Z. Wind Power Forecasting: A Multi-task Learning Framework Based on Hybrid Model and Knowledge Sharing. Appl. Soft Comput. 2025, 189, 114532. [Google Scholar] [CrossRef]
  8. He, B.; Ye, L.; Pei, M.; Lu, P.; Dai, B.; Li, Z.; Wang, K. A combined model for short-term wind power forecasting based on the analysis of numerical weather prediction data. Energy Rep. 2022, 8, 929–939. [Google Scholar] [CrossRef]
  9. Choi, S.; Jung, E.S. Optimizing numerical weather prediction model performance using machine learning techniques. IEEE Access 2023, 11, 86038–86055. [Google Scholar] [CrossRef]
  10. Yang, M.; Guo, Y.; Huang, Y. Wind power ultra-short-term prediction method based on NWP wind speed correction and double clustering division of transitional weather process. Energy 2023, 282, 128947. [Google Scholar] [CrossRef]
  11. Tyass, I.; Khalili, T.; Rafik, M.; Abdelouahed, B.; Raihani, A.; Mansouri, K. Wind Speed Prediction Based on Statistical and Deep Learning Models. Int. J. Renew. Energy Dev. 2023, 12, 288–299. [Google Scholar] [CrossRef]
  12. Yan, S.; Hu, M. A multi-stage planning method for distribution networks based on ARIMA with error gradient sampling for source–load prediction. Sensors 2022, 22, 8403. [Google Scholar] [CrossRef]
  13. Mandal, A.K.; Sen, R.; Goswami, S.; Chakraborty, B. Comparative study of univariate and multivariate long short-term memory for very short-term forecasting of global horizontal irradiance. Symmetry 2021, 13, 1544. [Google Scholar] [CrossRef]
  14. Huang, Y.; Dai, X.; Wang, Q.; Zhou, D. A hybrid model for carbon price forecasting using GARCH and long short-term memory network. Appl. Energy 2021, 285, 116485. [Google Scholar] [CrossRef]
  15. Pleños, M. Time series forecasting using holt-winters exponential smoothing: Application to abaca fiber data. Zesz. Nauk. SGGW W Warszawie-Probl. Rol. Swiat. 2022, 22, 17–29. [Google Scholar] [CrossRef]
  16. Zhang, W.; Lin, Z.; Liu, X. Short-term offshore wind power forecasting-A hybrid model based on Discrete Wavelet Transform (DWT), Seasonal Autoregressive Integrated Moving Average (SARIMA), and deep-learning-based Long Short-Term Memory (LSTM). Renew. Energy 2022, 185, 611–628. [Google Scholar] [CrossRef]
  17. Tyass, I.; Bellat, A.; Raihani, A.; Mansouri, K.; Khalili, T. Wind speed prediction based on seasonal ARIMA model. In E3S Web of Conferences; EDP Sciences: London, UK, 2022; Volume 336, p. 00034. [Google Scholar]
  18. Al-Gounmeein, R.S.; Ismail, M.T.; Al-Hasanat, B.N.; Awajan, A.M. Improving models accuracy using kalman filter and holt-winters approaches based on ARFIMA Models. IAENG Int. J. Appl. Math. 2023, 53, 98–107. [Google Scholar]
  19. Zhang, S.; Liu, M.; Xie, M.; Lin, S. Two-stage short-term wind power probabilistic prediction using natural gradient boosting combined with neural network. Appl. Soft Comput. 2024, 159, 111669. [Google Scholar] [CrossRef]
  20. Ay, A.; Önal, K.; Top, A.; Haydaroğlu, C.; Kılıç, H.; Yıldırım, Ö. Comparative Deep Learning Models for Short-Term Wind Power Forecasting: A Real-World Case Study from Tokat Wind Farm, Türkiye. Symmetry 2025, 18, 11. [Google Scholar] [CrossRef]
  21. Hou, X.; Li, Y.; Liu, Z. An Enhanced Polar Lights Optimization Algorithm with Symmetry Mechanisms for Global Optimization and Its Application to Wind Power Forecasting. Symmetry 2025, 18, 61. [Google Scholar] [CrossRef]
  22. Khazaei, S.; Ehsan, M.; Soleymani, S.; Mohammadnezhad-Shourkaei, H. A high-accuracy hybrid method for short-term wind power forecasting. Energy 2022, 238, 122020. [Google Scholar] [CrossRef]
  23. Qu, K.; Si, G.; Shan, Z.; Kong, X.; Yang, X. Short-term forecasting for multiple wind farms based on transformer model. Energy Rep. 2022, 8, 483–490. [Google Scholar] [CrossRef]
  24. Zheng, X.; Guan, S.; Zhou, Y. Attention-enhanced kolmogorov–arnold networks for accurate wind power prediction. Appl. Soft Comput. 2026, 190, 114637. [Google Scholar] [CrossRef]
  25. Suo, L.; Peng, T.; Song, S.; Zhang, C.; Wang, Y.; Fu, Y.; Nazir, M.S. Wind speed prediction by a swarm intelligence based deep learning model via signal decomposition and parameter optimization using improved chimp optimization algorithm. Energy 2023, 276, 127526. [Google Scholar] [CrossRef]
  26. Hu, J.; Deng, Y.; Che, J. A novel wind power interval prediction method based on neural ensemble search and dynamic conformalized quantile regression. Appl. Soft Comput. 2025, 180, 113476. [Google Scholar] [CrossRef]
  27. Zhou, D.; Jia, Y.; Liu, G.; Li, J.; Xi, K.; Wang, Z.; Wang, X. Research on an Ultra-Short-Term Wind Power Forecasting Model Based on Multi-Scale Decomposition and Fusion Framework. Symmetry 2026, 18, 253. [Google Scholar] [CrossRef]
  28. Hanifi, S.; Cammarono, A.; Zare-Behtash, H. Advanced hyperparameter optimization of deep learning models for wind power prediction. Renew. Energy 2024, 221, 119700. [Google Scholar] [CrossRef]
  29. Guo, H.; Wang, J.; Li, Z.; Jin, Y. A multivariable hybrid prediction system of wind power based on outlier test and innovative multi-objective optimization. Energy 2022, 239, 122333. [Google Scholar] [CrossRef]
  30. Wang, J.; Wu, X.; Chen, M. Dynamic ensemble point-interval wind speed prediction system for data drift: Adaptive real-time feature decoupling and multi-level information fusion quantile regression. Appl. Soft Comput. 2026, 190, 114615. [Google Scholar] [CrossRef]
  31. Kumar, B.; Yadav, N. A novel hybrid algorithm based on Empirical Fourier decomposition and deep learning for wind speed forecasting. Energy Convers. Manag. 2024, 300, 117891. [Google Scholar] [CrossRef]
  32. Xiong, Z.; Yao, J.; Huang, Y.; Yu, Z.; Liu, Y. A wind speed forecasting method based on EMD-MGM with switching QR loss function and novel subsequence superposition. Appl. Energy 2024, 353, 122248. [Google Scholar] [CrossRef]
  33. He, Y.; Wang, Y. Short-term wind power prediction based on EEMD–LASSO–QRNN model. Appl. Soft Comput. 2021, 105, 107288. [Google Scholar] [CrossRef]
  34. Lei, P.; Ma, F.; Zhu, C.; Li, T. LSTM short-term wind power prediction method based on data preprocessing and variational modal decomposition for soft sensors. Sensors 2024, 24, 2521. [Google Scholar] [CrossRef]
  35. Xie, B.; Shi, S.; Liu, W. Integrated forecasting of marine renewable power: An adaptively Bayesian-optimized MVMD-LSTM framework for wind-solar-wave energy. Sustain. Energy Technol. Assess. 2025, 84, 104753. [Google Scholar] [CrossRef]
  36. Xian, H.; Che, J. Unified whale optimization algorithm based multi-kernel SVR ensemble learning for wind speed forecasting. Appl. Soft Comput. 2022, 130, 109690. [Google Scholar] [CrossRef]
  37. Mirjalili, S. Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput. Appl. 2016, 27, 1053–1073. [Google Scholar] [CrossRef]
  38. Wu, H.; Gao, X.Z.; Li, K.; Heng, J.N. An improved brain-motivated network for forecasting day-ahead stock prices of electricity companies. Knowl.-Based Syst. 2025, 325, 114040. [Google Scholar] [CrossRef]
  39. Shambour, Q.; Al-Zyoud, M.; Almomani, O. Quantum-inspired hybrid metaheuristic feature selection with SHAP for optimized and explainable spam detection. Symmetry 2025, 17, 1716. [Google Scholar] [CrossRef]
  40. Jiang, T.; Liu, B.; Li, X.; Mazza, A.; Li, G.; Pons, E.; Huang, T. Subsynchronous oscillation source location in power system with high penetration of wind power using multivariate variational mode decomposition. IEEE Trans. Power Syst. 2025, 40, 4971–4983. [Google Scholar] [CrossRef]
  41. Bates, J.M.; Granger, C.W. The combination of forecasts. J. Oper. Res. Soc. 1969, 20, 451–468. [Google Scholar] [CrossRef]
  42. Nascimento, E.G.S.; de Melo, T.A.; Moreira, D.M. A transformer-based deep neural network with wavelet transform for forecasting wind speed and wind energy. Energy 2023, 278, 127678. [Google Scholar] [CrossRef]
  43. Shang, Y.; Miao, L.; Shan, Y.; Gnyawali, K.R.; Zhang, J.; Kattel, G. A hybrid ultra-short-term and short-term wind speed forecasting method based on CEEMDAN and GA-BPNN. Weather Forecast. 2022, 37, 415–428. [Google Scholar] [CrossRef]
  44. Wang, J.; Niu, X.; Zhang, L.; Liu, Z.; Huang, X. A wind speed forecasting system for the construction of a smart grid with two-stage data processing based on improved ELM and deep learning strategies. Expert Syst. Appl. 2024, 241, 122487. [Google Scholar] [CrossRef]
  45. Wu, X.; Cheng, H.; Wu, B.; Jiang, N.; Huang, Y. A regional XGBoost-based predictive model for the cumulative hours of wind speed levels. Atmos. Res. 2026, 327, 108336. [Google Scholar] [CrossRef]
  46. Zhu, J.; He, Y. A Large-Scale Multiobjective Evolutionary Quantile Estimation Model for Wind Power Probabilistic Forecasting. IEEE Trans. Evol. Comput. 2024, 29, 2244–2257. [Google Scholar] [CrossRef]
  47. Liu, Z.; Jiang, P.; Zhang, L.; Niu, X. A combined forecasting model for time series: Application to short-term wind speed forecasting. Appl. Energy 2020, 259, 114137.1–114137.25. [Google Scholar] [CrossRef]
Figure 1. Framework of the multi-model forecasting system.
Figure 1. Framework of the multi-model forecasting system.
Symmetry 18 00921 g001
Figure 2. Feature contribution and selection results quantified by SHAP interaction analysis.
Figure 2. Feature contribution and selection results quantified by SHAP interaction analysis.
Symmetry 18 00921 g002
Figure 3. Correlation diagram between various features.
Figure 3. Correlation diagram between various features.
Symmetry 18 00921 g003
Figure 4. Flowchart of the proposed integrated wind power forecasting system.
Figure 4. Flowchart of the proposed integrated wind power forecasting system.
Symmetry 18 00921 g004
Figure 5. Performance comparison of multi-step forecast in Experiment I for Site 1.
Figure 5. Performance comparison of multi-step forecast in Experiment I for Site 1.
Symmetry 18 00921 g005
Figure 6. Performance comparison of multi-step forecast in Experiment II for Site 2.
Figure 6. Performance comparison of multi-step forecast in Experiment II for Site 2.
Symmetry 18 00921 g006
Figure 7. Comparison of Pareto fronts obtained by MMODA and other multi-objective optimizers on ZDT test suites.
Figure 7. Comparison of Pareto fronts obtained by MMODA and other multi-objective optimizers on ZDT test suites.
Symmetry 18 00921 g007
Figure 8. Performance comparison multi-step forecast in Experiment IV for Site 1.
Figure 8. Performance comparison multi-step forecast in Experiment IV for Site 1.
Symmetry 18 00921 g008
Figure 9. Interval forecasting results of Site 1.
Figure 9. Interval forecasting results of Site 1.
Symmetry 18 00921 g009
Figure 10. The performance of the proposed model across the four seasons.
Figure 10. The performance of the proposed model across the four seasons.
Symmetry 18 00921 g010
Table 1. Comparative analysis of mainstream wind power forecasting paradigms.
Table 1. Comparative analysis of mainstream wind power forecasting paradigms.
Method CategoryFeatureAdvantageShortcoming
Physics-based methods (such as NWP)Based on meteorological principles and fluid dynamics(1) Good long-term forecast effect
(2) Clear physical meaning
(1) The calculation is complex and time-consuming
(2) High demand for data
(3) Short-term forecast
Time series method (such as ARIMA, GARCH)Linear model, parameter estimation(1) High efficiency of short-term forecast
(2) Mature mathematical theory
(3) Only historical power data is needed
(1) Difficult to capture complex nonlinear relationships
(2) Poor precision of long-term forecast
(3) Weak handling of multiple variables
Intelligent algorithm
(such as machine learning, deep learning)
Deep structure, automatic feature extraction(1) Powerful nonlinear fitting ability
(2) Excellent fault tolerance and generalization
(3) Time-dependent capture accuracy
(1) Requires access to extensive, high-quality data
(2) High risk of overfitting
(3) Poor black box and interpretability
mixed methodsMulti-technology integration, complementing each other’s strengths and weaknesses(1) Complementary advantages, balancing
(2) Bias and variance
Strong flexibility
(3) Better robustness
(1) High system complexity
(2) The computational cost may be higher
(3) Subjective weight allocation
Table 2. Architectural and training hyperparameter settings of the base models.
Table 2. Architectural and training hyperparameter settings of the base models.
ParameterTransformerXGBoostBPNNELMQRLSTM
Network Architecture2 Attention blocks + Flatten + 2 Dense (256, 128)Gradient boosting trees (1500 trees, max depth 7)3 Dense layers (256, 128)Single hidden-layer feedforward (500 hidden neurons)Bidirectional LSTM (192 units) + Attention + Dense (128, 64)
Activation FunctionReLU-tanhReLUSwish
Dropout Rate0.3---0.2
OptimizerAdam-AdamAdamAdam
Learning Rate0.00050.015 (tree learning rate)0.001-0.0005
Batch Size128-200-128
Loss FunctionMAEMAEMAEMAEMAE
Output LayerDenseWeighted sum of tree leavesDenseLinear weighted sumDense
Table 3. Performance metrics used in experimental evaluation.
Table 3. Performance metrics used in experimental evaluation.
MetricDefinitionEquation
MAE Average value of absolute error between forecast value and true value MAE = 1 n i = 1 M | e ^ i e i |
MAPE Average percentage of relative error between forecast value and true value MAPE = 1 M i = 1 M e i e ^ i e i × 100 %
RMSE The square root of the mean value of the square sum of the errors between the forecast value and the true value RMSE = 1 M × i = 1 M e ^ i e i 2
SSESum of squares of errors between forecast value and true value SSE = i = 1 M e i e ^ i 2
Table 4. Composition of different hybrid models used in the experiments.
Table 4. Composition of different hybrid models used in the experiments.
ModelDecomposition
Model
Prediction ModelOptimization Model
#1MVMDTransformer, BPNN, ELM, XGBoost, QRLSTMMMODA
#2MVMDTransformer
#3BPNN
#4ELM
#5XGBoost
#6QRLSTM
#7VMDTransformer, BPNN, ELM, XGBoost, QRLSTMMMODA
#8EWTTransformer, BPNN, ELM, XGBoost, QRLSTM
#9SSATransformer, BPNN, ELM, XGBoost, QRLSTM
#10MVMDTransformer, BPNN, ELM, XGBoost, QRLSTMMOGWO
#11Transformer, BPNN, ELM, XGBoost, QRLSTMMOWA
#12Transformer, BPNN, ELM, XGBoost, QRLSTMMODA
#13 Transformer
#14BPNN
#15ELM
#16XGBoost
#17QRLSTM
Table 5. Parameter configurations for the proposed and benchmark models in Experiment I.
Table 5. Parameter configurations for the proposed and benchmark models in Experiment I.
AlgorithmParametersValue
MMODAIteration Number200
Archive Size100
Dragonfly Number200
MVMDNumber of Modes (K)6
Penalty Factor (α)200
Convergence Tolerance (τ)1 × 10−7
Iteration Number800
Table 6. Comparison table of forecasting performance between the proposed model and the hybrid model using the same preprocessing technique (Experiment I).
Table 6. Comparison table of forecasting performance between the proposed model and the hybrid model using the same preprocessing technique (Experiment I).
DatasetModel1-Step2-Step3-Step
MAEMAPERMSESSEMAEMAPERMSESSEMAEMAPERMSESSE
Site 1#20.513.87 0.70 1938.860.644.740.762651.080.846.071.064034.98
#30.362.640.41949.710.563.970.551869.650.694.780.723106.50
#40.818.851.126768.191.1110.031.519129.161.3312.521.9513,659.40
#50.939.711.688272.101.4511.602.0410,202.561.6413.182.3416,831.88
#60.322.510.41893.300.503.630.601859.470.634.570.783058.16
#10.201.960.31612.050.312.900.461349.400.464.020.631939.05
Site 2#21.795.942.2512,211.102.156.222.7819,690.332.687.173.3727,806.99
#31.454.071.869696.001.935.862.5517,294.832.276.613.0225,610.00
#42.447.803.1120,123.002.829.284.3532,993.843.2910.865.4854,264.30
#53.9210.414.7858,981.024.3712.306.5072,845.716.4413.068.05102,647.72
#61.393.921.989846.331.905.332.7117,541.322.336.453.1625,513.92
#11.033.061.457677.291.624.612.3315,598.022.005.592.6522,066.52
Table 7. Parameter settings of models with variety of data preprocessing methods in Experiment II.
Table 7. Parameter settings of models with variety of data preprocessing methods in Experiment II.
AlgorithmParametersValue
MMODAIteration Number200
Archive Size100
Dragonfly Number200
MVMDNumber of Modes (K)6
Penalty Factor (α)200
Convergence Tolerance (τ)1 × 10−7
Iteration Number800
VMDNumber of Modes (K)6
Penalty Factor (α)2000
Convergence Tolerance (τ)1 × 10−7
EWTNumber of decomposed modes6
SSAWindow Length24
Principal Component Decomposition Number6
Table 8. Comparison table of forecasting performance between hybrid models using different preprocessing techniques and the proposed model (Experiment II).
Table 8. Comparison table of forecasting performance between hybrid models using different preprocessing techniques and the proposed model (Experiment II).
DatasetModel1-Step2-Step3-Step
MAEMAPERMSESSEMAEMAPERMSESSEMAEMAPERMSESSE
Site 1#71.8711.552.7928,527.412.0312.353.0331,768.262.1013.17 3.0634,091.31
#84.0024.195.8198,205.494.4928.466.49126,804.465.0031.547.20158,426.32
#92.3413.13 3.6146,432.622.4213.563.7450,770.422.4113.624.0052,759.14
#10.201.960.31612.050.312.900.461349.400.464.020.631939.05
Site 2#73.318.865.0855,075.763.8610.835.8060,986.134.04 12.46 6.6666,625.38
#86.0418.288.4684,618.167.1519.159.2592,519.748.22 21.6611.02110,212.15
#93.579.095.0259,150.763.8110.0645.8168,135.434.0310.966.9570,474.49
#10.753.001.185146.201.256.542.5425,524.132.298.163.3434,405.89
Table 9. The 50 times average results of the performance metrics for MOT.
Table 9. The 50 times average results of the performance metrics for MOT.
Function AlgorithmIGDHVSP
ZDT1 MOGWO0.02980.83310.0254
MOWA0.01760.85120.0154
MODA0.01600.85380.0136
MMODA0.00450.89100.0050
ZDT3MOGWO0.01860.70930.0216
MOWA0.02480.81380.0304
MODA0.02840.89320.0673
MMODA0.00560.96870.0074
ZDT6 MOGWO0.16050.36350.1494
MOWA0.03420.41860.1568
MODA0.05250.46490.1336
MMODA0.00360.62390.0338
Table 10. Comparative model parameter specifications in Experiment III.
Table 10. Comparative model parameter specifications in Experiment III.
AlgorithmParametersValue
MMODAIteration Number200
Archive Size100
Dragonfly Number200
MODAIteration Number200
Archive Size100
Dragonfly Number200
MOGWOIteration Number200
Archive Size100
Dragonfly Number200
MOWAIteration Number200
Archive Size100
Dragonfly Number200
MVMDNumber of Modes (K)6
Penalty Factor (α)200
Convergence Tolerance (τ)1 × 10−7
Table 11. Table of forecasting performance for combined models using different optimization algorithms and the proposed model (Experiment III).
Table 11. Table of forecasting performance for combined models using different optimization algorithms and the proposed model (Experiment III).
DatasetModel1-Step2-Step3-Step
MAEMAPERMSESSEMAEMAPERMSESSEMAEMAPERMSESSE
Site 1#100.252.000.38639.540.403.010.591502.850.524.080.712085.38
#110.292.020.40645.180.463.210.641576.210.554.100.762132.40
#120.312.190.43671.210.493.350.671636.480.594.120.822433.19
#10.19531.960.31612.050.312.900.461349.400.464.020.631939.05
Site 2#100.823.271.215419.101.746.903.0132,934.902.428.773.7642,103.85
#110.863.401.225510.771.967.193.1134,188.462.518.913.9447,131.00
#120.903.551.31 6067.092.117.47 3.3239,913.542.639.114.0854,190.97
#10.753.001.185146.201.256.542.5425,524.132.298.163.3434,405.89
Table 12. Key parameter specifications of the five benchmark models in Experiment IV.
Table 12. Key parameter specifications of the five benchmark models in Experiment IV.
AlgorithmParametersValue
MMODAIteration Number200
Archive Size100
Dragonfly Number200
MVMDNumber of Modes (K)6
Penalty Factor (α)200
Convergence Tolerance (τ)1 × 10−7
Iteration Number800
Table 13. Comparison table of forecasting performance between classic individual model and proposed model (Experiment IV).
Table 13. Comparison table of forecasting performance between classic individual model and proposed model (Experiment IV).
DatasetModel1-Step2-Step3-Step
MAEMAPERMSESSEMAEMAPERMSESSEMAEMAPERMSESSE
Site 1#136.9540.908.97314,039.607.4844.259.45401,050.738.2848.4810.50492,449.14
#146.8142.759.19320,392.147.7948.3910.51414,616.518.3952.0211.48500,144.95
#157.1443.939.27328,114.267.7948.8710.53416,331.028.6653.4011.83512,383.10
#166.6539.888.61298,294.136.8846.029.38393,719.177.2650.4010.56497,638.88
#177.0640.449.12325,055.427.5442.7510.60402,526.198.1849.1811.69509,645.05
#10.201.960.31612.050.312.900.461349.400.464.020.631939.05
Site 2#1320.7760.7126.532,007,934.7522.8869.1730.522,347,069.3824.7776.2532.912,797,115.64
#1430.0155.2234.403,056,516.7031.0159.3939.293,703,863.3333.5966.9641.294,150,954.83
#1529.6169.1134.432,893,836.8930.3171.0037.733,080,380.0031.6173.2440.343,515,398.96
#1625.2752.9329.102,781,294.4327.2757.3932.222,990,227.0528.7665.9634.943,268,052.06
#1720.7049.8425.292,056,827.0522.8753.7228.112,305,925.6923.7860.8731.292,698,243.36
#10.753.001.185146.201.256.542.5425,524.132.298.163.3434,405.89
Table 14. Summary of interval forecasting quality metrics for different models.
Table 14. Summary of interval forecasting quality metrics for different models.
DatasetModelConfidence LevelPICPPINAWCWCCRPS
Site 1MMODA-KDE95%0.97190.01990.01990.2324
90%0.95000.01560.01560.2215
85%0.92270.01250.01250.2134
MORIME-KDE95%0.94650.01990.01990.2385
90%0.90550.01780.01780.2279
85%0.85250.01550.01550.2177
MOWOA-KDE95%0.95360.02540.02540.2496
90%0.90900.02040.02040.2348
85%0.85360.01740.01740.2217
Thumb KDE95%0.95880.02240.02240.2423
90%0.93610.01870.01870.2326
85%0.89430.01640.01640.2195
Site 2MMODA-KDE95%0.96350.03630.03630.6840
90%0.92790.02930.02930.6635
85%0.89880.02430.02430.6319
MORIME-KDE95%0.95080.04040.04040.7139
90%0.91120.03190.03190.6972
85%0.85240.02710.02710.6615
MOWOA-KDE95%0.95910.04030.04030.7055
90%0.91760.03260.03260.6836
85%0.88700.02770.02770.6519
Thumb KDE95%0.95860.04110.04110.6936
90%0.91980.03300.03300.6794
85%0.89230.02810.02810.6483
Table 15. DM test statistics for forecasting performance among models.
Table 15. DM test statistics for forecasting performance among models.
Model1-Stepp-Value2-Stepp-Value3-Stepp-Value
#137.9816 *<0.0014.8600 *<0.0012.0191 **<0.001
#146.5231 *<0.0013.7945 *<0.0012.0071 **<0.001
#156.5540 *<0.0015.2375 *<0.0011.9901 **<0.001
#165.9304 *<0.0013.2345 *<0.0011.8101 ***<0.001
#1711.8434 *<0.0018.5244 *<0.0018.7250 *<0.001
#218.5871 *<0.00115.3642 *<0.00112.1811 *<0.001
#310.5961 *<0.0018.7512 *<0.0018.1906 *<0.001
#420.8580 *<0.00117.6470 *<0.00114.1079 *<0.001
#514.6819 *<0.00113.5279 *<0.00112.4746 *<0.001
#66.4978 *<0.0016.0049 *<0.0016.7112 *<0.001
#714.8275 *<0.00112.3509 *<0.00110.2415 *<0.001
#815.9509 *<0.00112.5613 *<0.00110.7608 *<0.001
#914.3171 *<0.00111.7988 *<0.0019.9118 *<0.001
#105.9151 *<0.0019.7631 *<0.0015.6462 *<0.001
#114.8856 *<0.0017.5892 *<0.0017.7838 *<0.001
#124.6665 *<0.0016.7574 *<0.0016.3387 *<0.001
Note: * 1% significance level; ** 5% significance level; *** 10% significance level.
Table 16. Performance improvement based on four evaluation criteria.
Table 16. Performance improvement based on four evaluation criteria.
ParametersDefinitionFormula
P M A E Improvement percentages of MAE. P M A E = M A E 1 M A E 2 M A E 2 × 100 %
P M A P E Improvement percentages of MAPE. P M A P E = M A P E 1 M A P E 2 M A P E 2 × 100 %
P R M S E Improvement percentages of RMSE. P R M S E = R M S E 1 R M S E 2 R M S E 2 × 100 %
P S S E Improvement percentages of SSE. P S S E = S S E 1 S S E 2 S S E 2 × 100 %
Table 17. Relative performance improvement over benchmark models.
Table 17. Relative performance improvement over benchmark models.
ModelSite 1Site 2
P M A E P M A P E P R M S E P S S E P M A E P M A P E P R M S E P S S E
#1395.791293.351695.159299.681793.724691.420792.163799.0967
#1495.824593.792695.504699.682395.468690.262793.864699.4074
#1595.943793.923395.563599.691395.315591.711993.731199.3184
#1695.402893.489095.092799.670294.717389.960492.672399.2862
#1795.807793.291295.545299.687793.621389.249691.675799.0847
#253.271840.603545.184756.490530.675332.118624.228926.1946
#341.243722.734417.930233.652618.915420.495614.216414.8247
#471.381572.256469.797187.339346.490553.193450.378557.9732
#576.733674.758177.316389.295568.507163.434466.923581.3684
#635.252818.009222.372231.830518.234516.225618.739315.5495
#784.049176.034584.182595.871661.671244.968739.102564.3803
#892.910389.442892.813898.980579.930770.069062.896976.7083
#986.652777.951787.642697.401362.361741.257840.019867.0922
#1018.56002.288016.79307.740013.67306.570011.600019.1176
#1126.42594.816322.271410.412119.29309.230014.710025.0500
#1231.11117.991726.654917.726723.835312.070018.898035.0400
Table 18. Comparative sensitivity analysis of forecasting models using four metrics.
Table 18. Comparative sensitivity analysis of forecasting models using four metrics.
MetricDefinitionEquation
S M A E STD value of MAE of n times forecasting. S M A E = S t d M A E 1 , M A E 2 , , M A E n
S M A P E STD value of MAPE of n times forecasting. S M A P E = S t d M A P E 1 , M A P E 2 , , M A P E n
S R M S E STD value of RMSE of n times forecasting. S R M S E = S t d R M S E 1 , R M S E 2 , , R M S E n
S S S E STD value of SSE of n times forecasting. S S S E = S t d S S E 1 , S S E 2 , , S S E n
Table 19. Sensitivity analysis of the proposed model’s parameters.
Table 19. Sensitivity analysis of the proposed model’s parameters.
StepParameterSite 1Site 2
SMAESMAPESRMSESSSESMAESMAPESRMSESSSE
1-StepK0.00970.13080.01793.31510.01970.20080.02394.7244
alpha0.00480.10480.01191.78320.01080.15490.01282.9956
max_iter0.00790.07490.00982.18450.02790.09200.00653.3345
Dragonfly Number0.00180.01250.00150.79200.00590.02680.00340.8173
Iteration Number0.00270.01790.00320.98790.00620.02430.00711.5736
Archive Size0.00330.00830.00690.62900.00740.01150.00840.9582
2-StepK0.01120.18790.05134.15370.02920.20180.04525.0028
alpha0.00920.13970.01932.48700.01720.19320.02463.7596
max_iter0.01320.16390.02953.28350.03120.20390.03743.0946
Dragonfly Number0.00450.02560.00720.93510.01030.03410.00661.0567
Iteration Number0.00630.00960.00711.31850.00760.01470.00471.2588
Archive Size0.00490.01080.01241.33760.00830.02370.01772.0984
3-StepK0.01880.23740.09965.17830.04280.33910.10336.6853
alpha0.01380.19640.02203.06380.02380.25850.03854.0076
max_iter0.01780.38470.05204.48720.03780.40010.05735.8329
Dragonfly Number0.00730.03910.00781.82740.01730.03800.00921.7599
Iteration Number0.00970.02780.00952.33870.01370.02210.01223.3618
Archive Size0.00350.01750.01711.96460.01530.02950.01783.0262
Table 20. Comparative parameter settings of benchmark and proposed models for four-season forecasting.
Table 20. Comparative parameter settings of benchmark and proposed models for four-season forecasting.
AlgorithmParametersValue
MMODAIteration Number200
Archive Size100
Dragonfly Number200
MOGWOIteration Number200
Archive Size100
Dragonfly Number200
MVMDNumber of Modes (K)6
Penalty Factor (α)200
Convergence Tolerance (τ)1 × 10−7
Iteration Number800
VMDNumber of Modes (K)6
Penalty Factor (α)2000
Convergence Tolerance (τ)1 × 10−7
Table 21. Seasonal forecasting results: proposed vs. benchmark models.
Table 21. Seasonal forecasting results: proposed vs. benchmark models.
PeriodModel1-Step2-Step3-Step
MAEMAPERMSESSEMAEMAPERMSESSEMAEMAPERMSESSE
Spring#61.434.101.832925.322.397.473.607913.183.2010.024.3411,914.00
#73.808.904.5314,311.104.0110.025.3417,187.974.8713.136.1722,130.54
#101.293.951.701896.791.927.173.276822.773.079.444.5310,958.71
#1721.4750.0526.94591,453.1223.8955.3627.71757,489.6525.0758.3729.08900,471.88
#10.703.101.271756.201.256.542.546295.12652.298.263.449643.89
Summer#62.334.602.745913.893.077.674.268867.96743.878.914.7212,630.94
#75.0511.326.0919,024.625.4412.177.4725,024.62466.6314.998.0230,585.44
#101.984.022.324085.852.847.053.807919.87493.558.634.2212,116.73
#1722.0952.9827.49653,871.3724.3557.9128.95845,721.9027.4960.3630.92994,782.29
#11.413.711.813812.502.696.743.707295.643.338.474.0711,643.67
Autumn#62.748.563.196218.863.9610.055.0314,296.864.5914.506.1919,514.98
#74.9214.387.1024,378.505.2717.557.4928,434.466.3119.648.5035,217.48
#102.406.222.985128.943.738.884.7212,899.324.2212.945.8716,458.45
#1725.6455.9230.71723,977.2229.4760.7430.64913,672.4632.7563.4734.811,093,749.92
#12.045.712.594812.503.638.004.1711,343.434.1012.465.2115,666.52
Winter#61.666.86692.113142.052.068.973.618800.143.1011.914.6613,183.58
#73.0112.754.2913,547.033.6714.905.1316,597.644.5818.905.7420,633.42
#101.285.391.912397.201.567.392.967508.102.699.034.0811,048.53
#1722.8452.3627.93618,392.8424.6356.8129.19801,637.4826.7359.8430.94953,829.95
#11.154.741.682028.001.486.902.696985.692.518.413.5710,031.51
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, J.; Zhang, H.; Sun, Z.; Xu, H.; Li, J.; Heng, J. An Adaptive Multi-Scale Heterogeneous Ensemble Framework for Interpretable Wind Power Forecasting in Sustainable Grids. Symmetry 2026, 18, 921. https://doi.org/10.3390/sym18060921

AMA Style

Gao J, Zhang H, Sun Z, Xu H, Li J, Heng J. An Adaptive Multi-Scale Heterogeneous Ensemble Framework for Interpretable Wind Power Forecasting in Sustainable Grids. Symmetry. 2026; 18(6):921. https://doi.org/10.3390/sym18060921

Chicago/Turabian Style

Gao, Jiaoyang, Hui Zhang, Zhongmiao Sun, Hui Xu, Jiahe Li, and Jiani Heng. 2026. "An Adaptive Multi-Scale Heterogeneous Ensemble Framework for Interpretable Wind Power Forecasting in Sustainable Grids" Symmetry 18, no. 6: 921. https://doi.org/10.3390/sym18060921

APA Style

Gao, J., Zhang, H., Sun, Z., Xu, H., Li, J., & Heng, J. (2026). An Adaptive Multi-Scale Heterogeneous Ensemble Framework for Interpretable Wind Power Forecasting in Sustainable Grids. Symmetry, 18(6), 921. https://doi.org/10.3390/sym18060921

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop