1. Introduction
The utilization and expansion of renewable energy resources is recognized as a critical strategy in the fight against future climate change and environmental degradation [
1,
2]. Across the globe, a wide range of eco-friendly power options exist [
3,
4]. Among these, wind-based generation, recognized for minimal emissions and ample availability, has drawn heightened interest. In consideration of the inherent variability and intermittent nature of wind energy [
5], ensuring the stability of its grid integration poses a series of challenges. Consequently, the implementation of accurate wind power forecasting is paramount for optimizing energy management within power systems, enhancing power supply reliability, and reducing the costs associated with system reserve capacities [
6].
The prediction of wind power is typically categorized into three primary groups: physical modelling techniques, statistical methods, and algorithms based on artificial intelligence [
7,
8]. Physical modelling approaches are more suitable for medium to long-term forecasting, as they take into account such parameters as wind field topography and atmospheric dynamics. Nevertheless, their application is generally constrained by uncertainties related to model inputs, parameter estimation, and structural configurations [
9,
10]. Conversely, statistical methods rely on historical data to predict wind power through the development of time series models or regression models [
11]. Conventional statistical approaches include methodologies like the Autoregressive Integrated Moving Average (ARIMA) model, in addition to strategies including Support Vector Regression (SVR) [
12,
13]. These methods offer rapid computational speeds and excel in forecasting linear data patterns but exhibit drawbacks in dealing with nonlinear and sophisticated relationships. In comparison to physical and statistical methods, artificial intelligence approaches such as recurrent neural networks (RNNs) [
14] and LSTM [
15] leverage big data analytics to demonstrate superior accuracy and broader applicability.
Considering that single predictive models in artificial intelligence methods have limited capability in handling some extremely complex sequences in wind power forecasting, techniques such as data preprocessing, algorithm optimization, and ensemble modelling have gradually been adopted by numerous studies [
16,
17,
18]. In the preprocessing phase of wind power feature analysis, reference [
19] applies the grey relational analysis technique for the purpose of assessing the effects of diverse wind power features on wind power generation, pinpointing the key determinants influencing the output. Reference [
20] employs the Pearson correlation coefficient to mine important features within the dataset, effectively eliminating features with minimal impact on wind power output. However, due to the inherent preference biases of different correlation analysis methods’ evaluation metrics, their influence on model prediction results necessitates further research.
Regarding wind speed data preprocessing, reference [
21] corrects weather forecast wind speeds through curve fitting and performs wind speed screening based on meteorological types, thereby excluding anomalous wind speed values. Nevertheless, curve fitting exhibits a strong dependence on data and is susceptible to the influence of outliers when data quality is poor. As demonstrated in reference [
22], the application of a Kalman filter to the smoothing of multi-scale signals in decomposed wind speed time series has been shown to be an effective method of reducing noise and outliers in wind speed data. However, this approach demonstrates poor real-time performance when handling large-scale data and is highly sensitive to model parameter selection, which may result in suboptimal filtering outcomes. Reference [
23] constructs a boundary using a OneSVM to describe the feature space of normal data and identifies anomalous data by detecting samples that significantly deviate from this boundary. As an unsupervised learning method, this approach effectively handles high-dimensional and nonlinear data while maintaining robust performance.
In optimizing predictive model parameters, reference [
24] proposes the implementation of a Grey Wolf Optimization (GWO) algorithm for the purpose of optimizing the hyperparameters of a Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) network. The GWO algorithm identifies an optimal combination of hyperparameters for the LSTM-CNN model. Reference [
25] enhances the parameters of the Support Vector Machine (SVM) by utilizing Particle Swarm Optimization (PSO), aiming to increase the predictive accuracy of wind power forecasting. Nevertheless, both algorithms exhibit a tendency to get stuck in local optima, which undermines their ability to effectively perform global search, necessitating additional exploration and refinement. More generally, many existing swarm intelligence and metaheuristic optimization algorithms suffer from several common drawbacks in high-dimensional, multimodal hyperparameter spaces: they may prematurely converge due to insufficient population diversity, struggle to balance global exploration and local exploitation, and display high sensitivity to initialization and control parameters. These limitations significantly constrain their ability to achieve stable and reliable global optimization in complex wind power forecasting tasks. The ZOA, introduced by Trojovská E et al. in 2022 [
26], is a novel optimization method that mimics zebra behaviour. Compared to other optimization algorithms, it boasts strong optimization capabilities and rapid convergence speed. Owing to its specific search mechanism, the ZOA can better maintain population diversity and more effectively trade off exploration and exploitation, thus reducing the risk of trapping in local optima and improving global search performance compared with traditional algorithms such as PSO and GWO in complex optimization scenarios. Reference [
27] implements an augmented iteration of the ZOA for hyperparameters of a LSTM network, demonstrating that the enhanced ZOA achieves faster convergence speeds than other algorithms. Elite Opposition-Based Learning (EOBL) is a method designed to improve the global exploration efficiency and the rate of convergence of optimization algorithms by producing inverse solutions, thus helping to by-pass local optima.
Reference [
28] improves the GWO algorithm using EOBL; this leads to an improvement in the predictive performance of the optimized LSTM network when compared to the baseline model. These studies indicate that combining the ZOA with EOBL has the potential to further enhance convergence speed and global optimization capability, making it a promising candidate for hyperparameter optimization in advanced wind power forecasting models.
In studies focusing on single predictive models, reference [
29] employs XGBoost in order to predict wind power. XGBoost, representing a highly efficient and powerful distributed gradient boosting algorithm, demonstrates a computation speed ten times faster than traditional gradient boosting algorithms. However, the XGBoost model is sensitive to anomalous wind speed values in wind power features; excessive anomalies in wind speed data may degrade model performance. Reference [
30] utilizes a LSTM network for the estimation of wind power. LSTM networks, renowned for their deep learning capabilities in capturing historical information within time sequence data, are widely employed in a variety of sequential prediction tasks. However, within the scope of wind power forecasting, LSTM networks are unable to handle multi-scale temporal features of wind speed fluctuations. Reference [
31] selects a Residual Neural Network (ResNet) for wind power prediction. Results indicate that the ResNet exhibits strong feature extraction capabilities in processing two-dimensional features and time series data due to its unique convolutional architecture. However, compared to other neural networks, the ResNet tends to overlook the correlations between local and global features because convolutional layers typically handle only a limited range of time steps. Reference [
32] employs a RR model for wind power forecasting. RR is a technique specifically designed to address multicollinearity in data analysis by incorporating an L2 regularization term into the loss function, thereby preventing model weights from becoming excessively large. This effectively controls model complexity and mitigates overfitting. However, the highly nonlinear relationships between various wind power features often lead to a diminished predictive capacity of the RR model. These differences suggest that the above models are not redundant, but rather complementary. Specifically, LSTM is suitable for learning long-term temporal dependence and sequential evolution patterns in wind power series; ResNet is effective at extracting local, hierarchical, and fluctuation-related features from input data through convolutional residual learning; RR provides a stable and interpretable linear baseline that is useful for capturing approximately linear relationships and mitigating multicollinearity among meteorological variables; and XGBoost is powerful in modelling nonlinear feature interactions and complex decision boundaries while maintaining high computational efficiency. Therefore, combining these highly heterogeneous base learners can enable the forecasting framework to simultaneously capture temporal dependence, local structural patterns, linear trends, and nonlinear interactions, thereby improving both prediction accuracy and model robustness.
In the domain of hybrid predictive model ensembles, reference [
33] introduces a combined approach utilizing a ResNet alongside LSTM, achieving an average accuracy improvement of 10.97% compared to single ResNet and LSTM predictive models. Nonetheless, its accuracy remains inferior to other ensemble models, indicating the need for more comprehensive ensemble strategies. Reference [
34] employs a stacking framework for multi-model fusion in load forecasting. The prediction results demonstrate that the stacking framework outperforms both single models and ensemble models without a stacking framework in terms of accuracy. However, its predictive accuracy is still lower compared to multi-model fusion frameworks optimized with algorithmic enhancements. Therefore, it is necessary to design stacking frameworks that incorporate strongly heterogeneous base learners and are equipped with more powerful optimization algorithms, so as to fully exploit model complementarity and further improve forecasting performance.
Taking into consideration the merits and demerits of the research methods previously discussed, this research puts forward a short-term wind power forecasting model which incorporates an improved multi-feature ZOA within a stacking ensemble structure. The model encompasses multi-feature data processing, anomaly detection and filtering, and an optimized parameter algorithm within a stacked deep learning architecture. The research objectives include:
Feature Analysis and Selection: To mitigate the escalation of computational costs and the risk of overfitting, grey relational analysis and Pearson correlation coefficient analysis are employed to evaluate each feature influencing wind power output. Considering that different correlation analysis methods may introduce biases in evaluation metrics affecting prediction accuracy, the scores from the two analysis methods for each feature are averaged. Subsequently, the degree of association for each feature is determined, and feature selection is conducted accordingly. If wind speed features are retained, the OneSVM model is utilized to screen and eliminate anomalous wind speed data. Compared to existing data preprocessing methods, this approach not only quantitatively assesses various wind power features but also effectively handles further wind speed anomalies.
Ensemble Model Construction: Given that single models and certain composite models still exhibit relatively poor prediction accuracy, it is necessary to consider more comprehensive ensemble models and combination strategies. Employing the stacking ensemble learning strategy, multiple forecasting models, including LSTM, XGBoost, RR, and ResNet are integrated to forecast wind power generation. These base learners are intentionally selected for their heterogeneity and complementarity: LSTM captures long-term temporal dependencies, ResNet extracts local and hierarchical fluctuation features, RR models linear relationships under multicollinearity, and XGBoost learns nonlinear interactions among multiple features. By combining these models within a stacking framework, the proposed method aims to exploit their complementary strengths and improve forecasting accuracy and robustness.
Hyperparameter Optimization: Due to the superior optimization capabilities and faster convergence speed of the ZOA compared to other optimization algorithms, it is adopted for optimizing the hyperparameters of the presented model. In this study, an improved ZOA incorporating Elite Opposition-Based Learning (EOBL) is employed to further enhance global exploration ability and avoid premature convergence. The enhanced ZOA is a sophisticated machine learning technique that utilizes elite reverse learning to maximize the performance of each model within the stacking ensemble architecture. This optimization aims to determine the ideal set of parameters that most effectively align with the predictive models, thus enhancing the precision of the forecasting approach.
The structure of this document is organized in the subsequent manner:
Section 2 describes the development of the preliminary processing element within the IZOA-Stacking wind turbine power forecasting model.
Section 3 describes the IZOA optimization algorithm, the stacking composite learning strategy, and the conceptual framework of the IZOA-Stacking model.
Section 4 introduces the model evaluation metrics and provides a relative analysis of the forecasting results from multiple models, thereby validating the feasibility of the proposed model. Finally,
Section 5 concludes this study.
2. Data Preprocessing and Filtering
The initial wind speed data collected from wind farms contain some abnormal values with short-term drastic fluctuations, which mask the true features of wind speed data. Moreover, the different characteristics of wind power generation in each period have varying impacts on power generation, which results in challenges when directly applying the raw data to forecasting wind power, making it hard to attain optimal prediction accuracy. The wind power prediction model, referred to as Improved Zebra Optimization Algorithm-Stacking (IZOA-Stacking), is outlined in this study. Its integration of a preprocessing module for initial data handling and preparation is intended to enhance the precision of wind power forecasts and reduce its inherent unpredictability. In addition, it is intended to evaluate the impact of various wind power features. The development procedure for the preprocessing module of the IZOA-Stacking wind power forecasting model is outlined as follows:
A correlation analysis is conducted on the various characteristics that influence wind power to determine the impact of each characteristic on wind power. After determining the correlation of each characteristic, they are screened, retaining some characteristics that are highly correlated with wind power. The aforementioned characteristics, which have been retained, are then subjected to a process of normalization and weighting, with a view to utilizing them as input features for the forecasting model.
- 2.
Wind Speed Anomaly Data Processing
If the wind speed characteristic is successfully retained after the first screening step, the second step is carried out. This involves screening the wind speed anomaly data and removing abnormal data to obtain data that can reflect the true properties of wind speed. Then, the missing wind speed data is filled with normal data to obtain regular and continuous wind speed data.
2.1. Analysis and Processing of Different Characteristics of Wind Power
Grey relational analysis (GRA) evaluates the closeness of relationships based on the geometric similarity between sequences; i.e., the proximity of sequence curves determines the degree of correlation. This analysis method effectively overcomes the limitations that traditional mathematical statistics analysis may encounter in system evaluation. It is insensitive to sample size and regularity, has a simple calculation process, and can reduce inconsistencies between quantitative calculation and qualitative evaluation.
Spearman’s Rank Correlation Coefficient Analysis, due to its intuitiveness and high reliability, is widely used in data analysis and modelling in various fields. This method is a common means of assessing the linear association between two variables, and it is particularly effective when the variables follow a normal or approximately normal distribution.
Before implementing GRA and Spearman’s Rank Correlation Coefficient Analysis, quantitative analysis is needed to preprocess these wind power characteristics with dimensionless processing to allow for more accurate correlation analysis. The main process of correlation analysis for influencing wind power characteristics is as follows:
Obtain a 60-day dataset of wind power from a specific wind power plant, featuring a 15 min time resolution, sourced from the Xihe Energy Meteorological Big Data Platform. This dataset serves as the reference sequence. For comparison, the other influencing factors are regarded as comparison sequences where represents the number of features.
Both the reference sequence and the comparison sequence are normalized using the Z-score standardization method, and the mathematical formula is:
In this context, N denotes the complete set of data points. The variables and denote the values of the -th feature and wind power generation, respectively, at the k-th data point. The mean values of the -th feature and wind power generation across all data points are represented by and , respectively, while and denote their respective standard deviations. Finally, and represent the standardized values of the -th data point for the -th feature and wind power, respectively.
Conduct grey relational analysis on each feature that affects wind power. The mathematical formula is:
where
represents the grey relational coefficient of the
k-th data point for the
-th feature,
denotes the distinguishing coefficient within the range of [0, 1], and
represents the grey relational grade of the
-th feature.
Pearson correlation coefficient analysis is performed on each feature that influences wind power, and the absolute values are taken. The mathematical formula is as follows:
where
represents the Pearson correlation coefficient of the
characteristic, and
and
denote the average values of the
feature and wind power output, in turn.
After determining the relevance of each feature, the relevance is first normalized. Then, the correlation coefficients of each feature from the two correlation analysis methods are averaged. Finally, the correlation of each feature is analyzed. The mathematical formula is as follows:
where
represents the normalized value of the grey relational analysis and Pearson interrelation coefficient of the
-th feature, and
is the mean-processed interrelation coefficient of the
-th feature. An excessive number of parameters amplifies model complexity, driving up computational costs and making it more prone to overfitting, while an inadequate selection of features may result in the loss of crucial details, thereby affecting prediction performance; two features, exhibiting minimal correlation with wind power, were excluded to derive
and attain the intended predictive performance of the model, where
.
2.2. Wind Speed Outlier Processing
If the wind speed feature is successfully retained after feature selection, wind speed outlier processing is performed. Outliers typically refer to values that are significantly different from other data points. These data may be caused by equipment errors, input errors, or very rare weather conditions. If these outliers are not processed during model training, they may distort the model’s learning process, thereby reducing the model’s prediction accuracy under normal conditions. OneSVM is an algorithm that utilizes SVM technology for anomaly detection, specifically designed for training with only normal sample data. The algorithm identifies abnormal samples by recognizing data points that deviate significantly from the decision boundary. As an unsupervised learning algorithm, OneSVM exhibits efficiency and robustness in processing high-dimensional datasets with intricate nonlinear features. The process of employing the OneSVM algorithm to screen wind speed outliers is as follows:
Obtain the 60-day historical wind speed data
with a time scale of 15 min from the XiHe Energy Meteorological Big Data Platform wind farm. Utilize the Radial Basis Function (RBF) kernel in calculating the similarity between each pair of data points. Mapping the dataset into a higher-dimensional feature space allows OneSVM to effectively address nonlinear challenges. The mathematical formula is as follows:
where
and
are the wind speeds at the
time point, respectively, where
means
represents the width parameter of the RBF kernel;
represents the similarity between
and
.
The OneSVM objective function is employed to calculate and find the Lagrange multiplier
value corresponding to each time point. The formula for calculation is provided as follows:
where
is a column vector of the optimal combination of Lagrange multipliers for each time point;
and
are the Lagrange multipliers of any two data points in the 60-day historical wind speed data; and, denoted as
C, a positive constant is employed to regulate the model’s flexibility.
The decision function is employed for decision screening. If the decision function at a certain time point is greater than 0, it is a normal data point; if it is less than 0, it is an outlier. The calculation formula is as follows:
where
is the decision function, which serves to assess if the wind speed value at the
data point constitutes an anomaly; the function
represents the signum function, which outputs either 1 or −1, based on the polarity of the input value; and
is the offset of the decision boundary.
After removing the wind speed outlier data points, they are replaced and updated with their adjacent normal values. The update formula is:
where
is the updated and replaced wind speed data at time point
. If
is an outlier and
,
, then the wind speed value
at the
time point after updating selects a farther normal data point;
,
,
are the wind speed data before updating at time points
,
, and
, respectively;
is the set of outlier wind speed data. The updated wind speed model dataset is then denoted as
.
3. IZOA-Stacking Wind Power Prediction Model
3.1. IZOA Optimization Algorithm
The Improved Zebra Optimization Algorithm (IZOA) is an innovative swarm intelligence-based optimization method. By utilizing the IZOA, the parameters of each base learner in stacking can be optimized to achieve improved predictive performance. Unlike alternative algorithms, the IZOA method demonstrates a faster convergence rate, improved global search efficiency, and greater robustness in terms of adaptability. The IZOA mainly comprises the subsequent stages:
Step 1: Randomly perform initialization and establish a zebra population of zebra individuals, each with dimensional position information.
Step 2: Improve the ZOA through EOBL, introducing a dynamic opposite point:
where
are random numbers between (0, 0.5), conforming to a uniform distribution;
and
represent the min and max values, respectively, of the
-th dimensional positional data of the
-th zebra individual;
refers to the updated
-th dimensional position of the
-th zebra individual after the application of the dynamic reverse point technique; and
denotes the
-th dimensional position of the
-th zebra individual prior to the dynamic reverse point’s introduction,
,
.
Obtain a reverse zebra population with zebra individuals, each having -dimensional position information, through the dynamic reverse point of the elite reverse learning method.
Step 3: Compare the fitness values associated with the positional data of each individual zebra in the initial population and its dynamic reverse zebra population. Select
zebra individuals with better fitness from high low to form the final initialized population. Then, the zebra population
improved by the elite reverse learning method is expressed as:
where
represents the maximum position dimension of each individual zebra, referring to the parameters of each model to be optimized in the stacking model, namely the parameters of XGBoost, LSTM, ResNet, and RR.
The fitness calculation formula is:
where
represents the fitness of the
-th zebra; the quantity
represents the total count of wind power prediction samples;
is the true power value at time
.
Step 4: If the number of iterations
, then perform the zebra foraging behaviour, i.e., the first stage, by imitating the behaviour of zebras searching for food to refresh the population composition. Zebras’ primary diet includes grasses and sedges; in times of food scarcity, the zebra may also consume other plant material, including shoots, fruit, bark, roots, and leaves. Foraging time for zebras, depending on the quality and availability of plants, can occupy 60 to 80% of their daily time. Plains zebras, in particular, tend to graze the taller, less nutritious grasses first, creating opportunities for other species seeking shorter, more nutrient-rich forage. In the IZOA model, the best-performing members of the population are considered the leading zebras, leading fellow members to the designated area within the search space. Hence, the adjustment of position during the zebra foraging process can be accomplished using the subsequent mathematical model:
where
denotes the provisional new location of the
-th zebra in the
-th dimension at the
-th iteration;
is a random number between 0 and 1;
is the original position of the
-th zebra in the
-th dimension at the
-th iteration;
is the position of the fittest zebra (i.e., the pioneer zebra) in the
-th dimension at the
-th iteration; and
is the new position of the
-th zebra in the
-th dimension at the
-th iteration. If the fitness
of the new position of the
-th zebra is better than the fitness
of the current position, the zebra moves to the new position; otherwise, it remains at the current position. The Root Mean Square Error (RMSE) is used to evaluate the fitness level.
Step 5: When the number of iterations reaches 75, the zebras engage in defensive behaviour against predators, marking the second stage. At this stage, the escape behaviour of zebras in reaction to predatory threats is employed to alter the locations of zebra individuals within the search domain of the IZOA throughout the specified region. When confronted with lion attacks, zebras employ an erratic zigzag route characterized by unpredictable lateral shifts. Nevertheless, when facing smaller predators such as hyenas and wild dogs, zebras exhibit increased aggression in their behaviour, utilizing collective strength to confuse and intimidate the predators. In the IZOA settings, the following two scenarios are assumed to occur with equal probability:
Step 6: Update the zebra position according to the fourth and fifth steps. If the iteration exceeds the maximum iteration number, the training ends and outputs the position information of all global optimal zebra dimensions, that is, the parameters of XGBoost, LSTM, ResNet, and RR. Otherwise, jump to the fourth step to continue parameter optimization.
It is important to emphasize that the IZOA optimizes hyperparameters using only the training set (1 April–20 May) with 5-fold cross-validation. The test set (21 May–30 May) remains completely unseen during the entire tuning and training process, and is used only once for the final evaluation of the optimized stacking model.
3.2. Stacking Ensemble Learning Strategy
The stacking ensemble learning strategy, often referred to as stacked generalization, is a method for combining multiple models in a hierarchical manner. Its workflow is summarized as follows:
Define the dataset serving as the input for the stacking ensemble learning approach. With a time scale of 15 min, the data point samples comprise 5760 samples. Each sample consists of a feature vector and its corresponding wind power, forming the dataset , ,…, .
Perform a split on the dataset to obtain training and test datasets.
Develop multiple beginner models for the first layer and apply 5-fold validation on the training dataset, constituting a fundamental aspect of the methodology. The procedure involves the division of the training dataset into five segments of uniform size. For each base model, one part is used for validation, while the remaining four parts are used for training. For each beginner learner, record its prediction results on the validation subset and test set, thereby generating a corresponding set of validation subset and test set prediction results for each type of beginner learner.
Establish a second layer learner, namely the secondary learner. This layer learner uses all the validation subset prediction results of each beginner learner in the first layer as training data, and the averaged test set prediction results as new test data. The meta-learner is trained on this basis to form the ultimate prediction model.
By employing this approach, the results obtained from each phase of learning serve as the input for the subsequent phase, ultimately improving the model’s generalization capabilities.
To mitigate overfitting given the limited 60-day dataset, the LSTM architecture employs dropout (0.3) and recurrent dropout (0.3), together with early stopping (patience = 10). The ResNet model includes dropout layers (0.3) after each residual block and uses batch normalization. These regularization strategies, combined with the stacking ensemble and deterministic cross-validation, ensure stable generalization.
3.3. IZOA-Stacking Ensemble Algorithm Framework Design
In constructing the IZOA-Stacking ensemble learning framework, the foremost task is to select base learners that are both high-quality and diverse. XGBoost trees represent a class of extremely efficient and predictive distributed gradient boosting algorithms, boasting computational speeds up to ten times faster than traditional gradient boosting methods. However, the XGBoost model exhibits sensitivity to outliers in wind speed features within wind power data; a prevalence of such outliers in wind speed measurements can bring about a degradation in the performance of the model. LSTMs leveraging their deep learning capabilities to capture historical information in time series data are extensively utilized in various time series forecasting tasks. However, in wind power forecasting, LSTM models struggle to effectively manage the multi-scale temporal characteristics inherent in wind speed fluctuations. ResNets, with their distinctive convolutional architecture, demonstrate robust feature extraction capabilities when processing spatial data types like images and time series, and thus are regarded as a significant breakthrough in the field of deep learning.
However, compared to other neural network architectures, the ResNet tends to overlook the correlations between local and global aspects in the processing of various wind power features due to convolutional layers typically managing only a limited range of temporal steps. RR is a technique specifically designed to address multicollinearity in data analysis by incorporating an L2 regularization term into the loss function, thereby preventing model weights from becoming excessively large. This effectively controls model complexity and mitigates overfitting. Nevertheless, the typically complex and nonlinear associations between wind power and its characteristics lead to erratic prediction accuracy in the RR model.
Figure 1 depicts the structural diagrams of each model within the ensemble learning framework with stacking strategy.
Based on the above considerations, the selection of base learners in this study follows a task-oriented principle, aiming to match the characteristics of wind power data with complementary model families rather than searching for a single universally best algorithm. From a methodological perspective, the proposed framework does not introduce a completely new learning algorithm; instead, it emphasizes a task-oriented design of the ensemble components. The four base learners are chosen to reflect a variety of model families, including tree-based boosting, recurrent neural networks, convolutional residual networks, and linear regularized regression, allowing the stacking ensemble to capture nonlinear interactions, temporal dynamics, and approximate linear trends concurrently. This diversity is important for wind power forecasting, where meteorological variables, turbine characteristics and operational constraints interact at multiple scales. By embedding IZOA-based hyperparameter optimization and the tailored preprocessing module into this heterogeneous ensemble, the proposed framework aims to provide a practically useful and reproducible recipe for short-term wind power prediction rather than a purely algorithmic novelty.
In the IZOA-Stacking ensemble model strategy adopted in this study, the primary learners consist of XGBoost, LSTM, ResNet, and RR, while the secondary learner is chosen as XGBoost. The model’s parameters are optimized through the use of the Zebra Optimization Algorithm, and incorporating the 5-fold KFOLD cross-validation approach ensures the generalization ability and the model’s prediction accuracy. The presented IZOA-Stacking wind power forecasting model is illustrated in
Figure 2.
4. Case Analysis
This study selected wind farms from the Xihe Energy Meteorological Big Data Platform as case study subjects, utilizing wind power data from 1 April 2024 to 30 May 2024, for analysis. Each wind farm possessed a total capacity of 50 MW, with data recorded every 15 min, resulting in 96 data points per day. This study aimed to conduct short-term predictions of wind power output by leveraging data from the preceding three time intervals to estimate the generation for the upcoming period. Data samples from 1 April 2024 to 20 May 2024 were designated as the training set, while samples from 21 May 2024 to 30 May 2024 were allocated to the test set. The experimental platform utilized in this study was PyCharm 2021.
4.1. Model Evaluation Metrics
Cost of Purchasing Electricity
To assess the predictive accuracy, model fit, and reliability of the proposed wind power prediction model, the following assessment criteria are used: the RMSE is utilized to assess precision, the standard deviation of training error (SD) is utilized as a measure of robustness, and the coefficient of determination (Rsquared) is adopted to indicate the quality of fit. Their mathematical formulas are:
where
represents the actual wind power value at the
data point;
denotes the predicted wind power value at the
data point;
represents the average actual wind power value at the
data point; and
represents the average training error.
4.2. Analysis of Results of the Preprocessing Module Based on the IZOA-Stacking Wind Power Prediction Model
To address the significant discrepancies of various wind power characteristics, the impact of this factor on wind power output being considerable, the data preprocessing module of the established model was employed to analyze and filter each distinct wind power feature. Both grey relational analysis and Pearson Correlation Coefficient Analysis were conducted on each wind power characteristic. The grey relational degree diagram and Pearson radar chart are displayed in
Figure 3 and
Figure 4, respectively, with the findings outlined in
Table 1. As indicated in
Table 1, wind speed characteristics exert the most substantial impact on wind power output, followed by atmospheric pressure, humidity, and temperature.
After preserving the wind speed characteristics, anomaly detection and preprocessing are conducted on the wind speed data. To evaluate the impact of handling anomalous wind speed data on the correlation with wind power generation, a correlation analysis is performed across varying degrees of wind speed data processing.
Figure 5 presents a scatter plot comparison between wind speed-associated power generation prior to outlier treatment and the predicted power generation corresponding to different levels of wind speed processing.
The figure demonstrates that data cleansed using the OneSVM algorithm effectively maintains the intrinsic wind power characteristics of the wind farm, efficiently eliminating anomalous power records and wind power data during wind curtailment periods.
Table 2 illustrates the impact on wind power correlation after randomly processing 10%, 50%, and 100% of the wind speed data. As shown in
Table 2, upon retaining wind speed characteristics, randomly processing 10%, 50%, and 100% of the wind speed data results in enhanced correlation coefficients compared to unprocessed wind speed data. Notably, randomly processing 100% of the wind speed data exhibits the highest correlation with wind power generation.
To quantify the contribution of the preprocessing module to the final forecasting performance, an ablation study is conducted. Five configurations are considered: Model 1 uses all raw features without feature selection or wind speed outlier processing; Model 2 performs feature selection only; and Models 3–5 apply both feature selection and wind speed outlier processing, where 10%, 50% and 100% of the wind speed samples are corrected, respectively. All other components of the stacking framework and the IZOA optimization procedure are kept identical so that the performance differences can be attributed to the preprocessing strategy. The parameter search ranges are summarized in
Table 3, and the corresponding prediction results of the five configurations are reported in
Figure 6 and
Figure 7.
The RMSE,
, and SD prediction evaluation metrics of each model’s prediction results are shown in
Table 4. As shown in
Table 4, after selecting different wind power features, the RMSE,
, and SD of the IZOA-Stacking wind power prediction model decrease by 0.307 MW, increase by 1.27, and decrease by 0.036 MW, respectively. With 10%, 50%, and 100% of wind speed data randomly processed, the RMSE gradually decreases by an average of 0.161 MW,
gradually increases by an average of 0.57, and SD gradually increases by an average of 0.019 MW. These results demonstrate that the IZOA-Stacking wind power prediction model preprocessing module does reduce wind power prediction error and improves prediction accuracy and goodness of fit.
4.3. Power Prediction Result Analysis Based on the IZOA-Stacking Wind Power Prediction Model
The preprocessing module of the IZOA-Stacking wind power forecasting model analyzes and processes the wind farm data to extract relevant features related to wind power, such as processed wind speed data, atmospheric pressure, humidity, and temperature. These features are integrated with the wind power data to create multi-dimensional input data for the IZOA-Stacking wind power forecasting model. To assess the efficiency and precision of the suggested wind power prediction approach, various models including XGBoost, LSTM, ResNet, RR, stacking, and IZOA-Stacking are compared. The results of the forecast are depicted in
Figure 8, while
Figure 9 presents the boxplots for each forecasting model, where models a-f correspond to XGBoost, LSTM, ResNet, RR, stacking, and IZOA-Stacking, respectively. The findings presented in
Figure 8 and
Figure 9 demonstrate that the IZOA-Stacking model for wind power forecasting exhibits a closer alignment with actual measurements than alternative forecasting approaches, suggesting superior accuracy in prediction.
Table 5 shows the RMSE,
, and SD prediction evaluation metrics for the prediction results of each model. In terms of RMSE and SD metrics, the proposed model in this paper reduces the values by 2.749 MW and 2.12 MW, 3.58 MW and 1.762 MW, 1.905 MW and 1.859 MW, 2.23 MW and 1.641 MW, 2.211 MW and 2.202 MW, respectively, compared to the XGBoost, LSTM, ResNet, RR, and stacking prediction models. In terms of the
metric, the proposed model improves the values by 14.39%, 20.96%, 8.77%, 10.81%, and 10.69%, respectively, compared to the XGBoost, LSTM, ResNet, RR, and stacking prediction models, demonstrating superior predictive performance and enhanced stability in forecasting results.
4.4. Discussion
This study uses wind power data only from 1 April 2024 to 30 May 2024, covering a two-month spring period. This limited scope does not capture seasonal variations in wind intensity and turbulence. The model’s performance under winter, summer, or autumn conditions remains untested, which restricts generalizability. Future work should extend the dataset to a full year.
- 2.
Comparison with Existing Stacking Ensemble Literature
A similar methodological genre can be found in Yang et al. [
35]. Compared to their work, our IZOA-Stacking introduces three distinct contributions: (1) a OneSVM-based outlier replacement module that corrects 100% of anomalous wind speed data, measurably increasing feature correlation; (2) an Improved Zebra Optimization Algorithm (IZOA) for hyperparameter tuning; and (3) a task-oriented selection of four complementary model families (XGBoost, LSTM, ResNet, ridge regression). Our experiments confirm that the full preprocessing pipeline outperforms both raw-feature baselines and the unoptimized stacking ensemble.
- 3.
Computational Cost and Practical Deployment Trade-offs
Beyond forecasting accuracy, the computational cost of the proposed framework is also considered. In terms of computational cost, the IZOA-Stacking framework is more demanding than individual base models because it requires repeated training of multiple learners during the cross-validation and hyperparameter search stages. In our implementation, the training time is approximately 5 to 10 times higher than that of a single XGBoost model on the same hardware, which constitutes a clear computational cost constraint. However, the forecasting time for new samples remains comparable to that of a single ensemble model, as inference only involves a forward pass through the trained base learners and meta-learner. For practical deployment, this trade-off suggests that the proposed method is suitable for scenarios where models can be updated offline while near-real-time prediction is required online. Nevertheless, this computational cost constraint potentially limits the use of the proposed framework in highly resource-constrained environments that require frequent offline retraining, as the repeated training overhead may become prohibitive. For online real-time forecasting, where future data is unavailable, any newly arriving data point detected as anomalous by the OneSVM model is replaced by the most recent normal value . This ensures the model can operate in real-world dispatch scenarios without look-ahead bias.
- 4.
Implications for Practice, Policy, and Society
Implications for practice and economic impact: More accurate 15 min ahead forecasts reduce imbalance penalties, reserve costs, and wind curtailment, directly improving wind farm profitability. Implications for policy and grid management: Reliable forecasting supports high renewable integration, reduces the need for fossil-fuel reserves, and enhances grid stability. Policymakers could incentivize advanced ensemble methods via grid codes. Implications for society: Stable and cost-effective wind power lowers electricity prices, reduces pollution, and accelerates the renewable energy transition.
- 5.
Contributions to Research and Education
This work provides a reproducible case study with fixed random seeds for data splitting and 5-fold cross-validation, ensuring all metrics (RMSE, R2, SD) can be exactly reproduced. The framework can serve as a benchmark in graduate-level courses on renewable energy forecasting and ensemble learning. Ablation experiments clearly demonstrate the marginal contribution of each component.
5. Conclusions
This paper develops a short-term wind power forecasting framework termed IZOA-Stacking, which combines a dedicated multi-feature preprocessing module, a set of complementary base learners, and an Improved Zebra Optimization Algorithm within a stacking ensemble architecture. Rather than pursuing algorithmic novelty for its own sake, the strength of the proposed model lies in its task-oriented design: explicitly matching complementary model families to the complex, multi-scale characteristics of meteorological and operational variables in wind power forecasting. Rather than proposing an entirely new learning paradigm, the contribution of this work lies in integrating and tailoring established techniques to the specific characteristics of wind power data and in systematically evaluating the effect of each component.
First, the preprocessing module jointly applies grey relational analysis and Pearson correlation analysis to quantify the relevance of meteorological and operational variables to wind power output. By averaging the normalized relevance scores from the two criteria and discarding weakly correlated variables, the module mitigates the imbalance of feature scales and reduces the risk of overfitting during model training. When wind speed is retained, a One-Class SVM is used to detect and correct anomalous wind speed values, which further improves the consistency between the input features and the actual operating conditions of the wind farm.
Second, an ensemble of XGBoost, LSTM, ResNet and ridge regression is constructed as the first-level learners in the stacking framework. These models are intentionally selected because they exhibit complementary inductive biases: XGBoost handles tabular and nonlinear relationships efficiently, LSTM is suitable for sequential temporal dependencies, ResNet enhances multi-scale feature extraction, and ridge regression provides a simple, low-variance linear baseline. This task-oriented selection ensures that the ensemble can simultaneously capture nonlinear interactions, temporal dynamics, multi-scale patterns, and approximate linear trends—capabilities that are essential for wind power data characterized by fluctuating meteorological variables and operational constraints. Using XGBoost as the meta-learner allows the framework to exploit heterogeneous base-model outputs and alleviate individual model weaknesses.
Third, an Improved Zebra Optimization Algorithm is employed to tune the hyperparameters of all base learners and of the meta-learner. The elite opposition-based strategy in IZOA enhances global exploration and accelerates convergence, reducing the sensitivity of the overall framework to manual parameter selection. Comparative experiments show that, under the same training and test sets, the IZOA-Stacking model achieves lower RMSE and SD and a higher R2 than the individual base models, the conventional stacking ensemble and other reference approaches. To providing a clearer view of the role of each component, ablation experiments are conducted on the preprocessing and ensemble modules. When only feature selection is applied but wind speed outlier processing is omitted, the predictive performance improves compared with using all raw features, but remains inferior to the full preprocessing pipeline. When different proportions of wind speed data are corrected, the correlation between wind speed and power output, as well as the overall forecasting accuracy, is further enhanced. Similarly, comparing the single models, the unoptimized stacking ensemble and the IZOA-Stacking model, confirms that the combination of stacking and IZOA-based hyperparameter tuning yields the most accurate and stable forecasts.
Despite these advantages, several limitations should be acknowledged. The case study is based on data from a single wind farm within a relatively short time span, which may restrict the generalization of the results to other locations, seasons and turbine configurations. In addition, the multi-stage framework, including feature analysis, anomaly detection and meta-learning, introduces additional training-time computational cost compared with simpler single-model approaches. Although the prediction stage remains efficient once the model is trained, future work will consider more detailed complexity analysis and potential model simplifications for real-time or resource constrained deployments. Moreover, the current experimental design, while including multiple benchmark models, does not yet cover the full range of state-of-the-art optimization algorithms and ensemble strategies.
We acknowledge that the current study is based on a single 50 MW wind farm with only two months of data (April–May 2024), which limits the generalizability of our findings across different seasons, geographies, and turbine configurations. In future research, we plan to extend the proposed methodology to longer forecasting horizons and multiple wind farms, and to incorporate additional baseline models and optimization techniques for a more comprehensive comparison. Another promising direction is to formulate multi-objective optimization schemes that explicitly balance forecast accuracy, model complexity and computational cost. Beyond these extensions, the real-time application of wind power predictions can be further enhanced by coupling them with intelligent control systems. For instance, exploring the integration of the proposed forecasting model with reinforcement learning controllers in off-grid microgrids could optimize hydrogen storage utilization and minimize unmet electrical loads [
36]. Such a combination would enable adaptive decision-making under uncertainty, leveraging forecasted wind power to schedule hydrogen production and storage operations dynamically. Additionally, more advanced learning paradigms, such as meta-reinforcement learning, offer the potential to accelerate the adaptation to new wind farms or rapidly changing weather patterns with minimal retraining [
37]. These directions, together with the multi-objective optimization schemes mentioned above, are expected to provide power system operators with more robust, interpretable, and operationally aware tools for planning and real-time control in renewable-dominated power systems.