An Adaptive, Data-Driven Stacking Ensemble Learning Framework for the Short-Term Forecasting of Renewable Energy Generation

: With the increasing integration of wind and photovoltaic power, the security and stability of the power system operations are greatly inﬂuenced by the intermittency and ﬂuctuation of these renewable sources of energy generation. The accurate and reliable short-term forecasting of renewable energy generation can effectively reduce the impacts of uncertainty on the power system. In this paper, we propose an adaptive, data-driven stacking ensemble learning framework for the short-term output power forecasting of renewable energy. Five base-models are adaptively selected via the determination coefﬁcient (R 2 ) indices from twelve candidate models. Then, cross-validation is used to increase the data diversity, and Bayesian optimization is used to tune hyperparameters. Finally, base modes with different weights determined by minimizing the cross-validation error are ensembled using a linear model. Four datasets in different seasons from wind farms and photovoltaic power stations are used to verify the proposed model. The results illustrate that the proposed stacking ensemble learning model for renewable energy power forecasting can adapt to dynamic changes in data and has better prediction precision and a stronger generalization performance compared to the benchmark models.


Introduction
With increasing global climatic warming and environmental issues, renewable energy sources are receiving increasing attention, especially wind and solar power.Due to the randomness and intermittency of wind and solar resources, the high penetration of wind and photovoltaic (PV) power generation causes uncertainty in the power system.Accurate and stable short-term forecasting for wind and PV output power is crucial to maintain the balance between the supply and demand of power systems, optimize the configuration of rotating reserve capacity, and make dispatching decisions in the power market environment [1,2].Data-driven prediction models for wind and solar renewable energy combined with artificial intelligence and machine learning technology are widely used, owing to their strong ability to mine historical data [3].
For data-driven renewable energy generation prediction, a complex nonlinear mapping relationship between the input features and the output power usually needs to be constructed.Traditional time series models such as regressive (AR), AR moving average (ARMA), and AR integrative moving average only define a linear mapping relationship between input and output, increasing the prediction error with each forecast interval [4].Advanced machine learning methods are capable of building a strong nonlinear inputoutput map through a black-box concept [5,6].A number of regression models use blackbox mapping, e.g., artificial neural network (ANN) [7], and support vector machine regression (SVR) [8].ANNs simulate the biological neural network constituting the brain, Energies 2023, 16,1963 2 of 20 consisting of a number of connected neurons that carry and transmit signals.Deep neural network methods, such as autoregressive neural networks [9], convolutional neural networks [10], and long-and short-term memory neural networks [11,12], have been developed rapidly due to their strong feature-capturing ability with little prior knowledge.Nevertheless, the network framework of deep learning is relatively complex, requiring a large amount of training data, and cannot outperform other prediction models with a small sample.SVR uses a kernel function to transform the original feature space to a highdimensional space, then constructs a linear map, overcoming the problem of dimensionality and achieving effective results with a small sample dataset.Therefore, SVR is selected as the candidate model in this paper.
In recent years, tree ensemble machine learning models [13,14], such as extreme gradient boosting [15] and gradient boosting trees [16], have received increasing attention in industry and academic research due to their open architecture, low computing cost, and robustness.The authors of [17] compared the performance of random forest (RF), extreme regression tree (ET) and support vector machine regression (SVR) for the prediction of photovoltaic power; the ET model achieved the best performance in terms of forecasting accuracy, calculation cost, and stability indices.The authors of [18] described the advantages of tree ensemble learning models, including RF, gradient boosting trees (GBRTs), and extreme gradient boosting (XGB), for wind speed and solar radiation prediction in comparison with the SVR method.The authors of [19] evaluated the performance of XGB and GBRT machine learning methods for solar irradiance prediction.The use of a single model for forecasting renewable energy, as mentioned above, may cause low prediction accuracy and insufficient generalizability when processing various non-stationary datasets.
A hybrid model based on ensemble learning can combine the advantages of different models to improve prediction accuracy and stability performance.Such models are more robust than a single model and are widely applied in energy generation prediction.The authors of [20] proposed a hybrid model combining ET with a deep neural network for the prediction of hourly solar irradiance.The authors of [21] combined a long short-term memory neural network with a convolutional neural network to predict solar irradiance.The authors of [22] adopted a stacking fusion framework based on RF regression tree, adaptive boosting (ADA), and XGB for the prediction of photovoltaic power and achieved improved prediction accuracy.The authors of [23,24] built a new hybrid model based on multiple deep learning methods for wind power prediction.The methods mentioned above use a combined model, improving the prediction accuracy and stability on some levels but ignoring the complex changing dynamic characteristics of the datasets.The factors affecting wind and PV output power are complicated, and the collected meteorological and historical data are high-dimensional and heterogeneous.Therefore, the ensemble learning framework adaptively selects optimal basis models according to data characteristics, representing a key technology to improve the accuracy and generalization performance of prediction models.
In this paper, we propose an adaptive, data-driven stacking ensemble learning framework for predicting renewable energy output power through the deep mining of historical data.Twelve diverse regression models that have been successfully used to mine information hidden in the raw datasets of renewable forecasts are applied as candidate forecast models [25][26][27].To reduce the negative effects of uncertainty hidden in the historical data and to enhance the generalization performance, an adaptive ensemble framework is developed, which can adaptively select five optimal models based on measurement indices.The optimal hyperparameters of each base-model are tuned using Bayesian optimization, and a linear regression method is employed as a meta-model to combine the five selected base-models.The weight of each base-model can be adaptively obtained according to the principle of cross-validation.Various case studies based on actual data from a wind farm and PV station located in Middle China verify the effectiveness of the proposed adaptive stacking ensemble learning model for renewable energy output power forecasting.In summary, the key contributions of this paper are as follows: Energies 2023, 16,1963 3 of 20 (1) A novel, data-driven, adaptive stacking ensemble learning framework is developed for the output power forecasting of renewable energy.The stacking structure and different base-models deeply explore the information hidden in the raw data, thereby boosting the regression ability for multi-dimensional heterogeneous datasets.(2) Twelve independent candidate regression models, including bagging, boosting, linear, K nearest neighbor and SVR methods, are comprehensively compared.Then, five better models are determined adaptively to integrate the stacking ensemble structure.
The diversity among the different base-models can ensure the excellent stability and generalization performance of the stacking model.(3) A meta-model is constructed using the linear regression method.The weights of base-models are determined via minimizing the cross-validation risk of the basemodels estimator.(4) The hyperparameters of base-models and meta-model are tuned and optimized using the Bayesian global optimization method, which further enhances the forecasting accuracy of the proposed model.

Adaptive Ensemble Learning Framework for Renewable Energy Forecast
Twelve methods with good performance for renewable energy power prediction in the current literature are used as candidate models, including boosting algorithms such as adaptive boosting (ADA), GBRT, XGB and light gradient boosting machine (LGBM) methods; bagging algorithms such as decision tree (DT), bagging, RF, extreme tree; linear regression (LR), K-nearest neighbor regression (KNN), elastic net regression (ELAN) and SVR algorithm.
Algorithms with different principles and structures can measure data from different perspectives, complementing each other.The diversity and excellent forecasting ability of the base-model is crucial to enhance the generalization and regression performance of the stacking ensemble learning framework.Generally, the first layer of the stacking learning framework selects three to five base learners.Too few learners have little effect on the performance of the integrated model; too many learners will cause redundancy of the model structure and an increase in computing cost, which is not conducive to the improvement of prediction accuracy.In this paper, 12 candidate models are trained and tested on the same dataset, and 5 models with better prediction performance in terms of the R 2 evaluation index are selected as base learners.The base-models adaptively selected may vary for different datasets as the module of base-model selection in Figure 1.
K-fold cross-validation is applied to prevent meta-model overfitting of the training data and enhance the generalization performance of the model.Cross-validation is a resampling method used to evaluate machine learning models, and K-fold means that a given data is spilt into K separate folds.One-fold is used to train the model, and K-1 folds are used to validate, and then an individual estimation is obtained by averaging the results of K evaluations [28].The model can be trained and validated on each fold data, increasing the model's fitness.That is to say, the input data to the meta-model is the out-of-fold predictions from multiple base-models.The overall framework of the proposed ensemble model for renewable energy output power forecasting is displayed in Figure 1; the procedure can be summarized as follows: (1) Twelve candidate models are trained and tested to select five base-models by evaluating the R 2 index.
For each base-model: a. Select a 5-fold split of the training dataset; b.
Tune hyperparameters using the Bayesian optimal method; d.
Store all out-of-fold predictions.
(2) Fit a meta-model on the out-of-fold predictions by linear regression.
(3) Evaluate the model on a holdout prediction dataset.
framework selects three to five base learners.Too few learners have little effect on the performance of the integrated model; too many learners will cause redundancy of the model structure and an increase in computing cost, which is not conducive to the improvement of prediction accuracy.In this paper, 12 candidate models are trained and tested on the same dataset, and 5 models with better prediction performance in terms of the R 2 evaluation index are selected as base learners.The base-models adaptively selected may vary for different datasets as the module of base-model selection in Figure 1.K-fold cross-validation is applied to prevent meta-model overfitting of the training data and enhance the generalization performance of the model.Cross-validation is a resampling method used to evaluate machine learning models, and K-fold means that a given data is spilt into K separate folds.One-fold is used to train the model, and K-1 folds are used to validate, and then an individual estimation is obtained by averaging the results of K evaluations [28].The model can be trained and validated on each fold data, increasing the model's fitness.That is to say, the input data to the meta-model is the outof-fold predictions from multiple base-models.The overall framework of the proposed ensemble model for renewable energy output power forecasting is displayed in Figure 1; the procedure can be summarized as follows: (1) Twelve candidate models are trained and tested to select five base-models by evaluating the R 2 index.
For each base-model: a. Select a 5-fold split of the training dataset; b.Evaluate using 5-fold cross-validation; c.Tune hyperparameters using the Bayesian optimal method; d.Store all out-of-fold predictions.
(2) Fit a meta-model on the out-of-fold predictions by linear regression.
(3) Evaluate the model on a holdout prediction dataset.

Methodology
Ensemble learning is a machine learning method that combines a series of base learn-

Methodology
Ensemble learning is a machine learning method that combines a series of base learners according to certain rules to obtain a strong learner, presenting a more robust performance than a single model.Ensemble techniques, including bagging, boosting and stacking, are popular and widely used in renewable energy generation prediction and load forecasting [29][30][31].

Regression Method Based on Boosting Learning
The boosting learning methods fit multiple weak learners on different versions of the training dataset, and then combines the predictions of the weak learners sequentially with different weights until a suitable strong learner is achieved [32].Tree-based boosting methods mainly include ADA, GBRT, XGB and LGBM.
AdaBoost uses the Cart tree as the base learner and conducts multiple iterations of learning to minimize the loss by changing the weights of base learners in each iterative step [27,32].GBRT uses a gradient boosting algorithm based on ADA and follows a shrinkage and regularization approach, which effectively improves the accuracy and stability of the prediction [27,33].
The XGB method adds several optimizations and refinements to the original GBRT, making the creation ensembles more straightforward and more generative.The details of XGB can be found in [20,22,27].LGBM is a modified XGB algorithm proposed by Microsoft in 2017.Gradient-based one-sided sampling (GOSS) and exclusive feature bundling (EFB) are used to enhance its histogram algorithm and decision tree growth strategy, improving the computing speed, stability, and robustness without reducing accuracy [18].Taking LGBM as an example, a given dataset D = {(x i , y i ) : i = 1 • • • N}, the input timeseries x i , and the output y i , constructing the nonlinear mapping y = f (x).Denoting the loss function L(y, f (x)) = (y − f (x)) 2 , the objective of model training is to find the function f *(x) = argmin f E y,x L(y, f (x)).The LGBM algorithm (Algorithm 1) steps can be written as follows: (2) Output: (d) Calculate the optimal weight for each regression tree T(x; Θ m ), where initial

presenting the initial weight of the regression tree; and
Θ is the parameters of the regression tree.

Regression Method Based on Bagging Learning
The bagging ensemble uses bootstrap replicates to obtain multiple different samples of the same training dataset as new training sets, and fits a decision tree on each new set.Due to perturbed training, the predictions for all of the created decision trees can reduce variance.Then, the predictions are combined, which can improve accuracy and prevent overfitting of the bagging method [34,35].
Random forest RF is an extension of bagging technology, which also uses bootstrap sampling to build a large number of training sample sets and fit different decision trees.Unlike bagging, to make the individual decision trees differ, RF estimates the input feature and then selects a number of samples as split candidates at each node [35].Out-of-bag (OOB) error estimation is employed to construct the forest, which can ensure unbiasedness and reduce forecast variance [36,37].
An extra regression tree (ET) is developed as an extension of the RF approach, which employs a classical top-down procedure to construct an ensemble of unpruned regression trees.As well as RF, a subset of features is randomly selected to train each base estimator.Unlike RF, ET randomly selects the best feature with the corresponding value to split the node.Additionally, ET employs the total training dataset to train each regression tree in the forest [36].These differences are likely to reduce overfitting, as interpreted in [38].

Other Regression Models
Linear regression is widely used in statistics to quantitatively analyze the dependence relationship between two or more variables.Basic linear regression is used to describe the linear relationship between variables.The least-square method is a commonly used algorithm to train the linear regression model.Elastic net is developed as an extension of linear regression.It adds L1 and L2 regularization parameters, which integrate the benefits of the least absolute shrinkage, selection operator (lasso) and ridge, resulting in a better performance for prediction [39].
K-nearest neighbor regression (KNN) carries out prediction by measuring the distance of a sample's nearest neighbor.KNN finds the K-nearest neighbors of a sample and assigns the mean value of some features of these neighbors to the sample.In other words, the mean value is the prediction value of the sample.The time series for wind power and PV power has a specific correlation in the time dimension.Theoretically, the KNN method is suitable for wind and PV power forecasting, and has been applied to renewable energy forecasting [40][41][42].
Support vector regression (SVR) is used to solve regression problems by adopting kernel functions to construct non-linear mapping.That is to say, the input space is mapped into a higher dimensional feature space, and a linear regression is performed in the feature space.The traditional empirical risk minimization principle only minimizes the training error.In contrast, SVR uses the structure risk minimization principle to minimize an upper boundary of the total generalization error with a certain confidence level.SVR is highly effective in solving non-linear problems, even with small sample events, and is popular in wind and PV power forecasting [36,43].

Stacking Ensemble
Stacking ensemble trains different base-models on the same dataset.Then, it uses a meta-model to combine the predictions generated via the base-models to achieve the ultimate predictions [44].The two-layer stacking ensemble learning framework is displayed in Figure 2. The first layer consists of multiple different basic learner models, and the input is the original data training set.The second layer is called the meta learner; the prediction from the first layer model is fed to the meta-model to make the ultimate prediction.The meta learner integrates the prediction ability of the basic learner model to improve the performance of stacking ensemble learning.Given the input dataset ), the dataset is divided into the training dataset, test dataset and validation dataset.Z h is the h-th base-model of the first layer.The prediction output of the Z h model on the validation set is Z h (x i ), and the prediction result of the Z h model on validation dataset is presented using Z * h (x i ).The output Z h (x i ) of the first layer model as a new training set is fed to the meta-model Z, and Z * h (x i ) as a test of the meta-model.The ultimate forecasting result can be written as follows:

Bayesian Hyperparameters Optimization
Bayesian optimization is derived from the famous Bayes theorem, which uses a probabilistic surrogate model to fit the objective function and selects the most "potential" evaluation point via the maximum acquisition function.The procedure of parameter optimization can reduce unnecessary sampling and make full use of the complete historical information to improve the search efficiency, and then obtain a global approximate optimal solution with low evaluation cost [45,46].Traditional optimization algorithms, such as grid search, particle swarm optimization, simulated annealing, etc., are not suitable for machine learning methods with large-scale parameters due to their expensive computing costs [46].
In this paper, the hyperparameters of the base-models and meta-model are tuned using Bayesian optimization, as shown in Figure 3. Firstly, a hyperparameter space Θ ∈ Λ, such as leaf nodes of the tree, and learning depth are defined.Given the dataset D = { (x 0 , y 0 ), • • • , (x i−1 , y i−1 )} , Bayesian global optimization can be described as Θ * ∈ argmax Θ∈Λ F(Θ), where Θ * is the optimal hyperparameter and F(Θ) is the objective function, indicating the loss of validation of the model with the hyperparameters.Assuming that F(Θ) cannot be observed directly, we can only obtain this by noise observations Y(Θ) = F(Θ) + ε, ε ∼ N(0, σ 2 noise ).The construction of a surrogate function and the selection of an acquisition function are critical technologies for Bayesian optimization.A surrogate function is built to express assumptions about the function to be optimized, and an acquisition function is selected to determine the next evaluation point.In this paper, the tree Parzen estimator (TPE) is employed to model the densities using a kernel density estimator, instead of directly modeling the objective function F by a probabilistic model p( f |D ) [47,48].More details about Bayesian optimization are discussed in [45][46][47][48].

Data
Wind speed (WS) and direction (WD), as the main meteorological features affecting wind output power, are selected as the inputs for the wind power prediction model, and wind power (WP) as the output.Data was collected from the SCADA system of a wind Wind speed (WS) and direction (WD), as the main meteorological features affecting wind output power, are selected as the inputs for the wind power prediction model, and wind power (WP) as the output.Data was collected from the SCADA system of a wind farm, located in central China.The installed capacity of the wind farm is 200 MW, and the rated power of each wind turbine is 2 MW.The historical data covers the whole year of 2020 with a 15-min time resolution, divided into four datasets depending on different seasons with 8832 samples in each season.Figure 4 gives an example of the historical dataset in Spring.

Data Standardization and Evaluation Indices
To reduce interference from outliers and differences from different data dimensions and ensure fairness of the forecast, the principle of maximum and minimum is applied for In the PV power model, the main meteorological features affecting PV output power are selected as the inputs of the prediction model, which include total irradiance (T_irr), normal vertical irradiance (V_irr), horizontal irradiance (H_irr) and temperature (Tem).The data is derived from a PV power station with 130 MW located in central China.Due to the characteristics of PV output power, the historical data from 07:00 to 18:00 is defined as effective, which consists of the whole year of 2020 with a 15-min time resolution.The dataset of each season contains 4095 time points.An example of the historical data for spring is shown in Figure 5.
From Figures 4 and 5, we can see that there are some differences between the characteristics of wind power and solar power.The time series of wind power is random, whereas the solar power time series has specific rules to follow.During the day, PV power can be generated only when the PV cells are radiated by the sun.At night, the output power from the PV station is 0. The diversity between the two datasets can be used to verify the model's universality.

Data Standardization and Evaluation Indices
To reduce interference from outliers and differences from different data dimensions and ensure fairness of the forecast, the principle of maximum and minimum is applied for normalization to (0, 1).It can be written as follows: ( 1, 2,..., ; 1, 2,..., )

Data Standardization and Evaluation Indices
To reduce interference from outliers and differences from different data dimensions and ensure fairness of the forecast, the principle of maximum and minimum is applied for normalization to (0, 1).It can be written as follows: where: x ij is the j-th sample of the variable i-th, and x ij is the corresponding normalization value; x i.max and x i.min represent the maximum and minimum values of i-th variable, respectively.Root-mean-square error (RMSE), mean absolute error (MAE) and determination coefficient R 2 are usually selected as the evaluation indices of the prediction model [49,50].The smaller the RMSE and MAE values, the smaller the prediction error will be.The determination coefficient R 2 measures the similarity between the actual and predicted values.The larger the value, the better the model fitting effect.These indices can be described as follows: where, P i and ∧ P i present the measured and prediction values, respectively; P i is the average of measured value and N is the number of samples.

Model Selection and Hyperparameter Optimization
As shown in Figure 1 of Section 2, 12 candidate models are simulated on four data cases to select 5 better base-models.The original data is divided into a training dataset (80% data) and a validation dataset (20% data).For the different datasets in spring, summer, autumn, and winter, five models with higher scores are adaptively selected as the basemodels according to the R 2 evaluation index.Especially, if the R 2 scores of the models are the same, the RMSE and MAE indices are used for further evaluation.The training and testing of the proposed model using Python 3.6 are conducted on a computer with Intel(R) Core (TM)i7-8565, CPU@1.80GHz, RAM 8.00 GB.
The results of the wind power prediction on the validation dataset are displayed in Table 1.Five base-models are selected with higher R 2 scores and lower RMSE and MAE values.For the spring dataset, the selected base-models are LGBM, GBRT, XGB, ADA, and RF, with corresponding R 2 values of 0.754, 0.746, 0.731, 0.701, and 0.698, respectively.For the summer dataset, the base-models are SVR, LGBM, GBRT, XGB, and ADA, with corresponding R 2 values of 0.689, 0.673, 0.667, 0.648 and 0.604 respectively.The base modes and their R 2 scores for the autumn dataset are XGB-0.869,GBRT-0.868,LGBM (0.867), RF (0.854), and KNN (0.853).For the winter dataset, they are GBRT (0.667), LGBM (0.662), ADA (0.633), SVR (0.634), and XGB (0.628).The R 2 scores of the same model vary greatly on a different dataset, such as LGBM, GBRT, and XGB, which indicate that a single forecasting model has certain limitations for different data.In addition, for the winter and summer datasets, the R 2 scores of all models are lower; the RMSE and MAE values are higher than those for the spring and autumn datasets, which is closely related to the fluctuation characteristics of the original data of wind speed, direction, and power.
The results for PV power forecast are listed in Table 2.For the spring dataset, the five models with higher R 2 scores are bagging (0.791), LGBM (0.762), RF (0.758), SVR (0.746), and ADA (0.743).Similarly, the base-models with higher R 2 scores are bagging (0.791), LGBM (0.762), RF (0.758), SVR (0.746), and ADA (0.743).For the autumn dataset, the highest R 2 scores are RF (0.615), GBRT (0.613), KNN (0.611), XGB (0.604), and bagging (0.581).For the winter dataset, the highest R 2 scores are GBRT (0.908), KNN (0.906), XGB (0.904), RF (0.896), and LGBM (0.894).The R 2 values of the ELAN model for wind power prediction and PV power prediction on all datasets are negative, which indicate that the model is unsuitable for renewable prediction.Like wind power forecasting, the evaluation indices of a model for PV power forecasting on different datasets are different.For all 12 models, the R 2 scores on the winter dataset are the highest, and RMSE and MAE values are the lowest, followed by summer, spring, and autumn.
Due to the significant difference between wind power and PV power time series, the base-models selected are also different, indicating the universality of different algorithms on different data.For example, the RF model is selected as the base-model on all four datasets for PV power prediction, whereas it is selected only on spring and autumn data for wind power forecasting, indicating that the performance of the RF method has certain limitations for data with stronger fluctuations.Similarly, the bagging method is selected only in wind power forecasting.We can see the variations among these base-models for different cases in Tables 1 and 2.
With the base-model selected; the next step is to select the meta-model.Taking wind power prediction as an example, the RF, XGB, GBRT, LGBM, and LR models with higher R 2 scores on four datasets in the above base-model experiments are tested and verified as meta-models, respectively.The results are shown in Table 3.It can be seen that the RMSE and MAE values of the linear model as the meta-model are lower and R 2 scores are higher on each dataset than the other models.Therefore, the linear model is selected as the meta-model in this paper.In a similar manner, the LR model as the meta-model for PV power prediction on four seasons has better prediction accuracy than the other models.In order to improve the prediction performance of the basic learner model, the Bayesian global optimization method is adopted to optimize the main parameters of these base-models, and the range of parameters are preset as listed in Table 4.For different datasets, the optimal parameters of a model may be different.In practical application, the hyperparameter optimization of the model can use offline training and online prediction to save calculation costs and improve the efficiency of the model prediction.

Wind Power Forecasting and Results Analysis
The single base-model is employed as a benchmark for comparing with the proposed stacking ensemble model.The evaluation index values on four test datasets are shown in Figure 4.The last day of each season, namely 29 February, 31 May, 31 August, and 31 December, is selected as the forecast day.The wind power forecast curve is shown in Figure 6.
In Figure 6, the base-models adaptively selected for each dataset are different in the four seasons.Furthermore, the RMSE and MAE values of the stacking ensemble method are lower than all the selected single base-models.In winter, the RMSE and MAE values of the stacking ensemble method are 0.152 and 0.102, respectively, which are the largest compared with the other three seasons.Nevertheless, its prediction error is still much smaller than the benchmarks, such as the SVR, XGB, GBRT, ADA, and LGBM methods, of which the RMSE and MAE values are 0.169 and 0.13; 0.181 and 0.132; 0.164 and 0.122; 0.168 and 0.134; 0.17 and 0.124, respectively, indicating its excellent stability and robustness.The GBRT model is selected as the base-model in all four datasets, and its error values and R 2 scores are less than the stacking ensemble model, indicating that the prediction performance of GBRT has a certain stability and robustness.In addition, the R 2 score values of the stacking ensemble model are higher than those of the single base-models for all datasets.Taking the winter case as an example, the R 2 score of the stacking ensemble method is 0.702, which is the lowest for the four seasons.Nevertheless, it is still much higher than the benchmark models, demonstrating its outstanding performance, i.e., the improvement in its prediction accuracy and an enhancement of its generalization ability.From Figure 6, the prediction error for autumn is the smallest, followed by summer, spring, and winter, consistent with the characteristics of data with weaker fluctuations.It can be concluded that when the input data at some time point fluctuates greatly, the accurate prediction ability of the stacking ensemble model needs to be improved.However, compared to all the benchmark models for different datasets, the prediction performance of the proposed method is still superior.In Figure 6, the base-models adaptively selected for each dataset are different in the four seasons.Furthermore, the RMSE and MAE values of the stacking ensemble method are lower than all the selected single base-models.In winter, the RMSE and MAE values of the stacking ensemble method are 0.152 and 0.102, respectively, which are the largest compared with the other three seasons.Nevertheless, its prediction error is still much smaller than the benchmarks, such as the SVR, XGB, GBRT, ADA, and LGBM methods, of which the RMSE and MAE values are 0.169 and 0.13; 0.181 and 0.132; 0.164 and 0.122; 0.168 and 0.134; 0.17 and 0.124, respectively, indicating its excellent stability and robustness.The GBRT model is selected as the base-model in all four datasets, and its error values and R 2 scores are less than the stacking ensemble model, indicating that the prediction performance of GBRT has a certain stability and robustness.In addition, the R 2 score values of the stacking ensemble model are higher than those of the single base-models for all datasets.Taking the winter case as an example, the R 2 score of the stacking ensemble method is 0.702, which is the lowest for the four seasons.Nevertheless, it is still much higher than the benchmark models, demonstrating its outstanding performance, i.e., the improvement in its prediction accuracy and an enhancement of its generalization ability.From Figure 6, the prediction error for autumn is the smallest, followed by summer, spring, and winter, consistent with the characteristics of data with weaker fluctuations.It can be concluded that when the input data at some time point fluctuates greatly, the accurate prediction ability of the stacking ensemble model needs to be improved.However, compared to all the benchmark models for different datasets, the prediction performance of the proposed method is still superior.
The prediction curves of the stacking ensemble model and the comparison benchmarks with 96 time points for the selected prediction day covering four seasons are shown in Figure 7.The stacking ensemble model can better track the actual output power change trend than the single benchmark, indicating better prediction performance.In Figure 7a,c,d for winter, their prediction curves are flat in some time periods due to the weak fluctuation of the input data, including wind speed and direction.Thus, the true values closely follow the actual values.In Figure 7c for autumn, the true measured power values The prediction curves of the stacking ensemble model and the comparison benchmarks with 96 time points for the selected prediction day covering four seasons are shown in Figure 7.The stacking ensemble model can better track the actual output power change trend than the single benchmark, indicating better prediction performance.In Figure 7a,c,d for winter, their prediction curves are flat in some time periods due to the weak fluctuation of the input data, including wind speed and direction.Thus, the true values closely follow the actual values.In Figure 7c for autumn, the true measured power values of the predicted day have higher fluctuations.According to the input data, wind speed and direction are random in the range of 48-96 time points, and the wind speed reaches a limit of 14~15 m/s at some time points.Therefore, the predicted power values during this time period deviate from the real measured power.However, compared to the benchmark models, the prediction curve of the stacking ensemble model is closer to the true measured values.It demonstrates that the stacking ensemble model integrates multiple algorithms with different principles, adaptively tracking changes in the datasets.Compared with the benchmark models, the proposed model for wind power forecasting has a better fitting performance and can produce more accurate point predictions along with better generalization performance and stability.

PV Power Forecasting and Results Analysis
Similar to the wind power forecasting cases, the proposed stacking ensemble model is further validated by forecasting the output power of a PV station.The division of the dataset and selection of the forecast day are the same as the case of wind power prediction.The evaluation index values and prediction curves are presented in Figures 8 and 9.
In Figure 8, the base-model adaptively selected for photovoltaic power prediction is different from that for wind power prediction.For example, in spring, the base-models for photovoltaic prediction are SVR, bagging, LGBM, ADA, and RF, while for wind power prediction, the base-models are ADA, XGB, GBRT, RF, and LGBM, demonstrating the different performance of the different models in data mining.Furthermore, the proposed stacking ensemble model has a lower prediction error and higher R 2 scores than the other comparison models for all the study cases.Taking the autumn dataset as an example, in Figure 8c, the RMSE and MAE values of the stacking ensemble model are 0.098 and 0.062, respectively, which are higher than the other three seasons; its R 2 score is 0.762 and is the lowest in all the four seasons.Nevertheless, compared to the benchmark models, its forecasting error is the lowest and its R 2 score is the highest, indicating the prediction superiority of the proposed method.
Due to the diversity of the data characteristics, the prediction error and fitting score in the different seasons vary.In spring, summer, autumn, and winter, the RMSE values are 0.104, 0.099, 0.098, and 0.079, respectively; the MAE values are 0.063, 0.069, 0.062, and 0.05, respectively; and R 2 scores are 0.894, 0.895, 0.762, and 0.942, respectively, which fully illustrate the ability of the data-driven stacking ensemble model to deep mine potential data.
Energies 2023, 16, x FOR PEER REVIEW 15 of 20 of the predicted day have higher fluctuations.According to the input data, wind speed and direction are random in the range of 48-96 time points, and the wind speed reaches a limit of 14~15 m/s at some time points.Therefore, the predicted power values during this time period deviate from the real measured power.However, compared to the benchmark models, the prediction curve of the stacking ensemble model is closer to the true measured values.It demonstrates that the stacking ensemble model integrates multiple algorithms with different principles, adaptively tracking changes in the datasets.Compared with the benchmark models, the proposed model for wind power forecasting has a better fitting performance and can produce more accurate point predictions along with better generalization performance and stability.

PV Power Forecasting and Results Analysis
Similar to the wind power forecasting cases, the proposed stacking ensemble model is further validated by forecasting the output power of a PV station.The division of the dataset and selection of the forecast day are the same as the case of wind power prediction.The evaluation index values and prediction curves are presented in Figures 8 and 9.In Figure 8, the base-model adaptively selected for photovoltaic power prediction is different from that for wind power prediction.For example, in spring, the base-models for photovoltaic prediction are SVR, bagging, LGBM, ADA, and RF, while for wind power prediction, the base-models are ADA, XGB, GBRT, RF, and LGBM, demonstrating the different performance of the different models in data mining.Furthermore, the proposed stacking ensemble model has a lower prediction error and higher R 2 scores than the other comparison models for all the study cases.Taking the autumn dataset as an example, in Figure 8c, the RMSE and MAE values of the stacking ensemble model are 0.098 and 0.062, respectively, which are higher than the other three seasons; its R 2 score is 0.762 and is the lowest in all the four seasons.Nevertheless, compared to the benchmark models, its forecasting error is the lowest and its R 2 score is the highest, indicating the prediction superiority of the proposed method.
Due to the diversity of the data characteristics, the prediction error and fitting score in the different seasons vary.In spring, summer, autumn, and winter, the RMSE values are 0.104, 0.099, 0.098, and 0.079, respectively; the MAE values are 0.063, 0.069, 0.062, and 0.05, respectively; and R 2 scores are 0.894, 0.895, 0.762, and 0.942, respectively, which fully illustrate the ability of the data-driven stacking ensemble model to deep mine potential data.

Conclusions
In this paper, an adaptive, data-driven stacking ensemble model is proposed for the output power prediction of renewable energy, including wind power and PV power.The proposed model is validated using datasets collected from an actual wind farm and PV station.The following conclusions can be drawn: (1) The models with different algorithm principles can deeply mine the space and structural characteristics of multi-dimensional heterogeneous datasets from multiple perspectives, realizing the performance complementarity among algorithms.The proposed stacking ensemble learning framework can track the dynamic changes within data, combining multiple base-models to improve the forecasting accuracy, as well as the generalization ability and adaptability.

Energies 2023 , 20 Figure 1 .
Figure 1.An adaptive stacking ensemble framework for renewable energy output power forecasting.

Figure 1 .
Figure 1.An adaptive stacking ensemble framework for renewable energy output power forecasting.

20 Figure 2 .Figure 2 .
Figure 2. The framework of stacking ensemble learning.Given the input dataset 1 , , , i m D x x x =   ( , ) , the dataset is divided into the training da- taset, test dataset and validation dataset.h Z is the h-th base-model of the first layer.The prediction output of the h Z model on the validation set is ( ) h i Z x , and the prediction result

Energies 2023 ,
16, x FOR PEER REVIEW 8 of 20 density estimator, instead of directly modeling the objective function F by a probabilistic model ( ) p f D[47,48].More details about Bayesian optimization are discussed in[45-  48].

Figure 3 .
Figure 3. Flow chart of the prediction model with Bayesian optimization.

Figure 3 .
Figure 3. Flow chart of the prediction model with Bayesian optimization.

Figure 4 .
Figure 4. Historical data example for spring from the wind farm.

Figure 5 .
Figure 5. Historical data example for spring from the PV station.

Figure 4 .
Figure 4. Historical data example for spring from the wind farm.

Figure 4 .
Figure 4. Historical data example for spring from the wind farm.

Figure 5 .
Figure 5. Historical data example for spring from the PV station.

Figure 5 .
Figure 5. Historical data example for spring from the PV station.

Figure 6 .
Figure 6.Comparison results of the different prediction models for wind power: (a) spring, (b) summer, (c) autumn, and (d) winter.

Figure 6 .
Figure 6.Comparison results of the different prediction models for wind power: (a) spring, (b) summer, (c) autumn, and (d) winter.

Figure 7 .
Figure 7. Wind power prediction curve of the different comparison models: (a) spring, (b) summer, (c) autumn, and (d) winter.

Figure 7 .
Figure 7. Wind power prediction curve of the different comparison models: (a) spring, (b) summer, (c) autumn, and (d) winter.

Figure 8 .
Figure 8.Comparison results of the different prediction models for PV power: (a) spring, (b) summer, (c) autumn, and (d) winter.

Figure 8 .
Figure 8.Comparison results of the different prediction models for PV power: (a) spring, (b) summer, (c) autumn, and (d) winter.

Figure 9
Figure9shows the prediction curves of the stacking ensemble model and the comparison models on the prediction day.From sub-graph (b) summer and (c) autumn, the real measured values of PV power have little variation, and the prediction curves of the stacking ensemble model closely follow the true output power curves, indicating a high prediction accuracy.In sub-graph (a) spring and (d) winter, the actual power value of the predicted day has greater fluctuations due to the variation of the input datasets.Therefore, there is a certain gap between the predicted values and the actual measured values, while the overall trends of the prediction curve follow the changes of the actual measured power curve, indicating the effectiveness and adaptiveness of the stacking ensemble method for PV power forecasting.In addition, for all datasets, at times with low PV output power, the prediction values of the proposed stacking model are similar to those of the benchmark model, indicating the difficulty of prediction at low power points.However, at times with high PV output, especially at time periods with large fluctuations (black box mark in sub-figures (a) spring and (d) winter), the prediction curves of the proposed stacking model more closely follow the true power curve, indicating the significant superiority and reliability of the proposed method for PV power prediction.

Figure 9
Figure9shows the prediction curves of the stacking ensemble model and the comparison models on the prediction day.From sub-graph (b) summer and (c) autumn, the real measured values of PV power have little variation, and the prediction curves of the stacking ensemble model closely follow the true output power curves, indicating a high prediction accuracy.In sub-graph (a) spring and (d) winter, the actual power value of the predicted day has greater fluctuations due to the variation of the input datasets.Therefore, there is a certain gap between the predicted values and the actual measured values, while the overall trends of the prediction curve follow the changes of the actual measured power curve, indicating the effectiveness and adaptiveness of the stacking ensemble method for PV power forecasting.In addition, for all datasets, at times with low PV output power, the prediction values of the proposed stacking model are similar to those of the benchmark model, indicating the difficulty of prediction at low power points.However, at times with high PV output, especially at time periods with large fluctuations (black box mark in sub-figures (a) spring and (d) winter), the prediction curves of the proposed stacking model more closely follow the true power curve, indicating the significant superiority and reliability of the proposed method for PV power prediction.

( 2 )
The cross-validation and Bayesian hyperparameter optimization methods are used in the model training, which can effectively improve the model's prediction accuracy.(3)The linear model is employed as a meta-model to integrate base-models.The weight of each base-model is determined by the minimum cross-validation error principle,

Table 1 .
Evaluation indices of the base-models for wind power prediction.

Table 2 .
Evaluation index of independent models for solar power prediction.

Table 3 .
Evaluation index of different meta-models.

Table 4 .
Hyperparameters of the different base learner models.