Solar Irradiation Forecasting Using Ensemble Voting Based on Machine Learning Algorithms

: This paper proposes an ensemble voting model for solar radiation forecasting based on machine learning algorithms. Several ensemble models are assessed using a simple average and a weighted average, combining the following algorithms: random forest, extreme gradient boosting, categorical boosting, and adaptive boosting. A clustering algorithm is used to group data according to the weather, and feature selection is applied to choose the most-related inputs and their past observation values. Prediction performance is evaluated by several metrics using a real-world Brazilian database, considering different prediction time horizons of up to 12 h ahead. Numerical results show the weighted average voting approach based on random forest and categorical boosting has superior performance, with an average reduction of 6% for MAE, 3% for RMSE, 16% for MAPE, and 1% for R 2 when predicting one hour in advance, outperforming individual machine learning algorithms and other ensemble models.


Introduction
Nowadays, solar energy is receiving significant attention as one of the main renewable energy sources, with great potential for contributing to the reduction of fossil fuel consumption and CO 2 emissions [1].As reported by [2], solar energy achieved an increase of 137 GW (+19%) in 2021, reaching a total of 854 GW of capacity and accounting for 28% of the renewable generation portfolio.
Solar power generation depends on the amount of available solar irradiation.It is an intermittent resource of energy, sensitive to random and uncontrollable weather changes.It leads to many challenges in the appropriate integration of solar power generation into the power grid, especially under high penetration levels.Solar forecasting errors can cause transmission congestion, forced solar generation curtailment, or activation of an expensive set of generators, implying extra costs [3].Accurate solar irradiation forecasting is therefore an essential task in guaranteeing the reliable and safe operation of power systems.Additionally, it helps to maintain power quality, avoid unexpected expenditure of expensive energy resources, reduces the need for large backup energy storage, and increases the penetration of solar-powered systems [4].
In general, forecasting methods can be classified into physical, statistical, and artificial intelligence (AI) methods [5].In physical methods, forecasting is carried out based on numerical weather prediction (NWP), cloud observations by satellite, or total sky imagers (TSIs) and using physical data such as temperature, pressure, humidity, and cloudiness.In contrast, statistical methods are based on historical series of meteorological data, which is simpler than physical methods.Autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), and exponential smoothing (ES) are examples of statistical methods used [6].
With the gradual emergence of AI, machine learning (ML) methods have become one of the most popular approaches for solar irradiation forecasting, presenting promising results.ML is a branch of AI that uses algorithms to automatically learn insights and recognize patterns from big datasets [7].The most-used algorithms are neural networks (NN), long short-term memory (LSTM), support vector regression (SVR), random forest (RF), k-nearest neighbors (kNN), and decision trees (DT).
Several studies can be found in the literature using ML algorithms to forecast solar irradiation [8].The authors in [9] compare three different ML algorithms to predict hourly solar irradiance: SVR, nonlinear autoregressive (NAR), and NN.They applied k-means clustering to classify data according to the weather and highlighted that the SVR model performed better.In [10], the authors examined the potential of different ML models to forecast hourly and daily solar radiation.The models used were NN, a recurrent neural network (RNN), gated recurrent units (GRUs), LSTM, and SVR.GRUs presented a slightly superior performance compared to the other models.The authors in [11] employed different ML algorithms to predict the hourly solar irradiance: NN, SVR, fuzzy inference system (FIS), and adaptive neuro-fuzzy inference system (ANFIS).The algorithms' performance was verified using only past time-series values of solar radiation as input and meteorological data as explanatory variables.Results showed the algorithms performed better when using meteorological data as input.The authors in [12] proposed a hybrid deep learning model for hourly solar irradiance forecasting, combining wavelet packet decomposition (WPD), a convolutional neural network (CNN), LSTM, and multilayer perceptron (MLP).Results showed the proposed hybrid model has better prediction accuracy than the other methods tested.The authors in [13] presented a combination of auto-encoder (AE) and LSTM for long-term solar radiation forecasting.The results showed the proposed hybrid method has superior performance compared with state-of-the-art models such as LSTM, GRU, and RF.
More recently, ensemble methods have increasingly been used to improve algorithms' prediction performance [14].The ensemble methods combine multiple algorithms into one to make an enhanced predictor.The ensemble can be homogeneous when using the same type of base learning algorithms or heterogeneous when combining different types of algorithms.Different techniques can be used to combine the base algorithms, such as bagging, boosting, stacking, or voting.
Several ensemble approaches have been proposed to forecast solar irradiation.The authors in [15] proposed a multistep-ahead solar radiation forecasting model based on the light gradient boosting machine (LightGBM), which is a boosting homogeneous ensemble learning technique.They compared the results with several tree-based ensembles and deep learning methods, and the proposed model achieved better performance.The authors in [16] proposed a heterogeneous ensemble model based on stacking for day-ahead solar power forecasting, combining the following ML algorithms: RF, extreme gradient boosting (XGBT), adaptive boosting (AdaBoost) and extra trees regressor (ETR).The authors in [17] proposed a stacking heterogeneous ensemble model combining XGBT and deep neural networks (DNNs) to forecast hourly solar irradiance.The input dataset included meteorological parameters and clear-sky index.In [18], the authors investigated the performance of homogeneous ensemble models based on bagging and boosting for solar radiation forecasting, such as boosted trees, bagged trees, RF, and generalized random forest.In [19], the authors proposed a homogeneous ensemble model using RF to forecast solar generation.Cluster analysis was first applied, and predictions were weighted by ridge regression to obtain the final prediction.In [20], the authors proposed a hybrid model for solar generation prediction that combined a statistical method with ML models.The ML models included LSTM, GRU, AE LSTM, and AE GRU.Several ensembles were explored with simple averaging and weighted averaging using linear, non-linear, and inverse approaches.
Table 1 summarizes the main differences between previous works and the proposed work in different aspects, including the forecasting variable, algorithms used, type of ensembles tested, cluster analysis, feature selection, and forecasting horizon.

√
In all references mentioned, the results show ensemble models offer superior prediction performance compared to individual regressor models.Although several papers have investigated ensemble methods based on ML algorithms to forecast solar irradiation, to the best of the authors' knowledge, only one study has been conducted combining algorithms through ensemble voting [20].However, this paper does not explore cluster analysis and feature selection.Furthermore, our literature review shows that some ML algorithms with great performance, such as categorical boosting (CatBoost), have not yet been applied to solar energy forecasting.
This paper attempts to address this knowledge gap in the literature by proposing an ensemble voting method based on several ML algorithms to forecast solar irradiation in a city in Brazil using historic meteorological data.A feature selection was applied to choose the most important inputs and their delay values, and a clustering algorithm was used to group data with similar weather patterns.The ensemble voting was constructed using the following algorithms: random forest, extreme gradient boosting, categorical boosting, and adaptive boosting.First, the performance of each ML algorithm was assessed with several commonly used metrics: MAE, MAPE, RMSE, and R 2 .Then, the ensemble voting model was built, combining ML algorithms using two approaches: the simple average and the weighted average.Several ensembles were tested, combining all the ML algorithms and later discarding the algorithm with the lowest performance until the two algorithms that individually produced the best results remained.The prediction accuracy was evaluated for different forecast horizons, from 1 h up to 12 h ahead, and a well-known dataset for time-series forecasting was used to validate the results.Moreover, the Diebold-Mariano statistical test was applied to compare the proposed ensemble voting algorithm against the other methods and check whether accuracy differences between the models were statistically significant.
The key contributions of this paper can be summarized as follows: • This work is organized as follows: Section 2 presents the basic theory of the ML algorithms used, Section 3 presents the proposed methodology and the database, and Section 4 shows the results, followed by our main conclusions in Section 5.

Machine Learning Algorithms
ML is a field of AI that builds algorithms for automated data analysis, providing more comprehensive insights into data.ML helps to handle large amounts of data efficiently and effectively, and even acts based on the information, which increases its demand.
ML algorithms use historical data as input to find and learn patterns to predict new output values.The input historical dataset is divided into training and testing datasets.
The training dataset has an output variable that needs to be predicted or classified.ML algorithms infer patterns from the training dataset and apply them to the test dataset for prediction or classification.The workflow of supervised ML algorithms is shown in Figure 1.The algorithms used in this paper to compose the voting ensemble are: RF, XGBT, CatBoost, and AdaBoost.
commonly used metrics: MAE, MAPE, RMSE, and R 2 .Then, the ensemble voting model was built, combining ML algorithms using two approaches: the simple average and the weighted average.Several ensembles were tested, combining all the ML algorithms and later discarding the algorithm with the lowest performance until the two algorithms that individually produced the best results remained.The prediction accuracy was evaluated for different forecast horizons, from 1 h up to 12 h ahead, and a well-known dataset for time-series forecasting was used to validate the results.Moreover, the Diebold-Mariano statistical test was applied to compare the proposed ensemble voting algorithm against the other methods and check whether accuracy differences between the models were statistically significant.
The key contributions of this paper can be summarized as follows:  Propose an ensemble voting combining random forest, extreme gradient boosting, categorical boosting, and adaptive boosting, which had never before been implemented for solar irradiation forecasting;  Apply a clustering algorithm to group data with similar weather patterns;  Propose an ensemble feature selection method to select the most significant input variables and their delay values;  Evaluate the performance of algorithms for different forecasting horizons.
This work is organized as follows: Section 2 presents the basic theory of the ML algorithms used, Section 3 presents the proposed methodology and the database, and Section 4 shows the results, followed by our main conclusions in Section 5.

Machine Learning Algorithms
ML is a field of AI that builds algorithms for automated data analysis, providing more comprehensive insights into data.ML helps to handle large amounts of data efficiently and effectively, and even acts based on the information, which increases its demand.
ML algorithms use historical data as input to find and learn patterns to predict new output values.The input historical dataset is divided into training and testing datasets.
The training dataset has an output variable that needs to be predicted or classified.ML algorithms infer patterns from the training dataset and apply them to the test dataset for prediction or classification.The workflow of supervised ML algorithms is shown in Figure 1.The algorithms used in this paper to compose the voting ensemble are: RF, XGBT, Cat-Boost, and AdaBoost.

Random Forest (RF)
RF is a commonly used ML algorithm trademarked by Leo Breiman in [21], which combines the output of multiple DTs to reach a single result.RF is a set of individual trees in which each tree predictor is trained using a different random subset of the training set, sampled using bagging or pasting methods.The prediction of a regression tree is simply given by the mean target value of the training data reaching the leaf node, and the prediction of an RF regressor is obtained by averaging the predictions of the individual regression trees.Substantial improvements in classification and regression have been obtained.

Random Forest (RF)
RF is a commonly used ML algorithm trademarked by Leo Breiman in [21], which combines the output of multiple DTs to reach a single result.RF is a set of individual trees in which each tree predictor is trained using a different random subset of the training set, sampled using bagging or pasting methods.The prediction of a regression tree is simply given by the mean target value of the training data reaching the leaf node, and the prediction of an RF regressor is obtained by averaging the predictions of the individual regression trees.Substantial improvements in classification and regression have been obtained.

Extreme Gradient Boosting (XGBT)
XGBT is a model that was first proposed by Tianqi Chen and Carlos Guestrin in 2011 and has been continuously optimized and improved in follow-up studies by many scientists [22].The model is a learning framework based on boosting tree models.XGBT is a decision tree-based ensemble ML algorithm, meaning it is a predictor built out of many small predictors.It builds a sequential series of weak learners, in which each learner tries to complement the others and correct for the residuals in the predictions made by all previous learners.It handles missing data and employs regularization to prevent overfitting for individual predictors.It uses parallel processing, considerably improving the training time and making it an advanced algorithm.

Categorical Boosting (CatBoost)
CatBoost is a gradient-boosting framework developed by Prokhorenkova et al. [23] in 2017 and uses a binary decision tree as base predictor.CatBoost has two main differences compared with other boosting algorithms.It uses the concept of ordered boosting, which is a random permutation approach, to train the model with a subset of data while calculating residuals with another subset, thus preventing overfitting.Furthermore, the same splitting criterion is used at all nodes, always creating symmetric trees.These trees are balanced and less prone to overfitting, which significantly speeds up the model execution.CatBoost is, however, sensitive to hyperparameter tuning.

Adaptive Boosting (AdaBoost)
AdaBoost is the first boosting meta-learning algorithm, proposed by Freud and Schapire in [24].It is based on the idea that a better model can be created by combining multiple "weak" models added sequentially, meaning the mistakes of earlier models are learned by their successors.Each model is trained using the same data set, with different weights assigned to each one based on its accuracy.Many researchers enhance the algorithm to obtain better performance, lower computation cost, and higher speed.However, AdaBoost is an algorithm with a convex loss function and is sensitive to noise and outliers in the data, and thus prone to overfitting.

Ensemble Voting
Ensemble voting can combine algorithms using a simple average or a weighted average [25].In the simple averaging approach, the final forecasted irradiation is obtained by taking the mean value of the forecast results from individual ML models, as shown in (1): where m is the number of ML algorithms used in the ensemble, ŷj is the predicted value of the j-algorithm, and ŷ is the final prediction value of solar irradiation.
In the weighted average approach, the final forecasted irradiation is obtained based on the weighted arithmetic mean, assigning different weights to the ML algorithms based on their accuracy.In this case, weights are assigned using integers starting at 1 depending on the individual performance of the ML algorithm.Then, the forecasted irradiation is evaluated as shown in (2): where w j is the weight of the j-algorithm.

Proposed Methodology
This section describes the proposed methodology for solar irradiation forecasting.First, data is pre-processed with cleaning, normalization, correlation analysis, and splitting data into training, validation, and test sets.Then, a clustering technique is applied to group data with similar weather patterns.Next, feature selection is performed to choose the most important inputs and delay values.After that, ensemble learning methods are applied to forecast solar irradiation.Finally, results are analyzed using several performance metrics.Each step of the proposed approach is explained in detail next.The flowchart of the proposed method is presented in Figure 2.

Proposed Methodology
This section describes the proposed methodology for solar irradiation for First, data is pre-processed with cleaning, normalization, correlation analysis, a ting data into training, validation, and test sets.Then, a clustering technique is a group data with similar weather patterns.Next, feature selection is performed t the most important inputs and delay values.After that, ensemble learning met applied to forecast solar irradiation.Finally, results are analyzed using severa mance metrics.Each step of the proposed approach is explained in detail n flowchart of the proposed method is presented in Figure 2.     2 shows all variables in the database.
Figure 4 depicts the time series of solar irradiation values over the period covered in this study.The graph indicates the strong seasonal pattern that can be found in solar irradiation data.For this reason, the clustering technique was applied to divide data according to seasonality, using solar and meteorological parameters.

Pre-Processing
Missing values and outliers were replaced using linear interpolation, and data normalization was applied using the min-max normalization method to scale the data into [0, 1].The dataset was divided into training, validation, and test sets, with a split of 70%-10%-20%, respectively.The training set was used to construct the forecasting model, and the test set was used to evaluate the performance of the model.The validation set was used to evaluate the model while tuning the hyperparameters.Next, Pearson's correlation analysis was performed [28].Figure 5 shows the correlation matrix between input variables.There is a high linear correlation between variables and their minimum and maximum values.Therefore, the following variables were removed: maximum and minimum atmospheric pressure, maximum and minimum temperature, maximum and minimum dew point temperature, and maximum and minimum relative humidity.

Clustering
Clustering techniques consist of analyzing data to group similar samples into clusters.In this work, the k-means clustering algorithm was used to group data with similar weather patterns and capture seasonality [28].Three indices were used to determine the number of clusters: Calinski-Harabasz, silhouette, and Davies-Bouldin.Figure 6 shows the results.The number of clusters k varied from 2 to 10.For Calinski-Harabasz and silhouette, the best number of clusters is the one with the highest value, which is achieved with three clusters.For Davies-Bouldin, the best clustering number is the one with the

Pre-Processing
Missing values and outliers were replaced using linear interpolation, and data normalization was applied using the min-max normalization method to scale the data into [0, 1].The dataset was divided into training, validation, and test sets, with a split of 70%-10%-20%, respectively.The training set was used to construct the forecasting model, and the test set was used to evaluate the performance of the model.The validation set was used to evaluate the model while tuning the hyperparameters.Next, Pearson's correlation analysis was performed [28].Figure 5 shows the correlation matrix between input variables.There is a high linear correlation between variables and their minimum and maximum values.Therefore, the following variables were removed: maximum and minimum atmospheric pressure, maximum and minimum temperature, maximum and minimum dew point temperature, and maximum and minimum relative humidity.

Pre-Processing
Missing values and outliers were replaced using linear interpolation, and data normalization was applied using the min-max normalization method to scale the data into [0, 1].The dataset was divided into training, validation, and test sets, with a split of 70%-10%-20%, respectively.The training set was used to construct the forecasting model, and the test set was used to evaluate the performance of the model.The validation set was used to evaluate the model while tuning the hyperparameters.Next, Pearson's correlation analysis was performed [28].Figure 5 shows the correlation matrix between input variables.There is a high linear correlation between variables and their minimum and maximum values.Therefore, the following variables were removed: maximum and minimum atmospheric pressure, maximum and minimum temperature, maximum and minimum dew point temperature, and maximum and minimum relative humidity.

Clustering
Clustering techniques consist of analyzing data to group similar samples into clusters.In this work, the k-means clustering algorithm was used to group data with similar weather patterns and capture seasonality [28].Three indices were used to determine the number of clusters: Calinski-Harabasz, silhouette, and Davies-Bouldin.Figure 6 shows the results.The number of clusters k varied from 2 to 10.For Calinski-Harabasz and silhouette, the best number of clusters is the one with the highest value, which is achieved with three clusters.For Davies-Bouldin, the best clustering number is the one with the

Clustering
Clustering techniques consist of analyzing data to group similar samples into clusters.In this work, the k-means clustering algorithm was used to group data with similar weather patterns and capture seasonality [28].Three indices were used to determine the number of clusters: Calinski-Harabasz, silhouette, and Davies-Bouldin.Figure 6 shows the results.The number of clusters k varied from 2 to 10.For Calinski-Harabasz and silhouette, the best number of clusters is the one with the highest value, which is achieved with three clusters.For Davies-Bouldin, the best clustering number is the one with the smallest value, which is also achieved with three clusters.The three metrics are coincident in the optimum number k = 3.Therefore, in this paper the dataset was divided into three clusters.Cluster 1 corresponds to 31.09% of the data, cluster 2 corresponds to 30.05%, and cluster 3 to 38.86%.
The average value of daily solar irradiation in Cluster 1 is 1.34 MJ/m 2 , in Cluster 2 it is 1.46 MJ/m 2 , and in Cluster 3 it is 2.03 MJ/m 2 .As can be seen, Cluster 3 presents higher levels of solar irradiation, meaning it holds data with sunnier days, in contrast to Cluster 1, which presents lower levels of solar irradiation.Figure 7 shows the percentage of days per month in each cluster.Data was grouped into clusters based on meteorological conditions of each day, and some months have data belonging to two or three clusters.Thus, data are grouped differently compared to the traditional division that follows the seasons of the year.The average value of daily solar irradiation in Cluster 1 is 1.34 MJ/m 2 , in Cluster 2 it is 1.46 MJ/m 2 , and in Cluster 3 it is 2.03 MJ/m 2 .As can be seen, Cluster 3 presents higher levels of solar irradiation, meaning it holds data with sunnier days, in contrast to Cluster 1, which presents lower levels of solar irradiation.Figure 7 shows the percentage of days per month in each cluster.Data was grouped into clusters based on meteorological conditions of each day, and some months have data belonging to two or three clusters.Thus, data are grouped differently compared to the traditional division that follows the seasons of the year.

Feature Selection
Feature selection is a key step to improve the prediction performance of ML algorithms, reducing data size and model complexity.In this paper, three algorithms were used to select the most important inputs and their delay values: RF [21], mutual information (MI), and relief [29,30].Each algorithm evaluates and assigns an important value to each variable.These values are normalized, and the final variable importance ranking is achieved by calculating the average value of the three algorithms.The threshold between the selected and discarded variables is empirically found using the training and validation sets.Figure 8 shows the feature importance ranking for each cluster.The importance of several lags was tested for each variable, from 1 to 72 (Xt − 1 ... Xt − 72).The final data set with the selected variables and their lags are shown in Table 3.The average value of daily solar irradiation in Cluster 1 is 1.34 MJ/m 2 , in Cluster 2 it is 1.46 MJ/m 2 , and in Cluster 3 it is 2.03 MJ/m 2 .As can be seen, Cluster 3 presents higher levels of solar irradiation, meaning it holds data with sunnier days, in contrast to Cluster 1, which presents lower levels of solar irradiation.Figure 7 shows the percentage of days per month in each cluster.Data was grouped into clusters based on meteorological conditions of each day, and some months have data belonging to two or three clusters.Thus, data are grouped differently compared to the traditional division that follows the seasons of the year.

Feature Selection
Feature selection is a key step to improve the prediction performance of ML algorithms, reducing data size and model complexity.In this paper, three algorithms were used to select the most important inputs and their delay values: RF [21], mutual information (MI), and relief [29,30].Each algorithm evaluates and assigns an important value to each variable.These values are normalized, and the final variable importance ranking is achieved by calculating the average value of the three algorithms.The threshold between the selected and discarded variables is empirically found using the training and validation sets.Figure 8 shows the feature importance ranking for each cluster.The importance of several lags was tested for each variable, from 1 to 72 (Xt − 1 ... Xt − 72).The final data set with the selected variables and their lags are shown in Table 3.

Feature Selection
Feature selection is a key step to improve the prediction performance of ML algorithms, reducing data size and model complexity.In this paper, three algorithms were used to select the most important inputs and their delay values: RF [21], mutual information (MI), and relief [29,30].Each algorithm evaluates and assigns an important value to each variable.These values are normalized, and the final variable importance ranking is achieved by calculating the average value of the three algorithms.The threshold between the selected and discarded variables is empirically found using the training and validation sets.Figure 8 shows the feature importance ranking for each cluster.The importance of several lags was tested for each variable, from 1 to 72 (X t − 1 . . .X t − 72 ).The final data set with the selected variables and their lags are shown in Table 3.

Hyperparameter Optimization
In ML, it is important to find the optimal hyperparameter values for a given algorithm, so that its performance is maximized.In this paper, the hyperparameters were selected using the GridSearchCV technique from the scikit-learn library, which combines grid search and cross-validation [31].GridSearchCV tries all the combinations of hyperparameter values pre-defined by the user and evaluates the model for each combination using the cross-validation method.Cross-validation is a resampling method that uses different portions of the data to test and train the model.In this paper, K-fold cross-validation was adopted with k = 5.Thus, the set of hyperparameters that provided the highest accuracy was considered the best.The hyperparameters of the forecasting models are presented in Table 4.

Hyperparameter Optimization
In ML, it is important to find the optimal hyperparameter values for a given algorithm, so that its performance is maximized.In this paper, the hyperparameters were selected using the GridSearchCV technique from the scikit-learn library, which combines grid search and cross-validation [31].GridSearchCV tries all the combinations of hyperparameter values predefined by the user and evaluates the model for each combination using the cross-validation method.Cross-validation is a resampling method that uses different portions of the data to test and train the model.In this paper, K-fold cross-validation was adopted with k = 5.Thus, the set of hyperparameters that provided the highest accuracy was considered the best.The hyperparameters of the forecasting models are presented in Table 4.

Performance Metrics
The performance of algorithms is analyzed using several metrics, such as the mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and the coefficient of determination (R 2 ).These metrics are evaluated as shown in Equations ( 3)- (6), where, F i is the forecasted value, O i is the observed value, O is the mean value of observations, and n is the number of samples.The lower the number of errors, the better the prediction, and an R 2 equal to 1 indicates the model perfectly performs on unseen data.

Results and Discussion
In this section, the forecasting accuracy of each ML algorithm was first evaluated using the test dataset.Then, different ensemble models were constructed, combining the ML algorithms using the voting average approach and the weighted average approach.Finally, the ensemble models were tested for different prediction horizons.

Machine Learning Algorithms
Table 5 shows the performance metrics for each algorithm and cluster.The best results are highlighted in bold.Results show that CatBoost outperformed the other ML algorithms in all metrics and all clusters.AdaBoost presented the poorest performance in all metrics to all clusters.Figure 9 shows the histogram of absolute forecasting errors obtained with the best and the worst models, CatBoost and AdaBoost.In CatBoost, the error distribution peak is centered around zero more sharply in all clusters, indicating small errors in most predictions.For the AdaBoost model, error distribution is clearly more dispersed, indicating larger prediction errors.

Voting Ensemble
This paper tested three approaches of voting average (VOA) and one approach of voting weighted average (VOWA).In the simple average voting approach, the first ensemble, VOA1, was built combining all the ML algorithms investigated.The second ensemble, VOA2, combined the ML algorithms investigated and discarded AdaBoost, which was the algorithm with the lowest performance.The third ensemble, VOA3, combined the ML algorithms investigated and discarded the two worst-performing algorithms, retaining Cat-Boost and RF.The weighted average approach, VOWA, combined the same two algorithms that produced the best results, which were CatBoost and RF.In summary, the following ensemble methods were explored: It is important to evaluate the computational performance of an algorithm when dealing with real-world applications.Table 6 shows the mean value of 10 runs of performance metrics and learning speed in seconds for each ensemble algorithm and cluster.All experiments were performed on a computer with an Intel i5-1035G1 CPU (1.19 GHz) and 8.0 GByte RAM.

Voting Ensemble
This paper tested three approaches of voting average (VOA) and one approach of voting weighted average (VOWA).In the simple average voting approach, the first ensemble, VOA1, was built combining all the ML algorithms investigated.The second ensemble, VOA2, combined the ML algorithms investigated and discarded AdaBoost, which was the algorithm with the lowest performance.The third ensemble, VOA3, combined the ML algorithms investigated and discarded the two worst-performing algorithms, retaining CatBoost and RF.The weighted average approach, VOWA, combined the same two algorithms that produced the best results, which were CatBoost and RF.In summary, the following ensemble methods were explored:   The comparison of the developed models illustrates that VOWA outperformed the other ensembles, with lower error metrics except for R 2 in cluster 1.It also presented better results than the ML algorithms individually.VOWA's average learning time was 69.4 s, which is acceptable for planning purposes.
Based on Table 6, all the forecasting ensemble models had a high correlation coefficient, close to 1, with an average of 0.85.The learning time for the VOWA model in cluster 3 was larger than in the other clusters, because cluster 3 had more samples from the dataset.
Figure 10 shows the observed and forecasted hourly solar irradiation in addition to the residuals obtained using the VOWA ensemble model.The results show that the forecast model can follow variations in solar irradiation.Since data are grouped into clusters based on the meteorological conditions of each day, they lose continuity.For this reason, the figure presents the last 30 days of the test set of each cluster.Weights used in VOWA are shown in Table 7; the model with the best performance had two votes and the one with the lowest had one vote.The forecasting errors obtained in Cluster 3 were lower compared to the errors obtained in the other clusters, because Cluster 3 had data with higher levels of solar irradiation: sunnier days with lower irradiance variability.Cluster 1 had data with lower levels of solar irradiation, referred to as cloudy days with higher irradiance variability.Although its absolute error curve had a higher error peak than the other clusters, at around 2 MJ/m 2 , the error metrics presented in Table 6 indicate this cluster achieved lower error.Since there were no previous studies utilizing the voting regressor method, it is not possible to make comparisons with previous studies.

Statistical Analysis
In this section, the proposed voting ensemble method is compared against the other ML algorithms, using the widely used Diebold-Mariano (DM) statistical test [32].This test analyzes whether any difference in accuracy between two forecasting models is statistically significant.
Define the forecast errors from two competing algorithms as: where ŷjt is the forecasted value of the j-algorithm (j = 1, 2), y t is the observed value, and n is the number of samples.The loss function of the forecast error g e jt is usually taken as the squared error or the absolute error.
The DM test is based on the loss differential d t between the two competing forecasts, defined as: The two forecasts have equal accuracy if, and only if, the loss differential has zero expectation for all t.The null hypothesis (H 0 ) states that the two forecasts have equivalent accuracy (H 0 : E(d t ) = 0 ∀t).The alternative hypothesis (H a ) states that the two forecasts have different levels of accuracy (H a : E(d t ) = 0 ∀t).In the DM test, a significance level of p = 0.05 is established.Then, the decision whether to reject the null hypothesis or not is based on the resulting p-value.If the p-value is greater than 0.05, it will fail to reject the null hypothesis, and the differences observed between the performance of the two forecasting models are not significant.Otherwise, if the p-value is less than 0.05, the null hypothesis will be rejected, and the differences observed between the performance of the two forecasting models are significant.
Table 8 shows the results of the DM test, comparing the performance of the proposed VOWA voting model with the other forecasting algorithms two by two.The p-values are less than the threshold value of 0.05 in all cases, thereby allowing the null hypothesis H 0 to be rejected.This indicates that the observed differences are significant, and the proposed VOWA model is significantly more accurate than the other models.

Different Forecast Horizons
The proposed VOWA ensemble model was evaluated for different forecast horizons, from 1 h to 12 h ahead.Figure 11 shows the results.As expected, the forecasting errors increase as the prediction horizon becomes larger and the coefficient of determination decreases.Note that performance indices vary more significantly for the forecast window of up to 3 h ahead.From this point, the forecasts do not deteriorate, which is positive.In addition, Cluster 1 shows more errors and higher deterioration in forecasting performance.This can be explained by the fact that this cluster holds data with rainier days, with a higher level of precipitation and a lower level of solar irradiation.

Comparison with Benchmark Dataset
In this section, a well-known dataset for time-series forecasting is used to validate the proposed algorithm.The temperature time series shows the mean monthly air temperature measured at Nottingham Castle from 1920 to 1939 [33].
The dataset is divided into training and test sets, with a split of 70% and 30%, respectively.The VOWA approach was obtained following the same procedure previously adopted: combining the two ML algorithms that produced the best results, in this case, AdaBoost and RF.The forecasting results are presented on Table 9.In the benchmark results, VOWA outperformed the other ML algorithms in all metrics and in all clusters, confirming superior performance.

Conclusions
This paper proposes an ensemble voting method using ML algorithms to forecast solar irradiation.Several ensemble models were evaluated using average voting and weighted voting based on RF, XGBT, CatBoost, and AdaBoost.Feature selection was performed to select inputs and their corresponding delay values, and a clustering algorithm was used to group data according to weather characteristics.
First, the performance of the ML models was tested individually.Results showed that CatBoost had the best forecasting performance against the other ML models, presenting the following average performance metrics among the three clusters: MAE of 0.259, RMSE of 0.379, MAPE of 26.283%, and R 2 of 0.845.Although AdaBoost has practical advantages, such as low implementation complexity and lower tuning parameters, it presented the worst forecasting performance, with the following average metrics among the three clusters: MAE of 0.327, RMSE of 0.435, MAPE of 45.650%, and R 2 of 0.798.Then, different voting ensemble models were tested.Weighted average voting based on CatBoost and RF presented superior accuracy compared to the single algorithms and other ensembles tested, with the following average metrics: MAE of 0.256, RMSE of 0.377, MAPE of 25.659%, and R 2 of 0.848.The Diebold-Mariano statistical test was applied to compare the weighted average voting algorithm against the other methods.Results showed that the proposed model is significantly more accurate than the other models.
The performance of weighted average voting was also tested for different forecast horizons, from 1 h to 12 h ahead.Results showed that accuracy deteriorates more significantly for the forecast window of up to 3 h ahead and remains almost the same from this point forward.A well-known dataset for time-series forecasting was used to validate the results and confirmed superior performance for the weighted average voting algorithm.
This study presented interesting results, but several issues still need to be further investigated.The selection of appropriate values for weights in the weighted average voting approach remains a challenging task, and optimization algorithms can be tested in future studies.Another direction for future work is to apply the proposed methodology in other datasets from places with different weather conditions.Furthermore, the proposed methodology can be applied to solving other forecasting problems, such as wind speed prediction.
3.1.Data Description Simulations were conducted using real-world data collected from the Braz tional Institute of Meteorology (INMET) [26].INMET's weather stations are e with devices to measure temperature (thermometer), wind (anemometer), rain ( eter), atmospheric pressure (barometer), and solar irradiation (pyranometer), in locations considered strategic and in areas of interest for each Brazilian state.The used is from the city of Salvador, located on the Brazilian coast (12°58′28 38°28′35.9940″W) at 8 m altitude, as shown in Figure 3.It has a tropical climate c ized by high temperatures ranging from 22 °C to 31 °C, high humidity, and ra year round.The period covered by the database is from 1 January 2015 to 23 Aug in sampling intervals of 1 h.Since global solar irradiation is measured while hav light on the sensor, only daytime samples were considered, from 7:00 a.m. to Table2shows all variables in the database.
3.1.Data Description Simulations were conducted using real-world data collected from the Brazilian National Institute of Meteorology (INMET) [26].INMET's weather stations are equipped with devices to measure temperature (thermometer), wind (anemometer), rain (pluviometer), atmospheric pressure (barometer), and solar irradiation (pyranometer), installed in locations considered strategic and in areas of interest for each Brazilian state.The database used is from the city of Salvador, located on the Brazilian coast (12 • 58 28.9992 S, 38 • 28 35.9940W) at 8 m altitude, as shown in Figure 3.It has a tropical climate characterized by high temperatures ranging from 22 • C to 31 • C, high humidity, and rainfall all year round.The period covered by the database is from 1 January 2015 to 23 August 2022, in sampling intervals of 1 h.Since global solar irradiation is measured while having sunlight on the sensor, only daytime samples were considered, from 7:00 a.m. to 5:00 p.m. Table

Sustainability 2023 ,
15,  x FOR PEER REVIEW 9 of 20 smallest value, which is also achieved with three clusters.The three metrics are coincident in the optimum number k = 3.Therefore, in this paper the dataset was divided into three clusters.Cluster 1 corresponds to 31.09% of the data, cluster 2 corresponds to 30.05%, and cluster 3 to 38.86%.

Figure 6 .
Figure 6.Clustering evaluation for different k.

Figure 7 .
Figure 7. Percentage of days per month in each cluster.

Figure 6 .
Figure 6.Clustering evaluation for different k.

Sustainability 2023 ,
15,  x FOR PEER REVIEW 9 of 20 smallest value, which is also achieved with three clusters.The three metrics are coincident in the optimum number k = 3.Therefore, in this paper the dataset was divided into three clusters.Cluster 1 corresponds to 31.09% of the data, cluster 2 corresponds to 30.05%, and cluster 3 to 38.86%.

Figure 6 .
Figure 6.Clustering evaluation for different k.

Figure 7 .
Figure 7. Percentage of days per month in each cluster.

Figure 7 .
Figure 7. Percentage of days per month in each cluster.

Figure 11 .
Figure 11.Solar irradiation for different forecast horizons using VOWA.

Figure 11 .
Figure 11.Solar irradiation for different forecast horizons using VOWA.

Figure 11 .
Figure 11.Solar irradiation for different forecast horizons using VOWA.

Table 1 .
Literature review of recent papers using ML algorithms for solar forecasting.

Table 3 .
Selected Set of Input Features for ML Algorithms.

Table 4 .
Selected Hyperparameters for ML Algorithms.

Table 5 .
Forecasting Accuracy of ML Models.

Table 6 .
Forecasting Accuracy of Ensemble Models.

Table 7 .
Weights of Ensemble Models.

Table 9 .
Forecasting Accuracy of ML Models for Benchmark Dataset.