Ensemble Machine-Learning Models for Accurate Prediction of Solar Irradiation in Bangladesh

Alam, Md Shafiul; Al-Ismail, Fahad Saleh; Hossain, Md Sarowar; Rahman, Syed Masiur

doi:10.3390/pr11030908

Open AccessArticle

Ensemble Machine-Learning Models for Accurate Prediction of Solar Irradiation in Bangladesh

¹

Applied Research Center for Environment & Marine Studies, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia

²

Department of Electrical Engineering, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia

³

Interdisciplinary Research Center of Renewable Energy and Power Systems (IRC-REPS), King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia

⁴

Department of EEE, International Islamic University Chittagong (IIUC), Chittagong 4318, Bangladesh

^*

Author to whom correspondence should be addressed.

Processes 2023, 11(3), 908; https://doi.org/10.3390/pr11030908

Submission received: 3 February 2023 / Revised: 6 March 2023 / Accepted: 13 March 2023 / Published: 16 March 2023

(This article belongs to the Special Issue Advanced Technologies and Materials for Sustainability in Energy Systems and Environmental Processes)

Download

Browse Figures

Versions Notes

Abstract

:

Improved irradiance forecasting ensures precise solar power generation forecasts, resulting in smoother operation of the distribution grid. Empirical models are used to estimate irradiation using a wide range of data and specific national or regional parameters. In contrast, algorithms based on Artificial Intelligence (AI) are becoming increasingly popular and effective for estimating solar irradiance. Although there has been significant development in this area elsewhere, employing an AI model to investigate irradiance in Bangladesh is limited. This research forecasts solar radiation in Bangladesh using ensemble machine-learning models. The meteorological data collected from 32 stations contain maximum temperature, minimum temperature, total rain, humidity, sunshine, wind speed, cloud coverage, and irradiance. Ensemble machine-learning algorithms including Adaboost regression (ABR), gradient-boosting regression (GBR), random forest regression (RFR), and bagging regression (BR) are developed to predict solar irradiance. With the default parameters, the GBR provides the best performance as it has the lowest standard deviation of errors. Then, the important hyperparameters of the GRB are tuned with the grid-search algorithms to further improve the prediction accuracy. On the testing dataset, the optimized GBR has the highest coefficient of determination (

R^{2}

) performance, with a value of 0.9995. The same approach also has the lowest root mean squared error (0.0007), mean absolute percentage error (0.0052), and mean squared logarithmic error (0.0001), implying superior performance. The absolute error of the prediction lies within a narrow range, indicating good performance. Overall, ensemble machine-learning models are an effective method for forecasting irradiance in Bangladesh. They can attain high accuracy and robustness and give significant information for the assessment of solar energy resources.

Keywords:

solar irradiance; machine-learning; ensemble models; performance matrices; prediction error; hyperparameters

1. Introduction

Renewable energy sources (RESs) are seen as a more environmentally friendly and sustainable alternative to fossil fuels. Better use of renewable energy can result in decreased greenhouse gas emissions, financial savings, increased reliability and scalability, greater energy security, and a more diverse energy mix. Many governments have established goals for boosting renewable energy use. The European Union, for example, has set a target of acquiring 55% of its energy from renewable sources by 2030 [1]. The government, utilities, and academic communities are collaborating to create an intelligent power system capable of better integrating renewable energy sources into the grid [2]. However, large-scale renewable energy grid integration has many challenges including reduced reliability due to intermittency, less grid inertia, high fault current, and reduction in grid stability [3,4,5]. Moreover, integrating renewable energy into the grid necessitates real-time monitoring and control, which might be difficult if the existing grid infrastructure needs to be improved. The challenges related to renewable energy integration, especially solar and wind power, can be overcome with the accurate prediction of solar irradiation and wind speed.

The solar radiation information for a region must be known to evaluate a region’s solar energy potential. One of the outstanding challenges for robust planning, management, and applications of investments in the solar energy industry is a successful prediction of solar radiation amount arriving on the ground [6]. If the energy generated by the investments is sold commercially, it is critical to provide accurate, particularly short-term solar radiation predictions in both the day ahead and intraday markets. Making accurate projections directly impacts the profit margins of energy suppliers in these marketplaces [7,8]. Empirical models are simple to calculate and widely regarded as a method for predicting solar radiation data [9,10,11]. Even while empirical models have been widely used to estimate monthly average daily global solar radiation, these models cannot reliably predict short-term solar radiation data due to quick changes in meteorological conditions such as cloud cover, rainy days, humidity, and so on. According to some researchers, these models cannot capture the complicated and nonlinear connections between dependent and independent variables in humid regions where solar radiation is heavily influenced by heavy clouds throughout rainy days [12,13]. Previous research revealed that these empirical models produced partially satisfactory prediction results for daily global solar radiation data [14,15,16].

Artificial intelligence (AI) has grown in popularity in almost all technical domains in recent decades, thanks to technological advancements [17,18,19]. In addition to empirical models, other AI methods such as support vector machine (SVM), deep learning (DL), kernel nearest neighbor (k-NN), artificial neural networks (ANN), genetic algorithms (GA), and others have become popular models for predicting solar radiation data [20,21]. Determining the hyperparameters of DL to reduce the search space and subsequently employing hyperparameter optimization are important in AI applications [22,23]. Previous research has found that AI algorithms produce more accurate results than empirical models when it comes to predicting solar radiation. The reference [24] describes a method for predicting solar irradiance of photovoltaic power plants using wavelet decomposition and an extreme learning machine. One day in advance, the solar irradiance is forecasted with a resolution of 15 min. The simulation result based on actual observed data from a photovoltaic power station in Gansu province suggests that the proposed model is more accurate than traditional ones. In [25], a combination of particle swarm optimization and least squares support vector regression is presented to predict solar irradiance, and found that the model had a high-performance index. Some other state-of-the-art gradient-boosting machine-learning models such as LightGBM, and CatBoost have been applied for solar irradiance prediction [26,27,28]. LightGBM provided good performance matrices compared to other AI approaches such as support vector regression (SVR) and multiple linear regression (MLR). Several feasibility studies show that Bangladesh has a high potential to meet electric power demand from the solar-wind hybrid system if solar irradiance is accurately predicted [29,30]. However, much attention has not been given to predicting solar irradiance in Bangladesh using machine-learning algorithms. Overall, machine-learning algorithms can be effectively used to predict solar irradiance, and various algorithms have performed well in different studies. It is important to note that the choice of algorithm and parameters will likely depend on the specific dataset and application.

As noted from the above discussion, a range of machine-learning strategies have been applied in several different research projects in order to generate forecasts regarding solar irradiance. However, ensemble machine-learning algorithms require more focus. Ensemble prediction, which incorporates the predictions of numerous models, is gaining popularity as a method of improving the accuracy of solar irradiation predictions. The main research gaps filled by this research are discussed below.

The solar irradiation data from 32 stations are collected and preprocessed for machine-learning models. On the other hand, the existing literature [31,32] uses data for only a few stations.
Four different ensemble machine-learning techniques are employed to forecast solar irradiance in Bangladesh. The best model is selected with minimal standard deviation. The hyperparameter tuning of the selected model further improves the prediction accuracy.
Unlike many studies that compare prediction accuracy with various measures, this study provides a detailed explanation of algorithm success using five criteria such as $R^{2}$ , mean absolute percentage error (MAPE), root mean squared error (RMSE), mean absolute error (MAE), and mean squared logarithmic error (MSLE).

In addition to these contributions, the proposed prediction approach is compared with other global irradiance prediction approaches with other machine-learning and empirical methods. The proposed approach shows better performance in terms of several performance matrices. The remaining paper is organized as follows. Section 2 provides the data description. Ensemble algorithms for the prediction of solar irradiance are presented in Section 3. Section 4 summarizes performance evaluation matrices. Results and discussions are provided in Section 5. Finally, Section 6 provides this research’s conclusion and future direction.

2. Data Descriptions and Visualizations

In data-driven models, feature engineering is crucial in defining the model’s prediction performance. The process of choosing, extracting, and manipulating significant features from raw data to provide meaningful inputs for the model is known as feature engineering [33]. Correlation maps depict the relationship between two or more variables graphically. They are frequently used in data analysis to investigate the correlations between variables and to find patterns and trends in data [34]. A variable in statistical modeling is a property or feature that can have different values or categories. These numbers can be used to characterize and explain variation in a specific outcome. Statistical models, such as regression models, can be used to express the relationship between the outcome and the variable, where the variable serves as a predictor of the outcome. Moreover, variables are crucial to understanding the behavior of complex systems in the world around us from a physical standpoint. Variables in physics indicate quantifiable physical system parameters such as temperature, pressure, velocity, and mass.

A country in south Asia, Bangladesh spans 147,570 km

^{2}

and is situated between latitudes 20.59

^{\circ}

and 26.63

^{\circ}

N and longitudes 88.01

^{\circ}

and 92.67

^{\circ}

E. The country has three distinct seasons: scorching summer, chilly winter, and monsoon. High humidity, heavy rainfall, and powerful winds characterize the monsoon season. Bangladesh is located in a low-lying delta region, making it particularly vulnerable to floods during the monsoon season. Because of shifting climate patterns, the country has seen an increase in the frequency and intensity of natural disasters such as cyclones and floods in recent years. The data have been collected from 32 stations spread all across the country from the meteorological department. The dataset spans 1990 and 2017 and has 3060 rows and 10 columns. The station, date, maximum temperature, minimum temperature, total precipitation, humidity, sunshine hours, wind speed, cloud cover, and solar radiation are the meteorological features, where solar radiation is the output. The missing portions of some samples have been filled using forward filling machine-learning technique. A total of 2448 samples were used for training, and 612 samples were used for testing purposes. Figure 1 shows a scatter plot to help understand the trends in solar irradiance data in Bangladesh. From 1990 and 2017, the majority of the data shows a fluctuating trend. A correlation matrix plot is shown in Figure 2 to observe the correlation of diverse data. The chart clearly shows that solar irradiance is substantially connected with sunshine and cloud coverage, and only marginally correlated with other features.

3. Ensemble Methods for Solar Irradiation Prediction

This section provides details on several ensemble machine-learning models to forecast solar irradiance in Bangladesh.

3.1. Gradient-Boosting Regressor

Solar irradiance forecasting is within the category of the supervised regression problem. Boosting is a learning technique that was initially developed for classification problems but has now been effectively applied to regression problems as well. Gradient-Boosting Regression (GBR) is a machine-learning technique used for numerical value prediction. It is an ensemble method that combines several simple regression models to produce a more complicated and resilient model, as shown in Figure 3. It works by adding new regression models to the ensemble iteratively, with each model correcting the errors committed by earlier models [35,36]. The algorithm’s goal is to minimize the loss function, which calculates the difference between anticipated and actual values. In each iteration, the algorithm employs gradient descent to update the parameters of the regression models. The technique is extremely adaptable and can perform a wide range of regression tasks, including linear and nonlinear regression. Missing data and outliers in the training data can also be handled with GBR models. It can be compatible with a wide range of regression techniques, including decision trees, linear regression, and neural networks. However, the algorithm is computationally demanding and may necessitate large computing resources [37].

When making predictions, this machine-learning model “boosts” an ensemble of weak prediction models, frequently decision trees, to create a more dependable model. If a GRB has a T number of trees, then the prediction equation is given below.

f_{_{T}} (p_{j}) = \sum_{t}^{T} γ_{t} h_{t} (x_{j})

(1)

where

h_{t}

stands for weak learner and

γ_{t}

for scaling factor.

3.2. Adaboost Regressor

An ensemble of several weak learner decision trees, known as adaptive boosting, is marginally superior to random guessing. The adaptive nature of the AdaBoost method, however, conveys the gradient of the previous trees to the subsequent trees to reduce the error of the previous tree. As a result, this continual learning of trees at each stage develops a strong learner. The weighted average of the forecasts made by each tree serves as the final prediction, as shown in Figure 4. AdaBoost is more resilient to outliers and irrelevant data because of its strong flexibility. Additionally, the method is designed to function so that future trees are fed the knowledge obtained by earlier trees, allowing them to concentrate solely on training samples that are difficult to forecast [38].

3.3. Random Forest Regressor

A regression technique called random forest (RF) combines the output of several decision tree algorithms to categorize or forecast the value of a variable [39]. By allowing the trees to develop from several training data subsets that were made through a process known as bagging, RF promotes the diversity of the trees and provides the correlation of the various trees, as shown in Figure 5. The bagging technique creates training data by replacing the original dataset with random samples.

Let us consider the p as the training set’s input, and RF creates Z number of regression trees. Therefore, a Z number trees build

{T (p)}^{k}

. The predictor equation is given by,

f^{'} \frac{Z}{r f} (p) = \frac{1}{Z} \sum_{Z = 1}^{Z} T (p)

(2)

One of the primary features of the RF is its resistance to overfitting. The model can limit the impact of outliers and noise in the data by employing an ensemble of trees with random subsamples of the data and features. This makes it less prone to overfitting. Furthermore, RF can handle both categorical and continuous data, making it adaptable to a wide range of applications.

3.4. Bagging Regressor

A machine-learning technique of bootstrap aggregation, proposed by Bierman, often known as bagging, can be employed with various classification and regression algorithms to lessen the variance involved in prediction and enhance the prediction process. It is a fairly straightforward concept where many bootstrap samples are chosen from the existing data, some prediction method is applied to each bootstrap sample, and the results are then combined, by averaging for regression and simple voting for classification, to obtain the overall prediction, with the variance being decreased by the averaging (Figure 6).

A committee-based strategy called bagging can be used to increase the precision of regression or classification techniques. Boosting employs a weighted average of outcomes gained using a prediction method on numerous samples instead of bagging, which uses a simple averaging of data to generate an overall forecast. Additionally, with boosting, the samples used at each step are not all randomly selected from the same population, but rather, the cases from a previous phase that had inaccurate predictions are given more weight in the subsequent step. As a result, unlike bagging, which is based on a simple average of predictions, boosting is an iterative process incorporating weights [40,41]. All AI models cannot explain the chemical mechanisms of solar radiation transfer in the atmosphere. Data and algorithms are needed to train AI models, and the kind of data and techniques employed determines the model’s capabilities. It is needed to use physical models based on the laws of physics and chemistry to comprehend the physical and chemical mechanisms of solar radiation transfer in the atmosphere. A new empirical model of global solar irradiance has been constructed based on an investigation of solar radiation and climatic data observed at a subtropical forest in China. It is capable of calculating global solar irradiance at both the ground and the top of the atmosphere [42]. Moreover, some other empirical models compare the predicted sun irradiance with those from ground-based measurement data [43].

3.5. Experimental Setup and Parameter Tuning

Figure 7 depicts the experimental setup for the proposed AI model for predicting solar irradiance in Bangladesh. The data obtained from 32 stations includes several input features and output. The data are preprocessed to clean, fill up missing elements, normalize, and split test/train sets. The best model is selected based on a minimum standard deviation of the testing data. After selecting the best model, hyperparameters are tuned, including the number of trees, learning rate, maximum features, and maximum depth. An approach to parameter tuning known as “grid search” is used in this study which assesses the model for each set of algorithm parameters specified in the grid [44,45]. It uses a brute-force search approach to assess how well a model performs with different sets of hyperparameters. Hyperparameters are adjustable parameters that are chosen prior to training rather than being learned by the model as it goes along. To find the ideal set of hyperparameters for the model, grid search specifies a range of values for each hyperparameter before iterating over all conceivable combinations of these values. Although this method requires expensive computer resources, it enables a thorough exploration of the hyperparameter space, which may improve model performance. The tuned ensemble model is then tested with the validation dataset, and several performance matrices are calculated to compare the effectiveness of the proposed model with the existing models.

4. Model Evaluation Processes

4.1. Performance Evaluation Matrices

To assess the prediction effectiveness of models, a variety of performance matrices were employed, such as mean absolute percentage error (MAPE), root-mean-square error (RMSE), R-squared (

R^{2}

), mean absolute error (MAE), and mean squared logarithmic error (MSLE). The ratio of the mean absolute value of prediction errors to the mean absolute value of the actual data is known as Mean Absolute Percentage Error (MAPE). A lower MAPE score indicates that the model is doing better.

MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{p_{i} - q_{i}}{p_{i}} | \times 100

(3)

where

y_{i}

is the prediction,

x_{i}

is the actual value, N is the number of samples.

The RMSE is a popular statistic for measuring the disparities between a model’s anticipated and reported values. The RMSE is calculated by adding up all the square roots of the second moment of the differences between the predicted and observed values in a given sample.

RMSE = \sqrt{\frac{\sum_{i = 1}^{N} {(x_{i} - \bar{x_{i}})}^{2}}{N}}

(4)

where

x_{i}

is the actual value,

\bar{x_{i}}

is the predicted values, and N is the number of data points.

The MAE quantifies the average size of the forecasting errors without taking into account their direction. It evaluates precision for continuous variables.

MAE = \frac{\sum_{i = 1}^{N} | y_{i} - x_{i} |}{N}

(5)

R

^{2}

is a statistical measure that depicts the percentage of variance for a dependent variable in a regression model that is explained by one or more independent variables, which is given by the equation below.

R^{2} = 1 - \frac{S S R}{S S T}

(6)

where

S S R

is the some of squared regression and

S S T

is the total sum of squares.

4.2. Model Evaluation by k-Fold Cross-Validation

In addition to the above matrices, the performance of the proposed models in predicting solar irradiance is evaluated with k-fold cross-validation technique. This is an ML approach for evaluating the performance of a model. It entails partitioning the dataset into k equal portions or folds, with one fold serving as the validation set and the remaining k-1 folds serving as the training set. This method is performed k times, with each fold serving as the validation set once. The model’s performance is then averaged over the k iterations to provide a more precise approximation of its generalization performance. k-fold cross-validation helps to reduce overfitting and is frequently employed when the dataset is low and the need for model validation is high. The value of k is determined by the size of the dataset and the computational resources available, with common values for k being 5, 10, or even higher. In this study, 10-fold cross-validation is used to evaluate the performance of the ensemble machine-learning models.

5. Predicted Results and Discussions

The pipeline approach evaluates four distinct ensemble models using the default parameters to select the best one. A machine-learning pipeline is an end-to-end design that orchestrates the flow of data into and output from a set of various models. A better score is obtained for the predicted solar irradiance with the gradient-boosting regressor as depicted in Figure 8. Then, hyperparameters of the gradient-boosting regressor are further tuned with a grid-search approach. For gradient-boosting regressor, two parameters are most important: the number of trees in the forest and the size of the random subset of the features (maximum features). It is found that the number of trees 450 provides the best performance with a maximum feature of 0.9 [46]. In addition, the learning rate and maximum depth for the gradient-boosting regressor are tuned to improve the prediction accuracy further. The hyperparameters for the ensemble models are shown in Table 1.

After tuning the important parameters for the gradient-boosting regressor, the model is trained with tuned hyperparameters. The trained ensemble machine-learning model is then serialized and saved to a file for subsequent use and prediction of irradiance with unseen data using pickle, a typical means of serializing objects in Python. Table 2 shows the statistical performance of the developed model with the validation dataset. It shows the developed model provides excellent performance in terms of

R^{2}

value to predict solar irradiance, which is 0.9995. The root mean squared error for 612 observations is around 0.0007 also indicates good statistical performance. The other matrices, such as MAPE, MAE, and MSLE, are also provided in Table 2. It is worth mentioning that in order to reduce overfitting by providing a more robust assessment of model performance, a 10-fold cross-validation is adopted. Since one-sampling comparison is insufficient, the average value of

R^{2}

score for a repeated sub-sampling process is obtained with the cross-validation process, which is around 0.999767. To validate that the data-driven model is not overfitting, the RMSE value is calculated for the training dataset which is around 0.00065. The performance of the tuned gradient-boosting ensemble mode is further examined by several visualization approaches. The scatter plot of the actual solar irradiance, and the predicted solar irradiance is shown in Figure 9. It is found that the predicted solar irradiance is very close to the actual irradiance for almost all observations.

The actual and predicted solar irradiance for the testing dataset is plotted in Figure 10. It shows that for most of the observations, the predicated irradiance closely lies with the actual irradiance. The absolute error plot (Figure 11) shows a few observations the error crosses 0.25 kwh/m

^{2}

. It is worth mentioning that the observations in the x-axis of Figure 10 and Figure 11 represent the samples of the data point. For most of the observations, the error is below 0.05. Thus, the proposed ensemble model for predicting solar irradiance in Bangladesh shows excellent performance. A comparison of the existing machine-learning and other approaches in the prediction of solar irradiance is summarized in Table 3. The proposed approach outperforms the existing approaches in the literature in terms of several performance matrices in predicting solar irradiance.

6. Conclusions

In Bangladesh, the perpetual increase in energy demand necessitates more clean energy production from solar PV. Therefore, it is necessary to accurately predict solar irradiance to help integrate more solar energy into the utility grid. This study used ensemble machine-learning algorithms to forecast solar irradiance by examining the influence of a subset of meteorological factors such as maximum temperature, minimum temperature, total rain, humidity, sunshine, wind speed, and cloud coverage. Four ensemble models were tested using the pipeline approach, and gradient-boosting regression, provided the best performance with the default parameters. To achieve more accurate prediction, the hyperparameters, such as the number of trees, maximum features, learning rate, and the maximum depth of the gradient-boosting regression, were tuned with the grid-search approach. The tuned gradient-boosting regression provided a maximum

R^{2}

score of 0.9995. The algorithm performed well in predicting solar irradiance in Bangladesh while providing the best performance matrices (RMSE = 0.000688938, MAPE = 0.005203076, MAE = 0.021723261, MSLE = 0.00014126) for the validation dataset. The predicted value and actual value visualization, as well as the absolute error plot, guarantee excellent performance of the proposed ensemble model. Thus, ensemble models are useful tools for Bangladeshi researchers and practitioners involved in the solar energy industry. The appropriate authorities can use models to properly plan solar power generation and smooth distributed grid operation while minimizing power fluctuation. Although all the machine-learning models performed well, their findings may have been improved if they had been fed more variables and meteorological data. Additionally, use of a computationally expensive grid-search algorithm is another limitation of this work. Furthermore, by examining other interesting machine-learning methodologies, such as transparent machines and deep-learning models, this study might be broadened to predict solar irradiance in more prospective areas of the country.

Author Contributions

Conceptualization, M.S.A.; methodology, M.S.A. and M.S.H.; formal analysis, M.S.A., S.M.R. and F.S.A.-I.; writing—original draft preparation, M.S.A. and M.S.H.; writing—review and editing, M.S.A., F.S.A.-I. and S.M.R.; supervision, M.S.A. and S.M.R.; and funding acquisition, M.S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to acknowledge the support provided by King Fahd University of Petroleum & Minerals (KFUPM) through direct Funded project No. ER221005.

Conflicts of Interest

The authors declare no conflict of interest.

References

Renewable Energy Targets. Available online: https://energy.ec.europa.eu/topics/renewable-energy/renewable-energy-directive-targets-and-rules/renewable-energy-targets_en (accessed on 22 January 2023).
Shafiullah, G.; Oo, A.M.; Jarvis, D.; Ali, A.S.; Wolfs, P. Potential challenges: Integrating renewable energy with the smart grid. In Proceedings of the 2010 20th Australasian Universities Power Engineering Conference, Christchurch, New Zealand, 5–8 December 2010; pp. 1–6. [Google Scholar]
Alam, M.S.; Al-Ismail, F.S.; Salem, A.; Abido, M.A. High-level penetration of renewable energy sources into grid utility: Challenges and solutions. IEEE Access 2020, 8, 190277–190299. [Google Scholar] [CrossRef]
Alam, M.S.; Abido, M.A.Y.; El-Amin, I. Fault current limiters in power systems: A comprehensive review. Energies 2018, 11, 1025. [Google Scholar] [CrossRef] [Green Version]
Alam, M.S.; Chowdhury, T.A.; Dhar, A.; Al-Ismail, F.S.; Choudhury, M.; Shafiullah, M.; Hossain, M.I.; Hossain, M.A.; Ullah, A.; Rahman, S.M. Solar and Wind Energy Integrated System Frequency Control: A Critical Review on Recent Developments. Energies 2023, 16, 812. [Google Scholar] [CrossRef]
Prasad, R.; Ali, M.; Kwan, P.; Khan, H. Designing a multi-stage multivariate empirical mode decomposition coupled with ant colony optimization and random forest model to forecast monthly solar radiation. Appl. Energy 2019, 236, 778–792. [Google Scholar] [CrossRef]
Dong, N.; Chang, J.F.; Wu, A.G.; Gao, Z.K. A novel convolutional neural network framework based solar irradiance prediction method. Int. J. Electr. Power Energy Syst. 2020, 114, 105411. [Google Scholar] [CrossRef]
Sigauke, C.; Chandiwana, E.; Bere, A. Spatio-Temporal Forecasting of Global Horizontal Irradiance Using Bayesian Inference. Appl. Sci. 2022, 13, 201. [Google Scholar] [CrossRef]
Nikolaeva, V.; Gordeev, E. SPAM: Solar Spectrum Prediction for Applications and Modeling. Atmosphere 2023, 14, 226. [Google Scholar] [CrossRef]
Choi, Y.; Kwun, H.; Kim, D.; Lee, E.; Bae, H. Residual Life Prediction for Induction Furnace by Sequential Encoder with s-Convolutional LSTM. Processes 2021, 9, 1121. [Google Scholar] [CrossRef]
Sibtain, M.; Li, X.; Saleem, S.; Asad, M.S.; Tahir, T.; Apaydin, H. A multistage hybrid model ICEEMDAN-SE-VMD-RDPG for a multivariate solar irradiance forecasting. IEEE Access 2021, 9, 37334–37363. [Google Scholar] [CrossRef]
Munoz, M.N.; Ballantyne, E.E.; Stone, D.A. Development and evaluation of empirical models for the estimation of hourly horizontal diffuse solar irradiance in the United Kingdom. Energy 2022, 241, 122820. [Google Scholar] [CrossRef]
Oyewola, O.M.; Patchali, T.E.; Ajide, O.O.; Singh, S.; Matthew, O.J. Global solar radiation predictions in Fiji Islands based on empirical models. Alex. Eng. J. 2022, 61, 8555–8571. [Google Scholar] [CrossRef]
Sharifi, S.S.; Rezaverdinejad, V.; Nourani, V. Estimation of daily global solar radiation using wavelet regression, ANN, GEP and empirical models: A comparative study of selected temperature-based approaches. J. Atmos. Sol. Terr. Phys. 2016, 149, 131–145. [Google Scholar] [CrossRef]
Djaafari, A.; Ibrahim, A.; Bailek, N.; Bouchouicha, K.; Hassan, M.A.; Kuriqi, A.; Al-Ansar, N.; El-Kenawy, E.S.M. Hourly predictions of direct normal irradiation using an innovative hybrid LSTM model for concentrating solar power projects in hyper-arid regions. Energy Rep. 2022, 8, 15548–15562. [Google Scholar] [CrossRef]
Jiang, Y. Prediction of monthly mean daily diffuse solar radiation using artificial neural networks and comparison with other empirical models. Energy Policy 2008, 36, 3833–3837. [Google Scholar] [CrossRef]
Ray, P.P. A review on TinyML: State-of-the-art and prospects. J. King Saud Univ. Comput. Inf. Sci. 2021, 34, 595–1623. [Google Scholar] [CrossRef]
Li, D.; Tang, Z.; Kang, Q.; Zhang, X.; Li, Y. Machine Learning-Based Method for Predicting Compressive Strength of Concrete. Processes 2023, 11, 390. [Google Scholar] [CrossRef]
Shafiullah, M.; AlShumayri, K.A.; Alam, M.S. Machine learning tools for active distribution grid fault diagnosis. Adv. Eng. Softw. 2022, 173, 103279. [Google Scholar] [CrossRef]
Ren, Y.; Suganthan, P.; Srikanth, N. Ensemble methods for wind and solar power forecasting—A state-of-the-art review. Renew. Sustain. Energy Rev. 2015, 50, 82–91. [Google Scholar] [CrossRef]
Ray, P.K.; Bharatee, A.; Puhan, P.S.; Sahoo, S. Solar Irradiance Forecasting Using an Artificial Intelligence Model. In Proceedings of the 2022 International Conference on Intelligent Controller and Computing for Smart Power (ICICCSP), Hyderabad, India, 21–23 July 2022; pp. 1–5. [Google Scholar]
Guo, H.; Zhuang, X.; Chen, P.; Alajlan, N.; Rabczuk, T. Stochastic deep collocation method based on neural architecture search and transfer learning for heterogeneous porous media. Eng. Comput. 2022, 38, 5173–5198. [Google Scholar] [CrossRef]
Guo, H.; Zhuang, X.; Chen, P.; Alajlan, N.; Rabczuk, T. Analysis of three-dimensional potential problems in non-homogeneous media with physics-informed deep collocation method using material transfer learning and sensitivity analysis. Eng. Comput. 2022, 38, 5423–5444. [Google Scholar] [CrossRef]
Dong, H.; Yang, L.; Zhang, S.; Li, Y. An Improved Prediction Approach on Solar Irradiance of Photovoltaic Power Station. TELKOMNIKA Indones. J. Electr. Eng. 2014, 12, 1720–1726. [Google Scholar] [CrossRef]
Ghazvinian, H.; Mousavi, S.F.; Karami, H.; Farzin, S.; Ehteram, M.; Hossain, M.S.; Fai, C.M.; Hashim, H.B.; Singh, V.P.; Ros, F.C.; et al. Integrated support vector regression and an improved particle swarm optimization-based model for solar radiation prediction. PLoS ONE 2019, 14, e0217634. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 1–9. Available online: https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf (accessed on 22 January 2023).
Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar]
Chaibi, M.; Benghoulam, E.; Tarik, L.; Berrada, M.; Hmaidi, A.E. An interpretable machine learning model for daily global solar radiation prediction. Energies 2021, 14, 7367. [Google Scholar] [CrossRef]
Lipu, M.S.H.; Uddin, M.S.; Miah, M.A.R. A feasibility study of solar-wind-diesel hybrid system in rural and remote areas of Bangladesh. Int. J. Renew. Energy Res. 2013, 3, 892–900. [Google Scholar]
Rashid, F.; Hoque, M.E.; Aziz, M.; Sakib, T.N.; Islam, M.T.; Robin, R.M. Investigation of optimal hybrid energy systems using available energy sources in a rural area of Bangladesh. Energies 2021, 14, 5794. [Google Scholar] [CrossRef]
Rabbi, K.M.; Nandi, I.; Saleh, A.S.; Faisal, F.; Mojumder, S. Prediction of solar irradiation in Bangladesh using artificial neural network (ANN) and data mapping using GIS technology. In Proceedings of the 2016 4th International Conference on the Development in the in Renewable Energy Technology (ICDRET), Dhaka, Bangladesh, 7–9 January 2016; pp. 1–6. [Google Scholar]
Shuvho, M.B.A.; Chowdhury, M.A.; Ahmed, S.; Kashem, M.A. Prediction of solar irradiation and performance evaluation of grid connected solar 80KWp PV plant in Bangladesh. Energy Rep. 2019, 5, 714–722. [Google Scholar] [CrossRef]
Heaton, J. An empirical analysis of feature engineering for predictive modeling. In Proceedings of the SoutheastCon 2016, Norfolk, VA, USA, 30 March–3 April 2016; pp. 1–6. [Google Scholar]
Hossain, M.A.; Gray, E.; Lu, J.; Islam, M.R.; Alam, M.S.; Chakrabortty, R.; Pota, H.R. Optimized forecasting model to improve the accuracy of very short-term wind power prediction. IEEE Trans. Ind. Inform. 2023, 3230726. [Google Scholar] [CrossRef]
Bhatnagar, M.; Yadav, A.; Swetapadma, A. Enhancing the resiliency of transmission lines using extreme gradient boosting against faults. Electr. Power Syst. Res. 2022, 207, 107850. [Google Scholar] [CrossRef]
Li, K.; Chang, F.; Shi, S.; Jiang, C.; Bai, Y.; Dong, H.; Meng, X.; Wu, J.C.; Zhang, X. A new method of Ionic Fragment Contribution-Gradient Boosting Regressor for predicting the infinite dilution activity coefficient of dichloromethane in ionic liquids. Fluid Phase Equilibria 2023, 564, 113622. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Cham, Switzerland, 2009; Volume 2. [Google Scholar]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
Guo, L.; Chehata, N.; Mallet, C.; Boukir, S. Relevance of airborne lidar and multispectral image data for urban scene classification using Random Forests. ISPRS J. Photogramm. Remote Sens. 2011, 66, 56–66. [Google Scholar] [CrossRef]
Sutton, C.D. Classification and regression trees, bagging, and boosting. Handb. Stat. 2005, 24, 303–329. [Google Scholar]
Kovačević, M.; Ivanišević, N.; Petronijević, P.; Despotović, V. Construction cost estimation of reinforced and prestressed concrete bridges using machine learning. Građevinar 2021, 73, 1–13. [Google Scholar]
Bai, J.; Zong, X. Global solar radiation transfer and its loss in the atmosphere. Appl. Sci. 2021, 11, 2651. [Google Scholar] [CrossRef]
Bi, J.; Huang, J.; Fu, Q.; Ge, J.; Shi, J.; Zhou, T.; Zhang, W. Field measurement of clear-sky solar irradiance in Badain Jaran Desert of Northwestern China. J. Quant. Spectrosc. Radiat. Transf. 2013, 122, 194–207. [Google Scholar] [CrossRef]
Bou-Rabee, M.; Lodi, K.A.; Ali, M.; Ansari, M.F.; Tariq, M.; Sulaiman, S.A. One-month-ahead wind speed forecasting using hybrid AI model for coastal locations. IEEE Access 2020, 8, 198482–198493. [Google Scholar] [CrossRef]
Shi, Y.; Zhao, G.; Wang, M.; Xu, Y. An adaptive grid search algorithm for fitting spherical target of terrestrial LiDAR. Measurement 2022, 198, 111430. [Google Scholar] [CrossRef]
Shi, R.; Xu, X. A train arrival delay prediction model using xgboost and bayesian optimization. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6. [Google Scholar]
Jamil, B.; Bellos, E. Development of empirical models for estimation of global solar radiation exergy in India. J. Clean. Prod. 2019, 207, 1–16. [Google Scholar] [CrossRef]
Deng, F.; Su, G.; Liu, C.; Wang, Z. Prediction of solar radiation resources in China using the LS-SVM algorithms. In Proceedings of the 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE), Singapore, 26–28 February 2010; pp. 31–35. [Google Scholar]
Solano, E.S.; Dehghanian, P.; Affonso, C.M. Solar Radiation Forecasting Using Machine Learning and Ensemble Feature Selection. Energies 2022, 15, 7049. [Google Scholar] [CrossRef]
Feng, Y.; Gong, D.; Zhang, Q.; Jiang, S.; Zhao, L.; Cui, N. Evaluation of temperature-based machine learning and empirical models for predicting daily global solar radiation. Energy Convers. Manag. 2019, 198, 111780. [Google Scholar] [CrossRef]
Arora, I.; Gambhir, J.; Kaur, T. Data normalisation-based solar irradiance forecasting using artificial neural networks. Arab. J. Sci. Eng. 2021, 46, 1333–1343. [Google Scholar] [CrossRef]
Antonopoulos, V.Z.; Papamichail, D.M.; Aschonitis, V.G.; Antonopoulos, A.V. Solar radiation estimation methods using ANN and empirical models. Comput. Electron. Agric. 2019, 160, 160–167. [Google Scholar] [CrossRef]
Bounoua, Z.; Chahidi, L.O.; Mechaqrane, A. Estimation of daily global solar radiation using empirical and machine-learning methods: A case study of five Moroccan locations. Sustain. Mater. Technol. 2021, 28, e00261. [Google Scholar] [CrossRef]

Figure 1. Scatter plot of solar irradiance of full dataset.

Figure 2. Irradiance data correlation plot.

Figure 3. Gradient-boosting regression model.

Figure 4. Adaboost regression model.

Figure 5. Random forest regression model.

Figure 6. Bagging regression model.

Figure 7. The structure of the proposed ensemble algorithms for solar irradiance prediction.

Figure 8. Performance comparison for several ensemble models.

Figure 9. Scatter plot of prediction and actual irradiance.

Figure 10. Actual and predicted irradiance for the testing dataset.

Figure 11. Absolute error for the testing dataset with the optimized GRB algorithm.

Table 1. Hyperparameters for the ensemble machine-learning models.

Ensemble Models	Hyperparameters
Adaboost Regression	number of trees = 50; Learning rate = 0.95, loss function = linear
Gradient-Boosting Regression	Number of trees = 450, maximum features = 0.9, learning rate = 0.7, maximum depth = 6
Random Forest Regression	Number of trees = 100, minimum sample split = 2, maximum features = 1
Bagging Regression	number of trees = 10; maximum samples = 1, maximum features = 1, base estimator = deprecated

Table 2. Statistical performance matrices in irradiance prediction.

RMSE	MAPE	MAE	MSLE	$R^{2}$ Score
0.0007	0.0052	0.0217	0.0001	0.9995

Table 3. Performance comparison of the proposed approach with the existing approaches.

Reference	Location	Prediction Models	Best Model	Hyperparameter Tuning	Performance Matrices
[47]	India	32 empirical models in 4 different categories	cubic and quartic models	N/A	$R^{2}$ = 0.9953, RMSE = 2.916, MAE = 1.3267
[48]	China	LS-SVM	LS-SVM	No	$R^{2}$ = 0.9832, RMSE = 0.7278
[49]	Brazil	SVR, XGBT, CatBoost, and VOA	VOA	Yes	$R^{2}$ = 0.848, RMSE = 0.3418, MAE = 0.2417, MAPE = 27.2163
[50]	Global	ANN, MEA-ANN, RF, WNN, Empirical	MEA-ANN	No	$R^{2}$ = 0.885, RMSE = 2.814, rRMSE = 19.6
[7]	China	Chaotic hybrid CNN, novel CNN	Novel CNN	Yes	MSE = 0.014, MAE = 0.063, AER = 0.011
[51]	India	ANN	ANN	Yes	$R^{2}$ = 0.9882, MAPE = 6.63141, MAE = 0.03069
[52]	Global	MLR, ANN, Empirical	ANN	Yes	$R^{2}$ = 0.884, RSME = 3.166
[28]	Global	LightGBM, SVR, MLR	LightGBM	No	$R^{2}$ = 0.9377, RSME = 0.4827, MAE = 0.3614
[53]	Morocco	Boosted trees, bagged trees, RF, Empirical	RF	No	$R^{2}$ = 0.9620, RSME = 0.0785, MAE = 0.0584
This Work	Bangladesh	ABR, GBR, RFR, BR	GBR	Yes	$R^{2}$ = 0.9995, RMSE = 0.0007, MAPE = 0.0052, MSLE = 0.0001

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alam, M.S.; Al-Ismail, F.S.; Hossain, M.S.; Rahman, S.M. Ensemble Machine-Learning Models for Accurate Prediction of Solar Irradiation in Bangladesh. Processes 2023, 11, 908. https://doi.org/10.3390/pr11030908

AMA Style

Alam MS, Al-Ismail FS, Hossain MS, Rahman SM. Ensemble Machine-Learning Models for Accurate Prediction of Solar Irradiation in Bangladesh. Processes. 2023; 11(3):908. https://doi.org/10.3390/pr11030908

Chicago/Turabian Style

Alam, Md Shafiul, Fahad Saleh Al-Ismail, Md Sarowar Hossain, and Syed Masiur Rahman. 2023. "Ensemble Machine-Learning Models for Accurate Prediction of Solar Irradiation in Bangladesh" Processes 11, no. 3: 908. https://doi.org/10.3390/pr11030908

APA Style

Alam, M. S., Al-Ismail, F. S., Hossain, M. S., & Rahman, S. M. (2023). Ensemble Machine-Learning Models for Accurate Prediction of Solar Irradiation in Bangladesh. Processes, 11(3), 908. https://doi.org/10.3390/pr11030908

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ensemble Machine-Learning Models for Accurate Prediction of Solar Irradiation in Bangladesh

Abstract

1. Introduction

2. Data Descriptions and Visualizations

3. Ensemble Methods for Solar Irradiation Prediction

3.1. Gradient-Boosting Regressor

3.2. Adaboost Regressor

3.3. Random Forest Regressor

3.4. Bagging Regressor

3.5. Experimental Setup and Parameter Tuning

4. Model Evaluation Processes

4.1. Performance Evaluation Matrices

4.2. Model Evaluation by k-Fold Cross-Validation

5. Predicted Results and Discussions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI