Benchmarking Tree-Based Artificial Intelligence Models for Multi-Resolution Solar Irradiance Forecasting Across Various Sky Conditions in Arid Climates
Abstract
1. Introduction
1.1. Related Works
1.2. Identified Research Gaps
1.3. The Key Contribution of This Work
- We assess forecasting models using a large-scale dataset collected over 13 years, which is processed and enhanced through feature engineering, strictly utilizing lagged inputs to prevent data leakage, across five distinct short-term resolutions: 5, 10, 15, 30 and 60 min.
- We determine model accuracy changes across different time resolutions and guide model selections.
- We evaluate model reliability by testing models across specific weather conditions (clear, partly cloudy, and overcast skies) to categorize the best-performing ML model for each weather pattern at different time resolutions.
- We compare all models against a Persistence baseline to set a clear standard. This gives grid operators a practical guide for short-term solar forecasting in arid areas, helping to increase solar energy use in these regions.
2. Dataset, Data Collection and Processing
2.1. Study Area and Data Collection
2.2. Data Quality and Statistical Analysis
2.3. Data Preprocessing and Feature Engineering
- : This parameter represents the value from the previous time step, making it a key feature for short-term forecasts [48]. The Persistence model relies entirely on this feature because it assumes that the weather stays the same from one step to the next [49]. This model is expressed by the following [48]:Therefore, provides vital autoregressive information for statistical and ML models, capturing the inherent temporal inertia of .
- : The precise theoretical calculation of under standard ideal circumstances is executed using the Ineichen–Perez model [50], which provides the essential baseline for calculating atmospheric attenuation, as described by
- gradient (): This factor measures the instantaneous rate of change in , providing the model with information about recent trends. To strictly prevent data leakage, this gradient is calculated using only historical values available at the time of prediction. It is defined as follows [48]:where corresponds to the time resolution (e.g., 5, 10, or 60 min). A positive value indicates a recent increase in irradiance, while a negative value suggests the onset of cloud cover or sunset. Both and are historical values known at the moment of the forecast; thus, this feature is completely safe from data leakage.
- Clear sky index (): To classify the weather, GHI is normalized to create a clear sky index. The index is calculated from the following [51]:
- Cloud opacity (): This variable measures the cloud density to help determine sky conditions [48]. By using this lagged value, the model checks the sky’s previous state to predict sudden solar irradiance fluctuations, making it highly useful for spotting incoming clouds in the volatile weather of the study site.
- Temporal features (hour, zenith, azimuth): These variables track the sun’s physical location. The hour shows the daily cycle of solar intensity. The zenith angle measures the sun’s height above the horizon, which controls how much atmosphere the radiation must pass through. The azimuth simply points to the compass direction of the sun. All of these are basic requirements for calculating solar geometry [48]. These parameters were calculated using the specific date and time and geographical coordinates (latitude and longitude) of the study site utilizing standard solar position algorithms.
- components (): represents the portion of solar radiation coming directly from the solar disk. This parameter provides the forecasting tool with precise details about the solar resources available for direct solar energy, as well as cloud types and thicknesses [52]. is the theoretical clear sky and offers a baseline for potential direct irradiance, while the lagged actual () provides immediate feedback on previous atmospheric transparency. Lagged versions of these components were used to prevent data leakage [48].
- Current time (): This is the moment when the forecast is made, at which the historical data (e.g., , ) are available.
- Target time (): This is the future time step for which the GHI value is predicted.
- Forecast horizon: This is the gap between the current time and the target time. Since this study uses datasets with different resolutions, the horizon is equal to the time step of each dataset (i.e., 5, 10, 15, 30, or 60 min).
- Available input variables: The available input variables include the following:
- Past data: Stochastic meteorological variables (GHI, DNI, cloud opacity) from time and earlier.
- Future data: Deterministic variables (zenith, azimuth, clear sky GHI) calculated for the target time . These are known in advance because they depend on astronomical equations, not weather.
2.4. Weather Classification Framework
- Clear sky (): The sky is mostly clear, allowing maximum sunlight to reach the surface.
- Partly cloudy (): Clouds pass frequently, causing sudden and unpredictable changes in solar radiation.
- Cloudy (): Thick clouds block the sun, keeping irradiance levels very low.
3. ML Models and Performance Metrics
3.1. Evaluated ML Models
- Random Forest: This ensemble method builds many separate decision trees. Each tree makes a prediction, and the model averages them together to get the final result. This averaging lowers the prediction error and stops the model from overfitting, making it very stable.
- Gradient Boosting: This model builds decision trees one at a time. Each new tree is trained specifically to fix the mistakes made by the previous ones. This cycle repeats until the error stops dropping. However, because it keeps trying to fix tiny errors, it can easily overfit if the training data are noisy.
- Histogram-Based Gradient Boosting (HistGradientBoosting): This model is a faster version of the standard Gradient Boosting model. It works by grouping similar data points into ‘bins’, which cuts down the processing time. This makes it much quicker to build and test accurate models.
- XGBoost: This model is a highly efficient boosting algorithm that works well on large datasets [12,16]. It includes built-in regularization that can be adjusted based on the specific problem, reducing the need for early stopping. This helps prevent overfitting and typically leads to better results on new data.
- LightGBM: This model is a fast Gradient Boosting framework that uses smart techniques like gradient-based one-side sampling and exclusive feature bundling to train much faster and use less memory than XGBoost, especially on large datasets [16]. Instead of growing trees level-by-level, this model grows them leaf-by-leaf. This helps it to find complex patterns quickly, though it requires careful parameter tuning to avoid overfitting.
3.2. Experimental Setup and Performance Metrics
- In this step, the model uses a Gaussian process to estimate the validation loss for different hyperparameters. It mixes initial assumptions with actual test results to predict both the expected loss and the uncertainty for any setting.
- An acquisition function: This function uses the surrogate model’s guesses to pick the next hyperparameters to test. The expected improvement method is a common choice for this step [55], given bywhere is the expected improvement for the hyperparameter configuration, is the vector of hyperparameters being evaluated, E denotes the expectation operator, is the best observed objective function value (minimum validation error) found so far, and is the objective function value (validation error) for the current hyperparameters where the best observed value is defined, and the expectation is calculated over the distribution of . This step-by-step approach efficiently searches through different hyperparameters until it finds the best settings to minimize the validation error. For example, consider an iteration where the best validation RMSE found thus far is . The surrogate model evaluates a new hyperparameter set . It predicts that the expected error for this set is . Equation (5) then calculates the expected improvement (), representing the probability that this new set will yield an error lower than . If is high, the algorithm selects for actual training. The Bayesian optimization process was allocated a fixed budget of 100 iterations for each model to ensure fair comparison. This optimization process was performed independently for each forecasting horizon (5, 10, 15, 30, and 60 min) to ensure that the hyperparameters were tailored to the specific dynamics of each time resolution. The optimized hyperparameter settings obtained from this process for each evaluated model are detailed in Table 4. It should be noted that the values in this table are representative examples for the 60 min horizon only, as listing all configurations for every horizon would be too extensive.
4. Overall Multi-Resolution Performance
5. Model Performance Under Distinct Weather Conditions
5.1. Comparison with Existing Studies
5.2. Limitations and Future Work
6. Discussion and Conclusions
- Very-short-term (5–15 min) horizons: The Gradient Boosting and XGBoost models provide the highest overall accuracy during this timeframe. The Persistence model remains a strong alternative for clear days, but ML models are necessary to handle passing clouds.
- Short-term (30–60 min): The HistGradientBoosting and XGBoost models are the top choices here. At a 60 min resolution, HistGradientBoosting drops the overall RMSE from 150.75 W/m2 down to 49.77 W/m2. This translates to an error reduction of roughly 67% compared to the Persistence baseline.
- Weather-dependent strategy: Model selection must change based on the sky and weather conditions. For partly cloudy conditions, the best model shifts depending on the exact forecast window (e.g., LightGBM at 15 min resolutions or the HistGradientBoosting model at 60 min resolutions). However, for fully overcast days, the data clearly shows that relying on a simple Persistence model is the safest option to avoid ML models chasing chaotic noise.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| ANN | Artificial neural network |
| DNI | Direct normal irradiance |
| GHI | Global horizontal irradiance |
| LSTM | Long short-term memory |
| MAE | Mean absolute error |
| ML | Machine learning |
| MSE | Mean squared error |
| R2 | Coefficient of determination |
| RMSE | Root mean squared error |
| sMAPE | Symmetric mean absolute percentage error |
References
- Schaeffer, R.; Schipper, E.L.F.; Ospina, D.; Mirazo, P.; Alencar, A.; Anvari, M.; Artaxo, P.; Biresselioglu, M.E.; Blome, T.; Boeckmann, M.; et al. Ten New Insights in Climate Science 2024. One Earth 2025, 8, 101285. [Google Scholar] [CrossRef]
- Chatterjee, S.; Khan, P.W.; Byun, Y.C. Recent Advances and Applications of Machine Learning in the Variable Renewable Energy Sector. Energy Rep. 2024, 12, 5044–5065. [Google Scholar] [CrossRef]
- Ye, J.; Chen, J. The Ultimate Meteorological Question from Observational Astronomers: How Good Is the Cloud Cover Forecast? Mon. Not. R. Astron. Soc. 2013, 428, 3288–3294. [Google Scholar]
- Shi, C.; Wang, T.; Wang, G.; Letu, H. The Net Warming Effect of Clouds on Global Surface Temperature May Be Weakening or Even Disappearing. Geosci. Front. 2025, 16, 102107. [Google Scholar] [CrossRef]
- Adar, M.; Babay, M.A.; Boussif, M.; Khaouch, Z.; Abbassi, Z.; Najih, Y.; Mabrouki, M. Optimization of Photovoltaic System Modelling: A Comparative Study and Experimental Validation Using Bond Graph Methodology and a Genetic Algorithm. In Applied Mathematics, Modeling and Computer Simulation; IOS Press: Amsterdam, The Netherlands, 2024; pp. 723–730. [Google Scholar]
- Kut, P.; Pietrucha-Urbanik, K. Forecasting Short-Term Photovoltaic Energy Production to Optimize Self-Consumption in Home Systems Based on Real-World Meteorological Data and Machine Learning. Energies 2025, 18, 4403. [Google Scholar]
- El-Amarty, N.; El Fadili, H.; Bennani, S.D. Accurate Short-Term Solar Irradiance Forecasting with TinyML on Edge Device. In Proceedings of the 2024 International Conference on Circuit, Systems and Communication (ICCsC), Fez, Morocco, 28–29 June 2024; pp. 1–6. [Google Scholar]
- Sward, J.A.; Ault, T.R.; Zhang, K.M. Genetic Algorithm Selection of the Weather Research and Forecasting Model Physics to Support Wind and Solar Energy Integration. Energy 2022, 254, 124367. [Google Scholar] [CrossRef]
- Chen, D.; Shi, X.; Jiang, M.; Zhu, S.; Zhang, H.; Zhang, D.; Chen, Y.; Yan, J. Selecting Effective NWP Integration Approaches for PV Power Forecasting with Deep Learning. Sol. Energy 2025, 301, 113939. [Google Scholar] [CrossRef]
- Zhang, F.; Hong, X.; Zhao, Z.; Gan, Z.; Ouyang, P.; Xiao, H.; Zhang, R.; Wei, X.; Cai, M.; Lu, F. Short-Term Forecasting of Cloud Physical Properties Based on Fourier Neural Operator Method. Geophys. Res. Lett. 2026, 53, e2025GL119553. [Google Scholar]
- Mohanty, P.; Subhadarshini, K.; Nayak, R.; Pati, U.C.; Mahapatra, K. Exploring Data-Driven Multivariate Statistical Models for the Prediction of Solar Energy. In Computer Vision and Machine Intelligence for Renewable Energy Systems; Elsevier: Amsterdam, The Netherlands, 2025; pp. 85–101. [Google Scholar]
- Aksoy, N.; Genc, I. Predictive Models Development Using Gradient Boosting Based Methods for Solar Power Plants. J. Comput. Sci. 2023, 67, 101958. [Google Scholar] [CrossRef]
- Nguyen, H.N.; Tran, Q.T.; Ngo, C.T.; Nguyen, D.D.; Tran, V.Q. Solar Energy Prediction Through Machine Learning Models: A Comparative Analysis of Regressor Algorithms. PLoS ONE 2025, 20, e0315955. [Google Scholar] [CrossRef] [PubMed]
- Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
- Azman, M.A.; Jantan, H.; Bahrin, U.F.M.; Kadir, E.A. Solar Power Production Forecasting Model Using Random Forest Algorithm. In Intelligent Systems Design and Applications; Springer: Cham, Switzerland, 2023; pp. 135–144. [Google Scholar]
- Vargas, J.; Martinez, R.; Loo, L. Enhancing Photovoltaic Energy Forecasting with Machine Learning: A Comparison Study of XGBoost, LightGBM and LSTM Models. In Proceedings of the 2024 IEEE Latin American Conference on Computational Intelligence (LA-CCI), Bogota, CO, USA, 13–15 November 2024; pp. 1–6. [Google Scholar]
- Costa, T.; Falcão, B.; Mohamed, M.A.; Annuk, A.; Marinho, M. Employing Machine Learning for Advanced Gap Imputation in Solar Power Generation Databases. Sci. Rep. 2024, 14, 23801. [Google Scholar] [CrossRef] [PubMed]
- Singh, R.; Singh, S.; Gupta, S.; Alotaibi, M.A.; Malik, H. Forecasting Rooftop Photovoltaic Solar Power Using Machine Learning Techniques. Energy Rep. 2025, 13, 3616–3630. [Google Scholar] [CrossRef]
- Mellit, A.; Massi Pavan, A.; Ogliari, E.; Leva, S.; Lughi, V. Advanced Methods for Photovoltaic Output Power Forecasting: A Review. Appl. Sci. 2020, 10, 487. [Google Scholar] [CrossRef]
- Kina, C.; Tanyildizi, H.; Al Bakri Abdullah, M.M.; Razak, R.A.; Imjai, T. Comparison of Deep LSTM and Machine Learning Models for Predicting Compressive Strength of Fly Ash/Slag-Based Geopolymer Concrete. Sci. Rep. 2025, 15, 32871. [Google Scholar] [CrossRef] [PubMed]
- Levent, I.; Sahin, G.; Isik, G.; van Sark, W.G. Comparative Analysis of Advanced Machine Learning Regression Models with Advanced Artificial Intelligence Techniques to Predict Rooftop PV Solar Power Plant Efficiency Using Indoor Solar Panel Parameters. Appl. Sci. 2025, 15, 3320. [Google Scholar] [CrossRef]
- Ramirez-Rivera, F.A.; Guerrero-Rodriguez, N.F. Ensemble Learning Algorithms for Solar Radiation Prediction in Santo Domingo: Measurements and Evaluation. Sustainability 2024, 16, 8015. [Google Scholar] [CrossRef]
- Saigustia, C.; Pijarski, P. Time Series Analysis and Forecasting of Solar Generation in Spain Using eXtreme Gradient Boosting: A Machine Learning Approach. Energies 2023, 16, 7618. [Google Scholar] [CrossRef]
- Ledmaoui, Y.; El Maghraoui, A.; El Aroussi, M.; Saadane, R.; Chebak, A.; Chehri, A. Forecasting solar energy production: A comparative study of machine learning algorithms. Energy Rep. 2023, 10, 1004–1012. [Google Scholar] [CrossRef]
- Van Poecke, A.; Tabari, H.; Hellinckx, P. Unveiling the Backbone of the Renewable Energy Forecasting Process: Exploring Direct and Indirect Methods and Their Applications. Energy Rep. 2024, 11, 544–557. [Google Scholar]
- Pan, F. Forecasting Solar Energy Generation Using Machine Learning Techniques and Hybrid Models Optimized by War SO. Informatica 2025, 49. [Google Scholar]
- Onaiwu, G.E.; Ayidu, J.N. Impact of Aerosols on Climate Change and Radiative Forcing. Environ. Policy Rev. 2025. [Google Scholar] [CrossRef]
- Gawusu, S.; Zhang, X.; Jamatutu, S.A.; Yeboah, E.K.; Tizumah, M.W.; Yakubu, S. Adaptive Solar Energy Modeling for Sustainable Urban Infrastructure: Addressing Non-Linear Conversion Challenges. Environ. Dev. Sustain. 2025, 1–35. [Google Scholar] [CrossRef]
- Sharma, H.; Kaur, S. Deep Learning for Sustainable Development Across Climate, Energy, Agriculture and Urban Systems. Discov. Sustain. 2025, 6, 1408. [Google Scholar] [CrossRef]
- Abdessadak, A.; Ghennioui, H.; El Bhiri, B.; Thirion-Moreau, N.; Abraim, M.; Merzouk, S. Assessing the Effects of Dust on Solar Panel Performance: A Comprehensive Review and Future Directions. Eng. Proc. 2025, 112, 9. [Google Scholar] [CrossRef]
- Amara, M.B.; Rdhaounia, E.; Balghouthi, M. Adaptive Solar Irradiance Forecasting in Arid Regions: Enhancing Accuracy with Localized Atmospheric Adjustments. J. Eng. Res. 2024, 13, 2663–2679. [Google Scholar] [CrossRef]
- Mardani, M.; Hoseinzadeh, S.; Garcia, D.A. Developing Particle-Based Models to Predict Solar Energy Attenuation Using Long-Term Daily Remote and Local Measurements. J. Clean. Prod. 2024, 434, 139690. [Google Scholar] [CrossRef]
- Atiea, M.A.; Shaheen, A.M.; Alassaf, A.; Alsaleh, I. Enhanced Solar Power Prediction Models with Integrating Meteorological Data Toward Sustainable Energy Forecasting. Int. J. Energy Res. 2024, 2024, 8022398. [Google Scholar] [CrossRef]
- Sutarna, N.; Tjahyadi, C.; Oktivasari, P.; Dwiyaniti, M.; Tohazen. Feature Optimization for Short-Term Solar Power Forecasting using Bidirectional LSTM Networks. In Proceedings of the 2024 7th International Conference of Computer and Informatics Engineering (IC2IE), Bali, Indonesia, 12–13 September 2024; pp. 1–6. [Google Scholar]
- Ali, M.; Souahlia, A.; Rabahi, A.; Guermoui, M.; Teta, A.; Tibermacine, I.E.; Rabahi, A.; Benghanem, M. A Robust Deep Learning Approach for Photovoltaic Power Forecasting Based on Feature Selection and Variational Mode Decomposition. J. Niger. Soc. Phys. Sci. 2025, 7, 2795. [Google Scholar] [CrossRef]
- Zhang, X.; Bose, I. Reliability Estimation for Individual Predictions in Machine Learning Systems: A Model Reliability-Based Approach. Decis. Support Syst. 2024, 186, 114305. [Google Scholar] [CrossRef]
- Hewamalage, H.; Ackermann, K.; Bergmeir, C. Forecast Evaluation for Data Scientists: Common Pitfalls and Best Practices. Data Min. Knowl. Discov. 2023, 37, 788–832. [Google Scholar] [PubMed]
- Jailani, N.L.M.; Dhanasegaran, J.K.; Alkawsi, G.; Alkahtani, A.A.; Phing, C.C.; Baashar, Y.; Capretz, L.F.; Al-Shetwi, A.Q.; Tiong, S.K. Investigating the Power of LSTM-Based Models in Solar Energy Forecasting. Processes 2023, 11, 1382. [Google Scholar] [CrossRef]
- Sales, V.G.; Strobl, E.; Elliott, R.J. Cloud Cover and Its Impact on Brazil’s Deforestation Satellite Monitoring Program: Evidence from the Cerrado Biome of the Brazilian Legal Amazon. Appl. Geogr. 2022, 140, 102651. [Google Scholar] [CrossRef]
- Emmerson, K.M.; Thatcher, M.; Osbrough, S.; Clarke, J.M. Quantifying Natural Emissions and Their Impacts on Air Quality in a 2050s Australia. Atmos. Environ. 2025, 349, 121144. [Google Scholar] [CrossRef]
- Solcast API Documentation. Available online: https://solcast.com/ (accessed on 1 December 2025).
- Al-Timimi, Y.J.; Al-Khudhairy, D.H. Analysis of Air Temperature Trends in Iraq. J. Environ. Earth Sci. 2015, 5, 14–25. [Google Scholar]
- Al-Salihi, A.M.; Mohammed, H.A. Analysis of the Relationship Between Meteorological Parameters and Aerosol Optical Depth over Iraq. Atmos. Res. 2020, 239, 104923. [Google Scholar]
- Voyant, C.; Notton, G.; Kalogirou, S.; Nivet, M.L.; Paoli, C.; Motte, F.; Fouilloy, A. Machine Learning Methods for Solar Radiation Forecasting: A Review. Renew. Energy 2017, 105, 569–582. [Google Scholar] [CrossRef]
- Mohamed, M.; Mahmood, F.E.; Abd, M.A.; Chandra, A.; Singh, B. Dynamic Forecasting of Solar Energy Microgrid Systems Using Feature Engineering. IEEE Trans. Ind. Appl. 2022, 58, 7857–7869. [Google Scholar] [CrossRef]
- Al-Musaylh, M.S.; Al-Daffaie, K.; Downs, N.; Ghimire, S.; Ali, M.; Yaseen, Z.M.; Igoe, D.P.; Deo, R.C.; Parisi, A.V.; Jebar, M.A. Multi-Step Solar Ultraviolet Index Prediction: Integrating Convolutional Neural Networks with Long Short-Term Memory for a Representative Case Study in Queensland, Australia. Model. Earth Syst. Environ. 2025, 11, 77. [Google Scholar]
- Mugware, F.W.; Ravele, T.; Sigauke, C. Short-Term Predictions of Global Horizontal Irradiance Using Recurrent Neural Networks, Support Vector Regression, Gradient Boosting Random Forest and Advanced Stacking Ensemble Approaches. Computation 2025, 13, 72. [Google Scholar] [CrossRef]
- Chou, J.S.; Krang, J.; Limantonio, D.N. Regional Solar Generation Prediction with Metaheuristically Optimized Artificial Intelligence for Sustainable Grid Management. Renew. Energy 2025, 231, 124641. [Google Scholar]
- Mohanasundaram, V.; Rangaswamy, B. Photovoltaic Solar Energy Prediction Using the Seasonal-Trend Decomposition Layer and ASOA Optimized LSTM Neural Network Model. Sci. Rep. 2025, 15, 4032. [Google Scholar] [CrossRef] [PubMed]
- Al-Hilfi, H.A.; Abu-Siada, A.; Shahnia, F. Estimating Generated Power of Photovoltaic Systems During Cloudy Days Using Gene Expression Programming. IEEE J. Photovolt. 2020, 11, 185–194. [Google Scholar] [CrossRef]
- Terrén-Serrano, G.; Martínez-Ramón, M. Deep Learning for Intra-Hour Solar Forecasting with Fusion of Features Extracted from Infrared Sky Images. Inf. Fusion 2023, 95, 42–61. [Google Scholar] [CrossRef]
- Marion, B. A Model for Deriving the Direct Normal and Diffuse Horizontal Irradiance from the Global Tilted Irradiance. Sol. Energy 2015, 122, 1037–1046. [Google Scholar] [CrossRef]
- Kakou, P.C.K.; Laouali, D.; Aka, B.; Osei, J.A.; Ette, N.F.K.; Frey, G. Multi-Timescale Validation of Satellite-Derived Global Horizontal Irradiance in Côte d’Ivoire. Remote Sens. 2025, 17, 998. [Google Scholar]
- Psiloglou, B.E.; Kambezidis, H.D.; Kaskaoutis, D.G.; Karagiannis, D.; Polo, J.M. Comparison between MRM simulations, CAMS and PVGIS databases with measured solar radiation components at the Methoni station, Greece. Renew. Energy 2020, 146, 1372–1391. [Google Scholar] [CrossRef]
- Madhiarasan, M. Bayesian Optimisation Algorithm Based Optimised Deep Bidirectional Long Short Term Memory for Global Horizontal Irradiance Prediction in Long-Term Horizon. Front. Energy Res. 2025, 13, 1499751. [Google Scholar] [CrossRef]
- Mystakidis, A.; Koukaras, P.; Tsalikidis, N.; Ioannidis, D.; Tjortjis, C. Energy Forecasting: A Comprehensive Review of Techniques and Technologies. Energies 2024, 17, 1662. [Google Scholar] [CrossRef]
- Al-Hilfi, H.A.; Abu-Siada, A.; Shahnia, F. Combined ANFIS-Wavelet Technique to Improve the Estimation Accuracy of the Power Output of Neighboring PV Systems During Cloud Events. Energies 2020, 13, 1613. [Google Scholar]






| Weather Parameter | Unit | Mean | Standard Deviation | Minimum | First Quartile of Data (25%) | Median Quartile of Data (50%) | Third Quartile of Data (75%) | Maximum |
|---|---|---|---|---|---|---|---|---|
| W/m2 | 440.07 | 295.07 | 1.00 | 163.00 | 436.00 | 690.00 | 1055.00 | |
| W/m2 | 357.74 | 270.84 | 0.00 | 49.00 | 391.00 | 591.00 | 949.00 | |
| Diffuse horizontal irradiance (dhi) | W/m2 | 192.14 | 108.17 | 1.00 | 110.00 | 189.00 | 271.00 | 595.00 |
| Global tilted irradiance (gti) | W/m2 | 504.74 | 321.99 | 0.00 | 184.00 | 552.00 | 799.00 | 1088.00 |
| Air temperature | °C | 31.51 | 10.78 | −1.00 | 22.00 | 32.00 | 41.00 | 52.00 |
| Relative humidity | % | 28.46 | 20.88 | 4.40 | 12.70 | 20.70 | 37.70 | 100.00 |
| Wind speed at 10 m | m/s | 4.80 | 2.54 | 0.00 | 2.80 | 4.50 | 6.50 | 14.60 |
| Wind direction at 10 m | ° | 254.90 | 89.67 | 0.00 | 185.00 | 298.00 | 315.00 | 360.00 |
| Cloud opacity | % | 8.21 | 18.63 | 0.00 | 0.00 | 0.00 | 3.20 | 97.00 |
| Albedo | - | 0.27 | 0.03 | 0.21 | 0.25 | 0.28 | 0.29 | 0.30 |
| Clear sky index | - | 0.92 | 0.19 | 0.02 | 0.97 | 1.00 | 1.00 | 1.00 |
| Rank | Feature Name | Frequency * | Avg. Importance # | Description |
|---|---|---|---|---|
| 1 | 20 | 61.28 | GHI value from the previous timestep (lagged GHI) | |
| 2 | 20 | 22.52 | Theoretical maximum GHI under clear skies | |
| 3 | 20 | 5.43 | Instantaneous rate of change in GHI | |
| 4 | 20 | 1.73 | Ratio of measured GHI to clear sky GHI | |
| 5 | 20 | 1.56 | Cloud density from the previous timestep | |
| 6 | hour | 20 | 1.03 | Hour of the day |
| 7 | zenith | 20 | 0.9 | Solar zenith angle |
| 8 | 20 | 0.75 | Theoretical Direct Normal Irradiance | |
| 9 | 20 | 0.66 | Direct Normal Irradiance from previous timestep | |
| 10 | azimuth | 20 | 0.63 | Solar azimuth angle |
| Resolution (min) | Clear Sky (%) | Partly Cloudy (%) | Cloudy (%) |
|---|---|---|---|
| 5 | 90.86 | 5.17 | 3.97 |
| 10 | 90.85 | 5.14 | 4.01 |
| 15 | 81.55 | 7.43 | 11.03 |
| 30 | 91.24 | 4.92 | 3.84 |
| 60 | 91.19 | 4.80 | 4.01 |
| Models | Key Parameters | Search Range (Min, Max) | Values |
|---|---|---|---|
| Random Forest | n_estimators | (50, 300) | 100 |
| max_depth | (5, 20) | 15 | |
| min_samples_split | (2, 10) | 2 | |
| min_samples_leaf | (1, 10) | 1 | |
| max_features | (1, 10) | 1 | |
| Gradient Boosting | learning_rate | (0.01, 0.3) | 0.1 |
| n_estimators | (50, 300) | 100 | |
| max_depth | (3, 10) | 5 | |
| subsample | (0.7, 1.0) | 1 | |
| HistGradientBoosting | learning_rate | (0.01, 0.3) | 0.1 |
| max_depth | (5, 20) | 15 | |
| max_iter | (50, 300) | 100 | |
| min_samples_leaf | (10, 50) | 20 | |
| XGBoost | learning_rate | (0.01, 0.3) | 0.1 |
| max_depth | (3, 10) | 5 | |
| n_estimators | (50, 300) | 100 | |
| subsample | (0.7, 1.0) | 1 | |
| colsample_bytree | (0.7, 1.0) | 1 | |
| reg_alpha | (0, 1.0) | 0 | |
| reg_lambda | (0, 5.0) | 1 | |
| LightGBM | learning_rate | (0.01, 0.3) | 0.1 |
| num_leaves | (20, 50) | 31 | |
| max_depth | (3, 10) | 5 | |
| feature_fraction | (0.7, 1.0) | 1 | |
| bagging_fraction | (0.7, 1.0) | 1 | |
| min_data_in_leaf | (10, 50) | 20 |
| Resolution (min) | Model | RMSE | MAE | sMAPE | R2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Training | Validation | Test | Training | Validation | Test | Training | Validation | Test | Training | Validation | Test | ||
| 5 | GradientBoosting | 10.53 | 14.82 | 17.65 | 4.07 | 5.28 | 5.6 | 2.82 | 3.71 | 4.1 | 0.999 | 0.998 | 0.996 |
| HistGradientBoosting | 10.51 | 14.98 | 17.78 | 4.59 | 5.86 | 6.19 | 3.72 | 4.52 | 4.89 | 0.999 | 0.997 | 0.996 | |
| LightGBM | 10.59 | 14.84 | 17.69 | 4.15 | 5.35 | 5.68 | 3.19 | 4.03 | 4.48 | 0.999 | 0.998 | 0.996 | |
| Persistence | 15.84 | 18.88 | 21.32 | 12.43 | 13.45 | 14.04 | 9.9 | 10.53 | 11.02 | 0.997 | 0.996 | 0.995 | |
| RandomForest | 7.69 | 14.98 | 17.77 | 2.62 | 4.7 | 4.91 | 1.38 | 2.82 | 3.2 | 0.999 | 0.997 | 0.996 | |
| XGBoost | 10.19 | 15.12 | 17.86 | 4.44 | 5.87 | 6.2 | 3.28 | 4.17 | 4.52 | 0.999 | 0.997 | 0.996 | |
| 10 | GradientBoosting | 19.46 | 24.07 | 26.69 | 7.31 | 8.74 | 8.67 | 4.32 | 5.16 | 5.45 | 0.996 | 0.993 | 0.992 |
| HistGradientBoosting | 19.19 | 24.19 | 26.77 | 7.6 | 9.17 | 9.12 | 5.2 | 6.04 | 6.3 | 0.996 | 0.993 | 0.992 | |
| LightGBM | 19.63 | 24.02 | 26.65 | 7.32 | 8.73 | 8.66 | 4.6 | 5.31 | 5.59 | 0.996 | 0.994 | 0.992 | |
| Persistence | 30.96 | 33.72 | 36.01 | 24.57 | 25.62 | 26.17 | 17.72 | 18.38 | 18.93 | 0.989 | 0.987 | 0.985 | |
| RandomForest | 13.28 | 24.5 | 26.87 | 4.67 | 8.2 | 7.93 | 2.38 | 4.52 | 4.74 | 0.998 | 0.993 | 0.992 | |
| XGBoost | 18.46 | 24.53 | 26.95 | 7.75 | 9.65 | 9.56 | 4.77 | 5.7 | 5.98 | 0.996 | 0.993 | 0.992 | |
| 15 | GradientBoosting | 27.35 | 28.54 | 33.98 | 10.38 | 11.13 | 11.6 | 5.41 | 5.95 | 6.58 | 0.991 | 0.991 | 0.987 |
| HistGradientBoosting | 26.84 | 28.87 | 34.25 | 10.54 | 11.44 | 12 | 6.33 | 6.91 | 7.35 | 0.992 | 0.991 | 0.987 | |
| LightGBM | 27.72 | 28.57 | 33.93 | 10.43 | 11.11 | 11.57 | 5.61 | 6.06 | 6.59 | 0.991 | 0.991 | 0.987 | |
| Persistence | 45.9 | 45.35 | 49.7 | 36.63 | 36.52 | 38.05 | 24.25 | 24.72 | 25.26 | 0.976 | 0.977 | 0.972 | |
| RandomForest | 17.92 | 29.54 | 34.51 | 6.38 | 10.71 | 10.91 | 3.17 | 5.64 | 6 | 0.996 | 0.99 | 0.987 | |
| XGBoost | 25.2 | 30.17 | 34.69 | 10.55 | 12.44 | 12.7 | 6.01 | 6.79 | 7.28 | 0.993 | 0.99 | 0.986 | |
| 30 | GradientBoosting | 36.62 | 42.33 | 42.8 | 14.93 | 16.57 | 15.5 | 7.12 | 7.93 | 8.03 | 0.985 | 0.98 | 0.98 |
| HistGradientBoosting | 36.65 | 42.75 | 43.13 | 15.13 | 16.84 | 15.82 | 7.94 | 8.32 | 8.37 | 0.985 | 0.98 | 0.98 | |
| LightGBM | 37.44 | 42.18 | 42.55 | 15.08 | 16.46 | 15.4 | 7.28 | 7.83 | 7.91 | 0.984 | 0.98 | 0.98 | |
| Persistence | 81.02 | 82.5 | 83.82 | 68.46 | 69.67 | 70.26 | 40.11 | 40.13 | 40.5 | 0.924 | 0.924 | 0.923 | |
| RandomForest | 20.63 | 42.91 | 43.69 | 8.16 | 16.05 | 14.93 | 3.95 | 7.41 | 7.34 | 0.995 | 0.98 | 0.979 | |
| XGBoost | 32.7 | 43.9 | 44.34 | 14.58 | 17.92 | 16.96 | 7.3 | 8.43 | 8.43 | 0.988 | 0.979 | 0.979 | |
| 60 | GradientBoosting | 42.39 | 50.5 | 50.26 | 17.36 | 20.37 | 18.7 | 8.07 | 9.1 | 8.92 | 0.98 | 0.972 | 0.972 |
| HistGradientBoosting | 41.07 | 50.45 | 49.77 | 17.3 | 20.55 | 18.81 | 9.23 | 9.57 | 9.23 | 0.981 | 0.972 | 0.973 | |
| LightGBM | 43.61 | 50.12 | 50.31 | 17.66 | 20.28 | 18.81 | 8.26 | 9.12 | 8.95 | 0.979 | 0.972 | 0.972 | |
| Persistence | 145.44 | 148.53 | 150.75 | 125.74 | 128.63 | 130.49 | 64.15 | 62.78 | 62.3 | 0.762 | 0.755 | 0.751 | |
| RandomForest | 22.78 | 50.93 | 50.88 | 9.18 | 20.05 | 18.31 | 4.16 | 8.8 | 8.58 | 0.994 | 0.971 | 0.972 | |
| XGBoost | 36.71 | 51.29 | 50.23 | 16.34 | 21.49 | 19.36 | 8.24 | 9.68 | 9.3 | 0.985 | 0.971 | 0.972 | |
| Resolution (min) | Model | Weather | RMSE | MAE | sMAPE | R2 |
|---|---|---|---|---|---|---|
| 5 | Persistence | Overall (Daytime) | 21.32 | 14.04 | 11.02 | 0.995 |
| Clear Sky | 18.2 | 13.13 | 9.25 | 0.996 | ||
| Partly Cloudy | 45.15 | 26.24 | 29.26 | 0.881 | ||
| Cloudy | 32.02 | 17.36 | 27.42 | 0.683 | ||
| RandomForest | Overall (Daytime) | 17.77 | 4.91 | 3.2 | 0.996 | |
| Clear Sky | 13.33 | 3.12 | 1.56 | 0.998 | ||
| Partly Cloudy | 44.8 | 25.1 | 17.01 | 0.883 | ||
| Cloudy | 33.23 | 18.2 | 23.84 | 0.659 | ||
| GradientBoosting | Overall (Daytime) | 17.65 | 5.6 | 4.1 | 0.996 | |
| Clear Sky | 13.3 | 3.89 | 2.31 | 0.998 | ||
| Partly Cloudy | 44.24 | 24.84 | 18.24 | 0.886 | ||
| Cloudy | 33.04 | 18.46 | 28.45 | 0.662 | ||
| HistGradientBoosting | Overall (Daytime) | 17.78 | 6.19 | 4.89 | 0.996 | |
| Clear Sky | 13.47 | 4.54 | 3.09 | 0.998 | ||
| Partly Cloudy | 44.25 | 24.82 | 20.28 | 0.886 | ||
| Cloudy | 33.17 | 18.41 | 27.31 | 0.66 | ||
| XGBoost | Overall (Daytime) | 17.7 | 5.72 | 4.18 | 0.996 | |
| Clear Sky | 13.35 | 4.03 | 2.43 | 0.998 | ||
| Partly Cloudy | 44.29 | 24.86 | 18.15 | 0.885 | ||
| Cloudy | 33.1 | 18.4 | 27.94 | 0.661 | ||
| LightGBM | Overall (Daytime) | 17.69 | 5.68 | 4.48 | 0.996 | |
| Clear Sky | 13.34 | 3.99 | 2.67 | 0.998 | ||
| Partly Cloudy | 44.28 | 24.82 | 19.98 | 0.885 | ||
| Cloudy | 33.07 | 18.34 | 27.16 | 0.662 | ||
| 10 | Persistence | Overall (Daytime) | 36.01 | 26.17 | 18.93 | 0.985 |
| Clear Sky | 32.06 | 25.08 | 16.48 | 0.988 | ||
| Partly Cloudy | 67.4 | 41.46 | 44.22 | 0.735 | ||
| Cloudy | 54.05 | 29 | 41.22 | 0.099 | ||
| RandomForest | Overall (Daytime) | 26.87 | 7.93 | 4.74 | 0.992 | |
| Clear Sky | 20.08 | 5.08 | 2.37 | 0.995 | ||
| Partly Cloudy | 64.13 | 37.76 | 23.36 | 0.76 | ||
| Cloudy | 59 | 33.43 | 37.13 | −0.074 | ||
| GradientBoosting | Overall (Daytime) | 26.69 | 8.67 | 5.45 | 0.992 | |
| Clear Sky | 19.8 | 5.89 | 2.96 | 0.995 | ||
| Partly Cloudy | 64.26 | 37.65 | 24.25 | 0.759 | ||
| Cloudy | 58.85 | 33.57 | 40.76 | −0.069 | ||
| HistGradientBoosting | Overall (Daytime) | 26.77 | 9.12 | 6.3 | 0.992 | |
| Clear Sky | 19.86 | 6.37 | 3.67 | 0.995 | ||
| Partly Cloudy | 64.18 | 37.7 | 27.48 | 0.759 | ||
| Cloudy | 59.47 | 33.91 | 41.21 | −0.091 | ||
| XGBoost | Overall (Daytime) | 26.61 | 8.7 | 5.41 | 0.992 | |
| Clear Sky | 19.81 | 5.96 | 2.91 | 0.995 | ||
| Partly Cloudy | 63.9 | 37.45 | 24.57 | 0.761 | ||
| Cloudy | 58.3 | 33.15 | 40.27 | −0.049 | ||
| LightGBM | Overall (Daytime) | 26.65 | 8.66 | 5.59 | 0.992 | |
| Clear Sky | 19.81 | 5.9 | 3.16 | 0.995 | ||
| Partly Cloudy | 64.17 | 37.66 | 24.57 | 0.759 | ||
| Cloudy | 58.51 | 33.04 | 39.18 | −0.056 | ||
| 15 | Persistence | Overall (Daytime) | 49.7 | 38.05 | 25.26 | 0.972 |
| Clear Sky | 45.16 | 37.38 | 14.52 | 0.972 | ||
| Partly Cloudy | 81.55 | 56.32 | 37.77 | 0.653 | ||
| Cloudy | 50.13 | 28.53 | 101.33 | 0.192 | ||
| RandomForest | Overall (Daytime) | 34.51 | 10.91 | 6 | 0.987 | |
| Clear Sky | 23.98 | 6.23 | 1.86 | 0.992 | ||
| Partly Cloudy | 71.34 | 42.29 | 19.12 | 0.734 | ||
| Cloudy | 54.36 | 22.86 | 28.6 | 0.05 | ||
| GradientBoosting | Overall (Daytime) | 33.98 | 11.6 | 6.58 | 0.987 | |
| Clear Sky | 23.1 | 7.08 | 2.18 | 0.993 | ||
| Partly Cloudy | 71.58 | 42.35 | 19.23 | 0.733 | ||
| Cloudy | 53.84 | 22.76 | 31.63 | 0.068 | ||
| HistGradientBoosting | Overall (Daytime) | 34.25 | 12 | 7.35 | 0.987 | |
| Clear Sky | 23.28 | 7.41 | 2.33 | 0.993 | ||
| Partly Cloudy | 71.66 | 42.87 | 19.88 | 0.732 | ||
| Cloudy | 54.77 | 23.56 | 37.47 | 0.035 | ||
| XGBoost | Overall (Daytime) | 33.9 | 11.66 | 6.89 | 0.987 | |
| Clear Sky | 23.14 | 7.17 | 2.21 | 0.993 | ||
| Partly Cloudy | 71.26 | 42.09 | 19.13 | 0.735 | ||
| Cloudy | 53.51 | 22.86 | 34.52 | 0.079 | ||
| LightGBM | Overall (Daytime) | 33.93 | 11.57 | 6.59 | 0.987 | |
| Clear Sky | 23.2 | 7.08 | 2.18 | 0.993 | ||
| Partly Cloudy | 71.23 | 42.21 | 19.24 | 0.735 | ||
| Cloudy | 53.59 | 22.6 | 31.72 | 0.076 | ||
| 30 | Persistence | Overall (Daytime) | 83.82 | 70.26 | 40.5 | 0.923 |
| Clear Sky | 81.2 | 70.47 | 37.57 | 0.925 | ||
| Partly Cloudy | 113.24 | 76.68 | 75.81 | 0.28 | ||
| Cloudy | 96.33 | 53.3 | 62.49 | −2.071 | ||
| RandomForest | Overall (Daytime) | 43.69 | 14.93 | 7.34 | 0.979 | |
| Clear Sky | 30.7 | 9.83 | 3.77 | 0.989 | ||
| Partly Cloudy | 100.62 | 62 | 32.88 | 0.432 | ||
| Cloudy | 123.14 | 78.38 | 64.5 | −4.018 | ||
| GradientBoosting | Overall (Daytime) | 42.8 | 15.5 | 8.03 | 0.98 | |
| Clear Sky | 29.56 | 10.54 | 4.33 | 0.99 | ||
| Partly Cloudy | 100 | 61.11 | 34.91 | 0.439 | ||
| Cloudy | 122.11 | 77.52 | 66.77 | −3.934 | ||
| HistGradientBoosting | Overall (Daytime) | 43.13 | 15.82 | 8.37 | 0.98 | |
| Clear Sky | 29.52 | 10.73 | 4.67 | 0.99 | ||
| Partly Cloudy | 100.26 | 62.1 | 35.36 | 0.436 | ||
| Cloudy | 125.55 | 80.17 | 66.79 | −4.217 | ||
| XGBoost | Overall (Daytime) | 42.56 | 15.46 | 8.01 | 0.98 | |
| Clear Sky | 29.31 | 10.5 | 4.33 | 0.99 | ||
| Partly Cloudy | 99.71 | 61.01 | 34.73 | 0.442 | ||
| Cloudy | 121.7 | 77.48 | 66.29 | −3.901 | ||
| LightGBM | Overall (Daytime) | 42.55 | 15.4 | 7.91 | 0.98 | |
| Clear Sky | 29.43 | 10.43 | 4.26 | 0.99 | ||
| Partly Cloudy | 100.39 | 61.57 | 33.99 | 0.434 | ||
| Cloudy | 119.8 | 76.92 | 66.53 | −3.75 | ||
| 60 | Persistence | Overall (Daytime) | 150.75 | 130.49 | 62.3 | 0.751 |
| Clear Sky | 152.2 | 134.62 | 59.54 | 0.735 | ||
| Partly Cloudy | 144.53 | 102.63 | 99.6 | −0.224 | ||
| Cloudy | 117.26 | 63.51 | 76.73 | −3.515 | ||
| RandomForest | Overall (Daytime) | 50.88 | 18.31 | 8.58 | 0.972 | |
| Clear Sky | 37.05 | 12.48 | 4.56 | 0.984 | ||
| Partly Cloudy | 111.9 | 70.45 | 36.06 | 0.266 | ||
| Cloudy | 139.71 | 92.16 | 73.11 | −5.41 | ||
| GradientBoosting | Overall (Daytime) | 50.26 | 18.7 | 8.92 | 0.972 | |
| Clear Sky | 36.17 | 12.85 | 4.8 | 0.985 | ||
| Partly Cloudy | 111.97 | 70.34 | 36.5 | 0.265 | ||
| Cloudy | 139.2 | 93.66 | 76.19 | −5.363 | ||
| HistGradientBoosting | Overall (Daytime) | 49.77 | 18.81 | 9.23 | 0.973 | |
| Clear Sky | 35.95 | 13.05 | 5.12 | 0.985 | ||
| Partly Cloudy | 111.27 | 70.3 | 37.98 | 0.274 | ||
| Cloudy | 136.39 | 91.68 | 74.4 | −5.109 | ||
| XGBoost | Overall (Daytime) | 50.21 | 18.72 | 8.9 | 0.972 | |
| Clear Sky | 36.23 | 12.91 | 4.77 | 0.985 | ||
| Partly Cloudy | 111.82 | 70.06 | 36.76 | 0.267 | ||
| Cloudy | 138.4 | 93.25 | 75.86 | −5.29 | ||
| LightGBM | Overall (Daytime) | 50.31 | 18.81 | 8.95 | 0.972 | |
| Clear Sky | 36.32 | 13 | 4.88 | 0.985 | ||
| Partly Cloudy | 111.5 | 69.8 | 36.29 | 0.271 | ||
| Cloudy | 139.33 | 93.87 | 75.31 | −5.375 |
| Resolution (min) | Clear Sky | Cloudy | Overall (Daytime) | Partly Cloudy |
|---|---|---|---|---|
| 5 | GradientBoosting | Persistence | GradientBoosting | GradientBoosting |
| 10 | GradientBoosting | Persistence | XGBoost | XGBoost |
| 15 | GradientBoosting | Persistence | XGBoost | LightGBM |
| 30 | XGBoost | Persistence | LightGBM | XGBoost |
| 60 | HistGradientBoosting | Persistence | HistGradientBoosting | HistGradientBoosting |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Al-Hilfi, H.A.H.; Shahnia, F.; Celtek, S.A.; Yazdani, A.; Wang, H. Benchmarking Tree-Based Artificial Intelligence Models for Multi-Resolution Solar Irradiance Forecasting Across Various Sky Conditions in Arid Climates. Energies 2026, 19, 3065. https://doi.org/10.3390/en19133065
Al-Hilfi HAH, Shahnia F, Celtek SA, Yazdani A, Wang H. Benchmarking Tree-Based Artificial Intelligence Models for Multi-Resolution Solar Irradiance Forecasting Across Various Sky Conditions in Arid Climates. Energies. 2026; 19(13):3065. https://doi.org/10.3390/en19133065
Chicago/Turabian StyleAl-Hilfi, Hasanain A. H., Farhad Shahnia, Seyit Alperen Celtek, Amirmehdi Yazdani, and Hai Wang. 2026. "Benchmarking Tree-Based Artificial Intelligence Models for Multi-Resolution Solar Irradiance Forecasting Across Various Sky Conditions in Arid Climates" Energies 19, no. 13: 3065. https://doi.org/10.3390/en19133065
APA StyleAl-Hilfi, H. A. H., Shahnia, F., Celtek, S. A., Yazdani, A., & Wang, H. (2026). Benchmarking Tree-Based Artificial Intelligence Models for Multi-Resolution Solar Irradiance Forecasting Across Various Sky Conditions in Arid Climates. Energies, 19(13), 3065. https://doi.org/10.3390/en19133065

