A Hybrid Ensemble Learning Framework for Accurate Photovoltaic Power Prediction
Abstract
1. Introduction
2. Literature Review
- 1.
- Many models are trained on site-specific or small datasets, undermining generalizability.
- 2.
- The temporal structure of solar data remains underutilized.
- 3.
- Hybrid ensemble models remain underexplored.
3. Proposed Methodology
3.1. Data Preparation
3.1.1. Dataset Collection
3.1.2. Data Visualization
3.1.3. Data Preprocessing
3.1.4. Model Hyperparameters and Implementation Details
3.1.5. Tree-Based Models and Ensemble Strategy
3.1.6. Feature Selection
3.1.7. Data Splitting
3.2. Principle of ML Algorithm
3.2.1. Linear Regression (LR)
3.2.2. Random Forest
3.2.3. Support Vector Regression
3.2.4. K-Nearest Neighbors (KNN)
3.2.5. Convolutional Neural Network (CNN)
3.2.6. Extreme Gradient Boosting (XGBoost)
3.2.7. Gradient Boosting
3.2.8. Light Gradient Boosting Machine (LightGBM)
3.2.9. CatBoost
3.2.10. Lasso Regression
3.2.11. Ensemble Method
4. Performance Analysis
Experimental Setup
5. Results and Discussion
5.1. Station-Wise Performance Evaluation
5.2. Robustness Analysis Under Time-Aware, Rolling-Origin, and Station-Wise Evaluations
5.3. Comparative Description of All Models
5.4. Model Performance Comparison with Other Studies
5.5. Model Interpretability Using SHAP
6. Limitations and Future Directions
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, Y.; Wang, R.; Tanaka, K.; Ciais, P.; Penuelas, J.; Balkanski, Y.; Sardans, J.; Hauglustaine, D.; Liu, W.; Xing, X.; et al. Accelerating the energy transition towards photovoltaic and wind in China. Nature 2023, 619, 761–767. [Google Scholar] [CrossRef]
- Sepúlveda-Oviedo, E.H. Impact of environmental factors on photovoltaic system performance degradation. Energy Strategy Rev. 2025, 59, 101682. [Google Scholar] [CrossRef]
- Cui, S.; Lyu, S.; Ma, Y.; Wang, K. Improved informer PV power short-term prediction model based on weather typing and AHA-VMD-MPE. Energy 2024, 307, 132766, Correction in Energy 2024, 310, 133290. [Google Scholar] [CrossRef]
- Aguilar, D.; Quinones, J.J.; Pineda, L.R.; Ostanek, J.; Castillo, L. Optimal scheduling of renewable energy microgrids: A robust multi-objective approach with machine learning-based probabilistic forecasting. Appl. Energy 2024, 369, 123548. [Google Scholar] [CrossRef]
- Yan, J.; Möhrlen, C.; Göçmen, T.; Kelly, M.; Wessel, A.; Giebel, G. Uncovering wind power forecasting uncertainty sources and their propagation through the whole modelling chain. Renew. Sustain. Energy Rev. 2022, 165, 112519. [Google Scholar] [CrossRef]
- Kim, J.M. Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis. Mathematics 2025, 13, 1659. [Google Scholar] [CrossRef]
- Cha, Y.J.; Ali, R.; Lewis, J.; Büyüköztürk, O. Deep learning-based structural health monitoring. Autom. Constr. 2024, 161, 105328. [Google Scholar] [CrossRef]
- Khouili, O.; Hanine, M.; Louzazni, M.; Flores, M.A.L.; Villena, E.G.; Ashraf, I. Evaluating the impact of deep learning approaches on solar and photovoltaic power forecasting: A systematic review. Energy Strategy Rev. 2025, 59, 101735. [Google Scholar] [CrossRef]
- Liu, Y.; Li, L.; Zhou, S. Ensemble Forecasting Frame Based on Deep Learning and Multi-Objective Optimization for Planning Solar Energy Management: A Case Study. Front. Energy Res. 2021, 9, 764635. [Google Scholar] [CrossRef]
- Zhou, Y.; Li, Y.; Wang, D.; Liu, Y. A multi-step ahead global solar radiation prediction method using an attention-based transformer model with an interpretable mechanism. Int. J. Hydrogen Energy 2023, 48, 15317–15330. [Google Scholar] [CrossRef]
- Hamad, S.A.; Ghalib, M.A.; Munshi, A.; Alotaibi, M.; Ebied, M.A. Evaluating machine learning models comprehensively for predicting maximum power from photovoltaic systems. Sci. Rep. 2025, 15, 10750. [Google Scholar] [CrossRef] [PubMed]
- Yao, T.; Wang, J.; Wu, H.; Zhang, P.; Li, S.; Wang, Y.; Chi, X.; Shi, M. A photovoltaic power output dataset: Multi-source photovoltaic power output dataset with Python toolkit. Sol. Energy 2021, 230, 122–130. [Google Scholar] [CrossRef]
- Tripathi, A.K.; Aruna, M.; Elumalai, P.; Karthik, K.; Khan, S.A.; Asif, M.; Rao, K.S. Advancing solar PV panel power prediction: A comparative machine learning approach in fluctuating environmental conditions. Case Stud. Therm. Eng. 2024, 59, 104459. [Google Scholar] [CrossRef]
- Van Tai, D. Solar photovoltaic power output forecasting using machine learning technique. J. Phys. Conf. Ser. 2019, 1327, 012051. [Google Scholar] [CrossRef]
- Varughese, R.A.; Karpagam, R. Prediction of Solar Power Generation Based on Machine Learning Algorithm. In Proceedings of the 2022 3rd International Conference on Intelligent Computing, Instrumentation and Control Technologies: Computational Intelligence for Smart Systems, ICICICT 2022, Kannur, India, 11–12 August 2022; pp. 396–400. [Google Scholar] [CrossRef]
- Li, Z. Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost. Comput. Environ. Urban Syst. 2022, 96, 101845. [Google Scholar] [CrossRef]
- Chaaban, A.K.; Alfadl, N. A comparative study of machine learning approaches for an accurate predictive modeling of solar energy generation. Energy Rep. 2024, 12, 1293–1302. [Google Scholar] [CrossRef]
- Bashir, A.; Ahmad, E.; Dulawat, S.; Abba, S.I. Harnessing synergy of machine learning and nature-inspired optimization for enhanced compressive strength prediction in concrete. Hybrid Adv. 2025, 9, 100404. [Google Scholar] [CrossRef]
- Thanh, H.V.; Dai, Z.; Rahimi, M. Data-driven explainable machine learning approaches for predicting hydrogen adsorption in porous crystalline materials. J. Alloys Compd. 2025, 1028, 180709. [Google Scholar] [CrossRef]
- Sekar, S.; Kaviya, V.; Vivekanandan, G.; Nithya, P. Solar Panel Power Prediction Using Machine Learning Technique. In Proceedings of the 2023 Intelligent Computing and Control for Engineering and Business Systems, ICCEBS 2023, Chennai, India, 14–15 December 2023. [Google Scholar] [CrossRef]
- Subeshan, B.; Atayo, A.; Asmatulu, E. Machine learning applications for electrospun nanofibers: A review. J. Mater. Sci. 2024, 59, 14095–14140. [Google Scholar] [CrossRef]
- Subramanian, E.; Karthik, M.M.; Krishna, G.P.; Prasath, D.V.; Kumar, V.S. Solar Power Prediction Using Machine Learning. 2023. Available online: https://arxiv.org/pdf/2303.07875 (accessed on 29 June 2025).
- Suanpang, P.; Jamjuntr, P. Machine Learning Models for Solar Power Generation Forecasting in Microgrid Application Implications for Smart Cities. Sustainability 2024, 16, 6087. [Google Scholar] [CrossRef]
- Ali, M.; Rabehi, A.; Souahlia, A.; Guermoui, M.; Teta, A.; Tibermacine, I.E.; Rabehi, A.; Benghanem, M.; Agajie, T.F. Enhancing PV power forecasting through feature selection and artificial neural networks: A case study. Sci. Rep. 2025, 15, 22574. [Google Scholar] [CrossRef] [PubMed]
- Jiang, T.; Gao, Y.; Liu, G.; Zhao, G.; Hu, J.; Sa, R.; Gao, P. Dipole-Moment-Knowledge-Guided Molecular Design for Perovskite Surface Passivation: A Gemma-Language-Model and DFT-Driven Framework. Adv. Theory Simul. 2025, 8, e01318. [Google Scholar] [CrossRef]
- Idogho, C.; Abah, E.O.; Onuhc, J.O.; Harsito, C.; Omenkaf, K.; Samuel, A.; Ejila, A.; Idoko, I.P.; Ali, U.E. Machine Learning-Based Solar Photovoltaic Power Forecasting for Nigerian Regions. Energy Sci. Eng. 2025, 13, 1922–1934. [Google Scholar] [CrossRef]
- Atiea, M.A.; Abdelghaffar, A.A.; Ben Aribia, H.; Noureddine, F.; Shaheen, A.M. Photovoltaic power generation forecasting with Bayesian optimization and stacked ensemble learning. Results Eng. 2025, 26, 104950. [Google Scholar] [CrossRef]
- Li, X.; Ma, L.; Chen, P.; Xu, H.; Xing, Q.; Yan, J.; Lu, S.; Fan, H.; Yang, L.; Cheng, Y. Probabilistic solar irradiance forecasting based on XGBoost. Energy Rep. 2022, 8, 1087–1095. [Google Scholar] [CrossRef]
- Kumar, M.; Borgohain, S.; Panda, K.P.; Thokchom, S.; Panda, G. Smart Solar Forecasting: Machine Learning Approaches for Predicting Solar Power. In Proceedings of the IEEE Region 10 Annual International Conference, Proceedings/TENCON, Singapore, 1–4 December 2024; pp. 1344–1349. [Google Scholar] [CrossRef]
- Kareem, A.K.; Nafea, A.A.; Mansour, M.T.; Jamil, N.N. Accurate Photovoltaic Power Prediction Using Machine Learning and Deep Learning: A Comparative Study Across Multiple Locations. ITEGAM-J. Eng. Technol. Ind. Appl. 2025, 11, 87–98. [Google Scholar] [CrossRef]
- Yao, T.; Wang, J.; Wu, H.; Zhang, P.; Li, S.; Wang, Y.; Chi, X.; Shi, M. PVOD v1.0: A Photovoltaic Power Output Dataset. Science Data Bank Datasets. Available online: https://www.scidb.cn/en/detail?dataSetId=f8f3d7af144f441795c5781497e56b62 (accessed on 12 July 2025).
- Sahin, G.; Isik, G.; van Sark, W.G. Predictive modeling of PV solar power plant efficiency considering weather conditions: A comparative analysis of artificial neural networks and multiple linear regression. Energy Rep. 2023, 10, 2837–2849. [Google Scholar] [CrossRef]
- Jogunuri, S.; Josh, F.T.; Stonier, A.A.; Peter, G.; Jayaraj, J.; Jaganathan, S.; Joseph, J.J.; Ganji, V. Random forest machine learning algorithm based seasonal multi-step ahead short-term solar photovoltaic power output forecasting. IET Renew. Power Gener. 2024, 19, e12921. [Google Scholar] [CrossRef]
- Zulkafli, N.A.; Bundak, C.E.A.; Rahman, M.A.A.; Yap, C.C.; Chong, K.-K.; Tan, S.T. Prediction of device performance in SnO2 based inverted organic solar cells using Machine learning framework. Sol. Energy 2024, 278, 112795. [Google Scholar] [CrossRef]
- Du, K.-L.; Jiang, B.; Lu, J.; Hua, J.; Swamy, M.N.S. Exploring Kernel Machines and Support Vector Machines: Principles, Techniques, and Future Directions. Mathematics 2024, 12, 3935. [Google Scholar] [CrossRef]
- Gadi, S.N.; Aly, H.H.; Cada, M. A proposed hybrid model of ANN and KNN for solar cell defects detection and temperature prediction using fuzzy image segmentation. Heliyon 2024, 10, e31774. [Google Scholar] [CrossRef]
- Zang, H.; Chen, D.; Liu, J.; Cheng, L.; Sun, G.; Wei, Z. Improving ultra-short-term photovoltaic power forecasting using a novel sky-image-based framework considering spatial-temporal feature interaction. Energy 2024, 293, 130538. [Google Scholar] [CrossRef]
- Qiu, R.; Liu, C.; Cui, N.; Gao, Y.; Li, L.; Wu, Z.; Jiang, S.; Hu, M. Generalized Extreme Gradient Boosting model for predicting daily global solar radiation for locations without historical data. Energy Convers. Manag. 2022, 258, 115488. [Google Scholar] [CrossRef]
- Krishnan, N.; Kumar, K.R.; Anirudh, R.S. Solar radiation forecasting using gradient boosting based ensemble learning model for various climatic zones. Sustain. Energy Grids Netw. 2024, 38, 101312. [Google Scholar] [CrossRef]
- Aksoy, N.; Genc, I. Predictive models development using gradient boosting based methods for solar power plants. J. Comput. Sci. 2023, 67, 101958. [Google Scholar] [CrossRef]
- Banik, R.; Biswas, A. Improving Solar PV Prediction Performance with RF-CatBoost Ensemble: A Robust and Complementary Approach. Renew. Energy Focus 2023, 46, 207–221. [Google Scholar] [CrossRef]
- Farhat, M.; Dekhane, A.; Djellad, A.; Takruri, M.; Al-Qaisi, A.; Barambones, O. Optimizing photovoltaic performance: Data-driven maximum power point prediction via advanced regression models. Results Control Optim. 2025, 20, 100586. [Google Scholar] [CrossRef]
- Anuradha, K.; Erlapally, D.; Karuna, G.; Srilakshmi, V.; Adilakshmi, K. Analysis of Solar Power Generation Forecasting Using Machine Learning Techniques. E3S Web Conf. 2021, 309, 01163. [Google Scholar] [CrossRef]
- Nguyen, H.N.; Tran, Q.T.; Ngo, C.T.; Nguyen, D.D.; Tran, V.Q. Solar energy prediction through machine learning models: A comparative analysis of regressor algorithms. PLoS ONE 2025, 20, e0315955. [Google Scholar] [CrossRef]
- Balal, A.; Jafarabadi, Y.P.; Demir, A.; Igene, M.; Giesselmann, M.; Bayne, S. Forecasting Solar Power Generation Utilizing Machine Learning Models in Lubbock. Emerg. Sci. J. 2023, 7, 1052–1062. [Google Scholar] [CrossRef]
















| Ref. | Year of Publication | Model/Architecture Type Used | Findings | Limitations |
|---|---|---|---|---|
| [15] | 2020 | XGBoost, Random Forest, Linear Regression | XGBoost demonstrated highest RÂ2 and lowest MAE in predicting solar power output. | Focused only on historical weather and irradiance data; external factors like shading ignored. |
| [17] | 2021 | KNN, ANN, SVR, Random Forest | Random Forest and SVR offered higher accuracy than KNN and ANN for solar power prediction. | Models not tuned extensively; limited evaluation metrics. |
| [20] | 2022 | ANN, SVM, XGBoost | XGBoost achieved highest accuracy; ANN showed good nonlinear mapping. | Hyperparameter tuning details limited; small dataset size. |
| [22] | 2023 | Linear Regression, Decision Tree, Random Forest, Gradient Boosting | Random Forest and Gradient Boosting performed best among traditional models. | Limited comparison to more advanced deep learning models. |
| [23] | 2024 | LGBM, KNN | LGBM outperformed KNN in accuracy (RÂ2 = 0.84 vs. 0.77), RMSE (5.77 vs. 6.93), and MAE (3.93 vs. 4.34), but with higher memory and training time. | LGBM requires longer training time and more memory; only two models compared. |
| [28] | 2024 | XGBoost | XGBoost showed promising performance in daily radiation prediction, validated across multiple stations. | XGBoost may be sensitive to data distribution shifts; limited ablation studies. |
| [29] | 2024 | Multiple Regression, Decision Tree, Random Forest | Random Forest yielded best results in terms of RÂ2 and MAE. | No ensemble or hybrid deep models included for comparison. |
| [30] | 2025 | ML + DL models (SVR, RF, GBR, ANN, 1D-CNN) | Demonstrates that 1D-CNN and tree-based ensembles outperform linear and single-tree models, emphasizing the benefit of nonlinear architectures. | Limited to two sites; does not implement explicit hybrid or weighted ensembles across models. |
| [25] | 2025 | Feature-selection-enhanced ensemble | Feature selection improves generalization and reduces model complexity, while ensemble learners provide competitive accuracy across varying conditions. | Focuses on feature-selection effects; does not explore multi-station open benchmarks or temporal feature engineering in depth. |
| [25] | 2025 | Feature selection + ensemble ML | Uses feature selection with ensemble learners to improve forecasting accuracy and reduce redundancy. | Single-site, proprietary data; no multi-station evaluation and ensemble design is less explicitly specified. |
| [26] | 2025 | ML models (tree-based, kernel methods) | Compares various ML models for PV forecasting across distinct climates, showing benefits of tree-based and kernel methods. | Region-specific, non-open data; no standardized multi-station benchmark like PVOD v1.0 and no explicit hybrid weighted ensemble |
| Station_File | Mean Power (kW) | Max Power (kW) | Std Dev | Count |
|---|---|---|---|---|
| station00.csv | 0.83 | 5.52 | 1.28 | 28,896 |
| station01.csv | 3.68 | 20 | 5.55 | 33,408 |
| station02.csv | 2.58 | 16.05 | 4 | 30,432 |
| station03.csv | 3.37 | 17.42 | 4.97 | 14,688 |
| station04.csv | 4.53 | 26.77 | 6.84 | 33,408 |
| station05.csv | 7.06 | 35.12 | 9.75 | 9696 |
| station06.csv | 1.69 | 11.74 | 2.66 | 31,104 |
| station07.csv | 2.62 | 17.28 | 4.11 | 32,928 |
| station08.csv | 2.88 | 17.87 | 4.45 | 33,120 |
| station09.csv | 1.35 | 12.04 | 2.19 | 24,288 |
| Model | Key Hyperparameters |
|---|---|
| Random Forest | 300 trees, bootstrap enabled |
| XGBoost | 400 trees, learning rate = 0.05, max depth = 6, subsample = 0.8 |
| CatBoost | 400 iterations, depth = 6, learning rate = 0.05, loss = RMSE |
| Station | MAE (kW) | RMSE (kW) | R2 |
|---|---|---|---|
| station00 | 0.153 | 0.316 | 0.974 |
| station01 | 0.498 | 1.105 | 0.977 |
| station02 | 0.326 | 0.532 | 0.973 |
| station03 | 0.305 | 0.471 | 0.985 |
| station04 | 1.105 | 1.177 | 0.983 |
| station05 | 0.33 | 0.821 | 0.989 |
| station06 | 0.304 | 0.478 | 0.971 |
| station07 | 0.442 | 1.089 | 0.972 |
| station08 | 0.254 | 0.438 | 0.988 |
| station09 | 0.875 | 2.199 | 0.483 |
| Model | R2 (%) | MAE (kW) | RMSE (kW) |
|---|---|---|---|
| Linear Regression | 83.01 | 1.310 | 2.009 |
| Lasso Regression | 71.73 | 1.552 | 2.591 |
| Random Forest | 98.15 | 0.228 | 0.663 |
| Gradient Boosting | 95.83 | 0.485 | 0.995 |
| Support Vector | 97.45 | 0.285 | 0.778 |
| KNN | 97.19 | 0.336 | 0.817 |
| XGBoost | 98.02 | 0.276 | 0.685 |
| LightGBM | 97.57 | 0.306 | 0.759 |
| CatBoost | 98.09 | 0.271 | 0.673 |
| HybrEnNet | 99.30 | 0.227 | 0.629 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ali, W.; Akhtar, F.; Ullah, A.; Kim, W.Y. A Hybrid Ensemble Learning Framework for Accurate Photovoltaic Power Prediction. Energies 2026, 19, 453. https://doi.org/10.3390/en19020453
Ali W, Akhtar F, Ullah A, Kim WY. A Hybrid Ensemble Learning Framework for Accurate Photovoltaic Power Prediction. Energies. 2026; 19(2):453. https://doi.org/10.3390/en19020453
Chicago/Turabian StyleAli, Wajid, Farhan Akhtar, Asad Ullah, and Woo Young Kim. 2026. "A Hybrid Ensemble Learning Framework for Accurate Photovoltaic Power Prediction" Energies 19, no. 2: 453. https://doi.org/10.3390/en19020453
APA StyleAli, W., Akhtar, F., Ullah, A., & Kim, W. Y. (2026). A Hybrid Ensemble Learning Framework for Accurate Photovoltaic Power Prediction. Energies, 19(2), 453. https://doi.org/10.3390/en19020453

