Short-Term Forecasts of Energy Generation in a Solar Power Plant Using Various Machine Learning Models, along with Ensemble and Hybrid Methods
Abstract
1. Introduction
2. Data and Forecasting Methods
2.1. Data
2.1.1. Preprocessing of “Raw” PV Generation of Electrical Energy
2.1.2. Statistical Analysis of Time Series of PV Generation of Electrical Energy and Meteorological Measurement Data (15 min Periods)
2.1.3. Development of Input Variable SETS for Forecasting Models Including Feature Engineering
- The most important input variables are recent lagged values of electricity generation and solar irradiance. They exhibit almost linear dependence with the output data EG(T).
- The relationship between EG(T) and the lagged value of one day, EG(T-96), is nonlinear in nature.
- Air temperature and wind speed show a relatively small correlation with EG(T), and their relationship is nonlinear.
- When using input variables with a nonlinear relationship to the output, it seems appropriate to use models capable of modeling nonlinearity; however, for simple models with a limited number of inputs, which exhibit either linear or nearly linear relationships with the output, one can use predictive models that model linear dependencies, such as ARIMA or Multiple Linear Regression models.
2.1.4. Dataset Division
2.2. Forecasting Methods
3. Results
- The best results (measured by the error metric nRMSE) were achieved by heterogeneous ensemble models. The hybrid model was slightly worse than the two homogeneous ensemble models (RF and XGBOOST). It is worth noting that the differences in quality between these models are minimal.
- Nonlinear single models (KNN and SVR) perform significantly worse in terms of quality compared to ensemble models.
- Particular attention should be given to the very poor performance of the single SVR model, which is worse than the linear single models (MLR and ARIMA).
- Linear single models (MLR and ARIMA) are distinctly inferior in quality compared to nonlinear models.
- It is worth noting that nonlinear single models (MLP and LSTM) offer high quality comparable to that of both homogeneous and heterogeneous ensemble models, while being significantly less complex in their construction.
4. Discussion
- The “Weighted averaging ensemble based on different methods” model is marginally better than the “Averaging ensemble based on different methods” model—the difference is extremely small.
- The best of the “Ensemble Averaging Without Extremes” models has forecast results that are worse than both the best “Weighted averaging ensemble based on different methods” model and the best “Averaging ensemble based on different methods” model.
- All analyzed ensemble models achieved an nRMSE error smaller than the best single models and ensemble with homogenous input data model (RF).
- The smallest nMAE from all analyzed models was achieved by “Hybrid with two models connected in series” (XGBOOST->RF).
5. Conclusions
- The preferred model when considering nRMSE error as the most critical quality criterion is AVE_W(RF, XGBOOST, MLP). This model shows an improvement of 11.370% over the naive method and uses SET1 (the most extensive set) and SET3 as input data.
- The preferred model when considering nMAE error as the most critical quality criterion is the XGBOOST->RF HYBRID. This model shows an improvement of 3.901% over the naive method and uses SET1 and SET3 as input data.
- The preferred model when both nRMSE and nMAE errors are considered the most critical quality criteria is INT_MLP(RF, XGBOOST, MLP). This is the most complex of all models and requires the most time to build and finetune hyperparameters. The improvements over the naive method are 10.990% and 2.411%, respectively, using SET1 and SET3 as input data.
- The preferred model when considering nRMSE error as the most critical quality criterion is the RF model. This model shows an improvement of 10.562% over the naive method and uses SET1 as input data.
- The preferred model when considering nMAE error as the most critical quality criterion is the KNN model. This model shows an improvement of 1.439% over the naive method and uses SET4 as input data.
- The preferred model when both nRMSE and nMAE errors are considered as the most critical quality criteria is the XGBOOST model. The improvements over the naive method are 10.018% and 0.211%, respectively, using SET3 as input data.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ACF | Autocorrelation function |
ANN | Artificial neural network |
ARIMA | Autoregressive Integrated Moving Average |
CNN | Convolutional Neural Network |
E | Error |
GCN | Graph Convolutional Network |
GGRU | Graph Gated Recurrent Unit |
KNN | K-Nearest Neighbors |
LR | Linear regression |
LSTM | Long-Short-Term Memory |
MLP | Multilayer Perceptron |
MLR | Multiple Linear Regression |
nAPEmax | Maximal of Normalized Absolute Percentage Error |
NGboost | Natural Gradient Boosting |
nMAE | Normalized Mean Absolute Error |
nMBE | Normalized Mean Bias Error |
nRMSE | Normalized Root Mean Squared Error |
NWP | Numerical Weather Prediction |
PSO | Particle Swarm Optimization |
PV | Photovoltaics |
R | Pearson linear correlation coefficient |
RES | Renewable Energy Source |
RF | Random Forest |
RNN | Recurrent Neural Network |
SHAP | Shapley Additive Explanation |
SVR | Support Vector Regression |
WNN | Wavelet Neural Network |
XGBOOST | eXtreme Gradient Boost Decision Tree |
References
- Wang, H.; Mao, L.; Zhang, H.; Wu, Q. Multi-Prediction of Electric Load and Photovoltaic Solar Power in Grid-Connected Photovoltaic System Using State Transition Method. Appl. Energy 2024, 353, 122138. [Google Scholar] [CrossRef]
- Awais, M.; Mahum, R.; Zhang, H.; Zhang, W.; Metwally, A.S.M.; Hu, J.; Arshad, I. Short-Term Photovoltaic Energy Generation for Solar Powered High Efficiency Irrigation Systems Using LSTM with Spatio-Temporal Attention Mechanism. Sci. Rep. 2024, 14, 10042. [Google Scholar] [CrossRef]
- Huang, J.; Guo, W.; Wei, R.; Yan, M.; Hu, Y.; Qin, T. Short-term Power Forecasting Method for 5G Photovoltaic Base Stations on Non-sunny Days Based on SDN-integrated INGO-BP and RGAN. IET Renew. Power Gen. 2024, 18, 1019–1039. [Google Scholar] [CrossRef]
- Bazionis, I.K.; Kousounadis-Knousen, M.A.; Georgilakis, P.S.; Shirazi, E.; Soudris, D.; Catthoor, F. A Taxonomy of Short-term Solar Power Forecasting: Classifications Focused on Climatic Conditions and Input Data. IET Renew. Power Gen. 2023, 17, 2411–2432. [Google Scholar] [CrossRef]
- Tsai, W.-C.; Tu, C.-S.; Hong, C.-M.; Lin, W.-M. A Review of State-of-the-Art and Short-Term Forecasting Models for Solar PV Power Generation. Energies 2023, 16, 5436. [Google Scholar] [CrossRef]
- Tawn, R.; Browell, J. A Review of Very Short-Term Wind and Solar Power Forecasting. Renew. Sustain. Energy Rev. 2022, 153, 111758. [Google Scholar] [CrossRef]
- Campos, F.D.; Sousa, T.C.; Barbosa, R.S. Short-Term Forecast of Photovoltaic Solar Energy Production Using LSTM. Energies 2024, 17, 2582. [Google Scholar] [CrossRef]
- Piotrowski, P.; Parol, M.; Kapler, P.; Fetliński, B. Advanced Forecasting Methods of 5-Minute Power Generation in a PV System for Microgrid Operation Control. Energies 2022, 15, 2645. [Google Scholar] [CrossRef]
- Etxegarai, G.; López, A.; Aginako, N.; Rodríguez, F. An Analysis of Different Deep Learning Neural Networks for Intra-Hour Solar Irradiation Forecasting to Compute Solar Photovoltaic Generators’ Energy Production. Energy Sustain. Dev. 2022, 68, 1–17. [Google Scholar] [CrossRef]
- Liao, W.; Bak-Jensen, B.; Pillai, J.R.; Yang, Z.; Liu, K. Short-Term Power Prediction for Renewable Energy Using Hybrid Graph Convolutional Network and Long Short-Term Memory Approach. Electr. Power Syst. Res. 2022, 211, 108614. [Google Scholar] [CrossRef]
- Yue, H.; Ali, M.M.; Lin, Y.; Liu, H. Ultra-Short-Term Forecasting of Large Distributed Solar PV Fleets Using Sparse Smart Inverter Data. IEEE Trans. Sustain. Energy 2024, 15, 1968–1980. [Google Scholar] [CrossRef]
- Zhang, M.; Zhen, Z.; Liu, N.; Zhao, H.; Sun, Y.; Feng, C.; Wang, F. Optimal Graph Structure Based Short-Term Solar PV Power Forecasting Method Considering Surrounding Spatio-Temporal Correlations. IEEE Trans. Ind. Applicat. 2023, 59, 345–357. [Google Scholar] [CrossRef]
- Wang, Y.; Fu, W.; Zhang, X.; Zhen, Z.; Wang, F. Dynamic Directed Graph Convolution Network Based Ultra-short-term Forecasting Method of Distributed Photovoltaic Power to Enhance the Resilience and Flexibility of Distribution Network. IET Gener. Trans. Dist. 2024, 18, 337–352. [Google Scholar] [CrossRef]
- Bai, M.; Zhou, Z.; Li, J.; Chen, Y.; Liu, J.; Zhao, X.; Yu, D. Deep Graph Gated Recurrent Unit Network-Based Spatial–Temporal Multi-Task Learning for Intelligent Information Fusion of Multiple Sites with Application in Short-Term Spatial–Temporal Probabilistic Forecast of Photovoltaic Power. Expert Syst. Appl. 2024, 240, 122072. [Google Scholar] [CrossRef]
- Sheng, W.; Li, R.; Shi, L.; Lu, T. Distributed Photovoltaic Short-term Power Forecasting Using Hybrid Competitive Particle Swarm Optimization Support Vector Machines Based on Spatial Correlation Analysis. IET Renew. Power Gen. 2023, 17, 3624–3637. [Google Scholar] [CrossRef]
- Wang, F.; Lu, X.; Mei, S.; Su, Y.; Zhen, Z.; Zou, Z.; Zhang, X.; Yin, R.; Duić, N.; Shafie-khah, M.; et al. A Satellite Image Data Based Ultra-Short-Term Solar PV Power Forecasting Method Considering Cloud Information from Neighboring Plant. Energy 2022, 238, 121946. [Google Scholar] [CrossRef]
- Liu, Q.; Li, Y.; Jiang, H.; Chen, Y.; Zhang, J. Short-Term Photovoltaic Power Forecasting Based on Multiple Mode Decomposition and Parallel Bidirectional Long Short Term Combined with Convolutional Neural Networks. Energy 2024, 286, 129580. [Google Scholar] [CrossRef]
- Rai, A.; Shrivastava, A.; Jana, K.C. A Robust Auto Encoder-Gated Recurrent Unit (AE-GRU) Based Deep Learning Approach for Short Term Solar Power Forecasting. Optik 2022, 252, 168515. [Google Scholar] [CrossRef]
- Agga, A.; Abbou, A.; Labbadi, M.; Houm, Y.E.; Ou Ali, I.H. CNN-LSTM: An Efficient Hybrid Deep Learning Architecture for Predicting Short-Term Photovoltaic Power Production. Electr. Power Syst. Res. 2022, 208, 107908. [Google Scholar] [CrossRef]
- Li, R.; Wang, M.; Li, X.; Qu, J.; Dong, Y. Short-term Photovoltaic Prediction Based on CNN-GRU Optimized by Improved Similar Day Extraction, Decomposition Noise Reduction and SSA Optimization. IET Renew. Power Gen. 2024, 18, 908–928. [Google Scholar] [CrossRef]
- Guo, X.; Wang, X.; Ao, Y.; Dai, W.; Gao, Y. Short-term Photovoltaic Power Forecasting with Adaptive Stochastic Configuration Network Ensemble. WIREs Data Min. Knowl. Discov. 2022, 12, e1477. [Google Scholar] [CrossRef]
- Sarmas, E.; Spiliotis, E.; Stamatopoulos, E.; Marinakis, V.; Doukas, H. Short-Term Photovoltaic Power Forecasting Using Meta-Learning and Numerical Weather Prediction Independent Long Short-Term Memory Models. Renew. Energy 2023, 216, 118997. [Google Scholar] [CrossRef]
- Dudek, G. Combining Forecasts of Time Series with Complex Seasonality Using LSTM-Based Meta-Learning. Eng. Proc. 2023, 39, 53. [Google Scholar] [CrossRef]
- Liu, Y. Short-Term Prediction Method of Solar Photovoltaic Power Generation Based on Machine Learning in Smart Grid. Math. Probl. Eng. 2022, 2022, 1–10. [Google Scholar] [CrossRef]
- Zhang, J.; Hong, L.; Ibrahim, S.N.; He, Y. Short-term Prediction of Behind-the-meter PV Power Based on attention-LSTM and Transfer Learning. IET Renew. Power Gen. 2024, 18, 321–330. [Google Scholar] [CrossRef]
- Elsaraiti, M.; Merabet, A. Solar Power Forecasting Using Deep Learning Techniques. IEEE Access 2022, 10, 31692–31698. [Google Scholar] [CrossRef]
- Chai, M.; Liu, Z.; He, K.; Jiang, M. Short-term Prediction of Photovoltaic Power Generation Based on Neural Network Prediction Model. Energy Sci. Eng. 2023, 11, 97–111. [Google Scholar] [CrossRef]
- Mitrentsis, G.; Lens, H. An Interpretable Probabilistic Model for Short-Term Solar Power Forecasting Using Natural Gradient Boosting. Appl. Energy 2022, 309, 118473. [Google Scholar] [CrossRef]
- Rokicki, Ł. Digital Energy Path for Planning and Operation of the Sustainable Grid, Products and Society—Project Objectives and Selected Preliminary Results in Polish Conditions. Electrotech. Rev. 2024, 1, 21–27. [Google Scholar] [CrossRef]
- Shang, Y.; Li, S. FedPT-V2G: Security Enhanced Federated Transformer Learning for Real-Time V2G Dispatch with Non-IID Data. Appl. Energy 2024, 358, 122626. [Google Scholar] [CrossRef]
- Tan, M.; Hu, C.; Chen, J.; Wang, L.; Li, Z. Multi-Node Load Forecasting Based on Multi-Task Learning with Modal Feature Extraction. Eng. Appl. Artif. Intell. 2022, 112, 104856. [Google Scholar] [CrossRef]
- Rutyna, I.; Piotrowski, P. Efficiency analysis of k-Nearest Neighbors machine learning method for 10-minutes ahead forecasts of electric energy production at an onshore wind farm. Electrotech. Rev. 2024, 1, 147–150. [Google Scholar] [CrossRef]
- Piotrowski, P.; Baczyński, D.; Kopyt, M.; Gulczyński, T. Advanced Ensemble Methods Using Machine Learning and Deep Learning for One-Day-Ahead Forecasts of Electric Energy Production in Wind Farms. Energies 2022, 15, 1252. [Google Scholar] [CrossRef]
Output data description | Code |
Generation in period T [p.u.] | EG(T) |
Input data description | Code |
Month | Month |
Hour | Hour |
Rising solar irradiance | R_SI |
Declining solar irradiance | D_SI |
Smoothed generation in period T-1 [p.u.] | SEG(T-1) |
Generation in period T-n, n = 1, 2…6, 96, 192 [p.u.] | EG(T-n) |
Solar irradiance T-n, n = 1, 2…6, 96, 192 [W/m2] | SI(T-n) |
Air temperature T-1 [°C] | AT(T-1) |
Air temperature T-2 [°C] | AT(T-2) |
Wind speed T-1 [m/s] | WS(T-1) |
Code of Set | Codes of Input Data and Additional Comments |
---|---|
SET 1 (24 inputs) | Month, Hour, R_SI, D_SI, SEG(T-1), EG(T-1), EG(T-2), EG(T-3), EG(T-4), EG(T-5), EG(T-6), EG(T-96), EG(T-192), SI(T-1), SI(T-2), SI(T-3), SI(T-4), SI(T-5), SI(T-6), SI(T-96), SI(T-192), AT(T-1), AT(T-2), WS(T-1). All available/created input data including endogenous variables, exogenous variables, seasonality markers, daily variability markers, and process trend markers (increasing/decreasing) |
SET 2 (12 inputs) | SEG(T-1), EG(T-1), EG(T-2), EG(T-3), EG(T-4), EG(T-5), EG(T-6), EG(T-96), SI(T-1), SI(T-2), SI(T-3), SI(T-4)—12 highest ranked input data (from 24 input data) based on the final balancing ranking of the importance of input data |
SET 3 (13 inputs) | Month, Hour, R_SI, D_SI, SEG(T-1), EG(T-1), EG(T-2), EG(T-3), EG(T-4), EG(T-5), EG(T-6), EG(T-96), EG(T-192). Only endogenous variables and seasonality markers, daily variability markers, and process trend markers (increasing/decreasing) |
SET 4 (9 inputs) | SEG(T-1), EG(T-1), EG(T-2), EG(T-3), EG(T-4), EG(T-5), EG(T-6), EG(T-96), EG(T-192) Only endogenous variables without markers |
SET 5 (1 input) | EG(T-1) |
SET 6 (p inputs) | EG(T-1), …, EG(T-p); for ARIMA models, p is tested from 2 to 8 |
Model Name/ Model Code | Type | Tested Sets of Input Data | Hyperparameters/Parameters Tuned |
---|---|---|---|
Persistent model/ NAIVE | Linear | SET5 | - |
Autoregressive Integrated Moving Average/ ARIMA | Linear | SET6 | p—autoregressive order; d—differencing order; q—moving average order; c—constant term (total number of tested model variants: 32) |
Multiple Linear Regression/ MLR | Linear | SET1…SET4 | β0—constant term; β1, β2, …, βn—coefficients (total number of tested model variants: 8) |
Multilayer Perceptron/ MLP | Nonlinear | SET1…SET5 | Number of hidden layers and neurons in each layer, activation functions in each layer, optimizer type including optimizer-specific parameters, weight initialization type, number of epochs, batch size (total number of tested model variants: 498) |
Long Short-Term Memory/ LSTM | Nonlinear | SET1…SET5 | Number of hidden layers and neurons in each layer, activation functions in each layer, optimizer type including optimizer-specific parameters, weight initialization type, early stopping patience, batch size (total number of tested model variants: 240) |
Support Vector Regression SVR | Nonlinear | SET1…SET5 | Type of kernel; C—punitive term; ϵ—error tolerance (total number of tested model variants: 3400) |
K-Nearest Neighbors/ KNN | Nonlinear | SET1…SET5 | K—number of neighbors, distance metric (total number of tested model variants: 450) |
Model Name/ Model Code | Operating Method | Ensemble Prediction Calculation | Hyperparameters Tuned |
---|---|---|---|
eXtreme Gradient Boost Decision Tree/ XGBOOST | Boosting | Averaging | The number of decision trees, maximal depth of each decision tree, learning rate (total number of tested model variants: 500) |
Random Forest/ RF | Bagging | Averaging | The number of randomly chosen input data for each decision tree individually, the number of decision trees, minimum number of samples in a node subject to splitting, the maximum number of levels, the maximum number of nodes, minimum samples per leaf (total number of tested model variants: 192) |
Model Name/ Model Code | Operating Method | Ensemble/Hybrid Prediction Calculation | Tested Sets of Predictors |
---|---|---|---|
Averaging ensemble based on different methods/ AVE (predictor 1, predictor 2, …, predictor n) | Combining different architectures | Averaging | (RF, XGBOOST, MLP), (RF, XGBOOST, MLP, LSTM), (RF, XGBOOST, LSTM), (RF, XGBOOST), (RF, MLP), (RF, LSTM) |
Weighted averaging ensemble based on different methods/ AVE_W (predictor 1, predictor 2, …, predictor n) | Combining different architectures | Weighted averaging | (RF, XGBOOST, MLP), (RF, XGBOOST, MLP, LSTM), (RF, XGBOOST, LSTM), (RF, XGBOOST), (RF, MLP), (RF, LSTM) |
Ensemble Averaging Without Extremes/ AVE_OUT_EXT (predictor 1, predictor 2, …, predictor n) | Combining different architectures | Averaging without extreme forecasts for each prediction | (RF, XGBOOST, MLP), (RF, XGBOOST, LSTM), (RF, XGBOOST, MLP, LSTM) |
Hybrid with MLP as “meta-model”/ INT_MLP (predictor 1, predictor 2, …, predictor n) | Stacking (stacked generalization) | Output from MLP “meta-model” | (RF, XGBOOST, MLP), (RF, XGBOOST), (RF, XGBOOST, LSTM, MLP), (RF, XGBOOST, LSTM) |
Hybrid with two models connected in series/ predictor 1->predictor 2 | Stacking (stacked generalization) | Output from predictor 2 “meta-model” | XGBOOST->RF, MLP->RF, LSTM->RF |
Model Variant (Code) | Input Data Code | nRMSE (p.u.) | nMAE (p.u.) | nAPEmax (%) | nMBE (p.u.) |
---|---|---|---|---|---|
AVE_W(RF,XGBOOST,MLP) | SET1, SET3 | 0.0210397 | 0.009777 | 19.118 | −0.000612 |
AVE(RF,XGBOOST,MLP) | SET1, SET3 | 0.0210399 | 0.009776 | 19.120 | −0.000611 |
AVE_OUT_EXT(RF,XGBOST,MLP) | SET1, SET3 | 0.0210681 | 0.009814 | 19.356 | −0.000520 |
INT_MLP(RF,XGBOOST,MLP) | SET1, SET3 | 0.0211300 | 0.009646 | 19.460 | −0.000175 |
RF | SET1 | 0.0212316 | 0.010263 | 18.800 | −0.000711 |
XGBOOST | SET3 | 0.0213606 | 0.009863 | 19.706 | −0.000524 |
XGBOOST->RF HYBRID | SET1, SET3 | 0.0213951 | 0.009498 | 20.034 | −0.000446 |
MLP | SET3 | 0.0214949 | 0.009953 | 19.356 | −0.000599 |
LSTM | SET3 | 0.0215378 | 0.010164 | 18.563 | −0.000436 |
KNN | SET4 | 0.0221539 | 0.009742 | 19.667 | −0.000670 |
MLR | SET3 | 0.0222672 | 0.011174 | 19.591 | −0.000427 |
ARIMA | SET6 | 0.0226611 | 0.011380 | 19.767 | −0.000870 |
SVR | SET3 | 0.0228177 | 0.012782 | 17.247 | −0.000903 |
NAIVE | SET5 | 0.0237388 | 0.009884 | 20.292 | 0.000347 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Piotrowski, P.; Kopyt, M. Short-Term Forecasts of Energy Generation in a Solar Power Plant Using Various Machine Learning Models, along with Ensemble and Hybrid Methods. Energies 2024, 17, 4234. https://doi.org/10.3390/en17174234
Piotrowski P, Kopyt M. Short-Term Forecasts of Energy Generation in a Solar Power Plant Using Various Machine Learning Models, along with Ensemble and Hybrid Methods. Energies. 2024; 17(17):4234. https://doi.org/10.3390/en17174234
Chicago/Turabian StylePiotrowski, Paweł, and Marcin Kopyt. 2024. "Short-Term Forecasts of Energy Generation in a Solar Power Plant Using Various Machine Learning Models, along with Ensemble and Hybrid Methods" Energies 17, no. 17: 4234. https://doi.org/10.3390/en17174234
APA StylePiotrowski, P., & Kopyt, M. (2024). Short-Term Forecasts of Energy Generation in a Solar Power Plant Using Various Machine Learning Models, along with Ensemble and Hybrid Methods. Energies, 17(17), 4234. https://doi.org/10.3390/en17174234