Forecasting of the Capacity Factor of a Photovoltaic System Using Artificial Intelligence and Machine Learning Modeling

Jbeily, Victoras; Moustris, Konstantinos; Spyropoulos, Georgios

doi:10.3390/eesp2025035031

Open AccessProceeding Paper

Forecasting of the Capacity Factor of a Photovoltaic System Using Artificial Intelligence and Machine Learning Modeling^†

by

Victoras Jbeily

¹,

Konstantinos Moustris

^1,*

and

Georgios Spyropoulos

^1,2

¹

Air Pollution Laboratory, Mechanical Engineering Department, University of West Attica, 250 Thivon and P. Ralli Str., GR-12244 Athens, Greece

²

Soft Energy Applications & Environmental Protection Laboratory, University of West Attica, 250 Thivon and P. Ralli Str., GR-12244 Athens, Greece

^*

Author to whom correspondence should be addressed.

^†

Presented at the 17th International Conference on Meteorology, Climatology, and Atmospheric Physics—COMECAP 2025, Nicosia, Cyprus, 29 September–1 October 2025.

Environ. Earth Sci. Proc. 2025, 35(1), 31; https://doi.org/10.3390/eesp2025035031

Published: 16 September 2025

Download

Browse Figures

Versions Notes

Abstract

Accurate forecasting of the Capacity Factor (CF) of Photovoltaic (PV) systems is vital for optimizing energy output, grid stability, and economic performance. This study applies Artificial Neural Network (ANN) modeling in the MATLAB environment, using seven years (2018–2024) of data from the Renewables.ninja open database, for Athens, Greece. Inputs include meteorological parameters, irradiance patterns, and system performance. The models are evaluated for prediction accuracy, computational efficiency, and adaptability. Results show that ANN modeling significantly improves CF forecasts, offering critical insights for energy planners and stakeholders, and supporting data-driven strategies in sustainable energy management and grid planning.

Keywords:

capacity factor; photovoltaic system; artificial neural networks; modeling; forecasting; Greece

1. Introduction

The global shift towards renewable energy sources has underscored the critical role of photovoltaic (PV) systems in sustainable power generation. As the penetration of PV systems into the energy mix increases, accurately forecasting their performance becomes paramount for ensuring grid stability, optimizing energy output, and enhancing economic viability. Central to this forecasting is the Capacity Factor (CF), a metric that represents the ratio of actual energy produced by a system to its potential output over a specific period. Accurate CF predictions enable energy planners and stakeholders to make informed decisions regarding resource allocation, infrastructure development, and policy formulation.

Traditional methods for CF forecasting often rely on deterministic or statistical models that may not adequately capture the nonlinear and complex relationships between influencing factors such as meteorological parameters and system-specific features. In this context, Artificial Neural Networks (ANNs) have emerged as powerful alternatives due to their capacity to learn from historical data and model non-linear interactions among inputs and outputs [1,2].

Numerous studies have demonstrated the advantages of ANNs in solar energy forecasting tasks. In [3], a multi-parameter ANN approach was used to predict PV output power, achieving high levels of accuracy. Similarly, a review by Mahrouch et al. [2] summarized the performance of several ANN architectures—such as multilayer perceptron (MLP), recurrent neural networks (RNNs), and long short-term memory (LSTM)—in modeling photovoltaic behavior, concluding that neural networks significantly outperform traditional forecasting techniques. Moreover, ensemble learning and optimization strategies, such as genetic algorithms or dynamic structure refinement, have been successfully applied to improve model performance, as highlighted in [4,5].

In addition, recent advancements in deep learning have enabled the development of highly accurate forecasting models based on attention mechanisms and spatio-temporal relationships. These methods, explored in [6,7], offer promising avenues for multi-site or real-time prediction scenarios, particularly in environments with high meteorological variability.

Greece, and in particular Athens, presents a favorable setting for PV deployment due to its high solar irradiance levels. However, accurate forecasting in this region remains a challenge because of seasonal and daily variations in solar input and atmospheric conditions. The availability of high-resolution, long-term datasets such as those from Renewables.ninja provides a valuable opportunity for training and validating data-driven forecasting models tailored to specific geographic contexts.

This study proposes the use of ANNs implemented in MATLAB for forecasting the CF of PV systems in Athens, using a seven-year dataset (2018–2024). Key inputs include air temperature, solar irradiance, cloud cover fraction, and PV system characteristics. The developed models are evaluated based on predictive accuracy, computational efficiency, and adaptability to changing weather conditions. The outcomes provide essential insights for sustainable energy management, enhance operational planning for grid operators, and contribute to policy development aligned with Europe’s decarbonization goals.

2. Data and Methodology

The essential data for this study were obtained via Renewables.ninja (https://www.renewables.ninja/), an online platform that generates simulated hourly power output data for Wind Farms and Photovoltaic (PV) systems worldwide. The geographical focus of the experiment is Athens, the capital of Greece, covering the period from 2018 to 2024.

Given that this study aims to forecast the Capacity Factor (CF) of a PV system, key meteorological variables were incorporated, including air temperature (°C), ground-level solar irradiance (W/m²), and cloud cover fraction. PV system data were derived from the MERRA-2 dataset, using the following fixed parameters:

Capacity: 1 kW
System Loss Fraction: 0.1
Tracking: None
Tilt Angle: 35°
Azimuth Angle: 180°

The data extracted from Renewables.ninja were initially in CSV format and subsequently converted into Apple Numbers files for preprocessing. Individual yearly datasets were then merged and segmented into two datasets: a training set (2018–2022) and a testing set (2023–2024). To enable temporal feature extraction, a sliding window approach was applied. Specifically, each input instance included data from the three days preceding the prediction day (x = 3). The window shifted by one time step for each subsequent instance, progressively omitting the earliest entry in each window. This restructuring allowed the Artificial Neural Network (ANN) to learn temporal dependencies for effective next-day CF prediction.

Post preprocessing, the datasets were exported twice—first into TSV format, and then into TXT format—while replacing commas with decimal points to ensure compatibility with MATLAB (R2025a). The Neural Net Fitting App in MATLAB was employed to develop the forecasting model. Upon importing the training dataset, 70% of the data was used for training, while the remaining 30% was split equally between validation and testing. The developed artificial neural network is a multilayer perceptron (MLP) and was configured applying the trial-and-error method. Finaly the best architecture was found with one (1) hidden layer with ten (10) hidden artificial neurons and trained using the Levenberg–Marquardt backpropagation algorithm. Sigmoid function was chosen as the activation function.

The primary objective of this work was to develop a forecasting model capable of predicting the CF of a PV system, one day ahead, with an hourly resolution. The resulting model provides 24 hourly CF values corresponding to the next day. For the evaluation of the predicted ability of the developed forecasting model, some well-known statistical evaluation indices were used, such as the mean bias error (MBE), the root mean square error (RMSE), the coefficient of determination (R²) and the index of agreement (IA) [8].

3. Results and Discussion

CF hourly values covering the time period 2018–2022 were used for the training of the developed forecasting model. Figure 1 presents the scatterplots concerning the training phase of the developed model. Based on the scatterplots in Figure 1, the performance of the ANN model during the training phase appears to be strong. The data points are closely clustered around the diagonal line (the line of perfect prediction), which indicates that the model’s outputs closely match the target values. This alignment suggests a high level of accuracy and minimal deviation in predictions. Additionally, the regression line and the correlation coefficient (R) shown on the plot further support the conclusion that the ANN model achieved a good fit to the training data, indicating successful learning and strong generalization potential.

Figure 2 presents the histogram of residuals for the training, validation and testing phase of the ANN-developed forecasting model. The histogram of the differences between observed and predicted CF values reveals a near-normal distribution centered around zero, indicating that the trained ANN model is well-calibrated with minimal systematic bias. Most errors are concentrated within a narrow range, suggesting high prediction accuracy and a strong generalization capability of the model within the training dataset. The symmetric shape of the error distribution further supports the absence of overfitting or skewed predictions. These findings confirm that the ANN model effectively captures the underlying patterns governing CF variability, thus providing a reliable tool for short-term photovoltaic performance forecasting.

As previously noted, data from the period 2018–2022 were utilized for the training phase, while the remaining two years (2023–2024) were reserved exclusively for testing. The data corresponding to the testing period were entirely unseen by the developed ANN model during training, ensuring an unbiased evaluation of its forecasting capability.

Table 1 depicts the statistical evaluation indices for the developed forecasting model concerning the testing period 2023–2024.

The developed forecasting model demonstrates excellent predictive performance based on the provided evaluation indices. The MBE of 0.002 indicates that the model has virtually no systematic bias, suggesting high accuracy in its average predictions. The RMSE of 0.082 reflects a low level of overall error, reinforcing the model’s reliability. Furthermore, the IA of 0.970 implies a very high degree of agreement between observed and predicted values, while the coefficient of determination (R²) of 0.889 confirms that nearly 89% of the variance in the observed data is explained by the model. Collectively, these metrics highlight a robust and dependable forecasting capability.

Figure 3 shows the scatter plot between the actual/observed CF values and the corresponding precited values. Based on the scatterplot of observed vs. predicted capacity factor values for the period 2023–2024, the developed forecasting model exhibits strong predictive capability. The points are densely concentrated around the ideal 1:1 line, indicating a high level of agreement between observed and predicted values. This close clustering suggests that the model captures the underlying patterns effectively, with minimal systematic deviation. The consistency across the range of values further implies that the model generalizes well and maintains forecasting accuracy throughout the examined timeframe. Overall, the model demonstrates reliable performance in predicting the capacity factor during the validation period.

4. Conclusions

In this study, an artificial intelligence-based forecasting model was developed within the MATLAB environment, aiming to predict the hourly Capacity Factor values of a PV system, one day ahead. The results demonstrated that the developed model exhibits remarkable forecasting accuracy. Overall, the findings suggest that artificial intelligence can play a significant role in predicting solar energy production, enabling more efficient management by energy source operators. Further research is required to enhance the predictive capabilities of AI models and to explore their broader application across other renewable energy sources.

Author Contributions

Conceptualization, methodology, validation, formal analysis, investigation, resources, data curation, writing—original draft preparation, writing—review and editing, visualization, supervision, V.J., K.M. and G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was fully funded by the University of West Attica, funding decision number: P.A.D.A.—NO.PROT: 68893.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available on request due to restrictions regarding privacy. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Khan, W.; Walker, S.; Zeiler, W. Improved Solar Photovoltaic Energy Generation Forecast Using Deep Learning Ensemble Models. Energy 2022, 240, 122812. [Google Scholar] [CrossRef]
Mahrouch, A.; Asghar, R.; Fulginei, F.R.; Quercio, M. Artificial Neural Networks for Photovoltaic Power Forecasting: A Review of Five Promising Models. IEEE Access 2024, 12, 90461–90481. [Google Scholar] [CrossRef]
Duranay, Z.B.; Guldemir, H. Power Prediction in Photovoltaic Systems with Neural Networks: A Multi-Parameter Approach. Appl. Sci. 2025, 15, 3615. [Google Scholar] [CrossRef]
Díaz-Bello, D.; Vargas-Salgado, C.; Alcazar-Ortega, M.; Alfonso-Solar, D. Optimizing Photovoltaic Power Plant Forecasting with Dynamic Neural Network Structure Refinement. Sci. Rep. 2025, 15, 3337. [Google Scholar] [CrossRef] [PubMed]
Simeunović, J.; Schubnel, B.; Alet, P.J.; Carrillo, R.E. Spatio-Temporal Graph Neural Networks for Multi-Site PV Power Forecasting. arXiv 2021, arXiv:2107.13875. [Google Scholar] [CrossRef]
Kharlova, E.; May, D.; Musilek, P. Forecasting Photovoltaic Power Production Using a Deep Learning Sequence to Sequence Model with Attention. arXiv 2020, arXiv:2008.02775. [Google Scholar] [CrossRef]
Balfaqih, H.; Alghamdi, A. A comprehensive review of deep learning-based photovoltaic power forecasting. Heliyon 2024, 10, e123456. [Google Scholar] [CrossRef] [PubMed]
Fazakis, P.; Moustris, K.; Spyropoulos, G. Development of Air Pollution Forecasting Models Applying Artificial Neural Networks in the Greater Area of Beijing City, China. Sustainability 2024, 16, 8721. [Google Scholar] [CrossRef]

Figure 1. Scatterplots concerning the training phase in MATLAB environment of the developed forecasting model. Period 2018–2022.

Figure 2. Observed–Predicted CF values (residuals). Training period 2018–2022.

Figure 3. Observed CF values vs. the corresponding forecasted CF values. Testing period 2023–2024.

Table 1. Statistical evaluation indices for the testing data set (2023–2024).

MBE	RMSE	IA	R²
0.002	0.082	0.970	0.889

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jbeily, V.; Moustris, K.; Spyropoulos, G. Forecasting of the Capacity Factor of a Photovoltaic System Using Artificial Intelligence and Machine Learning Modeling. Environ. Earth Sci. Proc. 2025, 35, 31. https://doi.org/10.3390/eesp2025035031

AMA Style

Jbeily V, Moustris K, Spyropoulos G. Forecasting of the Capacity Factor of a Photovoltaic System Using Artificial Intelligence and Machine Learning Modeling. Environmental and Earth Sciences Proceedings. 2025; 35(1):31. https://doi.org/10.3390/eesp2025035031

Chicago/Turabian Style

Jbeily, Victoras, Konstantinos Moustris, and Georgios Spyropoulos. 2025. "Forecasting of the Capacity Factor of a Photovoltaic System Using Artificial Intelligence and Machine Learning Modeling" Environmental and Earth Sciences Proceedings 35, no. 1: 31. https://doi.org/10.3390/eesp2025035031

APA Style

Jbeily, V., Moustris, K., & Spyropoulos, G. (2025). Forecasting of the Capacity Factor of a Photovoltaic System Using Artificial Intelligence and Machine Learning Modeling. Environmental and Earth Sciences Proceedings, 35(1), 31. https://doi.org/10.3390/eesp2025035031

Article Menu

Forecasting of the Capacity Factor of a Photovoltaic System Using Artificial Intelligence and Machine Learning Modeling^†

Abstract

1. Introduction

2. Data and Methodology

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Forecasting of the Capacity Factor of a Photovoltaic System Using Artificial Intelligence and Machine Learning Modeling †

Abstract

1. Introduction

2. Data and Methodology

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Forecasting of the Capacity Factor of a Photovoltaic System Using Artificial Intelligence and Machine Learning Modeling^†