Next Article in Journal
Numerical Study on Cold Plume Behavior in the RPV of a Small Mobile Reactor During Safety Injection
Previous Article in Journal
Embodied Video Games for Nutrition and Healthy Eating Learning: Evidence on Retention and Cognitive Processes in Primary Students
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predictive Energy and Exergy Assessment of Photovoltaic Systems Under Dynamic Environmental Conditions Using Machine Learning

1
Municipality of Dronten, De Rede, 1, 8251 ER Dronten, The Netherlands
2
Copernicus Institute of Sustainable Development, Utrecht University, Princetonlaan 8A, 3584 CB Utrecht, The Netherlands
3
Department of Computer Science and Media Technology, Malmö University, 205 06 Malmö, Sweden
4
Sustainable Digitalisation Research Centre, Malmö University, 205 06 Malmö, Sweden
5
Biofilms Research Center for Biointerfaces (BRCB), Malmö University, 205 06 Malmö, Sweden
6
Department of Computer Engineering, Bitlis Eren University, Bitlis 13100, Türkiye
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2026, 16(10), 5049; https://doi.org/10.3390/app16105049
Submission received: 14 April 2026 / Revised: 26 April 2026 / Accepted: 4 May 2026 / Published: 19 May 2026

Abstract

This study evaluates the performance of a commercial silicon-based photovoltaic (PV) module under varying environmental conditions, including solar irradiance, module and ambient temperatures, humidity, and wind speed. Key performance indicators such as daily and lifetime energy output, CO2 reduction, and potential income were analyzed. Machine learning techniques, including Linear Regression (LR), Artificial Neural Networks (ANN), Random Forest (RF), and XGBoost, were employed to predict photovoltaic (PV) efficiency under varying environmental conditions. The results indicate that solar irradiance is the primary driver of energy production, while elevated temperatures and high humidity reduce efficiency, and wind speed provides minor cooling benefits. Among the models, XGBoost achieved the highest predictive accuracy (Test R2 = 0.9967), followed by RF and ANN, whereas LR underperformed due to a limited ability to capture nonlinear interactions. These findings highlight the critical influence of environmental and electrical factors on PV performance and demonstrate the effectiveness of advanced machine learning techniques, particularly XGBoost, in optimizing energy output and supporting sustainable energy planning.

1. Introduction

The global demand for energy continues to increase rapidly due to population growth, industrial expansion, and ongoing technological development. At the same time, the urgent need to reduce greenhouse gas emissions has accelerated the global transition toward renewable energy sources [1,2]. Among these alternatives, solar energy has become one of the most promising solutions because of its abundance, sustainability, and the continuously decreasing costs of photovoltaic (PV) technologies [3,4]. Photovoltaic systems convert solar radiation directly into electrical energy, providing a clean, reliable, and decentralized power generation option. However, the performance of PV systems is strongly influenced by various environmental and operational parameters such as solar irradiance, ambient temperature, module temperature, humidity, and wind speed [5,6]. Understanding the influence of these factors is essential for optimizing PV system design, improving energy production, and ensuring long-term operational sustainability. Recent research has emphasized the importance of integrating both energy and exergy analyses to evaluate the overall performance of photovoltaic systems [7,8]. While energy analysis focuses on the quantity of electrical energy generated from the incident solar radiation, exergy analysis evaluates the quality of energy conversion by considering thermodynamic losses and system irreversibilities. Therefore, exergy analysis provides deeper insight into the inefficiencies within PV systems and helps identify opportunities for performance improvement [9]. In parallel with these thermodynamic approaches, predictive modeling using machine learning techniques has gained significant attention for forecasting PV system performance under varying climatic conditions. Machine learning techniques have been widely applied to photovoltaic (PV) performance prediction. Linear Regression has been used as a baseline approach in PV studies to model the relationship between solar irradiance and PV output due to its simplicity and interpretability [10]. Artificial Neural Networks (ANN) have been applied to model complex nonlinear relationships between environmental variables such as temperature, humidity, and solar irradiance, and PV efficiency, providing improved predictive accuracy compared to linear methods [11,12]. Random Forest (RF) has been widely adopted for PV performance prediction due to its robustness against overfitting and its ability to capture nonlinear interactions among multiple environmental variables [13]. More recently, XGBoost has emerged as a highly effective method for PV efficiency prediction, as it combines gradient boosting with regularization techniques to improve both accuracy and generalization performance under varying environmental conditions [14]. Despite these advancements, several challenges remain, particularly in addressing environmental variability, temperature-dependent degradation effects, and the complex interactions among multiple climatic factors. Accurate prediction of PV efficiency therefore requires comprehensive datasets and advanced analytical frameworks that integrate thermodynamic analysis with statistical and machine learning approaches.
In this context, the present study investigates the performance of a commercial rooftop photovoltaic system by combining energy and exergy analyses with predictive modeling techniques. The objectives of this research are threefold: (i) to evaluate the influence of environmental and electrical parameters on PV system efficiency, (ii) to compare the predictive performance of different machine learning models, and (iii) to provide practical insights for optimizing PV system design and operation under real-world climatic conditions.

2. Materials and Methods

In this study, machine learning techniques, including Linear Regression (LR), Artificial Neural Networks (ANN), Random Forest (RF), and XGBoost, (version 2.0.3) were employed to predict photovoltaic (PV) efficiency under varying environmental conditions. These models were integrated within a broader framework that combines thermodynamic analysis with predictive modeling. Both energy and exergy analyses were applied to evaluate not only the quantity of electricity generated but also the quality and efficiency of the energy conversion process. The analysis uses environmental and operational parameters, including solar irradiation, module temperature, ambient temperature, wind speed, atmospheric pressure, and humidity. System performance indicators such as power output (kW), daily energy production (kWh), lifetime energy generation (kWh), CO2 emission reduction (kg), and income (USD) are used only as output variables for environmental and economic impact assessment. The selected machine learning models were used to capture complex nonlinear interactions between climatic variables and PV efficiency, enabling accurate and reliable predictions under varying environmental conditions.

2.1. Data and Evaluation Criteria

The data used in this study were obtained from a rooftop photovoltaic (PV) installation located in Kars Sarıkamış, Turkey, to evaluate panel performance under real operating conditions, as illustrated in Figure 1. The dataset consists of environmental and operational parameters, including solar irradiation, ambient temperature, humidity, air temperature, atmospheric pressure, and wind speed, which are used as input variables for the predictive models. PV efficiency is defined as the dependent variable. In addition, CO2 emission reduction (kg) and income (USD) are considered as output indicators representing environmental and economic performance. These variables are not used as input features in any machine learning model to avoid data leakage and ensure physically meaningful predictions. The internal electrical characteristics of the PV panel, including open circuit voltage (Voc), short circuit current (Isc), maximum power (Pmpp), and efficiency range, are summarized in Table 1. These parameters provide essential information about the operational limits and performance behavior of the PV system. Before model development, the relationships among independent variables were analyzed using a correlation matrix. In addition, multicollinearity was evaluated using Variance Inflation Factor (VIF) analysis to ensure model stability and prevent bias in regression-based interpretations. Pairwise relationships between variables were further examined using scatter plots to identify potential nonlinear interactions. To ensure transparency and reproducibility, a structured summary of the dataset is provided in Table 2, including collection period, sampling frequency, preprocessing steps, and data splitting strategy. This improves the clarity of the experimental design and supports reproducibility of the results.

2.2. Energy and Exergy Analysis

Energy analysis involves assessing the total solar energy incident on a PV panel and determining the fraction converted into electrical power [15]. The PV system was exposed to solar radiation under real outdoor conditions at one-hour intervals. Key parameters including solar irradiance, ambient temperature, maximum voltage (Vmax), maximum current (Imax), maximum power output (Pmax), and panel surface temperature were recorded systematically. These data were then used to calculate the energy generation capacity and efficiency of the PV panel. Energy input (Ein), based on the incident solar radiation and the panel area.
  • Electrical output (Eelec), derived from measured current and voltage.
  • Energy efficiency (ηen), defined as the ratio of electrical output to solar energy input.
η e n = E e l e c E i n
To assess the reliability of the calculated energy efficiency values, an uncertainty analysis was conducted for the primary measured parameters. The main sources of uncertainty stem from measurement errors in solar irradiance, voltage, and current sensors. These uncertainties were quantified based on the accuracy specifications provided by the instruments used in the data acquisition system [16].
δ η e n = η e n [ ( δ G G ) 2 + ( δ V V ) 2 + ( δ I I ) 2 ]
where δG, δV, and δI denote the measurement uncertainties of solar irradiance, voltage, and current, respectively. G represents the measured solar irradiance, while V and I correspond to the voltage and current values used to calculate the electrical power output. Considering the instrument specifications and typical operating conditions, the overall uncertainty in the calculated energy efficiency was estimated to be within ±3–5% [17]. This level of uncertainty is generally acceptable for photovoltaic performance studies and supports the reliability of the calculated efficiency values under varying environmental conditions. The combined uncertainty of the calculated energy efficiency (ηen) was estimated using the standard error propagation method. Exergy input (Exn) was calculated according to the Petela model, which considers the thermodynamic quality of solar radiation [18].
ψ s o l a r = G 1 4 3 . T 0 T s + 1 3 T 0 T s 4
Since the Petela exergy model relies on a reference temperature (T0), we examined how changes in ambient temperature affect exergy input and efficiency. Using the measured ambient temperature as T0, we found that higher temperatures slightly reduce exergy input and efficiency, though the effect is minor compared to solar irradiance and module temperature. This confirms that while the model is somewhat sensitive to T0, the overall conclusions remain reliable. Including irradiance and temperature fluctuations in exergy calculations could further improve the thermodynamic robustness of the results [19,20]. Exergy efficiency (ηex) was calculated as the ratio of electrical output to exergy input [21].
η e x = E e l e c Ex i n
This analysis highlights both the quantity and quality of energy conversion in the PV system, revealing irreversibilities and thermodynamic losses that energy efficiency alone cannot capture. Combining energy and exergy perspectives is essential for optimizing renewable energy systems under varying operational conditions [22,23]. The framework can be further strengthened by incorporating degradation or aging models to better evaluate long-term PV sustainability under material wear and operational stress. Additionally, exergy calculations can account for fluctuations in irradiance and temperature, enhancing thermodynamic robustness and improving the reliability of performance predictions.

2.3. Predictive Modeling of PV Efficiency

To analyze the influence of environmental conditions on photovoltaic system performance, several predictive modeling techniques were implemented, including Linear Regression (LR), Artificial Neural Networks (ANN), Random Forest (RF), and XGBoost. These models were applied to explore the relationships between climatic variables and PV efficiency and to generate reliable performance predictions. Before model implementation, multicollinearity among the independent variables was evaluated using correlation matrix analysis and variance inflation factors (VIF) to ensure that strong correlations between predictors would not negatively affect model accuracy or stability. In addition to statistical modeling, physical factors affecting PV performance were also considered. Temperature-related performance losses were evaluated in relation to semiconductor behavior at higher operating temperatures, while the potential impact of humidity on optical performance was assessed through its effect on panel surface conditions. The cooling influence of wind was interpreted in terms of convective heat transfer on the module. Furthermore, variations in solar irradiation and temperature were incorporated into the exergy input calculations to improve the reliability of thermodynamic assessments under changing environmental conditions. By combining predictive modeling with physical interpretation of environmental effects, this approach provides a comprehensive understanding of the factors influencing PV efficiency.

2.4. Linear Regression (LR)

Linear Regression (LR) was used to model the relationship between PV efficiency and environmental variables such as temperature, humidity, wind speed, and solar irradiance [24,25]. To improve stability and capture mild nonlinear effects, selective polynomial terms and regularization methods like Ridge and Lasso were applied. Preliminary scatter plots and correlation analyses showed that solar irradiance has a roughly linear effect, while temperature and humidity introduce noticeable nonlinearities due to physical PV behavior. The LR model is expressed as:
Y   =   β 0 + i = 1 n β i X i + ϵ
where βi are regression coefficients, Xi are input variables, and ϵ is the error term [26,27,28]. Despite its simplicity, LR provides clear insight into the main linear dependencies affecting PV efficiency.

2.5. Artificial Neural Network (ANN)

Artificial Neural Networks (ANNs) are widely used for modeling complex nonlinear relationships in photovoltaic system analysis [29,30]. In this study, a feedforward ANN architecture was developed to estimate PV efficiency based on environmental input variables. The network structure consists of an input layer representing the climatic parameters, two hidden layers, and a single output neuron corresponding to PV efficiency. Nonlinear patterns within the data were captured using ReLU activation functions in the hidden layers, while a linear activation function was applied at the output layer. To improve model performance, key hyperparameters were optimized using a grid search approach with cross validation. Different neuron configurations were tested for the hidden layers (Layer 1: 8, 16, 32; Layer 2: 4, 8, 16), along with alternative activation functions such as ReLU and tanh. Model accuracy was evaluated using standard performance metrics including MAE, MSE, RMSE [31,32,33,34,35,36], and the coefficient of determination (R2) [37,38,39]. The results indicate that the ANN model effectively captures nonlinear interactions between environmental variables and PV efficiency while maintaining reliable predictive performance.

2.6. Random Forest (RF)

Random Forest (RF) is an ensemble method that enhances PV efficiency prediction by combining multiple decision trees [40,41]. In this study, the RF model used 500 trees with a maximum depth of 15 to capture nonlinear interactions while preventing overfitting. Bootstrap sampling created diverse training subsets, and at each split, the square root of input features was considered for node selection. Sensitivity analysis confirmed that this configuration balanced bias and variance, producing stable and reliable predictions comparable to SVR under varying environmental conditions [42].

2.7. XGBoost

XGBoost, a gradient boosting algorithm, was applied to predict PV efficiency by sequentially training 500 decision trees to minimize residual errors from previous iterations [43,44]. It effectively models both linear and nonlinear interactions among environmental factors, including solar irradiance and temperature. The model was fine-tuned using grid search and cross validation with the following hyperparameter ranges: learning rate (0.01, 0.05, 0.1, 0.2), maximum tree depth (3, 5, 7, 9), number of trees (100, 200, 300, 500), subsample ratio (0.6, 0.8, 1.0), gamma (0, 0.1, 0.3), and lambda (0.5, 1, 2). A learning rate of 0.05 with exponential decay every 50 iterations, along with regularization (gamma = 0.1, lambda = 1), ensured convergence, reduced overfitting, and maintained model stability under varying climatic conditions. This setup allowed XGBoost to provide accurate, robust, and reliable PV efficiency predictions across different environmental scenarios [45,46].

2.8. Model Implementation Details

In order to ensure reproducibility and a fair comparison between the different machine learning approaches, all models were implemented under a consistent experimental framework. The hyper-parameters of each model were carefully selected and tuned using a grid search strategy combined with k-fold cross validation (k = 5). This approach was adopted to reduce overfitting risk and to improve the generalization capability of the models. For the XGBoost model, a systematic hyper-parameter optimization process was carried out. The learning rate was set to 0.05 to ensure a balanced trade-off between convergence speed and model stability. The maximum tree depth was limited to 7 to prevent excessive model complexity, while the number of estimators was set to 500 to allow sufficient learning capacity. In addition, subsampling and column sampling ratios were both set to 0.8 to introduce randomness and improve robustness. The gamma parameter was assigned a value of 0.1 to control node splitting, and L2 regularization (lambda = 1) was applied to reduce overfitting. Furthermore, a learning rate decay strategy was introduced every 50 iterations to enhance convergence stability and improve model performance.
The Random Forest model was configured with 500 decision trees to ensure strong ensemble learning capability. The maximum depth of each tree was limited to 15 to control model complexity. The number of features considered at each split was defined as the square root of the total number of input features, which is a commonly used heuristic in Random Forest implementations. Bootstrap sampling was enabled to improve model diversity, and the mean squared error criterion was used as the splitting function.
For the Artificial Neural Network (ANN), a relatively simple feedforward architecture was designed to balance performance and computational efficiency. The network consisted of two hidden layers with 16 and 8 neurons, respectively. The ReLU activation function was used in the hidden layers to introduce nonlinearity, while a linear activation function was applied in the output layer for regression purposes. The model was trained using the Adam optimizer with a learning rate of 0.001. Training was performed with a batch size of 32 over 200 epochs, and the mean squared error was selected as the loss function.
For the Linear Regression model, both Ridge and Lasso regularization techniques were evaluated to handle potential multicollinearity and improve model generalization. In addition, polynomial feature expansion up to degree 2 was tested where necessary to capture simple nonlinear relationships within the data.
To ensure a fair and unbiased comparison, all models were trained and evaluated using identical training and testing splits. This consistent experimental setup allowed for a reliable assessment of each model’s predictive performance under the same conditions.

2.9. Model Evaluation

The performance of each predictive model was evaluated using several metrics:
  • R2 (Coefficient of Determination): Measures how well the model explains the variance in the dependent variable.
R 2 = 1 i = 1 n y i y i ^ 2 i = 1 n y i y ¯ 2
  • Mean Absolute Error (MAE): Measures the average absolute difference between the predicted and actual values.
MAE = 1 n i = 1 n | y i y i ^ |
  • Root Mean Squared Error (RMSE): Represents the square root of the average squared differences between predicted and actual values.
RMSE = 1 n i = 1 n ( y i y i ^ ) 2
  • Mean Squared Error (MSE): Measures the average of the squared differences between predicted and actual values.
MSE = 1 n i = 1 n ( y i y i ^ ) 2
The models were evaluated using standard performance metrics to determine the most accurate and reliable predictor of PV efficiency. Residuals, calculated as the difference between predicted and measured efficiencies, were analyzed through distribution plots and error histograms showing errors centered around zero with few extreme deviations. Statistical summaries including mean, standard deviation, skewness, and kurtosis further confirmed model stability and robustness.
Temperature-Dependent Degradation and Long-Term Performance: Energy and exergy analyses can be enhanced by accounting for temperature- and time-dependent degradation of PV panels. Incorporating semiconductor material properties enables the model to capture accelerated efficiency loss at higher operating temperatures, providing a more accurate assessment of long-term performance and sustainability [47,48,49].

2.10. Feature Importance Validation

To ensure the stability and reliability of environmental parameter rankings derived from tree-based models (Random Forest and XGBoost), permutation testing and bootstrap resampling were implemented.
Permutation Testing: Each environmental feature was randomly shuffled across the dataset, and the models were retrained to measure the impact on predictive performance (R2 and RMSE). Features that caused a significant drop in accuracy upon permutation were confirmed as truly influential, reinforcing confidence in their ranking.
Bootstrap Validation: The dataset was resampled 1000 times with replacement, and the feature importance was recalculated for each bootstrap sample. The mean importance values and 95% confidence intervals were derived to evaluate the consistency of each parameter’s contribution across different samples.
This dual validation strategy demonstrates that solar irradiance, temperature, and humidity consistently rank as the most influential parameters on PV efficiency, while wind speed and atmospheric pressure remain secondary factors. Incorporating these validation methods enhances the interpretability of the models and ensures that the feature importance rankings are robust and reliable for practical PV system analysis.

3. Results

The performance of the photovoltaic (PV) panel was evaluated using four machine learning techniques, including Linear Regression (LR), Artificial Neural Networks (ANN), Random Forest (RF), and XGBoost. The main goal was to predict PV efficiency under varying environmental and electrical conditions and to identify the factors that most strongly influence system performance. Initial exploratory analysis using scatter plots and correlation matrices provided an overview of variable relationships. In addition, regression lines were used to better quantify these trends and to strengthen the assessment of interdependencies among environmental, energy, and economic variables. This approach enhances model interpretability and supports more robust conclusions regarding the key drivers of PV efficiency. The performance of photovoltaic efficiency prediction was therefore evaluated using these four models, allowing a direct comparison between linear and nonlinear learning approaches under real environmental conditions.

3.1. Prediction of Photovoltaic (PV) Panel Efficiency and Model Comparison

We systematically compared the predictive performance of Linear Regression (LR), Artificial Neural Networks (ANN), Random Forest (RF), and XGBoost in estimating PV panel efficiency. The evaluation relied on standard metrics including R2, MAE, MSE, and RMSE, providing a comprehensive view of each model’s predictive and explanatory power. The results, summarized in Table 3, reveal clear differences in model performance under real-world conditions. Ensemble tree-based methods, particularly XGBoost, provide both accuracy and reliability in PV efficiency prediction, while ANN captures nonlinear interactions effectively. Simpler models like LR are informative but limited in handling complex environmental effects.
To ensure clarity and consistency, all performance metrics were recalculated using standard definitions. RMSE and MAE represent absolute error metrics in the same unit as PV efficiency, while MSE is derived as the square of RMSE. This guarantees full mathematical consistency between reported evaluation indicators. XGBoost clearly outperforms the other models, achieving the highest test R2 (0.9967) and the lowest errors in both RMSE and MAE, demonstrating the best accuracy and stability. Random Forest ranks second and also provides strong performance. Although ANN achieves a high R2 (0.9612), its error values are slightly higher, indicating that capturing complex nonlinear interactions remains challenging. Linear Regression, as a basic linear model, shows limited performance and struggles to represent nonlinear effects at extreme efficiency values. These results highlight that ensemble tree-based methods, particularly XGBoost and Random Forest, provide both reliable and accurate predictions for PV efficiency, while ANN effectively captures nonlinear interactions.
Moving on to parameter influence, Table 4 presents the effect coefficients obtained from the XGBoost model using only environmental input variables. Solar irradiation emerges as the primary factor influencing PV efficiency, confirming its dominant role in energy generation. Module temperature and air temperature show negative effects, indicating that higher temperatures reduce efficiency due to increased thermal losses in the PV cells. Wind speed has a small positive contribution, which can be attributed to its cooling effect on the panel surface. Humidity exhibits a slight negative impact, likely due to its influence on solar radiation transmission and potential surface condensation effects. Atmospheric pressure shows a relatively minor influence compared to other environmental parameters. This analysis clearly demonstrates that PV efficiency is mainly governed by solar irradiation- and temperature-related variables, while other environmental factors play secondary roles. The exclusion of output-related variables such as CO2 reduction and income ensures that the model remains physically meaningful and free from data leakage.
Figure 2a presents predicted versus actual PV efficiencies for all models. Linear Regression, shown in pink, displays significant scatter especially at high and low efficiency values, highlighting its limitations in capturing complex environmental interactions. Artificial Neural Network (ANN), represented in green, closely follows the 1:1 line, demonstrating excellent capability in capturing nonlinear relationships. Random Forest, indicated by an orange line, provides stable and accurate predictions with improved performance compared to linear models. XGBoost, represented by a black line, achieves the most accurate and robust predictions overall. Overall, Linear Regression performs the weakest among the models due to its inability to capture nonlinear behavior, while ANN, Random Forest, and XGBoost show progressively better performance. Figure 2b visualizes parameter effects based on the XGBoost model. Solar irradiation and ambient temperature strongly enhance efficiency, whereas rising air temperature has a negative effect. Wind speed contributes minimally, occasionally improving efficiency through cooling. Other parameters such as pressure and humidity show limited influence, suggesting that thermal management may be more critical than other environmental variables.
Figure 3a,b further illustrate the comparison between actual and predicted PV efficiencies over time and across all machine learning models. In the figures, the actual PV efficiency values are represented by the black line and black reference markers, while Linear Regression is shown in pink, ANN in green, Random Forest in orange, and XGBoost in purple. The results indicate that the XGBoost model, represented by the purple line and markers, provides the closest agreement with the actual measured values, demonstrating the highest predictive accuracy and stability among all evaluated models. Random Forest, shown in orange, also produces stable and reliable predictions with relatively small deviations from the actual efficiency values. The ANN model, represented in green, effectively captures nonlinear relationships and follows the overall efficiency trend successfully, although slight deviations are observed under some extreme operating conditions. In contrast, Linear Regression, illustrated in pink, exhibits the largest deviations from the measured data, particularly at very low and very high efficiency values, highlighting its limitations in modeling complex nonlinear environmental interactions. Overall, the comparison demonstrates that advanced machine learning approaches, particularly XGBoost and Random Forest, provide more robust and accurate PV efficiency predictions under varying environmental conditions. ANN also performs effectively in capturing nonlinear interactions between climatic variables and PV efficiency, whereas Linear Regression remains comparatively weaker due to its simplified linear structure. Environmental parameters such as solar irradiation and module temperature are identified as the dominant factors influencing PV efficiency, while humidity, pressure, and wind speed contribute secondary effects. These findings provide valuable insight for improving PV system optimization and enhancing predictive model development under real-world climatic conditions.

3.1.1. Time Series and Seasonal Effect Analysis

Hourly, daily, and seasonal PV data were analyzed to uncover trends, cycles, and autocorrelation patterns. Understanding these temporal variations is essential for evaluating model reliability under short-term fluctuations and seasonal shifts, particularly for complex models such as ANN and XGBoost.
Figure 4 presents the daily and monthly variations in PV efficiency alongside solar irradiation. Peaks and troughs reveal clear seasonal effects, showing that energy output fluctuates significantly with changing sunlight and time of year. This confirms that models must account for seasonal variability to accurately predict PV performance.
To further interpret the influence of temporal variability, the role of seasonality in photovoltaic (PV) performance was also considered. PV systems are inherently sensitive to seasonal changes due to variations in solar irradiance, ambient temperature, and atmospheric conditions throughout the year. These variations directly affect both energy production and conversion efficiency. The observed results indicate that PV efficiency tends to increase during periods of higher solar availability and decreases during lower irradiation and unfavorable weather conditions. This confirms that seasonal dynamics are a significant factor in accurately characterizing PV system behavior.
From a modeling perspective, the machine learning algorithms used in this study particularly XGBoost, Random Forest, and Artificial Neural Networks are capable of capturing nonlinear relationships between environmental variables and PV efficiency. However, their generalization capability across different climatic regions depends on the diversity of the training dataset. Therefore, while the developed models perform reliably under the climatic conditions of the studied location, their direct application to regions with significantly different seasonal patterns may require retraining or enrichment of the dataset with multi-seasonal and multi-location data to ensure robustness and transferability.

3.1.2. Higher-Order Nonlinear Interactions

To capture complex relationships among environmental variables, Polynomial Regression and ANN models were applied, focusing on interactions such as Temperature × Humidity × Solar Irradiation. These higher-order interactions are particularly important under extreme conditions, where linear approximations fail to represent the true behavior of PV systems. We used a feedforward ANN with two hidden layers (16 and 8 neurons) and ReLU activation to visualize these effects.
Figure 5 shows a 3D interaction surface of PV efficiency as predicted by the ANN, with Module Temperature and Solar Irradiation on the axes. Wind Speed, Air Temperature, Atmospheric Pressure, and Humidity were held at their mean values. The surface clearly indicates that efficiency rises with increasing solar irradiation, as expected. However, at higher module temperatures, efficiency starts to drop, illustrating the thermal sensitivity of silicon-based PV panels. This highlights that extreme heat can significantly reduce output even under strong sunlight.
Figure 6 provides an enhanced visualization of the same ANN-predicted surface. The color gradient makes it easier to see efficiency levels across the temperature and irradiation spectrum. Bright colors indicate high efficiency, which occurs at moderate module temperatures and high solar irradiation, while darker colors show reduced performance at elevated temperatures.
From these surfaces, it becomes evident that the interaction between temperature and solar irradiation is nonlinear and critical for accurate prediction. Simple linear models would fail to capture this nuance, emphasizing why ANN and higher-order methods are needed. For PV system optimization, these insights suggest that controlling module temperature (through cooling or ventilation) can meaningfully enhance efficiency during periods of high solar irradiation.

3.2. Feature Importance Validation Using Permutation Testing and Bootstrap

Permutation testing and bootstrap resampling were employed on both Random Forest (RF) and XGBoost models to evaluate the robustness of environmental feature rankings. In the permutation test, each input variable was randomly shuffled, and the resulting drop in R2 was used to quantify the importance of that feature. Bootstrap resampling was conducted over 1000 iterations, generating 95% confidence intervals to assess the stability and reliability of the feature rankings.
In Table 5, the feature importance analysis for both Random Forest (RF) and XGBoost models is presented using only environmental and operational input variables. For the RF model, solar irradiation emerges as the most influential feature, followed by module temperature and air temperature. Humidity shows a moderate effect, while wind speed and atmospheric pressure contribute less significantly. Similarly, the XGBoost model identifies solar irradiation as the dominant factor influencing PV efficiency. Module temperature and air temperature also have notable contributions, whereas humidity, wind speed, and pressure exhibit relatively smaller effects. The confidence intervals obtained from bootstrap resampling confirm that these rankings are stable and consistent across different samples, indicating the robustness of the model interpretations.
Figure 7 illustrates the feature importance rankings for RF and XGBoost models, with error bars representing the 95% confidence intervals obtained from bootstrap resampling. The figure clearly shows that solar irradiation and module temperature are the primary drivers of PV efficiency, followed by air temperature. Humidity has a moderate influence, while wind speed and atmospheric pressure have relatively minor effects. Smaller error bars associated with the top-ranked features indicate high confidence in their importance, whereas larger error bars for lower-ranked variables reflect greater uncertainty. This analysis highlights that optimizing PV system performance should primarily focus on maximizing solar irradiation and controlling module temperature, while other environmental variables play secondary roles. Feature importance rankings for Random Forest and XGBoost models based on environmental input variables. Error bars represent 95% confidence intervals obtained from bootstrap resampling. Solar irradiation and temperature-related variables are identified as the most influential factors affecting PV efficiency, while humidity, wind speed, and pressure have secondary effects.

3.3. Linear Regression Analysis

In Figure 8a, the Linear Regression model’s predicted efficiency values are plotted against the actual measured efficiencies. Green dots represent the model’s predictions for each observation compared to the true values, while the black dashed line indicates perfect predictions, where predicted equals actual. Most points are clustered near this line, showing that the model generally predicts efficiency accurately. However, some deviations appear at very low or very high efficiency values, suggesting slightly larger errors for these outliers. Overall, the model provides reliable predictions across the typical range of efficiency values. Figure 8b illustrates the influence of environmental input parameters on PV efficiency based on Linear Regression coefficients. Solar irradiation is identified as the most influential factor, showing a strong positive relationship with PV efficiency. Module temperature and air temperature also contribute to the model, but their effect is weaker and partly negative, reflecting thermal losses in PV systems. Humidity shows a slight negative impact on efficiency, indicating reduced performance under moist conditions. Wind speed and atmospheric pressure have relatively small effects compared to other variables. The Linear Regression model confirms that solar irradiation is the dominant driver of PV efficiency, while temperature-related variables and humidity play secondary roles. The model provides a simplified but physically consistent interpretation of PV system behavior under varying environmental conditions.
Figure 8c shows the effect of ambient temperature on PV efficiency, with blue dots representing individual measurements and the red line indicating the linear regression fit. Efficiency is more variable at low to moderate temperatures, and the regression suggests a slight decrease at higher temperatures, indicating that extreme heat may slightly reduce performance. Figure 8d illustrates the effect of solar irradiance on efficiency, showing a clear positive correlation where higher irradiance consistently increases PV output. Individual observations are slightly scattered around the regression line, but the overall trends are clear. Comparing both panels highlights that solar irradiance is the dominant factor affecting PV efficiency, while ambient temperature has a smaller, less consistent impact, demonstrating that irradiance is critical for PV performance, whereas temperature plays a secondary role.

3.4. Model Comparison Across Methods

To avoid redundancy, the performance evaluation of all machine learning models is consolidated into a single comparative analysis. Figure 9 presents a unified comparison of Linear Regression, Artificial Neural Network, Random Forest, and XGBoost models for photovoltaic (PV) efficiency prediction under identical environmental conditions. The results indicate clear differences in model performance. XGBoost achieves the highest predictive accuracy and stability, demonstrating its superior ability to capture complex nonlinear relationships between environmental variables and PV efficiency. Random Forest follows as the second-best model, providing strong and consistent performance due to its ensemble structure. The Artificial Neural Network also performs well by effectively modeling nonlinear interactions, although its accuracy is slightly lower than tree-based ensemble methods. In contrast, Linear Regression shows the weakest performance, as it cannot adequately represent nonlinear dependencies inherent in PV system behavior. Across all models, solar irradiation is consistently identified as the most dominant factor influencing PV efficiency, followed by module temperature. These variables strongly determine energy conversion efficiency, while humidity and wind speed exhibit secondary effects related to environmental cooling and atmospheric conditions. Overall, the consolidated analysis clearly demonstrates that advanced machine learning methods, particularly XGBoost and Random Forest, provide the most reliable predictions for PV performance under real-world climatic variability.
The comparison of different machine learning models for predicting PV panel efficiency is illustrated in Figure 9. Figure 9a shows Linear Regression (LR), which is simple and interpretable but limited in capturing nonlinear behavior under extreme temperatures or very high solar irradiation. Figure 9b presents the Artificial Neural Network (ANN), which stands out with an R2 around 0.95, capturing both linear and nonlinear interactions between temperature, irradiance, humidity, and wind, closely matching measured efficiency values. Figure 9c depicts Random Forest (RF), effectively modeling nonlinear relationships and environmental interactions, though it can slightly overfit if parameters are not optimized. Figure 9d shows XGBoost, which handles high-dimensional data and complex nonlinearities, providing highly accurate and robust predictions, albeit requiring careful tuning. From a physical perspective, the analysis reinforces known PV behavior: temperature negatively impacts efficiency, solar irradiance is the primary driver, humidity reduces output, and wind offers a minor cooling benefit. Taken together, ANN and XGBoost deliver the most reliable predictions across diverse environmental conditions, while LR offers simpler but less precise insights. Overall, these models illustrate the balance between interpretability, complexity, and accuracy in PV performance modeling, demonstrating how advanced algorithms can optimize system design and operation under varying conditions.

3.5. Energy and Exergy Efficiency Analysis

Analyzing both energy and exergy efficiency in photovoltaic (PV) systems is essential for understanding not only the amount of electricity produced but also the quality of that energy and the inherent thermodynamic losses. Exergy efficiency, unlike energy efficiency, accounts for the quality of energy and the irreversibilities in the conversion process. Sensitivity analysis based on the Petela model shows that ambient temperature has only a minor effect on the exergy input through the reference temperature term (T0/Ts). While higher ambient temperatures reduce the theoretical exergy potential of solar radiation, their impact on the calculated exergy efficiency is relatively small compared to the effects of solar irradiance and module temperature. This indicates that exergy analysis provides critical insight into the true thermodynamic performance of PV systems, complementing traditional energy efficiency metrics. Furthermore, uncertainty analysis suggests that the calculated energy efficiencies have an estimated uncertainty of ±3–5%, confirming the reliability of these evaluations.
Figure 10 compares energy and exergy efficiencies, highlighting that most data points fall below the ideal 1:1 line. This clearly indicates that PV systems experience greater exergy losses than energy losses, reflecting unavoidable thermodynamic irreversibilities. The deviation from the ideal line increases at higher temperatures (40 °C and 55 °C), demonstrating that exergy losses are more pronounced under extreme conditions. While systems with high energy efficiency generally also show high exergy efficiency, exergy remains consistently lower, emphasizing the necessity of incorporating exergy analysis to fully assess PV system performance.
Figure 11 presents a comparison of actual exergy efficiencies with XGBoost predictions. The data points cluster closely around the 45° line, demonstrating the high accuracy of the XGBoost model. Minor deviations exist but do not significantly affect overall performance. This confirms that XGBoost effectively captures the complex, nonlinear interactions between environmental variables and exergy efficiency, making it a valuable tool for predicting PV performance under varying conditions.
Figure 12 further explores the relationship between energy and exergy efficiency, with points colored according to module temperature (blue = 25 °C, orange = 40 °C, red = 55 °C). At lower temperatures, energy and exergy efficiencies are closely aligned, reflecting optimal system performance. As temperatures rise, the gap widens, with exergy losses becoming more pronounced. This illustrates the significant influence of temperature on both the quantity and quality of energy production, highlighting how thermodynamic inefficiencies intensify under extreme heat. XGBoost accurately models these complex relationships, revealing that temperature and solar irradiance are the dominant factors, while humidity and wind speed have secondary effects.
Table 6 shows strong positive correlations between energy production, environmental impact, and financial performance. Daily energy production correlates highly with lifetime energy (0.90), CO2 reduction (0.80), and income (0.75), indicating that improvements in PV efficiency directly enhance both environmental and financial outcomes. Lifetime energy and CO2 reduction also correlate strongly (0.85), reinforcing the long-term sustainability benefits of higher energy output. Although correlations with income are slightly lower, they remain positive, showing that efficient PV systems yield both ecological and economic advantages.
Looking at the graphs in Figure 13, the impact of PV panel efficiency on energy production, environmental benefits, and financial returns is clearly visible. Figure 13a shows a strong and nearly linear relationship between efficiency and daily energy production; as efficiency increases, daily energy output rises significantly, directly boosting instantaneous energy generation. Figure 13b illustrates the positive relationship between efficiency and lifetime energy production, although the data is more scattered, indicating that long-term energy output can vary depending on environmental and operational conditions. Figure 13c highlights a strong and nearly linear relationship between efficiency and CO2 reduction; more efficient panels generate more energy and consequently reduce carbon emissions to a greater extent, enhancing environmental benefits. Finally, Figure 13d shows a positive relationship between efficiency and income, though the increase is not linear and exhibits more fluctuation; while higher efficiency generally leads to higher revenue, market conditions and energy prices prevent this growth from being perfectly proportional. Taken together, these graphs clearly demonstrate that PV efficiency is a key parameter influencing energy production, environmental impact, and financial returns.
Figure 14 provides a pairwise analysis of all system parameters. Efficiency shows a strong positive correlation with daily energy, CO2 reduction, and lifetime energy, while income shows a moderately positive but more variable relationship. The scatterplots highlight that increases in efficiency generally lead to gains across environmental and financial metrics. Importantly, this visualization illustrates the multifaceted impact of improving PV efficiency: not only does it enhance energy production, but it also maximizes environmental benefits and financial returns, emphasizing the holistic value of high-performance PV systems.

4. Discussion

This study provides a comprehensive evaluation of a commercial silicon-based photovoltaic (PV) system by integrating energy and exergy analyses with machine learning-based predictive models. An uncertainty analysis confirmed that measured irradiance and electrical parameters yield reliable energy efficiency values, supporting the robustness of the dataset and the subsequent model predictions. The results clearly indicate that solar irradiance is the dominant factor driving PV efficiency and energy production, while high module temperatures and humidity reduce performance. Wind speed provides a modest cooling benefit, helping mitigate thermal losses [50,51]. Among the machine learning models, XGBoost emerged as the most accurate and reliable, achieving a Test R2 of 0.9967. Its superior performance is due to the gradient boosting approach, which sequentially optimizes decision trees to account for complex nonlinear interactions among environmental and system variables [52]. Random Forest and ANN also performed well, with Test R2 values of 0.9724 and 0.9612, respectively, but XGBoost consistently provided more precise predictions across the entire range of observed conditions. Linear Regression highlighted the limitations of assuming linear relationships in inherently nonlinear PV systems, while ensemble tree-based methods captured the complex dependencies effectively [53]. From a practical perspective, these findings suggest several recommendations for PV system design and operation. Optimizing module orientation and providing passive or active cooling can mitigate high-temperature losses, particularly in hot climates [54]. Wind exposure should be considered to enhance natural convective cooling, and areas with high and consistent solar irradiance should be prioritized to maximize energy output and CO2 reduction potential [55]. In regions with high humidity, protective measures such as anti-condensation coatings or panel covers can help maintain efficiency. Additionally, predictive models like XGBoost can be integrated into system monitoring to anticipate efficiency drops due to environmental fluctuations, enabling proactive maintenance and performance optimization [56]. The study also highlights the broader environmental and economic implications. Higher PV efficiency directly translates to greater daily and lifetime energy production, which in turn enhances CO2 emission reductions and potential revenue generation. Efficiency improvements were shown to correlate strongly with lifetime energy (R2 = 0.90) and CO2 reduction (R2 = 0.85), while income exhibited a positive, though slightly more variable, relationship (R2 = 0.75) [57]. These results emphasize that optimizing PV efficiency not only improves energy output but also maximizes environmental benefits and financial returns.
Finally, combining energy and exergy analyses with predictive modeling provides a holistic framework for understanding PV system performance. Exergy analysis revealed that solar irradiance and module temperature remain the dominant factors controlling the quality of energy conversion, while ambient temperature variations have a secondary effect [51]. Integrating these insights with machine learning predictions enables both reliable performance forecasting and evidence-based guidance for system optimization in real-world applications. In conclusion, XGBoost is recommended as the preferred predictive model for complex PV systems due to its high accuracy and ability to capture nonlinear relationships. The findings provide actionable guidance for PV system design, operational management, and policy planning, emphasizing the intertwined technical, environmental, and economic benefits of high-efficiency PV systems [52,53,54,55,56,57].

5. Conclusions

This study investigated the performance of a silicon-based photovoltaic (PV) system under varying environmental conditions using energy exergy analysis and machine learning models. The results demonstrate that PV efficiency is primarily governed by solar irradiance and module temperature. High irradiance increases energy production and CO2 reduction, while elevated temperature and humidity reduce efficiency. Wind speed provides a minor cooling effect that slightly improves performance. Among the evaluated machine learning models, XGBoost achieved the highest predictive accuracy (R2 = 0.9967), followed by Random Forest (R2 = 0.9724) and Artificial Neural Networks (R2 = 0.9612). Linear and Polynomial Regression showed limited performance due to their inability to capture nonlinear relationships, while KNN and SVR provided moderate accuracy. The results confirm that all three research objectives have been successfully achieved. The study effectively (i) identified the influence of key environmental and electrical parameters on PV efficiency, (ii) compared multiple machine learning models and demonstrated the superiority of ensemble methods, particularly XGBoost, and (iii) provided practical insights for improving PV system design and operational efficiency under real-world climatic conditions. From a methodological perspective, the integrated framework combining energy exergy analysis with machine learning proved to be robust and effective. However, further improvement could be achieved by extending the dataset to multiple geographic locations and incorporating long-term degradation effects, which would enhance model generalization and predictive reliability under diverse climatic conditions. The proposed approach provides a reliable framework for PV performance prediction and supports the development of more efficient and sustainable solar energy systems.

Author Contributions

Conceptualization, G.Ş. and E.A.; methodology, G.Ş.; software, G.Ş.; validation, G.Ş. and E.A.; formal analysis, G.Ş.; investigation, G.Ş.; resources, G.Ş.; data curation, G.Ş.; writing—original draft preparation, G.Ş.; writing—review and editing, G.Ş.; visualization, G.Ş.; supervision, G.Ş.; project administration, G.Ş.; funding acquisition, E.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in the publication were obtained from meteorology data made available by KNMI.

Acknowledgments

Only Python 3.12.3-amd63 programming was used for the analysis and modeling. No other software, tools, or AI-based systems were utilized during the preparation of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. World Health Organization (WHO). Air Pollution and Health; WHO: Geneva, Switzerland, 2021. [Google Scholar]
  2. Gernaat, D.E.H.J.; de Boer, H.S.; Daioglou, V.; Yalew, S.G.; Müller, C.; van Vuuren, D.P. Climate change impacts on renewable energy supply. Nat. Clim. Change 2021, 11, 119–125. [Google Scholar] [CrossRef]
  3. REN21. Renewables Global Status Report 2024; REN21 Secretariat: Paris, France, 2024. [Google Scholar]
  4. Çengel, Y.A.; Boles, M.A. Thermodynamics: An Engineering Approach, 10th ed.; McGraw-Hill: New York, NY, USA, 2019. [Google Scholar]
  5. Nur, A.; Gure, B.; Rustemli, S.; Gure, O.B. Wind Speed Prediction in Microgrid Energy Management Using the Random Forest and Artificial Neural Networks Methods. Middle East J. Sci. 2025, 11, 276–289. [Google Scholar] [CrossRef]
  6. IEA. Task 13: Soiling and Its Impact on PV Yield; IEA: Paris, France, 2024. [Google Scholar]
  7. Sahin, G.; Isik, G.; van Sark, W.G.J.H.M. Predictive modeling of PV solar power plant efficiency considering weather conditions: A comparative analysis of artificial neural networks and multiple linear regression. Energy Rep. 2023, 10, 2837–2849. [Google Scholar] [CrossRef]
  8. Demir, S.; Sahin, E.K. An Investigation of Feature Selection Methods for Soil Liquefaction Prediction Based on Tree-Based Ensemble Algorithms Using AdaBoost, Gradient Boosting, and XGBoost. Neural Comput. Appl. 2023, 35, 3173–3190. [Google Scholar] [CrossRef]
  9. IEA. Soiling and Cleaning Strategies; IEA: Paris, France, 2023. [Google Scholar]
  10. Rana, M.; Rahman, A. Multiple Steps Ahead Solar Photovoltaic Power Forecasting Based on Univariate Machine Learning Models and Data Re-sampling. Sustain. Energy Grids Netw. 2020, 21, 100286. [Google Scholar] [CrossRef]
  11. Shukla, A.K.; Sudhakar, K.; Baredar, P.; Mamat, R. Solar PV and BIPV System: Barriers, Challenges and Policy Recommendations in India. Renew. Sustain. Energy Rev. 2018, 82, 3314–3322. [Google Scholar] [CrossRef]
  12. Liu, W.; Shen, Y.; Aungkulanon, P.; Ghalandari, M.; Le, B.N.; Alviz-Meza, A.; Cárdenas-Escrocia, Y. Machine learning applications for photovoltaic system optimization in zero green energy buildings. Energy Rep. 2023, 9, 2787–2796. [Google Scholar] [CrossRef]
  13. Asiedu, S.T.; Nyarko, F.K.A.; Boahen, S.; Effah, F.B.; Asaaga, B.A. Machine Learning Forecasting of Solar PV Production Using Single and Hybrid Models over Different Time Horizons. Heliyon 2024, 10, e28898. [Google Scholar] [CrossRef]
  14. Şahin, G.; Levent, I.; Işık, G.; Sark, W.; Rustemli, S. Prediction and comparative analysis of rooftop PV solar energy efficiency considering indoor and outdoor parameters under real climate conditions factors with machine learning model. Comput. Model. Eng. Sci. 2025, 143, 1215–1248. [Google Scholar] [CrossRef]
  15. Kumar, M.; Malik, P.; Chandel, R.; Chandel, S.S. Development of a novel solar PV module model for reliable power prediction under real outdoor conditions. Renew. Energy 2023, 217, 119224. [Google Scholar] [CrossRef]
  16. Attou, A.; Massoum, A.; Chadli, M. Comparison Study of Two Tracking Methods for Photovoltaic Systems. Rev. Roum. Sci. Tech. Électrotech. Énergétique. 2015, 60, 205–214. [Google Scholar]
  17. Royne, A.; Dey, C.J.; Mills, D.R. Cooling of Photovoltaic Cells. Sol. Energy Mater. Sol. Cells 2005, 86, 451–483. [Google Scholar] [CrossRef]
  18. Ratshilengo, M.; Sigauke, C.; Bere, A. Short-term solar power forecasting using genetic algorithms: An application using south african data. Appl. Sci. 2021, 11, 4214. [Google Scholar] [CrossRef]
  19. Kumar, D.; Chauhan, Y.K.; Pandey, A.S.; Srivastava, A.K.; Kumar, V.; Alsaif, F.; Elavarasan, R.M.; Islam, M.R.; Kannadasan, R.; Alsharif, M.H. A Novel Hybrid MPPT Approach for Solar PV Systems Using Particle-Swarm Optimization-Trained Machine Learning and Flying Squirrel Search Optimization. Sustainability 2023, 15, 5575. [Google Scholar] [CrossRef]
  20. Luo, X.; Zhang, D.; Zhu, X. Deep learning based forecasting of photovoltaic power generation by incorporating domain knowledge. Energy 2021, 225, 120240. [Google Scholar] [CrossRef]
  21. de Campos, B.N.; Maionchi, D.d.O.; da Silva, J.G.; Biudes, M.S.; Oliveira, N.N.d.; Palácios, R.d.S. Photovoltaic Energy Modeling Using Machine Learning Applied to Meteorological Variables. Sustainability 2025, 17, 7506. [Google Scholar] [CrossRef]
  22. Green, M.A.; Dunlop, E.D.; Yoshita, M.; Kopidakis, N.; Bothe, K.; Siefer, G.; Hinken, D.; Rauer, M.; Hohl-Ebinger, J.; Hao, X. Solar Cell Efficiency Tables (Version 64). Prog. Photovolt. Res. Appl. 2024, 32, 425–441. [Google Scholar] [CrossRef]
  23. Zhang, L.; Jánošík, D. Enhanced short-term load forecasting with hybrid machine learning models: CatBoost and XGBoost approaches. Expert Syst. Appl. 2024, 241, 122686. [Google Scholar] [CrossRef]
  24. Kaya, F.; Sahin, G.; Alma, M.H. Investigation effects of environmental and operating factors on PV panel efficiency using by multivariate linear regression. Int. J. Energy Res. 2020, 45, 554–567. [Google Scholar] [CrossRef]
  25. Rosen, M.A.; Dincer, I. Exergy as the Confluence of Energy, Environment and Sustainable Development. Exergy Int. J. 2001, 1, 3–13. [Google Scholar] [CrossRef]
  26. Allal, Z.; Noura, H.N.; Salman, O.; Chahine, K. Leveraging the power of machine learning and data balancing techniques to evaluate stability in smart grids. Eng. Appl. Artif. Intell. 2024, 133, 108304. [Google Scholar] [CrossRef]
  27. Zhang, D.; Gong, Y. The comparison of LightGBM and XGBoost coupling factor analysis and prediagnosis of acute liver failure. IEEE Access 2020, 8, 220990–221003. [Google Scholar] [CrossRef]
  28. Kivambe, M.; Abdallah, A.; Figgis, B.; Scabbia, G. Assessing Vertical East-West Bifacial Photo-voltaic Systems in Desert Environments: Energy Yield and Soiling Mitigation. Sol. Energy 2024, 279, 112835. [Google Scholar] [CrossRef]
  29. Levent, I.; Sahin, G.; Isık, G.; van Sark, W.G.J.H.M. Comparative Analysis of Advanced Machine Learning Regression Models with Advanced Artificial Intelligence Techniques to Predict Rooftop PV Solar Power Plant Efficiency Using Indoor Solar Panel Parameters. Appl. Sci. 2025, 15, 3320. [Google Scholar] [CrossRef]
  30. Ellabban, O.; Abu-Rub, H.; Blaabjerg, F. Renewable Energy Resources: Current Status, Future Prospects, and Their Enabling Technology. Renew. Sustain. Energy Rev. 2014, 39, 748–764. [Google Scholar] [CrossRef]
  31. Wang, H.; Yi, H.; Peng, J.; Wang, G.; Liu, Y.; Jiang, H.; Liu, W. Deterministic and probabilistic forecasting of photovoltaic power based on deep convolutional neural network. Energy Convers. Manag. 2017, 153, 409–422. [Google Scholar] [CrossRef]
  32. Behera, M.K.; Majumder, I.; Nayak, N. Solar photovoltaic power forecasting using optimized modified extreme learning machine technique. Eng. Sci. Technol. Int. J. 2018, 21, 428–438. [Google Scholar] [CrossRef]
  33. Driesse, A.; Theristis, M.; Stein, J.S. A New Photovoltaic Module Efficiency Model for Energy Prediction and Rating. IEEE J. Photovolt. 2021, 11, 527–534. [Google Scholar] [CrossRef]
  34. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  35. Balachandran, G.B.; Devisridhivyadharshini, M.; Ramachandran, M.E. Comparative Investigation of Imaging Techniques, Pre-processing, and Visual Fault Diagnosis Using Artificial Intelligence Models for Solar Photovoltaic System—A Comprehensive Review. Measurement 2024, 232, 114683. [Google Scholar] [CrossRef]
  36. Kim, J.Y.; Shin, U.H.; Kim, K. Predicting biomass composition and operating conditions in fluidized bed biomass gasifiers: An automated machine learning approach combined with cooperative game theory. Energy 2023, 280, 128138. [Google Scholar] [CrossRef]
  37. Zafar, M.R.; Khan, N. Deterministic local interpretable model-agnostic explanations for stable explainability. Mach. Learn. Knowl. Extr. 2021, 3, 525–541. [Google Scholar] [CrossRef]
  38. Hashemi, B.; Taheri, S.; Cretu, A.M. Systematic analysis and computational intelligence based modeling of photovoltaic power generation in snow conditions. IEEE J. Photovolt. 2022, 12, 406–420. [Google Scholar] [CrossRef]
  39. Mai, C.; Zhang, L.; Hu, X. Combining Dynamic Adaptive Snake Algorithm with Perturbation and Observation for MPPT in PV Systems Under Shading Conditions. Appl. Soft Comput. 2024, 162, 111822. [Google Scholar] [CrossRef]
  40. Blaifi, S.-A.; Mellit, A.; Taghezouit, B.; Moulahoum, S.; Hafdaoui, H. A Simple Non-Parametric Model for Photovoltaic Output Power Prediction. Renew. Energy 2025, 240, 122183. [Google Scholar] [CrossRef]
  41. Tripathi, A.K.; Aruna, M.; Elumalai, P.V.; Karthik, K.; Khan, S.A.; Asif, M.; Rao, K.S. Advancing Solar PV Panel Power Prediction: A Comparative Machine Learning Approach in Fluctuating Environmental Conditions. Case Stud. Therm. Eng. 2024, 59, 104459. [Google Scholar] [CrossRef]
  42. Abdelsattar, M.; AIsmeil, M.; Menoufi, K.; AbdelMoety, A.; Emad-Eldeen, A. Evaluating Machine Learning and Deep Learning Models for Predicting Wind Turbine Power Output from Environmental Factors. PLoS ONE 2025, 20, e0317619. [Google Scholar] [CrossRef]
  43. Thangavelu, M.; Parthiban, V.J.; Kesavaraman, D.; Murugesan, T. Forecasting of Solar Radiation for a Cleaner Environment Using Robust Machine Learning Techniques. Environ. Sci. Pollut. Res. 2023, 30, 30919–30932. [Google Scholar] [CrossRef] [PubMed]
  44. Intergovernmental Panel on Climate Change (IPCC). Climate Change 2022: Mitigation of Climate Change; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar] [CrossRef]
  45. Saidur, R.; BoroumandJazi, G.; Mekhlif, S.; Jameel, M. Exergy Analysis of Solar Energy Applications. Renew. Sustain. Energy Rev. 2012, 16, 350–356. [Google Scholar] [CrossRef]
  46. Kalogirou, S.A. Solar Energy Engineering: Processes and Systems, 2nd ed.; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
  47. Jain, P.; Islam, M.T.; Alshammari, A.S. Comparative Analysis of Machine Learning Techniques for Metamaterial Absorber Performance in Terahertz Applications. Alex. Eng. J. 2024, 103, 51–59. [Google Scholar] [CrossRef]
  48. Tsanakas, I.A.; Pilat, E. Hybrid Data-Driven Modeling and Prediction of Photovoltaic Soiling Losses. Sol. RRL 2025, 9, E202500576. [Google Scholar] [CrossRef]
  49. Esram, T.; Chapman, P.L. Comparison of Photovoltaic Array Maximum Power Point Trac-king Techniques. IEEE Trans. Energy Convers. 2007, 22, 439–449. [Google Scholar] [CrossRef]
  50. Oyarzún-Aravena, A.M.; Chen, J.; Brownbridge, G.; Akroyd, J.; Kraft, M. An analysis of renewable energy resources and options for the energy transition in Chile. Appl. Energy 2025, 381, 125107. [Google Scholar] [CrossRef]
  51. Ndiaye, E.H.M.; Ndiaye, A.; Faye MGueye, D.; Ba, A.; Traore, M. Analysis of the impact of irradiance and temperature on photovoltaic production: A statistical and machine learning approach. MethodsX 2025, 15, 103716. [Google Scholar] [CrossRef]
  52. Al Kez, D.; Foley, A.M.; Wong, F.B.M.H.; Dolfi, A.; Srinivasan, G. AI-driven cooling technologies for high-performance data centres: State-of-the-art review and future directions. Sustain. Energy Technol. Assess. 2025, 82, 104511. [Google Scholar] [CrossRef]
  53. Bamisile, O.; Acen, C.; Cai, D.; Huang, Q.; Staffell, I. The environmental factors affecting solar photovoltaic output. Renew. Sustain. Energy Rev. 2025, 208, 115073. [Google Scholar] [CrossRef]
  54. Zhang, F.; Wen, Y.; Jiang, M.; Liu, T.; Li, X. Wind speed profile model of the desert photovoltaic arrays. J. Wind. Eng. Ind. Aerodyn. 2025, 267, 106239. [Google Scholar] [CrossRef]
  55. Yin, Z.; Lu, X.; Nielsen, C.P.; Cui, R.Y.; Ou, Y.; Han, M.; Shi, M.; Ruan, Z.; Wang, J.; Su, Y.; et al. Mitigating inequity risks in China’s net-zero energy transition via an enhanced renewable-guided industrial spatial reconfiguration. Innovation 2026, 7, 1308. [Google Scholar] [CrossRef]
  56. Rashid, A.B.; Karim Kausik, M.A. AI revolutionizing industries worldwide: A comprehensive overview of its diverse applications. Hybrid Adv. 2024, 7, 100277. [Google Scholar] [CrossRef]
  57. Sebestyén, V. Renewable and Sustainable Energy Reviews: Environmental impact networks of renewable energy power plants. Renew. Sustain. Energy Rev. 2021, 151, 111626. [Google Scholar] [CrossRef]
Figure 1. Photovoltaic system installation site used for data acquisition in Kars Sarıkamış.
Figure 1. Photovoltaic system installation site used for data acquisition in Kars Sarıkamış.
Applsci 16 05049 g001
Figure 2. (a) Shows predicted versus actual PV efficiency for four machine learning models: Linear Regression, Artificial Neural Networks, Random Forest, and XGBoost, (b) the parameter effects based on the XGBoost model in the same visual.
Figure 2. (a) Shows predicted versus actual PV efficiency for four machine learning models: Linear Regression, Artificial Neural Networks, Random Forest, and XGBoost, (b) the parameter effects based on the XGBoost model in the same visual.
Applsci 16 05049 g002
Figure 3. (a) Graph showing the comparison of predicted and actual PV efficiency over time for all models and (b) comparison of Predicted and Actual PV Efficiency for all models.
Figure 3. (a) Graph showing the comparison of predicted and actual PV efficiency over time for all models and (b) comparison of Predicted and Actual PV Efficiency for all models.
Applsci 16 05049 g003
Figure 4. Daily and Monthly Trends of PV Efficiency and Solar Irradiation.
Figure 4. Daily and Monthly Trends of PV Efficiency and Solar Irradiation.
Applsci 16 05049 g004
Figure 5. 3D Interaction Surface (Module Temperature × Solar Irradiation).
Figure 5. 3D Interaction Surface (Module Temperature × Solar Irradiation).
Applsci 16 05049 g005
Figure 6. ANN-Predicted PV Efficiency Surface (Enhanced Visualization).
Figure 6. ANN-Predicted PV Efficiency Surface (Enhanced Visualization).
Applsci 16 05049 g006
Figure 7. Feature Importance Rankings using Permutation + Bootstrap.
Figure 7. Feature Importance Rankings using Permutation + Bootstrap.
Applsci 16 05049 g007
Figure 8. (a) Comparison of measured and predicted PV efficiency based on the Linear Regression model over the monthly period. (b) Influence coefficients of environmental input parameters on PV efficiency obtained from the Linear Regression model. (c) Relationship between ambient temperature and PV efficiency, including the linear regression trend line. (d) Relationship between solar irradiance and PV efficiency, showing the positive linear trend between irradiance and system performance.
Figure 8. (a) Comparison of measured and predicted PV efficiency based on the Linear Regression model over the monthly period. (b) Influence coefficients of environmental input parameters on PV efficiency obtained from the Linear Regression model. (c) Relationship between ambient temperature and PV efficiency, including the linear regression trend line. (d) Relationship between solar irradiance and PV efficiency, showing the positive linear trend between irradiance and system performance.
Applsci 16 05049 g008aApplsci 16 05049 g008b
Figure 9. Comparison of predicted and actual photovoltaic (PV) efficiency for different machine learning models including (a) Linear Regression, (b) Artificial Neural Network, (c) Random Forest and (d) XGBoost.
Figure 9. Comparison of predicted and actual photovoltaic (PV) efficiency for different machine learning models including (a) Linear Regression, (b) Artificial Neural Network, (c) Random Forest and (d) XGBoost.
Applsci 16 05049 g009
Figure 10. Comparison of Energy and Exergy Efficiencies.
Figure 10. Comparison of Energy and Exergy Efficiencies.
Applsci 16 05049 g010
Figure 11. Comparison of XGBoost prediction with actual Exergy Efficiencies.
Figure 11. Comparison of XGBoost prediction with actual Exergy Efficiencies.
Applsci 16 05049 g011
Figure 12. Energy and Exergy Efficiency Representation with Temperature Colors.
Figure 12. Energy and Exergy Efficiency Representation with Temperature Colors.
Applsci 16 05049 g012
Figure 13. (a) Relationship between PV efficiency and daily energy production, (b) relationship between PV efficiency and lifetime energy production, (c) relationship between PV efficiency and CO2 reduction, (d) relationship between PV efficiency and income.
Figure 13. (a) Relationship between PV efficiency and daily energy production, (b) relationship between PV efficiency and lifetime energy production, (c) relationship between PV efficiency and CO2 reduction, (d) relationship between PV efficiency and income.
Applsci 16 05049 g013
Figure 14. Pairwise relationship between parameters.
Figure 14. Pairwise relationship between parameters.
Applsci 16 05049 g014
Table 1. Internal features for the panel that was measured.
Table 1. Internal features for the panel that was measured.
FeatureValues Between
Open Circuit Voltage (Voc) [Volt]38.10 V
Short Circuit Current (Isc) [A]14.07 A
Maximum Power (Pmpp) [W]420 W
Solar Irradiation Spread [m2]0.000616–0.008284
Maximum Voltage (Vmpp) [V]31.50 V
Maximum Current (Impp) [A]13.34 A
FillFactor (FF) [%]50–75
Parallel Resistance (Rp) [Ohm]0.18–1.20
Series Resistance (Rs) [Ohm]0.18–0.97
Module Temperature [°C]25 °C
Efficiency [%]5–20
Table 2. Dataset Description.
Table 2. Dataset Description.
ItemDescription
LocationKars Sarıkamış, Turkey
Collection period2023–2024
Sampling frequencyHourly measurements
Total samplesA one-month data set
Input variablesSolar irradiation, temperature, humidity, wind speed, pressure
Output variablesPV efficiency, CO2 reduction, income
Missing data handlingLinear interpolation + outlier removal
Train/Test split80%/20%
Cross-validation5-fold CV
NormalizationMin–Max scaling
A detailed description of model hyperparameters and dataset characteristics is provided in Section 2.7 to ensure reproducibility.
Table 3. Model performance metrics.
Table 3. Model performance metrics.
ModelR2RMSEMAEMSE
XGBoost0.99670.09430.00670.0089
Random Forest0.97240.10580.00950.0112
ANN0.96120.0130.0100.012
Linear Regression0.47770.12970.01230.0168
Table 4. Effects of parameters on PV efficiency (XGBoost).
Table 4. Effects of parameters on PV efficiency (XGBoost).
RankParameterEffect Coefficient
5Pressure0.070400
4Air Temp0.062560
2Module Temp0.052237
1Solar Irradiation0.049422
3Wind Speed0.011316
7Lifetime Energy (mWh)0.010462
6Humidity0.008333
8CO2 Reduction (kt)0.596283
9Income (USD)0.138987
Table 5. Feature Importance Rankings with 95% Confidence Intervals.
Table 5. Feature Importance Rankings with 95% Confidence Intervals.
FeatureRF Importance95% CIXGBoost Importance95% CI
Solar Irradiation0.28[0.10–0.52]0.31[0.08–0.60]
Module Temperature0.22[0.08–0.41]0.26[0.05–0.48]
Air Temperature0.18[0.06–0.35]0.21[0.04–0.40]
Humidity0.12[0.03–0.28]0.09[0.02–0.22]
Wind Speed0.10[0.02–0.25]0.08[0.01–0.20]
Pressure0.10[0.02–0.22]0.05[0.01–0.15]
Table 6. Correlation Matrix of PV System Performance Metrics.
Table 6. Correlation Matrix of PV System Performance Metrics.
Daily Energy (kWh)Lifetime Energy (kWh)CO2 Reduction (kg)Income (USD)
Daily Energy (kWh)1.000.900.800.75
Lifetime Energy (kWh)0.901.000.850.70
CO2 Reduction (kg)0.800.851.000.65
Income (USD)0.750.700.651.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Şahin, G.; Akin, E. Predictive Energy and Exergy Assessment of Photovoltaic Systems Under Dynamic Environmental Conditions Using Machine Learning. Appl. Sci. 2026, 16, 5049. https://doi.org/10.3390/app16105049

AMA Style

Şahin G, Akin E. Predictive Energy and Exergy Assessment of Photovoltaic Systems Under Dynamic Environmental Conditions Using Machine Learning. Applied Sciences. 2026; 16(10):5049. https://doi.org/10.3390/app16105049

Chicago/Turabian Style

Şahin, Gökhan, and Erdal Akin. 2026. "Predictive Energy and Exergy Assessment of Photovoltaic Systems Under Dynamic Environmental Conditions Using Machine Learning" Applied Sciences 16, no. 10: 5049. https://doi.org/10.3390/app16105049

APA Style

Şahin, G., & Akin, E. (2026). Predictive Energy and Exergy Assessment of Photovoltaic Systems Under Dynamic Environmental Conditions Using Machine Learning. Applied Sciences, 16(10), 5049. https://doi.org/10.3390/app16105049

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop