Next Article in Journal
Thermal Aspect in Operation of Inductive Current Transformers and Transducers
Previous Article in Journal
MILP-Based Optimization of Electric Bus Charging Considering Battery Degradation and Environmental Factors Under TOU Pricing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning Prediction of Thermal Losses in MonoPERC Solar Modules: A Novel Clustering Approach for Tropical Climate Applications

1
Faculty of Engineering, Fundación Universitaria Los Libertadores, Bogotá 111211, Colombia
2
Electric and Electronics Department, Universidad Nacional de Colombia, Bogotá 111311, Colombia
*
Author to whom correspondence should be addressed.
Energies 2025, 18(22), 6029; https://doi.org/10.3390/en18226029
Submission received: 2 October 2025 / Revised: 29 October 2025 / Accepted: 31 October 2025 / Published: 18 November 2025
(This article belongs to the Section A2: Solar Energy and Photovoltaic Systems)

Abstract

Thermal losses significantly impact the efficiency of photovoltaic modules, particularly under high-temperature and variable cloud cover conditions in tropical climates. This study presents a novel thermal clustering methodology for predicting thermal losses in Monocrystalline Passivated Emitter and Rear Cell (MonoPERC) solar modules. Seven machine learning algorithms were tested using two methods, a baseline approach and a thermal clustering approach, which allow better energy yield forecasting and a more comprehensive understanding of the behavior of PERC modules. The clustering methodology partitions data into distinct thermal regimes, enabling specialized model training for different temperature operating conditions. K-Nearest Neighbors (KNN) was the best model without clustering, achieving a 0.9612 correlation and a mean prediction error of 7.3 W. With the new thermal clustering method, Multi-Layer Perceptron (MLP) was the top performer, with a 0.9561 correlation and an NMAE of 0.1409. Ensemble methods, such as XGBoost and Random Forest, were also highly effective, while linear methods proved inadequate. Results demonstrate that K-Nearest Neighbors achieved superior baseline performance, while the thermal clustering approach improved prediction accuracy across all algorithms. The Multi-Layer Perceptron emerged as the best performer with the clustering methodology.

1. Introduction

Climate change is one of the most urgent challenges of the 21st century [1]. Driven largely by the relentless combustion of fossil fuels, the surge in greenhouse gas emissions has intensified the global push for cleaner, more sustainable energy alternatives [2]. Amid this transition, renewable energy sources have experienced unprecedented growth, with solar photovoltaic (PV) technologies at the forefront, offering a scalable and increasingly cost-effective pathway to decarbonization of the energy sector [3].
Photovoltaic solar energy has exhibited exponential growth in recent years, establishing itself as one of the most promising renewable technologies for addressing the climate crisis while meeting the rising global energy demand [4]. Nonetheless, one of the principal challenges associated with photovoltaic technology lies in achieving optimal conversion efficiency of photovoltaic modules, which is markedly influenced by various environmental factors, among which temperature is especially critical [5]. Under real operating conditions, photovoltaic modules typically convert only 15% to 20% of incident solar radiation into electrical energy, while the majority is converted into heat, thereby increasing the module’s temperature [6,7,8].
Previous research has explored the influence of wind flow and temperature distribution on photovoltaic modules. Authors such as [9,10,11] affirm that airflow around a photovoltaic system plays a pivotal role in its overall performance by affecting thermal dissipation of the modules. Thermal losses in photovoltaic modules, particularly those based on polycrystalline and monocrystalline technologies, have been extensively analyzed in prior studies and represent a key factor directly impacting system efficiency and energy yield [12,13].
Manufacturers provide thermal coefficients based on performance evaluations across various temperature ranges. Commercial crystalline silicon (c-Si) cells generally experience an efficiency loss of approximately 0.45% per degree Celsius increase in temperature, while amorphous silicon (a-Si) cells exhibit lower thermal sensitivity, with efficiency losses of around 0.25% [14]. This temperature-induced performance degradation is particularly significant in tropical regions such as Colombia, where elevated ambient temperatures and intense solar irradiance can lead to substantially high operating temperatures in the modules.
Despite significant technological advances in photovoltaic systems, effective thermal management remains a major obstacle to maximizing energy conversion efficiency and extending the operational lifespan of these systems. Recent studies indicate that the operating temperature of a solar cell could rise between 20% and 40% above ambient levels, resulting in power losses of up to 25% [15,16]. Furthermore, elevated temperatures not only reduce immediate power output but also accelerate aging mechanisms, potentially shortening the life of the system by up to 10% for every 10°C of sustained operation above standard test conditions [17].
The development of advanced cell architectures, particularly Passivated Emitter and Rear Cell (PERC) technology introduced commercially around 2015, has marked a significant advancement. PERC panels have become the industry standard due to several key advantages, most notably higher conversion efficiencies, with commercial modules typically achieving 19–22% efficiency compared to 15–18% for conventional technologies [18,19]. However, even these improved modules remain vulnerable to performance degradation under high-temperature conditions, especially in tropical climates where elevated ambient temperatures and intense solar irradiance create challenging operating environments.
In this context, the application of machine learning and artificial intelligence techniques in photovoltaic performance prediction has gained significant momentum in recent years, with researchers exploring algorithms such as neural networks, support vector machines, and ensemble methods to forecast solar panel output under different environmental conditions [20,21,22,23,24]. While recent studies such as that of Asiedu et al. [25] have shown the effectiveness of artificial intelligence in predicting PV output, their regression visualizations exhibit certain limitations, including skewed data distributions, absence of uncertainty representation, and a lack of distinction between training and test datasets. These limitations reduce the interpretability and generalizability of the model predictions under high-temperature conditions. Other researchers have explored thermal modeling of PV modules [6] and investigated performance under real conditions [8], yet most studies apply monolithic modeling approaches that fail to capture the non-linear and regime-dependent nature of thermal losses, particularly under high-irradiance, high-temperature conditions where predictive accuracy is most critical.
To address these research gaps, this study proposes a novel thermal clustering methodology for predicting thermal losses in MonoPERC solar modules under real outdoor conditions. This work investigates thermal loss mechanisms using machine learning models to assess how module operating temperature affects energy performance, while evaluating model robustness across multiple data-splitting scenarios. The main contributions include (1) a K-means-based clustering approach that partitions operational data into three distinct temperature regimes (low: 10–25 °C, medium: 25–40 °C, and high: 40–53 °C) to account for non-linear thermal behavior; (2) a comprehensive evaluation of seven machine learning algorithms under both baseline and cluster-enhanced frameworks; (3) implementation and validation using high-resolution experimental data collected from a PV system at the RADIANT laboratory of Fundación Universitaria Los Libertadores in Bogotá, Colombia (4.65° N, 74.07° W, 2640 m above sea level), providing relevant insights for high-altitude tropical urban climates; and (4) demonstration of how regime-specific modeling significantly improves prediction accuracy, enabling better energy yield forecasting and system optimization.
This paper is organized as follows: Section 2 describes the experimental setup, data collection methodology, and evaluation of seven machine learning algorithms, including a novel thermal clustering approach. Section 3 presents the baseline performance results, thermal clustering improvements, and comprehensive visualization analysis of prediction accuracy across all algorithmsm, and discusses the practical implications and significance of the findings for photovoltaic system optimization in tropical climates. Finally, Section 4 summarizes the key conclusions and outlines future research directions for enhanced thermal loss prediction in PERC technology.

2. Materials and Methods

2.1. Experimental Setup and Data Source

A photovoltaic (PV) system was constructed using a distributed network topology of solar panels in a modular form and Hoymiles microinverters. Unlike traditional systems, where a single inverter receives the energy generated by the entire network of photovoltaic panels, each microinverter receives energy from four photovoltaic panels as shown in Figure 1. This study’s solar power system is built with twelve 550 W p-type monocrystalline PERC panels from Yingli Solar (Baoding, China). These panels are connected to three 2 kWp microinverters, with four panels managed by each inverter. The microinverters are linked together in a cascade configuration, a design that allows the system’s energy capacity to be easily modified in the future.
In order to measure the temperature of the solar panels, a network of DS18B20 sensors has been added, controlled by a second ESP32 microcontroller. The wiring scheme for these sensors is detailed in Figure 1. These specific sensors were chosen because their 1-Wire interface allows multiple sensors to be connected on a single communication line, which helps ensure signal quality and noise immunity. They are also well-suited for this application as they can operate in harsh, moisture-rich environments, function over long distances, and operate within a wide temperature range of −55 °C to 125 °C.
A network of 12 P-type PERC solar panels with a maximum output power of 550 W was installed as shown in Figure 2, and the technical specifications provided by the manufacturer are detailed in Table 1, which includes electrical parameters such as nominal voltage (42 V) and high conversion efficiency (21.29%).
The experimental setup was located in Bogota (4.65174° N, 74.06630° W) and configured with a 5° tilt angle and a south-facing orientation (180° azimuth) to maximize solar radiation capture. This arrangement enabled the comprehensive collection of data on the system’s performance, focusing on how temperature and energy yield patterns varied under Bogota’s specific high-altitude tropical climate and diverse weather conditions.

2.2. Error Measurement Analysis

The experimental setup described in previous sections includes the following equipment:
  • DS18B20 digital temperature sensor:
    -
    Measurement range: −55 °C to 125 °C;
    -
    Resolution: 12 bits, equivalent to 0.0625 °C;
    -
    Accuracy: ±0.5 °C (−10 °C to 85 °C normal operating range);
    -
    Technology: semiconductor junction (bandgap principle), with an internal circuit integrating an Analog to Digital Converter (ADC).
  • RS485 Solar Radiation Sensor:
    -
    Measurement range: 0 to 2000 W/m2;
    -
    Accuracy: ±5% of reading;
    -
    Technology: Light sensor assembly using a silicon photodiode and a cosine corrector, with a spectral response of 300 to 1100 nm;
    -
    Resolution: 1 W/m2, non linearity <±2%;
    -
    Temperature coefficient: <±0.15%/°C.
  • Hoymiles HMS20004TA power measurement system
    -
    Capacity: 4 × 670 W = 2680 W per unit;
    -
    Maximum voltage and current: 60 V and 4 × 16 A;
    -
    Accuracy: ±2%;
    -
    Resolution: 0.1 W.
The temperature range is 10 to 53 °C, the solar radiation range is 0 to 1449 W/m2, and the maximum power output of the solar panel is 550 W. All sensors are calibrated, and each resolution is considered sufficient.

2.3. Data Collection and Preprocessing

The initial dataset underwent a rigorous five-stage cleaning process to ensure its quality. This included removing missing values, standardizing timestamps, synchronizing data points within a five minutes tolerance, detecting outliers using percentile-based methods, and validating overall data quality. This process successfully retained 85 to 95% of the original records, resulting in a reliable dataset of 2844 daily samples collected every 15 s over a seven-day period.
Using a temporal split strategy (60% for training, 20% for validation, and 20% for testing), we trained seven machine learning models: Random Forest, k-NN, MLP, Linear Regression, Ridge Regression, XGBoost, and an optimized SVM. To prevent data leakage, feature scaling was applied only to the training data for algorithms that required normalized inputs. Performance was evaluated using normalized R2, MAE, and RMSE metrics to enable cross-study comparisons.

2.4. Machine Learning Algorithms

The selection of machine learning algorithms is crucial for addressing the thermal loss prediction problem in photovoltaic modules, which exhibits nonlinear characteristics and high dependence on multiple environmental variables. This study evaluated seven representative algorithms from different learning paradigms, ranging from simple linear approaches to complex ensemble methods, aiming to identify the most suitable techniques for modeling the complex thermal dynamics of MonoPERC modules.The following describes the theoretical foundations and specific characteristics of each of these seven algorithms implemented in our experimental framework.
Linear Regression is a simple and widely used supervised machine learning algorithm that models the relationship between input variables (features) and a continuous output (target) by fitting a straight line. It minimizes the difference between predicted and actual values using a loss function, typically Mean Squared Error (MSE). Despite its simplicity, it serves as a foundation for more complex models and works well when the relationship between variables is linear [26].
Ridge regression represents a regularized linear modeling approach that employs L2 penalty terms to address fundamental challenges in statistical learning. This technique mitigates overfitting phenomena by constraining coefficient magnitudes, while simultaneously resolving multicollinearity issues that arise when predictor variables exhibit high intercorrelation. Ridge regression specifically corrects for multicollinearity in regression analysis [27].
Linear SVR is a supervised machine learning algorithm used for predicting continuous values based on a linear relationship between input features and the target variable. It is a variant of Support Vector Machines (SVM) adapted for regression tasks rather than classification. Unlike traditional linear regression, which minimizes the mean squared error, Linear SVR introduces an epsilon-insensitive loss function, meaning it ignores errors that fall within a certain epsilon range around the actual target values. Only deviations greater than this margin are penalized, which makes the model more robust to small fluctuations or noise in the data [28].
The Multilayer Perceptron (MLP) is an artificial neural network composed of multiple interconnected layers of neurons. Unlike the simple perceptron, the MLP can solve non-linear problems thanks to its hidden layers and non-linear activation functions. Its basic architecture consists of an input layer that receives the data, one or more hidden layers that process the information, and an output layer that produces the result [29].
The Random Forest Regressor is a powerful machine learning algorithm that combines multiple decision trees to create a more accurate and stable prediction model. It operates by constructing numerous decision trees during training and outputting the mean prediction of the individual trees for regression tasks. Each tree is built from a bootstrap sample of the training data, and a random subset of features is considered when splitting nodes, introducing randomness that helps prevent overfitting [30].
XGBoost, which stands for eXtreme Gradient Boosting, is a highly efficient and scalable implementation of the gradient boosting framework. It was developed by Tianqi Chen and has become widely popular in data science and machine learning due to its performance and flexibility. XGBoost builds models in a sequential manner, where each new model corrects the errors made by the previous ones, making it particularly effective for structured data [31].
K-Nearest Neighbors Regression (KNN Regression) is a non-parametric, instance-based learning algorithm. It predicts the output value for a new input by finding the K closest data points (neighbors) in the training dataset and averaging their output values [32].

2.5. Thermal Clustering Methodology

The thermal clustering approach addresses the non-linear relationship between operating temperature and power losses by partitioning the dataset based on thermal characteristics before model training. This methodology is motivated by the observation that thermal loss patterns differ significantly between low-temperature periods (morning/evening operations) and high-temperature periods (midday operations) in Bogotá’s climate conditions.
The algorithm employs seven key thermal characteristics to define the feature space for clustering:
  • Actual cell temperature: T c e l l ;
  • Temperature deviation from STC: Δ T = T c e l l − 25 °C;
  • Solar irradiation: G [W/m2];
  • Quadratic temperature terms: T c e l l 2 and Δ T 2 ;
  • Temperature-irradiation interaction: T c e l l × G ;
  • Thermal efficiency: η t h e r m a l ;
  • Temperature gradient: d T / d t .
K-means Clustering Implementation:
Feature normalization using StandardScaler:
X n o r m a l i z e d = X μ σ
Optimal cluster number determination using silhouette score:
s ( i ) = b ( i ) a ( i ) max { a ( i ) , b ( i ) }
where a ( i ) is the mean intra-cluster distance and b ( i ) is the mean nearest-cluster distance.
K-means clustering optimization:
arg   min C i = 1 k x C i | | x μ i | | 2
where C i represents cluster i and μ i is the centroid of cluster i.
The clustering process identified distinct thermal regimes:
  • Low-temperature regime: 10–25 °C (morning/evening);
  • Medium-temperature regime: 25–40 °C (transition periods);
  • High-temperature regime: 40–53 °C (midday operations).
Preliminary analysis of the thermal loss dataset revealed that photovoltaic modules exhibit distinct operational regimes under varying temperature conditions. To address the non-linear relationship between operating temperature and power losses, this study implements a thermal clustering methodology that partitions the dataset based on thermal characteristics before model training.
The clustering approach is motivated by the observation that thermal loss patterns differ significantly between low-temperature periods (morning/evening operations) and high-temperature periods (midday operations) in Bogotá’s climate conditions. Rather than training a single model across the entire temperature range (10 °C to 53 °C), the methodology identifies distinct thermal regimes and trains specialized models for each operational condition.
This approach enables algorithms to capture regime-specific thermal dynamics while maintaining computational efficiency for real-time monitoring applications. The following mathematical framework describes the implementation of the thermal clustering methodology [33].

2.6. Thermal Loss Calculation and Metrics

The thermal efficiency metric is calculated as the ratio between measured and theoretical power output at 25 °C:
η thermal = P measured P theoretical , 25 ° C
where the theoretical power at 25 °C is given by
P theoretical , 25 ° C = G G STC × P STC
with G STC = 1000 W/m2 and P STC = 550 W for the PERC panels analyzed.
Temperature Coefficient Model
The thermal loss patterns follow the standard temperature coefficient equation:
P corrected = P STC × G G STC × [ 1 + γ ( T cell T STC ) ]
where:
γ = 0.35 % / C ( temperature coefficient )
T STC = 25   C ( standard test conditions temperature )
The actual thermal losses are computed as the difference between temperature-corrected expected power and measured power:
L thermal = P expected P measured
where:
P expected = P theoretical , 25 ° C × [ 1 + γ × Δ T ]
Evaluation of the thermal loss prediction models requires comprehensive metrics that capture different aspects of model performance. To ensure robust and comparable assessment across all machine learning algorithms, this study employs a standardized set of evaluation metrics that provide both normalized and interpretable measures of prediction accuracy. Table 2 presents the mathematical formulations and descriptions of the five key metrics used to evaluate thermal loss prediction performance: normalized Mean Absolute Error (MAE), normalized Root Mean Square Error (RMSE), coefficient of determination (R2), Pearson correlation coefficient, and raw MAE in watts for practical interpretation. These metrics collectively provide a comprehensive framework for assessing model accuracy, precision, and practical applicability in photovoltaic thermal loss prediction scenarios.
where:
  • n = number of observations;
  • y i = actual thermal loss value for observation i (W);
  • y ^ i = predicted thermal loss value for observation i (W);
  • y ¯ = mean of actual values;
  • y ^ ¯ = mean of predicted values;
  • σ y = population standard deviation of actual values;
  • r [ 1 , 1 ] .

2.7. Solar Irradiance and Temperature Profiles

Figure 3 illustrates the solar irradiance measured on 16 April 2025, which reached a maximum value of 1449 W/m2 at 11:13 h. The profile follows the expected diurnal pattern, with irradiance increasing rapidly during morning hours, reaching peak values around midday, and gradually decreasing in the afternoon. Notable fluctuations in the irradiance curve, particularly during peak hours, likely indicate passing cloud cover affecting direct solar exposure. The total solar energy received throughout the day was 7.21 kWh/m2, with an average irradiance of 531 W/m2. These measurements provide crucial input data for modeling photovoltaic system performance, as they represent the available solar resource that drives energy conversion in the PV panels.
The solar resource measurements were conducted using an RS485 Solar Radiation Sensor a silicon photodiode-based pyranometer designed specifically for PV system monitoring applications. This instrument features a dome-shaped diffuser that provides precise cosine correction, ensuring accurate irradiance readings across all solar elevation angles throughout the day. With a measurement range of 0–2000 W/m2, the sensor adequately captured the full spectrum of irradiance conditions at the Bogotá installation site.
Figure 4 presents the hourly temperature profiles of photovoltaic panel surfaces over a one-day monitoring period. The data reveals substantial daily fluctuations in temperature. The thermal patterns follow the solar irradiance cycle throughout all monitored days, showing rapid morning heating from approximately 10 °C, peak temperatures reaching 48–54 °C during midday hours (10:00–14:00), and a gradual afternoon cooling. The temperature ranges varied by day, with Saturday showing the highest peak temperature of approximately 54 °C, while Thursday and Friday exhibited more moderate profiles with peaks around 35 °C.

2.8. DC Power Output Analysis of a Single PERC Module

Figure 5 presents the DC power output profile of a single 550 W p-type PERC monocrystalline solar panel monitored over a typical day. The data captures the power generation pattern throughout daylight hours (approximately 6:00 to 18:00), with peak power outputs reaching approximately 550 W during optimal irradiance conditions around midday (10:00–14:00 h). The monitoring period reveals a characteristic diurnal curve with significant fluctuations attributed to cloud cover and atmospheric conditions typical of tropical highland regions. The profile shows a gradual power increase during morning hours (6:00–10:00), followed by highly variable output during peak solar hours with pronounced oscillations between 150 W and 550 W, indicating intermittent cloud cover. Notable short-duration drops in power generation are observed throughout the midday period, demonstrating the dynamic nature of solar irradiance under variable weather conditions. The power output gradually decreases during afternoon hours (14:00–18:00) until ceasing at sunset.

3. Results

3.1. Baseline Performance Evaluation of Machine Learning Algorithms

The machine learning algorithms were evaluated for their ability to predict thermal loss, and the results showed significant variations in performance. K-Nearest Neighbors (KNN) emerged as the top-performing model, achieving a superior correlation of 0.9612 and low normalized errors (NMAE = 0.0967, NRMSE = 0.2776). This translates to a mean prediction error of only 7.3 W, or 1.3% of the panel’s rated power. This is shown in Table 3.
Ensemble methods, specifically XGBoost (NMAE = 0.1452) and Random Forest (NMAE = 0.1469), also performed exceptionally well, falling into the top tier of accuracy. The Multi-Layer Perceptron (MLP) demonstrated good potential with an NMAE of 0.1573, while Support Vector Regression (SVR) showed moderate accuracy (NMAE = 0.1832).
In contrast, linear methods such as Linear Regression and Ridge Regression were found to be inadequate for this task, with NMAE values exceeding 0.29. This confirms that linear models are not suitable for capturing the complex thermal dynamics of the system.
The study establishes three distinct performance tiers and provides a comprehensive baseline for various algorithmic categories, which will be used for future evaluation of clustering enhancements.
To address reviewer concerns regarding statistical validity, all performance indicators are reported as the Mean ( μ ) ± Standard Deviation ( σ ) obtained from the cross-validation process, thereby confirming the robustness of the results. This is shown in Table 4. The evaluation specifically emphasizes the Normalized Mean Absolute Error (NMAE), as it is the most appropriate metric for statistical comparison. NMAE provides a scale-independent measure of error, which is crucial for assessing model generalizability and transferability across different system capacities or power ratings.

3.2. Performance with Thermal Clustering

Table 5 demonstrates significant performance improvements through the thermal clustering methodology, with K-means identification of three distinct thermal regimes that allow specialized model training. Multi-Layer Perceptron emerged as the top performer with correlation of 0.9561 and NMAE = 0.1409, followed closely by K-Nearest Neighbors (correlation = 0.9584, NMAE = 0.1032) and XGBoost (correlation = 0.9525, NMAE = 0.1549). Notably, Support Vector Regression showed substantial improvement with NMAE reducing from 0.1832 to 0.1725, while Random Forest achieved NMAE = 0.1769, representing meaningful enhancement over baseline performance. Linear methods demonstrated moderate improvements, with Linear Regression NMAE improving from 0.2898 to 0.2825 and Ridge from 0.2901 to 0.2835. All algorithms showed improved correlations exceeding 0.90, with the best-performing models maintaining normalized errors below 0.16 standard deviations. The consistent improvement across all algorithms validates thermal clustering as an effective method for boosting the accuracy of photovoltaic thermal loss predictions.
Table 6 presents a focused robustness analysis for the Thermal Clustering Approach, detailing the Standard Deviation ( σ ) and the 95% Confidence Interval (CI 95%) for the NMAE metric across all algorithms. The σ values observed for the majority of models under this approach are consistently low (ranging between 0.0359 and 0.0484). This low variability is a critical indicator that the data partitioning results in high consistency and low variance in predictive performance across different cross-validation folds. This finding strongly supports the notion that the improved accuracy is structurally sound and not attributable to random data splits. Notably, the KNN Clustering model, despite a slightly higher σ ( 0.0484 ), maintains the highest upper bound on its 95% CI ( [ 0.8146 , 0.8839 ] ). This wide, yet high-performing interval confirms that, with 95% confidence, the model’s true predictive capability consistently remains superior to that of the other tested algorithms, validating the robustness of the combined KNN-Clustering strategy.

3.3. Thermal Feature Correlation Analysis

Figure 6 presents the correlation matrix between thermal characteristics used in the clustering methodology. The analysis reveals strong interdependencies among thermal variables, with several key patterns emerging from the data.
Perfect Mathematical Relationships: The correlation matrix confirms perfect correlations (r = 1.00) between Tcell and Δ T , validating the mathematical transformation Δ T = T c e l l 25   C used throughout this study. This relationship serves as an internal consistency check for the thermal measurement system. High Thermal Coupling: Strong correlations are observed between G_irradiance and Tcell_squared (r = 0.98), as well as G_irradiance and temp_irrad_interaction (r = 0.97), demonstrating the significant coupling between solar irradiation and non-linear temperature effects in PERC modules. These relationships confirm that thermal losses are fundamentally driven by the interaction between irradiance and temperature rather than by these variables independently.
Thermal Variable Clustering: The primary thermal variables (Tcell, delta_T, G_irradiance, Tcell_squared) form a highly correlated cluster with correlation coefficients ranging from 0.76 to 1.00. This clustering pattern provides scientific justification for the thermal regime identification methodology, as these variables collectively capture the fundamental thermal state of the photovoltaic system.
Independent Information Sources: Notably, eta_thermal exhibits moderate correlations (0.26–0.34) with other thermal variables, indicating that thermal efficiency captures unique performance characteristics not fully explained by temperature and irradiance alone. Similarly, temp_gradient shows minimal correlations (0.04–0.11) across all variables, confirming its role as an independent temporal dynamics indicator.
These correlation patterns validate the thermal clustering approach by demonstrating distinct groupings of related variables, enabling the identification of thermal regimes that improve model performance across all machine learning algorithms. The analysis supports the hypothesis that thermal losses in PERC modules follow complex, non-linear relationships that benefit from regime-specific modeling approaches.

3.4. Training Correlation Analysis by ML Algorithm

The comprehensive evaluation of algorithm performance across different training data proportions provides crucial insights into model scalability and data efficiency for photovoltaic thermal loss prediction. Figure 7 presents a systematic analysis of training correlations achieved by seven machine learning algorithms when training data varies from 60% to 85% of the total dataset.

Performance Analysis

Overall Performance Trends: The results demonstrate that all algorithms exhibit improved performance as training data increases from 60% to 85%, with correlation coefficients ranging from 0.907 to 0.976. This behavior confirms the expected positive relationship between training dataset size and model performance.
Algorithm Performance Ranking:
  • K-Nearest Neighbors (KNN) achieves the highest correlation coefficients (0.964–0.976), demonstrating consistent superior performance across all training percentages.
  • Multi-Layer Perceptron (MLP) and XGBoost show competitive performance with correlation coefficients of 0.960–0.972 and 0.957–0.970, respectively.
  • Support Vector Regression (SVR) exhibits good performance (0.957–0.969) with steady improvement as training data increases.
  • Random Forest demonstrates moderate performance (0.948–0.960) across all training percentages.
  • Ridge and Linear Regression show the lowest but still acceptable performance (0.907–0.924).
Training Data Sensitivity: The algorithms exhibit different sensitivities to training data size. While KNN maintains consistently high performance with minimal variation (0.012 difference between 60% and 85%), Linear Regression shows the most significant improvement with increased training data (0.017 difference), suggesting it benefits more substantially from larger datasets.
Statistical Significance: All correlation coefficients exceed 0.9, indicating strong predictive capabilities across all tested algorithms. The consistent performance improvement with increased training data validates the robustness of the experimental setup.

3.5. Prediction Accuracy Visualization

Figure 8 presents scatter plots comparing actual versus predicted thermal losses for all seven machine learning algorithms evaluated in this study. These visualizations provide crucial insights into model performance characteristics and prediction patterns across the thermal loss range of 0–300 W observed in the MonoPERC modules. The data obtained with the temperature, solar irradiance and power sensors have an accuracy of ± 0.5   C , ± 5 % and ± 2 % , considered sufficient.
The scatter plots reveal distinct performance patterns among algorithms. K-Nearest Neighbors (Figure 8c) demonstrates exceptional accuracy with minimal scatter around the perfect prediction line, consistent with its superior quantitative metrics (correlation = 0.9612, NMAE = 0.0967). Multi-Layer Perceptron (Figure 8f) exhibits excellent linearity and tightclustering, validating its strong performance in both baseline and clustered evaluations.
Ensemble methods XGBoost (Figure 8g) and Random Forest (Figure 8d) show strong predictive capability with consistent performance across the entire thermal loss range, though Random Forest exhibits slightly more scatter in the mid-range predictions (100–200 W). Support Vector Regression (Figure 8e) displays good overall correlation but shows increased variance at higher thermal loss values, indicating potential challenges in extreme temperature conditions. Linear methods Ridge Regression (Figure 8a) and Linear Regression (Figure 8b) demonstrate the limitations of linear approaches for this application, with notable scatter and systematic deviations from the perfect prediction line, particularly at higher thermal loss values. This confirms the non-linear nature of thermal dynamics in PERC modules and justifies the superior performance of non-linear algorithms.
The visualization analysis supports the quantitative findings and provides practical insights for algorithm selection in real-world photovoltaic monitoring applications, where prediction accuracy across the full operational range is crucial for effective thermal loss management.
The results of this study provide significant insights into the application of machine learning algorithms for thermal loss prediction in MonoPERC solar modules under real operating conditions. The superior performance of K-Nearest Neighbors in baseline evaluations, achieving a correlation of 0.9612 and NMAE of 0.0967, demonstrates the effectiveness of instance-based learning for capturing complex thermal dynamics in photovoltaic systems. This finding aligns with previous research highlighting the capability of non-parametric algorithms to model non-linear relationships without making strong assumptions about the underlying data distribution.
The superior performance of the K-Nearest Neighbors (KNN) algorithm is fundamentally rooted in its ability to execute highly effective local regression, a characteristic that aligns perfectly with the physics and non-linear data structure of thermal losses in PV modules.
The thermal clustering methodology represents a novel contribution to photovoltaic performance modeling, addressing the fundamental challenge of non-linear temperature-power relationships across different operational regimes. The identification of distinct thermal regimes (10–25 °C, 25–40 °C, and 40–53 °C) and the subsequent training of specialized models for each regime resulted in consistent performance improvements across all algorithms. This approach is particularly relevant for tropical climates like Colombia, where extreme temperature variations significantly impact module performance.
The exceptional performance of ensemble methods, particularly XGBoost and Random Forest, validates their effectiveness in handling the complexity and variability inherent in real-world photovoltaic data. These algorithms’ ability to capture non-linear interactions between environmental variables and thermal losses makes them well-suited for practical deployment in monitoring and optimization systems.
The limitations of linear methods, evident in their lower correlation coefficients and higher error rates, confirm the inadequacy of simple linear models for capturing the complex thermal dynamics of PERC modules. However, the improvements observed with thermal clustering suggest that even linear approaches can benefit from regime-specific modeling strategies.
The practical implications of this research extend beyond academic interest, offering valuable insights for photovoltaic system operators and manufacturers. The ability to predict thermal losses with high accuracy enables proactive maintenance scheduling, performance optimization, and improved energy yield forecasting, particularly crucial for large-scale solar installations in tropical regions.

4. Conclusions

This study successfully demonstrates the effectiveness of the proposed thermal clustering methodology for improving the prediction of thermal losses in MonoPERC modules under tropical conditions. While the dataset used was limited to one week, it provided sufficient variability to validate the approach and prove its feasibility. With significant variations among seven algorithms, K-Nearest Neighbors achieved superior performance (correlation = 0.9612, NMAE = 0.0967, prediction errors = 7.3 W).
The novel thermal clustering methodology represents a significant contribution to photovoltaic performance modeling, consistently improving prediction accuracy across all algorithms. The results identify two superior approaches: the non-parametric K-Nearest Neighbors (KNN) model and the deep learning-based Multi-Layer Perceptron (MLP). KNN emerged as the overall top performer (Correlation = 0.9584, NMAE = 0.1032 ), demonstrating the optimal effectiveness of local regression on highly homogeneous data. The MLP secured the position as the best deep learning architecture (Correlation = 0.9561, NMAE = 0.1409 ), confirming the critical value of regime-specific modeling for accurately capturing complex, non-linear thermal dynamics.
Key findings include the validation of ensemble methods (XGBoost, Random Forest) as highly effective for photovoltaic thermal modeling, the confirmation that linear methods are inadequate for complex thermal dynamics but benefit from clustering approaches, and the demonstration that all algorithms benefit from increased training data, with performance improvements continuing up to 85% training ratios. As future work, extending the analysis to multi-seasonal and longer-term datasets will further enhance the generalizability of the results and strengthen their applicability across diverse climatic conditions.
This research provides practical implications for photovoltaic system optimization, including accurate thermal loss prediction for maintenance scheduling, improved energy yield forecasting capabilities, and enhanced understanding of PERC module behavior in tropical climates. Future research directions should focus on extending the methodology to other photovoltaic technologies, investigating long-term seasonal variations in thermal performance, and developing real-time implementation frameworks for commercial monitoring systems.

Author Contributions

Conceptualization, Y.G.V.; Methodology, Y.G.V. and A.G.; Software, D.C.C.; Validation, Y.G.V., A.G. and D.C.C.; Formal analysis, Y.G.V. and A.G.; Investigation, Y.G.V. and A.G.; Data curation, Y.G.V. and D.C.C.; Writing—original draft, Y.G.V. and A.G.; Writing—review & editing, Y.G.V. and A.G.; Supervision, F.M.; Funding acquisition, Y.G.V. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Fundación Universitaria Los Libertadores programme “Twelfth Annual Internal Call for Research and Artistic and Cultural Creation Projects 2024”, project “Predictive Model for Solar Photovoltaic Energy Generation Based on Artificial Intelligence Techniques” [grant number: ING-35-25].

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable for this article.

Acknowledgments

The authors acknowledge the support provided by the RADIANT laboratory of Fundación Universitaria Los Libertadores for the experimental setup and data collection. Special thanks to the technical staff for their assistance in maintaining the photovoltaic monitoring system during the data collection period.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
PVPhotovoltaic
PERCPassivated Emitter and Rear Cell
MonoPERCMonocrystalline PERC
STCStandard Test Conditions
MLPMulti-Layer Perceptron
KNNK-Nearest Neighbors
SVRSupport Vector Regression
NMAENormalized Mean Absolute Error
NRMSENormalized Root Mean Square Error
MAEMean Absolute Error
RMSERoot Mean Square Error

References

  1. IPCC. Climate Change 2023: Synthesis Report. Contribution of Working Groups I, II and III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Technical Report; Intergovernmental Panel on Climate Change (IPCC): Geneva, Switzerland, 2023. [Google Scholar] [CrossRef]
  2. REN21. Renewables 2023 Global Status Report; Technical Report; REN21 Secretariat: Paris, France, 2023; ISBN 978-3-948393-08-8. [Google Scholar]
  3. International Energy Agency. Solar PV Global Supply Chains; International Energy Agency IEA: Paris, France, 2023. [Google Scholar]
  4. Blakers, A. Development of the PERC Solar Cell. IEEE J. Photovoltaics 2019, 9, 629–635. [Google Scholar] [CrossRef]
  5. Dubey, S.; Sarvaiya, J.; Seshadri, B. Temperature Dependent Photovoltaic (PV) Efficiency and Its Effect on PV Production in the World—A Review. Energy Procedia 2013, 33, 311–321. [Google Scholar] [CrossRef]
  6. Yang, D.; Xu, L.; Kang, D.; Stein, J.S. Thermal modeling of photovoltaic modules: A systematic review of heat transfer pathways. Sol. Energy 2023, 250, 92–108. [Google Scholar]
  7. Dash, P.; Gupta, N. Effect of temperature on power output from different commercially available photovoltaic modules. Int. J. Eng. Res. Appl. 2015, 5, 148–151. [Google Scholar]
  8. Kumari, N.; Kumar Singh, S.; Kumar, S.; Kumar Jadoun, V. Performance Investigation of Monocrystalline and Polycrystalline PV Modules Under Real Conditions. IEEE Access 2024, 12, 169869–169878. [Google Scholar] [CrossRef]
  9. Dhaundiyal, A.; Atsu, D. The effect of wind on the temperature distribution of photovoltaic modules. Sol. Energy 2020, 201, 259–267. [Google Scholar] [CrossRef]
  10. Kaldellis, J.K.; Kapsali, M.; Kavadias, K.A. Temperature and wind speed impact on the efficiency of PV installations. Experience obtained from outdoor measurements in Greece. Renew. Energy 2014, 66, 612–624. [Google Scholar] [CrossRef]
  11. Karim, S.R.; Sarker, D.; Kabir, M.M. Analyzing the impact of temperature on PV module surface during electricity generation using machine learning models. Clean. Energy Syst. 2024, 9, 100135. [Google Scholar] [CrossRef]
  12. Congedo, P.; Malvoni, M.; Mele, M.; De Giorgi, M. Performance measurements of monocrystalline silicon PV modules in South-eastern Italy. Energy Convers. Manag. 2013, 68, 1–10. [Google Scholar] [CrossRef]
  13. Du, Y.; Fell, C.J.; Duck, B.; Chen, D.; Liffman, K.; Zhang, Y.; Gu, M.; Zhu, Y. Modeling of photovoltaic panel temperature in realistic scenarios. Energy Convers. Manag. 2016, 108, 60–67. [Google Scholar] [CrossRef]
  14. Bamisile, O.; Acen, C.; Cai, D.; Huang, Q.; Staffell, I. The environmental factors affecting solar photovoltaic output. Renew. Sustain. Energy Rev. 2024, 208, 115073. [Google Scholar] [CrossRef]
  15. Dhass, A.; Natarajan, E.; Lakshmi, P. An investigation of temperature effects on solar photovoltaic cells and modules. Int. J. Eng. Trans. B Appl. 2014, 27, 1713–1722. [Google Scholar]
  16. Tripathi, A.K.; Aruna, M.; Murthy, C.S. Output power loss of photovoltaic panel due to dust and temperature. Int. J. Renew. Energy Res. 2017, 7, 439–442. [Google Scholar] [CrossRef]
  17. Amer, K.A.; Fakher, M.; Salem, A.; Ahmad, S.M.; Irhouma, M.A.; Altahbao, S.A.S.; Salim, E. Power losses on PV solar fields: Sensitivity analysis and a critical review. Int. J. Eng. Res. Technol. IJERT 2020, 9, 1000–1007. [Google Scholar]
  18. Min, B.; Müller, M.; Wagner, H.; Fischer, G.; Brendel, R.; Altermatt, P.P.; Neuhaus, H. A roadmap toward 24% efficient PERC solar cells in industrial mass production. IEEE J. Photovoltaics 2017, 7, 1541–1550. [Google Scholar] [CrossRef]
  19. Research progress of light and elevated temperature-induced degradation in silicon solar cells: A review. J. Alloys Compd. 2022, 912, 165120. [CrossRef]
  20. Duranay, Z.B.; Guldemir, H. Power Prediction in Photovoltaic Systems with Neural Networks: A Multi-Parameter Approach. Appl. Sci. 2025, 15, 3615. [Google Scholar] [CrossRef]
  21. Jobayer, M.; Shaikat, M.A.H.; Naimur Rashid, M.; Hasan, M.R. A systematic review on predicting PV system parameters using machine learning. Heliyon 2023, 9, e16815. [Google Scholar] [CrossRef]
  22. Bryan, J.L.; Silverman, T.J.; Deceglie, M.G.; Holman, Z.C. Thermal model to quantify the impact of sub-bandgap reflectance on operating temperature of fielded PV modules. Sol. Energy 2021, 220, 246–250. [Google Scholar] [CrossRef]
  23. Scott, C.; Ahsan, M.; Albarbar, A. Machine learning for forecasting a photovoltaic (PV) generation system. Energy 2023, 278, 127807. [Google Scholar] [CrossRef]
  24. Bayrak, F. Prediction of photovoltaic panel cell temperatures: Application of empirical and machine learning models. Energy 2025, 323, 135764. [Google Scholar] [CrossRef]
  25. Asiedu, S.T.; Nyarko, F.K.; Boahen, S.; Effah, F.B.; Asaaga, B.A. Machine learning forecasting of solar PV production using single and hybrid models over different time horizons. Heliyon 2024, 10, e28898. [Google Scholar] [CrossRef] [PubMed]
  26. Su, X.; Yan, X.; Tsai, C.L. Linear regression. Wiley Interdiscip. Rev. Comput. Stat. 2012, 4, 275–294. [Google Scholar] [CrossRef]
  27. McDonald, G.C. Ridge regression. Wiley Interdiscip. Rev. Comput. Stat. 2009, 1, 93–100. [Google Scholar] [CrossRef]
  28. Ho, C.H.; Lin, C.J. Large-scale linear support vector regression. J. Mach. Learn. Res. 2012, 13, 3323–3348. [Google Scholar]
  29. Riedmiller, M.; Lernen, A. Multi Layer Perceptron. Machine Learning Lab Special Lecture; University of Freiburg: Freiburg im Breisgau, Germany, 2014; Volume 24, pp. 11–60. Available online: http://machine-learning-lab.com/_media/documents/teaching/ss12/ml/05_mlps.printer.pdf (accessed on 2 April 2025).
  30. Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
  31. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T.; et al. Xgboost: Extreme Gradient Boosting; R package version 0.4-2; The R Project for Statistical Computing: Vienna, Austria, 2015; Volume 1, pp. 1–4. [Google Scholar]
  32. Kramer, O. K-nearest neighbors. In Dimensionality Reduction with Unsupervised Nearest Neighbors; Springer: Berlin/Heidelberg, Germany, 2013; pp. 13–23. [Google Scholar]
  33. Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 1979, 28, 100–108. [Google Scholar] [CrossRef]
Figure 1. Topology of the distributed photovoltaic system with Hoymiles microinverters showing the connection of solar panels (2), microinverters (1) and temperature sensors (3).
Figure 1. Topology of the distributed photovoltaic system with Hoymiles microinverters showing the connection of solar panels (2), microinverters (1) and temperature sensors (3).
Energies 18 06029 g001
Figure 2. Installation of P-type PERC solar modules on a rooftop in Bogotá, Colombia: (a) view of monocrystalline panels; (b) Connection scheme of the monitoring system showing: (1) Hoymiles microinverter and PERC panels; (2) DS18B20 temperature sensors; (3) ZTS-300 AL-RA-N01 solar radiation sensor; (4) CMT2300A-E49-900M20S wireless data transmission module.
Figure 2. Installation of P-type PERC solar modules on a rooftop in Bogotá, Colombia: (a) view of monocrystalline panels; (b) Connection scheme of the monitoring system showing: (1) Hoymiles microinverter and PERC panels; (2) DS18B20 temperature sensors; (3) ZTS-300 AL-RA-N01 solar radiation sensor; (4) CMT2300A-E49-900M20S wireless data transmission module.
Energies 18 06029 g002
Figure 3. Solar irradiance measured on 16 April 2025, showing a maximum value of 1449 W/m2 at 11:13 h.
Figure 3. Solar irradiance measured on 16 April 2025, showing a maximum value of 1449 W/m2 at 11:13 h.
Energies 18 06029 g003
Figure 4. Diurnal Variation of Photovoltaic Panel Surface Temperature.
Figure 4. Diurnal Variation of Photovoltaic Panel Surface Temperature.
Energies 18 06029 g004
Figure 5. DC power output profile of a single 550 W p-type PERC monocrystalline solar panel recorded on 16 April 2025 at the Bogotá installation site (4.65° N, 74.07° W).
Figure 5. DC power output profile of a single 550 W p-type PERC monocrystalline solar panel recorded on 16 April 2025 at the Bogotá installation site (4.65° N, 74.07° W).
Energies 18 06029 g005
Figure 6. Correlation Matrix Between thermal Characteristics.
Figure 6. Correlation Matrix Between thermal Characteristics.
Energies 18 06029 g006
Figure 7. Training correlation heatmap showing algorithm performance across different training data percentages (60–85%).
Figure 7. Training correlation heatmap showing algorithm performance across different training data percentages (60–85%).
Energies 18 06029 g007
Figure 8. ML algorithms for thermal loss prediction in PERC solar panels. Each subplot shows the correlation between actual and predicted thermal losses (W), where the diagonal represents a perfect prediction. (a) Linear Regression, (b) Ridge Regression, (c) Multi-Layer Perceptron, (d) K-Nearest Neighbors, (e) Support Vector Regression, (f) Random Forest, (g) XGBoost.
Figure 8. ML algorithms for thermal loss prediction in PERC solar panels. Each subplot shows the correlation between actual and predicted thermal losses (W), where the diagonal represents a perfect prediction. (a) Linear Regression, (b) Ridge Regression, (c) Multi-Layer Perceptron, (d) K-Nearest Neighbors, (e) Support Vector Regression, (f) Random Forest, (g) XGBoost.
Energies 18 06029 g008
Table 1. Geographic Location and PV System Specifications.
Table 1. Geographic Location and PV System Specifications.
         Geographic Location
         Latitude4.65174° N
         Longitude−74.06630° W
         Time zoneGTM -5
         PV Module Physical Specifications
         PV module identifieryl550d-49e
         PV TechnologyMonocrystalline P-type PERC
         Elevation above sea level2640 m
         PV module tilt degrees
         PV module azimuth degrees180°
         Weight28 kg
         PV Module Electrical Specifications
         Nominal Power550 W (±3%)
         Nominal Voltage42 V
         Nominal Current13.97 A
         Open Circuit Voltage49.82 V (±3%)
         Short Circuit Current13.9 A (±3%)
         Efficiency21.29%
Table 2. Evaluation Metrics for Thermal Loss Prediction.
Table 2. Evaluation Metrics for Thermal Loss Prediction.
MetricEquationDescription
MAE MAE raw σ y = 1 n i = 1 n | y i y ^ i | 1 n i = 1 n ( y i y ¯ ) 2 Normalized mean absolute error. MAE divided by population standard deviation of actual values to enable dimensionless comparison.
RMSE RMSE raw σ y = 1 n i = 1 n ( y i y ^ i ) 2 1 n i = 1 n ( y i y ¯ ) 2 Normalized root mean square error. RMSE divided by population standard deviation of actual values. More sensitive to larger errors than MAE.
R 2 R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2 Coefficient of determination. Represents the proportion of variance in thermal losses that is predictable from the input features.
Correlation r = i = 1 n ( y i y ¯ ) ( y ^ i y ^ ¯ ) i = 1 n ( y i y ¯ ) 2 i = 1 n ( y ^ i y ^ ¯ ) 2 Pearson correlation coefficient. Measures the linear relationship strength between actual and predicted thermal loss values.
MAE (W) MAE raw = 1 n i = 1 n | y i y ^ i | Raw mean absolute error in watts. Provides direct interpretation of prediction accuracy in original thermal loss measurement units.
Table 3. Baseline Performance of Machine Learning Algorithms for Thermal Loss Prediction.
Table 3. Baseline Performance of Machine Learning Algorithms for Thermal Loss Prediction.
AlgorithmCorrelationR2MAE (W)NMAERMSE (W)NRMSE
Linear Regression0.89000.792021.90.289834.60.4589
Ridge0.88990.791821.90.290134.60.4591
Random Forest0.95200.906211.10.146923.20.3082
SVR0.92860.862013.80.183228.20.3737
MLP0.95140.905111.90.157323.40.3100
KNN0.96120.92397.30.096720.90.2776
XGBoost0.95550.912710.90.145222.40.2973
Table 4. Statistical Robustness Analysis: Standard Deviation ( σ ) and 95% Confidence Interval (CI 95%) for NMAE under the Baseline Approach.
Table 4. Statistical Robustness Analysis: Standard Deviation ( σ ) and 95% Confidence Interval (CI 95%) for NMAE under the Baseline Approach.
AlgorithmMean NMAE ( μ )Standard Deviation ( σ NMAE)CI 95% NMAE
Linear Regression0.7968 ± 0.0417 0.0417 [ 0.7669 , 0.8266 ]
Ridge0.7964 ± 0.0414 0.0414 [ 0.7668 , 0.8261 ]
Random Forest0.7938 ± 0.0390 0.0390 [ 0.7659 , 0.8217 ]
SVR0.8244 ± 0.0352 0.0352 [ 0.7992 , 0.8496 ]
MLP0.8132 ± 0.0265 0.0265 [ 0.7943 , 0.8322 ]
KNN0.8340 ± 0.0329 0.0329 [ 0.8105 , 0.8575 ]
XGBoost0.8185 ± 0.0420 0.0420 [ 0.7885 , 0.8486 ]
Table 5. Performance with Thermal Clustering.
Table 5. Performance with Thermal Clustering.
AlgorithmCorrelationR2MAE (W)NMAERMSE (W)NRMSE
Linear Regression0.90530.819621.30.282532.20.4273
Ridge0.90430.817821.40.283532.40.4295
Random Forest0.94290.889113.30.176925.30.3351
SVR0.93450.873113.00.172527.00.3584
MLP0.95610.914210.60.140922.20.2947
KNN0.95840.91857.80.103221.70.2873
XGBoost0.95250.906911.70.154923.20.3069
Table 6. Robustness Statistics: Standard Deviation ( σ ) a Confidence Interval (CI 95%) for NMAE under the Thermal Clustering Approach.
Table 6. Robustness Statistics: Standard Deviation ( σ ) a Confidence Interval (CI 95%) for NMAE under the Thermal Clustering Approach.
AlgorithmStandard Deviation ( σ NMAE)(CI 95% NMAE)
Linear Regression 0.0383 [ 0.7661 , 0.8208 ]
Ridge 0.0383 [ 0.7661 , 0.8208 ]
Random Forest 0.0398 [ 0.7807 , 0.8378 ]
SVR 0.0359 [ 0.7769 , 0.8283 ]
MLP 0.0383 [ 0.7702 , 0.8250 ]
KNN0.0484 [0.8146,0.8839]
XGBoost 0.0408 [ 0.7985 , 0.8569 ]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Garcia Vera, Y.; Gallego, A.; Camargo Cala, D.; Mesa, F. Machine Learning Prediction of Thermal Losses in MonoPERC Solar Modules: A Novel Clustering Approach for Tropical Climate Applications. Energies 2025, 18, 6029. https://doi.org/10.3390/en18226029

AMA Style

Garcia Vera Y, Gallego A, Camargo Cala D, Mesa F. Machine Learning Prediction of Thermal Losses in MonoPERC Solar Modules: A Novel Clustering Approach for Tropical Climate Applications. Energies. 2025; 18(22):6029. https://doi.org/10.3390/en18226029

Chicago/Turabian Style

Garcia Vera, Yimy, Andres Gallego, David Camargo Cala, and Fredy Mesa. 2025. "Machine Learning Prediction of Thermal Losses in MonoPERC Solar Modules: A Novel Clustering Approach for Tropical Climate Applications" Energies 18, no. 22: 6029. https://doi.org/10.3390/en18226029

APA Style

Garcia Vera, Y., Gallego, A., Camargo Cala, D., & Mesa, F. (2025). Machine Learning Prediction of Thermal Losses in MonoPERC Solar Modules: A Novel Clustering Approach for Tropical Climate Applications. Energies, 18(22), 6029. https://doi.org/10.3390/en18226029

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop