Optoelectronic Devices Analytics: MachineLearning-Driven Models for Predicting the Performance of a Dye-Sensitized Solar Cell

Onah, Emeka Harrison; Lethole, N. L.; Mukumba, P.

doi:10.3390/electronics14101948

Open AccessEditor’s ChoiceArticle

Optoelectronic Devices Analytics: MachineLearning-Driven Models for Predicting the Performance of a Dye-Sensitized Solar Cell

by

Emeka Harrison Onah

^1,*

,

N. L. Lethole

²

and

P. Mukumba

¹

Physics Discipline, Department of Computational Sciences, Private Bag X1314, Alice 5700, Eastern Cape, South Africa

²

SAMRC Microbial Water Quality Monitoring Centre, University of Fort Hare, Private Bag X1314, Alice 5700, Eastern Cape, South Africa

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(10), 1948; https://doi.org/10.3390/electronics14101948

Submission received: 3 April 2025 / Revised: 5 May 2025 / Accepted: 7 May 2025 / Published: 10 May 2025

(This article belongs to the Special Issue Modeling and Design of Solar Cell Materials)

Download

Browse Figures

Versions Notes

Abstract

Optoelectronic devices, which combine optics and electronics, are vital for converting light energy into electrical energy. Various solar cell technologies, such as dye-sensitized solar cells (DSSCs), silicon solar cells, and perovskite solar cells, among others, belong to this category. DSSCs have gained significant attention due to their affordability, flexibility, and ability to function under low light conditions. The current research incorporates machine learning (ML) models to predict the performance of a modified Eu³⁺-doped Y₂WO₆/TiO₂ photo-electrode DSSC. Experimental data were collected from the “Dryad Repository Database” to feed into the models, and a detailed data visualization analysis was performed to study the trends in the datasets. The support vector regression (SVR) and Random Forest regression (RFR) models were applied to predict the short-circuit current density (J_sc) and maximum power (P_max) output of the device. Both models achieved reasonably accurate predictions, and the RFR model attained a better prediction response, with the percentage difference between the experimental data and model prediction being 0.73% and 1.01% for the J_sc and P_max respectively, while the SVR attained a percentage difference of 1.22% and 3.54% for the J_sc and P_max respectively.

Keywords:

dye-sensitized solar cells; support vector regression; random forest regression; short-circuit current density; maximum power

Graphical Abstract

1. Introduction

Clean energy is an essential need for humankind and modern society due to the alarming rate of environmental degradation caused by the burning of fossil fuels. Overreliance on non-renewable energy sources has led to many negative consequences, including air and water pollution, deforestation, and climate change. Transitioning to clean energy sources such as solar, wind, and hydroelectric power is essential to reducing these environmental effects and ensuring that future generations have a sustainable future. Solar energy is particularly promising energy source that has been essential to reducing environmental pollution and developing renewable energy technologies. One of the ways to harness and process solar energy is with the use of optoelectronic devices such as solar cells [1,2]. Amongst these devices, dye-sensitized solar cells (DSSCs) have particularly gained significant attention due to their affordability, flexibility, and ability to function under low light conditions [3,4,5]. DSSCs consist of a mesoporous semiconductor layer (usually TiO₂), a dye-sensitizer, an electrolyte, and a counter electrode. When the solar spectrum irradiates the dye molecules, electrons are lifted to the excited state and then are transferred into the conduction band of the semi-conductor, generating an electric current.

DSSCs were first introduced by O’Regan and Gratzel in 1991 [6], which represented a breakthrough in the field of photovoltaics. They play a significant role in renewable energy solutions, offering benefits over conventional/traditional silicon-based solar cells. These benefits include flexibility, transparency, and effective performance in low light conditions [7,8]. However, their efficiency and performance are influenced by various factors, including the types of dyes, the kinds of semiconductor materials, and the composition of the electrolytes [9,10,11,12]. Hence, it is crucial to establish a robust framework for predicting the performance of DSSCs before their fabrication and development. Machine learning (ML) emerges as an attractive option in this research area, offering a promising approach to enhancing our understanding and optimization of DSSC technology. ML, a branch of artificial intelligence (AI), applies data-driven algorithms to identify patterns and make predictions. In the context of optoelectronic devices, ML techniques can analyze extensive datasets from both experimental and simulation sources to derive meaningful insights related to performance metrics [13]. By examining parameters such as current density, output voltage, and electrical resistance, ML models provide a deeper understanding of the intricate relationships that influence DSSC power output performance, thereby enhancing their design and efficiency. The development of DSSC technology relies heavily on empirical data gathered from experiments. The significant volume of data and the variability present in experimental conditions can obscure patterns and complicate the optimization process. Optoelectronic device analytics offers a systematic approach to data analysis by employing ML to create predictive models that facilitate the performance enhancement of DSSCs [14]. This synergy can improve research efficiency and lead to quicker advancements in DSSC technologies.

Various ML methodologies are being applied to predict the performance of DSSCs. Kandregula et al. [15] developed a data-driven approach towards identifying dye-sensitizer molecules for higher power conversion efficiency in solar cells. The study identified a robust method for interpreting the quantitative structure–property relationship model of dye molecules via the combination of structural, quantum, and experimental properties in determining the power conversion efficiency of DSSCs using ML and computational methods. The features with the most impact in predicting power conversion efficiency were selected for further analysis in developing various machine learning models such as linear regression, sequential minimal optimization regression, Random Forest, and multilayer perception neural networks. Random Forest emerged as the best model with the root mean square error (RMSE) equivalent to 0.802. Sutar S.S et al. [16] predicted the properties of a hydrothermally synthesized ZnO-based dye-sensitized solar cell using the application of ML techniques. Their work reported on the prediction of the power conversion efficiency of a ZnO-based DSSC using Random Forest and artificial neural network algorithms, and both models predicted the efficiency of the ZnO-based DSSC with reasonable accuracy. Their analysis revealed that the Random Forest algorithm accurately predicted the power conversion efficiency with an adjusted coefficient of determination, or Adj. R-squared (R²), of 0.7232. Interestingly, the artificial neural network yielded an even more precise result, achieving an Adj. R² value of 0.7447 in predicting the relationship between experimental power conversion efficiency and model prediction. Notably, a greater concentration of data points aligned closely with the diagonal line when utilizing the artificial neural network, indicating its superior performance in forecasting the power conversion efficiency of the ZnO-based DSSC.

Other ML-driven models, such as support vector machine (SVM) and Random Forest regression (RFR), have also shown significant promise in predicting the performance of DSSC-based optoelectronic devices [13,17,18,19,20,21]. These models can capture complex patterns and non-linear relationships in data, thereby making them suitable for predicting the performance of DSSC devices [5]. The SVM exhibits robustness in high-dimensional spaces and can be applied to find a hyperplane that best separates the data points into different categories. From the perspective of regression, support vector regression (SVR) aims to find a function that deviates from the actual data points by a value no greater than a specific threshold

ε

while ensuring that the function is as flat as possible [22]. The RFRs are ensemble learning methods that operate by constructing multiple decision trees and outputting the mean prediction (regression) or mode (classification) of the individual trees [23]. The RFR models can perform a large number of input features without overfitting. They work by averaging the prediction of many decision trees, each trained on a different subset of the data, to reduce variance and enhance accuracy. All of these will be helpful in predicting the photovoltaic performance of DSSSCs.

In this research, ML-driven models, particularly SVR and RFR, were trained to predict the short-circuit current density

{(J}_{s c})

and maximum power

{(P}_{m a x})

output performance of a TiO₂-based DSSC, in particular, a modified Eu³⁺-doped Y₂WO/TiO₂ photo-electrode for improved light harvesting in DSSCs. The purpose of using a Eu³⁺-doped Y₂WO₆/TiO₂ photo-electrode for the ML-driven models was due to its light harvesting capacity that enhances electron transport properties to yield better DSSC performance. Both models achieved reasonable accurate predictions, and the better model was selected for further analysis. To find the hidden trends in the data, a detailed data visualization analysis was implemented. Correlation analysis was applied to determine the relationship between different features in the datasets. To determine the impact of features on the power output of the model, the Shapley additive exPlanation (SHAP) technique was applied. Key results show that the RFR model had better predictive performance than the SVM and that the current density showed the most impactful feature for power output prediction. The incorporation of ML into optoelectronic devices provides insight into the key features impacting the power performance of DSSCs, and optimizing this feature would provide improved performance of DSSCs. The uniqueness and novelty of this study are based upon the incorporation of SVR and RFR models for predicting the short-circuit current density and maximum power of a DSSC, coupled with SHAP analysis to interpret feature contributions, providing explainable insights that have not been reported in DSSC research. This contribution provides analytical insight that could help advance DSSC design strategies.

2. Theoretical Framework

2.1. Equivalent Circuit Model of DSSCs

In general, the model of the equivalent circuit for the DSSCs contains a diode, a source of photo-generated current density

(J_{P h})

, series resistance (

R_{s})

, shunt resistance

(R_{S h})

, and capacitances due to charge carrier transportation at the surface of the platinum (Pt) counter electrode

(C_{P t})

and on the electrolyte

(C_{E l e c})

. The diode represents the interface between the dye-coated semiconductor (TiO₂) and the electrolyte. The current source is associated with the creation of a photo-generated current in the cell, and it is proportional to the intensity of light. Losses are represented by the series resistance, which is related to the contact resistance of the metal-semiconductor, the ohmic resistance of the metal contact, and the ohmic resistance of the semiconductor material. The shunt resistance is represented by the leakage current along the edges of the cell. The

C_{P t}

and

(R_{P t})

are impedance due to carrier transportation on the surface of the counter electrode, where the

(R_{P t})

is the resistance at the surface of the Pt counter electrode. The

C_{E l e c}

and

(R_{E l e c})

are impedance due to carrier transportation through the ions in the electrolyte in terms of the dye–electrolyte interface. Where the

(R_{E l e c})

is the resistance within the electrolyte, the surface resistance of the cell is represented as

(R_{S u r f})

. Since the capacitance happens to be quite small compared to the other factors affecting the cell, it can be neglected so that the

R_{S u r f}

,

R_{E l e c}

, and

R_{P t}

can be resolved in terms of

R_{s}

. A schematic circuit diagram is shown in Figure 1. The effective

R_{s}

is obtained as follows [24,25]:

R_{s} = R_{P t} + R_{E l e c} + R_{S u r f}

(1)

Kirchhoff’s current law (KCL) states that the total current entering a junction or node in an electrical circuit is equal to the total current leaving the junction. From this law, it can be deduced from the circuit that

J_{P h}

is proceeding to a junction whereas

J_{d} a n d J^{'}

are exiting the junction. Therefore, photo-generated current density

(J_{P h})

in

m A / {c m}^{2}

can be expressed as in Equation (2).

J_{P h} = J_{d} + J^{'}

(2)

where

J_{d}

and

J^{'}

are the current density diverted through the diode and auxiliary current density, respectively. The same KCL can be replicated on the shunt side of the circuit. The

J^{'}

is now entering the junction of the shunt, whereas the

J_{S h} a n d J_{C e l l}

are exiting it. Therefore, on the shunt side of the circuit, the following equation can be deduced, as shown in Equation (3):

J^{'} = J_{S h} + J_{C e l l}

(3)

where

J_{S h}

is the current density through the shunt resistance and

J_{C e l l}

is the output current density. Therefore, the solar cell characteristic current density–voltage (

J_{C e l l} - V_{C e l l}

) behavior can be expressed in terms of the combination of (2) and (3). This indicates that we should substitute

J^{'}

in Equation (2) using Equation (3) and afterwards make

J_{C e l l}

the subject of the expression. Therefore, Equation (4) was obtained as a result.

J_{C e l l} = J_{P h} - J_{d} - J_{S h}

(4)

The current diverted through shunt resistance can be expressed as in Equation (5):

J_{S h} = \frac{V_{C e l l} + {J_{C e l l} R}_{s}}{R_{S h}}

(5)

where

R_{s}

is the series resistance within the active area of the cell (Ω·cm²) and

R_{S h}

is the shunt resistance within the active area of the cell (Ω·cm²).

The current diverted through the diode is governed by the following:

J_{d} = J_{0} (e^{\frac{q V_{C e l l} + {q J_{C e l l} R}_{s}}{n k T}} - 1)

(6)

The Shockley equation describes the exponential dependence of the current density as a function of the applied voltage. From the expression of the Shockley diode equation [26], the cell’s net current density can then be expressed as follows:

J_{C e l l} = J_{P h} - J_{0} [e^{\frac{q (V_{C e l l} + {J_{C e l l} R}_{s})}{n k T}} - 1] - \frac{V_{C e l l} + {J_{C e l l} R}_{s}}{R_{S h}}

(7)

where

J_{0}, n, T, q

, and

k

are the reverse saturation current density (mA/cm²), ideality factor (unit-less), absolute temperature (K), elementary charge (C), and Boltzmann’s constant (J/K), respectively. Since the

J_{P h} - J_{d} ≫ J_{S h}

, therefore

J_{C e l l} \approx J_{P h} - J_{d}

[27,28,29,30]. When the cell’s output voltage is zero, the

J_{P h}

will effectively be equal to the short-circuit current density (J_Sc). Equation (7) can then be simplified under the approximation that

R_{s} \to 0 a s R_{S h} \to \infty

:

J_{C e l l} = J_{S c} - J_{0} [e^{\frac{q V_{C e l l}}{n k T}} - 1]

(8)

On the other hand, when the cell’s output current is equal to zero (

J_{c e l l} = 0

), the cell’s output voltage (

V_{c e l l}

) will be equal to the open circuit voltage (

V_{O C}

). Equation (8) can then be re-written as follows:

V_{o c} = \frac{n k T}{q} \ln (\frac{J_{S c}}{J_{0}} + 1)

(9)

The power output (

P_{C e l l}

) can be expressed as the product of the current density and voltage within the given active area of the cell.

P_{C e l l} = J_{C e l l} {\cdot V}_{C e l l} \cdot A = J_{S c} V_{C e l l} A - J_{0} V_{C e l l} A [e^{\frac{q V_{C e l l}}{n k T}} - 1]

(10)

where

A

is the active area of the solar cell.

2.2. Support Vector Regression (SVR)

Support vector regression (SVR) is a kind of support vector machine (SVM) algorithm commonly used for regression analysis. One popular machine learning model for forecasting DSSC performance is the SVM. Its capacity to offer flexible algorithm control may be the reason for this [31]. From the perspective of Vapnik [32], the SVM algorithm is the best technique for defining pattern recognition problems. The SVM can be applied to analyze data classification and regression models. As depicted in Figure 2, the architecture of SVR bears similarities to that of an artificial neural network (ANN). It establishes a connection between the input layer and the output layer through the inclusion of a hidden layer. This hidden layer is critical, as it allows for the automatic computation of relationships based on the dataset, enabling the model to capture non-linear patterns and interactions.

For classification purposes, the SVM aims to determine the optimal hyperplane in N-dimensional space that can separate the data points in different classes in the feature space [33]. A hyperplane is a flat subspace in

N

-dimensional space that has an

N - 1

dimension, where

N

is the number of dimensional spaces. The dimension of a hyperplane is always one less than the ambient plane. For instance, in a 2D space, the hyperplane is a line; in a 3D space, it is a plane. The optimal hyperplane is the one that maximizes the margin, defined to be the distance between the hyperplane and the nearest data points from each class, called support vectors [34].

Just like the SVM, the SVR applies the concept of a hyperplane and the margin, but there is a slight difference between them. In SVR, the margin happens to be the error tolerance of the model, also known as the

ε - i n s e n s i t i v e t u b e

. The tube allows for some deviation of the data points from the hyperplane without being considered as errors [35]. Here, the hyperplane becomes the possible best fit to the data which fall within the tube.

The concept of SVR involves transforming the feature vectors of sample data from a lower dimension to a higher dimension. This transformation facilitates the application of a regression analysis in the higher-dimensional space through the use of a kernel function, which allows for the model to capture complex relationships within the data. As illustrated in Figure 3, this transformation is crucial for enabling SVR to effectively learn from the data by finding the optimal hyperplane that best fits the samples.

In essence, the SVR approach leverages the kernel function to project data into a space where it becomes easier to separate and analyze. This enables the regression model to accommodate non-linear relationships between the input features and the output variables. The operation of the support vector regression machine can be mathematically expressed as follows:

f (x) = w \cdot x + b

(11)

where

w

represents the coefficient,

x

is the input feature vector, and

b

is the bias constant. The

w

denotes the coefficient that affects the slope,

x

is the input feature vector that contains the various attributes of the data, and

b

is the bias constant that allows for the vertical shifting of the function. The formulation of SVR takes the form of a convex optimization problem, as defined in Equation (12). The purpose of this is to find the function

f (x)

that is as flat as possible. For linear functions, flatness depicts exhibiting a small value of

w

in the linear regression function. Therefore, a small

w

value is required by the Euclidean norm squared

{‖w‖}^{2}

minimization to determine the target of the SVR and ensure the flatness of the regression function. To determine the optimal regression function, a minimization function must be defined as follows.

{M i n i m i z e : \frac{‖w‖}{2}}^{2} + C \frac{1}{N} \sum_{i = 1}^{N} L (f (x_{i}), y_{i}),

(12)

where

L (f (x_{i}), y_{i})

represents the loss function between the predicted output

f (x_{i})

and the true output

y_{i}

for each data point

x_{i}

in a dataset of size N. The C is the penalty factor that typically represents a regularization factor, which helps prevent overfitting by penalizing large weights in the model.

The loss function

L (y)

is defined piecewise as follows:

L (y) = \{\begin{matrix} 0, i f |f (x_{i}) - y_{i}| \leq ε \\ |f (x_{i}) - y_{i}| - ε, i f |f (x_{i}) - y_{i}| > ε \end{matrix}

(13)

Here,

ε

is a threshold that determines the sensitivity of the loss function. The first case indicates that, when the prediction is close to the true value (within the margin defined by

ε

), the loss is zero, promoting a strong response to small errors. The second case quantifies the loss for predictions that exceed this threshold. The

w

has to be as small as possible to enhance smoothness, reduce complexities, and avert overfitting [36]. The overall goal is to find the optimal weights

w

that would minimize the combination of the regularization term and the aggregated loss across all training samples, thus ensuring a balance between model complexity and predictive accuracy.

By formulating the Lagrange equation and applying the Karush–Kuhn–Tucker (KKT) conditions to derive the partial derivatives of the parameters [37], we can successfully obtain the dual mode of the SVR model. This approach not only allows for a more efficient computation of the model’s parameters, but also facilitates the incorporation of constraints inherent in the optimization problem. The resulting decision function is expressed as follows:

f (x) = \sum_{i = 1}^{l} (α_{i}^{*} - α_{i}) K (x_{i}, x) + b

(14)

Here,

α_{i}^{*}

and

α_{i}

are the Lagrange multipliers corresponding to the support vectors

x_{i}

, while

K (x_{i}, x)

represents the kernel function that measures the similarity between the input data points. The term

b

has always been the bias term that adjusts the decision boundary. This formulation enables us to effectively capture the underlying patterns in the data, allowing for accurate predictions in various regression analyses. Moreover, leveraging the dual formulation often leads to computational advantages, particularly when dealing with high-dimensional feature spaces.

References [38,39] reflect the applicability of the linear and radial basis function (RBF) kernel in predicting the non-linear regression of a system. Therefore, both the linear and RBF kernels were chosen in this paper due to their proven effectiveness in handling complex datasets. The flexibility of these kernels is particularly notable, as they allow for the adjustment of the kernel function coefficient γ, which plays a critical role in shaping the model’s behavior. The linear kernel is ideal for scenarios where the relationship between input variables and the response is approximately linear. It provides simplicity and interpretability, making it easier to understand the influence of individual features on the output. In contrast, the RBF kernel is designed to capture more intricate patterns within the data by measuring the similarity between points in a transformed feature space. This capability is essential when dealing with non-linear relationships commonly observed in the J-V and P-V of DSSC response data. By adjusting the coefficient γ, we can control the spread of the RBF kernel, impacting how tightly the model fits the training data [40,41]. A small γ value leads to a smoother decision boundary, whereas a larger γ results in a more complex model that can closely follow the training data, but may risk overfitting. The mathematical representation of the RBF kernel is given by the following:

K (x_{i}, x_{j}) = \exp (- γ {|x_{i} - x_{j}|}^{2})

(15)

where

x_{i}

and

x_{j}

are the input vectors and

|x_{i} - x_{j}|

denotes the Euclidean distance between them. This formulation highlights how the RBF kernel’s flexibility allows for it to adapt to varying data distributions, making it particularly suitable for our analysis in predicting the performance of a DSSC. However, the combination of linear and RBF kernels provides a robust framework for modeling non-linear regression tasks. Their ability to be tuned through the kernel function coefficient (γ) ensures that our approach can effectively capture both the linear and non-linear characteristics of the data, ultimately leading to more accurate predictions and deeper insights into DSSC performance. Most importantly, the penalty factor C, the kernel function coefficient

γ

, and the maximum deviation

ε

all influence the outcome of the SVR [38].

2.3. Random Forest Regression (RFR)

Multiple decision trees are built using Random Forest regressions (RFRs), an ensemble learning technique that outputs the mean prediction. In the field of machine learning, the algorithm stands out for its distinct validity and verifiability. By utilizing random sampling and the sophisticated features from clustered sampling approaches, it generates accurate estimates and offers improved generalization capacities [42,43].

RFR consists of

M

sample trees, denoted as [

T_{m} (X), m = 1, 2, \dots, M

], and an

N

-dimensional input vector,

X = (x_{1}, x_{2}, \dots, x_{n})

, that forms the forest. Each of the

M

trees produce an output

y_{m}

,

(w h e r e y_{m} = T_{m} (X)),

resulting in

M

outputs, corresponding to the

M

trees. The final prediction of the algorithm is obtained by averaging the outputs from all of the trees, as shown in Figure 4. Reference [44] noted that, as the number of trees increases, the generalization error converges to a limit for both the classification and regression tasks. The training set comprises pairs

{(X}_{i}, Y_{i})

, where

X_{i} f o r i = 1, \dots, n

) is the

N

-dimensional input vector and

Y_{i}

is the corresponding output vector [45].

In the Random Forest (RF) algorithm, a new training set is created for each regression tree by replacing the original training set with bootstrap samples, leading to the exclusion of certain training data. These excluded data points are referred to as out-of-bag (OOB) samples, which typically comprise 20% of the new training set. The remaining 80% are utilized to develop the regression function. This process allows for the creation of a regression tree from a randomly selected training sample each time, while the out-of-bag samples are employed for accuracy assessment. These built-in validation features enhance the ability of Random Forests to generalize using independent test data. The overall learning error is represented as

y_{e}

and is calculated as follows:

\bar{Y_{i}} (X_{i}) = \frac{1}{M} \sum_{m - 1}^{M} \bar{y_{m}}

(16)

\bar{y_{e}} = \frac{1}{n} \sum_{i - 1}^{n} {(\bar{Y_{i}} - Y_{i})}^{2}

(17)

In this context,

\bar{Y_{i}}, Y_{i}, a n d n

represent the estimated output from each tree generated using out-of-bag samples, the desired output, and the total number of out-of-bag samples (which corresponds to the size of the input dataset), respectively. This error metric reflects the predictive performance of the Random Forest algorithm.

3. Methods

The dataset used in this study comprises various features including voltage, current density, power, and resistance. The target variable for performance prediction is the power output for SHAP analysis. For predicting the short-circuit current density

(J_{s c})

and maximum power output

(P_{m a x})

of the DSSCs, only the voltage feature applied. The data contained 100 data points with no missing values. The aim is to predict the short-circuit current density and maximum power of the DSSC using the given dataset. The dataset was sourced from “Dryad Repository Database”, loaded and processed from “MySQL database” [46,47,48,49] to ensure consistency and accuracy. MySQL ensures the consistency and accuracy of the data by guaranteeing that all queries from the database are processed reliably. Features like foreign keys for referential integrity, transaction handling with commit/rollback, and constraints like UNIQUE and NOT NULL help maintain accurate and consistent data throughout the system and ensure that the relationship between the tables in the database remain stable. The predictive modeling involved two machine learning algorithms: the SVR and RFR models. These models were chosen based on their complementary strengths. SVR is known for its kernel-type functionalities suitable for both linear and non-linear datasets, whereas RFR was chosen due to its robustness to noise and its ability to model complex feature interactions without significant risk of overfitting. Hyper-parameter optimization for both models was performed using the GridSearchCV method. The SVR and RFR models were implemented using the Scikit-learn Python library of the PyCharm software, version 2023.2.8. To interpret feature contributions and enhance model transparency, SHAP (Shapley Additive exPlanations) analysis was conducted. SHAP values allowed us to quantify the contribution of each feature to the model’s predictions, thus providing a foundation for physical insight into the most influential parameters affecting DSSC performance.

Model Training

Prior to model training, the dataset was checked for completeness and consistency, confirming that no missing values were present. A min–max normalization technique was applied to scale all features to the range (−1, 1) to improve the model convergence and performance, particularly for models like SVR. For the machine learning methods [50,51], normalization was commonly applied to mitigate the impact of varying data ranges. In this study, we opted for a normalization technique that scales data to the range of (−1, 1), which can be expressed as follows:

y = 2 \cdot (\frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}) - 1

(18)

Here,

y

is the normalized data and

x_{m i n}

and

x_{m a x}

denote the minimum and maximum values of the feature

x

. We selected (−1, 1) scaling normalization to balance both the direction and distribution of values, which is vital for kernel-based models like SVR for better convergence and lower MAE values [52]. This approach centers data on zero and maintains balance or symmetry for SVR models, which often perform better with such processes. To evaluate the performance of the SVR and RFR methods, the training and test datasets were randomly divided from an experimental dataset. The datasets accounted for 80% of the data for training and 20% for testing, ensuring a robust evaluation of model performance. This division aligns to the fact that such a percentage of separation is optimal for developing predictive models in this research. Using 80% of the data for training allows for the models to learn the underlying patterns and relationships within the dataset effectively, while the remaining 20% serves as an independent test set to evaluate the generalization capabilities of the new, unseen data. This approach not only enhances the reliability of the results, but also minimizes the risk of overfitting, a common challenge in predictive modeling [53,54]. The data contained 100 samples, and the 80/20 split was randomized and repeated using 5-fold cross-validation to validate model robustness. The random state was set to “SEED” to allow for the reproducibility of the results for every run. The 5-fold cross-validation is essential for evaluating the performance and reliability of the models [55,56]. Instead of relying on a single train–test split, which might introduce a lot of bias or variance depending on how the data are partitioned, 5-fold cross-validation divides the training set into five equal parts or folds. In each round of training, four of these folds are to train the model, and the remaining one is used for testing. This process is repeated five times so that each part serves as the testing/validation set once. This approach makes efficient use of every data point and works well with a small dataset because the model is tested across different subsets of data.

The “GridSearchCV” approach was applied from the Python library to systematically identify the optimal hyperparameters for both the SVR and RFR models. This method allows for a complete search of the specified parameter values, ensuring the selection of the best configuration for the optimized predication of J_s_c and P_max of the DSSC. Once the optimal hyper-parameters were identified, the next step was model training. The SVR model was initialized with the selected values for the regularization parameter (C) and kernel function coefficient (γ). During the training process, the model was fitted to the training data, aiming to find the optimal hyperplane that minimized prediction error while maintaining a level of simplicity. Notably, SVR applies an epsilon-insensitive loss function, which allows for the model to ignore small errors within a specified margin (

ε

) [57,58,59]. This characteristic enables the model to focus on significant deviations and disregarding trivial deviations, thereby enhancing robustness against noise in the data.

Kernel types, specifically the linear and radial basis function (RBF), were explored to capture the underlying patterns in the data effectively. Additionally, we fine-tuned the γ by testing the values from the set [0.1, 1], which helped in balancing the model’s complexity and generalization capabilities. The C was also optimized, with values [1, 10] tested to control the trade-off between achieving a low training error and a low testing error. Furthermore, the epsilon parameter, which defines the width of the epsilon-insensitive zone, was adjusted with values of [0.1, 0.2]. The resulting configuration yielded the best performance metrics for both the

J_{s c}

and

P_{m a x}

predictions.

For the Random Forest model, hyper-parameter tuning was lower compared to the SVR model. It focused basically on the number of estimators and the maximum depth of the trees. We evaluated

n

estimators with values of [100, 200], which dictate the number of trees in the forest and impact the robustness and accuracy of predictions. The maximum depth was set at values [10, 20] to prevent overfitting while ensuring that the model remained sufficiently complex to capture the patterns in the data. Additionally, we set the random state to [0] to ensure the reproducibility of results. This tuning of parameters led to optimal outcomes for both

J_{s c}

and

P_{m a x}

predictions. Overall, the GridSearchCV methodology provided a rigorous framework for hyper-parameter optimization, ultimately enhancing the predictive performance of both the SVR and RFR models [60]. The selected ranges of hyper-parameters for SVR and RFR were determined based on empirical considerations aimed at balancing model flexibility and generalization. These ranges offer a controlled yet effective search space for GridSearchCV, supporting model tuning so as to mitigate overfitting.

The selection of SVR and RFR models was dependent on the combination of accuracy, computational time, and interpretability. SVR is very well suited for small- to medium sized-datasets and offers a controlled balance between bias and variance via hyper-parameters such as the regularization parameter, kernel function, and epsilon margin. RFR, on the other hand, is well suited for robust performance on noisy and non-linear datasets and has strong resilience against overfitting due to ensemble averaging and its ability to capture feature interactions with minimal preprocessing [43]. Other models like artificial neural networks (ANNs) and gradient boosting trees (GBTs) often require more extensive hyper-parameter tuning, computational time, and larger training datasets to prevent overfitting and improve generalizability.

The R-squared (R²) value and mean absolute error (MAE) are crucial metrics for assessing the prediction performance of a model [61]. R², also known as the coefficient of determination, quantifies the proportion of variance in the dependent variable that can be explained by the independent variables in the model. Values of R² range from 0 to 1, where an R² close to 1 indicates that a significant portion of the variability in the measured data is captured by the model, suggesting a strong correlation between the predicted and actual values. On the other hand, the MAE measures the average magnitude of the errors in a set of predictions. It is calculated as the average of the absolute differences between the predicted and actual values. A MAE close to 0 indicates that the predictions are very close to the actual observations, reflecting high accuracy in the model’s performance. Together, R² and MAE provide a comprehensive understanding of a model’s predictive capabilities. While R² highlights the strength of the relationship between the predicted and actual values, MAE offers insight into the average prediction error. When both R² approaches 1 and MAE approaches 0, it suggests that the model is not only fitting the data well, but also making precise predictions, making it a reliable tool for predicting the performance of a DSSC.

R² is mathematically defined as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(19)

where

y_{i}

represents the actual values,

{\hat{y}}_{i}

is the predicted values, and

\bar{y}

is the mean of the actual values. MAE is defined as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} {|y_{i} - {\hat{y}}_{i}|}^{2}

(20)

where n is the number of samples. Both R² and MAE are essential for evaluating model performance, providing different yet complementary insights into the accuracy and reliability of predictions. The computation process of the RFR and SVR models is depicted in Figure 5.

4. Results and Discussion

4.1. SVR Prediction for J_sc and P_max

Both the linear and RBF kernel types of the SVR were evaluated for the J_s_c and P_max predictions. RBF kernel consistently outperformed the linear in terms of R-squared and MAE. This is attributed to the RBF kernel’s ability to capture non-linear relationships inherent in the J-V and P-V characteristics of the DSSCs, which are not well represented by linear functions. The SVR model with the RBF kernel provided better predictions of both J_s_c and P_max, confirming that kernel choice is critical for achieving better prediction accuracy. Table 1 provides the summary of linear and RBF performance metrics. In this context, the RBF kernel was selected for the SVR model graphs in Figure 6 and Figure 7.

A comparison between the SVR predicted and experimental current density (J) and voltage characteristics is depicted in Figure 6a, while the power output (P) and the voltage (V) of both the experimental and predicted characteristics are shown in Figure 6b. We observed a good agreement between the experimental and predicted values. The actual experimental value of the

J_{s c}

is 12.30 mA/cm², whereas the predicted value is 12.45 mA/cm², representing a minimal deviation of 1.22%, as shown in Table 1b. Since J_s_c is a data point in the J-V curve where the current density peaks, J_s_c occurs where the current is at its maximum, which corresponds to a point where the voltage is zero. The metrics used in quantifying the performance of the J-V curve prediction are presented in Table 1b, and the same metrics equally applied to J_sc, since it is an entity within the J-V curve. R-squared values achieved 0.8796 and 0.6460 for the training and testing sets, respectively. MAE values were 1.5674 and 2.2528 for the training and test datasets, respectively.

For the power output, a predicted

P_{m a x}

value of 1.91 mW was achieved, as compared to the experimental value of 1.98 mW, representing a 3.54% deviation. Since P_max is a data point where the power output peaks within the P-V curve, the P_max was maximum at a voltage approximately equal to 500 mV. As shown in Table 1b, P-V curve prediction achieved values of 0.9017 and 0.7536 for the training and test datasets, respectively, while the MAE achieved 0.7060 and 1.0545 for the training and test sets, respectively; the same metrics apply for the P_max prediction, since it is an entity in the P-V curve. These performance metrics are accurate and suggest that the SVR model applied provided reasonable accurate predictions, attributed to the robustness of the model, although it may still benefit from further refinement to reduce error margins and overfitting. The relationship between current density and voltage typically follows a non-linear trend, and J decreases as V increases, thereby portraying the behaviors of a typical and conventional DSSC photovoltaic device. Similarly, power output and voltage depict a P-V characteristic curve with a non-linear trend, with power increasing alongside voltage until it reaches a maximum (P_max), after which it decreases as voltage continues to increase, indicating the optimal operating point for maximum power to be approximately 500 mV.

The experimental vs. SVR predicted data of current density

(J)

and power output

(P)

of dye-sensitized solar cells (DSSCs) are shown in Figure 7a,b. As depicted in the figures, the data points were found clustered near the line of best fit, suggesting good agreement between the experimental and predicted data. Additionally, higher deviations were observed at lower and higher current ranges, which could be attributed to fewer data points in these ranges, suggesting some data imbalance, which is one of the limitations of the SVR model in this scenario. Moreover, the values of mean absolute error (MAE) and coefficient of determination R-squared (R²) are presented in the summary Table 1b. For current density prediction, the SVR model achieved R-squared values of 0.8796 and 0.6460 for the training and test datasets. Furthermore, MAE values of 1.5674 and 2.2528 for the training and test datasets were realized. For the power output prediction, the R-squared (R²) values were 0.9017 and 0.7536 for the training and test sets, respectively. MAE values were 0.7060 and 1.0545 for the training and test sets, respectively, as depicted in Table 1. Overall, the R² values close to 1 and low MAE values from the SVR suggest the accurate prediction of the DSSC performance metrics. The SVR model predicted reasonably well, and there was no substantial difference between the predicted and actual data, making the model reliable for predicting DSSC performance. Table 1b also presents the difference between the experimental and predicted results. These metrics indicate that the model demonstrated a decent level of predictive accuracy, effectively capturing the underlying relationships between the input features. However, it is important to note that the model exhibited sensitivity to the choice of hyper-parameters, suggesting that the performance could vary significantly based on the tuning process [62]. Therefore, the need to be more reliant on hyper-parameter settings such as kernel types, kernel function coefficient, regularization, and epsilon parameters can eventually lead to overfitting or inconsistent performance due to the inability to manage interactions among features without many parameter specifications.

Extensive hyper-parameter optimization was necessary to achieve these results, highlighting yet another potential limitation of the SVR approach in this context. This requirement for meticulous tuning may hinder the model’s applicability in real-time scenarios, where rapid predictions are essential. Again, the relatively small data size of 100 data points also posed a limitation, as the SVR model experienced inflated R² values on the training dataset, reduced generalization ability, and overfitting. While the SVR showed high performance on the training data, its lower R² and higher MAE on the test data suggest overfitting, a common concern with small datasets. This highlights the model’s limited ability to generalize to unseen data. It also shows that the model’s performance was high on the training data, but fails to accurately generalize to new, unseen data, leading to low performance on the test set.

4.2. RFR Prediction for J_sc and P_max

Figure 8a shows the comparison between the J-V characteristics of the actual and predicted RFR model, while the P-V characteristics are shown in Figure 8b. In Figure 8a, the DSSC has an actual

J_{S C}

of 12.30 mA/cm² and predicted

J_{S C}

of 12.39 mA/cm², indicating a percentage difference of 0.73% with the experimental value. In Figure 8b the actual value of the

P_{m a x}

is 1.98 mW, while the predicted value is 1.96 mW. The percentage difference between the experimental and the predicted

P_{m a x}

is 1.01%. The relationship between the current density and the voltage follows an exponential pattern according to 8a, with the current density decreasing as the voltage increases. The power–voltage characteristic presents a non-linear relationship: the power increases as voltage increases, until it reaches the maximum power point, where the power starts to decrease as the voltage increases as shown in Figure 8b.

For performance evaluation, the experimental versus predicted graph for the current density (J) and power output (P) are shown in Figure 9a,b, respectively. The RFR model demonstrated greater performance

J

prediction with R-squared values of 0.9252 and 0.9784 for the training and test datasets. Mean absolute error of 0.8441 and 0.5611 were achieved in the training dataset and test dataset, as shown in Table 2. The RFR model of power output achieved R-square values of 0.9308 and 0.9792 for the training and test datasets, respectively. The MAE are 0.4445 and 0.3156 for the training and test datasets, respectively, as shown in Table 2. The difference between the predicted and actual values of the

P_{m a x}

is depicted in Table 2 as well. This remarkable R-squared value for the training dataset indicates a strong correlation between the predicted and actual values for

J

and

P

, suggesting that the model can effectively explain a significant portion of the variability in the data [63]. The low MAE further emphasizes the model’s accuracy in predicting

J

and

P,

which showed minimal errors in this context. The performance metrics of the test set are higher than the training dataset, indicating no presence of model overfitting, as it was able to generalize well to new, unseen data.

The RFR model’s ability to capture complex interactions made it particularly well suited for predicting

J_{S C}

and

P_{m a x}

. Unlike traditional linear models, such as mathematical modeling or simulation, which may struggle with multi-collinearity and interaction effects, the RFR approach utilizes decision trees that can easily accommodate non-linear relationships and feature interactions [64,65]. This flexibility allows for the model to consider various physical parameters simultaneously, leading to predictions that are more refined.

Additionally, the model’s robustness against overfitting due to its ensemble nature ensures reliable performance across different datasets. This aspect is crucial in practical applications where data variability is common. The results not only emphasize the effectiveness of the RFR model in this context, but also highlight its potential applicability in other predictive tasks within applied sciences, materials science, and engineering. Overall, these findings support the idea that advanced machine learning techniques like RFR can significantly advance our understanding and predictive abilities in complex systems [66,67,68].

4.3. Correlation Analysis of DSSC Variables

The correlation analysis provides an insight into the relationships between the four key electrical performance measures of DSSCs: resistance, power, voltage, and current density. The fundamental trade-offs and interdependencies that influence the functionality and effectiveness of DSSCs are revealed by these connections [69]. Figure 10 presents the correlation matrix of the different features of the DSSC. Resistance is a feature in the dataset and refers to the series resistance of the DSSC. The correlation between resistance and other features were very close to zero, demonstrating that the effect of resistance can minimize the contribution of other features, which can reduce the performance of DSSCs. The negative correlation observed between voltage and current density indicates the inherent trade-off present in DSSCs. At high voltages, the electron mobility reduces, resulting in a lower current density. On the other hand, at low voltages, electron mobility increases, leading to an increased current density. This connection corresponds to the J-V characteristics of DSSCs, demonstrating that high currents occur at reduced operating voltages [70,71,72,73].

The positive correlation between voltage and power suggests that increasing voltage typically enhances power output, although only to a certain extent. The power in DSSCs is defined by P = V × J, meaning that an increase in voltage can boost power, as long as the current density does not decrease remarkably. Nonetheless, because V and J have an inverse relationship, there exists an optimum voltage at which the power output reaches its maximum. This optimum voltage aligns with the maximum power point (MPP) in DSSCs [74,75]. Therefore, the correlation analysis highlights the importance of balancing voltage and current density in DSSC optimization in order to achieve high maximum power output.

4.4. Shapley Additive exPlanation

The Shapley Additive exPlanations (SHAP) summary in Table 3 below shows the dataset features and their descriptive statistics, including the number of samples (count), key parameters such as voltage, current density, power, and resistance, and their descriptive statistics (mean, standard deviation, minimum and maximum values). The summary plot, shown in Figure 11, highlights the influence of three key features, current density, voltage, and resistance, on the model’s predictions for the power output of the DSSC. Each feature’s SHAP value reflects its contribution to the model’s output, offering insights into its role in DSSC power output performance. Figure 11a shows that current density is the most impactful feature, with a wide range of SHAP values densely clustering on the positive range and also appearing at the top of the feature values. For clarity of the most impactful feature, Figure 11b depicts the average impact of the features on the model’s output performance. As shown, the current density has the highest magnitude relative to other features. High values of

J

contribute positively to the model’s output. In DSSCs,

J

represents the photocurrent generated by the device and is directly tied to its efficiency [76,77,78]. Therefore, improving

J

through strategies such as optimizing dye coverage, reducing recombination, and enhancing charge transport should be a priority for boosting DSSC power performance [77,79,80,81].

Voltage has a moderate influence on the model’s power output, with SHAP values clustering more in the positive range. Low voltage values negatively impacted the model power output, while high voltage values showed positive effects on power output. Voltage corresponds to the open-circuit voltage (

V_{O C}

) of the DSSC, which depends on the energy difference between the TiO₂ Fermi level and the electrolyte’s redox potential [82,83,84]. High voltage is typically associated with effective charge separation and minimal recombination, whereas low voltage often indicates significant recombination losses. While improving

V_{O C}

is beneficial, its impact on DSSC performance is less pronounced than that of current density, as gains in voltage typically result in smaller efficiency improvements.

Resistance has the SHAP values consistently clustered in the negative range. Regardless of whether resistance is high or low, it reduces the model’s power output, with high resistance having a stronger negative impact. Resistance in DSSCs can be attributed to series resistance (

R_{s}

) which contributes to energy loss [85,86,87]. High resistance can stem from poor electrode conductivity, sub-optimal electrolyte composition, or inefficient charge transfer. Minimizing resistance by using low-resistance materials, such as highly conductive electrodes, and optimizing electrolyte formulations can help improve DSSC performance.

However, the SHAP analysis highlights the critical importance of current density as the dominant factor influencing DSSC performance, followed by voltage and resistance. While current density was the most impactful feature, its synergy with other features like voltage and resistance is crucial in governing the physical mechanisms of the DSSC, like carrier recombination and energy-level matching. High current density and high voltage generally suggests effective energy-level matching between the dye, semiconductor, and electrolyte, as well as efficient charge separation. On the other hand, even when the current density is high, high resistance can reduce the maximum power output. An analysis of the SHAP values showed that resistance, although being the least impactful feature, had high magnitude. Therefore, the combined high contribution of current density and resistance suggested the presence of recombination losses in reducing the power output even though the current density was high.

For optimization, the focus should be on enhancing J through better dye sensitization and charge carrier generation while also improving

V

by ensuring effective charge separation. Reducing

R

can further complement these efforts, helping to reduce recombination losses and enhance the efficiency of DSSCs. By accurately predicting the SHAP feature contributions that affect DSSC performance, developers can virtually access different electrode compositions and configurations that would yield the best current density performances before fabrication. The advantage of this is that it reduces experimental workload, curtails development cycles, and guide strategic decisions in selecting high-performance materials.

5. Conclusions

Principal focus of Random Forest regression (RFR) and support vector regression (SVR) is based upon their effectiveness in predicting key performance electrical metrics, the short-circuit current density (J_sc) and maximum power (P_max) for a DSSC. While both models achieved reasonably accurate predictions, the percentage difference values for J_s_c and P_max indicated that RFR outperformed SVR, primarily due to its ensemble approach which enhances robustness and the ability to capture intricate relationship among various features.

RFR model achieved better prediction response with percentage difference between actual and predicted values of 0.73% and 1.01% for J_s_c and P_max respectively. The percentage difference in the SVR represents a 1.22% deviation from the experimental J_s_c and 3.54% from experimental P_max. Additionally, RFR exhibited higher R-squared values and lower MAE on both the training and testing datasets, reflecting its superior generalization capability. In contrast, SVR required extensive hyper-parameter tuning and showed susceptibility to overfitting. SVR evaluated both linear and RBF kernel types to follow underlying patterns in the data and the RBF outperformed the linear function and thus was chosen for graphical analysis. Other hyper-parameters considered were the regularization parameter, kernel function coefficient and the epsilon-insensitive zone for further fine-tunings. However, hyper-parameter tuning was less for RFR model as compared to SVR. It included basically number of estimators and maximum depth of trees.

These findings showed significant recommendations for researchers in the field of photovoltaics. RFR is preferred because it indicates a shift toward less parameter-sensitive and more robust modeling techniques, which can save time when training and testing models. The findings also demonstrated how crucial it is to select the best modeling approach depending on the intricacy of the data. Ensemble approaches like RFR are probably going to produce more insightful forecasts for situations with intricate, non-linear relationships between features.

Significance of correlation demonstrates the importance of balancing voltage and current density of DSSC to achieve optimum power output, as it reveals the essential trade-offs and interdependencies that affect the functionalities and performance of DSSCs. SHAP plot highlights the effect of key features that notably influence power output of DSSC. Current density showed the most impactful feature with much of SHAP values having higher performance magnitude. Hence, enhancing current density via optimizing dye loading, minimizing charge carrier recombination and improving charge carrier transport would boost power output of DSSC. That being said, it is crucial to indicate that SVR and RFR models are not limited to DSSCs alone. These models once retrained and tested with appropriate datasets, can also serve general purpose which can be applied to predict performance across various solar cell technologies such as silicon-based or perovskite cells.

In this study, RFR model demonstrated strong generalization ability with better test performance metrics than the training set, which highlights the model’s resilience against overfitting due to its ensemble structure. In contrast, while the SVR model showed high performance on the training data, its lower R-squared and higher MAE values on the test data demonstrated that overfitting occurred, which a common concern with small datasets is. Thereby indicating the model’s inability to generalize to unseen data. However, future work will focus on expanding the dataset–realizable through the incorporation of experimental dataset of larger value to improve model robustness and prediction generalizability and to mitigate overfitting.

Author Contributions

Conceptualization, E.H.O. and N.L.L.; Methodology, E.H.O.; Software, E.H.O.; Validation, E.H.O., N.L.L. and P.M.; Formal analysis, E.H.O. and N.L.L.; Resources, N.L.L.; Data curation, E.H.O.; Writing—original draft, E.H.O.; Writing—review and editing, N.L.L.; Visualization, E.H.O.; Supervision, N.L.L. and P.M.; Project administration, N.L.L.; Funding acquisition, N.L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Technology Innovation Agency Seed Fund of the department of research and innovation, South Africa. It was also supported by the Research Niche Area (RNA), Renewable Energy Wind in Research Partnerships and Innovation, Postgraduate Studies and Postdoctoral Fellowships, University of Fort Hare, Alice, South Africa.

Data Availability Statement

Data are available upon reasonable request.

Acknowledgments

We acknowledge the assistance of the SAMRC Microbial Water Quality Monitoring Centre of the University of Fort Hare, Alice, South Africa.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ML	Machine learning
DSSC	Dye-sensitized solar cell
J_sc	Short-circuit current density
P_max	Maximum power
V_oc	Open circuit voltage
SVR	Support vector regression
RFR	Random Forest regression
MAE	Mean absolute error
R²	R-squared
SHAP	Sharpley Additive exPlanation

References

Mahmood, A.; Tang, A.; Wang, X.; Zhuo, E. First-principles theoretical designing of planar non-fullerene small molecular acceptors for organic solar cells: Manipulation of noncovalent interactions. Phys. Chem. 2019, 21, 2128–2139. [Google Scholar] [CrossRef] [PubMed]
Abdellatif, S.O.; Josten, S.; Khalil, A.S.; Erni, D. Marlow FTransparency and diffused light efficiency of dye-sensitized solar cells: Tuning and a new figure of merit. IEEE J. Photovolt. 2020, 10, 522–530. [Google Scholar] [CrossRef]
Grätzel, M. Photoelectrochemical Cells. Nature 2001, 414, 338–344. [Google Scholar] [CrossRef] [PubMed]
Mariotti, N.; Bonomo, M.; Fagiolaari, L.; Barbero, N.; Gerbaldi, C.; Bella, F.; Barolo, C. Recent advances in eco-friendly and cost-effective materials towards sustainable dye-sensitized solar cells. Green Chem. 2020, 22, 7168–7218. [Google Scholar] [CrossRef]
Khir, H.; Pandey, A.K.; Saidur, R.; Ahmad, M.S.; Abd Rahim, N.; Dewika, M.; Samykano, M. Recent advancements and challenges in flexible low temperature dye sensitised solar cells. Sustain. Energy Technol. Assess. 2022, 53, 102745. [Google Scholar] [CrossRef]
O’Regan, B.; Gratzelt, M. A low-cost, high-efficiency solar cell based on dye-sensitized colloidal Ti0₂ films. Nature 1991, 353, 737–740. [Google Scholar] [CrossRef]
Kojima, A.; Teshima, K.; Shirai, Y.; Miyasaka, T. Organometal halide perovskites as visible-light sensitizers for photovoltaic cells. J. Am. Chem. Soc. 2009, 131, 6050–6051. [Google Scholar] [CrossRef]
Kokkonen, M.; Talebi, P.; Zhou, J.; Asgari, S.; Soomro, S.A.; Elsehrawy, F.; Halme, J.; Ahmad, S.; Hagfeldt, A.; Hashmi, S.G. Advanced research trends in dye-sensitized solar cells. J. Mater. Chem. A 2021, 9, 10527–10545. [Google Scholar] [CrossRef]
Bach, U.; Lupo, D.; Comte, P.; Moser, J.E.; Weissörtel, F.; Salbeck, J.; Spreitzer, H.; Grätzel, M. Solid-state dye-sensitized mesoporous TiO₂ solar cells with high photon-to-electron conversion efficiencies. Nature 1998, 395, 583–585. [Google Scholar] [CrossRef]
Wu, J.; Lan, Z.; Lin, J.; Huang, M.; Huang, Y.; Fan, L.; Luo, G. Electrolytes in dye-sensitized solar cells. Chem. Rev. 2015, 115, 2136–2173. [Google Scholar]
Su’ait, M.S.; Rahman, M.Y.A.; Ahmad, A. Review on polymer electrolyte in dye-sensitized solar cells (DSSCs). Sol. Energy 2015, 115, 452–470. [Google Scholar] [CrossRef]
Agrawal, A.; Siddiqui, S.A.; Soni, A.; Sharma, G.D. Advancements, frontiers and analysis of metal oxide semiconductor, dye, electrolyte and counter electrode of dye sensitized solar cell. Sol. Energy 2022, 233, 378–407. [Google Scholar]
Feng, J.; Wang, H.; Ji, Y.; Li, Y. Molecular design and performance improvement in organic solar cells guided by high-throughput screening and machine learning. Nano Sel. 2021, 2, 1629–1641. [Google Scholar] [CrossRef]
Bhatti, S.; Manzoor, H.U.; Michel, B.; Bonilla, R.S.; Abrams, R.; Zoha, A.; Hussain, S.; Ghannam, R. Revolutionizing low-cost solar cells with machine learning: A systematic review of optimization techniques. Adv. Energy Sustain. Res. 2023, 4, 2300004. [Google Scholar] [CrossRef]
Kandregula, G.R.; Murugaiah, D.K.; Murugan, N.A.; Ramanujam, K. Data-driven approach towards identifying dyesensitizer molecules for higher power conversion efficiency in solar cells. New J. Chem. 2022, 46, 4395–4405. [Google Scholar] [CrossRef]
Sutar, S.S.; Patil, S.M.; Kadam, S.J.; Kamat, R.K.; Kim, D.K.; Dongale, T.D. Analysis and Prediction of Hydrothermally Synthesized ZnO-Based Dye-Sensitized Solar Cell Properties Using Statistical and Machine-Learning Techniques. Acs Omega 2021, 6, 29982–29992. [Google Scholar] [CrossRef] [PubMed]
Varga, Z.; Racz, E. Machine learning analysis on the performance of dye-sensitized solar cell—Thermoelectric generator hybrid system. Energies 2022, 15, 7222. [Google Scholar] [CrossRef]
Xiao, F.; Saqib, M.; Razzaq, S.; Mubashir, T.; Tahir, M.H.; Moussa, I.M.; El-Ansary, H.O. Performance prediction of polymer-fullerene organic solar cells and data mining-assisted designing of new polymers. J. Mol. Model. 2023, 29, 270. [Google Scholar] [CrossRef]
Ghanbari Motlagh, S.; Razi Astaraei, F.; Hajihosseini, M.; Madani, S. Application of Machine Learning Algorithms in Improving Nano-based Solar Cell Technology. J. AI Data Min. 2023, 11, 357–374. [Google Scholar]
Alwadai, N.; Khan, S.U.D.; Elqahtani, Z.M.; Ud-Din Khan, S. Machine learning assisted prediction of power conversion efficiency of all-small molecule organic solar cells: A data visualization and statistical analysis. Molecules 2022, 27, 5905. [Google Scholar] [CrossRef]
Khalifa, Z.; Abdellatif, S.; Fathi, A.; Abdullah, K.; Hassan, M. Investigating the variation in the optical properties of TiO₂ thin-film utilized in bifacial solar cells using machine learning algorithm. J. Photonics Energy 2022, 12, 022202. [Google Scholar]
Chalimourda, A.; Schölkopf, B.; Smola, A.J. Experimentally optimal ν in support vector regression for different noise models and parameter settings. Neural Netw. 2004, 17, 127–141. [Google Scholar] [CrossRef] [PubMed]
Mushtaq, Z.; Ramzan, M.F.; Ali, S.; Baseer, S.; Samad, A.; Husnain, M. Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques. Mob. Inf. Syst. 2022, 2022, 6521532. [Google Scholar] [CrossRef]
Han, L.; Koide, N.; Chiba, Y.; Islam, A.; Mitate, T. Modeling of an equivalent circuit for dye-sensitized solar cells: Improvement of efficiency of dye-sensitized solar cells by reducing internal resistance. Comptes Rendus Chim. 2006, 9, 645–651. [Google Scholar] [CrossRef]
Li, J.; Li, R.; Jia, Y.; Zhang, Z. Prediction of I–V characteristic curve for photovoltaic modules based on convolutional neural network. Sensors 2020, 20, 2119. [Google Scholar] [CrossRef]
Han, L.; Koide, N.; Chiba, Y.; Mitate, T. Modeling of an equivalent circuit for dyesensitized solar cells. Appl. Phys. Lett. 2004, 84, 2433–2435. [Google Scholar] [CrossRef]
Green, M.A.; Dunlop, E.D.; Hohl-Ebinger, J.; Yoshita, M.; Kopidakis, N.; Hao, X. Solar cell efficiency tables (Version 58). Prog. Photovolt. 2021, 29, 657–667. [Google Scholar] [CrossRef]
Martin, A.; Green, Y.H.; Dunlop, E.D.; Levi, D.H.; HohlEbinger, J. Solar cell efficiency tables (version 52). Prog. Photovolt. 2018, 26, 427–436. [Google Scholar]
Green, M.A. Accurate expressions for solar cell fill factors including series and shunt resistances. Appl. Phys. Lett. 2016, 108, 081111. [Google Scholar] [CrossRef]
Green, M.; Dunlop, E.; Hohl-Ebinger, J.; Yoshita, M.; Kopidakis, N.; Hao, X. Solar cell efficiency tables (version 57). Prog. Photovolt. Res. Appl. 2021, 29, 3–15. [Google Scholar] [CrossRef]
Deng, H.; Fannon, D.; Eckelman, M.J. Predictive modeling for US commercial building energy use: A comparison of existing statistical and machine learning algorithms using CBECS microdata. Energy and Buildings. Energy Build 2018, 163, 34–43. [Google Scholar] [CrossRef]
Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar]
Patle, A.; Chouhan, D.S. SVM kernel functions for classification. In Proceedings of the 2013 International Conference on Advances in Technology and Engineering (ICATE), Mumbai, India, 23–25 January 2013; pp. 1–9. [Google Scholar]
Blanco, V.; Japón, A.; Puerto, J. Optimal arrangements of hyperplanes for SVM-based multiclass classification. Adv. Data Anal. Classif. 2020, 14, 175–199. [Google Scholar] [CrossRef]
Fuad, M.; Hussain, M. Machine learning based modeling for solid oxide fuel cells power performance prediction. In Proceedings of the 6th International Conference Process System Engineering (PSE ASIA), Kuala Lumpur, Malaysia, 25–27 June 2013; pp. 19–24. [Google Scholar]
Yu, P.S.; Chen, S.T.; Chang, I.F. Support vector regression for real-time flood stage forecasting. J. Hydrol. 2006, 328, 704–716. [Google Scholar] [CrossRef]
Duan, H.; Huang, Y.; Mehra, R.K.; Song, P.; Ma, F. Study on Influencing Factors of Prediction Accuracy of Support Vector Machine (SVM) Model for NOx Emission of a Hydrogen Enriched Compressed Natural Gas Engine. Fuel 2018, 234, 954–964. [Google Scholar] [CrossRef]
Gordon, D.; Norouzi, A.; Blomeyer, G.; Bedei, J.; Aliramezani, M.; Andert, J.; Koch, C.R. Support Vector Machine Based Emissions Modeling Using Particle Swarm Optimization for Homogeneous Charge Compression Ignition Engine. Int. J. Engine Res. 2021, 24, 536–551. [Google Scholar] [CrossRef]
Wang, H.; Ji, C.; Shi, C.; Ge, Y.; Wang, S.; Yang, J. Development of Cyclic Variation Prediction Model of the Gasoline and N-Butanol Rotary Engines with Hydrogen Enrichment. Fuel 2021, 299, 120891. [Google Scholar] [CrossRef]
Anyanwu, G.O.; Nwakanma, C.I.; Lee, J.M.; Kim, D.S. Optimization of RBF-SVM kernel using grid search algorithm for DDoS attack detection in SDN-based VANET. IEEE Internet Things J. 2022, 10, 8477–8490. [Google Scholar] [CrossRef]
Pan, M.; Li, C.; Gao, R.; Huang, Y.; You, H.; Gu, T.; Qin, F. Photovoltaic power forecasting based on a support vector machine with improved ant colony optimization. J. Clean. Prod. 2020, 277, 123948. [Google Scholar] [CrossRef]
Babar, B.; Luppino, L.T.; Boström, T.; Anfinsen, S.N. Random forest regression for improved mapping of solar irradiance at high latitudes. Sol. Energy 2020, 198, 81–92. [Google Scholar] [CrossRef]
Greenaway, R.L.; Jelfs, K.E. Integrating computational and experimental workflows for accelerated organic materials discovery. Adv. Mater. 2021, 33, 2004831. [Google Scholar] [CrossRef] [PubMed]
Boateng, E.Y.; Otoo, J.; Abaye, D.A. Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: A review. J. Data Anal. Inf. Process. 2020, 8, 341–357. [Google Scholar] [CrossRef]
Adusumilli, S.; Bhatt, D.; Wang, H.; Bhattacharya, P.; Devabhaktuni, V. A low-cost INS/GPS integration methodology based on random forest regression keywords: Artificial neural network global positioning system inertial navigation system random forest regression. Expert Syst. Appl. 2013, 40, 4653–4659. [Google Scholar] [CrossRef]
Christudas, B.; Christudas, B. MySQL. In Practical Microservices Architectural Patterns: Event-Based Java Microservices with Spring Boot and Spring Cloud; SpringerLink: Berlin/Heidelberg, Germany, 2019; pp. 877–884. [Google Scholar]
Šušter, I.; Ranisavljević, T. Optimization of MySQL database. J. Process Manag. New Technol. 2023, 11, 141–151. [Google Scholar] [CrossRef]
Schwartz, B.; Zaitsev, P.; Tkachenko, V. High Performance MySQL: Optimization, Backups, and Replication; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2012. [Google Scholar]
Machacek, J.; Prochazka, Z.; Drapela, J. System for measuring and collecting data from solar-cell systems. In Proceedings of the 2007 9th International Conference on Electrical Power Quality and Utilisation, Barcelona, Spain, 9–11 October 2007; pp. 1–4. [Google Scholar]
Liu, J.; Huang, Q.; Ulishney, C.; Dumitrescu, C.E. Comparison of Random Forest and Neural Network in Modelling the Performance and Emissions of a Natural Gas Spark Ignition Engine. J. Energy Resour. Technol. 2022, 144, 032310. [Google Scholar] [CrossRef]
Liu, J.; Ulishney, C.; Dumitrescu, C.E. Machine Learning Assisted Analysis of Heat Transfer Characteristics of a Heavy Duty Natural Gas Engine (No. 2022-01-0473); SAE Technical Paper; SAE International: Washington, DC, USA, 2022; pp. 1–8. [Google Scholar]
Al-Anazi, A.; Gates, I.D. Support-vector regression for permeability prediction in a heterogeneous reservoir: A comparative study. SPE Reserv. Eval. Eng. 2010, 13, 485–495. [Google Scholar] [CrossRef]
Shahpouri, S.; Norouzi, A.; Hayduk, C.; Rezaei, R.; Shahbakhti, M.; Koch, C.R. Hybrid Machine Learning Approaches and a Systematic Model Selection Process for Predicting Soot Emissions in Compression Ignition Engines. Energies 2021, 14, 7865. [Google Scholar] [CrossRef]
Hao, D.; Mehra, R.K.; Luo, S.; Nie, Z.; Ren, X.; Fanhua, M. Experimental Study of Hydrogen Enriched Compressed Natural Gas (HCNG) Engine and Application of Support Vector Machine (SVM) on Prediction of Engine Performance at Specific Condition. Int. J. Hydrog. Energy 2019, 45, 5309–5325. [Google Scholar] [CrossRef]
Wong, T.T.; Yeh, P.Y. Reliable accuracy estimates from k-fold cross validation. IEEE Trans. Knowl. Data Eng. 2019, 32, 1586–1594. [Google Scholar] [CrossRef]
Nti, I.K.; Nyarko-Boateng, O.; Aning, J. Performance of machine learning algorithms with different K values in K-fold cross-validation. Int. J. Inf. Technol. Comput. Sci. 2021, 13, 61–71. [Google Scholar]
Carrasco, M.; López, J.; Maldonado, S. Epsilon-nonparallel support vector regression. Appl. Intell. 2019, 49, 4223–4236. [Google Scholar] [CrossRef]
Nourali, H.; Osanloo, M. Mining capital cost estimation using Support Vector Regression (SVR). Resour. Policy 2019, 62, 527–540. [Google Scholar] [CrossRef]
Tanveer, M.; Rajani, T.; Rastogi, R.; Shao, Y.H.; Ganaie, M.A. Comprehensive review on twin support vector machines. Ann. Oper. Res. 2024, 339, 1223–1268. [Google Scholar] [CrossRef]
Torres-Barrán, A.; Alonso, Á.; Dorronsoro, J.R. Regression tree ensembles for wind energy and solar radiation prediction. Neurocomputing 2019, 326, 151–160. [Google Scholar] [CrossRef]
Imam, A.A.; Abusorrah, A.; Seedahmed, M.M.; Marzband, M. Accurate Forecasting of Global Horizontal Irradiance in Saudi Arabia: A Comparative Study of Machine Learning Predictive Models and Feature Selection Techniques. Mathematics 2024, 12, 2600. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Q.; Chen, X.; Yan, Y.; Yang, R.; Liu, Z.; Fu, J. The prediction of spark-ignition engine performance and emissions based on the SVR algorithm. Processes 2022, 10, 312. [Google Scholar] [CrossRef]
Dey, K.; Kalita, K. Prediction performance analysis of neural network models for an electrical discharge turning process. Int. J. Interact. Des. Manuf. 2023, 17, 827–845. [Google Scholar] [CrossRef]
Al-Dahidi, S.; Alrbai, M.; Alahmer, H.; Rinchi, B.; Alahmer, A. Enhancing solar photovoltaic energy production prediction using diverse machine learning models tuned with the chimp optimization algorithm. Sci. Rep. 2024, 14, 18583. [Google Scholar] [CrossRef] [PubMed]
Tucci, M.; Piazzi, A.; Thomopulos, D. Machine Learning Models for Regional Photovoltaic Power Generation Forecasting with Limited Plant-Specific Data. Energies 2024, 17, 2346. [Google Scholar] [CrossRef]
Benti, N.E.; Chaka, M.D.; Semie, A.G. Forecasting renewable energy generation with machine learning and deep learning: Current advances and future prospects. Sustainability 2023, 15, 7087. [Google Scholar] [CrossRef]
Tan, H.; Qin, J.; Li, Z.; Wu, W. PSCNetlLong sequence time-series forecasting for photovoltaic power via period selection and cross-variable attention. Appl. Intell. 2025, 55, 1–7. [Google Scholar] [CrossRef]
Mubarak, H.; Hammoudeh, A.; Ahmad, S.; Abdellatif, A.; Mekhilef, S.; Mokhlis, H.; Dupont, S. A hybrid machine learning method with explicit time encoding for improved Malaysian photovoltaic power prediction. J. Clean. Prod. 2023, 383, 134979. [Google Scholar] [CrossRef]
Maddah, H.A. Machine learning analysis on performance of naturally-sensitized solar cells. Opt. Mater. 2022, 128, 112343. [Google Scholar] [CrossRef]
Jena, A.; Mohanty, S.P.; Kumar, P.; Naduvath, J.; Gondane, V.; Lekha, P.; Das, J.; Narula, H.K.; Mallick, S.; Bhargava, P. Dye sensitized solar cells: A review. Trans. Indian Ceram. Soc. 2012, 71, 1–16. [Google Scholar] [CrossRef]
Sarker, S.; Kim, D.M. Measurement and simulation of current-voltage relation in dye-sensitized solar cells with reduced graphene oxide at the counter electrodes. Sol. Energy 2018, 176, 656–662. [Google Scholar] [CrossRef]
Sahu, S.; Patel, M.; Verma, A.K.; Tiwari, S. Analytical study of current density-voltage relation in dye-sensitized solar cells using equivalent circuit model. In Proceedings of the 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing, Chennai, India, 1–2 August 2017; pp. 1489–1493. [Google Scholar]
Cavallo, C.; Di Pascasio, F.; Latini, A.; Bonomo, M.; Dini, D. Nanostructured semiconductor materials for dye-sensitized solar cells. J. Nanomater. 2017, 1, 5323164. [Google Scholar] [CrossRef]
Maçaira, J.; Andrade, L.; Mendes, A. Review on nanostructured photoelectrodes for next generation dye-sensitized solar cells. Renew. Sustain. Energy Rev. 2014, 27, 334–349. [Google Scholar] [CrossRef]
Keppetipola, N.M.; Tada, K.; Olivier, C.; Hirsch, L.; Bessho, T.; Uchida, S.; Segawa, H.; Toupance, T.; Cojocaru, L. Comparative performance analysis of photo-supercapacitor based on silicon, dye-sensitized and perovskite solar cells: Towards indoor applications. Sol. Energy Mater. Sol. Cells 2022, 247, 111966. [Google Scholar] [CrossRef]
Yun, S.; Qin, Y.; Uhl, A.R.; Vlachopoulos, N.; Yin, M.; Li, D.; Han, X.; Hagfeldt, A. New-generation integrated devices based on dye-sensitized and perovskite solar cells. Energy Environ. Sci. 2018, 11, 476–526. [Google Scholar] [CrossRef]
Al-Alwani, M.A.; Mohamad, A.B.; Ludin, N.A.; Kadhum, A.A.; Sopian, K. Dye-sensitised solar cells: Development, structure, operation principles, electron kinetics, characterisation, synthesis materials and natural photosensitisers. Renew. Sustain. Energy Rev. 2016, 65, 183–213. [Google Scholar] [CrossRef]
Patwari, J. Inversion of activity in DSSC for TiO₂ and ZnO photo-anodes depending on the choice of sensitizer and carrier dynamics. J. Lumin. 2019, 207, 169–176. [Google Scholar] [CrossRef]
Zheng, D.; Yang, X.; Čuček, L.; Wang, J.; Ma, T.; Yin, C. Revolutionizing Dye-sensitized Solar Cells with Nanomaterials for Enhanced Photoelectric Performance. J. Clean. Prod. 2024, 464, 142717. [Google Scholar] [CrossRef]
Babar, F.; Mehmood, U.; Asghar, H.; Mehdi, M.H.; Khan, A.U.H.; Khalid, H.; ul Huda, N.; Fatima, Z. Nanostructured photoanode materials and their deposition methods for efficient and economical third generation dye-sensitized solar cells: A comprehensive review. Renew. Sustain. Energy Rev. 2020, 129, 109919. [Google Scholar] [CrossRef]
Thavasi, V.R.R.J.S.R.V. Controlled electron injection and transport at materials interfaces in dye sensitized solar cells. Mater. Sci. Eng. R Rep. 2009, 63, 81–99. [Google Scholar] [CrossRef]
Omar, A.; Ali, M.S.; Abd Rahim, N. Electron transport properties analysis of titanium dioxide dye-sensitized solar cells (TiO₂-DSSCs) based natural dyes using electrochemical impedance spectroscopy concept: A review. Sol. Energy 2020, 207, 1088–1121. [Google Scholar] [CrossRef]
Kushwaha, S. A DSSC with an Efficiency of ∼10%: Fermi Level Manipulation Impacting the Electron Transport at the Photoelectrode-Electrolyte Interface. ChemistrySelect 2016, 1, 6179–6187. [Google Scholar] [CrossRef]
Colella, S.; Orgiu, E.; Bruder, I.; Liscio, A.; Palermo, V.; Bruchmann, B.; Samorì, P.; Erk, P. Titanium Dioxide Mesoporous Electrodes for Solid-State Dye-Sensitized Solar Cells: Cross-Analysis of the Critical Parameters. Adv. Energy Mater. 2014, 4, 1301362. [Google Scholar] [CrossRef]
Mashreghi, A.; Bahrami Moghadam, F. Effect of photoanode active area on photovoltaic parameters of dye sensitized solar cells through its effect on series resistance investigated by electrochemical impedance spectroscopy. J. Solid State Electrochem. 2016, 20, 1361–1368. [Google Scholar] [CrossRef]
Liu, W.; Hu, L.; Dai, S.; Guo, L.; Jiang, N.; Kou, D. The effect of the series resistance in dye-sensitized solar cells explored by electron transport and back reaction using electrical and optical modulation techniques. Electrochim. Acta 2010, 55, 2338–2343. [Google Scholar] [CrossRef]
Aftabuzzaman, M.; Sarker, S.; Lu, C.; Kim, H.K. In-depth understanding of the energy loss and efficiency limit of dye-sensitized solar cells under outdoor and indoor conditions. J. Mater. Chem. A 2021, 9, 24830–24848. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the equivalent circuit model of DSSC.

Figure 2. Support vector regression architecture.

Figure 3. Schematic transformation to a linear support vector regression.

Figure 4. Random Forest regression flowchart.

Figure 5. Flow chart of the (a) RFR and (b) SVR models.

Figure 6. (a) Experimental and predicted J-V characteristics. (b) Experimental and predicted P-V characteristics of the DSSC.

Figure 7. Comparison of the actual and SVR predicted (a) current density (J) and (b) power (P) results.

Figure 8. J–V and P–V curves (a) Experimental and predicted J-V characteristics. (b) Experimental and predicted power-voltage characteristics of the DSSC.

Figure 9. Comparison of the actual and RFR predicted (a) current density (J) and (b) Power (P) results.

Figure 10. Correlation matrix of DSSC variables.

Figure 11. SHAP summary plot (a) impact on model output and (b) average impact on model output magnitude.

Table 1. (a) Summary of photovoltaic performance metrics for the SVR model (kernel type: linear). (b) Summary of photovoltaic performance metrics for the SVR model (kernel type: RBF).

(a)
SVR current density performance metrics (kernel type: linear)
Set Train Test	MAE 3.7562 3.8057		R² 0.7149 0.5599
SVR power performance metrics (kernel type: linear)
Set Train Test	MAE 2.4852 2.4368		R² 0.0895 0.3725
Difference between the experimental and predicted values
Parameters Short-circuit current density (mA/cm²) Maximum Power (mW)	Experiment 12.30 1.98	SVR Model (kernel type: linear) 11.38 1.79		Difference (%) 7.45 9.60
(b)
SVR current density performance metrics (kernel type: RBF)
Set Train Test	MAE 1.5674 2.2528		R² 0.8796 0.6460
SVR power performance metrics (kernel type: RBF)
Set Train Test	MAE 0.7060 1.0545		R² 0.9017 0.7536
Difference between the experimental and predicted values
Parameters Short-circuit current density (mA/cm²) Maximum Power (mW)	Experiment 12.30 1.98	SVR Model (kernel type: RBF) 12.45 1.91		Difference (%) 1.22 3.54

Table 2. Summary of photovoltaic performance metrics for RFR mode.

RFR Current Density Performance Metrics
Set Train Test	MAE 0.8441 0.5611		R² 0.9252 0.9784
RFR Power performance metrics
Set Train Test	MAE 0.4445 0.3156		R² 0.9308 0.9792
Difference between the experimental and predicted values
Parameters Short-circuit current density (mA/cm²) Maximum Power (mW)	Experiment 12.30 1.98	RFR Model 12.39 1.96		Difference (%) 0.73 1.01

Table 3. Dataset features and their descriptive statistics.

	Voltage (mV)	Current Density (mA/cm²)	Power (mW)	Resistance (Ω·cm²)
count	100	100	100	100
mean	0.0005	11.5284	−2.5707	−0.0698
std	586.0926	10.4680	5.3470	0.8161
min	−1000	−11.8700	−27.5030	−7.3033
25%	−500	7.8758	−3.8956	−0.0762
50%	0	12.3885	−1.1410	−0.0360
75%	500	12.8270	1.0680	0.0451
max	1000	55.0060	1.9826	2.5508

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Onah, E.H.; Lethole, N.L.; Mukumba, P. Optoelectronic Devices Analytics: MachineLearning-Driven Models for Predicting the Performance of a Dye-Sensitized Solar Cell. Electronics 2025, 14, 1948. https://doi.org/10.3390/electronics14101948

AMA Style

Onah EH, Lethole NL, Mukumba P. Optoelectronic Devices Analytics: MachineLearning-Driven Models for Predicting the Performance of a Dye-Sensitized Solar Cell. Electronics. 2025; 14(10):1948. https://doi.org/10.3390/electronics14101948

Chicago/Turabian Style

Onah, Emeka Harrison, N. L. Lethole, and P. Mukumba. 2025. "Optoelectronic Devices Analytics: MachineLearning-Driven Models for Predicting the Performance of a Dye-Sensitized Solar Cell" Electronics 14, no. 10: 1948. https://doi.org/10.3390/electronics14101948

APA Style

Onah, E. H., Lethole, N. L., & Mukumba, P. (2025). Optoelectronic Devices Analytics: MachineLearning-Driven Models for Predicting the Performance of a Dye-Sensitized Solar Cell. Electronics, 14(10), 1948. https://doi.org/10.3390/electronics14101948

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optoelectronic Devices Analytics: MachineLearning-Driven Models for Predicting the Performance of a Dye-Sensitized Solar Cell

Abstract

1. Introduction

2. Theoretical Framework

2.1. Equivalent Circuit Model of DSSCs

2.2. Support Vector Regression (SVR)

2.3. Random Forest Regression (RFR)

3. Methods

Model Training

4. Results and Discussion

4.1. SVR Prediction for J_sc and P_max

4.2. RFR Prediction for J_sc and P_max

4.3. Correlation Analysis of DSSC Variables

4.4. Shapley Additive exPlanation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Optoelectronic Devices Analytics: MachineLearning-Driven Models for Predicting the Performance of a Dye-Sensitized Solar Cell

Abstract

1. Introduction

2. Theoretical Framework

2.1. Equivalent Circuit Model of DSSCs

2.2. Support Vector Regression (SVR)

2.3. Random Forest Regression (RFR)

3. Methods

Model Training

4. Results and Discussion

4.1. SVR Prediction for Jsc and Pmax

4.2. RFR Prediction for Jsc and Pmax

4.3. Correlation Analysis of DSSC Variables

4.4. Shapley Additive exPlanation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1. SVR Prediction for J_sc and P_max

4.2. RFR Prediction for J_sc and P_max