Toward Sustainable Solar Energy: Predicting Recombination Losses in Perovskite Solar Cells with Deep Learning

Abbas, Syed Raza; Mir, Bilal Ahmad; Ryu, Jihyoung; Lee, Seung Won

doi:10.3390/su17125287

Open AccessArticle

Toward Sustainable Solar Energy: Predicting Recombination Losses in Perovskite Solar Cells with Deep Learning

¹

Department of Precision Medicine, School of Medicine, Sungkyunkwan University, Suwon 16419, Republic of Korea

²

Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si 54896, Republic of Korea

³

Electronics and Telecommunications Research Institute (ETRI), Gwangju 61012, Republic of Korea

⁴

Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea

⁵

Department of Metabiohealth, Sungkyunkwan University, Suwon 16419, Republic of Korea

⁶

Personalized Cancer Immunotherapy Research Center, School of Medicine, Sungkyunkwan University, Suwon 16419, Republic of Korea

⁷

Department of Family Medicine, Kangbuk Samsung Hospital, School of Medicine, Sungkyunkwan University, 29 Saemunan-ro, Jongno-gu, Seoul 03181, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(12), 5287; https://doi.org/10.3390/su17125287

Submission received: 7 May 2025 / Revised: 4 June 2025 / Accepted: 5 June 2025 / Published: 7 June 2025

(This article belongs to the Special Issue Emerging Technologies in Silicon Solar Cells for Sustainable Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

Perovskite solar cells (PSCs) are emerging as leading candidates for sustainable energy generation due to their high power conversion efficiencies and low fabrication costs. However, their performance remains constrained by non-radiative recombination losses primarily at grain boundaries, interfaces, and within the perovskite bulk that are difficult to characterize under realistic operating conditions. Traditional methods such as photoluminescence offer valuable insights but are complex, time-consuming, and often lack scalability. In this study, we present a novel Long Short-Term Memory (LSTM)-based deep learning framework for dynamically predicting dominant recombination losses in PSCs. Trained on light intensity-dependent current–voltage (J–V) characteristics, the proposed model captures temporal behavior in device performance and accurately distinguishes between grain boundary, interfacial, and band-to-band recombination mechanisms. Unlike static ML approaches, our model leverages sequential data to provide deeper diagnostic capability and improved generalization across varying conditions. This enables faster, more accurate identification of efficiency limiting factors, guiding both material selection and device optimization. While silicon technologies have long dominated the photovoltaic landscape, their high-temperature processing and rigidity pose limitations. In contrast, PSCs—especially when combined with intelligent diagnostic tools like our framework—offer enhanced flexibility, tunability, and scalability. By automating recombination analysis and enhancing predictive accuracy, our framework contributes to the accelerated development of high-efficiency PSCs, supporting the global transition to clean, affordable, and sustainable energy solutions.

Keywords:

sustainable; energy; deep learning; solar cells; environment; silicon technologies

1. Introduction

The rapid depletion of fossil fuel reserves and the environmental impact of carbon-intensive energy systems have catalyzed a global push toward cleaner, renewable energy sources. Among these, solar energy stands out as one of the most abundant and sustainable options. Silicon-based photovoltaic (PV) technology currently dominates the commercial solar cell market, achieving power conversion efficiencies (PCEs) of up to 26% for single-crystalline silicon and 22% for polycrystalline silicon devices [1]. However, despite significant cost reductions over the past decade, silicon PV technology still faces major challenges such as high-temperature manufacturing requirements, complex purification processes, and limited mechanical flexibility [2]. These limitations hinder its integration into modern applications demanding lightweight, flexible, and application-specific energy solutions. Among emerging photovoltaic technologies, perovskite solar cells (PSCs) have garnered extraordinary attention due to their remarkable power conversion efficiencies, ease of processing, and tunable optoelectronic properties. Unlike conventional silicon-based technologies, PSCs offer the promise of high performance at low fabrication costs, making them a leading candidate for scalable, next-generation solar energy systems. Their potential to outperform traditional photovoltaics in both efficiency and versatility has placed PSCs at the center of cutting-edge solar research and development.

In response to these limitations, recent studies have explored new system-level and materials-level energy solutions, such as thermodynamic optimization of solar power towers using CO₂-based binary mixtures, which aim to enhance energy conversion efficiency and sustainability in solar applications [3]. Moreover, integrated planning strategies in energy systems, including electric and heating networks, are gaining traction to support reliable and efficient renewable deployment [4]. These system-wide advances reinforce the urgent need for complementary progress in next-generation solar materials like perovskites.

To address these challenges, researchers are actively investigating next-generation photovoltaic materials and architectures. Among them, perovskite solar cells (PSCs) have emerged as particularly promising candidates due to their exceptional optoelectronic properties, tunable bandgaps, and low-cost fabrication processes [2]. Since their initial demonstration in 2009 with a modest efficiency of 3.2%, PSCs have experienced a dramatic rise in performance, reaching certified efficiencies of over 25% within just a decade [5]. Both organic and inorganic perovskites have attracted intense interest for their dual role as efficient light-absorbing and charge-transport materials, potentially offering a path to high-efficiency, low-cost solar devices that can outperform traditional silicon cells in various applications [5]. Recent comprehensive reviews have emphasized the exceptional trajectory of perovskite solar cell research and the strategies for improving stability, efficiency, and scalability [6,7]. Innovations such as defect management via chlorination and bi-interfacial engineering have also been shown to substantially enhance the optoelectronic quality and performance of reduced-dimensional and inverted PSC architectures [8,9]. In addition, investigations into Pb-free alternatives are being actively pursued to align PSCs with environmental safety requirements [10].

In parallel with material innovations, the field of photovoltaics is also undergoing a digital transformation, driven by advances in data science, machine learning (ML), and artificial intelligence (AI). These tools are increasingly being integrated into the material discovery and device optimization pipelines to accelerate experimental workflows and improve the understanding of complex physical phenomena [11,12,13]. ML models have demonstrated the ability to predict the properties of materials, identify optimal fabrication parameters, and even suggest new compositions for performance enhancement [12,14]. In the context of PSCs, machine learning has been applied to investigate layer composition, doping profiles, and charge transport characteristics, with the goal of enhancing stability and PCE [15,16,17]. Furthermore, recent developments in ML-powered experimental planning and automated data analysis have shown promise in reducing trial-and-error-based lab work, enabling more systematic and reproducible research methodologies [18,19,20]. Cross-disciplinary applications of AI techniques in other domains, such as load forecasting [21], environmental monitoring [22], and battery fault diagnosis [23], show the adaptability and power of ML frameworks in managing complex systems with noisy, high-dimensional data. Similarly, classification and time-series models such as LSTM, ResLNet, and hybrid wavelet–LSTM combinations have demonstrated effectiveness in temporal learning tasks ranging from VR-based EEG signal recognition to action recognition and education technology [24,25,26]. These developments underscore the relevance of adopting time-dependent deep learning models for interpreting dynamic behavior in photovoltaic systems.

A critical barrier to further performance enhancement in PSCs lies in understanding and minimizing non-radiative recombination losses, which primarily occur either within the bulk of the perovskite absorber or at its interfaces with transport layers. Hybrid halide perovskites possess advantageous optoelectronic properties, such as large absorption coefficients, high charge carrier mobilities, and long diffusion lengths, which contribute to their suitability for thin-film solar cell applications [27,28]. However, despite advances in fabrication techniques that improve film compactness and the grain size, Shockley–Read–Hall (SRH) trap-assisted recombination remains the dominant loss mechanism, especially at grain boundaries and interfacial regions with electron and hole transport layers (ETLs/HTLs) [29,30,31,32,33]. Traditional characterization tools like photoluminescence (PL) spectroscopy have been employed to probe recombination dynamics and charge transfer kinetics at these interfaces. Although insightful, such techniques are typically carried out under non-standard device conditions and often require multiple device configurations, making them laborious and less scalable [34,35]. As high-efficiency PSCs consistently exhibit interface-limited recombination, there is a growing need for fast, accurate, and automated methods to distinguish between bulk and interfacial recombination losses under realistic operating conditions [36]. This highlights the importance of data-driven approaches that can analyze light intensity-dependent current–voltage (J–V) characteristics and efficiently infer the dominant recombination pathways in perovskite devices.

Furthermore, the field of materials science has seen remarkable progress in applying bio-derived and carbon-based nanomaterials across the energy and biomedical sectors. These advances in renewable material design and nanoscale engineering provide promising avenues for hybrid solar energy solutions and environmentally sustainable device architectures [37,38,39]. Such developments reinforce the broader potential of integrating green materials with AI-driven device optimization strategies in the pursuit of next-generation clean energy systems.

Recent advances in machine learning have opened new pathways to accurately predict dominant recombination mechanisms in PSCs, particularly those arising from grain boundaries (GBs), interfacial defects, and band-to-band recombination. For instance, Vincent et al. utilized photovoltaic performance metrics such as open-circuit voltage (Voc), light intensity, and the ideality factor to develop a model capable of identifying the primary sources of recombination losses in PSC devices [40]. More recently, state-of-the-art approaches like extreme gradient boosting (XGBoost) for predicting recombination types across large-scale simulated datasets have been developed. These studies employed advanced tools such as SHAP for model explainability and Optuna for hyperparameter tuning, revealing the influence of various input features on recombination losses [41]. However, despite their effectiveness, these models largely operate on static data inputs and often overlook the temporal dynamics that are inherent in real-world solar cell operation.

To address the limitations of existing static prediction models and contribute toward the development of sustainable, next-generation photovoltaic technologies, this study proposes a novel data-driven framework based on Long Short-Term Memory (LSTM) neural networks. The proposed approach leverages light intensity-dependent current–voltage (J–V) characteristics to dynamically model and predict the dominant recombination losses whether occurring in the perovskite bulk, at grain boundaries, or at the interfaces with transport layers. By learning temporal patterns across varying illumination conditions, the LSTM framework captures time-dependent recombination dynamics with greater fidelity compared to traditional machine learning models. This enables more accurate identification of the underlying causes of efficiency loss in perovskite solar cells, ultimately guiding the optimization of material composition and device design. Through this work, we aim to support the broader transition to clean energy systems by enabling faster, more intelligent design of high-performance PSCs aligned with the United Nations Sustainable Development Goal 7: Affordable and Clean Energy.

The main contributions of this study are summarized as follows:

We propose a novel LSTM-MLP hybrid deep learning framework that dynamically predicts dominant recombination losses (band-to-band, grain boundary, and interface) in perovskite solar cells using light intensity-dependent J–V characteristics.
We conduct extensive ablation and comparative studies, demonstrating the model’s superiority over traditional ML methods (e.g., Random Forest and XGBoost) in terms of accuracy and generalizability.
We perform interpretability analysis using permutation importance and ROC curves to validate the physical relevance and diagnostic capabilities of the model.
Our framework contributes toward automated, scalable recombination diagnostics in PSCs, aligning with global goals for clean and efficient solar energy technologies.

2. Methodology

In this section, we detail the methodology for developing a machine learning-based framework to predict the dominant recombination loss mechanisms in PSCs. The goal of this study is to leverage temporal dependencies in light intensity-dependent current–voltage characteristics to identify the primary loss mechanisms of grain boundary (GB), interface, or band-to-band recombination. Figure 1 illustrates the overall workflow of the study. The process begins with the acquisition of a high-quality dataset comprising material properties, device parameters, and photovoltaic metrics across varying light intensities. Key features are extracted and standardized to ensure consistent input to the model. A hybrid LSTM-MLP classifier is then trained to identify the dominant recombination mechanism (band-to-band, grain boundary, or interface). The model’s performance is thoroughly evaluated using standard metrics, and permutation importance is applied to interpret the most influential features. This end-to-end pipeline ensures both accuracy and physical interpretability, making it well suited for sustainability-focused device diagnostics.

2.1. Dataset and Feature Engineering

The dataset used in this study is derived from the publicly available dataset published by Le et al. [40], which was generated via extensive drift-diffusion simulations using the open-source software SIMsalabim. The dataset includes 2,470,491 data points, each corresponding to a perovskite solar cell (PSC) characterized by light intensity-dependent current–voltage (J–V) behavior. Each instance is labeled with its dominant recombination mechanism—band-to-band, grain boundary (GB), or interface—based on the known physical interpretation described in the original study [40].

The simulations cover a diverse range of material and device parameters including absorber and transport layer (TL) thickness, doping concentrations, mobilities, temperature, voltage bias, and light intensity. These parameters were varied systematically across ranges. To ensure suitability for machine learning, the dataset was cleaned by removing duplicate and null entries and then balanced across the three recombination labels. A subset of 823,497 samples with equal label distribution was selected to mitigate bias. Feature importance was analyzed using Pearson correlation coefficients and validated through SHAP-based interpretability. The dataset was partitioned into 75% for training and 25% for testing, with seven-fold cross-validation applied during training to enhance the generalization and robustness of the models. This extensive and diverse dataset effectively simulates realistic device behaviors under various operational conditions, making it a reliable basis for modeling recombination losses in PSCs using data-driven approaches.

2.1.1. Feature Selection

The selected feature set is critical for capturing the nuances of recombination losses. The input feature vector

X

for the proposed LSTM-MLP classifier consists of both material parameters and light intensity-dependent photovoltaic performance metrics. These features encapsulate the electrical, physical, and recombination-related behavior of the device. Table 1 summarizes the complete set of input variables used for training and evaluation. For varying light intensities, additional feature instances such as

{Voc}_{i}

,

{Jsc}_{i}

, and

{FF}_{i}

are computed where

i \in Gfrac

and

i < 1

.

G f r a c

represents a series of fractions related to light intensity variations. These features are directly related to the characteristics of the PSCs and their efficiency, making them well suited for predicting recombination losses.

Figure 2 provides an overview of the feature distributions via boxplots, which are segmented by recombination class. These plots reveal significant inter-class variability for features such as

{Voc}_{1.00}

and

mob_IR

, which highlights their discriminative potential for classification. Some features demonstrate moderate overlap but preserve distinctive statistical tendencies in median and IQR values. This reinforces the importance of multi-dimensional feature analysis and validates the use of normalization prior to model training. These statistical disparities support the underlying assumptions of the neural classifier by supplying diverse and informative input vectors.

2.1.2. Data Scaling

Given the heterogeneous nature of the dataset, we scale the features to ensure that each input variable contributes equally to the learning process. We use standard scaling to normalize the data, which is defined as follows:

{\hat{x}}_{i} = \frac{x_{i} - μ}{σ}

(1)

where

μ

and

σ

represent the mean and standard deviation of the feature x over all training samples.

2.1.3. Data Splitting

The dataset is split into the training (75%) and testing (25%) sets using random stratified sampling. Furthermore, the training set is further divided into training and validation subsets with an 80–20 ratio. This ensures a robust evaluation of the model’s performance during training and helps prevent overfitting.

2.2. Model Architecture

The core of the proposed framework is a hybrid architecture consisting of Long Short-Term Memory (LSTM) layers for sequential feature extraction followed by a Multi-Layer Perceptron (MLP) classifier for the final prediction. This hybrid approach is ideal for handling time-dependent characteristics of PSCs, such as current–voltage behavior under different light intensities, and predicting the dominant loss mechanism.

2.2.1. Long Short-Term Memory (LSTM)

LSTM is a variant of recurrent neural networks (RNNs) designed to capture long-term dependencies in sequential data. Given the temporal nature of PSC performance, LSTM is highly suitable for modeling the sequential relationships between light intensity and current–voltage characteristics, as they evolve during solar cell operation.

The LSTM model’s equations are as follows:

i_{t} = σ (W_{i} \cdot x_{t} + U_{i} \cdot h_{t - 1} + b_{i})

(2)

f_{t} = σ (W_{f} \cdot x_{t} + U_{f} \cdot h_{t - 1} + b_{f})

(3)

g_{t} = tanh (W_{g} \cdot x_{t} + U_{g} \cdot h_{t - 1} + b_{g})

(4)

o_{t} = σ (W_{o} \cdot x_{t} + U_{o} \cdot h_{t - 1} + b_{o})

(5)

c_{t} = f_{t} \cdot c_{t - 1} + i_{t} \cdot g_{t}

(6)

h_{t} = o_{t} \cdot tanh (c_{t})

(7)

where

-: $x_{t}$ is the input vector at time step t;
-: $h_{t - 1}$ is the hidden state from the previous time step;
-: $c_{t}$ is the cell state at time step t;
-: $σ$ and tanh are activation functions.

2.2.2. Bidirectional LSTM

We employ a bidirectional LSTM architecture, where the input sequence is processed in both the forward and backward directions. This bidirectional approach allows the model to capture both past and future context, providing a richer representation of the sequential data.

2.2.3. Multi-Layer Perceptron (MLP)

After the LSTM layer, the hidden state

h_{t}

is passed through an MLP for classification. The MLP is composed of several fully connected layers with ReLU activation and dropout regularization:

\hat{y} = MLP (h_{t})

(8)

The full architecture is summarized in Table 2. Each layer is carefully selected to balance expressiveness and generalization capability, which is particularly important when modeling the complex dependencies of recombination loss mechanisms.

2.3. Training Procedure

The model is trained using the Adam optimizer with a learning rate of

10^{- 3}

and a cross-entropy Loss function. To address the issue of class imbalance, we compute class weights based on the frequency of each class in the training set and incorporate these weights into the loss function. The training loss is computed as

L (\hat{y}, y) = - \sum_{c = 1}^{C} y_{c} log ({\hat{y}}_{c})

(9)

where C is the number of classes (3 in this case),

y_{c}

is the true class label, and

{\hat{y}}_{c}

is the predicted probability for class c.

We employ an “early stopping” strategy with a patience of 10 epochs to prevent overfitting. The model is saved whenever the validation loss decreases, and training is halted if no improvement is seen after the specified patience period.

2.4. Evaluation Metrics

The model’s performance is evaluated using standard classification metrics, namely accuracy, precision, recall, and F1-score. These metrics quantitatively assess the agreement between predicted and actual class labels and are crucial for evaluating the classifier’s effectiveness, especially in multi-class settings like recombination loss identification in PSCs.

Given the critical role of accurately diagnosing the dominant recombination mechanism (band-to-band, grain boundary, or interface recombination) in optimizing PSC efficiency, it is important to not only measure overall correctness but also understand the balance between different types of prediction errors.

Accuracy measures the overall correctness of the classifier by computing the ratio of correctly predicted samples to the total number of samples. It provides a general sense of model performance but can be misleading in the presence of class imbalance.

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(10)

Precision represents the proportion of true positive predictions among all instances classified as a particular recombination type. High precision ensures that when the model predicts a specific recombination mechanism, it is likely critical for avoiding misdiagnosis that could lead to incorrect device optimization steps.

Precision = \frac{T P}{T P + F P}

(11)

Recall quantifies the model’s ability to identify all actual instances of a given recombination loss type. High recall is essential to ensure that no dominant recombination mechanism goes undetected, which is vital for targeted material and device improvements.

Recall = \frac{T P}{T P + F N}

(12)

The F1-score, as the harmonic mean of precision and recall, balances the trade-off between false positives and false negatives. This metric is particularly useful in scenarios where the costs of different misclassifications vary and a balance between precision and recall is necessary.

F 1 - Score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

(13)

In these formulas,

T P

,

T N

,

F P

, and

F N

refer to the true positive, true negative, false positive, and false negative, respectively.

In this study, a comprehensive classification report is generated for each recombination loss class: band-to-band, grain boundary (GB), and interface recombination. In addition to standard metrics, a domain-specific evaluation is performed to analyze the model’s robustness and diagnostic reliability under varying photovoltaic operating conditions. This includes conditional accuracy across different light intensities and detailed confusion matrix analyses, providing insights into the model’s sensitivity to the nuanced feature distributions inherent in PSC recombination phenomena. Such an evaluation is fundamental because the precise identification of recombination mechanisms directly impacts the design strategies for minimizing losses and improving device efficiency, thereby accelerating the development of high-performance perovskite solar cells.

2.5. Suitability of the Approach for the Problem

The hybrid architecture of LSTM and MLP is particularly suitable for this problem due to the following reasons:

PSCs’ current–voltage characteristics exhibit temporal dependencies, especially under varying light intensities. LSTM is well suited to capture these dependencies, allowing the model to account for the dynamic nature of the solar cell’s performance.
The recombination losses in PSCs are governed by multiple physical processes, such as grain boundaries, interfaces, and band-to-band recombination. The model’s ability to handle multi-dimensional data with multiple features, including light intensity and voltage characteristics, makes it ideal for differentiating between these loss mechanisms.
The bidirectional LSTM allows the model to capture information from both past and future time steps, enriching its ability to predict the recombination loss mechanism more accurately. Furthermore, the integration of MLP ensures that the output is a clear classification of the dominant recombination loss.

This approach bridges the gap between experimental observations and machine learning, providing an efficient and scalable solution for identifying and mitigating recombination losses in PSCs, which is critical for improving their overall efficiency and sustainability.

3. Results

3.1. Ablation Study

To evaluate the effectiveness of the LSTM module in our model, we perform an ablation study by comparing the performance of the model with and without the LSTM component. This comparison helps us understand the impact of the LSTM architecture on the classification of recombination losses in PSCs. Specifically, we examine key performance metrics, including precision, recall, F1-score, and MCC, across three classes, namely band-to-band, grain boundary (GB), and interface. The results are summarized in the Table 3.

The classification results demonstrate a significant improvement in performance with the inclusion of the LSTM module. Specifically, the model with LSTM shows improved precision, recall, F1-score, and MCC across all three classes when compared to the model without LSTM. For the band-to-band class, there is a slight decrease in precision (from 0.9640 to 0.9566) with the LSTM model but a noticeable increase in recall (from 0.9725 to 0.9839). This indicates that LSTM helps the model identify a higher number of true positives in this class. The F1-score remains nearly identical (0.9683 without LSTM vs. 0.9700 with LSTM), suggesting that the balance between precision and recall is maintained, while the recall improvement indicates better performance in detecting band-to-band recombination. In the grain boundary (GB) class, the model with LSTM significantly outperforms the one without LSTM. Precision increased from 0.7449 to 0.8122, indicating that more true positive samples are correctly identified. Recall also improved, though slightly (from 0.8712 to 0.8476), which leads to an improved F1-score (from 0.8031 to 0.8295). Additionally, MCC showed a substantial increase (from 0.6980 to 0.7419), highlighting better model robustness and classification reliability for this class.

For the interface class, the model with LSTM achieved improvements in both precision and recall. Precision increased from 0.8148 to 0.8222, and recall improved from 0.6697 to 0.7629. These improvements are crucial for accurately classifying recombination losses at the interface. Consequently, the F1-score improved from 0.7352 to 0.7915, reflecting a more balanced classification performance. MCC also saw a notable increase, from 0.6272 to 0.6940, indicating better model performance in this category. The inclusion of the LSTM module led to a 10.71% increase in overall accuracy (from 0.8377 to 0.8648) and a 4.73% improvement in MCC (from 0.7604 to 0.7977). These results demonstrate that the LSTM layer effectively captures temporal dependencies in the data, leading to better generalization and more accurate classification of recombination losses. The improvements are especially pronounced in the detection of grain boundaries and interface recombination losses, which are critical for optimizing the performance of perovskite solar cells.

3.2. Comparison with Previously Reported Models

Table 4 presents a comparative analysis of our proposed LSTM-MLP hybrid model with previously reported machine learning classifiers, including the Random Forest (RF) and XGBoost. As observed, the proposed model outperforms the traditional methods in overall accuracy (86.48%) while also achieving superior correct prediction for the band-to-band recombination class (98.39%).

Furthermore, the model maintains competitive performance on grain boundary (GB) classification at 85%, aligning with the best-performing XGBoost model. Importantly, our model reports an MCC of 0.7977, indicating a strong correlation between the predicted and true labels across all classes. MCC was not reported in previous models, highlighting an additional strength of our evaluation pipeline. This reinforces the effectiveness of deep neural architectures, particularly the integration of LSTM layers, for capturing nonlinear and temporal dependencies in perovskite solar cell performance metrics across varying light intensities.

Unlike tree-based methods, our approach leverages the LSTM’s ability to model temporal dependencies in light intensity sequences, capturing dynamic behaviors in PSCs that static models cannot. The MLP layer then integrates these temporal features for classification. This fundamentally improves the detection of recombination losses, as demonstrated by superior performance metrics and enhanced interpretability through permutation importance analysis.

4. Discussion

To evaluate the classification performance of the proposed LSTM-MLP model in distinguishing between dominant recombination losses, namely band-to-band, grain boundary (GB), and interface, a series of diagnostic visualizations and interpretability techniques was employed.

Figure 3 presents the receiver operating characteristic (ROC) curves for each class. These curves provide insights into the model’s discriminative ability, plotting the true positive rate (sensitivity) against the false positive rate (specificity) across various classification thresholds. An ROC curve that approaches the top-left corner of the plot indicates a high-performance classifier. The area under the curve (AUC) values reflect the model’s effectiveness in correctly identifying recombination types, particularly for band-to-band losses where the curve nearly touches the ideal boundary. These findings suggest that the model is highly effective at distinguishing among the loss mechanisms, especially in scenarios where subtle feature variations might obscure classification boundaries.

Further, the confusion matrix in Figure 4 quantifies model performance at the class level. The strong diagonal dominance signifies that the model achieves high classification accuracy across all classes, particularly for band-to-band recombination, which shows the fewest misclassifications. Some confusion remains between the GB and interface recombination types, indicating a degree of overlap in their feature representations. Nevertheless, the high values along the diagonal confirm the model’s robustness and consistency in capturing the distinct physical patterns associated with each recombination mechanism.

To complement the performance evaluation, permutation importance analysis was applied to interpret model behavior and highlight critical features influencing classification, as shown in Figure 5. This method quantifies the impact of each input variable on model output by measuring the drop in predictive accuracy when the feature is randomly shuffled. Features such as the diode ideality factor (n) and the photovoltaic parameters under varying light intensities (Voc_i, Jsc_i, and FF_i) emerge as the most influential. These reflect critical device physics, including charge carrier recombination and photogenerated current behavior across operational regimes.

Furthermore, interfacial transport parameters, specifically charge carrier mobilities at the left and right interfaces (mob_IL and mob_IR), demonstrate considerable influence, underscoring the role of interfacial dynamics in dominant recombination behavior. These results not only validate the model’s physical relevance but also establish its capability as a tool for guiding material and interface engineering in perovskite solar cells.

Overall, the combination of ROC analysis, confusion matrix visualization, and feature importance metrics collectively demonstrates that the LSTM-MLP model is not only accurate but also interpretable and aligned with domain knowledge. This makes it highly suitable for deployment in sustainable photovoltaic research and device optimization.

5. Conclusions

This study demonstrates the effectiveness of combining LSTM networks with MLP classifiers for classifying recombination losses in photovoltaic systems. The addition of LSTM significantly enhances the model’s ability to capture temporal dependencies, leading to improvements in key metrics such as accuracy, precision, and recall. Our ablation study highlights a notable increase in performance with LSTM, especially for the challenging “Interface” class.These findings underscore the potential of deep learning in optimizing photovoltaic system performance and advancing sustainable energy solutions. By improving the classification of recombination losses, this approach can contribute to the development of more efficient and sustainable solar energy technologies. This work provides a promising foundation for future research in renewable energy applications, focusing on deep learning techniques for enhancing system efficiency and sustainability.

Author Contributions

Conceptualization, S.R.A. and B.A.M.; Methodology, S.R.A., B.A.M. and J.R.; Validation, S.R.A., B.A.M. and J.R.; Data curation, S.R.A.; Writing—original draft, S.R.A. and B.A.M.; Writing—review & editing, J.R.; Supervision, S.W.L.; Project administration, S.W.L.; Funding acquisition, S.W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by SungKyunKwan University and the BK21 FOUR (Graduate School Innovation) funded by the Ministry of Education (MOE, Korea) and the National Research Foundation of Korea (NRF). This work was also supported by National Research Foundation (NRF) grants funded by the Ministry of Science and ICT (MSIT) and Ministry of Education (MOE), Republic of Korea (NRF[2021-R1-I1A2(059735)]; RS[2024-0040(5650)]; RS[2024-0044(0881)]; RS[2019-II19(0421)]).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sun, J.; Zuo, Y.; Sun, R.; Zhou, L. Research on the conversion efficiency and preparation technology of monocrystalline silicon cells based on statistical distribution. Sustain. Energy Technol. Assess. 2021, 47, 101482. [Google Scholar] [CrossRef]
Maldonado, S. The importance of new “sand-to-silicon” processes for the rapid future increase of photovoltaics. ACS Energy Lett. 2020, 5, 3628–3632. [Google Scholar] [CrossRef]
Niu, X.; Ma, N.; Bu, Z.; Hong, W.; Li, H. Thermodynamic analysis of supercritical Brayton cycles using CO₂-based binary mixtures for solar power tower system application. Energy 2022, 254, 124286. [Google Scholar] [CrossRef]
Du, Y.; Xue, Y.; Wu, W.; Shahidehpour, M.; Shen, X.; Wang, B.; Sun, H. Coordinated planning of integrated electric and heating system considering the optimal reconfiguration of district heating network. IEEE Trans. Power Syst. 2023, 39, 794–808. [Google Scholar] [CrossRef]
Roy, P.; Ghosh, A.; Barclay, F.; Khare, A.; Cuce, E. Perovskite solar cells: A review of the recent advances. Coatings 2022, 12, 1089. [Google Scholar] [CrossRef]
Zhou, Q.; Liu, X.; Liu, Z.; Zhu, Y.; Lu, J.; Chen, Z.; Li, C.; Wang, J.; Xue, Q.; He, F.; et al. Annual research review of perovskite solar cells in 2023. Mater. Futur. 2024, 3, 022102. [Google Scholar] [CrossRef]
Ouedraogo, N.A.N.; Ouyang, Y.; Guo, B.; Xiao, Z.; Zuo, C.; Chen, K.; He, Z.; Odunmbaku, G.O.; Ma, Z.; Long, W.; et al. Printing Perovskite Solar Cells in Ambient Air: A Review. Adv. Energy Mater. 2024, 14, 2401463. [Google Scholar] [CrossRef]
Yu, M.; Qin, T.; Gao, G.; Zu, K.; Zhang, D.; Chen, N.; Wang, D.; Hua, Y.; Zhang, H.; Zhao, Y.B.; et al. Multiple defects renovation and phase reconstruction of reduced-dimensional perovskites via in situ chlorination for efficient deep-blue (454 nm) light-emitting diodes. Light. Sci. Appl. 2025, 14, 102. [Google Scholar] [CrossRef]
Li, K.; Zhu, Y.; Chang, X.; Zhou, M.; Yu, X.; Zhao, X.; Wang, T.; Cai, Z.; Zhu, X.; Wang, H.; et al. Self-induced bi-interfacial modification via fluoropyridinic acid for high-performance inverted perovskite solar cells. Adv. Energy Mater. 2025, 15, 2404335. [Google Scholar] [CrossRef]
Hu, Y.; Zhao, S.; Chen, L. Can Pb-Free Halide Perovskites be Realized by Incorporating the Neutral or Anionic Molecule? Adv. Theory Simul. 2025, 2401546. [Google Scholar] [CrossRef]
Ngiam, K.Y.; Khor, W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019, 20, e262–e273. [Google Scholar] [CrossRef] [PubMed]
Carleo, G.; Cirac, I.; Cranmer, K.; Daudet, L.; Schuld, M.; Tishby, N.; Vogt-Maranto, L.; Zdeborová, L. Machine learning and the physical sciences. Rev. Mod. Phys. 2019, 91, 045002. [Google Scholar] [CrossRef]
Eesaar, H.; Joe, S.; Rehman, M.U.; Jang, Y.; Chong, K.T. SEiPV-Net: An efficient deep learning framework for autonomous multi-defect segmentation in electroluminescence images of solar photovoltaic modules. Energies 2023, 16, 7726. [Google Scholar] [CrossRef]
Cao, B.; Adutwum, L.A.; Oliynyk, A.O.; Luber, E.J.; Olsen, B.C.; Mar, A.; Buriak, J.M. How to optimize materials and devices via design of experiments and machine learning: Demonstration using organic photovoltaics. ACS Nano 2018, 12, 7434–7444. [Google Scholar] [CrossRef]
Ahmed, S.; Alshater, M.M.; El Ammari, A.; Hammami, H. Artificial intelligence and machine learning in finance: A bibliometric review. Res. Int. Bus. Financ. 2022, 61, 101646. [Google Scholar] [CrossRef]
Bhavsar, P.; Safro, I.; Bouaynaya, N.; Polikar, R.; Dera, D. Machine learning in transportation data analytics. In Data Analytics for Intelligent Transportation Systems; Elsevier: Amsterdam, The Netherlands, 2017; pp. 283–307. [Google Scholar]
Häse, F.; Roch, L.M.; Aspuru-Guzik, A. Next-generation experimentation with self-driving laboratories. Trends Chem. 2019, 1, 282–291. [Google Scholar] [CrossRef]
Xia, R.; Brabec, C.J.; Yip, H.L.; Cao, Y. High-throughput optical screening for efficient semitransparent organic solar cells. Joule 2019, 3, 2241–2254. [Google Scholar] [CrossRef]
Wilbraham, L.; Sprick, R.S.; Jelfs, K.E.; Zwijnenburg, M.A. Mapping binary copolymer property space with neural networks. Chem. Sci. 2019, 10, 4973–4984. [Google Scholar] [CrossRef]
Lampe, C.; Kouroudis, I.; Harth, M.; Martin, S.; Gagliardi, A.; Urban, A.S. Rapid Data-Efficient Optimization of Perovskite Nanocrystal Syntheses through Machine Learning Algorithm Fusion. Adv. Mater. 2023, 35, 2208772. [Google Scholar] [CrossRef]
Yuqi, J.; An, A.; Lu, Z.; Ping, H.; Xiaomei, L. Short-term load forecasting based on temporal importance analysis and feature extraction. Electr. Power Syst. Res. 2025, 244, 111551. [Google Scholar] [CrossRef]
Fang, C.; Song, K.; Yan, Z.; Liu, G. Monitoring phycocyanin in global inland waters by remote sensing: Progress and future developments. Water Res. 2025, 275, 123176. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Wang, L.; Zhang, J.; Wu, Q.; Jiang, L.; Shi, Y.; Lyu, L.; Cai, G. Fault diagnosis of energy storage batteries based on dual driving of data and models. J. Energy Storage 2025, 112, 115485. [Google Scholar] [CrossRef]
Li, Y.; Zhang, H. EEG Signal Recognition of VR Education Game Players Based on Hybrid Improved Wavelet Threshold and LSTM. Int. Arab J. Inf. Technol. (IAJIT) 2025, 22, 170–181. [Google Scholar] [CrossRef]
Wang, T.; Li, J.; Wu, H.N.; Li, C.; Snoussi, H.; Wu, Y. ResLNet: Deep residual LSTM network with longer input for action recognition. Front. Comput. Sci. 2022, 16, 166334. [Google Scholar] [CrossRef]
Li, S.; Yang, J.; Bao, H.; Xia, D.; Zhang, Q.; Wang, G. Cost-Sensitive Neighborhood Granularity Selection for Hierarchical Classification. IEEE Trans. Knowl. Data Eng. 2025. [Google Scholar] [CrossRef]
Edri, E.; Kirmayer, S.; Mukhopadhyay, S.; Gartsman, K.; Hodes, G.; Cahen, D. Elucidating the charge carrier separation and working mechanism of CH3NH3PbI3- x Cl x perovskite solar cells. Nat. Commun. 2014, 5, 3461. [Google Scholar] [CrossRef]
Stranks, S.D.; Eperon, G.E.; Grancini, G.; Menelaou, C.; Alcocer, M.J.; Leijtens, T.; Herz, L.M.; Petrozza, A.; Snaith, H.J. Electron-hole diffusion lengths exceeding 1 micrometer in an organometal trihalide perovskite absorber. Science 2013, 342, 341–344. [Google Scholar] [CrossRef]
Stolterfoht, M.; Wolff, C.M.; Márquez, J.A.; Zhang, S.; Hages, C.J.; Rothhardt, D.; Albrecht, S.; Burn, P.L.; Meredith, P.; Unold, T.; et al. Visualization and suppression of interfacial recombination for high-efficiency large-area pin perovskite solar cells. Nat. Energy 2018, 3, 847–854. [Google Scholar] [CrossRef]
Fang, R.; Wu, S.; Chen, W.; Liu, Z.; Zhang, S.; Chen, R.; Yue, Y.; Deng, L.; Cheng, Y.B.; Han, L.; et al. [6, 6]-Phenyl-C61-butyric acid methyl ester/cerium oxide bilayer structure as efficient and stable electron transport layer for inverted perovskite solar cells. ACS Nano 2018, 12, 2403–2414. [Google Scholar] [CrossRef]
Yang, G.; Chen, C.; Yao, F.; Chen, Z.; Zhang, Q.; Zheng, X.; Ma, J.; Lei, H.; Qin, P.; Xiong, L.; et al. Effective carrier-concentration tuning of SnO2 quantum dot electron-selective layers for high-performance planar perovskite solar cells. Adv. Mater. 2018, 30, 1706023. [Google Scholar] [CrossRef]
Sherkar, T.S.; Momblona, C.; Gil-Escrig, L.; Bolink, H.J.; Koster, L.J.A. Improving perovskite solar cells: Insights from a validated device model. Adv. Energy Mater. 2017, 7, 1602432. [Google Scholar] [CrossRef]
Momblona, C.; Gil-Escrig, L.; Bandiello, E.; Hutter, E.M.; Sessolo, M.; Lederer, K.; Blochwitz-Nimoth, J.; Bolink, H.J. Efficient vacuum deposited pin and nip perovskite solar cells employing doped charge transport layers. Energy Environ. Sci. 2016, 9, 3456–3463. [Google Scholar] [CrossRef]
Leijtens, T.; Eperon, G.E.; Barker, A.J.; Grancini, G.; Zhang, W.; Ball, J.M.; Kandada, A.R.S.; Snaith, H.J.; Petrozza, A. Carrier trapping and recombination: The role of defect physics in enhancing the open circuit voltage of metal halide perovskite solar cells. Energy Environ. Sci. 2016, 9, 3472–3481. [Google Scholar] [CrossRef]
Shao, S.; Abdu-Aguye, M.; Qiu, L.; Lai, L.H.; Liu, J.; Adjokatse, S.; Jahani, F.; Kamminga, M.E.; ten Brink, G.H.; Palstra, T.T.; et al. Elimination of the light soaking effect and performance enhancement in perovskite solar cells using a fullerene derivative. Energy Environ. Sci. 2016, 9, 2444–2452. [Google Scholar] [CrossRef]
DeQuilettes, D.W.; Zhang, W.; Burlakov, V.M.; Graham, D.J.; Leijtens, T.; Osherov, A.; Bulović, V.; Snaith, H.J.; Ginger, D.S.; Stranks, S.D. Photo-induced halide redistribution in organic–inorganic perovskite films. Nat. Commun. 2016, 7, 11683. [Google Scholar] [CrossRef]
Liao, G.; Sun, E.; Kana, E.G.; Huang, H.; Sanusi, I.A.; Qu, P.; Jin, H.; Liu, J.; Shuai, L. Renewable hemicellulose-based materials for value-added applications. Carbohydr. Polym. 2024, 341, 122351. [Google Scholar] [CrossRef]
Li, T.; Sun, W.; Qian, D.; Wang, P.; Liu, X.; He, C.; Chang, T.; Liao, G.; Zhang, J. Plant-derived biomass-based hydrogels for biomedical applications. Trends Biotechnol. 2024, 43, 802–811. [Google Scholar] [CrossRef]
Liao, G.; Zhang, L.; Li, C.; Liu, S.Y.; Fang, B.; Yang, H. Emerging carbon-supported single-atom catalysts for biomedical applications. Matter 2022, 5, 3341–3374. [Google Scholar] [CrossRef]
Le Corre, V.M.; Sherkar, T.S.; Koopmans, M.; Koster, L.J.A. Identification of the dominant recombination process for perovskite solar cells based on machine learning. Cell Rep. Phys. Sci. 2021, 2, 100346. [Google Scholar] [CrossRef]
Akbar, B.; Tayara, H.; Chong, K.T. Unveiling dominant recombination loss in perovskite solar cells with a XGBoost-based machine learning approach. Iscience 2024, 27, 109200. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed methodology for classifying dominant recombination losses in perovskite solar cells using LSTM-enhanced MLP architecture. The flow includes data acquisition, feature engineering, model development, training, evaluation, and interpretation.

Figure 2. Boxplots of selected input features across the three recombination classes: band-to-band, GB, and interface. The x-axis represents the recombination type (band-to-band, GB, and interface), and the y-axis indicates the normalized feature values. Each subplot corresponds to a specific feature used in the model.

Figure 3. ROC curves for the classification model under various conditions. This plot demonstrates the performance of the model across different thresholds, illustrating the trade-off between sensitivity and specificity for each class.

Figure 4. Confusion matrix for the classification model, highlighting correct and incorrect classifications. Diagonal values represent correct predictions, while off-diagonal elements indicate class-level misclassifications.

Figure 5. Permutation importance of input features evaluated on the trained LSTM-MLP model. The bars represent the decrease in model performance when each feature is randomly shuffled, indicating its relative importance in predicting recombination losses.

Table 1. Summary of input features Used for recombination loss classification.

Feature	Description
n	Ideal exponential factor for the diode equation
`doping_left`	Doping concentration in the left region (HTL)
`doping_right`	Doping concentration in the right region (ETL)
`mob_IL`	Charge carrier mobility at the left interface
`mob_IR`	Charge carrier mobility at the right interface
`mun_0`	Initial electron mobility in the bulk
`mup_0`	Initial hole mobility in the bulk
`Voc1.00`, `Jsc1.00`, and `FF1.00`	Standard photovoltaic metrics at full light intensity
`Voc_i`, `Jsc_i`, and `FF_i`	Metrics computed at reduced light intensities
	where $i \in Gfrac$ and $i < 1$

Table 2. Architecture of the Multi-Layer Perceptron (MLP) head.

Layer No.	Description
1	Linear: $Linear (lstm_out_\dim, 256)$
2	ReLU Activation
3	Batch Normalization
4	Dropout (rate = 0.1)
5	Linear: $Linear (256, 128)$
6	ReLU Activation
7	Dropout (rate = 0.1)
8	Linear: $Linear (128, 64)$
9	ReLU Activation
10	Output: $Linear (64, num_classes)$

Table 3. Comparison of classification metrics with and without LSTM.

	Metric	Band-to-Band	GB	Interface
Without LSTM	Precision	0.9640	0.7449	0.8148
	Recall	0.9725	0.8712	0.6697
	F1-Score	0.9683	0.8031	0.7352
	Class Accuracy	0.9788	0.8577	0.8390
	MCC	0.9523	0.6980	0.6272
With LSTM	Precision	0.9566	0.8122	0.8222
	Recall	0.9839	0.8476	0.7629
	F1-Score	0.9700	0.8295	0.7915
	Class Accuracy	0.9797	0.8839	0.8660
	MCC	0.9549	0.7419	0.6940

Table 4. Comparison of the proposed model with previously reported models.

Model	Accuracy	Correct Prediction (Band-to-Band)	Correct Prediction (GB)	MCC
Le Corre et al. (Random Forest) [40]	0.82	0.97	0.74	0.73
Akbar et al. (XGBoost) [41]	0.85	0.97	0.85	0.77
Proposed LSTM-MLP Model	0.8648	0.9839	0.8476	0.7977

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abbas, S.R.; Mir, B.A.; Ryu, J.; Lee, S.W. Toward Sustainable Solar Energy: Predicting Recombination Losses in Perovskite Solar Cells with Deep Learning. Sustainability 2025, 17, 5287. https://doi.org/10.3390/su17125287

AMA Style

Abbas SR, Mir BA, Ryu J, Lee SW. Toward Sustainable Solar Energy: Predicting Recombination Losses in Perovskite Solar Cells with Deep Learning. Sustainability. 2025; 17(12):5287. https://doi.org/10.3390/su17125287

Chicago/Turabian Style

Abbas, Syed Raza, Bilal Ahmad Mir, Jihyoung Ryu, and Seung Won Lee. 2025. "Toward Sustainable Solar Energy: Predicting Recombination Losses in Perovskite Solar Cells with Deep Learning" Sustainability 17, no. 12: 5287. https://doi.org/10.3390/su17125287

APA Style

Abbas, S. R., Mir, B. A., Ryu, J., & Lee, S. W. (2025). Toward Sustainable Solar Energy: Predicting Recombination Losses in Perovskite Solar Cells with Deep Learning. Sustainability, 17(12), 5287. https://doi.org/10.3390/su17125287

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Toward Sustainable Solar Energy: Predicting Recombination Losses in Perovskite Solar Cells with Deep Learning

Abstract

1. Introduction

2. Methodology

2.1. Dataset and Feature Engineering

2.1.1. Feature Selection

2.1.2. Data Scaling

2.1.3. Data Splitting

2.2. Model Architecture

2.2.1. Long Short-Term Memory (LSTM)

2.2.2. Bidirectional LSTM

2.2.3. Multi-Layer Perceptron (MLP)

2.3. Training Procedure

2.4. Evaluation Metrics

2.5. Suitability of the Approach for the Problem

3. Results

3.1. Ablation Study

3.2. Comparison with Previously Reported Models

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI