Unraveling the Effect of Compositional Ratios on the Kesterite Thin-Film Solar Cells Using Machine Learning Techniques

: In the Kesterite family, the Cu 2 ZnSn(S,Se) 4 (CZTSSe) thin-film solar cells (TFSCs) have demonstrated the highest device efficiency with non-stoichiometric cation composition ratios. These composition ratios have a strong influence on the structural, optical, and electrical properties of the CZTSSe absorber layer. So, in this work, a machine learning (ML) approach is employed to evaluate effect composition ratio on the device parameters of CZTSSe TFSCs. In particular, the bi-metallic ratios like Cu/Sn, Zn/Sn, Cu/Zn, and overall Cu/(Zn+Sn) cation composition ratio are investigated. To achieve this, different machine learning algorithms, such as decision trees (DTs) and classification and regression trees (CARTs), are used. In addition, the output performance parameters of CZTSSe TFSCs are predicted by both continuous and categorical approaches. Artificial neural networks (ANN) and XGBoost (XGB) algorithms are employed for the continuous approach. On the other hand, support vector machine and k-nearest neighbor’s algorithms are also used for the categorical approach. Through the analysis, it is observed that the DT and CART algorithms provided a critical composition range well suited for the fabrication of highly efficient CZTSSe TFSCs, while the XGB and ANN showed better prediction accuracy among the tested algorithms. The present work offers valuable guidance towards the integration of the ML approach with experimental studies in the field of TFSCs.


Introduction
The earth-abundant element-based kesterite materials Cu2ZnSn(S,Se)4 (CZTSSe) are gaining attention in chalcogenide-based thin-film solar cells (TFSCs) [1][2][3][4].The recent developments in the device fabrication process offered a record power conversion efficiency (PCE) of 14.9% [5], though it is still lower than the other chalcogenide-based TFSCs.The higher open circuit voltage (Voc) loss in the kesterite is one of the prime reasons for it [6,7].This high Voc loss originates from potential and bandgap fluctuations, and these fluctuations occur due to the presence of a high density of defects [8][9][10][11].More commonly, the highly efficient CZTSSe devices exhibit non-stoichiometric Zn-rich and Cu-poor conditions [12].According to the first principle calculations, it is now well established that the precursor composition plays a vital role in the formation of these detrimental antisite defects, their defect clusters, and secondary phases [13][14][15][16].Therefore, to control the intrinsic defect density in the CZTSSe absorber layer, diverse sets of experiments and deep insight analyses need to be performed.However, it requires extensive human effort and can lead to the consumption of large amounts of resources.Consequently, controlling intrinsic defect density without external element doping and the formation of secondary phases with a suitable composition ratio remained a challenge.Machine learning (ML) has found promising applications in photovoltaic (PV) technology, offering opportunities to enhance PCE, reduce costs, and optimize performance [17,18].ML algorithms can analyze large datasets of solar cells, such as device parameters, materials properties, and diverse device fabrication conditions, and can identify key factors that affect device performance [17,19].By uncovering complex patterns and relationships, ML can guide the design and selection of materials in each layer of highly efficient solar cells.Moreover, ML can accelerate the discovery of new materials with desirable properties required for solar cells [20].By predicting material properties and performance through ML models, researchers can narrow the search for optimal materials, saving time and resources.Similarly, in kesterite-based TFSCs, a large amount of composition data can be analyzed, and a suitable compositional window can be established via the ML technique, saving energy and resources.
In the literature, various ML algorithms have been used for solar cell-based studies [21][22][23].Among different ML algorithms, decision trees (DTs), Random Forest (RF), Artificial Neural Networks (ANN), classification and regression trees (CARTs), XGBoost (XGB), k-nearest neighbors (KNN), and support vector machines are the most popular algorithms.They have been widely studied due to their lower complexity, lower computational cost, and good model accuracy.Kumar et al. [24] predicted the bandgap of TiO2 photoanode in dye-sensitized solar cells using DT, KNN, and RF techniques.Zhu et al. [25] predicted the key governing factors that influence the device performance of CIGS solar cells using RF, GB, and ANN algorithms, and correlation studies.Moreover, different interface passivation materials suitable for perovskite [26], structures/composition analysis of perovskite thin films determining electrical properties [27], and the design of new perovskite materials [28] were also achieved with these ML techniques.
In this work, to understand the effect of cation composition ratios related to Cu, Zn, and Sn on the device parameters of CZTSSe TFSCs, different ML algorithms were used.To predict the device performance, ANN and XGB algorithms were used for the continuous prediction approach.At the same time, the KNN and SVM algorithms were also used to predict the performance with the help of a confusion matrix for the categorical prediction approach.For the present study, we examined data points from 1300 devices.DT and CART approaches provided the best suitable composition windows for the device parameters in highly efficient CZTSSe devices.Moreover, various ANN, XGB, SVM, and KNN algorithms showed their suitability through prediction accuracy during predicting CZTSSe device parameters.

Fabrication of CZTSSe TFSCs and Construction of Database
The detailed fabrication workflow of the CZTSSe TFSCs and precursor fabrication process is described in our previous work [19].The diverse sets of precursor composition ratios were prepared through the deposition of Zn, Sn, and Cu metal precursors sequentially via a DC sputtering system on Mo-coated soda lime glass substrates.Each metal target with high purity produced by Taewon Science Co., Ltd.(i-TASCO), Seoul, South Korea, was used.During the sputtering process, the 30 W power was applied to each target size of 3 inches.Further, based on the preoptimized deposition rate, the deposition time varied from 0.5 h to 1.0 h.For example, to obtain the precursor with different Zn composition ratios, Zn deposition time was achieved while keeping the Cu and Sn deposition time constant.In one batch, a maximum of 9 samples having size 2.5 × 2.5 cm 2 were coated, which shows less than 1% composition deviation.During the deposition, substrates were constantly rotated at 5 rpm, and base pressure was maintained at around 8 mTorr.The overall precursor thickness of Cu/Sn/Zn layers over the Mo substrate was maintained at nearly 700 nm.Further, the thin films were soft-annealed in an inert (Ar) atmosphere for 0.5 h in the tube furnace and then sulfo-selenized in the rapid thermal annealing system to get the CZTSSe absorber layer.After the successful fabrication of the CZTSSe absorber layer, the n-type CdS buffer layer was deposited via a chemical bath deposition method.Subsequently, a window layer (i-ZnO and Al-ZnO(AZO)) was also deposited through RF sputtering.Finally, the Al top grid having an active device area of 0.30 cm 2 was deposited via DC sputtering to complete the device.The final device structure was maintained to SLG/Mo/CZTSSe/CdS/i-ZnO/AZO/Al via a device fabrication process established in the laboratory.

Characterizations
The formation of the CZTSSe phase in fabricated absorber thin films was confirmed through high-resolution X-ray diffraction (XRD, Philips, Amsterdam, The Netherlands).The diffraction measurements were performed from a 2theta value of 5-80° with a step size of 0.5°.The surface morphologies and cross-sectional view of CZTSSe thin films were analyzed via a field emission scanning electron microscope having a resolution of 0.6 nm (FE-SEM, ZEISS Gemini 500 + EDS (Oxford), Jena, Germany).The J-V curves for TFSCs were measured with a solar simulator (Wacom, WXS-155S-L2, Yamaguchi, Japan) at air mass 1.5 G conditions.The external quantum efficiency (EQE) spectra for the device were obtained from 300-1300 nm with a sept size of 10 nm with CEP-25BX (Bunkou Keiki Co., Ltd., CEP-25BX, Tokyo, Japan).An X-ray fluorescence (XRF) (Axios Minerals, Almelo, The Netherlands) was used to determine the composition ratios of the Cu/Sn/Zn precursor thin films operated at 4 KW and calibrated with Internal standard.

Computational Details
The device parameters like short circuit current (JSC), VOC, fill factor, (FF), PCE, series resistance (RS), and shunt resistance (RSh) are considered as target/output data, while cation composition ratio was considered input data.Before the DT algorithm use, the target property data were divided into four classes, namely low, medium, high, and very high (Table S1 in Supplementary Materials).On the other hand, the CARTs can handle continuous data; therefore classification of the data was not performed.With DT and CART algorithms, the best possible decision rules were created.Meanwhile, the prediction was performed with ANN, XGB, SVM, and KNN algorithms.The R studio (R version 3.6.2) was used to develop the source code for all ML procedures.

Results and Discussion
The CZTSSe absorber material fabricated in this work is a compound semiconductor material, i.e., Cu2ZnSn(S,Se)4 consisting of metallic elements Cu, Zn, and Sn as cations, while S and Se are in an anionic form.It is well established that S and Se and their corresponding anions ratio (S/(S+Se) or Se/(S+Se)) determine the band gap of the prepared CZTSSe absorber layer.Whereas the relative metal cation composition ratio of Cu, Zn, and Sn determines the material's stoichiometry, phase purity, secondary phase formation, and defect density.It eventually affects the material's optical and electronic properties.Here, the S/(S+Se) ratio in the CZTSSe absorber layer was fixed to ~1% throughout our device fabrication process as it provides high efficiency, so we performed the targeted study on metal cation composition ratio optimization and its effect.The ML-guided fabrication strategy of high-performance CZTSSe TFSCs involves identifying potential rules and heuristics based on compositional ratios viz Cu/(Zn+Sn), Cu/Sn, Cu/Zn, and Zn/Sn [19].Thus, the DT and CART algorithms were used to determine the optimal compositional ratios.Since DTs come under a supervised ML algorithm category, they require information about different classes [29].Accordingly, for PCE, four classes: low PCE (5.01 to 6.53%), medium PCE (6.53 to 7.34%), high PCE (7.34 to 8.16%), and very high PCE (8.16 to 10.24%) were made.The classes for the other device parameters are specified in supporting Table S1. Figure 1   The sets of rules are generated through DTs (Figure 1) and can be understood as follows.The DT output obtained for PCE as a target property gets divided into two subnodes and establishes the first decision rule.(i) If the Cu/Sn ratio is ≥1.3, the PCE of the CZTSSe TFSCs tends to be low, while if it is less than 1.3, then it can show a very high PCE [1,30].The probability of observing low and very high PCE is 0.39 and 0.35, respectively.The subsequent step involves the second rule: (ii) if the Cu/Zn composition ratio is ≥1.4, it corresponds to a low PCE with a probability of 1; otherwise, a medium PCE can be expected with a probability of 0.40.The third rule in the decision-making process involves the Cu/Zn composition ratio.(iii) If the Cu/Zn ratio is ≥1.2, then the DT predicts a high PCE with a probability of 0.35; otherwise, it anticipates achieving a very high PCE with a probability of 0.60.Further fourth subrule suggests, (iv) if the Cu/(Zn+Sn) ratio is <0.66, a high PCE can be observed with a probability of 0.36.On the other hand, if it is >0.66, a very high PCE can be obtained for the CZTSSe TFSCs with a probability of 0.92.Finally, the fifth subrule suggests, that (v) if the Zn/Sn ratio is <1, a high PCE may be achieved with 0.48 of probability.Otherwise, a lower PCE can be obtained with 0.34 of probability.The light saffron (low), gray (medium), dark saffron (high), and Green (very high) colors shows PCE trend in DT.Similar rules can be evaluated using the same methodology for the other DT models (FF, JSC, VOC, RS, and RSh: Figures S1-S5, respectively).
In the subsequent phase of our work, a decision model for CZTSSe TFSCs using the CART algorithm was constructed as shown in Figure 2. The CART is a supervised ML algorithm, primarily utilized for both classification and regression tasks.So, the CART algorithm is used to predict and classify the values of continuous properties based on input features.At each node, the algorithm selects the best feature and corresponding threshold to partition the data into homogeneous subgroups with respect to the target properties.Following the execution of the CART model it shows a division of the root node into two segments.The corresponding CART rules can be interpreted as: (i) when the Cu/Sn ratio is ≥1.3 (Yes), an average PCE of 6.7 can be achieved, otherwise, 7.6% PCE can be achieved.(ii) If the Cu/Zn ratio is ≥1.2 (given that Cu/Sn ≥ 1.3), 6.5% of average PCE may achieved or a relatively higher PCE of 7.0% may achieved.Other than the Cu/Sn and Cu/Zn ratio, further critical composition ratio can be determined with other subrules: (iii) if the Cu/Sn is not ≥1.3 (No), then 7.4% and 8.2% PCE can be achieved with Cu/Zn ≥ 1.2 and Cu/Zn < 1.2, respectively.(iv) If the Zn/Sn is ≥1 and Cu/Zn is <1.2, then a maximum PCE of ~7% can be achieved, while 7.8% can be achieved with Cu/Sn > 1.3.This subruleiv has a very low-node probability (<10%), thus this class of nodes can be ignored while making critical decisions.Similarly, it can be applied to other nodes too.(v) If the Zn/Sn < 1, Cu/(Zn+Sn) < 0.66, and Cu/Sn > 1.2 then 7.6% PCE can be obtained.(vi) If the Cu/Zn < 1.2, Zn/Sn > 1.1, and Zn/Sn ≥ 1.1 then 7.5% of PCE can be achieved or 8.8% can be obtained.Briefly, more than 8% of PCE can be reached in the CZTSSe TFSCs with the following composition ratios: Cu/Zn < 1.2, Zn/Sn ~1.0-1.1.From the light to dark blue color shows increasseing PCE trend in CART.Similar procedures can be employed to understand subsequent rules.The CART models for other properties like FF, JSC, VOC, RS, and RSh are shown in the supporting information file (Figures S6-S10 in Supplementary Materials).Overall, both the DT and CART output models for PCE as a tragert property suggest that the Cu/Sn raio higher than 1.3 and Cu/Zn higher than 1.2 are not helpful in breaking the saturated PCE of CZTSSe TFSCs.At the same time, slightly poor ratio conditions provide high PCE.Similar ratios also determine the other device parameters.The Zn/Sn ratio of about ~1.0 was optimal in all scenarios with high PCE and RSh.Interestingly, the overall metal composition ratio, i.e., Cu/(Zn+Sn), has minimal impact on PCE, while the binary intermetallic composition ratios such as Cu/Zn, Cu/Sn, and Zn/Sn are more sensitive to the PCE and other device parameters.In the next stage, ANN, XGB, KNN, and SVM algorithms were used to predict the device parameters of CZTSSe TFSCs depending on compositional ratios.In particular, the ANN and XGB ML algorithms were used to predict the continuous output properties.On the other hand, the SVM and KNN algorithms were used to predict the categorical output properties.At first, we applied the ANN algorithm to the relevant data and determined the predicted values.The adjusted R 2 (Adj.R 2 ) values for each ANN model were obtained through linear fitting to the experimental and predicted data.The predicted and experimental values of CZTSSe TFSCs based on the ANN model are shown in Figure 3.The detailed ANN structure of the various device parameters is shown in Figure S11.The linear fitting to the scatter plots exhibited adjusted R 2 values below 0.27.It indicates that the performance of the ANN models in prediction was unsatisfactory.The ANN model's inadequate predictive performance can be ascribed to the extremely heterogeneous dataset.In response, the XGB algorithm was employed to enhance predictions and rectify the unsatisfactory performance exhibited by the ANN algorithm.The prediction results of the PCE, FF, JSC, VOC, RS, and RSh obtained using the XGB algorithm are depicted in Figure 4.The adj.R 2 values for each XGB model were better than the ANN models.It can be observed that the XGB predicts the device parameters better than the ANN.The predictions of output performance parameters during the continuous approach are not satisfactory.Thus, we grouped the PCE, FF, JSC, VOC, RS, and RSh of CZTSSe TFSCs into four classes (1 to 4) and tried to predict these properties using SVM and KNN algorithms based on a categorical approach.The SVM and KNN are supervised ML algorithms, and they are generally used for classification tasks [31].Figure 5 represents the SVM-based confusion matrix of CZTSSe TFSCs.For this, compositional ratios were taken as input parameters.In this case, the accuracy of the PCE, FF, JSC, VOC, RS, and RSh confusion matrix is found to be 37.44%, 33.42%, 32.91%, 42.21%, 33.17%, and 21.36%, respectively.Figure 6 represents the KNN confusion matrix of CZTSSe TFSCs.For KNN, the accuracy of the PCE, FF, JSC, VOC, RS, and RSh-based confusion matrix was found to be 46.79%,45.28%, 47.92%, 54.15%, 43.58%, and 34.90%, respectively.These results suggested that the prediction results of the KNN algorithm are better than the SVM algorithm.Overall, the KNN outperforms the SVM in predicting the categorical output properties of CZTSSe TFSCs.Alternatively, the XGB predicts the device parameters better than the ANN.Based on the DT and CART rule, the CZTSSe device was fabricated with an optimal composition ratio of Cu/Zn = 1.15-1.20,Cu/Sn = 1.25-1.30,Zn/Sn = 0.95-1.0,and Cu/(Zn+Sn) = 0.57-0.62.The crystal structure and morphology of corresponding representative CZTSSe thin films were analyzed via XRD and FESEM, respectively (Figure S12 in Supplementary Materials).The CZTSSe thin film showed the formation of crystalline thin films with characteristic CZTSSe peaks around 2θ value of 27.27°, 45.31°, and 53.69° corresponding to the (112), (204), and (312) planes, respectively (Figure S12a).The FE-SEM analysis showed the formation of compact and large grains with an absorber thickness of ~1.65 µm (Figure S12b).Figure 7a shows the current-voltage (J-V) characteristics measured under standard AM 1.5 G illumination test conditions.The device fabricated with a ML-optimized composition range exhibited the average device PCE of 8.61% with JSC, VOC, and FF, ~33.61 mA/cm 2 , ~479 mV, and ~53.51%, respectively (Table S2), while the best sample exhibited 8.89% of PCE.The external quantum efficiency measured for the best device exhibited nearly 80% photoresponse in the visible region, as shown in Figure 7b.In addition, the bandgap values estimated for a similar device were ~1.09 eV.The present investigation showed DT and CART rules determined from ML well matched with the practical investigation.Overall, it can be realized that the integration of the ML approach with experimental studies can provide critical composition windows that could assist the fabrication of highefficiency solar cells.Moreover, it also revealed the composition windows where the device performance could decrease.Yet, the ML algorithm has its strengths and weaknesses, such as continuity in input data which limits the algorithm selection, the number of observations that decides the model reliability, and prediction accuracy counts for output result validation.Therefore, it is necessary to cautiously analyze the specific data requirements for a specific algorithm to fully explore the strength of ML in TFSCs.

Conclusions
In summary, various ML algorithms were utilized to understand the effect of compositional ratios (Cu/Zn, Cu/Sn, Cu/Zn+Sn, and Zn/Sn) on the CZTSSe TFSCs output device parameters (PCE, FF, JSC, VOC, RS, and RSh).The DT and CART algorithms provided hidden sets of rules for the fabrication of highly efficient CZTSSe TFSCs.It was observed that the bi-metallic composition ratios, such as Cu/Zn, Cu/Sn, and Zn/Sn, are more sensitive towards device properties, while overall Cu/(Zn+Sn) is less.The Cu/Zn and Cu/Sn ratios > 1.2 (i.e., Cu-rich conditions) were found to be unfavorable during achieving high PCE.In contrast, the Zn/Sn ratio ~1 was optimal.Further, the output performance of the CZTSSe TFSCs was predicted using continuous and categorical prediction approaches.In the case of the continuous approach, the XGB algorithm shows a better prediction of the device parameters than the ANN algorithm; on the other hand, the KNN predicts the categorical output properties better than the SVM.Among all tested prediction-based algorithms, the KNN provides better results than other algorithms.The present work provides guidelines for the possible application of different algorithms in the ML-integrated fabrication of TFSCs and insights to improve the PCE of devices.
shows the DT model obtained after executing the algorithm for the target PCE property of CZTSSe TFSCs.It is important to note that this DT model is specific to PCE, and similar DT models for other properties like JSC, VOC, FF, RS, and RSh are shown in the supporting information file (Figures S1-S5 in Supplementary Materials).

Figure 1 .
Figure 1.The DT output model aimed with PCE as a target property for CZTSSe TFSCs.Each node in the DT model contains specific details, such as the decision rule associated with that node, the class of the node, the number of observations used, and probabilities of classes.The light saffron, gray, dark saffron, and Green colors shows low, medium, high and very high PCE, respectively.

Figure 2 .
Figure 2. The CART output model aimed with PCE as a target property for CZTSSe TFSCs.Each node in the CART model consists of a node, and the average target value for each node and the percentage of data follows the rule of that node.The light blue and dark blue color shows lower and higher PCE in CART.
Figure S1: DT model for the FF of the CZTSSe TFSCs.; Figure S2: DT model for the JSC of the CZTSSe TFSCs.; Figure S3: DT model for the VOC of the CZTSSe TFSCs.; Figure S4: DT model for the RS of the CZTSSe TFSCs.; Figure S5: DT model for the RSh of the CZTSSe TFSCs.; Figure S6: CART model for the FF of the CZTSSe TFSCs.; Figure S7: CART model for the JSC of the CZTSSe TFSCs.; Figure S8: CART model for the VOC of the CZTSSe TFSCs.; Figure S9: CART model for the RS of the CZTSSe TFSCs.; Figure S10: CART model of the RSh of the CZTSSe TFSCs.; Figure S11: ANN structure related to (a) PCE, (b) FF, (c) JSC, (d) VOC, (e) RS, and (f) RSh of the CZTSSe TFSCs.; Figure S12: (a) XRD of CZTSSe absorber thin film, (b) FESEM surface morphology, inset cross-sectional view.;Table S2: Experimental photovoltaic parameters of the CZTSSe TFSCs under standard test conditions.