Investigation of Opto-Electronic Properties and Stability of Mixed-Cation Mixed-Halide Perovskite Materials with Machine-Learning Implementation

: The feasibility of mixed-cation mixed-halogen perovskites of formula A x A’ 1 − x PbX y X’ z X” 3 − y − z is analyzed from the perspective of structural stability, opto-electronic properties and possible degradation mechanisms. Using density functional theory (DFT) calculations aided by machine-learning (ML) methods, the structurally stable compositions are further evaluated for the highest absorption and optimal stability. Here, the role of the halogen mixtures is demonstrated in tuning the contrasting trends of optical absorption and stability. Similarly, binary organic cation mixtures are found to signiﬁcantly inﬂuence the degradation, while they have a lesser, but still visible effect on the opto-electronic properties. The combined framework of high-throughput calculations and ML techniques such as the linear regression methods, random forests and artiﬁcial neural networks offers the necessary grounds for an efﬁcient exploration of multi-dimensional compositional spaces.


Introduction
Halide perovskites are currently regarded as one of the most prominent materials for photovoltaic technology, following the rapid advancement in photoconversion efficiency (PCE) reaching up to 25.2% and due to the low fabrication costs, which makes perovskite solar cells (PSCs) a strong competing candidate for the standard silicon technology [1,2]. Although the rather high PCEs reflect the excellent opto-electronic properties of halide perovskites, there are still stability issues that hamper commercialization. Therefore, a lot of effort is currently dedicated to mitigating the intrinsic and extrinsic degradation processes, without compromising too much the absorption and photoconversion quality of the optically active materials.
The intrinsic degradation mechanisms concern pristine perovskite materials subjected to temperature [3] and illumination [4], while extrinsic degradation mechanisms are determined by foreign species and molecules such as oxygen [5,6] and water [7]. These are typically addressed by chemical engineering, most notably, using mixtures of cations [8] and halogens [9], doping and passivation [10]. The multitude of new organic cations renders a wide variety of low-dimensional perovskite structures, where ion migration is inhibited by the relatively large cations and their embedded hydrophobic moieties can in Figure 1a. According to their effective radii and the resulting Goldschmidt tolerance factors, these cations can easily fit into a 3D perovskite structure. However, not all of them have been proved so far to acquire a 3D structure, as supplementary interactions with the inorganic Pb-X cage, particularly by hydrogen bonds and van der Waals forces, can destabilize the 3D structure. Moreover, in the case of cation mixtures it is rather unclear how far the criteria based on Goldschmidt factor and octahedral ratio can provide a definite answer regarding the realization of a 3D structure. Larger cations typically give rise to 2D-3D or columnar 1D structures that can be adjusted by the proportions of large/small cations [32]. A more detailed analysis can take into account structural properties of the cations and alternative definitions, e.g., the globularity factor [33] may be introduced. Table 1. A-site cations considered here used to form 3D mixed-halide perovskites. The Goldschmidt tolerance factor is calculated for each halogen type. The listed cation effective radii (except FM) were adopted from Kieslich et al. [34]. The ionic radii of the constituent elements are R Pb eff = 119, R I eff = 220, R Br eff = 196, R Cl eff = 188, R N eff = 146, R O eff = 135, adopted from Shannon [35]. All ionic radii are given in [pm] units. ( 1 ) used as additive [36].
Besides the MAPbX 3 compounds, the synthesis of single-cation perovskites has been reported for FAPbI 3 [39], EAPbI 3 [40], GAPbI 3 [41]. In contrast, a recent study indicates that FM was not successfully embedded in the perovskite structure, but rather it was included as an additive [36], despite the similarity with the FA cation. The pure usage of HA cations yields structures with face-sharing octahedra that differ from the standard, corner-sharing perovskites [37]. This is despite the fact that the Goldsmith factor would indicate a typical 3D structure is possible and the anomaly was explained by the formation of hydrogen bonds. Additionally, DFT simulations predict that a perovskite structure can still be obtained in the case of HAPbI 3 , although significant distortions of the Pb-I cage are present [42].
In addition, mixed halides also play an important role with regards to both efficiency and stability [54,55]. Here, we consider all three halogens with different proportions. By introducing lighter halogens (Br,Cl) the bandgap can be tuned, which is particularly relevant for tandem PSCs. Although the resulting increased bandgaps lower the optical absorption, the stability of bromide-based perovskites is significantly enhanced [56] and this is confirmed by thermal decomposition of MAPbX 3 perovskites [57]. Furthermore, the stability seems to be also linked to the A-site cation mixture as observed in the case of halide segregation [58]. Using binary mixtures of the aforementioned seven A-site cations and three halogens we assemble the perovskite structures of formula A x A' 1−x PbX y X' z X" 3−y−z . We consider 5 A x -A' 1−x compositions, corresponding to x = 0, 0.25, 0.5, 0.75, 1 and, for each case, 16 different halogen proportions are taken: 12 of them correspond to binary halogen mixtures of type X y -X' 1−y similar to A-A' proportions, three proportions of triple cations of type X 0.25 -X' 0.25 -X" 0.50 and the case of equal halogen proportions, I 0.33 Br 0.33 Cl 0.33 . This results in a total of 1120 distinct samples to be analyzed. Since the multi-dimensional compositional space poses a difficulty in having a concise representation of the data sets of interest, we adopted the scheme depicted in Figure 1b, where each dot represents a given structure and the colors map the values of different quantities derived for each composition (e.g., integrated optical absorption coefficients, formation energies).

Density Functional Theory Calculations
The perovskite structures are constructed starting from the orthorhombic MAPbI 3 case containing 48 atoms in the unit cell, obtained by substituting the A-site cations and halogens on the original sites of MA and I, respectively. The initial positions of the organic cations are set to match the center of the mass as well as the primary direction of the molecular dipole, set by the C-N bond.
Structural relaxations are performed by DFT calculations implemented in the SIESTA package [59]. The linear scaling of the computational time with the system size, ensured by the strictly localized basis set, is essential for performing efficiently the high-throughput DFT calculations. Norm-conserving Troullier-Martins pseudopotentials are used, and the exchange-correlation potential corresponds to Ceperley-Alder parametrization [60]. We employ a double-ζ polarized basis set and the real space grid is fixed by a mesh cutoff of 150 Ry. Although reproducing the correct bandgap in halide perovskites may require, in general, the inclusion of hybrid functionals and spin-orbit interaction, which are considerably more computationally demanding, for lead-based perovskites there is a fortouis cancellation of errors rendering rather accurate band structures. Following relaxations, optical calculations are performed, which yield the complex dielectric function, and the energy dependent absorption coefficient is determined by the polycrystal approximation, i.e., an averaged value with respect to randomly incident optical vectors.
These premises offer a good compromise between accuracy and computational efficiency for these specific set of systems, making a rather extensive study feasible, which offers a broad perspective over the quantities of interest.

Assessment of 3D Perovskite Structures, Opto-Electronic Properties and Stability
Optimizing PSCs in terms of efficiency and stability is directly connected with enhancing the optical absorption under the standard AM 1.5 G irradiance conditions, while there should also be an effective mitigation of potential degradation effects.
As already noted, according to previous experimental evidence, not all seven compounds based on the cations considered here (see Figure 1a) were observed to form a standard 3D perovskite configuration. Therefore, to identify possible lattice distortions for the relaxed structures, we first perform a Pb-X bond length analysis. We collect all 24 Pb-X bond lengths, corresponding to the four octahedra of the unit cell and monitor the deviations from the reference structures, MAPbX 3 (X = I,Br,Cl), which have a relatively narrow distribution. Typically, the de-stabilization of the inorganic cage has, as a primary effect, a wider spread of bond lengths values. As one halogen atom tends to be localized closer to one lead atom, which may be indicative of a potential tendency for bond breaking, a corresponding increase of the distance to the other, nearest neighbor lead atom is observed. We characterize these deviations by ∆l min = l min − l max are the minimum and maximum bond lengths values for Pb-X in the MAPbX 3 perovskites, respectively. Since the stoichiometry is imposed for a particular set of DFT calculations, a transition to other structural configurations that require different Pb/I proportions cannot be observed here. However, our aim is to detect possible issues related to the formation of the standard 3D perovskite structures, which are visible in the Pb-X bond lengths statistics, in particular in the deviations obtained for the minimum and maximum bond lengths. Different quantitative criteria may be imposed for selecting feasible structures. Naturally, in the case of mixed compounds, the proportions of different cations of various sizes and chemical behaviors, in connection with the halogen distribution as well, will determine the likelihood for a perovskite to form a three-dimensional structure.
To rank the specific perovskite structures according to their optical efficiency, we determine the flux of photons Φ at given depth d, using the spectral (solar) irradiance S I(λ) and the calculated wavelength-dependent absorption coefficient α(λ). Applying the Beer-Lambert law we find: where the incident distribution of photon flux over the relevant wavelength range is Φ 0 (λ) = S I(λ)/E λ and E λ is the photon energy. Using Equation (1), one can define the fraction of total absorbed light at depth d as The f a index shall be further used to evaluate the absorption of perovskite layers, as it is connected to the maximum number of carriers that would be ideally generated for the solar spectrum.
To compare the stabilities of the perovskite compounds, we considered one intrinsic mechanism, which can be triggered and influenced by factors such as temperature, illumination, bias conditions, usually found during the operation of the PSC and one extrinsic mechanism, which assumes the presence of a foreign specie, namely oxygen.
One possible intrinsic degradation mechanism corresponds to the decomposition of the perovskite into gaseous phases, composed of organic molecules and HX acids, and a solid precipitate, PbX 2 , as given by the following reaction: HereÃ,Ã represent neutral molecules, which result by de-protonation of the respective cations. Equation (2) actually brings into a more general form one of the degradation mechanisms proposed in Ref. [23,61] discussed for MAPbX 3 . Other degradation mechanisms are possible, e.g., with the formation of MAX precursor or splitting of the organic cation to ammonia in the case of MAPbX 3 perovskites, typically in the presence of moisture. However, in the subsequent analysis, this reaction pathway also has the advantage of a uniform treatment of the mixed perovskite compounds, irrespective of the nature of organic molecule, which allows a direct comparison between the different compounds.
In addition to the intrinsic mechanism, we consider the extrinsic influence of molecular oxygen, which can be described by the following reaction pathway [23]: Although extrinsic elements such as O 2 and moisture can be prevented by a proper encapsulation, an assessment of such degradation mechanisms is important for the fabrication procedure as well as for the overall reliability in time of the PSC modules.
The formation energies, E f , associated with the reaction pathways are defined in a compact form, for intrinsic and extrinsic degradation mechanisms, respectively, as: where reaction products for the intrinsic and extrinsic mechanisms are, respectively, p i (i) = A, A , HX, HX , HX , PbX 2 , PbX 2 , PbX 2 and p i (e) = A, A , X 2 , X 2 , X 2 , PbX 2 , PbX 2 , PbX 2 .
The coefficients x (i) p i correspond to molar fractions of the reaction products in Equations (2) and (3), respectively. The total energies of the perovskite compound and molecular oxygen are denoted by E prv and E O 2 , respectively. Based on the formation energies we define properly scaled stability indices f , which provide an indication over relative stability of the perovskite compounds, where E min f and E max f represent the minimum and maximum values in the set, respectively.

Machine-Learning Models
To select the perovskite structures of interest, a rather extensive exploration of the compositional space is typically required. High-throughput calculations for structural relaxations can meet these goals, but use significant computing resources. Tuning the cation and/or halogen proportions brings a large degree of similitude, overlapping properties and tendencies, so that investigating the entire collection of systems can be more efficiently approached using suitable ML techniques.
The input of a ML approach is typically based on an accessible relevant information which is correlated with the targeted output. Here, we introduce a flexible feature vector that contains the proportions of all entities that define the compositional space under investigation. By entity we refer to groups of atoms, as in the case of organic cations (MA, FA, etc.) or single atoms (e.g., Pb, X). Please note that this entity-based approach can be employed to more complex perovskite structures with arbitrary number of cations, dimensionality and other functional groups as halogen replacements. Instead of considering species proportions as input vectors (i.e., single elements indexed by the atomic numbers), which would seem more general, this approach takes advantage of already identified functional groups of atoms (i.e., entities), which are directly correlated with the targeted output (e.g., band gap and optical absorption, formation energy). The predicted quantities are absorption and stability indices, f a and f (i/e) s , respectively, and the minimum and maximum bond lengths deviations, l min and l max . We first check the feasibility of linear regression models and, to this end, we employ multivariate least squares (MLS) fitting. However, due to their simplicity the linear regression models have limited applicability and more complex behavior can be accurately captured by non-linear models, such as random forests (RF) and artificial neural networks (ANNs). Therefore, we compare and assess the suitability of different ML methods (MLS, RF, ANN) for the prediction of targeted quantities.
In all three methods we chose N train randomly chosen structures as a subset for model training and the same feature vectors based on entity proportions. The rest of the samples, N test , is used as a test set in the case of MLS and RF methods, while for the ANN model we further split it into two subsets, i.e., for validation and test. The MLS and RF are implemented by Scikit-Learn [62]. Within the RF method a random forest regressor is implemented, which employs classifying decision trees. Here, the entire train subset is used to build the trees. The ANN is employed as an alternative method to compare the performance achieved by the different approaches. The ANN architecture is based on a 3-hidden layer fully connected network, with 100, 50, 25 neurons in each layer. We use ReLU activation function and the weight and biases are initialized from a random uniform distribution. A typical parametric configuration (learning rate of 10 −5 , β 1 = 0.9, β 2 = 0.999) was chosen for the Adam optimizer. The output layer contains a single neuron without activation function, which provides the targeted quantity. Our ANN implementation makes use of TensorFlow with Keras frontend [63,64].

Results
The 1120 structures are evaluated from the point of view of structural stability, optical absorption and chemical stability with respect to the two degradation mechanisms.
The distributions of the Pb-X minimum and maximum bonds lengths are depicted in Figure 2a,b, respectively, using a red-green-blue encoding to represent different halogen proportions. In the case of l min one can clearly see the three groups formed by each halogen, while the mixtures are found in between. By contrast, the distribution of l max is broader, with some very large values for Pb-X bond lengths indicating a significant distortion of the inorganic cage. Based on the bond lengths statistics, a binary classification of the perovskite structures is performed, which is shown to depend on the nature of cations, halogens and their proportions, as depicted in Figure 3. If following structural relaxations, the minimum or the maximum Pb-X bond length deviate more than a fraction f bl from the corresponding bond length value in the MAPbX 3 systems, which are taken as reference, i.e., |∆l min / max | > f bl × l min / max , the systems are considered unlikely to form the 3D perovskite conformation. Figure 3 shows a matrix type representation for the 3D classification, where each element corresponds to the description laid out in Figure 1b. The rows correspond to A cations, while the columns are index by A . This convention is kept the same in the subsequent analyses (e.g., absorption and stability indices). Two criteria are used, set by the fractions f bl of 10% and 5% (a more strict criterium), corresponding to the bond lengths deviations, which indicates the same trend: the MA, FA, EA cations do not induce substantial deformations of the Pb-X cage, as opposed to FM, HA, HZ and, in part, GA, which is the largest cation. This is somewhat opposed to what one would expect, based on the ionic radii alone, since we notice significant differences, as R MA eff < R FA eff < R EA eff < R GA eff , whereas FM, HA, HZ are all comparable in size with MA and FA. However, the interactions with the Pb-X cage seem to have significant destabilization effect, which is also reflected in the lack of reports concerning the attainability of these particular perovskites. Concerning the group pf FM, HA, HZ, our calculations indicate that FM mixtures are statistically the worst candidates, already signaled by the absence of the 3D phase in [36], followed by HA and HZ. However, in the case of HZ, surprisingly, mixtures with large proportions of MA or EA yield rather undistorted structures. A similar situation was experimentally observed in Ref. [38], where MA 1−x -HZ x mixtures were obtained for x < 0.3. We should note that although these results are statistical in nature, the large number of structures provides a consistent overview over the relatively broad class of mixed-cation mixedhalide perovskites. Further on, the optimization of absorption spectra is essential for highly efficient PSCs. Although relatively well positioned for the solar spectrum, the bandgap of MAPI of ∼1.6 eV is significantly larger than in the case of silicon and a decrease of the bandgap would be therefore desired. However, only a few options are available to achieve smaller gaps, e.g., by changing the species or functional groups in the three possible sites, A, B, X. In fact, by including lighter halogens, we observe the gap increases in correspondence with shorter Pb-X lengths. The band structures for the reference MAPbX 3 structures are shown in Figure 4a-c. The absorption spectra indicated in Figure 4d have different overlap with spectral solar irradiance S I, depending on the halogen type. The substitution of Pb by other group four elements such as Sn, Ge reportedly did not result in similar efficiencies. On the other hand, the influence of the organic cations on the bandgap is typically rather small, but still noticeable in some cases. For example, slightly larger cations such as FA and EA tend to increase the volume of the unit cell from 819.78 Å 3 (MAPI), to 823.47 Å 3 (FAPI) and 884.45 Å 3 (EAPI), which can be correlated with the bandgap decrease. GAPI is a somewhat special case which shows a larger gap and an intermediate volume of 843.81 Å 3 . However, the absorption coefficient has a different distribution over wavelengths, with larger values at higher photon energies (4-5 eV) which results in an overall comparable optical performance with FA and EA-based perovskites. The absorption spectra of these mono-cation base perovskites are comparatively presented in Figure 4d.  The picture of binary-cation and triple-cation mixtures is more complex; however, it follows the same general trends as depicted in Figure 5. As expected, the halogen proportions have a strong impact over the absorption index f a , the fraction of total absorbed light at depth d in a perovskite layer. The bandgap sequence corresponding to different I, Br, Cl proportions is clearly seen in all binary-cation mixtures, as it has direct implications in the absorption spectra. In particular, Br-Cl mixtures indicate the lowest efficiency as absorber materials. However, lighter halogens play an active role for enhancing the stability, therefore the potential detrimental effect with regards to opto-electronic properties should be put in a broader context. A close inspection of data reveals that there are indeed variations due to the cation mixtures as well, even though at a smaller scale, and the most prominent candidate mixtures rely on EA, GA, FA cations, which prove to be better than the standard MAPbX 3 compounds: higher x proportions in EA x -MA 1−x , GA x -MA 1−x , FA x -MA 1−x mixtures show a rather steady trend of improvement. The primary goal of enhancing the opto-electronic properties is presently increasingly put in the balance with the mitigation of possible degradation mechanisms, which have a greater impact on the overall performance. Here, we consider the intrinsic mechanism defined by Equation (2) and an extrinsic mechanism, which involves the perovskite interaction with molecular oxygen, described by Equation (3). The results concerning the stability index f (i/e) s are shown in Figure 6, scaled within the reference intervals of [2.9, 5.9] eV and [11.0, 14.8] eV for the intrinsic and extrinsic degradation mechanism, respectively, providing a compact overview regarding the relative stability of the perovskite mixtures.
Concerning the intrinsic degradation mechanism, the Br > I > Cl stability sequence was found as a quite general trend. This is consistent with the experimental study of Pistor et al. showing in situ XRD patterns during heating and decomposition of MAPbX 3 perovskites [57], indicating the highest decomposition temperature for MAPbBr 3 and the lowest for MAPbCl 3 . In addition, Brunetti et al. describes MAPbCl 3 as the most unstable compound among the same three perovskites [65], while MAPbI 3 and MABr 3 are reported to be more stable. Ciccioli et al. also finds highest stability for bromide-based compounds with respect to the volatilization reaction [24] and McGovern et al. further argues for the enhanced stability of MAPbBr 3 over MAPbI 3 by a reduced mobility of MA cations due to shorter Br-X bond lengths [56]. With respect to the organic cation mixtures, the perovskites incorporating GA, FA and EA show the higher stability. This is in line with reported improvements in the overall stability of the PSCs for GA-MA mixtures [48], EA-MA mixtures as revealed by first-principles calculations [46], while FA-MA mixtures with additional Cs also demonstrate stability enhancement [66]. Triple cations systems have been also evaluated starting with the base composition of MA 0.5 FA 0.25 Cs 0.25 PbI 3 and subsequently the highest proportion among the cations is varied along with halogen composition. The intrinsic mechanism described by Equation (2) is slightly modified by the appearance of CsX salts as degradation products. With respect to the reference triple-cation system, MA 0.25 FA 0.5 Cs 0.25 PbI 3 (i.e., larger proportion of FA) has slightly larger formation energy of ∼4%, followed by a significant decrease by 37% for MA 0. 25  Our investigation shows that GA-FA and GA-EA are the best candidates. This is despite the fact that the proposed combinations have, on average, larger cation sizes, compared to MA, but also to FM, HA, HZ, which are shown to yield a smaller formation energy for the decomposition into reaction products. Again, this can be correlated with the missing 3D perovskite compounds that can be, at least in theory, produced. Our calculations also suggest that the chemical stability can be enhanced for FM, HA, HZ by mixing them with GA and FA, as long as the structural analysis indicates a 3D structure. In addition, a degradation mechanism resulting in the formation of precursors was also suggested [23], although the experimental evidence of cation-halide precursor (A-X) is lacking [24], in contrast to PbI 2 , while the prevalence of the gaseous phases (A, HX) is pointed out. Nevertheless, we able to confirm the intrinsic instability of MAPbI 3 with a formation energy of −50 meV, similar to Ref. [22]. The extrinsic degradation mechanism conveys a similar picture, with the difference that chlorine-based perovskites are the most stable with respect to oxygen action. The same cations, GA and FA, can provide stability improvements, while least stability is demonstrated for the FM, HA and HZ cations.
These results can be, in part, explained by simple processes. Both intrinsic and extrinsic stability trends suggest that the cation deprotonation resistance is an important factor. To check this, we considered a simplified deprotonation reaction, of type AI →Ã + HI, where A is the cation andÃ is the deprotonated molecule, and we found that the reaction energies are in line with the formation energies corresponding to the degradation mechanisms: MA  19 eV). This shows that GA and FA cations have the largest deprotonation energies, while EA is rather close, which translates in higher stability. FM is at the other end, while HA and HZ have intermediate values. Our estimation for MA gives a small value, but still larger than FM. On the other hand, as the intrinsic mechanism has, as final products HX, instead of X 2 as in the case of the extrinsic degradation mechanism, we obtain that bromide-based perovskites are more stable for the intrinsic mechanism (following the trend Br > I > Cl), while chlorine compounds are more stable for the extrinsic one (Cl > Br > I). This is in accordance with the fact that HCl is most stable, followed by HBr and HI, with respect to the decomposition reaction HX → 1 /2H 2 + 1 /2X 2 , the resulting reaction energies being 0.94 eV (HCl), 0.57 eV (HBr) and 0.06 eV (HI). Although this relatively simple picture generally holds, the formation energies for the two degradation mechanisms depend also on the total energies of the respective perovskite mixtures.
The individual impact of cations and halogens on both optical absorption and intrinsic stability can be seen in Figure 7, as they are highlighted in absorption-stability ( f a − f s ) maps. Although in general, one cannot decouple the individual contributions of cations and halogens, some trends become visible at this level too. For instance, high concentration of FA and GA bring a visible increase in the formation energy, as well as higher concentrations of bromine, translated in a stability enhancement. This is in stark contrast to FM-based mixtures. Regarding the optical absorption, large proportions of EA and GA in mixtures are beneficial, with iodine as the best choice in this respect. Assuming a certain partition of the ( f a − f s ) map, one may define selections of candidate perovskites that are more appropriate for certain technological applications. For instance, aiming at both optical and stability performance, we obtain that the best candidates mostly rely on EA-GA and I-Br mixtures as one can see from Table 2, with EA 0.25 GA 0.75 PbI 3 as the most prominent example. Optimizing absorption only, compounds based on high GA proportions mixed with EA and potentially HZ or FM, with predominantly iodine are found, e.g., EA 0. 25   One should also mention that compared to realistic conditions, formation energy calculations are idealized for perfect bulk systems, rather than defected perovskite layers with finite grain sizes and interfaces, which can lead to rather different degradation mechanisms. It is expected that these different kinds of defects would lower the formation energies of the resulting compounds. Additionally, the DFT calculations at LDA level typically tend to overestimate the formation energies as well as slight underestimate the band gaps; however, in this context, a reasonable compromise was achieved between computational efficiency and accuracy. This enables a rather extensive overview regarding the mixed perovskite compounds, allowing a direct comparison with respect to both optical efficiency and chemical stability.
Moreover, under certain experimental conditions it is difficult to separate multiple degradation mechanisms. Taking into account the combined action of temperature and oxygen, an enhanced stability should be obtained by including lighter halogens (Br, Cl) and mixtures containing primarily GA, FA and EA. For these three cations, depending on their sizes and proportions, an abundance of low-dimensional perovskite structures may be formed as in the case of FA-GA mixtures, where 2D layered structures are also observed [67]. Despite their enhanced stability, these low-dimensional perovskites have wider bandgaps which can significantly reduce the light absorption in the case of thin 2D layers. In this respect, 3D perovskites are generally better suited. Therefore, controlling the thickness and chemical composition of the 2D layers remains an active goal for optimizing the overall absorber performance.
The multitude of possible compositions, potentially resulting in rather different structures cannot be covered by high-throughput calculations alone. As there is a large degree of overlapping information contained in the analyzed set of structures, regression methods can provide reasonably accurate results using only a relatively small subset. To this end, we investigate in how far relatively complex predictions accounting for structural 3D phase formation, optical absorption and chemical stability altogether can be achieved. The ML results concerning the prediction accuracy, measured by R 2 coefficient of determination, are summarized in Table 3. For MSL and RF methods, three different sizes for the train sets were investigated. In this way, we can point out a certain leveling of the prediction accuracies for about 100 random samples in the train set, beyond which a further increase in accuracy is not significant. This is important, as quite accurate results can be achieved using less than 10% from the total of 1120 samples. This in not entirely surprising, since e.g., the action of the halogens on the electronic gap is the same in all the cation mixtures. Linear regression methods such as MLS, using the multi-dimensional input assembled from the entity proportions, generally perform rather well particularly for f a and f s indices, with R 2 > 0.77 and R 2 > 92 on average and small variance, respectively. However, the structural deformations described by l min and l max have a more complex behavior and non-linear methods such as RF perform better, e.g., R 2 = 0.88 (RF) compared R 2 = 0.56 (MLS) for l min . This is particularly true in the case of l min , while the distribution of l max has a more complicated relation to the structure type distribution and the resulting prediction accuracies are smaller, as one may infer from Figure 2a The ANN models confirm the prediction accuracies of the previous two methods. This is illustrated in Figure 8, where the reference DFT calculated values and ANN predictions are indicated. Here, N train = 100 random structures were used as before, while the remaining set is equally split into validation and test sets. However, in this case, the overtraining effects is rather small, particularly for f a and f (i/e) s , so that validation and test sets yield rather similar values for a fixed number of training epochs. The relatively large fluctuations observed for l max , as compared to l min , obtained for the MLS and RF methods, is indicative for a certain limitation of the model's prediction capacity, which is also found to be the case for the ANN model. Instead, l min is a better suited parameter that can be strongly correlated with structural deformations for a certain perovskite composition and can be accurately inferred by the different ML techniques.

Conclusions
This study encompasses a rather broad class of perovskite materials with mixtures of cations and halogens. The systems are analyzed from both the perspective of optical absorption and the chemical stability, which is essential in the development of highperformance PSCs. The proposed methodology enables an efficient and unified description of many instances with varying proportions of cations and halogens. Some known results are recovered (e.g., based on MA, FA, GA), while some of the less studied compositions (e.g., based on EA, FM, HA, HZ) are integrated into the larger picture. Performing a bond lengths analysis, we first determine which structures are susceptible to acquire 3D perovskite structure or not, followed by an investigation of the chemical stability. One intrinsic degradation mechanism and one extrinsic mechanism subject to addition of molecular oxygen are evaluated for each structure. It was found that the most promising candidates with respect to the intrinsic degradation mechanism contain high proportions of GA and, in particular, FA-GA mixtures, while bromine is highly influential for enhancing stability. The same cation mixtures provide good mitigation of oxygen induced degradation, while here chlorine seems to have a larger impact. The cations which are reportedly less likely to form 3D perovskite structures (FM, HA, HZ), may appear in mixtures (e.g., with GA) to improve their structural and chemical stability. It was also established that the stability can be correlated with the deprotonation resistance of the organic cations. On the other hand, an optimal absorption can be reached by EA-GA and EA-FA mixtures, potentially with small amounts of FA, while iodine is generally the best choice as it induces a relatively smaller gap compared to bromine and chlorine. The EA cation, which has not been extensively investigated in previous studies, may prove to be a good candidate for enhanced opto-electronic properties. Somewhat surprisingly, small proportions of FM and HZ added to a GA-based perovskite may also result in a quite high optical performance.
As the high-throughput calculations of this type require extensive numerical resources, we investigated in how far ML techniques can predict the essential quantities. It was established that using random subsets of ∼10% of the calculated structures can still provide accurate results, using feature vectors based on entity proportions, which are readily available. The absorption index f a and the stability indices f (i/e) s can be predicted using multivariate linear regression methods on a much smaller subset. However, some other properties stemming from bond lengths distribution, notably the minimum and maximum values, require non-linear regression techniques, which were here implemented using the random forests method and artificial neural networks. We identify the variation of the minimum bond length as a suitable, predictable parameter to assess the deformations of the perovskite structures.
In conclusion, we proposed a methodology for an efficient exploration of candidate perovskite materials, to find optimal solutions for both opto-electronic and stability properties as required by the PSC technology. DFT-ML schemes of this type, which have a reasonable compromise between accuracy and efficient exploration of a huge compositional space, are easily scalable and can provide further essential insight into complex perovskite structures, specifically multiple-cation and low-dimensional systems.