Power Conversion Efficiency of Arylamine Organic Dyes for Dye-sensitized Solar Cells (dsscs) Explicit to Cobalt Electrolyte: Understanding the Structural Attributes Using a Direct Qspr Approach

Post silicon solar cell era involves light-absorbing dyes for dye-sensitized solar systems (DSSCs). Therefore, there is great interest in the design of competent organic dyes for DSSCs with high power conversion efficiency (PCE) to bypass some of the disadvantages of silicon-based solar cell technologies, such as high cost, heavy weight, limited silicon resources, and production methods that lead to high environmental pollution. The DSSC has the unique feature of a distance-dependent electron transfer step. This depends on the relative position of the sensitized organic dye in the metal oxide composite system. In the present work, we developed quantitative structure-property relationship (QSPR) models to set up the quantitative relationship between the overall PCE and quantum chemical molecular descriptors. They were calculated from density functional theory (DFT) and time-dependent DFT (TD-DFT) methods as well as from DRAGON software. This allows for understanding the basic electron transfer mechanism along with the structural attributes of arylamine-organic dye sensitizers for the DSSCs explicit to cobalt electrolyte. The identified properties and structural fragments are particularly valuable for guiding time-saving synthetic efforts for development of efficient arylamine organic dyes with improved power conversion efficiency.


Introduction
Research into renewable energy has become one of the most imperative issues in global energy strategy due to increasing energy consumption and limited fossil resources.The incident solar energy on earth per hour exceeds the current consumption of the energy of the world per year.The necessity of cultivating renewable energy sources is growing day by day [1].Therefore, efficient solar energy conversion provides a promising technology for balancing the increasing energy demand due to fast industrial development [2].In general, industrial photovoltaic cells are made of monocrystalline silicon, polycrystalline silicon, amorphous silicon, cadmium telluride, copper indium selenide/sulfide, or GaAs-based multi-junction material systems [3].
The possibility of using devices based on molecular components for the construction of a large-scale solar electricity production facility became reality after O'Regan and Grätzel reported the invention of their efficient dye solar cell [4].Emerging research topics and pre-industrial technology in solar cells, which involve light-absorbing dyes for dye-sensitized solar systems (DSSCs), quantum dot solar devices (QDSCs), Perovskite solar cell, and organic/polymer solar cells, are still below the performance of silicon devices, with the highest efficiency still below 13% for DSSC [5].The estimated theoretical QDSC performance is close to that of silicon-based devices; however, at present those record efficiencies only amount to 4.5% [6].Despite the present low efficiency and stability, organic solar cell applications are intensively studied due to the potential low cost of production [7,8].Specifically, the development of efficient DSSCs by O'Regan and Grätzel in 1991 [4] has put the solid-state technology on a challenge by devices operating at molecular or nanoscale levels.
The intricate relationship between the electron injection and dye regeneration processes in DSSCs is governed by redox reactions.The central constituent of the dye regeneration reaction is electrolytes where a classic I − /I 3 − redox couple has been considered one of the high-performance organic solvents in DSSCs [9].However, the I − /I 3 − redox couple reduces the rate of open-circuit potential, V OC , during the dye-regeneration process due to its complex redox chemistry.A better reduction potential redox mediator than I 3 − is the cobalt (Co II /Co III ) complexes (polypyridyl), which can be used to increase the efficiency of DSSCs [10,11].The redox chemistry of cobalt complexes can be fine-tuned by varying the ligand environment.To increase the efficiency of DSSCs, it is better to keep Co III away from the surface of the semiconductor.In the case of organic dyes, this can be accomplished by using longer/bulky substituents in the donor group [12,13].A cobalt complex has been used in Zn-based [10], Ru-based [11], and organic dye-based [12,13] solar cells with very good efficiency.Therefore, in our study we have considered DSSCs as necessary to cobalt electrolytes.Arylamine organic dyes (AOD) with donor (D), π-bridge (π), and acceptor (A) moieties for DSSCs have received great attention in the last decade because of their high molar absorption coefficient, low cost, and structural variety.In the case of DSSCs, a large number of experimental works are available [8], together with some theoretical studies [14]; joint experimental and theoretical [15][16][17] work is becoming the common research strategy.However, the development and optimization of materials for organic solar cells is, in general, not yet rational, but rather empirical.The aim of the present work is to determine the macroscopic characteristics of arylamine organic dyes of DSSCs from computed molecular properties and to elucidate the elementary mechanistic steps of electron transfer in such solar cells through DFT and TD-DFT studies, followed by quantitative structure-property relationship (QSPR) study.The use of in silico methods for chemical property prediction is well established and the QSPR method emerged as an important computational tool with a diverse range of applications [18].Robust and validated QSPR models can predict properties of new or untested molecular structures and provide insights that expedite the design of new compounds with enhanced desired properties [18].When applied to the property and energy efficiency of compounds, regulatory agencies worldwide have already accepted the implications of QSPR models that can explore the central relationship between organic dyes and the composite system of DSSCs.This kind of study not only saves resources but also provides some guidance for the synthesis of more efficient dye sensitizers in the future.
A limited number of QSPR models have been developed to model the power conversion efficiency (PCE) value of solar cells to date.Though few QSPR models were developed for the modeling of PCE of DSSCs, the only global QSPR model was reported by our group for Fullerene derivatives (FDs) as an acceptor for polymer-based solar cells (PSCs).We have revealed nine FDs with promising PCE values through QSPR models from virtual screening with a 200% increase in PCE value, compared to existing FDs as acceptors [19].Venkatraman et al. [20] reported the first successful application of comparative molecular field analysis (CoMFA) and vibrational frequency-based eigenvalue (EVA) descriptors to model molecular structure-photovoltaic performance relationships for a set of 40 coumarin derivatives as dyes for DSSCs.The obtained models provided statistically robust predictions of PCE values for all studied dyes.In another study, Venkatraman et al. [21] investigated de novo computational design methodology to design a coumarin-based dye sensitizer with improved properties for use in DSSCs.Ip et al. [22] performed successful prediction through in silico modeling of a set of new dyes on the basis of the known performance of existing dyes.In a recent work, Li et al. [23] investigated organic dyes from diverse classes of chemicals to model PCE with a QSPR tool.They reported an acceptable cascade QSPR model to model PCE.
Here we have considered 21 arylamine organic dyes on which experiments have been performed with a TiO 2 film and liquid cobalt electrolyte for DSSCs.This allows us to strictly maintain OECD Principle 1 to model PCE with computed descriptors directly from the dyes' structures without considering any experimental properties of DSSCs.The constructed QSPR model enables the identification of the essential structural attributes necessary for quantifying the molecular prerequisites of diverse dyes, chiefly responsible for the high PCE of DSSCs.The identified properties and structural fragments are particularly valuable for guiding future synthetic efforts for development of new organic dyes with improved power conversion efficiency.

Dataset
The dataset includes 21 organic arylamine dyes as sensitizers with a TiO 2 film and liquid cobalt electrolyte for solar cells with experimental PCE values collected from the literature [24][25][26][27][28][29][30][31][32][33].This dataset covers all possible arylamines available in the literature considering the experimental conditions.Therefore, although the dataset is small it is the only one available.The complete workflow of the study is addressed in Figure 1 and molecular structures with observed parameters are represented in Table 1.
classes of chemicals to model PCE with a QSPR tool.They reported an acceptable cascade QSPR model to model PCE.
Here we have considered 21 arylamine organic dyes on which experiments have been performed with a TiO2 film and liquid cobalt electrolyte for DSSCs.This allows us to strictly maintain OECD Principle 1 to model PCE with computed descriptors directly from the dyes' structures without considering any experimental properties of DSSCs.The constructed QSPR model enables the identification of the essential structural attributes necessary for quantifying the molecular prerequisites of diverse dyes, chiefly responsible for the high PCE of DSSCs.The identified properties and structural fragments are particularly valuable for guiding future synthetic efforts for development of new organic dyes with improved power conversion efficiency.

Dataset
The dataset includes 21 organic arylamine dyes as sensitizers with a TiO2 film and liquid cobalt electrolyte for solar cells with experimental PCE values collected from the literature [24][25][26][27][28][29][30][31][32][33].This dataset covers all possible arylamines available in the literature considering the experimental conditions.Therefore, although the dataset is small it is the only one available.The complete workflow of the study is addressed in Figure 1 and molecular structures with observed parameters are represented in Table 1.

Structure Preparation, Molecular and Quantum Chemical Calculations (DFT/TD-DFT)
Li et al. [23] suggested the importance of quantum-mechanical descriptors in the modeling of AOD.Therefore, the molecular geometries of Dye structures considered here were first prepared by molecular mechanics (MM).The MM + molecular mechanics method, as implemented in the HyperChem 8.07 software package [34], was applied and subsequently geometry optimizations were performed in the gas phase using DFT and TD-DFT methods with B3LYP and CAM-B3LYP exchange correlation functionals, respectively, as implemented in GAUSSIAN 09 software package [35].In both cases we choose the same basis set 6-31G (d, p).The CAM-B3LYP functional for

Structure Preparation, Molecular and Quantum Chemical Calculations (DFT/TD-DFT)
Li et al. [23] suggested the importance of quantum-mechanical descriptors in the modeling of AOD.Therefore, the molecular geometries of Dye structures considered here were first prepared by molecular mechanics (MM).The MM + molecular mechanics method, as implemented in the HyperChem 8.07 software package [34], was applied and subsequently geometry optimizations were performed in the gas phase using DFT and TD-DFT methods with B3LYP and CAM-B3LYP exchange correlation functionals, respectively, as implemented in GAUSSIAN 09 software package [35].In both cases we choose the same basis set 6-31G (d, p).The CAM-B3LYP functional for

Structure Preparation, Molecular and Quantum Chemical Calculations (DFT/TD-DFT)
Li et al. [23] suggested the importance of quantum-mechanical descriptors in the modeling of AOD.Therefore, the molecular geometries of Dye structures considered here were first prepared by molecular mechanics (MM).The MM + molecular mechanics method, as implemented in the HyperChem 8.07 software package [34], was applied and subsequently geometry optimizations were performed in the gas phase using DFT and TD-DFT methods with B3LYP and CAM-B3LYP exchange correlation functionals, respectively, as implemented in GAUSSIAN 09 software package [35].In both cases we choose the same basis set 6-31G (d, p).The CAM-B3LYP functional for

Structure Preparation, Molecular and Quantum Chemical Calculations (DFT/TD-DFT)
Li et al. [23] suggested the importance of quantum-mechanical descriptors in the modeling of AOD.Therefore, the molecular geometries of Dye structures considered here were first prepared by molecular mechanics (MM).The MM + molecular mechanics method, as implemented in the HyperChem 8.07 software package [34], was applied and subsequently geometry optimizations were performed in the gas phase using DFT and TD-DFT methods with B3LYP and CAM-B3LYP exchange correlation functionals, respectively, as implemented in GAUSSIAN 09 software package [35].In both cases we choose the same basis set 6-31G (d, p).The CAM-B3LYP functional for

Structure Preparation, Molecular and Quantum Chemical Calculations (DFT/TD-DFT)
Li et al. [23] suggested the importance of quantum-mechanical descriptors in the modeling of AOD.Therefore, the molecular geometries of Dye structures considered here were first prepared by molecular mechanics (MM).The MM + molecular mechanics method, as implemented in the HyperChem 8.07 software package [34], was applied and subsequently geometry optimizations were performed in the gas phase using DFT and TD-DFT methods with B3LYP and CAM-B3LYP exchange correlation functionals, respectively, as implemented in GAUSSIAN 09 software package [35].In both cases we choose the same basis set 6-31G (d, p).The CAM-B3LYP functional for * Compounds present in the test set.

Structure Preparation, Molecular and Quantum Chemical Calculations (DFT/TD-DFT)
Li et al. [23] suggested the importance of quantum-mechanical descriptors in the modeling of AOD.Therefore, the molecular geometries of Dye structures considered here were first prepared by molecular mechanics (MM).The MM + molecular mechanics method, as implemented in the HyperChem 8.07 software package [34], was applied and subsequently geometry optimizations were performed in the gas phase using DFT and TD-DFT methods with B3LYP and CAM-B3LYP exchange correlation functionals, respectively, as implemented in GAUSSIAN 09 software package [35].In both cases we choose the same basis set 6-31G (d, p).The CAM-B3LYP functional for the TD-DFT method is selected as its long-range coulomb-attenuating method comprises 19% of HF and 81% of B88 exchange interaction at short range; and 65% of HF plus 35% of B88 at long range [36].Furthermore, the literature suggested that the CAM-B3LYP method provides reasonable prediction for the excitation energies and the absorption spectra of the D-π-A molecules [37,38].Therefore, the vertical excitation energy and electronic absorption spectra were simulated using the TD-CAM-B3LYP method.Again, Magyar and Tretiak [39] study showed that to describe charge transfer states one should include 50% or more HF exchange for more reliability.
The effects of the solvents (the experimental solvents for the studied compounds were acetonitrile, dichloromethane, dimethylforamide) on the absorption spectra of the studied dyes have been simulated by the conductor-like polarizable continuum model (C-PCM) method at the TD CAM-B3LYP/6-31G (d, p) level of theory to compare the computed spectrum with the experimental spectrum of the dyes.

Descriptor Selection
Exploring the Gaussian output files from DFT and TD-DFT, we computed 32 basic quantum-mechanical descriptors.In addition, DRAGON 6 [40] software was used for computation of a total of 132 constitutional, ring, functional group counts & Atom-type E-state indices from the optimized structures to identify the important structural fragments responsible for better PCE.All the quantities related to DFT/TD-DFT properties are extracted for building QSPR models.Quantum-mechanical descriptors such as energy of HOMO and LUMO, difference of HOMO and LUMO energy, valence band, conduction band, hardness, Mulliken electronegativity, electronic chemical potential, and dipole moment are considered important, as suggested by the literature.Properties from DFT study such as absorption spectrum, ability to interacting with semiconductor oxide surface, excitation energy level, and stability are also considered as they significantly affect PCE.On the other hand, from TD-DFT calculations one can get energy descriptors that indicate the stability of molecules and electron transition property; the excited state properties, like maximum absorption wavelength and width of spectrum, give parameters related to sunlight absorption.The complete list of computed descriptors for the study is illustrated in Table S1 in the Supplementary Materials.

Data Pre-Processing
The complete pool of descriptors was initially pretreated with a 0.0001 variance cutoff and passed through a 0.99 correlation coefficient to eliminate correlations between them and reduce the noise level among input descriptors.Then, a genetic algorithm (GA) was applied to select the best possible set of descriptors for QSPR modeling from the pretreated pool consists of 139 descriptors [41].Such a "descriptor-thinning" procedure led us to identify 10 common descriptors from the pool of 139 from 100 equations generated by GA for the complete dataset.Therefore, the identified ones are the most relevant for our study as they can reflect the required properties of all studied dyes.

Model Development and Validation
We employed the genetic algorithm (GA) technique [41] as the selection statistical tool implemented in the Genetic Algorithm 1.4 software package [41].Next, multiple linear regression (MLR) analysis was performed by MLR Plus Validation GUI 1.2 software [41], using the training set compounds to develop the QSPR model, which was followed by validation of the model using the test set compounds.We used the following steering parameters for the GA algorithm: total number of iterations 100, cross-over probability 1, mutation probability 0.5, and smoothing parameter (LOF calculation) 1.

Model Validation and Metrics
Different statistical metrics were employed to ensure the fitness of the in silico models, and internal and external validation methodologies were subsequently employed for model validation.The goodness-of-fit of the equation was judged by the quality metric determination coefficient (R 2 ), as well as by using the following internal validation metrics: the leave-one-out cross-validation parameter, Q 2 LOO , and external validation metrics, R 2 pred .The r m 2 metric [42,43] was also calculated for the present work.The developed QSPR models were also subjected to additional validation tests like Q 2 ext(F2) ; and Golbraikh and Tropsha's criteria were used to check each model's reliability [43].

Y-Randomization
The robustness of the models was checked based on the Y-randomization technique.For a robust model, the determination coefficient (R 2 ) of the non-random model should exceed the squared average correlation coefficient of the randomized models (R r 2 ).The model randomization was performed 100 times via shuffling the dependent variables while maintaining the original independent variables.The average R 2 of 100 random models was computed and defined as R r 2 followed by calculation of the c R p 2 parameter [41,43] that penalizes model R 2 for small differences in the values of R 2 and R r 2 .
For an acceptable model, the value of c R p 2 should be greater than 0.5.

Applicability Domain Study
According to OECD Principle 3, an acceptable QSPR model should possess a defined AD, which represents the chemical space defined by the structural information extracted from the chemicals used in model development, i.e., the training set compounds in a QSAR analysis.Here, the AD of the QSPR model was checked using two different approaches: (a) the Euclidean distance approach [41]; and (b) the standardization-based technique [44].

Computational Results
Based on the experimental PCE data and thinned descriptors, we developed four separate QSPR models employing the hybrid GA-MLR tool.The statistical validation metrics data are provided in Table 2. Here, it is interesting to say that only acceptable internal validation does not provide the ultimate criteria to choose the best QSPR model.Rather, external validation is equally important to balance the quality as well as the predictability of any model to confirm the accurate prediction for new compounds.If we compare the numerical values, the least effective model is developed from the Kennard-Stone-based division (Model 3) as the basic external prediction metric R   S1.
The Y-randomization (Figure 2) study was performed to ensure that the model was not the outcome of mere chance alone, and all four models passed as their internal qualities are acceptable.However, except for the model developed based on K-Medoid clustering, the other models lack predictability.Again, a scatter plot (Figure 3) strongly supports the fact that Model 4 is the best among all four models as not a single test compound fell outside of the mean normalized distance of the training set compounds.Additionally, the AD study with the Euclidean distance approach and the standardization-based technique concludes that all test compounds were inside of the AD and their predictions are completely reliable for Model 4. Therefore, we can confidently predict 100% of test compounds based on the developed model after rigorous testing for validation and AD.The Euclidean distance plot is provided in Figure 4.For the comparison of all models' quality, scatter plots and Euclidean distance plots for other three models are also provided in Figures 3 and 4, respectively.The statistical data as well as all graphical illustrations prove that QSPR Model 4 is well fitted and robust to reliably predict the PCE of untested compounds.The predicted PCE value for all studied molecules is illustrated in Table 1.So, for the interpretation and further discussions we have considered only Model 4. The developed equation for Model 4 is as follows: ( Equation ( 2) involves three descriptors and explains 81.0% of the variance.Equation ( 2) was obtained in compliance with OECD principles, although the complete dataset is extremely diverse in terms of the structural variability of the molecules.However, satisfactory measures of goodness-of-fit, robustness, and predictability were achieved.Moreover, the least possible deviation of the predicted activity data from the corresponding observed ones is further implied from the satisfactory values of all the rm 2 metrics.Almost identical values for the Q 2 (F1) (0.63) and Q 2 (F2) (0.62) metrics indicate that the test and training sets selected for the development of the QSPR model have similar response distributions.External predictability was further assessed according to Golbraikh and Tropsha's criteria, which are highly satisfactory.The statistical data as well as all graphical illustrations prove that QSPR Model 4 is well fitted and robust to reliably predict the PCE of untested compounds.The predicted PCE value for all studied molecules is illustrated in Table 1.So, for the interpretation and further discussions we have considered only Model 4. The developed equation for Model 4 is as follows: PCE(%) = 6.131 (±0.493) + 4.199 (±0.838) × (SssssC) +1.254 (±0.305) × (nCq) + 0.006 (±0.001) × (D/Dtr11) .
Equation ( 2) involves three descriptors and explains 81.0% of the variance.Equation ( 2) was obtained in compliance with OECD principles, although the complete dataset is extremely diverse in terms of the structural variability of the molecules.However, satisfactory measures of goodness-of-fit, robustness, and predictability were achieved.Moreover, the least possible deviation of the predicted activity data from the corresponding observed ones is further implied from the satisfactory values of all the r m 2 metrics.Almost identical values for the Q Further, simulated absorption spectra for the investigated dyes calculated at TD CAM-B3LYP/6-31G (d, p) level of theory in reported experimental solvents by the C-PCM methods.One has to note that the experimental spectra are completely comparable with the computed ones (see Table S2 and Figure S1 in the Supplementary Materials), suggesting that the choice of functional for the present study is quite reliable.

Interpretation of the Developed Model
The descriptors thus appearing in Equation ( 2) obey the following order of significance based on their standardized coefficients: (a) SssssC; (b) nCq; and (c) D/Dtr11; and the standardized coefficient values are 1.45, 1.20, and 1.12, respectively.All the modeled descriptors make a positive and almost equivalent contribution to PCE values.Pearson correlation (Table 3) between descriptors is checked to avoid correlation and, interestingly, there is no correlation at all.One has to note that the training set is small in comparison to the number of parameters used for producing the QSPR models, but our model completely maintains the descriptors usage rule for the QSPR model considering the number of compounds (rule of 5:1 :: Sample size : Descriptors) [18].To comply with OECD Principle 5, a mechanistic interpretation should be given for any predictive QSPR model.Here we provide the interpretation, with suitable examples, and justify the importance of each descriptor appearing in the GA-MLR equation: SssssC is an atom-type E-state index that defines the sum of E-state values for the presence of Further, simulated a CAM-B3LYP/6-31G (d, p) le One has to note that the ex (see Table S2 and Figure S1 for the present study is quit

Interpretation of the Deve
The descriptors thus a on their standardized coeff coefficient values are 1.45, 1 and almost equivalent contr is checked to avoid correlat the training set is small in models, but our model co considering the number of c SssssC is an atom-ty C fragments in a molec quaternary C(sp 3 ).Althoug higher number of alkyl su increment of count of sp 3 c efficiency of the solar cell vi anchor group and the long literature suggests [46] that alkyl chain.Not only that, b length of the alkyl chain du alkyl chains attached to the Therefore, our findings are fragments in a molecule, and nCq is a functional group count related to the number of total quaternary C(sp 3 ).Although they are different descriptors, their impact on PCE is the same.A higher number of alkyl substitutions results in a high E-state (SssssC) value along with the increment of count of sp 3 carbon atoms.Structure and the orientation of dyes mostly dictate the efficiency of the solar cell via electron injection dynamics.In this regard, carboxyl plays the role of an anchor group and the longer alkyl chain is one of the most important entities in the dyes.The literature suggests [46] that J-aggregates formation becomes easier with the increasing length of the alkyl chain.Not only that, but the intrinsic sensitization or quantum efficiency is increased with the length of the alkyl chain due to the bathochromic shift of the absorption band [47].Moreover, longer alkyl chains attached to the sensitizer dyes allow the dye to bind normally to the TiO 2 surface [48].Therefore, our findings are completely in agreement with the literature.
Another property D/Dtr11 is a ring descriptor that defines the distance/detour ring index of order 11.This descriptor can be best defined with the fragment illustrated in Figure 5. Compounds like 1, 13, 16, etc. have this fragment, along with a higher value for other descriptors showing very high PCE (see Table 1).On the contrary, compounds like 6, 9, and 18 lack this fragment, resulting in lower PCE values (see Table 1).Again, compound 17 has the ring fragment but due to the lowest value for the other two descriptors it generates the lowest PCE value among the studied dyes (see Table 1).This fragment will help to make a composite fused structure dye system where favorable molecular orbital energy enables rapid electron injection into the semiconductor and allows efficient regeneration of the oxidized dye.As the effectiveness of the DSSCs is influenced by the amount of solar energy absorbed by the dye and the kinetics of the charge transfer across the semiconductor network, the abovementioned fragment affects performance immensely to gain a higher PCE value for DSSCs, along with the abovementioned descriptors SssssC and nCq.Another property 11 / Dtr D is a ring descriptor that defines the distance/detour ring index of order 11.This descriptor can be best defined with the fragment illustrated in Figure 5. Compounds like 1, 13, 16, etc. have this fragment, along with a higher value for other descriptors showing very high PCE (see Table 1).On the contrary, compounds like 6, 9, and 18 lack this fragment, resulting in lower PCE values (see Table 1).Again, compound 17 has the ring fragment but due to the lowest value for the other two descriptors it generates the lowest PCE value among the studied dyes (see Table 1).This fragment will help to make a composite fused structure dye system where favorable molecular orbital energy enables rapid electron injection into the semiconductor and allows efficient regeneration of the oxidized dye.As the effectiveness of the DSSCs is influenced by the amount of solar energy absorbed by the dye and the kinetics of the charge transfer across the semiconductor network, the abovementioned fragment affects performance immensely to gain a higher PCE value for DSSCs, along with the abovementioned descriptors SssssC and nCq .

Conclusions
We have introduced a QSPR model that allowed us to identify the essential fragments and structural features of AOD that are most responsible for the high PCE of DSSCs explicit to cobalt electrolytes through a stringent validation approach.Our findings can be summarized as follows:

•
The QSPR model enables identification of the essential structural attributes necessary for quantifying the prime molecular prerequisites of a diverse AOD system that could guide the design and synthesis of more efficient dyes in the near future.The interpretation of the model revealed that a higher number of alkyl substitutions, along with the increment of count of sp 3 carbon atoms and the combination of a distance/detour ring index of order 11 fragment, enable rapid electron injection into the semiconductor.This dynamic step allows efficient regeneration of the oxidized dye and helps to achieve a higher PCE value for an arylamine dye-sensitized solar cell explicit to cobalt electrolytes.

•
The QSPR model, developed from a set of 21 diverse AOD, is an efficient tool to screen a wide range of AOD, allowing for the identification of dyes with high PCE in a time-and cost-effective manner.

•
The developed QSPR model is particularly valuable for predicting and characterizing the nature of the donor:π-bridge:acceptor relationships critical for photoconversion.
Our calculations provide a set of data that will enable scientists to reduce their experimental effort, time, and resources.In addition, the exploratory features may assist in designing more efficient units.In our future work, we are working to develop global QSPR models for a huge number of AOD for DSSCs explicit to iodine electrolytes, followed by designing new AOD with their first principle study with emphasis on electron transfer rates and an experimental analysis that could further validate our QSPR models.

Conclusions
We have introduced a QSPR model that allowed us to identify the essential fragments and structural features of AOD that are most responsible for the high PCE of DSSCs explicit to cobalt electrolytes through a stringent validation approach.Our findings can be summarized as follows: • The QSPR model enables identification of the essential structural attributes necessary for quantifying the prime molecular prerequisites of a diverse AOD system that could guide the design and synthesis of more efficient dyes in the near future.The interpretation of the model revealed that a higher number of alkyl substitutions, along with the increment of count of sp 3 carbon atoms and the combination of a distance/detour ring index of order 11 fragment, enable rapid electron injection into the semiconductor.This dynamic step allows efficient regeneration of the oxidized dye and helps to achieve a higher PCE value for an arylamine dye-sensitized solar cell explicit to cobalt electrolytes.

•
The QSPR model, developed from a set of 21 diverse AOD, is an efficient tool to screen a wide range of AOD, allowing for the identification of dyes with high PCE in a time-and cost-effective manner.

•
The developed QSPR model is particularly valuable for predicting and characterizing the nature of the donor:π-bridge:acceptor relationships critical for photoconversion.
Our calculations provide a set of data that will enable scientists to reduce their experimental effort, time, and resources.In addition, the exploratory features may assist in designing more efficient units.In our future work, we are working to develop global QSPR models for a huge number of AOD for DSSCs explicit to iodine electrolytes, followed by designing new AOD with their first principle study with emphasis on electron transfer rates and an experimental analysis that could further validate our QSPR models.

Figure 1 .
Figure 1.A complete scheme of the present study.Figure 1.A complete scheme of the present study.

Figure 1 .
Figure 1.A complete scheme of the present study.Figure 1.A complete scheme of the present study.
Selection of the training and test sets plays a crucial role in the construction of a statistically significant QSAR model.The selection should be such that the test set molecules lie within the chemical space occupied by the training set molecules.To nullify the bias of splitting the dataset, we have employed four different techniques to split the dataset into training and test sets with a 3:1 ratio [41].The techniques are as follows: (1) Activity sorted; (2) Euclidean distance based; (3) Kennard stone based; and (4) K-Medoid clustering.

Figure 4 .
Figure 4. Euclidean distance plots of the developed QSPR model for the AD study.Models were developed with the following dataset splitting methods: (A) Activity sorted; (B) Euclidean distance based; (C) Kennard-Stone based; and (D) K-Medoid clustering.

Figure 4 .
Figure 4. Euclidean distance plots of the developed QSPR model for the AD study.Models were developed with the following dataset splitting methods: (A) Activity sorted; (B) Euclidean distance based; (C) Kennard-Stone based; and (D) K-Medoid clustering.

2 ( 2 =
F1) (0.63) and Q 2 (F2) (0.62) metrics indicate that the test and training sets selected for the development of the QSPR model have similar response distributions.External predictability was further assessed according to Golbraikh and Tropsha's criteria, which are highly satisfactory.Q 2 = 0.66 > 0.5, Passed r 2 = 0.63 > 0.6, 0.31 < 0.1, Passed 0.85 ≤ k = 0.984 ≤ 1.15 or, 0.85 ≤ k = 1.001 ≤ 1.15, Passed To check the model's quality further, we employed another stringent criterion to check the quality and predictability of the model together, developed by Roy et al. [45]: error-based judgment of test set predictions.The obtained Mean Absolute Error (MAE) value is 0.43027, the Standard Deviation of Absolute Error (SD) is 0.25567, and the suggested model quality based on MAE-based criteria showed 'GOOD' prediction.

=
To check the model's quality and predictability o of test set predictions.The Deviation of Absolute Erro criteria showed 'GOOD' pre

Table 1 .
Chemical structure of arylamine organic dyes tested in DSSCs employing liquid cobalt electrolytes with their experimental and predicted %PCE values.

Table 1 .
Chemical structure of arylamine organic dyes tested in DSSCs employing liquid cobalt electrolytes with their experimental and predicted %PCE values.

Table 1 .
Chemical structure of arylamine organic dyes tested in DSSCs employing liquid cobalt electrolytes with their experimental and predicted %PCE values.

Table 1 .
Chemical structure of arylamine organic dyes tested in DSSCs employing liquid cobalt electrolytes with their experimental and predicted %PCE values.

Table 1 .
Chemical structure of arylamine organic dyes tested in DSSCs employing liquid cobalt electrolytes with their experimental and predicted %PCE values.

Table 1 .
Chemical structure of arylamine organic dyes tested in DSSCs employing liquid cobalt electrolytes with their experimental and predicted %PCE values.

mV FF %PCE (Experimental) %PCE (Predicted) Structure
* Compounds present in the test set.
well as external validation metrics to prove the models' quality as well as the predictability of new test compounds that are not considered in the model development.
the remaining three models, the activity-based model (Model 1) failed in r m 2 metrics for the test set and the Euclidean-based model (Model 2) bearing r m 2 metrics values below the stipulated threshold for both training and test sets though all internal validation metrics is acceptable.Therefore, Model 4, based on K-Medoid clustering, emerged as the best QSPR model considering the acceptable values for internal as

Table 2 .
Statistical quality of QSPR models developed based on different division tools.
* Definitions of descriptors are provided in Table

Table 3 .
Calculated square correlation among the modeled descriptors for Model 4.