Stability Constant and Potentiometric Sensitivity of Heavy Metal–Organic Fluorescent Compound Complexes: QSPR Models for Prediction and Design of Novel Coumarin-like Ligands

Industrial wastewater often consists of toxic chemicals and pollutants, which are extremely harmful to the environment. Heavy metals are toxic chemicals and considered one of the major hazards to the aquatic ecosystem. Analytical techniques, such as potentiometric methods, are some of the methods to detect heavy metals in wastewaters. In this work, the quantitative structure–property relationship (QSPR) was applied using a range of machine learning techniques to predict the stability constant (logβML) and potentiometric sensitivity (PSML) of 200 ligands in complexes with the heavy metal ions Cu2+, Cd2+, and Pb2+. In result, the logβML models developed for four ions showed good performance with square correlation coefficients (R2) ranging from 0.80 to 1.00 for the training and 0.72 to 0.85 for the test sets. Likewise, the PSML displayed acceptable performance with an R2 of 0.87 to 1.00 for the training and 0.73 to 0.95 for the test sets. By screening a virtual database of coumarin-like structures, several new ligands bearing the coumarin moiety were identified. Three of them, namely NEW02, NEW03, and NEW07, showed very good sensitivity and stability in the metal complexes. Subsequent quantum-chemical calculations, as well as physicochemical/toxicological profiling were performed to investigate their metal-binding ability and developability of the designed sensors. Finally, synthesis schemes are proposed to obtain these three ligands with major efficiency from simple resources. The three coumarins designed clearly demonstrated capability to be suitable as good florescent chemosensors towards heavy metals. Overall, the computational methods applied in this study showed a very good performance as useful tools for designing novel fluorescent probes and assessing their sensing abilities.


Introduction
The aggressive increase of industrial areas without proper waste management systems along rivers has created a huge source of water pollution [1][2][3]. Industrial wastewater normally consists of toxic chemicals and pollutants, which are extremely harmful to people Nitrogen-containing ligands (mainly diamides of pyridine and bipyridine acids) Kanahashi et al. (2022) [35] Gaussian process regression; Best model ( In this work, for the first time, several QSPR models were developed in order to predict both the stability constant (β) and the potentiometric sensitivity (PS) of heterogeneous ionophores towards Cu 2+ , Cd 2+ , and Pb 2+ ions simultaneously. The structure-property relationships (SPRs) extracted from the mathematical models were employed for designing novel coumarin fluorescent probes and assessing their sensing abilities. Finally, the newly designed ligands were evaluated for stability in complex by using quantum chemical calculations. The developability of the proposed chemosensors was also assessed by analyzing their toxicological profiles and effective organic synthesis routes.
The stability constant (β) shows how stable the complex formed between the metal and ligand is. The complex is more stable when this index rises [60]. The equation shown below is for the calculation of β when the ratio of the metal (M): ligand (L) is equal to 1:1.
In this regard, M and L are also considered as Lewis acids and bases, and the complex stability could be explained using the hard/soft acids/bases (HSABs) principle by Pearson.
Potentiometric sensitivity (PS ML ) is determined by the potential difference between reference and indicator electrodes, which are embedded in the individual aqueous solutions of metals in a range of concentrations [40]. It was calculated from the linear parts of the calibration curves obtained from the voltage values following each concentration of ion metal. The higher this index is, the larger the range of voltage potential exhibited. Defining this value will enable us to design more-accurate ion-selective membranes in the future. Each constant was also converted to the logarithm format for developing predictive models.
The ChemDraw 20.1.1 software was used to create the compound structures, which were then translated to the SMILES code so that the Dragon 6.0 software could calculate the molecular descriptors [61]. The data were preprocessed following the same protocols previously published [62,63]. Highly correlated descriptors (R ij > 0.95), constant and near constant (Std. < 0.1), were removed. The data were normalized to avoid bias while training the models. The standard scale normalization of the data was carried out using the Scikit-learn package based on the mean and standard deviation as follows [64]: n−1 (2) in which n is the number of compounds, X j is the mean values of the jth descriptor, and x ij and X ij are the normalized and original values of the jth descriptor of the ith compound. After cleaning the data, they were randomly divided into a training set and a test set for model construction or validation following the protocol implemented in the QSARINS Version 2.2.4 software [65].

Construction of QSPR Models Using Statistical and Machine Learning Techniques
The quantitative structure-properties relationship (QSPR) models in this study were built based on multi-linear regression (MLR) algorithms to predict the value of the logβ and PS ML of the new compounds. In general, MLR can be represented as a linear combination of a subset of X variables (selected descriptor) that best describes the property Y with the regression coefficients αi and the intercept α 0 : Y = α 0 + α 1 X 1 + . . . + α n X n . The optimization approach based on the genetic algorithm (GA) was used for descriptor selection and to calculate the linear parameters of the MLR models [66]. The GA-MLR is based on the principles of evolution in biology, which seeks optimal models by iteratively modifying a set of randomly generated variable combinations using genetic operators, i.e., crossover, mutation, etc. The Q 2 under the influence of K (QUIK) rule in this algorithm enabled us to reduce the dimension of the model by setting a minimum threshold. If (K XY − K XX ) < δK, this model will be excluded. K XY is the correlation between the independent variables Toxics 2023, 11, 595 5 of 26 and the dependent variable; K XX is total of correlation among the descriptors; δK is the threshold that the model trainer selects. This approach helped avoid multicollinearity. This entire procedure was carried out by using the QSARINS v2.2.4 software [65].
In the next steps, machine learning (ML) techniques were applied to examine the non-linear structure-property relationships (SPRs) of the chemical data. To do so, different ML algorithms implemented in the Scikit-learn library were employed using the same variable pools selected by the MLR models for each metal complex. Herein, ten algorithms, including non-linear regression (NR), the Gaussian process (GP), the decision tree (DT), support vector regression (SVR), multilayer perceptron (MLP), the k neighbors regressor (KNR), random forest (FR), the AdaBoost regressor MLR, the AdaBoost regressor DT, and the gradient boosting regressor were used to obtain ML models and then compared to the performance of the GA-MLR model [64]. At this stage, only the best model showing the highest performance in both the cross-validation and external validation was chosen for further screening assays. The runtime parameters corresponding to the ML models developed for each property are provided in Tables S5-S10 (Supplementary Materials). The correlation coefficient (R 2 ) was calculated from the experimental values on the ordinate axis, and models with R 2 < 0.6 were not included in this analysis [67].

Performance Assessment
QSPR models were evaluated through some statistical parameters. The first one is R 2 , which determines the proportion of variance in the dependent variable that can be explained by the independent variable [68] and is calculated by this equation: y i andŷ i are the experimental and predictive values of the dependent variable, respectively; y i is the average value of y i (all of them are in the training set).
However, it should be noted that the MLR model's R 2 increased as the number of variables increased, which gradually resulted in overfitting. Therefore, the quality of the model was assessed using this adjusted parameter [68]: where the n and p parameters are the number of compounds and independent variables in the training set, respectively. In addition, we also checked for the fitness of the MLR models during the evolution process by calculating the Friedman lack-of-fit (LOF) measurement [69]. According to the analysis of Bondarchuk [70,71], this parameter is more resistant to overfitting in comparison to the commonly used least-squares error.
In this form, the SSE is the sum of squares of errors; c is the number of terms in the model; p is the total number of descriptors; N is the number of data rows in the training set; λ is a safety factor; d is a scaled smoothing parameter [70]. Accordingly, a model with a lower LOF value is better. However, there is no universally accepted threshold to determine what constitutes a low LOF value, as this can depend on many factors such as the nature of the data, the quantity of data, and the specific algorithm used. Therefore, to determine whether this value is acceptable or not, we performed the F-test of the LOF, taking into consideration the p-values (<0.01) as the F-statistic for the LOF. In this sense, a p-value < 0.01 indicates that the LOF of selected model is significantly lower than the other models, suggesting a better-fitting model. adj merely demonstrate the goodness-of-fit of QSPR models. Cross-validation strategies such as leave-one-out (LOO) and leave-many-out (LMO) were applied to ensure the model's stability [72]. The result from the LOO approach is the cross-validation correlation coefficient Q 2 LOO , calculated by this formula: y i is the experimental property of the ith compound, andŷ i/i is the predictive one from the model built by n − 1 remaining compounds (not including compound i).
LMO was also performed to overcome the disadvantages of the LOO method, which might exaggerate the predictive power of the model [73]. Then, 70% of the training set was randomly selected to construct the model for the prediction of the remaining 30%. This process was repeated 2000 times and the Q 2 value calculated each time. These values should be close to Q 2 LOO and to their mean (Q 2 LMO ). Another issue might come out that the model accidentally has the accuracy of prediction. To exclude this possibility, Y-randomization was applied by randomly rearranging the values of the dependent variables. Calculate R 2 Y, Q 2 Y each time 2000 times. These values should be far from the original value of the models.

External Validation
The test set was used to validate the predictive ability of the model by calculating the indexes below [65,73]: where y i andŷ i/i are the experimental and predictive values of the property of the compound i th , respectively; y Ext is average of the experimental values of the property in the training set; n Ext and n Tr are the number of compounds in the test set and training set, respectively. In general, if R 2 > 0.6 and Q 2 > 0.5, the model is considered to be a good one. The closer R 2 and Q 2 are to 1, the better the obtained models are.

Applicability Domain
The QSAR/QSPR models need to determine the applicability domain (AD). AD is an area defined by the molecular descriptors and response variables of the training set. We calculated the leverage value to determine if a substance belonged to the AD following this equation [74]: h i is the leverage value of the ith compound; x i is the molecular descriptor of the compound; X is matrix including k variables (k columns) of n compounds (n rows).
The AD is limited by a line of value h* = 3p'/n with p' as the number of variables of the model plus 1 and n as the number of compounds in the training set. If a compound has h i > h* and the predicted value is outside of three-times the standard residual, it may not be predicted well by the model, or the predictive value is not reliable in this circumstance [63].

Design of Novel Coumarin-like Structures
Coumarin fluorescent probes have received significant interest in recent years due to their favorable fluorescence properties, including a high fluorescence quantum yield, long emission wavelengths, and photostability [75]. In this study, several coumarin derivatives were designed based on the precursor probes previously reported using the Marvin Sketch R-group attachment module, ChemAxon (http://www.chemaxon.com, accessed on 5 November 2022). These scaffolds showed very good selectivity in the recognition of most competing metal ions, including Cd 2+ , Cu 2+ , and Pb 2+ . By modifying the groups that have a different effect on the efficiency of charge transfer and the electron configuration of the benzene and naphthalene rings of the coumarin derivatives, it is expected that compounds will be identified with the enhancement of the fluorescence intensity towards all three metals.

Quantum Chemical Calculations
Density functional theory (DFT) calculations were performed for the ground-state gas-phase geometry optimization of the metal complexes using Beck's three-parameter exchange functional Lee-Yang-Parr (B3LYP) methods combined with the LanL2DZ basis set in the gas phase [76]. The consensus measurements were considered for estimating the stability of the complexes formed. The calculations of the newly designed compounds were performed by using the GAMESS quantum chemistry package, and the wide range of parameters were analyzed including the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO) energies, among others [77]. Besides computing the HOMO-LUMO gap ∆E = E LUMO − E HOMO , the Mulliken electronegativity χ and hardness η values could be extracted based on the finite difference approximation, viz. χ = (IE + EA)/2 and η = (IE − EA)/2, where IE is the ionization energy or ionization potential and EA is the electron affinity of the ionic states of the complex system. Based on the semiquantitative approach proposed by Zhan et al., IE, EA, χ, and η can be estimated based on the calculated HOMO and LUMO energies [78]. Other properties of interest included the global softness σ = 1/η, which measures the extent of chemical reactivity, and the chemical potential of the complex system (µ), which is the first derivative of the total energy with respect to the number of electrons [79]. Parr and colleagues proposed a new global chemical reactivity parameter, namely an electrophilicity index (ω), which may be calculated via the equation ω = µ 2 /2η [80]. In order to take into account the diffuse characteristics of the molecular orbitals of heavy elements with a large electron density, all of these derived parameters were made at the B3LYP/LANL2DZ level.

Physicochemical and Toxicological Profiling
Given the broad application of the designed coumarin fluorescent probes, especially in an aqueous medium, it is of pivotal importance to determine the possible effects of the chemosensors for human and aquatic organism health. Important physicochemical properties mainly governing the aqueous solubility, organism permeability, and in vivo dermal and oral bioavailability were calculated using consensus cheminformatics approaches such as SwissADME [81] and ADMETlab 2.0 [82], among others [83][84][85][86]. To do so, chemical SMILES codes were used as the input of the computational tools, and the results were extracted based on more than 100 indices calculated previously. In addition, multiple toxicological endpoints of the newly developed sensors towards aquatic organisms were also investigated using the same tools [81,82].

Theoretical Proposal of Synthetic Accessibility for Promising Candidates
The prediction of the synthetic accessibility (SA) of the potential probe candidates was carried out using the GASA [87], ADMETlab 2.0 [82], and SwissADME tools [81]. GASA employs a rapid graph-based molecular synthetic accessibility prediction method, which has been demonstrated to be comparable to the existing methods, such as SYBA, SCScore, RAscore, and SAscore, among others. Based on the probability of the predictions, compounds can be classified as easy-(ES) or hard-to-synthesize (HS) structures [87]. This model was trained on 640,000 compounds and achieved an accuracy of 0.932 with an AUC of 0.984. In ADMETlab 2.0, the SA scores can be predicted with~90% accuracy using the fragment-based approach developed by Ertl and Schuffenhauer [88], on the basis of 934,046 representative molecules from the PubChem database. In addition, the SwissADME tool allows calculating the SA scores based on a fragmental contribution approach. This method consists of the analysis of nearly 13 million molecules with the hypothesis that the more frequent a chemical fragment, the easier it is to synthesize that molecule [81]. The SA scores computed by SwissADME receive a value ranging from 1 (very easy synthesis) to 10 (very difficult synthesis).

Development and Selection of GA-MLR QSPR Models
A total number of six MLR models were constructed based on the experimental logβ and PS data. Applying an extensive feature selection process of >105 iterations by the GA, 90% of the new generations showed the same fitness results. The best MLR models were selected from the final iteration taking into account Austin's criteria (5 × number of variables ≤ number of observations) to avoid the overfitting problem [66,89]. The best models were further analyzed and compared based on the quality of the statistical parameters, as shown in Table 2. The mathematical formulas of the six models are given by Equations (10) to (15), and more details about the variables selected can be found in Logβ PbL = 4.6561 − 1.1628 × nR05 − 6.1855 × J_Dt + 6.4076 × ATSC1s In general, all six models presented a goodness-of-fit with an R 2 and R 2 adj of approximately 0.9, except for the logβ PbL -only models, whose values were~0.8 ( Figure 1). According to the Friedman LOF parameters, the logβ model of the Cu complex showed the highest value of 7.05, and the lowest value corresponded to the logβ model of Pb (1.35). As can be seen in Table 2, the F-statistics for the F-test of the LOF were highly significant (p-value ranging from 1.095 × 10 −19 to 3.087 × 10 −7 ), suggesting an acceptable level of the LOF. From this test, we can confirm that no overfitting existed in the models, as they presented a good fit with a minimum number of descriptors. Considering the internal validation, the values of Q 2 LOO for almost all models were greater than 0.8 with the model MLR-logβ PbL being the lone exception, owning an approximate Q 2 value of 0.7. For the LMO approach, the Q 2 LMO values after each iteration showed a slight change, being around the Q 2 LOO of the primary model (Mod. Q 2 ) (see Figure S1). This indicated that the Q 2 LOO index in these models does not overate their predictive ability. In addition to the above criteria, the optimal model was chosen based on the lower level of R 2 −Q 2 LOO , the correlation among the independent variables (as low as possible), and the high correlation between the dependent variables and independent variable (high δK values). Alternatively, Y randomization also demonstrated that the accuracy of these models was not an accidental correlation. We found that R 2 , Q 2 , and K XY in each iteration were less than the original model ( Figure S2). The autocorrelation between the descriptors was checked through the Pearson correlation matrix and the correlation among the descriptors (K XX ). As the results show in Figures S3-S8, no significant correlation between the variables was detected, as the average values of the pairwise correlation ranged from 0.25-0.48. All the descriptor values calculated by the Dragon software for each of the training and test set compounds included in the six databases are provided in Tables S21-S26. In the external validation, the Q 2 F1 , Q 2 F2 , and Q 2 F3 of the six models were almost higher than 0.8, except the Pb 2+ models with Q 2 F1 values between 0.6 and 0.7, which are still acceptable according to our criteria ( Table 2). From the results, all the MLR models were able to predict the logβ and PS ML values of the unknown structures with high accuracy.
The applicability domain of each model is shown in Figure 2. If a new chemical is to be predicted, it must fall within the AD of the model. The outliers of the domain included two types: Y outlier and X outlier. The Y outlier indicates that the predictive value is not unreliable, and the X outlier shows that these compounds should not be applied to the model. There were no Y outliers in all six plots, so the quality of these models was ensured. Regarding the X outlier, there were a few compounds that are outside of the domain. Therefore, we need to pay more attention to these structures [63].

Machine Learning Algorithms for Improving Model Performance
As can be revealed from the validation of the MLR models, the statistical linear algorithm still showed weak performances; for example, the models to predict the sensing properties of the Pb 2+ complexes only had a testing Q 2 of 0.64-0.71 (Table 2). To overcome this challenge, machine learning (ML) methods were applied using the same variable subset selected by the GA-MLR. A wide range of 10 ML algorithms were explored adopting the Scikit-learn package in Python using the same script we developed previously [64]. To optimize the model performance, a parameter tuning search was applied, and the entire results are listed in Tables S5-S10, the Supplementary Materials. Note that only models showing an accuracy higher than 0.6 were retained for this analysis, and the models with the highest performance were used for further virtual screening assays. Figure 3 displays the comparison histogram of every ML model against the best MLR models obtained previously. In general, there was a large dissimilarity among the modeling techniques, and the ML algorithms normally outperformed the simple linear models in both k-fold cross-validation on the training set and the external test set. A detailed comparison of all the selected regressors is shown in Tables S12-S17, and the resulting runtime parameters of the selected models are given in Table 3. The applicability domain of each model is shown in Figure 2. If a new chemical is to be predicted, it must fall within the AD of the model. The outliers of the domain included two types: Y outlier and X outlier. The Y outlier indicates that the predictive value is not unreliable, and the X outlier shows that these compounds should not be applied to the model. There were no Y outliers in all six plots, so the quality of these models was ensured.
Regarding the X outlier, there were a few compounds that are outside of the domain. Therefore, we need to pay more attention to these structures [63].

Machine Learning Algorithms for Improving Model Performance
As can be revealed from the validation of the MLR models, the statistical linear algorithm still showed weak performances; for example, the models to predict the sensing properties of the Pb 2+ complexes only had a testing Q 2 of 0.64-0.71 (Table 2). To overcome this challenge, machine learning (ML) methods were applied using the same variable subset selected by the GA-MLR. A wide range of 10 ML algorithms were explored adopting the Scikit-learn package in Python using the same script we developed previously [64]. To optimize the model performance, a parameter tuning search was applied, and the entire   Regarding the models to predict the stability constant (logβ) values of the Cd 2+ -ligand complexes, the histogram shown in Figure 3 indicated that the accuracy of the AdaBoost regressor MLR and non-linear regression were comparable and higher than other models, being 0.922/0.820 and 0.926/0.812 for the training/testing R 2 , respectively. These results were slightly higher than the best MLR model, especially the training process, which resulted in an R 2 value of 0.908. AdaBoost has been described as an ensemble boosting algorithm, which repeatedly combines regressors, i.e., MLR or decision trees (DTs), on the same training set with a special focus on the error of the current prediction and the difficult cases. Meanwhile, the non-linear regression algorithm, unlike the statistical linear estimator, allows for a more complex relationship between the dependent and independent variables [90]. Herein, three functions were tested, including the polynomial, logarithmic, and exponential regressions. Given the importance of external testing, the AdaBoost regressor MLR was selected for further use in the new case prediction.
For the potentiometric sensitivity (PS ML ) data of the Cd 2+ -ligand complexes, seven ML models are shown, the gradient boosting, random forest, and AdaBoost DT being the regressors with the highest accuracy for the training set (R 2~1 .0). However, only gradient boosting and the AdaBoost DT significantly outperformed the other models with a testing R 2 of 0.948 and 0.941, respectively. Unlike AdaBoost, gradient boosting uses the gradient decent algorithm to add the weak learners in a stagewise manner so that the overall prediction error significantly decreases. Consequently, the gradient boosting model was selected for the PS ML value prediction of the Cd 2+ -ligand complexes.
In contrast to the Cd 2+ complex data, most of the ML techniques showed good performance on the Cu 2+ logβ and PS ML data. For the logβ data, the AdaBoost regression DT significantly outperformed the other models on the training (R 2 = 0.984), but only obtained an accuracy of 0.721 on the test set, which was much lower than the MLR model. In this case, MLR, even having a slightly lower coefficient for the training set (R 2 = 0.89) in comparison with the other ML models (training R 2 = 0.90-0.91), surprisingly exhibited the highest accuracy on the test set. We then selected the linear model MLR for further prediction of the Cu 2+ logβ data.
The results of the ML models in predicting the PS data of the Cu 2+ -ligand complexes were somewhat the same situation as the Cu 2+ logβ data. However, we identified support vector regression (SVR) as the best model, owning an accuracy of 0.884 and 0.901 for the training and test sets. In this case, MLR was the model with the lowest performance on both the training (training R 2 = 0.867) and test sets (average testing R 2 = 0.86). More details on the SVR algorithm implemented in the Scikit-learn package can be found in the literature [64]. This model was subsequently used in the next screening assays based on the Cu 2+ PS ML data.
The accuracy results from the Pb 2+ logβ and PS ML data were very encouraging. We found significant improvement when using ML techniques in comparison with the MLR model. For the Pb 2+ logβ data, the gradient boosting regressor once again showed the highest performance, owing to the determinant coefficient for the training and test sets of 1.0 and 0.847, respectively. Another ML algorithm that also performed well on these data was multi-layer perceptron (MLP) with a training and testing accuracy of 0.903 and 0.853, respectively; however, the training results were lower than gradient boosting. Consequently, the gradient boosting model was without a doubt selected for further prediction of the Pb 2+ logβ data.
For the Pb 2+ PS ML data, the AdaBoost regressor MLR displayed high performance with the accuracy on the training and testing sets being 0.922 and 0.816, respectively. One these data, the MLR model also performed well on the training set with an R 2 value of 0.908, but the accuracy on the test set significantly dropped to~0.7, resulting in the worst model in the external validation. Thus, the AdaBoost regressor MLR was selected for predicting the PS ML values of the Pb 2+ -new ligand complexes.

Design of Coumarin-like Chemical Library for Screening of Novel Chemosensors
One of the main applications of QSPR/QSAR approaches is to screen novel active ligands from a large chemical library of a determined scaffold. Bearing in mind that coumarin fluorescence probes for heavy metal ions currently receive increasing interest due to their highly variable size, hydrophobicity, and chelation, in this study, we focused on the design and screening of novel fluorescent probes bearing coumarin moieties [75,91]. In addition, Schiff bases are attracting more and more attention in analytical fields as they provide an electron-rich environment that is favorable to binding with metal ions [92]. When combined with a fluorescence moiety such as coumarin structure, the functional groups can participate in several reactions after being in contact with metal ions so that the Schiff base ligand provides detectable signals such as emission.
In 2017, Fan reported the synthesis of several coumarin-derived Schiff bases, which showed significant fluorescence enhancement owing to inhibiting the photo-induced electron transfer (PET) process [93]. The structures were mainly based on condensing 7-amino-4-methyl-coumarin (AMC) with salicylic aldehyde and 2-hydroxy-1-naphthaldehyde. Fan also evaluated the selectivity of his derivatives upon the addition of several metal ions such as Cd 2+ and Pb 2+ , among others (15 metals). The fluorescent property of these Schiff bases appeared to be dependent on the charge transfer and the electron configuration of the benzene and naphthalene rings. However, Cu 2+ was not tested for the coumarins designed in Fan's study [93]; thus, this core structure was explored in this study.
Going through the literature, we encountered coumarin-pyridin hybrids reported by Jung et al., which showed promising fluorescence-quenching properties in both the femtosecond timeresolved fluorescence (TRF) upconversion technique and ab initio calculations [47]. The authors identified that the n-picolylamide moiety (n = 1-3) allowed efficient tridentate complexation for Cu 2+ preferred over a variety of other common heavy and toxic metal ions. Of interest is also the chemosensor bearing acetyl-7-hydroxycoumarin moiety reported by Chang and colleagues, which exhibited a highly selective and sensitive fluorescence-sensing ability for Cu -+ over other metal ions under physiological conditions [94]. We also consulted the most recent advances in the field from literature reviews [75,91].
Taking all the above-mentioned findings, here, we tried to construct a small library of coumarin derivatives by using a structure-resembling approach using the R-group attachment module implemented int the Marvin Sketch package. As a result, over 129 coumarin structures were created, as shown in Figure 4A,B. All newly designed substances were checked to see if they lied in the range of the AD. Their calculated leverage values (ĥ i ) were all smaller than the threshold value (h*) of all the models, which means that QSPR models could work correctly on the chemical space of the designed compounds as they fell into the AD of the models. sensitive fluorescence-sensing ability for Cu -+ over other metal ions under physiological conditions [94]. We also consulted the most recent advances in the field from literature reviews [75,91]. Taking all the above-mentioned findings, here, we tried to construct a small library of coumarin derivatives by using a structure-resembling approach using the R-group attachment module implemented int the Marvin Sketch package. As a result, over 129 coumarin structures were created, as shown in Figure 4A,B. All newly designed substances were checked to see if they lied in the range of the AD. Their calculated leverage values ( i h ) were all smaller than the threshold value (h*) of all the models, which means that QSPR models could work correctly on the chemical space of the designed compounds as they fell into the AD of the models.

Virtual Screening and Complexation Potential Predictions Using DFT Calculations
In the next step, the newly designed library was screened for identifying coumarinlike florescent probes against each heavy metal using the QSPR models. The screening results are presented in Table S18 (Supplementary Materials) in which the designed compounds are denoted as NEW, and the logβ and PS values of each were estimated using the respectively selected models. Of note, a potential probe must have a high logβ and a low PS, which in turn can also be translated into a high PSML −1 . Figure 4 shows the trade-

Virtual Screening and Complexation Potential Predictions Using DFT Calculations
In the next step, the newly designed library was screened for identifying coumarin-like florescent probes against each heavy metal using the QSPR models. The screening results are presented in Table S18 (Supplementary Materials) in which the designed compounds are denoted as NEW, and the logβ and PS values of each were estimated using the respectively selected models. Of note, a potential probe must have a high logβ and a low PS, which in turn can also be translated into a high PS ML −1 . Figure 4 shows the trade-off of the logβ and PS ML −1 predictions, in which compounds closer to the top-right corner indicate better candidates. According to this criterion, we easily identified eight compounds, viz. NEW02, NEW03, NEW07, NEW21, NEW26, NEW34, NEW51, and NEW80, as potential florescent probes. Their logβ and PS values estimated by all the QSPR models are shown in Table 4, and the chemical SMILES of these eight coumarins are listed in Table S19. Especially, compounds NEW02, NEW03, and NEW07 were consistently predicted to be good ligand candidates in complex with all three heavy metal ions. Resulting from the prediction of the complexing characteristics, three florescent probe candidates were revealed, and they were subsequently examined for their interaction with Cd 2+ , Cu 2+ , and Pb 2+ using DFT calculations with B3LYP exchange functionals. To simplify the calculations, the stoichiometry of the complexes was adopted to be in a 1:1 (M:L) metal-and-coumarin-ligand molar ratio. In accordance with the literature, given the availability of a lone pair on the nitrogen atom, the Schiff base could form mono complexes with many metals; meanwhile, adding other substitutions, such as OH and SH, to the adjacent heterocycles may result in the formation of bidentate or tridentate chelates. This fact could be revealed by observing the partial distribution of the charge on the electrostatic potential surface (ESP) shown in Figure 5. This electrostatic potential map for all three compounds clearly revealed hydrophilic regions, including negative (red) and positive (blue) potentials, and hydrophobic regions, which are normally neutral (mainly the green area). The optimized geometric structures of the coumarin compounds computed via the B3LYP method showed that the reactive sites corresponding to the negative potentials were mainly localized on the O and S atoms, and two compounds, NEW02 and NEW07, showed higher polarity compared to NEW03. The total electron density of the studied derivatives varied from −0.31 to +0.29.
The quantum chemical parameters can provide insight into the chemical nature of the reaction mechanisms [95]. Through the DFT calculations at the B3LYP/LanL2DZ levels of theory, numerous descriptors corresponding to six metal complexes were determined as summarized in Table 5. Figure 6 shows the optimized structures of the metal-coumarin complexes and the corresponding distances of the π contacts between the ligands and metals. The molecular frontier orbitals (HOMO-LUMO) consisted of important descriptors mainly associated with the molecular reactivity.  The quantum chemical parameters can provide insight into the chemical nature of the reaction mechanisms [95]. Through the DFT calculations at the B3LYP/LanL2DZ levels of theory, numerous descriptors corresponding to six metal complexes were determined as summarized in Table 5. Figure 6 shows the optimized structures of the metal-coumarin complexes and the corresponding distances of the π contacts between the ligands and metals. The molecular frontier orbitals (HOMO-LUMO) consisted of important descriptors mainly associated with the molecular reactivity.   The E HOMO descriptor is related to the ability of the molecules to donate electrons to empty molecular orbitals with low energy, and the E LUMO descriptor indicates the electron-accepting affinity [96]. The interacting parts of the molecules were also identified by looking at the HOMO and LUMO visualization (Figure 7). Their different HOMO-LUMO ∆E represent the hard-soft, acid-base characteristics of the molecule. In such a way, the low gap energies identified in this study reflected the high chemical reactivity, optical polarizability, and low kinetic stability of the three metal complexes [97]. In other words, a molecule with a higher HOMO and a lower LUMO energy level has a higher chemical reactivity. Thus, for the Cd 2+ complexes, the energy gap followed the order: NEW03 < NEW02 < NEW07 (see Table 5), indicating that NEW03 had the lowest reactivity (the most-chemically stable) followed by NEW03 and NEW07 (the most-reactive). Likewise, the HOMO-LUMO gap for the Cu 2+ complexes were ranked as NEW03 < NEW07 < NEW02, and those for the Pb 2+ complexes were similar to the Cd 2+ complexes. In addition, the electron affinity (EA) and electronegativity (χ) of all six complexes were quite low compared to the ionization energy (IE). The current molecular orbital analysis also showed that the level of the frontier orbitals and low-energy electronic transitions were almost similar, suggesting the low energy intra-ligand charge transfer states of the metal complexes [95,97]. The EHOMO descriptor is related to the ability of the molecules to donate electrons to empty molecular orbitals with low energy, and the ELUMO descriptor indicates the electronaccepting affinity [96]. The interacting parts of the molecules were also identified by looking at the HOMO and LUMO visualization (Figure 7). Their different HOMO-LUMO ΔE represent the hard-soft, acid-base characteristics of the molecule. In such a way, the low gap energies identified in this study reflected the high chemical reactivity, optical polarizability, and low kinetic stability of the three metal complexes [97]. In other words, a molecule with a higher HOMO and a lower LUMO energy level has a higher chemical reactivity. Thus, for the Cd 2+ complexes, the energy gap followed the order: NEW03 < NEW02 < NEW07 (see Table 5), indicating that NEW03 had the lowest reactivity (the mostchemically stable) followed by NEW03 and NEW07 (the most-reactive). Likewise, the HOMO-LUMO gap for the Cu 2+ complexes were ranked as NEW03 < NEW07 < NEW02, and those for the Pb 2+ complexes were similar to the Cd 2+ complexes. In addition, the electron affinity (EA) and electronegativity (χ) of all six complexes were quite low compared to the ionization energy (IE). The current molecular orbital analysis also showed that the level of the frontier orbitals and low-energy electronic transitions were almost similar, suggesting the low energy intra-ligand charge transfer states of the metal complexes [95,97]. Another descriptor of pivotal interest is the chemical hardness, η, which refers to the resistance of a chemical species to the polarization of the electron cloud. It gives information, along with the electronegativity of the chemical reactivity and stability [79,95,96]. The softness is the reciprocal value of hardness. Previous studies demonstrated that most hard species have an electrostatic interaction, and soft species mainly interact through Another descriptor of pivotal interest is the chemical hardness, η, which refers to the resistance of a chemical species to the polarization of the electron cloud. It gives information, along with the electronegativity of the chemical reactivity and stability [79,95,96]. The softness is the reciprocal value of hardness. Previous studies demonstrated that most hard species have an electrostatic interaction, and soft species mainly interact through covalent bonding, e.g., mixing of orbitals. Of note, the global softness value increased from NEW07 to NEW03, while the chemical hardness and HOMO-LUMO gap values decreased. Evidently, the Schiff bases gained softer characteristics from NEW03 to NEW07. It was revealed that the stability parameters obtained in the QSPR predictions and the rankings obtained in the quantum chemical calculations were in concordance (Tables 4 and 5). In addition, Cu 2+ and Cd 2+ had higher electron donor characteristics (α) compared to Pb 2+ , while Cu 2+ and Pb 2+ showed higher basicity characteristics (β) compared to Cd 2+ . The softness parameter σ can also be calculated via equation σ = α/(α + β), which follows the ranking Cd 2+ > Cu 2+ > Pb 2+ (σ = 0.96 > 0.89 > 0.85, respectively). Thus, according to the Pearson classification for Lewis acids, Cu 2+ and Pb 2+ fall within the hard/soft borderline and Cd 2+ is the softest ion. As soft acids prefer to bind to soft bases, compound NEW03, therefore, can easily interact with the softer Cd 2+ charged ions, but not with the harder Pb 2+ ion. As soft metals normally have a filled d subshell, according to Lancashire (https://chem.libretexts.org/, assessed on 3 March 2023), the π interactions play an important role in metal-to-ligand bonding. Complexes of three coumarins with softer metals are more stable based on the electrostatic interactions [95,97].

Physicochemical and Toxicological Profiling
Given the broad applications in the analytic field, a well-organized chemosensor must be non-toxic to living organisms when it is applied in aqueous environments [15]. Therefore, in this study, the toxicological profiles of the newly designed compounds were assessed using several cheminformatics tools, including SwissADME and ADMETlab 2.0 [81,82].
First, the physicochemical properties that may affect the interactions between chemicals and living organisms were computed for NEW02, NEW03, and NEW07. The radar chart shown in Figure 8 illustrates the multivariate data of all three coumarins compared to the physicochemical profiles of the drug-like chemical space [82]. Being small molecules (MW range from 295.3-327.5), most of the physicochemical properties of these compounds fell into the drug-like chemical space, except the lipophilicity indices (logP and logD). This means that these compounds may have a drug-like behavior including appropriate pharmacokinetic, as well as low toxicological characteristics in the human body [98]. Based on the consensus predictions of SwissADME (Table S20), the solubility of NEW02 and NEW03 was predicted to be low (logS < −6.0 mol/L), while NEW07 had higher solubility (logS = −5.4 mol/L). According to the combination of the three properties (MW, PSA, and logD 7.4 ) as established in 3PRule [86], these coumarins may transport across the biological membrane via passive diffusion with moderate-to-high permeability. The oral bioavailability of these compounds was predicted to be <20%, and the brain disposition of these chemicals appeared to be low due to their low BBB passage [83].
Subsequently, the toxic effects on the human body were forecasted using both cheminformatics and toxicophore approaches. The results shown in Table 6 suggested that all three coumarins have no noticeable effects during exposure to the human body, especially on the skin. The oral acute toxicity was quite low, but for eye contact, they all can cause irritation [75,82]. In addition, biodegradability has been considered fundamental to the assessment of the environmental exposure and risk of chemical products. As can be seen in Table 6, there was no alert fragment in the structures of three coumarins, meaning that all the chemicals are biodegradable and will have no long-term toxic effect on the environment [82].
priate pharmacokinetic, as well as low toxicological characteristics in the human body [98]. Based on the consensus predictions of SwissADME (Table S20), the solubility of NEW02 and NEW03 was predicted to be low (logS < −6.0 mol/L), while NEW07 had higher solubility (logS = −5.4 mol/L). According to the combination of the three properties (MW, PSA, and logD7.4) as established in 3PRule [86], these coumarins may transport across the biological membrane via passive diffusion with moderate-to-high permeability. The oral bioavailability of these compounds was predicted to be <20%, and the brain disposition of these chemicals appeared to be low due to their low BBB passage [83]. Subsequently, the toxic effects on the human body were forecasted using both cheminformatics and toxicophore approaches. The results shown in Table 6 suggested that all three coumarins have no noticeable effects during exposure to the human body, especially on the skin. The oral acute toxicity was quite low, but for eye contact, they all can cause irritation [75,82]. In addition, biodegradability has been considered fundamental to the assessment of the environmental exposure and risk of chemical products. As can be seen in Table 6, there was no alert fragment in the structures of three coumarins, meaning that all the chemicals are biodegradable and will have no long-term toxic effect on the environment [82].
Finally, the ecotoxicological characteristics were assessed using cheminformatics tools implemented in ADMETlab 2.0. The results indicated that they had irrelevant  Finally, the ecotoxicological characteristics were assessed using cheminformatics tools implemented in ADMETlab 2.0. The results indicated that they had irrelevant environmental toxicity, especially for three types of aquatic organisms, e.g., tetrahymena pyriformis (TP), fathead minnow (FM), and daphnia magna (DM). These toxic endpoints are a mandatory preliminary assessment of chemicals' acute toxicity on aquatic species, according to the European regulation on the Registration, Evaluation, Authorization and Restriction of Chemicals (REACH) [99]. As can be seen in Table 6, the lethal concentrations 50% (log unit) were determined to be from −4.5 to −6.6 mg/L, and NEW03 exhibited the lowest aquatic toxicity among the three compounds. The overall results obtained herein confirmed the appropriate organo-kinetic and biosafety level of the designed coumarins against human and environmental health, suggesting their suitability for industrial production as florescent chemosensors.

Theoretical Synthetic Routes Proposed for Designed Coumarins
Given the sensing characteristics' predictions provided by the QSPR and quantum chemical calculations, it is desirable to obtain the florescent probes with the most potential for the experimental evaluations. Even though chemical synthesis was out of the scope of the current study, a proposal of the theoretical synthetic routes may be very helpful to assess the developability of the proposed structures, as well as to serve as a guidance for further developing novel chemosensors. To this end, three coumarins, NEW02, NEW03, and NEW07, were selected for the synthesis predictions.
The theoretical synthetic routes proposed for NEW02 and NEW03 start with the preparation of methyl (3-hydroxyphenyl)carbamate (I) by acylation of the amino group in m-aminophenol with methyl chloroformate in the presence of a base (e.g., potassium bicarbonate). Subsequently, 7-carbethoxyamino-4-methylcoumarin (II) can be synthesized via Pechmann condensation with acetoacetic ester in sulfuric acid. Accordingly, the synthesis of AMC from II can be performed by heating II at reflux in H2SO4 (98%) and glacial acetic acid for around 5 h. After adjusting to a low temperature and to pH ~ 7, the precipitate is then dissolved in the ethanol solvent. The resulting ethanol solution can be used to obtain the target compounds (NEW02 or NEW03) by adding to other ethanol solutions containing 2-mercaptobenzaldehyde (IIIa) or 2,4-mercaptobenzaldehyde (IIIb), then the mixture must be refluxed for 10-12 h under stirring. Once finished, the end product is cooled to room temperature, and after that, it can be collected by filtration, dried, and purified by recrystallization from DMF and ethanol. Considering the previous reports [91], the intermediate AMC can be formed in good yield (~90%); thus, the recrystallized form of NEW02 and NEW03 can be produced in an overall yield greater than 50%. The preparation of the remaining coumarin NEW07 based on the AHMC derivative is described by Scheme 2. The theoretical synthetic route is composed of four steps, starting with the formation of 8-hydroxy-4-methylcoumarin (IV) via a Pechmann condensation by reacting pyrocatechol and ethyl 3-oxobutanoate. Subsequently, nitration of IV with a mixture of nitric acid and sulfuric acid can lead to a separable mixture of both 7-and 8nitrocoumarin derivatives. There are several methods proposed in the literature to reduce o-nitrohydroxy-4-methylcoumarins using different reducing agents [91,94]. In this study, 7-amino-8-hydroxy-4-methylcoumarin (AHMC) was obtained by the reduction of 8-hydroxy-4-methyl-7-nitrocoumarin (V) using zinc chloride in hydrochloric acid and ethyl acetate to achieve a better yield. Finally, the target compound NEW07 can be obtained by reductive amination of AHMC with 2-hydroxybenzaldehyde using sodium cyanoborohydride in methanol and acetic acid (Scheme 2). The theoretical synthetic routes proposed for NEW02 and NEW03 start with the preparation of methyl (3-hydroxyphenyl)carbamate (I) by acylation of the amino group in m-aminophenol with methyl chloroformate in the presence of a base (e.g., potassium bicarbonate). Subsequently, 7-carbethoxyamino-4-methylcoumarin (II) can be synthesized via Pechmann condensation with acetoacetic ester in sulfuric acid. Accordingly, the synthesis of AMC from II can be performed by heating II at reflux in H 2 SO 4 (98%) and glacial acetic acid for around 5 h. After adjusting to a low temperature and to pH~7, the precipitate is then dissolved in the ethanol solvent. The resulting ethanol solution can be used to obtain the target compounds (NEW02 or NEW03) by adding to other ethanol solutions containing 2-mercaptobenzaldehyde (IIIa) or 2,4-mercaptobenzaldehyde (IIIb), then the mixture must be refluxed for 10-12 h under stirring. Once finished, the end product is cooled to room temperature, and after that, it can be collected by filtration, dried, and purified by recrystallization from DMF and ethanol. Considering the previous reports [91], the intermediate AMC can be formed in good yield (~90%); thus, the recrystallized form of NEW02 and NEW03 can be produced in an overall yield greater than 50%.

Conclusions
Heavy metal contamination in wastewater from industrial activities is escalating at alarming levels, and it has now become a major global problem. There has been a great necessity to effectively develop analytical methods for the identification and quantification of heavy metals in the aquatic medium. Florescent chemosensors have attracted major attention of researchers in this field due to their significant advantages related to visual simplicity, low cost, high sensitivity, good selectivity, and rapid response compared to the other methods. In assisting the rational design of potential chemosensors, computational approaches, e.g., the QSPR and DFT, have been widely developed and applied for a variety of heavy metals over the last few decades. In this study, a number of QSPR models were developed using the GA-MLR statistical and 10 machine learning techniques on a large database of ~200 florescent chemosensor structures in complexes with three heavy metals (Cu 2+ , Cd 2+ , and Pb 2+ ) previously reported. The modeling process clearly confirmed the ability of ML algorithms to improve the predictive performance of computational models. Six models based on GA-MLR, the AdaBoost regressor MLR, gradient boosting, and support vector regression, were selected and showed a good goodness-of-fit, robustness, and predictivity with the cross-validation accuracy ranging from >0.9 to 1.0, and those for external validation ranged from >0.8 to 0.95. It is worth noting that, so far, ML techniques have not been commonly used, then the predictive ability of the previously published studies appeared to be limited.
Subsequently, the developed computational models were applied for screening a virtual database of 129 coumarin-like structures, which were automatically generated based on seven potential florescent probes formerly reported in the literature. Interestingly, three novel simple compounds, namely NEW02, NEW03, and NEW07 were discovered with simultaneously high values of the stability constant and potentiometric sensitivity against three metal ions, Cu 2+ , Cd 2+ , and Pb 2+ . The QSPR predictions were then in good agreement with the quantum calculations using DFT/B3LYP approaches. All the HOMO-LUMO levels and other reactivity descriptors were in the same rankings as the QSPR approaches. To go further with the developability of the newly designed coumarin derivatives as chemosensors, their organo-kinetic and toxicological profiles were examined by computing numerous physicochemical and several toxic endpoints against human and environmental health using several cheminformatics tools. The results once again confirmed their safety characteristics for both humans and other species in aquatic ecosystems. Finally, based on retrosynthesis analysis with computational tools, the theoretical synthetic proposal for the three coumarins designed was successfully provided. The compounds can be synthesized by an inexpensive and simple method with good yield. Based on the aforementioned findings, the newly designed coumarins could be considered for further development stages as potential candidates as chemosensors against three metal ions, Cu 2+ , Cd 2+ , and Pb 2+ . The general workflow presented herein is promising, and we believe it can be applied to assist the development of novel chemosensors for screening multiple heavy metals' contamination, which is one of the major environmental problems facing our planet today.
Supplementary Materials: The following Supporting Information can be downloaded at: www.mdpi.com/xxx/s1, Figure S1: Scatter plots of LMO validation approach; Figure S2 The preparation of the remaining coumarin NEW07 based on the AHMC derivative is described by Scheme 2. The theoretical synthetic route is composed of four steps, starting with the formation of 8-hydroxy-4-methylcoumarin (IV) via a Pechmann condensation by reacting pyrocatechol and ethyl 3-oxobutanoate. Subsequently, nitration of IV with a mixture of nitric acid and sulfuric acid can lead to a separable mixture of both 7-and 8-nitrocoumarin derivatives. There are several methods proposed in the literature to reduce o-nitrohydroxy-4-methylcoumarins using different reducing agents [91,94]. In this study, 7amino-8-hydroxy-4-methylcoumarin (AHMC) was obtained by the reduction of 8-hydroxy-4-methyl-7-nitrocoumarin (V) using zinc chloride in hydrochloric acid and ethyl acetate to achieve a better yield. Finally, the target compound NEW07 can be obtained by reductive amination of AHMC with 2-hydroxybenzaldehyde using sodium cyanoborohydride in methanol and acetic acid (Scheme 2).

Conclusions
Heavy metal contamination in wastewater from industrial activities is escalating at alarming levels, and it has now become a major global problem. There has been a great necessity to effectively develop analytical methods for the identification and quantification of heavy metals in the aquatic medium. Florescent chemosensors have attracted major attention of researchers in this field due to their significant advantages related to visual simplicity, low cost, high sensitivity, good selectivity, and rapid response compared to the other methods. In assisting the rational design of potential chemosensors, computational approaches, e.g., the QSPR and DFT, have been widely developed and applied for a variety of heavy metals over the last few decades. In this study, a number of QSPR models were developed using the GA-MLR statistical and 10 machine learning techniques on a large database of~200 florescent chemosensor structures in complexes with three heavy metals (Cu 2+ , Cd 2+ , and Pb 2+ ) previously reported. The modeling process clearly confirmed the ability of ML algorithms to improve the predictive performance of computational models. Six models based on GA-MLR, the AdaBoost regressor MLR, gradient boosting, and support vector regression, were selected and showed a good goodness-of-fit, robustness, and predictivity with the cross-validation accuracy ranging from >0.9 to 1.0, and those for external validation ranged from >0.8 to 0.95. It is worth noting that, so far, ML techniques have not been commonly used, then the predictive ability of the previously published studies appeared to be limited.
Subsequently, the developed computational models were applied for screening a virtual database of 129 coumarin-like structures, which were automatically generated based on seven potential florescent probes formerly reported in the literature. Interestingly, three novel simple compounds, namely NEW02, NEW03, and NEW07 were discovered with simultaneously high values of the stability constant and potentiometric sensitivity against three metal ions, Cu 2+ , Cd 2+ , and Pb 2+ . The QSPR predictions were then in good agreement with the quantum calculations using DFT/B3LYP approaches. All the HOMO-LUMO levels and other reactivity descriptors were in the same rankings as the QSPR approaches. To go further with the developability of the newly designed coumarin derivatives as chemosensors, their organo-kinetic and toxicological profiles were examined by computing numerous physicochemical and several toxic endpoints against human and environmental health using several cheminformatics tools. The results once again confirmed their safety characteristics for both humans and other species in aquatic ecosystems. Finally, based on retrosynthesis analysis with computational tools, the theoretical synthetic proposal for the three coumarins designed was successfully provided. The compounds can be synthesized by an inexpensive and simple method with good yield. Based on the aforementioned findings, the newly designed coumarins could be considered for further development stages as potential candidates as chemosensors against three metal ions, Cu 2+ , Cd 2+ , and Pb 2+ . The general workflow presented herein is promising, and we believe it can be applied to assist the development of novel chemosensors for screening multiple heavy metals' contamination, which is one of the major environmental problems facing our planet today.
Supplementary Materials: The following Supporting Information can be downloaded at: https:// www.mdpi.com/article/10.3390/toxics11070595/s1, Figure S1: Scatter plots of LMO validation approach; Figure S2: Y randomization test compared to the original QSPR model; Figures S3-S8: Correlation matrix of molecular descriptors selected in the six QSPR models; Table S1: Database of experimental potentiometric sensitivity (PS) of complexes between Cu 2+ , Cd 2+ , and Pb 2+ ions and compounds; Tables S2-S4: Experimental stability constant (logβ) of the complex between three metals (Cd 2+ , Cu 2+ , Pb 2+ ) towards compounds in the database; Tables S5-S10: Runtime parameters optimized for the ML algorithm predicting the logβ and PS ML of the complex between three metals (Cd 2+ , Cu 2+ , Pb 2+ ) and compounds in the database; Table S11: Molecular descriptors using in the GA-MLR models; Tables S12-S17: Performance comparison between the GA-MLR and ML models; Table S18: logβ and PS ML parameter predictions for virtual the chemical library based on the corresponding QSPR models; Table S19: SMILES code of the 8 new coumarin-like structures and their leverage value (h i ) computed by the 6 QSPR models; Tables S20-S26. Physicochemical and absorption, distribution, metabolism, and excretion (ADME) properties of three candidates based on the SwissADME tool.