QSTR Modeling to Find Relevant DFT Descriptors Related to the Toxicity of Carbamates

Compounds containing carbamate moieties and their derivatives can generate serious public health threats and environmental problems due their high potential toxicity. In this study, a quantitative structure–toxicity relationship (QSTR) model has been developed by using one hundred seventy-eight carbamate derivatives whose toxicities in rats (oral administration) have been evaluated. The QSRT model was rigorously validated by using either tested or untested compounds falling within the applicability domain of the model. A structure-based evaluation by docking from a series of carbamates with acetylcholinesterase (AChE) was carried out. The toxicity of carbamates was predicted using physicochemical, structural, and quantum molecular descriptors employing a DFT approach. A statistical treatment was developed; the QSRT model showed a determination coefficient (R2) and a leave-one-out coefficient (Q2LOO) of 0.6584 and 0.6289, respectively.


Introduction
For decades, pesticides have been widely employed to either prevent, destroy, attract, repel, or control unwanted pests in plant or animal species [1]. There are different types of pesticides considering their chemical structures, e.g., organochlorides, organophosphates, carbamates, and pyrethroids, among others [2]. Particularly, carbamate compounds, derived from carbamic acid, have been used as pesticides to provide broad-spectrum control of insects around the world due to their broad biological activity and low bioaccumulation [3][4][5]. It is known that the residues from carbamates can interact with the human body through the food chain; due to their wide use to prevent, control, or eliminate diseases, insects, and grasses that are harmful in agricultural production, carbamate compounds could pose a threat to human health [6][7][8]. In this vein, to evaluate the toxicity of chemical compounds, the most common method is based on the lethal dose at 50% (LD 50 ), which is the minimal dose that causes death in 50% of individuals in a sample [2].
From a mechanistic view, several studies have shown that the toxicity of carbamate compounds is mediated by the inhibition of the enzyme acetylcholinesterase (AChE), where the first step in the inhibition process involves the formation of an enzyme-inhibitor complex with its subsequent carbamylation by the serine-hydroxyl group, generating a carbamate on the serine residue that is no longer able to hydrolyze the acetylcholine substrate [9].
Using these data, a quantitative structure-toxicity relationships (QSTR) analysis can be used to generate a mathematical model to predict the toxicity of new or unassessed compounds such as carbamates that are structurally related to the model training set [10,11].
Thus, we describe herein, that the analysis of the chemical reactivity of carbamates can generate a reliable model to relate their local and global DFT descriptors with their toxicity via a QSTR study. The QSTR model obtained, describes a relationship between molecular descriptors and toxicity. The methodology presented here is based on the experimental toxicity from a set of carbamates and the subsequent calculation of their molecular descriptors to model the relationship of physicochemical or structural properties with toxicity.

Structure and Optimization
A set of one hundred seventy-eight carbamate-based compounds (Figure 1 and Figure S1) with LD 50 on rats reported in the ChemID database, were optimized by utilizing DFT with the exchange-correlation functional PBE and the 6-311+G* basis set. The optimized structures of all carbamates used in the present study are shown in Figure S2.
Molecules 2022, 27, 5530 2 of 12 a carbamate on the serine residue that is no longer able to hydrolyze the acetylcholine substrate [9]. Using these data, a quantitative structure-toxicity relationships (QSTR) analysis can be used to generate a mathematical model to predict the toxicity of new or unassessed compounds such as carbamates that are structurally related to the model training set [10,11].
Thus, we describe herein, that the analysis of the chemical reactivity of carbamates can generate a reliable model to relate their local and global DFT descriptors with their toxicity via a QSTR study. The QSTR model obtained, describes a relationship between molecular descriptors and toxicity. The methodology presented here is based on the experimental toxicity from a set of carbamates and the subsequent calculation of their molecular descriptors to model the relationship of physicochemical or structural properties with toxicity.

Structure and Optimization
A set of one hundred seventy-eight carbamate-based compounds (Figures 1 and S1) with LD50 on rats reported in the ChemID database, were optimized by utilizing DFT with the exchange-correlation functional PBE and the 6-311+G* basis set. The optimized structures of all carbamates used in the present study are shown in Figure S2.

Electronic Analysis and DFT Descriptors of Carbamates
A frontier molecular orbital analysis was performed to determine the differences in the reactivity of one hundred seventy-eight carbamates. The chemical structures and graphs of the HOMO and LUMO frontier molecular orbitals were studied, and some examples are displayed in Figure 2; the full set of frontier molecular orbitals is shown in Figure S2. Frontier molecular orbital analysis of all carbamates revealed that HOMO orbital is mainly located in the aromatic ring with a small contribution towards the adjacent

Electronic Analysis and DFT Descriptors of Carbamates
A frontier molecular orbital analysis was performed to determine the differences in the reactivity of one hundred seventy-eight carbamates. The chemical structures and graphs of the HOMO and LUMO frontier molecular orbitals were studied, and some examples are displayed in Figure 2; the full set of frontier molecular orbitals is shown in Figure S2. Frontier molecular orbital analysis of all carbamates revealed that HOMO orbital is mainly located in the aromatic ring with a small contribution towards the adjacent oxygen and nitrogen atoms of the carbamoyl group; on the other hand, when the carbamates lacks of an aromatic ring, the HOMO was mainly located on the oxygen and the adjacent atoms of the carbonyl group of the carbamate moiety, while the LUMO orbital is mainly located on the conjugated carbon atoms and on the carbon atom of the carbamate moiety, which suggests that these sites are labile and allow a nucleophilic attack by the AChE enzyme [1].
To evaluate the relationship between the chemical reactivity and toxicity of the carbamates, DFT calculations were performed. The values of the electron affinity (A), ionization potential (I), chemical potential (μ), hardness (η), electrophilicity (ω), and Hirshfeld charges of the carbon atom in the carbonyl group (qC) are reported in Table 1. All descriptors were calculated using the PBE/6-311+G* level of theory; the results showed that the HOMO energies (EHOMO) have values from -1.23 to 1.83 eV, and LUMO energies (ELUMO) from 6.05 to 9.96 eV, generating a mean gap (ΔE) of 8.08 eV with a standard deviation of 0.81 eV. Figure 2. 3D structure and frontier molecular orbitals of the twelve molecules selected randomly (full set is reported in supplementary materials, Figure S2).  Figure 2. 3D structure and frontier molecular orbitals of the twelve molecules selected randomly (full set is reported in supplementary materials, Figure S2).
To evaluate the relationship between the chemical reactivity and toxicity of the carbamates, DFT calculations were performed. The values of the electron affinity (A), ionization potential (I), chemical potential (µ), hardness (η), electrophilicity (ω), and Hirshfeld charges of the carbon atom in the carbonyl group (qC) are reported in Table 1. All descriptors were calculated using the PBE/6-311+G* level of theory; the results showed that the HOMO energies (E HOMO ) have values from -1.23 to 1.83 eV, and LUMO energies (E LUMO ) from 6.05 to 9.96 eV, generating a mean gap (∆E) of 8.08 eV with a standard deviation of 0.81 eV.

QSTR Modeling
By applying a genetic algorithm over selected reliable molecular descriptors, a model with ten descriptors was obtained. The electronic affinity (EA) was selected as the first descriptor; this property is a global electronic descriptor, and it is associated with the LUMO energy, while qC was selected as the second descriptor, which is an electronic local descriptor related to the Hirshfeld charge of the carbonylic carbon atom.
Furthermore, another eight structural parameters were obtained in the model; these descriptors such as the lopping centric index (LOC) [12], the normalized spectral positive sum from reciprocal squared geometrical matrix (SpPosA_RG), the H autocorrelation of lag 4 weighted by mass [13] (H4m), the number of total tertiary C(sp 3 ) (nCt), the presence (value of 1) or absence (value of 0) of an aliphatic substituent bonded to the sp 3 oxygen of the carbamate group, (nROCON; for example, in Figure 3 there is an example of a value of 1 for this parameter), the presence (value of 1) or absence (value of 0) of C-N and N-O at topological distance of five bonds respectively (B05 [C-N] and B05 [N-O] ; see Figure 3), and, lastly, DLS 05 , that is the fifth drug-like score based on the two rules proposed by Zheng et al. [14], where the first rule is nNO/nC3 in the range 0.10-1.80 which is an index related to the proportion of heteroatoms, defined as the ratio of the total number of oxygen and nitrogen atoms (nNO) over the number of carbon atoms with sp 3 hybridization (nC3) and the second rule is Unsat-p ≤ 0.43, where Unsat-p is a measure of molecule unsaturation and is defined as is the ratio of molecular unsaturation, as defined by the Unsat index, over the number of atoms which do not have bonded hydrogens and halogens. The Unsat index [14] is calculated from Equation (1) where NRG 567 is the number of 5-, 6-, and 7-membered rings, nDB the number of double bonds, nTB the number of triple bonds, and nAB the number of aromatic bonds.

QSTR Modeling
By applying a genetic algorithm over selected reliable molecular descriptors, a model with ten descriptors was obtained. The electronic affinity (EA) was selected as the first descriptor; this property is a global electronic descriptor, and it is associated with the LUMO energy, while qC was selected as the second descriptor, which is an electronic local descriptor related to the Hirshfeld charge of the carbonylic carbon atom.
Furthermore, another eight structural parameters were obtained in the model; these descriptors such as the lopping centric index (LOC) [12], the normalized spectral positive sum from reciprocal squared geometrical matrix (SpPosA_RG), the H autocorrelation of lag 4 weighted by mass [13] (H4m), the number of total tertiary C(sp 3 ) (nCt), the presence (value of 1) or absence (value of 0) of an aliphatic substituent bonded to the sp 3 oxygen of the carbamate group, (nROCON; for example, in Figure 3 there is an example of a value of 1 for this parameter), the presence (value of 1) or absence (value of 0) of C-N and N-O at topological distance of five bonds respectively (B05[C-N] and B05[N-O]; see Figure 3), and, lastly, DLS05, that is the fifth drug-like score based on the two rules proposed by Zheng et al. [14], where the first rule is nNO/nC3 in the range 0.10-1.80 which is an index related to the proportion of heteroatoms, defined as the ratio of the total number of oxygen and nitrogen atoms (nNO) over the number of carbon atoms with sp 3 hybridization (nC3) and the second rule is Unsat-p ≤ 0.43, where Unsat-p is a measure of molecule unsaturation and is defined as is the ratio of molecular unsaturation, as defined by the Unsat index, over the number of atoms which do not have bonded hydrogens and halogens. The Unsat index [14] is calculated from Eq. 1 where NRG567 is the number of 5-, 6-, and 7-membered rings, nDB the number of double bonds, nTB the number of triple bonds, and nAB the number of aromatic bonds Unsat=NRG567 + nDB + 2⋅ nTB + (nAB+1)/2 (1) The best obtained model ( Table 2) was selected considering the criteria for predictive models proposed by Golbraikh et al. [15], with an R 2 = 0.6584 and a Q 2 LOO = 0.6289. The result was obtained using the ten above-described parameters. The data used to generate the model is displayed in Table S1, and the correlation matrix for these descriptors is The best obtained model ( Table 2) was selected considering the criteria for predictive models proposed by Golbraikh et al. [15], with an R 2 = 0.6584 and a Q 2 LOO = 0.6289. The result was obtained using the ten above-described parameters. The data used to generate the model is displayed in Table S1, and the correlation matrix for these descriptors is shown in Table 3. The descriptors showed a low correlation with each other; this implies that all variables are important in the multi-linear regression model (MLR). Additionally, a five-fold cross-validation was performed and the mean values for R squared, the root mean square error (RMSE), and the mean absolute error (MAE) were 0.6442, 0.4855 and 0.3889 respectively, this result confirms the proposed model has a good selection of test and training set, full information of cross-validation models is shown in Table S2. Furthermore, we test several regression approaches such as Ridge, Lasso, Backward-Forward selection, XGBoost and support vector regression (SVR) with the score R 2 in a range of 0.67 to 0.88; although SVR gives a better score it doesn't allow a chemical interpretation of the model. The R 2 value denotes whether the model obtained is viable or not; i.e., if the linear association between the toxicity [quantified by log(1/C)] and the molecular descriptor in the model is strong enough. Thus, as the value of R 2 approaches unity, then the model is considered adequate. Dispersion plots to validate the condition of the linear relation between descriptors and predicted log(1/C) are displayed in Figure 4. All variables are distributed randomly around axis X. Also, in Figure 4 the horizontal and vertical axes indicate the experimental values and the toxicity values obtained for the carbamate compounds, respectively. The graph with leave-one-out (LOO) validation ( Figure 5) shows that the compounds maintain the same trend as in the main model, i.e., quantitatively, the difference R 2 − Q 2 equal to 0.0295 does not exceed the range of values between 0.2 and 0.3, ref. [16] therefore this model is considered as acceptable.
The analysis of some statistical parameters ( Table 2) shows that the variable that most contributes to the model is the nROCON electronic property, with a standardized coefficient of −0.3527 and the p-value equal to 1.308 × 10 −6 which does not exceed the critical value of 0.05 [17]; a similar effect was observed on the other structural descriptors, therefore all variables that integrate the model are accepted.
Visualizing the applicability domain (AD) of the model and demonstrating the relationship between standardized residuals and leverage values (h), a Williams plot was utilized ( Figure 6). The Williams plot displays all the data points that are surrounded by standard deviations (−3.0 σ, 3.0 σ) and most of them are behind the critical leverage value (h*) for the model, revealing the robustness of the AD for the present QSAR models, corroborating the compounds from the internal validation set remain within the applicability domain, with a leverage value lower than h* of 0.217. It should be noted that one compound that is part of the test set has a value closest to the 3.0 σ limit; however, this is not considered as atypical data, ref. [9] confirming that all compounds from the internal validation set remain inside the limit permissible error.  The analysis of some statistical parameters ( Table 2) shows that the variable that mo contributes to the model is the nROCON electronic property, with a standardized coeff cient of −0.3527 and the p-value equal to 1.308 × 10 −6 which does not exceed the critic value of 0.05 [17]; a similar effect was observed on the other structural descriptors, ther fore all variables that integrate the model are accepted.
Visualizing the applicability domain (AD) of the model and demonstrating the rel tionship between standardized residuals and leverage values (h), a Williams plot was ut lized (Figure 6). The Williams plot displays all the data points that are surrounded b standard deviations (−3.0 σ, 3.0 σ) and most of them are behind the critical leverage valu (h*) for the model, revealing the robustness of the AD for the present QSAR models, co roborating the compounds from the internal validation set remain within the applicabili domain, with a leverage value lower than h* of 0.217. It should be noted that one com pound that is part of the test set has a value closest to the 3.0 σ limit; however, this is n considered as atypical data, [9] confirming that all compounds from the internal valid tion set remain inside the limit permissible error.

Binding of Carbamate Compounds to the Catalytic Site of AChE
Molecular docking calculations were performed to generate plausible carbam AChE complex models and evaluate their ability to reach the catalytic triad at th zyme's active site. The binding of a carbamate compound acting as an electrophile t catalytic pocket of the AChE in suitable orientation is considered key to the inhibiti the enzyme by this family of compounds [18]. The inactivation of the enzyme dep upon the chemical reaction among the enzyme residue Ser200 to the carbonyl carbon of the carbamate ligands. From the generated models, it was observed that most o Figure 6. William's plot with an up and low limit of 3σ and −3σ respectively, for the training set (light blue dots) and for the test set (violet dots).

Binding of Carbamate Compounds to the Catalytic Site of AChE
Molecular docking calculations were performed to generate plausible carbamate-AChE complex models and evaluate their ability to reach the catalytic triad at the enzyme's active site. The binding of a carbamate compound acting as an electrophile to the catalytic pocket of the AChE in suitable orientation is considered key to the inhibition of the enzyme by this family of compounds [18]. The inactivation of the enzyme depends upon the chemical reaction among the enzyme residue Ser200 to the carbonyl carbon atom of the carbamate ligands. From the generated models, it was observed that most of the tested compounds achieved proximity to the Ser200 ligand with a suitable orientation to allow a nucleophilic attack from the Ser200 hydroxyl group to the carbamate carbonyl group in each ligand. Figure 7 shows an example of this binding for the compound with Id 0000050077 (in purple). The hydrogen bond network formed between Ser200, His440, and Glu327 (highlighted in green), facilitates the subtraction of the hydroxyl H atom from Ser200, thus favoring the reaction of the Ser200 oxygen atom to the carbamate's carbonyl group located, in this case, at just 2.82 Å distance. Most of the compounds showed distances within 6.0 Å, confirming their potential to inactivate the enzyme by chemical reaction with the catalytic serine residue.
Unfortunately, we could not find any direct relationship between the affinity or the distance to the catalytic center with the toxicity of the studied compounds. Therefore, it is concluded that the binding process would not be the most relevant stage in the inactivation mechanism of AChE by the carbamates herein studied. It is very likely that the reactivity or the activation barrier during the reaction could better explain the toxicity of these compounds. tion with the catalytic serine residue.
Unfortunately, we could not find any direct relationship between the affinity or the distance to the catalytic center with the toxicity of the studied compounds. Therefore, it is concluded that the binding process would not be the most relevant stage in the inactivation mechanism of AChE by the carbamates herein studied. It is very likely that the reactivity or the activation barrier during the reaction could better explain the toxicity of these compounds.

Data Collection and Electronic Descriptors
All carbamate-based compounds were obtained from the ChemID database [19]. The LD50 values (mg/kg) (Table S1) for all compounds were obtained under similar experimental conditions (oral administration in rats). For modeling purposes, LD50 values were converted to molar concentration in mmol/kg (C) and subsequently to negative logarithms (log 1/C) according to the literature [20,21].
3D molecular structures for all carbamate compounds were built using the reported structure from the ChemID database [19]. After, the structures were optimized using DFT, without symmetry constraints by using the Gaussian 09 program [22] with the functional PBE [23] in combination with the 6-311+G* basis set for all atoms. Additionally, the global electronic descriptors such as the energy level of the highest occupied molecular orbital (EHOMO), the energy level of the lowest unoccupied molecular orbital (ELUMO), chemical potential (μ), hardness (η), and electrophilicity (ω) for all carbamates were obtained based on DFT calculations [24,25].

Data Collection and Electronic Descriptors
All carbamate-based compounds were obtained from the ChemID database [19]. The LD 50 values (mg/kg) (Table S1) for all compounds were obtained under similar experimental conditions (oral administration in rats). For modeling purposes, LD 50 values were converted to molar concentration in mmol/kg (C) and subsequently to negative logarithms (log 1/C) according to the literature [20,21].
3D molecular structures for all carbamate compounds were built using the reported structure from the ChemID database [19]. After, the structures were optimized using DFT, without symmetry constraints by using the Gaussian 09 program [22] with the functional PBE [23] in combination with the 6-311+G* basis set for all atoms. Additionally, the global electronic descriptors such as the energy level of the highest occupied molecular orbital (E HOMO ), the energy level of the lowest unoccupied molecular orbital (E LUMO ), chemical potential (µ), hardness (η), and electrophilicity (ω) for all carbamates were obtained based on DFT calculations [24,25].
Furthermore, local electronic descriptors were calculated for all carbamate groups from each molecule; these calculations were performed according to Vela et al. study [26] using Multiwfn [27].

Dragon Molecular Descriptors
Different types of molecular and electronic descriptors were used to develop the QSTR models. The Dragon software [28] was used to obtain the molecular descriptors based on the optimized molecules. Several non-representative descriptors (e.g., those showing the same values for all the compounds) were excluded from the data set. For each pair, if the correlation coefficient was higher than~90%, then the pair was considered inter-correlated and one of both was excluded. The most significant descriptors that enabled the toxicity correlation were isolated from the data set and combined with the electronic descriptors in order to develop a good QSTR model.

QSTR Model Building and Validation
The entire set was split using a random selection of variables, using 15% of the full set as a test set. Potential QSTR models were obtained with the combination of global, local, and structural descriptors using QSARINS software [29], where the reliability of the QSTR model was internally validated by using the coefficient of determination according to Equation (2), where y obs and y obs are the experimental log(1/C) and their average, respectively, while y calc is the calculated log(1/C) obtained via the QSTR model. The statistical analysis was performed with R-4.1.0 software project, available online: http://www.R-project.org/ (accessed on 1 June 2022). The multi-linear regression model was computed using the function lm (fitting linear models), which employs the QR factorization method to solve linear least-squares problems.
The data were divided into two sets: training and test sets using cross-validation and selecting the groups by the median of R 2 value. The validation was performed using the leave-one-out coefficient, Q 2 LOO , in Equation (3), an internal validation that enables the predictability of the model to be ascertained, wherein, if Q 2 LOO > 0.5 considering the molecules from the set used to build the model, then the QSTR model is acceptable. This is achieved by calculating the log(1/C) of a molecule via a model that excludes the molecule which is cycled over the whole set of molecules.
Moreover, the applicability domain (AD) was determined through leverage calculation, wherein all the untested compounds corresponding to the AD of the model will be predictable, while those compounds that do not fall within the AD are extrapolations obtained by the model [30]. In addition, to visualize the AD, a Williams plot was generated for the graphical detection of atypical responses (atypical Y) and the identification of the influence of the compounds in the model (atypical X).

Molecular Docking Calculations
All the studied carbamate compounds were docked into the active site of the target acetylcholinesterase (AChE) enzyme using the Autodock 4.2 program [31]. The AChE PDB model 2WFZ was selected to perform the calculations because it shows high resolution (1.95 Å) and good validation metrics [32]. This model was crystallized in a non-aged state with the organophosphate agent Soman, bonded to the catalytic residue Ser200. All the co-crystallized non-protein ligands were removed from the model and the remaining protein structure was subjected to a standard preparation workflow. Missing atoms in the protein were restored and polar hydrogen atoms were added to be treated explicitly in the calculations. Kollman charges and solvation parameters were assigned, and the structure was used to generate atom affinity maps for the calculation employing cubic grids of 90 × 90 × 90 points separated by 0.375 Å, centered at the enzyme's active site. For ligand structures, only polar hydrogens were considered with Gasteiger atom charges and full ligand flexibility. The docking calculation consisted of 100 runs per ligand of the genetic-Lamarkian algorithm, using 150 individuals as the initial population, 2,500,000 energy evaluations, and 27,000 maximum number of generations. The resulting 100 models per ligand were clustered with a 2 Å cutoff for further analysis. The selected models for each complex were evaluated for their ability to reach the catalytic triad located at the bottom of the binding pocket of the enzyme, i.e., Ser200, His440, and Glu327.

Conclusions
The generated QSTR model to predict the toxicity of carbamate compounds contains ten descriptors, where two of them are DFT descriptors, the Hirshfeld charge of the carbon from carbonyl (qC) and the electronic affinity (EA), which suggests that the DFT parameters play an important role to determine the toxicity in this new model. A comparison of DFT descriptors suggested the best contribution for the model is promoted by the EA, which is associated directly with the interaction of carbamate compounds with its AChE enzyme target according to molecular docking calculations, where a suitable orientation and proximity of the carbamate moiety of most compounds is effectively achieved to favor reactivity with the enzyme's catalytic residues via a nucleophilic attack. On the other hand, the nROCON descriptor generates the highest contribution in the model and indicates that a carbamate compound would be more toxic when an aromatic fragment is bonded to the sp 3 oxygen atom from the carbamate group, and thus, this feature would favor the electrophilic character of the carbamate's carbonyl group, further enhancing the inactivation reaction of AChE.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/molecules27175530/s1, Figure S1: Bidimensional structure of full carbamates set; Figure S2: Tridimensional structure, HOMO and LUMO graphs of full carbamates set; Table S1. Coefficients for quadratic least Squared regression model, Table S2. Coefficients for stepwise selection regression model, Table S3. Coefficients for Ridge and Lasso regularization techniques in regression model.