# In Silico Prediction of the Toxicity of Nitroaromatic Compounds: Application of Ensemble Learning QSAR Approach

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

_{50}). An initial set of 4885 molecular descriptors was generated and applied to build Support Vector Regression (SVR) models. The best two SVR models, SVR_A and SVR_B, were selected to build an Ensemble Model by means of Multiple Linear Regression (MLR). The obtained Ensemble Model showed improved performance over the base SVR models in the training set (R

^{2}= 0.88), validation set (R

^{2}= 0.95), and true external test set (R

^{2}= 0.92). The models were also internally validated by 5-fold cross-validation and Y-scrambling experiments, showing that the models have high levels of goodness-of-fit, robustness and predictivity. The contribution of descriptors to the toxicity in the models was assessed using the Accumulated Local Effect (ALE) technique. The proposed approach provides an important tool to assess toxicity of nitroaromatic compounds, based on the ensemble QSAR model and the structural relationship to toxicity by analyzed contribution of the involved descriptors.

## 1. Introduction

_{50}. The partial least square 2D QSARs showed reasonable performance values with R

^{2}= 0.96–0.98 for the training set and R

^{2}= 0.89–0.92 for the test set. These authors also showed that hydrophobicity, electrostatic and Van der Waals interactions, and the addition of hydroxyl (-OH) and fluorine (H

_{2}F and CH

_{2}F) groups contribute to the enhancement of toxicity, while the introduction of methyl groups leads to a decrease in toxicity. A non-additive effect was also found, as the toxicity of trinitroaromatic compounds did not show higher values than the toxicity of dinitroaromatic compounds [26].

_{50}). Several QSAR models were developed based on different classes of molecular descriptors including quantum chemical and topological molecular descriptors computed by DRAGON [28], PaDEL [29] and HiT-QSAR [30] software. The resulting best QSAR model was a combination of the unique indices from the different software, and gave reasonable results for the training (R

^{2}= 0.81), internal validation (Q

^{2}= 0.75) and test (R

^{2}= 0.72) sets. It is also important to remark that the authors reveal some structural relationships in terms of functional groups related to toxicity. This is the case for compounds with additional hydroxyl (-OH) and methyl (CH

_{3}) groups showing the highest toxicity. The presence of -PO

_{4}and -SO

_{4}groups increases toxicity, while the presence of -NH

_{2}groups can drastically reduce toxicity [27].

^{2}

_{train}= 0.719, Q

^{2}

_{train}= 0.695; R

^{2}

_{test}= 0.739). Despite these values, the study shows interesting structural relationships to toxicity through the use of the substructures mentioned above. For example, the presence of a heteroatom with 7 out of 14 double bonded oxygens, double bonded oxygen and sp

^{2}with double bond increases toxicity. On the other hand, the presence of some substructures such as sp

^{3}with branching, heteroaromatic nitrogen, and the presence of oxygen and carbon and NH

_{2}groups reduces the toxicity in NACs. More details on the analysis of substructures are provided in the original literature [25].

^{2}= 0.858) and test (R

^{2}= 0.857) set. The authors obtained an equation with five parameters for toxicity (−logLD

_{50}(M) = 1.599 + 0.4293*nNO

_{2}− 0.4165*nS + 1.771*nP + 1.313*Tox

^{+}− 2.110*Tox

^{−}). Three simple descriptors appear in this equation, two of which contribute positively to toxicity: nNO

_{2}, a descriptor related to the number of nitro groups, and nP, the number of phosphorus atoms. The descriptor nP, which accounts for the number of sulphur atoms, contributes negatively to toxicity. In addition, the equation contains two other adjustable parameters, Tox

^{+}and Tox

^{−}, whose interpretation in relation to the toxicity of NAC is more difficult and therefore affects the interpretation of the mechanism of the other constitutional descriptors in the equation [31].

_{50}), using a dataset of 128 NACs. In this study, seven simple 2D molecular descriptors were selected for the QSAR model after applying the GA-MLR variable selection methods. They reported a squared correlation coefficient R

^{2}of 0.748 for the training set (n = 101) and 0.759 for the external test set (n = 27). The most important descriptors were P_VSA_s_1, B06[C-F] and F09[C-N] which were positively related to toxicity, indicating that the higher values of these descriptors contributed to higher toxicity. These descriptors are related to the van der Waals surface area (P_VSA_s_1), the presence of C-F bonds at topological distance 6 (B06[C-F]), and the high frequency of C-N bonds at topological distance 9 (F09[C-N]) [9].

_{50}concentration for rats, which showed high predictive performance. The final model (ensemble model) combines the result of two Support Vector Regressions (SVR) and predicts the −logLD

_{50}value of a given NAC with high accuracy. In addition, the Accumulated Local Effect (ALE) approach was used to better understand the mechanistic relationship between the descriptors involved in the models and toxicity (−log LD

_{50}) [32]. To the best of our knowledge, this is the first study to use ALE method to explain the mechanistic interpretation of a non-linear QSAR model.

## 2. Materials and Methods

#### 2.1. Experimental Data Collection

_{50}, was calculated by converting all LD

_{50}values to molar values (mol/kg) and mapping them to a negative logarithm scale. For validation purposes the dataset was split into a training set and a test set, where the training set was used for model generation. Additionally, a set of seven NACs was collected for additional external evaluation of the model performance as a true external test set. These data can be found in Table S2.

#### 2.2. Generation of Descriptors

^{th}descriptor, x

_{ij}and ${X}_{ij}$ are the normalized and original values of the j

^{th}descriptor of the i

^{th}compound.

#### 2.3. QSAR Modeling and Validation

^{2}), Root Means Square Error (RMSE) and Mean Absolute Error (MAE). As a result, for each created model, the following equations were used to determine the squared correlation coefficient R

^{2}(Equation (2)), the Root Mean Square Error (Equation (3)), the Mean Absolute Error (Equation (4)) to evaluate the goodness of fit and the Concordance Correlation Coefficient (CCC, Equation (6))

^{th}compound, accordingly, and ${\tilde{y}}^{obs}$ is the mean of observed values. We estimated the Mean Absolute Error of cross-validation MAECV in each example to assess model stability according to Equation (5). In Equation (6), ${\overline{y}}^{obs}$ and ${\overline{y}}^{pred}$ are the mean values for observed and predicted values.

^{2}.

#### Support Vector Regression and Ensemble Model

#### 2.4. Analysis of Descriptors in Models

## 3. Results

#### 3.1. Distribution of Molecular Weights and Toxicity

_{50}for all three data sets. As can be seen in Figure 2, the training data are heterogeneously distributed. It can be observed that the compounds in both the external and the true external test sets share the same chemical space as the training data.

#### 3.2. Ensemble Model

^{2}= 0.92. This can be seen from the fact that the residual errors are smaller than those of SVR_B. In contrast, SVR_B showed better performance on the external test set. When the ensemble model was applied to the external test set, better performance results were obtained, indicating that this model has high predictive power and is well trained. Figure 3 shows the predicted versus experimental −logLD

_{50}for the training set (Figure 3A), the test set (Figure 3B) and the true external validation set (Figure 3C). In each scatter plot, the black solid line shows the associated regression line to the data points that confirm these performance results for the ensemble model.

^{2}(test) of 0.81 (n = 166) were obtained by the ensemble model constructed as a simple average of the predictions of the two best ML models. These individual models yielded RMSE (test) values of 0.70 and 0.72, i.e., R

^{2}(test) values of 0.80 and 0.78, respectively. The quality of the models in mentioned study is expressed by the parameters that measure an agreement (R

^{2}), but also by the parameters that estimate the standard error of the estimate or prediction (RMSE and MAE) as the basic model validation measure. It worth noting that for all models and for all sets (training set, test set and external test set), higher R

^{2}values were always associated with lower RMSE values, indicating their consistency and stability. This is a desirable predictive property of the model, especially for external data sets, as Lučić et al. have shown with examples (in Table 2 [49]) that with very small changes in external dataset it is possible for the R

^{2}to increase even in situations where RMSE decreases—in a case where an extremely bad prediction with error being greater than 2*RMSE was obtained in one additional example.

^{2}, confirming that the model is not the result of chance correlation.

## 4. Conclusions

^{2}= 0.88 for the training set and R

^{2}= 0.95 for the test set. Additionally, the contribution of each descriptor to toxicity was discussed using the Accumulated Local Effect (ALE) approach. This novel approach worked very well in this study as it was able to show the intervals of the linear relationship between the descriptors and toxicity for non-linear models such as Support Vector Regression. The developed ensemble QSAR model has eight descriptors showing strong positive effects on toxicity, while five descriptors show negligible effects, and three descriptors show a negative effects. It is important to emphasize that HATS7s is a common descriptor for SVR_A and SVR_B. The ALE plot of both models shows the same pattern for this descriptor. The obtained results describe the structural relationship between toxicity and molecular descriptors in developed non-linear models that could be helpful in assessment of the toxicity of existing nitroaromatic compounds and development of less toxic analogues. Moreover, the applied ALE approach might provide some mechanistic explanations to better describe the effects of the molecular descriptors in supervised black-box machine learning models.

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Bilal, M.; Bagheri, A.R.; Bhatt, P.; Chen, S. Environmental occurrence, toxicity concerns, and remediation of recalcitrant nitroaromatic compounds. J. Environ. Manag.
**2021**, 291, 112685. [Google Scholar] [CrossRef] [PubMed] - Kovacic, P.; Somanathan, R. Nitroaromatic compounds: Environmental toxicity, carcinogenicity, mutagenicity, therapy and mechanism. J. Appl. Toxicol.
**2014**, 34, 810–824. [Google Scholar] [CrossRef] [PubMed] - Tiwari, J.; Tarale, P.; Sivanesan, S.; Bafana, A. Environmental persistence, hazard, and mitigation challenges of nitroaromatic compounds. Environ. Sci. Pollut. Res.
**2019**, 26, 28650–28667. [Google Scholar] [CrossRef] [PubMed] - Kulkarni, M.; Chaudhari, A. Microbial remediation of nitro-aromatic compounds: An overview. J. Environ. Manag.
**2007**, 85, 496–512. [Google Scholar] [CrossRef] [PubMed] - Zhang, C.-L.; Yu, Y.-Y.; Fang, Z.; Naraginti, S.; Zhang, Y.; Yong, Y.-C. Recent advances in nitroaromatic pollutants bioreduction by electroactive bacteria. Process Biochem.
**2018**, 70, 129–135. [Google Scholar] [CrossRef] - Deng, K.; Wong, T.Y.; Wang, Y.; Leung, E.M.K.; Chan, W. Combination of precolumn nitro-reduction and ultraperformance liquid chromatography with fluorescence detection for the sensitive quantification of 1-nitronaphthalene, 2-nitrofluorene, and 1-nitropyrene in meat products. J. Agric. Food Chem.
**2015**, 63, 3161–3167. [Google Scholar] [CrossRef] - Slater, E.C. Mechanism of uncoupling of oxidative phosphorylation by nitrophenols. Comp. Biochem. Physiol.
**1962**, 4, 281–301. [Google Scholar] [CrossRef] - Strauss, M.J. The Nitroaromatic Group in Drug Design. Pharmacology and Toxicology (for Nonpharmacologists). Ind. Eng. Chem. Prod. Res. Dev.
**1979**, 18, 158–166. [Google Scholar] [CrossRef] - Hao, Y.; Sun, G.; Fan, T.; Tang, X.; Zhang, J.; Liu, Y.; Zhang, N.; Zhao, L.; Zhong, R.; Peng, Y. In vivo toxicity of nitroaromatic compounds to rats: QSTR modelling and interspecies toxicity relationship with mouse. J. Hazard. Mater.
**2020**, 399, 122981. [Google Scholar] [CrossRef] - Khan, K.; Roy, K.; Benfenati, E. Ecotoxicological QSAR modeling of endocrine disruptor chemicals. J. Hazard. Mater.
**2019**, 369, 707–718. [Google Scholar] [CrossRef] - Isayev, O.; Rasulev, B.; Gorb, L.; Leszczynski, J. Structure-toxicity relationships of nitroaromatic compounds. Mol. Divers.
**2006**, 10, 233–245. [Google Scholar] [CrossRef] [PubMed] - Ding, Y.L.; Lyu, Y.C.; Leong, M.K. In silico prediction of the mutagenicity of nitroaromatic compounds using a novel two-QSAR approach. Toxicol. Vitr.
**2017**, 40, 102–114. [Google Scholar] [CrossRef] [PubMed] - Cassani, S.; Kovarich, S.; Papa, E.; Roy, P.P.; van der Wal, L.; Gramatica, P. Daphnia and fish toxicity of (benzo)triazoles: Validated QSAR models, and interspecies quantitative activity-activity modelling. J. Hazard. Mater.
**2013**, 258–259, 50–60. [Google Scholar] [CrossRef] [PubMed] - Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform.
**2010**, 29, 476–488. [Google Scholar] [CrossRef] [PubMed] - Katritzky, A.R.; Oliferenko, P.; Oliferenko, A.; Lomaka, A.; Karelson, M. Nitrobenzene toxicity: QSAR correlations and mechanistic interpretations. J. Phys. Org. Chem.
**2003**, 16, 811–817. [Google Scholar] [CrossRef] - Casañola-Martin, G.M.; Le-Thi-Thu, H.; Pérez-Giménez, F.; Marrero-Ponce, Y.; Merino-Sanjuán, M.; Abad, C.; González-Díaz, H. Multi-output model with Box–Jenkins operators of linear indices to predict multi-target inhibitors of ubiquitin–proteasome pathway. Mol. Divers.
**2015**, 19, 347–356. [Google Scholar] [CrossRef] - Bediaga, H.; Moreno, M.I.; Arrasate, S.; Vilas, J.L.; Orbe, L.; Unzueta, E.; Mercader, J.P.; González-Díaz, H. Multi-output chemometrics model for gasoline compounding. Fuel
**2022**, 310, 122274. [Google Scholar] [CrossRef] - Litter, M.I. A short review on the preparation and use of iron nanomaterials for the treatment of pollutants in water and soil. Emergent Mater.
**2022**, 5, 391–400. [Google Scholar] [CrossRef] - Chen, S.; Liu, H. Self-reductive palladium nanoparticles loaded on polydopamine-modified MXene for highly efficient and quickly catalytic reduction of nitroaromatics and dyes. Colloids Surf. A Physicochem. Eng. Asp.
**2022**, 635, 128038. [Google Scholar] [CrossRef] - Kumunda, C.; Adekunle, A.S.; Mamba, B.B.; Hlongwa, N.W.; Nkambule, T.T.I. Electrochemical Detection of Environmental Pollutants Based on Graphene Derivatives: A Review. Front. Mater.
**2020**, 7, 616787. [Google Scholar] [CrossRef] - Tiwari, J.; Gandhi, D.; Sivanesan, S.; Naoghare, P.; Bafana, A. Remediation of different nitroaromatic pollutants by a promising agent of Cupriavidus sp. strain a3. Ecotoxicol. Environ. Saf.
**2020**, 205, 111138. [Google Scholar] [CrossRef] [PubMed] - Wu, Q.; Chen, J.; Liu, Z.; Xu, Y. CO Activation Using Nitrogen-Doped Carbon Nanotubes for Reductive Carbonylation of Nitroaromatics to Benzimidazolinone and Phenyl Urea. ACS Appl. Mater. Interfaces
**2020**, 12, 48700–48711. [Google Scholar] [CrossRef] [PubMed] - He, L.; Xiao, K.; Zhou, C.; Li, G.; Yang, H.; Li, Z.; Cheng, J. Insights into pesticide toxicity against aquatic organism: QSTR models on Daphnia Magna. Ecotoxicol. Environ. Saf.
**2019**, 173, 285–292. [Google Scholar] [CrossRef] [PubMed] - Tugcu, G.; Ertürk, M.D.; Saçan, M.T. On the aquatic toxicity of substituted phenols to Chlorella vulgaris: QSTR with an extended novel data set and interspecies models. J. Hazard. Mater.
**2017**, 339, 122–130. [Google Scholar] [CrossRef] [PubMed] - Mondal, D.; Ghosh, K.; Baidya, A.T.K.; Gantait, A.M.; Gayen, S. Identification of structural fingerprints for in vivo toxicity by using Monte Carlo based QSTR modeling of nitroaromatics. Toxicol. Mech. Methods
**2020**, 30, 257–265. [Google Scholar] [CrossRef] - Kuz’min, V.E.; Muratov, E.N.; Artemenko, A.G.; Gorb, L.; Qasim, M.; Leszczynski, J. The effects of characteristics of substituents on toxicity of the nitroaromatics: HiT QSAR study. J. Comput.-Aided Mol. Des.
**2008**, 22, 747. [Google Scholar] [CrossRef] - Gooch, A.; Sizochenko, N.; Rasulev, B.; Gorb, L.; Leszczynski, J. In vivo toxicity of nitroaromatics: A comprehensive quantitative structure–activity relationship study. Environ. Toxicol. Chem.
**2017**, 36, 2227–2233. [Google Scholar] [CrossRef] - Toddeschini, R.; Consonni, V.; Mauri, A.; Pavan, M. Dragon Software for the Calculation of Molecular Descriptors, Version 6 for Windows; Talete SRL: Milan, Italy, 2014. [Google Scholar]
- Yap, C.W. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem.
**2011**, 32, 1466–1474. [Google Scholar] [CrossRef] - Kuz’min, V.E.; Artemenko, A.G.; Muratov, E.N.; Polischuk, P.G.; Ognichenko, L.N.; Liahovsky, A.V.; Hromov, A.I.; Varlamova, E.V. Virtual Screening and Molecular Design Based on Hierarchical Qsar Technology. Recent Adv. QSAR Stud.
**2010**, 8, 127–176. [Google Scholar] [CrossRef] - Keshavarz, M.H.; Akbarzadeh, A.R. A simple approach for assessment of toxicity of nitroaromatic compounds without using complex descriptors and computer codes. SAR QSAR Environ. Res.
**2019**, 30, 347–361. [Google Scholar] [CrossRef] - Apley, D.W.; Zhu, J. Visualizing the effects of predictor variables in black box supervised learning models. J. R. Stat. Society. Ser. B Stat. Methodol.
**2020**, 82, 1059–1086. [Google Scholar] [CrossRef] - ChemIDplus: A Web-Based Chemical Search System, Mar-Apr 2000, NLM Technical Bulletin. Available online: https://www.nlm.nih.gov/pubs/techbull/ma00/ma00_chemid.html (accessed on 17 June 2022).
- Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem in 2021: New data content and improved web interfaces. Nucleic Acids Res.
**2021**, 49, D1388–D1395. [Google Scholar] [CrossRef] [PubMed] - Hypercube Inc., N.t.S., Gainesville, Florida 32601, USA. HyperChem(TM) Professional 8.0. 2019. Available online: http://www.hypercubeusa.com/ (accessed on 13 October 2021).
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - OECD. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship (Q)SAR Models; OECD: Paris, France, 2014. [Google Scholar] [CrossRef]
- Gramatica, P.; Sangion, A. A Historical Excursus on the Statistical Validation Parameters for QSAR Models: A Clarification Concerning Metrics and Terminology. J. Chem. Inf. Model.
**2016**, 56, 1127–1131. [Google Scholar] [CrossRef] - Roy, K. On some aspects of validation of predictive quantitative structure-activity relationship models. Expert Opin. Drug Discov.
**2007**, 2, 1567–1577. [Google Scholar] [CrossRef] [PubMed] - Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model
**2002**, 20, 269–276. [Google Scholar] [CrossRef] - Pratim, R.P.; Paul, S.; Mitra, I.; Roy, K. On two novel parameters for validation of predictive QSAR models. Molecules
**2009**, 14, 1660–1701. [Google Scholar] [CrossRef] - Erickson, M.E.; Ngongang, M.; Rasulev, B. A refractive index study of a diverse set of polymeric materials by QSPR with quantum-chemical and additive descriptors. Molecules
**2020**, 25, 3772. [Google Scholar] [CrossRef] - Gramatica, P.; Chirico, N.; Papa, E.; Cassani, S.; Kovarich, S. QSARINS: A new software for the development, analysis, and validation of QSAR MLR models. J. Comput. Chem.
**2013**, 34, 2121–2132. [Google Scholar] [CrossRef] - Freund, Y.; Schapire, R.E.; Singer, Y.; Warmuth, M.K. Using and combining predictors that specialize. In Proceedings of the Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing; 1997. [Google Scholar]
- Van Der Laan, M.J.; Polley, E.C.; Hubbard, A.E. Super learner. Stat. Appl. Genet. Mol. Biol.
**2007**, 6, 25. [Google Scholar] [CrossRef] - Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Independently Published-Amazon: Seattle, WA, USA, 2022; ISBN -13 979-841-146-333-0. [Google Scholar]
- Hao, Y.; Sun, G.; Fan, T.; Sun, X.; Liu, Y.; Zhang, N.; Zhao, L.; Zhong, R.; Peng, Y. Prediction on the mutagenicity of nitroaromatic compounds using quantum chemistry descriptors based QSAR and machine learning derived classification methods. Ecotoxicol. Environ. Saf.
**2019**, 186, 109822. [Google Scholar] [CrossRef] [PubMed] - Lovrić, M.; Pavlović, K.; Žuvela, P.; Spataru, A.; Lučić, B.; Kern, R.; Wong, M.W. Machine learning in prediction of intrinsic aqueous solubility of drug-like compounds: Generalization, complexity, or predictive ability? J. Chemom.
**2021**, 35, e3349. [Google Scholar] [CrossRef] - Lučić, B.; Batista, J.; Bojović, V.; Lovrić, M.; Sović Kržić, A.; Bešlo, D.; Nadramija, D.; Vikić-Topić, D. Estimation of Random Accuracy and Its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges. Croat. Chem. Acta
**2019**, 92, 379–391. [Google Scholar] [CrossRef] [Green Version] - Cronin, M.T.D.; Gregory, B.W.; Schultz, T.W. Quantitative Structure−Activity Analyses of Nitrobenzene Toxicity to Tetrahymena pyriformis. Chem. Res. Toxicol.
**1998**, 11, 902–908. [Google Scholar] [CrossRef] [PubMed] - Schmitt, H.; Altenburger, R.; Jastorff, B.; Schüürmann, G. Quantitative Structure−Activity Analysis of the Algae Toxicity of Nitroaromatic Compounds. Chem. Res. Toxicol.
**2000**, 13, 441–450. [Google Scholar] [CrossRef] - Ukić, Š.; Sigurnjak, M.; Cvetnić, M.; Markić, M.; Stankov, M.N.; Rogošić, M.; Rasulev, B.; Lončarić Božić, A.; Kušić, H.; Bolanča, T. Toxicity of pharmaceuticals in binary mixtures: Assessment by additive and non-additive toxicity models. Ecotoxicol. Environ. Saf.
**2019**, 185, 109696. [Google Scholar] [CrossRef] - Cvetnic, M.; Juretic Perisic, D.; Kovacic, M.; Ukic, S.; Bolanca, T.; Rasulev, B.; Kusic, H.; Loncaric Bozic, A. Toxicity of aromatic pollutants and photooxidative intermediates in water: A QSAR study. Ecotoxicol. Environ. Saf.
**2019**, 169, 918–927. [Google Scholar] [CrossRef] - Sizochenko, N.; Mikolajczyk, A.; Jagiello, K.; Puzyn, T.; Leszczynski, J.; Rasulev, B. How the toxicity of nanomaterials towards different species could be simultaneously evaluated: A novel multi-nano-read-across approach. Nanoscale
**2018**, 10, 582–591. [Google Scholar] [CrossRef] - Toropov, A.A.; Rasulev, B.F.; Leszczynski, J. QSAR modeling of acute toxicity by balance of correlations. Bioorganic Med. Chem.
**2008**, 16, 5999–6008. [Google Scholar] [CrossRef] - Klein, D.J. Topological Indices and Related Descriptors in QSAR and QSPR. J. Chem. Inf. Comput. Sci.
**2002**, 42, 1507. [Google Scholar] [CrossRef]

**Figure 2.**(

**A**) Scatter plots of −logLD

_{50}values vs. molecular weight. Histogram of molecular weights for (

**B**) training set, (

**C**) test set, (

**D**) external test set.

**Figure 3.**Experimental versus predicted −logLD

_{50}values obtained by the ensemble model for the training (

**A**), test (

**B**) and external test (

**C**) sets.

Method/Model | Runtime Parameters |
---|---|

SVR_A and SVR_B | Kernel = ’rbf’, degree = 3, gamma = ’auto’, coef 0 = 0.0, tol = 0.001, C = 5.0, epsilon = 0.1, shrinking = Ture, cache_size = 200, verbose = False, max_iter = −1 |

MLR | Fir_intercept = True, normalize = ’False’, copy_X = True, n_jobs = −1, positive = False |

Parameters | Regression Model | Ensemble Model | |
---|---|---|---|

SVR_A | SVR_B | ||

No. of descriptors | 11 | 8 | _ |

R^{2} (training) | 0.83 | 0.81 | 0.88 |

RMSE (training) | 0.111 | 0.127 | 0.093 |

MAE (training) | 0.221 | 0.226 | 0.199 |

MAECV(5-Fold) | 0.484 | 0.486 | 0.480 |

R^{2} (test) | 0.92 | 0.85 | 0.95 |

RMSE (test) | 0.056 | 0.096 | 0.041 |

MAE (test) | 0.191 | 0.250 | 0.155 |

CCC (test) | 0.968 | 0.946 | 0.978 |

R^{2} (external test) | 0.74 | 0.88 | 0.92 |

RMSE (external test) | 0.132 | 0.123 | 0.061 |

MAE (external test) | 0.320 | 0.319 | 0.202 |

CCC (external test) | 0.898 | 0.931 | 0.961 |

${Q}_{F1}^{2}$ | 0.945 | 0.906 | 0.960 |

${Q}_{F2}^{2}$ | 0.943 | 0.903 | 0.958 |

${r}_{m}^{2}$ | 0.510 | 0.536 | 0.560 |

$k$ | 0.955 | 0.981 | 0.975 |

$k\prime $ | 1.041 | 1.007 | 1.021 |

Descriptor | SVR_A | SVR_B | Definition and Scope | Descriptor Type |
---|---|---|---|---|

AVS_B(e) | X | X | average vertex sum from Burden matrix weighted by Sanderson electronegativity | 2D matrix-based descriptors |

HATS7s | X | X | leverage-weighted autocorrelation of lag 7/weighted by I-state | GETAWAY descriptors |

Eta_sh_y | X | X | Eta y shape index | ETA indices |

GATS2v | X | Geary autocorrelation of lag 2 weighted by van der Waals volume | 2D autocorrelations | |

GATS8m | X | Geary autocorrelation of lag 8 weighted by mass | 2D autocorrelations | |

P_VSA_LogP_3 | X | P_VSA-like on LogP, bin 3 | P_VSA-like descriptors | |

nHM | X | number of heavy atoms | Constitutional indices | |

RDF060s | X | Radial Distribution Function—060/weighted by I-state | RDF descriptors | |

Dm | X | D total accessibility index/weighted by mass | WHIM descriptors | |

H8u | X | H autocorrelation of lag 8/unweighted | GETAWAY descriptors | |

O-059 | X | Al-O-Al | Atom-centred fragments | |

B09[C-C] | X | Presence/absence of C—C at topological distance 9 | 2D Atom Pairs | |

SpMax3_Bh(m) | X | largest eigenvalue n. 3 of Burden matrix weighted by mass | Burden eigenvalues | |

CATS2D_05_NL | X | CATS2D Negative-Lipophilic at lag 05 | CATS 2D | |

Eig02_EA(dm) | X | eigenvalue n. 2 from edge adjacency mat. weighted by dipole moment | Edge adjacency indices | |

C-043 | X | X--CR.X | Atom-centred fragments |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Daghighi, A.; Casanola-Martin, G.M.; Timmerman, T.; Milenković, D.; Lučić, B.; Rasulev, B.
In Silico Prediction of the Toxicity of Nitroaromatic Compounds: Application of Ensemble Learning QSAR Approach. *Toxics* **2022**, *10*, 746.
https://doi.org/10.3390/toxics10120746

**AMA Style**

Daghighi A, Casanola-Martin GM, Timmerman T, Milenković D, Lučić B, Rasulev B.
In Silico Prediction of the Toxicity of Nitroaromatic Compounds: Application of Ensemble Learning QSAR Approach. *Toxics*. 2022; 10(12):746.
https://doi.org/10.3390/toxics10120746

**Chicago/Turabian Style**

Daghighi, Amirreza, Gerardo M. Casanola-Martin, Troy Timmerman, Dejan Milenković, Bono Lučić, and Bakhtiyor Rasulev.
2022. "In Silico Prediction of the Toxicity of Nitroaromatic Compounds: Application of Ensemble Learning QSAR Approach" *Toxics* 10, no. 12: 746.
https://doi.org/10.3390/toxics10120746