Evaluation of QSAR Equations for Virtual Screening
Abstract
:1. Introduction
2. Results
3. Discussion
4. Methods
4.1. Datasets
4.2. Multiple Linear Regression (MLR)
4.3. Random Forest (RF) and Support Vector Machine (SVM)
4.4. A Novel Algorithm for the Derivation of QSAR Equations (Enrichment Optimization Algorithm; EOA)
- Given a dataset of M compounds (of which L are active), characterized by N descriptors:
- Select random descriptors.
- Select random weights.
- For each compound calculate a predictive activity value: .
- Sort , from highest to lowest.
- Count the number of known actives, within the first L places of the sorted list. Call this number .
- Optionally select new descriptors with new weights and/or modify the weights of the current descriptors so that , where, .
- Calculate or
- Sort {, from highest to lowest.
- Count the number of actives, within the first L places of the sorted list. Call the number .
- If , accept and set: ; ; .
- If accept according to the Metropolis MC criterion:
- A number, r, between 0 to 1 is generated randomly and the step is accepted if , where . When using MMC simulations to obtain the canonical ensemble, R is the gas constant and T is the absolute temperature. When using MMC as global optimizer as in the present case, R and T are constants with no physical meaning and their values simply determine the acceptance rate. In the present case, the term RT was linearly reduced in accord with the simulated annealing procedure.
- If the step is rejected, keep the old values of the descriptors and weights.
- Go back to step 7.
- During the simulation process, keep the best value of P, Pbest and its associated descriptors and weights. If several solutions lead to the same value of Pbest, keep them all.
- Keep a list {B} of the Z best solutions having the best score, .
- For each solution Z in {B} independently normalize the indices of the L active compounds and the M-L inactive compounds. Designate the normalized indices as j’ and j’’, respectively.
- For each solution Z in {B} calculate a score SZ as follows:
- Sum the normalized j’ indices for the active compounds with ranks > L, Sactive = .
- Sum the normalized j’’ indices for the inactive compounds with ranks < L, Sinactive = .
- Set Sz = Sactive − Sinactive
- The solution with the lowest score is the best one and will be tested on the validation and test sets.
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Kim, E.; Nam, H. Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints. BMC Bioinform. 2017, 18, 227. [Google Scholar] [CrossRef] [Green Version]
- Low, Y.; Uehara, T.; Minowa, Y.; Yamada, H.; Ohno, Y.; Urushidani, T.; Sedykh, A.; Muratov, E.; Kuz’Min, V.; Fourches, D.; et al. Predicting Drug-Induced Hepatotoxicity Using QSAR and Toxicogenomics Approaches. Chem. Res. Toxicol. 2011, 24, 1251–1262. [Google Scholar] [CrossRef] [Green Version]
- Öberg, T. A QSAR for Baseline Toxicity: Validation, Domain of Application, and Prediction. Chem. Res. Toxicol. 2004, 17, 1630–1637. [Google Scholar] [CrossRef] [PubMed]
- Mazzatorta, P.; Smieško, M.; Piparo, E.L.; Benfenati, E. QSAR Model for Predicting Pesticide Aquatic Toxicity. J. Chem. Inf. Model. 2005, 45, 1767–1774. [Google Scholar] [CrossRef] [PubMed]
- Alves, V.M.; Capuzzi, S.J.; Muratov, E.N.; Braga, R.C.; Thornton, T.E.; Fourches, D.; Strickland, J.; Kleinstreuer, N.; Andrade, C.H.; Tropsha, A. QSAR models of human data can enrich or replace LLNA testing for human skin sensitization. Green Chem. 2016, 18, 6501–6515. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Capuzzi, S.J.; Sun, W.; Muratov, E.N.; Martínez-Romero, C.; He, S.; Zhu, W.; Li, H.; Tawa, G.J.; Fisher, E.G.; Xu, M.; et al. Computer-Aided Discovery and Characterization of Novel Ebola Virus Inhibitors. J. Med. Chem. 2018, 61, 3582–3594. [Google Scholar] [CrossRef]
- Wignall, J.A.; Muratov, E.; Sedykh, A.; Guyton, K.Z.; Tropsha, A.; Rusyn, I.; A Chiu, W. Conditional Toxicity Value (CTV) Predictor: An In Silico Approach for Generating Quantitative Risk Estimates for Chemicals. Environ. Heal. Perspect. 2018, 126, 057008. [Google Scholar] [CrossRef] [Green Version]
- Tropsha, A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol. Inform. 2010, 29, 476–488. [Google Scholar] [CrossRef]
- Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef]
- Tropsha, A.; Gramatica, P.; Gombar, V.K. The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models. QSAR Comb. Sci. 2003, 22, 69–77. [Google Scholar] [CrossRef]
- Gramatica, P. Principles of QSAR models validation: internal and external. QSAR Comb. Sci. 2007, 26, 694–701. [Google Scholar] [CrossRef]
- Consonni, V.; Ballabio, D.; Todeschini, R. Comments on the Definition of theQ2Parameter for QSAR Validation. J. Chem. Inf. Model. 2009, 49, 1669–1678. [Google Scholar] [CrossRef]
- Schüürmann, G.; Ebert, R.-U.; Chen, J.; Wang, B.; Kühne, R. External Validation and Prediction Employing the Predictive Squared Correlation Coefficient—Test Set Activity Mean vs Training Set Activity Mean. J. Chem. Inf. Model. 2008, 48, 2140–2145. [Google Scholar] [CrossRef] [PubMed]
- Gramatica, P.; Sangion, A. A Historical Excursus on the Statistical Validation Parameters for QSAR Models: A Clarification Concerning Metrics and Terminology. J. Chem. Inf. Model. 2016, 56, 1127–1131. [Google Scholar] [CrossRef] [PubMed]
- Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics. Methods Princ. Med. Chem. 2009, 41. [Google Scholar] [CrossRef]
- Yosipof, A. Optimization Algorithms for Chemoinformatics and Material-informatics; InTechOpen: London, UK, 2016. [Google Scholar] [CrossRef] [Green Version]
- Hou, T.J.; Wang, J.; Liao, N.; Xu, X.J. Applications of Genetic Algorithms on the Structure−Activity Relationship Analysis of Some Cinnamamides. J. Chem. Inf. Comput. Sci. 1999, 39, 775–781. [Google Scholar] [CrossRef] [Green Version]
- Le, T.C.; Winkler, D.A. Discovery and Optimization of Materials Using Evolutionary Approaches. Chem. Rev. 2016, 116, 6107–6132. [Google Scholar] [CrossRef]
- Namasivayam, V.; Bajorath, J. Multiobjective Particle Swarm Optimization: Automated Identification of Structure–Activity Relationship-Informative Compounds with Favorable Physicochemical Property Distributions. J. Chem. Inf. Model. 2012, 52, 2848–2855. [Google Scholar] [CrossRef]
- Glick, M.; Rayan, A.; Goldblum, A. A stochastic algorithm for global optimization and for best populations: A test case of side chains in proteins. Proc. Natl. Acad. Sci. USA 2002, 99, 703–708. [Google Scholar] [CrossRef] [Green Version]
- Metropolis, N.; Ulam, S. The Monte Carlo Method. J. Am. Stat. Assoc. 1949, 44, 335–341. [Google Scholar] [CrossRef]
- Tropsha, A. Predictive QSAR Modeling Workflow, Model Applicability Domains, and Virtual Screening. Curr. Pharm. Des. 2007, 13, 3494–3504. [Google Scholar] [CrossRef] [PubMed]
- Mueller, R.; Rodriguez, A.L.; Dawson, E.S.; Butkiewicz, M.; Nguyen, T.T.; Oleszkiewicz, S.; Bleckmann, A.; Weaver, C.D.; Lindsley, C.W.; Conn, P.J.; et al. Identification of Metabotropic Glutamate Receptor Subtype 5 Potentiators Using Virtual High-Throughput Screening. ACS Chem. Neurosci. 2010, 1, 288–305. [Google Scholar] [CrossRef] [PubMed]
- Neves, B.J.; Braga, R.C.; Melo-Filho, C.C.; Moreira-Filho, J.T.; Muratov, E.N.; Andrade, C.H. QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery. Front. Pharmacol. 2018, 9, 1275. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hoffmann, T.; Gastreich, M. The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov. Today 2019, 24, 1148–1156. [Google Scholar] [CrossRef] [PubMed]
- Lyu, J.; Wang, S.; Balius, T.E.; Singh, I.; Levit, A.; Moroz, Y.S.; O’Meara, M.J.; Che, T.; Algaa, E.; Tolmachova, K.; et al. Ultra-large library docking for discovering new chemotypes. Nat. Cell Biol. 2019, 566, 224–229. [Google Scholar] [CrossRef]
- Reymond, J.-L. The Chemical Space Project. Accounts Chem. Res. 2015, 48, 722–730. [Google Scholar] [CrossRef] [Green Version]
- Arús-Pous, J.; Awale, M.; Probst, D.; Reymond, J.-L. Exploring Chemical Space with Machine Learning. Chim. Int. J. Chem. 2019, 73, 1018–1023. [Google Scholar] [CrossRef]
- Arús-Pous, J.; Blaschke, T.; Ulander, S.; Reymond, J.-L.; Chen, H.; Engkvist, O. Exploring the GDB-13 chemical space using deep generative models. J. Chemin. 2019, 11, 20. [Google Scholar] [CrossRef]
- Alexander, D.L.J.; Tropsha, A.; Winkler, D.A. Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models. J. Chem. Inf. Model. 2015, 55, 1316–1322. [Google Scholar] [CrossRef] [Green Version]
- ChEMBL Database. Available online: https://www.ebi.ac.uk/chembl/ (accessed on 26 April 2020).
- Braga, R.C.; Alves, V.M.; Silva, M.F.B.; Muratov, E.; Fourches, D.; Tropsha, A.; Andrade, C.H. Tuning HERG out: antitarget QSAR models for drug development. Curr. Top. Med. Chem. 2014, 14, 1399–1415. [Google Scholar] [CrossRef] [Green Version]
- Schrödinger. Schrödinger Release 2020-2: LigPrep; Schrödinger, L.L.C.: New York, NY, USA, 2020. [Google Scholar]
- Schrödinger. Schrödinger Release 2019-2: Canvas; Schrödinger, L.L.C.: New York, NY, USA, 2019. [Google Scholar]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef] [PubMed]
- Dixon, S.; Merz, K.M.; Lauri, G.; Ianni, J. QMQSAR: Utilization of a semiempirical probe potential in a field-based QSAR method. J. Comput. Chem. 2004, 26, 23–34. [Google Scholar] [CrossRef] [PubMed]
- Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A Training Algorithm for Optimal Margin Classifiers; ACM: New York, NY, USA, 1992; pp. 144–152. [Google Scholar]
- Cortes, C.; Vapnik, V.; Saitta, L. Support-Vector Networks Editor; Kluwer Academic Publishers: Boston, MA, USA, 1995; Volume 20. [Google Scholar]
- Chang, C.-C.; Lin, C.-J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Frank, E.; Hall, M.A.; Witten, I.H. The WEKA Workbench. Online Appendix. Data Min. Pract. Mach. Learn. Tools Tech. 2016, 128. [Google Scholar]
Set | # Actives = L | # Descriptors | MC Steps | Train | Validation | Test1 | Test2 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
# Actives Among L Top Places (Enrichment) | # Actives among L Top Places (Enrichment) | # Actives among L top Places (Enrichment) | # Actives = L | # Actives among L Top Places (Enrichment) | ||||||||
M2 | 50 | 7 | 106 | 0.77 | 48 (2.6) | 0.66 | 0.66 | 0.67 | 43 (2.4) | 32 (66.4) | 67 | 47 (54.5) |
M2 | 10 | 106 | 0.80 | 48 (2.6) | 0.63 | 0.63 | 0.64 | 45 (2.5) | 0 (0) | 1 (1.2) | ||
M2 | 13 | 106 | 0.82 | 48 (2.6) | 0.61 | 0.61 | 0.62 | 42 (2.3) | 0 (0) | 0 (0) | ||
H1 | 50 | 7 | 106 | 0.73 | 47 (2.5) | 0.59 | 0.59 | 0.63 | 42 (2.3) | 0 (0) | 42 | 0 (0) |
H1 | 10 | 106 | 0.78 | 48 (2.6) | 0.45 | 0.45 | 0.49 | 39 (2.1) | 0 (0) | 0 (0) | ||
H1 | 13 | 106 | 0.82 | 49 (2.6) | 0.56 | 0.56 | 0.60 | 41 (2.2) | 0 (0) | 0 (0) | ||
5HT2C | 50 | 7 | 106 | 0.58 | 43 (2.4) | 0.14 | 0.14 | 0.18 | 32 (1.8) | 10 (20.8) | 58 | 21 (32.5) |
5HT2C | 10 | 106 | 0.65 | 44 (2.5) | 0.08 | 0.08 | 0.12 | 34 (1.9) | 1 (2.1) | 5 (7.7) | ||
5HT2C | 13 | 106 | 0.70 | 45 (2.5) | −0.08 | −0.08 | −0.03 | 30 (1.7) | 1 (2.1) | 6 (9.3) | ||
hERG | 100 | 7 | 106 | 0.34 | 67 (4.7) | 0.28 | 0.28 | 0.16 | 64 (4.5) | 87 (45.6) | 26 | 21 (160.5) |
hERG | 10 | 106 | 0.36 | 69 (4.8) | 0.32 | 0.31 | 0.23 | 68 (4.8) | 87 (45.6) | 23 (175.8) | ||
hERG | 13 | 106 | 0.39 | 71 (5.0) | 0.33 | 0.33 | 0.24 | 72 (5.0) | 91 (47.7) | 22 (168.2) | ||
M3 | 75 | 7 | 106 | 0.85 | 74 (2.0) | 0.66 | 0.66 | 0.68 | 68 (1.8) | 0 (0) | 4 | 0 (0) |
M3 | 10 | 106 | 0.89 | 74 (2.0) | 0.68 | 0.68 | 0.70 | 67 (1.8) | 0 (0) | 0 (0) | ||
M3 | 13 | 106 | 0.91 | 75 (2.0) | 0.73 | 0.73 | 0.75 | 70 (1.9) | 0 (0) | 0 (0) | ||
D1 | 58 | 7 | 106 | 0.83 | 57 (2.0) | 0.81 | 0.81 | 0.80 | 56 (1.9) | 20 (30.9) | 20 | 2 (25.8) |
D1 | 10 | 106 | 0.86 | 57 (2.0) | 0.77 | 0.77 | 0.75 | 57 (2.0) | 0 (0) | 0 (0) | ||
D1 | 13 | 106 | 0.88 | 58 (2.0) | 0.74 | 0.74 | 0.72 | 56 (1.9) | 0 (0) | 0 (0) | ||
Alpha2C | 57 | 7 | 106 | 0.77 | 53 (1.9) | 0.77 | 0.77 | 0.77 | 56 (2.0) | 33 (52.8) | 1 | 0 (0) |
Alpha2C | 10 | 106 | 0.80 | 53 (1.9) | 0.70 | 0.70 | 0.70 | 55 (1.9) | 26 (41.6) | 0 (0) | ||
Alpha2C | 13 | 106 | 0.83 | 54 (1.9) | 0.71 | 0.71 | 0.72 | 53 (1.9) | 29 (46.4) | 0 (0) |
Set | # Descriptors | # Actives = L | MC Steps | # Actives among L Top Places in Best Model (Enrichment) | Test2 | |||
---|---|---|---|---|---|---|---|---|
Train | Validation | Test1 | # Actives = L | # Actives among L Top Places in Best Model (Enrichment) | ||||
M2 | 7 | 50 | 106 | 47 (2.5) | 40 (2.1) | 40 (83.1) | 67 | 56 (65.0) |
M2 | 10 | 50 | 106 | 47 (2.5) | 44 (2.4) | 39 (81.0) | 54 (62.6) | |
M2 | 13 | 50 | 106 | 47 (2.5) | 42 (2.3) | 38 (78.9) | 54 (62.6) | |
H1 | 7 | 50 | 106 | 48 (2.6) | 37 (2.1) | 31 (64.4) | 42 | 23 (67.6) |
H1 | 10 | 50 | 106 | 49 (2.6) | 42 (2.3) | 32 (66.4) | 24 (70.5) | |
H1 | 13 | 50 | 106 | 48 (2.6) | 38 (2.0) | 32 (66.4) | 22 (64.6) | |
5HT2C | 7 | 50 | 106 | 45 (2.5) | 34 (1.9) | 0 (0) | 58 | 6 (9.3) |
5HT2C | 10 | 50 | 106 | 45 (2.5) | 32 (1.8) | 1 (2.1) | 8 (12.4) | |
5HT2C | 13 | 50 | 106 | 47 (2.6) | 33 (1.8) | 0 (0) | 0 (0) | |
hERG | 7 | 100 | 106 | 67 (4.7) | 60 (4.2) | 86 (45.1) | 26 | 22 (168.2) |
hERG | 10 | 100 | 106 | 75 (5.3) | 61 (4.3) | 89 (46.6) | 23 (175.8) | |
hERG | 13 | 100 | 106 | 77 (5.4) | 59 (4.1) | 87 (45.6) | 23 (175.8) | |
M3 | 7 | 75 | 106 | 74 (2.0) | 65 (1.7) | 49 (45.4) | 4 | 3 (964.7) |
M3 | 10 | 75 | 106 | 74 (2.0) | 67 (1.8) | 0 (0) | 0 (0) | |
M3 | 13 | 75 | 106 | 74 (2.0) | 70 (1.9) | 57 (52.9) | 3 (964.7) | |
D1 | 7 | 58 | 106 | 56 (1.9) | 54 (1.9) | 29 (44.8) | 20 | 11 (141.9) |
D1 | 10 | 58 | 106 | 57 (2.0) | 55 (1.9) | 20 (30.9) | 1 (12.9) * | |
D1 | 13 | 58 | 106 | 57 (2.0) | 54 (1.9) | 41 (63.4) | 17 (219.3) | |
Alpha2C | 7 | 57 | 106 | 49 (1.7) | 47 (1.6) | 25 (40.0) | 1 | 0 (0) |
Alpha2C | 10 | 57 | 106 | 49 (1.7) | 45 (1.6) | 25 (40.0) | 0 (0) | |
Alpha2C | 13 | 57 | 106 | 56 (2.0) | 52 (1.8) | 15 (24.0) | 0 (0) |
Set | Run | # Actives among L Top Places (Enrichment; MCC) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
EOA | RF | SVM | ||||||||
Train | Validation | Test 1 | Train | Validation | Test 1 | Train | Validation | Test 1 | ||
M2 | 1 | 46 (9.2; 0.91) | 41 (8.2; 0.80) | 50 (103.8; 1.00) | 42 (8.4; 0.88) | 39 (7.8; 0.77) | 39 (81.0; 0.88) | 43 (8.6; 0.92) | 36 (7.2; 0.84) | 36 (74.8; 0.85) |
2 | 46 (9.2; 0.91) | 40 (8.0; 0.78) | 48 (99.7; 0.96) | 42 (8.4; 0.83) | 37 (7.4; 0.73) | 37 (76.8; 0.86) | 42 (8.4; 0.83) | 37 (7.4; 0.84) | 37 (76.8; 0.86) | |
3 | 45 (9.0; 0.89) | 41 (8.2; 0.80) | 47 (97.6; 0.94) | 43 (8.6; 0.86) | 42 (8.4; 0.79) | 42 (87.2; 0.92) | 41 (8.2; 0.84) | 37 (7.4; 0.73) | 37 (76.8; 0.86) | |
4 | 44 (8.8; 0.87) | 42 (8.4; 0.82) | 48 (99.7; 0.96) | 40 (8.0; 0.81) | 40 (8.0; 0.80) | 40 (83.1; 0.89) | 41 (8.2; 0.82) | 38 (7.6; 0.77) | 38 (78.9; 0.87) | |
H1 | 1 | 44 (8.8; 0.87) | 36 (7.2; 0.69) | 38 (78.9; 0.76) | 46 (9.2; 0.92) | 29 (5.8; 0.61) | 29 (60.2; 0.73) | 41 (8.2; 0.88) | 33 (6.6; 0.76) | 33 (68.5; 0.81) |
2 | 42 (8.4; 0.82) | 31 (6.2; 0.58) | 47 (97.6; 0.94) | 47 (9.4; 0.91) | 33 (6.6; 0.67) | 33 (68.5; 0.78) | 40 (8.0; 0.85) | 34 (6.8; 0.78) | 34 (70.6; 0.82) | |
3 | 38 (7.6; 0.73) | 34 (6.8; 0.64) | 42 (87.2; 0.84) | 47 (9.4; 0.88) | 39 (7.8; 0.77) | 39 (81.0; 0.70) | 39 (7.8; 0.86) | 31 (6.2; 0.73) | 31 (64.4; 0.79) | |
4 | 43 (8.6; 0.84) | 29 (5.8; 0.53) | 39 (81.0; 0.78) | 41 (8.2; 0.82) | 27 (5.4; 0.46) | 27 (56.1; 0.52) | 39 (7.8; 0.79) | 27 (5.4; 0.7) | 27 (56.1; 0.73) | |
5HT2C | 1 | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.98) | 50 (103.8; 1.00) | 50 (10.0; 1.00) | 48 (9.6; 0.98) | 48 (99.7; 0.98) |
2 | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.99) | 50 (103.8; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | |
3 | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | 50 (10.0; 0.99) | 50 (10.0; 0.96) | 50 (103.8; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | |
4 | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.99) | 50 (103.8; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | |
hERG | 1 | 41 (8.2; 0.80) | 29 (5.8; 0.53) | 42 (87.2; 0.84) | 39 (7.8; 0.83) | 23 (4.6; 0.41) | 23 (47.8; 0.31) | 35 (7.0; 0.81) | 19 (3.8; 0.55) | 19 (39.5; 0.61) |
2 | 42 (8.4; 0.82) | 22 (4.4; 0.38) | 38 (78.9; 0.76) | 42 (8.4; 0.90) | 26 (5.2; 0.56) | 26 (54.0; 0.33) | 38 (7.6; 0.84) | 19 (3.8; 0.52) | 19 (39.5; 0.61) | |
3 | 39 (7.8; 0.76) | 23 (4.6; 0.40) | 45 (93.4; 0.90) | 35 (7.0; 0.77) | 32 (6.4; 0.68) | 32 (66.4; 0.64) | 31 (6.2; 0.77) | 22 (4.4; 0.63) | 22 (45.7; 0.66) | |
4 | 38 (7.6; 0.73) | 29 (5.8; 0.53) | 47 (97.6; 0.94) | 34 (6.8; 0.78) | 19 (3.8; 0.49) | 19 (39.5; 0.55) | 33 (6.6; 0.8) | 18 (3.6; 0.56) | 18 (37.4; 0.60) | |
D1 | 1 | 42 (8.4; 0.82) | 37 (7.4; 0.71) | 44 (91.4; 0.88) | 42 (8.4; 0.86) | 33 (6.6; 0.71) | 33 (68.5; 0.69) | 45 (9.0; 0.94) | 44 (8.8; 0.84) | 44 (91.4; 0.94) |
2 | 42 (8.4; 0.82) | 33 (6.6; 0.62) | 46 (95.5; 0.92) | 50 (10;.0 0.97) | 31 (6.2; 0.71) | 31 (64.4; 0.55) | 44 (8.8; 0.91) | 35 (7.0; 0.81) | 35 (72.7; 0.84) | |
3 | 41 (8.2; 0.80) | 41 (8.2; 0.80) | 45 (93.4; 0.90) | 48 (9.6; 0.93) | 37 (7.4; 0.68) | 37 (76.8; 0.71) | 46 (9.2; 0.94) | 36 (7.2; 0.79) | 36 (74.8; 0.85) | |
4 | 39 (7.8; 0.76) | 37 (7.4; 0.71) | 45 (93.4; 0.90) | 48 (9.6; 0.98) | 40 (8.0; 0.82) | 40 (83.1; 0.87) | 42 (8.4; 0.91) | 37 (7.4; 0.80) | 37 (76.8; 0.86) | |
M3 | 1 | 44 (8.8; 0.87) | 42 (8.4; 0.82) | 49 (101.7 0.98) | 50 (10.0; 0.98) | 50 (10.0; 0.96) | 50 (103.8; 1.00) | 45 (9.0; 0.94) | 45 (9.0; 0.92) | 45 (93.4; 0.95) |
2 | 48 (9.6; 0.96) | 44 (8.8; 0.87) | 48 (99.7; 0.96) | 38 (7.6; 0.84) | 35 (7.0; 0.78) | 35 (72.7; 0.84) | 40 (8.0; 0.88) | 35 (7.0; 0.82) | 35 (72.7; 0.84) | |
3 | 44 (8.8; 0.87) | 41 (8.2; 0.80) | 48 (99.7; 0.96) | 41 (8.2; 0.88) | 32 (6.4; 0.78) | 32 (66.4; 0.80) | 41 (8.2; 0.9) | 33 (6.6; 0.80) | 33 (68.5; 0.81) | |
4 | 48 (9.6; 0.96) | 44 (8.8; 0.87) | 44 (91.4; 0.88) | 44 (8.8; 0.92) | 37 (7.4; 0.84) | 37 (76.8; 0.86) | 42 (8.4; 0.91) | 37 (7.4; 0.85) | 37 (76.8; 0.86) | |
Alpha2C | 1 | 40 (8.0; 0.78) | 34 (6.8; 0.64) | 37 (76.8; 0.74) | 42 (8.4; 0.86) | 35 (7.0; 0.72) | 35 (72.7; 0.84) | 37 (7.4; 0.85) | 33 (6.6; 0.77) | 33 (68.5; 0.81) |
2 | 43 (8.6; 0.84) | 29 (5.8; 0.53) | 44 (91.4; 0.88) | 42 (8.4; 0.87) | 38 (7.6; 0.78) | 38 (78.9; 0.87) | 39 (7.8; 0.86) | 34 (6.8; 0.78) | 34 (70.6; 0.82) | |
3 | 43 (8.6; 0.84) | 35 (7.0; 0.67) | 43 (89.3; 0.86) | 41 (8.2; 0.85) | 38 (7.6; 0.75) | 38 (78.9; 0.79) | 42 (8.4; 0.91) | 34 (6.8; 0.76) | 34 (70.6; 0.82) | |
4 | 40 (8.0; 0.78) | 37 (7.4; 0.71) | 42 (87.2; 0.84) | 39 (7.8; 0.81) | 40 (8.0; 0.81) | 40 (83.1; 0.89) | 44 (8.8; 0.93) | 41 (8.2; 0.82) | 41 (85.1; 0.90) |
Set | Run | # Actives among L Top Places (Enrichment; MCC) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
EOA | RF | SVM | ||||||||
Train | Validation | Test 2 | Train | Validation | Test 2 | Train | Validation | Test 2 | ||
M2 | 1 * | 46 (9.2; 0.91) | 41 (8.2; 0.80) | 6 (857.8; 1.00) | 42 (8.4; 0.88) | 39 (7.8; 0.77) | 5 (714.9; 0.91) | 43 (8.6; 0.92) | 36 (7.2; 0.84) | 5 (714.9; 0.91) |
2 | 47 (9.4; 0.93) | 37 (7.4; 0.71) | 6 (857.8; 1.00) | 42 (8.4; 0.83) | 37 (7.4; 0.73) | 5 (714.9; 0.91) | 42 (8.4; 0.83) | 37 (7.4; 0.84) | 5 (714.9; 0.91) | |
3 | 49 (9.8; 0.98) | 42 (8.4; 0.82) | 5 (714.9; 0.83) | 43 (8.6; 0.86) | 42 (8.4; 0.79) | 1 (143; 0.41) | 41 (8.2; 0.84) | 37 (7.4; 0.73) | 2 (285.9; 0.58) | |
4 * | 44 (8.8; 0.87) | 42 (8.4; 0.82) | 5 (714.9; 0.83) | 40 (8.0; 0.81) | 40 (8.0; 0.80) | 5 (714.9; 0.91) | 41 (8.2; 0.82) | 38 (7.6; 0.77) | 5 (714.9; 0.91) | |
H1 | 1 | 46 (9.2; 0.91) | 36 (7.2; 0.69) | 115 (32.8; 0.84) | 46 (9.2; 0.92) | 29 (5.8; 0.61) | 84 (24; 0.77) | 41 (8.2; 0.88) | 33 (6.6; 0.76) | 81 (23.1; 0.77) |
2 * | 42 (8.4; 0.82) | 31 (6.2; 0.58) | 128 (34.5; 0.91) | 47 (9.4; 0.91) | 33 (6.6; 0.67) | 85 (22.9; 0.76) | 40 (8.0; 0.85) | 34 (6.8; 0.78) | 81 (21.8; 0.76) | |
3 * | 38 (7.6; 0.73) | 34 (6.8; 0.64) | 115 (34.3; 0.86) | 47 (9.4; 0.88) | 39 (7.8; 0.77) | 88 (26.2; 0.72) | 39 (7.8; 0.86) | 31 (6.2; 0.73) | 71 (21.2; 0.73) | |
4 * | 43 (8.6; 0.84) | 29 (5.8; 0.53) | 120 (33.3; 0.87) | 41 (8.2; 0.82) | 27 (5.4; 0.46) | 90 (25.0; 0.70) | 39 (7.8; 0.79) | 27 (5.4; 0.70) | 97 (26.9; 0.84) | |
5HT2C | 1 * | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 98 (53.5; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.98) | 98 (53.5; 1.00) | 50 (10.0; 1.00) | 48 (9.6; 0.98) | 97 (52.9; 0.99) |
2 * | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 97 (54.0; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.99) | 97 (54; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 97 (54.0; 1.00) | |
3 * | 50 (10.0; 1.00) | 49 (9.8; 0.98) | 98 (53.5; 1.00) | 50 (10.0; 0.99) | 50 (10.0; 0.96) | 98 (53.5; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 98 (53.5; 1.00) | |
4 * | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 98 (53.5; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.99) | 98 (53.5; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 98 (53.5; 1.00) | |
hERG | 1 * | 41 (8.2; 0.80) | 29 (5.8; 0.53) | 112 (37.2; 0.89) | 39 (7.8; 0.83) | 23 (4.6; 0.41) | 65 (21.6; 0.47) | 35 (7.0; 0.81) | 19 (3.8; 0.55) | 62 (20.6; 0.70) |
2 * | 42 (8.4; 0.82) | 22 (4.4; 0.38) | 110 (78.9; 0.87) | 42 (8.4; 0.90) | 26 (5.2; 0.56) | 72 (23.9; 0.49) | 38 (7.6; 0.84) | 19 (3.8; 0.52) | 51 (16.9; 0.63) | |
3 | 37 (7.4; 0.71) | 28 (5.6; 0.51) | 103 (34.2; 0.81) | 35 (7.0; 0.77) | 32 (6.4; 0.68) | 70 (23.2; 0.66) | 31 (6.2; 0.77) | 22 (4.4; 0.63) | 61 (20.2; 0.69) | |
4 | 42 (8.4; 0.82) | 32 (6.4; 0.6) | 111 (36.8; 0.88) | 34 (6.8; 0.78) | 19 (3.8; 0.49) | 42 (13.9; 0.54) | 33 (6.6; 0.80) | 18 (3.6; 0.56) | 47 (15.6; 0.61) | |
D1 | 1 * | 42 (8.4; 0.82) | 37 (7.4; 0.71) | 40 (102.4; 0.89) | 42 (8.4; 0.86) | 33 (6.6; 0.71) | 35 (89.6; 0.75) | 45 (9.0; 0.94) | 44 (8.8; 0.84) | 38 (97.3; 0.92) |
2 | 42 (8.4; 0.82) | 37 (7.4; 0.71) | 41 (105; 0.91) | 50 (10.0; 0.97) | 31 (6.2; 0.71) | 35 (89.6; 0.64) | 44 (8.8; 0.91) | 35 (7.0; 0.81) | 35 (89.6; 0.88) | |
3 | 41 (8.2; 0.80) | 37 (7.4; 0.71) | 38 (93.2; 0.82) | 48 (9.6; 0.93) | 37 (7.4; 0.68) | 39 (95.6; 0.77) | 46 (9.2; 0.94) | 36 (7.2; 0.79) | 35 (85.8; 0.87) | |
4 | 40 (8.0; 0.78) | 37 (7.4; 0.71) | 43 (105.4; 0.93) | 48 (9.6; 0.98) | 40 (8.0; 0.82) | 44 (107.9; 0.96) | 42 (8.4; 0.91) | 37 (7.4; 0.80) | 39 (95.6; 0.92) | |
M3 | 1 * | 44 (8.8; 0.87) | 42 (8.4; 0.82) | 65 (77.7; 0.98) | 50 (10.0; 0.98) | 50 (10.0; 0.96) | 66 (78.9; 1.00) | 45 (9.0; 0.94) | 45 (9.0; 0.92) | 55 (65.7; 0.91) |
2 * | 48 (9.6; 0.96) | 44 (8.8; 0.87) | 66 (78.9; 1.00) | 38 (7.6; 0.84) | 35 (7.0; 0.78) | 47 (56.2; 0.84) | 40 (8.0; 0.88) | 35 (7.0; 0.82) | 48 (57.4; 0.85) | |
3 * | 44 (8.8; 0.87) | 41 (8.2; 0.8) | 64 (76.5; 0.97) | 41 (8.2; 0.88) | 32 (6.4; 0.78) | 40 (47.8; 0.78) | 41 (8.2; 0.90) | 33 (6.6; 0.80) | 43 (51.4; 0.81) | |
4 | 44 (8.8; 0.87) | 41 (8.2; 0.8) | 61 (72.9; 0.92) | 44 (8.8; 0.92) | 37 (7.4; 0.84) | 47 (56.2; 0.84) | 42 (8.4; 0.91) | 37 (7.4; 0.85) | 46 (55; 0.83) | |
Alpha2C | 1 * | 40 (8.0; 0.78) | 34 (6.8; 0.64) | 6 (255.5; 0.54) | 42 (8.4; 0.86) | 35 (7.0; 0.72) | 8 (340.6; 0.85) | 37 (7.4; 0.85) | 33 (6.6; 0.77) | 7 (298.0; 0.80) |
2 * | 43 (8.6; 0.84) | 29 (5.8; 0.53) | 9 (383.2; 0.82) | 42 (8.4; 0.87) | 38 (7.6; 0.78) | 9 (383.2; 0.90) | 39 (7.8; 0.86) | 34 (6.8; 0.78) | 8 (340.6; 0.85) | |
3 * | 43 (8.6; 0.84) | 35 (7.0; 0.67) | 8 (340.6; 0.73) | 41 (8.2; 0.85) | 38 (7.6; 0.75) | 7 (298.1; 0.54) | 42 (8.4; 0.91) | 34 (6.8; 0.76) | 7 (298.0; 0.80) | |
4 * | 40 (8.0; 0.78) | 37 (7.4; 0.71) | 6 (482.7; 0.75) | 39 (7.8; 0.81) | 40 (8.0; 0.81) | 5 (402.3; 0.79) | 44 (8.8; 0.93) | 41 (8.2; 0.82) | 6 (482.7; 0.87) |
Dataset | # Descriptors | Training Set | Validation Set | Test Set 1 | Test Set 2 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
# Actives | # Inactives | Maximal Enrichment | # Actives | # Inactives | Maximal Enrichment | # Actives | # Random | Maximal Enrichment | # Actives | # Random | Maximal Enrichment | ||
5HT2C | 7, 10, 13 | 50 | 87 | 2.7 | 50 | 87 | 2.7 | 50 | 5141 | 103.8 | 67 | 5141 | 77.7 |
M2 | 7, 10, 13 | 50 | 84 | 2.7 | 50 | 84 | 2.7 | 50 | 5141 | 103.8 | 42 | 5141 | 123.4 |
H1 | 7, 10, 13 | 50 | 90 | 2.8 | 50 | 90 | 2.8 | 50 | 5141 | 103.8 | 58 | 5141 | 89.6 |
hERG | 7, 10, 13 | 100 | 600 | 7.0 | 100 | 600 | 7.0 | 100 | 5141 | 52.4 | 26 | 5141 | 198.7 |
M3 | 7, 10, 13 | 75 | 75 | 2.0 | 75 | 75 | 2.0 | 75 | 5141 | 69.5 | 4 | 5141 | 1286.3 |
D1 | 7, 10, 13 | 58 | 58 | 2.0 | 58 | 58 | 2.0 | 58 | 5141 | 89.6 | 20 | 5141 | 258.1 |
Alpha2C | 7, 10, 13 | 57 | 57 | 2.0 | 57 | 57 | 2.0 | 57 | 5141 | 91.2 | 1 | 5141 | 5142.0 |
Dataset | # Descriptors | Training Set | Validation Set | Test Set 1 | Test Set 2 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
# Actives | # Inactives | Maximal Enrichment | # Actives | # Inactives | Maximal Enrichment | # Actives | # Inactives | Maximal Enrichment | # Actives | # Inactives | Maximal Enrichment | ||
5HT2C | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 97–98 * | 5141 | 53.4–54.0 |
M2 | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 6 | 5141 | 857.8 |
H1 | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 133–140 * | 5141 | 37.7–39.7 |
hERG | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 126 | 5141 | 41.8 |
M3 | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 66 | 5141 | 78.9 |
D1 | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 45–46 * | 5141 | 112.8–115.2 |
Alpha2C | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 8–11 * | 5141 | 468.4–643.6 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Spiegel, J.; Senderowitz, H. Evaluation of QSAR Equations for Virtual Screening. Int. J. Mol. Sci. 2020, 21, 7828. https://doi.org/10.3390/ijms21217828
Spiegel J, Senderowitz H. Evaluation of QSAR Equations for Virtual Screening. International Journal of Molecular Sciences. 2020; 21(21):7828. https://doi.org/10.3390/ijms21217828
Chicago/Turabian StyleSpiegel, Jacob, and Hanoch Senderowitz. 2020. "Evaluation of QSAR Equations for Virtual Screening" International Journal of Molecular Sciences 21, no. 21: 7828. https://doi.org/10.3390/ijms21217828
APA StyleSpiegel, J., & Senderowitz, H. (2020). Evaluation of QSAR Equations for Virtual Screening. International Journal of Molecular Sciences, 21(21), 7828. https://doi.org/10.3390/ijms21217828