Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms
Abstract
:1. Introduction
2. Results and Discussion
2.1. Chemical Space and Scaffold Analysis of the MFH Database
2.2. Performances Evaluation and Comparison of Developed Models
2.2.1. Performances of Classification and Ensemble Models on Dataset 1
2.2.2. Performances of Classification and Ensemble Models on Dataset 2
2.2.3. Performances of Regression Models on Dataset 3
2.2.4. Performances of Classification and Ensemble Models on Dataset 4
2.2.5. Performances of Classification and Ensemble Models on Dataset 5
2.2.6. Performances of Regression Models on Dataset 6
2.2.7. Performances of the External Validation Sets
2.3. Virtual Screening on the MFH Database
2.3.1. Potential COX-2 Inhibitors in the MFH Database
Candidates | Origins | IC50 μM a | SOM HP b | Effects |
---|---|---|---|---|
cmp_A1 | Radix Salviae | 1.74 | 0.91 | Anti-inflammation [25] |
cmp_A2 | Gigeriae Galli Endothelium | 2.69 | 0.82 | |
cmp_A3 | Panax Ginseng | 3.47 | 1 | COX-2 inhibition [24] |
cmp_A4 | Angelica sinensis Radix | 3.69 | 1 | |
cmp_A5 | Jujubae Fructus | 4.24 | 1 | Antibacterial [28] |
cmp_A6 | Atractylodes macrocephala | 4.37 | 0.91 | Anti-inflammation [26] |
cmp_A7 | Lycii Fructus | 4.60 | 0.91 | Anti-inflammation [27] |
cmp_A8 | Fagopyrum esculentum | 5.20 | 0.82 | |
cmp_A9 | Mori Follum | 5.92 | 0.82 | |
cmp_A10 | Glycyrrhiza glabra L. | 6.97 | 1 | Anti-oxidation [29] |
2.3.2. Potential mPGES-1 Inhibitors in the MFH Database
2.3.3. Molecular Docking on the Potential COX-2 and mPGES-1 Inhibitors
Molecular Docking Analysis on Potential COX-2 Inhibitors
Molecular Docking Analysis on Potential mPGES-1 Inhibitors
3. Materials and Methods
3.1. Construction of the Catalogue for MFH Substances
3.2. Collection and Preparation of Active Ingredients from MFH Substances
3.3. Chemical Space Analysis on the MFH Substances Database
3.4. Construction of Datasets for Building Classification and Regression Models
3.4.1. Datasets for Modeling on COX-2 Inhibitors
3.4.2. Datasets for Modeling on mPGES-1 Inhibitors
3.4.3. Splitting Strategy for Generating the Training/Test Set
3.5. Characterization of Datasets
3.5.1. Binary Fingerprints for Classification Models
3.5.2. Physicochemical Molecular Descriptors for Regression Models
3.6. Supervised Machine Learning Algorithms for Modeling
3.6.1. Modeling with SVM, RF, and XGBoost
3.6.2. Modeling with DNN
3.7. Ensemble Learning Based on Developed Classification Models
3.8. Unsupervised Machine Learning on MFH Substances
3.9. Evaluation of Model Performances
3.10. Pan Assay Interference Compounds (PAINS) Screening
3.11. Molecular Docking
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Sample Availability
References
- Tabas, I.; Glass, C.K. Anti-Inflammatory Therapy in Chronic Disease: Challenges and Opportunities. Science 2013, 339, 166–172. [Google Scholar] [CrossRef] [PubMed]
- Dennis, E.A.; Norris, P.C. Eicosanoid Storm in Infection and Inflammation. Nat. Rev. Immunol. 2015, 15, 511–523. [Google Scholar] [CrossRef] [PubMed]
- Ju, Z.; Li, M.; Xu, J.; Howell, D.C.; Li, Z.; Chen, F.-E. Recent Development on COX-2 Inhibitors as Promising Anti-Inflammatory Agents: The Past 10 Years. Acta Pharm. Sin. B 2022, 12, 2790–2807. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.-Y.; Yao, Y.-D.; Luo, J.-F.; Liu, Z.-Q.; Huang, Y.-M.; Wu, F.-C.; Sun, Q.-H.; Liu, J.-X.; Zhou, H. Microsomal Prostaglandin E2 Synthase-1 and Its Inhibitors: Molecular Mechanisms and Therapeutic Significance. Pharmacol. Res. 2022, 175, 105977. [Google Scholar] [CrossRef]
- Xue, H.; Wang, W.; Bian, J.; Gao, Y.; Hao, Z.; Tan, J. Recent Advances in Medicinal and Edible Homologous Polysaccharides: Extraction, Purification, Structure, Modification, and Biological Activities. Int. J. Biol. Macromol. 2022, 222, 1110–1126. [Google Scholar] [CrossRef]
- Hu, Y.; Chen, X.; Hu, M.; Zhang, D.; Yuan, S.; Li, P.; Feng, L. Medicinal and Edible Plants in the Treatment of Dyslipidemia: Advances and Prospects. Chin. Med. J. 2022, 17, 113. [Google Scholar] [CrossRef]
- Yang, M.; Yan, T.; Yu, M.; Kang, J.; Gao, R.; Wang, P.; Zhang, Y.; Zhang, H.; Shi, L. Advances in Understanding of Health-promoting Benefits of Medicine and Food Homology Using Analysis of Gut Microbiota and Metabolomics. Food Front. 2020, 1, 398–419. [Google Scholar] [CrossRef]
- Ju, Z.; Lin, X.; Lu, X.; Tu, Z.; Wang, J.; Kaliyaperumal, K.; Liu, J.; Tian, Y.; Xu, S.; Liu, Y. Botryoisocoumarin A, a New COX-2 Inhibitor from the Mangrove Kandelia Candel Endophytic Fungus Botryosphaeria Sp. KcF6. J. Antibiot. 2015, 68, 653–656. [Google Scholar] [CrossRef]
- Ambati, G.G.; Jachak, S.M. Natural Product Inhibitors of Cyclooxygenase (COX) Enzyme: A Review on Current Status and Future Perspectives. Curr. Med. Chem. 2021, 28, 1877–1905. [Google Scholar] [CrossRef]
- Khan, H.; Rengasamy, K.R.R.; Pervaiz, A.; Nabavi, S.M.; Atanasov, A.G.; Kamal, M.A. Plant-Derived MPGES-1 Inhibitors or Suppressors: A New Emerging Trend in the Search for Small Molecules to Combat Inflammation. Eur. J. Med. Chem. 2018, 153, 2–28. [Google Scholar] [CrossRef]
- Schaible, A.M.; Traber, H.; Temml, V.; Noha, S.M.; Filosa, R.; Peduto, A.; Weinigel, C.; Barz, D.; Schuster, D.; Werz, O. Potent Inhibition of Human 5-Lipoxygenase and Microsomal Prostaglandin E2 Synthase-1 by the Anti-Carcinogenic and Anti-Inflammatory Agent Embelin. Biochem. Pharmacol. 2013, 86, 476–486. [Google Scholar] [CrossRef] [PubMed]
- Hwang, S.H.; Wecksler, A.T.; Wagner, K.; Hammock, B.D. Rationally Designed Multitarget Agents Against Inflammation and Pain. Curr. Med. Chem. 2013, 20, 1783–1799. [Google Scholar] [CrossRef] [PubMed]
- Badwan, B.A.; Liaropoulos, G.; Kyrodimos, E.; Skaltsas, D.; Tsirigos, A.; Gorgoulis, V.G. Machine Learning Approaches to Predict Drug Efficacy and Toxicity in Oncology. Cell Rep. Methods 2023, 3, 100413. [Google Scholar] [CrossRef]
- Wang, M.; Wang, J.; Weng, G.; Kang, Y.; Pan, P.; Li, D.; Deng, Y.; Li, H.; Hsieh, C.Y.; Hou, T. ReMODE: A Deep Learning-Based Web Server for Target-Specific Drug Design. J. Cheminform. 2022, 14, 84. [Google Scholar] [CrossRef] [PubMed]
- Kwon, Y.; Park, S.; Lee, J.; Kang, J.; Lee, H.J.; Kim, W. BEAR: A Novel Virtual Screening Method Based on Large-Scale Bioactivity Data. J. Chem. Inf. Model. 2023, 63, 1429–1437. [Google Scholar] [CrossRef]
- Zhang, L.-X.; Dong, J.; Wei, H.; Shi, S.-H.; Lu, A.-P.; Deng, G.-M.; Cao, D.-S. TCMSID: A Simplified Integrated Database for Drug Discovery from Traditional Chinese Medicine. J. Cheminform. 2022, 14, 89. [Google Scholar] [CrossRef]
- Yang, M.; Chen, J.; Shi, X.; Xu, L.; Xi, Z.; You, L.; An, R.; Wang, X. Development of in Silico Models for Predicting P-Glycoprotein Inhibitors Based on a Two-Step Approach for Feature Selection and Its Application to Chinese Herbal Medicine Screening. Mol. Pharm. 2015, 12, 3691–3713. [Google Scholar] [CrossRef]
- Ali, D.E.; Gedaily, R.A.E.; Ezzat, S.M.; Sawy, M.A.E.; Meselhy, M.R.; Abdel-Sattar, E. In Silico and in Vitro Anti-Inflammatory Study of Phenolic Compounds Isolated from Eucalyptus Maculata Resin. Sci. Rep. 2023, 13, 2093. [Google Scholar] [CrossRef]
- Traditional Chinese Medicine Integrated Database. Available online: http://47.100.169.139/tcmid/ (accessed on 3 June 2023).
- TCMSP—Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform. Available online: https://tcmsp-e.com/tcmsp.php (accessed on 3 June 2023).
- Godden, J.W.; Xue, L.; Bajorath, J. Combinatorial Preferences Affect Molecular Similarity/Diversity Calculations Using Binary Fingerprints and Tanimoto Coefficients. J. Chem. Inf. Comput. Sci. 2000, 40, 163–166. [Google Scholar] [CrossRef]
- Sinaga, K.P.; Yang, M.S. Unsupervised K-Means Clustering Algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
- Zhang, R.; Ren, S.; Dai, Q.; Shen, T.; Li, X.; Li, J.; Xiao, W. InflamNat: Web-Based Database and Predictor of Anti-Inflammatory Natural Products. J. Cheminform. 2022, 14, 30. [Google Scholar] [CrossRef] [PubMed]
- Fernandes, E.S.; Passos, G.F.; Medeiros, R.; Da Cunha, F.M.; Ferreira, J.; Campos, M.M.; Pianowski, L.F.; Calixto, J.B. Anti-Inflammatory Effects of Compounds Alpha-Humulene and (−)-Trans-Caryophyllene Isolated from the Essential Oil of Cordia Verbenacea. Eur. J. Pharmacol. 2007, 569, 228–236. [Google Scholar] [CrossRef] [PubMed]
- Lim, Y.; Lee, S.; Kim, J.Y.; Shin, J.H.; Kwon, O. A Phellinus Baumii–Based Supplement Containing Salvia Miltiorrhiza Bunge Improves Atherothrombotic Profiles through Endothelial Nitric Oxide Synthase and Cyclooxygenase Pathways in Vitro and in Vivo. J. Funct. Foods 2016, 24, 231–243. [Google Scholar] [CrossRef]
- Bayala, B.; Bassole, I.H.N.; Gnoula, C.; Nebie, R.; Yonli, A.; Morel, L.; Figueredo, G.; Nikiema, J.-B.; Lobaccaro, J.-M.A.; Simpore, J. Chemical Composition, Antioxidant, Anti-Inflammatory and Anti-Proliferative Activities of Essential Oils of Plants from Burkina Faso. PLoS ONE 2014, 9, e92122. [Google Scholar] [CrossRef] [PubMed]
- Tan, K.C.; Pham, T.X.; Lee, Y.; Lee, J.Y.; Balunas, M.J. Identification of Apocarotenoids as Chemical Markers of in Vitro Anti-Inflammatory Activity for Spirulina Supplements. J. Agric. Food Chem. 2021, 69, 12674–12685. [Google Scholar] [CrossRef]
- Morel, A.F.; Araujo, C.A. Antibacterial Cyclopeptide Alkaloids from the Bark of Condalia buxifolia. Phytochemistry 2002, 61, 561–566. [Google Scholar] [CrossRef] [PubMed]
- Chin, Y.W.; Jung, H.A.; Liu, Y.; Su, B.N.; Castoro, J.A.; Keller, W.J.; Pereira, M.A.; Kinghorn, A.D. Anti-Oxidant Constituents of the Roots and Stolons of Licorice (Glycyrrhiza glabra). J. Agric. Food Chem. 2007, 55, 4691–4697. [Google Scholar] [CrossRef]
- Vitale, P.; Panella, A.; Scilimati, A.; Perrone, M.G. COX-1 Inhibitors: Beyond Structure Toward Therapy: Selective COX-1 inhibitors. Med. Res. Rev. 2016, 36, 641–671. [Google Scholar] [CrossRef]
- Guo, F.; Feng, L.; Huang, C.; Ding, H.; Zhang, X.; Wang, Z.; Li, Y. Prenylflavone Derivatives from Broussonetia Papyrifera, Inhibit the Growth of Breast Cancer Cells in Vitro and in Vivo. Phytochem. Lett. 2013, 6, 331–336. [Google Scholar] [CrossRef]
- Lin, Y.; Kuang, Y.; Li, K.; Wang, S.; Song, W.; Qiao, X.; Sabir, G.; Ye, M. Screening for Bioactive Natural Products from a 67-Compound Library of Glycyrrhiza Inflata. Bioorg. Med. Chem. 2017, 25, 3706–3713. [Google Scholar] [CrossRef]
- Damodar, K.; Kim, J.K.; Jun, J.G. Synthesis and Pharmacological Properties of Naturally Occurring Prenylated and Pyranochalcones as Potent Anti-Inflammatory Agents. Chin. Chem. Lett. 2016, 27, 698–702. [Google Scholar] [CrossRef]
- Nishizawa, M.; Izuhara, R.; Kaneko, K.; Fujimoto, Y. 3-caffeoyl-4-sinapoylquinic acid, a novel lipdxygenase inhibitor from gardeniae fructus. Chem. Pharm. Bull. 1987, 35, 2133–2135. [Google Scholar] [CrossRef] [PubMed]
- Song, J.H.; Cui, L.; An, L.B.; Li, W.T.; Fang, Z.Z.; Zhang, Y.Y.; Dong, P.-P.; Wu, X.; Wang, L.-X.; Gonzalez, F.J.; et al. Inhibition of UDP-Glucuronosyltransferases (UGTs) Activity by Constituents of Schisandra Chinensis: UGTs’ Inhibition by Schisandra chinensis. Phytother. Res. 2015, 29, 1658–1664. [Google Scholar] [CrossRef] [PubMed]
- Chiu, T.; Ku, C.; Ho, T.; Tsai, K.; Yang, Y.; Ou, H.; Chen, H. Schisanhenol Ameliorates oxLDL -caused Endothelial Dysfunction by Inhibiting LOX -1 Signaling. Environ. Toxicol. 2023, 38, tox.23788. [Google Scholar] [CrossRef]
- Yamakuni, T.; Aoki, K.; Nakatani, K.; Kondo, N.; Oku, H.; Ishiguro, K.; Ohizumi, Y. Garcinone B Reduces Prostaglandin E2 Release and NF-ΚB-Mediated Transcription in C6 Rat Glioma Cells. Neurosci. Lett. 2006, 394, 206–210. [Google Scholar] [CrossRef]
- Hu, X.; Yu, M.H.; Yan, G.R.; Wang, H.Y.; Hou, A.J.; Lei, C. Isoprenylated Phenolic Compounds with Tyrosinase Inhibition from Morus nigra. J. Asian Nat. Prod. Res. 2018, 20, 488–493. [Google Scholar] [CrossRef]
- Coste, C.; Gérard, N.; Dinh, C.P.; Bruguière, A.; Rouger, C.; Leong, S.T.; Awang, K.; Richomme, P.; Derbré, S.; Charreau, B. Targeting MHC Regulation Using Polycyclic Polyprenylated Acylphloroglucinols Isolated from Garcinia Bancana. Biomolecules 2020, 10, 1266. [Google Scholar] [CrossRef]
- Li, K.; Yu, X.H.; Maskey, A.R.; Musa, I.; Wang, Z.Z.; Garcia, V.; Guo, A.; Yang, N.; Srivastava, K.; Dunkin, D.; et al. Cytochrome P450 3A4 Suppression by Epimedium and Active Compound Kaempferol Leads to Synergistic Anti-Inflammatory Effect with Corticosteroid. Front. Pharmacol. 2023, 13, 1042756. [Google Scholar] [CrossRef]
- Kim, H.W.; Kim, J.B.; Shanmugavelan, P.; Kim, S.N.; Cho, Y.S.; Kim, H.R.; Lee, J.T.; Jeon, W.-T.; Lee, D.J. Evaluation of γ-Oryzanol Content and Composition from the Grains of Pigmented Rice-Germplasms by LC-DAD-ESI/MS. BMC Res. Notes 2013, 6, 149–159. [Google Scholar] [CrossRef]
- Lee, K.-T.; Sohn, I.-C.; Kim, D.-H.; Choi, J.-W.; Kwon, S.-H.; Park, H.-J. Hypoglycemic and Hypolipidemic Effects of Tectorigenin and Kaikasaponin III in the Streptozotocin-Lnduced Diabetic Rat and Their Antioxidant Activity in Vitro. Arch. Pharm. Res. 2000, 23, 461–466. [Google Scholar] [CrossRef]
- Chen, X.; Zhang, J.; Li, R.; Zhang, H.; Sun, Y.; Jiang, L.; Wang, X.; Xiong, Y. Flos Puerariae-Semen Hoveniae Medicinal Pair Extract Ameliorates DSS-Induced Inflammatory Bowel Disease through Regulating MAPK Signaling and Modulating Gut Microbiota Composition. Front. Pharmacol. 2022, 13, 1034031. [Google Scholar] [CrossRef] [PubMed]
- Sun, Y.; Long, J.; Chen, W.; Sun, Y.; Zhou, L.; Zhang, L.; Zeng, H.; Yuan, D. Alisol B 23-Acetate, a New Promoter for Cholesterol Efflux from Dendritic Cells, Alleviates Dyslipidemia and Inflammation in Advanced Atherosclerotic Mice. Int. Immunopharmacol. 2021, 99, 107956. [Google Scholar] [CrossRef] [PubMed]
- Baell, J.B.; Holloway, G.A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 2010, 53, 2719–2740. [Google Scholar] [CrossRef] [PubMed]
- Marnett, L.J. Recent Developments in Cyclooxygenase Inhibition. Prostaglandins Other Lipid Mediat. 2002, 68–69, 153–164. [Google Scholar] [CrossRef]
- Romero-Estrada, A.; Boto, A.; González-Christen, J.; Romero-Estudillo, I.; Garduño-Ramírez, M.L.; Razo-Hernández, R.S.; Marquina, S.; Maldonado-Magaña, A.; Columba-Palomares, M.C.; Sánchez-Carranza, J.N.; et al. Synthesis, Biological Evaluation, and Molecular Docking Study of 3-Amino and 3-Hydroxy- Seco A Derivatives of α-Amyrin and 3-Epilupeol as Inhibitors of COX-2 Activity and NF-KB Activation. J. Nat. Prod. 2022, 85, 787–803. [Google Scholar] [CrossRef]
- Shekfeh, S.; Çalışkan, B.; Fischer, K.; Yalçın, T.; Garscha, U.; Werz, O.; Banoglu, E. A Multi-step Virtual Screening Protocol for Identification of Novel Non-acidic Microsomal Prostaglandin E2 Synthase-1 (MPGES-1) Inhibitors. ChemMedChem 2018, 14, 273–281. [Google Scholar] [CrossRef]
- National Health Commission of P. R. China. Management Approach of Food and Chinese Medicine Homologous Catalogue According to Tradition. Available online: http://www.nhc.gov.cn/wjw/yjzj/201411/67ac54fb05ed46929adc63f2db31d4bf.shtml (accessed on 3 June 2023).
- National Health Commission of P. R. China. List of 9 Pilot Food and Chinese Medicine Homologous Substances According to Tradition. Available online: http://www.nhc.gov.cn/sps/s7885/202001/1ec2cca04146450d9b14acc2499d854f.shtml (accessed on 3 June 2023).
- Pharmacopoeia Committee of P. R. China. Pharmacopoeia of People’s Republic of China, 11th ed.; China Medical Science and Technology Press: Beijing, China, 2020. [Google Scholar]
- CNKI. Available online: https://www.cnki.net/index/ (accessed on 3 June 2023).
- Schaduangrat, N.; Anuwongcharoen, N.; Charoenkwan, P.; Shoombuatong, W. DeepAR: A Novel Deep Learning-Based Hybrid Framework for the Interpretable Prediction of Androgen Receptor Antagonists. J. Cheminform. 2023, 15, 50. [Google Scholar] [CrossRef]
- RDKit: Open-Source Cheminformatics Software. Available online: https://rdkit.org/ (accessed on 3 June 2023).
- Matplotlib: Visualization with Python. Available online: https://matplotlib.org/ (accessed on 3 June 2023).
- Gisbrecht, A.; Schulz, A.; Hammer, B. Parametric Nonlinear Dimensionality Reduction Using Kernel T-SNE. Neurocomputing 2015, 147, 71–82. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Qin, Z.; Xi, Y.; Zhang, S.; Tu, G.; Yan, A. Classification of Cyclooxygenase-2 Inhibitors Using Support Vector Machine and Random Forest Methods. J. Chem. Inf. Model. 2019, 59, 1988–2008. [Google Scholar] [CrossRef]
- European Bioinformatics Institute of European Molecular Biology Laboratory ChEMBL. Available online: https://www.ebi.ac.uk/chembl/g/ (accessed on 3 June 2023).
- Elsevier Reaxys. Available online: https://www.reaxys.com/#/search/quick (accessed on 3 June 2023).
- Chemical Abstracts Service of America SciFinder. Available online: https://scifinder-n.cas.org/ (accessed on 3 June 2023).
- Tian, Y.; Yang, Z.; Wang, H.; Yan, A. Prediction of Bioactivities of Microsomal Prostaglandin E2 Synthase-1 Inhibitors by Machine Learning Algorithms. Chem. Biol. Drug Des. 2023, 101, 1307–1321. [Google Scholar] [CrossRef]
- Kc, G.B.; Bocci, G.; Verma, S.; Hassan, M.M.; Holmes, J.; Yang, J.J.; Sirimulla, S.; Oprea, T.I. A Machine Learning Platform to Estimate Anti-SARS-CoV-2 Activities. Nat. Mach. Intell. 2021, 3, 527–535. [Google Scholar] [CrossRef]
- Riniker, S.; Landrum, G.A. Open-Source Platform to Benchmark Fingerprints for Ligand-Based Virtual Screening. J. Cheminform. 2013, 5, 26. [Google Scholar] [CrossRef] [PubMed]
- CORINA Symphony, version 1.0; Molecular Networks GmbH: Germany, Altamira; LLC: New York, NY, USA, 2018; Available online: https://mn-am.com/products/corinasymphony/ (accessed on 3 June 2023).
- MayaChemTools: A Growing Collection of Perl and Python Scripts, Modules, and Classes to Support a Variety of Day-to-Day Computational Discovery Needs. Available online: http://www.mayachemtools.org/ (accessed on 4 June 2023).
- Noble, W.S. What Is a Support Vector Machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef] [PubMed]
- Belgiu, M.; Drăguţ, L. Random Forest in Remote Sensing: A Review of Applications and Future Directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Wang, L.; Ding, J.; Shi, P.; Fu, L.; Pan, L.; Tian, J.; Cao, D.; Jiang, H.; Ding, X. Ensemble Machine Learning to Evaluate the in Vivo Acute Oral Toxicity and in Vitro Human Acetylcholinesterase Inhibitory Activity of Organophosphates. Arch. Toxicol. 2021, 95, 2443–2457. [Google Scholar] [CrossRef]
- Krstajic, D.; Buturovic, L.J.; Leahy, D.E.; Thomas, S. Cross-Validation Pitfalls When Selecting and Assessing Regression and Classification Models. J. Cheminform. 2014, 6, 10. [Google Scholar] [CrossRef]
- Petersen, P.; Voigtlaender, F. Optimal Approximation of Piecewise Smooth Functions Using Deep ReLU Neural Networks. Neural Netw. 2018, 108, 296–330. [Google Scholar] [CrossRef]
- Vilares Ferro, M.; Doval Mosquera, Y.; Ribadas Pena, F.J.; Darriba Bilbao, V.M. Early Stopping by Correlating Online Indicators in Neural Networks. Neural Netw. 2023, 159, 109–124. [Google Scholar] [CrossRef]
- Chen, C.H.; Tanaka, K.; Kotera, M.; Funatsu, K. Comparison and Improvement of the Predictability and Interpretability with Ensemble Learning Models in QSPR Applications. J. Cheminform. 2020, 12, 19. [Google Scholar] [CrossRef] [PubMed]
- Lou, C.; Yang, H.; Deng, H.; Huang, M.; Li, W.; Liu, G.; Lee, P.W.; Tang, Y. Chemical Rules for Optimization of Chemical Mutagenicity via Matched Molecular Pairs Analysis and Machine Learning Methods. J. Cheminform. 2023, 15, 35. [Google Scholar] [CrossRef] [PubMed]
- Glielmo, A.; Husic, B.E.; Rodriguez, A.; Clementi, C.; Noé, F.; Laio, A. Unsupervised Learning Methods for Molecular Simulation Data. Chem. Rev. 2021, 121, 9722–9758. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Zhang, S.; Yin, H.; Yan, A. Quantitative Structure-Activity Relationship (QSAR) Models and Their Applicability Domain Analysis on HIV-1 Protease Inhibitors by Machine Learning Methods. Chemom. Intell. Lab. Syst. 2020, 196, 103888. [Google Scholar] [CrossRef]
- Irwin, J.J.; Duan, D.; Torosyan, H.; Doak, A.K.; Ziebart, K.T.; Sterling, T.; Tumanian, G.; Shoichet, B.K. An Aggregation Advisor for Ligand Discovery. J. Med. Chem. 2015, 58, 7076–7087. [Google Scholar] [CrossRef]
- Eberhardt, J.; Santos-Martins, D.; Tillack, A.F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021, 61, 3891–3898. [Google Scholar] [CrossRef]
Fingerprints | Algorithm | Training Set | 5-Fold Cross-Validation | Test Set | External Validation Set A1 | ||||
---|---|---|---|---|---|---|---|---|---|
Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | ||
Avalon | RF | 0.911 ± 0.005 | 0.817 ± 0.009 | 0.853 ± 0.007 | 0.7 ± 0.015 | 0.863 ± 0.014 | 0.721 ± 0.026 | 0.777 ± 0.027 | 0.524 ± 0.008 |
Avalon | SVM | 0.97 ± 0.007 | 0.939 ± 0.014 | 0.892 ± 0.006 | 0.779 ± 0.012 | 0.893 ± 0.011 | 0.783 ± 0.023 | 0.786 ± 0.037 | 0.531 ± 0.017 |
Avalon | DNN | 0.996 ± 0.002 | 0.991 ± 0.005 | 0.972 ± 0.006 | 0.942 ± 0.012 | 0.883 ± 0.011 | 0.759 ± 0.024 | 0.778 ± 0.033 | 0.532 ± 0.011 |
Avalon | XGBoost | 0.975 ± 0.024 | 0.949 ± 0.054 | 0.8893 ± 0.031 | 0.794 ± 0.061 | 0.89 ± 0.011 | 0.774 ± 0.023 | 0.792 ± 0.036 | 0.541 ± 0.02 |
ECFP4 | RF | 0.91 ± 0.008 | 0.814 ± 0.017 | 0.85 ± 0.005 | 0.691 ± 0.01 | 0.857 ± 0.015 | 0.704 ± 0.031 | 0.763 ± 0.02 | 0.508 ± 0.024 |
ECFP4 | SVM | 0.988 ± 0.007 | 0.975 ± 0.014 | 0.885 ± 0.007 | 0.764 ± 0.015 | 0.885 ± 0.016 | 0.763 ± 0.033 | 0.794 ± 0.023 | 0.537 ± 0.022 |
ECFP4 | DNN | 0.994 ± 0.001 | 0.988 ± 0.002 | 0.975 ± 0.005 | 0.948 ± 0.011 | 0.87 ± 0.019 | 0.732 ± 0.038 | 0.773 ± 0.021 | 0.515 ± 0.017 |
ECFP4 | XGBoost | 0.99 ± 0.003 | 0.981 ± 0.005 | 0.883 ± 0.002 | 0.761 ± 0.004 | 0.887 ± 0.011 | 0.768 ± 0.023 | 0.767 ± 0.02 | 0.531 ± 0.013 |
MACCS | RF | 0.897 ± 0.004 | 0.79 ± 0.008 | 0.844 ± 0.005 | 0.68 ± 0.01 | 0.854 ± 0.012 | 0.701 ± 0.025 | 0.737 ± 0.018 | 0.519 ± 0.011 |
MACCS | SVM | 0.945 ± 0.013 | 0.887 ± 0.027 | 0.874 ± 0.007 | 0.742 ± 0.014 | 0.876 ± 0.013 | 0.746 ± 0.026 | 0.758 ± 0.024 | 0.532 ± 0.011 |
MACCS | DNN | 0.973 ± 0.005 | 0.945 ± 0.01 | 0.939 ± 0.009 | 0.874 ± 0.019 | 0.862 ± 0.012 | 0.716 ± 0.023 | 0.741 ± 0.015 | 0.526 ± 0.01 |
MACCS | XGBoost | 0.968 ± 0.006 | 0.931 ± 0.014 | 0.875 ± 0.005 | 0.745 ± 0.011 | 0.881 ± 0.012 | 0.753 ± 0.025 | 0.775 ± 0.02 | 0.54 ± 0.023 |
Ensemble | RF | 0.933 ± 0.018 | 0.835 ± 0.016 | 0.876 ± 0.021 | 0.717 ± 0.02 | 0.886 ± 0.024 | 0.735 ± 0.029 | 0.787 ± 0.025 | 0.535 ± 0.023 |
Ensemble | SVM | 0.986 ± 0.007 | 0.952 ± 0.013 | 0.901 ± 0.006 | 0.78 ± 0.01 | 0.903 ± 0.009 | 0.782 ± 0.016 | 0.797 ± 0.017 | 0.551 ± 0.012 |
Ensemble | DNN | 0.996 ± 0.002 | 0.986 ± 0.007 | 0.971 ± 0.006 | 0.931 ± 0.012 | 0.883 ± 0.014 | 0.747 ± 0.025 | 0.775 ± 0.013 | 0.535 ± 0.009 |
Ensemble | XGBoost | 0.987 ± 0.008 | 0.963 ± 0.018 | 0.893 ± 0.009 | 0.777 ± 0.018 | 0.896 ± 0.009 | 0.774 ± 0.019 | 0.787 ± 0.017 | 0.546 ± 0.014 |
Fingerprints | Algorithm | Training Set | 5-Fold Cross-Validation | Test Set | External Validation Set A1 | ||||
---|---|---|---|---|---|---|---|---|---|
Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | ||
Avalon | RF | 0.838 ± 0.004 | 0.67 ± 0.008 | 0.786 ± 0.006 | 0.566 ± 0.012 | 0.795 ± 0.014 | 0.586 ± 0.029 | 0.747 ± 0.017 | 0.505 ± 0.01 |
Avalon | SVM | 0.932 ± 0.017 | 0.863 ± 0.035 | 0.814 ± 0.006 | 0.625 ± 0.013 | 0.817 ± 0.008 | 0.632 ± 0.018 | 0.785 ± 0.014 | 0.529 ± 0.016 |
Avalon | DNN | 0.992 ± 0.002 | 0.984 ± 0.003 | 0.943 ± 0.007 | 0.883 ± 0.015 | 0.802 ± 0.012 | 0.602 ± 0.025 | 0.747 ± 0.21 | 0.507 ± 0.01 |
Avalon | XGBoost | 0.97 ± 0.019 | 0.938 ± 0.035 | 0.818 ± 0.007 | 0.628 ± 0.013 | 0.825 ± 0.006 | 0.644 ± 0.014 | 0.762 ± 0.026 | 0.523 ± 0.015 |
ECFP4 | RF | 0.828 ± 0.005 | 0.65 ± 0.011 | 0.778 ± 0.005 | 0.548 ± 0.011 | 0.784 ± 0.014 | 0.561 ± 0.028 | 0.713 ± 0.026 | 0.497 ± 0.013 |
ECFP4 | SVM | 0.948 ± 0.025 | 0.894 ± 0.052 | 0.811 ± 0.003 | 0.616 ± 0.006 | 0.817 ± 0.01 | 0.63 ± 0.022 | 0.757 ± 0.023 | 0.532 ± 0.018 |
ECFP4 | DNN | 0.989 ± 0.002 | 0.977 ± 0.003 | 0.956 ± 0.005 | 0.91 ± 0.01 | 0.8 ± 0.012 | 0.595 ± 0.024 | 0.725 ± 0.012 | 0.509 ± 0.009 |
ECFP4 | XGBoost | 0.947 ± 0.023 | 0.896 ± 0.043 | 0.811 ± 0.006 | 0.617 ± 0.012 | 0.813 ± 0.013 | 0.625 ± 0.028 | 0.755 ± 0.046 | 0.523 ± 0.022 |
MACCS | RF | 0.811 ± 0.006 | 0.617 ± 0.012 | 0.77 ± 0.006 | 0.534 ± 0.013 | 0.773 ± 0.011 | 0.541 ± 0.022 | 0.713 ± 0.013 | 0.497 ± 0.009 |
MACCS | SVM | 0.897 ± 0.016 | 0.792 ± 0.033 | 0.809 ± 0.004 | 0.614 ± 0.01 | 0.812 ± 0.012 | 0.621 ± 0.025 | 0.758 ± 0.027 | 0.529 ± 0.022 |
MACCS | DNN | 0.946 ± 0.01 | 0.893 ± 0.017 | 0.887 ± 0.01 | 0.772 ± 0.02 | 0.79 ± 0.01 | 0.575 ± 0.022 | 0.732 ± 0.011 | 0.506 ± 0.008 |
MACCS | XGBoost | 0.909 ± 0.012 | 0.816 ± 0.025 | 0.807 ± 0.004 | 0.611 ± 0.007 | 0.817 ± 0.009 | 0.626 ± 0.012 | 0.753 ± 0.013 | 0.515 ± 0.01 |
Ensemble | RF | 0.84 ± 0.006 | 0.66 ± 0.008 | 0.792 ± 0.007 | 0.563 ± 0.011 | 0.798 ± 0.011 | 0.576 ± 0.02 | 0.739 ± 0.016 | 0.513 ± 0.01 |
Ensemble | SVM | 0.94 ± 0.009 | 0.864 ± 0.019 | 0.827 ± 0.008 | 0.633 ± 0.011 | 0.83 ± 0.007 | 0.642 ± 0.016 | 0.78 ± 0.014 | 0.544 ± 0.011 |
Ensemble | DNN | 0.989 ± 0.005 | 0.965 ± 0.007 | 0.942 ± 0.006 | 0.868 ± 0.012 | 0.812 ± 0.008 | 0.604 ± 0.015 | 0.748 ± 0.004 | 0.52 ± 0.003 |
Ensemble | XGBoost | 0.953 ± 0.014 | 0.895 ± 0.027 | 0.823 ± 0.005 | 0.63 ± 0.008 | 0.83 ± 0.006 | 0.643 ± 0.012 | 0.767 ± 0.018 | 0.531 ± 0.01 |
Descriptors | Algorithm | Training Set | 5-Fold Cross-Validation | Test Set | External Validation Set A2 | ||||
---|---|---|---|---|---|---|---|---|---|
R2 (ave_std) | RMSE (ave_std) | R2 (ave_std) | RMSE (ave_std) | R2 (ave_std) | RMSE (ave_std) | R2 (ave_std) | RMSE (ave_std) | ||
G_3D | RF | 0.847 ± 0.004 | 0.346 ± 0.004 | 0.633 ± 0.01 | 0.537 ± 0.003 | 0.651 ± 0.007 | 0.517 ± 0.021 | 0.539 ± 0.063 | 0.634 ± 0.036 |
G_3D | SVM | 0.941 ± 0.011 | 0.214 ± 0.023 | 0.713 ± 0.007 | 0.475 ± 0.006 | 0.725 ± 0.017 | 0.465 ± 0.014 | 0.553 ± 0.05 | 0.612 ± 0.034 |
G_3D | DNN | 0.951 ± 0.021 | 0.191 ± 0.041 | 0.833 ± 0.06 | 0.354 ± 0.062 | 0.691 ± 0.025 | 0.488 ± 0.018 | 0.536 ± 0.066 | 0.63 ± 0.036 |
G_3D | XGBoost | 0.907 ± 0.022 | 0.268 ± 0.035 | 0.817 ± 0.024 | 0.375 ± 0.026 | 0.684 ± 0.031 | 0.492 ± 0.02 | 0.539 ± 0.05 | 0.634 ± 0.032 |
RDKit | RF | 0.858 ± 0.007 | 0.334 ± 0.009 | 0.679 ± 0.008 | 0.503 ± 0.007 | 0.688 ± 0.025 | 0.485 ± 0.028 | 0.532 ± 0.041 | 0.607 ± 0.021 |
RDKit | SVM | 0.959 ± 0.012 | 0.179 ± 0.027 | 0.744 ± 0.01 | 0.449 ± 0.01 | 0.746 ± 0.022 | 0.436 ± 0.019 | 0.581 ± 0.05 | 0.596 ± 0.038 |
RDKit | DNN | 0.965 ± 0.042 | 0.143 ± 0.076 | 0.862 ± 0.096 | 0.307 ± 0.102 | 0.72 ± 0.026 | 0.471 ± 0.031 | 0.594 ± 0.037 | 0.591 ± 0.03 |
RDKit | XGBoost | 0.988 ± 0.002 | 0.095 ± 0.01 | 0.923 ± 0.012 | 0.243 ± 0.021 | 0.703 ± 0.025 | 0.477 ± 0.015 | 0.594 ± 0.038 | 0.584 ± 0.035 |
Fingerprints | Algorithm | Training Set | 5-Fold Cross-Validation | Test Set | External Validation Set B1 | ||||
---|---|---|---|---|---|---|---|---|---|
Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | ||
Avalon | RF | 0.914 ± 0.003 | 0.722 ± 0.012 | 0.882 ± 0.004 | 0.605 ± 0.014 | 0.888 ± 0.006 | 0.627 ± 0.022 | 0.744 ± 0.013 | 0.5 ± 0.013 |
Avalon | SVM | 0.978 ± 0.003 | 0.931 ± 0.01 | 0.916 ± 0.004 | 0.736 ± 0.013 | 0.917 ± 0.006 | 0.739 ± 0.021 | 0.76 ± 0.006 | 0.513 ± 0.009 |
Avalon | DNN | 0.992 ± 0.001 | 0.977 ± 0.005 | 0.966 ± 0.006 | 0.894 ± 0.019 | 0.908 ± 0.006 | 0.711 ± 0.019 | 0.747 ± 0.01 | 0.501 ± 0.011 |
Avalon | XGBoost | 0.99 ± 0.004 | 0.971 ± 0.011 | 0.913 ± 0.004 | 0.73 ± 0.008 | 0.918 ± 0.005 | 0.74 ± 0.014 | 0.747 ± 0.017 | 0.508 ± 0.006 |
ECFP4 | RF | 0.912 ± 0.003 | 0.713 ± 0.012 | 0.878 ± 0.004 | 0.587 ± 0.016 | 0.882 ± 0.006 | 0.603 ± 0.024 | 0.726 ± 0.01 | 0.501 ± 0.013 |
ECFP4 | SVM | 0.993 ± 0.004 | 0.977 ± 0.012 | 0.94 ± 0.003 | 0.812 ± 0.01 | 0.941 ± 0.007 | 0.817 ± 0.022 | 0.77 ± 0.012 | 0.543 ± 0.02 |
ECFP4 | DNN | 0.999 ± 0.001 | 0.998 ± 0.001 | 0.985 ± 0.003 | 0.954 ± 0.008 | 0.925 ± 0.008 | 0.767 ± 0.024 | 0.768 ± 0.012 | 0.533 ± 0.01 |
ECFP4 | XGBoost | 0.995 ± 0.004 | 0.987 ± 0.01 | 0.933 ± 0.004 | 0.796 ± 0.008 | 0.946 ± 0.006 | 0.83 ± 0.009 | 0.778 ± 0.005 | 0.547 ± 0.015 |
MACCS | RF | 0.907 ± 0.005 | 0.698 ± 0.019 | 0.88 ± 0.003 | 0.596 ± 0.011 | 0.885 ± 0.005 | 0.616 ± 0.017 | 0.704 ± 0.012 | 0.504 ± 0.01 |
MACCS | SVM | 0.97 ± 0.008 | 0.906 ± 0.025 | 0.921 ± 0.003 | 0.75 ± 0.011 | 0.924 ± 0.008 | 0.76 ± 0.025 | 0.752 ± 0.016 | 0.536 ± 0.011 |
MACCS | DNN | 0.985 ± 0.002 | 0.955 ± 0.007 | 0.959 ± 0.004 | 0.872 ± 0.012 | 0.912 ± 0.009 | 0.726 ± 0.028 | 0.729 ± 0.008 | 0.529 ± 0.012 |
MACCS | XGBoost | 0.977 ± 0.005 | 0.933 ± 0.013 | 0.92 ± 0.005 | 0.749 ± 0.011 | 0.925 ± 0.005 | 0.765 ± 0.014 | 0.757 ± 0.017 | 0.536 ± 0.014 |
Ensemble | RF | 0.923 ± 0.005 | 0.723 ± 0.011 | 0.892 ± 0.005 | 0.608 ± 0.01 | 0.897 ± 0.007 | 0.627 ± 0.019 | 0.736 ± 0.011 | 0.514 ± 0.01 |
Ensemble | SVM | 0.99 ± 0.004 | 0.949 ± 0.014 | 0.936 ± 0.004 | 0.777 ± 0.007 | 0.938 ± 0.006 | 0.783 ± 0.017 | 0.772 ± 0.01 | 0.542 ± 0.011 |
Ensemble | DNN | 0.997 ± 0.001 | 0.985 ± 0.008 | 0.977 ± 0.013 | 0.905 ± 0.041 | 0.927 ± 0.003 | 0.75 ± 0.008 | 0.76 ± 0.004 | 0.533 ± 0.005 |
Ensemble | XGBoost | 0.997 ± 0.001 | 0.974 ± 0.005 | 0.932 ± 0.002 | 0.769 ± 0.005 | 0.939 ± 0.004 | 0.788 ± 0.006 | 0.771 ± 0.008 | 0.54 ± 0.008 |
Fingerprints | Algorithm | Training Set | 5-Fold Cross-Validation | Test Set | External Validation Set B1 | ||||
---|---|---|---|---|---|---|---|---|---|
Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | Q (ave_std) | MCC (ave_std) | ||
Avalon | RF | 0.913 ± 0.003 | 0.694 ± 0.013 | 0.885 ± 0.003 | 0.583 ± 0.014 | 0.882 ± 0.008 | 0.575 ± 0.032 | 0.699 ± 0.012 | 0.495 ± 0.007 |
Avalon | SVM | 0.967 ± 0.01 | 0.892 ± 0.035 | 0.911 ± 0.003 | 0.703 ± 0.013 | 0.904 ± 0.006 | 0.679 ± 0.022 | 0.712 ± 0.008 | 0.507 ± 0.007 |
Avalon | DNN | 0.99 ± 0.007 | 0.968 ± 0.022 | 0.965 ± 0.007 | 0.884 ± 0.021 | 0.902 ± 0.009 | 0.678 ± 0.03 | 0.718 ± 0.009 | 0.509 ± 0.01 |
Avalon | XGBoost | 0.987 ± 0.005 | 0.962 ± 0.008 | 0.911 ± 0.005 | 0.708 ± 0.013 | 0.908 ± 0.006 | 0.702 ± 0.014 | 0.713 ± 0.012 | 0.511 ± 0.007 |
ECFP4 | RF | 0.904 ± 0.004 | 0.658 ± 0.015 | 0.878 ± 0.003 | 0.552 ± 0.011 | 0.879 ± 0.008 | 0.559 ± 0.033 | 0.718 ± 0.005 | 0.495 ± 0.007 |
ECFP4 | SVM | 0.99 ± 0.005 | 0.969 ± 0.018 | 0.934 ± 0.003 | 0.78 ± 0.009 | 0.932 ± 0.004 | 0.773 ± 0.016 | 0.767 ± 0.011 | 0.538 ± 0.016 |
ECFP4 | DNN | 0.998 ± 0.002 | 0.994 ± 0.007 | 0.986 ± 0.005 | 0.954 ± 0.016 | 0.917 ± 0.005 | 0.722 ± 0.019 | 0.728 ± 0.008 | 0.511 ± 0.008 |
ECFP4 | XGBoost | 0.994 ± 0.003 | 0.981 ± 0.01 | 0.933 ± 0.003 | 0.769 ± 0.011 | 0.934 ± 0.004 | 0.772 ± 0.012 | 0.766 ± 0.01 | 0.537 ± 0.009 |
MACCS | RF | 0.9 ± 0.003 | 0.646 ± 0.013 | 0.876 ± 0.003 | 0.543 ± 0.013 | 0.882 ± 0.007 | 0.571 ± 0.03 | 0.697 ± 0.007 | 0.489 ± 0.008 |
MACCS | SVM | 0.969 ± 0.006 | 0.896 ± 0.019 | 0.92 ± 0.004 | 0.731 ± 0.012 | 0.921 ± 0.008 | 0.733 ± 0.028 | 0.736 ± 0.017 | 0.512 ± 0.006 |
MACCS | DNN | 0.983 ± 0.002 | 0.945 ± 0.006 | 0.959 ± 0.003 | 0.861 ± 0.011 | 0.9 ± 0.01 | 0.67 ± 0.028 | 0.71 ± 0.009 | 0.498 ± 0.008 |
MACCS | XGBoost | 0.976 ± 0.004 | 0.923 ± 0.013 | 0.918 ± 0.005 | 0.727 ± 0.01 | 0.924 ± 0.008 | 0.745 ± 0.025 | 0.737 ± 0.012 | 0.517 ± 0.011 |
Ensemble | RF | 0.916 ± 0.003 | 0.676 ± 0.01 | 0.89 ± 0.002 | 0.569 ± 0.009 | 0.891 ± 0.005 | 0.578 ± 0.022 | 0.715 ± 0.005 | 0.504 ± 0.004 |
Ensemble | SVM | 0.986 ± 0.005 | 0.929 ± 0.017 | 0.932 ± 0.002 | 0.748 ± 0.007 | 0.929 ± 0.004 | 0.738 ± 0.017 | 0.748 ± 0.008 | 0.529 ± 0.007 |
Ensemble | DNN | 0.998 ± 0.002 | 0.979 ± 0.008 | 0.98 ± 0.004 | 0.91 ± 0.014 | 0.917 ± 0.006 | 0.7 ± 0.02 | 0.729 ± 0.005 | 0.516 ± 0.007 |
Ensemble | XGBoost | 0.994 ± 0.004 | 0.964 ± 0.008 | 0.929 ± 0.004 | 0.744 ± 0.01 | 0.931 ± 0.004 | 0.749 ± 0.011 | 0.748 ± 0.01 | 0.531 ± 0.009 |
Descriptors | Algorithm | Training Set | 5-Fold Cross-Validation | Test Set | External Validation Set B2 | ||||
---|---|---|---|---|---|---|---|---|---|
R2 (ave_std) | RMSE (ave_std) | R2 (ave_std) | RMSE (ave_std) | R2 (ave_std) | RMSE (ave_std) | R2 (ave_std) | RMSE (ave_std) | ||
G_3D | RF | 0.929 ± 0.023 | 0.175 ± 0.027 | 0.718 ± 0.21 | 0.351 ± 0.016 | 0.731 ± 0.052 | 0.342 ± 0.046 | 0.615 ± 0.052 | 0.46 ± 0.047 |
G_3D | SVM | 0.864 ± 0.041 | 0.241 ± 0.037 | 0.737 ± 0.019 | 0.339 ± 0.016 | 0.75 ± 0.048 | 0.329 ± 0.045 | 0.634 ± 0.048 | 0.444 ± 0.052 |
G_3D | DNN | 0.927 ± 0.029 | 0.174 ± 0.03 | 0.745 ± 0.01 | 0.331 ± 0.008 | 0.727 ± 0.042 | 0.354 ± 0.032 | 0.611 ± 0.042 | 0.472 ± 0.034 |
G_3D | XGBoost | 0.92 ± 0.035 | 0.183 ± 0.037 | 0.746 ± 0.028 | 0.332 ± 0.022 | 0.741 ± 0.044 | 0.339 ± 0.039 | 0.625 ± 0.044 | 0.455 ± 0.045 |
RDKit | RF | 0.925 ± 0.023 | 0.181 ± 0.026 | 0.745 ± 0.015 | 0.335 ± 0.009 | 0.776 ± 0.051 | 0.306 ± 0.029 | 0.619 ± 0.016 | 0.449 ± 0.037 |
RDKit | SVM | 0.874 ± 0.035 | 0.232 ± 0.033 | 0.751 ± 0.019 | 0.33 ± 0.014 | 0.778 ± 0.038 | 0.304 ± 0.029 | 0.63 ± 0.023 | 0.424 ± 0.019 |
RDKit | DNN | 0.939 ± 0.025 | 0.161 ± 0.033 | 0.834 ± 0.038 | 0.265 ± 0.035 | 0.787 ± 0.016 | 0.301 ± 0.016 | 0.625 ± 0.009 | 0.451 ± 0.031 |
RDKit | XGBoost | 0.98 ± 0.018 | 0.079 ± 0.048 | 0.75 ± 0.017 | 0.331 ± 0.011 | 0.773 ± 0.028 | 0.311 ± 0.018 | 0.625 ± 0.012 | 0.445 ± 0.024 |
Candidates | Origins | IC50 μM a | SOM HP b | Effects |
---|---|---|---|---|
cmp_B1 | Cannabis Sativa L. | 0.88 | 1 | COX inhibition [30] |
cmp_B2 | Ramulus Mori | 0.25 | 0.93 | |
cmp_B3 | Amomum longiligularg | 0.34 | 0.93 | Anti-breast cancer [31] |
cmp_B4 | Glycyrrhiza glabra L. | 0.80 | 0.93 | PTP1B [32] and nitric oxide (NO) inhibition [33] |
cmp_B5 | Gardeniae Fructus | 0.18 | 0.82 | LOX inhibition [34] |
cmp_B6 | Schisandra chinensis | 0.37 | 0.82 | UDP-glucuronosyltransferase [35], and oxLDL inhibition [36] |
cmp_B7 | Rhizoma Dioscoreae | 0.55 | 0.82 | prostaglandin E2 reduction [37] |
cmp_B8 | Ramulus Mori | 0.18 | 1 | |
cmp_B9 | Mori Cortex | 0.19 | 1 | Tyrosinase inhibition [38] |
cmp_B10 | Glehniae Radix | 0.20 | 1 | |
cmp_B11 | Colla | 0.37 | 1 | Anti-inflammation [39] |
cmp_B12 | Epimrdii Herba | 0.50 | 1 | Anti-inflammation [40] |
cmp_B13 | Coicis Semen | 0.27 | 1 | Anti-inflammation [41] |
cmp_B14 | Radix Puerariae | 0.20 | 1 | Anti-oxidation [42], andenteritis treatment [43] |
cmp_B15 | Alisma Orientale | 0.29 | 1 | COX-2 inhibition [44] |
cmp_A1 & 5KIR a (affinity = −7.57 kcal/mol) | cmp_A2 & 6BL3 (affinity = −11.75 kcal/mol) |
cmp_A3 & 4PH9 (affinity = −7.37 kcal/mol) | cmp_A4 & 4PH9 (affinity = −8.53 kcal/mol) |
cmp_A5 & 6BL4 (affinity= −7.25 kcal/mol) | cmp_A6 & 4PH9 (affinity = −8.68 kcal/mol) |
cmp_A7 & 4PH9 (affinity = −7.69 kcal/mol) | cmp_A8 & 6BL4 (affinity = −11.24 kcal/mol) |
cmp_A9 & 6BL4 (affinity = −9.61 kcal/mol) | cmp_A10 & 6BL3 (affinity = −8.92 kcal/mol) |
cmp_B1 & 4YL0 a (affinity = −7.93 kcal/mol) | cmp_B2 & 4YL1 (affinity = −9.19 kcal/mol) |
cmp_B3 & 4YL1 (affinity = −8.82 kcal/mol) | cmp_B4 & 4AL1 (affinity = −8.08 kcal/mol) |
cmp_B5 & 4AL1 (affinity= −11.25 kcal/mol) | cmp_B6 & 4YL0 (affinity = −7.25 kcal/mol) |
cmp_B7 & 4YL0 (affinity = −7.26 kcal/mol) | cmp_B8 & 5TL9 (affinity = −9.55 kcal/mol) |
cmp_B9 & 5TL9 (affinity = −9.2 kcal/mol) | cmp_B10 & 5K0I (affinity = −10.31 kcal/mol) |
cmp_B11 & 4YL0 (affinity = −6.89 kcal/mol) | cmp_B12 & 4YL0 (affinity = −8.73 kcal/mol) |
cmp_B13 & 4YL0 (affinity = −7.19 kcal/mol) | cmp_B14 & 4YL0 (affinity = −10.71 kcal/mol) |
cmp_B15 & 4YL0 (affinity = −7.34 kcal/mol) |
Datasets | Targets | N | Descriptions |
---|---|---|---|
Dataset 1 | COX-2 | 1630 | COX-2 inhibitors for constructing classification models, model results are shown in Table 1 and Table S1, molecules with IC50 > 10 μM are weakly active inhibitors; with IC50 < 0.1 μM are highly active inhibitors |
Dataset 2 | COX-2 | 2925 | COX-2 inhibitors for constructing classification models, model results are shown in Table 2 and Table S2, molecules with IC50 > 1 μM are weakly active inhibitors; with IC50 ≤ 1 μM are highly active inhibitors |
Dataset 3 | COX-2 | 1511 | COX-2 inhibitors for constructing QSAR models, model results are shown in Table 3 and Table S3, molecules with IC50 values which were tested in vitro by enzyme-linked immunoassay |
External validation set A1 | COX-2 | 368 | for evaluating the constructed classification models on COX-2 inhibitors |
External validation set A2 | COX-2 | 114 | for evaluating the constructed regression models on COX-2 inhibitors |
Dataset 4 | mPGES-1 | 3179 | mPGES-1 inhibitors for building classification models, model results are shown in Table 4 and Table S4, molecules with IC50 > 10 μM are weakly active inhibitors; with IC50 < 0.6 μM are highly active inhibitors |
Dataset 5 | mPGES-1 | 3455 | mPGES-1 inhibitors for building classification models, model results are shown in Table 5 and Table S5, molecules with IC50 ≥ 10 μM are weakly active inhibitors; with IC50 < 10 μM are highly active inhibitors |
Dataset 6 | mPGES-1 | 735 | mPGES-1 inhibitors for constructing QSAR models, model results are shown in Table 6 and Table S6, molecules with IC50 values which were tested in vitro by homogeneous time-resolved fluorescence assay |
External validation set B1 | mPGES-1 | 217 | for evaluating the constructed classification models on mPGES-1 inhibitors |
External validation set B2 | mPGES-1 | 60 | for evaluating the constructed regression models on mPGES-1 inhibitors |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, Y.; Zhang, Z.; Yan, A. Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms. Molecules 2023, 28, 6782. https://doi.org/10.3390/molecules28196782
Tian Y, Zhang Z, Yan A. Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms. Molecules. 2023; 28(19):6782. https://doi.org/10.3390/molecules28196782
Chicago/Turabian StyleTian, Yujia, Zhixing Zhang, and Aixia Yan. 2023. "Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms" Molecules 28, no. 19: 6782. https://doi.org/10.3390/molecules28196782