Predicting the Pathway Involvement of Compounds Annotated in the Reactome Knowledgebase
Abstract
:1. Introduction
2. Materials and Methods
3. Results
3.1. Main Results
3.2. oMCC and Compound/Pathway Size
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
MLP | Multilayer perceptron |
CV | Cross-validation |
MCC | Matthew’s correlation coefficient |
oMCC | Overall Matthew’s correlation coefficient |
KEGG | Kyoto Encyclopedia of Genes and Genomes |
ChEBI | Chemical Entities of Biological Interest |
TP | True positives |
TN | True negatives |
FP | False positives |
FN | False negatives |
L | Level |
References
- Voet, D.; Voet, J.G.; Pratt, C.W. Fundamentals of Biochemistry: Life at the Molecular, 5th ed.; Wiley: Hoboken, NJ, USA, 2016. [Google Scholar]
- Berg, J.M.; Tymoczko, J.L.; Gatto, G.J.; Stryer, L. Biochemistry, 9th ed.; W. H. Freeman: New York, NY, USA, 2019. [Google Scholar]
- Nelson, D.L.; Cox, M.M. Principles of Biochemistry, 8th ed.; W. H. Freeman: New York, NY, USA, 2021. [Google Scholar]
- Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28, 27–30. [Google Scholar] [CrossRef] [PubMed]
- Milacic, M.; Beavers, D.; Conley, P.; Gong, C.; Gillespie, M.; Griss, J.; Haw, R.; Jassal, B.; Matthews, L.; May, B.; et al. The reactome pathway knowledgebase 2024. Nucleic Acids Res. 2024, 52, D672–D678. [Google Scholar] [CrossRef] [PubMed]
- Huckvale, E.D.; Moseley, H.N.B. A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement. PLoS ONE 2024, 19, e0299583. [Google Scholar] [CrossRef] [PubMed]
- Huckvale, E.D.; Powell, C.D.; Jin, H.; Moseley, H.N.B. Benchmark dataset for training machine learning models to predict the pathway involvement of metabolites. Metabolites 2023, 13, 1120. [Google Scholar] [CrossRef] [PubMed]
- Huckvale, E.D.; Moseley, H.N.B. Predicting the pathway involvement of metabolites based on combined metabolite and pathway features. Metabolites 2024, 14, 266. [Google Scholar] [CrossRef] [PubMed]
- Huckvale, E.D.; Moseley, H.N.B. Predicting the Association of Metabolites with Both Pathway Categories and Individual Pathways. Metabolites 2024, 14, 510. [Google Scholar] [CrossRef] [PubMed]
- Huckvale, E.D.; Moseley, H.N.B. Predicting the pathway involvement of all pathway and associated compound entries defined in the kyoto encyclopedia of genes and genomes. Metabolites 2024, 14, 582. [Google Scholar] [CrossRef] [PubMed]
- Hastings, J.; Owen, G.; Dekker, A.; Ennis, M.; Kale, N.; Muthukrishnan, V.; Turner, S.; Swainston, N.; Mendes, P.; Steinbeck, C. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016, 44, D1214–D1219. [Google Scholar] [CrossRef] [PubMed]
- Dalby, A.; Nourse, J.G.; Hounshell, W.D.; Gushurst, A.K.I.; Grier, D.L.; Leland, B.A.; Laufer, J. Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J. Chem. Inf. Model. 1992, 32, 244–255. [Google Scholar] [CrossRef]
- Jin, H.; Moseley, H.N.B. md_harmonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases. Metabolites 2023, 13, 1199. [Google Scholar] [CrossRef] [PubMed]
- Jin, H.; Moseley, H.N.B. Hierarchical Harmonization of Atom-Resolved Metabolic Reactions across Metabolic Databases. Metabolites 2021, 11, 431. [Google Scholar] [CrossRef] [PubMed]
- Jin, H.; Mitchell, J.M.; Moseley, H.N.B. Atom Identifiers Generated by a Neighborhood-Specific Graph Coloring Method Enable Compound Harmonization across Metabolic Databases. Metabolites 2020, 10, 368. [Google Scholar] [CrossRef] [PubMed]
- Reactome Pathway Browser. Available online: https://reactome.org/PathwayBrowser/ (accessed on 20 February 2025).
- Verstraeten, G.; Van den Poel, D. Using Predicted Outcome Stratified Sampling to Reduce the Variability in Predictive Performance of a One-Shot Train-and-Test Split for Individual Customer Predictions. ICDM (Posters) 2006, 214, 1–10. [Google Scholar]
- Rossum, G.V.; Drake, F.L. Python 3 Reference Manual; CreateSpace: North Charleston, SC, USA, 2009; ISBN 1441412697. [Google Scholar]
- The pandas development team. pandas-dev/pandas: Pandas 1.0.3. Zenodo 2020. [CrossRef]
- Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
- Collette, A. Python and HDF5; O’Reilly: Sebastopol, CA, USA, 2013. [Google Scholar]
- Falcon, W.; Borovec, J.; Wälchli, A.; Eggert, N.; Schock, J.; Jordan, J.; Skafte, N.; Bereznyuk, V.; Harris, E.; Murrell, T.; et al. PyTorchLightning/pytorch-lightning: 0.7.6 release. Zenodo 2020. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv 2019, arXiv:1912.01703. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Müller, A.; Nothman, J.; Louppe, G.; et al. Scikit-learn: Machine Learning in Python. arXiv 2012, arXiv:1201.0490. [Google Scholar]
- Chamberlin, D. SQL. In Encyclopedia of Database Systems; Liu, L., Özsu, M.T., Eds.; Springer US: Boston, MA, USA, 2009; pp. 2753–2760. ISBN 978-0-387-35544-3. [Google Scholar]
- Raasveldt, M.; Mühleisen, H. Duckdb: An embeddable analytical database. In Proceedings of the 2019 International Conference on Management of Data, Amsterdam, The Netherlands, 30 June–5 July 2019; ACM: New York, NY, USA, 2019; pp. 1981–1984. [Google Scholar]
- Kluyver, T.; Ragan-Kelley, B.; Pérez, F.; Granger, B.; Bussonnier, M.; Frederic, J.; Kelley, K.; Hamrick, J.; Grout, J.; Corlay, S.; et al. Jupyter Notebooks—A publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas; Loizides, F., Scmidt, B., Eds.; IOS Press: Amsterdam, The Netherlands, 2016; pp. 87–90. [Google Scholar]
- Waskom, M. Seaborn: Statistical data visualization. J. Open Source Softw. 2021, 6, 3021. [Google Scholar] [CrossRef]
- Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
- Salesforce. Tableau Public; Salesforce: San Francisco, CA, USA, 2024. [Google Scholar]
- Huckvale, E.D.; Moseley, H.N.B. gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments. arXiv 2024, arXiv:2404.01473. [Google Scholar]
Dataset | # Compounds | # Pathways | # Compound Features | # Pathway Features | # Entries |
---|---|---|---|---|---|
KEGG | 6485 | 502 | 16,509 | 11,321 | 3,255,470 |
Reactome | 1976 | 3985 | 6187 | 5386 | 7,874,360 |
Dataset | # Pathways | # Entries |
---|---|---|
L1+ | 3985 | 7,874,360 |
L2+ | 3700 | 7,311,200 |
L3+ | 3006 | 5,939,856 |
Hierarchy Levels Included | Mean MCC | Median MCC | Standard Deviation |
---|---|---|---|
L1+ | 0.916 | 0.919 | 0.0149 |
L2+ | 0.907 | 0.907 | 0.0099 |
L3+ | 0.884 | 0.886 | 0.0134 |
Dataset | Mean MCC | Median MCC | Standard Deviation |
---|---|---|---|
Reactome | 0.916 | 0.919 | 0.0149 |
KEGG | 0.847 | 0.848 | 0.0098 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huckvale, E.D.; Moseley, H.N.B. Predicting the Pathway Involvement of Compounds Annotated in the Reactome Knowledgebase. Metabolites 2025, 15, 161. https://doi.org/10.3390/metabo15030161
Huckvale ED, Moseley HNB. Predicting the Pathway Involvement of Compounds Annotated in the Reactome Knowledgebase. Metabolites. 2025; 15(3):161. https://doi.org/10.3390/metabo15030161
Chicago/Turabian StyleHuckvale, Erik D., and Hunter N. B. Moseley. 2025. "Predicting the Pathway Involvement of Compounds Annotated in the Reactome Knowledgebase" Metabolites 15, no. 3: 161. https://doi.org/10.3390/metabo15030161
APA StyleHuckvale, E. D., & Moseley, H. N. B. (2025). Predicting the Pathway Involvement of Compounds Annotated in the Reactome Knowledgebase. Metabolites, 15(3), 161. https://doi.org/10.3390/metabo15030161