Application of Machine Learning Solutions to Optimize Parameter Prediction to Enhance Automatic NMR Metabolite Profiling
Abstract
:1. Introduction
2. Results
2.1. H-NMR Metabolite Profiling and Prediction Pipeline of Expected Signal Parameter Values
- A data cleaning step to minimize the influence of inaccurate feature values (possibly due to wrong annotation or suboptimal quantification) during the prediction phase.
- A feature selection step, using the “Boruta” R package, to filter non-relevant features to reduce the noise in the dataset.
- Finally, we included a further feature engineering step [29], adding the first five PCs of the signal parameter dataset to the predictor dataset. The first PC components explain most of the system variance and relegate noise-related variance to later PCs. Consequently, the possible high noise-related variance in the dataset is minimized, and, hence, prediction performance is enhanced.
2.2. Using Accurate Predicted Values with Narrow PIs That Can Be Used to Maximize Profiling Performance
3. Discussion
- Strict sample preparation requirements or spectrum acquisition limitations. Caveats: difficulty of changing established protocols in laboratories, less flexibility to adapt the spectrum acquisition process to the properties of samples.
- Half bandwidth and chemical shift prediction. Caveats: broadening of TSP signal mediated by protein, nonlinear patterns in certain signals in complex matrices, inability to handle unidentified metabolites [13].
- Simultaneous lineshape fitting of all the signals of a same metabolite. Caveats: variability in the relative intensity of signals depending on the matrix, challenges when the signal chemical shift is not predicted exactly, inability to handle unidentified metabolites.
4. Materials and Methods
4.1. Datasets
4.2. 1H-NMR Metabolite Profiling Workflow
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Holmes, E.; Wilson, I.D.; Nicholson, J.K. Metabolic phenotyping in health and disease. Cell 2008, 134, 714–717. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nicholson, J.K. Global systems biology, personalized medicine and molecular epidemiology. Mol. Syst. Biol. 2006, 2, 52. [Google Scholar] [CrossRef] [PubMed]
- Van Duynhoven, J.; van Velzen, E.; Jacobs, D.M. Chapter Three—Quantification of Complex Mixtures by NMR; Webb, G., Ed.; Academic Press: London, UK, 2013; Volume 80, pp. 181–236. ISBN 0066-4103. [Google Scholar]
- Fiehn, O. Metabolomics—The link between genotypes and phenotypes. Plant Mol. Biol. 2002, 48, 155–171. [Google Scholar] [CrossRef] [PubMed]
- Weljie, A.M.; Newton, J.; Mercier, P.; Carlson, E.; Slupsky, C.M. Targeted Profiling: Quantitative Analysis of 1H NMR Metabolomics Data. Anal. Chem. 2006, 78, 4430–4442. [Google Scholar] [CrossRef] [PubMed]
- Petrakis, L. Spectral line shapes: Gaussian and Lorentzian functions in magnetic resonance. J. Chem. Educ. 1967, 44, 432. [Google Scholar] [CrossRef]
- Laatikainen, R.; Niemitz, M.; Malaisse, W.J.; Biesemans, M.; Willem, R. A computational strategy for the deconvolution of NMR spectra with multiplet structures and constraints: Analysis of overlapping C-13-H-2 multiplets of C-13 enriched metabolites from cell suspensions incubated in deuterated media. Magn. Reson. Med. 1996, 36, 359–365. [Google Scholar] [CrossRef]
- Hao, J.; Liebeke, M.; Astle, W.; De Iorio, M.; Bundy, J.G.; Ebbels, T.M.D. Bayesian deconvolution and quantification of metabolites in complex 1D NMR spectra using BATMAN. Nat. Protoc. 2014, 9, 1416–1427. [Google Scholar] [CrossRef]
- Gómez, J.; Brezmes, J.; Mallol, R.; Rodríguez, M.A.; Vinaixa, M.; Salek, R.M.; Correig, X.; Canellas, N. Dolphin: A tool for automatic targeted metabolite profiling using 1D and 2D 1H-NMR data. Anal. Bioanal. Chem. 2014, 406, 7967–7976. [Google Scholar] [CrossRef]
- Ravanbakhsh, S.; Liu, P.; Bjorndahl, T.C.; Mandal, R.; Grant, J.R.; Wilson, M.; Eisner, R.; Sinelnikov, I.; Hu, X.; Luchinat, C.; et al. Accurate, fully-automated NMR spectral profiling for metabolomics. PLoS ONE 2015, 10, e0124219. [Google Scholar] [CrossRef] [Green Version]
- Roweis, S. Levenberg-Marquardt Optimization. Notes Univ. Tor. 1996. Available online: https://cs.nyu.edu/~roweis/notes/lm.pdf (accessed on 6 February 2022).
- Kanzow, C.; Yamashita, N.; Fukushima, M. Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints. J. Comput. Appl. Math. 2004, 172, 375–397. [Google Scholar] [CrossRef] [Green Version]
- Dona, A.C.; Kyriakides, M.; Scott, F.; Shephard, E.A.; Varshavi, D.; Veselkov, K.; Everett, J.R. A guide to the identification of metabolites in NMR-based metabonomics/metabolomics experiments. Comput. Struct. Biotechnol. J. 2016, 14, 135–153. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Horst, R.; Pardalos, P.M. Nonconvex Optimization and Its Applications. In Handbook of Global Optimization; Springer: Boston, MA, USA, 2013; ISBN 9781461520252. [Google Scholar]
- Van der Hooft, J.J.J.; Rankin, N. Metabolite Identification in Complex Mixtures Using Nuclear Magnetic Resonance Spectroscopy BT—Modern Magnetic Resonance; Webb, G., Ed.; Springer International Publishing: Cham, Switzerland, 2016; pp. 1–32. ISBN 978-3-319-28275-6. [Google Scholar]
- Vitols, C.; Mercier, P. Correcting Lineshapes in NMR Spectra. Chenomx Appl. Note 2006. Available online: https://www.chenomx.com/wp-content/uploads/2016/01/Correcting-Lineshapes-in-NMR-Spectra.pdf (accessed on 6 February 2022).
- Hu, H.; Van, Q.N.; Mandelshtam, V.A.; Shaka, A.J. Reference deconvolution, phase correction, and line listing of NMR spectra by the 1D filter diagonalization method. J. Magn. Reson. 1998, 134, 76–87. [Google Scholar] [CrossRef] [Green Version]
- Takis, P.G.; Schäfer, H.; Spraul, M.; Luchinat, C. Deconvoluting interrelationships between concentrations and chemical shifts in urine provides a powerful analysis tool. Nat. Commun. 2017, 8, 1662. [Google Scholar] [CrossRef]
- Schleif, F.-M.; Riemer, T.; Börner, U.; Schnapka-Hille, L.; Cross, M. Genetic algorithm for shift-uncertainty correction in 1-D NMR-based metabolite identifications and quantifications. Bioinformatics 2011, 27, 524–533. [Google Scholar] [CrossRef]
- Baran, R. Untargeted metabolomics suffers from incomplete data analysis. bioRxiv 2017, 143818. [Google Scholar] [CrossRef]
- Sokolenko, S.; McKay, R.; Blondeel, E.J.M.; Lewis, M.J.; Chang, D.; George, B.; Aucoin, M.G. Understanding the variability of compound quantification from targeted profiling metabolomics of 1D-1H-NMR spectra in synthetic mixtures and urine with additional insights on choice of pulse sequences and robotic sampling. Metabolomics 2013, 9, 887–903. [Google Scholar] [CrossRef]
- Nassar, M.; Doan, M.; Filby, A.; Wolkenhauer, O.; Fogg, D.K.; Piasecka, J.; Thornton, C.A.; Carpenter, A.E.; Summers, H.D.; Rees, P.; et al. Label-Free Identification of White Blood Cells Using Machine Learning. Cytom. Part A 2019, 95, 836–842. [Google Scholar] [CrossRef] [Green Version]
- Phongpreecha, T.; Fernandez, R.; Mrdjen, D.; Culos, A.; Gajera, C.R.; Wawro, A.M.; Stanley, N.; Gaudilliere, B.; Poston, K.L.; Aghaeepour, N.; et al. Single-cell peripheral immunoprofiling of Alzheimer’s and Parkinson’s diseases. Sci. Adv. 2020, 6, eabd5575. [Google Scholar] [CrossRef]
- Gajera, C.R.; Fernandez, R.; Postupna, N.; Montine, K.S.; Fox, E.J.; Tebaykin, D.; Angelo, M.; Bendall, S.C.; Keene, C.D.; Montine, T.J. Mass synaptometry: High-dimensional multi parametric assay for single synapses. J. Neurosci. Methods 2019, 312, 73–83. [Google Scholar] [CrossRef] [PubMed]
- Phongpreecha, T.; Gajera, C.R.; Liu, C.C.; Vijayaragavan, K.; Chang, A.L.; Becker, M.; Fallahzadeh, R.; Fernandez, R.; Postupna, N.; Sherfield, E.; et al. Single-synapse analyses of Alzheimer’s disease implicate pathologic tau, DJ1, CD47, and ApoE. Sci. Adv. 2021, 7, eabk0473. [Google Scholar] [CrossRef] [PubMed]
- Simmons, A.J.; Banerjee, A.; McKinley, E.T.; Scurrah, C.R.; Herring, C.A.; Gewin, L.S.; Masuzaki, R.; Karp, S.J.; Franklin, J.L.; Gerdes, M.J.; et al. Cytometry-based single-cell analysis of intact epithelial signaling reveals MAPK activation divergent from TNF-α-induced apoptosis in vivo. Mol. Syst. Biol. 2015, 11, 835. [Google Scholar] [CrossRef] [PubMed]
- Ho, W.J.; Erbe, R.; Danilova, L.; Phyo, Z.; Bigelow, E.; Stein-O’Brien, G.; Thomas, D.L., 2nd; Charmsaz, S.; Gross, N.; Woolman, S.; et al. Multi-omic profiling of lung and liver tumor microenvironments of metastatic pancreatic cancer reveals site-specific immune regulatory pathways. Genome Biol. 2021, 22, 1–23. [Google Scholar] [CrossRef]
- Cañueto, D.; Gómez, J.; Salek, R.M.; Correig, X.; Cañellas, N. rDolphin: A GUI R package for proficient automatic profiling of 1D 1 H-NMR spectra of study datasets. Metabolomics 2018, 14, 24. [Google Scholar] [CrossRef]
- Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2013; ISBN 9781461468493. [Google Scholar]
- Efron, B.; Hastie, T. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science; Institute of Mathematical Statistics Monographs; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
- Gromski, P.S.; Muhamadali, H.; Ellis, D.I.; Xu, Y.; Correa, E.; Turner, M.L.; Goodacre, R. A tutorial review: Metabolomics and partial least squares-discriminant analysis—A marriage of convenience or a shotgun wedding. Anal. Chim. Acta 2015, 879, 10–23. [Google Scholar] [CrossRef]
- Efron, B. Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation. J. Am. Stat. Assoc. 1983, 78, 316–331. [Google Scholar] [CrossRef]
- Efron, B.; Tibshirani, R. Improvements on Cross-Validation: The 632+ Bootstrap Method. J. Am. Stat. Assoc. 1997, 92, 548–560. [Google Scholar] [CrossRef]
- Savorani, F.; Tomasi, G.; Engelsen, S.B. icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. J. Magn. Reson. 2010, 202, 190–202. [Google Scholar] [CrossRef]
- Vu, T.N.; Valkenborg, D.; Smets, K.; Verwaest, K.A.; Dommisse, R.; Lemière, F.; Verschoren, A.; Goethals, B.; Laukens, K. An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data. BMC Bioinform. 2011, 12, 405. [Google Scholar] [CrossRef] [Green Version]
- Noguera-Julian, M.; Rocafort, M.; Guillén, Y.; Rivera, J.; Casadellà, M.; Nowak, P.; Hildebrand, F.; Zeller, G.; Parera, M.; Bellido, R.; et al. Gut Microbiota Linked to Sexual Preference and HIV Infection. EBioMedicine 2016, 5, 135–146. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dieterle, F.; Ross, A.; Schlotterbeck, G.; Senn, H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in1H NMR metabonomics. Anal. Chem. 2006, 78, 4281–4290. [Google Scholar] [CrossRef] [PubMed]
- Hernández-Alonso, P.; Giardina, S.; Cañueto, D.; Salas-Salvadó, J.; Cañellas, N.; Bulló, M. Changes in Plasma Metabolite Concentrations after a Low-Glycemic Index Diet Intervention. Mol. Nutr. Food Res. 2019, 63, 1700975. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Elzhov, T.V.; Mullen, K.M.; Spiess, A.-N.; Maintainer, B.B. Minpack.Lm: R Interface to the Levenberg-Marquardt Nonlinear Least-Squares Algorithm Found in MINPACK, Plus Support for Bounds. R Packag. Version 1.2-1; 2016; pp. 1–14. Available online: https://cran.r-project.org/web/packages/minpack.lm/minpack.lm.pdf (accessed on 6 February 2022).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cañueto, D.; Salek, R.M.; Bulló, M.; Correig, X.; Cañellas, N. Application of Machine Learning Solutions to Optimize Parameter Prediction to Enhance Automatic NMR Metabolite Profiling. Metabolites 2022, 12, 283. https://doi.org/10.3390/metabo12040283
Cañueto D, Salek RM, Bulló M, Correig X, Cañellas N. Application of Machine Learning Solutions to Optimize Parameter Prediction to Enhance Automatic NMR Metabolite Profiling. Metabolites. 2022; 12(4):283. https://doi.org/10.3390/metabo12040283
Chicago/Turabian StyleCañueto, Daniel, Reza M. Salek, Mònica Bulló, Xavier Correig, and Nicolau Cañellas. 2022. "Application of Machine Learning Solutions to Optimize Parameter Prediction to Enhance Automatic NMR Metabolite Profiling" Metabolites 12, no. 4: 283. https://doi.org/10.3390/metabo12040283
APA StyleCañueto, D., Salek, R. M., Bulló, M., Correig, X., & Cañellas, N. (2022). Application of Machine Learning Solutions to Optimize Parameter Prediction to Enhance Automatic NMR Metabolite Profiling. Metabolites, 12(4), 283. https://doi.org/10.3390/metabo12040283