1. Introduction
Metabolomics has increasingly been utilized in a number of domains, including medical settings, typically to characterize groups based on sets of metabolites that can be used to differentiate between them. Clinical applications generally require metabolites to be identified in order to better understand the disease process, and so targeted approaches are utilized [
1,
2]. One-dimensional (1D) proton nuclear magnetic resonance (
1H-NMR) spectroscopy is a popular metabolomics platform for a number of reasons, such as high reproducibility, preservation of samples, short acquisition time, and quantification of metabolites over a dynamic range [
3,
4,
5,
6,
7]. However,
1H-NMR-based metabolomics studies require substantial time and effort to obtain reliable identifications and quantifications of known metabolites through manual spectral fitting by an expert before any statistical analyses of the cohort can take place.
A number of automated, non-commercial programs have been developed over the past 10 years to reduce the time and resource requirements for the identification and quantification of NMR spectra as well as improve the reproducibility of the results. Some automated programs for targeted analyses include BQuant, AQuA, BATMAN, BAYESIL, ASICS, and Dolphin/rDolphin [
5,
6,
8,
9,
10,
11,
12,
13,
14]. These programs employ different algorithms on a variety of platforms, including R, MATLAB, and web-based implementations. However, not all of these programs are still available, and if they are, they are being maintained and coded using open-source programs. Two programs that meet these requirements are BATMAN and rDolphin.
The Bayesian automated metabolite analyzer for
1H-NMR spectra (BATMAN) utilizes existing knowledge about the resonance signatures of peaks from publicly available databases, such as the Human Metabolome Database (HMDB) [
15]. Known peaks are assigned to specific metabolites and quantified while unknown peaks are modelled using wavelets to incorporate potential features that contribute to understanding the studied phenomenon, for example, improving classification of the sample spectra. A Markov chain Monte Carlo algorithm estimates the joint posterior distribution of the parameters. The wavelet component is heavily penalized to favor known over unknown peaks in the likelihood during the burn-in phase of model fitting [
16].
rDolphin implements the Dolphin program written in MATLAB (The Mathworks, Inc., Natick, Massachusetts) code in the open-source software program R [
17]. Like Dolphin, rDolphin uses the Region of Interest (ROI) approach to identify peak areas where metabolites can be identified. The redesign of Dolphin into rDolphin addressed several limitations of Dolphin, providing improved visualization of spectral regions with high variability across sample spectra meriting scrutiny, options for enhanced metabolite identification, quality and reliability checks to minimize suboptimal quantification, and potential for novel metabolite identification. rDolphin includes a comprehensive graphical user interface (GUI) to facilitate exploratory analyses.
rDolphin and BATMAN share a number of program features which are implemented by different methodologies. Both programs use a targeted approach to identify metabolites and address peak overlap problems in 1D
1H-NMR spectra. In addition, both are written in the R programming language and utilize approaches to detect and potentially identify unknown metabolites or features. Lastly, both programs import information from the Human Metabolome Database (HMDB) for metabolite identification and quantification [
15]. BATMAN is based on a Bayesian model that estimates model parameters from the posterior distribution, while rDolphin is based on a 1D line shape fitting approach.
The purpose of this pilot study was to compare results obtained by naïve operators who used these two automated, open-source programs (BATMAN and rDolphin) to carry out identification and quantification of 1D
1H-NMR metabolomics spectral data with the results obtained by manual profiling by an experienced spectroscopist. A targeted approach was needed for clinical diagnostic purposes in the original study [
18], so this was also adopted here.
3. Discussion
This study evaluated the use of two automated, open-source programs that carry out identification and quantification of 1D 1H-NMR metabolomics spectral data and compared their results with those obtained via manual profiling performed by an experienced spectroscopist using a commercially available program.
Our results revealed that the manual profiling results had the highest sensitivity compared with the rDolphin and BATMAN programs (
Table 2). All three approaches had very similar and high specificity. Thus, for biomarkers that have been carefully validated, if specificity is more important than sensitivity in a clinical application, then the automated methods could work very well. For pediatric sepsis diagnosis and triage, a high specificity is not adequate, as being highly sensitive early on to detect the need for PICU is very important in making triage decisions. On this measure, the automated programs performed poorly, and this is reflected in the AUC results discussed next.
The AUC results showed the superiority of the manual profiling method, and the comparability of the BATMAN to the rDolphin program. Although the two automated programs had similar performance metrics, the selected metabolites for each were not the same. Both programs identified the same two metabolites from the decreased concentration in the PICU sepsis cohort, namely, acetate and citrate. rDolphin also identified serine whereas BATMAN identified alanine from the set of seven metabolites. Both programs also selected only one metabolite from the set of six metabolites with increased concentrations in the PICU sepsis cohort, with rDolphin identifying dimethylamine and BATMAN identifying lysine. The results for the Expert Profiler also showed that only three of the six metabolites in the increased concentration list were selected compared to all of the metabolites in the decreased concentration set. Since all of these results are based on how frequently metabolites were selected in resampled data sets in the BioMark program, this suggests that increased concentrations of some unselected metabolites might vary in this data set. We compared the ratios of the means for each metabolite to the results from the modelling results in
Figure 2b but did not find a consistent explanation (see
Supplementary Materials Table S1).
Precision results based on the number of correct identifications out of all selected metabolites by the BioMark program were all similar. The BioMark program settings were set to a high level of reproducibility, so this finding that about 30–60% of the biomarkers selected were the correct ones across all three methods suggests that some additional, individual metabolites might be important in this different approach. However, the identification of the list of true biomarkers was based on OPLS-DA that can take the correlation between metabolites into account, which may explain this finding. Lastly, the testing accuracy from the Multilayer Perceptrons with Hidden Multipliers (MLPHM) neural net found the manual profiling to be very accurate whereas the two automated methods were poor. This may be due to unclear features and relatively large amount of noise in these data sets obtained by the two automated methods, indicating that the quality of the metabolite concentrations was not as good. Overall, these findings suggest that neither rDolphin nor BATMAN could perform reasonably well compared to manual profiling.
Automated open-source programs have a number of important advantages for analysing NMR-based metabolomics data over manual profiling using a commercial program. First, these programs can provide greater consistency between experienced spectroscopists and laboratories, as the results should be highly reproducible using the same input data and parameters. Second, the time required to obtain metabolite concentrations is a matter of seconds up to a few minutes compared with 30–60 min for each
1H-NMR spectrum when performed manually. This gain in analysis time can be important when clinical decisions for treatment need to be made quickly, as in the case of sepsis [
19]. Third, greater accuracy over time can be achieved, as manual curve fitting can be tedious and performance can drift due to the repetitiveness of the process. Less experienced users can use these programs with limited training or expertise. Lastly, open-source programs are free, so available study resources can be used for other expenditures rather than obtaining commercial licenses.
However, these open-source, automated programs also come with a number of disadvantages. They are not fully automated, as there still is some input required by the analyst that affects the identification and quantification of the spectra. In rDolphin, for example, an analyst could adjust the spectral fit based on visual assessment and error messages. This could improve performance but at the cost of extra time and the need for an expert profiler. Automated programs are also not as accurate as an expert who can match reference peaks found in the library to what is found in each sample and who can take into account overlapping peaks or superimposition of unknown signals. Based on our experience in the early phase of this project, a number of these open-source programs were not being maintained, would not install on our laptop for multiple reasons, including missing program components, or were no longer available to download. If workflows are built based on a program that is not maintained over time, then consistency and time-savings are not long-lived. The BATMAN program also required substantially more computational resources that might not be available to all users.
In conclusion, this pilot project aimed to answer the question: Can automated, publicly available programs that carry out identification and quantification of 1H-NMR metabolomics spectral data implemented by naïve users achieve acceptable results compared to those obtained by an expert? Our results for this pediatric sepsis data set of serum samples showed that neither the rDolphin nor BATMAN programs performed acceptably well. There might be several reasons for these discrepancies, including program capabilities as well as differences in the pre-processing of the spectra or in the adjustments or lack of adjustments made by human operators. We are investigating why each of the automated methods performed better with the decreased metabolite concentrations to better understand the differences in their approaches regarding quantification as implemented in this study. Automated approaches will not likely replace manual profiling by an expert, but they could be used to quickly determine whether known metabolites distinguish groups of patients. Newer, automated methods are being developed to overcome identification and quantification challenges, so the performance gap is likely to keep closing.
Author Contributions
Conceptualization, H.J.V., B.M. and K.A.K.; methodology, B.M., H.J.V., K.A.K. and X.W.; software, K.A.K. and X.W.; validation, B.M. and X.W.; formal analysis, K.A.K. and X.W.; investigation, X.W.; resources, A.R.J., B.M., G.C.T., H.J.V. and J.B.; data curation, B.M.; writing—original draft preparation, X.W. and K.A.K..; writing—review and editing, A.R.J., B.M., G.C.T., H.J.V., J.B., K.A.K. and X.W.; visualization, X.W.; supervision, B.M. and K.A.K. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
The original study was approved by the Conjoint Health Research Ethics Board of the University of Calgary (Ethics ID 23426) and the Health Research Ethics Board of the University of Alberta (Pro00008797). Use of the anonymized data to explore metabolomic prediction of triage in pediatric sepsis did not require separate ethics approval.
Informed Consent Statement
All patients, or their next of kin, provided written informed consent for participating in this study, part of the Alberta Sepsis Network project examining metabolomics to predict triage in pediatric sepsis.
Data Availability Statement
The datasets analyzed during the current study are not publicly available due to limitations of consent for the original study but could be available from Joffe on reasonable request.
Acknowledgments
The authors gratefully acknowledge the funding for the original study that was supported by an Alberta Innovates-Health Solutions team grant to the Alberta Sepsis Network. We thank the following for their help with this original project: Josee Wong and the Critical Care Epidemiologic and Biologic Tissue Resource for specimen handling, Derrice Knight for her organizational skills as the project manager for the Alberta Sepsis Network, Mandy Tse and the Snyder Translational Laboratory in Critical Care Medicine for protein mediator analysis, and the pediatric research coordinators who managed the consent process and data collection and entry.
Conflicts of Interest
B.M. and H.J.V. hold patent US20140205591 A1 (Metabolite Biomarkers for Diagnosis and Prognosis of Pediatric Septic Shock). None of the other authors has any competing interests or other financial disclosures to make. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
References
- Weljie, A.M.; Newton, J.; Mercier, P.; Carlson, E.; Slupsky, C.M. Targeted profiling: Quantitative analysis of 1H NMR metabolomics data. Anal. Chem. 2006, 78, 4430–4442. [Google Scholar] [CrossRef] [PubMed]
- Kohler, I.; Hankemeier, T.; van der Graaf, P.H.; Knibbe, C.A.; van Hasselt, J.C. Integrating clinical metabolomics-based biomarker discovery and clinical pharmacology to enable precision medicine. Eur. J. Pharm. Sci. 2017, 109, S15–S21. [Google Scholar] [CrossRef] [PubMed]
- Emwas, A.-H.; Roy, R.; McKay, R.T.; Tenori, L.; Saccenti, E.; Gowda, G.; Raftery, D.; Alahmari, F.; Jaremko, L.; Jaremko, M. NMR spectroscopy for metabolomics research. Metabolites 2019, 9, 123. [Google Scholar] [CrossRef] [Green Version]
- Takis, P.G.; Schäfer, H.; Spraul, M.; Luchinat, C. Deconvoluting interrelationships between concentrations and chemical shifts in urine provides a powerful analysis tool. Nat. Commun. 2017, 8, 1662. [Google Scholar] [CrossRef]
- Röhnisch, H.E.; Eriksson, J.; Müllner, E.; Agback, P.; Sandström, C.; Moazzami, A.A. AQuA: An automated quantification algorithm for high-throughput NMR-based metabolomics and its application in human plasma. Anal. Chem. 2018, 90, 2095–2102. [Google Scholar] [CrossRef] [PubMed]
- Zheng, C.; Zhang, S.; Ragg, S.; Raftery, D.; Vitek, O. Identification and quantification of metabolites in 1H NMR spectra by Bayesian model selection. Bioinformatics 2011, 27, 1637–1644. [Google Scholar] [CrossRef] [Green Version]
- Vignoli, A.; Ghini, V.; Meoni, G.; Licari, C.; Takis, P.G.; Tenori, L.; Turano, P.; Luchinat, C. High-throughput metabolomics by 1D NMR. Angew. Chem. Int. Ed. 2019, 58, 968–994. [Google Scholar] [CrossRef]
- Spicer, R.; Salek, R.M.; Moreno, P.; Cañueto, D.; Steinbeck, C. Navigating freely-available software tools for metabolomics analysis. Metabolomics 2017, 13, 106. [Google Scholar] [CrossRef] [Green Version]
- Gómez, J.; Brezmes, J.; Mallol, R.; Rodríguez, M.A.; Vinaixa, M.; Salek, R.M.; Correig, X.; Cañellas, N. Dolphin: A tool for automatic targeted metabolite profiling using 1D and 2D 1 H-NMR data. Anal. Bioanal. Chem. 2014, 406, 7967–7976. [Google Scholar] [CrossRef]
- Cañueto, D.; Gómez, J.; Salek, R.M.; Correig, X.; Cañellas, N. rDolphin: A GUI R package for proficient automatic profiling of 1D 1 H-NMR spectra of study datasets. Metabolomics 2018, 14, 24. [Google Scholar] [CrossRef]
- Hao, J.; Astle, W.; De Iorio, M.; Ebbels, T.M. BATMAN—An R package for the automated quantification of metabolites from nuclear magnetic resonance spectra using a Bayesian model. Bioinformatics 2012, 28, 2088–2090. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ravanbakhsh, S.; Liu, P.; Bjordahl, T.C.; Mandal, R.; Grant, J.R.; Wilson, M.; Eisner, R.; Sinelnikov, I.; Hu, X.; Luchinat, C. Accurate, fully-automated NMR spectral profiling for metabolomics. PLoS ONE 2015, 10, e0124219. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lefort, G.; Liaubet, L.; Canlet, C.; Tardivel, P.; Père, M.-C.; Quesnel, H.; Paris, A.; Iannuccelli, N.; Vialaneix, N.; Servien, R. ASICS: An R package for a whole analysis workflow of 1D 1H NMR spectra. Bioinformatics 2019, 35, 4356–4363. [Google Scholar] [CrossRef] [PubMed]
- Tardivel, P.J.; Canlet, C.; Lefort, G.; Tremblay-Franco, M.; Debrauwer, L.; Concordet, D.; Servien, R. ASICS: An automatic method for identification and quantification of metabolites in complex 1D 1 H NMR spectra. Metabolomics 2017, 13, 109. [Google Scholar] [CrossRef] [Green Version]
- Wishart, D.S.; Feunang, Y.D.; Marcu, A.; Guo, A.C.; Liang, K.; Vázquez-Fresno, R.; Sajed, T.; Johnson, D.; Li, C.; Karu, N. HMDB 4.0: The human metabolome database for 2018. Nucleic Acids Res. 2018, 46, D608–D617. [Google Scholar] [CrossRef]
- Astle, W.; De Iorio, M.; Richardson, S.; Stephens, D.; Ebbels, T. A Bayesian model of NMR spectra for the deconvolution and quantification of metabolites in complex biological mixtures. J. Am. Stat. Assoc. 2012, 107, 1259–1271. [Google Scholar] [CrossRef]
- R Core Team R. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2015. [Google Scholar]
- Mickiewicz, B.; Thompson, G.C.; Blackwood, J.; Jenne, C.N.; Winston, B.W.; Vogel, H.J.; Joffe, A.R. Development of metabolic and inflammatory mediator biomarker phenotyping for early diagnosis and triage of pediatric sepsis. Crit. Care 2015, 19, 320. [Google Scholar] [CrossRef] [Green Version]
- Mathias, B.; Mira, J.; Larson, S.D. Pediatric sepsis. Curr. Opin. Pediatrics 2016, 28, 380. [Google Scholar] [CrossRef] [Green Version]
- Mnova, 12.0.3; Mestrelab Reseach S.L.: Santiago de Compostela, Spain, 2017.
- Wehrens, H.; Franceschi, P. Meta-statistics for variable selection: The R package BioMark. J. Stat. Softw. 2012, 51, 1–18. [Google Scholar] [CrossRef] [Green Version]
- Barker, M.; Rayens, W. Partial least squares for discrimination. J. Chemom. A J. Chemom. Soc. 2003, 17, 166–173. [Google Scholar] [CrossRef]
- Zuber, V.; Strimmer, K. Gene ranking and biomarker discovery under correlation. Bioinformatics 2009, 25, 2700–2707. [Google Scholar] [CrossRef] [PubMed]
- Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Donoho, D.; Jin, J. Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 2008, 105, 14790–14795. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Donoho, D.; Jin, J. Higher criticism for detecting sparse heterogeneous mixtures. Ann. Stat. 2004, 32, 962–994. [Google Scholar] [CrossRef] [Green Version]
- Meinshausen, N.; Bühlmann, P. Stability selection. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2010, 72, 417–473. [Google Scholar] [CrossRef]
- Wang, X.; Wang, J.; Zhang, K.; Lin, F.; Chang, Q. Convergence and objective functions of noise-injected multilayer perceptrons with hidden multipliers. Neurocomputing 2020, 452, 796–812. [Google Scholar] [CrossRef]
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).