A Review on Differential Abundance Analysis Methods for Mass Spectrometry-Based Metabolomic Data
Abstract
:1. Introduction
2. Statistical Methods for DA Analysis
2.1. One-Part Tests
2.1.1. Wilcoxon Rank-Sum Test
2.1.2. Truncated Wilcoxon-Test
2.1.3. Tobit-Model
2.2. Two-Part Tests
2.2.1. Two-Part t-Test
2.2.2. Two-Part Wilcoxon Test
2.2.3. SDA
2.3. Mixture Models
2.3.1. Left-Inflated Mixture Likelihood Ratio Test (LIM-LRT)
2.3.2. DASEV
2.4. Model Comparison
3. Practical Guidelines
4. Discussion
Author Contributions
Funding
Conflicts of Interest
References
- Oliver, S.G.; Winson, M.K.; Kell, D.B.; Baganz, F. Systematic functional analysis of the yeast genome. Trends Biotechnol. 1998, 16, 373–378. [Google Scholar] [CrossRef]
- Alseekh, S.; Fernie, A.R. Metabolomics 20 years on: What have we learned and what hurdles remain? Plant J. 2018, 94, 933–942. [Google Scholar] [CrossRef] [PubMed]
- Trivedi, D.K.; Hollywood, K.A.; Goodacre, R. Metabolomics for the masses: The future of metabolomics in a personalized world. New Horiz. Transl. Med. 2017, 3, 294–305. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, X.; Locasale, J.W. Metabolomics: A Primer. Trends Biochem. Sci. 2017, 42, 274–284. [Google Scholar] [CrossRef] [Green Version]
- Guijas, C.; Montenegro-Burke, J.R.; Warth, B.; Spilker, M.E.; Siuzdak, G. Metabolomics activity screening for identifying metabolites that modulate phenotype. Nat. Biotechnol. 2018, 36, 316–320. [Google Scholar] [CrossRef]
- Sinem, N.; Abdullah, K. Introductory Chapter: Insight into the OMICS Technologies and Molecular Medicine; Sinem, N., Hakima, A., Eds.; Molecular Medicine; IntechOpen: London, UK, 2019. [Google Scholar]
- Alseekh, S.; Aharoni, A.; Brotman, Y.; Contrepois, K.; D’Auria, J.; Ewald, J.; Ewald, J.C.; Fraser, P.D.; Giavalisco, P.; Hall, R.D.; et al. Mass spectrometry-based metabolomics: A guide for annotation, quantification and best reporting practices. Nat. Methods 2021, 18, 747–756. [Google Scholar] [CrossRef]
- Dunn, W.B. Mass spectrometry in systems biology an introduction. Methods Enzym. 2011, 500, 15–35. [Google Scholar]
- Aretz, I.; Meierhofer, D. Advantages and Pitfalls of Mass Spectrometry Based Metabolome Profiling in Systems Biology. Int. J. Mol. Sci. 2016, 17, 632. [Google Scholar] [CrossRef] [Green Version]
- Saghatelian, A.; Trauger, S.A.; Want, E.J.; Hawkins, E.G.; Siuzdak, G.; Cravatt, B.F. Assignment of endogenous substrates to enzymes by global metabolite profiling. Biochemistry 2004, 43, 14332–14339. [Google Scholar] [CrossRef] [Green Version]
- Boiteau, R.M.; Hoyt, D.W.; Nicora, C.D.; Kinmonth-Schultz, H.A.; Ward, J.K.; Bingol, K. Structure Elucidation of Unknown Metabolites in Metabolomics by Combined NMR and MS/MS Prediction. Metabolites 2018, 8, 8. [Google Scholar] [CrossRef] [Green Version]
- Levsen, K.; Schiebel, H.M.; Behnke, B.; Dötzer, R.; Dreher, W.; Elend, M.; Thiele, H. Structure elucidation of phase II metabolites by tandem mass spectrometry: An overview. J. Chromatogr. A 2005, 1067, 55–72. [Google Scholar] [CrossRef]
- Dunn, W.B.; Broadhurst, D.; Begley, P.; Zelena, E.; Francis-McIntyre, S.; Anderson, N.; Brown, M.; Knowles, J.D.; Halsall, A.; Haselden, J.N.; et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 2011, 6, 1060–1083. [Google Scholar] [CrossRef] [PubMed]
- Shao, Y.; Li, T.; Liu, Z.; Wang, X.; Xu, X.; Li, S.; Xu, G.; Le, W. Comprehensive metabolic profiling of Parkinson’s disease by liquid chromatography-mass spectrometry. Mol. Neurodegener. 2021, 16, 4. [Google Scholar] [CrossRef] [PubMed]
- Clarke, C.J.; Haselden, J.N. Metabolic profiling as a tool for understanding mechanisms of toxicity. Toxicol. Pathol. 2008, 36, 140–147. [Google Scholar] [CrossRef] [PubMed]
- Lapainis, T.; Rubakhin, S.S.; Sweedler, J.V. Capillary electrophoresis with electrospray ionization mass spectrometric detection for single-cell metabolomics. Anal. Chem. 2009, 81, 5858–5864. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Prasad, B.; Garg, A.; Takwani, H.; Singh, S. Metabolite identification by liquid chromatography-mass spectrometry. TrAC Trends Anal. Chem. 2011, 30, 360–387. [Google Scholar] [CrossRef]
- Xiao, J.F.; Zhou, B.; Ressom, H.W. Metabolite identification and quantitation in LC-MS/MS-based metabolomics. Trends Anal. Chem. TRAC 2012, 32, 1–14. [Google Scholar] [CrossRef] [Green Version]
- Dahal, U.P.; Jones, J.P.; Davis, J.A.; Rock, D.A. Small molecule quantification by liquid chromatography-mass spectrometry for metabolites of drugs and drug candidates. Drug Metab. Dispos. 2011, 39, 2355–2360. [Google Scholar] [CrossRef] [Green Version]
- Easterling, L.F.; Yerabolu, R.; Kumar, R.; Alzarieni, K.Z.; Kenttämaa, H.I. Factors Affecting the Limit of Detection for HPLC/Tandem Mass Spectrometry Experiments Based on Gas-Phase Ion-Molecule Reactions. Anal. Chem. 2020, 92, 7471–7477. [Google Scholar] [CrossRef]
- Lu, W.; Su, X.; Klein, M.S.; Lewis, I.A.; Fiehn, O.; Rabinowitz, J.D. Metabolite Measurement: Pitfalls to Avoid and Practices to Follow. Annu. Rev. Biochem. 2017, 86, 277–304. [Google Scholar] [CrossRef]
- Gleiss, A.; Dakna, M.; Mischak, H.; Heinze, G. Two-group comparisons of zero-inflated intensity values: The choice of test statistic matters. Bioinformatics 2015, 31, 2310–2317. [Google Scholar] [CrossRef] [PubMed]
- Dakna, M.; Harris, K.; Kalousis, A.; Carpentier, S.; Kolch, W.; Schanstra, J.P.; Haubitz, M.; Vlahou, A.; Mischak, H.; Girolami, M. Addressing the challenge of defining valid proteomic biomarkers and classifiers. BMC Bioinform. 2010, 11, 594. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Do, K.T.; Wahl, S.; Raffler, J.; Molnos, S.; Laimighofer, M.; Adamski, J.; Suhre, K.; Strauch, K.; Peters, A.; Gieger, C.; et al. Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. Metabolomics 2018, 14, 128. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Faquih, T.; van Smeden, M.; Luo, J.; le Cessie, S.; Kastenmüller, G.; Krumsiek, J.; Noordam, R.; Van Heemst, D.; Rosendaal, F.R.; Vlieg, A.V.H.; et al. A Workflow for Missing Values Imputation of Untargeted Metabolomics Data. Metabolites 2020, 10, 486. [Google Scholar] [CrossRef] [PubMed]
- Taylor, S.L.; Leiserowitz, G.S.; Kim, K. Accounting for undetected compounds in statistical analyses of mass spectrometry ‘omic studies. Stat. Appl. Genet. Mol. Biol. 2013, 12, 703–722. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hrydziuszko, O.; Viant, M.R. Missing values in mass spectrometry based metabolomics: An undervalued step in the data processing pipeline. Metabolomics 2011, 8, 161–174. [Google Scholar] [CrossRef]
- Li, Y.; Fan, T.W.M.; Lane, A.N.; Kang, W.Y.; Arnold, S.M.; Stromberg, A.J.; Wang, C.; Chen, L. SDA: A semi-parametric differential abundance analysis method for metabolomics and proteomics data. BMC Bioinform. 2019, 20, 501. [Google Scholar] [CrossRef] [Green Version]
- Zhang, D.; Fan, C.; Zhang, J.; Zhang, C.H. Nonparametric methods for measurements below detection limit. Stat. Med. 2009, 28, 700–715. [Google Scholar] [CrossRef]
- Smyth, G.K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 2004, 3, 1–25. [Google Scholar] [CrossRef]
- Wang, P.; Tang, H.; Zhang, H.; Whiteaker, J.; Paulovich, A.G.; McIntosh, M. Normalization regarding non-random missing values in high-throughput mass spectrometry data. Biocomputing 2006, 11, 315–326. [Google Scholar]
- Hughes, G.; Cruickshank-Quinn, C.; Reisdorph, R.; Lutz, S.; Petrache, I.; Reisdorph, N.; Bowler, R.; Kechris, K. MSPrep-summarization, normalization and diagnostics for processing of mass spectrometry-based metabolomic data. Bioinformatics 2014, 30, 133–134. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Webb-Robertson, B.J.; Wiberg, H.K.; Matzke, M.M.; Brown, J.N.; Wang, J.; McDermott, J.E.; Smith, R.D.; Rodland, K.D.; Metz, T.O.; Pounds, J.G.; et al. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J. Proteome Res. 2015, 14, 1993–2001. [Google Scholar] [CrossRef] [Green Version]
- Lazar, C.; Gatto, L.; Ferro, M.; Bruley, C.; Burger, T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Impu-tation Strategies. J. Proteome Res. 2016, 15, 1116–1125. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liaqat, M.; Kamal, S.; Fischer, F.; Zia, N. Zero-inflated and hurdle models with an application to the number of involved axillary lymph nodes in primary breast cancer. J. King Saud Univ.-Sci. 2022, 34, 101932. [Google Scholar] [CrossRef]
- Zhang, P.; Pitt, D.; Wu, X. A New Multivariate Zero-Inflated Hurdle Model with Applications in Automobile Insurance. ASTIN Bull. 2022, 1–24. [Google Scholar] [CrossRef]
- Lam, K.F.; Xue, H.; Bun Cheung, Y. Semiparametric Analysis of Zero-Inflated Count Data. Biometrics 2006, 62, 996–1003. [Google Scholar] [CrossRef]
- Neelon, B.; O’Malley, A.J.; Smith, V.A. Modeling zero-modified count and semicontinuous data in health services research part 2: Case studies. Stat. Med. 2016, 35, 5094–5112. [Google Scholar] [CrossRef]
- Young, D.S.; Roemmele, E.; Yeh, P. Zero inflated modeling part I: Traditional zero inflated count regression models, their applications, and computational tools. WIREs Comput. Stat. 2020, 14, e1541. [Google Scholar] [CrossRef]
- Liu, L.; Shih, Y.-C.T.; Strawderman, R.L.; Zhang, D.; Johnson, B.A.; Chai, H. Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review. Stat. Sci. 2019, 34, 253–279. [Google Scholar] [CrossRef]
- Min, Y.; Agresti, A. Modeling Nonnegative Data with Clumping at Zero: A Survey. J. Iran. Stat. Soc. 2002, 1, 7–33. [Google Scholar]
- Wilcoxon, F. Individual Comparisons by Ranking Methods. Biom. Bull. 1945, 1, 80–83. [Google Scholar] [CrossRef]
- Hallstrom, A.P. A modified Wilcoxon test for non-negative distributions with a clump of zeros. Stat. Med. 2010, 29, 391–400. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; Chen, E.Z.; Li, H. Truncated Rank-Based Tests for Two-Part Models with Excessive Zeros and Applications to Microbiome Data. arXiv 2021, arXiv:2110.05368. [Google Scholar]
- Taylor, S.; Pollard, K. Hypothesis tests for point-mass mixture data with application to ‘omics data with many zero values. Stat. Appl. Genet. Mol. Biol. 2009, 8, 8. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Simpson, D.G. Conditional decomposition diagnostics for regression analysis of zero-inflated and left-censored data. Stat. Methods Med. Res. 2012, 21, 393–408. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Moulton, L.H.; Halsey, N.A. A mixture model with detection limits for regression analyses of antibody response to vaccine. Biometrics 1995, 51, 1570–1578. [Google Scholar] [CrossRef]
- Karpievitch, Y.; Stanley, J.; Taverner, T.; Huang, J.; Adkins, J.N.; Ansong, C.; Heffron, F.; Metz, T.O.; Qian, W.-J.; Yoon, H.; et al. A statistical framework for protein quantitation in bottom-up MS-based proteomics. Bioinformatics 2009, 25, 2028–2034. [Google Scholar] [CrossRef] [Green Version]
- Wu, S.H.; Black, M.A.; North, R.A.; Atkinson, K.R.; Rodrigo, A.G. A statistical model to identify differentially expressed proteins in 2D PAGE gels. PLoS Comput. Biol. 2009, 5, e1000509. [Google Scholar] [CrossRef] [Green Version]
- Huang, Z.; Lane, A.N.; Fan, T.W.M.; Higashi, R.M.; Weiss, H.L.; Yin, X.; Wang, C. Differential Abundance Analysis with Bayes Shrinkage Estimation of Variance (DASEV) for Zero-Inflated Proteomic and Metabolomic Data. Sci. Rep. 2020, 10, 876. [Google Scholar] [CrossRef]
- Dwivedi, A.K.; Mallawaarachchi, I.; Alvarado, L.A. Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method. Stat. Med. 2017, 36, 2187–2205. [Google Scholar] [CrossRef]
- Mundry, R.; Fischer, J. Use of statistical programs for nonparametric tests of small samples often leads to incorrect P values: Examples fromAnimal Behaviour. Anim. Behav. 1998, 56, 256–259. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tsonaka, R.; Signorelli, M.; Sabir, E.; Seyer, A.; Hettne, K.; Aartsma-Rus, A.; Spitali, P. Longitudinal metabolomic analysis of plasma enables modeling disease progression in Duchenne muscular dystrophy mouse models. Hum. Mol. Genet. 2020, 29, 745–755. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Overmyer, K.A.; Shishkova, E.; Miller, I.J.; Balnis, J.; Bernstein, M.N.; Peters-Clarke, T.M.; Meyer, J.G.; Quan, Q.; Muehlbauer, L.K.; Trujillo, E.A.; et al. Large-Scale Multi-omic Analysis of COVID-19 Severity. Cell Syst. 2021, 12, 23–40.e7. [Google Scholar] [CrossRef]
- Sindelar, M.; Stancliffe, E.; Schwaiger-Haber, M.; Anbukumar, D.S.; Adkins-Travis, K.; Goss, C.W.; O’Halloran, J.A.; Mudd, P.A.; Liu, W.-C.; Albrecht, R.A.; et al. Longitudinal metabolomics of human plasma reveals prognostic markers of COVID-19 disease severity. Cell Rep. Med. 2021, 2, 100369. [Google Scholar] [CrossRef]
- Jendoubi, T.; Ebbels, T.M.D. Integrative analysis of time course metabolic data and biomarker discovery. BMC Bioinform. 2020, 21, 11. [Google Scholar] [CrossRef] [PubMed]
- Berk, M.; Ebbels, T.; Montana, G. A statistical framework for biomarker discovery in metabolomic time course data. Bioinformatics 2011, 27, 1979–1985. [Google Scholar] [CrossRef]
- Mei, Y.; Kim, S.B.; Tsui, K.-L. Linear-mixed effects models for feature selection in high-dimensional NMR spectra. Expert Syst. Appl. 2009, 36, 4703–4708. [Google Scholar] [CrossRef]
- Rusilowicz, M.J.; Dickinson, M.; Charlton, A.J.; O’Keefe, S.; Wilson, J. MetaboClust: Using interactive time-series cluster analysis to relate metabolomic data with perturbed pathways. PLoS ONE 2018, 13, e0205968. [Google Scholar] [CrossRef] [Green Version]
- Gowda, G.A.N.; Zhang, S.; Gu, H.; Asiago, V.; Shanaiah, N.; Raftery, D. Metabolomics-based methods for early disease diagnostics. Expert Rev. Mol. Diagn. 2008, 8, 617–633. [Google Scholar] [CrossRef] [Green Version]
- Wieder, C.; Frainay, C.; Poupin, N.; Rodríguez-Mier, P.; Vinson, F.; Cooke, J.; Lai, R.P.; Bundy, J.G.; Jourdan, F.; Ebbels, T. Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis. PLoS Comput. Biol. 2021, 17, e1009105. [Google Scholar] [CrossRef]
- Xia, J.; Wishart, D.S. MetPA: A web-based metabolomics tool for pathway analysis and visualization. Bioinformatics 2010, 26, 2342–2344. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Marco-Ramell, A.; Palau-Rodriguez, M.; Alay, A.; Tulipani, S.; Urpi-Sarda, M.; Sanchez-Pla, A.; Andres-Lacueva, C. Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data. BMC Bioinform. 2018, 19, 1. [Google Scholar] [CrossRef] [PubMed]
- Jiang, D.; Armour, C.R.; Hu, C.; Mei, M.; Tian, C.; Sharpton, T.J.; Jiang, Y. Microbiome Multi-Omics Network Analysis: Statistical Considerations, Limitations, and Opportunities. Front. Genet. 2019, 10, 995. [Google Scholar] [CrossRef] [PubMed] [Green Version]
| Category | Method | Able to Distinguish TPMVs and BPMVs | Free of Data Normality Assumption | Available R Function/Package | References | 
|---|---|---|---|---|---|
| One-part test | Wilcoxon rank-sum test | N | Y | wilcox.test | [42] | 
| Truncated Wilcoxon test | N | Y | https://rdrr.io/github/chvlyl/ZIR/ | [43,44] | |
| Tobit-model | N | N | VGAM (https://cran.r-project.org/web/packages/VGAM/index.html) | [22] | |
| Two-part test | Two-part t-test | N | N | t.test binom.test | [22] | 
| Two-part Wilcoxon test | N | Y | wilcox.test binom.test | [22] | |
| SDA | N | Y | SDAMS (https://bioconductor.org/packages/release/bioc/html/SDAMS.html) | [28] | |
| Mixture Model | LIM-LRT | Y | N | https://cemsiis.meduniwien.ac.at/en/kb/science-research/software/statistical-software/limlrt/ | [22,26,46,47] | 
| DASEV | Y | N | http://sweb.uky.edu/~cwa236/DASEV.html | [50] | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, Z.; Wang, C. A Review on Differential Abundance Analysis Methods for Mass Spectrometry-Based Metabolomic Data. Metabolites 2022, 12, 305. https://doi.org/10.3390/metabo12040305
Huang Z, Wang C. A Review on Differential Abundance Analysis Methods for Mass Spectrometry-Based Metabolomic Data. Metabolites. 2022; 12(4):305. https://doi.org/10.3390/metabo12040305
Chicago/Turabian StyleHuang, Zhengyan, and Chi Wang. 2022. "A Review on Differential Abundance Analysis Methods for Mass Spectrometry-Based Metabolomic Data" Metabolites 12, no. 4: 305. https://doi.org/10.3390/metabo12040305
APA StyleHuang, Z., & Wang, C. (2022). A Review on Differential Abundance Analysis Methods for Mass Spectrometry-Based Metabolomic Data. Metabolites, 12(4), 305. https://doi.org/10.3390/metabo12040305
 
         
                                                
 
       