In this review, we summarize established and recent bioinformatic and statistical methods for the analysis of NMR-based metabolomics. Data analysis of NMR metabolic fingerprints exhibits several challenges, including unwanted biases, high dimensionality, and typically low sample numbers. Common analysis tasks comprise the identification of differential metabolites and the classification of specimens. However, analysis results strongly depend on the preprocessing of the data, and there is no consensus yet on how to remove unwanted biases and experimental variance prior to statistical analysis. Here, we first review established and new preprocessing protocols and illustrate their pros and cons, including different data normalizations and transformations. Second, we give a brief overview of state-of-the-art statistical analysis in NMR-based metabolomics. Finally, we discuss a recent development in statistical data analysis, where data normalization becomes obsolete. This method, called zero-sum regression, builds metabolite signatures whose estimation as well as predictions are independent of prior normalization.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited