Special Issue "Metabolomics Data Processing and Data Analysis—Current Best Practices"

A special issue of Metabolites (ISSN 2218-1989).

Deadline for manuscript submissions: closed (31 May 2019).

Special Issue Editors

Dr. Kati Hanhineva
E-Mail Website
Guest Editor
Academy Research Fellow, University of Eastern Finland, Finland
Interests: food and nutritional metabolomics; LC-MS based metabolic profiling approaches; development of data-analytical procedures for metabolomics
Special Issues and Collections in MDPI journals
Dr. Justin Van der Hooft
E-Mail Website
Guest Editor
Bioinformatics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands
Tel. +31317482620
Interests: analytical chemistry; mass spectrometry; mass spectrometry fragmentation; metabolism; metabolites; specialized molecules; metabolomics; metabolome mining; linking genes to molecules; nuclear magnetic resonance

Special Issue Information

Dear Colleagues,

Metabolomics data-analytical approaches are developing with accelerating speed, alongside technical improvements in the instrumentation used in the field. There is currently a plethora of vendor-specific and open source software solutions for various aspects of the metabolomics data-analysis—some of which are covering the whole workflow, whereas some are focusing on specific aspects, such as the in silico prediction of metabolite structures. Thus, the choice of methods for new scholars entering the field may be confusing, and the selection of suitable approach is a tedious process. This Special Issue is devoted to reviewing the current practical aspects of metabolomic data-analytical workflows, starting from the data collection all the way to the presentation of publication-ready metabolomics results, to serve as a tutorial on the current best practices. We therefore invite review and viewpoint manuscripts devoted to various aspect within non-targeted metabolite profiling data-analysis with a specific emphasis on peak picking, data preprocessing (e.g., normalization, scaling, imputation), metabolite annotation and identification, as well as visualization practices. Finally, we also invite manuscripts with innovative and integrative solutions towards peak picking and metabolite annotations—which may well become “current practices” in the near future.

The Special Issue is open for submission now. A proper extension may be granted. Please kindly let us know in advance. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website.

Dr. Kati Hanhineva
Dr. Justin van der Hooft
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Metabolites is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • metabolomics
  • data processing
  • data analysis
  • data interpretation: annotation and visualization

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle
In Silico Optimization of Mass Spectrometry Fragmentation Strategies in Metabolomics
Metabolites 2019, 9(10), 219; https://doi.org/10.3390/metabo9100219 - 09 Oct 2019
Abstract
Liquid chromatography (LC) coupled to tandem mass spectrometry (MS/MS) is widely used in identifying small molecules in untargeted metabolomics. Various strategies exist to acquire MS/MS fragmentation spectra; however, the development of new acquisition strategies is hampered by the lack of simulators that let [...] Read more.
Liquid chromatography (LC) coupled to tandem mass spectrometry (MS/MS) is widely used in identifying small molecules in untargeted metabolomics. Various strategies exist to acquire MS/MS fragmentation spectra; however, the development of new acquisition strategies is hampered by the lack of simulators that let researchers prototype, compare, and optimize strategies before validations on real machines. We introduce Virtual Metabolomics Mass Spectrometer (ViMMS), a metabolomics LC-MS/MS simulator framework that allows for scan-level control of the MS2 acquisition process in silico. ViMMS can generate new LC-MS/MS data based on empirical data or virtually re-run a previous LC-MS/MS analysis using pre-existing data to allow the testing of different fragmentation strategies. To demonstrate its utility, we show how ViMMS can be used to optimize N for Top-N data-dependent acquisition (DDA) acquisition, giving results comparable to modifying N on the mass spectrometer. We expect that ViMMS will save method development time by allowing for offline evaluation of novel fragmentation strategies and optimization of the fragmentation strategy for a particular experiment. Full article
Open AccessArticle
R-MetaboList 2: A Flexible Tool for Metabolite Annotation from High-Resolution Data-Independent Acquisition Mass Spectrometry Analysis
Metabolites 2019, 9(9), 187; https://doi.org/10.3390/metabo9090187 - 17 Sep 2019
Abstract
Technological advancements have permitted the development of innovative multiplexing strategies for data independent acquisition (DIA) mass spectrometry (MS). Software solutions and extensive compound libraries facilitate the efficient analysis of MS1 data, regardless of the analytical platform. However, the development of comparable tools [...] Read more.
Technological advancements have permitted the development of innovative multiplexing strategies for data independent acquisition (DIA) mass spectrometry (MS). Software solutions and extensive compound libraries facilitate the efficient analysis of MS1 data, regardless of the analytical platform. However, the development of comparable tools for DIA data analysis has significantly lagged. This research introduces an update to the former MetaboList R package and a workflow for full-scan MS1 and MS/MS DIA processing of metabolomic data from multiplexed liquid chromatography high-resolution mass spectrometry (LC-HRMS) experiments. When compared to the former version, new functions have been added to address isolated MS1 and MS/MS workflows, processing of MS/MS data from stepped collision energies, performance scoring of metabolite annotations, and batch job analysis were incorporated into the update. The flexibility and efficiency of this strategy were assessed through the study of the metabolite profiles of human urine, leukemia cell culture, and medium samples analyzed by either liquid chromatography quadrupole time-of-flight (q-TOF) or quadrupole orbital (q-Orbitrap) instruments. This open-source alternative was designed to promote global metabolomic strategies based on recursive retrospective research of multiplexed DIA analysis. Full article
Show Figures

Figure 1

Open AccessArticle
rMSIKeyIon: An Ion Filtering R Package for Untargeted Analysis of Metabolomic LDI-MS Images
Metabolites 2019, 9(8), 162; https://doi.org/10.3390/metabo9080162 - 02 Aug 2019
Abstract
Many MALDI-MS imaging experiments make a case versus control studies of different tissue regions in order to highlight significant compounds affected by the variables of study. This is a challenge because the tissue samples to be compared come from different biological entities, and [...] Read more.
Many MALDI-MS imaging experiments make a case versus control studies of different tissue regions in order to highlight significant compounds affected by the variables of study. This is a challenge because the tissue samples to be compared come from different biological entities, and therefore they exhibit high variability. Moreover, the statistical tests available cannot properly compare ion concentrations in two regions of interest (ROIs) within or between images. The high correlation between the ion concentrations due to the existence of different morphological regions in the tissue means that the common statistical tests used in metabolomics experiments cannot be applied. Another difficulty with the reliability of statistical tests is the elevated number of undetected MS ions in a high percentage of pixels. In this study, we report a procedure for discovering the most important ions in the comparison of a pair of ROIs within or between tissue sections. These ROIs were identified by an unsupervised segmentation process, using the popular k-means algorithm. Our ion filtering algorithm aims to find the up or down-regulated ions between two ROIs by using a combination of three parameters: (a) the percentage of pixels in which a particular ion is not detected, (b) the Mann–Whitney U ion concentration test, and (c) the ion concentration fold-change. The undetected MS signals (null peaks) are discarded from the histogram before the calculation of (b) and (c) parameters. With this methodology, we found the important ions between the different segments of a mouse brain tissue sagittal section and determined some lipid compounds (mainly triacylglycerols and phosphatidylcholines) in the liver of mice exposed to thirdhand smoke. Full article
Show Figures

Figure 1

Open AccessArticle
MolNetEnhancer: Enhanced Molecular Networks by Integrating Metabolome Mining and Annotation Tools
Metabolites 2019, 9(7), 144; https://doi.org/10.3390/metabo9070144 - 16 Jul 2019
Cited by 1
Abstract
Metabolomics has started to embrace computational approaches for chemical interpretation of large data sets. Yet, metabolite annotation remains a key challenge. Recently, molecular networking and MS2LDA emerged as molecular mining tools that find molecular families and substructures in mass spectrometry fragmentation data. Moreover, [...] Read more.
Metabolomics has started to embrace computational approaches for chemical interpretation of large data sets. Yet, metabolite annotation remains a key challenge. Recently, molecular networking and MS2LDA emerged as molecular mining tools that find molecular families and substructures in mass spectrometry fragmentation data. Moreover, in silico annotation tools obtain and rank candidate molecules for fragmentation spectra. Ideally, all structural information obtained and inferred from these computational tools could be combined to increase the resulting chemical insight one can obtain from a data set. However, integration is currently hampered as each tool has its own output format and efficient matching of data across these tools is lacking. Here, we introduce MolNetEnhancer, a workflow that combines the outputs from molecular networking, MS2LDA, in silico annotation tools (such as Network Annotation Propagation or DEREPLICATOR), and the automated chemical classification through ClassyFire to provide a more comprehensive chemical overview of metabolomics data whilst at the same time illuminating structural details for each fragmentation spectrum. We present examples from four plant and bacterial case studies and show how MolNetEnhancer enables the chemical annotation, visualization, and discovery of the subtle substructural diversity within molecular families. We conclude that MolNetEnhancer is a useful tool that greatly assists the metabolomics researcher in deciphering the metabolome through combination of multiple independent in silico pipelines. Full article
Show Figures

Graphical abstract

Open AccessArticle
Visualization and Interpretation of Multivariate Associations with Disease Risk Markers and Disease Risk—The Triplot
Metabolites 2019, 9(7), 133; https://doi.org/10.3390/metabo9070133 - 06 Jul 2019
Abstract
Metabolomics has emerged as a promising technique to understand relationships between environmental factors and health status. Through comprehensive profiling of small molecules in biological samples, metabolomics generates high-dimensional data objectively, reflecting exposures, endogenous responses, and health effects, thereby providing further insights into exposure-disease [...] Read more.
Metabolomics has emerged as a promising technique to understand relationships between environmental factors and health status. Through comprehensive profiling of small molecules in biological samples, metabolomics generates high-dimensional data objectively, reflecting exposures, endogenous responses, and health effects, thereby providing further insights into exposure-disease associations. However, the multivariate nature of metabolomics data contributes to high complexity in analysis and interpretation. Efficient visualization techniques of multivariate data that allow direct interpretation of combined exposures, metabolome, and disease risk, are currently lacking. We have therefore developed the ‘triplot’ tool, a novel algorithm that simultaneously integrates and displays metabolites through latent variable modeling (e.g., principal component analysis, partial least squares regression, or factor analysis), their correlations with exposures, and their associations with disease risk estimates or intermediate risk factors. This paper illustrates the framework of the ‘triplot’ using two synthetic datasets that explore associations between dietary intake, plasma metabolome, and incident type 2 diabetes or BMI, an intermediate risk factor for lifestyle-related diseases. Our results demonstrate advantages of triplot over conventional visualization methods in facilitating interpretation in multivariate risk modeling with high-dimensional data. Algorithms, synthetic data, and tutorials are open source and available in the R package ‘triplot’. Full article
Show Figures

Figure 1

Open AccessArticle
Mass Spectrometry Data Repository Enhances Novel Metabolite Discoveries with Advances in Computational Metabolomics
Metabolites 2019, 9(6), 119; https://doi.org/10.3390/metabo9060119 - 24 Jun 2019
Abstract
Mass spectrometry raw data repositories, including Metabolomics Workbench and MetaboLights, have contributed to increased transparency in metabolomics studies and the discovery of novel insights in biology by reanalysis with updated computational metabolomics tools. Herein, we reanalyzed the previously published lipidomics data from nine [...] Read more.
Mass spectrometry raw data repositories, including Metabolomics Workbench and MetaboLights, have contributed to increased transparency in metabolomics studies and the discovery of novel insights in biology by reanalysis with updated computational metabolomics tools. Herein, we reanalyzed the previously published lipidomics data from nine algal species, resulting in the annotation of 1437 lipids achieving a 40% increase in annotation compared to the previous results. Specifically, diacylglyceryl-carboxyhydroxy-methylcholine (DGCC) in Pavlova lutheri and Pleurochrysis carterae, glucuronosyldiacylglycerol (GlcADG) in Euglena gracilis, and P. carterae, phosphatidylmethanol (PMeOH) in E. gracilis, and several oxidized phospholipids (oxidized phosphatidylcholine, OxPC; phosphatidylethanolamine, OxPE; phosphatidylglycerol, OxPG; phosphatidylinositol, OxPI) in Chlorella variabilis were newly characterized with the enriched lipid spectral databases. Moreover, we integrated the data from untargeted and targeted analyses from data independent tandem mass spectrometry (DIA-MS/MS) acquisition, specifically the sequential window acquisition of all theoretical fragment-ion MS/MS (SWATH-MS/MS) spectra, to increase the lipidomic annotation coverage. After the creation of a global library of precursor and diagnostic ions of lipids by the MS-DIAL untargeted analysis, the co-eluted DIA-MS/MS spectra were resolved in MRMPROBS targeted analysis by tracing the specific product ions involved in acyl chain compositions. Our results indicated that the metabolite quantifications based on DIA-MS/MS chromatograms were somewhat inferior to the MS1-centric quantifications, while the annotation coverage outperformed those of the untargeted analysis of the data dependent and DIA-MS/MS data. Consequently, integrated analyses of untargeted and targeted approaches are necessary to extract the maximum amount of metabolome information, and our results showcase the value of data repositories for the discovery of novel insights in lipid biology. Full article
Show Figures

Figure 1

Open AccessArticle
Comparison of Bi- and Tri-Linear PLS Models for Variable Selection in Metabolomic Time-Series Experiments
Metabolites 2019, 9(5), 92; https://doi.org/10.3390/metabo9050092 - 09 May 2019
Cited by 1
Abstract
Metabolomic studies with a time-series design are widely used for discovery and validation of biomarkers. In such studies, changes of metabolic profiles over time under different conditions (e.g., control and intervention) are compared, and metabolites responding differently between the conditions are identified as [...] Read more.
Metabolomic studies with a time-series design are widely used for discovery and validation of biomarkers. In such studies, changes of metabolic profiles over time under different conditions (e.g., control and intervention) are compared, and metabolites responding differently between the conditions are identified as putative biomarkers. To incorporate time-series information into the variable (biomarker) selection in partial least squares regression (PLS) models, we created PLS models with different combinations of bilinear/trilinear X and group/time response dummy Y. In total, five PLS models were evaluated on two real datasets, and also on simulated datasets with varying characteristics (number of subjects, number of variables, inter-individual variability, intra-individual variability and number of time points). Variables showing specific temporal patterns observed visually and determined statistically were labelled as discriminating variables. Bootstrapped-VIP scores were calculated for variable selection and the variable selection performance of five PLS models were assessed based on their capacity to correctly select the discriminating variables. The results showed that the bilinear PLS model with group × time response as dummy Y provided the highest recall (true positive rate) of 83–95% with high precision, independent of most characteristics of the datasets. Trilinear PLS models tend to select a small number of variables with high precision but relatively high false negative rate (lower power). They are also less affected by the noise compared to bilinear PLS models. In datasets with high inter-individual variability, bilinear PLS models tend to provide higher recall while trilinear models tend to provide higher precision. Overall, we recommend bilinear PLS with group x time response Y for variable selection applications in metabolomics intervention time series studies. Full article
Show Figures

Figure 1

Open AccessArticle
CFM-ID 3.0: Significantly Improved ESI-MS/MS Prediction and Compound Identification
Metabolites 2019, 9(4), 72; https://doi.org/10.3390/metabo9040072 - 13 Apr 2019
Cited by 6
Abstract
Metabolite identification for untargeted metabolomics is often hampered by the lack of experimentally collected reference spectra from tandem mass spectrometry (MS/MS). To circumvent this problem, Competitive Fragmentation Modeling-ID (CFM-ID) was developed to accurately predict electrospray ionization-MS/MS (ESI-MS/MS) spectra from chemical structures and to [...] Read more.
Metabolite identification for untargeted metabolomics is often hampered by the lack of experimentally collected reference spectra from tandem mass spectrometry (MS/MS). To circumvent this problem, Competitive Fragmentation Modeling-ID (CFM-ID) was developed to accurately predict electrospray ionization-MS/MS (ESI-MS/MS) spectra from chemical structures and to aid in compound identification via MS/MS spectral matching. While earlier versions of CFM-ID performed very well, CFM-ID’s performance for predicting the MS/MS spectra of certain classes of compounds, including many lipids, was quite poor. Furthermore, CFM-ID’s compound identification capabilities were limited because it did not use experimentally available MS/MS spectra nor did it exploit metadata in its spectral matching algorithm. Here, we describe significant improvements to CFM-ID’s performance and speed. These include (1) the implementation of a rule-based fragmentation approach for lipid MS/MS spectral prediction, which greatly improves the speed and accuracy of CFM-ID; (2) the inclusion of experimental MS/MS spectra and other metadata to enhance CFM-ID’s compound identification abilities; (3) the development of new scoring functions that improves CFM-ID’s accuracy by 21.1%; and (4) the implementation of a chemical classification algorithm that correctly classifies unknown chemicals (based on their MS/MS spectra) in >80% of the cases. This improved version called CFM-ID 3.0 is freely available as a web server. Its source code is also accessible online. Full article
Show Figures

Figure 1

Open AccessFeature PaperArticle
MetaboAnalystR 2.0: From Raw Spectra to Biological Insights
Metabolites 2019, 9(3), 57; https://doi.org/10.3390/metabo9030057 - 22 Mar 2019
Cited by 6
Abstract
Global metabolomics based on high-resolution liquid chromatography mass spectrometry (LC-MS) has been increasingly employed in recent large-scale multi-omics studies. Processing and interpretation of these complex metabolomics datasets have become a key challenge in current computational metabolomics. Here, we introduce MetaboAnalystR 2.0 for comprehensive [...] Read more.
Global metabolomics based on high-resolution liquid chromatography mass spectrometry (LC-MS) has been increasingly employed in recent large-scale multi-omics studies. Processing and interpretation of these complex metabolomics datasets have become a key challenge in current computational metabolomics. Here, we introduce MetaboAnalystR 2.0 for comprehensive LC-MS data processing, statistical analysis, and functional interpretation. Compared to the previous version, this new release seamlessly integrates XCMS and CAMERA to support raw spectral processing and peak annotation, and also features high-performance implementations of mummichog and GSEA approaches for predictions of pathway activities. The application and utility of the MetaboAnalystR 2.0 workflow were demonstrated using a synthetic benchmark dataset and a clinical dataset. In summary, MetaboAnalystR 2.0 offers a unified and flexible workflow that enables end-to-end analysis of LC-MS metabolomics data within the open-source R environment. Full article
Show Figures

Graphical abstract

Open AccessArticle
Annotating Nontargeted LC-HRMS/MS Data with Two Complementary Tandem Mass Spectral Libraries
Metabolites 2019, 9(1), 3; https://doi.org/10.3390/metabo9010003 - 23 Dec 2018
Cited by 3
Abstract
Tandem mass spectral databases are indispensable for fast and reliable compound identification in nontargeted analysis with liquid chromatography–high resolution tandem mass spectrometry (LC-HRMS/MS), which is applied to a wide range of scientific fields. While many articles now review and compare spectral libraries, in [...] Read more.
Tandem mass spectral databases are indispensable for fast and reliable compound identification in nontargeted analysis with liquid chromatography–high resolution tandem mass spectrometry (LC-HRMS/MS), which is applied to a wide range of scientific fields. While many articles now review and compare spectral libraries, in this manuscript we investigate two high-quality and specialized collections from our respective institutes, recorded on different instruments (quadrupole time-of-flight or QqTOF vs. Orbitrap). The optimal range of collision energies for spectral comparison was evaluated using 233 overlapping compounds between the two libraries, revealing that spectra in the range of CE 20–50 eV on the QqTOF and 30–60 nominal collision energy units on the Orbitrap provided optimal matching results for these libraries. Applications to complex samples from the respective institutes revealed that the libraries, combined with a simple data mining approach to retrieve all spectra with precursor and fragment information, could confirm many validated target identifications and yield several new Level 2a (spectral match) identifications. While the results presented are not surprising in many ways, this article adds new results to the debate on the comparability of Orbitrap and QqTOF data and the application of spectral libraries to yield rapid and high-confidence tentative identifications in complex human and environmental samples. Full article
Show Figures

Graphical abstract

Open AccessArticle
Mind the Gap: Mapping Mass Spectral Databases in Genome-Scale Metabolic Networks Reveals Poorly Covered Areas
Metabolites 2018, 8(3), 51; https://doi.org/10.3390/metabo8030051 - 15 Sep 2018
Cited by 9
Abstract
The use of mass spectrometry-based metabolomics to study human, plant and microbial biochemistry and their interactions with the environment largely depends on the ability to annotate metabolite structures by matching mass spectral features of the measured metabolites to curated spectra of reference standards. [...] Read more.
The use of mass spectrometry-based metabolomics to study human, plant and microbial biochemistry and their interactions with the environment largely depends on the ability to annotate metabolite structures by matching mass spectral features of the measured metabolites to curated spectra of reference standards. While reference databases for metabolomics now provide information for hundreds of thousands of compounds, barely 5% of these known small molecules have experimental data from pure standards. Remarkably, it is still unknown how well existing mass spectral libraries cover the biochemical landscape of prokaryotic and eukaryotic organisms. To address this issue, we have investigated the coverage of 38 genome-scale metabolic networks by public and commercial mass spectral databases, and found that on average only 40% of nodes in metabolic networks could be mapped by mass spectral information from standards. Next, we deciphered computationally which parts of the human metabolic network are poorly covered by mass spectral libraries, revealing gaps in the eicosanoids, vitamins and bile acid metabolism. Finally, our network topology analysis based on the betweenness centrality of metabolites revealed the top 20 most important metabolites that, if added to MS databases, may facilitate human metabolome characterization in the future. Full article
Show Figures

Figure 1

Review

Jump to: Research

Open AccessReview
Metabolic Modeling of Human Gut Microbiota on a Genome Scale: An Overview
Metabolites 2019, 9(2), 22; https://doi.org/10.3390/metabo9020022 - 28 Jan 2019
Cited by 3
Abstract
There is growing interest in the metabolic interplay between the gut microbiome and host metabolism. Taxonomic and functional profiling of the gut microbiome by next-generation sequencing (NGS) has unveiled substantial richness and diversity. However, the mechanisms underlying interactions between diet, gut microbiome and [...] Read more.
There is growing interest in the metabolic interplay between the gut microbiome and host metabolism. Taxonomic and functional profiling of the gut microbiome by next-generation sequencing (NGS) has unveiled substantial richness and diversity. However, the mechanisms underlying interactions between diet, gut microbiome and host metabolism are still poorly understood. Genome-scale metabolic modeling (GSMM) is an emerging approach that has been increasingly applied to infer diet–microbiome, microbe–microbe and host–microbe interactions under physiological conditions. GSMM can, for example, be applied to estimate the metabolic capabilities of microbes in the gut. Here, we discuss how meta-omics datasets such as shotgun metagenomics, can be processed and integrated to develop large-scale, condition-specific, personalized microbiota models in healthy and disease states. Furthermore, we summarize various tools and resources available for metagenomic data processing and GSMM, highlighting the experimental approaches needed to validate the model predictions. Full article
Show Figures

Graphical abstract

Back to TopTop