Special Issue "Bioinformatics and Data Analysis"

A special issue of Metabolites (ISSN 2218-1989).

Deadline for manuscript submissions: closed (31 August 2016)

Special Issue Editor

Guest Editor
Dr. Peter D. Karp

Director, Bioinformatics Research Group, SRI International, AE206, 333 Ravenswood Ave, Menlo Park, CA 94025, USA
Website | E-Mail
Phone: 650-859-4358
Interests: metabolic pathway bioinformatics, computational genomics, database integration, biological databases, pathway reconstruction, metabolic modeling, model organism databases

Special Issue Information

Dear Colleagues,

Bioinformatics analysis methods for metabolomics data have undergone considerable improvements in the last decade, and these methods have a strong effect on both the speed and the accuracy of metabolomics studies. Still, it seems unlikely that metabolomics investigations are extracting all potential knowledge from their collected data, and the development of improved bioinformatics methods for analyzing metabolomics data are needed. This Special Issue is devoted to computational techniques for analyzing metabolomics data. Topics that will be covered by this Special Issue will include (not exclusively): statistical methods for analyzing metabolomics samples, metabolite structure identification, visualization of metabolomics data, pathway-based data analysis, metabolomics and metabolic modeling, and metabolomics-related databases.

Dr. Peter D. Karp
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Metabolites is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 850 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • metabolomics data analysis
  • metabolite identification
  • computational metabolomics

Published Papers (7 papers)

View options order results:
result details:
Displaying articles 1-7
Export citation of selected articles as:

Research

Open AccessArticle A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps
Metabolites 2016, 6(4), 40; doi:10.3390/metabo6040040
Received: 15 September 2016 / Revised: 27 October 2016 / Accepted: 27 October 2016 / Published: 3 November 2016
Cited by 1 | PDF Full-text (2087 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A
[...] Read more.
Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynxTM software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50–100 counts) and the mass tolerance (0.005–0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data, understanding of the data structures and exploration of different algorithms and methods (at different steps of the data analysis pipeline) might be the best trade-off, currently, and possibly an epistemological imperative. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Figures

Open AccessArticle MetMatch: A Semi-Automated Software Tool for the Comparison and Alignment of LC-HRMS Data from Different Metabolomics Experiments
Metabolites 2016, 6(4), 39; doi:10.3390/metabo6040039
Received: 30 August 2016 / Revised: 27 October 2016 / Accepted: 28 October 2016 / Published: 2 November 2016
PDF Full-text (10601 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Due to its unsurpassed sensitivity and selectivity, LC-HRMS is one of the major analytical techniques in metabolomics research. However, limited stability of experimental and instrument parameters may cause shifts and drifts of retention time and mass accuracy or the formation of different ion
[...] Read more.
Due to its unsurpassed sensitivity and selectivity, LC-HRMS is one of the major analytical techniques in metabolomics research. However, limited stability of experimental and instrument parameters may cause shifts and drifts of retention time and mass accuracy or the formation of different ion species, thus complicating conclusive interpretation of the raw data, especially when generated in different analytical batches. Here, a novel software tool for the semi-automated alignment of different measurement sequences is presented. The tool is implemented in the Java programming language, it features an intuitive user interface and its main goal is to facilitate the comparison of data obtained from different metabolomics experiments. Based on a feature list (i.e., processed LC-HRMS chromatograms with mass-to-charge ratio (m/z) values and retention times) that serves as a reference, the tool recognizes both m/z and retention time shifts of single or multiple analytical datafiles/batches of interest. MetMatch is also designed to account for differently formed ion species of detected metabolites. Corresponding ions and metabolites are matched and chromatographic peak areas, m/z values and retention times are combined into a single data matrix. The convenient user interface allows for easy manipulation of processing results and graphical illustration of the raw data as well as the automatically matched ions and metabolites. The software tool is exemplified with LC-HRMS data from untargeted metabolomics experiments investigating phenylalanine-derived metabolites in wheat and T-2 toxin/HT-2 toxin detoxification products in barley. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Figures

Figure 1

Open AccessArticle Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding
Metabolites 2016, 6(4), 38; doi:10.3390/metabo6040038
Received: 31 August 2016 / Revised: 20 October 2016 / Accepted: 24 October 2016 / Published: 28 October 2016
PDF Full-text (1163 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate
[...] Read more.
Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a “pure” regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Figures

Figure 1

Open AccessFeature PaperArticle Prediction, Detection, and Validation of Isotope Clusters in Mass Spectrometry Data
Metabolites 2016, 6(4), 37; doi:10.3390/metabo6040037
Received: 31 August 2016 / Revised: 29 September 2016 / Accepted: 14 October 2016 / Published: 20 October 2016
Cited by 3 | PDF Full-text (4611 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Mass spectrometry is a key analytical platform for metabolomics. The precise quantification and identification of small molecules is a prerequisite for elucidating the metabolism and the detection, validation, and evaluation of isotope clusters in LC-MS data is important for this task. Here, we
[...] Read more.
Mass spectrometry is a key analytical platform for metabolomics. The precise quantification and identification of small molecules is a prerequisite for elucidating the metabolism and the detection, validation, and evaluation of isotope clusters in LC-MS data is important for this task. Here, we present an approach for the improved detection of isotope clusters using chemical prior knowledge and the validation of detected isotope clusters depending on the substance mass using database statistics. We find remarkable improvements regarding the number of detected isotope clusters and are able to predict the correct molecular formula in the top three ranks in 92 % of the cases. We make our methodology freely available as part of the Bioconductor packages xcms version 1.50.0 and CAMERA version 1.30.0. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Figures

Figure 1

Open AccessArticle Development of Database Assisted Structure Identification (DASI) Methods for Nontargeted Metabolomics
Metabolites 2016, 6(2), 17; doi:10.3390/metabo6020017
Received: 17 April 2016 / Revised: 26 May 2016 / Accepted: 27 May 2016 / Published: 31 May 2016
Cited by 1 | PDF Full-text (13302 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Metabolite structure identification remains a significant challenge in nontargeted metabolomics research. One commonly used strategy relies on searching biochemical databases using exact mass. However, this approach fails when the database does not contain the unknown metabolite (i.e., for unknown-unknowns). For these
[...] Read more.
Metabolite structure identification remains a significant challenge in nontargeted metabolomics research. One commonly used strategy relies on searching biochemical databases using exact mass. However, this approach fails when the database does not contain the unknown metabolite (i.e., for unknown-unknowns). For these cases, constrained structure generation with combinatorial structure generators provides a potential option. Here we evaluated structure generation constraints based on the specification of: (1) substructures required (i.e., seed structures); (2) substructures not allowed; and (3) filters to remove incorrect structures. Our approach (database assisted structure identification, DASI) used predictive models in MolFind to find candidate structures with chemical and physical properties similar to the unknown. These candidates were then used for seed structure generation using eight different structure generation algorithms. One algorithm was able to generate correct seed structures for 21/39 test compounds. Eleven of these seed structures were large enough to constrain the combinatorial structure generator to fewer than 100,000 structures. In 35/39 cases, at least one algorithm was able to generate a correct seed structure. The DASI method has several limitations and will require further experimental validation and optimization. At present, it seems most useful for identifying the structure of unknown-unknowns with molecular weights <200 Da. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Figures

Open AccessArticle Analysis of Metabolomics Datasets with High-Performance Computing and Metabolite Atlases
Metabolites 2015, 5(3), 431-442; doi:10.3390/metabo5030431
Received: 13 April 2015 / Revised: 7 July 2015 / Accepted: 13 July 2015 / Published: 20 July 2015
Cited by 4 | PDF Full-text (3088 KB) | HTML Full-text | XML Full-text
Abstract
Even with the widespread use of liquid chromatography mass spectrometry (LC/MS) based metabolomics, there are still a number of challenges facing this promising technique. Many, diverse experimental workflows exist; yet there is a lack of infrastructure and systems for tracking and sharing of
[...] Read more.
Even with the widespread use of liquid chromatography mass spectrometry (LC/MS) based metabolomics, there are still a number of challenges facing this promising technique. Many, diverse experimental workflows exist; yet there is a lack of infrastructure and systems for tracking and sharing of information. Here, we describe the Metabolite Atlas framework and interface that provides highly-efficient, web-based access to raw mass spectrometry data in concert with assertions about chemicals detected to help address some of these challenges. This integration, by design, enables experimentalists to explore their raw data, specify and refine features annotations such that they can be leveraged for future experiments. Fast queries of the data through the web using SciDB, a parallelized database for high performance computing, make this process operate quickly. By using scripting containers, such as IPython or Jupyter, to analyze the data, scientists can utilize a wide variety of freely available graphing, statistics, and information management resources. In addition, the interfaces facilitate integration with systems biology tools to ultimately link metabolomics data with biological models. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Open AccessArticle Computational Metabolomics Operations at BioCyc.org
Metabolites 2015, 5(2), 291-310; doi:10.3390/metabo5020291
Received: 20 November 2014 / Revised: 28 February 2015 / Accepted: 30 March 2015 / Published: 22 May 2015
Cited by 4 | PDF Full-text (1396 KB) | HTML Full-text | XML Full-text
Abstract
BioCyc.org is a genome and metabolic pathway web portal covering 5500 organisms, including Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Escherichia coli. These organism-specific databases have undergone variable degrees of curation. The EcoCyc (Escherichia coli Encyclopedia) database is the most highly
[...] Read more.
BioCyc.org is a genome and metabolic pathway web portal covering 5500 organisms, including Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Escherichia coli. These organism-specific databases have undergone variable degrees of curation. The EcoCyc (Escherichia coli Encyclopedia) database is the most highly curated; its contents have been derived from 27,000 publications. The MetaCyc (Metabolic Encyclopedia) database within BioCyc is a “universal” metabolic database that describes pathways, reactions, enzymes and metabolites from all domains of life. Metabolic pathways provide an organizing framework for analyzing metabolomics data, and the BioCyc website provides computational operations for metabolomics data that include metabolite search and translation of metabolite identifiers across multiple metabolite databases. The site allows researchers to store and manipulate metabolite lists using a facility called SmartTables, which supports metabolite enrichment analysis. That analysis operation identifies metabolite sets that are statistically over-represented for the substrates of specific metabolic pathways. BioCyc also enables visualization of metabolomics data on individual pathway diagrams and on the organism-specific metabolic map diagrams that are available for every BioCyc organism. Most of these operations are available both interactively and as programmatic web services. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Back to Top