Next Article in Journal
Staphylococcus aureus Infection Reduces Nutrition Uptake and Nucleotide Biosynthesis in a Human Airway Epithelial Cell Line
Previous Article in Journal / Special Issue
MetMatch: A Semi-Automated Software Tool for the Comparison and Alignment of LC-HRMS Data from Different Metabolomics Experiments
Article Menu

Export Article

Open AccessArticle
Metabolites 2016, 6(4), 40;

A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps

Department of Biochemistry, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa
Drug Discovery and Development, Biosciences, CSIR, Pretoria 0001, South Africa
Author to whom correspondence should be addressed.
Academic Editor: Peter Karp
Received: 15 September 2016 / Revised: 27 October 2016 / Accepted: 27 October 2016 / Published: 3 November 2016
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Full-Text   |   PDF [2087 KB, uploaded 3 November 2016]   |  


Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynxTM software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50–100 counts) and the mass tolerance (0.005–0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data, understanding of the data structures and exploration of different algorithms and methods (at different steps of the data analysis pipeline) might be the best trade-off, currently, and possibly an epistemological imperative. View Full-Text
Keywords: chemometrics; data mining; metabolomics; pre-processing; pre-treatment; scaling; transformation chemometrics; data mining; metabolomics; pre-processing; pre-treatment; scaling; transformation

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Supplementary material


Share & Cite This Article

MDPI and ACS Style

Tugizimana, F.; Steenkamp, P.A.; Piater, L.A.; Dubery, I.A. A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps. Metabolites 2016, 6, 40.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Metabolites EISSN 2218-1989 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top