Next Article in Journal
Standardization and Quality Control in Data Collection and Assessment of Threatened Plant Species
Next Article in Special Issue
Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure
Previous Article in Journal
The Land Surface Temperature Synergistic Processor in BEAM: A Prototype towards Sentinel-3
Previous Article in Special Issue
SNiPhunter: A SNP-Based Search Engine
Article Menu

Export Article

Open AccessArticle
Data 2016, 1(3), 19; doi:10.3390/data1030019

Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations

1
Department of Computer Science, University of Pittsburgh, 6135 Sennott Square, 210 S Bouquet St, Pittsburgh, PA 15260-9161, USA
2
Department of Medicine, Washington University School of Medicine, 660 S Euclid Ave, St. Louis, MO 63110, USA
3
Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Suite 500, Pittsburgh, PA 15206-3701, USA
*
Author to whom correspondence should be addressed.
Academic Editor: Pufeng Du
Received: 28 September 2016 / Revised: 5 December 2016 / Accepted: 6 December 2016 / Published: 13 December 2016
(This article belongs to the Special Issue Biomedical Informatics)
View Full-Text   |   Download PDF [590 KB, uploaded 13 December 2016]   |  

Abstract

Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery. View Full-Text
Keywords: helminth infection; microbiota; 16S rRNA gene; taxonomic tree; classification; SMART-scan method; transfer learning helminth infection; microbiota; 16S rRNA gene; taxonomic tree; classification; SMART-scan method; transfer learning
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Supplementary materials

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Eshaghzadeh Torbati, M.; Mitreva, M.; Gopalakrishnan, V. Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations. Data 2016, 1, 19.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Data EISSN 2306-5729 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top