Next Article in Journal
Anaphase A: Disassembling Microtubules Move Chromosomes toward Spindle Poles
Previous Article in Journal
Mechanisms of Chromosome Congression during Mitosis
Article Menu

Export Article

Open AccessArticle
Biology 2017, 6(1), 14; doi:10.3390/biology6010014

MetaGaAP: A Novel Pipeline to Estimate Community Composition and Abundance from Non-Model Sequence Data

School of Earth, Environmental and Biological Sciences, Queensland University of Technology, Brisbane City QLD 4000, Australia
Author to whom correspondence should be addressed.
Academic Editor: Jukka Finne
Received: 1 December 2016 / Revised: 6 January 2017 / Accepted: 7 February 2017 / Published: 17 February 2017
View Full-Text   |   Download PDF [1612 KB, uploaded 17 February 2017]   |  


Next generation sequencing and bioinformatic approaches are increasingly used to quantify microorganisms within populations by analysis of ‘meta-barcode’ data. This approach relies on comparison of amplicon sequences of ‘barcode’ regions from a population with public-domain databases of reference sequences. However, for many organisms relevant ‘barcode’ regions may not have been identified and large databases of reference sequences may not be available. A workflow and software pipeline, ‘MetaGaAP,’ was developed to identify and quantify genotypes through four steps: shotgun sequencing and identification of polymorphisms in a metapopulation to identify custom ‘barcode’ regions of less than 30 polymorphisms within the span of a single ‘read’, amplification and sequencing of the ‘barcode’, generation of a custom database of polymorphisms, and quantitation of the relative abundance of genotypes. The pipeline and workflow were validated in a ‘wild type’ Alphabaculovirus isolate, Helicoverpa armigera single nucleopolyhedrovirus (HaSNPV-AC53) and a tissue-culture derived strain (HaSNPV-AC53-T2). The approach was validated by comparison of polymorphisms in amplicons and shotgun data, and by comparison of predicted dominant and co-dominant genotypes with Sanger sequences. The computational power required to generate and search the database effectively limits the number of polymorphisms that can be included in a barcode to 30 or less. The approach can be used in quantitative analysis of the ecology and pathology of non-model organisms. View Full-Text
Keywords: bioinformatics; baculoviruses; metapopulation; meta-barcoding; MetaGaAP; HaSNPV-AC53; community analysis bioinformatics; baculoviruses; metapopulation; meta-barcoding; MetaGaAP; HaSNPV-AC53; community analysis

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Supplementary material

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Noune, C.; Hauxwell, C. MetaGaAP: A Novel Pipeline to Estimate Community Composition and Abundance from Non-Model Sequence Data. Biology 2017, 6, 14.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Biology EISSN 2079-7737 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top