Next Article in Journal
Homochirality through Photon-Induced Denaturing of RNA/DNA at the Origin of Life
Next Article in Special Issue
Bioinformatic Workflows for Generating Complete Plastid Genome Sequences—An Example from Cabomba (Cabombaceae) in the Context of the Phylogenomic Analysis of the Water-Lily Clade
Previous Article in Journal
Sun Exposure Shapes Functional Grouping of Fungi in Cryptoendolithic Antarctic Communities
Previous Article in Special Issue
Integral Phylogenomic Approach over Ilex L. Species from Southern South America
Technical Note

phylotaR: An Automated Pipeline for Retrieving Orthologous DNA Sequences from GenBank in R

Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Gothenburg, Sweden
Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Gothenburg, Sweden
Naturalis Biodiversity Center, P.O. Box 9517, 2300 RA Leiden, The Netherlands
Gothenburg Botanical Garden, Carl Skottsbergsgata 22A, SE-413 19 Gothenburg, Sweden
Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St., Cambridge, MA 02138 USA
Author to whom correspondence should be addressed.
Received: 28 March 2018 / Revised: 26 May 2018 / Accepted: 1 June 2018 / Published: 5 June 2018
(This article belongs to the Special Issue Open Science Phyloinformatics: Resources, Methods, and Analyses)
The exceptional increase in molecular DNA sequence data in open repositories is mirrored by an ever-growing interest among evolutionary biologists to harvest and use those data for phylogenetic inference. Many quality issues, however, are known and the sheer amount and complexity of data available can pose considerable barriers to their usefulness. A key issue in this domain is the high frequency of sequence mislabeling encountered when searching for suitable sequences for phylogenetic analysis. These issues include, among others, the incorrect identification of sequenced species, non-standardized and ambiguous sequence annotation, and the inadvertent addition of paralogous sequences by users. Taken together, these issues likely add considerable noise, error or bias to phylogenetic inference, a risk that is likely to increase with the size of phylogenies or the molecular datasets used to generate them. Here we present a software package, phylotaR that bypasses the above issues by using instead an alignment search tool to identify orthologous sequences. Our package builds on the framework of its predecessor, PhyLoTa, by providing a modular pipeline for identifying overlapping sequence clusters using up-to-date GenBank data and providing new features, improvements and tools. We demonstrate and test our pipeline’s effectiveness by presenting trees generated from phylotaR clusters for two large taxonomic clades: Palms and primates. Given the versatility of this package, we hope that it will become a standard tool for any research aiming to use GenBank data for phylogenetic analysis. View Full-Text
Keywords: BLAST; DNA; open source; phylogenetics; R; sequence orthology BLAST; DNA; open source; phylogenetics; R; sequence orthology
Show Figures

Figure 1

MDPI and ACS Style

Bennett, D.J.; Hettling, H.; Silvestro, D.; Zizka, A.; Bacon, C.D.; Faurby, S.; Vos, R.A.; Antonelli, A. phylotaR: An Automated Pipeline for Retrieving Orthologous DNA Sequences from GenBank in R. Life 2018, 8, 20.

AMA Style

Bennett DJ, Hettling H, Silvestro D, Zizka A, Bacon CD, Faurby S, Vos RA, Antonelli A. phylotaR: An Automated Pipeline for Retrieving Orthologous DNA Sequences from GenBank in R. Life. 2018; 8(2):20.

Chicago/Turabian Style

Bennett, Dominic J., Hannes Hettling, Daniele Silvestro, Alexander Zizka, Christine D. Bacon, Søren Faurby, Rutger A. Vos, and Alexandre Antonelli. 2018. "phylotaR: An Automated Pipeline for Retrieving Orthologous DNA Sequences from GenBank in R" Life 8, no. 2: 20.

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop