Next Article in Journal / Special Issue
A Phylogenomic Supertree of Birds
Previous Article in Journal
A Vulnerability Analysis of Coral Reefs in Coastal Ecotourism Areas for Conservation Management
Previous Article in Special Issue
No Signs of Genetic Erosion in a 19th Century Genome of the Extinct Paradise Parrot (Psephotellus pulcherrimus)
 
 
Article
Peer-Review Record

Phylogenetic Signal of Indels and the Neoavian Radiation

Diversity 2019, 11(7), 108; https://doi.org/10.3390/d11070108
by Peter Houde 1,*, Edward L. Braun 2,*, Nitish Narula 1, Uriel Minjares 1 and Siavash Mirarab 3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Diversity 2019, 11(7), 108; https://doi.org/10.3390/d11070108
Submission received: 3 June 2019 / Revised: 3 July 2019 / Accepted: 4 July 2019 / Published: 6 July 2019
(This article belongs to the Special Issue Genomic Analyses of Avian Evolution)

Round 1

Reviewer 1 Report

Review for Houde et al. Phylogenetic signal of indels and the neoavian radiation. Diversity.

 

Houde et al. Phylogenetic signal of indels and the neoavian radiation represents a needed contribution to the literature on the deep phylogenetic relationships among living birds. The combination of indel and nucleotide sequence data represents significant progress in testing for hard polytomies among the ordinal branches of the avian tree using genomic data. I would recommend publishing with minor revisions.

 

I have relatively few comments to contribute to this already excellent manuscript. Chief among these however would concern a discussion of indel size variation and the effect of considering all indels, regardless of size, in the context of a single analysis. The authors indicate that homplasy is negatively correlated with indel length but that the most inclusive indel datasets still exhibit much lower incidence of homoplasy compared to nucleotide sequence data (page 5, lines 146-152; page 7, lines 222-223; Figure 3). Additionally the authors note that Han et al. found that 73% of insertions >40bp were transposable elements (page 7, lines 230-231). The authors also acknowledge the heterogeneous nature of the underlying mutational mechanisms of indels and compared to nucleotide sequences the lack of robust models to describe their evolution (page 4, lines 102-122; page 5, lines 139-146; page 7, lines 227-229). What’s more the authors also point to the observation that there is extensive conflict among trees generated from transposable elements (page 21; lines 547-552). The question therefore is that is most of the phylogenetic signal of the indels in this study coming from the smaller indels? Since the authors combine all indels it’s difficult to say. The authors identified nearly 24,000 informative indels >50bp. Even though they suggest that larger indels are less desirable because there are fewer of them (page 5, lines 148-152) I would think a separate analysis of these nearly 24,000 informative indels (many of which are likely TEs) would be a useful analysis if for no other reason to see of they mirror the results of other TE specific analyses. Additionally, a major concern the authors rightfully emphasize is the lack of clear mutational models for indel-based analyses (page 4, lines 114-118; page 24, lines 621-622). However, short and long indels are likely under very different mutational mechanisms (ex. slippage at short tandem repeats for many small indels and TEs for many long indels) and thus may fit very different models. Additional analyses that split short and long indels (perhaps at  50bp) would be a useful piece of information to determine if different sized indels which presumably evolve by very different mechanisms are contributing different phylogenetic signals. Whether that additional analysis should be in this paper or another may be a matter of debate but at least more discussion should be given to the topic perhaps to set the stage for a future follow up paper.

 

Below are some additional very minor comments:

 

Page 4, lines 113-114: Perhaps this sentence would better read, “Assuming that one accepts indels are useful characters…”

Page 6, line 179: Simmons and Ochoterena 2000 specifically refer to this as “indel coding” not “gap coding”.

 

Page 6, line 191: I’m assuming BS refers top “bootstrapped”. I would make certain all these abbreviations are defined in the paper the first time they are mentioned.

 

Page 6, TABLE 1: The first line of table 1 says there were 2,515 introns in the study while the text mentions 2,516 introns (page 6, line 178). Also I think there is space on the table to spell out “informative” rather than using the abbreviation “Inf”.

 

Page 16, FIGURE 9: I think given the authors have a clear legend associating shades of red with the number of branches it is not necessary to also include those numbers on every cell of the heat map in Figure 9b.

 

Page 21, line 529: A grammatical suggestion would be to change the parenthetical part of this sentence to “(i.e. the MP-EST* TENT binned loci and included exons)”.

 

Page 22, line 557: The sentence “We believe that our analyses say quite a bit.” Is unnecessary.


Author Response

We thank the reviewer very much for the careful, thoughtful, and quick review.


Reviewer: "Houde et al. Phylogenetic signal of indels and the neoavian radiation represents a needed contribution to the literature on the deep phylogenetic relationships among living birds. The combination of indel and nucleotide sequence data represents significant progress in testing for hard polytomies among the ordinal branches of the avian tree using genomic data. I would recommend publishing with minor revisions.

 

I have relatively few comments to contribute to this already excellent manuscript. Chief among these however would concern a discussion of indel size variation and the effect of considering all indels, regardless of size, in the context of a single analysis. The authors indicate that homplasy is negatively correlated with indel length but that the most inclusive indel datasets still exhibit much lower incidence of homoplasy compared to nucleotide sequence data (page 5, lines 146-152; page 7, lines 222-223; Figure 3). Additionally the authors note that Han et al. found that 73% of insertions >40bp were transposable elements (page 7, lines 230-231). The authors also acknowledge the heterogeneous nature of the underlying mutational mechanisms of indels and compared to nucleotide sequences the lack of robust models to describe their evolution (page 4, lines 102-122; page 5, lines 139-146; page 7, lines 227-229). What’s more the authors also point to the observation that there is extensive conflict among trees generated from transposable elements (page 21; lines 547-552). The question therefore is that is most of the phylogenetic signal of the indels in this study coming from the smaller indels? Since the authors combine all indels it’s difficult to say. The authors identified nearly 24,000 informative indels >50bp. Even though they suggest that larger indels are less desirable because there are fewer of them (page 5, lines 148-152) I would think a separate analysis of these nearly 24,000 informative indels (many of which are likely TEs) would be a useful analysis if for no other reason to see of they mirror the results of other TE specific analyses. Additionally, a major concern the authors rightfully emphasize is the lack of clear mutational models for indel-based analyses (page 4, lines 114-118; page 24, lines 621-622). However, short and long indels are likely under very different mutational mechanisms (ex. slippage at short tandem repeats for many small indels and TEs for many long indels) and thus may fit very different models. Additional analyses that split short and long indels (perhaps at  50bp) would be a useful piece of information to determine if different sized indels which presumably evolve by very different mechanisms are contributing different phylogenetic signals. Whether that additional analysis should be in this paper or another may be a matter of debate but at least more discussion should be given to the topic perhaps to set the stage for a future follow up paper."

Response: Of course we are quite interested in both the differences in properties and phylogenetic signal of indels of different size classes. This is part of our ongoing work, but identifying the causes and modeling them represent a collection of complex issues that is both beyond the scope of this study and on which we are yet not ready to report. We would add to the reviewer’s observations that even the shortest of indels appear to result from multiple mutational mechanisms. Some of them occur within regions of very low sequence complexity that are virtually impossible to confidently align, often near the 3’ terminus of introns. These are expected to exhibit both high levels of true homoplasy due to recurrent replication slippage and scoring errors due to inaccurate alignment. Other short indels with empirically high parsimony consistency occur in low density within unambiguously aligned regions of relatively high sequence complexity. These differences need to be characterized based on either sequence complexity or alignment consistency so that they can be modeled and /or filtered to provide a larger dataset of parsimony consistent indel characters to augment the meager dataset of large indels.

That being said, we have added a new section, figure, and table to the Supplemental Materials that address some of these issues and drive home the point that there are simply too few long indels to provide meaningful results, especially in a coalescent framework. We report the results of maximum likelihood analyses on concatenated indels of increasingly larger size classes (i.e., less-inclusive datasets). We show that the number of larger indels present per intron is prohibitively far too few to resolve nodes in the gene trees required for coalescent analysis, and that bootstrap support decreases precipitously even in maximum likelihood analysis of the concatenated data. While there may seem to be a large number of indels >50bp in length, the majority are parsimony uninformative, they are divided among a large number of loci, and most exist in long branches. To this we would merely add that gene tree conflict of both short and long indels is similarly high to that of published TEs, as shown in the bar graphs of Figure 7 and Figs. S9-S12.


Reviewer: Below are some additional very minor comments:

Page 4, lines 113-114: Perhaps this sentence would better read, “Assuming that one accepts indels are useful characters…”

Response:  The text is modified accordingly.


Reviewer: Page 6, line 179: Simmons and Ochoterena 2000 specifically refer to this as “indel coding” not “gap coding”.

Response:  The text is modified accordingly.


Reviewer: Page 6, line 191: I’m assuming BS refers top “bootstrapped”. I would make certain all these abbreviations are defined in the paper the first time they are mentioned.

Response:  The first use of the word “bootstrap” is, in fact, immediately followed by the abbreviation “BS” in parentheses on line 73 in the caption to Fig 2 on page 3.


Reviewer: Page 6, TABLE 1: The first line of table 1 says there were 2,515 introns in the study while the text mentions 2,516 introns (page 6, line 178). Also I think there is space on the table to spell out “informative” rather than using the abbreviation “Inf”.

 Response: Jarvis et al (2014) identified 2516 introns, but one of these lacked any indels whatsoever. Thus, the nucleotide dataset includes one locus that the indel dataset does not. This is now made explicit in a newly added table in the Supplemental, Table S-new). On the second point, “Informative” is now spelled out in Table 1.


Reviewer: Page 16, FIGURE 9: I think given the authors have a clear legend associating shades of red with the number of branches it is not necessary to also include those numbers on every cell of the heat map in Figure 9b.

Response:  When creating the heat map, we were of the opinion that it was sometimes difficult to distinguish slight differences in shade using the color scale alone. We opted to include the values for the benefit of readers who might be interested. Certainly, anyone not so inclined can ignore the values and focus on the color scale instead.


Reviewer: Page 21, line 529: A grammatical suggestion would be to change the parenthetical part of this sentence to “(i.e. the MP-EST* TENT binned loci and included exons)”.

Response: We find both the original and suggested edits awkward because the operator of the “binned” is the MP-EST* TENT tree, so we opted for an alternative that we believe is clearer.


Reviewer: Page 22, line 557: The sentence “We believe that our analyses say quite a bit.” Is unnecessary.

Response: We agree, and the sentence is removed accordingly.


Reviewer 2 Report

This study made an extensive investigation of the phylogenetic signal in indels, which are rarely used as a source of phylogenetic data. The authors were thorough in comparing the signal across locus trees and estimates of species trees, and were careful in separating indels from introns and UCEs. The primary findings of an improved resolution at the base of the tree of birds is convincing, such that this report moves several fields forward: indel phylogenetics, bird taxonomy, and phylogenomics broadly.


Due to the great reporting, I have no comments on this manuscript and believe it can be published as is.

Author Response

We thank the reviewer for the quick review and complimentary comments.

Back to TopTop