Next Article in Journal / Special Issue
Effect of Transhumant Livestock Grazing on Pseudo-Alpine Grassland Bird Communities
Previous Article in Journal / Special Issue
Territorial Responses of Nuthatches Sitta europaea—Evaluation of a Robot Model in a Simulated Territorial Intrusion
 
 
Article
Peer-Review Record

Data Types and the Phylogeny of Neoaves

Birds 2021, 2(1), 1-22; https://doi.org/10.3390/birds2010001
by Edward L. Braun * and Rebecca T. Kimball *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Birds 2021, 2(1), 1-22; https://doi.org/10.3390/birds2010001
Submission received: 15 November 2020 / Revised: 14 December 2020 / Accepted: 15 December 2020 / Published: 5 January 2021
(This article belongs to the Special Issue Feature Papers of Birds 2021)

Round 1

Reviewer 1 Report

This study is a logical next step for avian phylogenomics, and it provides an incremental but important advance and a welcomed review of the state of knowledge regarding Neoaves phylogeny. The methods and presentation of the data were good. The writing was clear. I have no concerns about it being published as is.

Author Response

This study is a logical next step for avian phylogenomics, and it provides an incremental but important advance and a welcomed review of the state of knowledge regarding Neoaves phylogeny. The methods and presentation of the data were good. The writing was clear. I have no concerns about it being published as is.

RESPONSE: WE THANK THE REVIEWER FOR HIS/HER KIND COMMENTS.

Reviewer 2 Report

In the present article, Braun and Kimball expand on prior work which investigated the phylogeny of Neoaves, the most speciose of the three major groups of modern birds. While most of the relationships among lineages within these major groups are moderately-well resolved, the interordinal relationships within Neoaves represent a persistent source of discord among avian systematists. While the present work does not fully resolve these issues, I believe it is a valuable step forward and explicitly tests several predictions that the authors proposed in prior work.

The authors propose that discrepancies between several phylogenetic hypotheses for birds can be attributed to “data-type” effects — that is, coding sequences seem to present a different phylogenetic signal from non-coding sequences. The authors attribute this primarily to model inadequacy for coding sequences. The primary test that the authors conduct, asks if recoding the coding sequence data into a lower dimensional space that reduces variation on the GC-AT axis produces a phylogenetic signal that is more similar to that obtained from non-coding sequences, which are known to possess lower compositional heterogeneity. This is a nice way to assess the data-type hypothesis, and it seems to validate their predictions. The authors accept the intron signal as their null hypothesis and ask if re-coding the coding sequences causes the coding data signal to collapse toward the non-coding signal. Thus, reducing a source of known systematic error in phylogenetic inference changes the signal from the coding sequences relative to a null hypothesis (the intron signal), and the authors gently imply this should lend additional credence to the non-coding intron signal. Specifically, the authors rely on the presence or absence of “indicator clades” in the maximum likelihood topology estimates to differentiate between various coding and non-coding signals.

While I agree with this approach and the conclusions in general, I have a few points that I think could improve the manuscript.

First— the effect of partitioning or not seems notable — and this alone appears to affect some of the patterns associated with compositional heterogeneity (line 279 - “only one indicator clade was present in all four trees expected to exhibit a “coding-type” topology”). I find it particularly interesting that the tree distance between partitioned and unpartitioned analyses of coding sequences seems to be much greater than the tree distance between partitioned and unpartitioned analyses of the non-coding sequences (figure 3B), and I wonder if this indicates the effect of partitioning on compositionally heterogenous sequences. Further, it is notable that in most cases for the non-coding data, it seems like partitioning has the effect of reducing bootstrap support, rather than increasing it. For instance, in Figure 5, it looks like almost all support values presented are lower (sometimes substantially) in partitioned analyses. If we think that partitioning better accommodates heterogeneity in the data (and I think we have reason to believe that it does), then this is an important result (see “The Effects of Partitioning on Phylogenetic Inference” https://doi.org/10.1093/molbev/msv026 — a key takeaway from the abstract: “Most notably, we find occasional instances where the use of a suboptimal partitioning scheme produces highly supported but incorrect nodes in the tree.”). This ties into my next point —

In general, I think it should be emphasized that the bulk of the comparisons made by the authors are based on the maximum likelihood point estimates for each analysis. While the value of the bootstrap as a measure of support is debatable (and has been debated at length in the literature, see the literature on concordance analysis for example), the authors of IQTREE (on their website) indicate that only ultrafast bootstrap values > 95 should be considered a reliable indicator, as these statistics are biased upward relative to regular bootstrap values. Therefore, I think it would be interesting to generate a version of figure 3B based on ML trees that have low support nodes (a threshold determined at the authors’ discretion) collapsed. My guess is that the presently clear distinction between coding and non-coding sequences in analysis of tree-space will be reduced. If the patterns in figure 3B hold after collapsing low support nodes, I think the case here would be stronger. This is not to imply that the authors are not correct in their conclusions or interpretation, but I think it is important to note that these patterns are conditional on the ML point estimates and are not necessarily robust to the standard deviations (bootstraps) of those analyses. The consistency of the ML estimates for different data types is doubtlessly compelling (and consistency is a type of evidence for sure), but if these results for coding and non-coding are within each other’s range of statistical variability — what does that mean? It probably means we should be moving away from the bootstrap as an indicator of support, but I would be curious what the authors think about this potential issue.

Likewise — the existence of Passerea does not seem clearly supported by these analyses. The highest reported ultrafast bootstrap support value reported for this clade was 93 in one analysis of partitioned and RY coded exons, or in the analysis of the whole RY coded dataset (which are mostly the same datasets). Critically — this clade was not also well supported in the non-coding data (Figure 5, BS 47/29), even though it was compellingly recovered in the ML point estimate. If we are to prefer the non-coding signal, how are we to reconcile these apparently conflicting support values? It could be due to reduced signal in the non-coding data as a consequence of its reduced dataset size, but nonetheless, I would probably disagree with the characterization in the conclusion that the re-analysis of these data “strongly support” the existence of Passerea. It seems equally plausible that the position of the hoatzin as sister to landbirds is accurate, given similar support in the RY analysis of all data (BS > 80), and its similar position in analysis of all sites unpartitioned (albeit with weak BS > 50 support). A similar statement can be made about the sister taxon to other Neoaves — while it is compelling that Mirandornithes appear as sister in several ML estimates, this is not a strong statement because of the weak support values immediately derived from the Neoavian MRCA. A complicating factor in interpreting these results is that even though RY recoding sequence data which is compositionally heterogeneous on the GC-AT axis across lineages may ameliorate that bias, it may also introduce other sources of positively misleading bias (see https://link.springer.com/article/10.1186/1471-2105-15-S2-S8).

In sum— perhaps the conclusion of Suh (https://doi.org/10.1111/zsc.12213) that the base of Neoaves contains a hard polytomy makes sense of these results (and that these different data type signals reflect different reticulations on the “true” phylogenetic network representation of these relationships — the authors’ “third explanation” in their discussion. Nonetheless, I also think that doing as the authors have done in ruling out sources of possible systematic error is a critical step in falsifying that hypothesis — so this is a valuable step forward.

 

minor points:

Lines 510:520 — by “bias” do the authors mean long branch attraction?

Line 419: typos in the first sentence

Line 244: typo on the last line

Line 12: typo after genomic

Line 11: suggest changing “It has recently been proposed” to active voice

 

Author Response

In the present article, Braun and Kimball expand on prior work which investigated the phylogeny of Neoaves, the most speciose of the three major groups of modern birds. While most of the relationships among lineages within these major groups are moderately-well resolved, the interordinal relationships within Neoaves represent a persistent source of discord among avian systematists. While the present work does not fully resolve these issues, I believe it is a valuable step forward and explicitly tests several predictions that the authors proposed in prior work.

The authors propose that discrepancies between several phylogenetic hypotheses for birds can be attributed to “data-type” effects — that is, coding sequences seem to present a different phylogenetic signal from non-coding sequences. The authors attribute this primarily to model inadequacy for coding sequences. The primary test that the authors conduct, asks if recoding the coding sequence data into a lower dimensional space that reduces variation on the GC-AT axis produces a phylogenetic signal that is more similar to that obtained from non-coding sequences, which are known to possess lower compositional heterogeneity. This is a nice way to assess the data-type hypothesis, and it seems to validate their predictions.

The authors accept the intron signal as their null hypothesis and ask if re-coding the coding sequences causes the coding data signal to collapse toward the non-coding signal. Thus, reducing a source of known systematic error in phylogenetic inference changes the signal from the coding sequences relative to a null hypothesis (the intron signal), and the authors gently imply this should lend additional credence to the non-coding intron signal. Specifically, the authors rely on the presence or absence of “indicator clades” in the maximum likelihood topology estimates to differentiate between various coding and non-coding signals.

RESPONSE: WE THANK THE REVIEWER FOR THE POSITIVE COMMENTS, AND FOR THE THOUGHTFUL REVIEW. THIS REVIEWER OBVIOUSLY SPENT A LOT OF TIME TO REALLY UNDERSTAND OUR RESULTS AND, IN DOING SO, PROVIDED US WITH THOUGHT PROVOKING AND HELPFUL COMMENTS.

While I agree with this approach and the conclusions in general, I have a few points that I think could improve the manuscript.

First— the effect of partitioning or not seems notable — and this alone appears to affect some of the patterns associated with compositional heterogeneity (line 279 - “only one indicator clade was present in all four trees expected to exhibit a “coding-type” topology”). I find it particularly interesting that the tree distance between partitioned and unpartitioned analyses of coding sequences seems to be much greater than the tree distance between partitioned and unpartitioned analyses of the non-coding sequences (figure 3B), and I wonder if this indicates the effect of partitioning on compositionally heterogenous sequences.

RESPONSE: THIS WAS AN INTERESTING OBSERVATION, AND WE HAVE NOW ADDED A PARAGRAPH TO THE DISCUSSION TO ADDRESS THIS (AND THE FOLLOWING) OBSERVATIONS ABOUT PARTITIONING.

Further, it is notable that in most cases for the non-coding data, it seems like partitioning has the effect of reducing bootstrap support, rather than increasing it. For instance, in Figure 5, it looks like almost all support values presented are lower (sometimes substantially) in partitioned analyses. If we think that partitioning better accommodates heterogeneity in the data (and I think we have reason to believe that it does), then this is an important result (see “The Effects of Partitioning on Phylogenetic Inference” https://doi.org/10.1093/molbev/msv026 — a key takeaway from the abstract: “Most notably, we find occasional instances where the use of a suboptimal partitioning scheme produces highly supported but incorrect nodes in the tree.”). This ties into my next point —

RESPONSE: WE HAD NOT FULLY DISCUSSED THE ISSUE OF PARTITIONING ORIGINALLY. THE PAPER THE REVIEWER CITED DEMONSTRATES THAT USE OF ALGORITHMIC MODELS FOR PARTITIONING SHOULD LEAD TO APPROPRIATE PARTITIONING SCHEMES. WE USED THE ALGORITHMIC PARTITIONING SCHEME IMPLEMENTED IN IQ-TREE, SO WE HAVE NO REASON TO EXPECT A SUBOPTIMAL PARTITIONING SCHEME. IN ADDITION, SINCE THE TOPOLOGY FOR THE NON-CODING DATA DID NOT CHANGE WHEN WE USED PARTITIONING, IT IS UNLIKELY THAT PARTITIONING LED TO INCORRECT NODES IN THE TREE THAT WERE NOT PRESENT IN THE UNPARTITIONED TREE. HOWEVER, AS INDICATED ABOVE, THESE OBSERVATIONS ABOUT PARTITIONING HAVE ENCOURAGED US TO ADDRESS THEM IN THE DISCUSSION EXPLICITLY.

In general, I think it should be emphasized that the bulk of the comparisons made by the authors are based on the maximum likelihood point estimates for each analysis. While the value of the bootstrap as a measure of support is debatable (and has been debated at length in the literature, see the literature on concordance analysis for example), the authors of IQTREE (on their website) indicate that only ultrafast bootstrap values > 95 should be considered a reliable indicator, as these statistics are biased upward relative to regular bootstrap values. Therefore, I think it would be interesting to generate a version of figure 3B based on ML trees that have low support nodes (a threshold determined at the authors’ discretion) collapsed. My guess is that the presently clear distinction between coding and non-coding sequences in analysis of tree-space will be reduced. If the patterns in figure 3B hold after collapsing low support nodes, I think the case here would be stronger. This is not to imply that the authors are not correct in their conclusions or interpretation, but I think it is important to note that these patterns are conditional on the ML point estimates and are not necessarily robust to the standard deviations (bootstraps) of those analyses. The consistency of the ML estimates for different data types is doubtlessly compelling (and consistency is a type of evidence for sure), but if these results for coding and non-coding are within each other’s range of statistical variability — what does that mean? It probably means we should be moving away from the bootstrap as an indicator of support, but I would be curious what the authors think about this potential issue.

RESPONSE: THE REVIEWER MAKES AN INTERESTING POINT WITH RESPECT TO THE IDEA OF COLLAPSING BRANCHES IN THE TREES BEFORE PERFORMING THE CLUSTERING ANALYSES. WE TRIED THIS USING A CUTOFF OF 50% AND FOUND VERY SIMILAR RESULTS. WHEN WE PERFORMED THAT ANALYSIS, WE FOUND THAT THE RY TREES FOR ALL DATA AND FOR THE EXONS SHIFTED TO A POSITION CLOSER TO NON-CODING TREES BASED ON ANALYSES OF NUCLEOTIDE DATA. WE THANK THE REVIEWER FOR THIS INTERESTING SUGGESTION AND WE HAVE ADDED A SUPPLEMENTARY FIGURE TO SHOW THIS INFORMATION (SUPPLEMENTARY FIGURE S1).

THE REVIEWER IS OBVIOUSLY FAMILIAR WITH THE LONGSTANDING DEBATE REGARDING VARIOUS SUPPORT VALUES IN THE PHYLOGENETICS LITERATURE. WE WOULD LIKE TO STRESS THAT, IN ALL CASES, WE HAD A PRIORI EXPECTATIONS THAT WERE ARTICULATED IN PUBLISHED MANUSCRIPTS. WE VIEW THE RESULTS IN FIGURE 3 AS PROVIDING TWO PIECES OF INFORMATION FOR READERS – WHETHER SPECIFIC A PRIORI DEFINED CLADES WERE PRESENT IN THE OPTIMAL TREE AND WHAT THE RELATIVE SUPPORT FOR THAT CLADE IS IN SPECIFIC ANALYSES. WE ARE UNCOMFORTABLE WITH THE NOTION OF ADVOCATING A CUTOFF FOR THE SUPPORT VALUES AND PREFER TO ALLOW INDIVIDUAL READERS TO INTERPRET OUR SUPPORT VALUES AS THEY SEE FIT.

Likewise — the existence of Passerea does not seem clearly supported by these analyses. The highest reported ultrafast bootstrap support value reported for this clade was 93 in one analysis of partitioned and RY coded exons, or in the analysis of the whole RY coded dataset (which are mostly the same datasets). Critically — this clade was not also well supported in the non-coding data (Figure 5, BS 47/29), even though it was compellingly recovered in the ML point estimate. If we are to prefer the non-coding signal, how are we to reconcile these apparently conflicting support values? It could be due to reduced signal in the non-coding data as a consequence of its reduced dataset size, but nonetheless, I would probably disagree with the characterization in the conclusion that the re-analysis of these data “strongly support” the existence of Passerea

RESPONSE: THIS IS A GOOD POINT; WE AGREE WITH THE REVIEWER THAT THE SUPPORT FOR PASSEREA IN THIS STUDY DOES NOT RISE TO THE LEVEL OF “STRONG SUPPORT”. HOWEVER, WE WOULD LIKE TO STRESS THAT THIS STUDY WAS CONDUCTED IN A FRAMEWORK WHERE WE HAD CLEAR A PRIORI EXPECTATIONS (I.E., THE RECOVERY OF PASSEREA IN THE JARVIS AND REDDY STUDIES). THUS, WE HAVE REWORDED OUR CONCLUSIONS TO HIGHLIGHT THAT OUR STUDY PROVIDES ADDITIONAL SUPPORT FOR PASSEREA (I.E., THAT IT CORROBORATES THE EXISTENCE OF THE CLADE GIVEN OUR PRIOR KNOWLEDGE) AND WE HAVE TRIED TO AVOID THE IMPLICATION THAT PASSEREA IS STRONGLY SUPPORTED BASED ON THIS STUDY ALONE.

It seems equally plausible that the position of the hoatzin as sister to landbirds is accurate, given similar support in the RY analysis of all data (BS > 80), and its similar position in analysis of all sites unpartitioned (albeit with weak BS > 50 support). A similar statement can be made about the sister taxon to other Neoaves — while it is compelling that Mirandornithes appear as sister in several ML estimates, this is not a strong statement because of the weak support values immediately derived from the Neoavian MRCA. A complicating factor in interpreting these results is that even though RY recoding sequence data which is compositionally heterogeneous on the GC-AT axis across lineages may ameliorate that bias, it may also introduce other sources of positively misleading bias (see https://link.springer.com/article/10.1186/1471-2105-15-S2-S8).

RESPONSE: REGARDING THE POTENTIAL FOR RY CODING TO LEAD BIAS WE WOULD LIKE TO START BY THANKING THE REVIEWER FOR POINTING US TO AN INTERESTING PAPER WE WERE UNAWARE OF. WE WERE AWARE OF SOME SIMILAR LITERATURE AT THE PROTEIN LEVEL (e.g., C. Kosiol and N. Goldman, "Markovian and non-Markovian protein sequence evolution: aggregated Markov process models.," J. Mol. Biol., vol. 411, no. 4, pp. 910--923, Aug. 2011) BUT WE UNAWARE OF A DEMONSTRATION THAT THIS WAS POSSIBLE FOR RY CODING. WE HAVE ADDED A CITATION. TO THE PAPER HIGHLIGHTED BY THE REVIEWER AND THE POTENTIAL FOR BIAS.

WITH THAT SAID, WE FEEL BIAS DUE TO RY CODING IS UNLIKELY TO PLAY A MAJOR ROLE IN OUR RESULTS. OUR MOST FUNDAMENTAL OBSERVATION IS THAT RY CODING THE CODING DATA (AND THE COMPLETE DATASET) RESULTS IN TREES THAT ARE MORE SIMILAR TO THE TREES BASED ON ANALYSIS OF THE NON-CODING DATA USING STANDARD NUCLEOTIDE MODELS. IT CERTAINLY REMAINS POSSIBLE THAT SOME NODES IN THE RY TREES REFLECT BIAS BUT THEY ARE UNLIKELY TO BE THE ONES THAT ARE CONGRUENT WITH THE NODES IN THE NON-CODING TREE; THOSE ARE THE FOCAL NODES FOR THIS STUDY.

In sum— perhaps the conclusion of Suh (https://doi.org/10.1111/zsc.12213) that the base of Neoaves contains a hard polytomy makes sense of these results (and that these different data type signals reflect different reticulations on the “true” phylogenetic network representation of these relationships — the authors’ “third explanation” in their discussion. Nonetheless, I also think that doing as the authors have done in ruling out sources of possible systematic error is a critical step in falsifying that hypothesis — so this is a valuable step forward. 

RESPONSE: AS WE HAD INDICATED, AND FURTHER EXPANDED IN THE REVISION, THE NON-CODING AND CODING DATA CAME FROM THE SAME LOCI. THUS, THESE DATA TYPE EFFECTS ARE VERY UNLIKLEY TO REPRESENT DIFFERENT RETICULATIONS.

minor points:

Lines 510:520 — by “bias” do the authors mean long branch attraction?

RESPONSE: WE HAVE NOW CLARIFIED THAT THE BIAS COULD BE LONG-BRANCH ATTRACTION OR OTHER PROCESSES.

Line 419: typos in the first sentence

RESPONSE: CORRECTED (ALSO NOTED BY REVIEWER 3)

Line 244: typo on the last line

RESPONSE: CORRECTED.

Line 12: typo after genomic

RESPONSE: CORRECTED (ALSO NOTED BY REVIEWER 3)

Line 11: suggest changing “It has recently been proposed” to active voice

RESPONSE: DONE

Reviewer 3 Report

In this manuscript, Braun and Kimball report a detailed exploration of the effects of data type on the resolution of some of the most recalcitrant branches known in the tree of life. Their analyses convincingly reveal an effect of data type where coding regions lead to different results to non-coding regions. The differences are likely to arise from the increased complexity in the molecular evolutionary process of coding regions, as revealed by the RY coding data. The report is well written and executed, so I am happy to recommend it for publication with only a few minor suggestions.

Line 12. "type of genomic data"

Line 24. Perhaps rephrase since the hypothesis itself has not exacerbated the differences among data types.

Figure 3b. It would be ideal to have this "tree of trees" be shown unrooted, which would more accurately reflect what it is showing, and would allow easier interpretation of the distances among lineages traversing the root.

Line 419. Remove full stop near the start of sentence.

Line 425-429. The parentheses at the end of sentence are not closing correctly.

Line 486. Remove full stop at the end of line.

Line 544-546. Phrase seems incomplete. 

Author Response

In this manuscript, Braun and Kimball report a detailed exploration of the effects of data type on the resolution of some of the most recalcitrant branches known in the tree of life. Their analyses convincingly reveal an effect of data type where coding regions lead to different results to non-coding regions. The differences are likely to arise from the increased complexity in the molecular evolutionary process of coding regions, as revealed by the RY coding data. The report is well written and executed, so I am happy to recommend it for publication with only a few minor suggestions.

RESPONSE: WE THANK THE REVIEWER FOR THE POSITIVE COMMENTS ON THE WRITING AND ANALYSES, AND APPRECIATE THE CAREFUL CHECK TO CATCH SOME OF OUR ERRORS.

Line 12. "type of genomic data"

RESPONSE: CORRECTED (ALSO NOTED BY REVIEWER 2).

Line 24. Perhaps rephrase since the hypothesis itself has not exacerbated the differences among data types.

RESPONSE: CORRECTED

Figure 3b. It would be ideal to have this "tree of trees" be shown unrooted, which would more accurately reflect what it is showing, and would allow easier interpretation of the distances among lineages traversing the root.

RESPONSE: WE HAVE CHANGED THE “TREE-OF-TREES” TO AN UNROOTED FIGURE. THE TREE-OF-TREES IS SIMPLY A MEANS TO VISUALIZE TREESPACE SO WE AGREE WITH THE REVIEWER THAT THE ROOT IS NOT NECESSARILY SIGNIFICANT. IT WAS PRESENTED AS A ROOTED TREE BECAUSE WE FELT IT WAS EASIER TO VISUALIZE IN THAT WAY, BUT THE CHOICE OF ROOTED VS UNROOTED IS SOMEWHAT ARBITRARY SO WE ARE HAPPY TO PRESENT THE OTHER ALTERNATIVE.

Line 419. Remove full stop near the start of sentence.

RESPONSE: CORRECTED (ALSO NOTED BY REVIEWER 2).

Line 425-429. The parentheses at the end of sentence are not closing correctly.

RESPONSE: CORRECTED

Line 486. Remove full stop at the end of line.

RESPONSE: CORRECTED.

Line 544-546. Phrase seems incomplete. 

RESPONSE: WE APOLOGIZE, AS WE AGREE IT WAS NOT CLEAR WHAT WE INTENDED. WE HAVE REWRITTEN THE LAST HALF OF THAT PARAGRAPH TO HOPEFULLY PROVIDE A MUCH CLEARER DISCUSSION.

Reviewer 4 Report

In this study, the authors performed a direct test of the data type effects hypothesis for the base of Neoaves by conducting phylogenetic analyses of the coding and non-coding subsets of the Prum data matrix. I found the manuscript very interesting, especially the implications of their results for the theory and practice of phylogenomics. The study falls into the scope of Birds and thus could be published. Before this can be done, a few amendments should be made as specified below.

First, perhaps the authors judge useful to cite the following recently published paper:

Heiner Kuhl, Carolina Frankl-Vilches, Antje Bakker, Gerald Mayr, Gerhard Nikolaus, Stefan T Boerno, Sven Klages, Bernd Timmermann, Manfred Gahr, An Unbiased Molecular Approach Using 3′-UTRs Resolves the Avian Family-Level Tree of Life, Molecular Biology and Evolution, msaa191, https://doi.org/10.1093/molbev/msaa191

Line 8: “separated birds three main groups” change to “separated birds INTO three main groups”

Lines 36-38: “Using our results, we provide a summary phylogeny that identifies well-corroborated relationships and highlights specific nodes where future efforts should focus”. I understood that future studies should focus on core landbirds (556-557). I suggested to the authors to highlight it in the conclusion section.

Line 112: Please, remove bold format in “and the Prum et”

Line 568 -569: “(clade IV),” change to “(clade IV).”

Author Response

In this study, the authors performed a direct test of the data type effects hypothesis for the base of Neoaves by conducting phylogenetic analyses of the coding and non-coding subsets of the Prum data matrix. I found the manuscript very interesting, especially the implications of their results for the theory and practice of phylogenomics. The study falls into the scope of Birds and thus could be published. Before this can be done, a few amendments should be made as specified below.

RESPONSE: WE APPRECIATE THAT THE REVIEWER FOUND THE PAPER INTERESTING AND INFORMATIVE, AND ALSO HIGHLIGHTED SOME OVERSIGHTS WE HAVE NOW CORRECTED.

First, perhaps the authors judge useful to cite the following recently published paper:

Heiner Kuhl, Carolina Frankl-Vilches, Antje Bakker, Gerald Mayr, Gerhard Nikolaus, Stefan T Boerno, Sven Klages, Bernd Timmermann, Manfred Gahr, An Unbiased Molecular Approach Using 3′-UTRs Resolves the Avian Family-Level Tree of Life, Molecular Biology and Evolution, msaa191, https://doi.org/10.1093/molbev/msaa191

RESPONSE: WE HAD CITED THIS IN THE ORIGINAL VERSION (REFERENCE #16). IT WAS IDENTIFIED 8 TIMES IN THE ORIGINAL VERSION, WITH REFERENCE TO OTHE NON-CODING (AND IN SOME CASES, SPECIFICALLY UTR) TREES.

Line 8: “separated birds three main groups” change to “separated birds INTO three main groups”

RESPONSE: CORRECTED. THANK YOU FOR NOTING THE MISSING WORD.

Lines 36-38: “Using our results, we provide a summary phylogeny that identifies well-corroborated relationships and highlights specific nodes where future efforts should focus”. I understood that future studies should focus on core landbirds (556-557). I suggested to the authors to highlight it in the conclusion section.

RESPONSE: THANK YOU FOR NOTING THIS OVERSIGHT, AS FURTHER EXPLORATION OF EARLY DIVERGENCES IN THE LANDBIRDS IS ALSO NEEDED. WE HAVE ADDED A SENTENCE INTO THE CONCLUSIONS TO HIGHLIGHT THIS.

Line 112: Please, remove bold format in “and the Prum et”

RESPONSE: WE COULD NOT IDENTIFY THIS BOLD TEXT. WE LOOKED AT THE ORIGINAL, THE VERSION SUPPLIED FOR REVISION, AND AT LINES WELL BEFORE AND WELL AFTER LINE 112.

Line 568 -569: “(clade IV),” change to “(clade IV).”

RESPONSE: CORRECTED. THANK YOU.

Back to TopTop