Next Article in Journal
Probing Structural Dynamics of Membrane Proteins Using Electron Paramagnetic Resonance Spectroscopic Techniques
Previous Article in Journal
Nanomarker for Early Detection of Alzheimer’s Disease Combining Ab initio DFT Simulations and Molecular Docking Approach
 
 
Article
Peer-Review Record

The Roles of Protein Structure, Taxon Sampling, and Model Complexity in Phylogenomics: A Case Study Focused on Early Animal Divergences

Biophysica 2021, 1(2), 87-105; https://doi.org/10.3390/biophysica1020008
by Akanksha Pandey 1,2 and Edward L. Braun 1,*
Reviewer 1:
Reviewer 2: Anonymous
Biophysica 2021, 1(2), 87-105; https://doi.org/10.3390/biophysica1020008
Submission received: 25 January 2021 / Revised: 18 February 2021 / Accepted: 28 February 2021 / Published: 25 March 2021

Round 1

Reviewer 1 Report

This article by Pandy and Bruan takes another look at the phylogenetic signal of amino acid residues with different biophysical properties on the root of the metazoan phylogeny. I find the paper to be well designed and mostly clearly written (but see one of my critiques). I do take some issues with a few points that I would like to see clarified or removed from the manuscript prior to its acceptance. With these points cleared up, I think this work will make a nice contribution to the field. It is also nice to see an interdisciplinary approach to phylogenetics that is illuminated by biophysical knowledge.

 

Page 3. The statement “The second explanation is more complex and it represents one of the biggest challenges in the field of phylogenomics: different parts of the genome can have distinct evolutionary histories” and the citations therein seem problematic to me. Sure, gene trees and species trees can be discordant due to duplication, loss and horizontal transfer, but individual residues within the same locus must, in principle, have a shared evolutionary history. I can’t think of a plausible evolutionary scenario where buried residues and exposed residues from the same locus could have different evolutionary histories. I think this proposal of distinct evolutionary histories should be eliminated or supported somehow with a proposed mechanism. The two papers cited to not support this particular proposal – they have to do with gene tree species tree discordance, which is well understood. I agree that ILS as an evolutionary process is not likely to drive the data-type discordance observed, but no other evolutionary process is proposed to explain it. The authors go on to discuss how different selection pressures could lead to the data discordance observed. I agree, but that would not indicate different evolutionary histories, only differences in evolutionary pressures that are potentially not accommodated by the model. It seems to me that any discordance observed between the two datasets must have to do with sampling error or model inadequacy.

 

Page 10. “The observation that different structural environments have different proportions of strongly decisive sites would seem to falsify any explanation that for the conflicting strongly decisive sites that rely on true discordance among gene trees.” I have read this sentence several times. I can’t for the life of me decipher its meaning. The sentence that follows this one also appears to be missing some words. Please clarify this passage.

 

Page 13. Again, in the discussion the authors invoke gene flow between stem ctenophores, sponges and parahoxozoa and possibly horizontal gene transfer between early metazoans and other opisthokont groups. However, I don’t see how this could explain the difference in phylogenomic performance of buried and exposed sites. As far as I can tell from the manuscript, these different site classes should often be present within the same locus. I don’t see how gene flow and perhaps even extensive recombination could explain the observed result. I think the differences between the datasets are much more likely to be due to model inadequacies rather unexplained evolutionary processes.

 

Page 13. I think it’s interesting that your analyses of the Simeon dataset, filtered for globular proteins (FSD), yielded support for either T2 or T3, irrespective of whether sites were buried or not, but rarely supported T1, which was the original finding of Simeon. Further, your analyses are conducted under site heterogeneous models, which advocates of T1 have quite pointedly argued does not occur. It would seem to me that the loci filtered out of the Simeon dataset by your selection of globular proteins may have been the drivers of the original Simeon finding of T1, suggesting that the data, or their curation, rather than superior models, may drive the T1 result. This may be why T1 is so rarely reported from studies that do not focus on the Simeon dataset. More details on how much of the Simeon dataset was omitted by your selection criteria would clarify why the FSD does not recover T1 using largely the same data and approximately the same model that originally recovered T1. Furthermore, citing Kapli and Telford is problematic here because their simulation studies were based on the topology reported in Simeon, however the present authors should have no expectation that such a simulation would show similar results using your FSD-based topology because your FSD does not find T1 in the first place. In fact, I would expect the asymmetry observed in Kapli and Telford to dissolve if simulations were conducted using your topology. It would be interesting to conduct similar simulations on your topology and see just how asymmetrical phylogenetic errors are under the FSD-based topology. Perhaps that’s another paper, but I think the section contextualizing the present study in light of Simeon and Kapli and Telford needs to be rethought, and perhaps some of the points I raise here should elaborated on in the discussion.  

 

 

Author Response

Please see the attachment.

NOTE: this reviewer clicked the box for "I would like to sign my review report" but I was unable to find their name. I would like to thank them by name in the acknowledgements (assuming that that clicking the box to sign the review was not an error).

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper explores the effects of protein structure on phylogenetic inference, specifically comparing the reliability of exposed versus buried sites in the protein using a dataset pertaining to relationships among major lineages of Metazoa. Previous phylogenomic analyses have yielded contradictory results with regard to resolution of the deepest splits in this phylogeny. The authors find that the data type effect only strongly affects the resulting topology when the taxon sample is relatively small. They also provide a helpful review of the various potential pitfalls presented by this kind of dataset and describe the kinds of data artifacts that could bias phylogenomic analysis. They discuss the importance of robust taxon sampling in overcoming some of the problems that can result when data are heterogeneous. They propose using the relative number “strongly decisive sites”, i.e., amino acid positions with log likelihoods greater than or equal to 5 standard deviations from the mean for all sites as a criterion for distinguishing among alternative tree topologies and evaluating the reliability of tree searches. For the metazoan dataset they found that exposed sites included a higher proportion of strongly decisive sites supporting the topology favored by most analyses whereas buried sites included a larger number of strongly decisive sites favoring one of the two alternative toplogies.  The authors conclude that the strongly supported sites criterion cannot replace standard ML tree searches and more standard measures of tree support but provide an additional, computationally efficient method for evaluating alternative topologies.

This paper should be of broad interest to biologists attempting to resolve phylogenetic relationships using data from protein-coding genes. My one suggestion for improvement is to provide a clearer explanation for how the cut-off likelihood value was chosen for defining a particular amino acid site as “strongly decisive”. Would slightly lower cut-off values be expected to yield similar results and how difficult would it be to conduct an analysis in which all sites are weighted based on their relative decisiveness?

Author Response

Please see the attachment (responses to both reviewers are in a single word file, uploaded as a response to the first reviewer).

Author Response File: Author Response.pdf

Back to TopTop