Review Reports - Evaluation of Genetic Diversity and Parasite-Mediated Selection of MHC Class I Genes in <i>Emberiza godlewskii</i> (Passeriformes: Emberizidae)

Round 1

Reviewer 1 Report

Reviewer's report

Date: 30 September, 2022

Journal: Diversity

Manuscript ID: diversity-1954303

Type of manuscript: Article

Title: Evaluation of genetic diversity and parasite-mediated selection of MHC class I genes in Emberiza godlewskii (Passeriformes: Emberizidae)

Authors: Wei Huang, Xinyi Wang, Boye Liu, Tobias L. Lenz, Yangyang Peng, Lu Dong *, Yanyun Zhang *

Using an NGS-based protocol, the authors successfully genotyped the MHC class I genes in Godlewski’s bunting Emberiza godlewskii and identified 160 functional alleles in a large population of the buntings. The manuscript is clear and well written, with no fundamental flaws and weaknesses, and contains new and interesting data that are sound, adequately described and illustrated, and that may provide important cues to scientists interested in the MHC genes variability and evolution. Therefore, after amending the manuscript according to the following suggestion, it is suitable for publication in Diversity.

Abstract, lines 12-14: The sentence is not clear. Please, reformulate.

Introduction, line 22: Change “…pathogens[1].” to “…pathogens [1].” Add a space between “pathogens” and “[1].” Do the same change up to the end of the manuscript with all the rest references.

Introduction, lines 54-58: Incorrect comparison of “…the number of MHC genes per individual…” with “ alleles per individual”. Reformulate the sentence.

Introduction, line 79: Plasmodium and Leucocytozoon must be in Italics.

Introduction, line 79: Add species names for the genera Plasmodium and Leucocytozoon.

Material and methods: The section 2.3. title includes “NGS sequencing”, however this section contains no any information concerning the NGS sequencing. Add relevant information concerning the NGS sequencing.

Material and methods, lines 117, 118: Change incorrect abbreviation “ul” to microlitre.

Material and methods, line 117: Add a space between “20” and “ul”. Do the same up to the end of the manuscript.

Material and methods, line 169: Change “1 x 107” to “1 x 10⁷”.

Material and methods, line 172: Add a space between “212” and “bp”. Do the same up to the end of the manuscript.

Material and methods, section 2.6: There is no any description of the methods to infer the contemporary selection. Add a detailed description of the methods you used to identify and prove contemporary selection.

Material and methods, lines 203-204: The sentence is not clear. Is the cyt b gene from Plasmodium falciparum? Add more details.

Material and methods, line 205: Add a reference(s) at the end of the sentence “…bioinformatics methods were described in the previous study.”

Material and methods, lines 219-221: The sentence “In addition, we constructed another set of models…” requires a detailed explanation. Clearly explain how you used this approach to test the heterozygote advantage hypothesis.

Results, lines 229-230: You state that some alleles are non-functional. Add clear criterions you used for this statement. The length of sequence is not a criterion of non-functionality.

Results, lines 235-236: Change “There were several monophyletic clades identified in the Bayesian tree for the 184 functional alleles…” to “There were several monophyletic clades identified in the Bayesian tree (Figure 1) for the 184 functional alleles…”

Results, line 237: Consider English revision for the sentence “The 12 alleles with expression evidence from RNA …”

Results, line 248: Clearly determine the term “supertypes”.

Results, section 3.2. Historical selection analysis: Please add more details concerning the Maximum likelihood (ML) estimates for site-specific models (d_N/d_S and p values).

Results, line 267: Change “diveristy” to “diversity”.

Results, lines 267-270: The sentence “The results of model with quadratic term of MHC diveristy were consistent…” must be clarified. Explain in detail how you use this information to prove contemporary selection. The analysis of the contemporary selection is absent in the section Results.

Figure 1: Increase the figure 1 and add more description concerning functional and non-functional sequences.

Figure 2: The numbers on the axes concerning the percent of the variation explained by each principal component (P1 and P2) must be added.

Discussion: lines 331-333: You state that “…MHC class I genes in Godlewski’s buntings evolve under strong balancing selection…” However, balancing selection usually produce highly polarized haplotype structure, which is probably not true in your case. It would be probably more correct to suggest the diversifying selection (see, for instance, Loiseau et al. (2009). Please, add short note concerning the type of positive selection detected in the MHC genes. It would be useful to add a figure with polymorphic amino acid sites in all Godlewski’s bunting individuals that will show clearly the character of variability. It will help in interpretation of the results.

Conclusions, lines 354-355: The conclusion concerning the contemporary selection is highly speculative and should be reformulated.

References

Loiseau et al. (2009). Diversifying selection on MHC class I in the house sparrow (Passer domesticus). Mol. Ecol., 18(7), 1331–1340. doi:10.1111/j.1365-294x.2009.04105.x

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Huang et al. present a study of the genetic diversity and selection of MHC class I genes in Godlewski’s buntings based on amplicon-sequencing of an impressive sample size of more than 300 birds, of which almost 90% were infected by malaria parasites.

Overall, I found the manuscript well-written with only some sections requiring some extra proof-reading in terms of language (mainly in the methods and results section). I found a few important issues in the manuscript with respect to experimental design, methods and results descriptions that led to my recommendation of major revisions (see below).

I am focusing my review on the data generation, experimental design and some of the analyses (BEAST, PAML) as well as reproducibility of the study, as my experience with the MHC region and amplicon sequencing is limited.

Major comments:

1) Several sections of the Materials and Methods and Results are missing crucial information that the authors need to provide to ensure scientific soundness:

Materials and Methods, line 124: The Illumina sequencing has to be described in detail. Specifically, the pooling strategy, the library preparation protocol, the Illumina sequencing platform, paired-end or single-end reads, how many lanes were used for sequencing, are missing.

Materials and Methods, line 169: When running Bayesian methods such as BEAST it is crucial to check if the analysis has converged. Ideally, one runs more than one chain to make sure all chains converge on the same result. Convergence can be checked in the software Tracer. Explain how you ensured the BEAST analysis had converged.

Results: An overview of the sampling locations of the 326 individuals should be provided, as table or figure (map). Is anything known about their geographic structure when studying other loci than MHC genes that could be relevant to this study, e.g. population subdivisions across the sampling range?

Results: A section that describes the expression results is missing. Which MHC alleles were expressed (providing e.g. a table with accession numbers)? What were the expression levels at the different expressed MHC alleles?

Results, line 252: P-values and omega estimates from the likelihood ratio tests need to be provided.

Results, line 253: A table with the ten sites under positive selection with their posterior probability thresholds and omega estimates from both models.

2) Discussion: A discussion of the experimental design for the expression analyses is missing, specifically discussing the small sample size of seven samples. Also, some sections could be rephrased and conclusions toned down due to the small sample size for the expression analyses (e.g. lines 316-318).

Additional comments and suggestions

Lines 104-124: Did you apply the same protocol to the cDNA?

Lines 127-129: In case the amplicons were sequenced on Illumina Nextseq or Novaseq machines, poly-G trimming need to be run.

Lines 137-139: The parameters used in Amplisas need to be described more clearly, otherwise the analysis is not reproducible: 1) I can’t find this parameter, did you mean “Minimum amplicon frequency (%)”? 2) Which parameter was used to remove variants found in only one individual?

Lines 167-169: I suspect that this is a formatting issue, and that you ran the MCMC chain in BEAST for 10,000,000 generations. Please make this clear and also add the frequency of logging.

Line 183: Which empirical Bayes method did you use? A reference for the method is missing.

Figure 1: It would be helpful if you marked the nine supertypes found with the empirical Bayes analysis and DAPC in the tree.

Also, the font size for branch support is too small to be readable, and the figure caption should include information on the method used to generate the tree and the branch support shown/not shown in the tree (many branches lack branch support values).

Lines 295-300: The great tit is a clear outlier in this list of average numbers of functional alleles per individual and should be rather mentioned as “but see great tit”.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The manuscript was significantly improved and now it is suitable for publication in Diversity.

Author Response

We are very glad to have your recognition, thank you again.

Reviewer 2 Report

Huang et al. have revised their manuscript on MHC class I diversity and selection in Godlewski’s buntings and have addressed many of my comments on the previous manuscript version. A few open questions remain and I found a few new issues that the authors need to address prior to publication.

As previously, I am focusing my review on the data generation, experimental design and some of the analyses (BEAST, PAML) as well as reproducibility of the study, as my experience with the MHC region and amplicon sequencing is limited.

Major comments:

1) Several sections of the Materials and Methods and Results are still missing information that the authors need to provide:

Materials and Methods, line 121-129: The authors have added a new section on the Illumina sequencing strategy as requested. However, the library preparation protocol is still missing and should be added to the manuscript text. Also, the section reads a bit repetitive in terms of pooling of PCR products.

Materials and Methods, line 175-177: Please add the requirement for convergence (ESS>200 for all parameters) to the manuscript text.

Results: The D_exp-depth values of different expressed MHC alleles should be added to the manuscript (either to the text or to supplementary table S1).

Results, lines 252-254 and Supplementary Figure S2: Please add if the clustering of clade 1 with the 214 bp-long alleles is supported by high or low posterior probabilities. Branch support values are missing from the tree in Supplementary figure S2 and from the figure caption. The figure should be enlarged and the colours made more transparent so that branch support values would be visible. Also, please highlight clade 1 in the tree.

2) I’m still missing information on the expression analyses and a proper discussion of the small sample size throughout the manuscript.

Materials and Methods: There is no section on how expression levels of MHC I alleles were determined.

Results and Discussion: The authors did not address the issue about small sample size for the expression analyses to my satisfaction. With a sample size of only seven individuals (without any information on their physiological status, their age, or their geographic origin), the lack of expression of certain alleles does not mean that the alleles are not expressed in other individuals. This should be explained in the discussion, e.g. in lines 330-336. Some sections need to be rephrased to clarify, e.g.

Line 249: Please replace “We found a total of twelve expressed alleles distributed throughout the tree…” with “We found a total of twelve expressed alleles in seven individuals distributed throughout the tree…”

Line 323: “Analysis of MHC gene expression corroborated that…” should be “Analysis of MHC class I gene expression in seven individuals corroborated that…”

Additional comments and suggestions:

Abstract, lines 8-9: “Individual diversity” can be mistaken for nucleotide diversity. Please rephrase to make clear that you refer to the mean number of alleles per individual. I guess that “medium” was a typo and should be “median”?

Materials and methods: In the last version of the manuscript, the sequencing platform was not mentioned, so I gave the advice that in case the amplicons were sequenced on Illumina Nextseq or Novaseq machines, poly-G trimming needs to be run on the raw sequencing reads. The authors provide the sequencing platform in the revised manuscript (Illumina MiSeq PE250). However, there seemed to be a misunderstanding, because sequencing reads generated with MiSeq platforms do not require poly-G trimming which the authors state they did in their reply to my comment. This is only necessary for data generated on platforms using a 2-channel method such as NovaSeq and NextSeq. MiSeq uses the traditional 4-channel method. Please clarify.

Materials and methods, lines 143-145: I was aiming for better reproducibility of the analysis with my previous comment, which was not addressed by the authors. To be able to reproduce the analysis, all parameters have to be provided that are not default parameters. Specifically, the Amplisas parameter to remove the variants with “maximum per amplicon frequency depth lower than 1%" is missing - on the Amplisas website I can only find “Minimum amplicon frequency (%)” or “Minimum variant depth” among the advanced parameters. Also, please add the Amplisas parameter that was used to remove variants found in only one individual (or make clear this was done with a different tool).

Discussion, lines 291-294 & lines 343-344: Please provide references for these sentences.

Throughout the manuscript, language (wording, grammar) should be improved, for example (but not only) in lines 133-135, 228-229, 280-281, 364.

Supplementary figure S3: It would be interesting to add another panel with amino acid variations in classical alleles to visualise differences between classical alleles and non-classical alleles.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf