Review Reports
- Oxana Galzitskaya1,2,
- Aleksey Lebedev1,3,* and
- Anna Kuznetsova1,*
- et al.
Reviewer 1: Anonymous Reviewer 2: Brian Thomas Foley Reviewer 3: Anonymous
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsAbstract:
LN 27-28: This says nothing about the features. What features are you referring to?
Methods:
- Phylogenetic tools are constantly being updated with improved statistical models to ensure the reliability of data analysis. The key software, MEGA 6.0, used for this analysis is an outdated version and might not be ideal for the current data analysis. What considerations exist for choosing v6 over the latest versions 10 (available since 2018) or 12?
- How did you validate the subtype/clade assignment of these sequences? The subtype assignment in LANL and other databases is not error-proof. As a minimum, a phylogenetic tree reconstruction with database reference sequences is required to confirm these subtypes/clades. A maximum likelihood tree with aLRT or a fast-bootstrapping method should suffice for this purpose (use recently updated ML tree building software!).
Results:
- The font size in the figures is too small and illegible to read. This made it difficult and frustrating to evaluate the contents of these figures.
- The authors should simplify Figure 1 for easy comprehension. It wasn’t clear whether the bottom panel of Figure 1A is a continuation of the top panel (at a glance, this looks like MSA of three distinct protein sequences). Consider using an alternative chart for depicting the genetic diversity other than an MSA, for Figure 1; otherwise, simplify this figure.
- LN 274: This refers to Figure 2B, and not Figure 3B.
- It will be helpful to reiterate the implications of the WK values here - what low and high levels mean – LN 380-384.
- There is a significant overlap of the results/methodology of subsections 3.1 (AA variability) and 3.5 (WK variability index). Sub-section 3.5 was framed as an add-on to the in-silico modelling in 3.4, which is misleading, as the WK index was derived from the data in 3.1.
Discussion:
- LN 420-436. The discussion section is not the place for a detailed introduction/background on concepts that are not entirely relevant to the results. Reframe to focus specifically on the scope of the study – Vif and Vpr proteins!
The manuscript requires extensive editing and proofreading by a native English speaker or someone with a high proficiency in the English language. The grammatical errors and typos throughout the results and discussion sections made it difficult to read this manuscript.
Author Response
Dear Reviewer 1,
The authors express their deep gratitude to you for very attentive to our article and made the very valuable and fair comments. We have analyzed in detail each of your comments and remarks and we hope that the article became clearer and better formulated.
Please find our detailed replies below.
Comment 1:
Abstract:
LN 27-28: This says nothing about the features. What features are you referring to?
Response 1:
The statement «The features of HIV-1 Group M clades in Vif and Vpr proteins have been determined: some of them might influence on functional activity» was changed into:
«In consensus sequences the substitutions, which might influence on HIV-infection progression, have been determined: in Vif - 22H (11_cpx,91_cpx) and 136P(A6,01_AE,15_01B, 59_01B, 89_BF1, 103_01B, 111_01C, 133_A6B), in Vpr – 41N (06_cpx) and 55A (B, 07_BC, 35_01D, 56_cpx, 66_cpx, 66_BF1, 71_BF1, 85_BC, 137_0107 ). In functional motifs were noted CSSs associated with changing in chemical properties of amino acids residues. These findings could be taken in account for the development of therapeutic drugs in the future».
Comment 2:
Methods:
Phylogenetic tools are constantly being updated with improved statistical models to ensure the reliability of data analysis. The key software, MEGA 6.0, used for this analysis is an outdated version and might not be ideal for the current data analysis. What considerations exist for choosing v6 over the latest versions 10 (available since 2018) or 12?
Response 2:
You are right. We apologize. This is a typo. We used MEGA -X. We have made the correction to the text.
Comment 3:
How did you validate the subtype/clade assignment of these sequences? The subtype assignment in LANL and other databases is not error-proof. As a minimum, a phylogenetic tree reconstruction with database reference sequences is required to confirm these subtypes/clades. A maximum likelihood tree with aLRT or a fast-bootstrapping method should suffice for this purpose (use recently updated ML tree building software!).
Response 3:
For greater reliability of subtyping of sequences, we included only full-genome sequences, which were previously subtyped in Genbank. We had no reason to doubt the result of the virus variant identification based on the full-genome analysis. A similar methodology was previously used by Troyano-Hernáez et al in two studies:
- Troyano-Hernáez, P.; Reinosa, R.; Holguín, A. Genetic Diversity and Low Therapeutic Impact of Variant Specific Markers in HIV-1 Pol Proteins. Front. Microbiol. 2022, 13, 866705. DOI: 10.3389/fmicb.2022.866705
- Troyano-Hernáez, P.; Reinosa, R.; Holguín, Á. HIV Capsid Protein Genetic Diversity Across HIV-1 Variants 706 and Impact on New Capsid-Inhibitor Lenacapavir. Microbiol. 2022, 13, 854974. DOI: 10.3389/fmicb.2022.854974
Comment 4:
The font size in the figures is too small and illegible to read. This made it difficult and frustrating to evaluate the contents of these figures.
The authors should simplify Figure 1 for easy comprehension. It wasn’t clear whether the bottom panel of Figure 1A is a continuation of the top panel (at a glance, this looks like MSA of three distinct protein sequences). Consider using an alternative chart for depicting the genetic diversity other than an MSA, for Figure 1; otherwise, simplify this figure.
Response 4:
You are right.. It seems that the quality of drawings decreased during the conversion process. Initially all drawings are high resolution (1200 dpi) and do not lose quality when scaled (up to 400%). We will draw the editor's attention to this.
The bottom panel of Figure 1A is a continuation of the top panel. This is indicated by the continuation of the numbering of amino acids and the division into sections: A and B , each of which corresponds to a protein sequences (Vif or Vpr). We find this figure appropriate with no changes deemed necessary (with the exception of the quality of figure).
Comment 5:
LN 274: This refers to Figure 2B, and not Figure 3B.
It will be helpful to reiterate the implications of the WK values here - what low and high levels mean – LN 380-384.
Response 5:
Thank you for pointing this out. We have made a correction.
The statement
«In our study, the following criteria were used: WK <10% were classified as low level, from 10 to 20 as an intermediate level, from 20% to 50% as above average level, and 50% or more as high level.»
was placed immediately before the description of the results and after Figure 6.
Comment 6:
There is a significant overlap of the results/methodology of subsections 3.1 (AA variability) and 3.5 (WK variability index). Sub-section 3.5 was framed as an add-on to the in-silico modelling in 3.4, which is misleading, as the WK index was derived from the data in 3.1.
Response 6:
In 3.1. we analyzed only 37 subtypes/CRFs with more than 8 sequences, we have compared the aa diversity in clades and for each clade we identified the most frequently occurring amino acid residue at that position.
In 3.5. The WK coefficient evaluated aa position variability in regards to evolutionary replacements. In this analysis we analyzed all sequences in HIV-1 Group M.
Comment 7:
Discussion:
LN 420-436. The discussion section is not the place for a detailed introduction/background on concepts that are not entirely relevant to the results. Reframe to focus specifically on the scope of the study – Vif and Vpr proteins!
Response 7:
Thank you for the feedback. We've completely rewritten that section.
The adjustments we made are highlighted in yellow in the article.
Reviewer 2 Report
Comments and Suggestions for Authors1). The paper states:
"Consensus sequence calculations were performed employing the Consensus Maker tool (accessible at https://www.hiv.lanl.gov/content/sequence/CONSENSUS/consensus.html, accessed on 28 December 2024), using the entirety of the acquired sequences for each respective HIV-1 clade."
and
"The study was restricted to HIV-1 group M sequences; unique recombinant forms (URFs) were excluded from consideration. To ensure data integrity, sequence inclusion was predicated on only near full-length HIV-1 genomic representation, with subsequent removal of (i) redundant sequences (defined as duplicate sequences originating from the same patient), (ii) sequences exhibiting premature termination codons within the Vif and Vpr coding regions, and (iii) sequences manifesting greater than 5.0% ambiguous amino acid residues. The resultant amino acid sequences underwent alignment against the HXB2 reference strain (GenBank accession number K03455) using the ClustalX algorithm. Post-alignment processing, including analysis and sequence trimming, was performed with MEGA v6.0 software [46]. Manual curation of the sequence alignments ultimately yielded a final dataset comprising 5286 Vif and Vpr protein sequences with documented collection dates spanning the period from 1978 to 2023. A comprehensive listing of GenBank accession numbers, geographical origins, and collection dates pertaining to the HIV-1 sequences incorporated in this study is 116 presented in Table S1."
The LANL HIV Database provides consensus sequences for each suptype and CRF (https://www.hiv.lanl.gov/content/sequence/NEWALIGN/help.html#consensus https://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html ) after selecting for one sequence per patient. It can be important to not include all sequences because in some cases researchers have sequenced hundreds of sequences from one patient and this can bias the consensus toward that patient. Likewise the HIV-1 M group consensus can be biased toward subtype C, and/or some of the circulating recombinant forms such as CRF07_BC and CRF08_BC which are derived from subtype C, if all sequences are used, so a more balanced consensus can be made by using the consensus of each subtype; a "consensus of the subtype consensus sequences". The authors have duplicated this effort.
The LANL HIV Database currently contains 53,460 HIV-1 Vif sequences ( https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html ) and 53,534 HIV-1 Vpr sequences. The vast majority are not part of complete genome sequences. Including many more sequences doe not help in most analyses of diversity such as this paper. But for some purposes, such as comparing subtype B with over 1000 complete genome sequences to subsubtpe A6 with only around 70 genomes, it could help to test if A6 diversity would be higher if more sequences were included. This is not necessary for this paper, but just a suggestion for future work by the authors.
"For the first time, the comprehensive analysis of the genetic diversity of the Vif and Vpr proteins in HIV-1 group M clades was carried out. The average level of conservation in the HIV-1 group M was 86.4% for Vif and 91.3% for Vpr. In both proteins, sub-subtype A6 showed the lowest amino acid diversity, while subtype B had the highest number of amino acid residue changes. In general, the level of conservation in the functional domains in Vif and Vpr varied across clades."
and
"Moreover, the rate of inter clade diversity could depend on wide spreading. Thus, clades may be under different selective pressures in human subpopulations: different HLA alleles induce different escape-mutations [60]. Subtype B is spreading in different parts of the world: South and North America, North Africa and Middle East, Europe and Oceania, while sub-subtype A6 is distributed in Russia and the former Soviet Union [2,61]."
The major reason for this, is that subtype B has roughly 1,300 sequences of complete genomes, one per patient included in the analysis and subtype B has been spreading round the world since the 1970s. Subsubtype A6 has only about 70 sequences of complete genomes one per patient, and has only been spreading in the Former Soviet Union region since the late 1980s. Several papers have suggested that differences in human HLA alleles could drive selection of different subtypes in different human populations. But human HLA distributions are not very different, and the evolution of the HIV-1 M group subtypes and subsubtpes is mostly due to the epidemiological history of the spread of viruses around the world over time.
Minor English language use improvement:
In recently study the emergence of new recombinant forms (CRF157_A6C and CRF158_0107) were described [6,7].
should be:
In recent studies the emergence of new circulating recombinant forms (CRF157_A6C and CRF158_0107) were described [6,7].
Author Response
Dear Reviewer 2,
The authors express their deep gratitude to you for very attentive to our article and made the very valuable and fair comments. We tried to consider all comments and amendments and we hope that the article became clearer and better formulated.
Please find my detailed replies below.
Comments on the Quality of English Language
The manuscript requires extensive editing and proofreading by a native English speaker or someone with a high proficiency in the English language. The grammatical errors and typos throughout the results and discussion sections made it difficult to read this manuscript.
Response
We have corrected the text and hope that it became clearer and better formulated.
To the paper states:
Comments 1:
1."Consensus sequence calculations were performed employing the Consensus Maker tool (accessible at https://www.hiv.lanl.gov/content/sequence/CONSENSUS/consensus.html, accessed on 28 December 2024), using the entirety of the acquired sequences for each respective HIV-1 clade."
and
"The study was restricted to HIV-1 group M sequences; unique recombinant forms (URFs) were excluded from consideration. To ensure data integrity, sequence inclusion was predicated on only near full-length HIV-1 genomic representation, with subsequent removal of (i) redundant sequences (defined as duplicate sequences originating from the same patient), (ii) sequences exhibiting premature termination codons within the Vif and Vpr coding regions, and (iii) sequences manifesting greater than 5.0% ambiguous amino acid residues. The resultant amino acid sequences underwent alignment against the HXB2 reference strain (GenBank accession number K03455) using the ClustalX algorithm. Post-alignment processing, including analysis and sequence trimming, was performed with MEGA v6.0 software [46]. Manual curation of the sequence alignments ultimately yielded a final dataset comprising 5286 Vif and Vpr protein sequences with documented collection dates spanning the period from 1978 to 2023. A comprehensive listing of GenBank accession numbers, geographical origins, and collection dates pertaining to the HIV-1 sequences incorporated in this study is 116 presented in Table S1."
The LANL HIV Database provides consensus sequences for each suptype and CRF (https://www.hiv.lanl.gov/content/sequence/NEWALIGN/help.html#consensus https://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html ) after selecting for one sequence per patient. It can be important to not include all sequences because in some cases researchers have sequenced hundreds of sequences from one patient and this can bias the consensus toward that patient. Likewise the HIV-1 M group consensus can be biased toward subtype C, and/or some of the circulating recombinant forms such as CRF07_BC and CRF08_BC which are derived from subtype C, if all sequences are used, so a more balanced consensus can be made by using the consensus of each subtype; a "consensus of the subtype consensus sequences". The authors have duplicated this effort.
2.The LANL HIV Database currently contains 53,460 HIV-1 Vif sequences ( https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html ) and 53,534 HIV-1 Vpr sequences. The vast majority are not part of complete genome sequences. Including many more sequences doe not help in most analyses of diversity such as this paper. But for some purposes, such as comparing subtype B with over 1000 complete genome sequences to subsubtpe A6 with only around 70 genomes, it could help to test if A6 diversity would be higher if more sequences were included. This is not necessary for this paper, but just a suggestion for future work by the authors.
Response 1:
1.Agree, but we chose not to do so. Here are a few reasons "why":
- according to the description, the consensus sequences on LANL HIV Database were built using the all available HIV-1 sequences though the end of 2019.
Our dataset also included the later data (after 2019 until the end of 2023).
- unfortunately, it is not clear from the consensus sequences description whether or not they were only from the nearly full-length HIV-1 genomes or from all available HIV-1 sequences.
- it is difficult to judge, whether sequences with stop codons and, for example, sequence with more than 5.0% ambiguous amino acids, had been excluded from the dataset.
We took all these into account while construction the consensus sequences to the same extent as they had formerly done in such studies
Troyano-Hernáez P., Reinosa R., Holguín Á. HIV Capsid Protein Genetic Diversity Across HIV-1 Variants and Impact on New Capsid-Inhibitor Lenacapavir. Front. Microbiol. 2022;13:854974. doi: 10.3389/fmicb.2022.854974;
Schlösser M, Kartashev VV, Mikkola VH, et al. HIV-1 Sub-Subtype A6: Settings for Normalised Identification and Molecular Epidemiology in the Southern Federal District, Russia. Viruses. 2020;12(4):475. Published 2020 Apr 22. doi:10.3390/v12040475
- Agree. That's why we included only one sequence per individual (the earliest sequence) (LN 106-107). The Group M consensus sequence was obtained from the consensus sequences of all the HIV-1 clades, that is, "consensus of the subtype consensus sequences"(LN 126-127).
You are absolutely right. Including partial genomic sequences in the analysis may lead to subtyping errors and the inclusion of such sequences in the dataset does not help in most analyses of diversity such as this paper. Nevertheless, we’ll consider your suggestion for the future.
Comments 2: The major reason for this, is that subtype B has roughly 1,300 sequences of complete genomes, one per patient included in the analysis and subtype B has been spreading round the world since the 1970s. Subsubtype A6 has only about 70 sequences of complete genomes one per patient, and has only been spreading in the Former Soviet Union region since the late 1980s. Several papers have suggested that differences in human HLA alleles could drive selection of different subtypes in different human populations. But human HLA distributions are not very different, and the evolution of the HIV-1 M group subtypes and subsubtpes is mostly due to the epidemiological history of the spread of viruses around the world over time.
Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we have made the correction (addition).
Comments 3:
Minor English language use improvement:
In recently study the emergence of new recombinant forms (CRF157_A6C and CRF158_0107) were described [6,7].
should be:
In recent studies the emergence of new circulating recombinant forms (CRF157_A6C and CRF158_0107) were described [6,7].
Response 3:
Agree. We have made the correction.
The adjustments we made are highlighted in yellow in the article
Reviewer 3 Report
Comments and Suggestions for AuthorsVpr and Vif are important accessory proteins during HIV-1 infection. With the increasing number of global infections, the genetic diversity of HIV-1 has also been gradually rising. Therefore, Galzitskaya et al. analyzed the genetic diversity of Vif and Vpr using data from the HIV Sequence Database maintained by the Los Alamos National Laboratory (available at www.hiv.lanl.gov/, accessed on 28 December 2024). This study demonstrates a certain degree of innovativeness, but there are still some issues that need to be addressed before publication.
1.The authors conducted a single-center study, and comparisons with studies from other countries or regions should be included in the discussion, particularly regarding the genetic diversity associated with Vif and Vpr.
2.The author predicted the interaction between vif mutants and APOBEC3G using bioinformatics software, but whether there are specific changes in their interaction should preferably undergo preliminary experimental verification.
Author Response
Dear Reviewer 3,
The authors express their deep gratitude to you for very attentive to our article and made the very valuable and fair comments. We tried to consider all comments and amendments and we hope that the article became clearer and better formulated.
Please find my detailed replies below.
Comments 1: The authors conducted a single-center study, and comparisons with studies from other countries or regions should be included in the discussion, particularly regarding the genetic diversity associated with Vif and Vpr.
Response 1: Thank you for the feedback. We have completely rewritten the beginning of "Discussion" section, where we reviewed the existing locally studies of genetic variability of the Vif and Vpr proteins.
Comments 2: The author predicted the interaction between vif mutants and APOBEC3G using bioinformatics software, but whether there are specific changes in their interaction should preferably undergo preliminary experimental verification.
Response 2: Thank you for your comment. It's certainly very interesting and necessary to test the predicted interaction experimentally. We, or other scientists interested in this problem, will verify these interaction results.
The adjustments we made are highlighted in yellow in the article
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have satisfactorily addressed most of my comments, except comment #3. The response given here for not addressing this specific comment is inadequate to ensure the robustness of assessing genetic diversity. For computational constraints, I suggest validating a representative subset of your dataset - the sequences used for generating Table 1. Refer to Comment 3 for the validation and include the results as supplementary data and highlight in the main text that "the phylogenetic reconstruction method using .... revealed that XX/YY were correctly classified as their database-assigned subtypes".
Comment 3:
How did you validate the subtype/clade assignment of these sequences? The subtype assignment in LANL and other databases is not error-proof. As a minimum, a phylogenetic tree reconstruction with database reference sequences is required to confirm these subtypes/clades. A maximum likelihood tree with aLRT or a fast-bootstrapping method should suffice for this purpose (use recently updated ML tree building software!).
Comments on the Quality of English Language
The manuscript requires extensive editing and proofreading by a native English speaker or someone with a high proficiency in the English language. The grammatical errors and typos throughout the results and discussion sections made it difficult to read this manuscript.
Author Response
Dear Reviewer,
We are deeply touched by the attention with which you have treated our work. According to your comments, we carefully corrected the wording and Figures (2 and 3).
As for the phylogenetic reconstruction method, the annotations of sequence subtype in HIV LANL were accepted at face value. This is the general methodological approach that is common among studies, requiring the generation of the subtype-specific dataset.
Below you could find studies that use this approach:
- Kupperman MD, Leitner T, Ke R. A deep learning approach to real-time HIV outbreak detection using genetic data. PLoS Comput Biol. 2022;18(10):e1010598. Published 2022 Oct 14. doi:10.1371/journal.pcbi.1010598
- Gartland M, Arnoult E, Foley BT, et al. Prevalence of gp160 polymorphisms known to be related to decreased susceptibility to temsavir in different subtypes of HIV-1 in the Los Alamos National Laboratory HIV Sequence Database. J Antimicrob Chemother. 2021;76(11):2958-2964. doi:10.1093/jac/dkab257
- Linchangco GV Jr, Foley B, Leitner T. Updated HIV-1 Consensus Sequences Change but Stay Within Similar Distance From Worldwide Samples. Front Microbiol. 2022;12:828765. Published 2022 Jan 31. doi:10.3389/fmicb.2021.82876
- Troyano-Hernáez P, Reinosa R, Holguín Á. HIV Capsid Protein Genetic Diversity Across HIV-1 Variants and Impact on New Capsid-Inhibitor Lenacapavir. Front Microbiol. 2022;13:854974. Published 2022 Apr 12. doi:10.3389/fmicb.2022.854974
- Dampier W, Berman R, Nonnemacher MR, Wigdahl B. Computational analysis of cas proteins unlocks new potential in HIV-1 targeted gene therapy. Front Genome Ed. 2024;5:1248982. Published 2024 Jan 4. doi:10.3389/fgeed.2023.1248982
Additionally, to exclude the any possibility of incorrect subtyping, we selected only full-genome sequences. From the selected full-genome sequences, we cut out the analyzing genome fragments.
Based on these facts, we deem it acceptable use of the information from the HIV LANL without further double-checking the assigned subtype.