Genetic Features of HIV-1 Integrase Sub-Subtype A6 Predominant in Russia and Predicted Susceptibility to INSTIs

The increasing use of the integrase strand transfer inhibitor (INSTI) class for the treatment of HIV-infection has pointed to the importance of analyzing the features of HIV-1 subtypes for an improved understanding of viral genetic variability in the occurrence of drug resistance (DR). In this study, we have described the prevalence of INSTI DR in a Russian cohort and the genetic features of HIV-1 integrase sub-subtype A6. We included 408 HIV infected patients who were not exposed to INSTI. Drug resistance mutations (DRMs) were detected among 1.3% of ART-naïve patients and among 2.7% of INSTI-naïve patients. The prevalence of 12 polymorphic mutations was significantly different between sub-subtypes A6 and A1. Analysis of the genetic barriers determined two positions in which subtype A (A1 and A6) showed a higher genetic barrier (G140C and V151I) compared with subtype B, and one position in which subtypes A1 and B displayed a higher genetic barrier (L74M and L74I) than sub-subtype A6. Additionally, we confirmed that the L74I mutation was selected at the early stage of the epidemic and subsequently spread as a founder effect in Russia. Our data have added to the overall understanding of the genetic features of sub-subtype A6 in the context of drug resistance.


Introduction
It was estimated that more than 1 million individuals were living with HIV-1 in the Russian Federation at the end of 2019, which is 0.7% of the total population of the country [1].
Advances in combination antiretroviral therapy (cART) have improved treatment effectiveness for people living with HIV, increasing life expectancy and quality of life.
The HIV-1 integrase strand transfer inhibitor (INSTI) class of antiretroviral drugs is the latest to be approved for treatment and is the favored class recommended by different international guidelines, including the Russian treatment guidelines since 2017, as part of first-line treatment and salvage regimens [2][3][4][5][6]. resistance mutations, native protein activity, structure, and the function of the drug-mediated inhibition of the enzyme, which may have important implications for INSTI therapy [13,22,[34][35][36][37][38][39][40].
Therefore, the expanded use of INSTIs in Russia makes it urgent to analyze the HIV-1 integrase gene. Following this need, the aims of this study were to describe INSTI resistance profiles among INSTI-naïve patients, to analyze the prevalence of naturally occurring polymorphisms and genetic barriers, and also to investigate the dispersal patterns of A6 resistance mutations by means of phylogenetic and phylodynamic analysis.

Study Population
We analyzed integrase sequences that had been obtained as part of routine clinical care from 408 HIV-infected patients between 2007 and 2019 in Russia; 225 treatment-naïve patients and 183 INSTI-naïve patients with virological failure to cART were included. This study was approved by the Ethics Review Committee of the Central Research Institute of Epidemiology (Moscow, Russia).

RNA Extraction and Sequencing Method
An AmpliSens ® HIV-Resist-Seq kit (Central Research Institute of Epidemiology, Moscow, Russia) was used for RNA extraction from the plasma samples and sequencing of the HIV pol-gene region coding integrase (4230-5093 bp according to HXB-2, GenBank accession number K03455).

HIV-1 INSTI Resistance Profile Genotyping
The Stanford HIV Resistance Database (HIVdb Program v 8.9-1 and Calibrated Population Resistance Tool) was used to describe and interpret the INSTI resistance profiles.

Polymorphism Analysis
For this analysis, we used the treatment-naïve A6 sequences obtained in this study and worldwide HIV-1 treatment-naïve A1 sequences (n = 100) (4230-5093 bp) obtained from the Los Alamos HIV Database (www.hiv.lanl.gov accessed 23 March 2020).
Naturally occurring polymorphisms of subtype were defined as any substitution that occurred with a frequency of ≥1% in sequences of this subtype compared with the HXB2 HIV-1 clade B sequence (GenBank accession number K03455). All other positions were defined as conserved. Highly polymorphic positions were defined as positions that had substitutions detected in more than 50% of sequences of this subtype.
Fisher's exact test was used to compare the proportion of amino acid substitutions between groups. Statistical significance was defined as p-values < 0.05.

Genetic Barrier Analysis
For this analysis, we used the treatment-naïve A6 sequences obtained in this study and worldwide HIV-1 treatment-naïve A1 sequences (n = 100) and B sequences (n = 2577) obtained from the Los Alamos HIV Database (www.hiv.lanl.gov accessed 23 March 2020). The calculations of genetic barriers were performed as published previously [32]. According to A.M. Vandamme [44] the transitions (replacement A→G and C↔T) occur 2.5 times more frequently than transversions (replacement A↔C, Viruses 2020, 12, 838 4 of 13 A↔T, G↔C, G↔T). Therefore, transitions were scored as 1 and transversions were scored as 2.5. Because of the high rate of APOBEC-mediated hypermutations (G to A) this type of transition was scored as 0.2 [45]. The missing nucleotides were scored as −777. Python v 3.7 script was used to score the genetic barrier to the drug resistance-associated mutations. The genetic barrier was the sum of the scores for each studied amino acid position.
Pairwise differences between subtypes A1, A6, and B sequences were analyzed with the Mann-Whitney test and the Benjamini-Hochberg method to correct for multiple hypothesis testing.
Phylogenetic trees were estimated from the underlying nucleotide sequences using the approximate maximum likelihood (ML) method with bootstrap evaluation under the generalized time reversible (GTR) model as a nucleotide substitution model-including a Γ distributed rate of heterogeneity among sites-as implemented in RaxML [46]. Further phylogenetic analysis was performed in FastTree v2.1 [47] to verify our results. Tree visualization and annotation were performed using FigTree v1.4.2 and iTOL (https://itol.embl.de/).
The phylodynamic analysis was conducted using a Bayesian approach as implemented in BEAST v2.2 [48]. We analyzed sequences found within the Russian samples by using the GTR as the nucleotide substitution model with gamma heterogeneity and an uncorrelated relaxed clock model with lognormal distribution. A Markov chain Monte Carlo (MCMC) analysis was run for 25 × 106 generations and sampled every 5000 steps, with the first 10% of samples being discarded as burn-in. The MCMC convergence and effective sample sizes (ESS) were confirmed using Tracer v1.5. A consensus tree was built, and the distribution was assessed from the posterior tree using TreeAnnotator v1.8 23.

GenBank Accession Numbers
The HIV-1 integrase sequences generated from this study are available in the NCBI database with GenBank accession numbers MT382663-MT383070.

Study Population
We studied the plasma samples of 408 HIV-1-infected patients from different regions of Russia. The main characteristics (demographic, clinical, and laboratory data) of the study population are described in Table S1. The average age of the patients was 32 years (IQR 26-40 years), and the majority of patients were male (57.6%). The dominant transmission routes were heterosexual contact (40.9%), intravenous drug use (IDU, 25.0%), mother-to-child transmission (MTCT, 4.9%), and men who have sex with men (MSM, 5.6%).
Among ART-naïve patients, DRMs were detected in only 3 samples (1.3%). The major mutation, Q146P, was accompanied by the accessory mutation G163R in one patient and by itself in the second patient. The third patient had only the accessory mutation G163R. Thus, according to interpretation-predicted DR from the HIVdb Program, the first patient had high-level EVG and low-level RAL resistance, the second patient had high-level EVG resistance, and the third patient had low-level resistance to EVG and RAL. No patients in this group had mutations on the list of DRMs used for surveillance of transmitted HIV-1 DR (SDRM) [49].

Prevalence of Naturally Occurring Integrase Polymorphisms
We determined 16 polymorphic mutations detected in more than 50% (highly polymorphic) of A6 sequences, which more often belong to the part of the catalytic core domain.
In addition, we analyzed highly polymorphic positions for the A1 sequences (n = 100) as the closest and most studied clade of subtype A. We compared positions that were highly polymorphic for at least one of the A6 or A1 clades and identified 19 amino acid positions and 20 substitutions in them ( Figure 2, Table S2).
In 11 positions (14, 20, 74, 119, 124, 126, 134, 218, 234, 255, and 283), A6 sequences were significantly different (p < 0.05) from the A1 clade, which can be used as a sub-subtype-specific marker to distinguish them. The most polymorphic positions were identified in the N-terminal domain (20/50, 40%), none of which were in the conserved zinc finger motif. In the catalytic core domain, 39 polymorphic positions (39/162, 24.1%) were determined, all of which were out of the DDE catalytic triad. In the C-terminal domain, 16 polymorphic positions were identified (16/76, 21.1%). Additionally, the amino acids in integrase positions that were identified as critical for interaction with the essential HIV integration cofactor LEDGF/p75 linking integrase to chromatin (128,130,131,161,165,166,168,170,172,173,214, and 216) were determined to be conserved [9]. We determined 16 polymorphic mutations detected in more than 50% (highly polymorphic) of A6 sequences, which more often belong to the part of the catalytic core domain. Five highly polymorphic mutations, R20K [50], I72V [9,33,51], L74I [7,[23][24][25][26], S119P [51,52], and V201I [11,26,38,53], were frequently reported in regard to a small reduction in replication capacity relative to the wild type. It has been reported that in the absence of major mutations, all these polymorphic mutations had little, if any, effect on drug susceptibility in vitro, thus suggesting a secondary role for viral fitness rescue and increasing resistance.
In addition, we analyzed highly polymorphic positions for the A1 sequences (n = 100) as the closest and most studied clade of subtype A. We compared positions that were highly polymorphic for at least one of the A6 or A1 clades and identified 19 amino acid positions and 20 substitutions in them ( Figure 2, Table S2).
A comparative analysis of the sub-subtypes A1 (n = 100), A6 (n = 193), and subtype B (n = 2577) sequences showed that the genetic barrier was similar at almost all positions.
For two positions (140 and 151), subtype A (A1 and A6) had a significantly different genetic barrier from subtype B (Table 1).

Figure 2.
Prevalence of integrase highly polymorphic mutations in A1 and A6 viral clades from treatment-naïve patients. The statistically significant differences (p < 0.05) between A1 and A6 are indicated by *, and the unique A6 polymorphic mutations are indicated by **.
For two positions (140 and 151), subtype A (A1 and A6) had a significantly different genetic barrier from subtype B (Table 1).
Position 140, in most sequences of A1 and A6, is encoded by GGG/GGA, and in subtype B, there is a GGC/GGT codon. These codons determine that, with respect to subtype B, the calculated barrier for the G140C mutation, which is a major mutation and associated with resistance to all INSTIs, was equal to 2.5, and a score of 5 was calculated for subtype A.
Position 151 in most sequences of subtype B was codon GTA, and codon GTG was present in sequences of subtype A. Therefore, for the V151I mutation, which could play a role in the resistance to INSTIs, for both sub-subtypes A1 and A6, the genetic barrier was also higher than that of subtype B (0.4 versus 0.2, respectively).
Additionally, the sequences of sub-subtype A6 were significantly different from those in A1 and B at position 74. At this position, for the vast majority of sequences of sub-subtype A6, the codon ATA was present, which determined the genetic barrier for L74M (score of 1) and L74I (score of 0). In most sequences of sub-subtype A1 and subtype B, this position was encoded by the CTG, which determined the genetic barrier to 74M and 74I with scores of 2.5 and 2.7, respectively. For position L74F, A6 sequences had a score of 5, while for sequences of A1 and B, the barrier had a score of 3.5.
Comparative analysis of codon 74 among different genetic variants of subtype A showed that the frequency of substitutions was significantly different (p < 0.05) between the A6 clade and other sub-subtypes A. Sub-subtype A6 and other genetic variants of subtype A included L74L in 3.7% and 86.4% of cases, L74I in 94.9% and 9.6%, L74M in 1.2% and 3%, and L74V in 0.1% and 1%, respectively.
The phylogenetic tree is shown in Figure 3. Phylogenetic analysis suggests that A6 sequences formed a well-defined monophyletic subcluster among different sub-subtypes of subtype A. Within the A6 subcluster, there are a few branches with L74L, suggesting that in a few cases, L74I can revert to wild type. In addition to the majority of samples from Russia, clade A6 contains sequences from Uzbekistan, Tajikistan, Kazakhstan, Armenia, Ukraine, Georgia, and Belarus, 7 sequences from Cyprus, and single sequences from England and Italy.
Interestingly, there was a small separate cluster of viruses with L74M, including 7 samples from Uzbekistan, one sample from Russia (Tomsk), and one sample from Kuwait. However, analysis of the genetic distance between these samples showed that only 7 sequences from Uzbekistan differ by less than 1% (0.2-1.0%), which suggests an epidemiological link between the samples.
Notably, two outlier sequences from Italy (A6.IT.2002.60000. EU861977) and Russia (A6.RU.2000.RU00051. EF545108) with L74L were placed close to the root of the A6 tree. These sequences can be considered as close links to the potential founder of the monophyletic clade of sub-subtype A6. The sequences with L74L belonged to heterosexuals, identified before the burst of the large epidemic among the injectors. Clearly, the viruses with the L74I mutation spread as a result of a founder effect among injectors across FSU countries.
Additionally, given that the majority of A1 sequences, which are the source of A6, do not have any substitutions in codon 74, the overall picture is that L74I was selected at the early stage of the epidemic among injectors in Russia and subsequently spread as a founder effect in Russia and other regions.
Viruses 2020, 12, x FOR PEER REVIEW  8 of 13 Uzbekistan, Tajikistan, Kazakhstan, Armenia, Ukraine, Georgia, and Belarus, 7 sequences from Cyprus, and single sequences from England and Italy. Interestingly, there was a small separate cluster of viruses with L74M, including 7 samples from Uzbekistan, one sample from Russia (Tomsk), and one sample from Kuwait. However, analysis of the genetic distance between these samples showed that only 7 sequences from Uzbekistan differ by less than 1% (0.2-1.0%), which suggests an epidemiological link between the samples.
Notably, two outlier sequences from Italy (A6.IT.2002.60000. EU861977) and Russia (A6.RU.2000.RU00051. EF545108) with L74L were placed close to the root of the A6 tree. These sequences can be considered as close links to the potential founder of the monophyletic clade of subsubtype A6. The sequences with L74L belonged to heterosexuals, identified before the burst of the large epidemic among the injectors. Clearly, the viruses with the L74I mutation spread as a result of a founder effect among injectors across FSU countries.
Additionally, given that the majority of A1 sequences, which are the source of A6, do not have any substitutions in codon 74, the overall picture is that L74I was selected at the early stage of the epidemic among injectors in Russia and subsequently spread as a founder effect in Russia and other regions.   The phylodynamic analysis revealed that the time of the most recent common ancestor (tMRCA) for sub-subtype A6 in Russia was 1998 (95% HPD: 1989HPD: -2003. tMRCA is considered the approximate time of infection of the potential founder of the sub-subtype A6 epidemic, which was determined before its expansion among IDUs in Russia.

Discussion
The development and expanding use of INSTIs in ARV-naïve and ARV-experienced patients make it increasingly important to survey INSTI resistance. Moreover, this is particularly important for patients with non-B viruses because of the lack of data about features of genetic variants and virological outcomes of treatment in these patients.
In this study, we have described the genetic features of HIV-1 integrase sub-subtype A6, which dominated Russia and spread successfully across FSU countries and evolved into one of the fastest-growing epidemics in the world [39][40][41][42]. As expected, the most frequent viral genetic variant for our data samples was sub-subtype A6 (85.8%).
It should be noted that the dominant transmission routes for the studied patients were sexual contact (46.5%), and only 25% were in the traditional risk group (intravenous drug users), which is typical for the Russian epidemic since 2015.
Because the use of INSTIs was introduced recently in Russia [6], we found a low level of INSTI DR. Overall, the frequency of at least one INSTI resistance mutation was 1.3% and 2.7% in treatment-naïve and INSTI-naïve patients, respectively. SDRMs were detected only in 1 INSTI-naïve patient. These data suggest that the implementation of INSTI drugs for the treatment of HIV-infected ARV-naïve and ARV-experienced patients in Russia will be successful.
However, not only DRMs can affect sensitivity changes to drugs. Previous studies have shown that the virological outcomes of a therapy can be related to subtype-specific differences, pointing to the importance of analyzing the features of HIV subtypes for an improved understanding of viral subtype variability in the occurrence of DR [13,15,16,22,[32][33][34][35][36][37][38].
Subtype-specific differences related to polymorphic mutations that can affect viral fitness and can influence INSTI efficacy, even in the absence of major DRMs, can be emerging threats to the success of cART.
Remarkably, the prevalence of 12 polymorphic mutations (K14R, R20K, L74I, S119P, T124S, T124A, V126F, G134N, T218I, L234I, S255N, and S283G) was significantly different between sub-subtypes A6 and A1. This characteristic can be used as a marker of the A6 variant, allowing discrimination of this sub-subtype from A1 viruses. Taking into account that the A6 virus dominating Russia may be misclassified as A1, these gene features allow us to clarify the viral sub-subtype in samples and correct the data about the circulation of different A-subtype viruses.
Naturally occurring polymorphisms can also impact the genetic barrier to DR by influencing the selection of resistance mutations, enzymatic activity, and replicative capacity [12].
The genetic barrier, defined as the number of mutations required to overcome drug selective pressure, is one of the important factors in the development of HIV DR [32].
Analysis of genetic barriers among genetic variants A1, A6, and B displayed a similar genetic barrier for almost all positions, which is consistent with published data [32]. In a large number of sequences of subtype A, we also showed that subtype B differs from subtype A at positions 140 and 151, and subtype B had a lower genetic barrier to the occurrence of mutations G140C and V151I, which could play a role in INSTI resistance.
Additionally, for position 74, we showed a lower genetic barrier for L74M and L74I mutations in sub-subtype A6. This suggests that important differences at position 74 may lead to a lower genetic barrier for the drug resistance pathway in the A6 sub-subtype epidemics.
The L74I mutation appears to be a key characteristic of sub-subtype A6. Therefore, we estimated the temporal origin and phylogenetic characteristics of the sub-subtype A6 viruses with the L74I mutation and showed that viruses with L74I spread as a result of a founder effect among IDUs across FSU countries, validating our initial hypothesis.
The origin and dispersal of L74I are identical, and the tMRCA of L74I was estimated in 1998, which is in accordance with the previously estimated dates of the A6 epidemic among the injectors.
The findings of this article show the important role of features of viral genotypes in drug resistance and also indicate that there is still a need for a better understanding of resistance mechanisms to INSTIs. The potential contribution of naturally occurring polymorphisms, particularly the L74I mutation, in HIV-1 integrase to the evolution of resistance under the selective pressure of INSTIs may have clinical and virological implications. A lower genetic barrier for L74I mutations in sub-subtype A6 can lead to quicker development of drug resistance than subtype B, which is especially important in light of virological failures in patients with the L74I mutation from Russia in the ATLAS and FLAIR studies [15,16]. In vitro studies and studies in clinical practice are necessary to determine what facilitates INSTI resistance in non-B subtypes and how baseline polymorphic mutations impact clinical or virological outcomes.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/12/8/838/s1, Table S1: Characteristics of the patients, Table S2: The prevalence of integrase highly polymorphic mutations in A1 and A6 viral clades in treatment-naïve patients, Table S3: The prevalence of codons in sequences A1, A6, and B and the number of mutations required to obtain resistant codons.