Potential Associations of Mutations within the HIV-1 Env and Gag Genes Conferring Protease Inhibitor (PI) Drug Resistance

: An increasing number of patients in Africa are experiencing virological failure on a second-line antiretroviral protease inhibitor (PI)-containing regimen, even without resistance-associated mutations in the protease region, suggesting a potential role of other genes in PI resistance. Here, we investigated the prevalence of mutations associated with Lopinavir/Ritonavir (LPV/r) failure in the Envelope gene and the possible coevolution with mutations within the Gag-protease ( gag-PR ) region. Env and Gag-PR sequences generated from 24 HIV-1 subtype C infected patients failing an LPV/r inclusive treatment regimen and 344 subtype C drug-naïve isolates downloaded from the Los Alamos Database were analyzed. Fisher’s exact test was used to determine the differences in mutation frequency. Bayesian network probability was applied to determine the relationship between mutations occurring within the env and gag-PR regions and LPV/r treatment. Thirty-ﬁve mutations in the env region had signiﬁcantly higher frequencies in LPV/r-treated patients. A combination of Env and Gag-PR mutations was associated with a potential pathway to LPV/r resistance. While Env mutations were not directly associated with LPV/r resistance, they may exert pressure through the Gag and minor PR mutation pathways. Further investigations using site-directed mutagenesis are needed to determine the impact of Env mutations alone and in combination with Gag-PR mutations on viral ﬁtness and LPV/r efﬁcacy.


Introduction
While the dispensing of HIV-1 protease inhibitors (PIs) as part of a routine secondline regimen has been a significant turning point in the management of HIV-1 [1], the development of resistance to PIs is also increasing [2].
PIs are known to inhibit the activity of the HIV-1 protease (PR) enzyme responsible for the proteolytic processing of HIV structural Gag and enzymatic Pol polyprotein components [3]. The viral protease (PR) enzyme cleaves the Gag precursor protein (Pr55 Gag ) and the Gag-Pol precursor protein (Pr160 Gag-Pol ), resulting in virion maturation [3][4][5]. This proteolytic process prevents the formation and maturation of infectious HIV particles. The efficacy of the PI is limited by the emergence of resistance mutations that are potentially caused by poor compliance, subtherapeutic systemic levels of the drug, or prolonged treatment with one PI-based regimen during virologic failure [6].
Interestingly, >20% of patients failing a PI-based regimen do not harbor any resistanceassociated mutations in the protease (PR) domain [7][8][9]. While non-adherence has been identified as a key player, some studies have suggested that Gag mutations can indirectly affect the Gag cleavage site and drive PI resistance without any mutations in the protease region [10,11]. These mutations were commonly found close to the cleavage sites, although non-cleavage site mutations have also been linked with PI resistance [12,13]. Verheyen et al. (2006) have linked Gag precursor p7-p1 (A431V, K436R, and I437V) and p1-p6 (L449P, P452S, and P453L) cleavage site mutations with resistance to PIs [14][15][16]. These mutations are thought to compensate for any enzymatic impairment of protease caused by the loss of van der Waals interactions between the inhibitor and binding sites [12,14,15,[17][18][19][20].
Others have suggested that the mechanism for PI failure may involve mutations in the Envelope (env) gene [5,21]. Coetzer et al. (2017) reported Gp41 mutations from both the Heptad repeat (HR) (607T and 641L) and cytoplasmic tail (CT) (721I) that potentially contribute to PI failure [21] Two studies that looked at virological failure during PI exposure both reported that mutations in the CT impacted PI susceptibility [22,23]. Furthermore, Env mutations have been suggested to promote cell to cell transmission, leading to highlevel drug resistance mutations in ARV target genes; by doing so, they increase the level of resistance to a broad panel of ARVs in vitro [24,25]. The HIV-1 Envelope is said to concurrently evolve to escape from both neutralizing antibodies (NABs) and ARVs [25].
Although studies have reported the emergence of Gag and Env mutations in patients failing PIs [21][22][23], to date, no one has reported on the coevolution of Gag and Env in these patients. Here, we investigated the prevalence of Env mutations associated with PI treatment failure and the possibility of coevolution with mutations within Gag, and protease, in full-length Env and Gag-PR sequences from HIV-1 subtype C infected individuals failing a PI-inclusive regimen.

Study Cohort
This retrospective study used 24 stored plasma samples obtained from virologically failing PI-treated patients enrolled in the Protease Cleavage Site (PCS) study (2009)(2010)(2011)(2012)(2013) at McCord and King Edward VIII hospitals Durban, South Africa [26]. All enrolled patients received Lopinavir/Ritonavir therapy for at least six months and had plasma HIV-1 RNA levels > 1000 copies/mL. In addition, sequences from 344 subtype C drug-naïve isolates were downloaded from the Los Alamos HIV-1 Database (http://hiv-web.lanl.gov) accessed on 10 September 2020. Written informed consent was obtained from all study participants. The Biomedical Research Ethics Committee of the University of KwaZulu-Natal (BREC NO: 678/17) approved the study.

Amplification and Sequencing Analyses of the Env Domain
Viral RNAs were extracted from 140 µL of plasma using the QIAamp RNA kit (QIA-GEN Services, Inc., Germantown, MD, USA) and reverse-transcribed using a Thermoscript RT-PCR kit (Invitrogen, Carlsbad, CA, USA). A 1.7 kb Gag-Protease product was amplified by nested PCR using the Takara Ex Taq HS enzyme kit. Specific primers, Gag +1 5 -GAGATCTCTCGACGCAGGAC-3 (HXB2 nucleotide: 675 to 697, forward primer) and 3 rvp 5 -GGAGTGTTATat GGATTTTCAGGCCCAATT-3 (HXB2 positions: 2696 to 2725, reverse primer), were used for the first-round PCR, at 55 • C for 30 min (cDNA synthesis) and 94 • C for 2 min (initial denaturation), followed by 35 cycles of 94 • C for 15 s (denaturation), 55 • C for 30 s (annealing) and 68 • C for 2 min (extension), and ended with a 5 min incubation at 68 • C (final extension). Long fwd 5 -GAC TCG GCT TGC TGA AGC GCG CAC GGC AAG AGG CGA GGG GCG ACT GGT GAGTAC GCC AAA AAT TTT GAC TAG CGG AGG CTA GAA GGA GAGAGA TGG G-3 (HXB2 nucleotide: 695 to 794, forward primer) and Long rev 5 -GGC CCA ATT TTT GAA ATT TTT CCT TCC TTT TCC ATT TCT GTA CAA ATT TCT ACT AAT GCT TTT ATT TTT TCT GTC AAT GGC CAT TGT TTA ACT TTT G-3 (HXB2 nucleotide: 2706 to 2805, reverse primer) were used for nested PCR at the following conditions: 94 • C for 2 min (initial denaturation), 40 cycles of 94 • C for 30 s (denaturation), 60 • C for 30 s (annealing) and 72 • C for 2 min (extension) followed by a 7 min hold at 72 • C (final extension). Primers were complementary to subtype B, gag-protease-deleted NL4-3 plasmid (pNL4-3∆gag-protease) on either side. Amplicons were electrophoresed in a 1% agarose gel to confirm the presence of the 1.7 kb amplicon corresponding to the gag-PR gene. The size of the product was determined using the GeneRuler™ 1 kb DNA Ladder (Fermentas International Inc., Waltham, MA, USA), as shown in Supplementary Figure S1.

Statistical Analysis
Multiple sequence alignments were generated using the ClustalX program (http: //www.clustal.org/) accessed on 16 September 2020, and manually edited using BIOEDIT (Ibis Biosciences, An Abbott Company, Sylmar, CA, USA). The Virulign tool (https:// github.com/rega-cev/virulign) accessed on 11 December 2020, was used to determine the amino acid substitution at each position in Gag-PR and Env on PI exposure and naïve isolate sequences. Fisher's exact test was performed for all positions to identify mutations significantly more prevalent (p-value < 0.05) in PI-experienced patients.

Bayesian Network
Bayesian network learning was performed using B-Course (http:/www.b-course) accessed on May 2021, which modeled direct and indirect associations between Gag, PR, and Env mutations and exposure to PI treatment. Only significant mutations identified using Fisher's exact test were included in the model together with known PI-resistanceassociated mutations. As previously described, associations between mutations learned and graphical representations were evaluated [28,29]. Non-parametric bootstrap analysis was calculated by running 5000 replicates to derive network robustness. Only interactions with bootstrap support over 65% were included in the network.

Coevolution Using CAPS
Gag-PR-Env coevolution analysis was performed using the Coevolution Analysis for Protein Sequences (CAPS, http://bioinf.gen.tcd.ie/caps/) accessed on 26 October 2020 program. CAPS identifies groups of coevolving pairs with a correlation coefficient > 0.5 by using amino acid (AA) sites to compare the correlated variance of their evolutionary rate using the probability scores between two pairs of aligned sequences [30]. The evolutionary rate was estimated using the blocks substitution matrix (BLOSUM). The distribution of 5000 randomly sampled values was used to identify coevolving codons.

Positive Selection
Positive selection pressure in Gag-PR-Env sites was assessed using the rate ratio of non-synonymous to synonymous (ω) substitutions in the CODEML program of the PAML software package (version 4.9) [31]. Analysis was performed using three classes ω ≤ 1, ω = 1, ω ≥ 1 assuming purifying, neutral, and positive selection, respectively. The ω value and likelihood estimates were calculated for three different codon-based ML pairs of site models: M0 (one ω) vs. M3 (discrete), M1a (nearly neutral) vs. M2a (positive selection), and M7 (beta) vs. M8 (beta and ω ≥ 1). Comparison of M0 vs. M3 is a test of site rate variation, while M1 vs. M2 is for positive selection. The likelihood ratio test was used to evaluate the best-fitting model for the data [32]. The Bayes Empirical Bayes method was used to identify specific sites under positive selection.

Participant Characteristics
Twenty-four Gag-Prot-Env sequences from 24 patients failing a PI-inclusive treatment regimen were available for analysis, of whom 14 were females and 10 were males, with the median age of 35 years (interquartile range (IQR) 17-38 years). The median viral load was 4.84 log 10 copies/mL (IQR 4.12-5.51). All patients were infected with HIV subtype C as established by the Virulign tool.

Positive Selected Sites in Envelope and Coevolution with Gag-PR Residues
Here, we determined which amino acid changes were under positive selection pressure following treatment failure using PAML [30]. Figure 2 shows PAML analysis of amino acid changes from participants failing an LPV/r-inclusive treatment regimen and ART-naïve participants that were positively selected. A total of twelve sites were positively selected: V85, T138, Y146, M147, E150, K151, G152, D277, G321, K322, N463, and T465. None of these sites were previously associated with entry and fusion inhibitor or LPV/r failure.

Positive Selected Sites in Envelope and Coevolution with Gag-PR Residues
Here, we determined which amino acid changes were under positive selection pressure following treatment failure using PAML [30]. Figure 2 shows PAML analysis of amino acid changes from participants failing an LPV/r-inclusive treatment regimen and ART-naïve participants that were positively selected. A total of twelve sites were positively selected: V85, T138, Y146, M147, E150, K151, G152, D277, G321, K322, N463, and T465. None of these sites were previously associated with entry and fusion inhibitor or LPV/r failure.

Interactions between Envelope and Gag-PR Mutations with LPV/r Treatment
Using Bayesian network learning, we explored the interactions between Gag, PR, and Env mutations in isolates exposed to LPV/r treatment. Only statistically significant positions according to Fisher's exact test were included in the analysis. Figure 3A shows mutations in the env Gp120 region (S110N-Env, T132S-Env, T138S-Env, P183Q-Env, and Coevolution Analyses for Protein Sequences (CAPS) was used to determine whether amino acids in Gag, PR, and Env sites coevolved in patients failing an LPV/r-inclusive treatment regimen. All positions with amino acid variation greater than 1% were included in the analysis. The coevolution analysis identified seven amino acid coevolving pairs, but these did not correlate significantly. The coevolving pairs were mostly seen between Env and Gag sites (Env: 11, 23, 29,142, 147, 336, 389, 400, 496, 534, 617, and 668; Gag: 108,  225, 256, 335, 372, 388, 478, and 480), while only one was observed between Env and PR (400-Env and 76-PR) sites.

Interactions between Envelope and Gag-PR Mutations with LPV/r Treatment
Using Bayesian network learning, we explored the interactions between Gag, PR, and Env mutations in isolates exposed to LPV/r treatment. Only statistically significant positions according to Fisher's exact test were included in the analysis. Figure 3A shows mutations in the env Gp120 region (S110N-Env, T132S-Env, T138S-Env, P183Q-Env, and Q315R-Env) that were indirectly associated with exposure to LPV/r treatment via known Gag-PR mutations (Q69K-Gag, S111I-Gag, I256V-Gag, and V77I-PR). More specifically, P183Q-Env was indirectly associated with LPV/r treatment via interaction with V77I-PR. T132S-Env also showed a robust connection with Q69K-Gag mutation, which is further related to K20R-PR mutation. Both T138S-Env and S110N-Env showed a strong association with I256V-Gag. Interestingly, a robust direct connection with LPV/r exposure was observed between a known Gag-PR mutation (R76K-Gag) and wildtype variants at codon positions in Gag (Q182-Gag-PR-WT) and PR that have been linked to drug resistance (I50-PR-WT, V77-PR-WT, and L90-PR-WT). Figure 3B shows the association between Gag, PR, Env Gp41 mutations, and LPV/r treatment experience. There was a strong interaction between T536M-Env, S543A-Env, I688V-Env, and P724S-Env and Gag-PR mutations (S12T-PR, P63L-PR, Q182S-Gag, and I256V-Gag). Specifically, P724S-Env was directly associated with P63L-PR and Q182S-Gag and indirectly associated with A431V found at a Gag cleavage site. Another direct association was seen between S534A-Env and I256V-Gag. Both T536M-Env and I688V-Env were directly associated with S12T-PR. There was no direct interaction between treatment experience and Env Gp41 mutations. Figure 3. Annotated Bayesian network based on 5000 bootstraps, showing the association between nodes indicating Env and Gag mutations in subtype C LPV/r-treated sequences. Here, we determined the association between (A) Env Gp120, Gag, PR mutations, and treatment experience and (B) Gp41, Gag, PR, and treatment experience. The color code of nodes is defined based on their link with LPV/r: drug resistance variants (red), treatment associated variants (orange) and natural occurring: Polymorphic variants (gray), wild-type variants (blue). The arcs represent a direct dependency between corresponding nodes, and the thickness of an arc is in proportion to bootstrap support. The arc direction does not represent the accumulation of mutations or causal meaning but may indicate a multivariable effect in the network. Figure 3. Annotated Bayesian network based on 5000 bootstraps, showing the association between nodes indicating Env and Gag mutations in subtype C LPV/r-treated sequences. Here, we determined the association between (A) Env Gp120, Gag, PR mutations, and treatment experience and (B) Gp41, Gag, PR, and treatment experience. The color code of nodes is defined based on their link with LPV/r: drug resistance variants (red), treatment associated variants (orange) and natural occurring: Polymorphic variants (gray), wild-type variants (blue). The arcs represent a direct dependency between corresponding nodes, and the thickness of an arc is in proportion to bootstrap support. The arc direction does not represent the accumulation of mutations or causal meaning but may indicate a multivariable effect in the network.

Discussion
Mutations in the env gene have previously been associated with resistance to PIs [5]; however, no studies have been conducted using subtype C PI failures, nor have they investigated their role in combination with the Gag. In this study, LPV/r-resistanceassociated mutations in Gag, PR, and Env were characterized in HIV-1 subtype C patients from KZN, South Africa, who were failing an LPV/r-inclusive treatment regimen. We identified several amino acids in the env gene that were associated with LPV/r failure. We also found potential pathways leading to LPV/r resistance that involved a combination of Gag, PR, and Env mutations.
In GP41, sequences from LPV/r failures harbored significantly higher frequencies of Heptad repeat (HR) mutations at codons 534, 536, 567, and 632. Consistent with our findings, an HIV-1 subtype A study also observed higher levels of mutations in HR in LPV/r failures [21]. Furthermore, higher mutation frequencies were also seen in the cytoplasmic tail (CT); this is in line with other studies that reported CT mutations associated with virological failure in PI-experienced participants [21][22][23]. As the CT plays a functional role in Env incorporation during virion assembly, this suggests that mutations in this region may influence the efficiency of Env incorporation [34].
Using Bayesian network learning, we found potential pathways leading to LPV/r resistance that involved a combination of Env and Gag mutations. I256V appeared to be an essential mutation as it was shown to be directly associated with three Env mutations (S110N-Env, T138S-Env, and S543A-Env) and indirectly associated with Q315R-Env via S111I-Gag and P724S-Env via Q182S-Gag. Although less is known about the I256V-Gag mutation, it has been previously associated with LPV/r failure in subtype C studies [35,36]. It has also been associated with reducing drug susceptibility to benzodiazepine and benzimidazole in Gag subtype B studies [37]. Mutations T138S-Env and S110N-Env showed a strong interaction with I256V-Gag, forming a pathway (T138S-Env + S110N-Env + I256V + S111C + R76K) to LPV/r exposure. A study by Singh et al. (2015) identified I256V, S111C, and R76K in Gag following LPV/r treatment in subtype C patients in South Africa [35]. Interestingly, T138 was also positively selected in our PAML analysis. Although a wildtype amino acid, T138, is known to play an essential role in antibody binding, and although the change from Threonine (T) to Serine (S) weakened the binding, T138S did not inhibit antibody binding [38].
Another potential pathway for LPV/r resistance included Env mutations Q315R-Env and the same Gag mutations I256V, S111C, and R76K, with the addition of S111I. While Q315R has been identified as a naturally occurring polymorphism in subtype A [39], its role as a drug-resistance-associated mutation in subtype C requires more mechanistic studies. Mutation T132S-Env showed a robust connection with Q69K-Gag, which has previously been associated with reduced susceptibility to PIs [40].
Interestingly, known PI-resistance-associated mutations K20R-PR, I54V-PR, L10F-PR, and L76V-PR were not directly connected to LPV/r exposure in the network, suggesting that immune selection pressure may also be in play.
While not directly associated with LPV/r resistance, Env Gp41 (S543A-Env and P724S-Env) mutations were associated with Gag (Q182S-Gag and I256V-Gag) mutations. Env mutation P724S was strongly associated with Q182S-Gag, which was further associated with I256V. S534A-Env was also directly associated with I256V-Gag, further supporting the role of I256V-Gag in connecting Env in the pathway to drug resistance. Interestingly, most of the mutations in Gag connected to Env mutations were located in the capsid domain (such as I256V), which is similar to other studies [21].
Another potential network to LPV/r resistance involved mutations found in the env in combination with the gag and PR region. Env mutation P183Q-Env was directly associated with V77I-PR, previously associated with PI drug resistance [41][42][43]. In addition, Env mutations T536M and I688V showed a strong direct association with PR mutation S12T-PR, which is known to be selected in LPV/r-and RTV-treated patients in subtype C [40]. Interestingly, the P724S Env mutation was linked to LPV/r failure via the P63L-PR and A431V-Gag pathways. The A431V Gag mutation located in the CS of Gag is known to confer resistance to all PIs except Darunavir (DRV) [15], while P63L-PR was observed at baseline in patients initiating antiretroviral therapy (Lopinavir/r, Lamivudine, and Zidovudine) [15]. These findings suggest that there might be other pathways to LPV/r treatment failure that involve Gag (Matrix and Capsid), minor PR, and Env mutations.

Conclusions
We found a high prevalence of Env mutations in HIV-1 subtype C associated with LPV/r treatment failure. The majority of these associations were in combination with a Gag (Matrix and Capsid) and/or PR mutation. Further investigations using site-directed mutagenesis need to be conducted to determine whether Env mutations alone can affect LPV/r efficacy.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/microbiolres12040071/s1, Figure S1: Gel picture, showing the amplification of Gag-PR. Gag-PR size 1.7 kb. Figure S2: Gel picture showing the amplification of envelope. Envelope size 3.5 kb. Table S1: Sequencing Primers.  Informed Consent Statement: Consent was not applicable since the study was conducted using the stored plasma samples. Data Availability Statement: Data are contained within the article and the Supplementary Material; further information can be obtained from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.