Phylogenetic Characterization of HIV-1 Sub-Subtype A1 in Karachi, Pakistan

(1) Background: HIV-1 sub-subtype A1 is common in parts of Africa, Russia, former Soviet Union countries, and Eastern Europe. In Pakistan, sub-subtype A1 is the predominant HIV-1 subtype. Preliminary evidence suggests that distinct strains of HIV-1 sub-subtype A1 are circulating in Pakistan; however, an in-depth molecular phylogenetic characterization of HIV-1 sub-subtype A1 strains in Pakistan have not been presented. We performed a detailed characterization of the HIV-1 sub-subtype A1 epidemic in Pakistan using state-of-the-art molecular epidemiology and phylodynamics. (2) Methods: A total of 143 HIV-1 sub-subtype A1 gag sequences, including 61 sequences generated specifically for this study from PLHIVs part of our cohort, representing all sub-subtype A1 gag sequences from Pakistan, were analyzed. Maximum-likelihood phylogenetic cluster analysis was used to determine the relationship between Pakistani sub-subtype A1 strains and pandemic sub-subtype A1 strains. Furthermore, we used signature variation, charge distribution, selection pressures, and epitope prediction analyses to characterize variations unique to Pakistani HIV-1 strains and establish the association between signature variations and Gag epitope profile. (3) Results: The HIV-1 sub-subtype A1 sequences from Pakistan formed three main clusters: two that clustered with Kenyan sequences (7 and 10 sequences, respectively) and one that formed a Pakistan-specific cluster of 123 sequences that were much less related to other sub-subtype A1 sequences available in the database. The sequences in the Pakistan-specific cluster and the Kenyan reference strains exhibited several signature variations, especially at amino acid positions 312, 319, 331, 372, 373, 383, and 402. Structural protein modeling suggested that amino acid changes in these positions result in alterations of the Gag protein structure as well as in Gag-specific T-cell epitopes. (4) Conclusions: Our results suggest that the majority of the Pakistan HIV-1 sub-subtype A1 strains were unique to Pakistan and with a specific mutation pattern in Gag.


Introduction
Human immunodeficiency virus type 1 (HIV-1) is a major global health concern and is responsible for approximately 37.6 million infected individuals worldwide [1]. A high level of genetic diversity is a characteristic feature of HIV-1, attributed to error-prone replication, recombination between HIV-1 strains, and evolution under immune selection pressure [2,3]. Over time, HIV-1 has diversified into several hundred lineages (called subtypes, sub-subtypes, or circulating or unique recombinant forms [CRFs or URFs]), often specific to a geographic region, e.g., sub-subtype A1 in East Africa [4,5] and East Europe [6,7], sub-subtype A3 in West Africa [8][9][10], region-specific strains of subtype B in Thailand, Brazil, Korea, Trinidad, and Tobago [11], and HIV-1 subtype C in Brazil [12].
HIV-1 tends to evolve under selection pressure exerted by the host immune system, where human leukocyte antigens (HLAs) play a major role in the CD8+ T-cell immunity [13]. HLA diversity worldwide is shaped by demographics and natural selection [14], and the HLA region on chromosome 6 is a highly polymorphic region of the human genome with >7000 HLA alleles [15]. Moreover, the HLA frequency can differ at the population level and result in differential inter-population selection pressure which may affect the evolution of HIV-1 in a population-specific manner [16].
The HIV-1 epidemic in Pakistan has been expanding rapidly and a large increase (57%) in new HIV-1 infections has been observed during the last ten years [17,18]. These infections have now bridged from high-risk groups, such as people who inject drugs (PWID) and men who have sex with men (MSM), into the low-risk communities, including women and children [19,20]. The HIV-1 epidemic in Pakistan is mainly driven by HIV-1 sub-subtype A1. However, other HIV-1 strains have been found in different outbreaks in the country [19,20]. Several of them have emerged as recombinant forms including sub-subtype A1, e.g., CRF02/A1 [21], indicating an increasing diversity in circulating HIV-1 strains in Pakistan. However, an in-depth molecular characterization of HIV-1 sub-subtype A strains in Pakistan has not been presented. Moreover, the relationship between Pakistani sub-subtype A1 strains and other sub-subtype A1 strains is largely unknown. In the current study, we analyzed HIV-1 gag sequences to characterize the HIV-1 sub-subtype A1 epidemic in Pakistan by state-of-the-art molecular epidemiology and phylodynamics.

Study Population
In total, 143 HIV-1 gag sequences were analyzed. Of these, 61 were generated from whole blood samples (3-5 mL whole blood from each study participant) collected from people living with HIV-1 (PLHIV) enrolled in a Pakistan high-risk community HIV-1 cohort in Karachi-Pakistan. The age of the participants ranged from 18 to 64 years, and most of them were male (92%), with 64% being on antiretroviral therapy (ART, Table 1). Recorded information on risk group included people who inject drugs (PWID, 43%), men who have sex with men (MSM, 13%), spouse of an HIV positive individual (SP, 2%), and heterosexual (5%, Table 1). The majority of participants were diagnosed for HIV-1 infection between 2011 and 2015 (61%), and all samples were collected in 2015 (Table 1, [3]). Samples from PLHIV were recruited after obtaining written informed consent from each participant. Basic information regarding age, sex, ethnicity, ART usage, and high-risk behavior was obtained from each participant. Ethical approval for the study was obtained from the Ethical Review Committees of Aga Khan University (4189-BBS-ERC-16). In addition to the new sequences generated from these samples (as described below), 82 HIV-1 subsubtype A1 gag sequences from Pakistan (previously deposited to the Los Alamos Database [https://www.hiv.lanl.gov]; accessed on 1 October 2018) were included in the study. All accession numbers are listed in Supplementary Table S1.

Subtyping Determination
For subtype determination, gag sequences from Pakistan were aligned with the most recent (2010) HIV-1 subtype reference sequence dataset containing all group M sequences and circulating recombinant forms (CRFs, https://www.hiv.lanl.gov/; accessed on 1 Octo-ber 2018) using the Clustal algorithm, followed by manual editing in Geneious v8.1.9 [22]. The gag alignment was submitted for maximum likelihood (ML) phylogenetic reconstruction in PhyML 3.0 using the following parameters: general time-reversible (GTR) model of nucleotide substitution with a gamma-distributed rate heterogeneity. Branch support was assessed by an approximate likelihood-ratio test based on the Shimodaira-Hasegawa-like procedure (aLRT-SH) [23]. The inferred tree was assessed and annotated in FigTree (v1.4.3; http://tree.bio.ed.ac.uk/software/figtree/; accessed on 10 October 2018). Nodes with aLRT-SH support ≥0.9 were considered significant.

Cluster Analysis
A BLAST approach was used to generate a reference sequence dataset from NCBI Genbank [24] based on similarity with the Pakistani HIV-1 sequences, as previously described [25]. Briefly, the 10 closest matches for each Pakistani sequence were selected, after which duplicate hits were removed using the program skipredundant.exe (threshold 98%) from the EMBOSS package [26]. Information about the collection date, sampling country, and route of transmission was also collected for each sequence. Subsequently, the sequence alignment was used to determine a maximum likelihood (ML) tree as described above.

Time-Scaled Phylogenetic Reconstruction
Time-scaled phylogenetic reconstruction was done using the Bayesian Markov chain Monte Carlo (MCMC) method as implemented in BEAST package v1.10.2 [27] with the following settings: HKY + G nucleotide substitution model with codon partitioning; relaxed clock with an uncorrelated lognormal rate distribution; and SkyGrid demographic model [28]. The prior ulcd. mean was set to uniform with an initial 0.001 (upper limit 1, and lower limit 1.0 × 10 −6 ). The posterior probability ≥0.99 was used to determine monophyletic clusters. MCMC chains were run for 2 × 10 8 generations and sampled every 20,000 steps. Maximum clade credibility (MCC) trees were generated in TreeAnnotator v1.10.2 [29] after 10% burn-in. Time trees were labeled in FigTree v1.4.4 [30]. Two individual runs were performed for each dataset, with log files combined in LogCombiner V1.10.2 [31].

Signature Variation(s) Analysis and Charge Distribution
Consensus nucleotide sequences were generated in Geneious v8.1.9 [22], followed by translation to corresponding amino acid sequences in ExPASy [32] for protein-based analysis. To identify the molecular level variations behind the divergence and temporal changes, the signature variations (defined as variations unique to a particular set of sequences) were identified using the viral epidemiology signature pattern analysis (VESPA) tool available in the Los Alamos HIV Database [33]. The statistical significance of the ratio of the identified unique sites between the strains was determined using the Chi-square test using Graph-Pad (https://www.graphpad.com/quickcalcs/chisquared1/; accessed on 1 January 2019). Amino acid variations were plotted using the WebLogo tool [34]. The effects of signature variations on the biophysical properties of the sequences were determined by calculating the differences in charge distribution on each amino acid along the length of the HIV-1 sub-subtype A1 Gag protein. Charged amino acids (arginine, lysine, glutamic acid, aspartic acid, and histidine) and potential N-linked glycosylation sites (PNGSs) were determined using an in-house Perl script based on the rules set in GLYCOSITE, as described previously [2]. The amino acid positions in HIV-1 Gag were determined using HXB2 (accession number K03455) as a reference. The statistical significance of the net charge and total charge between strains was determined using an unpaired t-test employed in Prism 9 version 9.3.1 (GraphPad Software, San Diego, CA, USA).

Renaissance Counting
To determine sites under selection, ratios of non-synonymous (dN) and synonymous (dS) substitutions were estimated by renaissance counting, as implemented in BEAST v1.10.2 using the HKY85 nucleotide substitution model, three-site codon partitioning, and an uncorrelated relaxed molecular clock with lognormal distribution [2,35]. The MCMC chain length was set at 2 × 10 8 .

Epitope Mapping
To determine the effect of significant variations in cytotoxic T-cell (CTL) and helper T-cell epitopes, pre-defined CTL and helper T-cell epitopes were retrieved from the HIV molecular immunology database (www.hiv.lanl.gov/content/immunology/index). For comparison, epitopes were then mapped on the consensus Gag protein sequences of the Pakistani and reference datasets, respectively. Furthermore, HLA anchoring residues and restricting HLA alleles for each epitope were predicted from consensus sequences using the Motif Scan tool available in the Los Alamos Database [36].

Protein Structure Modeling
Finally, the effect of signature variations on protein structure was determined using protein homology modeling, and secondary and tertiary structures were modeled using the ab-initio protein structure prediction tool QUARK (https://zhanglab.ccmb.med.umich. edu/QUARK, accessed on 1 January 2019) [37]. The similarity between the generated tertiary structure models was determined by superimposing the two protein models using the Click server (http://cospi.iiserpune.ac.in/click/, accessed on 1 January 2019) [38]. Finally, sites with signature variations were marked in Discovery Studio Visualizer V17.2.0 [39].

Subtyping and Cluster Analysis of HIV-1 Sub-Subtype A1 in Pakistan
The subtype analysis indicated that all 143 gag sequences from Pakistan were subsubtype A1 (Supplementary Figure S1). After the removal of duplicate sequences (from the total number of reference sequences obtained from the database), the dataset was reduced to 80 patient-unique reference sequences from Genbank for cluster analysis [24]. Except for two sequences, all Pakistan sequences clustered together in three Pakistani-specific clusters (Clusters 1-3, Figure 1), whereof most sequences were found in one large Pakistaniexclusive cluster (Cluster 3, n = 123 sequences, Figure 1). The remaining Pakistani sequences formed two separate clusters with sequences from Kenya (Cluster 1 and Cluster 2, Figure 1). To assess if the Kenyan reference sequences were representative of the main HIV-1 subsubtype A1 epidemic in Kenya, we reconstructed a separate ML tree based on all available Kenyan sub-subtype A1 sequences (n = 2045) in the Los Alamos Sequence Database (accession numbers in Supplementary Table S3). The analysis showed that the reference sequences were intermingled with the remaining Kenyan sequences, indicating that the reference sequences were representative of the main HIV-1 sub-subtype A1 epidemic in Kenya (Supplementary Figure S2). Due to their close genetic relationship with the Pakistani sequences, these Kenyan sequences were used as references in subsequent comparative analyses.

Date of Origin and Evolutionary Dynamics of HIV-1 Sub-Subtype A1 in Pakistan
To further characterize the HIV-1 sub-subtype A1 epidemic in Pakistan, we performed an in-depth phylodynamic analysis of the main Pakistani cluster, Cluster 3 (n = 123). For comparison and to gain further insight, we also analyzed the Kenyan HIV-1 sub-subtype A1 cluster (n = 51, excluding the 19 Pakistani sequences, Figure 1). Some sequences from the Kenyan reference dataset were removed because of poor temporal signal or possible recombination (based on clustering pattern).  Figure 2D). However, the apparent stabilization after 2000 was accompanied by a large variance and should be interpreted with caution.

Analysis of Molecular Properties between Pakistani and Kenyan HIV-1 Sub-Subtype A1 Sequences
The VESPA analysis indicated seven amino acid sites that differed between Pakistani and Kenyan HIV-1 sub-subtype A1 sequences ( Table 2). The total and net charges of HIV-1 sub-subtype A1 sequences from Pakistan and Kenya were not significantly different (p = 0.886) (Supplementary Table S4

Gag Sites under Selection in Pakistani and Kenyan HIV-1 Sub-Subtype A1 Strains
The ratio of non-synonymous (dN) and synonymous (dS) substitution was determined to assess site-specific selection in Pakistani and Kenyan HIV-1 sub-subtype A1 Gag sequences. The analysis showed that eight sites in the Pakistani sequences and six sites in the Kenyan sequences were under significant positive selection pressure ( Figure 4).

Sequence Variation in HIV-1 Gag T-Cell Epitopes
In the next step, we evaluated the effects of mutations (the amino acids dissimilar between Pakistani and Kenyan sequences (303, 332, 357, 370, 372, 375, and 383)) on CTL and helper T-cell epitope generation (Figure 4). The number of identified epitopes between Kenyan (CD4 + = 7 and CD8 + = 27) and Pakistani (CD4 + = 4 and CD8 + = 15) consensus sequences differed due to amino acid polymorphisms in epitope regions. For example, the presence of threonine (T) at amino acid position 303 in the Kenyan sequence matched five CD4 + and ten CD8 + T-cell epitopes, respectively (Figure 4). In contrast, a valine (V) in this position in the Pakistani sequence only matched two CD8 + T-cell epitopes (Figure 4). In contrast, some signature variations resulted in the prediction of CD4 + T-cell epitopes in Pakistani sequences only, such as sites 332 and 370, which matched two and one CD4 + T-cell epitopes, respectively ( Figure 4). Moreover, amino acid site 375 in the Pakistani strain was represented by two CD8+ T-cell epitopes, whereas the same site in the Kenyan strain did not match any previously described epitopes (Figure 4).

Sequence Variation in HLA Binding Motifs and Epitopes
Next, we evaluated the effects of mutations (the amino acids dissimilar between Pakistani and Kenyan sequences (303, 332, 357, 370, 372, 375, and 383)) on HLA anchorage. In total, 176 epitopes and 20 unique HLA anchoring residues were found in the Pakistani consensus sequence, compared with 175 epitopes and 19 residues in the Kenyan consensus sequence (Table 3 and Supplementary Table S5). The sub-subtype A1 signature sites 370, 372, and 375 in the Pakistani sequence (containing amino acids A, Q, and M, respectively) were associated with 7 unique CD4 + /CD8 + T-cell epitopes, whereas the corresponding sites in the Kenyan sequence (containing amino acids V, H, and I, respectively) were associated with 11 CD4 + /CD8 + T-cell epitopes (Table 3 and Supplementary Table S5). An alanine at position 370 in the Pakistani sequence resulted in three unique CD4 + /CD8 + T-cell epitopes, in contrast to the Kenyan sequence, where a valine at the same position resulted in five unique CD4 + /CD8 + T-cell epitopes (Table 3 and Supplementary Table S5).
The signature position 303 in the Pakistani sequence (containing a valine) was identified as an anchor site in nine unique epitopes restricted by the following type I HLAs: A*0206, B*3501, B*5103, Cw*0601, and Cw*0602; and type II HLAs: DPA1*0102, DPA1*0201, DPB1*0201, DPB1*0401, DRB1*0401, DRB1*0901, and DRB4*0101; whereas a threonine at position 303 in the Kenyan sequence was associated with only one unique epitope of four amino acids (FFKT) with anchor residues in the following type II HLAs: DRB1*0301 and DRB3*0201 (Table 3). Moreover, site 332 in the Pakistani sequence containing a threonine was not associated with any epitope, whereas the presence of a serine at the same site in the Kenyan dataset was associated with the epitope NANPDCKSI that is restricted by HLA B*7801 (Table 3). Table 3. Unique HLA Anchoring Residues and restrictions. The HLA anchoring residue analysis of the possible unique epitopes, their restricted HLA, and their anchoring residues in HIV-1 sub-subtype A1 in Pakistan and Kenyan reference datasets. Columns 1-4 and columns 5-8 represent the data from HIV-1 sub-subtype A1 from Pakistan and the Kenyan reference dataset, respectively. The amino acid in bold and underline represents the signature amino acids between the strains.

Structural Diversity
The structure of the HIV-1 Gag protein was predicted using the ab initio method in the QUARK tool [37] to assess the potential effects of differences in the identified amino acid differences between the Kenyan and Pakistani consensus sequences on the Gag protein structure. The analysis suggested several secondary and tertiary structural differences ( Figure 5). For example, differences in positions 303, 332, 357, 370, and 372 resulted in a longer α-helix in the Kenyan sequence as compared to the Pakistani Gag protein sequence ( Figure 5A,B). Similarly, the signature position 375 in the Kenyan sequence changed the start position of α-helix in Gag protein (start-end: 374-378, with reference to HXB2 position) as compared to the Pakistani sequence (start-end: 375-379, with reference to HXB2 position, Figure 5A,B). Additionally, the signature pattern at position 383 elongated the β-sheet downstream in Gag from the Pakistani HIV-1 strain compared to the Kenyan strain ( Figure 5A,B). The superimposition of tertiary structures of Gag from the Kenyan and Pakistani strains indicated a 64% similarity between the two structures. The α-helix extension caused by differences in sites 303, 332, 357, 370, and 372, in the Kenyan strain, altered the overall folding of the protein and affected the complete superimposition of the protein. The variation at position 383 in the Kenyan strain resulted in a bigger and more exposed loop compared to the Pakistani strain ( Figure 5A,B).

Discussion
In this study, we characterized the HIV-1 sub-subtype A1 epidemic in Pakistan on the basis of the phylogenetic and molecular properties of gag. Our dataset includes all known HIV-1 sub-subtype A1 gag sequences. The analysis showed that the HIV-1 sub-subtype A1 sequences from Pakistan formed three main clusters, whereof two clustered with Kenyan sequences suggesting a close relationship between the Kenyan and the Pakistani HIV-1 sub-subtype A1 epidemics. This is in line with a previous study on pan-epidemic strains of sub-subtype A1 (performed on sub-subtype A1 sequences collected up to 2010) [40]. However, the third and larger cluster was much less related to other available sub-subtype A1 sequences, also suggesting the presence of a Pakistani-specific HIV-1 sub-subtype A1 strain. To the best of our knowledge, this Pakistani-specific strain has not been described before, and this may reflect an evolving and distinct HIV-1 epidemic in Pakistan. The phylodynamic analysis indicated that this strain emerged around 2002 (95% HPD: [2000][2001][2002][2003][2004] and had an approximately four times higher evolutionary rate compared with the Kenyan strain. This could be due to the selection of evolutionary 'fit' and transmissible strain, capable of spreading rapidly to high-risk groups such as PWID and MSM [41][42][43]; however, this could not be ascertained due to a lack of availability of precise sequences and epidemiological information from that period. Moreover, the evolutionary rate of the Pakistani strain was approximately twice as high compared with the HIV-1 gag evolutionary rate in the global sub-subtype A1 epidemic (median rate: 2.3 × 10 −3 s/s/y), as recently determined by Patino-Galindo et al. [44]. This may reflect the recent emergence of a Pakistani-specific HIV-1 strain that still has not fully adapted to the Pakistani population. Indeed, the evolution of population-specific strains is not an unusual phenomenon in HIV-1 infection, and several studies have reported on different subtypes that have evolved into population-specific sub-strains (e.g., the Thai subtype B, Brazilian subtype B, Korean subtype B, Trinidad and Tobagonian subtype B, and Ethiopian and East African subtype C strains) [11,45]. Furthermore, the phylodynamic analysis of the Pakistan-specific lineage suggested a rapid increase in the number of effective infections and transmissions following the emergence of the strain in 2002. This is in agreement with the HIV-1 surveillance data for Pakistan, showing a rapid increase in new cases in the country, especially during the last 10 years [1, 19,41,46].
Next, we compared the consensus sequences of the Pakistani and the Kenyan HIV-1 sub-subtype A1 strains for the presence of mutations and variations unique to Pakistani or Kenyan datasets (referred to as signature sites). Interestingly, some of the identified sites (303, 332, 370, 372, and 375) have previously been shown to be specific for sub-subtype A1, and even specific to Kenyan and Pakistani sequences [16]. However, in contrast to the previous study, which was focused on only Pakistani and Kenyan patient-derived sequences, this study included all available sub-subtype A1 gag sequences deposited from Pakistan (patient-derived and database sequences), which not only confirms the previous findings but also identifies additional sites, such as position 373, to be uniquely different in Kenyan sequence, as compared to Pakistani sequences, because of the presence of PNGSs. Glycosylation is known to affect the function of Gag protein in HIV-1; for example, previous studies have demonstrated that the glycosylated Gag protein mimics the Nef function in nef deficient HIV-1 by restoring and rescuing its infectivity [47]. In addition to glycosylation, changes in the net charge can affect protein stability. For example, in GFP protein, instead of lysine, the presence of arginine, which has three nitrogen atoms and is capable of forming hydrogen bonds with neighboring amino acids (Pro75 and Asp76), leads to protein stability [48]. Interestingly, at signature position 383 in NC (nucleocapsid region), arginine was found in the Kenyan strain, which was replaced with lysine in the Pakistani strain. Our observations cannot ascertain the link between the transmission advantage of the Kenyan compared to the Pakistan strain. However, further studies to investigate the role of amino acid charge differences in Gag structure and functions can provide further insights into how the difference in the charge of Gag amino acids can affect viral infection dynamics.
HIV-1 Gag is relatively conserved compared with other HIV-1 proteins and contains several immunodominant regions [49]. Mutations in the gag gene have previously been associated with increased tolerance against drugs or immune activity. For example, in vitro studies have demonstrated that mutations in the capsid can produce negative effects on virus infectivity [50]. Moreover, mutations in the Gag p2/NC (spacer peptide 1/nucleocapsid) cleavage site can reduce the efficacy of protease inhibitors, which ultimately leads to treatment failure and propagation of the infection [51]. Other effects include changes in epitope sequence, which may abrogate the recognition of epitopes, particularly T-cell epitopes, leading to immune escape [40,52]. An in-depth molecular analysis of Gag indicated signature amino acid variation in Pakistani HIV-1 sub-subtype A1, which may have evolved under immune pressure. The presence of valine at signature position 303 of Gag resulted in the generation of more epitopes, as valine has been predicted to serve as an anchoring residue for five HLA type I and seven HLA type II molecules [36]. On the contrary, the presence of threonine at the same position has been associated with a decreased number of epitopes [16]. In addition, a recent study on HIV-1 subtype C showed an association between the selection pressure exerted by HLA Cw*0303/Cw*0304 and amino acid variation at position 303 in the HLA-restricted HIV-1 Gag epitope YVDRFFKTL [53]. Moreover, the substitution of wild-type amino acid T with A/I/V at position 303 in Gag has been suggested to reduce the CD8+ T-cell recognition of the epitope [53]. Similarly, the mutation T303A/I has been shown to decrease HIV-1 infectivity rates, whereas HIV-1 with T303V showed similar replicative fitness compared with the wild-type strain [53]. In contrast, the signature position 357, with a serine in the Pakistani strain, was found to reduce the number of CD4 + /CD8 + T-cell epitopes compared with the Kenyan strain that had a glycine in this position. This observation is supported by a previous study indicating that the set of four substitutions (79F+228L+286K+357G) can reduce the replicative capacity of NL4-3 (mutant) by 2-fold [54]. However, due to limited in vitro and in vivo data on epitopes specific for HIV-1 sub-subtype A1, a detailed assessment of the relationship between signature mutations and immune escape cannot be established and warrants further in vitro studies to confirm this phenomenon. However, it is important to mention that since more than half of the patients were on ART, and archival HIV-1 DNA genomes were amplified, it may be difficult to assess actual immune escape in Gag in these patients.
The HIV-1 inhibitor Bevirimat (BVM) binds to the Gag spacer regions and blocks the cleavage of the protein [55]. The amino acid positions 370, 372, and 375 found to be different between the Pakistani and Kenyan strains are the known binding residues of Gag inhibitors [55]. In this study, we found a signature variation at position 370 of the Gag protein (alanine and valine in the Kenyan and Pakistani strains). Interestingly, the V370A polymorphism has previously been shown to reduce the BVM susceptibility by 40-fold. Although this has to be verified by in vitro experiments, this opens up the possibility that the Pakistani HIV-1 sub-subtype A1 strain may be resistant to BVM [55].

Conclusions
In summary, this study increases our understanding of the HIV-1 sub-subtype A1 introduction, evolution, and diversification in Pakistan. The results suggest that HIV-1 sub-subtype A1 strains are unique to Pakistan with phylogenetic linkages to the Kenyan strains. Additionally, Pakistani sub-subtype A1 strains have accumulated certain unique mutations in the gag gene that may facilitate HIV-1 adaptation to host selection pressures and more effective transmission of the virus to different at-risk groups. Further studies are needed to fully disentangle the role of the identified mutations in virus transmission dynamics.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/v14102307/s1, Table S1: List of accession number of HIV-1 Gag sequences from Pakistan used in this study.; Table S2: Profile of the people living with HIV-1 (PLHV-1) in Pakistan recruited for this study.; Table S3: The list of accession numbers of all Kenyan HIV-1 subtype A1 sequences from the database used to construct the Kenyan HIV-1 subtype A1 tree.; Table S4: The list of net charges and total charges.; Table S5: List of anchoring residues in HIV-1 subtype A1 gag protein from Kenya and Pakistan.; Table S6: The list of accession numbers along with biophysical properties difference of the sequences different between Kenyan and Pakistani HIV-1 subtype A1.; Figure S1. Genotyping of HIV-1 subtype A1.; Figure S2. Clustering pattern of Kenyan reference dataset sequence with all available Kenyan sequences from the Los Alamos HIV Database.; Figure S3. Amino acid variations within Pakistani and Kenyan cohorts.; Figure S4. Sites under selection pressure in Pakistani and Kenyan reference subtype A1.

Informed Consent Statement:
The study was conducted after obtaining written informed consent from the participants. Written informed consent has been obtained from the participants to publish this paper.

Data Availability Statement:
The data is available within the manuscript or its supplementary files.