Comprehensive Analysis of HIV-1 Integrase Resistance-Related Mutations in African Countries

The growing emergence of non-nucleoside reverse transcriptase inhibitor (NNRTI) HIV drug resistance in sub-Saharan Africa (SSA) led to the World Health Organization (WHO) recommending, in 2018, a transition to dolutegravir (DTG) as a first-line antiretroviral therapy (ART) in SSA. The broad HIV-1 genetic diversity in SSA could shape DTG effectiveness and the pattern of drug resistance mutations (DRMs) in this region. This study evaluated HIV-1 integrase (IN) DRMs and conserved regions among published groups M, N, O, and P HIV-1 sequences spanning forty years of the HIV epidemic during the transition of DTG-based ART. Overall, we found low levels of integrase strand transfer inhibitor (INSTI)-DRMs (<1%) across HIV groups between the years 1983 and 2023; however, it was unexpected to detect DRMs at statistically significantly higher frequencies in pre-INSTI (1983–2007) than in the INSTI (2008–2023) era. The variability of accessory INSTI-DRMs depended on the HIV subtypes, with implications for susceptibility to DTG. Our findings provide new perspectives on the molecular epidemiology and drug resistance profiles of INSTIs in SSA, emphasizing the need for ongoing surveillance and customized treatment approaches to address the continent’s varied HIV subtypes and changing resistance patterns.


Introduction
At the close of 2023, approximately 39 million individuals globally were living with HIV (PLWH) [WHO], and Africa, with 25.6 million infected people, accounting for the highest HIV-1 genetic diversity globally, exhibits different distributions of HIV-1 subtypes and recombinant forms [1,2].This includes subtype B in Northern Africa, subtype C in the South, CFR02_AG in the West and Central, and subtype C and D in the East [1][2][3][4].The extensive genetic diversity of HIV, especially in West and Central Africa, potentially contributes to the emergence of natural polymorphisms associated with resistance and varying susceptibilities to combined antiretroviral therapy (cART) [5].While cART has significantly reduced transmission, morbidity, and mortality among People Living with HIV (PLWH), the rise of drug resistance mutations (DRMs) in the virus increases the risk of virological failure and challenges the effectiveness of cART [5].
In many countries of Sub-Saharan Africa (SSA), high levels of pretreatment drug resistance to nucleotide reverse transcriptase inhibitors (NRTIs), non-nucleoside reverse transcriptase inhibitors (NNRTIs), and protease inhibitors (PIs) have led, in 2018, the World Health Organization (WHO) to recommend a transition to dolutegravir (DTG)-an integrase strand transfer inhibitor (INSTI) with a strong resistance barrier-as a firstline ART component in resource-limited settings [6].INSTIs, targeting the integrase (IN) enzyme crucial for HIV replication, have shown high efficacy in treatment-naïve and -experienced individuals [7].The first INSTI approved by the United States Food and Drug Administration (US FDA) for clinical use was raltegravir (RAL) in 2007, followed by elvitegravir (EVG) in 2012, dolutegravir (DTG) in 2013, bictegravir (BIC) in 2018, and cabotegravir (CAB) in 2021 [6][7][8].
In African countries with prevalent NRTI resistance, a transition to (DTG)-based ART is underway.Optimized use of DTG is vital for cART success in settings lacking genotyping testing or drug resistance surveillance to prevent the emergence of resistant variants.The Stanford HIV Drug Resistance Database (HIVdb) (http://hivdb.stanford.edu,accessed on 25 November 2023).identifies key IN mutations across HIV-1 subtypes at positions T66, E92, G118, E138, G140, Y143, S147, Q148, N155 and R263 [9][10][11][12].Additionally, mutations that reduce susceptibility in combination with major DRMs include polymorphic M50, L74, T97 and nonpolymorphic H51, Q95, F121, Q146, G149, V151, S153, G163, S230, and D232 accessory mutations.The pattern of DRMs varies among HIV-1 subtypes, like the G118R mutation in non-B subtypes and R263K in subtype B [9][10][11][12].Given the significant INSTI susceptibility differences among HIV subtypes and the genetic diversity in SSA, it is crucial to monitor and investigate the pattern of INSTI-DRMs through a surveillance program.This study aims to provide updated insights into the molecular epidemiology and drug resistance mutation profiles of INSTIs in published IN sequences from 1983 to 2023 of the four phylogenetic M, N, O, and P HIV-1 groups in African countries.

Spatio-Temporal HIV-1 Subtype Epidemic Trend
We retrieved 7906 HIV-1 IN sequences, covering all 864 base pairs, collected from both ART-naïve and ART-treated individuals between 1983 and 2022 from a public database maintained by Los Alamos National Laboratory https://www.hiv.lanl.gov/content/index(accessed on 1 October 2023).HIV-1 group M included sequences belonging to subtypes A, B, C, D, F, G, H, J, K, L, and the circulating recombinant forms (CRFs) 01 to 143.A set of 96 sequences referring to HIV-1 groups N, O, P, and untyped "U" were assembled separately from the M group.The temporal distribution of HIV-1 IN subtypes across the SSA was meticulously analyzed through a comprehensive study of these 7906 HIV-1 samples.The selection criteria for samples were stringent, prioritizing those with detailed epidemiological metadata, including the date of sample collection and the country of origin.These HIV-1 strains represent a diverse set of geographical origins, categorized into four major African macro-regions: Western, Eastern, Central, and Southern.This regional classification followed the framework provided by UN-AIDS.The study further explored the nuances of subtype distribution within these regions, revealing patterns and trends indicative of both the biological characteristics of the virus and the socio-epidemiological factors at play.

Analysis of Polymorphisms and Integrase DRMs
All sequences were submitted to the Stanford HIV Drug Resistance Database version 7.0 for quality analysis.Sequence quality control was ensured by excluding any sequence having two or more of the following features: internal stop codons, frame shifts, more than 1% ambiguous nucleotide, hypermutations, insertions, or deletions.To assess the DRM s , we used the Calibrated Population Resistance (CPR) tool version 8.1, available at Stanford University HIV Drug Resistance Database (HIVdb) (https://hivdb.stanford.edu/,(accessed on 1 October 2023)) and Stanford HIVDB 8.9-1 "https://hivdb.stanford.edu/dr-summary/comments/INSTI/ (last updated 25 November 2023)".The analysis also employed the 2022 IAS-USA DRM list using the HIV-1 subtype B reference sequence (GenBank accession number: K03455).

Examination of Differences in Sequence Composition
The analysis involved examining each position along the sequences to determine the presence of mutations.In this study, the polymorphisms were defined as mutations present in ≥0.5% of group M sequences.By comparing each base to a reference sequence, we calculated the frequency of amino acid changes at specific positions as follows: where f (i) represents the mutation frequency at the position i, N is the total number of sequences examined, x j is the amino acid at the position i in the sequence j, I(x j ̸ = a i ) is a function that returns 1 if x j ̸ = a i and 0 otherwise, and L is the length of the reference sequence.This allowed us to identify positions of increased mutational activity, providing insight into the molecular evolution of the sequence.In addition to isolated positions, we constructed cumulative mutation profiles, illustrating the cumulative frequency of mutations over the entire length of the sequences.This facilitated the identification of regions characterized by higher mutation density, offering cues for potential functional domains or regions under selective pressure.
Next, divergence analysis aimed to quantify dissimilarity between the reference sequence and those within specific subtypes, excluding ambiguous or non-contributing elements, to provide a comprehensive perspective on the genetic variability present.For each subtype, the divergence was calculated as: where D represents the average divergence, N and M are the total number of sequences, and d ij represents the divergence between the sequences i and j.
By comparing each sequence to a reference, we calculated the number of amino acid variations, describing the specific nature of genetic divergence at the protein level.The results were saved in a divergence matrix, which systematically captures the dissimilarity between each sequence pair.This matrix facilitated the identification of clusters of sequences with similar divergence profiles, contributing to a more granular understanding of genetic relationships.Finally, to test the statistical significance of the observed mutation frequencies, hypothesis tests, such as the chi-square test, were applied to determine whether the observed mutation patterns deviated significantly from what would be expected by chance alone.This step added a level of robustness to our results, ensuring that the identified mutation frequencies were not simply stochastic fluctuations.All statistical analyses were performed using R (version 4.3.1).A p-value of <0.05 was considered statistically significant.

Distribution of Published HIV Integrase Sequences in SSA
Figure 1 shows the distribution of published sequences of HIV IN across SSA collected over forty years and included in the study.
According to HIV-1 molecular epidemiology studies in the African continent [1-4], the majority of IN samples from 22 countries in SSA since the onset of the epidemic were of subtypes A, B, C, D, G, H, and L. From 1993 onwards, sequences of subtypes F, J, K, CRFs, and the untyped groups N, O, and U were also documented.Subtype C samples have been consistently significant throughout the entire period, especially in Southern and Eastern Africa and, to a lesser extent, in Central Africa.In Eastern Africa, subtypes A and D samples were predominant from 1990 to 2020.In contrast, Western Africa had limited representation, mainly with a few sequences from subtype G and CRFs.Central Africa showcased a diverse range of sequences from all subtypes and HIV groups throughout the period, with a notable expansion of CRFs between 2019 and 2022.A, B, C, D, G, H, and L. From 1993 onwards, sequences of subtypes F, J, K, CRFs, and the untyped groups N, O, and U were also documented.Subtype C samples have been consistently significant throughout the entire period, especially in Southern and Eastern Africa and, to a lesser extent, in Central Africa.In Eastern Africa, subtypes A and D samples were predominant from 1990 to 2020.In contrast, Western Africa had limited representation, mainly with a few sequences from subtype G and CRFs.Central Africa showcased a diverse range of sequences from all subtypes and HIV groups throughout the period, with a notable expansion of CRFs between 2019 and 2022.

HIV Integrase Amino Acid Diversity
Table 1 shows the extent of IN amino acid diversity by groups and subtypes for sequences from two time periods, 1983-2007 and 2008-2023, using subtype B (HXB2) as the reference strain.Statistical analysis did not show significant differences in IN amino acid diversity between the years 1983 and 2007 and 2008 and 2023.However, an increase in amino acid diversity was found in subtypes A (0.06 ± 0.03 vs. 0.07 ± 0.04) and D (0.04 ± 0.02 vs. 0.06 ± 0.05), and notably in the NOPU group (0.12 ± 0.07 vs. 0.17 ± 0.19), while a decrease was noted in subtypes B (0.05 ± 0.03 vs. 0.04 ± 0.03) and C (0.06 ± 0.02 vs. 0.05 ± 0.02) between the years 1983 and 2007 and 2008 and 2023.For subtypes H, J, K, and L, amino acid diversity (0.06 ± 0.03) was estimated only for the 1983-2007 period.No variation was found in subtypes F, G, CRFs, and all HIV groups (M, N, O, P) and U between the two sampling periods.Overall, these results were in agreement with previous estimates of mean amino acid diversity, around 7% between HIV sub-subtypes and 0.11-0.18%between HIV groups [9].Regarding subtypes A and D, the observed divergences between the years 1983 and 2007 and 2008 and 2023 suggested that there may have been a trend toward a genetic diversification over time, which was also much more evident in the NOPU group, that could shape INSTI effectiveness and INSTI DRM in SSA.

Natural Polymorphism Patterns and Conservation Analysis
To evaluate the conservation degree in INSTI-DRMs and polymorphisms of HIV-IN amino acidic sequences in SSA over time, we conducted a chronological analysis of the overall contribution of the entire study population.The dataset was divided into two time periods, 1983-2007 (3491 samples) vs. 2008-2023 (4415 samples), with consideration given to the time before and after approval of INSTI-based treatment regimens [13], and the amino acidic sequences were first aligned and compared to the reference strain B (HXB2) and then compared to each other (Figure 2).
Concerning highly conserved amino acid positions (with <1% variability), sequences between 1983 and 2007 and 2008 and 2023 showed similar levels of variability, accounting for 65.62% (189/288) and 64.63% (187/288), respectively.Among the 288 IN amino acids, there was no significant difference in variability concerning major or accessory INSTI DRMs, except for position G163, which was less conserved in samples from 2008-2023.

Discussion
Limited information is available regarding the evolution of INSTI-DRM resistance in SSA, underscoring the presence of pre-existing DRMs before transitioning to dolutegravirbased ART regimens, which could significantly impact virological outcomes [14,15].Several groups have reported contrasting data on INSTI-DRMs in SSA, with many showing a lack of major DRMs and a low frequency of accessory INSTI-DRMs [15][16][17], while others reported frequencies ranging from 0.8% up to 5.4% of major INSTI-DRMs in INSTI-naïve populations, suggesting a potential threat to the effectiveness of the DTG-based ART programs in SSA [18][19][20].
Our large-scale analysis provided a comprehensive mapping of polymorphisms and conserved amino acid positions within the HIV IN gene across major HIV-1 subtypes and HIV groups, considering both INSTI-naïve and experienced individuals.This analysis also examined the baseline INSTI resistance, encompassing major and accessory DRMs, with implications for the use of DTG as a first-line regimen.
HIV-1 genetic diversity in SSA plays a crucial role in susceptibility to INSTIs and the selection of resistance mutations [18][19][20][21][22].In our study, the majority of the viruses analyzed were subtypes C and A, followed by CRFs, D, F, and G, as well as groups N, O, P, and U, whereas subtypes H, J, F, L, and B were poorly represented, reflecting the distribution of HIV infection in SSA.Despite a significant increase in the number of CRFs and changes in the geographic distribution of M HIV-1 subtypes and groups N, O, P, and U forms over the past forty years, the diversity estimates for IN inter-subtype diversity levels remained consistently low at <0.07, as did inter-group diversity at <0.18, during both the 1983-2007 and 2008-2023 periods.This suggested that the IN region is a conserved drug target [9].
Analysis of the conservation within the genetically conserved key function enzyme amino acids revealed similar levels of variability (<1%) in both the 1983-2007 and 2008-2023 periods, consistent with findings in HIV-1 group M subtypes and previous studies [9,18].In contrast, a higher number of polymorphic positions were detected at a frequency of ≥5% or ≥20% in samples from the 2008-2023 period.Notably, residue G163, where substitutions have been reported to act synergistically with other mutations to reduce susceptibility to INSTIs, appeared to be more polymorphic (≥5% variability) in samples from 1983-2007 [23].
Estimates of INSTI-DRMs are limited within SSA [24][25][26][27][28][29][30][31][32][33][34][35][36][37][38].Most reports indicate that primary INSTI DRMs are rare, and very few INSTI accessory resistances can be detected prior to the initiation of INSTI therapy, possibly due to transmitted INSTI resistances and the cumulative natural occurrence of mutations without selective drug pressure [27,39].DRMs such as G118R, E138K/A/T, G140S/A/C/R, Q148H/R/K, S153 F/Y, N155H, and R263K have been identified as being associated with DTG resistance [34].In our study, the examination of INSTI resistance-associated mutations showed low levels of major DRMs, such as T66A, E92G, G118R, E138K/A, G140R, Y143H, S147G, Q148H, and N155H/T/D, over forty years, and as mentioned above, some of these have been reported to have an effect on DTG.Notably, it was unexpected to detect major INSTI-DRMs at statistically significantly higher frequencies in 1983-2007 compared to 2008-2023.The most common major DRM was T66A, found in Cameroon, Rwanda, and South Africa [38].The rare mutation E92G was present in South Africa; G118, previously described as a DTG-resistance pathway in non-B subtypes, was present in South Africa; the extremely rare mutation G140R was present in South Africa; and the less common mutation Y143H was present in Uganda and Botswana.The appearance of INSTI resistance mutations in the pre-INSTI era, as found in our study, might be explained by low retention in care and poor treatment adherence in SSA, while the decline in the frequency of DRMs in the INSTI era likely reflects the availability of ART regimens and improved adherence in treatment-experienced populations [37,39].
Regarding accessory DRMs, in spite of the broad genetic diversity over forty years, no relevant difference in mutation frequency between 1983-2007 and 2008-2023 was detected, and more than 46% of the study population harbored at least one mutation.Seven accessory INSTI-DRMs that may contribute to drug resistance when combined with major DRMs were found, with M50I, L74I/M, T97A, and E157Q being the most frequent.In particular, M50I, L74I/M, and T97A, in combination with other INSTI drug resistance mutations, may contribute to reduced DTG susceptibility [37,40].These mutations, which have been shown to occur before initiation of cART with different prevalences depending on the subtypes, could facilitate viral evolution of resistance under drug-selective pressure.M50I has been shown to increase DTG resistance in combination with R263K [39].Here, we show a high prevalence of the M50I in HIV subtypes circulating in SSA over forty years, particularly in subtype C and in HIV groups N, O, P, and U, with a higher increase in groups N, O, P, and U in 2008-2023.L74 is located in the catalytic core domain (amino acids 51-212) and is involved in the catalytic processes of the enzyme.L74M, in combination with T79A, may reduce susceptibility to RAL and/or EVG in the absence of primary INIresistance mutations [40].It has also been reported that L74M combined with G118R or E138K mutations increases resistance to DTG [41].To date, the role of L74I substitution is still unclear [42,43], although it has been shown to enhance integrase inhibitor resistance in combination with additional major INSTI mutations [8] and is also being evaluated as a putative cause of DRMs for ART using DTG [41].Our analysis indicated a low prevalence of L74I polymorphisms in subtypes A, D, and F, a relatively high prevalence in subtypes C, G, and CRFs, and a high prevalence in groups N, O, P, and U, particularly in 2008-2023, while L74M remained relatively low except in CFRs, where it reached a prevalence of about 6%.
T97A is a relatively common polymorphism-associated INSTI resistance and is coselected in the presence of

Conclusions
In summary, our study presented a comprehensive analysis of HIV integrase gene variability over forty years.Our data showed a higher prevalence of major DRMs in the years 1983-2007, prior to INSTI-based therapy, than in the years 2008-2023.This could be explained by the low retention in care and the poor treatment adherence in SSA, which favor genetic variability and the high evolution rate of HIV-1.No significant difference in mutation frequency was detected in accessory DRMs in the years 1983-2007 and 2008-2023.Natural HIV-1 variation could explain this because accessory DRMs occur in the absence of therapy, but similarly to major DRMs, other factors, such as poor adherence support and poor retention in care, may also support the level of resistance reported here.The results of this study suggest the need to expand and strengthen surveillance through an HIV drug resistance national survey.

13 Figure 1 .
Figure 1.Temporal sampling of HIV integrase sequences in different African regions (1983-2022).Sampling of all available HIV IN sequences in different African countries through time, annotated according to their subtype assignment.Each dot represents an individual sample, with overlapping subtypes indicating isolation within the same time frame.

Figure 1 .
Figure 1.Temporal sampling of HIV integrase sequences in different African regions (1983-2022).Sampling of all available HIV IN sequences in different African countries through time, annotated according to their subtype assignment.Each dot represents an individual sample, with overlapping subtypes indicating isolation within the same time frame.According to HIV-1 molecular epidemiology studies in the African continent [1-4], the majority of IN samples from 22 countries in SSA since the onset of the epidemic were of subtypesA, B, C, D, G, H, and L. From 1993 onwards, sequences of subtypes F, J, K, CRFs, and the untyped groups N, O, and U were also documented.Subtype C samples have been consistently significant throughout the entire period, especially in Southern and Eastern Africa and, to a lesser extent, in Central Africa.In Eastern Africa, subtypes A and D samples were predominant from 1990 to 2020.In contrast, Western Africa had limited representation, mainly with a few sequences from subtype G and CRFs.Central Africa showcased a diverse range of sequences from all subtypes and HIV groups throughout the period, with a notable expansion of CRFs between 2019 and 2022.

Figure 2 .
Figure 2. Comparison of degree variability between HIV integrase sequences from 1983-2007 vs. 2008-2023.The reference strain B (HXB2), used for the definition of mutations, is colored according to the frequency rate of mutations present in the 1983-2007 and 2008-2023 groups.Domains of IN are colored boxes.Red asterisk indicates major INSTI DRMs, and black asterisk indicates INSTI accessory polymorphic and nonpolymorphic mutations from the latest version of Stanford's list (https://hivdb.stanford.edu/,accessed on 25 November 2023).
major INSTI-DRMs by RAL and EVG and by DTG in INSTIexperienced individuals [44].Our analysis revealed that in subtype A, mutation T97A had a statistically significantly higher prevalence in the years 1983-2007 compared with 2008-2023, and it had a relatively high prevalence in subtype F and the NOPU group, while it was absent in the NOPU group in the years 2008-2023.Our results suggest the existence of major and accessory DRMs prior to INSTI-based therapy that poses a minimal risk of compromised virologic response for patients enrolling in INSTI-based therapy.This study is limited in that (a) our analysis did not distinguish ART-experienced from treatment-naïve individuals in the years 1983-2007 and 2008-2023, (b) the heterogeneity in the number of sequences analyzed for different African regions depends on the number of sequences deposited in the Los Alamos HIV database, (c) the limited number of available IN sequences for subtypes F and G and NOPU groups may have affected the frequency of DRMs and polymorphisms at specific positions.

Table 1 .
The inter-group-subtype diversity of IN sequences (%).: the mean proportion of amino acid difference between all sequence pairs.The number of sequences compared are in brackets. Divergence

Table 2 .
Distribution of major DRMs across subtypes.

Table 3 .
Distribution of INSTI accessory resistance across subtypes.