A Nationwide Study about the Dispersal Patterns of the Predominant HIV-1 Subtypes A1 and B in Greece: Inference of the Molecular Transmission Clusters

Our aim was to investigate the dispersal patterns and parameters associated with local molecular transmission clusters (MTCs) of subtypes A1 and B in Greece (predominant HIV-1 subtypes). The analysis focused on 1751 (28.4%) and 2575 (41.8%) sequences of subtype A1 and B, respectively. Identification of MTCs was based on phylogenetic analysis. The analyses identified 38 MTCs including 2–1518 subtype A1 sequences and 168 MTCs in the range of 2–218 subtype B sequences. The proportion of sequences within MTCs was 93.8% (1642/1751) and 77.0% (1982/2575) for subtype A1 and B, respectively. Transmissions within MTCs for subtype A1 were associated with risk group (Men having Sex with Men vs. heterosexuals, OR = 5.34, p < 0.001) and Greek origin (Greek vs. non-Greek origin, OR = 6.05, p < 0.001) and for subtype B, they were associated with Greek origin (Greek vs. non-Greek origin, OR = 1.57, p = 0.019), younger age (OR = 0.96, p < 0.001), and more recent sampling (time period: 2011–2015 vs. 1999–2005, OR = 3.83, p < 0.001). Our findings about the patterns of across and within country dispersal as well as the parameters associated with transmission within MTCs provide a framework for the application of the study of molecular clusters for HIV prevention.


Introduction
HIV remains a major health challenge to date. Although HIV transmissions have declined over the last few years, incidence remains high (http://www.UNAIDS.org). According to the UNAIDS/WHO, 1.7 million people became newly infected with HIV in 2019 (http://www.UNAIDS.org). HIV-1 genetic diversity is a hallmark of the virus and since the beginning of the epidemic, group M of HIV-1 has been genetically classified into pure subtypes (A-D, F-H, J-L), sub-subtypes, recombinant forms, and unclassified viruses [1,2]. The recombinant forms of the virus are further divided into circulating recombinant forms (CRFs) and unique recombinant forms (URFs) [1]. The prevalence of HIV-1 subtypes and recombinants varies greatly with the subtype C being predominant, followed by subtypes B, A, CRF02_AG, and CRF01_AE [3]. In the absence of an effective vaccine and cure, prevention of HIV-1 transmission provides the only strategy to mitigate its impact and reduce incidence (http://www.UNAIDS.org).
HIV cluster analysis has been performed in many instances and there is increasing evidence that this knowledge can be useful for public health purposes. The benefits of cluster analysis include the ability to identify clustered infections that occurred recently and, therefore, focus on populations at higher risk of getting or transmitting HIV [17]. This knowledge can be extracted from data generated for HIV drug resistance testing, which is being performed widely.
Our previous studies have shown that subtypes A1 and B are the most prevalent, with the former following an increasing trend after 2000 in Greece [19]. The outbreak among people who inject drugs (PWID) in the Athens metropolitan area that ignited in 2011 was a different situation, where most viruses were classified as CRF14_BG followed by CRF35_AD and subtypes A1 and B [15,20]. The local clusters were of considerable importance for the high prevalence of non-nucleoside reverse transcriptase inhibitors (NNRTI) resistance mutations in Greece, where the majority of the most prevalent resistance mutations (e.g., E138A/G, K103N, Y181C) were due to onward transmissions within molecular transmission clusters (MTCs) [21].
Our aim was to provide a framework of HIV cluster analysis for public health action in Greece, by means of molecular epidemiology. Specifically, we aimed to identify the MTCs of subtypes A1 and B, which are the most prevalent clades in Greece, and to investigate the parameters associated with infections within MTCs. We also aimed to infer the within country patterns of HIV-1 dispersal.

Study Population
The study dataset included 6166 HIV-1 sequences available in the protease and partial reverse transcriptase regions (pol gene) obtained from PLHIV during 1999-2015 in Greece. This number accounts for 57.2% of all people diagnosed with HIV-1 during the same period in Greece (n = 10,787). Surveillance data were available from the National Public Health Organization (formerly, Hellenic Centre for Disease Control and Prevention) [22]. Specifically, our study sample consisted of 4790 (77.7%) sequences generated in Athens (sampled from Central/Southern Greece) and 1376 (22.3%) sequences generated in Thessaloniki (1298 (94.3%) sampled from Northern Greece and 78 (5.7%) from Crete). In the case of multiple available sequences per patient (sequences collected at different time points during the patient's follow-up), only the earliest available sequence was included in the analysis. A unique serial number was used to specify each patient, while duplicate patients were detected and excluded from the study population. The current project has been approved by the Ethics Committee of the Medical School, National and Kapodistrian University of Athens.

Subtyping Analysis
HIV-1 subtyping was carried out by using the online automated subtyping tools COMET (COntext-based Modeling for Expeditious Typing) (http://comet.retrovirology.lu) and REGA (http: //dbpartners.stanford.edu/RegaSubtyping/html/indexhbv.html). In addition, the study sequences (n = 6166) were analyzed phylogenetically along with a set of 216 reference sequences representative of all pure subtypes, sub-subtypes, and the majority of CRFs, collected from the HIV-1 sequence database (http://www.hiv.lanl.gov/). Sequence alignment and editing were performed on MEGA X [23]. Editing was conducted manually and according to the coding reading frame. Phylogenetic analysis was performed by the approximate maximum likelihood (ML) method, as implemented in FastTree v2.1 [24], using Generalized Time Reversible (GTR) as a substitution model with CAT approximation. Tree visualization and annotation were performed on the programs Dendroscope v3.5.7 (http://dendroscope.org) and FigTree v1.4 (http://tree.bio.ed.ac.uk/software/figtree/). Furthermore, all the study sequences which remained unclassified from the previous procedures were tested for the presence of recombination by using the bootscanning approach as implemented in SimPlot v3.5.1 [25]. HIV-1 subtyping confirmation by phylogenetic and recombination analysis has focused on the characterization of subtypes A1 and B, which are the most prevalent clades in Greece, as described previously [19,21].

Phylogenetic Analysis
The dispersal patterns of HIV-1 subtypes A1 and B in Greece were investigated by means of phylogenetic analysis. Analysis was performed separately for each subtype and was repeated in five replicates using a different set of randomly selected references, in order to minimize the possibility of overrepresentation of countries or geographic regions with a higher number of available sequences due to more frequent HIV drug resistance testing. Specifically, sequences from our study population (A1: 1751; B: 2575) were analyzed along with a randomly selected globally sampled dataset of sequences (A1: 1500; B: 2000), used as references. Reference sequences were available in the HIV-1 sequence database and their subtype was checked and confirmed by COMET. The phylogenetic trees were inferred by the ML method under the GTR + G4 substitution model as implemented in RAxML v8.2.10 [26].
Thereafter, phylogenetic clusters consisting of at least two sequences sampled from Greece at a proportion greater than 70% compared to the total number of sequences within the cluster (geographic criterion) were defined as tentative MTCs [27,28]. Tentative MTCs that were present and fulfilled the geographic criterion in all five rounds of phylogenetic analysis were defined as MTCs. Robustness of all the large MTCs (i.e., MTCs including more than 10 sequences) was confirmed by ML transfer bootstrap observation (TBE) and Bayesian phylogenetic analysis [29]. Specifically, phylogenetic analysis using either ML or Bayesian methods were performed for each individual MTC using as references the most closely related sequences identified after a BLAST search using the HIV BLAST tool (http://www.hiv.lanl.gov). Only unique sequences identified using 10 BLAST matches to each query were included in the analysis. ML phylogenetic analysis was conducted by the RAxML v8.2.10 program, as specified in the above paragraph. Bayesian analysis was performed by using MrBayes [30] with GTR + G and run for 10 × 10 6 generations (burnin: 10%). The convergence of all parameters of the Markov chain Monte Carlo (MCMC) was assessed by Tracer [31], if the effective sample size (ESS) was higher than 200. MTCs receiving bootstrap support > 75% or posterior probability support > 0.85 were considered as significant.

Phylogeographic Analysis
In order to infer the within country patterns of HIV-1 dispersal, we performed phylogeographic analysis on all the large MTCs of subtypes A1 and B. Specifically, phylogeographic analysis was performed separately on the MTCs of each subtype over a high number of bootstrap trees (n = 408), reconstructed by ML phylogenetic analysis that was conducted on the RAxML v8.2.10 program. Initially, we tested if the HIV transmissions (viral migration events) occurred more frequently among PLHIV from the same geographic area (Central/Southern Greece, Northern Greece, Crete) in a significant manner. The hypothesis testing was performed by character reconstruction using the criterion of parsimony on Mesquite v3.61 [32]. Afterwards, we quantified the viral migration events between the different geographic areas by using PAUP*4.0 [33], and we assessed if they were higher than those expected by chance (panmixis). Further details about the way of estimating viral migration events has been described in detail elsewhere [34][35][36].

Statistical Analysis
Mean and standard deviation (SD)/median and interquartile range (IQR) for continuous variables and absolute and relative frequencies for categorical variables were used to summarize the demographic data of the study. The non-parametric Mann-Whitney U test was used for the comparisons of the relevant distributions, by defining the level of significance at 5% and by adjusting the level of significance according to the Bonferroni correction for multiple comparisons. The demographic parameters associated with the local transmission of subtypes A1 and B in Greece were estimated by multivariable logistic regression models that were fit to a subset of the original data consisting of complete observations (A: 1635; B: 2453). Presence in MTCs was the binary outcome variable, while age, gender, transmission risk group, origin, sampling period, and area of sampling were chosen as possible explanatory variables in both models. Statistical analyses were performed on STATA 13.1 (StataCorp LP).

Sequences Availability
The whole dataset corresponds to a dense sampling of PLHIV in Greece. Therefore, to avoid the risk of PLHIV identification, sequences will be available upon reasonable request.
Our analysis has focused on the detection of the two most prevalent clades (A1 and B) in Greece. The demographic characteristics of the PLHIV infected with subtypes A1 and B are presented in detail in Table 1

Subtypes A1 and B Dispersal Patterns and Regional Transmission
Phylogenetic analysis using large sets of randomly selected reference sequences was performed in five replicates for each one of the most prevalent subtypes (A1 and B) in order to characterize the spread of the HIV-1 epidemic in Greece. Further analysis confirmed the phylogenetic credibility of MTCs with more than 10 sequences (large MTCs). A new version of phylogenetic bootstrap (TBE) was selected since with large phylogenies, the classical Felsenstein's bootstrap proportion (FBP) tends to give low support, especially at deep branches [29]. Estimation of MTCs revealed the levels of regional transmission per subtype by dividing the number of HIV-1 sequences from Greece found within MTCs with the total number of available sequences from Greece. Levels of regional transmission corresponds to PLHIV who have been infected locally.
Analyses revealed that the subtype A1 and B sequences from Greece clustered at different points in the ML trees (Figures 1a and 2a). In addition, it was found that a considerable number of clusters fulfilled the criteria to be considered as MTCs for both subtypes. Specifically, 93.8% (1642 of 1751) of subtype A1 sequences formed 38 MTCs, with a range of 2-1518 sequences (Figure 1b and Figure S1a). For subtype A1, a large MTC was identified (Figure 1b), while for subtype B, several smaller MTCs were detected across the tree (Figure 2b and Figure S1b). Even though the median number of subtype A1 sequences per MTC was 2 (IQR: 2-4), 86.7% (1518) of the subtype A1 sequences fell within a large single MTC (Figure 1b). This finding indicates that the subtype A1 epidemic in Greece is highly monophyletic and corresponds to transmissions that occurred locally. This large single MTC received high TBE bootstrap support (99%), and according to the sensitivity analysis that was performed previously [19], it is a single outbreak. The evolutionary distances in this MTC were quite large since this strain has been introduced at the early stage of the HIV-1 epidemic in Greece. Specifically, the date of the most recent common ancestor of the subtype A1 in Greece was estimated to be 1977.9 (95% highest posterior density interval: 1973.7-1981.9) [19]. In addition, most of the PLHIV infected within the large single MTC of subtype A1 were males (n = 1305; 86.0%) of Greek origin (n = 1219; 80.3%) with a median age of 36 years (IQR: [29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46]. The predominant mode of infection in this population was through sex between men (MSM) (n = 999; 65.8%), while most of the sequences from these PLHIV were sampled from Central/Southern Greece (n= 1066; 70.2%). Furthermore, phylogenetic analysis of subtype B sequences revealed the existence of 168 MTCs with a range of 2-218 sequences (Figure 2b and Figure  S1b). The proportion of subtype B sequences within MTCs was 77.0% (1982 of 2575). The high levels of dispersal of subtype B (Figure 2a) and the existence of a large number of MTCs (Figure 2b) indicate that the introduction of this subtype into Greece has occurred from multiple sources. The number of subtype A1 and B introductions into Greece was calculated separately for each subtype by adding the number of sequences found outside the MTCs (unclustered sequences-each corresponding to an independent introduction) and the number of MTCs (e.g., for two MTCs including 5 and 10 sequences, we counted only two introductions-as the number of founder strains within each MTC). The number of introductions was found equal to 147 (38 MTCs + 109 unclustered sequences) and 761 (168 MTCs + 593 unclustered sequences) for subtypes A1 and B, respectively. The proportion of founder strains transmitted locally (MTCs) was similar for subtype A1 (25.8%; 38/147) and B (22.1%; 168/759).

Factors Associated with Local Transmission of Subtypes A1 and B
The results of the statistical analysis (multivariable logistic regression models) for the identification of demographic factors associated with local transmission of subtypes A1 and B in Greece are presented in detail in Table 3. Analysis revealed that transmissions within MTCs of subtype A1 were associated with the MSM risk group (MSM vs. heterosexuals, OR = 5.34, p < 0.001) and Greek origin (Greek vs. non-Greek origin, OR = 6.05, p < 0.001), having been adjusted for the rest of the incorporated variables (Table 3). For subtype B, local transmissions were associated with Greek origin (Greek vs. non-Greek origin, OR = 1.57 p = 0.019) and more recent sampling (sampling period: 2011-2015 vs. 1999-2005, OR = 3.83, p < 0.001) ( Table 3). In addition, it was found that younger age was also associated with an increased probability of local transmission of subtype B (OR = 0.96, p < 0.001) ( Table 3).

Patterns of Geographical Dispersal
To investigate the patterns of virus spread among different geographic areas in Greece, we performed phylogeographic analysis on the sequences found within MTCs that were generated in Athens and sampled from most areas in Greece (area 1), were generated in Thessaloniki and sampled from Northern Greece (area 2), and a small sample of sequences from Crete also generated in Thessaloniki (area 3). Phylogeographic analyses suggested that the total number of viral migration events corresponding to transmissions among the three different areas was 181 (median value; IQR: 178-184) and 136 (median value; IQR: 134-138) for subtypes A1 and B, respectively. These numbers were significantly lower than the total number of viral migrations expected by chance under the simulated scenario of panmixis (i.e., random mixing of sequences assuming that the level of regional transmission is negligible). Specifically, the total number of viral migrations under panmixis were 398 (median value; IQR: 394-402) and 259 (median value; IQR: 257-262) for subtypes A1 and B, respectively, and were significantly higher than the estimated values for the bootstrap trees for both subtypes (p < 0.001). These findings suggest that transmissions occur at the regional level; a conclusion that can also be drawn by the visual inspection of the trees, where the existence of several regional clusters is obvious (Figure 3). Additionally, we quantified the viral migration events among the different areas and we assessed if they were higher than those expected by chance under panmixis. According to the migration matrix (Table 4), the number of transmissions that were exported from Central/Southern Greece to the other localities (Northern Greece: subtype A1-90.9; subtype B-67.6/Crete: subtype A1-11.6; subtype B-15.0) was lower than those expected by chance, and only transmissions exported from Northern Greece to Central/Southern Greece (subtype A1: 35.4; subtype B: 17.4) were significantly higher than panmixis. Similar results were found between Crete and Central/Southern Greece; however, the number of transmissions was very low (subtype A1: 0.1; subtype B: 1.2) ( Table 4).

Discussion
In the current study, we investigated the patterns of HIV-1 transmission across and among different geographic areas, and we estimated the parameters associated with regional transmission for the two most prevalent subtypes in Greece. We found that a high proportion of subtypes A1 and B, which are the predominant clades, occur locally at 93.8% and 77.0%, respectively. These provide some of the largest proportions of regional transmissions compared to approximately 50% that was reported previously [18,[37][38][39][40]. Possible explanations are the dense sample used in our analysis (our study population represents the 57.2% of the total population of PLHIV diagnosed in Greece during 1999-2015) and the method implemented for the identification of MTCs that was based on phylogenetic analysis using a novel bootstrap approach, which is more appropriate for large datasets [29]. Moreover, we did not implement a genetic threshold as a criterion for the identification of MTCs, since our aim was to identify not only PLHIV with close transmission links, but all individuals infected locally. For subtype B, where the dispersal pattern was not monophyletic as was found for subtype A1, the regional transmissions were estimated equal to 77.0%, which is much higher than the estimations of previous studies (reasons mentioned above). Notably, Greece provides one of the few countries in Europe where subtype A1 is found at high prevalence and is associated with the local population and not with the PWID risk group [19,41]. Similarly, it has been reported for Albania and Cyprus [39,42], while for most countries in Western and Central Europe, subtype C and recombinant forms are the most prevailing viruses [3]. The subtype A1 epidemic in Greece differs from the A6 epidemic in Russia and Eastern European countries, where the majority of transmission have been found among the PWID and the origin of infection is from a distinct route [19,[43][44][45][46][47]. Furthermore, a recent study showed that Greece provides the main source of subtype A1 transmissions in Europe and remote locations, such as the USA and Australia [48].
The proportion of subtype A1 and B introductions leading to MTCs were similar, suggesting that approximately one-quarter of cases were associated with different levels of onward transmissions in MTCs. These findings indicate that one of the reasons why the non-B subtypes are less frequently associated with transmission chains is probably that they are less abundant in European countries than subtype B [49] (http://www.hiv.lanl.gov). Although the estimated proportion of strains leading to MTCs can be extrapolated as a proportion of onward transmission for the whole population infected within the MTCs, it provides a proxy of the proportion of cases providing as sources for HIV infection. Given that our study population is a subsample of the total number of PLHIV in Greece, the previous proportion should be considered as the minimum estimate.
Our analysis suggested that regional transmission was associated for both subtypes with Greek origin, while the MSM risk group was identified as an additional associated parameter only for subtype A1 and more recent diagnosis and younger age for subtype B, respectively. These findings are in accordance with findings from Europe and the USA, where the MSM risk group, male gender, younger age, and recent diagnosis were the main factors associated with HIV-1 infections within MTCs [18,37,38,[50][51][52], suggesting that prevention efforts should focus on populations with these characteristics. The exact effect of the time of sampling to MTCs cannot be estimated, since for sequences sampled earlier, identifying their transmission links is less likely and therefore, the likelihood is lower to be placed in MTCs [18].
Regarding the dispersal patterns of the virus, our analyses focused on sequences found within MTCs and we found that regional dispersal dominated within the geographic areas of study. This finding was supported by the significantly lower number of migration events found among the different areas as compared to the migration events expected under the scenario of random mixing of the sequences. The quantification of viral migration events among the different localities suggested a limited number of transmissions from the largest area to the two others, whereas transmissions from Northern Greece to Central/Southern Greece were higher than expected by chance. We should note that the number of transmissions from Central/Southern Greece to the other areas was the largest, but according to statistical phylogeography, they were less than those expected under the random mixing scenario.
The high numbers of transmissions from Central/Southern Greece were probably due to the larger sample size of this area compared to Northern Greece or Crete; however, these numbers were lower than what was expected by chance. These findings indicate that Northern Greece, although it is smaller in sample size than Central/Southern Greece, transmits the virus at high rates to the latter area. Therefore, besides the major local dispersal pattern observed within the three different areas, some cross-regional transmissions occur, with Northern Greece acting as a major source for this type of viral dispersal. A similar pattern has been observed for the dispersal of the most prevalent mutations associated with resistance to antiretroviral drugs (ARV), where the major pattern included regional dispersal with some cases of cross-regional transmissions [21,53]. Similar findings were reported elsewhere, highlighting the important effect of transmission clusters in the increasing prevalence of NNRTI resistance mutations among treatment-naïve populations [37,51,[54][55][56].
In conclusion, our study provides a detailed picture of the characteristics of the HIV-1 epidemic regarding the dispersal patterns of the most prevalent subtypes in Greece. Since we are one step before the implementation of molecular surveillance as a method to identify PLHIV and their links to higher risk for HIV infection, our aim was to perform a comprehensive analysis of MTCs and their characteristics. We identified the MTCs using a novel approach for bootstrapping and we showed that HIV-1 regional transmission dominates in Greece and, also, it is associated with the MSM risk group (subtype A1) and Greek origin (both subtypes). Overall, our findings provide some novel insights that can be important for national policy against HIV. Moreover, we showed that approximately one-quarter of introductions leads to regional dispersal. This estimate might provide a proxy of the percentage of PLHIV contributed to onward transmissions.