Increased HIV Subtype Diversity Reflecting Demographic Changes in the HIV Epidemic in New South Wales, Australia

Changes over time in HIV-1 subtype diversity within a population reflect changes in factors influencing the development of local epidemics. Here we report on the genetic diversity of 2364 reverse transcriptase sequences from people living with HIV-1 in New South Wales (NSW) notified between 2004 and 2018. These data represent >70% of all new HIV-1 notifications in the state over this period. Phylogenetic analysis was performed to identify subtype-specific transmission clusters. Subtype B and non-B infections differed across all demographics analysed (p < 0.001). We found a strong positive association for infections among females, individuals not born in Australia or reporting heterosexual transmission being of non-B origin. Further, we found an overall increase in non-B infections among men who have sex with men from 50 to 79% in the last 10 years. However, we also found differences between non-B subtypes; heterosexual transmission was positively associated with subtype C only. In addition, the majority of subtype B infections were associated with clusters, while the majority of non-B infections were singletons. However, we found seven non-B clusters (≥5 sequences) indicative of local ongoing transmission. In conclusion, we present how the HIV-1 epidemic has changed over time in NSW, becoming more heterogeneous with distinct subtype-specific demographic associations.


Introduction
The human immunodeficiency virus 1 (HIV-1) circulates globally as different genetically defined subtypes, with prevalence varying between different countries. Subtype C is most prevalent in Africa and subtype B is most prevalent in America, most of Europe, and Australia. In addition, over 98 circulating recombinant forms (CRFs) have been identified (Los Alamos National Lab (LANL) database (http://www.hiv.lanl.gov/). One of these, the circulating recombinant form 01_AE (CRF01_AE), is the most prevalent subtype in Southeast Asia [1]. A study by Angelis et al. showed that Thailand is the origin of the global CRF01_AE distribution, and this is consistent with Thailand's popularity as a travel destination [2]. An increase in the proportion of CRF01_AE infections has recently been reported in the Philippines [3] and in Australia [4]. Furthermore, changes in subtype diversity are often related to migration patterns [5]. For example, a study from the USA reported an increase in non-B subtypes for numerous states that was associated with infections among foreign-born individuals [6]. Studies from Germany and Italy found that an increase in non-B subtypes was correlated to an increase in infections reported among individuals born in countries where the corresponding subtypes had a high prevalence [7,8]. Similarly, other studies found changes in subtype prevalence among individuals reporting differing transmission risk factors. For example, a study from China reported a decline in subtype B but an increase in CRF01_AE infections among individuals who reported heterosexual transmission [9]. Another study from Spain found an increase in non-B infections among men who have sex with men (MSM) [10]. These changes show that the HIV-1 epidemic is experiencing increased subtype diversity, which causes even further challenges for vaccine designs, but also HIV testing [11].
The prevalence of HIV-1 in Australia is low at 0.14%. The epidemic is characterised by a majority of infections occurring in MSM (63% of all HIV-1 notifications in 2017) and being of subtype B origin (62% in 2016) [4]. In addition, the epidemic is becoming more diverse. Between 2013 and 2017, a 11% decline in new infections associated with MSM transmission was reported, while during the same period a 10% increase in infections associated with heterosexual transmission was seen [4]. Further, differences in subtype prevalence are present between different Australian states; the subtype B prevalence varies between 59% in Western Australia and 82% in Queensland [12]. Most recently, South Australia reported a subtype B prevalence of only 36% for 2018 [13]. New South Wales (NSW) is the most populous state in Australia and accounts for approximately 30% of all HIV-1 infections annually [4]. We have reported previously an increase in CRF01_AE subtype prevalence from an average of 16% in the period 2004-2008 to 42% between 2014 and 2017 [14]. We have also shown the presence of local ongoing transmission for both subtype B and CRF01_AE ( [15], accepted for publication [15]). However, it is unclear to what extent local ongoing transmission of other non-B subtype is present in NSW and to what extent demographic factors have changed in recent years for non-B infections. Thus, we report here: (i) the extent to which various demographic factors correlate with transmission patterns within different subtypes, (ii) the factors that are most or least similar between different subtypes, and (iii) how demographic associations have changed over time for different subtypes.

Ethics
Routinely collected sequence and demographic data on all newly notified HIV-1 infections are linked and irreversibly de-identified to enable public health research in NSW as previously described [14].

Sequence Phylogeny and Identification of NSW-Specific Clusters
A total of 2364 sequences of the reverse transcriptase (RT) were retrieved from the local database that includes all HIV-1 sequences in NSW: these represent sequences from new HIV notifications reported between 2004 and 2018. HIV-1 subtype was determined using the Stanford HIV subtyping tool and confirmed via a phylogenetic tree estimated using RAxML v8.2.12 [16,17]. The sequences fell into subtypes B, CRF01_AE, C, A, and multiple individual other subtypes and CRFs. The same NSW sequences were then combined with global sequence data retrieved from the HIV LANL database as described elsewhere [18]. In short, NSW HIV-1 RT nucleotide sequences were compared against global HIV-1 sequences via BLASTN to retrieve most similar sequence data to be used as background. Sequences were aligned using MAFFT L-INS-I [19] and the alignment was inspected in Geneious 11.1.3 (https://www.geneious.com) for accuracy. The final alignment consisted of 6300 sequences and 628 nt in length. A phylogenetic tree was estimated using FastTreeMP v2.1.10 [20]. NSW-specific clusters were defined as sequence nodes containing 100% of NSW sequences (monophyletic). Although, it is common to use partial pol including protease for HIV-1 sequence we decided to use the RT only as one of the laboratories performing standard diagnostic testing does not process the protease any longer. Thus, we could assure complete coverage for all sequences analysed. RT alignment length has been used for phylogenetic analysis and has sufficient signal for an accurate phylogenetic analysis [21].

Demographic Factors
Demographic data included (i) sex: male, female; (ii) region of birth: Australia, not Australia; (iii) transmission risk factor: MSM (men who have sex with men), heterosexual, people who inject drugs (PWID), other; (iv) age groups: 0-19 years, 20-29 years, 30-39 years, 40-49 years, 50+ years; and (v) residential area based on categories of postcodes according to the proportion of adult gay men: <5%, 5-19.9%, and ≥20% [22]. Disease stage data included stage of infection at diagnosis. Infection stage categories were adapted from the NSW Ministry of Health definitions [23]: early = evidence of an HIV-1 infection acquired within 12 months of diagnosis or CD4+ T-cell count > 500 cells/mm 3 , CD4+ T-cell count 350-499 cells/mm 3 , CD4+ T-cell count 200-349 cells/mm 3 , and advanced = CD4+ T-cell count < 200 cells/mm 3 or AIDS defining illness in absence of early diagnosis. Chi-square tests of independence for the association of demographic data with HIV subtype were performed in R v3.6.2 [24] using the gplots and corrplots packages [25,26] and linear regression and correlation coefficient were calculated in MATLAB ® v2020a and regarded a significant with p values < 0.05 [27]. To ensure the individuals' data privacy, demographic data containing <5 sequences per year analysed was will not be shown. Instead, proportion of sequence data rather than total number is shown for demographic comparisons, though, these are not independent of each other. In addition, demographic changes over time were analysed for the time period 2012-2018 due to the limited sequence data available in previous years (<5 sequences for non-B subtypes per year analysed).

Data Availability
A random subset of 10% of the NSW sequences analysed here is available via NCBI under the GenBank accession numbers MW246250-MW246484. This will prevent the potential identification of sequence networks and ensure the individuals' data privacy.

Subtype B and CRF01_AE Dominate the HIV-1 Epidemic in New South Wales
In NSW, the majority of HIV-1 infections are of subtype B origin. Between 2004 and 2018, 70% of sequences were of subtype B, 18% of CRF01_AE, and 6% of subtype C origin ( Figure 1A). The remaining sequences included subtype A (2%), and subtypes F, G, and D (1%), and other CRFs (2%). Data used in our analysis included only sequences that were linked to a newly notified case of HIV, and thus had demographic data available. The proportion of sequences that were able to be linked to a  Figure 1B) [23]. The proportion of sequences that could be linked to meta data did not differ across subtypes B, C, and other subtypes and CRFs (p = 0.57, linked proportions of between 57 and 60%). However, a slightly higher proportion of sequences could be linked for CRF01_AE (63%, p < 0.01). Due to this improved data collection, the number of sequences increased significantly over time for all subtypes ( Figure 1B). Most notably, for CRF01_AE from 0 to 66 sequences between 2004 and 2018.
Viruses 2020, 12, x FOR PEER REVIEW 4 of 12 remaining sequences included subtype A (2%), and subtypes F, G, and D (1%), and other CRFs (2%). Data used in our analysis included only sequences that were linked to a newly notified case of HIV, and thus had demographic data available. The proportion of sequences that were able to be linked to a notified case increased over time from 34% in 2004 to 51% in 2011 and 83% in 2018. Thus, the number of sequences included in the analysis here was higher in 2018 (n = 233) compared to 2006 (n = 64) even though the annual number of new HIV-1 notifications in NSW has been declining (n = 397 in 2006, n= 278 in 2018) ( Figure 1B) [23]. The proportion of sequences that could be linked to meta data did not differ across subtypes B, C, and other subtypes and CRFs (p = 0.57, linked proportions of between 57 and 60%). However, a slightly higher proportion of sequences could be linked for CRF01_AE (63%, p < 0.01). Due to this improved data collection, the number of sequences increased significantly over time for all subtypes ( Figure 1B). Most notably, for CRF01_AE from 0 to 66 sequences between 2004 and 2018.

Subtype C is Strongly Associated with Hetereosexual Transmission
The subtypes differed significantly in their associated demographics (Table 1, Figure S1A). First, subtype B dominated the data set with 68% of sequences deriving from infections of this subtype, and for almost all demographics the majority of infections were subtype B. Two exceptions were infections among females and individuals reporting heterosexual transmission, for which only 27% and 35% where of subtype B origin, respectively. Overall, subtype B infections were positively associated with individuals that were male, born in Australia, 40-49 years old, living in postcodes with a gay male population of ≥20%, reporting MSM transmission as their risk factor, or were diagnosed during the early stage of infection (p < 0.001). Non-B subtypes were positively associated with infections among individuals that were female, not born in Australia, <30 years of age, living in postcodes with a gay male population of <5%, reporting heterosexual transmission, or being diagnosed during the advanced stage of infection (p < 0.001). Demographics also differed within non-B subtypes (Table 1, Figure S1A). For non-B infections, CRF01_AE was the dominant subtype, making up 56% of all non-B infections. Similar to subtype B, CRF01_AE accounted for the majority of non-B infections among individuals that were male, born in Australia, >20 years old, living in postcodes with a gay male population of >5%, or reporting MSM transmission (Table 1). Thirty-eight percent of infections among females were subtype C (p < 0.001, Figure S1A). In our study, we found that a large proportion of people with non-B subtypes reported Viruses 2020, 12, 1402 6 of 12 heterosexual contact (53%, 29%, and 33% for subtype C, CRF01_AE, and other subtypes and CRFs, respectively). In fact, heterosexual transmission was positively associated with all non-B subtypes (p < 0.001). In addition, significant differences were also found for age (p < 0.05), postcode according to gay male population (p < 0.001), transmission risk factor (p < 0.001), and stage of infection (p < 0.01) ( Figure S1). Subtype C infections were also more likely to be among individuals that were <20 years of age and 40-49 years old, living in postcodes with a gay male population of <5%, or reported heterosexual transmission. CRF01_AE infections had a positive association with individuals that were male, >50 years old, living in a postcode with a gay male population of 5-19%, reported MSM transmission, or were diagnosed during the advanced stage of infection. All other subtypes and CRFs combined had a positive association with individuals that were 20-29 years old or were diagnosed during the early stage of infection.

Changes in Associations between Demographics and Subtype between 2009 and 2018
We investigated changes over time between subtypes and different demographics for the time period 2009-2018 as it contained sufficient data for all subtypes analysed. Between 2009 and 2018, the changes in subtype-specific demographics over time differed for subtype B and non-B infections. This was most notable for region of birth and transmission risk factors ( Figure 2). For subtype B, we found significant changes in three of the five demographics analysed. We observed a decrease in the proportion of B infections from 66% to 54% among those born in Australia (p < 0.05), but an increase from 27% to 46% among those not born in Australia (p < 0.01). We also observed a decrease in infections among individuals aged 20-29 years (p < 0.001) and in parallel a decrease among those aged 40-49 years (p < 0.01). No changes were found for transmission risk factor and stage of infection. For CRF01_AE we found significant changes over time for transmission risk, and stage of infection. We found an increase in infections among individuals reporting MSM transmission combined with a decrease in infections among individuals reporting PWID or other as a transmission risk factor (p < 0.05). Further, we observed an increase in infections among individuals diagnosed during the early stage (p < 0.05) combined with a decrease in infections among individuals diagnosed during the advanced stage (p < 0.01). We also found an increase in infections among MSM for subtype C and other subtypes and CRFs (p < 0.001) and this was combined with a decrease in infections among heterosexuals for subtype C (p < 0.05). No other demographics showed significant changes over time for subtype C and other subtypes and CRFs.
A study by Callander et al. showed the vast discrepancy in the proportion of gay men and lesbian women living in different residential areas in NSW [22]. They showed that 2.7% of postcodes were highly populated by gay men and lesbian women, i.e., 'gayborhoods'. They also identified postcodes with an estimated gay male population of ≥20%. Here, we used this differentiation to identify changes in HIV-1 diversity within and outside these postcodes. Interestingly, for subtype B, we found a decrease from 29% to 19% in the proportion of infections among individuals living in postcodes with a gay male population of ≥20% (p < 0.01). However, we found an increase from 45% to 57% in the proportion of infections individuals living in postcodes with a gay male population of <5% (p < 0.01). We also found an increase from 15% to 30% for CRF01_AE infections among individuals living in postcodes with a gay male population of 5-19.9% (p < 0.05). 0.05). Further, we observed an increase in infections among individuals diagnosed during the early stage (p < 0.05) combined with a decrease in infections among individuals diagnosed during the advanced stage (p < 0.01). We also found an increase in infections among MSM for subtype C and other subtypes and CRFs (p < 0.001) and this was combined with a decrease in infections among heterosexuals for subtype C (p < 0.05). No other demographics showed significant changes over time for subtype C and other subtypes and CRFs.

Demographic Differences in Clusters and Singleton Sequences
Patterns of HIV-1 transmission were assessed by investigating sequence clusters for all subtypes. Overall, 232 sequence clusters and 165 sequence pairs were identified. For subtype B, 67% of all infections were associated with a cluster (>3 sequences) and 10% were associated with a sequence pair ( Figure 3A). In contrast, for people with subtype C, CRF01_AE, and other subtypes and CRFs a much lower proportion of infections were associated with a cluster (37%, 39% and 28%, respectively). These latter three subtype categories were characterised by a high frequency of singleton sequences (47%, 37%, 49% for subtype C, CRF01_AE, and other subtypes and CRFs, respectively). The proportion of infections that were associated with a cluster or not was also compared over time, but no changes were found for either subtype or CRF ( Figure 3B). Finally, we compared demographics for infections in clusters and pairs and those that were singletons Table S1). Interestingly, no differences were found for any of the non-B subtypes with similar proportion of infections being either a cluster/pair or singletons for each of the demographics analysed. However, we found significant differences for subtype B ( Figure S1B). In concordance with our previous study, we found a strong positive correlation for infections among individuals not born in Australia (p < 0.001), as well as among heterosexuals, for being singletons (p < 0.01). Further, infections from notifications during the advanced stage were more likely to be singletons (p < 0.05). In addition, we found here that subtype B infections among individuals 50 years and older were also more likely to be singletons (p < 0.05).

Discussion
We report here on the growing diversity of HIV-1 in NSW. We show how non-B subtypes have increased over time, particularly among individuals reporting MSM as a transmission risk factor. We also show how subtype-specific demographics differ in general between subtype B and non-B subtypes, which also changed over time. In addition, we showed the importance of sequence data for epidemiological analysis as notification data alone do not represent the changes in genetic variability within an epidemic.
Overall, we found significant differences between subtype B and non-B subtypes for all demographics analysed. Subtype B was strongly associated with male individuals and those born in Australia, while non-B subtypes were associated with female individuals and those not born in Australia. The strongest positive association was found for subtype C being among female individuals (p < 0.001). A large proportion of non-B subtypes infections among females has also been reported in the adjoining state, Victoria. In their study, Chibo et al. identified 85% of infections among female individuals reported in 2010 being of non-B origin [28]. In addition, 62% of infections of other subtypes and CRFs were among individuals 20-29 years old and not born in Australia. This positive association between young individuals and non-B HIV infection could in part be explained by the relatively high number of international students completing higher education in Australia (~30% of all higher education students were international in 2018) [29]. Further, NSW Health reported a 33% increase in late diagnosis, i.e., diagnosis made >12 months after transmission, among MSM not born in Australia and concludes that these infections are likely to be acquired outside Australia [23]. However, in our study we found a decrease in the proportion of infections notified during the Finally, we compared demographics for infections in clusters and pairs and those that were singletons Table S1). Interestingly, no differences were found for any of the non-B subtypes with similar proportion of infections being either a cluster/pair or singletons for each of the demographics analysed. However, we found significant differences for subtype B ( Figure S1B). In concordance with our previous study, we found a strong positive correlation for infections among individuals not born in Australia (p < 0.001), as well as among heterosexuals, for being singletons (p < 0.01). Further, infections from notifications during the advanced stage were more likely to be singletons (p < 0.05). In addition, we found here that subtype B infections among individuals 50 years and older were also more likely to be singletons (p < 0.05).

Discussion
We report here on the growing diversity of HIV-1 in NSW. We show how non-B subtypes have increased over time, particularly among individuals reporting MSM as a transmission risk factor. We also show how subtype-specific demographics differ in general between subtype B and non-B subtypes, which also changed over time. In addition, we showed the importance of sequence data for epidemiological analysis as notification data alone do not represent the changes in genetic variability within an epidemic.
Overall, we found significant differences between subtype B and non-B subtypes for all demographics analysed. Subtype B was strongly associated with male individuals and those born in Australia, while non-B subtypes were associated with female individuals and those not born in Australia. The strongest positive association was found for subtype C being among female individuals (p < 0.001). A large proportion of non-B subtypes infections among females has also been reported in the adjoining state, Victoria. In their study, Chibo et al. identified 85% of infections among female individuals reported in 2010 being of non-B origin [28]. In addition, 62% of infections of other subtypes and CRFs were among individuals 20-29 years old and not born in Australia. This positive association between young individuals and non-B HIV infection could in part be explained by the relatively high number of international students completing higher education in Australia (~30% of all higher education students were international in 2018) [29]. Further, NSW Health reported a 33% increase in late diagnosis, i.e., diagnosis made >12 months after transmission, among MSM not born in Australia and concludes that these infections are likely to be acquired outside Australia [23]. However, in our study we found a decrease in the proportion of infections notified during the advanced stage only for CRF01_AE from 46% to 20% between 2009 and 2018 (p < 0.01) but found no changes over time for any other subtype. The decline in advanced diagnosed for CRF01_AE infections was combined with an increase in diagnoses during the early stage of infection, which indicates an increase in early testing for individuals harboring CRF01_AE.
We observed an overall increase in infections among MSM for all non-B subtypes over time; for subtype C (0% to 50%), for CRF01_AE (23% to 74%), and for other subtypes and CRFs (0% to 79%). This means that the observed decline in HIV-1 infections in NSW [23] is driven by a decrease in subtype B infections among MSM born in Australia only, while the number of non-B infections has not changed. However, at this point it is unclear what has caused the dichotomy in number of infections between subtype B and non-B among individuals born in Australia and those not born in Australia, as well as among individuals reporting MSM and heterosexual transmission risk. In Australia, infections with non-B subtypes are often associated with migration or travel and are thus more likely to have been acquired elsewhere. Thus, individuals acquiring these infections might not have been exposed to local prevention strategies, subsequently their numbers would not have changed over time. However, HIV-1 acquisition outside of Australia does not explain the observed increase in subtype B infections among individuals not born in Australia, which could indeed represent local transmission.
Notably, we found changes for infections in individuals living in postcodes with a high or low proportion of gay male population. For subtype B, we found an increase in infections among individuals living in postcodes with a gay male population of <5% and a parallel decrease for infections among individuals living in postcodes with a gay male population ≥20%, indicating a shift from 'gayborhoods' to suburbs with low proportion of gay men. Our findings are consistent with the study by Grulich et al. that reported a 52% decline in HIV-1 notifications within postcodes in Sydney, NSW, where >10% of resident men identified as gay compared to 7% in other residential areas [30]. Their study investigated the impact of pre-exposure prophylaxis (PrEP) rollout in NSW on the number of new HIV-1 infections in the state and reported an overall decline of 25% among MSM. However, the study also showed that this decline was lower in postcodes with a gay male population of <10%. Another study found a lower PrEP uptake in gay and bi-sexual men aged 25 years and younger in NSW [31]. Taken together, these variable trends across different subtypes and demographics indicate potential variable penetration of public health initiatives, such as those to increase testing rates or to increase access to PreP in at risk populations.
Finally, we quantified transmission clusters and found some clusters containing <5 sequences which are indicative of ongoing local transmission. Such subtype B and CRF01_AE clusters have been extensively described elsewhere ( [15], accepted for publication) and will not be discussed here. In the current study, we found three subtype C clusters (n = 7 infections, n = 5, n = 5), one CRF02_AG cluster (n = 7), one subtype A cluster (n = 6), and one CRF60_BC cluster (n = 5). However, only two of these clusters contained infections sampled in 2017 or 2018 that are indicative of recent local transmission. In addition, we found that while subtype B contained overall a majority of infections associated with clusters (67%), non-B subtypes contained a majority of infections that were either singletons or pairs. This means that non-B subtypes rarely lead to ongoing transmission in NSW.
This study had a number of limitations. Most notably, molecular data only include sequences from NSW, thus, we lack data from other states in Australia and might miss inter-state transmission. Further, we were only able to use the reverse transcriptase gene for phylogenetic analysis which might limit the depth of genetic clustering found here. However, we are using global data as background and thus clusters are defined by being more closely related to each other than to sequence data from other regions [32]. We also use a very conservative threshold of 100% NSW-specific sequences being part of a cluster and believe that both these parameters increase the sensitivity of cluster analysis. We also lack some demographics, e.g., time since arrival in Australia that would improve associations of HIV-1 acquisition outside Australia and potential transmission in NSW.

Conclusions
In conclusion, we found significant differences in the demographic characteristics for newly diagnosed patients with different HIV subtypes. People with non-B subtypes were mostly foreign born, people with subtype C were mostly heterosexual, and people with other subtypes and CRFs were mostly 20-29 years old. Lastly, though limited, ongoing local transmission for non-B subtypes was observed in NSW. These data may assist with targeted prevention interventions and suggests that ongoing surveillance of HIV-1 molecular data is needed to continue to inform strategies aimed at virtual elimination of HIV in NSW.