Determinants of HIV-1 Late Presentation in Patients Followed in Europe

To control the Human Immunodeficiency Virus (HIV) pandemic, the World Health Organization (WHO) set the 90-90-90 target to be reached by 2020. One major threat to those goals is late presentation, which is defined as an individual presenting a TCD4+ count lower than 350 cells/mm3 or an AIDS-defining event. The present study aims to identify determinants of late presentation in Europe based on the EuResist database with HIV-1 infected patients followed-up between 1981 and 2019. Our study includes clinical and socio-demographic information from 89851 HIV-1 infected patients. Statistical analysis was performed using RStudio and SPSS and a Bayesian network was constructed with the WEKA software to analyze the association between all variables. Among 89,851 HIV-1 infected patients included in the analysis, the median age was 33 (IQR: 27.0–41.0) years and 74.4% were males. Of those, 28,889 patients (50.4%) were late presenters. Older patients (>56), heterosexuals, patients originated from Africa and patients presenting with log VL >4.1 had a higher probability of being late presenters (p < 0.001). Bayesian networks indicated VL, mode of transmission, age and recentness of infection as variables that were directly associated with LP. This study highlights the major determinants associated with late presentation in Europe. This study helps to direct prevention measures for this population.


Introduction
At the end of 2019, there were 38.0 million people living with the Human Immunodeficiency Virus (HIV) and 1.7 million people were newly infected worldwide. However, 7.1 million people were still unaware of their HIV status [1].
For the control of the HIV pandemic, the World Health Organization (WHO) had set a 90-90-90 target until 2020. 90% of people living with HIV know their status, of those 90% are receiving antiretroviral therapy (ART) and of those 90% achieve viral suppression. These targets had been successful in some countries. Globally, by the end of 2019, there were 81% of people living with HIV who knew their status. Of those, 67% were receiving antiretroviral therapy and of those 59% had reached HIV viral suppression. The success of these goals is dependent on the region of origin, the vulnerability of populations and on the national HIV programs that are implemented. Yet, between 2010 and 2019, the percentage of new infections dropped by 31% [2].
New goals were set to end the pandemic by 2030, the 95-95-95 targets, based on the same definition of the previous targets. In order to attain the WHO goals by 2030, early diagnosis is essential [3].
One major concern threatening those goals is late presentation. Late presentation can have consequences in the health and treatment of infected individuals, leading to poorer outcomes and increased health care costs, since it has been shown that late presenters, especially those aged above 50 years old, are at higher risk for developing non-infectious co-morbidities and complex multimorbidity [4]. In addition, late presentation can have a negative impact on the control of the pandemic, increasing the risk of onward HIV transmission in individuals that are not aware of their HIV status. Besides, late presentation to HIV care was shown to be the main reason for virological failure [5,6].
Late presentation is defined as an individual presenting a TCD4+ count lower than 350 cells/mm 3 or an AIDS-defining event, regardless of TCD4+ cell count. This is the definition according to the European Late Presenter Consensus working group [7]. It is estimated that Late Presenters (LP) account for 40-60% of HIV cases in Europe, in Asia the percentage of LP range from 72 to 83%, in Africa range from 35 to 89% and in Brazil, it is estimated that the percentage is near 40% [8][9][10]. For prevention and treatment of HIV, timely diagnosis and linkage to health care are essential tools [11].
The present study has the objective of identifying determinants of late presentation in Europe. To achieve this goal, we analyzed a population of patients from the EuResist database, a European database.

Characteristics of European Population
Among 89851 HIV-1 infected patients included in the analysis, the median age was 33 (IQR: 27.0-41.0) years and 74.4% were males. From those 28889 patients (50.4%) were LP and 28388 (49.6%) were non-late presenters (NLP). The majority of patients with information about treatment status were naïve, 11487 (58.6%). 41.9% of patients were men who have sex with men (MSM) and 78.5% originated from Western Europe. The most prevalent subtype in this population was subtype B (64.4%), followed by Subtype G (20.4%), CRF 02_AG (15.9%) and Subtype A (13.5%). Most of the patients included in this study were classified as Native (75.4%) and as having Chronic Infection (59.8%) based on the ambiguity rate of the first genomic sequence. CD4 count at diagnosis and viral load at diagnosis (log10) presented a median of 348 cells/mm 3

Determinants Associated with Late Presentation
In the unadjusted model (Table 2), sex was associated with LP. In the HIV exposure category, significant differences were found for MSM and Intravenous Drug Users (IDU) compared with heterosexuals. Significantly more LP were from Africa and other regions compared to Western Europe. In addition, the variables age at diagnosis, viral load, subtype, recentness of infection and migrant status were significantly associated with LP. Determinants associated with late presentation were age at diagnosis ( Table 2): patients with less than 30yo had lower probability of being late presenters and patients aged above 56yo had higher probability of being late presenters when compared with patients aged between 31 and 55yo (>18yo: aOR 0.31 (0.20-0.49), p < 0.001; 19-30yo: aOR 0.46 (0.34-0.62), p < 0.001; >56: aOR 1.70 (1.49-1.94), p = 0.004), transmission via MSM had lower probability when compared with heterosexuals (aOR 0.74 (0.64-0.86); p < 0.001). Patients originating from Africa and South America had 1.76 and 1.41 more probability, respectively, of presenting late than those from Western Europe (aOR 1.76 (1.37-2.26), p < 0.001; aOR 1.41 (1.07-1.87), p = 0.015, respectively) and patients presenting with a viral load between 4.1 and 5.0 and higher than 5.1 had a higher probability of being LP than those with a viral load lower than 4.0 (aOR 1.45 (1.37-1.53) and aOR 3.41 (3.21-3.62); p < 0.001 and p < 0.001, respectively). As expected, but confirming the reliability of our classification of recentness of infection based on the ambiguity rate, patients with a recent infection-as classified based on the ambiguity rate of the genomic sequence from the first drug resistance testhad a lower probability of being LP than those classified as being chronically infected (aOR 0.61 (0.55-0.68); p < 0.001).

Bayesian Network
For the bayesian network, we used the HillClimber algorithm with nine as the maximum number of parents a node in BN can have. This algorithm is based on a "hill climbing adding and deleting arcs with no fixed ordering of variables" [12]. The BN had a LogScore Bayes of −35615.94 and an accuracy of 61%. In the BN (Figure 1), LPs are directly associated with the viral load, recentness of infection, mode of transmission and age, as we can see in the figure below, those were direct links between the nodes. The indirectly associated links were between LP and region of origin. As we can also see in the figure, there was no direct link between those two nodes. We can see that the mode of transmission is the variable with more direct associations and the variable sex is the only one that is not associated with LP. This BN is in accordance with our logistic regression model. The variables Subtype and Migrant status had been removed from the logistic regression model due to the conflict with the variable region of origin. As we can see in Figure 1, the region of origin is directly associated with those two variables and that the migrant status is only associated with region of origin.

Ambiguity Rate and CD4 Analysis
We performed an analysis to understand the association between CD4 count and the ambiguity rate overall and on subtype B, non-B and G. This association was inversely proportional in all correlations, this means that for higher values of CD4 count the ambiguity rate is lower. In this study, the LP population had higher ambiguity rates in their sequences, since their CD4 count is lower. We also performed a linear regression in order to explain how much of the CD4 count could the ambiguity rate explain. We divided that analysis in the same categories as mentioned above and the higher result was from only individuals with non-B subtype, in which the ambiguity rate explained 5% of the variation from CD4 count (Tables A1-A4).

Analysis of Late Presenters Rate over Time
We also constructed a graph to evaluate the evolution in time of the rate of LP (Figure 2). The confidence intervals were also calculated for each point. We did not include in the analysis the first three years (1981)(1982)(1983)) since the total number of patients in those years was low and the confidence intervals had high values. In 2019 the sample size was also small, but we included this year in the analysis to see the trend that LPs in Europe will have. As we can see in the graph, LPs have had constant values through the years. In 1984 we had 57.5% LPs, in 1991 we had the lowest value of LPs (45.1%). The evolution through the years maintained between 45 and 60% the rate of LP. Since 2017 the rate of LPs was growing until 2019 that peaked, beyond 60%.

Discussion
This study had the goal of explaining the determinants of late presentation for HIV-1 infection in Europe.
In our population, late presenters represented 50.4% of the patients. A study in Georgia, using the same definition of late presentation as we used, reported 63.4% of late presenters. Another study analyzing late presentation in different settings indicated a rate of late presentation ranging between 40 and 67%, depending on the region of study. This study corresponded to the Swiss data incorporated in the COHERE study, a Collaboration of Observational HIV Epidemiological Research Europe Study. Our results are concordant with the results reported in these studies [13][14][15].
In our study, late presenters were more frequently males, with heterosexual transmission, from Western Europe and aged between 31 and 55 years old. In a study in East of England, the percentage of late presenters was higher in older patients and patients with heterosexual contact, when compared with homosexual and bisexual contact. Furthermore, according to other studies in Poland and the Netherlands, males were also more prevalent in the late presenters' population. These results are consistent with our study [16][17][18].
Patients originated from Africa had a higher probability of being LPs when compared to patients originated from Western Europe. This percentage of African migrants in the LP population can be explained by the lower access to health care. Furthermore, African migrants have a higher probability of being in conditions of unemployment, poverty and poorer household, which further increase their barriers to access to health care. A positive status for HIV also stigmatizes individuals, and they fear the reactions of their communities, since HIV is mostly associated among these communities with inappropriate and promiscuous sexual behavior [19]. The migrants of our study from South America were mainly from Brazil and the LP rate was lower than the NLPs. This can be explained in two ways: Brazil has a concentrated HIV epidemic among MSM population and that population is frequently tested [20]. These results are in accordance with HIV studies about the migrant population [21][22][23].
The results from a previous study showed a statistically significant correlation between late presentation and IDUs [24]. In our study, we found this significant association between LP and IDU in our univariate analysis, but in the logistic regression analysis, we only found significant the association between MSM when compared to heterosexuals. The prevalence of HIV-positive IDU population is mainly from Eastern Europe. In our study, the IDU group maybe underrepresented since the larger proportion of cases are from Western Europe, in which the major mode of transmission is through heterosexual and MSM contacts [25,26].
We also studied the association between CD4 count and the ambiguity rate of the sequences included in this study. Our results show a negative correlation between CD4 count and the ambiguity rate, for lower values of CD4 we had higher values of the ambiguity rate. There is still little information regarding this topic, but our results were in accordance with a study about sequence ambiguity and HIV incidence trends [27]. In fact, the ambiguity rate could be an alternative variable to be used for the definition of Late Presentation. As we know, the initial drop of CD4 count in the acute phase of HIV infection can be a cause of bias when we define Late Presentation based on a CD4 count lower than 350 cells/mm 3 .
The results from the graph showed stable and high values for LPs rate. This indicates that LPs were and remain a big part of the HIV epidemic and represent a major threat to treatment and prevention strategies.
The main goal of this study was to identify determinants associated with late presentation. Those determinants included age at diagnosis, mode of transmission, region of origin, recentness of infection and viral load at diagnosis (Figure 3). Our results were in concordance with other previously published studies [13,28,29]. The last study about late presentation in Europe was published in 2015 and the timeline of the study was between 2010 and 2013. This was an update from the first study published in 2013, with a timeline of analysis between 2000 and 2011 [29,30]. Our study analyzes a European database with a timeline between 1981 and 2019. The main strength of our study was the database used, which is one of the largest datasets and integrates clinical, socio-demographic and viral genotypic information from HIV-1 patients from all over Europe. This large dataset allows for a robust analysis of the data, and up to date information regarding late presentation. In addition, we can analyze trends in the evolution of late presentation in Europe.
The major limitation of our study was the lack of information about the stage of HIV infection and AIDS-defining events. While we used the ambiguity rate to minimize this problem, we only used the definition of a CD4 count below 350 cells/mm 3 to define an individual as LP or NLP.
Yet, this study is the most recent update on the HIV epidemic of late presentation in Europe, since the last one was published in 2015.
Since late presentation is a major obstacle to the 95-95-95 targets, it is necessary to reinforce the follow-up of this population. Increased HIV testing is key to reduce late presentation since it results in earlier HIV diagnosis. Prevention measures like targeting the vulnerable populations and increasing screening programs for those populations are the most urgent strategies to halt and decrease the percentage of late presenters. In lowand middle-income countries, point-of-care testing would be a major advance to stop the spread of the virus by those who do not know their serological status and therefore decreasing late presentation at diagnosis.

Study Group
Our study includes clinical and socio-demographic information from 89851 HIV-1 infected patients from the EuResist Integrated Database (EIDB) between 1981 and 2019. The EuResist integrated database (EIDB) is one of the largest existing datasets which integrate clinical, socio-demographic and viral genotypic information from HIV-1 patients. It integrates longitudinal, periodically updated data mainly from Italy (ARCA database), Germany (AREVIR database) Spain (CoRIS and IRISCAIXA), Sweden, Belgium, Portugal and Luxembourg [31][32][33].
In this study, information from the ARCA, AREVIR, Luxembourg, IRISCAIXA, Portugal, Russia, United Kingdom and CoRIS databases were used.

Subtyping
The genomic data included HIV-1 protease and reverse transcriptase sequences, generated through routine drug resistance testing and as stored in the EuResist database. Only the first HIV genomic sequence per patient was considered.

Study Variables
We used the information from the EuResist database regarding the following variables:  After creating these two variables, for quality control purposes, we only included in the analysis patients for which treatment status at date of first CD4 count and Treatment Status at date of first Drug Resistance test were consistent.

•
Recentness of infection-Based on ambiguity rate of genomic sequences. We defined Chronic infection as an ambiguity value higher than 0.45% and Recent infection as an ambiguity value equal or below 0.45% [36]. Additionally, only genomic sequences larger than 500 nucleotides and with ambiguity rate lower than 2.5% were considered. • LP vs. NLP-Based on CD4 count, LP were defined as patients with CD4 count lower than 350 cells/mm 3 and NLP were defined as patients with CD4 count higher than 350 cells/mm 3 .

Statistical Analysis
The proportion and median (interquartile range, IQR) of LP and non-Late presenters (NLP) were calculated for every categorical and continuous variable, respectively. Our interest variables were compared with the categorical variables with Chi-square test, and continuous variables with Mann-Whitney U test.
To study the relationship between our dependent variable (LP or NLP) and the independent variables, logistic regression models were calculated. We first presented the logistic regression with the unadjusted odds ratios (uOR) and confidence intervals at 95% (95% CI), in order to see the probability of our event, the dependent variable, (late presentation vs non-late presentation) on the occurrence of each independent variable, individually, e.g., the probability of a woman being late presenter. Variables with a p-value < 0.05 were considered to enter the model since it is the most used threshold. The final model for LP vs. NLP was adjusted for sex, this variable was forced into the model regardless of its significance and the reference class was women. The final model included only the variables that were considered statistically significant (p < 0.05) and the variables that suited the best regression model according to the backward stepwise regression analysis through SPSS. The odds ratio and 95% confidence intervals were calculated for those variables as well. Data were analyzed using RStudio (Version 1.2.5033) and SPSS (Version 26.0.0.0).

Bayesian Networks
A Bayesian network (BN) is a tool that consists of a directed acyclic graph (DAG), made of nodes and directed links between the nodes, which allows us to understand the representation of a probabilistic distribution. Each node is a representation of a variable, and the links indicate that one node is directly influencing another. The lack of a direct link does not mean that one variable is not associated with another. These networks are able to intuitively create causal links between variables since they are built from probability distributions and for prediction [37].
We constructed a BN to analyze the association between all variables, specifically, we wanted to see how the variables were associated with one another. With the different levels and connections between variables, it is possible to see if they are directly or indirectly associated. We used the WEKA software version 3.8.5. WEKA stands for Waikato Environment for Knowledge Analysis. After the upload of the dataset, the first step is to choose a classifier to start the analysis. We used a statistical-based learning scheme, the Bayes classifier, specifically the BayesNet [38]. After choosing the classifier, we used different search algorithms as a local score structure learning. Our final choice of algorithm was based on the LogScore Bayes value and the percentage of correctly classified instances.

Conclusions
In summary, late presentation still accounts for 50% of the new diagnosis in Europe. Its most important determinants are age at diagnosis, mode of transmission, region of origin and viral load at diagnosis (Figure 3). In addition, the evolution of the rate of late presentation through the years was stable, except for the last two years analyzed (2018 and 2019) when that rate showed an increase. This study highlights the major determinants associated with late presenters in Europe, and this will help to strengthen some prevention measures.  Institutional Review Board Statement: The protocol was in accordance with the declaration of Helsinki. This database contains anonymized patients' information, including demographic and clinical data from patients from the EuResist Integrated Database (Date of approval: 15 January 2021).

Data Availability Statement:
Restrictions apply to the availability of these data. Data was obtained from the EuResist Network and is available for request through a study application form at https: //www.euresist.org/become-a-partner with the permission of the EuResist Network.

Conflicts of Interest:
The authors declare no conflict of interest.