Phylodynamic and Phylogeographic Analysis of Hepatitis Delta Virus Genotype 3 Isolated in South America

The hepatitis delta virus (HDV) is a globally distributed agent, and its genetic variability allows for it to be organized into eight genotypes with different geographic distributions. In South America, genotype 3 (HDV-3) is frequently isolated and responsible for the most severe form of infection. The objective of this study was to evaluate the evolutionary and epidemiological dynamics of HDV-3 over the years and to describe its distribution throughout this continent in an evolutionary perspective. While using Bayesian analysis, with strains being deposited in the Nucleotide database, the most recent common ancestor was dated back to 1964 and phylogenetic analysis indicated that the dispersion may have started in Brazil, spreading to Venezuela and then to Colombia, respectively. Exponential growth in the effective number of infections was observed between the 1950s and 1970s, years after the first report of the presence of HDV on the continent, during the Labrea Black Fever outbreak, which showed that the virus continued to spread, increasing the number of cases decades after the first reports. Subsequently, the analysis showed a decrease in the epidemiological levels of HDV, which was probably due to the implantation of the vaccine against its helper virus, hepatitis B virus, and serological screening methods implemented in the blood banks.


Introduction
The hepatitis delta virus, which is the sole representative of the Deltavirus genus, causes the most severe form of liver infection among all viral hepatitises, with developmental potential for hepatic cirrhosis, hepatocellular carcinoma (HCC), and death [1]. Cirrhotic individuals that were infected with HDV showed a high risk for developing HCC and hepatic decompensation when compared to patients that were infected with hepatitis B virus (HBV) or hepatitis C virus alone [2]. Infection with this virus only occurs when associated with HBV, which makes it a defective and satellite agent, due to

Alignment and Determination of Evolutionary Model
The alignment was performed while using the Clustal W algorithm in MEGA7 software (Molecular Evolutionary Genetic Analysis, [30]). An estimation of the most appropriate substitution model was performed in MEGA7 itself, while using the Model Analyses (ML) tool, through the Automatic (Neighbor-joining tree) tree, showing that the most suitable model for the data set was Tamura three-parameter with Gamma distribution (T92 + G), which was chosen among the others because it has the lowest Bayesian Information Criterion (BIC) value (4187,2987).

Temporal Signal Estimation
To perform Bayesian phylogenetic inference based on a molecular clock, there must be sufficient genetic change between the sampling times of the taxa to reconstruct a statistical relationship between genetic divergence and time [31]. To examine this relationship, an analysis was performed to verify the existence of a temporal signal in the study data set. This analysis was done while using TempEst v.1.5.3 software, through a "non-clock" phylogenetic tree that was obtained while using IQtree-1.6.10. software.

Bayesian Analysis
In the software BEAUti v.1.10.1., the collection date for the Bayesian inference was included to reconstruct the dynamic and evolutionary past of the virus. The uncorrelated relaxed clock was defined by allowing each branch of the phylogenetic tree to have a different evolutionary rate than the others. Tamura-Nei (TN93) (BIC: 4209,630) was used as an evolutionary model, since T92 is not included among this software's models.
The effective sample size (ESS) of parameters post-run of Markovian chains is indicated to be greater than 200. The Bayesian SkyGrid coalescent model was used. 50 Tree Prior parameters were defined, with a final time point of 100 years before the most recent sampling, estimating one population size every two years. The length of the Monte Carlo Markov Chain (MCMC) was set to 5 × 10 7 , with data collection every 5000, which thus creates 10,000 samples by running Markov chains through BEAST v.1.10.1 software, which was sufficient for the convergence of parameters. The analyses were performed in duplicate and were combined while using LogCombiner v.1.8.4 software.
Bayesian reconstruction and the determination of tMRCA were inferred while using Tracer v.1.7.1 software. The phylogenetic tree was summarized and a consensus tree was created using TreeAnotator v.1.10.1., which in turn selects a single "target" tree and records it with the summary information of the entire sample of trees (10,000), excluding 10% of the samples as burn-in, creating a Maximum Clade

Spatial Phylogenetic Reconstruction of the Evolutionary Dynamics
A phylogeographic analysis was performed to infer a dispersion profile from the cartographic point of view. The methodology used resembles that used in the reconstruction of evolutionary dynamics, with small exceptions. The data that refer to the location coordinates of the taxa were implemented during the execution of BEAUti v.1.10.4. The model used was selected from tests and observation of the ESS value to evaluate the convergence of the parameters. Thus, the Cauchy Relaxed Random Walk model was selected for phylogeographic diffusion in continuous space, with an MCMC chain length of 10 × 10 8 and data collection every 10,000 steps. The results of the analysis were summarized while using TreeAnotator v.1.8.4. for building an MCC tree. This, in turn, was used to plot the diffusion results on the map through SPREAD v.1.0.7 software and the result was viewed through Google Earth software. Specific isolation local of sequence were obtained by reading the studies that gave rise to the deposition. Only the sequence KC590319.1 had no specific isolation local defined (only Brazil was informed as the place of origin), however the phylogenetic analysis allowed for observing that it was similar to those that were isolated in the State of Amazonas, Brazil. Therefore, this state was defined as the origin place of this strain. The coordinates used were obtained by searching in Google Earth ( Table 2).

Results and Discussion
It was possible to observe a linear regression curve that shows a positive correlation between genetic divergence and sampling time by exploring the existence of the temporal signal in the study dataset ( Figure 1). This correlation pattern points to the conclusion that the alignment contains enough temporal information to justify a molecular clock approach [31]. Although the temporal signal level is low, as evidenced by the R 2 value (0.24), this parameter should not be used to test the statistical significance of the regression, because the individual data points are not independently distributed and are instead partially correlated due to shared phylogenetic ancestry [31]. Temporal signal secondary analyses were performed, including the full-length genome and R0 region of 55 HDV-1 sequences deposited in the Nucleotide database and exhibited similar temporal signal levels [32] (data not shown).  The graph shows the positive correlation between genetic diversity from root-to-tip (y-axis) and time (x-axis). This effect on the relationship of these variables shows the existence of a temporal signal in the data set that makes it sufficient to perform molecular clock analysis. Evolutionary dynamics analysis being performed in isolated sequences from different regions of South America showed the most recent common ancestor time, dating to 47 years prior to the most recent sample, estimated in 1964 according to the mean of the data (95% of HPD between 1952 and 1972) (Figure 2).
In the phylogenetic tree constructed (Figure 3), it was possible to observe that the HDV sequences that were isolated in the state of Amazonas (Brazil) [27] have a common ancestor with sequences isolated in Colombia [24] dating around the year 1990 (posterior probability of 74.4%). This grouping shares an ancestor dating five years earlier, with three sequences from Venezuela [28]. These Venezuelan HDV strains were isolated from Yucpa indigenous patients from three distinct, but closely located, villages and lined up in a single cluster, which suggests that a single strain of HDV caused dissemination in this population. Figure 1. Temporal signal linear regression graph. The graph shows the positive correlation between genetic diversity from root-to-tip (y-axis) and time (x-axis). This effect on the relationship of these variables shows the existence of a temporal signal in the data set that makes it sufficient to perform molecular clock analysis.
Evolutionary dynamics analysis being performed in isolated sequences from different regions of South America showed the most recent common ancestor time, dating to 47 years prior to the most recent sample, estimated in 1964 according to the mean of the data (95% of HPD between 1952 and 1972) (Figure 2).  In the early 1980s, the Yucpa Amerindians were affected by a devastating epidemic of HDV [28]. In a 1992 study that involved 216 Yucpa Indian patients with HBV under observation between 1983 and 1988, 74 were initially positive for HDV infection and another 35 contracted the infection during the study, which showed evidence of a high prevalence of the virus in this population [33]. Although the years between the results of the Bayesian analysis are not accurately related to prevalence studies, it is worth mentioning that, in the generated MCC tree, the nodes are located according to the average age estimation results in 95% HPD intervals of these same nodes. This range of probabilities varies, the node in question varied between 1975 and 1989, and the mean of data estimated this node as being the year 1885.
The phylogenetic analysis allowed for us to observe that this cluster of HDV strains from Venezuela, Colombia, and Brazil still share ancestors between 1963 and 1970, with HDV strains being isolated from patients who presented fulminant hepatitis and died, deposited by [14], showing that the spread of HDV occurred in this direction: starting in Brazil, spreading to Venezuela, and then to Colombia. However, there is no relation of heterochronology between the sequences of Venezuela and Colombia, which were collected in 1990 and 2007, respectively. This lack of disparity between the intragroup isolation dates does not allow for us to conclude that HDV did, in fact, spread in this way. In addition, Bolivia and Peru are countries that report HDV isolates, but they do not have collection dates that are specified for the deposited sequences.
According to the phylogeographic analysis that was performed in the study, in 1980, the first possible phylogenetic relationships of HDV-3 are detectable, which quite possibly shows that the dispersion started in the region of Boca do Acre (Amazonas, Brazil) and then arrived at Sena Madureira (Acre, Brazil) ( Figure 4A). In the following decade, the Yucpa amerindians of the northwestern region of Venezuela experienced the devastating epidemic of HDV and, according to the results, most probably the strain that caused the HDV infections in this population arose from a focus of dissemination in the municipality of Pauini (Amazonas, Brazil). Given the distance between the points, it is likely that the dissemination occurred through a traveling individual. In this same decade, HDV-3 continued to spread through other points in this region ( Figure 4B). sequences that were isolated in the state of Amazonas (Brazil) [27] have a common ancestor with sequences isolated in Colombia [24] dating around the year 1990 (posterior probability of 74.4%). This grouping shares an ancestor dating five years earlier, with three sequences from Venezuela [28]. These Venezuelan HDV strains were isolated from Yucpa indigenous patients from three distinct, but closely located, villages and lined up in a single cluster, which suggests that a single strain of HDV caused dissemination in this population.  . Bayesian phylogenetic tree. In the maximum clade credibility (MCC) tree generated, the phylogenetic relationship was estimated by Bayesian analysis among 44 strains of HDV-3 isolated in South America from 1978 to 2011. Red color taxa correspond to sequences from [28]; blue corresponds to those from [24]; green to those from [27]; black to those from [22] and purple to that from [29] ( Table 1). In each node the posterior probability rate is shown as percentage data and in the lower part, the time in years is displayed. The time to the Most Recent Common Ancestor (tMRCA) dating back to 1964 is demonstrated for this tree root.   In the last year of the limit of the analysis, 2011, it is already possible to observe that this genotype was even more dispersed in the Amazon region, in the East, West, and North directions. A new focus of dissemination was formed, specifically in the vicinity of the municipality of Eirunepé (Amazonas, Brazil), which, in addition to originating strains that were directed to other municipalities of the same state (Tefé and Ipixuna), also originated those that were directed to the state of Acre, in Tarauacá. The strains of Colombia also originated from ancestors that were linked to the Eirunepé region and arrived in the country causing fulminant hepatitis in patients who were invited to participate in the Alvarado-Mora et al. (2011) [24] study ( Figure 4C).
SkyGrid reconstruction analysis ( Figure 5) showed slow exponential growth that began in 1952 (tMRCA minimum value) and continued through the mid-1980s. A study from 2011 using only HDV-3 sequences that were isolated in Brazil, Colombia, Venezuela, and Peru, found exponential growth between the 1950s and 1970s, through analysis with the Bayesian Skyline Plot model [24] and showed a brief similarity between the results, when the geographic origin of the analyzed strains was considered. plane represent the posterior support of the node, which also accompany a color gradient, ranging from black (lightly supported node) to light blue (well supported node).
SkyGrid reconstruction analysis ( Figure 5) showed slow exponential growth that began in 1952 (tMRCA minimum value) and continued through the mid-1980s. A study from 2011 using only HDV-3 sequences that were isolated in Brazil, Colombia, Venezuela, and Peru, found exponential growth between the 1950s and 1970s, through analysis with the Bayesian Skyline Plot model [24] and showed a brief similarity between the results, when the geographic origin of the analyzed strains was considered.
However, the Bayesian SkyGrid coalescent model that was used for population dynamics inference in the present study presents a more reliable and accurate approach. Presented to the scientific community two years later, in 2013, it has been shown to be more efficient in recovering true population trajectories and it has produced a considerable increase in the accuracy of tMRCA determination [34]. This improvement appears to be especially prominent in multi-loci datasets, such as the analyzed region of the HDV genome, which, in addition to partially comprising the HDAg encoding gene, has nucleotides that correspond to L-HDAg and also a non-coding region. This observation is supported when 95% HPD of the tMRCA value determined by [24] varied by 153 years (1821-1974), while that of the current study varies by only 20 years (1952-1972). Acclivity in genetic diversity/time relationships can be interpreted as an increase in the effective number of infections over time, and the reverse interpretation is possible when a decline occurs on the graph. This association is only possible when the genomic changes of the analyzed organism do not undergo some strong influence of natural selection [35], as occurs, for example, in the cases of resistance mutations in the genome of HBV and HIV, which arise when these viruses are exposed to drugs whose targets are specific to viral replication activity.
It was stated that HDV was related to old cases of hepatitis on the South American continent. One report that is most likely associated with this virus is known as "Sierra Nevada de Santa Marta" However, the Bayesian SkyGrid coalescent model that was used for population dynamics inference in the present study presents a more reliable and accurate approach. Presented to the scientific community two years later, in 2013, it has been shown to be more efficient in recovering true population trajectories and it has produced a considerable increase in the accuracy of tMRCA determination [34]. This improvement appears to be especially prominent in multi-loci datasets, such as the analyzed region of the HDV genome, which, in addition to partially comprising the HDAg encoding gene, has nucleotides that correspond to L-HDAg and also a non-coding region. This observation is supported when 95% HPD of the tMRCA value determined by [24] varied by 153 years , while that of the current study varies by only 20 years (1952-1972). Acclivity in genetic diversity/time relationships can be interpreted as an increase in the effective number of infections over time, and the reverse interpretation is possible when a decline occurs on the graph. This association is only possible when the genomic changes of the analyzed organism do not undergo some strong influence of natural selection [35], as occurs, for example, in the cases of resistance mutations in the genome of HBV and HIV, which arise when these viruses are exposed to drugs whose targets are specific to viral replication activity.
It was stated that HDV was related to old cases of hepatitis on the South American continent. One report that is most likely associated with this virus is known as "Sierra Nevada de Santa Marta" hepatitis, an outbreak that occurred in Colombia [24]. In 1985, 60% of HBsAg-reactive individuals from a village in northern Colombia who had clinical profiles that were consistent with this outbreak were also reactive for anti-HDV [36]. In the following year, a study was performed with 81 patients who presented this hepatitis clinical profile and died, 70% of whom presented the Delta Antigen through autopsy and viscerotomy analyses [37].
The exponential growth between 1952 and 1980 indicates that there was an exorbitant increase in the epidemiological rates of HDV infection during this period. Some years before, precisely in 1938, the presence of HDV in South America was confirmed through histological analysis of hepatic viscerotomy samples and the serology of patients who presented a liver disease, known as Labrea Black Fever, in the southern region of the State of Amazonas, Brazil [38,39]. This outbreak was characterized by severe fast course hepatitis, which resulted in liver failure and death [40]. There are reports of a disease that decimated entire families and affected so-called rubber soldiers and family members, receiving various denominations, including Lábrea Black Fever and Lábrea hepatitis [41], which was later associated with HDV infection. Thus, it is possible to assume that the rate of HDV infection continued to increase among these populations decades after initial reports, until it experienced a stationary period and decline.
The two oldest HDV cases date back to 1938 and 1940, and they were confirmed by histological analysis and Delta antigen detection from patients from the Boca do Acre region of Brazil [38,39]. Dias & Coura (1985) [39] also concluded, histologically, the diagnosis of Lábrea hepatitis in samples of Colombian and Peruvian patients; however there was no detection of the Delta antigen [39]. Currently, it is known that this clinical condition also presents other possible etiologies [40], which makes it impossible to sustain that these diseases were related to HDV. Even if they were, it is not possible to say whether or not the infection was contracted in the country of origin, or whether those patients were residents of Brazil. Therefore, while considering the evolutionary dispersion profile that were observed through the temporal phylogenetic tree (Figure 3) and the order of reports of the presence of HDV in South America-Brazil in 1938, 1940, and 1978, Venezuela in the mid-1980s [28] and Colombia in 1985 and 1986 [36,37]-it is plausible to conclude that the spread of HDV-3 truly began in Brazil.
In September 1989, vaccination against HBV was implemented in the Amazon region and in 1993 the minimum age for immunization was decreased [17,24]. When considering the fact that HDV infection is only possible when associated with HBV, one would expect the relative number of infections to decrease after this event. However, Bayesian analysis showed that the first decline in the genetic diversity/time relationship occurred only after the year 2000. Seven years after the introduction of the vaccine, a decrease in the prevalence rate of HDV was reported in areas that are considered to be hyperendemic in the Amazon region, from 32% in individuals reactive for HBsAg to 22.2% in individuals with reactivity to any marker of hepatitis B [42,43].
A recent study reported successive reductions in the prevalence rate of HDV over time in South America. In the 1980s, the value of this variable was determined to be 40.29% for HBsAg reactive patients and it decreased to 28.01% by the end of the decade, and then to a moderate endemic level in 2000 (13.58%). It also continued to exponentially decline to values that were close to those of the rest of the world in 2010 (6.97%) [44]. Most of the strains that were used in the analysis were isolated in Brazil. Epidemiological data from this country show that there was a recent fall in the number of registered cases of hepatitis Delta from 359 cases in 2013, to 159 in 2017, with a decrease of 55.72% (mean drop of 13.93% per year). Between 1999 and 2017, 75% of registered HDV infections occurred in the northern region of Brazil [13], which, in addition to partially comprising the Western Amazon Basin, is relatively geographically close to Venezuela and Colombia. Bayesian analysis of HDV evolution presented results that corroborate HDV prevalence studies at this study site from a more specific point of view, which observes how much the viral genome has been modified over time, estimating a field of probabilities that reflect the genetic diversity that is related to the time of the epidemiological behavior of the pathogen. In an indirect way, the study suggests that the result of immunization against HBV and the screening of transfusion materials in blood centers was determinant for the fall in HDV epidemiological levels.

Conclusions
The decline in the genetic diversity of HDV in the new millennium may be a direct consequence of the implementation of the HBV vaccine and of more efficient screening methods that are directed at transfusion materials in blood banks, since the blood pathway is a strong form of HBV transmission, and consequently HDV. These considerations regarding changes in the genome of the virus over time are sustained by observations in the established epidemiological rates in previous studies that have shown successive declines in the number of HDV cases. However, HDV infection is still a worrisome health problem, especially in endemic regions, which causes the development of fulminant hepatitis and chronic hepatitis, which can progress to liver cirrhosis, hepatocellular carcinoma, and death.
Regarding the evolutionary history, although the most recent ancestor of the strains analyzed was dated back to the 1960s, previous studies have reported the presence of this virus in South America about 30 years earlier, which indicates that infection by this virus has been a serious problem for almost a century, affecting both isolated Amerindian populations and the general population. The evolutionary profile and chronology of HDV reports in South America indicate a possible geographical origin of the virus in Brazil. However, it is recommended to consider that the Bayesian analysis only showed viral epidemiological behavior after 1952 and there is no way to determine how the spread of the virus occurred during previous periods. Therefore, it is possible that there is another origin, since the first confirmed report was of patients with Labrea Black Fever, and these cases began to be reported only after 1926, presuming then that this is the period of arrival of the virus in the Brazil, or a possible founder effect, which may or may not be associated with indigenous populations.