Dates and Rates of Tick-Borne Encephalitis Virus—The Slowest Changing Tick-Borne Flavivirus

We evaluated the temporal signal and substitution rate of tick-borne encephalitis virus (TBEV) using 276 complete open reading frame (ORF) sequences with known collection dates. According to a permutation test, the TBEV Siberian subtype (TBEV-S) data set has no temporal structure and cannot be applied for substitution rate estimation without other TBEV subtypes. The substitution rate obtained suggests that the common clade of TBEV (TBEV-common), including all TBEV subtypes and louping-ill virus (LIV), is characterized by the lowest rate (1.87 × 10−5 substitutions per site per year (s/s/y) or 1 nucleotide substitution per ORF per 4.9 years; 95% highest posterior density (HPD) interval, 1.3–2.4 × 10−5 s/s/y) among all tick-borne flaviviruses previously assessed. Within TBEV-common, the TBEV European subtype (TBEV-E) has the lowest substitution rate (1.3 × 10−5 s/s/y or 1 nucleotide substitution per ORF per 7.5 years; 95% HPD, 1.0–1.8 × 10−5 s/s/y) as compared with TBEV Far-Eastern subtype (3.0 × 10−5 s/s/y or 1 nucleotide substitution per ORF per 3.2 years; 95% HPD, 1.6–4.5 × 10−5 s/s/y). TBEV-common representing the species tick-borne encephalitis virus diverged 9623 years ago (95% HPD interval, 6373–13,208 years). The TBEV Baikalian subtype is the youngest one (489 years; 95% HPD, 291–697 years) which differs significantly by age from TBEV-E (848 years; 95% HPD, 596–1112 years), LIV (2424 years; 95% HPD, 1572–3400 years), TBEV-FE (1936 years, 95% HPD, 1344–2598 years), and the joint clade of TBEV-S (2505 years, 95% HPD, 1700–3421 years) comprising Vasilchenko, Zausaev, and Baltic lineages.


Introduction
Tick-borne encephalitis virus (TBEV) is a pathogenic and neurotropic RNA-virus belonging to the family Flaviviridae, genus Flavivirus with up to 12,000 tick-borne encephalitis (TBE) cases identified annually from countries where the disease is tracked [1]. The coding region of the TBEV genome consists of one open reading frame (ORF) with a length of 10,245 nt encoding three structural and seven non-structural proteins [2] that is typical for flaviviruses [3].
The common clade of TBEV (TBEV-common) comprises at least five main subtypes: Far-Eastern (TBEV-FE), European (TBEV-E), Siberian (TBEV-S), Baikalian (TBEV-B) [4] and the recently discovered Himalayan (TBEV-H) [5] subtype. Moreover, the scientific community also distinguishes TBEV subtypes into subvariants or subgenotypes: within TBEV-S there is delineation into the Zausaev (TBEV-SZ), Vasilchenko (TBEV-SV), and Baltic (TBEV-SB) subvariants. Moreover, many distinct lineages with relatively long patristic distances are reported (e.g., Omskaya and Bosnia lineages [6] which are close to TBEV-S; the 178-79 strain, close to TBEV-B and -FE; the NL lineage, close to TBEV-E [7], etc.), but it is difficult to reach a consensus about their intraspecies status due to the lack of an accepted definition of subtype. Nevertheless, a 10% nucleotide difference threshold for TBEV subtypes was recently proposed [8]. In addition, louping-ill virus (LIV) primarily circulating in the British Isles is a paraphyletic unit in the structure of TBEV-common [9] and in the term of monophyly can be considered as a member of the tick-borne encephalitis virus species [10] or together with TBEV-E as a single species [11].
Several estimates of TBEV substitution rate based on the different loci (mainly, E gene or complete ORF sequences) have been reported with the values in different studies by an order of magnitude (e.g., median values of 0.8-1.4 × 10 −4 for the E gene [12] and 3.9 × 10 −5 for the complete ORF [9]). There was only one study, to our knowledge, where the temporal signal (or temporal structure [13][14][15]) of TBEV was evaluated [12] but the researchers analyzed E gene sequences (1488 nt) and used an informal root-to-tips regression analysis which tended to give false negative results [15] in the case of an ancient root and relatively narrow sampling windows (time between the most recent and oldest samples in a data set). Thus, it is possible that some of those estimates need to be revised.
In the current work, we aimed to assess the temporal structure of TBEV-common as well as TBEV subtypes separately using complete genome data available at present. We approached this by applying informal regression analysis in TempEst and a formal permutation test when collection dates between samples were permuted. Analysis suggests that TBEV-common representing the tick-borne encephalitis virus species is the oldest and the slowest virus among tick-borne flaviviruses. To the best of the authors' knowledge, this work is the first comprehensive study of the TBEV substitution rate and population dynamics based on the actual complete ORF data set.

Temporal Signal in the Data Sets of the Common Clade and Different Subtypes of TBEV
We assessed the temporal signal using the complete genome data set of the common TBEV clade including LIV and the separate data sets of each TBEV subtype. LIV was not analyzed individually since it has been previously demonstrated with the permutation test that LIV has no temporal structure [16]. Preliminary analysis with TempEst showed ( Figure 1) the lack of a correlation between the collection dates and root-to-tips genetic distances in all data sets except TBEV-FE with R 2 = 0.145 (weak temporal structure). As TempEst tends to infer false negative results (low R 2 values) in the case of relatively old tree roots and narrow sampling windows [15], we relied on the permutation test with ten permuted replicas in each analysis. The permutation test (Figure 2) revealed that the data set comprising TBEV-common samples has a strong temporal signal without overlapping root height 95% HPD intervals of true dates analysis and permuted replicas, despite an extremely low R 2 of 3 × 10 −5 inferred by TempEst (see Figure 1).

Figure 1.
Root-to-tips regression analysis indicating a correlation between genetic distances from a root to tree tips and collection dates of samples. The X-axes represent calendar years where negative values are years BC. The abbreviates in the titles of windows are the common tick-borne encephalitis virus clade (TBEV-common) and TBEV subtypes: Far-Eastern (-FE), European (-E), and Siberian (-S).
As TempEst tends to infer false negative results (low R 2 values) in the case of relatively old tree roots and narrow sampling windows [15], we relied on the permutation test with ten permuted replicas in each analysis. The permutation test ( Figure 2) revealed that the data set comprising TBEV-common samples has a strong temporal signal without overlapping root height 95% HPD intervals of true dates analysis and permuted replicas, despite an extremely low R 2 of 3 × 10 −5 inferred by TempEst (see Figure 1). Moreover, TBEV-FE but not TBEV-S demonstrated the presence of temporal structure. The results obtained validate the use of the common TBEV clade and TBEV-FE complete genome data sets in molecular clock analysis. Thus, TBEV-S is not considered for substitution rate evaluation. For the population dynamics analysis of TBEV-S, we used the substitution rate of TBEV-common. TBEV-E was not used for the permutation test as it was shown previously that the TBEV-E data set contained the temporal signal [16].

Nucleotide Saturation Test
The nucleotide saturation analysis showed that the codons in the TBEV-common The results obtained validate the use of the common TBEV clade and TBEV-FE complete genome data sets in molecular clock analysis. Thus, TBEV-S is not considered for substitution rate evaluation. For the population dynamics analysis of TBEV-S, we used the substitution rate of TBEV-common. TBEV-E was not used for the permutation test as it was shown previously that the TBEV-E data set contained the temporal signal [16].

Nucleotide Saturation Test
The nucleotide saturation analysis showed that the codons in the TBEV-common data set were not critically saturated by substitutions (Iss = 0.271 and Iss.c = 0.571) and, therefore, were applicable for phylogenetic analysis.

Divergence Times
To evaluate the divergence time of TBEV subtypes, we used the data set containing 276 complete genome sequences of TBEV-common. The results of our analysis in BEAST suggest that the root age of TBEV-common is 9623 years (95% HPD, 6373-13208 years) ( Figure 4).
The youngest TBEV subtype is TBEV-B (489 years; 95% HPD, 291-697 years); however, these estimates should be treated with caution, as the number of TBEV-B complete genome sequences (20) is relatively small ( Figure 5). The UCLD clock analysis exhibited 95% of TBEV-E CV posterior density being lower than 0.13, which implied that the data is clock-like. TBEV-common and TBEV-FE demonstrated non-clock-like behavior with CV ranging 0.27-0.42 and 0.34-0.57, respectively, which supports applying UCLD.

Divergence Times
To evaluate the divergence time of TBEV subtypes, we used the data set containing 276 complete genome sequences of TBEV-common. The results of our analysis in BEAST suggest that the root age of TBEV-common is 9623 years (95% HPD, 6373-13,208 years) ( Figure 4).     The ages of the following TBEV subtypes (in ascending order) are significantly different: TBEV-B, -E, LIV and -FE/-S (95% HPD intervals of LIV, TBEV-FE and -S are overlapped substantially).

Relative Genetic Diversity
Bayesian skyline reconstruction of TBEV-common with the strongest population structure (i.e., incorporating samples of main TBEV subtypes such as TBEV-S, -E, -FE, -B, that are isolated geographically) revealed population dynamics with patterns similar to the skyline trends of TBEV-S, -E, and -FE, separately ( Figure 6). The ages of the following TBEV subtypes (in ascending order) are significantly different: TBEV-B, -E, LIV and -FE/-S (95% HPD intervals of LIV, TBEV-FE and -S are overlapped substantially).

Relative Genetic Diversity
Bayesian skyline reconstruction of TBEV-common with the strongest population structure (i.e., incorporating samples of main TBEV subtypes such as TBEV-S, -E, -FE, -B, that are isolated geographically) revealed population dynamics with patterns similar to the skyline trends of TBEV-S, -E, and -FE, separately ( Figure 6).  In other words, the N e × τ trend of TBEV-common consists of the TBEV-S N e × τ uprising in the 1300s followed by the TBEV-S, -E, -FE stable period during the years 1500-1900, and the decline of TBEV-E and -FE relative genetic diversity values in the 1900s. Notably, the TBEV-S skyline trend has no decline close to the present.
Previously [11], we proposed to unite the clades of TBEV-FE, -B, -S, -H, and other single TBEV lineages into the separate virus species F. zilber, and the TBEV-E subtype with LIV into another species F. neudoerfl. This is required by the monophyletic conception ( Figure 4) and the ICTV definition of virus species (https://ictv.global/about/code; accessed on 26 December 2022). Our estimates suggest F. neudoerfl is slightly younger than F. zilber with the time to the most recent common ancestor being 5634 and 8327 years, respectively; however, the 95% HPD intervals are significantly overlapped.
TBEV-common with a strong population structure has demonstrated population dynamics features shared with three main subtypes (TBEV-S, -E, -FE; Figure 6), which highlights insidious aspects of coalescent inference assumptions (such as population structure or isolation influence). In the case of TBEV-common, Bayesian Skyline does not reflect the true changes of relative genetic diversity; therefore, during Bayesian Skyline analysis, TBEV-common as a metapopulation should be subdivided into subpopulations (or demes) according to TBEV subtypes. When analyzing TBEV subtypes separately, it was revealed that only the TBEV-S population has pronounced Ne × τ ascending in the early 1300s and has no Ne × τ decline to the present. In a previous study [12], researchers used the Bayesian SkyGrid model to analyze 551 TBEV E gene nucleotide sequences and obtained an opposite pattern of population dynamics of three main TBEV subtypes (TBEV-S, -E, -FE) with ascending relative genetic diversity up to the present. This discrepancy is Previously [11], we proposed to unite the clades of TBEV-FE, -B, -S, -H, and other single TBEV lineages into the separate virus species F. zilber, and the TBEV-E subtype with LIV into another species F. neudoerfl. This is required by the monophyletic conception ( Figure 4) and the ICTV definition of virus species (https://ictv.global/about/code; accessed on 26 December 2022). Our estimates suggest F. neudoerfl is slightly younger than F. zilber with the time to the most recent common ancestor being 5634 and 8327 years, respectively; however, the 95% HPD intervals are significantly overlapped.
TBEV-common with a strong population structure has demonstrated population dynamics features shared with three main subtypes (TBEV-S, -E, -FE; Figure 6), which highlights insidious aspects of coalescent inference assumptions (such as population structure or isolation influence). In the case of TBEV-common, Bayesian Skyline does not reflect the true changes of relative genetic diversity; therefore, during Bayesian Skyline analysis, TBEV-common as a metapopulation should be subdivided into subpopulations (or demes) according to TBEV subtypes. When analyzing TBEV subtypes separately, it was revealed that only the TBEV-S population has pronounced N e × τ ascending in the early 1300s and has no N e × τ decline to the present. In a previous study [12], researchers used the Bayesian SkyGrid model to analyze 551 TBEV E gene nucleotide sequences and obtained an opposite pattern of population dynamics of three main TBEV subtypes (TBEV-S, -E, -FE) with ascending relative genetic diversity up to the present. This discrepancy is due to the difference in molecular markers compared to parameters of the Skyline and SkyGrid models. We believe that an increase in the number of loci (TBEV ORF contains ten genes) has a better effect on the accuracy of the population dynamics inference than an increase in the number of sequences. This is at least known for the Extended Bayesian Skyline Plot [32]. Evaluation of the number of parsimony-informative sites in the E gene data set with known collection dates (723 samples) and the ORF data set (276 samples) used in the current study showed that despite a relatively low number of sequences, the ORF data set contained almost six times more informative sites than E gene alone (4082 and 658 or 14.8 and 0.9 informative sites per OTU, respectively).
The observed dynamics of relative genetic diversity is associated with changes in the two-host vertebrate-invertebrate (rodent-tick) system. In turn, the climatic changes during this period could have affected rodents and ticks. The time period from the 1300s to the end of the 1900s is known as the Little Ice Age [33,34]. From 950 to 1250, there was a medieval climatic optimum with, respectively, high temperature and humidity in the northern hemisphere [34] followed by a sharp cooling at the first stage of the Little Ice Age (1300-1440) in the Eurasian part of the TBEV distribution area. All three TBE subtypes had, respectively, high relative genetic diversity during this first stage. At the second stage of the Little Ice Age (1440-1560), there was a slight rise in temperature. The third stage (1560-1900) was characterized by a drop in the temperature and humidity associated with the period of low solar activity [34,35]. The Little Ice Age ended at the beginning of the 1900s and since then there has been a rise in temperature in the northern hemisphere. The TBEV-S population increase was detected in the first stage of the Little Ice Age. Some studies based on the historical observations [36] and the population genetic analysis concluded that there was a surge in the number of rodents during the Little Ice Age [37,38]. This surge, in turn, was associated with the liberation of ecological niches by larger mammals and a decrease in the number of predators including poikilotherms. A rise in the rodent population supports the increase of relative genetic diversity of TBEV. An increase in the relative genetic diversity of TBEV-S during the Little Ice Age could also be due to the retention of the sufficient level of humidity on the Eurasian continent [39][40][41]. The consequence of this was the preservation of the suitable habitats for ticks. The decline in the effective population sizes of TBEV-E and TBEV-FE after 1900 is probably associated with the following:

•
Hidden population structure within TBEV-E and -FE (so-called confounding effect of population structure [42]); • Large-scale forest treatment by acaricides including dichlorodiphenyltrichloroethane (DDT) to prevent human infection.
The TBEV-S effective population size shows no decline up to the present due to either a relatively small sample size of TBEV-S or a weaker population structure.

Data Set Design
We searched homologous ORF nucleotide sequences of TBEV with known collection dates in the GenBank database. LIV was also included in the search since together with TBEV it formed a paraphyletic clade in the phylogenetic tree. In total, 276 ORF sequences (10,242 nt) were found and filtered by stop codons. The data set composition is presented in Table 1. Single TBEV lineages (e.g., 178-79 strain, Bosnia and Omskaya lineages of TBEV-S, NL lineage of TBEV-E) were not combined with the main TBEV subtypes in a union clade due to relatively high patristic distances, and, therefore, were not involved in the evaluation of divergence times.

Codon Saturation Measuring
The index of substitution saturation (I ss ) in a set of aligned nucleotide sequences was calculated in DAMBE [44] and compared with critical I ss values (referred to as I ss.c [45]). Minor saturation was observed in the case of I ss < I ss.c .

Temporal Signal Assessment
We assessed temporal signal in the TBEV-common data set using all 276 ORF sequences along with subtypes of TBEV separately. TBEV-S comprised TBEV-SB, -SV, and -SZ variants. We did not evaluate the signal of TBEV-E since it was previously approved with a permutation test by Clark, et al. (2020) [16]. The evidence of a lack of the temporal structure in the LIV complete genome data set was also shown by these authors. Thus, temporal structure was evaluated for TBEV-common, TBEV-FE and TBEV-S data sets.
Firstly, we used TempEst [46] to conduct regression analysis where the correlation between the genetic distances from a tree root to tips (root-to-tips distances) and the collection dates was estimated. This method is informal and tends to give false negative results in the case when a sampling window (Table 1) is narrow relative to root height [15]. Secondly, we then applied intuitive and formal permutation tests where collection dates were permuted between samples. We used 10 permuted replicas, which were analyzed in BEAST. Root height values obtained were compared with the analysis of data sets with true collection dates. In the case of overlapping 95% HPD intervals of true and permuted dates, temporal signal was taken as negative and vice versa. To reduce the computational cost, MCMC runs of permuted replicas were continued until effective sample sizes (ESS) of all parameters reached 100.
A molecular clock model was tested by the coefficient of rate variation (CV) values according to an empirical rule [49] when CV posterior density in the range 0.0-0.1 indicates clock-likeness or strict clock (SC) evidence and, otherwise, a relaxed clock with uncorrelated lognormal prior distributions (UCLD) of substitution rates among lineages. Substitution rate values are indicated in substitutions per site per year (s/s/y) with median values and 95% HPD intervals.

Phylogenetic Analysis in BEAST
Phylogenetic analyses were performed in BEAST v.2.6.3 [50]. Markov chain Monte Carlo (MCMC) analyses were run until ESS values of all parameters were >200 with a sampling frequency at which the total number of trees was at least 40,000. The longest MCMC length of more than one billion iterations was for the TBEV-common data set. MCMC convergence was checked in Tracer v.1.7.2 [51]. Most credibility clade (MCC) trees were estimated with TreeAnnotator.
The BEAST files and fasta alignment used in this study can be found at link: https://doi.org/10.6084/m9.figshare.21780239 (accessed on 26 December 2022)

Bayesian Skyline Reconstruction
To reconstruct population dynamics, we used the Bayesian Skyline method [52] with default priors. Due to the strong population structure of TBEV-common, we additionally subdivided the TBEV-common data set into TBEV-S, -E, and -FE subtypes to compare population dynamics inferences of populations with weak and strong population structures.
As TBEV-S showed the lack of temporal signal in the permutation test, in TBEV-S population dynamics analysis, we applied the substitution rate of TBEV-common.

1.
TBEV has the lowest substitution rate among other tick-borne flavivirus species; 2.
The actual data set of TBEV-S ORF sequences has no temporal structure; 3.
TBEV-common representing the species tick-borne encephalitis virus according to median values diverged 9623 years ago with a 95% HPD interval of 6373-13,208 years; 6.
The population dynamics of TBEV-common as a metapopulation with a strong population structure has a "chimeric" pattern comprising features of dynamics of highly isolated TBEV subtypes (TBEV-S, -E, and -FE) so N e × τ dynamics inferred from TBEV-common cannot be interpreted as changes in the effective number of infections.