Genetic Diversity and Possible Origins of the Hepatitis B Virus in Siberian Natives

A total of 381 hepatitis B virus (HBV) DNA sequences collected from nine groups of Siberian native populations were phylogenetically analyzed along with 179 HBV strains sampled in different urban populations of former western USSR republics and 50 strains from Central Asian republics and Mongolia. Different HBV subgenotypes predominated in various native Siberian populations. Subgenotype D1 was dominant in Altaian Kazakhs (100%), Tuvans (100%), and Teleuts (100%) of southern Siberia as well as in Dolgans and Nganasans (69%), who inhabit the polar Taimyr Peninsula. D2 was the most prevalent subgenotype in the combined group of Nenets, Komi, and Khants of the northern Yamalo-Nenets Autonomous Region (71%) and in Yakuts (36%) from northeastern Siberia. D3 was the main subgenotype in South Altaians (76%) and Buryats (40%) of southeastern Siberia, and in Chukchi (51%) of the Russian Far East. Subgenotype C2 was found in Taimyr (19%) and Chukchi (27%), while subgenotype A2 was common in Yakuts (33%). In contrast, D2 was dominant (56%) in urban populations of the former western USSR, and D1 (62%) in Central Asian republics and Mongolia. Statistical analysis demonstrated that the studied groups are epidemiologically isolated from each other and might have contracted HBV from different sources during the settlement of Siberia.


Introduction
Human hepatitis B virus (HBV) belongs to the Hepadnaviridae family. It has a circular DNA genome of~3200 nucleotides encoding four major proteins: P, S, C, and X. Differences in the structure of the surface antigen HBsAg, encoded by the S-gene, have allowed researchers to identify nine HBV subtypes: ayw1, ayw2, ayw3, ayw4, ayr, adw2, adw4, adrq+, and adrq− [1][2][3]. The amino acid residues that determine these subtypes have been described [4,5]. Phylogenetically, HBV is classified into at least eight genotypes designated A through H [6][7][8][9][10][11][12]. Some authors additionally distinguish genotype I, which is considered a recombinant between genotypes A, C, and G, and the rare genotype J, which is highly divergent from others human HBV strains and is close to a gibbon HBV [13,14]. Furthermore, within the genotypes, at least 42 subgenotypes have described, designated with Arabic numerals (A1, A2, etc.) [15][16][17][18][19]. Strains of the same HBsAg subtype may belong to different (sub)genotypes and vice versa, but there is a predominant correlation between serological subtypes and genotypes [14,17]. The subtype and genotype classifications of HBV complement each other and are used in serological research (e.g., studies on vaccine efficacy or serological diagnostics), as well as in genetics studies on the evolution and molecular epidemiology of HBV.
The prevalence of different HBV (sub)genotypes and HBsAg subtypes varies in geographical regions [3,17,20]. Genotype A (subtypes ayw1 and adw2) dominates in northwestern Europe, North America, Africa, and Asia, with subgenotype A1 common in Africa and Asia and A2 in Europe and the USA. Genotypes B (subtypes ayw1 and adw2) and C (subtypes adr, adrq+/−, ayr, and adw2) predominate in southeastern Asia and Oceania. Genotype D is the most widely distributed genotype, but its main subgenotypes vary in different parts of the world: D1 (ayw2) predominates in the Middle East and in Turkic and Maghreb states; D2 (ayw3) is common in Eastern Europe; and D3 (ayw2) circulates as a minor variant in South America (cited from [17,20]). In some regions, such as the Mediterranean, India, and Russia [21][22][23][24][25], these three D-subgenotypes are intermixed and circulate together. Genotype E (subtype ayw4) is prevalent in the East African countries, and genotypes F and H (subtypes adw4 and ayw4) are the main genetic variants in South and Central America [12]. Several isolates of genotype G have been described in the USA, Western Europe, and Asia; genotype I is common from India to Vietnam; and some strains of genotype J have been detected in Japan [11,13,[26][27][28][29][30].
The Siberian region, which traditionally includes territories from the Russian Far-East to Eastern Ural, covers around 13 million square kilometers (~75% of the Russian Federation) and has~40 million inhabitants (more than 25% of RF's total population). Siberia is inhabited with unique indigenous population groups that have historically lived in geographic and cultural isolation from each other and from the Russian settlers. Some Siberian natives have been assimilated by newly arrived populations and now reside in cities, but many still live in remote settlements, preserving their traditional lifestyle. Due to their isolation, local native groups may be epidemiological reservoirs in which HBV and other pathogens have persisted and evolved for a relatively long time after their introduction without intermixing with strains from other populations. If this is true, analyzing these isolated HBV populations may help to disclose how HBV historically spread throughout Siberia. In this study, we aimed to investigate the prevalence of HBV subgenotypes and HBsAg subtypes in nine groups of native Siberian populations and also to deduce historical origins of the revealed pattern of HBV genetic diversity in Siberia.

HBV Strains
A total of 381 HBV DNA sequences collected from 9 groups of native Siberian populations ( Figure 1) were analyzed along with 179 HBV strains sampled from different urban populations of the western republics of the former USSR (including Russia) and 50 strains from the former Central Asian USSR republics and Mongolia ( Table 1). Some of these sequences were collected by the authors of this paper, while others were retrieved from GenBank (see Table 1 for the references and accession numbers). Also included in the analysis were 124 HBV strains from ancient tombs located in different parts of the world (including modern Russian and Central Asian territories), which were sequenced by [31]. Additionally, 319 prototype strains of different HBV genotypes and subgenotypes were used for general subgenotyping analysis. For them, GenBank accession numbers and countries are listed in Figure S1 in Supplementary Materials.
The analyzed sequences varied in length from 681 nucleotides to 3225 nucleotides. Even the shortest sequences included the entire sequence of the most variable HBV S-gene (nucleotide positions 1564-2244 according to the prototype isolate X02763 [32]). Alignment of all the studied sequences with indicated accession numbers or unique codes, country or region of origin, and year of collection is attached in File S1 in Supplementary Materials in PHYLIP and FASTA formats. All the listed data for the strains (country/city, population group, GenBank accession number, year of collection) are provided in Figure S1 in Supplementary Materials.

Phylogenetic Analysis
Phylogenetic analysis was performed using maximum likelihood (ML) method with approximate likelihood ratio test for branches based on the Shimodaira-Hasegawa procedure (aLRT-SH) [33] in the online version of PhyML 3.0 software [34], http://www.atgcmontpellier.fr/phyml/. For the analysis, the following parameters were used: general time reversible (GTR) substitution model; empirical equilibrium frequencies; estimated proportion of invariable sites; 4 substitution rate categories; gamma shape parameter estimated; and aLRT-SH-like branch support method. A strain was assigned to a specific HBV subgenotype if it was clustered into the supported (index ≥ 90) branch corresponding to this subgenotype. The indices are shown in Figure 2 and Figure S1 in Supplementary Materials.

Determination of HBsAg Subtypes
HBsAg subtypes were determined by analyzing amino acids residues at positions 122, 127, 140, and 160 encoded by the S-gene of the HBV genome, as previously described [5].

Statistical Data Processing
To perform pairwise comparisons for the proportions of the HBV variants in the studied groups, 2-sided Fisher's exact test or Chi-square criterion with Yates correction were used (depending on sample characteristics). The significance threshold was set at p < 0.05; specific p-values are provided in the text.  Figure S1 in Supplementary Materials) [35] Teleutes  Figure S1) [40] Chukchi: Chukotka Autonomous Region (northeastern-most Siberia) 123 1997-2008 Author's numbers (chXX, schXX, see Figure S1 in Supplementary Materials) [35] Urban populations of Russia and western republics of the former USSR (listed from west to east), n = 179    Table 2). The colored arrows designate the main migration pathways during the settlement of Siberia in the past ten millennia (see Section 4).

HBV Viral Populations and Study Design
The main object of the study was to analyze HBV strains in nine geographical locations in Siberia that are settled mostly by indigenous people of different ethnic and language groups. These nine native groups and the regions in which they live are depicted in Figure 1. In total, 381 HBV strains from native populations were analyzed. A list of these and all other sequences studied in this paper is presented with comments and references in Table 1.
Because today Siberian natives live in close contact with Russian-speaking populations (especially in regional capital towns), it is important to know whether HBV populations are common to both the aboriginal and the Russian-speaking populations (hereafter, we will say "Russian populations" to indicate historically newer settlers in Siberia). To answer this question, the study included 179 HBV strains sampled in large cities of Russia, Belarus, and the Baltic states (Table 1). We combined these strains into one reference group because the homogeneity of the HBV samples in urban populations of the Russian Federation and contiguous countries has been reported previously by a number of authors [35,37,38,42].
Many Siberian natives are believed to have descended from ancient Turkic-speaking tribes that migrated to Siberia from Central Asian territories long ago (see below). Because  Table 2). The colored arrows designate the main migration pathways during the settlement of Siberia in the past ten millennia (see Section 4).

HBV Viral Populations and Study Design
The main object of the study was to analyze HBV strains in nine geographical locations in Siberia that are settled mostly by indigenous people of different ethnic and language groups. These nine native groups and the regions in which they live are depicted in Figure 1. In total, 381 HBV strains from native populations were analyzed. A list of these and all other sequences studied in this paper is presented with comments and references in Table 1.
Because today Siberian natives live in close contact with Russian-speaking populations (especially in regional capital towns), it is important to know whether HBV populations are common to both the aboriginal and the Russian-speaking populations (hereafter, we will say "Russian populations" to indicate historically newer settlers in Siberia). To answer this question, the study included 179 HBV strains sampled in large cities of Russia, Belarus, and the Baltic states (Table 1). We combined these strains into one reference group because the homogeneity of the HBV samples in urban populations of the Russian Federation and contiguous countries has been reported previously by a number of authors [35,37,38,42].
Many Siberian natives are believed to have descended from ancient Turkic-speaking tribes that migrated to Siberia from Central Asian territories long ago (see below). Because of this, we added a group of 50 HBV strains derived from modern-day Central Asian countries: Uzbekistan and Kazakhstan (which are Turkic-speaking nations), as well as Tajikistan and Mongolia (Table 1). We decided to include the later country in the "Central Asian" group because Mongolian people have a very long and rich history of relations with both the Russian population and the Siberian and Turkic peoples.
Finally, we included in the analysis 124 HBV sequences ( Table 1) that have been recently collected in archeological tombs in different parts of the world (including modern Russian and Central Asian territories) [31] to determine whether the descendants of these ancient variants survive in present-day Siberia.
All the listed HBV strains, along with 319 prototype sequences from GenBank with known (sub)genotypes, were used in a single phylogenetic analysis using the ML method with the aLRT-SH test for calculating branch support indices [33,34] (see Section 2). The resulting tree, consisting of 1053 HBV strains, is presented, due to its large size, in Figure S1 in Supplementary Materials. In this complete tree, one can find a branch of any studied strain, its code or accession number, the region and year of collection, and the ethnicity of the donor (Siberian strains). The overall layout of the tree, including the main regional and ethnic clusters, support indices for these clusters, topology, and relations among clusters inside the clades of HBV subgenotypes, is depicted in Figure 2.
The sequences included in the same analysis were of different lengths (681-3200 nt, see Section 2) as they were collected in different studies and by different authors ( Table 1). As the Maximum Likelihood method is applicable for aligned sequences of different lengths [34], we used HBV sequences from 681 to 3200 nt in length in the same analysis to obtain more accurate results of genotyping and to avoid possible bias due to the analysis of trimmed, shorter sequences.
The tree ( Figure S1 in Supplementary Materials) was used to determine the subgenotype of all the studied HBV strains. Additionally, for each strain, the HBsAg subtype was transcribed based on the sequence of its S-gene. The prevalence of the different subgenotypes and subtypes in all the studied groups is summarized in Table 2. This paper is chiefly concerned with comparing the studied groups by proportions of different HBV genotypes and subtypes. From this, conclusions are drawn whether the populations are epidemiologically homogeneous (i.e., a common population of HBV circulates within them), or if the HBV strains carried by populations are of different origins. At the end of the paper, authors speculate on the ways in which the current patterns of HBV genetic diversity in Siberia might have formed.

Differences in the HBV Subgenotypes in Siberian Natives and Urban Populations of the Former USSR
The first important conclusion that can be drawn from data in Table 2 (bottom rows) is that the ratio of HBV subgenotypes differs significantly in urban populations of Russia, Belarus, and the Baltic states; in the Central Asian republics; and in Siberian native populations, even when samples from the aboriginal groups are combined together into a synthetic cohort (of course, the distinct aboriginal groups differ from each other, which will be discussed below). Indeed, the main HBV subgenotype in the "Russian population" samples was D2 (56%; Table 2), and its prevalence here was reliably higher than in either the Central Asian republic samples (6%; p < 0.001) or Siberian native samples (24%; p < 0.001). By contrast, in the Central Asia and Mongolia group, the main subgenotype was D1 (62%; Table 2), which was significantly higher than in the "Russian population" (9%; p < 0.001) and the Siberian group (19%; p < 0.001). Finally, in samples from Siberian natives, all the three D subgenotypes were present in even proportions ( Table 2), but the most common subgenotype was D3 (34%), and its prevalence was higher than in the "Russian population" samples (17%; p < 0.001) and the Central Asian samples (14%; p < 0.01). All this suggests that the aboriginal population of Siberia is still epidemiologically isolated from both the European population and the Turkic and Mongolian populations of Central Asia, even though these peoples have lived together and interacted actively in the same vast territory for at least the last thousand years. Furthermore, we will see that the native groups of Siberia are also very different from each other in terms of the HBV variants they host. ses 2022, 14, x FOR PEER REVIEW 8 of 20  blue: ancient samples from [31]. Strain numbers in the clusters are shown in brackets. Supporting indices, which were calculated using an aLRT-SH-like procedure (see Section 2), are shown near the corresponding clusters and branches. For a detailed tree of all 1053 samples, see Figure S1 in Supplementary Materials.

Differences in the HBV Subgenotypes in Siberian Natives and Urban Populations of the Former USSR
The first important conclusion that can be drawn from data in Table 2 (bottom rows) is that the ratio of HBV subgenotypes differs significantly in urban populations of Russia, Belarus, and the Baltic states; in the Central Asian republics; and in Siberian native populations, even when samples from the aboriginal groups are combined together into a synthetic cohort (of course, the distinct aboriginal groups differ from each other, which will be discussed below). Indeed, the main HBV subgenotype in the "Russian population" samples was D2 (56%; Table 2), and its prevalence here was reliably higher than in either the Central Asian republic samples (6%; p < 0.001) or Siberian native samples (24%; p < 0.001). By contrast, in the Central Asia and Mongolia group, the main subgenotype was D1 (62%; Table 2), which was significantly higher than in the "Russian population" (9%; p < 0.001) and the Siberian group (19%; p < 0.001). Finally, in samples from Siberian natives, all the three D subgenotypes were present in even proportions ( Table 2), but the most common subgenotype was D3 (34%), and its prevalence was higher than in the "Russian population" samples (17%; p < 0.001) and the Central Asian samples (14%; p < 0.01). All this suggests that the aboriginal population of Siberia is still epidemiologically isolated from both the European population and the Turkic and Mongolian populations of Central Asia, even though these peoples have lived together and interacted actively in the same vast territory for at least the last thousand years. Furthermore, we will see that the native groups of Siberia are also very different from each other in terms of the HBV variants they host.  [31]. Strain numbers in the clusters are shown in brackets. Supporting indices, which were calculated using an aLRT-SH-like procedure (see Section 2), are shown near the corresponding clusters and branches. For a detailed tree of all 1053 samples, see Figure S1 in Supplementary Materials. The rates of HBsAg positives in each group are indicated according to the reported data for Altaians [46], Kazakhs [47], Tuvans [48], Teleuts and Taimyr [25], YNAR [49][50][51], Buryats [36], Yakuts [52,53], Chukchi [35], across Siberia [25], urban population of the western former USSR [35,38,[54][55][56], Central Asian Republics as cited in the systematic review [57], and Mongolia [58].

Diversity of the HBV Characteristics and Variants in Different Native Groups in Siberia
Nine native Siberian groups were studied (Tables 1 and 2, Figure 1): in southwestern Siberia, Altaians, Kazakhs (who live in Altai Republic), Tuvans, and Teleuts; in northern Siberia, a combined group of the inhabitants of Yamalo-Nenets Autonomous Region (YNAR), consisting of the Nenets, Khants, Komi, Kets, and Selkups, and a combined group of the inhabitants of the Taimyr Peninsula, including the Nganasans and Dolgans; and in eastern Siberia, Buryats, Yakuts, and Chukchi. Table 2 presents the number of examined samples from each group, the reported estimation of chronic hepatitis B prevalence, and the calculated distribution of the HBV subgenotypes and HBsAg subtypes in each of the listed groups.
The incidence of HBsAg-positive carriers in Siberian populations has been reported by different authors (see the comments in Table 2 for the citations), which have shown that the studied groups vary greatly in terms of HBV prevalence. Some groups may be considered as low-endemic for HBV B (e.g., YNAR [0-1.6% HBV prevalence]), but most are highly endemic groups according to the classification of [59], with 8-13% HBV prevalence in Tuvans, Chukchi, Teleuts, Taimyr Peninsula inhabitants, and Altaians and 24% prevalence in Yakuts (Table 2). For comparison, in the general population of Russia, the prevalence of HBsAg carriers was estimated to be 2-4% in the 2000s [56] (also see the references to Table 2). It is still unknown why many Siberian native groups demonstrate such high levels of HBV prevalence and which behavioral or other risk factors lead to the wide transmission of the virus, since no sound epidemiological study has been performed among these groups. Some studies [35,36,53] show that intravenous drug use is not common among any of the Siberian indigenous populations and thus does not strongly contribute to HBV transmission. Additionally, the incidence of HBsAg-positive carriers increases with age in some groups (Kets, Teleuts, Tuvans, Yakuts, YNAR, and Taimyr inhabitants) [25,50,60,61], indirectly indicating that existing (but not yet identified) risk factors affect the whole population rather than only specific cohorts with risky behavior.
The differences in incidence of HBsAg prevalence became the first evidence for the epidemiological heterogeneity of the studied groups. Analyzing the distribution of HBV subgenotypes also showed distinct viral populations circulating in Siberian native populations, even those with close geographical locations or common ethnic origin (see Figure 1 and Table S1 in Supplementary Materials).

Southwestern Siberia: D3 (ayw2) in Altaians v. D1 (ayw2) in Kazakhs, Tuvans, and Teleuts
In southwestern Siberia, the most prevalent HBV subgenotype in Altaians was D3 (76%) ( Table 2). The Altaians are a Turkic-language ethnic group (see Table S1 in Supplementary Materials), with a current population of~70,000 (here and below, population data are given according the Report of the Russian Census of 2010 [62]), and they live mostly in Altai Republic (Figure 1). At the same time, another subgenotype, D1, was absolutely dominant in the small group of Teleuts (100%, Table 2), who are close relatives of the Altaians and are even known as the sub-ethnos of "Upper Altaians", living less than 500 km north of the Altai Republic in the Kemerovo Region (see Table 1 and Figure 1). Moreover, D1 was the main HBV subgenotype in other Turkic-language groups of Siberian Kazakhs (100%, Table 2), who live in the closest district to the Altaian settlements in the same republic (distance between the Altaians and Kazakhs locations is about 300 km, Table 1, Figure 1). Finally, the same D1 dominated among the Tuvans (100%, Table 2), who represent another large (more than 260,000 people) Turkic-language group in Siberia. The Tuvans live in Tuva Republic, which shares a border with the Altai Republic. Thus, Tuvans are located rather closely to Altaians and Kazakhs, who are~500 km to the west.
Thus, in three of these four studied groups of people, which are ethnically and geographically close to each other (see Table S1 in Supplementary Materials), D1 was dominant, while D3 was dominant in the fourth. The differences in the incidence of D1 and D3 among the group of Altaians, Kazakhs, Teleuts, and Tuvans were statistically significant according to all pair comparisons between them (p < 0.05-0.001). This supports the existence of some yet unknown epidemiological barriers which isolate the Altaians (with predominance of the D3) from the surrounding HBV (D1) circulating in the closest populations.

Northwestern and Northern Siberia: D2 (ayw3) in YNAR v. D1 (ayw2) in Taimyr
In northwestern Siberia, we examined the YNAR (Tables 1 and 2), a multi-ethnic region at the north of the Ural Mountains (Figure 1) with extremely low population density; a total of 0.5 million people (including Russian urban populations, which cover about 83% of the general population of the region) lives in a territory of 0.77 million square kilometers. Among the non-urban indigenous populations, the most numerous are the Nenets (29,800 people in YNAR), Khants (9500), Komi (5150), and Selkups (2000). All these groups belong to peoples of the Uralic language family; the Komi are the members of the Finno-Ugric group, while Nenets, Khants, and Selkups represent different branches of the Samoyedic subfamily of peoples who are a result of intermixing between the western Ugric tribes and the ancient Siberian Mongoloid tribes. For additional data about language, genetic, and anthropological classifications of Siberian natives, see Table S1 in Supplementary Materials. Another northwestern Siberian group, the Kets, is very different from the others, descending from the ancient Yeniseian peoples that have almost disappeared and having proposed common ancestry with Native American groups.
Previous studies [25,36,50] have shown that the HBV types circulating in all the ethnic groups in YNAR, including the groups in our study ( Figure 1, Table 1), are homogenous and do not demonstrate any statistically significant differences. Because of this, we considered these peoples as a single epidemiological group of the YNAR, as shown in Tables 1 and 2. As mentioned above, the aboriginal peoples of YNAR represent a unique group because of the extremely low HBV endemicity (<1.6%; Table 2). In terms of HBV epidemiology, the inhabitants of YNAR are more similar to urban "Russian populations" in our study than to any other aboriginal groups located anywhere in Siberia (see Table 2). Like the "Russian population," the peoples in YNAR demonstrated low presence of HBsAg carriers and predominance of the D2 subgenotype (71%), and, consequently, subtype ayw3 (60%; Table 2); the association between ayw3 and D2, as well as ayw2 and D1/D3, is well known [17]. Thus, the YNAR group statistically significantly differs from all other Siberian groups and the groups of Central Asia ( Table 2) in incidences of D2 and ayw3 (p < 0.05-0.001) but is similar to the "Russian population," in which D2 prevalence is 56% (Table 2).
In our study, two other groups had notable prevalence of D2: the Yakuts (36% D2) and Buryats (31%; Table 2), although the prevalence of D2 in the YNAR (71%) was statistically higher (p < 0.001, p < 0.002). Figure 2 shows that the HBV strains in the D2 clade were intermixed regardless of the population from which they were collected-Russian, Yakuts, Buryats, or YNAR. This suggests that the D2 HBV population in these Siberian and "Russian" groups is common, despite large geographical distances between territories of the YNAR, Yakutia (Sakha Republic), and Buryatia (2000-3000 km, Figure 1).
In another north-Arctic Siberian region, the Taimyr Peninsula (Figure 1), two main indigenous groups were studied, the Nganasans and Dolgans (Table 1). They have different ethnic origin, but now live alongside each other in the same settlements. The Nganasans, having only about 860 representatives, belong to the above-mentioned Uralic Samoyedic language family, while the more numerous group, the Dolgans (about 5500 persons on Taimyr), is a Turkic-language group that arrived to the region later (19th century) and is in fact a northwestern branch of the Yakuts (see below). Previous studies showed no differences in the circulating HBV types between Dolgans and Nganasans [25]; thus, we combined them into one group in our study. In Taimyr, the dominant HBV subgenotype was  D1 (69%, Table 2), which, surprisingly made these populations similar to those located in southwestern Siberia (Kazakhs, Tuvans, Teleuts), but distinguished it from populations with D2 or D3 prevalence (Altaians, YNAR, Yakuts, Buryats, Chukchi) (differences confirmed statistically; p < 0.05-0.001). Interestingly, in this study, a significant incidence of HBV subgenotype C2 (19%, Table 2, Figure 2) was first discovered in the Taimyr. Genotype C is common mostly in southeastern Asia [17,20], and had never been previously reported so far north in Arctic Siberia (the Taimyr Peninsula is northernmost point of continental Eurasia). Since HBV genotype C has important implications in terms of clinical prognosis and therapy tactics, causing more severe infection with higher risk of cancer development than genotype D (cited by the review [63]), the discovered high prevalence of genotype C in Taimyr and Chukotka (see Figure 2 and below) should be taken into account by physicians and public health specialists in Siberia.
3.3.3. Eastern Siberia: D3 (ayw2) in Buryats, D2 (ayw2) and A2 (adw2) in Yakuts, and D3 (awy2) and C2 (adrq+) in Chukchi In southeastern Siberia, the large group of the Buryats was studied (Figure 1, Table 1). The Buryats (population in Russia~460,000) belong to a numerous Mongolic-language family and are traditionally considered as a northern sub-ethnos of Mongolians. In this group, all the HBV subgenotypes-D1, D2, and D3-were circulating simultaneously with 17%, 31%, and 40% prevalence, respectively ( Table 2). This absence of statistical differences in subgenotype rates makes the Buryats similar to the synthetic group of "Siberian natives" (with 19%, 24%, and 34% prevalence of D1, D2, and D3; Table 2). This may suggest that the Buryats are not isolated from surrounded aboriginal peoples and probably exchange HBV variants with other Siberian populations.
In eastern Siberia, the population of Sakha (Yakutia) Republic citizens was studied ( Figure 1, Tables 1 and 2). The Yakuts are the biggest non-Russian ethnic group in eastern Siberia, with a population of nearly half a million. The Yakuts are not considered "true" aboriginal natives, since they migrated into their present-day territory only~400 years ago. As the eastern-most ethnic group of the Turkic-language family in the world, the Yakuts possess pronounced Mongoloid phenotypical traits while intermixing with ancient Mongoloid Tungus tribes that formerly inhabited the Baikal region and Lena River basin.
Prevalence of the HBV subgenotype D2 in the Yakut population (36%, Table 2) makes these people statistically similar to the Buryats (31% D2 prevalence). In addition, as mentioned above, D2 was the main subgenotype in YNAR (71%; Table 2). However, the highest prevalence of HBsAg carriers (up to 24% according to [52]) and significant incidence of HBV genotype A (33%; Table 2) makes the Yakuts sharply different not only from the YNAR population, but from any other Siberian group in our study.
Finally, at the far northeastern end of Siberia, the group of Chukchi was studied ( Table 1). The Chukchi represent a relatively numerous (about 16,000 population) and very ancient aboriginal group of the Chukotka peninsula (Figure 1), belonging to the unique Kamchatkan family of peoples. The Paleo-Mongoloid ancestors of the Chukchi apparently moved to Chukotka over 5000-6000 years ago. The anthropological and genetic origins of the Chukchi give evidence to a common ancestor between them and Native Americans. Among Chukchi, the main HBV subgenotype was D3 (51%), similarly to the eastern Buryats (40%) and even to the Altaians (76%; Table 2). The main feature of this group, unusual for Russia, is the high incidence of subgenotype C2 (27%), which makes the Chukchi different from all the other Siberian groups, except the Taimyr inhabitants (19%; Table 2). Despite the large geographical distance between the Chukotka and Taimyr peninsulas (over 3000 km), all the Siberian strains of the C subgenotype in this study clustered into a joined clade of these two regions with good statistical support (0.86; Figure 2), indicating the common origin of HBV circulating among these groups of people. So, there might exist (or have previously existed) a mode of HBV transmission along the shore of the Arctic Ocean.

Discussion
As demonstrated above, many different HBV types circulate in Siberia, with distinct aboriginal groups often carrying different subgenotypes. But how did this pattern of HBV genetic diversity emerge, and, more widely, how might HBV have invaded Siberia?
HBV is known to have historically arisen not in Siberia, but in much more southern regions of the world, probably North Africa or the Middle East [17]. Thus, it should be assumed that HBV was introduced into Siberia along with the migrations of various populations in the past.
The first known Homo to live in Siberia occupied the region in the Paleolithic era, at least 45,000 years ago. These ancient inhabitants apparently disappeared and did not leave any known modern descendants. Much later, several big waves of settlement in Siberia occurred, which can be described very simplistically as follows (Figure 1) [64].
Since the Neolithic era, the most ancient known populations in eastern Siberia were Paleo-Mongoloids, who migrated there 8000-10,000 years ago. The modern descendants of these Mongoloid tribes are the Chukchi and possibly the Kets (see Table 1 and Table S1 in Supplementary Materials). Of course, there are other ethnic groups of Mongoloid origin in Siberia, including Koryaks, Evenks (Tungusy), and others, but here we discuss only the groups that we studied. It is believed that the Siberian Paleo-Mongoloids had also given rise to Native Americans.
Western Siberia was settled by the Uralic tribes at least 4000 years ago. These Uralic peoples divided into two major branches, giving origin to the Finno-Ugric group of peoples that further spread into the eastern part of Europe, and the Samoyedic group of peoples that occupied the territory of northern and central Siberia. Among modern Siberian aborigines, Komi are descendants of the Ugric people, while Samoyedic descendants include Khants, Nenets, Selkups, Nganasans, and some others, not included in our study.
Approximately 3000 years ago, the Turkic tribes started active migrations to southern Siberia, eventually also moving to the north (Figure 1). Intermixing between the Turkic people and Siberian Mongoloids, the Turkic people gave rise to many modern aboriginal groups, including the Altaians, Teleuts, Tuvans, and Kazakhs in the south, and even the Yakuts and Dolgans in the very northeast of Siberia. The Turkic migrations in Siberia took a long time; the formation of the Yakuts nation had only been completed in the 17th century. Late in this period, during the 12th and 13th centuries, the Mongols, one of the descendants of these ancient Turkic tribes (despite of the name, the Mongols are not genetically Mongoloid, originating, according to the most popular hypothesis, from Xiongnu, or Hunnu, people), moved back to the West, sweeping through all of Siberia and Eastern Europe with the Genghis Khan conquest. The closest Siberian branch of these historical Mongol tribes now are the Buryats.
In the 16th century, Russians began their active conquest of Siberia, which generally continues to this day. Certainly, this massive expansion hugely influenced all the smaller Siberian tribes. Due to these historical migration processes, at least 40 aboriginal groups exist in Siberia today [62], differing greatly in their origin, language, and traditions. Modern genetic and linguistic classification of Siberian natives is very complicated and controversial (see Table S1 in Supplementary Materials). However, as we saw above, overall, four main waves of migration formed the existing diversity of the indigenous peoples of Siberia: Mongoloid, Uralic (Ugric and Samoyedic), Turkic, and European Slavic (Russian) (Figure 1). This may explain why the genetic HBV types collected in Siberia are highly diverse.
It may be assumed that the C2 subgenotype of HBV was introduced into Siberia by the Mongoloids, since genotype C, as we know it, is one of the main HBV types in southeastern Asia, the territory in which Mongoloid peoples originated [17,20]. Surprisingly, HBV genotype C, being rare in the majority of studied populations, appeared to be prevalent in two geographically remote populations, in Chukchi and in Taimyr. Although we could not find any evidences of relationship or other connections between these two populations, there are two possible explanations for such unexpected similarity in genotype C prevalence. Firstly, this HBV genotype could be imported to both populations through Yakutia via the ancient route of migration. However, genotype C is not prevalent in modern Yakutia, its prevalence is only 2% compared to 19% on Taimyr and 27% on Chukotka. Secondly, genotype C could be introduced from Chukotka to Taimyr in modern times through seaports at the Arctic shore that have been operating since the beginning of the 20th century. Theoretically, some HBV strain(s) of the genotype C might have been imported to Taimyr with sailors and then spread there due to a "founder effect". Subgenotype D1 was obviously brought into Siberia with the Turkic migrations, since, in the modern world, this subgenotype is most common in historical homelands of the Turkic tribes [65]: Afghanistan [66], Iran and the Persian Gulf [67,68], and Turkey [69][70][71]. In addition, D1 is the main subgenotype among Turkic populations of the Central Asian republics and Mongolia (Table 2). We can thus hypothesize that the introduction of D1 into Siberia occurred enough times to form distinct phylogenetic clusters, which represents HBV subvariants of the Tuvans and Kazakhs, Teleuts, and Taimyr Peninsula inhabitants (see the upper part of Figure 2). Subgenotypes D2 and A2 most probably entered Siberia with the Russian expansion, as these HBV types are the most common in Russia and in Eastern European countries ( Table 2) [35,37,38,42]. Figure 2 shows that no clusters formed within D2 and A2 clades according to geographical region or ethnicity of the hosts; rather, strains from urban Russian and Eastern European populations are intermixed with strains from Siberian natives. It may be even suggested that the incoming Russians were the main source of HBV for the native populations of YNAR (where 71% of the found strains were D2) and Yakutia (36% D2 and 33% A2, Table 2). However, this does not explain why the incidence of HBsAg carriers is so high (up to 24%) in Yakutia and so low in YNAR (1.6%) ( Table 2); why the prevalence of the A2 among the Yakuts is twice as high as among the "Russian population" (33% vs. 17%, p < 0.005); and, finally, why the Dolgans (Taimyr Peninsula people who are the northern relatives of Yakuts) do not carry D2 and A2 (Table 2) but have significant incidence of subgenotype C2 (19%), which is very rare in Yakuts (2%, p < 0.005). All these questions remain to be answered in further investigations. For now, we can suppose that, apart from epidemiological factors in HBV spread in Siberia, there were a number of "bottle-necks" when a specific HBV type developed an advantage under yet unknown circumstances.
It is not clear how D3 entered and spread in Siberia. In the modern world, there are no regions with a clear predominance of this HBV subgenotype. As a minor subgenotype, D3 has been reported in South America [72,73], India [74,75], and Egypt [76]. It seems that the Siberian aboriginal population in which D3 is the most prevalent subgenotype (34%, Table 2) is the main reservoir of D3 on the planet. Moreover, we can conclude that there exists a Siberian variant of D3, strains of which have formed a large and explicit cluster inside D3 (see the upper part of the D3 clade in Figure 2; support index is 0.85). Of course, as mentioned above, it is unlikely that D3 arose in Siberia. Most probably, it had come from more southern regions (e.g., India, where it circulates to date [74,75]) but became established in Siberia, having been displaced from southern countries by the more modern subtype D1 [17,65]. In turn, from Siberia, D3 may have entered South America moving with the ancestors of Native Americans. More recently, it may have returned to Europe with migrants from Latin America to the Mediterranean, where this subtype is also found today (see D3 clade in Figure 2 and the full tree in Figure S1 in Supplementary Materials). In addition, the D3 clade in Figure 2 includes some ancient strains from Russia and Kazakhstan [31], supporting the hypothesis that this subgenotype was already common in those regions (as well as Siberia) 600-1100 years ago.
We clearly understand that the above speculations are completely hypothetical. In this paper, we do not provide scientific evidence to suggest that HBV spread in Siberia exactly as we suppose. However, our reasonings do not contradict the static picture of the genetic diversity of HBV in Siberia that we have described. We deliberately did not use any phylodynamic or phylogeographic methods in this study because, to date, we are unable to reasonably choose one of the existing models of the HBV molecular clock and believe that the reported rates of mutations in the HBV genome are still debatable. In addition to that, we realize that the phylogenetic methods we used, and especially the fact that we included HBV sequences of different lengths in the same analyses (including some sequences represented only by the small S-gene), are suitable for subgenotyping purposes but are unable to disclose precise phylogenetic relationships between the strains. Because of this, we publish all the sequences in File S1 in Supplementary Materials and invite more sophisticated researchers to analyze them with better methods after developing a commonly recognized model for HBV evolution.

Data Availability Statement:
The data presented in this study are available in File S1 in Supplementary Materials. Any additional data are available on request from the corresponding author: victormanuilov@medipaltech.ru.