An estimated 250 million humans [1
] are chronically infected with hepatitis B virus (HBV), causing an estimated 887,000 annual deaths, mostly due to the long-term sequelae liver cirrhosis and hepatocellular carcinoma (HCC) [2
]. The viral population can be divided into nine genotypes (A to I) [3
] which differ in more than 7.5% of their nucleotide sequences [3
] and which are further subdivided into subgenotypes with a nucleotide divergence greater than 4% [3
]. While genotypes A to H have long been accepted as individual genotypes, two new genotypes (I and J) were proposed more recently [5
]. Genotype I was first described in 2008 after isolation from a Vietnamese patient and constitutes a recombination of genotypes A, C and G [5
]. Since the nucleotide divergence, especially compared to genotype C, is relatively small, it was long debated if this strain should be considered a new genotype [7
]. Finally, the identification of similar HBV strains in Laos, North India and China in isolated native populations, indicating that the virus strains have circulated in these populations for a longer time, led to acceptance as an independent genotype [8
]. Another strain, previously proposed to be a new genotype “J”, was isolated from a Japanese patient who had lived on Borneo island [6
]. Phylogenetic analysis revealed that the strain rather resembles gibbon than human HBV, and may result from recombinations with human genotype C [11
]. Since further reports of human infections with this strain are lacking, it has so far not been recognized as a relevant HBV genotype [11
Sequencing and genotyping of HBV isolates is not routinely done and rarely reported in, e.g., epidemiological studies. Hepatitis B virus genotypes, however, vary in their clinical consequences including the natural course of infection, disease progression and treatment response (reviewed in [3
]): While genotypes B, C and I are associated with a more frequent vertical transmission from mother to child, a higher transmission rate during sexual contact or injecting drug use has been reported for genotypes A, D and G [3
]. A higher chronification rate after infections with genotypes A and C, compared to genotypes B and D, has also been reported [14
], but may also be due to the transmission route. Among chronic HBV carriers, a lower rate of seroconversion to HBV-e-antigen antibodies (anti-HBe) was proposed in genotype C and D infections [14
]. Also, a faster disease progressions to liver cirrhosis and HCC are associated with infections with genotypes C, D and F [14
]. While all genotypes similarly respond to treatment with reverse transcriptase inhibitors, under interferon-α treatment, genotypes A and B show an increased virological response and higher anti-HBe seroconversion than other HBV genotypes [14
The separation of the HBV population into different genotypes can be dated back 30 million years ago, when the ancestors of modern Homo sapiens
dispersed across Africa and Eurasia [18
]. The distinct appearances of genotypes and subgenotypes in certain geographical regions and ethnic groups [3
] allow for the confirmation of prehistoric human migrations by the transfer of HBV genotypes and subtypes from one continent to another [3
]. While previous studies described differences between local genotype appearances, the number of infections with each HBV genotype and the genotype distribution within global chronic HBV infections have not been studied. Given the specific characteristics of each HBV genotype, this information could, however, be valuable for precising the HBV disease burden and informing health policy, for example, by means of predicting and estimating prevention and treatment needs and relevance of specific treatments in a country. A further adaptation and rational application of diagnostic tests, as well as the development of new broadly applicable therapies, may also be stimulated by scoping HBV genotypes worldwide. We approached this need and performed a literature search for HBV genotyping studies and applied previously published HBV surface antigen (HBsAg) prevalence estimates [1
] and United Nations (UN) population data [20
] to approximate the number of HBV infections by each genotype per country, world region and globally.
Information on the contributions of each HBV genotype to the global burden of HBV infection is missing. This scoping review on HBV genotyping results and the combination with HBsAg prevalence and population estimates approaches this gap. We found that the different distributions of HBV genotypes, together with varying numbers of chronically HBV-infected individuals in world regions, result in strong variations in the global number of infections caused by each genotype: genotypes A to E were estimated to cause the vast majority of infections, although each in different parts of the world accounting together for 96% of chronic HBV infections worldwide. In contrast, the remaining genotypes F to I each were estimated to cause a significantly lower number of infections, together causing less than 2% of chronic HBV infections in the world.
The number and quality of data identified by our scoping review strongly varied between countries even within highly developed regions of the world. In each world region, there were countries with good, but also with poor quality data or no data at all. The distribution of HBV genotypes showed an interrelation with geographic boundaries (e.g., oceans or the Sahara Desert) or the dissemination of ethnic groups. For instances, while HBV-infections in Sub-Saharan Africa were primarily caused by genotypes A and E, this was not the case in Northern Africa. Here, the population, which rather resembles the populations of West Asia [21
], showed also a similar HBV genotype distribution as this region (mainly genotype D). Additionally, world areas, in which large proportions of the population descend from migrants from other parts of the world, showed a genotype distribution reflecting the areas, from which migrants originated. This is illustrated by, for example, relative high frequencies of genotypes A, B, C and D in Northern America referring to migrants from Europe and Asia. Another example constitutes the Caribbean, where mostly genotypes A and D were found, correlating with higher proportions of migrants originating from the African continent. As a consequence of different HBV endemicity levels and population sizes, genotypes with a high occurrence in regions, such as Southeast and East Asia (genotypes B and C) and Sub-Saharan Africa (genotypes A and E), caused a large proportion of worldwide infections. In contrast, genotype F, which was dominant in Latin America, was estimated to cause less than 1% of chronic HBV infections globally, due to a relatively low HBV endemicity in this area.
Our results need to be interpreted with caution, as several technical limitations are inherent in the underlying data and the method of extrapolating the genotype distribution from study populations to global HBV infections. A potential cause of bias is the genotyping method applied: the majority of studies only sequenced parts of the viral genome or used non-sequencing-based methods. This can lead to wrong classifications, especially for recombinant viruses (including genotype I which constitutes a recombination of genotype A, C and G) [22
]. In some instances, tests were used (e.g., line probe assays) with inability to detect all genotypes. Another impacting factor is the analysis year of samples: almost half of studies were published before 2010, i.e., before genotype I was described or recognized as an independent genotype. While current data confirm that genotype I is rare and only found in Southeast Asia, there is a potential that we underestimated its frequency, because infections with genotype I were missed in earlier studies.
Regarding the precision of extrapolating the genotype distribution found in a study population to all HBV-infected individuals of the respective country, some limitations need to be mentioned. Two-thirds of studies were based on a single town or region in a country. This region may not reflect HBV-infected individuals in the whole country, especially in cases, where ethnic groups that potentially carry different HBV genotypes live in separate geographic areas. However, a significant number of studies which sampled more than one region did not do this in a manner that would represent the whole country in a satisfying way, which was especially a problem for large countries (e.g., Russia).
In 40% of studies included, we identified a risk to potentially favor selection of certain genotypes. This, for example, applies to studies which exclusively included patients with advanced fibrosis/cirrhosis or HCC which could favor the selection of genotypes that are associated with a faster disease progression. Additionally, often genotypes were not detected at all in included studies for a country, but studies testing specific minorities or non-national immigrants in the country (which we excluded due to our exclusion criteria) proofed the presence of these genotypes at least at low level. This suggests that rare genotypes were often not detected, most likely because of small sample sizes or because certain minorities were not represented in study populations.
Low numbers of genotyped samples could also result in an imprecise estimation of the genotype distribution in several countries. For many countries, the sample size could be enlarged by pooling samples from several studies, but this was not possible for all countries. Further compromising the precision of the global genotype distribution, the other parameters used—HBsAg prevalence and population data—also only constitute estimates and in most cases did not derive from the same time point and might have changed over time. Moreover, some genotyping studies included samples which were either collected several years ago or during a large period, questioning representativeness for the year 2015. We furthermore could only calculate the number of genotype infections if all three parameters were available for a country. While we were able to retrieve data for 125 countries which, according to the UN WPP, represented 96% of the world population in 2015, we could not calculate genotype infections for the remaining countries. Future studies should focus on countries with highest need to perform HBV genotyping, including countries for which no data could be retrieved or countries for which we assessed available data to have poor quality (Table S1
, Supplementary Materials
Importantly, while we assessed the quality of studies using a scoring system, this served only descriptive purposes. Due to limitation of data, we did not adjust for factors besides the sample number in our aggregation analyses. We also did not include age as a factor which is of note due to the interrelation of transmission routes and genotypes which could lead to varying genotype frequencies between age groups. We omitted these factors from our analysis, as for many countries, only a single source was available and information on the age of individuals infected with a certain genotype was mostly missing. The scarcity of genotyping data, when compared to seroprevalence studies, probably results from the fact that genotyping, at least when performed by sequencing, is labor and cost-intensive, which constitutes a limitation, especially for resource-poor countries. However, we also cannot exclude that available genotyping data were missed by our literature search. As a consequence of the scarcity of HBV genotyping data, the extrapolation of genotype distribution was based on only 26,000 genotyped samples, which constitutes only around 0.01% of chronic HBV infections. Thus, the global HBV genotype distribution calculated in our study carries the risk of high uncertainties, which we did not define due to limited data points. The global genotype distribution calculated in this study should, therefore, be regarded as a first approximation, but a more precise estimation is warranted, by taking other factors and co-variates into consideration and by including additional data that may come up.
We chose not to extend our study to include subgenotypes for several reasons: (i) Only few publications included information on subgenotypes; (ii) the majority of studies used genotyping methods, which are not suitable to determine subgenotypes, including sequencing of only parts of the viral genome [4
]; and (iii) the definition of subgenotypes has been subject to changes during the phase, from which genotyping studies were selected [4
]. Thus, we believe the quantity and quality of available data were not sufficient to allow estimation of the subgenotype distribution with sufficient precision.
Despite the limitations described, this study provides an up-to-date insight into the worldwide HBV genotyping data from recent years and is an important initial approach to quantifying how genotypes contribute to the global burden of chronic HBV infection. Our study identified countries with no or only low quality genotyping data urgently requiring further studies. The wide distribution of HBV genotypes around the world underscores the need to ensure that the applied diagnostic tests and therapeutic approaches address the variety of HBV genotypes. Most experimental cell culture and animal models currently used to study HBV biology and new treatment options are based on genotype D or A, which we estimated, account for only 1/5th and 1/6th of infections worldwide, respectively. This reflects the need to expand experimental models to the other HBV genotypes. While efforts are ongoing to establish models for genotypes B and C, genotype E seems to be disregarded in this respect, although it seems more important than genotypes A and B. Experimental models for drug development should be expanded to at least cover genotypes A to E to represent the vast majority of HBV infections worldwide.