Variable Proportions of Phylogenetic Clustering and Low Levels of Antiviral Drug Resistance among the Major HBV Sub-Genotypes in the Middle East and North Africa

Hepatitis B virus (HBV) infection remains a major public health threat in the Middle East and North Africa (MENA). Phylogenetic analysis of HBV can be helpful to study the putative transmission links and patterns of inter-country spread of the virus. The objectives of the current study were to analyze the HBV genotype/sub-genotype (SGT) distribution, reverse transcriptase (RT), and surface (S) gene mutations and to investigate the domestic transmission of HBV in the MENA. All HBV molecular sequences collected in the MENA were retrieved from GenBank as of 30 April 2021. Determination of genotypes/SGT, RT, and S mutations were based on the Geno2pheno (hbv) 2.0 online tool. For the most prevalent HBV SGTs, maximum likelihood phylogenetic analysis was conducted to identify the putative phylogenetic clusters, with approximate Shimodaira–Hasegawa-like likelihood ratio test values ≥ 0.90, and genetic distance cut-off values ≤ 0.025 substitutions/site as implemented in Cluster Picker. The total number of HBV sequences used for genotype/SGT determination was 4352 that represented a total of 20 MENA countries, with a majority from Iran (n = 2103, 48.3%), Saudi Arabia (n = 503, 11.6%), Tunisia (n = 395, 9.1%), and Turkey (n = 267, 6.1%). Genotype D dominated infections in the MENA (86.6%), followed by genotype A (4.1%), with SGT D1 as the most common in 14 MENA countries and SGT D7 dominance in the Maghreb. The highest prevalence of antiviral drug resistance was observed against lamivudine (4.5%) and telbivudine (4.3%). The proportion of domestic phylogenetic clustering was the highest for SGT D7 (61.9%), followed by SGT D2 (28.2%) and genotype E (25.7%). The largest fraction of domestic clusters with evidence of inter-country spread within the MENA was seen in SGT D7 (81.3%). Small networks (containing 3-14 sequences) dominated among domestic phylogenetic clusters. Specific patterns of HBV genetic diversity were seen in the MENA with SGT D1 dominance in the Levant, Iran, and Turkey; SGT D7 dominance in the Maghreb; and extensive diversity in Saudi Arabia and Egypt. A low prevalence of lamivudine, telbivudine, and entecavir drug resistance was observed in the region, with almost an absence of resistance to tenofovir and adefovir. Variable proportions of phylogenetic clustering indicated prominent domestic transmission of SGT D7 (particularly in the Maghreb) and relatively high levels of virus mobility in SGT D1.


Introduction
Chronic hepatitis B (CHB) is one of the major global health care issues, with more than a quarter-billion people chronically infected by hepatitis B virus (HBV), resulting in more than 800,000 deaths as of 2015 [1,2]. Despite the availability of an effective HBV vaccine and the accelerated advent of antiviral therapies to control CHB, HBV has been suggested to cause almost half of all deaths globally from viral hepatitis, half of all deaths from hepatocellular carcinoma (HCC), and a third of all mortalities from hepatic cirrhosis [3][4][5].
The substantial global burden of CHB and its complications mandates continuous and comprehensive research involving its epidemiology, the spread of antiviral drug resistance, and the emergence of vaccine-escape mutations in the virus [6][7][8]. This can help in refining knowledge regarding HBV and CHB. Ultimately, this can aid in achieving the World Health Organization (WHO) goal of eliminating hepatitis B as a public health threat (which entails a 90% reduction in new chronic infections and a 65% reduction in mortality) [9,10].
The genome of HBV is a circular, partially double-stranded DNA with four overlapping open reading frames (ORFs) and a length of approximately 3.2 kilobases (kb) [11]. These four ORFs encode the polymerase (P), surface (S), core (C), and X proteins [11]. This virus is unique among human DNA viruses due to its reverse-transcriptase that lacks a proofreading activity [12,13]. The consequence of such a replication mechanism is a relatively higher evolutionary rate compared to other DNA viruses [14]. The extensive genetic diversity of HBV, besides the rapid development of drug resistance and vaccine escape mutations, are amongst the manifestations of such a rapid evolution [15].
On the other hand, the presence of overlapping ORFs in the viral genome constrains the evolutionary rate of HBV since a synonymous mutation in one ORF can result in a non-synonymous mutation in another ORF [16][17][18]. The relatively swift evolution of HBV has resulted in its diversification into several genotypes (A-J), which are further classified into multiple sub-genotypes (SGTs) assigned to Arabic numbers [17,[19][20][21][22]. Genotype assignment is based on pairwise inter-genotype genetic distance > 8.0%, while SGTs are grouped together based on pairwise genetic distances of 4.0-8.0% [19]. The gold-standard method for HBV genotyping is phylogenetic analysis, preferably of full-genomes; however, analysis of the S gene region that overlaps with the P region can be sufficient for accurate genotype determination [19].
The epidemiologic investigation of HBV infection reveals considerable variability in terms of prevalence, genotype distribution, and transmission patterns. Based on rates of the chronic carriage of hepatitis B surface antigen (HBsAg), countries can be classified as countries with hyperendemicity (HBsAg prevalence of > 8.0%), intermediate endemicity (HBsAg prevalence of 2.0-8.0%), and low endemicity with HBsAg prevalence < 2.0% [23][24][25].
Five HBV genotypes cause 96% of CHB cases worldwide, with genotype C as the most common (26%), followed by genotype D (22%), genotype E (18%), genotype A (17%), and genotype B (14%) [26]. Genotype distribution varies in different regions, with genotype D predominating in the Mediterranean region, while genotypes C and B are mostly reported in South East Asia, China, and Japan, and genotype A is most commonly found in the United States and Northern Europe [26,27].
The potential clinical value of HBV genotyping has gained much attention recently, with examples including possible association with more severe disease among the patients infected with genotype C compared to those with genotype B infection, and another example pointing to a possible earlier occurrence of HCC among patients infected with genotype D compared to those with genotype A infection [8,[28][29][30][31]. In contrast, other studies found no such link [32].
Transmission of HBV occurs mainly through the parenteral route, besides horizontal, sexual, or vertical routes [33][34][35]. A few studies reported potential links between certain HBV genotypes and possible routes of transmission: the preferred route for genotype A was sexual transmission, while genotype D was mainly transmitted by blood transfusion and transplantation [32]. The Middle East and North Africa (MENA) represent a region with shared demographic, cultural, and economic attributes [36]. The study of virus transmission in the region as a single unit can be invaluable in the epidemiologic investigation of virus spread [37][38][39][40][41][42]. The MENA region can be classified among the regions with intermediate to high endemicity of CHB [41,43]. In addition, CHB was reported as an important cause of HCC in the region [44]. Genotype D was found to dominate infections in the MENA region from various countries in different time periods [45][46][47][48][49][50][51][52][53]. This appears as an expected distribution considering that the MENA region is the putative epicenter for genotype D of the virus, with widespread distribution of its SGTs in the region [54][55][56]. Despite the recent reports indicating that the coverage of the HBV vaccine has increased to exceed 80% for the third dose, the low birth dose coverage remains a challenge in the MENA region [57].
The objectives of the current study were: (1) To characterize the genetic diversity of HBV in the MENA region; (2) To estimate the prevalence of the RT and S genes' mutations in the region; (3) To assess the proportions of phylogenetic clustering in the MENA region as indicators of domestic transmission of the virus.

Study Design and Sequence Inclusion Criteria
Using the search tool in GenBank (the National Institutes of Health (NIH)) genetic sequence database (www.ncbi.nlm.nih.gov/genbank/ accessed on 30 April 2021), a search was conducted for all HBV sequences that were collected from the following countries/ territories of the MENA region: Algeria, Bahrain, Cyprus/Northern Cyprus, Egypt, Iran, Iraq, Jordan, Kingdom of Saudi Arabia (KSA), Kuwait, Lebanon, Libya, Mauritania, Morocco, Oman, Palestine, Qatar, Somalia, Sudan, Syria, Tunisia, Turkey, United Arab Emirates (UAE), and Yemen [58]. The search was completed on 30 April 2021, and the following sequence metadata were also retrieved (if available): year and country of sequence collection, genotype/SGT data, and sequence length.

Determination of HBV Genotype Distribution, Antiviral Resistance and S Gene Mutations
In order to investigate the HBV genotype distribution in the MENA, the final dataset was compiled using the following exclusion criteria: (1) Country of collection other than MENA countries described in the section above; (2) Sequence length of less than 300 base pairs (bp, based on the assumption that shorter sequences can yield inaccurate subgenotyping results); or (3) Sequences that did not span a part of the P/S ORFs. The determination of HBV genotypes and SGTs, antiviral drug resistance, and S gene mutations were conducted using the Geno2pheno (HBV) online tool and compared with GenBank metadata [59]. Results for sub-genotyping were mostly concordant except for SGT D7, with sequences assigned to SGT D4 in Geno2pheno (HBV) online tool. These sequences were considered as SGT D7 based on original nomenclatures used by the original GenBank submitters/authors.

Analysis of HBV Domestic Transmission in the MENA Region
In order to investigate the domestic transmission of the most prevalent HBV SGTs in the MENA (SGTs with > 10 MENA sequences), the final datasets were compiled according to the following inclusion criteria: (1) Molecular sequences covering the genomic region that was selected for the final analysis, which spanned part of the P and S genes (positions 216-755 in relation to HBV reference genome with GenBank accession NC_003977); and (2) Final SGT dataset with at least 10 MENA sequences. The final number of molecular HBV sequences that were considered for phylogenetic clustering analysis was 2384.
Phylogenetic inference of possible transmission links among the MENA HBV sequences was performed using the maximum likelihood (ML) approach as implemented in PhyML 3.0 [60]. A search for similar HBV GenBank sequences was performed using the BLAST tool, looking for the ten most similar sequences, which were included for final ML analysis together with the MENA sequences for each SGT [61]. The global sequences without known country of origin and those with stop codons were excluded from the final analysis.
The criteria to identify phylogenetic clusters indicating putative epidemiologic linkages were: (1) Internal branch support values ≥ 0.90 using approximate Shimodaira-Hasegawa-like likelihood ratio test (aLRT-SH); (2) An ad hoc mean intra-cluster genetic distance of ≤ 2.5%; (3) More than or equal to 75.0% sequences collected within the MENA countries/territories [62,63]. Selection of the genetic distance was in light of HIV phylogenetic cluster analysis basis together with one study investigating familial clustering of HBV [64,65]. Transmission cluster analysis was performed using Cluster Picker 1.2 tool to identify the putative monophyletic clades in the MENA region [66]. Based on the cluster size, each cluster was classified as dyads (two sequences), small networks (three-14 sequences), or large networks (≥15 sequences) [37,67,68].

Statistical Analysis
Analysis was conducted with IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp. The possible associations between categorical variables were evaluated using the chi-square (χ 2 ) test. The statistical significance was considered for p < 0.010.

Characteristics of the MENA HBV Molecular Dataset
The total number of sequences that spanned part of the P gene and were >300 base pairs in length was 4352, which formed the basis for the final genotyping of HBV in the MENA ( Figure 1). Three-quarters of all sequences came from four MENA countries: Iran (n = 2103, 48.3%), KSA (n = 503, 11.6%), Tunisia (n = 395, 9.1%), and Turkey (n = 267, 6.1%, Figure 2). The years of sequence collection ranged from 2000 to 2019, with a total of 1329 sequences that lacked the dates of collection.
Phylogenetic inference of possible transmission links among the MENA HBV sequences was performed using the maximum likelihood (ML) approach as implemented in PhyML 3.0 [60]. A search for similar HBV GenBank sequences was performed using the BLAST tool, looking for the ten most similar sequences, which were included for final ML analysis together with the MENA sequences for each SGT [61]. The global sequences without known country of origin and those with stop codons were excluded from the final analysis.
The criteria to identify phylogenetic clusters indicating putative epidemiologic linkages were: (1) Internal branch support values ≥ 0.90 using approximate Shimodaira-Hasegawa-like likelihood ratio test (aLRT-SH); (2) An ad hoc mean intra-cluster genetic distance of ≤ 2.5%; (3) More than or equal to 75.0% sequences collected within the MENA countries/territories [62,63]. Selection of the genetic distance was in light of HIV phylogenetic cluster analysis basis together with one study investigating familial clustering of HBV [64,65]. Transmission cluster analysis was performed using Cluster Picker 1.2 tool to identify the putative monophyletic clades in the MENA region [66]. Based on the cluster size, each cluster was classified as dyads (two sequences), small networks (three-14 sequences), or large networks (≥15 sequences) [37,67,68].

Statistical Analysis
Analysis was conducted with IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp. The possible associations between categorical variables were evaluated using the chi-square (χ 2 ) test. The statistical significance was considered for p < 0.010.

Characteristics of the MENA HBV Molecular Dataset
The total number of sequences that spanned part of the P gene and were >300 base pairs in length was 4352, which formed the basis for the final genotyping of HBV in the MENA ( Figure 1). Three-quarters of all sequences came from four MENA countries: Iran (n = 2103, 48.3%), KSA (n = 503, 11.6%), Tunisia (n = 395, 9.1%), and Turkey (n = 267, 6.1%, Figure 2). The years of sequence collection ranged from 2000 to 2019, with a total of 1329 sequences that lacked the dates of collection.

Genotype D Dominated HBV Infections in the MENA Despite the Detection of an Extensive Genetic Diversity
In the 20 MENA countries/territories with HBV sequences eligible for HBV genotype analysis, genotype D was the most common in 19 countries. Somalia was the only country with genotype A as the most common HBV genotype ( Figure 3).

Genotype D Dominated HBV Infections in the MENA despite the Detection of an Extensive Genetic Diversity
In the 20 MENA countries/territories with HBV sequences eligible for HBV genotype analysis, genotype D was the most common in 19 countries. Somalia was the only country with genotype A as the most common HBV genotype ( Figure 3).

Genotype D Dominated HBV Infections in the MENA Despite the Detection of an Extensive Genetic Diversity
In the 20 MENA countries/territories with HBV sequences eligible for HBV genotype analysis, genotype D was the most common in 19 countries. Somalia was the only country with genotype A as the most common HBV genotype ( Figure 3).  Countries of the Middle East displayed a significantly higher prevalence of genotype D vs. all other genotypes grouped together compared to countries of MENA African countries (3195/3484 (91.7%) vs. 572/868 (65.9%); p < 0.001, χ 2 test, Figure 4). Countries of the Middle East displayed a significantly higher prevalence of genotype D vs. all other genotypes grouped together compared to countries of MENA African countries (3195/3484 (91.7%) vs. 572/868 (65.9%); p < 0.001, χ 2 test, Figure 4).
The most extensive genetic diversity of HBV was observed in KSA, where 16 different genotypes/SGTs were found, followed by Egypt (15 different genotypes/SGTs), Tunisia (12 different genotypes/SGTs), and Turkey (10 different genotypes/SGTs, Table 1). Jordan was the only country with the exclusive presence of a single SGT (D1, Table 1).
Sub-genotype D1 was the most common genetic variant of HBV in 14 out of the 20 MENA region countries/territories. Sub-genotype D7 was the most common in the Maghreb countries (Morocco, Algeria, and Tunisia), while in Sudan, genotype E dominated, and in Somalia, SGT A1 was the most common ( Figure 5).
The most extensive genetic diversity of HBV was observed in KSA, where 16 different genotypes/SGTs were found, followed by Egypt (15 different genotypes/SGTs), Tunisia (12 different genotypes/SGTs), and Turkey (10 different genotypes/SGTs, Table 1). Jordan was the only country with the exclusive presence of a single SGT (D1, Table 1).

The Most Common HBV Genotype/SGT 4 (n, %) Other SGTs
Lebanon ( Sub-genotype D1 was the most common genetic variant of HBV in 14 out of the 20 MENA region countries/territories. Sub-genotype D7 was the most common in the Maghreb countries (Morocco, Algeria, and Tunisia), while in Sudan, genotype E dominated, and in Somalia, SGT A1 was the most common ( Figure 5).

Antiviral Drug Resistance and S Gene Mutations in the MENA HBV Sequences
For antiviral drug resistance prediction, the MENA HBV sequences that were assigned to the year of the collection were only used (n = 3023), excluding the sequences that lacked such data (n = 1329). The estimates for each drug were used for susceptible vs. resistant/partly resistant/limited susceptibility categories, while the rest were excluded (unknown, compensatory). The overall prevalence of resistance to the five antiviral drugs is shown in (Table 2).

Antiviral Drug Resistance and S Gene Mutations in the MENA HBV Sequences
For antiviral drug resistance prediction, the MENA HBV sequences that were assigned to the year of the collection were only used (n = 3023), excluding the sequences that lacked such data (n = 1329). The estimates for each drug were used for susceptible vs. resistant/partly resistant/limited susceptibility categories, while the rest were excluded (unknown, compensatory). The overall prevalence of resistance to the five antiviral drugs is shown in (Table 2).  1 Sequences excluded from analysis were those with compensatory mutations (for lamivudine, n = 75; for entecavir and telbivudine, n = 67); those with mutations of unknown value (for lamivudine,  1 Sequences excluded from analysis were those with compensatory mutations (for lamivudine, n = 75; for entecavir and telbivudine, n = 67); those with mutations of unknown value (for lamivudine, n = 214; for adefovir, n = 81; for tenofovir, n = 75; for entecavir, n = 244; and telbivudine, n = 204); 2 S: Susceptible; 3 R: Resistant, which included sequences assigned to the following categories (resistant/partly resistant/limited susceptibility); 4 n: Number.
No significant temporal changes in resistance among HBV MENA sequences were observed; however, spatial differences were detected with a higher percentage of resistance to lamivudine, entecavir, and telbivudine in the Middle East compared to North Africa ( Table 3). The complete list of mutations detected in each of the five HBV antiviral classes is summarized in (Table 4).  For the S gene, the most frequent mutations detected are summarized in (Figure 6). The most common S gene mutations included: 143L (n = 373), 144A (n = 327), 126I (n = 323), and 141R (n = 302).

Variable Proportions of Putative Domestic HBV Transmission in the MENA Region for Different SGTs
The final number of molecular HBV sequences that were considered for phylogenetic clustering analysis was 2375, based on the genomic region that was selected for final analysis, which spanned part of the P and S genes (positions 216-755 in relation to HBV reference genome with GenBank accession no. NC_003977). The selection of this region was based on an initial analysis of the HBV genetic region with the highest number of MENA sequences (Figure 7). Sub-genotypes that had at least 10 sequences that were collected in the MENA region included in descending order: D1 (n = 1777), D7 (n = 268), A1 (n = 102), D2 (n = 78), E (n = 73), D3 (n = 48), and A2 (n = 23). The excluded SGTs from transmission cluster analyses included: C2 (n = 3), B1 (n = 1), B4 (n = 1), and C1 (n = 1).

Variable Proportions of Putative Domestic HBV Transmission in the MENA Region for Different SGTs
The final number of molecular HBV sequences that were considered for phylogenetic clustering analysis was 2375, based on the genomic region that was selected for final analysis, which spanned part of the P and S genes (positions 216-755 in relation to HBV reference genome with GenBank accession no. NC_003977). The selection of this region was based on an initial analysis of the HBV genetic region with the highest number of MENA sequences (Figure 7). Sub-genotypes that had at least 10 sequences that were collected in the MENA region included in descending order: D1 (n = 1777), D7 (n = 268), A1 (n = 102), D2 (n = 78), E (n = 73), D3 (n = 48), and A2 (n = 23). The excluded SGTs from transmission cluster analyses included: C2 (n = 3), B1 (n = 1), B4 (n = 1), and C1 (n = 1).    (73,42.0%). For SGT D2, the total number of MENA sequences enclosed within domestic phylogenetic clusters was 22/78 (28.2%). The clusters comprised five dyads and two small networks, each with five MENA sequences. For genotype E, the total number of MENA sequences enclosed within domestic phylogenetic clusters was 18/73 (25.7%). These domestic clusters were divided into a single dyad (n = 2, 11.1%), a single small network (n = 3, 16.7%), and a large network (n = 13, 72.2%). A single small network was noticed for SGT D3 with an overall proportion of clustering of 4/48 (8.3%). For SGTs A1 and A2, no domestic clusters were identified.
The largest fraction of sequences from different MENA countries were seen in SGT D7 (135/166, 81.3%). Inter-country spread of SGT D1 was seen in clusters containing a total of 66 sequences (24.7%), while for SGT D2, inter-country spread comprised 5/22 (22.7%), and it was 3/18 (16.7%) for genotype E. Out of the total 109 MENA clusters identified in this study, 30 clusters contained sequences from two or more MENA countries. A fraction of the transmission clusters with MENA sequences collected in two or more countries originated from neighboring countries: Iran and Oman (D1 dyad, D7 large network), Egypt and KSA (D1 small network), Egypt and Sudan (E small network), Iran and Turkey (D1 two small networks), and Algeria, Morocco and Tunisia (D7 two small networks, Table 5). All ML trees used to infer the putative transmission clusters in the MENA region are provided in (Supplementary Materials). Table 5. Detailed description of the HBV phylogenetic clusters detected in the MENA region stratified by cluster size.

Discussion
The MENA region can be viewed as a region with a relatively high burden of hepatitis B [41,69,70]. In this study, we investigated HBV genetic diversity, antiviral drug resistance, and transmission clustering patterns in the MENA region using a phylogenetic-based approach. Countries of the region can be classified as intermediate to highly endemic in relation to the prevalence of CHB [27,41,71]. The study of viral transmission in a region with a common culture, economic difficulties, belief systems, and behaviors can be valuable to reveal the dynamics of virus spread [37,[72][73][74]. In addition, the study of HBV genotype distribution is valuable considering its potential association with the progression of liver disease, viral load, and viral clearance [75,76].
A major result of this study was the observation of the extensive genetic diversity of HBV in the MENA region. Nevertheless, genotype D dominated infections in the majority of the MENA countries, particularly in the Middle East. Additionally, the results indicated that genotype D emerged as the most significant genetic variant of the virus from an epidemiologic point of view due to its contribution to the local spread of HBV. This result appears plausible considering the previous evidence of genotype D dominance in the Eastern Mediterranean region [55,77,78]. Genotype D was previously reported to be the most dominant variant of HBV in Iran, Iraq, Syria, Jordan, Lebanon, Palestine, Turkey, and Northern Cyprus [46,50,53,[79][80][81]. In Iran, two separate studies by Aghakhani et al. and Pourkarim et al. attributed the dominance of genotype D in the country to its geographical location and ethnic background [82,83].
The extensive genetic diversity of HBV in the MENA was more pronounced in some countries (KSA and Egypt). Saudi Arabia has a high level of international migration within the MENA region countries [84]. Thus, the large diversity of residents in KSA was reflected by the presence of the eight HBV genotypes in the country. A previous study by Al-Qudari et al. demonstrated such an extensive genetic diversity of HBV in KSA [85]. For Egypt-particularly in Cairo-large urban refugee communities existed for a long time, which might have contributed to the huge genetic diversity of the virus in the country [86]. Saudy et al. suggested that being a destination for many tourists and visitors, particularly from the countries where genotype D is prevalent, can explain the considerable genetic diversity and dominance of genotype D in Egypt [87].
Likewise, Sumer and Sayan suggested that the HBV genetic diversity seen in Northern Cyprus can be related to the relatively large fraction of students and workers present there, with similar evidence by Arıkan et al. [53,88]. An important point to be considered in areas with extensive genetic diversity of the virus is the higher possibility of recombination, which might yield novel viral variants, that may impact the prophylaxis, diagnosis, and treatment of the disease [89,90].
Another interesting observation was the dominance of SGT D7 in the Arab Maghreb (Northwest Africa). The sustained presence of SGT D7 in the Maghreb was linked to intrafamilial transmission in early childhood [91]. The dominance of SGT D7 in the Maghreb was previously reported by several original and review papers and is consistent with the results of this study [49,[91][92][93][94].
The early characterization of SGT D7 was reported in Tunisia by Meldal et al., and the evolution of this SGT in Tunisia was further dissected by Ciccozzi et al., who described its peculiar distribution geographically [91,94]. The latter study demonstrated a recent origin of SGT D7 dating back to the 1950s, with an exponential increase in infections through two main routes, namely, familial transmission and the unsafe use of needles [91]. The peculiarity of SGT D7 in this study extended to involve the classification with the Geno2pheno (HBV) online tool, where this SGT was initially classified as SGT D4. The consistent discrepancy between the GenBank sequence metadata and genotyping results, besides the lack of SGT D4 retrieval using the BLAST tool with D7 as the query sequences, demonstrated its genuine classification as D7 rather than D4, which is found mainly in Oceania rather than the MENA region [55]. A possible explanation of SGT D7's initial misclassification as SGT D4 is the shared recent common ancestor for both SGTs D4 and D7, with its possible divergence from SGT D5 that took place in the Maghreb as suggested by Ciccozzi et al. [91]. This result points to the importance of conducting phylogenetic classification using full genomes to achieve reliable conclusions about the genetic diversity of the virus and the continuous need for revising the classification for HBV, which is characterized by a swift evolution among human DNA viruses.
Other studies from both Algeria and Morocco add further evidence to the previous hypothesis of SGT D7 origin in the Maghreb [49,95]. Collaborative efforts are required in the region to help in better characterization of the HBV epidemic in the Maghreb, which in turn can help in the implementation of well-informed preventive public health measures to reduce the burden of CHB in the region [92].
Despite its global distribution, genotype D predominates around the Mediterranean and in the Middle East [15,26]. The clinical significance of genotype D stems from its possible association with poor prognosis, besides its higher correlation with acute liver failure [15,75,[96][97][98]. Previous studies investigating the origin of SGT D1 using the phylogeographic approach pointed to possible origins of this sub-genotype in the region (Syria and Turkey); thus, the predominance of SGT D1 in the Levant appears as an expected outcome [99][100][101]. The dominance of genotype E in Sudan contradicts a previous report with a small sample size in the country, which showed that genotype D was the most common, followed by genotype E [102]. However, the higher number of molecular sequences analyzed in the current study might point to the genuine dominance of genotype E in Sudan, which appears as a possible outcome considering the high prevalence of this genotype in Sub-Saharan Africa [103].
The proportion of phylogenetic clustering besides inter-country mixing of HBV lineages was the highest in Maghreb. This was higher than the proportions previously reported for HIV in Europe and the MENA region [37,73]. Taken together, the results of phylogenetic cluster analysis can give clues to very high levels of viral mobility for the following: SGTs A1 and A2; high levels for SGTs D1, D3, and genotype E; and high levels of domestic spread with inter-country mixing of SGT D7 viral lineages.
A previous phylogenetic suggestion of HBV movement in the MENA region was reported by Garmiri et al., where Iranian and Egyptian strains clustered together despite the lack of high statistical support using bootstrapping method [104]. The study by Pourkarim et al. that reported the clustering of Iranian HBV strains with isolates from Turkey, Syria, and Lebanon was in line with our results [83].
Another recent study from Jordan demonstrated evidence of possible inter-country spread of HBV in the MENA region as inferred through the observation of intermingling of HBV sequences from Iran, Turkey, Syria, and KSA with Jordanian sequences [50]. Nevertheless, the previous study by Ababneh et al. reported a higher proportion of SGT D1 clustering (30%) compared to the results of this study (15%) [50]. Possible explanations for this discrepancy can be the adoption of a stricter definition of statistical support for the internal nodes in defining the monophyletic clades in this study besides the examination of a larger dataset from a region rather than a single country.
Considering the evidence of domestic viral spread in the region, halting dispersal of HBV requires collaborative efforts and more comprehensive epidemiologic studies to identify possible risk factors of infection and proper vaccine coverage, which can be used to guide well-informed focused preventive measures [105].
Regarding the most common S gene mutations detected in this study, such mutations can have a notable impact on the biologic behavior of the virus as follows: changing antigenicity of the surface antigen, which may affect the specificity of monoclonal antibodies binding to this antigen [106,107]; reduction in the antigenicity of the surface protein of HBV with subsequent reactivation resulting in occult HBV infection [108]; the lower possibility of detection resulting in a higher risk of transmission in association with occult hepatitis B infection [108]; and possible reduction in the ability to detect HBV surface antibodies in immunoassays [109,110]. Such mutations should be monitored closely considering the potential survival advantage of such immune-escape mutants [111].
For the polymerase gene mutations, the overall prevalence of those mutations conferring antiviral drug resistance was low (mainly against lamivudine and telbivudine). Nevertheless, continuous monitoring of such mutations is recommended to prevent the selection of such mutants, which would hinder the management of CHB [112]. This result should be interpreted with extreme caution due to the lack of data for the treatment status of the individuals from whom the molecular HBV sequences were retrieved and analyzed in this study. The previous reports of the rapid emergence of resistance to lamivudine also shed doubt over this particular result [113][114][115]. Thus, more studies from various countries of the region are recommended to confirm or disprove such an uncertain result. Such research is particularly recommended in the Middle East considering the significantly higher prevalence of resistance to lamivudine, entecavir, and telbivudine.
The RT amino acid substitutions 204I/V were among the most frequent RT substitutions detected in the study. Such amino acid substitutions were reported previously in Iran and Jordan, even among treatment-naïve individuals [50,116,117]. The amino acid substitutions 204I/V can be considered among the signature resistance mutations to lamivudine with cross-resistance to telbivudine [118,119]. The decreased response to lamivudine associated with the selection of drug-resistant mutants was reported in Egypt and Iran and should be considered carefully since this antiviral is considered a cost-effective treatment option widely used in the MENA region [71,120,121].
The current study had several limitations as follows: first, a pre-requisite for the determination of HBV SGTs is the analysis of full-length genomes, which was not available in this study. Thus, slight differences in SGT assignments should be expected [122]. Second, a limitation that should be considered in any prospective research investigating HBV epidemiology is the need to consider the probable mode of transmission and serologic profiles among the infected patients, besides the treatment status, all of which were not available in the majority of sequences utilized in this study [123]. Third, the use of short genomic regions and sparse sampling were unavoidable in this study as well, which could have influenced the estimates of HBV clustering. Fourth, the possibility of recombination in the analyzed sequences was not ruled out. Finally, a few countries from the MENA lacked HBV molecular sequences in GenBank (Bahrain, Djibouti, Mauritania, and Qatar), besides the unequal distribution of the number of HBV sequences in this study with a predominance of sequences from Iran, Saudi Arabia, and Tunisia, which could result in biased results.

Conclusions
The following patterns of HBV genotype distribution were observed in the MENA: SGT D1 dominance in the Levant, Iran, and Turkey, SGT D7 dominance in the Maghreb, genotype E dominance in Sudan, and SGT A1 dominance in Somalia. In addition, an extensive genetic diversity of HBV was seen in Saudi Arabia and Egypt. Low prevalence of lamivudine, telbivudine, and entecavir drug resistance in the MENA was found in this study, with almost absence of resistance to tenofovir and adefovir. Transmission cluster phylogenetic analysis indicated the variable proportions of phylogenetic clustering. Specifically, the prominent domestic transmission of SGT D7 was observed (particularly in the Maghreb), while lower levels of clustering observed for genotype E and SGTs D1 andD3 suggests higher levels of virus mobility and inter-country spread of the virus in the region. Future epidemiologic studies involving the investigation of risk factors, vaccine coverage, the occurrence of antiviral drug resistance, and vaccine escape mutations are highly recommended to refine the preventive and management strategies in the MENA.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/pathogens10101333/s1, Figure S1: Maximum likelihood phylogenetic trees for the major HBV genotypes/SGTs used to conduct transmission cluster analysis. Visualization of the phylogenetic trees was conducted in the software FigTree, available freely at (http://tree.bio.ed.ac.uk/software/ figtree/); accessed on 29 August 2021. Funding: We declare that we received no funding nor financial support/grants by any institutional, private or corporate entity.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The complete list of molecular sequences analyzed in this study can be found in GenBank (https://www.ncbi.nlm.nih.gov/genbank/ accessed on 30 April 2021) and by contacting the original submitting authors.