High Frequency of Either Altered Pre-Core Start Codon or Weakened Kozak Sequence in the Core Promoter Region in Hepatitis B Virus A1 Strains from Rwanda

Hepatitis B virus (HBV) is endemic in Rwanda and is a major etiologic agent for chronic liver disease in the country. In a previous analysis of HBV strains from Rwanda, the S genes of most strains segregated into one single clade of subgenotype, A1. More than half (55%) of the anti-HBe positive individuals were viremic. In this study, 23 complete HBV genomes and the core promoter region (CP) from 18 additional strains were sequenced. Phylogenetic analysis of complete genomes confirmed that most Rwandan strain formed a single unique clade, within subgenotype A1. Strains from 17 of 22 (77%) anti-HBe positive HBV carriers had either mutated the precore start codon (9 strains with either CUG, ACG, UUG, or AAG) or mutations in the Kozak sequence preceding the pre-core start codon (8 strains). These mutually exclusive mutations were also identified in subgenotypes A1 (70/266; 26%), A2 (12/255; 5%), and A3 (26/49; 53%) sequences from the GenBank. The results showed that previous, rarely described HBV variants, expressing little or no HBeAg, are selected in anti-HBe positive subgenotype Al carriers from Rwanda and that mutations reducing HBeAg synthesis might be unique for a particular HBV clade, not just for a specific genotype or subgenotype.

The HBV genome has four overlapping open reading frames (ORF), the preS1/S2/S coding for surface antigens, the P encoding the polymerase, the precore/C encoding for both the circulating e-antigen (HBeAg) and the core antigen, and the X ORF encoding the HBx protein. The genome also has four promoters (preS1, preS2, core, and X) and two enhancer elements (ENI and ENII) located upstream of the core promoter [8]. There are seven polyadenylated and capped viral RNA transcripts which encode the viral proteins [9,10]. The pregenomic RNA (pgRNA), which serves as a template for reverse transcription into HBV DNA also encodes for both the core and the polymerase proteins. HBeAg, discovered in the early 1970s [11], is a suggested T-cell tolerogen [12,13], and is produced from the translation product of the longest HBV mRNA with transcripts initiating 29 codons, upstream of and in frame with the C start codon [8]. HBeAg is secreted from the hepatocyte into the blood and is a marker of active ongoing replication. Seroconversion to the corresponding antibody, anti-HBe, is often a sign of remission. Some patients, however, show persistence of serum HBV DNA, despite seroconversion to anti-HBe, due to the emergence of mutations in the core promoter or precore region.
The core promoter ((CP); nt 1575-1849) has an important role in the replication of HBV. It consists of the basal core promoter ((BCP); nt 1743-1849), partly overlapping with the precore region, and the upper regulatory region ((URR); nt 1613-1742). The BCP is sufficient for accurate initiation of both the precore-mRNA and pgRNA transcription, in vivo, and it contains the direct repeat 1 region, which is required for reverse transcription. The CP, thus, regulates transcription of the precore-mRNA and hence mutations in this region might cause reduced levels of HBeAg expression [14][15][16]. Therefore, certain mutations in this region can affect HBeAg synthesis without adversely affecting the ability of HBV to replicate [15,17,18]. The most common mutations in BCP are double mutations, at nucleotide positions 1762 and 1764, A1762T and G1764A, which are associated with downregulation of the production of HBeAg [15,18]. These double mutations have been detected more frequently in patients with fulminant hepatitis, than in asymptomatic carriers [19]. Other CP mutations have been shown to affect viral DNA replication and HBeAg expression [20]. Mutations C1766T and/or T1768A have been shown to enhance pgRNA synthesis 2.5-5-fold and reduce the HBeAg synthesis, at the same magnitude, by downregulating the precore mRNA [21]. Insertions within BCP causing increased pgRNA synthesis have been described in patients with chronic and fulminant hepatitis [22,23].
Mutations in the precore region are often nonsense or cause a frameshift which might terminate the HBeAg expression [24,25]. The most common is a G1896A mutation that forms a stop codon (TAG), within the precore region, thus, abolishes the translation of HBeAg [25,26]. This precore stop codon appears exclusively in strains with thymidine at nucleotide position 1858, thereby, not in genotype A, F (except subgenotype F1), and subgenotype C2, which express a cytosine at this position. All genotype G strains have two stop codons in this region, at positions 2 and 28. The double BCP mutations and precore stop codons are not mutually exclusive [16]. Lack of HBeAg expression can also be due to different mutations in the translation initiation codon of the precore protein. However, these variants seem to be infrequent and have previously only been described in isolated strains [19,[26][27][28].
This study investigated mutations in the HBV genome that might explain the high frequency of viremic infections in anti-HBe positive carriers in Rwanda.

Serum Samples
Serum samples from 25 Rwandan individuals with no known liver disease (20 blood donors and 5 healthy individuals) and 16 Rwandan patients with liver disease were used in the study. The patients were attending liver disease clinics at six different hospitals, from all regions of Rwanda. The patients were diagnosed with either chronic hepatitis or cirrhosis. The persons designated as healthy did not know that they had any liver disease. Determinations of HBeAg and anti-HBe, and sequencing of the S-genes, in all samples, have been described previously and are shown in Table 1 [7,29]. The viral load in the sera has also been described previously [7,29]. The study was approved by The Rwanda National Ethics Committee (RNEC 024/2014).

PCR Amplification
Nucleic acid extraction and virus DNA amplification were performed, as described previously [7]. Complete HBV genome could be amplified for 23 strains ( Table 1). The primers used are given in the Supplementary Table S1.
Partial CP region could be amplified with the same primers as those used for complete genome sequencing in additional 18 strains (14 from carriers with anti-HBe, 3 from persons with HBeAg, and one from a patients with liver disease, lacking markers for HBeAg and anti-HBe).

Sequencing
All amplified PCR products were purified and extracted with QIAquick PCR Purification Kit (Qiagen, Hilden, Germany), according to the manufacturer's description. The purified products were cycle sequenced in both directions, using 1.6 µM of the same primers as in the PCR, in the BigDye Terminator Cycle Sequencing Ready Reaction kit (Applied Biosystems, Carlsbad, CA, USA), according to the manufacturer's instructions. The sequences were obtained by the 3130 × l Genetic Analyzer (Applied Biosystems).

Phylogenetic Analysis
The sequences obtained were analyzed in the SeqMan program in the DNAStar programme package version 10.1.2 (DNA Star Inc, Madison, WI, USA). The sequences were aligned with 526 complete genomes, representing all HBV genotypes, obtained from the GenBank, including 75 genomes from subgenotype A1 strains from Africa. Phylogenetic analysis was carried out with the PHYLIP package version 3.65. Evolutionary distances were calculated using the F84 algorithm in the DNADIST program, with a transition/transversion ratio of 1.34, with gamma correction and alpha 0.23. Phylogenetic trees were constructed using the unweight pair-group method, using arithmetic averages (UPGMA) and the neighbor-joining method in the NEIGHBOR program in the PHYLIP package. Bootstrap analysis for 1000 replicas was performed with the SEQBOOT and CONSENSE programs in the PHYLIP package. The sequences obtained in this study are deposited in GenBank with accession numbers MK512455-MK512477.

Complete HBV Genomes
Complete genomes were obtained for 23 HBV A1 strains. Twelve were from HBeAg positive, 8 from anti-HBe positive, and three from individuals with both HBe markers. Phylogenetic analysis of the complete genomes confirmed that most HBV strains from Rwanda, together with two strains from the neighboring Tanzania and one from Democratic Republic of Congo formed a single clade, within subgenotype A1 ( Figure 1).

Figure 1.
Branch from an unweight pair-group method, using arithmetic averages (UPGMA) tree based on 526 complete Hepatitis B virus (HBV) genomes. The clades formed by each subgenotype of A are shown in the small tree to the left. One of two branches formed by subgenotype A1 complete HBV genomes, is enlarged. The origin of strains from the same country is marked with brackets at the nodes, the origin of the other stains are given at the nodes. Strains with wild type precore start codon and Kozak sequence preceding the precore start from patients with unknown HBeAg/anti-HBe status are marked by black squares at the nodes. Strains from HBeAg positive patients are marked by green squares. Strains from patients with anti-HBe and wild-type precore start codon and Kozak sequence are marked by blue squares. The strains marked with red or orange squares have either a changed precore start codon (red) or changed Kozak sequence (orange). The figures below the branches refer to boot strap values of 1000 replicas.
The P gene was 2484 nucleotides long, for all strains, apart from nine, which had deletions in the spacer region (Supplementary Table S2) [30], and one strain, RW2079, with an additional 18 nucleotide deletion between the nucleotide residues 1573-1590, in the RNaseH region of the polymerase [30]. The amino acid Gln334 in the polymerase, which is unique for subgenotype A1 [31], was present in all strains, however, the other unique A1 amino acid Lys338 was lacking in two strains, Rw14-25 from a blood donor and Rw2086 from a patient with liver disease. Both these strains expressed Gln338.
The S genes of the sequenced strains have been described previously [7,29]. The core region was 558 nucleotides long for all strains, except strain rw14-03, which had a C2396T mutation forming a stop codon, which made the core protein for this strain shorter by two amino acids, compared to the other sequenced strains (Supplementary Table S2).
The X gene was 465 nucleotides long for all strains, except three. Two had mutations affecting the basal core promoter. Strain Rw14-220 had a 12 nucleotides insertion with a stop codon between nucleotide positions 1766 and 1777, truncating the X gene to 408 nucleotides. The other strain, rw2199, had a 426 nucleotide long X gene, due to 39 nucleotide deletion between positions 1736 and 1775. The third strain with a shorter X gene, Rw2079, had the above mentioned 18 nucleotide deletions between nucleotides 1573 and 1590 also affecting the P gene (Table S2).

Core Promoter and Precore Regions of Complete and Partial Genomes
Due to mutations observed in the CP region in the 23 complete genomes, the region between nucleotides 1747 and 1927 was sequenced in additional 18 strains ( Table 1).
The most common mutations found in 22 strains from anti-HBe positive individuals in this study, were mutations altering precore start codon, in nine strains (41%; Tables 2 and 3) or weakening the Kozak sequence preceding the precore start codon [32], in eight strains (36% ; Tables 2 and 4). The mutations altering the precore start codon were CUG, UUG, AAG, and ACG, with CUG and UUG being the most common.   The Kozak sequence between residues 1809 and 1813 was not the typical wild-type GCACC sequence found in most genotypes, but TCATC in all strains from the HBeAg positive carriers. This Kozak sequence has also been described in strains from HBeAg and anti-HBe positive individuals from South Africa [33], and is probably the wild-type sequence for subgenotype A1 [21]. There were six different patterns of altered Kozak sequences between residues 1809 and 1813 in 8/42 (19%) sequenced strains with two Kozak sequences, TTCTC and TCCTC, shared by two strains each (Tables 2 and 4).
None of these mutations in the sequenced strains were associated with liver disease. Among the anti-HBe positive carriers, 11 of 13 healthy persons compared to six out of eight patients with liver disease, had either an altered precore start codon or a weakened Kozak sequence.
In the BCP region, the double mutation A1762T/G1764A was observed in six of the 41 sequenced strains, three were from healthy blood donors and three from patients with liver disease. One of the strains, rw2113 from a patient, also had an altered precore start codon, and one from another patient, rw2216, had a changed Kozak sequence preceding the precore start codon.
The four regulatory regions for the TATA binding proteins, within the BCP, between nucleotides 1750-1755, 1758-1762, 1771-1775, and 1788-1795, were conserved in all sequenced strains except rw14-120, which had a T1758C mutation in the second region and strain rw2199, which had a deletion covering the first three TATA regions. These three regions are important for initiation of the precore mRNA [34]. The fourth region, important for pgRNA synthesis and the pgRNA initiation at nucleotide positions 1821-1828 [34], was conserved in all sequenced strains.
The G1888A/C mutation stabilizing the eta signal [35] was found in 27 (66%) of the 41 strains sequenced in this study. These mutations were found in strains with or without the above-mentioned mutations and from both HBeAg and anti-HBe positive individuals. None of the strains had the G1896A mutation that forms a stop codon within the precore region, thereby, terminating the HBeAg expression, or any amino acid substitution in the precore region sequenced.
There was no correlation between viral load and the different mutations in the BCP, although the viral load in strains with altered precore start codon tended to be higher in patients with liver disease than in those without known liver disease (Table 2).

Discussion
In this study the high prevalence of viremic patients with anti-HBe without HBeAg reactivity, in Rwanda, for the majority of cases, can be explained by the identification of otherwise rarely described mutations in the CP region of the infecting HBV genomes. The most common mutations were either mutations altering the precore start codon or weakening the Kozak sequence, before the start codon of the precore gene. Both mutations, which were mutually exclusive, were likely to reduce the synthesis of HBeAg and, thereby, induce a virus escape from the host's immune response. Mutations in the Kozak sequence have also been identified in some South African, Kenyan, and Asian subgenotype A1 strains [21,33,36]. The change of the precore start codon has rarely been described and only from a few A1 strains from South Africa and untyped strains from Japan and Europe [19,[26][27][28]36]. In this study strains with these mutations were found in clades formed by A1 strains from Rwanda and Haiti, and A3 strains from Haiti and Cameroon. The clades formed by the strains with these mutations from Haiti and Cameroon were observed in the phylogenetic tree, but the mutations were not discussed in the publication describing these strains [37,38]. These results indicate that the functionally advantaged mutations reducing the HBeAg synthesis, might be restricted to a particular HBV clade and not just for a specific genotype or subgenotype.
The mutations observed in the CP region for the strains in this study, probably reduced the synthesis of the precore mRNAs but left the pgRNA synthesis essentially unaffected, since both the TATA rich region important for initiation of transcription and the initiation site for pgRNA [34] were conserved in all sequenced strains. The change of the precore start codon might abolish or just decrease the expression of HBeAg. Several of the identified mutations have been described as non-canonical start codons used by mammals, with the most efficient being the CUG [39,40], which was found in three of the strains, in this study, followed by UUG found in another set of three strains. This change of start codon for an ORF has also been found in other viruses, such as hepatitis E virus and plant RNA viruses [41,42], and has been shown to be 2-30% as efficient as AUG, as a start codon in mammalian cells [39]. CUG was the most commonly changed start codon found in subgenotypes A1, A2, and A3 sequences in the GenBank. Although the AAG has not been shown to be used as a non-canonical start codon, it was found in two strains, in this study.
The efficiency of transcription of an open reading frame is dependent on the preceding Kozak sequence with a weak Kozak sequence lowering the initiation of translation of the mRNA [42]. The strains from the anti-HBe positive persons, in this study, had either an altered precore start codon or weakened Kozak sequence changes. Possibly, the reason why the strains in this study and the sequences in the GenBank did not have both these changes, might be because both mutations might change the eta signal or that, for most strains, a low production of HBeAg might be preferential for the virus replication, which would be completely abolished if both mutations had occurred simultaneously.
One of the two most commonly described BCP mutations, the precore G1896A mutation leading to premature termination of the precore protein and, thereby, preventing the HBeAg production [25] was not observed in any strain sequenced, since they expressed C1858 [43], which stabilized the eta signal by base pairing with G1896. The G1896A mutation was, therefore, not observed in strains with C1858. The other commonly described BCP mutation was the double A1762T/G1764A mutation, in strains from viremic patients with anti-HBe [15], which occurred frequently in patients with hepatocellular carcinoma [36,44]. This double mutation was found in six strains in this study, three from healthy blood donors, and two with other BCP mutations.
Variants with deletions of partial BCP regions need coinfection with wild-type virus, for replication [45,46], and have in other studies been associated with low viremia levels. This might, however, be dependent on genotype, since the strains in the former studies were of genotype C, while the strain with deletion in BCP in this study belonged to subgenotype A1 from an HBeAg positive individual, with a high viral load of HBV DNA, and without a known liver disease. No wild-type sequence was observed, but deep sequencing of the strains might be needed to identify low levels of wild-type virus. The strain with an 11bp insertion in the BCP region was also from an HBeAg positive individual, without a known liver disease. This insertion had a binding site for HNF1, and has been described previously in transplanted patients with fulminant hepatitis [23]. These results indicated that this and other BCP mutations that have been associated with disease aggravations, might be more pathogenic, only in certain viral strains or host settings.

Conclusions
This study showed that previously rarely described HBV variants expressing little or no HBeAg were selected among subgenotype Al strains, from carriers in Rwanda, after they had seroconverted to anti-HBe, and that these mutations might be clade specific. The most common mutations concerned the precore ORF, with either an altered start codon or a Kozak sequence. The study also confirmed the uniqueness of HBV in Rwanda, where almost all strains from every region of the country belonged to the same clade, within the subgenotype A1 [7]. The strains originated from all regions of Rwanda, and the low divergence between them indicates, either a recent introduction of HBV into Rwanda, with a rather rapid spread, or that this strain is more viremic than other introduced strains and, thereby, has a faster spread. No complete HBV genome was available from neighboring Burundi, but two strains from Tanzania and the Democratic Republic of the Congo were found in the Rwandan clade, indicating that the Rwandan strain might either have been imported from or exported to one of these countries.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2073-4425/10/3/182/s1, Figure S1: Branch from a UPGMA tree, based on 526 complete HBV genomes. The clades formed by each subgenotype A are shown in the small tree to the left. The branch formed by subgenotype A3 complete HBV genomes is enlarged, Table S1: Primers used for amplification and sequencing complete HBV genomes. Table S2: Designation, genome length, and position and nucleotide length of the 5 ORFs, in the sequenced complete HBV genomes.