Whole Genome Analysis of Human Rotaviruses Reveals Single Gene Reassortant Rotavirus Strains in Zambia

Rotarix® vaccine was implemented nationwide in Zambia in 2013. In this study, four unusual strains collected in the post-vaccine period were subjected to whole genome sequencing and analysis. The four strains possessed atypical genotype constellations, with at least one reassortant genome segment within the constellation. One of the strains (UFS-NGS-MRC-DPRU4749) was genetically and phylogenetically distinct in the VP4 and VP1 gene segments. Pairwise analyses demonstrated several amino acid disparities in the VP4 antigenic sites of this strain compared to that of Rotarix®. Although the impact of these amino acid disparities remains to be determined, this study adds to our understanding of the whole genomes of reassortant strains circulating in Zambia following Rotarix® vaccine introduction.


Introduction
Group A rotavirus (RVA), a widespread and infectious pathogen that causes dehydrating diarrhea, particularly in children under five years of age, was estimated to have caused approximately 128,000 deaths in 2016. A greater percentage of these deaths (approximately 105,000) occurred in sub-Saharan Africa [1]. The significance of RVA burden of disease led to the development of prophylactic vaccines. In that regard, the World Health Organization (WHO) recommended the use of rotavirus vaccines globally [2]. Four WHO-prequalified rotavirus vaccines (Rotarix ® , RotaTeq ® , ROTAVAC ® , and ROTASIIL ® ) are currently in use in 110 countries worldwide as of 5 April 2021 [3]. The two-dose monovalent vaccine Rotarix ® (RV1; GlaxoSmithKline Biologicals, Belgium) consists of a single human G1P [8] strain [2]. In sub-Saharan Africa, Rotarix ® is used in countries such as Kenya, Mauritania, Namibia, Niger, and Zimbabwe [3]. This vaccine was introduced in Lusaka, Zambia in 2012 as a pilot project and then rolled out nation-wide in 2013 [4,5]. Vaccine coverage in Zambia in 2019 was at 90% [6].
Due to the segmented genome of RVA, it is common for reassortment events to occur, which play a key role in generating the genetic diversity of the virus [11]. It is crucial to understand genetic exchange through reassortment, particularly those belonging to the two major genogroups, as well as various evolutionary mechanisms that contribute to genetic diversity. RVA genomes have high rates of mutation and are subject to frequent reassortment events, which are primarily responsible for rotavirus evolution [11][12][13][14][15][16]. RVA with unusual G-P combinations such as G1P [4], G2P [6], G2P [8], G3P [4], and G8P [4] are known to circulate in human populations as a result of intergenogroup reassortment between co-circulating strains. The G1P [4] and G2P [8] have been shown to circulate among G1P [8] and G2P [4] strains [16][17][18][19][20][21]. Most human RVA strains possess either a typical Wa-like or DS-1-like constellation and are thought to have an evolutionary fitness advantage that allows them to spread widely and persist in human populations [22,23]. Nevertheless, after the isolation of two naturally occurring intergenogroup reassortants between Walike and DS-1-like in Bangladesh in 1985-1986 [24], RVA strains possessing mixed gene constellations of human and/or animal origin have been documented in various parts of the world [25][26][27][28][29][30][31][32][33][34].
RVA strain surveillance based on conventional genotyping of VP7 and VP4 has been conducted in Zambia [35]. Unusual G-and P-combinations such as G1P [6] and G9P [6] were reported in 2011 before Rotarix ® was implemented. On the contrary, only the G2P [6] was reported post-vaccine implementation [35]. However, there is a dearth of Zambian whole genome sequence data. Here we report the whole genomes of four intergenogroup reassortant strains identified between 2014 and 2016 during the ongoing RVA surveillance in Zambia, to understand the mechanisms that result in genetic diversity among Zambian RVA post-Rotarix ® introduction.

Ethics Statement
Ethical approval was obtained from the Health Sciences Research Ethics Committee (HSREC) at the University of the Free State, Bloemfontein, South Africa (Ethics number UFS-HSD2020/0277/2104).

Study Samples
Stool samples were collected in the post-vaccine period from children who presented with acute gastroenteritis. The demographics and clinical profiles of the children from whom the study samples were taken at Arthur Davidson Children's Hospital (ACDH) in Ndola and University Teaching Hospital (UTH) in Lusaka are shown in Table 1. The four Zambian strains analyzed in this study were collected from one female and three male children aged between 5-20 months, as part of the ongoing rotavirus surveillance by the WHO/AFRO. The strains demonstrated a sporadic transmission pattern, devoid of any sign of an outbreak infection, as they were identified in different parts of Ndola and Lusaka. Further, clinical information indicated that all the children had diarrhea that lasted between a day and four days, with varying frequencies. Similarly, three children presented with two to three days of intermittent vomiting, and two children presented with fever. There were two cases of severe dehydration and one case of moderate dehydration because of diarrhea and vomiting. Two of the four children had been vaccinated, while the other two were not vaccinated. However, all the strains were detected post-Rotarix ® vaccine implementation (2014)(2015)(2016). No mortality resulted due to illness, as all children fully recovered.
Thereafter, the samples were screened for presence of RVA antigen using the ProSpecT™ Rotavirus Microplate Assay Kit (Oxoid limited, Basingstoke, Hampshire, UK) at the Virology laboratory in Lusaka, Zambia. The samples were then sent to the Diarrhoeal Pathogens Research unit (DPRU), a WHO Regional Reference Laboratory (Pretoria, South Africa) as part of the annual rotavirus surveillance conducted by the WHO-Regional Office for Africa. The samples were first conventionally genotyped at the DPRU before being shipped to the University of the Free State, Next Generation Sequencing unit (UFS-NGS) for whole genome sequencing and analysis. Four RVA strains from a total of 133 were seen to be reassortant strains after whole genome analysis which formed the basis of this study. To address two Terms of Reference between WHO and UFS-NGS, phylogenetic analysis was performed on all strains, and the possible antigenic disparities between Rotarix ® , global reference strains and one distinct study strain was assessed.

Extraction of Double-Stranded RNA and cDNA Synthesis
Viral double-stranded RNA was extracted and purified using a MinElute ® gel extraction kit (Qiagen, Hilden, Germany) according to already established methods [33,36]. Thereafter, the integrity of purified RNA was determined by electrophoresis on 1% TBE agarose gel stained with ethidium bromide (Sigma-Aldrich, Saint Louis, MO, USA). Before proceeding with synthesis of complementary DNA (cDNA), the purified dsRNA was first quantified on a Biodrop µLite platform (Biochrom, Cambridge, UK). Samples with a 1.8-2.2 absorbance ratio were considered pure for further processing [37]. A Maxima H Minus Double Stranded cDNA kit (Thermo Fisher Scientific, Waltham, MA, USA) was used in the synthesis of cDNA, according to manufacturer's instructions with minor modifications. The variations were captured in the UFS-NGS SOP as follows: denaturation of dsRNA was performed at 95 • C for five minutes and first strand synthesis was carried out at 50 • C for two hours. The generated cDNA was later purified using an MSB ® Spin PCRapace purification kit (Stratec, Invitek molecular, Berlin, Germany).

DNA Library Preparation and Illumina ® Sequencing
A Nextera ® XT DNA library preparation kit (Illumina ® , San Diego, CA, USA) was utilized to construct a DNA library according to the manufacturer's instructions. This process involved fragmenting DNA and subsequent addition of dual barcodes to the DNA fragments. Agencourt AMPure magnetic beads (Beckman Coulter, Indianapolis, Indiana, USA) were used to purify the barcoded libraries while simultaneously selecting for an average insert of 300 bp (range 200-400 bp). Subsequently, the libraries were validated and quantified prior to sequencing on a 2100 Bioanalyzer platform (Agilent Technologies, Santa Clara, CA, USA) and the Qubit ™ 3.0 fluorometer (Invitrogen, Carlsbad, CA, USA), respectively. The validated and quantified libraries were pooled and whole genome sequencing was performed on an Illumina ® MiSeq platform using a V3 MiSeq reagent kit (Illumina ® , San Diego, CA, USA) for 600 cycles to generate 2 × 301 bp paired reads. A 5% PhiX DNA control spike-in was used.

Genome Assembly
Sequence reads obtained from the Illumina ® MiSeq platform in FASTQ format were first trimmed and subsequently assembled using Geneious ® Prime 2019.2.1 (https://www. geneious.com/; accessed 5 April 2021) [38]. Genome assembly comprised both reference mapping as well as de novo assembly.

Identification of Genotype Constellations
The genotype of each of the 11 genome segments of the four Zambian RVA strains were identified on the Virus Pathogen Resource (ViPR), an online bioinformatics database and analysis resource for virological research [39]. The Basic Local Alignment Search Tool (BLAST) was also utilized as a complementary tool for genotype identification [40].

Phylogenetic Analysis
Reference sequences were compiled using BLAST as well as the Virus Variation Resource hosted by the National Centre for Biotechnology Information (NCBI) [40,41]. Multiple alignments were made for each gene using the MAFFT plugin in Geneious ® Prime version 2019.2.1 (https://www.geneious.com/; accessed 5 April 2021) and MUSCLE algorithm that is present in MEGA 6 [38,42,43]. Pairwise nucleotide and amino acid sequence identity matrices were calculated using the p-distance algorithm in MEGA 6 [43]. A maximum likelihood tree was constructed for each genome segment. Substitution models that best fit the data were selected based on corrected Akaike Information Criterion (AICc) in MEGA 6 [44]. The models used in this study were: GTR+G+I (VP1), TN93+G (VP2), GTR+I (VP3 and NSP1), T92+G+I (VP4 and VP7), T92+G (VP6, NSP2, NSP4, and NSP5), and TN93+I (NSP3). Branch support was estimated with 1000 bootstrap replicates [45].

Protein Modelling
Protein modelling was performed using the SWISS MODEL online server (SWISS-MODEL (expasy.org)) [46,47]. The RVA spike protein databank structure, 2dwr.1, was selected from the SWISS MODEL template library and had an X-ray diffraction resolution value of 2.50 Å. The stereochemical quality of the protein structure was assessed using the Structure Assessment feature in SWISS MODEL. The protein structure was modified and visualised using PyMOL (http://www.pymol.org/; accessed 5 April 2021) [48].

Genotyping Based on Whole Genome Constellations
Following Illumina ® MiSeq sequencing, complete or nearly complete nucleotide sequences for each of the 11 genes of the four study strains were obtained. The contig lengths and number of reads after assembly are shown in Table 2  The Wa-like genogroup is represented in green, while the DS-1-like genogroup is represented in red.

Phylogenetic and Sequence Analysis
To understand the genetic relationship of the four Zambian strains with global stains, a phylogenetic tree was resolved for each of the 11 gene segments. For the designation of lineages in the VP7, VP4, and VP1 trees, closely related strains as well as strains on the respective lineages, were selected from the GenBank using previously published articles as reference [49][50][51][52][53][54].

Phylogenetic Analysis of the VP7 Genes (G1 and G2)
Reference RVA strains utilized in this analysis segregated into the known seven G1 lineages and five G2 lineages [51,52]. A multiple sequence alignment and phylogenetic analysis of the VP7 genes of the four study strains showed that the Zambian G1 strains

Comparison of the VP4 Antigenic Epitopes of Zambian G2P[8] to Rotarix ®
The pattern of aa substitution occurring in P [8] strains in each lineage, including the phylogenetically distinct Zambian strain UFS-NGS-MRC-DPRU4749 that was seen to be phylogenetically distinct, was analyzed relative to that of Rotarix ® . It was observed that there were 26 fully conserved aa residues. Overall, most of the aa changes in the P[8]  indicate amino acid changes in the residues that have been shown to escape neutralisation with monoclonal antibodies. The normal dots (.) represent conserved amino acids relative to Rotarix ® .
Comparison of the divergent Zambian P [8] strain against Rotarix ® showed 30 identical aa residues spanning the VP4 antigenic epitopes (Figure 3). Seven aa changes, E150D, N195G, N113D, V115A, S125N, S131R, and N135D, were seen in the study strain relative to Rotarix ® (Figure 3). These changes were located on the surface of the protein structure ( Figure 4). Analysis of the Zambian P [8] strain relative to two selected strains of the most common lineage, lineage P [8] III [55], identified two aa differences (D113N and A115V). Doan et al. [52] established five lineages for global R2 strains. More recently, Agbemabiese et al. [49] proposed 14 lineages for R2 strains which included human and animal RVA strains. Based on this, one of the two DS-1-like Zambian strains, UFS-NGS-MRC-DPRU13327, clustered in lineage R2 V that mainly comprised of African strains ( Figure  5). This strain displayed maximum nt (aa) identities of 99.4% (99.7%) with strains from Zimbabwe and Mozambique (Supplementary data 1). In the VP1 phylogenetic tree, a cluster of strains within the R2 genotype could not be classified under any lineage according to the established designations [49,52] and were therefore named "undefined".
Strain UFS-NGS-MRC-DPRU4749 clustered independently ( Figure 5), and shared the highest similarity to RVA/Human-wt/IND/NIV1416591/2014/G9P [4] that clustered in Lineage R2 V, with nt (aa) identities of 93% (96.6%) (Supplementary data 1). Doan et al. [52] established five lineages for global R2 strains. More recently, Agbemabiese et al. [49] proposed 14 lineages for R2 strains which included human and animal RVA strains. Based on this, one of the two DS-1-like Zambian strains, UFS-NGS-MRC-DPRU13327, clustered in lineage R2 V that mainly comprised of African strains ( Figure 5). This strain displayed maximum nt (aa) identities of 99.4% (99.7%) with strains from Zimbabwe and Mozambique (Supplementary data 1). In the VP1 phylogenetic tree, a cluster of strains within the R2 genotype could not be classified under any lineage according to the established designations [49,52] and were therefore named "undefined".

Discussion
The present study reported on four intergenogroup reassortant strains in Zambia.
Whole genome sequencing and analyses demonstrated that the four study strains possessed mixed genotypes in at least one gene segment within the constellation between Wa-like and DS-1-like genogroups, hence were considered intergenogroup reassortant strains. Such reassortant strains have been detected in countries such as Germany, Japan, Lebanon, Malawi, Rwanda, Senegal, South Africa, and Zimbabwe [28,30,34,[59][60][61][62][63]. A key observation was made regarding strain UFS-NGS-MRC-DPRU4749. This strain was seen to be phylogenetically distinct in the VP4 gene, as it did not cluster into any of the already defined P [8] lineages [51]. VP4 analysis on BLAST and ViPR showed that UFS-NGS-MRC-DPRU4749 possessed a nucleotide variance of almost 10% to the closest strain. The same observation was made in the VP1 gene, whereby the Zambian strain clustered distinctly from other established R2 lineages proposed by Doan et al. [52] and Agbemabiese et al. [49], with a nucleotide variance of 7% to the closest strain. Further, the divergent Zambian strain was supported by bootstrap values of 88% and 79% at the branching node in the VP4 and VP1 phylogenetic trees, respectively. The large genetic distance to other global strains on both nt and aa level concurred with the distinct clustering seen in the VP4 and VP1 phylogenetic trees, thus strain UFS-NGS-MRC-DPRU4749 can be considered as a divergent strain.
The VP4 spike protein is proteolytically cleaved into VP8* and VP5* subunits by trypsin-like proteases present in the gastrointestinal tract of a host, which in the process activates the rotavirus particle [64,65]. The VP5* enables the penetration of the virus by permeabilizing lipid vesicles during infection, while the VP8* is thought to mediate attachment to the host [66][67][68]. Four (8-1 to 8-4) and five (5-1 to 5-5) epitopes are contained in the VP8* and VP5* subunits, respectively, which are targets for neutralizing monoclonal antibodies [55]. Neutralizing antibodies that target the VP8* neutralize infectivity of the virus by inhibiting attachment, while those directed against VP5* are thought to block membrane penetration [69,70]. The VP4 is involved in several important structural and functional roles such as attachment, penetration, and particle maturation. Due to this, the genetic variability is more restricted in human VP4 RVA as compared to the VP7 [7,58,70]. This characteristic is exploited by the current vaccines, Rotarix ® which contains a single human G1P [8] and RotaTeq ® that contains G1-G4 and a P [8] genotype [71]. Therefore, while the higher genetic variability in the VP7 may compromise immunity induced by vaccines, the VP4 component of vaccines may compensate when a human is infected with a P [8] strain. In agreement with the observation of low genetic variability in VP4, around 70% (26/37) of the aa residues belonging to the global human P [8] RVA strains, including Zambian strain UFS-NGS-MRC-DPRU4749, were fully conserved when compared to the Rotarix ® vaccine strain.
Accumulation of point mutations, along with reassortment and other mechanisms of rotavirus evolution, is a key mechanism that generates genetic diversity in RVA over time [11][12][13][14][15]. Seven aa substitutions were identified in the VP8* (8-1 and 8-3) region when the study strain UFS-NGS-MRC-DPRU4749 was compared against Rotarix ® . Of the seven, four were seen to be radical in nature (N195G, N113D, S131R, and N135D). With respect to the nature of aa, the N195G substitution resulted in a change in polarity (polar to non-polar) whereas N113D, S131R, and N135D resulted in a change in charge (polar neutral to acidic polar negative, polar neutral to basic polar positive, and polar neutral to acidic polar negative, respectively) [72]. One peculiar aa difference was the V115A which occurred only in the study strain. This mutation is considered conservative because the charge and polarity of the aa remained unchanged. It is therefore unlikely that this change would affect protein structure and hydrophobicity [72,73]. The impact of such a change on rotavirus transmission and vaccine effectiveness remains to be determined.

Conclusions
This study lends credence to reassortment being a major evolutionary mechanism in RVA. Since the other three Zambian strains were also collected during the post-vaccine period, the discovery of the phylogenetically and genetically divergent Zambian G2P [8] strain was unexpected. Given that this strain was identified in an unvaccinated child, it remains unclear whether the aa mutations present in the VP4 gene would have a negative impact on the effectiveness of the vaccine. Continuous surveillance of circulating RVA, along with whole genome sequencing and analysis is therefore critical in monitoring the impact of such reassortant strains on children, as well as their impact on effectiveness of current vaccine products.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/v13091872/s1, Supplementary data 1: Identity matrices (nucleotide and amino acid) among strains for the VP7, VP4, VP6, VP1, VP2, VP3, NSP1, NSP2, NSP3, NSP4 and NSP5 calculated using the p-distance algorithm in MEGA 6. Supplementary data 2: Additional phylograms containing Figure S1 (VP6), Figure S2 (VP2), Figure S3 (VP3), Figure S4  Informed Consent Statement: Consent was waived for patients, as the diarrhoeal stool samples were collected as part of the diagnostic clinical samples when parents took their child to the hospital for clinical management, which did not require written informed consent.

Data Availability Statement:
The authors confirm that all data relevant to this study is contained in the article and the supplementary materials. Additionally, sequences generated in this study were deposited in the GenBank under accession numbers MZ027412-MZ027455.