Molecular Evolution of GII.P31/GII.4_Sydney_2012 Norovirus over a Decade in a Clinic in Japan

Norovirus (NoV) genogroup II, polymerase type P31, capsid genotype 4, Sydney_2012 variant (GII.P31/GII.4_Sydney_2012) has been circulating at high levels for over a decade, raising the question of whether this strain is undergoing molecular alterations without demonstrating a substantial phylogenetic difference. Here, we applied next-generation sequencing to learn more about the genetic diversity of 14 GII.P31/GII.4_Sydney_2012 strains that caused epidemics in a specific region of Japan, with 12 from Kyoto and 2 from Shizuoka, between 2012 and 2022, with an emphasis on amino acid (aa) differences in all three ORFs. We found numerous notable aa alterations in antigenic locations in the capsid region (ORF2) as well as in other ORFs. In all three ORFs, earlier strains (2013–2016) remained phylogenetically distinct from later strains (2019–2022). This research is expected to shed light on the evolutionary properties of dominating GII.P31/GII.4_Sydney_2012 strains, which could provide useful information for viral diarrhea prevention and treatment.


Introduction
Norovirus (NoV) has remained the leading cause of acute gastroenteritis in people of all ages for several decades [1].In particular, after the introduction of rotavirus (RV) vaccines, NoV has become the primary cause of acute gastroenteritis (AGE) in children in many countries [2].Every year, NoV is predicted to cause 699 million illnesses and 219,000 deaths worldwide [1,3].This small, non-enveloped, single-stranded, positive-sense, RNA virus of the Caliciviridae family demonstrates extensive genetic diversity [4].Its ~7.5 kb long RNA genome is organized into three open reading frames, ORF1, ORF2, and ORF3, encoding the nonstructural proteins (e.g., VPg, protease, and polymerase), the major capsid protein (VP1), and the minor capsid protein (VP2), respectively [5].Of these, VP1 possesses the immunodominant antigenic sites: the hypervariable P2 subdomain induces the majority of the blocking antibody responses, whereas antibodies against the less variable P1 and shell domains are more cross-reactive and do not block [6].Based on the variations in VP1 amino acid (aa) sequences, NoVs are classified into at least 10 genogroups (GI-GX) and 49 genotypes [7].Among these, genogroup II genotype 4 (NoV GII.4) has been the most common since the mid-1990s, accounting for 62-80% of all NoV outbreaks worldwide over the last two decades [8].The predominance of NoV GII.4 is associated with the chronological emergence of phylogenetically distinct variants at 2-3-year intervals that are antigenically different due to differences in aa at antigenic sites (A-I) located on the VP1's outermost surface, allowing escape from previous infections [9].Since the mid-1990s, GII.4 has caused six major pandemics, including Grimsby_1995, Farmington Hills_2002, Hunter_2004, Den Haag_2006b, New Orleans_2009, and Sydney_2012, and many epidemics, such as Lanzhou_2002, Sakai_2003, Yerseke_2006a, Osaka_2007, Apeldoorn_2007, and HongKong_2019 [9].Among these, the Sydney_2012 variant appeared with a completely distinct collection of nonstructural polymerase proteins P31 (once known as Pe) that soon outcompeted all others, while still producing a maximum number of cases [10].This variant further acquired a new P16 polymerase protein, most likely through recombination with the GII.2 [P16] viruses, and produced GII.P16/GII.4_Sydney_2012strains, which swiftly predominated alongside GII.P31/GII.4_Sydney_2012worldwide since 2015 [11].
Despite the fact that most GII.4 variants circulated for 2 to 4 years, Grimsby_1995 predominated for more than 8 years, while GII.P31/GII.4_Sydney_2012has been dominant for more than a decade [9].Several investigations have focused on the key antigenic sites, namely A-G, and have shown that aa alterations in these epitopes are critical in escaping immune system action, resulting in global epidemics, as seen in the GII.4_Sydney_2012 lineage [12].However, advances in genome sequencing technologies over the last decade revealed several antigenic changes in major antigenic sites in VP1, despite the fact that the total antigenicity of Sydney_2012 has remained very similar for over a decade, suggesting that the prevalence of GII.4 is not simply due to the antigenic diversity of the capsid protein; nonetheless, comprehensive sequencing is required to understand this further [9].Several studies have focused on the whole genome sequences of GII.4_Sydney_2012, but only a handful have examined the evolutionary trend of GII.4 Sydney_2012 outbreak strains across time in a particular region.In this study, we analyzed 14 full genomes of GII.P31/GII.4_Sydney_2012strains collected from 2012 to 2022, mostly from Kyoto, Japan, during epidemics, to better understand the evolutionary characteristics that lead to molecular changes over time, which may aid in the development of effective vaccines against this disease.

Sample Characteristics
To gain insights into the evolutionary trend of globally dominating NoV strains, 14 NoV GII.P31/GII.4_Sydney_2012-positivestool samples were selected, of which 12 were collected from a single pediatric outpatient clinic in Kyoto prefecture between 2012 and 2022, while the remaining 2 were collected from Shizuoka prefecture in 2022.Full-length genome sequences (~7500 nt) were obtained from all 14 samples, of which 13 remained outbreak strains (Table 1).An outbreak strain was chosen if several children (eight or more) in the clinic were found to be sick with NoV GII.P31/GII.4_Sydney_2012at the same time.Notably, outbreak strains used in this study were primarily selected from three major outbreaks in Kyoto, each of which lasted for several months.To comprehend the genetic alterations during a single epidemic, several samples from both nearby and distant samples were chosen from each epidemic.The children were aged from 7 to 40 months.Co-infections were detected in four children with classic Astrovirus 1 (AstV1) in both Kyoto and Shizuoka in 2021-2022.

Phylogenetic Analysis and Nucleotide (nt) Identities of Individual ORFs
All three phylogenetic analyses of individual ORFs, including ORF1 (Figure 1A), ORF2 (Figure 1B), and ORF3 (Figure 1C), showed that all the analyzed strains belonged to the GII.P31/GII.4_Sydney_2012genotype, showing less association with other major GII.4 variants like Farmington_Hills_2002, Hunter_2004, Yerseke_2006a, Den Haag_2006b, and New Orleans_2009.Importantly, in all three ORFs, the analyzed strains remained divided into two clusters within the GII.P31/GII.4_Sydney_2012lineage.Namely, the earlier strains from 2013 to 2016 remained associated with the strains from a similar time in cluster I, while the later strains from 2019 to 2022 and their associates remained clustered into cluster II, suggesting that the genetic make-up of all three ORFs gradually changed over time.Interestingly, although the studied strains belonged to the GII.P31/GII.4_Syd-ney_2012genotype, none of these strains demonstrated close association with the original Interestingly, although the studied strains belonged to the GII.P31/GII.4_Sydney_2012genotype, none of these strains demonstrated close association with the original GII.P31/ GII.4_Sydney_2012 prototype (JX459908); rather, they remained more closely related to the AB972502 strain, another GII.P31/GII.4_Sydney_2012variant, isolated in 2011 in Niigata (AB972502) (Table 2).The average nt identities in ORF1, ORF2, and ORF3 were determined to be 98.9%, 98.4%, and 98.2% with the JX459908 strain, while they were 99.5%, 99.0%, and 99.3% with the AB972502 strain, respectively, for earlier strains.Meanwhile, for later strains, these nt identities were decreased similarly (0.6-1.1%) from both reference strains.Together, our data reveal that our strains remained genetically closer to the Japanese Sydney_2012 variant (AB972502) than the original Sydney_2012 prototype (JX459908) and additional genetic evolution occurred in later strains.

Discussion
This study aimed to investigate the evolutionary changes in the genetics of the globally predominant GII.P31/GII.4_Sydney_2012variant in a single community from 2012 to 2022.This GII.P31/GII.4_Sydney_2012variant grew and predominated mostly after the introduction of RV vaccines.In fact, RV vaccines continued to reduce disease severity in RV illness [14][15][16][17] but failed to regulate the diversity in RV genotypes [18][19][20][21][22] and the increasing trends of other diarrheal viruses including NoV [23].However, we need to investigate carefully how viral gastroenteritis will change after the COVID-19 pandemic and the spread of RV vaccination.AGE viruses persisted in substantial numbers in environmental samples even throughout the COVID-19 pandemic [24][25][26][27].Therefore, to control the overall burden of childhood diarrhea, it is important to examine the evolutionary changes in this dominant strain along with the adoption of RV vaccines.
This GII.P31/GII.4_Sydney_2012variant remained involved in more than six outbreaks in the Kyoto region during this period.We chose samples from the three main outbreaks that each continued for several months in Kyoto, along with one non-epidemic strain (17378).We considered that the possibility of genetic evolution during the outbreak period may have contributed to extending the duration of the outbreak.In this regard, samples from both adjacent and distant times were chosen from each outbreak.In addition, two strains from the Shizuoka outbreak of 2022 were investigated for comparison.This GII.4_Sydney_2012 genotype prevailed in the Kyoto region during this period.Although the RdRp genotype was not always examined, the recombinant GII.P16/GII.4_Sydney_2012genotype was detected rarely in Kyoto and not in any epidemic.Other NoV GII genotypes that were detected in Kyoto during the 2012-2021 period included GII.2, GII.3, GII.4 (2006b), GII.4 (2008a), II.4 (2008b), GII.4 (2009), GII.6, GII.14 and GII.17.Among these, GII.3 and GII.17 remained involved in outbreaks in Kyoto in 2014 and 2015, respectively.
Although GII.P31/GII.4_Sydney_2012appeared as a pandemic variant in 2012 reported first in Australia (Sydney-NSW0514/2012/AU accession JX459908) [28], it was also detected in 2010-2011 in several countries, including South Africa [29], Italy [30], the USA [31] and Japan [32,33].The full genome sequence of a GII.P31/GII.4_Sydney_2012strain (AB972502) isolated from Niigata, Japan, in 2011 exhibited 99.65%, 98.89%, and 98.13% nt identities in ORF1, ORF2, and ORF3, respectively, with those of the original JX459908 prototype.Interestingly, our strains remained closer to AB972502 rather than the original prototype, JX459908 (Table 2).Little diversity of the JX459908 strain from other Sydney_2012 strains has been also noticed in other studies [33].All six strains detected between 2018 and 2022 clustered away from the earlier strains in all three ORFs (Figure 1).In fact, strains of each epidemic season changed gradually over the course of time, which remained consistent with previous findings [33].Interestingly, a similar pattern of phylogenetic distribution was observed for studied strains in all three ORFs, suggesting that all three ORFs remained prone to evolutionary changes.
Importantly, the substitution of aa included many important functional positions.For instance, the substitution of 393 residues of ORF2 may affect HBGA recognition [32].All of our earlier strains detected before the 2018-2019 season expressed serine at 393 residues, which remained similar to that of the AB972502/Niigata strain as well as many other strains, like KJ649702/Hu/HKG/2014, KJ451059/2013/TW, LC005734/Hu/JP/2013, and AB933761/Osaka/2011/2009.However, later strains detected since 2018-19 expressed Glycine at the 393 position, which remained similar to GII.P31/GII.4_Sydney_2012prototype (JX459908) as well as KM272334/KR/2012, LC133344/Osaka/2014/JP, KX354113/2014/USA, OM373200/2018/CHN, LC699533/Tokyo/2021, OK148516/Hu/GZ19/2018/CHN, and AB933699/Akita/2011/2006b. While a single aa substitution remained very common, a few positions (such as 90 and 774 at ORF1, 340, 372, and 373 at P2 of ORF2, and 146, 164, and 174 in ORF3) showed two or more aa substitutions, which should be regarded as more vulnerable sites (Tables 3-5).Major aa changes remained similar in the same epidemic as well as in the non-epidemic strain 17378/Kyoto/Jan/2018-2019 and strains that were collected from Shizuoka (18958 and 18968/Shizuoka/2021-2022) in the same season.Figure 1 shows that all three ORFs of the Kyoto strains (18792, 18794, 18821) and Shizuoka strains (18958, 18968) of the 2021-2022 epidemic as well as the 17378 non-epidemic strain existed in the same cluster II in phylogenetic analyses, though there were few aa differences in these strains (Tables 3-5).In particular, the 17378 non-epidemic strain exhibited a few aa substitutions at positions 67, 161, 842, and 1522 in ORF1, 119 in ORF2, and 13, 134, 137, 159, and 181 in ORF3 that were absent from all epidemic strains.Though the precise function of these aa substitutions is still unknown, it is plausible that this strain cannot become strong enough to start an epidemic as a result of these unusual aa mutations.
Finally, this study presented the evolutionary changes of nucleotides at different residues in GII.P31/GII.4_Sydney_2012outbreak strains.Several informative mutations were identified, but their role in the phenotype remains unknown.The lack of antigenic testing to comprehend the role of substituted aa and the small sample size drawn nonrandomly only from three major epidemics remained the main shortcomings of this study.Nonetheless, this study presents all the mutations of the 14 strains and provides a general understanding of the genetic alterations of the GII.P31/GII.4_Sydney_2012strains in the same region/season as well as variations over time.This information may be helpful in determining the significance of significant mutations in the future.

Sample Selection and RNA Extraction
As a part of routine screening of diarrheal viruses, stool samples from AGE children were collected under the approval of the ethical committees of the University of Tokyo (1139) and Nihon University (25-13-0, 29-9-0, 29-9-1) and were investigated for the genetic diversity of 11 AGE viruses including NoV GII and 10 other enteric viruses including rotavirus (RV) A, B, and C, NoV GI, sapovirus (SaV), adenovirus (AdV), human astrovirus (AstV), human parechovirus (HPeV), enterovirus (EV), and Aichi virus which were detected using four sets of primers (A, B, C, and D) in four different multiplex RT-PCR reactions, as described previously [34].NoV GII was further subjected to polymerase-capsid dual typing by means of sequence analysis of the polymerase-capsid junction region as described earlier [35].
Finally, 14 NoV GII.P31/GII.4_Sydney_2012-positivestool samples were selected between 2012 and 2022: 12 from Kyoto and 2 from Shizuoka prefecture pediatric outpatient clinics.For next-generation sequencing (NGS), RNA was extracted from the 10% fecal suspensions using the QIAamp Viral RNA mini kit (QIAGEN, Hilden, Germany) following the manufacturer's instructions without using carrier RNA.

Next-Generation Sequencing (NGS)
The extracted RNA was subjected to Illumina MiSeq NGS as described previously [36].In brief, a 200 bp fragment library ligated with bar-coded adapters was constructed for 14 NoV GII strains using an NEBNext Ultra RNA Library Prep Kit for Illumina v 1.2 (New England Biolabs, Ipswich, MA, USA) according to the manufacturer's instructions.A cDNA library was isolated using Agencourt AMPure XP magnetic beads (Beckman Coulter, Brea, CA, USA).After evaluating the quality and quantity of the isolated cDNA library, 151-cycle paired-ends-read nucleotide sequencing was performed on a MiSeq Reagent Kit v2 (Illumina Inc., San Diego, CA, USA).MiSeq sequence data were analyzed using CLC Genomics Workbench v8.0.1 (CLC Bio, Aarhus, Denmark).Contigs were assembled from the obtained sequence data (trimmed) using de novo assembly.Using the assembled contigs as query sequences, the Basic Local Alignment Search Tool (BLAST) non-redundant nucleotide database was used to determine which contigs represented fulllength nucleotide sequences for each gene segment of the 14 NoV strains.To further refine the contigs, sequence reads for each gene segment were mapped back to the assembled contig.The nucleotide sequences of the strains were translated into aa sequences using GENETYX v11 (GENETYX, Tokyo, Japan)

Phylogenetic Analysis
Sequences were segmented into individual ORFs based on the sequences of ORFs of Sydney_2012 prototype (JX459908).Reference sequences were obtained from the NCBI GenBank database (https://blast.ncbi.nlm.nih.gov/Blast.cgi)accessed on 31 January 2024.Phylogenetic trees were constructed after multiple sequence alignments using MEGA7 software using the neighbor-joining method with the Kimura 2-parameter model and statistical significance testing by 1000 bootstrapping replicates.The deduced aa sequences were determined using BioEdit v7.2.5 software.The ORF1 was segmented into p48, NTPase, p22, Vpg, Protease, and RdRp based on their genetic mapping in the strain MK934772.The ORF2 was segmented into NTA, S, P1, and P2 domains as described earlier [5].The variability of the deduced aa sequences was determined using BioEdit v7.2.5 software.

Nucleotide Accession Number
Whole genome sequences of 14 GII.P31/GII.4_Sydney_2012strains determined in this study were deposited in the GenBank database for the accession numbers and shown in Table 1.

Conclusions
This study demonstrated the genetic evolution of GII.P31/GII.4_Sydney_2012strains that caused epidemics in a place at different times.By enabling researchers to gain a better understanding of gene activity and the mechanisms underlying reinfection, these data will be helpful in the prevention of infection and the creation of efficient vaccinations.Future evolutionary analyses of this globally dominant genotype will likewise need to continue.

Figure 1 .
Figure 1.Phylogenetic analysis of ORF1 (A), ORF2 (B), and ORF3 (C).These trees were constructed by means of the neighbor-joining method with the Kimura 2-parameter model nucleotide substitution model.The statistical significance was tested using 1000 bootstrapping replicates and values ≥ 80% are shown at the branch nodes.The strains detected in this study are shown in bold with underlining.Asterisks represent the prototypes.

Figure 1 .
Figure 1.Phylogenetic analysis of ORF1 (A), ORF2 (B), and ORF3 (C).These trees were constructed by means of the neighbor-joining method with the Kimura 2-parameter model nucleotide substitution model.The statistical significance was tested using 1000 bootstrapping replicates and values ≥ 80% are shown at the branch nodes.The strains detected in this study are shown in bold with underlining.Asterisks represent the prototypes.

Table 1 .
Whole genome sequences of 14 GII.P31/GII.4_Sydney_2012strains determined in this study were deposited in the GenBank database for the accession numbers.