Phylogenetic Investigation of Norovirus Transmission between Humans and Animals

Norovirus infections are a leading cause of acute gastroenteritis worldwide, affecting people of all ages. There are 10 norovirus genogroups (GI-GX) that infect humans and animals in a host-specific manner. New variants and genotypes frequently emerge, and their origin is not well understood. One hypothesis is that new human infections may be seeded from an animal reservoir, as human noroviruses have occasionally been detected in animal species. The majority of these sequences were identified as older GII.4 variants, but a variety of other GIIs and GIs have been detected as well. While these sequences share at least 94% nt similarity with human strains, most of them are >98% identical to human strains. The fact that these strains were detected in animals after they had been detected through human surveillance to be already circulating in humans suggests human-to-animal transmission.


Introduction
Noroviruses are an important cause of gastroenteritis in humans and animals [1]. Their genome is 7.5 kb in length and organized in three open reading frames (ORF1-3) [2]. ORF1 encodes a polyprotein that is enzymatically cleaved by the viral protease into six proteins, including RNA-dependent RNA polymerase (RdRp). ORF2 and ORF3 encode for the major and minor capsid protein (VP1 and VP2), which make up the virus capsid. VP1 is composed of the conserved shell-domain and the protruding (p)-domain, which contains the receptor binding sites that recognize histo-blood group antigens (HBGAs), and the antigenic sites [3,4]. Based on phylogenetic analysis of VP1 sequences, 10 genogroups have been identified (GI-GX), which are further divided into 49 genotypes, of which some include several variants [5]. Viruses within genogroups I, II, IV, VIII, and IX infect humans, with GI and GII being the most commonly detected genotypes. Viruses from the other genogroups have been found in a broad range of animals including cattle and sheep (GIII), cats and dogs (GIV, GVI, and GVII), rodents (GV), bats (GX), and harbor porpoises (GNA1). Despite this large number of genotypes, viruses within GII.4 are most commonly detected in humans and are responsible for the majority of outbreaks [6][7][8]. Norovirus diversity is additionally increased by recombination events between ORF1 and ORF2, resulting in new strains. New variants, genotypes, and recombinants frequently emerge in the human population, yet their origin is unknown. One hypothesis is that they originate from an animal reservoir. We have previously systematically reviewed serological evidence of transmission between animals and humans and described that more evidence exists for human-to-animal transmission than vice versa [9]. However, given the presence of host-specific noroviruses, the possibility of serological reactivity due to the presence of cross-reactive antibodies cannot be excluded. More conclusive evidence can be gained from virological testing, and although viral RNA of animal strains has not been detected in humans, viral RNA of human GI and GII strains has been detected in fecal material of calves, pigs, birds, captive macaques, dogs, and rodents (reviewed in reference [9]). Most of these animals are also susceptible to human noroviruses under experimental conditions [10]. This implies that animals could be a possible reservoir for human noroviruses. To explore this possibility and investigate the genetic relationship of human noroviruses detected in animals and humans, we have analyzed all human norovirus sequences that, to date, have been found in animal stool samples.

Phylogenetic Analyses
Published sequences of human noroviruses detected in animal feces were collected and searched against the entire GenBank database for DNA sequence (BLASTN). The 20 best hits were downloaded and typed using the Noronet typing tool [11]. Blast hits that were identical to each other were excluded. Sequences from animal inoculation experiments were also excluded. For the phylogeny we used the blast hits as well as sequences of the respective genotypes and variants from the Noronet typing tool reference sequence set (https://www.rivm.nl/mpf/typingtool/norovirus). Alignments were made using MUSCLE [12]. Maximum likelihood trees were created with PhyML v3.0 [13] (http://www.atgc-montpellier.fr/phyml/) and an automated model was selected by Smart Model Selection (SMS [14]) with 100 bootstrap replications. The trees were visualized using FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/).

BEAST Analyses
GII.7 and a GII.17 were the only sequences of which the whole VP1 was available and which contained nonsynonymous mutations compared to the most closely related human strains. Therefore, these were used in the BEAST analysis. All complete or near complete GII.7 and GII.17 VP1 sequences were downloaded from GenBank and aligned separately with MUSCLE [12]. The temporal signal of each group of sequences was evaluated with TempEst v1.5.3 [15] and sequence outliers were removed from the final dataset. Bayesian phylogenetic trees based on complete VP1 sequences were inferred using BEAST v1.10.4 [16]. For GII.7 sequences, the final dataset included 29 sequences in the alignment (1560 bp). The general time reversible (GTR) substitution model was used with 4 gamma categories with 3 partition into codon positions to generate an uncorrelated relaxed molecular clock. The tree prior was set as an exponential growth and random sampling. The Markov chain Monte Carlo (MCMC) was set to 50,000,000 generations to ensure convergence. For GII.17 sequences, the final dataset included 764 sequences in the alignment (1484 bp), corresponding to the period between 2013 and 2018 and belonging to the Kawasaki308 cluster. The HKY substitution model and the population size was assumed to be constant throughout its evolutionary history. The MCMC run was set to 400,000,000 generations to ensure convergence. In both datasets, if the day from the collection date was missing, the day was set as the 15th of the given month. If both day and month were missing, the collection date was set as June 15 of the given year. Log files were analyzed in Tracer v1.7.1 to check if ESS values were beyond threshold >200 [17]. The maximum clade credibility tree was constructed with 10% burn-in of the trees using TreeAnnotator v1.10.4. Trees were annotated and visualized using FigTree v1.4.3. The reliability of the branches was supported by 95% highest posterior densities (HPDs).

Mapping of Amino Acid Changes onto 3D Structure
Amino acid changes of GII.7 and GII.17 that were unique to strains found in animals, were mapped onto 3D p-particle structures using EzMol v2.1 [18]. The three-dimensional structure of the GII.7 Viruses 2020, 12, 1287 3 of 13 strain was predicted by homologous modelling using SWISS-MODEL server (available at https: //swissmodel.expasy.org/interactive) with default settings. The model was built on the basis of the crystal structure of the p-domain of a GII.17 strain (PDB number 5f4o.1). For GII.7 (KT943504 and KT943505) the predicted 3D p-domain structure was used and for GII.17 (KX356908) the p-domain structure of GII.17 Kawasaki (5LKG). The antigenic epitopes were inferred from those of GII.4 using multiple sequence alignment [17] and information on the HBGA binding site was taken from reference [19].

Norovirus Strains, Closely Related to Human Noroviruses, Are Found in Animals
Published sequences of human noroviruses detected in animal feces were collected [7], and sample information is summarized in Table 1. Human noroviruses have been found in a variety of mostly asymptomatic animals, of which the domestic pig was the most common species. While three whole genomes have been sequenced (two GII.4 Sydney[P31] from dogs and one GII.17[P17] from a rhesus macaque), most published sequences are short, 200-300 bp in length, and cover the 5 end of VP1, reflecting commonly used targets for diagnostic RT-PCR assays. Single sequences that cover different parts of ORF1 were not used for phylogenetic analysis but are listed in Table 1.  [28] Overall, the animal strains are very close or even identical to human strains, ranging from 94% to 100% nt identity. It is worth noting that none of these strains differed enough to be categorized as a new variant. Three sequences belonged to the GI genogroup and all others to GII, GII. 4 (Figure 1). The isolation dates of these samples coincide with the end of the time period that these strains were circulating in the human population (Table 1). Den Haag 2006b was most prevalent in the human population between 2006 and 2008 [21], but the collection dates of the animal samples fell between 2008 and 2009, with the exception of one RdRp sequence, which was collected in 2005. Two studies which included samples of close contact humans with symptoms detected identical GII.4 sequences in dogs and their owners: the two full genome GII.4 Sydney sequences found in Thailand and an unassigned GII.4 in Finland [25,26]. Most studies, unfortunately, did not include samples of close contact humans.
While GII.4 was the most commonly found genotype, other GII and GI noroviruses have been detected as well (Figures 2 and 3). Some strains matched the then-circulating strains in the human population, such as GII.3 and GII.17, of which the latter was one of the most prevalent genotype in the period 2014-2015 [34]. Other strains are less frequently found in humans, and their discovery in animals was therefore more surprising. These include the GI genotypes as well as GII.1, GII.2, GII.12, and GII.14 viruses. The recent finding of a GI.1 virus, which was identical to the prototype strain first isolated in 1968, is unexpected. This specific GI.1 is not detected in humans anymore, but newer GI.1 variants are sporadically detected in humans and in sewage [7,35,36]. These findings spark the question of whether the less frequently detected GII and GI viruses continue to circulate undetected in humans and animals.

Molecular Clock Phylogeny of GII.7 and GII.17 Genotypes
To investigate the evolutionary relationship between noroviruses detected in humans and animals and to estimate how long ago they diverged, we conducted a BEAST analysis. Of the noroviruses found in animals, the complete VP1 sequences were only available for viruses belonging to GII.4 (MK928498-99), GI.1 (KT943503), GI.6 (KC294198), GII.17 (KX356908), and GII.7 (KT943504/5). Of these, two GII.7 sequences and a GII.17 sequence (all found in rhesus macaques) were the only sequences with nonsynonymous mutations compared to the most closely related human strains. To determine the time to the most recent common ancestor (tMRCA) of the rhesus macaque-derived VP1 gene sequences to those found in humans, we performed separate BEAST analysis for these two genotypes.  Figure S1). However, it should be noted that the tMRCA 95% HPD interval is large and does not necessarily predate the tMRCA solely human GII.17 strains within the same clade.  . Sequences that were found in animals (red) were aligned with most closely related human sequences (black) and the reference sequences from the noronet typing tool (blue, black circle). The animal in which norovirus was found as well as the date and country of collection are indicated next to the sequence. The scale bar indicates nucleotide substitutions per site.

of 13
Viruses 2020, 12, 1287 9 of 16 Figure 2. Genetic characterization of non-GII.4 GII sequences found in animals. A maximumlikelihood tree based on 180 bp GII sequences was inferred with PhyML v3.0 software using the general time reversible nucleotide substitution model (GTR + G). Sequences that were found in animals (red) were aligned with most closely related human sequences (black) and the reference sequences from the noronet typing tool (blue, black circle). The animal in which norovirus was found as well as the date and country of collection are indicated next to the sequence. The scale bar indicates nucleotide substitutions per site.  . Sequences that were found in animals (red) were aligned with most closely related human sequences (black) and the reference sequences from the noronet typing tool (blue, black circle). The animal in which norovirus was found as well as the date and country of collection are indicated next to the sequence. The scale bar indicates nucleotide substitutions per site.

Animal GII.7 and GII.17 Sequences Contain Amino Acid Changes That Are Located either in or Adjacent to Antigenic Epitopes
Amino acid changes in the exposed protruding p-domain of the capsid can lead to differences in either HBGA binding or antigenic drift [24]. To identify whether the 13 and 4 amino acid changes found in GII.7 and GII.17 VP1 sequences from macaques are close to an antigenic epitope or receptor binding site, we mapped their location onto the predicted 3D GII.7 structure of the p-domain and a 3D GII.17 p-domain structure, respectively. The antigenic epitopes were predicted based on an alignment with GII. 4 sequences. The GII.7 sequence had several amino acid changes that were located either within a predicted antigenic epitope or in close proximity ( Figure 5A). Three changes were located directly in the predicted antigenic epitopes ( Figure 5C, Supplementary Figure S2): N294S and G295V in epitope A, and N346I in epitope C. Another seven were in close proximity to predicted epitopes: E375G was situated right next to the HBGA binding site, N343G, V290I and I291T were adjacent to epitope C, and V404A, R401Q, and L446M were next to epitope D. Two changes, I478V and Y514H, were on the surface but distant from any epitopes, while T54N was located outside of the p-domain. Of the four changes found in the GII.17 sequence, N342S was the only one in proximity to epitope C ( Figure 5B,C, Supplementary Figure S2). Y505H was on the surface but distant from any predicted epitopes. P280S and G282D are not surface exposed. Thus, some of the observed mutations potentially affect HBGA-binding specificity and antigenicity.  were adjacent to epitope C, and V404A, R401Q, and L446M were next to epitope D. Two changes, I478V and Y514H, were on the surface but distant from any epitopes, while T54N was located outside of the p-domain. Of the four changes found in the GII.17 sequence, N342S was the only one in proximity to epitope C ( Figure 5B,C, Supplementary Figure S2). Y505H was on the surface but distant from any predicted epitopes. P280S and G282D are not surface exposed. Thus, some of the observed mutations potentially affect HBGA-binding specificity and antigenicity.

Discussion
Norovirus genome sequences that are very similar or even identical to those of human strains have been detected in animals all over the world. The timing of detection of human-like sequences in animals almost invariably coincides with the circulation of the matching variants in the human population, and most sequences were highly similar, indicating a recent spillover. This was especially visible for GII.4 viruses, which-in the human population-undergo epochal evolution leading to emergence of antigenically distinct variants every few years, replacing the previous viruses [25]. For the GII.4 viral genomes detected in animals, assuming that the direction of transmission was from humans to animals seems most plausible, as the GII.4 variants were circulating in humans before they were found in animals. This was also the case for two studies that analyzed human and animal virus sequences from the same household [18,19].
The epidemiology of non-GII.4 genotype noroviruses is distinct. Non-GII.4 viruses also have a global distribution, and cause sporadic infections and outbreaks [23], but do not evolve as rapidly as GII.4 variants and do not show the pattern of variant replacement [24]. Nevertheless, our analysis showed that most GII sequences found in animals were also very close or identical to human strains, arguing against long-term circulation in animals. It should, however, be noted that sequence information was often limited to very short fragments that are commonly used as diagnostic targets, as the sequences cover conserved regions. It is intriguing that the two longer sequences belonging to GII.7 and GII.17 that were available were the viruses with the most diverged nucleotide sequence compared to human variants. They were both found in captive macaques, but no information about humans or contaminated food from those centers was available. The BEAST analysis placed the most recent common ancestor to human isolates four and eight years before their detection in macaques, revealing a considerable temporal and genetic gap of these genotypes. For the GII. 17[P17] strain detected in macaques the tMRCA predated their detection in humans. This can be explained either by lack of knowledge about the GII.7 and GII.17 diversity in humans or by the undetected circulation of these genotypes in a non-human reservoir. GII.7, and to a lesser degree GII.17, had accumulated amino acid changes that were located in regions predicted to define antigenicity of norovirus, thereby possibly resulting in an adapted phenotype. The epitopes in GII.7 and GII.17 were inferred from those of GII.4. It should be noted that these have only been established as antigenic epitopes in GII. 4 and not for any other genotype. However, comparison of capsid sequences indicates that GII.17 is evolving at previously defined GII.4 antibody epitopes [37]. In our analysis, the rhesus macaque GII.17 strain only had one mutation near the HBGA binding site compared to the most closely related strains detected in humans. Saliva binding studies using recombinant protein showed that the rhesus macaque GII.17 strain binds to human saliva samples with significantly lower binding signals than a similar human GII.17 strain with two mutations near the HBGA binding site [26]. Thus, animals can harbor human norovirus strains that potentially have antigenic and binding properties that differ from those detected in humans.
As the interface between wildlife and domesticated animals and humans is expanding, the risk of pathogens jumping the species barrier increases. While much of current virus research is focused toward transmission from animals-to-humans, our results show that the reverse should not be neglected, as it might have consequences for pathogen dynamics in humans as well as in animals. How often human-to-animal transmission of norovirus occurs, and if they are single events or if human strains circulate continuously in some animal reservoir, needs to be further addressed. Given the prevalence of host-specific viruses in several of the species of animals in which human norovirus sequences were detected, there is at least in theory the potential for recombination in case of dual infections. The question of whether human noroviruses in animals or recombinant human animal norovirus genomes are transmitted back into the human population, and therefore have an impact on (re)-emergence of noroviruses, remains to be answered.