Whole Genome Sequence Analysis of Porcine Astroviruses Reveals Novel 2 Genetically Diverse Genotypes Circulating in East African Smallholder Pig Farms

10 Astroviruses (AstVs) are occurs globally and are common causes of gastroenteritis in human and 11 animals. The genetic diversity and epidemiology of AstVs in Africa is not well known, hence, we aimed to 12 genetically characterize astroviruses in asymptomatic smallholder piglets in East Africa. Twenty-four 13 samples randomly selected from 446 piglets (<6 months old), initially collected for rotavirus study, was 14 sequenced for metagenomic analysis. Thirteen (13/24) samples had contigs with high identity to genus 15 Mamastrovirus . Analysis of 7 strains with complete (or near complete) genome revealed variable 16 nucleotide and amino acid sequence identities with known PoAstV strains. The U083 and K321 strains 17 had nucleotide sequence similarities ranging from 66.4 to 75.4 % to the known PoAstV2 strains, 18 nucleotide sequence similarity of U460 strain with known PoAstV3 ranged 57.0 to 65.1 % to the, while 19 K062, K366, K451, and K456 strains showed nucleotide sequence similarities of 63.5 to 80 % to the 20 known PoAstV4 strains. The low sequence identities (<90 %) indicate that novel genotypes of PoAstVs 21 are circulating in the study area. Multiple recombination events were detected in our PoAstV4 strains, 22 indicating that the genetic diversity observed in these strains may be due to recombination. Importantly, we identified potential candidate epitopes with conserved peptides in our PoAstV strains that could aid 24 in the design of immune diagnosis tools and subunit vaccines. Our data provide new intuitions into the 25 genetic structure of porcine astroviruses in East African.


30
Porcine astroviruses (PoAstV) belong to the family Astroviridae consisting of Avastrovirus and 31 Mamastrovirus genera [2]. Astroviruses (AstVs) have been reported in a wide variety of mammals and 32 protein (ORF2), respectively [13]. Genome analysis has shown that ORF1b is the most conserved while 48 ORF2 is highly divergent due to selective pressure in this region. Recently, the Astroviruses Study Group,49 International Committee on Taxonomy of Viruses (ICTV) (http://ictvonline.org/virusTaxonomy.asp), 50 proposed a classification based on the amino acid sequence of ORF2 [15,16], of which 19 genotype 51 species of Mamastrovirus (MAstV 1-19) has been identifed. In pigs, the first porcine astrovirus (PoAstV) 52 was identified by electron microscopy in 1980 [17]. To date, four more PoAstV types (PAstV2-PAstV5) 53 have been identified [18][19][20][21]. Association of PoAstV with gastroenteritis has been reported [17,19,22,54 23], however, PoAstVs have also been found in healthy pigs [21,[24][25][26][27][28]. Despite the clinical and 55 agricultural significance of astroviruses, they are among the least studied enteric RNA viruses [29], which 56 could be due to the lack of small animal models and fewer full genome sequences for many AstV 57 species. Additionally, there are limited established culture systems for propagating AstVs [23,30]. 58 Studies of genetic and antigenic diversity among PoAstV strains in a given location may aid the design of 59 accurate diagnostic assays and vaccine and hence improved disease prevention [25]. Currently, there 60 are no PoAstV vaccines available commercially. In this study, the complete (or near complete) genomes 61 of seven porcine astroviruses were analyzed and characterized, which revealed novel porcine astrovirus 62 strains. Additionally, we determined potential linear antigenic epitope on capsid region of our PoAstV 63 strains with predicted high antigenicity, which could aid in the design of immune diagnosis reagents and 64 composition through a BLAST comparison with the Refseq complete viral genomes protein sequences 92 database from NCBI. 93 2.4.1 Phylogenetic analysis: We carried out multiple sequence alignment using Clustal Omega 94 (ClustalO) web server (http://www.ebi.ac.uk/Tools/msa/clustalo/) to evaluate the sequence percent 95 similarity of our strains with other known AstV genomes at nucleotide and amino acid levels. 96 Phylogenetic analysis of our assembled genomes and deduced amino acids were performed in MEGA X 97 [1] after building alignment by ClustalW algorithm and phylogenetic tree created using the neighbour-98 joining method [33] with 1000 bootstraps replicates. 99

Recombination analysis:
A nucleotide alignment was created by using ClustalW on the full 100 genome sequences of 28 mamastroviruses, including the seven porcine astroviruses from this study and 101 21 known astroviruses from swine, bovine, camel and human. Using Recombination Detection Program 102 (RDP4, Version 4.94), potential recombination patterns were screened by RDP, GENECONV, MaxChi, 103 Chimaera, SiScan, 3Seq and Bootscan methods, following Instruction Manual [34], using the step-down 104 correction for multiple comparisons and a p-value cutoff of 0.01. We checked regions of potential 105 recombinant interest using the above methods. We considered recombination events only when an 106 event was involving at least one of our 7 newly identified strains and supported by highest acceptable p-107 value of 0.05 with all the methods. 108 2.4.3 Linear antigen epitope prediction: We examined potential linear antigen epitopes on capsid 109 protein of our PoAstV strains using SVMTriP web-based software [35] which predicts linear antigen 110 epitope based on Support Vector Machine. The astrovirus uses capsid gene for attachment and entry 111 into host cells, therefore, accurate prediction of antigenic epitopes in this gene could be useful in the 112 design of immune diagnosis assays and/or subunit vaccines. The predicted epitopes by SVMTrip 113 software was further analyzed by IEDB analysis resource and Immuno-medicine Group tool, web based 114 programs which predicts protein segments that are probable antigenic and elicit antibody response [36, 115 37]. We examined the antigenicity of predicted candidate epitopes using VaxiJen v2.0 server [38]. This 116 server uses auto cross covariance (ACC) transformation of selected protein sequences based on unique 117 amino acid properties. Each sequence is used to find out 100 known antigen and 100 non-antigens. The 118 identified sequences are tested for antigenicity by leave-one-out cross-validation and overall external 119 validation, with prediction accuracy of up to 89%. Thereafter, we modelled the 3D structure of capsid 120 proteins using Swiss model server which is fully automated and provides full stoichiometry and the 121 whole structure of the complex as inferred by homology modelling [39]. Conformational B-cell epitopes 122 from the 3D model of our PoAstVs proteins were predicted by ElliPro, a webtool designed by Thornton's 123 method together with MODELLER program of a residue clustering algorithm and Jmol viewer [40]. 124

Glycosylation analysis:
Glycosylation is an vital post-translational modification, which influences 125 protein folding, localization and trafficking, protein solubility, antigenicity, biological activity and half-126 life, as well as cell-cell interactions. We investigated the spread of known and predicted N-glycosylation 127 sites across the capsid protein of the our PoAstV field strains using NetNglyc software [41]. This software 128 uses artificial neural networks that examine the sequence context of Asn-X-Ser/Thr sequons 129 (Asn=Asparagine, Ser=Serine, Thr=Threonine and X=any other amino acid except Proline) and 130 differentiates glycosylated sequences from non-glycosylated ones. The predictions are only shown on 131 Asn-X-Ser/Thr sequences, since only asparagine residues within Asn-X-Ser/Thr (and in some cases, Asn-132 X-Cys) are N-glycosylated in-vivo.  Table 1. Our strains had a characteristic AstV genomic organization having three 139 ORFs (ORF1a, ORF1b and ORF2), preceded by a 5' untranslated region (UTR) and ending with a 3' UTR, 140 with ribosomal slippage site between ORF1a and ORF1b. The 5' UTR region of K451 and U460 strains 141 was not assembled. They had a frameshift heptamer (AAAAAAC) and a stem-loop structure near the 3' 142 end of ORF1a. The occurrence of heptamer signifies a ribosomal frameshift during translation to create 143 the ORF1ab (replicase polyprotein) [12]. PoAstVs detected in this study showed the conserved tyrosine 144 residue within the TEEEY motif in the viral protein genome-linked (VPg) putative protein at the 3' end of 145 ORF1a (PoAstV3 contained SEEEY). Additionally, a classical YGDD motif was conserved in the middle of 146 the predicted RNA-dependent RNA-polymerase (RdRp) protein of all our strains [12,42,43]. All the 147 strains also contained a conserved sequence located at the junction of RdRp and capsid region 148 (UUUGGAGGGG(A/C)GGACCAAA(G/A)8/11AUGGC), which is a regulatory element utilized as a promoter 149 for sgRNA transcription [12,16]. Finally, all our strains contained Trypsin-like peptidase domain in the 150 nonstructural protein 1a and astrovirus capsid protein precursor domain. 151 152 the GenBank (63.5 to 80 %), while low identities (41 to 48 %) were noted with other AstV types (Table  159   S1). The U083 and K321 strains had nucleotide sequence identities of 82% among themselves and 66.4 160 to 75.4 % to the known sequences for PoAstV2 in the GenBank, while U460 strain showed nucleotide 161 sequence similarities ranging from 57.0 to 65.5 % with the known sequences for PoAstV3. These data 162 demonstrate a broad genetic divergence among the PoAstV strains circulating in East Africa region, and 163 therefore, we expect significant serological differences among them [13,44]. Further analysis of the 164 nucleotide and deduced amino acid sequences of the capsid region revealed significant variation among 165 the our strains. Analysis of the nucleotide and deduced amino acid sequences of the capsid region 166 revealed significant variations among East African field strains ranging from 51.7 to 77.8 % and 49 to 167 76.4 %, respectively, to other PoAstVs in the same group (Tables S2 and S3), suggesting that they are 168 novel PoAstV subtype, similar to reports in Japan [45]. We carried out phylogenetic analysis using the 169 nucleotide and/or amino acid sequences of the complete genome and capsid region (ORF2) of the 170 PoAstV reported in this study and those available from GenBank, together with selected AstV sequences 171 from other species to establish genetic relatedness (Figures 1 and 2). In the phylogenetic trees 172 constructed, East African strains clustered with astroviruses of the PoAstV2 (U083 and K321), PoAstV3 173 (U460) and PoAstV4 (K456, K451, K366 and K062) lineages, indicating that they are similar to those 174 strains.

177
The phylogenetic analysis showed our PoAstV2 were closely related at the nucleotide level, while the 178 PoAstV4 were very diverse, consistent with our comparative sequence analysis results. The sequence 179 identity between our identified novel viruses was mostly greater in the RdRp region than in the NSP1a 180 capsid region (data not shown). Capsid proteins are naturally under intense positive selective pressure 181 from the host immune reaction [46], hence is likely to be more diverse, as shown in this study. 182 According to the ICTV, the capsid protein encoded by the ORF2 is used to distinguish genotypes and 183 species of astroviruses. They described an amino acid sequence diversity in the capsid gene product of 184 <0.312 and >0.378 within and between astrovirus species, respectively. Analysis of evolutionary 185 divergence using the capsid region (ORF2) of our strains (Table S4) showed that PoAstV2 were 0.512 186 divergent among themselves and 0.418 to 0.579 with the known PoAstV2 strains in the same group. 187 have effects in vaccine design [47,48]. We further analyzed our strains for potential recombination, 207 since our previous study identified multiple genotypes of PoAstV in same pigs and/or same farms [3,24]. 208 To determine probable recombination events among our strains, we carried out recombination analysis Based on the recombination analysis, we concluded that all the four PoAstV4 strains may be of 214 recombinant origin. Event 1, where K062 is recombinant, was predicted at the ORF1a-ORF1b overlap, 215 whereas event 2 (K366 is recombinant) was predicted from the ORF1b-ORF2 junction and covering 216 almost the entire ORF2. The event 3 (K456 recombinant) was predicted at the 3' end of capsid region 217 (ORF2). These recombination patterns were supported by RDP, GENECONV, MaxChi, Chimaera, SiScan, 218 3Seq, LARD, Phylpro and Bootscan programs as shown in Table 2. The potential recombination event 2 219 reported in this study supports the previous suggestion that the ORF1b/ORF2 junction region is prone to 220 the recombination region in AstVs [49,50]. The co-existence of different PoAstVs within swine farms is 221 absolutely possible due to the high prevalence of PoAstV in swine farms [3,12], and this may promote 222 co-infections of one animal with more than one AstV strain at the same time as observed in this study. 223 Therefore, recombination events between different strains have been observed. As an immune-escape 224 mechanism, virus recombination events may lead to the generation of novel virus strains, to which the 225 affected host may have a lower immunity than to the parent strains.   (Table S5). Therefore, to narrow down on the potential linear antigenic 254 epitope candidates, which could be used as immunological targets, we further analyzed our sequences 255 using IEDB analysis resource software and immune-medicine group tool which used different algorithms 256 to predict antigenic epitopes. Using threshold value of 0.4, the antigenic property of recognized target 257 sequences from these epitopes was predicted by VaxiJen software. The sequences with values below 258 the threshold value were regarded as non-antigenic in nature while sequences with values above 259 threshold were deemed antigenic. Importantly, using the three analytic software above (SVMTrip,IEDB 260 and Immune-medicine), among all the epitopes, we determine three potential candidate motifs at the 261 surface of the structure and were present at same position on all the capsid proteins of PoAstV2, 262 PoAstV3 and PoAstV4 strains as shown in Table 3. The antigenicity of these predicted epitopes was 263 further analyzed with VaxiJen software with threshold of 0.4, which means the segment greater than 264 the threshold were potentially antigenic in nature. Based on the results of the VaxiJen, we are proposing 265 that the epitope at the amino acid position 126-161 is the best potential candidate epitope since it 266 contained a conserved motif in each genotype (Table 3). 267 After predicting the potential linear B cell epitope, we constructed a 3D model of capsid protein of all 268 our strains using Swiss model [39], to be able to predict potential conformational B-cell epitopes. We 269 then predicted the conformational B-cell epitopes of our PoAstVs capsid protein models using Ellipro 270 server ([40]. The sequence identity between capsid protein of our PoAstV strains and the selected 271 template (Q82452-Human AstV 1 strain) ranged from 39.45% -42.86%, a value greater than the 272 required 30% sequence similarity for creating suitable models [55]. Our potential candidate epitopes 273 predicted were identified in the capsid protein model by Ellipro software and visualized in Jmol to show 274 their 3D structures and comparative orientation of protein and peptide molecule ( Table 4). The amino 275 acid positions of each epitope were also verified by Jmol viewer. The peptide on these sites was 276 predicted to be highly antigenic (>1) that could be considered for effective vaccination or immune 277 diagnosis. Furthermore, all the selected epitopes were on surface of the capsid protein structure, hence, 278 suitable as the immunological targets for diagnosis of PoAstVs in the study region. 279 280 Table 3. Predicted antigenic epitopes within capsid protein (ORF2) of our field strains using three 281 different software and antigenicity of the predicted epitopes determined 282

Glycosylation analysis:
Glycosylation is normally required for progeny formation and infectivity of 293 many viruses [56]. High levels of glycosylation serve as a protective shield from the host's immune 294 system, where during virus entry into host cells, host cell glycans are viral receptors interacting with 295 carbohydrate binding proteins on the viral surface [57,58]. We observed glycosylation on the capsid 296 protein of PoAstV2, PoAstV3 and PoAstV4 with higher glycosylation sites in PoAstV2 compared to 297 PoAstV3 and PoAstV4 (Table 5). Studies have shown that, N-and O-linked glycans shield 298 immunodominant epitopes from immune recognition [57][58][59]. Our analysis of the predicted antigenic 299 epitopes at position 126-161 common between PoAstV2, PoAstV3 and PoAstV4 showed that they had at 300 least one glycosylation sites. Therefore, in-depth studies of the glycosylation in AstVs would be an 301 important step in designing suitable antigens for diagnostic tools and vaccine development. Based on 302 these results, we suggest that any approach that is based on inhibition of the host mechanism to 303 glycosylate astrovirus proteins may offer the best potential approach to develop therapeutics to 304 astrovirus infection. 305 =Any potential crossing the default threshold of 0.5 (predicted glycosylated site); +=N glycosylated, -=a 308 negative site; * = Proline occurs just after the Asparagine residue (unlikely to be glycosylated); the jury 309 agreement column indicates how many of the nine networks support the prediction; N=Asparagine, 310 S=Serine, T=Threonine. For picking up N-glycosylation sites with high specificity (Asparagine residues 311 very likely to be glycosylated), use only (++) predictions (and better) for Asparagines that occur within 312 the Asn-X-Ser/Thr triplet (no Proline at the X position) 313 314

315
To our knowledge, we report for the first time a detailed genetic characterization of whole genome of 316 three PoAstV strains from African region. Our findings give intuitions into the epidemiology and 317 evolution of PoAstV in the region and would facilitate investigations on the genetic diversity of PoAstV 318 globally. The finding of these novel PoAstV strains provides reveals how diverse AstVs are in smallholder 319 pig population in the study region where there is close contact between pigs and human. Our data taken 320 together would be helpful in the development of PoAstV immunodiagnostic tools and vaccines for 321 African region. However, the high genetic diversity among PoAstV strains (especially diversity at 322 predicted potential antigenic epitopes) and high levels of glycosylation reported in the PoAstVs reported 323 here may pose difficulties for development of virus detection methods, subunit vaccines, as wel as 324 implications on epidemiological investigations. Importantly, this study identified three potential linear 325 antigenic epitopes occurring at the surface of the capsid protein structure, which could be used as the 326 immunological targets for the proper diagnosis and treatment of PoAstVs in the study region. The 327 generated information can be used to test the predicted function of these peptides by conducting in 328 vitro and in vivo experiments to confirm the immunogenicity and ultimately the vaccine properties to 329 prevent PoAstV infections. Finally, understanding the genetic differences of these novel PoAstV variants, 330 that may emerge locally or globally through genetic drift or shift, in wild and domestic animals could 331 lead to early identification of the source of an emerging outbreak leading to faster and more targeted 332 interventions to control and/or limit the spread of such outbreaks.  Table S1: Pairwise comparison of nucleotide sequence identities of the complete (near complete, U460) 340 genomes of the seven (7) astrovirus field strains (bold) and with sequences of other astroviruses 341 available in GenBank 342 Table S2. Summary of nucleotide sequence identity matrix of the capsid protein (ORF2) among the 343 seven (7) astroviruses field strains (bold) and the known reference strains in the GenBank using Clustal 344 Omega 345  Table S3. Summary of amino acid sequence identity matrix of the capsid protein (ORF2) among the 7 346 astroviruses field strains (bold) and the known reference strains in the GenBank using Clustal Omega 347