Genome Sequences and Characterization of Chicken Astrovirus and Avian Nephritis Virus from Tanzanian Live Bird Markets

The enteric chicken astrovirus (CAstV) and avian nephritis virus (ANV) are the type species of the genus Avastrovirus (AAstV; Astroviridae family), capable of causing considerable production losses in poultry. Using next-generation sequencing of a cloacal swab from a backyard chicken in Tanzania, we assembled genome sequences of ANV and CAstV (6918 nt and 7318 nt in length, respectively, excluding poly(A) tails, which have a typical AAstV genome architecture (5′-UTR-ORF1a-ORF1b-ORF2-‘3-UTR). They are most similar to strains ck/ANV/BR/RS/6R/15 (82.72%) and ck/CAstV/PL/G059/14 (82.23%), respectively. Phylogenetic and sequence analyses of the genomes and the three open reading frames (ORFs) grouped the Tanzanian ANV and CAstV strains with Eurasian ANV-5 and CAstV-Aii viruses, respectively. Compared to other AAstVs, the Tanzanian strains have numerous amino acid variations (substitutions, insertions and deletions) in the spike region of the capsid protein. Furthermore, CAstV-A has a 4018 nt recombinant fragment in the ORF1a/1b genomic region, predicted to be from Eurasian CAstV-Bi and Bvi parental strains. These data should inform future epidemiological studies and options for AAstV diagnostics and vaccines.

Although they are globally infecting diverse avian species, the prevalence and epidemiology of AAstVs in the African poultry industry remain largely unknown. The vast majority of currently available AAstV genomic data are from Sanger sequencing of polymerase chain reaction amplicons. Using nontargeted next-generation sequencing (NGS) of a cloacal sample from an adult backyard chicken from Tanzania, we assembled and molecularly characterized the genome sequences of the ANV and CAstV strains.

Samples, RNA Extraction and NGS
The oropharyngeal (OP) and cloacal (CL) samples used in the current study were part of a consignment of samples from backyard chickens collected at live bird markets (LBMs) in Arusha, Dar es Salaam, Iringa, Mbeya, Morogoro and Tanga in Tanzania, during surveillance of the Newcastle disease virus (NDV) conducted between September 2018 and May 2019. From each bird, one OP and one CL was collected. During the sampling period, the flocks did not present overt clinical signs consistent with avian diseases, and their vaccination statuses or histories were not available. After collection using standard procedures [36], the samples were shipped to the Southeast Poultry Research Laboratory (SEPRL) in Athens, GA, USA for total RNA extraction using the MagMAX™-96 AI/ND Viral RNA Isolation Kit (Thermo Fisher Scientific, Waltham, MA, USA) as recently described [37]. Preparation of sequencing libraries (sequence-independent, single-primer amplification [38] and Nextera TM Flex protocols) and paired-end NGS (500-cycle MiSeq Reagent Kit v3) using the Illumina MiSeq platform were performed as previously described [39]. For the current study, 20 birds that were NDV-positive by real-time reverse transcription-polymerase chain reaction (rRT-PCR) were selected (n = 40 samples; one OP and one CL from each of the 20 birds).

Nontargeted Virus Discovery and Genome Assembly
The NGS detected AAstV-specific RNAs in a CL sample from a chicken (ID IM162) sampled in April 2019 from Miomboni LBM located in the urban district of Iringa, Tanzania; the counterpart OP swab did not contain detectable AAstV RNAs. The CL sample contained 15,643 and 9393 reads specific to ANV and CAstV (based on BLASTn), respectively, which were assembled de novo into full-length genome sequences of ANV and CAstV strains (size of 6919 and 7318 nt, excluding the poly(A)tail; single contigs with median read depth coverage of 1502X and 479X, respectively). In addition to AAstVs, genetic sequences of bacterial and viral species of avian interest were identified, including Enterococcus sp. (E. cecorum and E. faecium) and Gallibacterium anatis, avian leukosis virus and sicinivirus.

Genetic Relationships of the Tanzanian Strains with Other AAstVs
The lengths of the gene coding regions of the Tanzanian strains are consistent with other AAtstVs that have full-length genome sequences available in GenBank (Table 1). Based on the complete CP aa sequences, CAstV and ANV phylogenetically group with, but are distinct from, European CAstV-Aii and ANV-5 viruses, respectively ( Figure 1; see Figure S1 for details of the taxa in the condensed subtrees). The two Tanzanian AAstVs identified in the current study have been named as ck/ANV-5/TZ/IM162/19 and ck/CAstV-A/TZ/IM162/19 and are hereafter abbreviated as ANV-5/IM162/19 and CAstV-A/IM162/19, respectively.
The clustering of the Tanzanian strains observed in the CP tree topology was similar to the phylogenetic trees based on the sequences of the full-length genomes, nsP and RdRp ( Figure S2).
The full-length genome and the nsP sequences of the Tanzanian ANV-5/IM162/19 are most similar to those of a Brazilian ANV-8 strain RS/6R/15 with nt identities of 82.72% and 86.22%, respectively. However, the CP sequences are most similar to Dutch and Chinese ANV-5 viruses with nt identities in the range of 72.2-76.68% ( Table 2). The CP showed the lowest nt identities to pigeon ANV-6 viruses (42.44-46.20%) and European ANV-1 and ANV-2 viruses (56.68-58.20%). The RdRp showed the highest nt identities amongst the chicken ANVs (87-91%).  Figure S1. The final data set involved 147 sequences and a total of 577 positions. Genogroups are named as explained in the text. Sequence names include GenBank accession numbers, abbreviated host avian species, and country/strain/strain/year of isolation. Abbreviations: AAstV, Avastrovirus; ANV, avian nephritis virus; CAstV, chicken astrovirus; DAstV, duck astrovirus; GfAstV, guinea fowl astrovirus; GoAstV, goose astrovirus; TAstV, turkey astrovirus.  Figure S1. The final data set involved 147 sequences and a total of 577 positions. Genogroups are named as explained in the text. Sequence names include GenBank accession numbers, abbreviated host avian species, and country/strain/strain/year of isolation. Abbreviations: AAstV, Avastrovirus; ANV, avian nephritis virus; CAstV, chicken astrovirus; DAstV, duck astrovirus; GfAstV, guinea fowl astrovirus; GoAstV, goose astrovirus; TAstV, turkey astrovirus.    Alignment of the coding regions of the CP sequences revealed numerous aa variations throughout the protein sequence when comparing ANV-5/IM162/19 with representative strains of ANV-1 to ANV-11. As expected, most of the aa variations are in the spike region of the CP protein [13], which are illustrated in Figure 3. The variations include substitutions in ANV-5/IM162/19 compared to other ANV-5 viruses, as well as aa insertions and deletions (indels) when comparing ANV-5 viruses to viruses belonging to other ANV subgroups.

Genomic Characteristics of Strain CAstV-A/IM162/19
Genome architecture of CAstV-A/IM162/19 is typical of CAstVs with ORF1a (3294 nt in length) and ORF1b (1560 nt in length) overlapping with a 19 nt linker that harbors RFS (position 3354 AAAAAAC 3360 ), and ORF2 (2166 nt in length) separated from ORF1b by a 24 nt spacer that contains the AAstV pentamer located at position 4914 CCGA 4918 (Figure 4). Based on the genome sequence data currently available in the GenBank, the 5 -UTR of CAstVs remain poorly described; there is no consensus on the lengths of this region. It is therefore possible that the 5 -UTR of the Tanzanian CAstV-A/IM162/19 reported here could be incomplete, which we are currently investigating. The three ORFs are flanked by  (Figure 4).
Heterogeneity in the spike region of the CP (the ORF2 product) was assessed by aligning translated aa sequences of representative strains of CAstV-A (subgroups i-Aiii and CAstV (subgroups Bi to Bvi). The alignment segregated CAstV-A from CAstV-B viruses, with 11 indel regions between the two groups (arbitrarily named A to K in the figure), and with 17 aa substitutions when comparing CAstV-A/IM162/19 and other analyzed CAstV-Aii viruses ( Figure 5). CAstV-A and B viruses shared only pockets of 1-3 identical aa residues. Overall, the spike region of CAstV-A viruses is more heterogenous than CAstV-B viruses which share 77.0% and 82.2% aa identical sites, respectively.   Variations in the aa residues of the spike (P2) region of the CP sequence of ANV-5/IM162/19 strain (highlighted in red font) compared to representatives of ANV genogroups 1-11 (ANV-1 to ANV-11), shown in square bracket on the left side of the alignment). The aa positions relative to the first (methionine) residue of the protein-coding CP gene are indicated at the top of the consensus (aligned) sequence. The 1979 Japanese ANV-1 strain G-4260 (highlighted in blue), which has been reported to be widely spread in Japanese and European commercial farms [13], is used as reference in the alignment, and four of the nine variable regions (VR-F to VR-I) in the P2 region of the CP sequence are highlighted (shaded in grey color). Identical and missing aa residues are indicated by dots (.) and dashes (-), respectively. Sequence names include GenBank accession numbers, abbreviated host avian species, and country/strain/strain/year of isolation. Variations in the aa residues of the spike (P2) region of the CP sequence of ANV-5/IM162/19 strain (highlighted in red font) compared to representatives of ANV genogroups 1-11 (ANV-1 to ANV-11), shown in square bracket on the left side of the alignment). The aa positions relative to the first (methionine) residue of the protein-coding CP gene are indicated at the top of the consensus (aligned) sequence. The 1979 Japanese ANV-1 strain G-4260 (highlighted in blue), which has been reported to be widely spread in Japanese and European commercial farms [13], is used as reference in the alignment, and four of the nine variable regions (VR-F to VR-I) in the P2 region of the CP sequence are highlighted (shaded in grey color). Identical and missing aa residues are indicated by dots (.) and dashes (-), respectively. Sequence names include GenBank accession numbers, abbreviated host avian species, and country/strain/strain/year of isolation. , and viral genome-linked protein (VPg). The structural features are derived from available genomic datasets on AstVs [5,13,17,31,35,50]. Sequence names include GenBank accession numbers, abbreviated host avian species, and country/strain/strain/year of isolation. , and viral genome-linked protein (VPg). The structural features are derived from available genomic datasets on AstVs [5,13,17,31,35,50]. Sequence names include GenBank accession numbers, abbreviated host avian species, and country/strain/strain/year of isolation.

Analyses of Recombination Events
Because of the diversity observed in our analyses and reports of their occurrence in AAstVs in poultry [28], we assessed the possible occurrence of recombination events in the Tanzanian AAstVs using the RDP4 suit, which involved the 78 full-length genome sequences used for the phylogenetic analyses shown in Figure S1. The analyses detected a recombination signal in strain CAstV-A/IM162/19, with the recombinant fragment covering 75% of the C-terminal region of ORF1a (2461 nt), the entire ORF1b (1560 nt) and 16 out of the 24 nt in the ORF1b/ORF2 spacer region ( Figure 6).

Analyses of Recombination Events
Because of the diversity observed in our analyses and reports of their occurrence in AAstVs in poultry [28], we assessed the possible occurrence of recombination events in the Tanzanian AAstVs using the RDP4 suit, which involved the 78 full-length genome sequences used for the phylogenetic analyses shown in Figure S1. The analyses detected a recombination signal in strain CAstV-A/IM162/19, with the recombinant fragment covering 75% of the C-terminal region of ORF1a (2461 nt), the entire ORF1b (1560 nt) and 16 out of the 24 nt in the ORF1b/ORF2 spacer region ( Figure 6).  (table at bottom right), with a Swiss strain CH/PB41-SI14/19 (GenBank accession: OM469240) as the "minor parent"-strain most identical to the recombinant fragment in CAstV-A/IM162/19 (in this case at 92.6% nt identity). The "major parent" (i.e., strain most identical to CAstV-A/IM162/19 in the genomic region surrounding the recombinant breakpoints) is unknown, but RDP4 inferred it to be a Chinese CAstV-Bvi strain GDYHTJ718-6/18 (Gen-Bank accession: MN725026). The vertical dotted line indicates bootstrap cutoff (75%) used to determine the significance of the breakpoints. Domain features (described in Figure 4) in the recombinant fragment are shown in the top panel.
The predicted minor parent strain, which is defined here as the strain most identical to the recombinant fragment in CAstV-A/IM162/19, is a Swiss CAstV-Bvi strain CH/PB41-SI14/19 (GenBank accession: OM469240); the 4018 nt recombinant fragment in the Tanzanian strain is 92.6% identical to the homologous region in the Swiss virus. The Swiss virus was  (table at bottom right), with a Swiss strain CH/PB41-SI14/19 (GenBank accession: OM469240) as the "minor parent"-strain most identical to the recombinant fragment in CAstV-A/IM162/19 (in this case at 92.6% nt identity). The "major parent" (i.e., strain most identical to CAstV-A/IM162/19 in the genomic region surrounding the recombinant breakpoints) is unknown, but RDP4 inferred it to be a Chinese CAstV-Bvi strain GDYHTJ718-6/18 (GenBank accession: MN725026). The vertical dotted line indicates bootstrap cutoff (75%) used to determine the significance of the breakpoints. Domain features (described in Figure 4) in the recombinant fragment are shown in the top panel.
The predicted minor parent strain, which is defined here as the strain most identical to the recombinant fragment in CAstV-A/IM162/19, is a Swiss CAstV-Bvi strain CH/PB41-SI14/19 (GenBank accession: OM469240); the 4018 nt recombinant fragment in the Tanzanian strain is 92.6% identical to the homologous region in the Swiss virus. The Swiss virus was detected from an RSS-affected broiler flock [51].The major parental strain, which is defined here as the strain with the genomic region surrounding the recombinant breakpoint most similar to the recombinant strain (i.e., CAstV-A/IM162/19), could not be identified amongst the analyzed viruses. However, RDP4 inferred the major parent to be also a CAstV-Bi strain; the closest match is a Chinese strain (GDYHTJ718-6/18; GenBank accession: MN725026), which was reported as a novel AAstV from a sero-positive chicken flock [52]. Our analyses did not detect recombinant signals in the two parental strains.
A recombinant event signal was also detected in ORF1a (647-1439 nt region, which contains the viroporin domain) of ANV-5/IM162/19, with a British genogroup ANV-3 strain UK/VF14-92-A2 (GenBank accession: MT585643) as the major parent and an unknown minor parental strain. Although this recombination event signal was supported by six out of the nine RDP4 detection algorithms, it was deemed a misidentification because the beginning of the breakpoint was uncertain and the parental strain UK/VF14-92-A2 was suggested as the likely actual recombinant.

Discussion
The AAstVs are among the least-studied avian enteric RNA viruses. Recent advances in the use of NGS technologies have however led to the discovery of many novel AAstVs, complicating classification because the current species are demarcated based on their genetic distances within the CP aa sequences, and the phylogenetic clustering of the viruses does not necessarily correspond to the classification. The vast majority of the AAstV genomic data available in the databases are partial sequences of ORF2 (capsid gene), and fewer of ORF1b (RdRp gene). Furthermore, only a few complete genome sequences of ANVs and CAstVs are available, most of which are ANV-3/-6/-8 and CAstV-Biv/vi strains. Several of the AAstV strains in the databases have little or no molecular characterization and remain unclassified. In this study, we identified and characterized the full-length genome sequences of two AAstVs assembled de novo into single contigs, from the NGS of a cloacal swab obtained from a backyard chicken originating from rural Tanzania and sold at an urban LBM in the Iringa region of the country.
Although identified in a single bird, the two Tanzanian strains belong to two different genogroups of AAstVs, which is not surprising because strains from different genogroups have been reported to simultaneously infect a single avian species [35,53,54]. There is also evidence that coinfection of poultry with multiple enteric viruses could result in an increase in the severity of enteritis and the risk to infected animals of secondary infections with opportunistic microbial pathogens [55]. The chicken from which the cloacal sample was obtained contained detectable levels of RNAs of bacterial species that are associated with poultry diseases as opportunistic pathogens, which included E. cecorum, E. faecium, Avibacterium and Gallibacterium anatis. We also detected RNAs of an endogenous retrovirus (avian leukosis virus subgroup; ALV-E) and a picornavirus (sicinivirus type A; SiV-A). However, because there were no recorded metadata about any overt clinical signs of disease in the sampled birds, we could not associate the Tanzanian AAstV strains to pathologies in chickens.
The full-length genome sequences of ANV-5/IM162/19 (6919 nt) and CAstV-A/IM162/19 (7318 nt) and their architecture (5 -UTR-ORF1a-ORF1b-ORF2-3 -UTR) are consistent with other AAstVs [35]. The observed variations in sequence lengths of the full-length genomes and individual ORFs amongst the strains used in our analyses are not unexpected for AAstVs; such variations are largely attributable to viral molecular evolution resulting from factors such as recombination events, indels, and the ability of these viruses to be transmitted vertically (via eggs) and across different avian species, as well as other unexplored biological and ecological factors [28,35,[56][57][58][59][60]. Recombination events have intrinsic potential to complicate the classification of RNA viruses [60], and even more so in the case of AAstVs challenged with insufficient full-genomic data. Indels are more common in the variable genomic regions of RNA viruses (specifically in the spike region of the capsid protein in the case of AAstVs) than in conserved regions (ORF1a and ORF1b, which encode the nsP and RdRp of AAstVs, respectively), where recombination events occur with higher frequencies between genetically related species [35]. There were higher identities amongst the analyzed nsP and RdRp sequences compared to the much lower and wide-ranging identities amongst the CP sequences. Numerous aa variations were also observed in the spike region, which is to be expected because it is the most variable genomic region of AAstVs [11,12,20]. Our analyses of the CP sequences show that the three subgroups of CAstV-A (i-iii) cluster distinctly, unlike the six subgroups of CAstV-B (i-vi) which cluster with guinea-fowl AAstVs (GfAstVs). Furthermore, based on the CP sequences, the Chinese AAstV-III duck type 2 (DAstV-2) and goose type 1 (GoAstV-1) viruses phylogenetically cluster with the CAstV viruses separate from DAstV-1 and GoAstV-2. Overall, the phylogenetic clustering of the Tanzanian AAstVs was consistent based on the full genome and all three ORF sequences, but they appear to be distinct from other strains through their respective ANV and CAstV groups.
The recombination event detected in the nsP/RdRp genomic region of CAstV-A/IM162/19 underscores the potential of recombination events complicating the classification of AAstVs. The proposed parental strains of the recombinant fragment in CAstV-A/IM162/19 (CAstV-Aii) are CAstV-B viruses from different subgroups (i.e., Bi and Bvi), which is interesting because, as stated above, recombination events mostly occur between closely related viral species. However, it should be noted that most of the available CAstV full-length genome sequence data are of Eurasian and Canadian genogroup B viruses, creating the possibility that the real parental strains of the Tanzanian recombinant strain are yet to be identified. Many recombinants have been reported from Canada [28]. The classification of AAstVs has hitherto included partial CP sequences (as opposed to using complete protein coding sequences); our data indicate that expansion of the phylogenetic analyses, and the classification of these viruses using the sequences of the full-length genome and all three ORFs, should be considered.
An important aspect to highlight is that the European AAstVs to which the Tanzanian strains are most closely related (phylogenetically and nt/aa identities) have been implicated in gastrointestinal diseases and hatchability problems in commercial poultry production [29]. Our analyses have shown high levels of heterogeneity in the CP of the Tanzanian strains compared to the European viruses. However, the implications of the genetic differences between the Tanzanian and the European viruses in terms of pathology remain unknown. Furthermore, viral pathogenicity can be a subject of myriads of other factors such as viral loads, age, breed and immune status of the host birds, and co-infecting pathogens. Nevertheless, because the CP contains the vast majority of the viral epitopes that are exposed to and interact with the host's immune system, the numerous genetic aa variations observed in this region amongst the AAstVs are of obvious significance in future epidemiological investigations into AstVs, as well as in the development of diagnostic tools and vaccines in efforts to control these viruses. With the difficulties currently associated with the growth of AAstVs in cell culture, the nontargeted approach is an attractive alternative to PCR/Sanger sequencing in molecular studies of these viruses as demonstrated by the data and results of the current study.

Conclusions
We have presented full-length genome sequences and the characterization of CAstV and ANV strains assembled de novo from the NGS of a cloacal swab from a backyard chicken in rural Tanzania, which is a first in the East and Central African regions. Based on comparative sequence analyses and phylogenetics using the currently available genomic data on AAstVs, the two Tanzanian strains belong to two different genogroups and are most closely related to Eurasian viruses, but with numerous aa variations in the spike region of the capsid protein that harbors the majority of the viral epitopes. We have also demonstrated that the Tanzanian CAstV strain has a CAstV-Aii backbone with a potential recombinant fragment in the conserved genomic region (nsP/RdRp) derived from CAstV-B parental viruses. These data provide a valuable contribution to the repertoire of genomic datasets of AAstVs and hint at the need for revisiting the current ICTV taxonomic classification of members of the Astrovirus genus. Furthermore, the observed heterogeneity of the CP should inform on the future development of diagnostics and vaccines for the management of AAstVs.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/v15061247/s1, Figure S1: Relationship of the Tanzanian ANV-5/IM162/19 and CAstV-A/IM162/19 strains (in red bold red) with other AAstVs based on the complete capsid protein (CP) aa sequences. Details of the condensed taxa subtrees in panel A are shown in details panel B and vice versa. Mean genetic distances and standard deviation (±) are indicated in brackets for each group. Reconstruction of the phylogenetic tree using MEGA and naming of the AAstV genogroups were performed as explained in the text with the final dataset involving 147 sequences and 577 positions. Sequence names include GenBank accession numbers, abbreviated host avian species, and country/strain/strain/year of isolation. Abbreviations: AAstV, Avastrovirus; ANV; avian nephritis virus; CAstV, chicken astrovirus; DAstV, duck astrovirus; GfAstV, guinea fowl astrovirus; GoAstV, goose astrovirus; TAstV, turkey astrovirus. Figure S2: Relationships of the Tanzanian ANV-5/IM162/19 and CAstV-A/IM162/19 strains (in red bold red) identified in the current study with AAstVs based on the nucleotide sequences of the full-length genomes and ORF1a and ORF1b sequences. Reconstruction of the phylogenetic tree using MEGA and naming of the AAstV genogroups were performed as explained in the text and involved final datasets of 78 nt sequences and 5925, 898 and 568 positions (genome, ORF1a and ORF2 sequences, respectively).