Genetic and Phylogenetic Characterization of Tataguine and Witwatersrand Viruses and Other Orthobunyaviruses of the Anopheles A, Capim, Guamá, Koongol, Mapputta, Tete, and Turlock Serogroups.

The family Bunyaviridae has more than 530 members that are distributed among five genera or remain to be classified. The genus Orthobunyavirus is the most diverse bunyaviral genus with more than 220 viruses that have been assigned to more than 18 serogroups based on serological cross-reactions and limited molecular-biological characterization. Sequence information for all three orthobunyaviral genome segments is only available for viruses belonging to the Bunyamwera, Bwamba/Pongola, California encephalitis, Gamboa, Group C, Mapputta, Nyando, and Simbu serogroups. Here we present coding-complete sequences for all three genome segments of 15 orthobunyaviruses belonging to the Anopheles A, Capim, Guamá, Kongool, Tete, and Turlock serogroups, and of two unclassified bunyaviruses previously not known to be orthobunyaviruses (Tataguine and Witwatersrand viruses). Using those sequence data, we established the most comprehensive phylogeny of the Orthobunyavirus genus to date, now covering 15 serogroups. Our results emphasize the high genetic diversity of orthobunyaviruses and reveal that the presence of the small nonstructural protein (NSs)-encoding open reading frame is not as common in orthobunyavirus genomes as previously thought.


Introduction
The family Bunyaviridae ranks among the largest families of RNA viruses. The family has more than 530 named members that are either assigned to the five included genera Hantavirus, Nairovirus, Orthobunyavirus, Phlebovirus, or Tospovirus, or remain to be classified [1][2][3]. Many bunyaviruses can cause disease in humans. These diseases commonly manifest as arthritides/rashes, fevers/myalgias, pulmonary diseases, encephalitides, or viral hemorrhagic fevers [4]. In addition, bunyaviruses can cause severe disease in wild and domesticated animals and wild or cultivated crop or ornamental plants (tospoviruses only). The majority of bunyaviruses is transmitted among hosts by arthropods (predominantly mosquitoes, ticks, sandflies, biting midges, or thrips), whereas hantaviruses are  [1,2,32]. Guamá serogroup viruses are mostly endemic to South America. The exceptions are Guamá and Mahogany Hammock viruses, which were isolated in North America. Catú virus and Guamá virus were isolated from humans with fever/myalgia. Viruses of the Guamá serogroup are usually transmitted by culicine mosquitoes among vertebrate hosts including bats, birds, marsupials, and rodents [34,[36][37][38][39]47]. Here, we present coding-complete sequences for Bimiti, Catú, Guamá, Mahogany Hammock, and Moju viruses.
The Koongol serogroup currently consists of Koongol virus and Wongal virus, both of which are assigned to the species Koongol orthobunyavirus [1,2,32]. Both viruses were originally isolated in 1960 in Queensland, Australia, from Culex mosquitoes [40]. Koongol virus, newly sequenced here, was also obtained from Ficalbia mosquitoes in New Guinea [35]. HI tests suggested that both Koongol and Wongal viruses may be able to infect a range of mammals, marsupials, and possibly birds, and may be widespread throughout Australia. However, these results were not confirmed using NT [48][49][50].
The Mapputta serogroup currently has seven members (Buffalo Creek, Gan Gan, Mapputta, Maprik, Murrumbidgee, Salt Ash, Trubanaman viruses) that have not yet been assigned to species [1,24,51]. Buffalo Creek virus and Murrumbidgee viruses [24,52,53] were isolated from Anopheles mosquitoes collected in Northern Territory and Griffith, Australia, respectively. Mapputta and Trubanaman viruses, which we have sequenced during this study, were initially isolated from Anopheles mosquitoes, respectively, collected in Queensland, Australia in the 1960s [40,41]. Antibodies against Trubanaman virus have been detected in humans and domestic and wild animals in different parts of Queensland, and the virus is suspected to cause arthritis/rash in humans [42].
Finally, we sequenced Umbre virus, a virus of the Turlock serogroup that is represented by viruses belonging to the two species M'Poko orthobunyavirus (M'Poko virus, Yaba-1 virus) and Turlock orthobunyavirus (Lednice virus, Turlock virus, Umbre virus) [1,2]. Turlock serogroup viruses are distributed in Africa, Asia, Europe, and Northern and South America. Umbre virus was initially isolated from Culex bitaeniorhynchus mosquitoes collected in Poona (today Pune), India in 1955 [44]. Additional virus isolates were obtained from Culex mosquitoes and birds in India and Malaysia. Anti-Umbre virus antibodies could be detected in sera collected from wild birds and sentinel chickens from Malaysia [35].
In addition, we determined coding-complete genomic sequences of two unclassified members of the family Bunyaviridae, Tataguine virus and Witwatersrand virus (Table 1). Tataguine virus was initially recovered from pooled Culex and Anopheles mosquitoes collected in Senegal in 1962 [57]. This virus is widespread in Africa as evidenced by isolation from areas of today's Cameroon, Central African Republic, Ethiopia, Nigeria, and Senegal [45,[58][59][60][61][62][63][64]. Tataguine virus is a known human pathogen; infected patients present with fever, headache, rash, and joint pain [63][64][65]. Witwatersrand virus was originally isolated from Culex mosquitoes collected in Germiston, South Africa, in 1958 [46]. Additional isolates were obtained from Culex mosquitoes, sentinel hamsters, and various rodents sampled in Mozambique, South Africa, and Uganda. Antibodies to Witwatersrand virus also have been detected in human sera, although the virus has not been associated with human disease [66].
Our data unequivocally identify both viruses as members of the genus Orthobunyavirus and largely confirm the previously deduced relationships of the remaining 15 viruses using serological assays.

Viral Genomic RNA Isolation and Library Preparation
Orthobunyairuses were obtained from the Russian State Collection of Viruses in the form of lyophilized infected suckling mouse brains (Table 1). Total RNA was isolated from vials with 1 mL of TRI Reagent (Molecular Research Center, Cincinnati, OH, USA). Total RNA was additionally purified using the RNeasy MinElute Cleanup Kit (Qiagen, Hilden, Germany) followed by ribosomal RNA depletion using the GeneRead rRNA Depletion Kit (Qiagen) according to the manufacturers' instructions. Purified RNA was reverse-transcribed with RevertAid Reverse Transcriptase (Thermo Fisher Scientific, Grand Island, NY, USA) using hexameric random primers (Promega, Madison, WI, USA). First strand cDNA was converted to double-stranded cDNA using the NEBNext Second Strand Synthesis Module (New England BioLabs, Ipswich, MA, USA) according to the manufacturer's instructions. Resulting dsDNA was used to prepare next-generation sequencing libraries using the TruSeq DNA LT Library Prep Kit (Illumina, San Diego, CA, USA). A paired-end 250-bp protocol was used for sequencing indexed libraries on an Illumina MiSeq instrument.

Bioinformatic and Phylogenetic Analyses
Primary analysis of sequencing data and de novo genome assembly were performed with CLC Genomics Workbench 7.0 (CLC bio, Waltham, MA, USA). Open reading frame (ORF) analysis and general work with assembled contigs were performed using the Lasergene 11.0.0 (DNAStar, Madison, WI, USA) software package. After de novo assembly of trimmed reads, BLASTx (BLAST, basic local alignment search tool, open-source software) analysis was performed against orthobunyaviral sequences, and matching contigs were extracted. All resulting contigs contained parts of non-coding terminal regions and full-length ORFs corresponding to their matches in the BLASTx search. Identified ORFs were translated, and the resulting amino acid (aa) sequences were used in further analyses. Deduced aa sequences of the proteins encoded by sequenced orthobunyaviral genome segments and corresponding sequences of selected representatives of already characterized orthobunyaviruses were aligned using all available multiple sequence alignment methods implemented on M-Coffee server [67] and only columns with a score of 5 or higher were retained. N, glycoprotein precursor polyprotein, and RdRp final alignment lengths were 240, 1412, and 2331 aa residues, respectively. Most suitable models of protein evolution were predicted with open-source ProtTest 3.2 for three alignments [68].
Phylogenetic trees were inferred using MrBayes 3.2.4 [69] under LG + G + I model for N and LG + G + I + F model for glycoprotein precursor polyprotein and RdRp alignments, with 1,000,000 generations and a 25% burn-in value. Maximum likelihood (ML) phylogenies were inferred using MEGA6 with the same protein evolution models and 1000 bootstrap replicates. Consensus trees were visualized with TreeGraph 2.4 [70].
Signal peptide cleavage sites of orthobunyaviral glycoprotein precursors were determined from deduced aa sequences using SignalP 4.1 Server [71]. Transmembrane domains of glycoprotein precursors were predicted using the same sequences and open-source TMHMM Server v 2.0 [72]. Putative N-glycosylation sites were determined with the open-source NetNGlyc 1.0 Server [73].

Results
Segment-specific ORF-based phylogenies of the genus Orthobunyavirus, including previously and newly characterized viruses, are presented in Figure 1.   LG + I + G model of amino acid (aa) substitution was used for inferring the N protein phylogeny, whereas the LG + I + G + F model was used to investigate glycoprotein precursor polyprotein and L protein phylogenetic relationships. Numbers represent Bayesian posterior probabilities (Maximum Likelihood bootstrap values). "Herbeviruses" (Herbert virus, Kibale virus, and Taï virus) [5,6] were used to root the phylograms. Trees are drawn to scale measured in substitutions per site. In (a), viruses that encode NSs proteins are marked in blue, whereas viruses that do not are marked in red. Viruses studied in the present work are depicted in bold. GenBank accession numbers and serogroups are to the right of virus names.

Anopheles A Serogroup
Genomic data of the Anopheles A serogroup orthobunyaviruses were thus far limited to S segment sequences of Anopheles A and Tacaiuma viruses [74]. Here, we expanded this sequence space by adding the complete coding sequences of Lukuni virus. Based on the obtained genomic data, Lukuni virus is closely related to Anopheles A virus (71.5% aa nucleocapsid sequence identity) and is more distantly related to Tacaiuma virus (53.9% nucleocapsid sequence identity).
The measured divergence of Guamá serogroup viruses is somewhat inconsistent among phylogenies inferred for different viral proteins (Figure 1). For instance, based on the phylogenies obtained for N and L proteins, Guamá, Catú, Moju, and Bimiti viruses are more closely related to each other than to Mahogany Hammock virus, although exhibiting different branching orders inside the group. Additionally, analysis of the glycoprotein precursor polyproteins pairs Moju virus with Guamá virus, Mahogany Hammock virus, and Capim serogroup viruses. Taken together, these discrepancies suggest that all examined Guamá group viruses may be reassortants.
The most obvious example of this observation is the relationship between Catú BeH 151 and Guamá BeAn 277 viruses, which were found to be almost identical when compared by their N protein sequences, differing only by two aa (99.2% sequence identity). However, their glycoprotein precursor polyprotein and L protein sequences are more divergent (64.5% polyprotein identity, 95.8% L protein identity). These data are in agreement with original studies that distinguished these two viruses in NT, but not in CF tests [37]. The observed relationships between Catú and Guamá viruses support the idea that one or both of the viruses may be reassortants.
Capim and Guajará viruses fall basal to viruses of the Guamá group phylogenies inferred for N and L proteins, but are placed inside the Guamá serogroup in the glycoprotein precursor polyprotein phylogeny (Figure 1). Capim and Guajará viruses are one-way reactive with Guamá, and Catú and Guamá viruses, respectively, in NT but not in CF assays. Interestingly, no reactivity between Capim and Guajará viruses was found in NT [35]. Bushbush virus, another member of the Capim serogroup, has similar cross-reactivity in NT but not in the CF assay with Bimiti and Catú viruses of the Guamá serogroup [35]. The observed phylogenetic and antigenic relationships between Guamá and Capim serogroup viruses might indicate that their ancestors were inter-group reassortant viruses that shared an M segment. If validated by further experiments and analyses, this finding may be of significance for orthobunyavirus taxonomy, as thus far reassortment has only been observed among viruses of the same orthobunyaviral species and this restriction has been used as one of the official orthobunyavirus species demarcation criteria [2].

Mapputta Serogroup
The obtained genomic sequences of Mapputta virus MRM186 is more than 99% identical to the sequences of the same isolate reported previously [24], and all identified nucleotide (nt) substitutions are synonymous. These substitutions likely arose due to different passaging and maintenance procedures. Our phylogenetic analysis placed Trubanaman virus firmly inside the Mapputta serogroup and revealed its close relationship to two previously characterized viruses: Buffalo Creek virus and Murrumbidgee virus [24,52,53]. The N protein divergence of Buffalo Creek, Murrumbidgee, and Trubanaman viruses does not exceed 2.1%, suggesting that Buffalo Creek and Murrumbidgee viruses are different isolates of Trubanaman virus rather than distinct viruses. An increasing need for further characterization of this group of viruses is indicated by evidence that Buffalo Creek virus [52] and Trubanaman virus are suspected as human pathogens [42].

Tete Serogroup
Sequence information for this serogroup was limited to the S segment of Tete and Batama viruses [74] and a partial Weldona virus M segment sequence. We provide complete coding sequences for three Tete serogroup viruses: Tete, Bahig, and Matruh viruses. Interestingly, the S segment sequence of Tete virus SAAn 3518 obtained here differs slightly from that previously reported [74]. The conflicting region is located closer to the C-terminus of the N protein and, in our case, is represented by a characteristic amino acid motif (G 158 -S 164 ) found in all other N protein sequences belonging to Tete group viruses, but not in the Tete virus SAAn 3518 sequence reported earlier.
Our analyses reveal Bahig and Matruh viruses to be closely related, with 98.8%, 99.1%, and 87.2% aa identities among their N, L, and glycoprotein precursor polyprotein sequences. This observed relationship is in agreement with the results of CF tests, which showed that these viruses were practically indistinguishable, and HI tests, which proved them to be easily distinguishable [35]. The obtained phylogenetic tree topology and branching order of Tete serogroup viruses (Figure 1) is consistent among the three segments.
Two unclassified bunyaviruses, I612045 virus (GenBank HM627179-81) and Oyo virus (GenBank HM639778-80) form a monophyletic group with Tete serogroup viruses, with Oyo virus falling basal to the other viruses in this group. While the measured nt and aa identities of I612045 virus with other orthobunyaviruses clearly indicate its taxonomic status as a member of Tete serogroup, Oyo virus is indeed distinct and may represent a distinct species in the genus Orthobunyavirus.

Koongol and Turlock Serogroups
Here we report coding-complete sequences of all three genomic segments of Koongol and Umbre viruses. The sequence of the Umbre virus IG1424 M segment is more than 99% identical to a previously published M segment of the same isolate [75]. Phylogenetic analysis placed Koongol and Umbre viruses along with unclassified Witwatersrand virus (see below) into a monophyletic group regardless of the protein assayed for tree reconstruction. These viruses also share a last common ancestor with Gamboa serogroup viruses, which are exclusively distributed in North and South America.
Genomic sequence information for the Turlock serogroup was thus far limited to partial Umbre virus M segment sequences. We expanded the sequence space of this serogroup by determining the coding-complete Umbre virus genome sequence and confirmed the relationship of Umbre virus to Kongool and Witwatersrand viruses (Figure 1).

Tataguine and Witwatersrand Viruses (Unassigned Bunyaviruses)
We present sequence information for Tataguine virus, the closest relative of which appears to be Lukuni virus, with 47.5% aa identity for N, 35.5% aa identity for glycoprotein precursor polyprotein, and 53.5% aa identity for L. Anopheles A and Anopheles B group viruses, along with Tataguine virus, form a monophyletic group with Tataguine virus at its base. Our data indicate that Tataguine virus may have to be assigned to a new species in the genus Orthobunyavirus.
As mentioned above, based on the obtained phylogenies, Witwatersrand virus clusters together with Umbre (Turlock serogroup) and Koongol (Koongol serogroup) viruses in the same branching order independent of the analyzed protein. The closest relative of Witwatersrand virus is Koongol virus (48.5% to 59.6% aa identities for the three proteins), indicating that Witwatersrand virus should be classified as an orthobunyavirus.

Characteristics of S Segments: N and NSs proteins
The N proteins of bunyaviruses encapsidate viral RNA and are major CF determinants [3]. The lengths of the N proteins of the examined orthobunyaviruses range from 234 aa residues for Koongol virus (Kongool serogroup) to 258 aa for Bahig, Matruh, and Tete viruses (Tete serogroup) ( Table 2). Our studies confirm that Tete serogroup viruses possess the longest N proteins in the genus Orthobunyavirus with unique extensions predominantly located at the amino termini [74].  Forty-six aa of the N protein are conserved among the previously determined 51 sequences of orthobunyaviruses from the Bunyamwera, California, Group C, and Simbu serogroups [77]. Our analyses reveal that only 11 aa of these 46 aa are strictly conserved among the N proteins of Anopheles A, Anopheles B, Bunyamwera, Bwamba, California, Capim, Gamboa, Guamá, Group C, Koongol, Turlock, Nyando, Simbu, and Tete serogroup viruses (Figure 2). Five of these 11 aa (F26, P125, G131, K179, W193) are crucial for Bunyamwera virus mini-replicon rescue. Two of those aa, K179 and W193, are likely involved in RNA synthesis, and two other aa residues, P125 and G131, are thought to play a role in ribonucleoprotein packaging [77].
A number of researchers have evaluated whether NSs is involved in the immune response to orthobunyavirus infections. Despite not expressing NSs, Tacaiuma virus (Anopheles A serogroup) antagonizes host interferon (IFN) production through a yet unrecognized mechanism [74] and is associated with human febrile illness [35]. Similarly, Mapputta serogroup viruses, such as Maprik virus and Buffalo Creek virus, do not encode NSs, but are linked to human diseases [24]. Two out of 17 orthobunyaviruses studied in the present work, Umbre virus (Turlock serogroup) and Witwatersrand virus (ungrouped orthobunyavirus), encode NSs proteins of 79 and 111 aa, respectively (previously characterized orthobunyaviruses: 62 (Group C serogroup) to 130 aa (Gamboa serogroup)). Neither of the two viruses has been associated with human disease. In contrast, known human pathogens, such as the febrile disease-causing ungrouped Tataguine virus and Guamá and Catú viruses (Guamá serogroup), do not encode NSs. These findings suggest that the presence or absence of an NSs-encoding ORF alone does not predic human pathogenicity. Additionally, the presence of an NSs-encoding ORF is far less common for orthobunyaviruses than previously thought as only viruses from 8 out of 15 sequenced serogroups do encode this nonstructural protein.

Characterization of M segments: Glycoprotein Precursor Polyprotein and Gn and Gc Proteins Cleavage Products
Each orthobunyavirus M segment contains a single continuous ORF encoding a glycoprotein precursor polyprotein that is cotranslationally cleaved into glycoprotein Gn, the nonstructural protein NSm, and the glycoprotein Gc [78]. Among the polyprotein sequences derived from the sequenced M segments of our study, the Koongol virus glycoprotein precursor polyprotein (1105 aa) is notably smaller than all other glycoprotein precursors of orthobunyaviruses, which range in size from 1370 aa in the case of Mapputta virus to 1448 aa in the case of Witwatersrand virus, Table 1. A strictly conserved arginine residue of the glycoprotein precursor polyprotein sequences of all sequenced orthobunyaviruses (position 302 in the prototype Bunyamwera virus) is believed to mark the cleavage site between Gn and NSm proteins [78]. Therefore, Koongol virus Gn is comparable in size with Gn of other orthobunyaviruses whilst its Gc and NSm proteins are notably shorter.
Regions highly similar to the fusion peptide identified in La Crosse virus glycoprotein precursor polyprotein (positions 1066-1087) are present in the predicted polyproteins of the sequenced orthobunyaviruses of all serogroups. Ten out of twenty-two La Crosse virus fusion peptide aa are strictly conserved across the genus. This finding indicates that all orthobunyavirus Gc glycoproteins have analogous functions and thereby act as class II fusion proteins [79]. Supporting this hypothesis is the finding that the overall topology of orthobunyaviral glycoprotein precursor polyprotein appears to be conserved based on the number of conserved cysteine residues (ranging from 67 in the case of Matruh virus (Tete serogroup) to 78 in the case of Guamá virus (Guamá serogroup)). Once again, Koongol virus (Koongol serogroup) is the outlier with only 57 cysteine residues [80][81][82].
N-glycosylation of viral membrane proteins plays a crucial role in correct protein folding and functioning, including receptor binding, membrane fusion, and cell-penetration processes [83]. All three predicted N-glycosylation sites of the membrane glycoproteins of Bunyamwera virus (Bunyamwera serogroup) are indeed glycosylated. Glycosylation of Bunyamwera virus Gn's N60 site is essential for correct protein folding of both Gn and Gc [84]. Interestingly, this glycosylation site is highly conserved among almost all previously sequenced orthobunyaviruses, with the notable exception of Maprik virus (Mapputta serogroup) [24]. NetNGlyc 1.0 server predicted this glycosylation site to be present in the Gn proteins of all viruses sequenced in this study, with the exception of Lukuni virus (Anopheles A serogroup) and ungrouped Tataguine viruses. In general, glycosylation site locations were moderately conserved in orthobunyaviruses belonging to the same serogroup, but were not consistent among all viruses of the genus Orthobunyavirus. Finally, with the exception of Koongol virus, transmembrane prediction using hidden Markov models (TMHHM) 2.0 generally predicted the same distribution and type of transmembrane regions for the glycoprotein precursor polyprotein sequences of the analyzed viruses, which included two transmembrane domains in Gn, three in NSm, and one in Gc. The Koongol glycoprotein precursor polyprotein has four predicted transmembrane regions, lacking two domains usually located at the C-terminal half of NSm.

Characterization of L Segments: RNA-Dependent RNA Polymerase
The L sequence aa lengths of the studied orthobunyaviruses are comparable to those of previously studied viruses, ranging from 2241 aa in the case of Lukuni virus to 2293 aa in the case of Umbre virus ( Table 2). Tete group (Bahig, Matruh, and Tete viruses) L (2280-2281 aa) possesses a serogroup-characteristic 24 aa insertion (E2185-E2208) at the C terminus. Umbre and Witwatersrand virus L possess several aa at the very end of the C terminus that are not conserved among other viruses. All analyzed L sequences have the same well-conserved topology, consisting of four distinct regions with readily distinguishable RNA-dependent RNA polymerase motifs pre-A to E inside the POL III block. The proposed site-active domain SDD1163-5 of Bunyamwera virus [85] is strictly conserved among all previously and newly characterized orthobunyaviruses.
Within the classic orthobunyavirus group, genomic sequence information was limited to viruses of 10 of 18 serogroups [15][16][17][18][19][20][21][22][23][24][25][26][27][28]. Our work expands this sequence space by adding coding-complete genomic information on viruses of an additional five serogroups. Our efforts largely confirm the relationships of the studied viruses that had been established previously by non-sequence-based (serological) techniques. Despite the high genetic diversity of the orthobunyaviruses, reflecting their wide geographic distribution and variety of ecological features, the viruses of each serogroup are grouped together within the appropriate lineage on the three phylogenetic trees. Since the members of the Bunyaviridae family possess segmented genomes, the phenomenon of segment reassortment plays a significant role in their evolution [110]. Earlier, it was shown that many members of the genus Orthobunyavirus are intra-group reassortants. Our data show that all examined viruses of the Guamá serogroup are in all likelihood genome segment reassortants. Furthermore, viruses of the Guamá and Capim serogroups form a monophyletic lineage on the tree inferred for the glycoprotein polyprotein precursor protein (M segment), suggesting inter-group reassortment of M segments in their natural history. Another important observation that could be made is the classification of Witwatersrand virus and Tataguine virus as likely members of two new novel species in the genus Orthobunyavirus.
Finally, our data show that presence of an ORF encoding an NSs protein is not a universal feature for orthobunyaviruses. Among the viruses of 15 orthobunyavirus serogroups for which genomic data are now available, only viruses of eight serogroups along with Witwatersrand virus encode NSs proteins. Therefore, the presence or absence of NSs protein should not be considered as a taxonomic characteristic of the genus Orthobunyavirus, but may be important for differentiating pathogenic from nonpathogenic viruses.
Our analyses advance the overall resolution of orthobunyavirus phylogeny. Although sequence information for the 3 1 and 5 1 genomic segment termini could not obtained in this study, we are confident that the determined phylogenetic placement on the phylogenetic tree of all viruses studied here will hold. Complementation of our results by genomic sequences of viruses from the remaining unsequenced orthobunyavirus serogroups should allow official taxonomic re-organization of the genus Orthobunyavirus.