Genomic Characterization of the Genus Nairovirus (Family Bunyaviridae)

Nairovirus, one of five bunyaviral genera, includes seven species. Genomic sequence information is limited for members of the Dera Ghazi Khan, Hughes, Qalyub, Sakhalin, and Thiafora nairovirus species. We used next-generation sequencing and historical virus-culture samples to determine 14 complete and nine coding-complete nairoviral genome sequences to further characterize these species. Previously unsequenced viruses include Abu Mina, Clo Mor, Great Saltee, Hughes, Raza, Sakhalin, Soldado, and Tillamook viruses. In addition, we present genomic sequence information on additional isolates of previously sequenced Avalon, Dugbe, Sapphire II, and Zirqa viruses. Finally, we identify Tunis virus, previously thought to be a phlebovirus, as an isolate of Abu Hammad virus. Phylogenetic analyses indicate the need for reassignment of Sapphire II virus to Dera Ghazi Khan nairovirus and reassignment of Hazara, Tofla, and Nairobi sheep disease viruses to novel species. We also propose new species for the Kasokero group (Kasokero, Leopards Hill, Yogue viruses), the Ketarah group (Gossas, Issyk-kul, Keterah/soft tick viruses) and the Burana group (Wēnzhōu tick virus, Huángpí tick virus 1, Tǎchéng tick virus 1). Our analyses emphasize the sister relationship of nairoviruses and arenaviruses, and indicate that several nairo-like viruses (Shāyáng spider virus 1, Xīnzhōu spider virus, Sānxiá water strider virus 1, South Bay virus, Wǔhàn millipede virus 2) require establishment of novel genera in a larger nairovirus-arenavirus supergroup.


Introduction
With over 530 members, Bunyaviridae is one of the largest virus families [1]. Bunyaviruses are characterized by single-stranded RNA genomes that typically consist of separate small (S), medium (M), and large (L) segments, all of which have complementary 3 1 and 5 1 termini. Most bunyavirus genomes

Viruses
The viruses used in this study were obtained from the World Reference Center for Emerging Viruses and Arboviruses at the University of Texas Medical Branch, Galveston, TX, USA. All of these viruses have been described before. Table 1 provides specifics about the viruses and GenBank accession numbers for all newly sequenced and re-sequenced viruses.

Genome Sequencing
Viral stocks were obtained in TRIzol LS (Invitrogen, Carlsbad, CA, USA), and RNAs were extracted using the Direct-zol™ RNA MiniPrep kit (Zymo, Irvine, CA, USA). RNAs were converted to cDNAs and amplified using sequence-independent single primer amplification as described previously [65] with some modifications to resolve the 5 1 and 3 1 ends. An oligonucleotide containing three ribonucleotides (rGTP) at the 3 1 end (GCCGGAGCTCTGCAGATATCGGCCATTAT GGCCrGrGrG) and the FR40RV-T primer [65] were added during first-strand cDNA synthesis. Open-source Cutadapt [66] and Prinseq-lite [67] were used to trim primers and remove poor quality reads, respectively. Reads were assembled into contigs using open-source Ray Meta [68]. Annotation was determined using basic local alignment search tool (BLAST) in combination with custom scripts. Contigs related to nairovirus sequences were used as references.

Phylogenetic Analysis
A set of nairovirus sequences (252 for the N gene of the S segment, 111 for the M segment, and 93 for the L segment) comprising the majority of the nucleotide (nt) sequences from GenBank available on 1 March 2016, were aligned using the CLUSTAL algorithm. Because the nairovirus sequences of all analyzed nairoviruses were so different that the alignment reached substitution saturation (no detection of signal), alignments were instead implemented at the amino acid (aa) level (using MEGA Version 5 [69]). Non-coding regions of S segments therefore had to be excluded. Additional manual editing was performed to ensure the highest possible quality of alignments. Neighbor-joining (NJ) analysis at the aa level was performed due to the observed high variability of the underlying nt sequences. The statistical significance of tree topology was evaluated by bootstrap re-sampling of the sequences 1000 times. Phylogenetic analyses were performed using MEGA Version 5.

Genomic Characterization and Phylogenetic Analysis
Consistent with the genomic organization characteristic for already sequenced nairoviral genomes, each of the 23 viral genomes sequenced during this study is comprised of three RNA segments including (a) a small (S) segment encoding the NP and, in an ambisense orientation, a non-structural protein (NSs); (b) a medium (M) segment encoding a GPC; and (c) a large (L) segment encoding an RNA-dependent RNA polymerase. Fourteen nairovirus genomes were completely characterized. The 3 1 terminal sequences were obtained for 57 segments, and the 5 1 terminal sequences were obtained for 51 segments (Table 1). For most of the viral genomes sequenced in this study, the nine most terminal nucleotides of each segment were identical to those previously reported for nairoviruses (3 1 segment terminus AGAGUUUCU and 5 1 segment terminus AGAAACUCU) [1,3]. However, the The phylogenetic placement of the newly sequenced viruses is largely consistent with their previous serological classification, including recent amendments [42] (Table S1). However, Hazara virus (HAZV) and Tofla virus (TOFV) clustered with each other but not with CCHFV and, therefore, should not be classified in the species Crimean-Congo hemorrhagic fever nairovirus. Likewise, both Kupe virus (KUPEV) and Nairobi sheep disease virus (NSDV) did not cluster with Dugbe virus (DUGV), and, therefore, should be removed from the species Dugbe nairovirus and re-assigned to new species (here proposed as "Hazara nairovirus" (HAZV, TOFV)) and "Nairobi sheep disease virus" (NSDV), respectively). Ganjam virus (GANV) is clearly identified as an isolate of NSDV. Our analysis confirm that Leopards Hill virus (LPHV), Kasokero virus (KAS(O)V), and Yogue virus (YOGV) form a novel nairovirus genogroup (proposed species "Kasokero nairovirus"), as do Keterah virus (KRTV) and Issyk-kul virus (ISKV) (proposed species "Keterah virus") [29,42]. The recently described soft tick bunyavirus [38] is identified as an isolate of KRTV. Genetic characterization of TUNV clearly demonstrates that this virus is a nairovirus and not a phlebovirus as previously described by serological analysis [63]. The TUNV genome represents an isolate of AHV, indicating necessary classification into the species Dera Ghazi Khan nairovirus. Another new species, proposed to be named "Burana nairovirus" should be established for Wēnzhōu tick virus, Huángpí tick virus 1, and Tȃchéng tick virus 1. Finally, the phylogenetic trees demonstrate that several nairo-like viruses with three (Shāyáng spider virus 1, Xīnzhōu spider virus (XSV), Sānxiá water strider virus 1 (SWSV-1), South Bay virus (SBV)) or two genomic segments (SBV, Wȗhàn millipede virus 2) should not be classified in the genus Nairovirus.
Although genomic segment reassortment has been found very frequently among CCHFV strains and lineages [84][85][86][87][88], we were unable to detect any instance of reassortment among the other nairoviruses using RDP, Bootscan, MaxChi, LARD and Phylip Plot. Phylogenetic incongruence was only detected in the case of HAZV: whereas the HAZV M and N open reading frames (ORFs) cluster together with those of NSDV/KUPEV, the HAZV L ORF does not. However, given the genetic distance between these sequences, whether this distance is the result of reassortment or saturation of the phylogenetic signal is not clear.
In the case of CCHFV NP, two positively charged regions are responsible for RNA binding [92]. One region forms a large positively charged crevice (residues K339, K343, K346, R384, K411, H453, and Q457), of which two residues contribute to a conserved nairovirus motif (EH 453 (L/M), (L/F)HQ 457 ). The other region is delineated by residues R134, R140, and Q468. The CCHFV NP stalk region also contains a positively charged region consisting of residues R195, H197, K222, R225, R282, and R286. Only three of the positively charged residues (R134, K222 and K343) are absolutely conserved among all nairovirus NPs, although most of the substitutions observed maintain the overall hydrophobicity profile.
CCHFV NP interacts with the antiviral defense factor MxA [95] and the apoptosis mediator caspase-3 [96]. Thus, we expected some degree of conservation of the NP areas mediating those functions. Using a CCHFV minireplicon system [89], three separate NP residues (K132, Q300, and K411) were identified to be essential for replicon activity, and mutation of another two residues (K90 and H456) resulted in significantly reduced NP functionality. However, only H456 and Q300 are completely conserved among nairovirus NPs.
Because protein structure is evolutionarily conserved to a higher degree compared to the primary aa sequence, we used homology modeling principles and techniques to identify conserved structures among nairovirus proteins. We used the Phyre2 (Protein homology/analogy recognition engine) server to model the structure of the proteins and to align remotely related sequences based on hidden Markov models (HMMs) (Figure 4).
In the case of CCHFV NP, two positively charged regions are responsible for RNA binding [92]. One region forms a large positively charged crevice (residues K339, K343, K346, R384, K411, H453, and Q457), of which two residues contribute to a conserved nairovirus motif (EH453(L/M), (L/F)HQ457). The other region is delineated by residues R134, R140, and Q468. The CCHFV NP stalk region also contains a positively charged region consisting of residues R195, H197, K222, R225, R282, and R286. Only three of the positively charged residues (R134, K222 and K343) are absolutely conserved among all nairovirus NPs, although most of the substitutions observed maintain the overall hydrophobicity profile.
CCHFV NP interacts with the antiviral defense factor MxA [95] and the apoptosis mediator caspase-3 [96]. Thus, we expected some degree of conservation of the NP areas mediating those functions. Using a CCHFV minireplicon system [89], three separate NP residues (K132, Q300, and K411) were identified to be essential for replicon activity, and mutation of another two residues (K90 and H456) resulted in significantly reduced NP functionality. However, only H456 and Q300 are completely conserved among nairovirus NPs.
Because protein structure is evolutionarily conserved to a higher degree compared to the primary aa sequence, we used homology modeling principles and techniques to identify conserved structures among nairovirus proteins. We used the Phyre2 (Protein homology/analogy recognition engine) server to model the structure of the proteins and to align remotely related sequences based on hidden Markov models (HMMs) (Figure 4).  In addition, upon identification of a conserved nairovirus protein structure, we performed mutational sensitivity analysis using the Disease-Susceptibility-based SAV Phenotype Prediction (SuSPect) tool [97], to predict whether missense mutations in a nairovirus protein are likely to functionally affect structure.
Finally, the mutational sensitivity score was plotted for each position against the average percent identity for that position in the nairovirus protein alignment. We assumed that if a mutation would have an influence on the structure of a conserved domain, the frequency of that mutation would be diminished. In summary, we expected that the positions with a higher score in the mutational sensitivity analysis should directly correlate with positions of higher conservation. The validity of this approach was tested initially with nairovirus NP sequences ( Figure 5A,B). As expected, all nairovirus NPs were found to be structurally homologous to CCHFV NP, and the positional analysis revealed a direct correlation between positions of high mutational sensitivity with those of high identity ( Figure 5A). In addition, upon identification of a conserved nairovirus protein structure, we performed mutational sensitivity analysis using the Disease-Susceptibility-based SAV Phenotype Prediction (SuSPect) tool [97], to predict whether missense mutations in a nairovirus protein are likely to functionally affect structure.
Finally, the mutational sensitivity score was plotted for each position against the average percent identity for that position in the nairovirus protein alignment. We assumed that if a mutation would have an influence on the structure of a conserved domain, the frequency of that mutation would be diminished. In summary, we expected that the positions with a higher score in the mutational sensitivity analysis should directly correlate with positions of higher conservation. The validity of this approach was tested initially with nairovirus NP sequences ( Figure 5A,B). As expected, all nairovirus NPs were found to be structurally homologous to CCHFV NP, and the positional analysis revealed a direct correlation between positions of high mutational sensitivity with those of high identity ( Figure 5A). Interestingly, we identified two other nairovirus NP domains that were structurally similar to other known, non-viral structures. The first domain, at approximate position 150-200 of the nairovirus alignment, is similar to the globular tail of myosin-4 motor protein (type V myosin; confidence 60%; Protein Data Bank (PDB) ID: 3mmi). Myosin-4 motor protein is a monomeric myosin with motility uniquely adapted to transport mRNA [98]. Interestingly, SuSPect analysis suggested that the domain was sensitive to aa changes ( Figure 5B). The second domain, located at the C-terminal NP domain (approximately at positions 460-481), is similar to the structures of the cholesterol-binding toxins intermedilysin (confidence 35%; PDB: 1s3r), perfringolysin (PDB: 1pfo) Interestingly, we identified two other nairovirus NP domains that were structurally similar to other known, non-viral structures. The first domain, at approximate position 150-200 of the nairovirus alignment, is similar to the globular tail of myosin-4 motor protein (type V myosin; confidence 60%; Protein Data Bank (PDB) ID: 3mmi). Myosin-4 motor protein is a monomeric myosin with motility uniquely adapted to transport mRNA [98]. Interestingly, SuSPect analysis suggested that the domain was sensitive to aa changes ( Figure 5B). The second domain, located at the C-terminal NP domain (approximately at positions 460-481), is similar to the structures of the cholesterol-binding toxins intermedilysin (confidence 35%; PDB: 1s3r), perfringolysin (PDB: 1pfo) and pneumolysin (PDB: 4qqq), but SuSPect analysis did not show mutational sensitivity at these positions (data not shown).

Medium (M) Segment-Glycoprotein Precursor
ORF analysis of 36 sequenced nairovirus M segments generally yielded single unit polyprotein-encoding ORFs in each case (two units in the case of "Burana nairovirus" M segments). The predicted masses of the encoded unmodified polyproteins, the GPCs, ranged from 143 kDa (Thiafora virus (TFAV)) to 187 kDa (CCHFV). Each nairovirus polyprotein was approximated, via software modeling, to contain a signal peptide at the N-terminus. Experimental evidence garnered from CCHFV GPC processing [99][100][101] was used for functional element assignment in predicted nairovirus GPCs. Cleavage motifs for signal peptidases, furin (RSKR), subtilase SKI-I/S1P-like (RRLL and RKLL), and an unknown convertase that cleaves at the aa sequence RKPL, are highly conserved across CCHFV isolates and are critical post-translational motifs in the viral lifecycle [99][100][101][102]. Interestingly, the RSKR motif appears to be unique to CCHFV. The RKLL motif for SKI-I/S1P protease is conserved among nairovirus GPCs and is found in most members of all established and putative nairoviruses except in AVAV, DUGV, ERVEV, KAS(O)V, KUPEV, NSDV/GANV, and YOGV. The RKLL motif is not confined to any specific region/domain and occurs throughout the nairovirus GPCs-in some instances more than once (e.g., DUGV and members of Qalyub nairovirus and "Keterah nairovirus"). The RRLL motif is the second most prevalent cleavage site with the exception of viruses belonging to "Keterah nairovirus" and Qalyub nairovirus, which do not contain the motif. The RKPL and RRLL motifs are also conserved across nairoviruses of many species.
Using the four cleavage motifs (RSKR, RKPL, RRLL, RKLL), the nairovirus GPs stemming from GPC processing were predicted and annotated numerically, starting at the N-termini and depicted as colored arrows ( Figure 6). We predicted the synthesis of two to five GPs depending on the examined nairovirus (Table S2 and Figure 6). To further explore GPC processing phenotypes, proprotein convertase prediction software was used to identify additional cleavage patterns. The most prevalent motifs predicted with a high degree of probability to mediate post-translation cleavage along the GPC were R-X-X-L, G-X-X-R, Q-X-X-C, and R-X-X-K, (data not shown and Figure 6). Other predicted motifs (with varying degrees of probability) are shown in sky blue and pink boxes within Figure 6.
Other post-translational modifications, such as, Nand O-linked glycosylations, were predicted for all analyzed nairoviruses using software modeling (NetOGlyc 4.0, NetNGlyc 1.0). Averages of N-linked glycosylation sites between viruses of different species ranged from seven to twenty sites along polyproteins. The most N-linked glycosylations were found for members of the "Keterah nairovirus" and Sakhalin nairovirus species, whereas the members of the Dugbe and Hazara nairovirus species had the fewest. The extent of nairovirus glycoprotein O-linked glycosylation ranged from four to over 130 sites (Table S2). The fewest O-glycans were predicted for members of the Hughes nairovirus and Dera Ghazi Khan nairovirus species, ranging from four to 17 sites. Intermediate O-linked glycosylation was predicted for Hazara, Dugbe, Qalyub, Sakhalin, and Thiafora nairovirus species members, with averages that ranged from 29 to 48 sites. The highest number of O-glycans (100 and 135 sites, respectively), were predicted for viruses of the Crimean-Congo hemorrhagic fever nairovirus and "Keterah nairovirus" species. Relatively few sites were predicted towards the C-termini of nairovirus GPC, with the exception of the "Keterah nairovirus" species members, which were predicted to contain a small cluster of O-linked sites between residues 994 and 1033.
Heavily O-glycosylated GPCs were typically characterized by site clustering towards the N-termini. O-glycosylated members of the Dugbe, "Keterah," and Qalyub nairovirus species contain RKPL and RKLL proteolytic motifs in these areas. If used by proteases, these motifs could mediate the production of separate, stand-alone peptides that are O-glycosylated. Such peptides have been identified in the cases of CCHFV and DUGV as mucin-like domains (MLDs) [103]. Additionally, viruses of the "Leopards Hill nairovirus" species encode GPC with shorter regions of O-linked glycosylation clustered towards the N-termini in the vicinity of predicted proteolytic cleavage sites. Of nairovirus genus members containing regions of O-glycosylation/MLDs, unmodified averaged masses ranged from eight to 37 kDa. Notably, of the GPC of seven viruses belonging to the Hughes nairovirus species, none were predicted to be O-glycosylated. masses ranged from eight to 37 kDa. Notably, of the GPC of seven viruses belonging to the Hughes nairovirus species, none were predicted to be O-glycosylated.  We also analyzed the nairovirus GPC for the occurrence, location, and topology of transmembrane regions using software modeling (TopPred2). The number of TMDs varied between one and five domains. Interestingly, viruses of the Qalyub nairovirus species had the fewest transmembrane regions in their glycoprotein precursors (e.g., QYBV has only one such region). By comparison, CCHFV and DUGV GPCs have five TMDs. All nairovirus GPCs were predicted to have at least a single conserved C-terminal transmembrane region approximately 40-60 residues prior to the C-termini of the GPC ( Figure 6 and Table S2).

Large (L) Segment-RNA-Dependent RNA Polymerase
Nairovirus RNA-dependent RNA polymerases (Ls) are substantially larger than other bunyavirus L homologs. All nairovirus L sequences maintain the characteristic RNA-dependent RNA polymerase core motifs described by Poch [104] and Muller et al. [105] comprising residues 2361-2669 of the L gene alignment (domain A), and therefore include the polymerase module pre-A motif through motif E (Figure 7). We also analyzed the nairovirus GPC for the occurrence, location, and topology of transmembrane regions using software modeling (TopPred2). The number of TMDs varied between one and five domains. Interestingly, viruses of the Qalyub nairovirus species had the fewest transmembrane regions in their glycoprotein precursors (e.g., QYBV has only one such region). By comparison, CCHFV and DUGV GPCs have five TMDs. All nairovirus GPCs were predicted to have at least a single conserved C-terminal transmembrane region approximately 40-60 residues prior to the C-termini of the GPC (Figure 6 and Table S2).

Large (L) Segment-RNA-Dependent RNA Polymerase
Nairovirus RNA-dependent RNA polymerases (Ls) are substantially larger than other bunyavirus L homologs. All nairovirus L sequences maintain the characteristic RNA-dependent RNA polymerase core motifs described by Poch [104] and Muller et al. [105] comprising residues 2361-2669 of the L gene alignment (domain A), and therefore include the polymerase module pre-A motif through motif E (Figure 7). Moreover, inter-motif regions are moderately conserved, suggesting structural constrains on their three-dimensional arrangements. The invariant sequences DXX KW and SDD of motifs A and C, respectively, may have metal-binding activities necessary for catalytic functions [106,107] ( Figure S1). A phylogenetic analysis of the nairovirus and nairo-like virus core polymerase modules with the corresponding regions of other bunyaviruses (hantaviruses, orthobunyaviruses, phleboviruses), mammarenaviruses, and orthomyxoviruses is shown in Figure 8.
Our analysis demonstrates that the nairovirus and nairo-like virus RNA-dependent-RNA polymerase core domain is more closely related to the arenavirus domain than to any other bunyaviral domain, supporting the existence of an arenavirus-nairovirus supergroup. Moreover, inter-motif regions are moderately conserved, suggesting structural constrains on their three-dimensional arrangements. The invariant sequences DXX KW and SDD of motifs A and C, respectively, may have metal-binding activities necessary for catalytic functions [106,107] (Figure S1). A phylogenetic analysis of the nairovirus and nairo-like virus core polymerase modules with the corresponding regions of other bunyaviruses (hantaviruses, orthobunyaviruses, phleboviruses), mammarenaviruses, and orthomyxoviruses is shown in Figure 8.
Our analysis demonstrates that the nairovirus and nairo-like virus RNA-dependent-RNA polymerase core domain is more closely related to the arenavirus domain than to any other bunyaviral domain, supporting the existence of an arenavirus-nairovirus supergroup. CCHFV encodes a deubiquitinase (DUB) of the OTU family, which unlike eukaryotic OTU DUBs, also targets interferon-stimulated gene 15 (ISG15) modifications [108,109]. The catalytic motifs characteristics of OTU-like cysteine proteases are clearly detected in all nairoviruses, but not in any of the nairo-like viruses. The role of the CCHFV OTU-like cysteine protease in the cleavage of host cell proteins, and specifically on the ubiquitin-and ISG15-dependent innate immune response, has been widely investigated [108,109]. Several OTU-characteristic residues (P35, D37, G38, C40, Y89, W99, W119, and H151) are highly conserved among all nairoviruses ( Figure S7-2). Interestingly, these residues are not conserved in any of the nairo-like viruses [110]. P35, D37, G38, C40, and H151 are part of OTU's catalytic site ( Figure S2). Y89 is the key aa residue of a conserved site resembling a topoisomerase motif (SXXXY), but its serine residue is not conserved among nairoviruses. Interestingly, the conserved site Y89 is located very close to P77, E78, and R80, which are key residues for the interaction with ubiquitin and ISG15 [111]. Moreover, this area of the nairovirus OTU domain is structurally similar to the catalytic domain of a transferase-like protein (PDB ID: 1k98; methionine synthase protein, confidence 60.8%) and to the DNA/RNA-binding 3-helical bundle fold of "winged CCHFV encodes a deubiquitinase (DUB) of the OTU family, which unlike eukaryotic OTU DUBs, also targets interferon-stimulated gene 15 (ISG15) modifications [108,109]. The catalytic motifs characteristics of OTU-like cysteine proteases are clearly detected in all nairoviruses, but not in any of the nairo-like viruses. The role of the CCHFV OTU-like cysteine protease in the cleavage of host cell proteins, and specifically on the ubiquitin-and ISG15-dependent innate immune response, has been widely investigated [108,109]. Several OTU-characteristic residues (P35, D37, G38, C40, Y89, W99, W119, and H151) are highly conserved among all nairoviruses ( Figure S7-2). Interestingly, these residues are not conserved in any of the nairo-like viruses [110]. P35, D37, G38, C40, and H151 are part of OTU's catalytic site ( Figure S2). Y89 is the key aa residue of a conserved site resembling a topoisomerase motif (SXXXY), but its serine residue is not conserved among nairoviruses. Interestingly, the conserved site Y89 is located very close to P77, E78, and R80, which are key residues for the interaction with ubiquitin and ISG15 [111]. Moreover, this area of the nairovirus OTU domain is structurally similar to the catalytic domain of a transferase-like protein (PDB ID: 1k98; methionine synthase protein, confidence 60.8%) and to the DNA/RNA-binding 3-helical bundle fold of "winged helix" DNA-binding domain proteins. Interestingly, the region in the methionine synthase protein identified as structurally similar to the nairovirus OTU domain is the binding domain for vitamin B12. As expected, positional analysis revealed a correlation between positions with high mutational sensitivity and high identity ( Figure 5C). The presence of the conserved OTU motif in all nairovirus Ls and the structural similarities between topoisomerases and strand-specific recombinases could indicate a role of the OTU domain in RNA strand manipulation.
The majority of nairoviruses pathogenic to humans (CCHFV, DUGV virus, ERVEV, ISKV, KAS(O)V, and NSDV) or other mammals (NSDV/GANV) contain an identifiable topoisomerase I active site motif, whereas most nairoviruses without that domain are only known to infect arthropods or to establish subclinical infections in vertebrates. The deubiquitinylation and deISGylation activities of CCHFV L have been proposed as a mechanism of virus evasion from the innate immune response via efficient interference with antiviral signaling pathways mediated by nuclear factor (NF)-kB, interferon regulatory factor 3 (IRF3), and type 1 interferon (IFN-α/β) [108,109]. These pathways rely on protein ubiquitinylation for their activation-one outcome is the modification of the pathway factor with ISG15. Thus, researchers posit that the CCHFV OTU domain might be an important virulence determinant, as some differences were observed in the functionality of the domain in very virulent CCHFV and the less virulent DUGV [112]. The observation of a potential structural and phylogenetic difference between the nairovirus OTUs of pathogenic and non-pathogenic nairoviruses is therefore intriguing.
Position-specific iterated (PSI)-BLAST searches had previously demonstrated that the region between the nairovirus OTU-like cysteine protease domain and the core region of the RNA-dependent polymerase (domain A) is conserved [42,113] (Figure 7). A region, including the C 2 H 2 -type zinc finger domain and several aa positions conserved among all RNA-dependent RNA polymerase modules of segmented negative-sense RNA viruses, was named domain B [106,107] (Figure 7 and Figure S3). Interestingly, this area of L is structurally similar to an oxidoreductase (PDB: 1mv8, a GDP-mannose 6-hydrogenase; positions 328-433; confidence 86.5%: identity 64%). Although several aa residues essential for the proper structure and function of this domain are conserved among nairovirus and other bunyavirus Ls, no correlation between mutational sensitivity and conservation was detected (data not shown). Downstream, Phyre2 analysis identified an area with similarity to the mammalian suppressor of yeast Sec 4 (Mss4)-like superfamily of proteins, of the family Rab guanine nucleotide exchange factor (GEF) Mss4 (PDB: 2fu5, confidence 67.4%, identity 44%). Interestingly, this area also contains several aa residues that are highly conserved among all nairoviruses (Figure 7 and Figure S3). However, no correlation with mutational sensitivity was detected (data not shown).
A region immediately upstream of the core polymerase region is highly conserved among arenaviruses and bunyaviruses (domain C) (Figure 7 and Figure S4; residues conserved among families are highlighted in orange in the figure) [114]. In the case of CCHFV, domain C includes a leucine zipper motif, which is found in all nairovirus genomes. Leucine zipper domains are a common three-dimensional structural motif of transcription factors, characterized by a periodic repetition of leucine residues at every seventh position over a distance covering eight helical turns [115]. The polypeptide segments containing these periodic arrays of leucine residues were proposed to exist in an alpha-helical conformation, and the leucine side chains from one alpha helix interdigitate with those from the alpha helix of a second polypeptide, facilitating dimerization. Basic-region leucine zippers (bZIPs) are a class of eukaryotic transcription factors including leucine zipper domains of 60 to 80 residues in length with highly conserved DNA binding basic regions [116]. The nairovirus leucine zipper domain surrounds a highly conserved NRRQ domain in the center of the 7 moderately conserved leucine positions. Of note, this motif is also structurally similar to human immunodeficiency virus 1 Nef (PDB: 2xi1, positions 97-110, confidence 11%, identify 43%), which includes the functional conserved motif XR [117,118]. Interestingly, structural analysis of nairovirus domain C revealed a similarity to RNA-binding an endoribonuclease VapD (PDB: 3ui3, positions 69-132, confidence 45.6%, identity 40%).
Finally, domain D, which has been previously described only for orthobunyaviruses (Figure 7 and Figure S5; residues conserved among family members are highlighted in orange in the figure), could also be identified as a conserved feature of nairovirus Ls, but structural similarities could not be detected.
Broadening the bunyaviral sequence space to encompass as many bunyaviruses and bunya-like viruses as possible is necessary to elucidate their true phylogenetic relationships and to establish a modern comprehensive bunyaviral taxonomy that adequately reflects evolution. Here we reported 23 bunyaviral genome sequences (Table 1) with the goal of understanding the relationships of nairoviruses and nairo-like viruses-a group of bunyaviruses for which almost no sequence information was available until very recently [29,42]. While we confirmed the overall monophyly of the current genus Nairovirus, our results indicated that (a) novel species will have to be established; (b) that some nairoviruses will have to be moved between species; (c) that a long-thought phlebovirus is actually a nairovirus; and (d) that several newly discovered bunyaviruses do not directly fall into the nairovirus clade but are nevertheless more closely related to nairoviruses than to other bunyaviruses (Figures 1-3). We propose here that in the absence of at least coding-complete genomic information, "bunyaviruses" should not be classified into bunyaviral taxa but rather be seen as putative members of the family Bunyaviridae. Applying this stringent criterium to the list of "nairoviruses" (Table S1), we propose a new taxonomic organization for the genus Nairovirus in Table 2. This organization is overall in line with a similar, very recent, proposal by Walker et al. [42].
Coding-complete genome sequences of at least five additional nairoviruses (Burana virus, Caspiyi virus, Chim virus, Geran virus, and Tamdy virus) have been determined but are not yet available for analysis [30][31][32][33][34] (Table S1). Therefore, the taxonomy proposed in Table 2 should be considered preliminary until these sequences become available and incorporated. In particular, the Burana virus and Tamdy virus sequences may help to further refine the Burana and Qalyub genogroups.
Genomic reassortment (i.e., the relatively free swapping of S, M, and L segments, between taxonomically diverse bunyaviruses simultaneously infecting the same host and consequently resulting in novel viruses) has been described for orthobunyaviruses, phleboviruses, and tospoviruses [127]. Interestingly, our analysis did not reveal any signs of inter-nairovirus reassortment. This result suggests that distinct nairoviruses may rarely have the opportunity in nature to infect the same host at the same time or that molecular-biological constraints prevent inter-nairovirus reassortment. In vitro experiments should be performed to evaluate these hypotheses. Table 2. Proposed new taxonomy of the genus Nairovirus based on genomic data. Viruses mentioned in Table S1 but not here ought to be considered putative nairoviruses that based on current data cannot/should not be classified. Proposed new taxa are highlighted in red and placed in quotation marks.

Species Virus Members
"Burana Nairovirus" Two recent studies on arthropod samples revealed the existence of at least five bunyaviruses (SWSV-1, Shāyáng spider virus 1, SBV, Wȗhàn millipede virus 2, and XSV) that appeared to be closely related to nairoviruses [37,41]. Interestingly, SBV and Wȗhàn millipede virus 2 genomes were found to consist of only two segments, rather than the bunyavirus/nairovirus-typical three segments. Our phylogenetic analyses (Figures 1-3 and Figure 8) indicate that all five viruses should not be classified in the genus Nairovirus, but confirm that these viruses are more closely related to nairoviruses than to all other bunyaviruses. To further resolve phylogenetic placement of these five viruses, we performed PAirwise Sequence Comparison (PASC) of bunyavirus sequences using the National Center for Biotechnology Information's (NCBIs) PASC tool. Histograms from this analysis demonstrates the distribution of the number of virus genome pairs at each identity percentage. Histogram peaks and valleys can be used to differentiate taxon ranks and to establish taxon demarcation criteria using objective criteria [128,129]. A preliminary PASC analysis with representative bunyaviral genomes indicate that cut-offs of «26% and 31%-34% identity for the M and L segments, respectively, ought to be used to uphold the current bunyaviral division into Hantavirus, Nairovirus, Orthobunyavirus, Phlebovirus, and Tospovirus genera (data not shown). Using these cut-offs, PASC confirms the monophyly of the genus Nairovirus as outlined in Table 2 and indicates the need to establish four nairovirus-like genera for (1) SBV; (2) SWSV-1 and XSV; (3) Shāyáng spider virus 1; and (4) Wȗhàn millipede virus 2 (Figures 1-3; Figure 8: nairoviruses in blue and nairo-like viruses in purple). Interestingly, phylogenetic analyses also indicate that the bisegmented arenaviruses, currently not part of the Bunyaviridae, are more related to these viruses than they are to other bunyaviruses, suggesting the need to establish an arenavirus/nairovirus supergroup within the family (Figure 8).

Supplementary Materials:
The following are available online at www.mdpi.com/link, Table S1: All known and putative nairoviruses and nairo-like viruses and their deduced relationships prior to this study; Table S2: M-segment polyprotein descriptions. Figures S1 through S5: conserved features of nairovirus RNA-dependent RNA polymerases (Ls).