Unmapped RNA Virus Diversity in Termites and Their Symbionts

Despite their ecological importance, nothing is known about the diversity and abundance of RNA viruses in termites (Termitoidae). We used a metatranscriptomics approach to determine the RNA virome structure of 50 diverse species of termite that differ in both phylogenetic position and colony composition. From these samples, we identified 67 novel RNA viruses, characterized their genomes, quantified their abundance and inferred their evolutionary history. These viruses were found within or similar to those from the Togaviridae, Iflaviridae, Polycipiviridae, Flaviviridae, Leviviridae, Narnaviridae, Mitoviridae, Lispivirdae, Phasmaviridae, Picobirnaviridae and Partitiviridae. However, all viruses identified were novel and divergent, exhibiting only 20% to 45% amino acid identity to previously identified viruses. Our analysis suggested that 17 of the viruses identified were termite-infecting, with the remainder likely associated with the termite microbiome or diet. Unclassified sobemo-like and bunya-like viruses dominated termite viromes, while most of the phylogenetic diversity was provided by the picobirna- and mitovirus-like viruses. Of note was the identification of a novel flavi-like virus most closely related to those found in marine vertebrates and invertebrates. Notably, the sampling procedure had the strongest association with virome composition, with greater RNA virome diversity in libraries prepared from whole termite bodies than those that only sampled heads.


Introduction
Invertebrates, particularly insects, are the most diverse lineage of animals [1]. This diversity is likely to be paralleled by the diversity of their viromes, rendering invertebrates a potentially rich resource of novel viruses. The metatranscriptomic (i.e., total RNA sequencing) study of a variety of invertebrate taxa has revealed that these organisms harbor an enormous diversity of RNA viruses, currently accounting for~36% of the complete RNA virus genomes in the RefSeq database [2]. Not only are invertebrate RNA viruses diverse, but the viromes of individual species are often complex with levels of abundance that are often far higher than those seen in vertebrate species [3,4]. Despite this metagenomic revolution, the current sample of the virosphere remains fragmentary, leading to substantial gaps in our understanding of RNA virus diversity, evolution and ecology. Evidently, a simple first step in changing this picture is to characterize more of those RNA viruses present kit (Illumina, San Diego, CA, USA). Libraries were sequenced on an Illumina HiSeq 2500 platform (100 bp paired-end reads) (Illumina, San Diego, CA, USA). The Sequence Read Archive (SRA) accession numbers for these libraries are: SRR8924822-SRR8924831.
The second set of termites comprised 44 samples, of which 10 produced libraries containing credible RNA viruses (libraries 1, 2, 5-10, [16][17]. These were collected as described in Bucek et al. [7] from various geographic locations (Supplementary Table S1) and are designated as "head-only" libraries. Again, termites were collected and stored at −80 • C until RNA extraction. Rather than utilizing whole body tissue samples, the heads of 2-15 termites were removed and RNA extracted using a standard phenol-chloroform procedure with TRIzol reagent (Life Technologies, Grand Island, NY, USA). RNA samples were poly(A)+ enriched and cDNA libraries were sequenced using an Illumina HiSeq 2500 platform (125 bp paired-end reads) (Illumina, San Diego, CA, USA). The SRA accession numbers for these libraries are: SRR12736817, SRR9968561-SRR9968586, SRR9968593-SRR9968607, SRR9968609, and SRR9968615 (Supplementary Table S1).

Assembly and Virus Identification
Sequenced reads were trimmed using Trimmomatic and de novo assembled using Trinity (2.5.1) [25]. BLASTn (2.6.0) and diamond BLASTx (0.9.10) [26] searches on contigs were used to identify contigs that hit to viruses. These putative contigs were then manually parsed to identify viruses. Only contigs with credible, significant BLAST hits (e-value < 1 × 10 −5 ) and without non-viral CD-search [27] hits were retained. GeneMark.hmm PROKARYOTIC v3.26 [28] was used to predict virus open reading frames (ORFs), the amino acid sequences of which were then extracted. To estimate virus abundance in each library, the Expectation-Maximization (RSEM; 1.3.0) [29] tool was used to determine the number of transcripts per million (TPM) for all contigs. Detailed descriptions of the final set of sequences with sufficient identity to viruses that we refer to here as "viruses", including sequenced genome length, abundance and virus group assignment, are provided in Supplementary Table S2. The consensus sequences of all novel viruses identified here have been submitted to GenBank and assigned accession numbers MW052060-MW052147 and are also described in Supplementary Table S2.

Virus Abundance Measurements
The abundance level of each virus was estimated as the percentage of viral reads from the total read count. Abundances greater than 0.01% were arbitrarily considered as representing a "high" virus abundance, values between 0.01% and 0.001% were considered as "moderate" abundance, while those less than 0.001% were considered as "low" abundance. To reduce the impact of false-positives due to index-hopping, we assumed that viruses were the result of contamination from another library if the total read count was less than 0.1% of the highest count for that virus among the other libraries. This led to the removal of one putative Pafsystermes virus (326 reads in library 20 compared to~180,000 in libraries 14 and 15).

Viruses Excluded from the Analysis
A number of potential virus sequences were excluded: contaminating RNA viruses, likely endogenous RNA viruses and all hits to DNA viruses. Contaminating RNA viruses were bovine viral diarrhea virus 1 and human picobirnavirus. Sequences with significant BLAST hits but lacking a complete genome in high abundance and containing indels that interrupt the ORF predictions were considered to be endogenous and excluded. In particular, almost all samples contained several sequences with similarity to viruses in the Mono-Chu group of RNA viruses that appeared to be endogenous: Hubei chuvirus, Hubei chuvirus 4, Hubei oodonate virus 11, Hubei rhabdo-like virus 3, Lampyris noctiluca chuvirus-like virus 1, Wuchan romanomermis nematode virus 3 and Hubei orthoptera virus 5. The majority of these sequences had no significant full length (or close to full length) hits to the National Center for Biotechnology Information (NCBI) conserved domain database (CDD) and/or no gene-sized ORFs. Because of the difficulty in distinguishing DNA viruses from host genes, particularly those with double-stranded genomes, these were similarly excluded.

Phylogenetic Analysis
All viral ORFs containing sequences corresponding to the virus RNA-dependent RNA polymerase (RdRp) were used to infer maximum likelihood phylogenetic trees based on their amino acid sequences. For ease of analysis, prior to the sequence alignment, RdRp amino acid sequences were placed into the following groups defined previously [4]: the Flaviviridae, Hepe-Virga, Bunya-Arena, Mono-Chu, Luteo-Sobemo, Picorna-Calici, Narna-Levi and Partiti-Picobirna groups. Multiple sequence alignments were constructed using MAFFT (v7.402) [30] with 1000 iterative refinements. Alignments were then trimmed using TrimAL (v1.4.1) [31] to remove poorly aligned sequence positions. Two phylogenetic trees were then estimated for each group of viruses: (i) An initial phylogeny to place the termite viruses in their overall context was inferred encompassing the entire virus group as described above, and (ii) a second more specific tree was made for all clades of termite viruses and their closest relatives. Viruses with replicase sequences sharing >95% amino acid similarity were considered to be members of the same species.

Statistical Analysis of Abundance and Diversity
The R packages Vegan (v2.5-6) [36] and Phyloseq (1.30.0) [37] were used to calculate alpha-and beta-diversity statistics for these viromes. The multcomp (v1.4-13) [38] package was used for statistical tests, and ggplot2 (v3.3.0) [39] was used for data visualization. Analysis of variance (using a Chi-square test) of generalized linear models was used to check for significant relationships between termite taxonomy, sampling procedure, termite diet and source country and a variety of ecological measures (total virus abundance and virome diversity and richness).

Statistical Analysis of Abundance and Diversity
Virus names were chosen to provide meaningful information on the host sample in each case, while simultaneously being both succinct and distinctive as recommended by the International Committee on Taxonomy of Viruses (ICTV). Accordingly, all virus names were constructed with a unique, random prefix combined with the "-systermes" suffix-"-sys-", from Greek, meaning "with", and "-termes", from Latin, meaning "woodworm" and the root of the word termite. We therefore use "-systermes" to convey that the named virus was obtained from within termites and/or the community of organisms associated with termites.

Identification and Annotation of RNA Viruses
We collated samples from 50 species of termite, each of which contained 2 to 25 individual termites. From these samples, we constructed 54 RNA sequencing libraries, with most species represented by a single library (Supplementary Table S1). Libraries produced via the head-only procedure enabled a better association of viruses with the termite host, although they generated fewer data. Across both procedures, the 54 samples yielded an average of 42 million and 34 million sequencing paired-reads, which we reconstructed into an average of 497,862 and 166,012 contigs, for the whole-body and head-only libraries, respectively.

RNA Virus Diversity in Termites
Overall, we identified 67 novel viruses (i.e., novel sequences with virus identity) from 20 sequencing libraries ( Figure 1, Table 1 and full details in Supplementary Table S2), which could be placed into 9 phylogenetic groups ( Figure 2). As many viruses had close relatives to taxonomically unassigned viruses determined by Shi et al. [4], we used similar groupings of virus families to aid identification. In total, 33 positive-sense single-stranded (ss) RNA virus genomes were identified and assigned to six groups based on phylogenetic analysis: the Hepe-Virga, Luteo-Sobemo, Picorna-Calici, Narna-Levi and Flaviviridae groups [4]. Similarly, four negative-sense ssRNA virus genomes were recovered and assigned to the Mono-Chu and Bunya-Arena groups, while 30 double-stranded (ds) RNA viruses were identified and assigned to the Partiti-Picobirna group. Within these major phylogenetic groups, the 67 viruses identified here were assigned to 21 clades that enabled a more specific classification and interpretation (Figures 3-7).
Viruses 2020, 12, 1145 6 of 26 Figure 1. Bar graph depicting the number of reads in each sequencing library, below the x-axis, and the percentage of these reads that map to viruses described in this study, above the x-axis. Each library is labeled by name and termite species sampled. Bars are colored by sampling procedure. Sequenced contigs were annotated using BLAST (BLASTx and BLASTn) and conserved domainbased searches (CD-search) to identify putative viral contigs, excluding those matching commonly contaminating RNA viruses, endogenous viruses, retroviruses and any hits to DNA viruses (see Methods). Overall, an average of 68% of contigs per sample was unidentified (no significant BLAST hit), and 27.4% hit host sequences (i.e., from the phylum Arthropoda), 1.3% hit bacterial sequences and 0.024% of all contigs had a significant hit to a virus genome. After predicting open reading frames (ORFs) for contigs determined to represent true virus genomes, CD-search hits were used for annotation, and the predicted replicase genes were incorporated into phylogenetic trees to determine their identity.

Figure 1.
Bar graph depicting the number of reads in each sequencing library, below the x-axis, and the percentage of these reads that map to viruses described in this study, above the x-axis. Each library is labeled by name and termite species sampled. Bars are colored by sampling procedure. Sequenced contigs were annotated using BLAST (BLASTx and BLASTn) and conserved domain-based searches (CD-search) to identify putative viral contigs, excluding those matching commonly contaminating RNA viruses, endogenous viruses, retroviruses and any hits to DNA viruses (see Methods). Overall, an average of 68% of contigs per sample was unidentified (no significant BLAST hit), and 27.4% hit host sequences (i.e., from the phylum Arthropoda), 1.3% hit bacterial sequences and 0.024% of all contigs had a significant hit to a virus genome. After predicting open reading frames (ORFs) for contigs determined to represent true virus genomes, CD-search hits were used for annotation, and the predicted replicase genes were incorporated into phylogenetic trees to determine their identity. We also attempted to infer host-virus relationships (i.e., which of the viruses identified likely infect termites themselves as opposed to elements of their diet and other microorganisms), although these are necessarily tentative. Host assignments were informed by virus abundances and the host associations of related viruses and the procedure used to produce each sequencing library. We assume that viruses at a high abundance are more likely to be infecting the termite host, particularly when these viruses are closely related to arthropod-associated viruses ( Figure 8) and found in head-only samples (where the gut, containing possible symbiont hosts, is excluded). We now discuss each of the major groups of viruses in turn.

Hepe-Virga Viruses
Two novel viruses were identified within the Hepe-Virga group (Figures 2 and 3). Of these, the newly defined "Mohsystermes virus" is grouped with members of the Idaeovirus genus, a sparsely sampled group of plant-infecting viruses, and found at a moderate abundance of 0.0087% of the total reads in the H. ferox library 12 ( Figures 3A and 9). Mohsystermes virus is considered to have a close to complete genome, as it has a similar total genome size (9888 nt) to its relatives Brown algae RNA virus 1 (BAV1), Plasmopara viticola associated virga-like virus 1 (PVV) and Raspberry bushy dwarf virus. Similar to BAV1 and PVV, Mohsystermes virus is lacking a significant hit to a methyl-transferase (Supplementary Figure S1). As Mohsystermes virus has a phylogenetic background with an association with plant hosts and is found at a moderate abundance, we infer the host for this virus to be a constituent of the termite diet.  (Table 2). Phylogenies were inferred using the gene encoding the RdRp, with the virus taxa (usually genera) identified in this study shown in red. Taxonomic groups of interest are labeled on the trees for reference. The sequence alignments used to infer the individual phylogenies are described in Supplementary Table S3. All trees are unrooted with branch length scaled to the number of amino acid substitutions per site.

Figure 2.
Overall phylogenetic trees representing each of the nine groups of RNA viruses (Table 2). Phylogenies were inferred using the gene encoding the RdRp, with the virus taxa (usually genera) identified in this study shown in red. Taxonomic groups of interest are labeled on the trees for reference. The sequence alignments used to infer the individual phylogenies are described in Supplementary  Table S3. All trees are unrooted with branch length scaled to the number of amino acid substitutions per site.
Tohsystermes virus is the second virus identified in this group and has a far greater certainty with respect to both its host association and genome completeness. This virus has a 10,376 nt genome with predicted domains similar to its closest relative, the Myriapod-associated Hubei virga-like virus 8, which has a 10,433 nt genome and 33.8% amino acid identity with Tohsystermes virus ( Figure 3B, Supplementary Figure S1). As Tohsystermes virus falls within the insect-infecting Negev-like virus group that contains insect-associated viruses, has a very high abundance (0.54%) and was obtained from the head-only Constrictotermes cavifrons library, we suggest that it is likely a truly termite-infecting virus.

Luteo-Sobemo Viruses
The Luteo-Sobemo group comprises the plant-infecting Luteoviridae and Sobemoviridae families, although the majority of viruses in this group are formally unassigned ( Figure 2). Indeed, it is striking that the five Luteo-Sobemo viruses identified in this study-Pafsystermes virus, Kofsystermes virus, Mafsystermes virus, Nufsystermes virus and Wifsystermes virus-form a clade that is markedly divergent from either family: These viruses were placed with a set of unassigned arthropod-associated viruses found by Shi et al. [4] ( Figure 3C).
Almost all sobemo-like virus genomes identified are close to complete with a 2500 nt to 3000 nt segment, containing a RdRp domain (CDD: cl02808) and a smaller second segment, with a viral coat domain (CDD: cl29941) (Supplementary Figure S2). Notably, the ORF structure predicted for Wifystermes virus requires the use of a variant genetic code in which UGA is translated as tryptophan. This alternative codon is used in the yeast and mitochondrial genetic codes, both of which can be used to predict ORFs in the other four sobemo-like viruses.             Virus clades are shown on the x-axis by decreasing total abundance, libraries are given on the right y-axis ordered by the phylogenetic relationships of the termites in each library. The abundance, as a percentage of the total reads, of each genome is presented as a heat map while the number of genomes found for each group is labeled for each cell. The schematic phylogenetic tree on the left is adapted from Bucek et al. [7] and represents phylogenetic relationships between termites associated with each library. Tip nodes on this tree are colored by procedure used to collect termites and process them to libraries: Blue nodes represent the "whole-body" procedure, while pink nodes represent the "head-only" procedure. Virus clades are shown on the x-axis by decreasing total abundance, libraries are given on the right y-axis ordered by the phylogenetic relationships of the termites in each library. The abundance, as a percentage of the total reads, of each genome is presented as a heat map while the number of genomes found for each group is labeled for each cell. The schematic phylogenetic tree on the left is adapted from Bucek et al. [7] and represents phylogenetic relationships between termites associated with each library. Tip nodes on this tree are colored by procedure used to collect termites and process them to libraries: Blue nodes represent the "whole-body" procedure, while pink nodes represent the "head-only" procedure.
The five novel viruses identified here often dominate the viromes in which they are detected, with abundance levels greater than 0.003% and as high as 0.94% (Figure 9). Interestingly, Nufsystermes virus and Pafsystermes virus were each detected in the paired libraries from M. darwiniensis and S. intermedius, respectively. Given the very high abundance of these viruses and their phylogenetic relationship to arthropod-associated viruses (Figure 2), we suggest that the novel viruses in this group infect the termite hosts themselves and specifically may be mitochondria-infecting based on their variant genetic code.

Picorna-Calici Viruses
The Picorna-Calici group consists of the large and diverse Picornaviridae and the vertebrate-infecting Caliciviridae, as well as a wide range of other classified and unclassified viruses. We identified seven novel viruses (Moksystermes virus, Hiksystermes virus, Jaksystermes virus, Nuksystermes virus, Laksystermes virus and Feksystermes virus) that fall within the Picorna-Calici group, although these viruses occupied a range of phylogenetic positions and included some of the most divergent ones found in this study (Figures 1 and 8). More precisely, the viruses identified here fall within the insect-associated Iflaviridae ( Figure 4A) and Polycipiviridae families within the Picorna-Calici group ( Figure 4B), as well as in clades of unassigned viruses ( Figure 4C,D). Strikingly, all viruses are found in high abundance (0.017% to 0.45%) (Figure 9), and the majority of the viruses related to the Picorna-Caliciviruses found are associated with arthropods. Hence, we suggest that all of these viruses are termite-infecting.
The majority of the Picorna-Calici group identified here are close to complete based on the length of their genomes in comparison to their documented relatives, although they often lack significant hits to known protein domains (Supplementary Figure S3). Jaksystermes virus is~6000 nt longer than its closest relative, the flea-associated Stamford virus [40]. In addition, Jaksystermes virus contains an unusual hit to HAM1, CDD: cd00515. Nuksystermes virus and Laksystermes virus genomes have a structure which is consistent with that of the Polycipiviridae, with a cluster of four ORFs separated from the replicase ORF [41].

Flaviviridae
The Flaviviridae are notable in that they infect both invertebrates and vertebrates. Some flaviviruses, such as those from the genus Flavivirus, are commonly pathogenic to vertebrates and transmitted among them by arthropod vectors (Figure 2). We identified a single novel member of the Flaviviridae-denoted Waxsystermes virus-that is found in high abundance (0.13%) in a head-only library, and which possesses the genome structure common to members of the genus Flavivirus (Supplementary Figure S4). Interestingly, Waxsystermes virus falls within a flavi-like virus clade that contains two marine vertebrate-associated viruses: Eastern red scorpionfish flavivirus (51.2% amino acid identity) and Wenzhou shark flavivirus (38.1% amino acid identity) ( Figure 5A). The true host for Waxsystermes virus is therefore difficult to determine, and this part of the phylogeny is characterized by a complex mix of invertebrate and vertebrate-associated viruses. While Waxsystermes virus clearly does not infect a marine vertebrate, its high abundance and adherence to the flavivirus genome structure points toward an insect host. We therefore cautiously suggest that Waxsystermes virus is termite-infecting.

Narna-Levi Viruses
The Narna-Levi viruses were the most ubiquitous group of viruses detected in our libraries, with 19 novel viruses identified (Figure 9). The majority of the novel Narna-Levi viruses were placed within and sister to members of the genus Mitovirus that typically infects the mitochondria of fungi and of some plants [42,43] (Figure 5B). All possess small, simplistic, single-segment genomes (Supplementary Figure S5) and were detected in low to moderate abundances ranging from 0.00029% to 0.022%. The exception to this is Wunsystermes virus, present in high abundance (0.37%) in an Occasitermes library. In addition, Wunsystermes virus, as well as Kinsystermes virus, seemingly utilize UGA codons as tryptophan encoding for their predicted ORFs. Such a codon assignment is commonly seen in fungal viruses [42], although it is occasionally seen in plant-infecting mitoviruses [43] and is used in the invertebrate mitochondrial genetic code. Given the strong association of viruses within this group with fungi, the seven novel Narna-Levi viruses that fall within the Mitovirus genus are likely fungus-associated.
In contrast, the further eight novel viruses that fall basal to this clade-Mansystermes virus, Monsystermes virus, Ansystermes virus, Jansystermes virus, Mensystermes virus, Ransystermes virus, Konsystermes virus and Punsystermes virus-are related to arthropod and vertebrate-associated relatives. This, coupled with their low abundances in whole-body libraries, means that we can make no clear inference to their host.
Of the remaining four viruses identified in this group, three fell within the Leviviridae ( Figure 5D): Fonsystermes virus, Vansystermes virus and Ensystermes virus. These viruses have genomes structurally similar to those seen in leviviruses, ranging from 3346 nt to 4931 nt (Supplementary Figure  S5). All were present in low abundances (0.00057% to 0.0023%) in whole-body libraries and fell within a well-known bacteria-infecting family in the phylogenetic analysis, suggesting that they most likely associated with bacteria.
Finally, Cunsystermes virus has the largest genome of this group at 6454 nt and was detected in a clade basal to the genus Narnavirus (Figures 2 and 5C), although its host is unclear. While Cunsystermes virus is present in high abundance in the whole-body M. darwiniensis libraries (0.10% and 0.17%), it is also phylogenetically divergent (sharing only 30% amino acid identity with its closest relative, Wilkie narna-like virus 2) and is placed within an under-sampled clade with mixed host associations. We are therefore unable to make a safe inference for the host of Cunsystermes virus.

Mono-Chu Viruses
Sequences with similarity to the Mono-Chu viruses were relatively commonplace in the termite libraries, but all but one appeared to be endogenous: These sequences had disjointed ORFs and incomplete genomes. Jimsystermes virus is the only Mono-Chu group virus detected that has a credible genome structure indicative of an exogenous virus, with a mononega-like RdRp (CDD: pfam00946), capsid (CDD: pfam14318) and methyl-transferase (CDD: cl27811) domains detected in its largest ORF (Supplementary Figure S6). This virus is detected in both Occasitermes sp. libraries 3 and 4 in high abundances of 0.2% and 0.13% and falls within the Lispiviridae with the other insect-associated viruses ( Figure 6A) [44]. Isopteran arli-related virus, from Coptotermes sp. [22], is the closest relative to Jimsystermes virus with an amino acid identity of 61.3%. Given the clear insect association of the Lispiviridae and the high abundance of Jimsystermes virus, we suggest that this virus is insect-infecting.

Bunya-Arena Viruses
The Bunyavirales and the Arenaviridae, which together comprise the Bunya-Arena group, are associated with both invertebrates and vertebrates. The three novel viruses we detected from this group-denoted Degsystermes virus, Ogsystermes virus and Magsystermes virus-sit phylogenetically with viruses that have a clear association with arthropods, including mosquitoes, fleas, glow-worms and butterflies ( Figure 2, Figure 6B,C). Structurally, all three viruses are similar, with genomes that encode a single, large ORF that contains a Bunyavirales RdRp domain (CDD: cl01709). Notably, these viruses have some of the highest abundances detected in this study, ranging from 0.24% to 0.82% (Figure 9). While only single,~6000 nt segments were detected for most of these novel viruses, a possible second 1974 nt segment in a similar abundance to Degsystermes virus was detected in H. ferox library 14 (Supplementary Figure S7). A second such segment has been observed in the genomes of the related Hubei insect virus 1 and Hubei bunya-like virus 13.
According to our phylogenetic analysis, Ogsystermes virus and Degsystermes viruses fall in a clade of unassigned bunya-like viruses ( Figure 6B), while Magsystermes virus groups within the Phasmaviridae, basal to the newly defined Orthophasmavirus genus [44] (Figure 6C). Notably, these novel phasmaviruses are highly divergent, with less than 28% amino acid identity to their closest relatives (Figure 8). Due to their relatively high abundance as the dominant components of the viromes in which they are detected, as well as their clustering with other invertebrate viruses in the phylogenetic analysis, we suggest that these are termite-infecting viruses.

Partiti-Picobirna Viruses
By far the most numerous and commonly detected group of viruses identified were members of the Partiti-Picobirna group that comprises members of the Partitiviridae and Picobirnaviridae (Figure 2)-non-enveloped dsRNA viruses with~4000 nt genomes. The Partitiviridae are known to infect plants, fungi and protozoa and are transmitted intracellularly. The Picobirnaviridae, however, are more complex. While they are often associated with vertebrate fecal metagenomes [45], there is evidence that they may be infecting bacterial symbionts of vertebrate hosts [46]. We identified 30 novel virus genomes for this group, spanning nine libraries and placed phylogenetically into six clades (Figure 7 and Table 2). No second segment was detected for any virus identified in this group, and genomes detected were typically~1900 nt and only encoded an RdRp domain (Supplementary Figure S8). Importantly, while these viruses were detected in the lowest abundances seen in this study (0.0001% to 0.008%), several were typically detected per each library and only within the whole-body libraries (Figure 9).   The majority of the viruses in this group were found within and sister to the Picobirnaviridae (Figure 2), falling with related viruses sampled from primate feces ( Figure 7A-C) [45,46]. The remaining seven partiti-like viruses fell into three groups: two viruses fell basal to members of the genus Amalgavirus ( Figure 7D), two viruses grouped with unassigned arthropod-associated viruses [4] ( Figure 7E), while three viruses grouped with the fecal-associated Lysoka partiti-like viruses ( Figure 7F). With their low abundance, previous association with feces and representation in whole-body libraries, we suggest that these novel viruses infect termite gut symbionts.

Correlates of RNA Virome Diversity and Abundance
Due to the small sample size and the confounding country and procedure variables (i.e., all whole-body samples were collected in Australia), we could not determine if there was a relationship between virome structure and diet or country. Visual inspection revealed no obvious association between termite phylogeny and virus composition and abundance, aside from members of the same species sharing similar viromes ( Figure 9). Notably, however, the sampling procedure (i.e., whole-body or head-only with poly(A)+ enrichment) used to produce each library had a significant impact on virome composition, with virome abundance (R 2 = 0.33, p = 0.00865), richness (R 2 = 0.61, p = 4.27 × 10 −5 ) and diversity (R 2 = 0.44, p = 0.00144) differing significantly by sampling procedure (Figure 1). Libraries prepared from whole-body samples also had a greater number of reads mapping to RNA viruses: 0.6% of total reads mapping to RNA viruses in whole-body libraries compared to 0.1% in head-only libraries.

Discussion
We present the first survey of the RNA virome of termites, demonstrating that these important organisms harbor a diverse array of RNA viruses. Overall, 50 species of termites were collected, spanning the range of the Termitoidae, yielding 20 sequencing libraries with viruses. From these data, our metatranscriptomic analysis identified 67 novel viruses related to a diverse range of RNA viruses, including the Togaviridae, Iflaviridae, Polycipiviridae, Flaviviridae, Leviviridae, Narnaviridae, Mitoviridae, Lispivirdae, Phasmaviridae, Picobirnaviridae and Partitiviridae (Figure 1 and Supplementary Table S2).
Although our sampling focused on termites, the novel RNA viruses identified likely infect a range of organisms-either the termite itself, one of its symbionts, or an ingested organism. We sought to make inferences for the hosts of all viruses discovered in this study, based on the level of virus abundance and phylogenetic positions of each virus in relation to those previously described ( Figure 8 and Table 2). Although these inferences necessarily range in certainty, we suggest that the Sobemo-like, Negev-like, Picorna-Calici group, Flaviviridae, Lispiviridae and Bunya-Arena group viruses identified in this study most likely directly infect termites. Indeed, these viruses both are found in a high abundance and have relatives that are associated with arthropods [3,4,40,[47][48][49][50][51].
The mitovirus-like viruses, Picobirnaviridae and picobirna-like novel viruses were the least abundant viruses identified, although more phylogenetically diverse. Based on their low abundance and phylogenetic context, as well as their tendency to appear in whole-body libraries, we suggest these groups most likely infect termite gut symbionts. This, in turn, provides a biological explanation for their diversity, as we would expect viruses infecting a gut microbiome to be more diverse than those infecting the termite host itself. However, the Picobirnaviridae are complex as they have been detected in an array of vertebrate fecal material [45,46,52], whole-body invertebrate samples [4] and soil [53]. Indeed, their ubiquity in metatranscriptomic studies may mean that they are in fact often associated with bacterial hosts. Clearly, more work is needed to determine the most likely hosts for these viruses.
An important element of this study was the high level of divergence of the viruses identified: All viruses discussed are novel, and the majority exhibit only 20% to 45% amino acid identity to their closest relatives, and this is independent of their sampling procedure or likely host. An interesting example of this is the divergent flavi-like Waxsystermes virus. This virus is highly abundant and has a genome structure typical of members of the arthropod infecting/vectored genus Flavivirus. It is therefore possible that Waxsystermes virus is indeed termite-infecting, although it is impossible to exclude that it is associated with a symbiont or parasite of the termite, and it is striking that Waxsystermes is most closely related to recently discovered marine-vertebrate associated viruses [54]. As such, establishing the true host of Waxsystermes virus has more broad-ranging implications for our understanding of flavivirus evolution as a whole.
The sampling of termites used in this project was not only designed to identify novel RNA viruses but to enable us to provisionally explore any relationship between these viruses and the ecology and phylogenetic background of the termites sampled. In particular, four species of termite were sampled twice, with samples representing two separate colonies per species (or the soldier and worker caste in the case of the M. darwiniensis samples) ( Table 1). Sample replicates from the same species revealed similar viruses across all these paired samples, and it is important to note that these were the only cases in which the same virus was detected in multiple libraries. Further, this sharing of virome components was strongest in the paired soldier/worker libraries from M. darwiniensis: Two Partiti-Picobirna viruses, three Narna-Levi and the Luteo-Sobemo Nufsystermes virus were detected in both M. darwiniensis libraries 19 and 20. However, colonies sampled were collected from within the same regions, and this cross-infection of colonies may be due to geographic proximity rather than these viruses being species-specific, although the relatedness of viruses discovered from different continents may imply otherwise.
Across the samples as a whole, we detected no association between RNA virus composition or abundance and the background termite phylogeny, indicating that virome compositions lack strong heritability and can change markedly on evolutionary time scales. However, it was also notable that termite-associated representatives of a number of virus groups-Sobemo-like, Iflaviviridae, Mitovirus-like, Bunya-Arena, Picobirnaviridae and Picobirna-like viruses-clustered together on the relevant phylogenies, despite usually being collected at different locations. It may be that some virus lineages display specificity for either termites or their symbionts, as observed for a number of bacterial lineages found in termites [55]. Additional sampling of other insect and animal viromes and comparisons with those found in termites are required to assess this possibility. Strikingly, we saw a significant effect of sampling and sequencing procedure on virome characteristics: Many of our initial head-only samples yielded no virus genomes at all, and of the libraries where RNA virus genomes were identified, those from whole termite bodies had greater virus abundances and richer, more diverse RNA viromes than the head-only samples. Hence, these results show that tissue type may play a more important role in shaping virus compositions, although such a difference may also reflect the use of poly(A)+ enriched sequencing libraries in the head-only samples.
The invertebrate RNA virosphere is clearly immense. This is reflected in our identification of 67 novel, divergent viruses from the first screen of the termite RNA virome. Our work also provides insights into the types of viruses that inhabit the termite and its associated symbiont community.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/12/10/1145/s1, Figure S1: Hepe-Virga virus genome structures, Figure S2: Luteo-Sobemo virus genome structures, Figure S3: Picorna-Calici virus genome structures, Figure S4: Flaviviridae virus genome structures, Figure S5: Narna-Levi virus genome structures, Figure S6: Mono-Chu virus genome structures, Figure S7: Bunya-Arena virus genome structures, Figure S8: Partiti-Picobirna virus genome structures, Table S1: Termite species sampled, country of collection, SRA accessions and total numbers of reads/contigs for all libraries used in this study, Table S2: Abundance, phylogenetic and genome structure information for all viruses identified in this study, Table S3: The number of viral taxa analyzed and alignment length for multiple sequence alignments used in the phylogenetic analysis.