Metatranscriptomic Identification of Diverse and Divergent RNA Viruses in Green and Chlorarachniophyte Algae Cultures

Our knowledge of the diversity and evolution of the virosphere will likely increase dramatically with the study of microbial eukaryotes, including the microalgae within which few RNA viruses have been documented. By combining total RNA sequencing with sequence and structural-based homology detection, we identified 18 novel RNA viruses in cultured samples from two major groups of microbial algae: the chlorophytes and the chlorarachniophytes. Most of the RNA viruses identified in the green algae class Ulvophyceae were related to the Tombusviridae and Amalgaviridae viral families commonly associated with land plants. This suggests that the evolutionary history of these viruses extends to divergence events between algae and land plants. Seven Ostreobium sp-associated viruses exhibited sequence similarity to the mitoviruses most commonly found in fungi, compatible with horizontal virus transfer between algae and fungi. We also document, for the first time, RNA viruses associated with chlorarachniophytes, including the first negative-sense (bunya-like) RNA virus in microalgae, as well as a distant homolog of the plant virus Virgaviridae, potentially signifying viral inheritance from the secondary chloroplast endosymbiosis that marked the origin of the chlorarachniophytes. More broadly, these data suggest that the scarcity of RNA viruses in algae results from limited investigation rather than their absence.


Introduction
Viruses are likely to infect every cellular organism and play fundamental roles in biosphere diversity, evolution, and ecology. Those studies of the global virosphere performed to date have revealed marked heterogeneities in virus composition. For example, while RNA viruses are commonplace in eukaryotes, they are less often found in bacteria and are yet to be conclusively identified in archaea. Rather, both the bacteria and archaea are dominated by DNA viruses [1,2]. It is unclear, however, whether such highly skewed virus distributions reflect fundamental biological, cellular or ecological factors of the hosts in question, or because RNA viruses in bacteria and archaea are often so divergent in sequence that they are difficult to detect using primary sequence comparisons alone. questions: (i) is the deep divergence between the two algae taxa also reflected in their RNA virome compositions? (ii) does their RNA virome provide evidence for complex evolutionary histories, including horizontal transfer events? (iii) is the RNA/DNA virus bias between algae and green plants an artefact of sampling or reflect a more fundamental biological division? More generally, we aimed to broaden our understanding of the biodiversity of the algal virosphere as this may have implications for understanding and managing the roles played by algae in global element cycling, climate forcing and biotechnology, and as reservoirs for genetic novelty [34,35].

Total RNA Extractions
For total RNA extractions, RNAlater was removed by low centrifugation, algal cells were disrupted using thaw/freezing cycles and bead beating (0.1-0.5 mm), and total RNA was extracted using the Qiagen ® RNeasy Plant mini kit following the manufacturer's instructions.

Total RNA Sequencing
RNA quality was assessed and TruSeq stranded libraries were synthetized by the Australian Genome Research Facility (AGRF), using either (i) TruSeq stranded with a eukaryotic rRNA depletion step (RiboZero Gold kit, Illumina) for ALG_2, or (ii) the SMARTer Stranded Total RNA-Seq Kit v2-Pico Input Mammalian libraries (Takara Bio, Mountain View, CA, USA) for ALG_1 and ALG_3 (Table S1), due to the low amount of input RNAs in these libraries. The resulting libraries were sequenced on an Illumina HiSeq2500 (paired-end, 100bp) at the AGRF. Library descriptions and RNA-seq statistics are summarized in Table S1.

Read Depletion and Contig Assembly
The RNA-seq data were first subjected to low-quality read and Illumina adapter filtering using the Trimmomatic v0.36 program [38]. Ribosomal RNA was depleted with the SortmeRNA v3.0.3 program [39] using the SILVA v32 database [40], which removed between 86 and 94% of the total unfiltered reads (Figure S1A/2). Read-depleted libraries were then de novo assembled using the Trinity v2.5.1 program [41] and contigs shorter than 200 nt were removed (the average length of contig assembly is shown in Figure S1B/2). Contig abundances were calculated from the RNA-seq data using the Expectation-Maximization (RSEM) v1.3.1 software [42] and expressed as the expected read counts. An analysis of the assembly quality was attempted by estimating the proportion of full-length transcripts in each library using the "analyze_blastPlus_topHit_coverage.pl" script available in Trinity package. Briefly, this analysis consists in aligning all the transcripts obtained after each de novo assembly against the SwissProt/UniProt database using BLASTx and extracting the number of proteins aligned depending on their level of coverage (percentage of the top-hit sequence).

RNA Virus Detection Using BLASTx and BLASTn
The similarity of contigs to the current NCBI nucleotide (nt) and protein (nr) databases was determined using the BLASTn v2.2.30 and Diamond BLASTx v0.9.32 programs [43], respectively, employing 10 −10 and 10 −05 as e-value cut-offs and the more sensitive option in BLASTx. RNA virus-like sequences were also identified using BLASTx against all RdRp protein sequences available on NCBI/GenBank. False-positive signals for RNA viruses were removed by BLASTing RdRp-like sequences against the nr database and discarding sequences displaying a nonviral sequence as the best hit, based on BLASTx scores.

D Protein Structure Prediction of RdRp-Like Contigs
To infer a structural model for the distant RdRp signals detected using profiles, sequences displaying a RdRp-like signal were subjected to the normal mode search of the Protein Homology/analogY Recognition Engine v 2.0 (Phyre2) web portal [46]. Briefly, this program first compares the submitted amino acid sequence to a curated nonredundant nr20 data set using HHblits [47]. It then converts the conserved secondary structure information as a query against known 3D-structures using HHsearch [48]. A final structural modeling step based on identified structural homologies is performed as described previously [46].

Revealing Host-Virus Associations
A challenge faced by all metagenomic studies is confidently assigning each viral sequence to a particular host in a given sample. We used algal cultures to minimize the number of potential additional cellular hosts. These cultures were, however, nonaxenic (i.e., cultures not purified from other contaminating organisms), with mainly bacteria present. To evaluate the possibility of additional microeukaryotic cells in the sample, we obtained taxonomic identification for contigs in the meta-transcriptome by aligning them to the NCBI nt database using the KMA aligner v1.2.11a and the CCMetagen program [50,51] v1.1.3. Contigs matching an entry in the nt database were displayed as Krona plots and classified based on their taxonomy (using high taxonomic levels for clarity).

Phylogenetic Analysis
RdRp amino acid sequences were aligned using the L-INS-I algorithm and default parameters in the MAFFT program v7.402 [52] and trimmed with TrimAI v1.4.1 (automated1 model). Maximum likelihood phylogenetic trees were then estimated using IQ-TREE v2.0-rc1, employing ModelFinder to obtain the best-fit model of amino acid substitution in each case, with nodal support assessed using 1000 bootstrap replicates and 1000 replicates of SH-like approximate likelihood ratio test (SH-aLRT) [53,54]. For each tree, reference genomes and corresponding RdRp sequences were retrieved from the NCBI viral genome resource (https://www.ncbi.nlm.nih.gov/genome/viruses/). To depict the evolutionary relationships of the newly discovered viruses as meaningfully as possible, the closest unclassified BLASTx homologs were used in the phylogenetic analysis. This resulted in alignments of 237, 76,198, 558 and 325 RdRp protein sequences for Amalga-like, Mito-like, Tombus-like, Virga-like and Bunya-like virus groups, respectively.

RT-PCR Validation
Viral contigs were validated experimentally and associated to individual algal sample using Reverse Transcription (SuperScript ™ IV reverse transcriptase -Invitrogen, Carlsbad, CA, USA ™) followed by PCR (Platinum ™ SuperFi ™ DNA polymerase -Invitrogen ™), with specific primer sets for each contig. The rbcL and tuf A marker genes were used as PCR positive controls using sets of primers designed in [55]. All primers and PCR conditions used in this study are described in Table S2.

Data Availability
The libraries sequenced here are available at the Sequence Read Archive (SRA) under BioProject PRJNA668187. The consensus sequences of the all novel viruses identified here have been submitted to GenBank and assigned accession numbers MW086576-MW086593.

The RNA Viromes of Two Divergent Groups of Microalgae
Our aim was to determine the RNA viromes of six microalgae cultures from six different algal species classified into two highly phylogenetically distinct algal clades: the chlorarachniophytes (Rhizaria) and the green algae (Chlorophyta, Archaeplastida) ( Figure 1A,B). To the best of our knowledge, this is the first identification of viruses in samples from the Chlorarachniophyceae ( Figure 1C) [15]. Representation of algae supergroups among the diversity of eukaryotes (latest eukaryotic classification retrieved from [29]). The phylogenetic tree was adapted from [56]. Pictures illustrate some of the samples used in this study and corresponding clades are marked with "*". (B) Pictures of algae cultures used in this study. (C) The current extent of the microalgae virosphere. The viral sequence counts for each virus class (DNA or RNA, single-stranded or double-stranded) were retrieved from VirusHostdb [16] according to 11 major eukaryotic algae lineages. Microalgal lineages investigated in this study are highlighted in bold.
While our limited understanding of chlorarachniophyte viruses can be explained by the small number of species characterized to date (only 15 in Algaebase database), the Ulvophyceae is an abundant and diverse algal lineage in existence since the late Proterozoic and comprises at least 1933 species [57]. It contains a wide range of morphologies from unicellular benthic algae to large seaweeds [58] and its representatives commonly occur in marine, terrestrial and freshwater habitats [20]. We therefore performed RNA sequencing (meta-transcriptomics) on six microalgal species belonging to both the green algae (classes Ulvophyceae and Mamiellophyceae) and chlorarachniophytes ( Table 1).
Because of variable RNA quality and quantity, fewer nonrRNA reads were obtained for the ALG_1 and ALG_3 libraries. This may in large part explain the limited length of contigs, the reduced number of estimated full-length transcripts, and ultimately the lower number of viral sequences, compared to ALG_2 ( Figure S3).
In total, we identified 18 new putative viral sequences using a standard sequence similarity search among the three libraries. These largely comprised viruses with double-stranded (ds) RNA or single-stranded positive-sense (ssRNA(+)) genomes (Table 2). However, a divergent bunya-like partial sequence was also retrieved from Chlorarachnion reptans and may constitute the first negative-sense (ssRNA(−−−)) virus identified in microalgae. Importantly, the presence of these viruses was validated by RT-PCR on each total extracted RNA ( Figure S1). In each case these viruses exhibited very low levels of sequence similarity to existing RdRp amino acid sequences, with sequence identities ranging from only 27 to 38%. With the exception of Virga-like bellevilovirus, Bunya-like bridouvirus and Amalga-like boulavirus for which partial sequences have been retrieved, the length and genomic organization (ORF numbers, predicted protein length, etc.) of all of the new viruses described in this study are similar to the well-annotated full-length genomes of reference homologs. It is therefore likely that they correspond to full-length genomes. In addition, the lack of frameshifts and premature stop codons means these sequences are very likely true (exogenous) viruses rather than endogenous viral elements (EVEs) inserted into host genomes.
Eight of the viruses identified in Ostreobium sp. fell within the Narnaviridae or Tombusviridae. In contrast, five viral sequences from Ostreobium sp. and K. allantoideum do not fit into defined taxonomic groups and were instead related to the broad set of 'partiti-like' viruses that comprise the Partitiviridae, Totiviridae and Amalgaviridae. Finally, more divergent but detectable sequence similarities to the Virgaviridae (+ssRNA) were obtained for samples from the chlorarachniophyte library.

Detection of Divergent Viruses Using Protein Structural Data
An additional attempt to detect even more divergent RNA viruses was conducted was using protein structure. In particular, it is possible that highly divergent viruses are part of the unknown orphan sequences (i.e., contigs with no match in nt/nr databases, or the 'dark matter') that comprise between 50-60% of total contigs obtained in this study ( Figure 2A). Accordingly, we attempted to detect evolutionary-conserved features of protein structural and functional motifs in orphan sequences that encode unknown ORFs, using a cut-off of 200 amino acids (600 nucleotides): we chose this size because it is shorter than most RdRps [59] yet should be long enough to make evolutionary inferences. The corresponding translated ORFs were compared to protein profiles from the PFAM RdRp clan and the VOG databases using the hidden-Markov model-based HMMer3 program. To help exclude false-positives, all positive hits were compared to the entire PFAM database. This resulted in the identification of three nonphage contigs that displayed homology to the RNA virus RdRp: ALG_2_DN19089, ALG_2_DN594 and ALG_3_DN34624 (Table 3).  To manually assess the level of confidence of the RdRp signal detected in the HMM comparisons, the protein sequences of the three RdRp candidates were aligned to amino acid sequences retrieved from the RdRp_C, RdRp_4 and RdRp_1 PFAM profiles. The ALG_2_DN594 contig displayed similarities with the RdRp_C profile that represents the C-terminal of the RdRp (Protein A) found in alphanodaviruses. Unfortunately, this C-terminal region lacks the key functional motifs usually associated with the RdRp, preventing us from definitively establishing the ALG_2_DN594 contig as a true RNA virus. Similarly, the ALG_3_DN34624 alignment with the viral RdRp sequences that comprise the PFAM RdRp_4 profile (PF02123) does not show conservation of the crucial functional residues at the A, B and C-motifs within the RdRp, particularly the canonical motif C that is normally GDD, yet GFD in ALG_3 contig ( Figure S2B): strikingly, the GFD motif is absent from an alignment of 4627 viral RdRp sequences [60]. Whether this reflects a newly identified functional motif remains to be determined, but we cannot safely conclude that ALG_3_DN34624 encodes a viral RdRp. In contrast, the ALG_2_DN19089-encoded ORF shared motifs with RdRp_1 profile (PF00680), including motif A at positions 437-442, motif B at positions 507-517 and a GDD motif C at positions 557-559 of the RdRp alignment ( Figure S2A). Because of the presence of these functional motifs, ALG_2_DN19089 can be confidently considered as a true RdRp-encoding contig and will be referred to here as 'Partiti-like adriusvirus'. Interestingly, this contig also revealed significant similarity to some eukaryotic chloroplast-associated double-stranded RNA replicons (BDRC) obtained from the green algae species Bryopsis cinicola [61] (Table 2). It is therefore likely that these BDRC dsRNA Bryopsis-replicons in fact represent viral RdRp sequences [22], and we treat them as such in this study.
An additional BLASTx comparison using this divergent Partiti-like RdRp as a reference identified two other BDRC-like contigs in the Ostreobium sp. data set-ALG_2_DN19300 and ALG_2_DN19436 (Table 2). Along with the Partiti-like adriusvirus, these two additional sequences were both validated by RT-PCR ( Figure S1) and are listed in the viral contig table as 'Partiti-like lacotivirus' and 'Partiti-like alassinovirus', respectively ( Table 2).
The remaining hits from the RdRp-profile analysis-ALG_2_DN594_c0_g2_i1_len711 and ALG_3_DN34624_c0_g1_i1_len2077-were used in a Phyre2 protein structural analysis. However, this revealed no confident identification of a viral RdRp (i.e., the confidence levels of structural models obtained were <90%).

Relative Abundance and Prevalence of RNA Viruses in the Samples
Relative virus abundance varied between libraries and viruses of the same family in the Ostreobium sp. culture, with viral-like sequences constituting between 0.01 (considered as average abundance) and 1.2% (considered as very high abundance) of the total non-rRNA reads (Table 2, Figure 2). Each virus described was identified in only one of the cultures sequenced ( Figure S1). However, intersample BLAST-comparisons revealed similarity between the partiti-like sequence identified in the K. allantoideum and Ostreobium samples (30% amino acid identity). A nonannotated ORF from a K. allantoideum contig, ALG_1_DN2506, aligned with the N-terminal ORF of Ostreobium sp. amalga-like virus contigs that potentially encode the virus coat protein, but the high level of divergence prevented us from performing any phylogenetic analysis of proteins other than the RdRp. Because of their co-occurrence in K. allantoideum (1.1 and 1.2 PCR, Figure S1) and their similar abundance levels, it is likely that ALG_1 DN2506 and ALG_1 DN2691 (referred as 'Amalga-like boulavirus', Table 2) contigs are part of the same genome. Unfortunately, the poor quality of RNAs and the resulting high degree of fragmentation obtained in the ALG_1 RNA-seq library did not allow us to resolve this question.
Notably, a large majority of the new RNA viruses reported here come from the ulvophyte Ostreobium sp. clonal culture, although this may in part result from differences in RNA quality and sequencing rather than a true biological difference in RNA virome composition and diversity ( Figure  S3). Difficulties in detecting highly divergent viral sequences, especially in poorly characterized and distant clades such as the chlorarachniophytes, may also contribute to the different numbers of viruses observed between libraries.

Detection of Possible Secondary Hosts
As the algal cultures analyzed here were not axenic, we assessed the diversity and relative abundance of other potential eukaryotic organisms in these samples. Indeed, algae cultures are commonly co-cultured with bacteria, fungi and other endosymbiotic algae [62]. The rRNA depletion performed during the RNA-Seq library preparation prevented us from using standard 16S/18S profiling. We therefore evaluated the presence of other eukaryotes in the samples using CCMetagen [50]. According to the Krona plots obtained for each library, cultivated algae were, as expected, the dominant organism found in the samples, representing between 79-99% of all assigned contigs (Files S1-S3). Nevertheless, a small proportion (2-8%) of contigs from ALG_1 and ALG_3 were assigned to dinoflagellates and Cyanophora algae (Files S1 and S3). Although sequences assigned to Lingulodinium polyedrum potentially result from GenBank mis-annotation and were likely of bacterial origin, the Coolia malayensis (Dinophyceae), Amphidinium sp. (Dinophyceae) and Cyanophora paradoxa (Glaucophyta) associated sequences likely constitute true assignments. We therefore suspect that these additional micro-eukaryotic transcripts may have arisen from cross-contamination with additional algae samples that were extracted and prepared at the same time and sequenced in the same run. Importantly, however, none of the viruses identified in the three libraries studied here could be detected in the transcriptomes of co-processed Dinophyceae and Glaucophyta cultures, suggesting that these low-abundance contaminants are not the hosts of the viruses reported here. In addition, a minor portion of the contigs in ALG_2 and ALG_9 was assigned to bacterial species (0.3% and 5%, respectively; Supplementary Files S2 and S3). To prevent any misinterpretation, particular care was taken to remove bacteriophage-like signals from the final virus-like sequence files.

Partiti-Like dsRNA Viruses
Eight of the newly described viral sequences exhibited RdRp amino acid sequence similarity to Partiti-like viruses (i.e., relatives of the Partitiviridae) and found at various levels of abundance ( Table 2). Based on phylogenetic studies, five of the viral-like sequences are close to those from the Amalgaviridae (named after their mosaic status comprising both a Partitiviridae-like RdRp and the dicistronic and monopartite genomic organization of the Totiviridae [63]) and form a clade with the Bryopsis mitochondria-associated dsRNA (BDRM), although they share only 28-32% sequence identity (Table 2, Figure 3). BDRM was first described as a dsRNA associated with mitochondria in Bryopsis cinicola macroalgae [64] and later classified as a virus by the International Committee on Taxonomy of Viruses (ICTV). Like Ostreobium and Kraftionema, Bryopsis belongs to the class Ulvophyceae, and it seems likely that all these five newly identified Amalga-like viral sequences ( Figure 3) form an Ulvophyceae-infecting viral clade. Given the level of RdRp pairwise identity between these sequences (Table S3, top) we assumed that each constituted a new species. To perform a preliminary taxonomic assessment, we used the PAirwise Sequence Comparison (PASC) tool available at the NCBI [66]. Each of these five new BDRM-like viral genome sequences were compared with the Amalgaviridae full-length genomes available. The closest matches to existing Amalgaviridae sequences were retrieved for each newly discovered virus, and the resulting pairwise identity distributions compared with those observed within and between Amalgaviridae genera ( Figure 4A). While this analysis indicates that these newly identified virus sequences are not part of any existing Amalgaviridae genus ( Figure 4A), whether they can be considered as a new genus within the Amalgaviridae, or even a new family, is currently unclear and will require formal taxonomic assessment by the ICTV. Interestingly, if these Amalga-like sequences are translated into amino acids using the protozoan mitochondrial code they display the same genomic organization as BDRM, encoding two overlapping ORFs: the 5 one encoding a hypothetical protein and the other encoding a replicase through a -1 ribosomal frameshift [67] (Figure S4). However, it is unclear if these sequences should be translated using the standard cytoplasmic code, and such sub-cellular localization remains to be validated. It is also notable that the two closest homologs of BDRM, the Amalga-like dominovirus and Amalga-like chaucrivirus, also contain the GGAUUUU ribosomal-1 frameshift motif and the two encoded ORFs could plausibly be translated in this manner ( Figure S4).
The length and two-ORF encoding genomic structure of the BDRM-like sequences generally correspond to genomic features of the amalgaviruses (Figure 3, right). Despite a lack of detectable sequence similarity at both the sequence (BLASTx) and structural levels (i.e., the Phyre2 analysis), the second ORFs predicted in these amalgavirus-like sequences are expected to encode a CP-like protein, even if the involvement of this potential CP in encapsidation remains unclear [68].
The three BDRC-like sequences identified from Ostreobium sp. also cluster with the Partiti-like viruses ( Figure 3, Table 2) and can be classified as three different species after applying the commonly-used 90% RdRp percentage identity species demarcation criteria (Table S3, Middle). Notably, the genomic organization of these BDRC-like contigs seems "inverted" compared to members of the Totiviridae and Amalgaviridae; that is, a first ORF encoding a CP protein is followed by a second that represents the RdRp ( Figure 5). Indeed, the RdRp encoded by the Partiti-like ALG_2 contigs is close to the 5 extremity, followed by a second ORF. This second ORF could potentially encode a CP, although functional annotation could not be achieved due to the high level of sequence divergence.

Mitovirus-Like ssRNA(+) Viruses
Seven viral sequences from Ostreobium sp. clustered in the Narnaviridae, forming a clade within the genus Mitovirus ( Figure 6). With their single ORF likely encoding an RdRp, and a genome length of~3000 nt that is typical of the Narnaviridae (Figure 6, right), and the relatively high levels of abundances (Table 2), these viral sequences very likely constitute replicating viruses and thus represent a newly-described clade of protist-associated mitoviruses potentially restricted to green microalgae. The closest relative virus identified, Shahe Narna-like virus 6, was isolated from freshwater small planktonic crustaceans (Daphnia magna, Daphnia carinata and Moina macrocopa) belonging to the order Cladocera [65]. These "grazer" animals feed on marine microorganisms including microalgae, and it is therefore possible that Shahe Narna-like virus 6 in fact infects ingested-algae rather than arthropod host, in a similar manner to other members of the Narnaviridae.
Based on the 50% RdRp sequence identity used as a species demarcation criterion in the Narnaviridae (ICTV report 2009), we identified seven new mito-like viral species (Table S3, bottom). To place of these new species among the Narnaviridae we performed a PASC analysis using the putative full-length genomes from the seven new viral species. This revealed that the identity levels of the new sequences fell in the range expected of intra-genus diversity ( Figure 4B). We therefore propose the existence of a new subgroup of mitoviruses, comprising these seven new species as well as the Shahe Narna-like virus 6 ( Figure 6). Whether this clade is associated with mitochondria is currently unclear, and predicted ORFs were obtained using both standard and mitochondrial genetic codes. Figure 6. Phylogeny of the Narnaviridae-Botourmiaviridae group based on the RdRp. Newly discovered viruses from Ostreobium sp. are highlighted in red. RdRp sequences from unassigned RNA virus retrieved from [65] are marked in grey. Left, phylogenetic tree estimated using IQ-Tree with bootstrap replicates and SH-aLTR set to 1000 (values in parenthesis). Right, genomic organization of both viral genomes identified in this study (red) and representative species of each major family and genus used in the phylogeny (Cassava virus C-Botourmiaviridae; Saccharomyces 23S RNA narnavirus-Narnavirus genus; Chenopodium quinoa mitovirus 1 Mitovirus genus). Annotations of Cassava virus C coding sequences: RdRp (Segment I); Putative movement protein (Segment II); Coat protein (Segment III). Branch lengths are scaled according to the number of amino acid substitutions per site.

Tombusviridae-Like ss(+)RNA Viruses
One sequence from Ostreobium sp., the Tombus-like chagrupourvirus, exhibited similarity to members of the Tombusviridae family of ssRNA(+) viruses (Figure 7), grouping with viruses previously identified as infecting plant or plant pathogenic fungi. This could again illustrate a shared evolutionary history between green algae and land plants, and that horizontal virus transfers can occur between plants and their pathogenic fungi. Of note is that the closest relative of Tombus-like chagrupourvirus documented to date, Hubei-Tombus-like virus 12, was isolated from freshwater animals (mollusca Nodularia douglasiae) [65]. According to the lack of distinguishable animal-related contigs in the Ostreobium sp. sample (File S2), and the average abundance level associated to this virus (0.7% of total non rRNA reads, Table 2), we assume this Hubei-Tombus-like virus 12 may, together with the newly tombus-like sequence identified here, constitute a new clade of green algae-infecting viruses.  (Figure 7, right), suggesting that it comprises a full-length genome for this virus. This putative full-length genome sequence was compared to Tombusviridae reference genomic sequences using PASC to assess its taxonomic position ( Figure 4C). Accordingly, the Ostreobium-associated tombus-like sequence could constitute a new Tombusviridae genus.

Virgaviridae-Like ssRNA(+) Viruses
One sequence identified in the Chlorarachnion reptans culture displayed detectable sequence similarity to the Virgaviridae-like RdRp supergroup ( Figure 8) and is present at average abundance in the library (0.01% of all non-rRNA reads). The family Virgaviridae comprise ssRNA(+) viruses traditionally associated with plants and display diverse genomic organizations. The short length of the Virga-like bellevillovirus associated with chlorarachniophytes indicates that this sequence likely comprises a partial genome sequence only. Moreover, the multi-segment structure of the closest relatives suggests that the partial genome recovered in ALG_3 could also contain additional segments not yet identified. Although further host confirmation is required, this newly described RNA virus-like sequence would constitute the first algae virus from the Hepe-Virga group.

Bunyavirales-Like ss(-)RNA Viruses
A partial viral genome, denoted Bunya-like bridouvirus, encoding a RdRp-like signal was identified in the Chlorarachnion reptans sample at an abundance of 0.01% of total non rRNA reads. However, this sequence is highly divergent and cannot be formally assigned to any previously described viral family. Despite this, it is striking that the sequence clusters with a Bunya-Arena-like virus, Shahe bunya-like virus 1 (Figure 9) previously identified in diverse Freshwater small planktonic crustaceans (Daphnia magna, Daphnia carinata and Moina macrocopa) [65] that typically feed on algae. Our phylogenetic analysis places this sequence within the diversity of the order Bunyavirales (Figure 9). In addition to the freshwater organism-associated viruses identified in [65], this contig clusters with several bunya-like unclassified negative-strand viruses isolated from the fungi Cladosporium cladosporioides and the oomycete Plasmopara viticola (Figure 9) that are both plant pathogens. The multi-segment structure of the closest classified family, the Phenuiviridae, suggests that additional segments associated with the partial Bunya-like bridouvirus genome may exist in C. reptans.

Discussion
We aimed to better characterize the RNA virus diversity in two major algal lineages, the chlorarachniophytes and the ulvophytes, for which no RNA viruses had previously been reported. Our investigation of the RNA virus diversity in samples from six microalgae species led to the identification of 18 new and divergent RNA viruses, although with clear homology to five established viral families. While an unequivocal host assignment cannot be formally established on these metagenomic data alone, that the algae studied were the dominant host species in the metagenomic sequencing data, in one case (Ostreobium sp.) representing 99% of all assigned contigs, makes it likely that most if not all of these 18 viruses infect algae hosts. In addition, we identified a number of narnaviruses, a group previously observed in algae [69,70], and our observation of a Bunyavirales-like sequence is similarly in accord with a study that presented evidence for the presence of bunya/phlebo-like viruses in brown algae [71]. As such, the apparent domination of DNA viruses in microalgae at least partially reflects major sampling biases. The concept that there is a potentially large dark matter of algal viruses is further supported by the high proportion of unassigned contigs observed: we speculate that these likely contain a nonneglectable number of highly divergent viral reads.

RNA Virome Similarities between Green Algae and Land Plants
Among the 18 novel RNA virus species described here, seven of those detected in the green algae Ostreobium sp. and K. allantoideum were seemingly related to the Tombusviridae and Amalgaviridae families of plant RNA viruses. Such similarities in RNA virome composition between green algae and land plants are consistent with previous analyses based on the Plant Genome project transcriptomic data that identified partitivirus-like signals in Chlorophyte algae [22]. However, the very limited sequence similarity among these viral families strongly suggests an ancient divergence among them, perhaps even before the chlorophyte-streptophyte split some 850-1100 million years ago (Ma) [57]. The close link between land plant and green algae RNA virosphere is further supported by the recent observation that plant viruses are able to infect nonvascular plants such as mosses and algae [72,73].
The detectable sequence similarity observed between the Partitiviridae and Amalgaviridae also suggests that they share common ancestry [68], despite a wide range of hosts and genomic organizations. As such, it is important to determine whether amalgaviruses are restricted to plants [74][75][76] or, as the Partitiviridae, infect many divergent eukaryotic hosts such as fungi, plants and protists [77,78].

Divergent Homologs to Fungal Mitoviruses Detected in Ostreobium sp.
While we cannot formally identify host associations from our RNA-seq data alone, no fungal-associated contigs were detected in any of the libraries, strongly arguing against the mitovirus-like viruses detected in Ostreobium sp. as being fungal viruses. The presence of a potential new group of protist-associated mitoviruses is of importance as they have traditionally been viewed as restricted to fungal hosts and were only very recently identified in plants [79]. Similar to virus transfer between fungi and land plants, it is possible that the symbiosis and co-evolution between green algae and fungi [80,81] explains the close phylogenetic relationships of their viromes, perhaps including horizontal gene transfer events. Indeed, coral holobionts are the location of frequent interactions between endolithic algae, such as Ostreobium sp., and fungi [82][83][84]. Considering the high levels of sequence divergence between our viral sequences and those associated with fungi and within the clade formed by Ostreobium sp.-associated viruses, it seems likely that any such horizontal gene transfer events are not recent and may have occurred in Ulvophyceae or even Chlorophyta ancestors. It will be of considerable interest to examine this new group of mitoviruses across a larger set of green microalgae species, particularly whether their putative mitochondrial subcellular location is the result of an escape from cytoplasmic dsRNA silencing (as suggested for newly characterized plant mitoviruses [85]) or if they are relics of the eukaryotic endosymbiosis event, particularly as mitoviruses have bacterial counterparts-the Leviviridae [60]. More broadly, these newly-reported mitovirus-like sequences further illustrate the enormous diversity of hosts infected by the Narnaviridae, including such eukaryotic microorganisms as Apicomplexa, Excavates and Oomycetes hosts [86][87][88][89].

Detection of Plant Viruses in the Chlorarachniophytes
The apparent similarity between a Rhizarian (C. reptans) associated viral sequence and land-plant infecting viruses was striking. The Rhizaria and Archaeplastida are assumed to have diverged before the cyanobacteria primary endosymbiosis event, ca. 1.5 billion years ago [90,91]. Thus, a detectable sequence similarity between Rhizaria-associated viruses and those infecting land plants cannot be reasonably attributed to such an ancient evolutionary event. Rather, assuming these viruses actually infect C. reptans, the presence of such land plant-like viruses in Chlorarachniophyte would reflect their more recent acquisition in chlorarachniophytes through either horizontal transfer by common vector/symbiont/parasite ancestors or secondary endosymbiosis (eukaryote-to-eukaryote) events. Indeed, a secondary endosymbiosis event of a green alga in the core Chlorophyta, possibly related to Bryopsidales, led to the origin of the plastid of chlorarachniophytes between 578-318 Ma [33]. Whether this virus (i) constitutes a relic of viruses that infected this engulfed green algal endosymbiont, (ii) is part of the chlorarachniophyte cytoplasm or still associated with the periplastidial compartment (i.e., remnant cytoplasm of the endosymbiont), or (iii) interacts with the nucleomorph (remnants of the green algal endosymbiont nucleus) are key questions in the evolution of eukaryotic RNA viruses. While our data cannot provide answers and still require a formal virus-host association, it will be of interest to extend these analyses to euglenophytes and the dinoflagellate genus Lepidodinium where distinct secondary endosymbiosis with green algae have also occurred, as well as to cryptophytes that also contain the remnant nucleus of its red algae endosymbiont [92,93].

First Report of a Negative-Sense RNA Virus in Microalgae
Our detection of a Bunyavirales-like sequence in C. reptans is the first evidence of a negative-sense RNA virus in microalgae. Considering the extensive host range of the Bunyavirales (land plants, invertebrates, vertebrates and humans), their association with chlorarachniophyte hosts is plausible. However, given its very low abundance in C. reptans, additional work is clearly needed to retrieve the complete virus genome and to confirm the association of such bunya-like viruses with the chlorarachniophytes.
One of the greatest challenges in viral genomics is the ability to detect distant homologies, especially in rapidly evolving RNA viruses. As a first attempt to retrieve such ephemeral evolutionary signals, we scanned orphan contigs using RdRp protein structural information in addition to the standard primary amino acid sequence. Notably, both the RdRp profile and 3D protein structure comparison led to the identification of highly divergent RNA virus candidates, although these remain difficult to annotate. Efforts to better describe the repertoire of sequence and structure of viral RdRps are therefore central to unveiling the RNA virosphere in overlooked eukaryotic organisms.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/12/10/1180/s1, Figure S1: Gel electrophoresis of RT-PCR to validate the putative new RNA viruses in each sample used for library construction and RNA-seq. Figure S2: MAFFT alignment of the HMM-detected RdRp candidates with Pfam RdRPp profiles. Figure S3: rRNA depletion and contig assembly results. Figure S4: Genomic organization of the ALG_2 Partiti-like sequences and Bryopsis mitochondria-associated dsRNA. Table S1: RNA-seq library preparation and sequencing results, Table S2: List of primers used in this study. Table S3 RdRp-based pairwise sequence identity of the viruses newly identified here: File S1: Krona graphs obtained using the KMA and CCMetagen methods for the ALG_1 library. File S2: Krona graphs obtained using the KMA and CCMetagen methods for the ALG_2 library. File S3: Krona graphs obtained using the KMA and CCMetagen methods for the ALG_3 library.