Unveiling Mycoviromes Using Fungal Transcriptomes

Viruses infecting fungi are referred to as mycoviruses. Here, we carried out in silico mycovirome studies using public fungal transcriptomes mostly derived from mRNA libraries. We identified 468 virus-associated contigs assigned to 5 orders, 21 families, 26 genera, and 88 species. We assembled 120 viral genomes with diverse RNA and DNA genomes. The phylogenetic tree and genome organization unveiled the possible host origin of mycovirus species and diversity of their genome structures. Most identified mycoviruses originated from fungi; however, some mycoviruses had strong phylogenetic relationships with those from insects and plants. The viral abundance and mutation frequency of mycoviruses were very low; however, the compositions and populations of mycoviruses were very complex. Although coinfection of diverse mycoviruses in the fungi was common in our study, most mycoviromes had a dominant virus species. The compositions and populations of mycoviruses were more complex than we expected. Viromes of Monilinia species revealed that there were strong deviations in the composition of viruses and viral abundance among samples. Viromes of Gigaspora species showed that the chemical strigolactone might promote virus replication and mutations, while symbiosis with endobacteria might suppress virus replication and mutations. This study revealed the diversity and host distribution of mycoviruses.


Introduction
Viruses are small infective agents composed of nucleic acids including DNA or RNA that replicate in the infected host [1]. Viruses are ubiquitous, presenting not only in most living organisms, such as animals, plants, fungi, and bacteria, but also in diverse environments [2][3][4][5][6].
Viruses do not have a conserved genetic element like bacteria and fungi [7]. Moreover, isolation of pure viral particles is very difficult and challenging for viruses with low titers and coinfecting viruses [8]. The rapid development of high-throughput sequencing (HTS) followed by bioinformatics tools has facilitated the identification of numerous eukaryotic microorganisms in diverse hosts and environments [9]. In particular, metatranscriptomics based on RNA-sequencing (RNA-seq) overcomes several obstacles associated with virus identification [10,11]. HTS techniques enable us to identify both known and novel viruses in the host and provide several types of information, such as viral genomes and mutations [12,13]. Recently, virus-associated studies using HTS have used a term, "virome," which refers to a whole collection of viral nucleic acids in a specific host or a particular ecosystem [14,15]. In fact, virome studies provide several types of information associated with viral communities, such as viral genomes, viral populations, mutations, phylogeny, evolution, recombination, and interaction of viruses with the host or ecosystem, which have been widely used in many such studies.
Viruses infecting fungi are referred to as mycoviruses [16]. Most observed mycoviruses consist of double-stranded (ds) RNA genomes; however, the number of observed mycoviruses with positive or negative single-stranded (ss) RNA and ssDNA genomes is gradually increasing [17,18]. As compared to other viruses infecting animals and plants, most known mycoviruses lack an extracellular route for infection and movement proteins [19]. Some of the identified mycoviruses infecting phytopathogenic fungi have the ability to reduce the fungal hosts' pathogenicity, known as hypovirulence [16]. Many novel mycoviruses are now being identified based on metatranscriptomics, such as 66 novel mycoviruses with 15 distinct lineages from five plant-pathogenic fungi [20] and 59 viruses from 44 different fungi [21]. Moreover, several studies showed that fungal hosts are very often coinfected by various mycoviruses [21,22]. The rapid development and application of HTS techniques has increased the vast amount of sequencing data in public databases, such as the Short Read Archive (SRA) at NCBI, which are freely available. Many studies have demonstrated the usefulness of transcriptomic data for virus identification and virome study without any a priori knowledge of virus infection [21,23]. Here, we carried out in silico bioinformatics analyses to identify viruses infecting fungi using available fungal transcriptome data and addressed several mycovirus-associated findings.

Identification of Virus-Associated Contigs from Fungal Transcriptomes
To identify viruses from fungal transcriptomes, we obtained 3,733,874 fungal assembled contigs (4,268,333,264 bp) from 126 different fungal transcriptomes in the Transcriptome Shotgun Assembly (TSA) database ( Figure 1a and Table S1). Initially, all fungal contigs were subjected to a BLASTX search against the viral protein database, resulting in 7244 contigs based on an E-value of 1 × 10 −10 as a cutoff. After removing sequences from hosts and other organisms, we obtained 468 virus-associated contigs (Figure 1b and Table S2).

Viral Genome Assembly and Phylogeny of Novel Mycoviruses
In total, 120 viral genomes were assembled. Of these, 106 viral genomes could be taxonomically assigned to 16 known viral families, whereas 14 viral genomes were not assigned to any known viral families (Table S3). Most viruses were novel viruses with low nucleotide sequence identity, with an average of 48.23% against known viruses, and had a single genome, except for four viruses with two RNA segments. Except for one ssDNA virus, all viruses had RNA genomes, which were further divided into unknown  We analyzed the genome organization of the 120 assembled viral genomes (Figures 2a and S1). Most viruses encoded a single protein. For example, all viruses in the families Botourmiaviridae, Narnaviridae, Partitiviridae, Bromoviridae, Circoviridae, Phasmaviridae, and Rhabdoviridae encoded an RdRp. Out of nine viruses in the family Gammaflexiviridae encoding an RdRp, three viruses encoded RdRp and movement proteins (MPs). In the family Totiviridae, many identified viruses had both coat proteins (CPs) and RdRp proteins or a single CP or RdRp, except for the genome of Uromyces totivirus D, which encoded two hypothetical proteins. Some viruses in the families Potyviridae, Fusariviridae, Secoviridae, Hypoviridae, and Tymoviridae encoded a polyprotein. The genome organization for viruses in the family Virgaviridae and unclassified viruses was very diverse. For instance, Entomophthora virgavirus A encoded a polyprotein and three hypothetical proteins, whereas Leucocoprinus tobamovirus A encoded an RdRp and MP.
Based on 107 RdRp amino acid sequences, we generated a phylogenetic tree revealing 16 viral families, 1 viral order, and 1 group with unclassified viruses (Figures 2b and S2). To reveal the phylogenetic relationships of the identified viruses in detail, we generated 18 phylogenetic trees according to virus families ( Figures S3-S20). The six major virus families identified were Narnaviridae (20 viruses), Botourmiaviridae (13 viruses), Totiviridae (13 viruses), Gammaflexiviridae (9 viruses), Hypoviridae (7 viruses), and Partitiviridae (7 viruses) ( Figure 2c). The phylogenetic trees revealed the genetic relationships and host interactions of the identified viruses ( Figure 3c). For instance, all viruses in the families Botourmiaviridae, Gammaflexiviridae, and Hypoviridae, as well as nine mitoviruses in the family Narnaviridae derived from this study, showed sequence similarity to those from fungal hosts. However, 11 narnaviruses ( Figure S3), 1 totivirus ( Figure S5), and 2 partitiviruses ( Figure S8) were grouped with those from insects ( Figure 2c).
The 20 viruses identified in the family Narnaviridae were further divided into narnaviruses (11 viruses) and mitoviruses (9 viruses) ( Figure S3). Most narnaviruses were closely related to known narnaviruses from insects except for two Puccinia narnaviruses, which were grouped with the Fusarium poae narnavirus 1. Interestingly, four narnaviruses derived from trypanosomatids, which are parasites of insects, were clustered together with those from insects and fungi. We identified 13 ourmiaviruses in the family Botourmiaviridae from six fungal species in the same clade, which were distantly grouped with 2 ourmiaviruses from plants ( Figure S4). Interestingly, four different Monilinia ourmiaviruses were phylogenetically different from each other. Except for Uromyces totivirus D in the same clade with two Wuhan insect viruses, 13 totiviruses from Uromyces, Podosphaera, and Phakopsora genera were together with other totiviruses from fungi ( Figure S5). Of nine gammaflexiviruses from Leucocoprinus species, Leucocoprinus gammaflexivirus E was distantly related with other gammaflexiviruses ( Figure S6).
Except for Mycosphaerella hypovirus A, six hypoviruses were identified from Monilinia species ( Figure S7). The phylogenetic tree showed that all known hypoviruses were identified from fungi except for two hypoviruses from insects. Among the seven identified partitiviruses, Entomophthora partitivirus D was closely related to other partitiviruses from insects ( Figure S8).
The two identified barnaviruses in this study were very distantly related to the known Rhizoctonia solani barnavirus 1 ( Figure S9) [20]. Of the six benyviruses, we identified two Hubei Beny-like viruses from Entomophthora muscae that cause a fungal disease in flies [3], while the other four viruses were grouped with those from fungi ( Figure S10). We identified an anulavirus in the family Bromoviridae, a circovirus in the family Circoviridae, and two fusariviruses in fungi (Figures S11-S13). In this study, Mucor phasmavirus A was grouped with Mucorales RNA virus 1 and two other insect viruses ( Figure S14). Of the two picornaviruses, the novel Pleurotus picornavirus A from mushrooms showed strong similarity to the known Dicistroviridae TZ-1 in the family Dicistroviridae from human blood ( Figure S15) [24]. Uromyces potyvirus A and Zymoseptoria comovirus A were grouped with other viruses in plants (Figures S16 and S17). Entomophthora rhabdovirus A was closely related to other insect rhabdoviruses ( Figure S18). Three tymoviruses from Leucocoprinus species and all six unclassified viruses were grouped with those from fungi (Figures S19 and S20).  Table S2. The number on the right indicates virus genome size. Genome organization for viruses in other families can be found in Figure S2. (b) Phylogenetic tree using RdRp sequences of 107 identified viruses. Amino acid sequences were aligned with MAFFT and used for phylogenetic tree construction using the maximum likelihood method and the LG + I + G4 protein substitution model implemented in IQ-TREE. The phylogenetic tree is midpoint rooted using FigTree. Viruses are indicated by different colors according to corresponding virus family. The detailed phylogenetic tree can be found in Figure S2. (c) Phylogenetic trees of identified viruses belonging to six major families and related viruses. RdRp sequences of identified viruses and related viruses were aligned by MAFFT followed by sequence trimming using TrimAL. Maximum likelihood phylogenetic trees for individual virus families were constructed using IQ-TREE. The protein substitution model was indicated. Red lines indicate viruses identified from our study. Viruses from plants, fungi, and insects  Table S2. The number on the right indicates virus genome size. Genome organization for viruses in other families can be found in Figure S2. (b) Phylogenetic tree using RdRp sequences of 107 identified viruses. Amino acid sequences were aligned with MAFFT and used for phylogenetic tree construction using the maximum likelihood method and the LG + I + G4 protein substitution model implemented in IQ-TREE. The phylogenetic tree is midpoint rooted using FigTree. Viruses are indicated by different colors according to corresponding virus family. The detailed phylogenetic tree can be found in Figure S2.

Viral Abundance in Different Fungal Transcriptomes
We examined viral abundance in each fungal transcriptome based on FPKM (fragments per kilobase of exon model per million reads mapped) values by mapping raw sequence reads from SRA data on the 468 virus-associated contigs. Unfortunately, few SRA datasets were available (Table S4). After calculating FPKM values for each SRA dataset, we found that the SRA data from the FLX454 system were highly mapped on the virusassociated contigs due to long read sizes (Table S5). Therefore, we excluded seven SRA datasets from further analysis. Finally, we obtained FPKM values from 111 SRA datasets (Table S5). To simplify the process, we combined the viral FPKM values for each identified virus species in the same fungal species (Table S6) We compared the number of infecting viruses and the amount of viral abundance ( Figure 3b) and found that the number of infecting viruses was correlated with viral abundance. For example, both GiM and LeG contained 24 different viruses with a high amount of viral abundance. However, PeR was infected by a single virus, but its viral abundance was very high. Next, we examined the frequency of identified viruses in different fungal species (Figure 3c). Anulavirus, betapartitivirus, potexvirus, and potyvirus were identified in a single fungal species, whereas mitovirus was identified in at least 17 fungal species. Moreover, endornavirus (14 species), the unclassified virus (14 species), partitivirus (12 species), barnavirus (11 species), and ourmiavirus (11 species) were frequently identified in different fungal species. Based on FPKM values, mitovirus showed the highest viral abundance, followed by the unclassified virus and gemycircularvirus.

Analyses of Diverse Monilinia and Gigaspora Viromes
We examined changes of viromes for two selected fungal transcriptome projects for Monilinia and Gigaspora species, respectively. Monilinia transcriptomes comprised 18 different samples established from three different hosts, four different regions, and three different Monilinia species: M. fructicola, M. laxa, and M. fructigena [25]. The number of identified viruses ranged from 1 to 10 (Table S8). In particular, three samples from M. fructigena collected from pear in Emilia-Romagna, northern Italy, were severely infected by many viruses (9 to 10 viruses) (Figure 4a). In addition, the proportion of virus-associated reads in three M. fructigena transcriptomes was relatively high (2.3-6.1%) as compared to those of the other two Monilinia species (less than 1%).  (Table S8). For example, mitovirus was dominant in M. fructicola from the Puglia region, while ourmiavirus was dominant in M. fructicola from the Campania region. Although different growth conditions and tissues were used, we did not find any possible effects on viral populations. Hypovirus was dominant in two samples from plums in the Basilicata region, whereas barnavirus was dominant in three samples from pears in the Emilia-Romagna region (Figure 4b and Table S8).
Next, we analyzed viromes of Gigaspora margarita, which comprised 21 transcriptomes derived from 10 different conditions (Table S9) [26]. G. margarita was infected by at least nine different viruses, and the proportion of virus-associated reads ranged from 0.06-0.33% (Figure 4c). The GOU-13 sample was infected by only four viruses; however, the proportion of virus-associated reads was the highest (0.33%). Two frequently identified viruses in G. margarita were mitovirus and narnavirus (Figure 4d). In particular, mitovirus was the most dominant virus in 6 out of 10 conditions. Although G. margarita was derived from a single strain, the viral populations in each condition were varied. Strigolactone is known to stimulate arbuscular mycorrhizal fungi by activating mitochondria [26]. GOU-15, GDR-31, and GDR-34 samples were treated with strigolactone; however, we did not observe any significant changes in the viromes of G. margarita caused by strigolactone. This study was initially conducted to observe the symbiotic effects of an endobacterium to increase the bioenergetic potential of G. margarita [26]. The number of infecting viruses and viral abundance were slightly reduced in the three symbiotic conditions, GDR-16, GDR-36, and GDR-38, compared with the other seven pre-symbiotic conditions (Figure 4d,e).
Viruses have high mutation rates as compared to their hosts. We examined virus mutation frequency in two different viromes. Although Monilinia was infected by many viruses, only three samples from M. fructigena collected from pear in Emilia-Romagna showed many mutations (Table S10). In particular, barnavirus showed many mutations as compared to other viruses (Figure 4e). By contrast, several mutations were observed in most G. margarita samples (Figure 4f and Table S11). Interestingly, GOU15 and GDR-31 samples treated with strigolactone had many mutations in several identified virus genomes. In GOU-15, containing six viruses, barnavirus (43 mutations) had the highest number of mutations followed by mitovirus (31 mutations) and parvovirus (22 mutations). In GDR-31, containing four viruses having mutations, narnavirus (91 mutations) had the highest number of mutations followed by partitivirus (35 mutations) and narnavirus (28 mutations). Interestingly, the numbers of virus mutations for the two symbiotic conditions, GOU-16 and GOU-38, were 0 and 10, respectively (Figure 4e and Table S11).

Discussion
Here, we conducted a large-scale in silico mycovirus identification and mycovirome study using public fungal transcriptome datasets. Our study focused on mycoviruses, which have not been well explored as compared to viruses infecting other eukaryotic kingdoms. Similarly, a recent study also carried out in silico bioinformatics analyses using available fungal transcriptomes, identifying 59 mycoviral genomes [21]. There were two main differences in materials and methods between the two studies. The first was the diversity of fungal transcriptomes. In our study, we analyzed 126 fungal transcriptomes from at least 11 fungal phyla, whereas the recent study focused on Pezizomycotina fungi of the phylum Ascomycota. The second was reference viral proteins for virus identification.
To increase the number of virus-associated contigs, we used all available viral proteins for the BLASTX search instead of using viral RdRp sequences, revealing a wide range of mycoviruses encoding diverse viral proteins. By contrast, the recent study only used six different viral RdRp families. Although we intensively conducted BLASTX against known viral proteins for virus identification, it is highly likely that some novel virus groups that do not show any sequence similarity to known viral proteins might not have been discovered.
Our approach enabled us to assemble 120 viral genomes and reveal mycoviruses classified into five orders, 21 families, 26 genera, and 88 species, representing the widest range of mycoviral taxonomy in a single study. Although both studies analyzed the assembled mycoviral genomes with phylogenetic trees and domain study, our study additionally showed viral abundance, viral diversity, and mutation frequency using fungal transcriptomes. Our results might provide a better picture of how much of the true overall mycovirome is visible within these results.
To increase specificity for virus identification, BLASTX against the NR database eliminated possible endogenous virus-like sequences, resulting in 468 virus-associated contigs. Due to the low abundance of mycoviruses in fungal transcriptomes, the number of mycovirus-associated reads was very low, with the result that about half of the assembled virus-associated contigs were less than 1 kb. Moreover, it is also possible that the 83 of 126 fungal transcriptomes that did not contain any virus-associated contigs might have been infected by unknown mycoviruses. The low abundance of mycoviruses in the fungal hosts could not be identified by metatranscriptomics; however, they could be identified by PCR based approaches with virus-specific primers. Identification of virus by a single approach does not guarantee virus infection in a living organism.
Most of the transcriptome data in this study were derived from mRNA libraries using poly(A) selection. Therefore, the majority of the viruses identified in this study might have polyadenylated genomes, whereas viruses with DNA or RNA without poly(A) could be under-represented. Most identified virus-associated contigs were derived from ssRNA and dsRNA genomes; however, we identified viruses with negative ssRNA, ssDNA, and dsDNA genomes, suggesting transcribed viral RNAs from DNA genomes by metatranscriptomics. For example, the infection of gemycircularviruses and genomoviruses with a single circular DNA genome in fungi was reported previously [18]; however, this is the first report of the identification of a mastrevirus and alphasatellite from fungal transcriptomes.
The phylogenetic tree revealed that many of the identified mycoviruses were originated from fungi; however, some mycoviruses had strong phylogenetic relationships with those from insects and plants. In particular, four known narnaviruses derived from trypanosomatids were closely related with those from insects and fungi, suggesting that the previously identified narnaviruses from insects might be derived from insect parasites [27]. It is likely that the insect parasites were also included during RNA preparation from insect tissues.
Several virus families, including Bromoviridae, Circoviridae, and Dicistroviridae, were identified from fungi for the first time. Interestingly, Gigaspora circovirus A with ssDNA genomes showed sequence similarity to only the beak and feather disease virus, which causes a viral disease in birds [28]. Circoviruses are circular ssDNAs that encode for two proteins, bidirectional replication initiator protein (Rep) and CP. However, Gigaspora circovirus A has only a Rep. Furthermore, Gigaspora circovirus A was identified in all 10 Gigaspora transcriptomes. This result confirmed the possible infection of a circovirus in the fungus. The strong similarity between the novel Pleurotus picornavirus A and known Dicistroviridae TZ-1 from the blood of febrile Tanzanian children was somewhat interesting [24]. However, other dicistroviruses from insects were grouped together with picornaviruses from insects, suggesting the presence of dicistrovirus in the mushroom. In addition, parvoviruses and flaviviruses, which infect both vertebrate and invertebrate hosts, were detected in fungi for the first time.
Uromyces potyvirus A and Zymoseptoria comovirus A showed strong similarity to other viruses from plants, indicating that both fungi transmit plant viruses that infect both fungi and plants. Fungal transmission of a plant virus has been reported [29]. Moreover, plants are known hosts for viruses in the family Virgaviridae, which were also identified in fungi, suggesting that fungi could be alternative hosts. A previous study reporting the replication of endophytic mycovirus in plant cells supports our finding [30].
The genome organization of the 120 mycoviromes indicates that most mycoviruses had simple genome structures. For instance, many of the identified mycoviruses had an RdRp required for viral replication. In addition, we found that the genome structures of some viruses in the families Barnaviridae, Benyviridae, Gammaflexiviridae, Virgaviridae, and Tymoviridae were not identical, indicating changes of viral genome structures during the mycoviruses' evolution and host adaptation.
A few recent studies revealed the complete genome of a large number of viruses from vertebrates and invertebrates using transcriptomic and complementary approaches [3,4]. By contrast, in our in silico mycovirome study, we faced several obstacles to obtaining complete mycoviral genomes from only transcriptome data because of the low titer of mycoviruses in the fungi and unavailability of original samples. However, 468 mycovirus-associated sequences, in which 120 viral genomes were assembled, provided a comprehensive overview of the distribution of mycovirus taxonomy and their hosts. It was also surprising that the fungi could be the hosts of numerous viruses, as shown in GiM and LeG, which were infected by at least 24 different viruses. In addition, individual mycoviromes showed at least 10 and 8 different viruses infecting Monilinia and Gigaspora species, respectively, revealing the coinfection of mycoviruses in many fungal hosts.
The virome study based on the mapping of raw sequence reads revealed a possible correlation between the number of infecting viruses and viral abundance. This result suggests that the competition of diverse coinfected mycoviruses promoted the viral replication. Although the coinfection of diverse mycoviruses in the fungi was common in our study, most viromes had a dominant virus species. Our study identified the most common mycoviruses, such as mitovirus, endornavirus, benyvirus, ourmiavirus, and partitivirus, with a wide range of fungal hosts, as described previously [20,21]. However, several viruses, including anulavirus, betapartitvirus, potexvirus, and potyvirus, had a limited range of hosts.
The single nucleotide polymorphism (SNP) analysis showed that there were differences in virus mutation frequency among coinfecting mycoviruses, and most mycoviruses did not show strong mutation except for a few mycoviruses, such as barnavirus, mitovirus, and narnavirus.
The study of Monilinia viromes revealed that there were strong deviations in the composition and viral abundance of viruses that originated from identical samples. This might have been caused by virus abundance, RNA extraction, library preparation, or RNA-seq. In addition, hosts and sample regions played an important role in viral populations in different Monilinia species. Although different dark and light conditions and tissues were used for sample preparation, it was not sufficient to determine their effects on the virome without biological replicates.
The study of Gigaspora viromes revealed that the chemical strigolactone might promote virus replication and mutations, while symbiosis with endobacteria might suppress virus replication and mutations. Strigolactones, hormones extracted from plant roots, play an important role in symbiotic interactions between plants and arbuscular mycorrhizal fungi [31]. However, our results demonstrated negative effects of strigolactones on fungi and mycoviruses. The strong symbiotic interaction between endobacteria and Gigaspora species inhibited mycovirus replication. A previous study showed that infection of Sclerotinia sclerotiorum with a pathogenic mycovirus converted S. sclerotorum into a beneficial endophytic fungus [32]. Moreover, field experiments involving spraying mycovirus-infected fungal strains on rape plants were reported to reduce disease and increase yield [32]. We found that treatment with strigolactone promoted the replication and mutation of the mycovirus infecting Gigaspora species. Therefore, it might be of interest to examine whether treatment with strigolactone in Gigaspora species infected with a mycovirus could reduce the pathogenicity of Gigaspora species. Taken together, the effects of strigolactones and symbiosis of endobacteria on mycovirus lifecycles should be further elucidated.
As shown in our study, the compositions and populations of mycoviruses were more complex than we expected. This study is the largest comprehensive mycovirome study shedding light on an array of mycovirome-associated topics, including virus identification, fungal host ranges, taxonomy of mycoviruses, viral genome assembly, phylogenetic analyses, viral abundance, mutations, and changes of mycoviral populations in different conditions. Moreover, this study provides many clues and much information to reveal the diversity and host distribution of mycoviruses.

Construction of Viral Genome and Protein Database
All viral genomes and protein sequences were downloaded from NCBI's viral genome website (https://www.ncbi.nlm.nih.gov/genome/viruses/, accessed on 1 June 2019) on 10 October 2018. The downloaded sequences were used to set up a database for BLAST and DIAMOND using the following commands: "makeblastdb -in viral_genome.fasta -dbtype nucl -out viral_genome" and "diamond makedb -in viral_protein.faa -d viral_protein" [33].

Identification of Virus-Associated Contigs
All fungal transcripts were subjected to BLASTX search against the viral protein database using the DIAMOND program with an E-value of 1 × 10 −10 as a cutoff (diamond blastx -d viral_protein -q fungal_contigs.fasta -e 1 × 10 −10 -k 1 -f 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen -o fun-gal_contigs_viral_protein.txt) [33]. Out of 3,733,874 fungal transcripts, 7244 contigs showing similarity to viral proteins were extracted for further analysis. The 7244 extracted contigs were subjected to BLASTX using DIAMOND with an E-value of 1 × 10 −10 as a cutoff against NCBI's NR database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz, accessed on 1 June 2019) to remove sequences derived from hosts and other organisms such as eukaryotes and bacteria. We ultimately obtained 468 virus-associated contigs.

Viral Genome Annotation
The 468 virus-associated contigs were again subjected to TBLASTX and BLASTN searches against the viral genome database to annotate the virus-associated contigs using an E-value of 1 × 10 −10 . We selected virus-associated contigs larger than 1000 bp for the prediction of open reading frames (ORFs) by the stand-alone version of ORFfinder in NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/ORFfinder/linux-i64/, accessed on 1 June 2019). The predicted ORFs were compared to the related reference virus genome by BLASTP. Conserved domains in the virus-associated contigs were predicted by the SMART program (http://smart.embl-heidelberg.de/, accessed on 1 June 2019) [34].

Phylogenetic Tree Construction
We obtained 153 viral protein sequences composed of 107 viral replicase-associated sequences, 17 CPs, 15 hypothetical proteins, 6 MPs, and 14 other proteins. We only used replicase-associated protein sequences as representatives of identified viruses for the phylogenetic analyses. To examine the phylogenetic relationship of identified viruses with known viruses, 107 viral replicase proteins were divided into 16 viral families, 1 viral order, and 1 unclassified group, resulting in 18 datasets. The viral replicase proteins in each dataset were subjected to BLASTP search against the NR database in NCBI. We included all matched protein sequences and removed redundant protein sequences. Viral replicase sequences in each dataset were aligned using the L-INS-I method implemented in MAFFT version 7.450 [35]. We removed all ambiguously aligned sequences by TrimAl (version 1.2) with the option "automated1" [36]. We determined the best-fit model of amino acid substitution in aligned sequences using IQ-TREE [37]. The maximum likelihood phylogenetic tree in each dataset was inferred using IQ-TREE with the ultrafast bootstrap method [38] and the SH-aLRT branch test [39]. The obtained phylogenetic trees were visualized by FigTree (http://tree.bio.ed.ac.uk/software/figtree/, accessed on 1 June 2019).

Estimation of Viral Abundance in Fungal Transcriptomes
We downloaded all raw sequence data associated with the 43 fungal transcriptomes containing virus-associated contigs from the SRA database in NCBI (https://www.ncbi. nlm.nih.gov/sra, accessed on 1 June 2019). Unfortunately, some raw sequences were not available. In many cases, there were several SRA datasets for a fungal transcriptome. We calculated viral abundance for an individual SRA dataset, not a single fungal transcriptome. The downloaded SRA data were converted into fastq files using the fastq-dump tool implemented in the SRA toolkit program [40]. Each fastq file was mapped on the 468 virus-associated sequences using the Burrows-Wheeler Alignment (BWA) tool with the "BWA-MEM" algorithm [41]. The mapped SAM file was subjected to pileup.sh in the BBMap program (https://sourceforge.net/projects/bbmap/, accessed on 1 June 2019) to extract the number of reads and FPKM values, which is a normalized estimation of gene expression considering the gene length and the sequencing depth in RNA-seq data. To calculate viral abundance in each fungal transcriptome, the number of virus-associated reads was divided by the total number of reads. We expected more reads from the large virus genomes and datasets with high sequencing depth. Therefore, we used FPKM values to calculate the proportion of viral abundance in a specific fungal transcriptome.

Fungal Transcriptome Assembly and Calculation of Viral Mutations
For the virome study, we selected two previously studied fungal transcriptomes, Monilinia transcriptomes including 18 libraries and Gigaspora margarita transcriptomes including 21 libraries. Both fungal transcriptomes were obtained from diverse fungal samples that were infected by many mycoviruses. However, the available assembled fungal transcriptomes were obtained by combining all libraries. Therefore, it was necessary to assemble an individual library for the virome study. Raw sequence reads from each condition were de novo assembled using the Trinity program (version 2.84) with default parameters [42]. Then, each assembled transcriptome was subjected to BLASTN search against 468 virus-associated contigs. From each transcriptome, we extracted virus-associated contigs, followed by BLASTX search against the NR database to remove non-viral sequences. Finally, we obtained virus-associated contigs from the individual condition. We mapped the raw sequence reads against the virus-associated contigs in each condition to calculate viral abundance and SNPs. Viral abundance in each condition was calculated by the number of viral reads and FPKM values.

Identification of Virus Mutations
It is important to select proper reference sequences for the identification of virus mutations. For example, if we use a known reference viral genome for mapping, we can find several SNPs between two different viral genomes. However, the obtained SNPs are not viral mutations. For that, we used viral sequences as references and raw sequence reads in the identical condition for the calculation of SNPs. We obtained SNP results for 18 Monilinia and 21 Gigaspora margarita transcriptomes. SNPs were calculated as follows. For each condition, raw sequence reads were mapped on the viral sequences obtained from the identical condition using the BWA-MEM algorithm with default parameters. We converted the aligned reads in the SAM file format into the BAM file format using SAMtools (ver. 1.3) [43]. After sorting and indexing the BAM file, SNP calling was performed using BCFtools (ver. 1.3) [44]. We obtained a VCF (Variant Call Format) file containing information on SNPs representing virus mutations for each condition.