Direct Metatranscriptomic Survey of the Sunflower Microbiome and Virome

Sunflowers (Helianthus annuus L.) are susceptible to multiple diseases in field production. In this study, we collected diseased sunflower leaves in fields located in South Dakota, USA, for virome investigation. The leaves showed visible symptoms on the foliage, indicating phomopsis and rust infections. To identify the viruses potentially associated with the disease diagnosed, symptomatic leaves were obtained from diseased plants. Total RNA was extracted corresponding to each disease diagnosed to generate libraries for paired-end high throughput sequencing. Short sequencing reads were assembled de novo and the contigs with similarities to viruses were identified by aligning against a custom protein database. We report the discovery of two novel mitoviruses, four novel partitiviruses, one novel victorivirus, and nine novel totiviruses based on similarities to RNA-dependent RNA polymerases and capsid proteins. Contigs similar to bean yellow mosaic virus and Sclerotinia sclerotiorum hypovirulence-associated DNA virus were also detected. To the best of our knowledge, this is the first report of direct metatranscriptomics discovery of viruses associated with fungal infections of sunflowers bypassing culturing. These newly discovered viruses represent a natural genetic resource from which we can further develop potential biopesticide to control sunflower diseases.


Introduction
Annual sunflower (Helianthus annuus L.) production is an economically important crop. In China, the total sunflower acreage planted, mainly for the non-oilseed types, was relatively stable with fluctuations from 1978 to 2018. The average yield, however, increased steadily from 1077 kg/ha to 2707 kg/ha, so the overall trend of sunflower production is increasing. In the USA, the yield and production show a stable trend due to the improved hybrid varieties, despite declining sunflower acreage [1]. Both oil and confection sunflower are prone to several major diseases, such as downy mildew, phomopsis stem canker, and rust, which can greatly impact the yield. The causal agents of these diseases are Plasmopara halstedii, Phomopsis helianthi, and Puccinia helianthi, respectively, and symptoms are readily identifiable in the field. Unfortunately, existing controls based on host resistance or fungicides have not provided sufficient control to date, which calls for developing alternative control methods.
Recent surveys have demonstrated that plants often harbor novel mycoviruses [2], which have the potential to be developed for biopesticides [3] if hypovirulence (reduced virulence) in the fungal hosts is found. Hypovirulence in fungi has been reported in many different fungi, such as Cryphonectria parasitica, Sclerotinia sclerotiorum, Rhizoctonia spp., and Fusarium spp., etc., although many mycoviruses appear to cause asymptomatic infections (reviewed in [4][5][6]). Field studies based on foliar sprays or "bio-priming" seed treatment of mycelial homogenates of fungal isolates infected with mycoviruses showed significantly reduced severity of fungal diseases on brassica oilseed crops [7,8]. However, few potential mycoviruses associated with sunflower fungal pathogens have been discovered and characterized with only a few focused on the oomycete P. halstedii [9,10], and none for Phomopsis helianthi and Puccinia helianthi. Besides their virocontrol aspect, past surveys have observed that mycoviruses of the same family infecting the same fungal host tend to cluster closely phylogenetically [11] and can be useful for understanding fungal epidemiology [12]. Cryptic and persistent plant-associated viruses also have been found to change plant volatiles and protect the plant host from aphids, a common vector for acute plant viruses [13]. Therefore, the presence of specific viruses may provide useful information to predict the incidence of fungal or plant viral diseases. Currently, there are still limitations on the resolution of the fungal nuclear ribosomal internal transcribed spacer (ITS) region to make an accurate identification of some fungal endophytes, so plant-associated viromes could serve as biomarkers for the identification of the fungal endophyte community.
Sometimes, lab isolates of fungal cultures have sectoring growth and are discarded by the pathologists when the transferred isolates grow slowly and become hard to maintain. Therefore, candidates of biopesticide or biotechnological use could be eliminated in the culturing/serial passage process, e.g., Macrophomina phaseolina tobamo-like virus 1 [11,14]. Moreover, some fungal pathogens, e.g., sunflower downy mildew and rust, being obligate parasites, cannot be cultured in the lab. Metatranscriptome surveys of plant leaves/roots or fungal isolates of plant pathogens have allowed for the discovery of many novel mycoviruses [2,15] bypassing the culturing step. Once potential mycoviruses are prioritized for further characterization, the resulting mycoviruses can sometimes reprogram the pathogenic fungi to grow as beneficial endophytes to increase yield beyond the protective effect [16].
In this study, we directly identify novel viruses from the diseased tissues of sunflowers caused by rust and phomopsis. This direct approach of sequencing diseased tissue without first culturing bypasses the possibility of selecting for the mildly attenuating or asymptomatic mycovirus infections and allows us to characterize the plant viromes of the pathogens to include the ones that cannot be easily cultured in the lab.

Field Sampling and Sample Processing
To identify potential mycoviruses from the field, tissue samples of diseased sunflowers were collected in the summer of 2017. Symptomatic leaf samples from four diseased plants, each for phomopsis and rust, from Brookings County and Sully County in South Dakota, USA, were sequenced, respectively, in this study. The two diseased plant samples were collected separately from different fields on different dates and were not coinfected by the two pathogens. The phomopsis plants were sampled early to mid-September 2017 for leaves showing symptoms but not yet completely wilted. Specifically, the green leaf tissues on the edge of the lesions caused by phomopsis were sampled because the lesion area had very little intact RNA left (data not shown). Four leaves per plant were stacked, punched into discs, divided into 150 mg per tube, flash-frozen in liquid nitrogen, and stored at −80°C until further processing. Leaf discs containing rust pustules and between the pustules were both included in the RNA extraction.

RNA Extraction
RNA extraction was conducted as previously described by Marzano et al. [11]. Leaf discs were ground in tubes dipped in liquid nitrogen by metal beads, and total RNA was extracted using Qiagen RNeasy Plant Mini Kit following the manufacturer's instruction (Valencia, CA, USA). RNA samples were treated with DNase I and evaluated for integrity by agarose gel electrophoresis.

Library Construction
Total RNA (approximately 1 µg) free of genomic DNA was depleted of rRNAs with the Ribo-Zero Plant Kit and used as a template to construct paired-end libraries with a ScriptSeq RNA sample preparation kit (both manufactured by Illumina, San Diego, CA, USA), and were cleaned and sequenced on an Illumina HiSeq 4000 for pair-end sequencing through the W.M. Keck Center, University of Illinois. SF1 and SF2 libraries were constructed using RNA extracts from phomopsis-and rust-infected leaves, respectively.

Sequence Analysis
The microbiome associated with sunflower leaves was profiled using DIAMOND [17] by aligning the short reads to the National Center for Biotechnology Information (NCBI) nonredundant (nr) amino acid sequence database, parsed, and assigned to taxa using MEGAN 6 [18]. The RNA-Seq library reads were analyzed and uploaded to the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under accession PRJNA657097. There were 65,156,534 and 114,034,206 reads from SF1 (SRS7200872) and SF2 (SRR12450951) libraries, respectively. Adapters were first trimmed using bbduk (JGI) and assembled into contigs using a TRINITY (v2.5.1) de novo transcriptome assembler [19]. After trimming of the adapters, there were~98% of reads left for assembly. Contigs with significant similarity of viral amino acid sequences were identified using USEARCH ublast [20] with a parameter e-value of 0.0001 for significant hits and compared to a custom database containing sunflower and viral amino acid sequences from GenBank using BLASTX to exclude misidentified sequences.
The predicted amino acid sequences of potential novel viruses were aligned to the contigs from NCBI using ClustalX 2.0 [21]. Phylogenetic trees were generated with RAxML [22] and modified with Dendroscope 3.7.2 [23]. Poorly supported lineages were contracted. To better present the evolutionary relationships between the novel viruses and other known viruses, the minimum support thresholds were set to 47 for Partitiviridae and 44 for Mitoviridae because the new viral contigs would not cluster with other viruses otherwise. For other viruses, the thresholds were set at 50.

Metatranscriptomic Comparison of Microbiome/Virome from Two Libraries
Upon aligning the short reads to NCBI nr-database, the metatransciptomic comparison of the two libraries using MEGAN analysis provided verification of the association with their respective plant diseases. The causal agent(s) of phomopsis stem canker of sunflower were detected from infected leaves, coinfected by other fungi under families Sclerotiniaceae, Hypocreales, and Pleosporineae. Some bacterial taxa are common to both libraries in approximately the same amount, including Candidatus kryptonium, Bacillus azotoformans, and Staphylococcus haemolyticus. However, the SF1 library had nearly 10X more Actinobacteria in normalized read counts than the SF2 library. In addition, most of the HaEV1 reads were aligned to the SF1 library, in contrast to the SF2 library, which had only mitoviruses and dsRNA viruses (Durnavirales) detected using the MEGAN analysis based on short read alignments ( Figure 1). Table A1 summarizes all the viruses discovered in this study with identities to RNAdependent RNA polymerases (RdRps) or capsid proteins (CPs). Based on the RdRp identities and contig lengths, the virus genomes included in this report are one endornavirus, two mitoviruses, three partitiviruses, and four totiviruses ( Figure 2). Based on the similarities to CPs, 15 viral contigs were appended in Table A1 as well. Based on both RdRps and CPs, three known viruses, including Helianthus annuus alphaendornavirus 1 (HaEV1), bean yellow mosaic virus, and a short contig of Sclerotinia sclerotiorum hypovirulence-associated DNA virus were detected. Four of the viral contigs likely had complete RdRps, including Helianthus annuus alphaendornavirus (HaEV1), Helianthus annuus leaf-associated partitivirus 2 (HlaPV2), Helianthus annuus leaf-associated partitivirus 3 (HlaPV3), and Helianthus annuus leaf-associated totivirus 1 (HlaTV1).  Table A1 as well. Based on both RdRps and CPs, three known viruses, including Helianthus annuus alphaendornavirus 1 (HaEV1), bean yellow mosaic virus, and a short contig of Sclerotinia sclerotiorum hypovirulence-associated DNA virus were detected. Four of the viral contigs likely had complete RdRps, including Helianthus annuus alphaendornavirus (HaEV1), Helianthus annuus leaf-associated partitivirus 2 (HlaPV2), Helianthus annuus leaf-associated partitivirus 3 (HlaPV3), and Helianthus annuus leaf-associated totivirus 1 (HlaTV1).   annuus leaf-associated partitivirus 2 (HlaPV2), Helianthus annuus leaf-associated parti virus 3 (HlaPV3), and Helianthus annuus leaf-associated totivirus 1 (HlaTV1).

Endornavirus Genome
The predicted amino acid sequence of a known virus, HaEV1, (14,645 nt; MT873524) contained a conserved polyprotein with 99.8% identity to the previously published Helianthus annuus alphaendornavirus sequence (NC_040799.1) and is related to another alphaendornavirus (MN015676.1) with 40% identity. Besides these two viruses, the HaEV1 amino acid sequence was also grouped with other endornaviruses, including Helicobasidium mompa alphaendornavirus 1, Soybean leaf-associated endornavirus 1, Phytophthora endornavirus 1, and Vicia faba endornavirus. Therefore, based on the structure of the phylogenetic tree and the genus of known viruses from the International Committee on Taxonomy of Viruses (ICTV), HaEV1 belongs to Alphaendornavirus genus.

Endornavirus Genome
The predicted amino acid sequence of a known virus, HaEV1, (14,645 nt; MT8735 contained a conserved polyprotein with 99.8% identity to the previously published H anthus annuus alphaendornavirus sequence (NC_040799.1) and is related to another phaendornavirus (MN015676.1) with 40% identity. Besides these two viruses, the HaE amino acid sequence was also grouped with other endornaviruses, including Helicobas ium mompa alphaendornavirus 1, Soybean leaf-associated endornavirus 1, Phytophth endornavirus 1, and Vicia faba endornavirus. Therefore, based on the structure of the p logenetic tree and the genus of known viruses from the International Committee on T onomy of Viruses (ICTV), HaEV1 belongs to Alphaendornavirus genus.

Putative Totivirus Genome
Based on the similarity to RdRp, four viral contigs with similarities to the Totiviridae family were found and named as Helianthus annuus leaf-associated totivirus 1 (4924 nt; MT873528), Helianthus annuus leaf-associated totivirus 2 (3854 nt; MT873529), Helianthus annuus leaf-associated totivirus 3 (2489 nt; MT873530), and Helianthus annuus leaf-associated totivirus 4 (2887 nt; MT873531). The predicted amino acid sequence of HlaTV1 contains a complete RdRp which shares 50% identity to Puccinia striiformis totivirus 4 (KY207364). The HlaTV2 contig has 37% identity with RdRp of Puccinia striiformis totivirus 2 (KY207362). Both HlaTV3 and HlaTV4 are similar to Puccinia striiformis totivirus 5 (KY207365) with identities of 49% and 79%, respectively. It is worth noting that Puccinia striiformis is another rust fungus, which suggests that our approach was successful in identifying viruses associated with the pathogenic fungi. These four viral contigs are more closely related to the viruses belonging to the Totivirus genus than other genera in the Based on the coat protein identities, four viral contigs were identified to be within Partitiviridae and named as Helianthus annus leaf-associated partitivirus 4 (MZ532542), Helianthus annus leaf-associated cryptic virus 1 (MZ532543), Helianthus annus leaf-associated partitivirus 5 (MZ532544), and Helianthus annus leaf-associated cryptic virus 2 (MZ532556) ( Figure 5). Totiviridae family ( Figure 5). According to the reconstructed phylogenetic tree of Totiviridae, HlaTV1 with HlaTV2 and HlaTV3 with HlaTV4 clustered separately, which indicates that HlaTV1 and HlaTV2 are more closely related and HlaTV3 and HlaTV4 are more related phylogenetically.

Putative Totivirus Genome
Based on the similarity to RdRp, four viral contigs with similarities to the Totiviridae family were found and named as Helianthus annuus leaf-associated totivirus 1 (4924 nt; MT873528), Helianthus annuus leaf-associated totivirus 2 (3854 nt; MT873529), Helianthus annuus leaf-associated totivirus 3 (2489 nt; MT873530), and Helianthus annuus leaf-associated Viruses 2021, 13, 1867 8 of 14 totivirus 4 (2887 nt; MT873531). The predicted amino acid sequence of HlaTV1 contains a complete RdRp which shares 50% identity to Puccinia striiformis totivirus 4 (KY207364). The HlaTV2 contig has 37% identity with RdRp of Puccinia striiformis totivirus 2 (KY207362). Both HlaTV3 and HlaTV4 are similar to Puccinia striiformis totivirus 5 (KY207365) with identities of 49% and 79%, respectively. It is worth noting that Puccinia striiformis is another rust fungus, which suggests that our approach was successful in identifying viruses associated with the pathogenic fungi. These four viral contigs are more closely related to the viruses belonging to the Totivirus genus than other genera in the Totiviridae family ( Figure 5). According to the reconstructed phylogenetic tree of Totiviridae, HlaTV1 with HlaTV2 and HlaTV3 with HlaTV4 clustered separately, which indicates that HlaTV1 and HlaTV2 are more closely related and HlaTV3 and HlaTV4 are more related phylogenetically.
Based on the similarity to coat protein (CP), 10 viral contigs were identified, including 9 totiviruses and a victorivirus (MZ532545~54) (Figure 6). Among the partitivirus contigs, HlaPV3 and HlaPV5 could be the two RNA segments for the same virus based on the fact that the majority of the reads both came from the SF1 library. In addition, contigs of HlaPV2 and HlaCV1 could come from the same virus too, based on the same read count ratios between the reads from SF1 and SF2 libraries. Based on the similarity to coat protein (CP), 10 viral contigs were identified, including 9 totiviruses and a victorivirus (MZ532545~54) (Figure 6). Among the partitivirus contigs, HlaPV3 and HlaPV5 could be the two RNA segments for the same virus based on the fact that the majority of the reads both came from the SF1 library. In addition, contigs of HlaPV2 and HlaCV1 could come from the same virus too, based on the same read count ratios between the reads from SF1 and SF2 libraries.
A potyvirus-like contig was also identified, which shares 99.5% amino acid identity to bean yellow mosaic virus (BYMV) for the 555 nt contig and has only 70% nucleotide identity to the Meadow saffron breaking virus polyprotein gene and the Butterfly flower mosaic virus. As the sequences for BYMV are very diverse, and one of the top hits (93.3% nt identity) in BLASTX for this BYMV-like sequence is a BYMV isolate from sunflowers in Iran encoding a portion of the CP, we still named it as BYMV. The naming is based on the knowledge that CPs need to form specific structures to assemble virus particles, so their amino acid sequences are often highly conserved. Moreover, based on coat protein similarity, we detected a short contig (197 nt) that came from very few reads in both libraries and is similar to the Sclerotinia sclerotiorum hypovirulence-associated DNA virus. It is divergent from the other detected CPs from dsRNA viruses, and therefore excluded from the phylogenetic analysis shown in Figure 6. More characterization of the whole genome sequence is needed in future studies.  A potyvirus-like contig was also identified, which shares 99.5% amino acid identity to bean yellow mosaic virus (BYMV) for the 555 nt contig and has only 70% nucleotide Viruses 2021, 13, 1867 9 of 14 identity to the Meadow saffron breaking virus polyprotein gene and the Butterfly flower mosaic virus. As the sequences for BYMV are very diverse, and one of the top hits (93.3% nt identity) in BLASTX for this BYMV-like sequence is a BYMV isolate from sunflowers in Iran encoding a portion of the CP, we still named it as BYMV. The naming is based on the knowledge that CPs need to form specific structures to assemble virus particles, so their amino acid sequences are often highly conserved. Moreover, based on coat protein similarity, we detected a short contig (197 nt) that came from very few reads in both libraries and is similar to the Sclerotinia sclerotiorum hypovirulence-associated DNA virus. It is divergent from the other detected CPs from dsRNA viruses, and therefore excluded from the phylogenetic analysis shown in Figure 6. More characterization of the whole genome sequence is needed in future studies.

Discussion
In this study, we identified 25 putative viral contigs using a metatranscriptomics approach, ranging from 1.6 kb to 14.6 kb for the RNA viruses, and one short DNA virus contig. Only one of the viruses, an endornavirus, had been described previously as Helianthus annuus alphaendornavirus (HaEV1). It was sampled from leaves showing typical symptoms of brown, irregular-shaped spots on infected leaves, which coalesced into large patches of dead tissue spreading in from the leaf margin, indicating Phomopsis spp. infection. Phomopsis stem canker in sunflowers has been reported to be associated with at least 15 species of Diaporthe that are either pathogenic or non-pathogenic [25] and can be a complex of the pathogenic Diaporthe (Phomopsis) species [26]. Although this virus is thought to exist without having direct associations to fungi, our detection of it being associated with Phomopsis warrants further investigation into whether the virus replicates in P. helianthin, and the effects of HaEV1 on fungal virulence. It has been reported that a plant virus, cucumber mosaic virus, can naturally infect a fungal host, Rhizoctonia solani [27]. Therefore, it will be interesting to verify whether it is the same for HEV1, especially since we have detected contigs of HEV1 in a separate metatranscriptome (deposited under NCBI PRJNA753120) directly from Diaporthe spp., which will be reported in a separate paper. The current study has the limitation that more validation is required to cross-check with Diaporthe metatranscriptomes, for example, to make a connection to phomopsis-infecting mycoviruses. Nevertheless, this study profiled known and novel sunflower-associated viruses from the field and provided hypotheses for further laboratory determination. In addition, endornaviruses belong to a group of viruses without true virions, which will require the construction of infectious cDNA clones driven by a T7 promoter in future studies to establish cause-and-effect relationships.
Based on RdRps, totiviruses with less than 50% amino acid sequence identity are considered different species. Therefore, HlaTV2 and HlaTV3 are considered new species. For partitiviruses, the threshold is 40%: amino acid sequence similarity >40% between RdRps of viruses from different species in the same phylogenetic cluster and <40% between members of species in different clusters; therefore, new species are reported in this study. Based on CPs, we discovered more viral contigs than RdRps, and it will require further investigation to better characterize these newly reported viruses. Using similarities to existing CPs, we detected more contigs, probably because they need to be translated to specific protein structures to assemble virions and are more conserved. Although some viruses lack CPs, such as endornaviruses and mitoviruses, using CPs to detect new viruses is very useful for the viruses that encode CPs.
We previously reported a survey of soybean leaf-associated virome from field-collected tissues regardless of the disease symptoms [2]. In this study, we instead focused only on diseased tissues because of the potential to link the diseases with the viruses. Even though specific non-pathogenic viruses are associated with tissues infected by pathogenic fungi, it does not necessarily mean that these viruses do not cause hypovirulence of the fungal pathogens. Potential hypovirulence-associated mycoviruses discovered in phytobiomes [2,3] can be rescued and reintroduced back to the field in a high viral titer after culturing in the lab. To build up viral titer, once the virus is rescued by reverse genetics approaches [15] or purified along with the fungal isolates [7], abnormal growth of the fungal hyphae should be noted and preferentially selected for serial transfer. Rather than following the traditional method of maintaining a fungal isolate in mycology labs, the abnormal-looking portion of older growth, specifically the fuzzy, sectoring, and oddly pigmented part of the plate, should be selected for transfer. Once the viral titer is increased after repeated lab conditioning, an inoculum can be obtained from a culture with even growth. For example, a stable culture of S. sclerotiorum that does not produce sclerotia is a desirable phenotype [3], and the hypovirulence strain can be introduced directly to the field. Furthermore, crude extraction of the virions can be achieved by low-speed centrifugation and used for virocontrol development for either agricultural purposes [28] or to be explored for medical uses [29]. Again, the exact hosts of these viruses reported in the current study require further validation, and the effects on their hosts are yet to be determined.
To simultaneously compare the changes in microbiome and virome in the field or in greenhouse production, the same direct sequencing of total RNA without culturing, in a repeated sampling scheme, should be carried out in the future. Ecological modeling should be performed to include the abundances of each virus and overall microbial population, including fungal pathogens, shown in normalized read counts based on the high throughput sequencing data, in conjunction with the disease severity rating, and the environmental conditions for multiple time points. Applying such a systems approach will provide us new insights into the emergence and maintenance of viromes facing environmental variability, which would be proactive, considering their importance as ecological drivers of the pathogen population and overall microbial composition. A study to include multiple sites can be advantageous, especially if the disease in question is widespread and the pathogen is cosmopolitan. It would be interesting to monitor whether there is a correlation between the mycoviral read counts and the disease severities and yield intraannually [30], since yield enhancement has been reported in a mycovirus that presents in fields of brassica spp. [16]. As global climate change and extreme weather intensify, exploring virome-mediated pathogen control and plant growth promotion in agricultural production is urgently needed more than ever. The metatranscriptomics approach used in this study provides an unbiased snapshot of the plant-associated virome, including some potentially beneficial mycoviruses.  Institutional Review Board Statement: Not applicable. The study does not involve humans or animals.

Informed Consent Statement: Not applicable.
Data Availability Statement: All data generated and analyzed during this study are included in this article.