Sequencing a Strawberry Germplasm Collection Reveals New Viral Genetic Diversity and the Basis for New RT-qPCR Assays

Viruses are considered of major importance in strawberry (Fragaria × ananassa Duchesne) production given their negative impact on plant vigor and growth. Strawberry accessions from the National Clonal Germplasm Repository were screened for viruses using high throughput sequencing (HTS). Analyses of sequence information from 45 plants identified multiple variants of 14 known viruses, comprising strawberry mottle virus (SMoV), beet pseudo yellows virus (BPYV), strawberry pallidosis-associated virus (SPaV), tomato ringspot virus (ToRSV), strawberry mild yellow edge virus (SMYEV), strawberry vein banding virus (SVBV), strawberry crinkle virus (SCV), strawberry polerovirus 1 (SPV-1), apple mosaic virus (ApMV), strawberry chlorotic fleck virus (SCFaV), strawberry crinivirus 4 (SCrV-4), strawberry crinivirus 3 (SCrV-3), Fragaria chiloensis latent virus (FClLV) and Fragaria chiloensis cryptic virus (FCCV). Genetic diversity of sequenced virus isolates was investigated via sequence homology analysis, and partial-genome sequences were deposited into GenBank. To confirm the HTS results and expand the detection of strawberry viruses, new reverse transcription quantitative PCR (RT-qPCR) assays were designed for the above-listed viruses. Further in silico and in vitro validation of the new diagnostic assays indicated high efficiency and reliability. Thus, the occurrence of different viruses, including divergent variants, among the strawberries was verified. This is the first viral metagenomic survey in strawberry, additionally, this study describes the design and validation of multiple RT-qPCR assays for strawberry viruses, which represent important detection tools for clean plant programs.


Introduction
The garden or commercial strawberry (Fragaria × ananassa Duchesne) is a widely grown hybrid species of the genus Fragaria (family Rosaceae), which is cultivated as a source of food in many parts of the world [1]. This plant was first bred in Europe around 1750 and is a hybrid between F. virginiana from North America and F. chiloensis from South America [2,3]. The United States is one of the major worldwide producers of strawberry with 1.2 million tons in 2018, produced primarily in the state of California (Food and Agriculture Organization. http://www.fao.org/faostat/en/#data/QC, accessed on 26 October 2020). California strawberries are available year-round because of the mild climate, which allows harvesting fruits over a long period of time [4].
An important resource of genetic material from Fragaria is the US Department of Agriculture, ARS National Clonal Germplasm Repository (NCGR) located in Corvallis (OR, USA). The NCGR contains 2013 active strawberry accessions, representing 46 taxa including species and subspecies (GRIN-Global. https://npgsweb.ars-grin.gov/gringlobal/search, accessed on 27 October 2020). Besides having foundational material for horticultural distribution, the NCGR also separately houses pathogen-positive accessions for virology research.
Various viral diseases have been reported in strawberry and are associated with plant decline and yield loss (for a review [5,6]). The high number of identified viruses in strawberry is the result of vegetative propagation and exposure of the plant in open-field cultivation. Additionally, different vectors (e.g., insects and nematodes) have been shown to transmit these viruses. Lastly, viruses in strawberry plants may be in low concentrations and in mixed infections, and commonly induce non-specific symptoms [7].
Reliable and efficient diagnostic tests are critical in determining viral infection in strawberry and consequently the control of disease. Multiple laboratory-based tests are available for the diagnosis of viruses in strawberry, mainly involving ELISA and reverse transcription PCR (RT-PCR), previously reviewed in [8]. In contrast, the use of quantitative RT-PCR (RT-qPCR) assays for strawberry viruses is extremely limited or absent. The advantages of using RT-qPCR over other classical molecular-serological methods to detect viruses are increased sensitivity, speed, reproducibility, and limited risk of contamination [9].
Recently, next generation sequencing or high throughput sequencing (HTS) has been used to reveal the etiology of several diseases in strawberry. For example, a novel virus (i.e., strawberry polerovirus-1, SPV-1) was associated with the strawberry decline disease in Canada [10]. Two new putative viruses in the genus Crinivirus, strawberry criniviruses 3 and 4 (SCrV-3 and 4), were identified in strawberry plants displaying virus-like symptoms [11,12]. In 2019, Fránová et al. [7] sequenced a novel rhabdovirus infecting garden and wild strawberry plants.
To further investigate the virome of strawberry, we analyzed 45 plants collected at the NCGR by HTS. Multiple variants of viruses known to infect strawberry were annotated and characterized, including near-complete genomes. To complement these HTS results, new RT-qPCR assays were developed and validated for 14 different viruses.

Plant Material and TNA Extraction
In the summer of 2020, 45 accessions of strawberry were received as propagative material from the NCGR, under a USDA Animal and Plant Health Inspection Service (APHIS) movement permit, for HTS analysis at Foundation Plant Services (FPS, University of California-Davis, Davis, CA, USA). These 45 plants were previously determined to be positive for viruses or virus-like agents using RT-PCR and bio-indexing. Stolons were propagated under mist and later transferred to single 1-gal pots within a greenhouse. Four months after bud break, 0.7 g of leaf tissue from each strawberry plant was collected and spiked with 5% (w/w) tissue of Phaseolus vulgaris cultivar Black Turtle Soup (BTS). BTS is naturally infected by two different endornaviruses and is used as an internal control in HTS virus screening [13]. At the time of sampling different leaf symptoms were observed in several strawberry plants ( Figure 1). Following the protocol described in [14], total nucleic acid (TNA) extracts were prepared using guanidine isothiocyanate lysis buffer and a KingFisher Flex System with the MagMax™ Plant RNA Isolation kit (ThermoFisher Scientific, Sunnyvale, CA, USA).

HTS and Plant Virus Identification
For individual samples, a total of 700 ng per 10 µ L of extracted nucleic acids were subjected to rRNA and cDNA library construction using TruSeq Stranded Total RNA with Ribo-Zero Plant kit (Illumina, San Diego, CA, USA). Subsequently, cDNA libraries were end-repaired, adapter-ligated by unique dual-indexes, and PCR enriched. Finally, the amplicons were sequenced in an Illumina NextSeq 500 platform using a single-end 75-bp format.
Sequenced reads were demultiplexed and adapter trimmed using Illumina bcl2fastq2 v2.20.0.422. To obtain the highest quality contig for an annotated virus, three de novo assemblies were performed: (1) a de novo assembly using SPAdes v3.14.1 [15]; (2) a de novo assembly using SPAdes v3.14.1 in RNA; and (3) an overlap layout consensus assembly of the virus annotated contigs from (1) using the Minimo assembler [16]. For each virus infection, the assembly yielding the longest contigs was chosen.
To annotate de novo assemblies, all contigs greater than 200 bp were aligned to the June 2020 version of the GenBank non-redundant database of nucleotide sequences using BLASTn with a reduced word size of 7 to identify known viruses. Additionally, to screen for potential unknown viruses, contigs greater than 200 bp were aligned to the June 2020 version of the GenBank non-redundant database of nucleotide sequences using BLASTx. Then, sequences with significant hits (E-value < 1 × 10 −5 ) in this database to a virus known to infect land plants went through a final manual annotation check.

Plant Virus Genome Analysis
Potential open reading frames (ORFs) and proteins encoded by the HTS-detected viruses were annotated by ORFfinder and BLASTp analysis. Conserved domains present in the putative proteins were searched in the Pfam database [17] using HMMER v3.1 [18]. Once sequence analysis was completed, new virus sequences were deposited in GenBank.

RT-qPCR Assay Design
Following the protocol described in [19], new RT-qPCR assays were designed for all the viruses identified during the HTS analysis. Using a comprehensive approach that covers all the known virus genetic diversity in this study and in GenBank, detection assays based on sequence-specific DNA hydrolysis probes (TaqMan™ MGB) were generated. Determination of primer and probe sequences for a target region included the default parameters for qPCR in the Primer Express software (ThermoFisher Scientific, Sunnyvale, CA, USA).

HTS and Plant Virus Identification
For individual samples, a total of 700 ng per 10 µL of extracted nucleic acids were subjected to rRNA and cDNA library construction using TruSeq Stranded Total RNA with Ribo-Zero Plant kit (Illumina, San Diego, CA, USA). Subsequently, cDNA libraries were end-repaired, adapter-ligated by unique dual-indexes, and PCR enriched. Finally, the amplicons were sequenced in an Illumina NextSeq 500 platform using a single-end 75-bp format.
Sequenced reads were demultiplexed and adapter trimmed using Illumina bcl2fastq2 v2.20.0.422. To obtain the highest quality contig for an annotated virus, three de novo assemblies were performed: (1) a de novo assembly using SPAdes v3.14.1 [15]; (2) a de novo assembly using SPAdes v3.14.1 in RNA; and (3) an overlap layout consensus assembly of the virus annotated contigs from (1) using the Minimo assembler [16]. For each virus infection, the assembly yielding the longest contigs was chosen.
To annotate de novo assemblies, all contigs greater than 200 bp were aligned to the June 2020 version of the GenBank non-redundant database of nucleotide sequences using BLASTn with a reduced word size of 7 to identify known viruses. Additionally, to screen for potential unknown viruses, contigs greater than 200 bp were aligned to the June 2020 version of the GenBank non-redundant database of nucleotide sequences using BLASTx. Then, sequences with significant hits (E-value < 1 × 10 −5 ) in this database to a virus known to infect land plants went through a final manual annotation check.

Plant Virus Genome Analysis
Potential open reading frames (ORFs) and proteins encoded by the HTS-detected viruses were annotated by ORFfinder and BLASTp analysis. Conserved domains present in the putative proteins were searched in the Pfam database [17] using HMMER v3.1 [18]. Once sequence analysis was completed, new virus sequences were deposited in GenBank.

RT-qPCR Assay Design
Following the protocol described in [19], new RT-qPCR assays were designed for all the viruses identified during the HTS analysis. Using a comprehensive approach that covers all the known virus genetic diversity in this study and in GenBank, detection assays based on sequence-specific DNA hydrolysis probes (TaqMan™ MGB) were generated. Determination of primer and probe sequences for a target region included the default parameters for qPCR in the Primer Express software (ThermoFisher Scientific, Sunnyvale, CA, USA).
After initial assay design was completed, in silico analysis facilitated by purpose-built scripts implementing the procedures described below was used to incorporate additional viral genetic diversity into assay design. For each of the viruses, the RT-qPCR assay was first evaluated against all virus sequences in the July 2020 version of GenBank as well as all virus isolates identified during this study. First, a BLAST database search is used to identify and obtain all sequences overlapping the current assay region. To maximize sensitivity, a tBLASTn translated alignment exploiting codon redundancy was used. Once target sequences were collected and their species identification confirmed, all existing primers and probes were aligned to all target sequences covering the assay region. This alignment was accomplished using a script that used an end-gap-free nucleotide alignment to identify the best matching probe, forward and reverse primer sequences to each variant. In each case, the variant sequences corresponding to the matching oligos were collected and analyzed for divergence. Thus, all unique candidate sequence variants were inspected for total or partial divergence to an existing primer/probe sequence. The location and quantity of nucleotide differences and the frequency of the differences were also determined, and the assays were updated with extra primers or probes as needed. A probe or primer was added when more than two nucleotide mismatches or a single mismatch near the end were detected during the sequence comparison. Lastly, in order to reduce the effective number of primers in each reaction, degenerate bases were not used in the oligos.
Once assay design was completed, their efficiency was calculated via serial dilutions of 1:1 to 1:10,000,000 and replicated in triplicate; standard curves were generated by the QuantStudio 6 Flex Real-Time PCR System software (ThermoFisher Scientific, Sunnyvale, CA, USA).

HTS Validation by RT-qPCR
Leaf tissue was collected from all the propagated strawberry plants and TNA was extracted as described above but reducing the input material to 0.2 g and omitting the addition of BTS. RT-qPCR reactions were completed in the QuantStudio 6 Flex Real-Time PCR System using the TaqMan Fast Virus 1-Step Master Mix (ThermoFisher Scientific, Sunnyvale, CA, USA) as per manufacturer's protocol. Each reaction (10 µL final volume) included 2 µL of TNA and final primer and probe concentrations of 900 and 250 ηM, respectively. In addition, the new assays were multiplexed with an 18S rRNA assay to confirm the presence of RNA [20].

Additional Testing by RT-qPCR
The new RT-qPCR assays were used to test 12 virus-positive strawberry plants located at FPS. Given the nature of FPS as a center for distribution of virus-tested propagation material, occasionally plants infected by viruses are identified and adopted as positive controls in routine screening (https://fps.ucdavis.edu/index.cfm, accessed on 1 July 2021). All 12 plants were previously analyzed by RT-PCR and HTS, revealing the presence of several viruses included in this study.

Viral Sequences Identified by HTS
Forty-five strawberry plants originating from the NCGR were screened for viruses via HTS. HTS yielded consistent results for the sequencing, assembly, and annotation of each of the samples (Table S1). Across multiple runs, the Illumina sequenced read depth for each ranged from 23.4 to 32.2 million 75-bp reads with a median value of 28.0 million and a coefficient of variation of 0.09. More variation was observed in the subsequent Spades de novo assemblies of the metagenomes, which ranged from 9.0 Mbp to 30.0 Mbp contigs with a median value of 20.5 Mbp and a coefficient of variation of 0.27. The number of contigs that annotated as viral was a small fraction (0.01-0.25%) of each assembly.
Ignoring the multiple endornavirus-like contigs generated in all the HTS-analyzed samples, no plant infecting viruses were identified in five plants but viruses in single and mixed infections were identified in the remaining 40 plants (Table 1) A separate annotation was performed to identify potential novel plant viruses in these samples, characterized by divergent protein homology to a virus known to infect plants. No such sequences were observed. However, distant protein homology to insect viruses, mycoviruses and phages was observed.
We used the nucleotide identity to the closest homolog in GenBank as an estimate of the amount of additional nucleotide diversity each of these sequences provide. As indicated in Table 2, this study provides substantial new diversity for six of the 14 strawberry infecting viruses. Further, if we consider 90% nucleotide identity as the cutoff for a divergent isolate, we obtained a total of 19 new divergent isolates (BPYV, 1; SCrV-4, 2; SMoV, 1; SMYEV, 10; SPaV, 4; ToRSV, 1). The virus with the greatest amount of nucleotide diversity and largest number of divergent isolates was SMYEV, a positive sense RNA virus. In contrast, the virus with the lowest amount of additional nucleotide diversity, based on three sequenced isolates, was SVBV, a DNA genome virus.

New RT-qPCR Assays
New RT-qPCR assays were developed for SMoV, BPYV, SPaV, ToRSV, SMYEV, SVBV, SCV, SPV-1, ApMV, SCFaV, SCrV-4, SCrV-3, FClLV and FCCV. According to the in silico analysis, most assays needed multiple forward and reverse primers and/or probes (Table 3) to cover all the known genetic diversity of these viruses. To reduce the number of potential primer combinations, we preferred using multiple primers to degenerate bases. The amplification efficiency varied among assays and ranged from 87.5% to 118.6% ( Figure S1).

New RT-qPCR Assays
New RT-qPCR assays were developed for SMoV, BPYV, SPaV, ToRSV, SMYEV, SVBV, SCV, SPV-1, ApMV, SCFaV, SCrV-4, SCrV-3, FClLV and FCCV. According to the in silico analysis, most assays needed multiple forward and reverse primers and/or probes (Table 3) to cover all the known genetic diversity of these viruses. To reduce the number of potential primer combinations, we preferred using multiple primers to degenerate bases. The amplification efficiency varied among assays and ranged from 87.5% to 118.6% ( Figure S1).

Detection of Viruses by RT-qPCR
To confirm the presence of viruses identified by HTS, source plants were analyzed by newly designed RT-qPCR assays. Virus detection was then validated by comparing HTS and RT-qPCR results for each virus-infected sample. In all cases, RT-qPCR and HTS results agreed (Table S2). Likewise, no amplification was observed in samples H2390, H2397, H2411, H2412 and H2430, confirming the virus-free status of these plants.
To further validate the new assays, twelve FPS positive control plants were analyzed by both HTS and RT-qPCR (Table S3)

Discussion
The primary objectives of this study were to increase available genetic diversity of strawberry viruses by characterizing a diverse set of infected plants and to develop a comprehensive set of RT-qPCR assays for detecting viruses infecting strawberries. The new sequence resources were utilized in the design of the detection assays.
The NCGR provided us with a vast collection of accessions of geographically and genetically diverse provenance. From that collection, all strawberry samples identified as pathogen-infected were included in a viral metagenomic survey using HTS. This approach allowed us to efficiently obtain sequences for all viruses infecting the sample. It is the first such survey of this type in strawberry, templated on work done in previous crops [21][22][23]. The metagenomic analysis revealed the presence of 14 different strawberry infecting viruses, the majority of which had RNA genomes. More detailed HTS analyses indicated that 40 out of 45 plants were infected with at least one virus and 34 had mixed infections of up to five different viruses.
Considerable additional genetic diversity was observed in several cases of strawberry viruses. In total, 65 partial genome sequences from 14 strawberry-infecting viruses were deposited in GenBank. These represent a substantial contribution to sequence resources for strawberry viruses, increasing the number of sequences deposited in GenBank by approximately 10%. Moreover, these sequences extend the genetic diversity of strawberry infecting viruses characterized in previous amplicon-based studies [24][25][26][27]. Notably, our most highly divergent population of new virus isolates comes from SMYEV. Xiang et al. [28] also observed multiple highly divergent populations of SMYEV in their amplicon-based study. The lowest amount of additional genetic diversity was observed in SVBV; this result is also consistent with the lower mutation rates observed for DNA viruses [29].
Four different criniviruses infecting strawberry were found during the metagenomics survey (BPYV, SPaV, SCrV-3 and -4). Most of these viruses displayed considerable genetic diversity, with the exception of SCrV-3, which lacked divergent variants. In that sense, an outstanding question we believe needs to be addressed is whether SCrV-3 and SCrV-4 represent new virus species or whether they are strains of a known virus. Novel data, like those generated here, will help to understand the biology and molecular biology of criniviruses, which remains to be fully understood [30].
Utilizing these new genetic resources, a comprehensive set of new RT-qPCR detection assays were designed for strawberry viruses. New assays were validated in vitro to confirm their reliability. For example, twelve strawberry samples previously screened for viruses and located at FPS were retested using the newly designed RT-qPCR assays. While we did not compare the sensitivity of our new assays with other PCR-based methods such as endpoint RT-PCR, it is generally accepted that RT-qPCR is a highly sensitive method. Multiple primers and probes were employed to detect diverse virus isolates, as demonstrated with a similar assay for grapevine leafroll-associated virus 3, a highly divergent virus infecting grapevine [31].
The high throughput nature of RT-qPCR is desirable for clean plant certification facilities that process large sample numbers. Except for previous work describing RT-qPCR assays for strawberry necrotic shock virus and SCV, there is a scarcity of published assays of this kind for detecting strawberry viruses [32,33]. The novel diagnostic tools described here address that scarcity, looking to improve the management of strawberry viruses in the United States and globally.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/v13081442/s1, Figure S1: Standard curves generated from new RT-qPCR assays for strawberry viruses, Table S1: Sequencing data generated from strawberry samples, Table S2: Detection of strawberry viruses in NCGR samples via RT-qPCR, Table S3