Identification of Viruses and Viroids Infecting Tomato and Pepper Plants in Vietnam by Metatranscriptomics

Tomato (Lycopersicum esculentum L.) and pepper (Capsicum annuum L.) plants belonging to the family Solanaceae are cultivated worldwide. The rapid development of next-generation sequencing (NGS) technology facilitates the identification of viruses and viroids infecting plants. In this study, we carried out metatranscriptomics using RNA sequencing followed by bioinformatics analyses to identify viruses and viroids infecting tomato and pepper plants in Vietnam. We prepared a total of 16 libraries, including eight tomato and eight pepper libraries derived from different geographical regions in Vietnam. We identified a total of 602 virus-associated contigs, which were assigned to 18 different virus species belonging to nine different viral genera. We identified 13 different viruses and two viroids infecting tomato plants and 12 viruses and two viroids infecting pepper plants with viruses as dominantly observed pathogens. Our results showed that multiple infection of different viral pathogens was common in both plants. Moreover, geographical region and host plant were two major factors to determine viral populations. Taken together, our results provide the comprehensive overview of viral pathogens infecting two important plants in the family Solanaceae grown in Vietnam.


Introduction
Tomato (Lycopersicum esculentum L.) and pepper (Capsicum annuum L.) plants belonging to the family Solanaceae are cultivated worldwide. Tomato fruit, classified botanically as a berry, is consumed in various ways, such as fresh in salad or as materials for diverse dishes, sauces, and drinks [1,2]. Pepper fruits of Capsicum plants have diverse names according to their regions and types. For instance, piquant pepper varieties are referred as chili peppers, whereas peppers with large-or mid-sized fruits are referred to as bell peppers. Sometimes, colors are important factors for determining names of peppers, such as green pepper and red pepper [3].
To date, diverse viruses infecting tomato and peppers have been reported around the world. Virus infections on tomatoes and peppers have a negative impact on these crops, such as low quality and quantity of fruit production [4][5][6][7][8]. Among the viruses infecting solanaceous vegetables,

Collection of Leaf Samples and Generation of Libraries for Identification of Viruses Infecting Tomato and Pepper Plants
To identify viruses infecting tomato and pepper plants, we collected leaf samples showing viral disease symptoms, including yellowing, mosaic, mottling, and dwarfing, in different geographical regions in Vietnam ( Figure 1). We collected samples from eight and five different regions for tomatoes and peppers, respectively. In the case of pepper samples, three different kinds of pepper such as chili pepper, bell pepper, and red pepper were collected. For the pooling of the samples, we collected more than three leaves from each farm located in each individual region.
Samples collected from the same host plants in the same geographical regions were pooled and subjected to total RNA extraction followed by preparation of libraries for RNA-Seq. We named each library according to geographical regions and host plants. For example, the tomato sample collected from Dong Ahn was named DAT (Dong Ahn Tomato) and the chili pepper sample collected from Gia Lam was named GLCP (Gia Lam Chili Pepper). In total, we prepared 16 libraries, including eight tomato and eight pepper libraries. The prepared libraries were paired-end (2 × 100 bp) sequenced by HiSeq 2000 system. Samples collected from the same host plants in the same geographical regions were pooled and subjected to total RNA extraction followed by preparation of libraries for RNA-Seq. We named each library according to geographical regions and host plants. For example, the tomato sample collected from Dong Ahn was named DAT (Dong Ahn Tomato) and the chili pepper sample collected from Gia Lam was named GLCP (Gia Lam Chili Pepper). In total, we prepared 16 libraries, including eight tomato and eight pepper libraries. The prepared libraries were paired-end (2 × 100 bp) sequenced by HiSeq 2000 system.

Identification of Virus-Associated Contigs
The numbers of raw read bases ranged from 8,473,848,288 (DDBP library) to 4,899,434,654 (NBT library). The sequenced raw data of each library were assembled using the Trinity program for de novo assembly. The number of assembled contigs ranged from 77,563 (library NBT) to 146,324 (DDRP library) (Figure 2A).

Identification of Virus-Associated Contigs
The numbers of raw read bases ranged from 8,473,848,288 (DDBP library) to 4,899,434,654 (NBT library). The sequenced raw data of each library were assembled using the Trinity program for de novo assembly. The number of assembled contigs ranged from 77,563 (library NBT) to 146,324 (DDRP library) ( Figure 2A). To identify virus-associated contigs, we performed a BLASTN search using assembled contigs against virus reference genome database obtained from NCBI database. We identified a total of 602 virus-associated contigs. The number of virus-associated contigs for tomato (450 contigs) was higher than that for pepper (152 contigs) ( Figure 2B). The number of identified virus-associated contigs was ranged from three (VPHP) to 45 (DDBP). To identify virus-associated contigs, we performed a BLASTN search using assembled contigs against virus reference genome database obtained from NCBI database. We identified a total of 602 virus-associated contigs. The number of virus-associated contigs for tomato (450 contigs) was higher than that for pepper (152 contigs) ( Figure 2B). The number of identified virus-associated contigs was ranged from three (VPHP) to 45 (DDBP).

Classification of Identified Virus-Associated Contigs According to Virus Taxonomy
We identified 18 different virus species belonging to nine different viral genera (Supplementary  Table S1

Proportion of Identified Viruses and Viroids According to Virus-Associated Contigs
We identified 13 different viruses and two viroids from eight different tomato libraries. Of these, PCFVd (225 contigs) was the most frequently identified viral pathogen based on the number of identified contigs, followed by TNRV (56 contigs), TYLCKaV (26 contigs), PVY (23 contigs), and CMV (23 contigs) ( Figure 3A).

Proportion of Identified Viruses and Viroids According to Virus-Associated Reads
Next, we calculated the proportion of virus-associated reads in each library ( Figure 4A). Except VPCP (1.048%), the proportion of virus-associated reads was less than 1% in all libraries. For eight tomato libraries, the proportion of virus-associated reads ranged from 0.012% (BLT) to 0.575% (DAT), whereas the proportion of virus-associated reads for eight pepper libraries ranged from 0.001% (DACP) to 1.048% (VPCP).

Proportion of Identified Viruses and Viroids According to Virus-Associated Reads
Next, we calculated the proportion of virus-associated reads in each library ( Figure 4A). Except VPCP (1.048%), the proportion of virus-associated reads was less than 1% in all libraries. For eight tomato libraries, the proportion of virus-associated reads ranged from 0.012% (BLT) to 0.575% (DAT), whereas the proportion of virus-associated reads for eight pepper libraries ranged from 0.001% (DACP) to 1.048% (VPCP).
We examined the number of libraries for each virus and viroid ( Figure 4B). PCV-2 was identified in eight libraries (one tomato and seven pepper libraries). AYVCNV was identified only in a single tomato library, while PeVYV and ToLCVV were identified only in a single pepper and tomato library, respectively. In contrast to other viruses, STV was identified in six tomato libraries.
We examined the number of identified viral pathogens in each library ( Figure 4C). Except two libraries, DAT and VPHP, all libraries contained at least two different viral pathogens. Of these, two libraries, BLT and DDBP, contained at least seven different viral pathogens. Three tomato libraries, DTT, DDT, and DCT, and a single pepper library, DDRP, were infected by five different viral pathogens.
Based on virus-associated reads, we examined the proportion of viral pathogens in each library ( Figure 4D We examined the number of libraries for each virus and viroid ( Figure 4B). PCV-2 was identified in eight libraries (one tomato and seven pepper libraries). AYVCNV was identified only in a single tomato library, while PeVYV and ToLCVV were identified only in a single pepper and tomato library, respectively. In contrast to other viruses, STV was identified in six tomato libraries.
We examined the number of identified viral pathogens in each library ( Figure 4C). Except two libraries, DAT and VPHP, all libraries contained at least two different viral pathogens. Of these, two libraries, BLT and DDBP, contained at least seven different viral pathogens. Three tomato libraries, DTT, DDT, and DCT, and a single pepper library, DDRP, were infected by five different viral pathogens.
Based on virus-associated reads, we examined the proportion of viral pathogens in each library ( Figure 4D). TNRV was the dominant virus in DAT and GLCP. CLVd was the dominant viral

Phylogenetic Analyses for Identified Viruses and a Viroid
We assembled viral genomes for four STV isolates, one PMMoV isolate, three ToMV isolates, one TMGMV isolate, one AYVCV isolate, and two PCFVd isolates by RNA-Seq and conducted

Phylogenetic Analyses for Identified Viruses and a Viroid
We assembled viral genomes for four STV isolates, one PMMoV isolate, three ToMV isolates, one TMGMV isolate, one AYVCV isolate, and two PCFVd isolates by RNA-Seq and conducted bioinformatics analyses. The assembled viral genomes, which were assembled to complete genomes, were subjected to BLASTN search to retrieve homologous viral genome sequences. After nucleotide sequence alignment, we generated six different phylogenetic trees ( Figure 6). The phylogenetic tree for STV isolates revealed that isolates including GLT (MW012410) and DDT (MW012413) were closely related, while STV isolates DCT (MW012412) and DTT (MW012411) were grouped in the other clade. Notably, STV isolate DTT was classified into different groups, in contrast to other isolates ( Figure 6A). The identified PMMoV isolate VPCP (MW012414) was closely related to other isolates from China ( Figure 6B). In the case of ToMV, we have already reported three partial genome sequences of ToMV from tomato, pepper leaves, and chili seeds [17]. The ToMV sequence from the previous study was used for phylogenetic tree construction. The phylogenetic tree of ToMV showed two distinct groups. Three ToMV isolates (isolate DTT; MW012409, isolate DDBP; MH393623, isolate BLT; MH393621) from Vietnam were grouped with an isolate from Japan ( Figure 6C). The TMGMV isolate NBHP (MW012408) was closely related to the isolate CaJO from Jordan ( Figure 6D). In the case of AYVCNV, we previously reported the assembled genome sequence as Ageratum yellow vein virus (AYVV) isolate BaoLoc [10]. However, a BLASTN search against a nucleotide database using the same identical genome sequence showed that the nearly complete genome for AYVV isolate BaoLoc showed strong genetic relationship with other known isolates from China ( Figure 6E). Therefore, we renamed the virus as AYVCNV isolate BaoLoc (MW012407). Two PCFVd isolates (MW012406 and MW012415) from Vietnam were closely related with an isolate identified in tomato plants in Thailand ( Figure 6F).  Assembled viral genome sequences, as well as matched known viral genome sequences from GenBank, were subjected to the construction of phylogenetic trees. Phylogenetic trees were constructed using the MEGA7 program using the Maximum Likelihood method with a bootstrap of 1000 replicates. The identified viral isolates from this study were marked by the gray color.

Validation of Results for RNA-Seq by RT-PCR
We carried out RT-PCR using virus-specific primers to confirm results of RNA-Seq. Virusspecific primers were designed based on the identified sequence for individual virus and viroid ( Table 1). The results of RT-PCR were similar to those of RNA-Seq (Figure 7). For example, infection Assembled viral genome sequences, as well as matched known viral genome sequences from GenBank, were subjected to the construction of phylogenetic trees. Phylogenetic trees were constructed using the MEGA7 program using the Maximum Likelihood method with a bootstrap of 1000 replicates. The identified viral isolates from this study were marked by the gray color.

Validation of Results for RNA-Seq by RT-PCR
We carried out RT-PCR using virus-specific primers to confirm results of RNA-Seq. Virus-specific primers were designed based on the identified sequence for individual virus and viroid ( Table 1). The results of RT-PCR were similar to those of RNA-Seq (Figure 7). For example, infection of two viruses-TNRV and PCV in the DDRP library-was confirmed by RT-PCR. In addition, we validated infection of six viruses in the DTT library using RT-PCR.

Discussion
Recently, a large number of viruses and viroids infecting tomato and pepper plants have been identified in a single study based on diverse NGS techniques [23][24][25]. Although tomato and pepper plants are widely cultivated in Vietnam and viruses infecting tomato and pepper plants can cause devastating epidemics, little is known about viruses and viroids infecting both plants. In this study, we identified 15 and 14 viral pathogens infecting tomato and pepper plants, respectively, grown in the diverse fields in Vietnam by RNA-Seq.
A previous study identified a total of 22 viruses infecting tomato derived from 170 field-grown samples in China by small RNA sequencing [25]. When we compared our results to the previous study, six viruses, including PVY, ChiVMV, CMV, ToMV, STV, and TYLCV, were commonly identified in both studies. In contrast, six viral pathogens, including AYVCNV, CaCV, PCSV, TNRV, CLVd, and LAYVV, were identified only in our study. Of these, with the exception of AYVCNV, this is the first report of five viral pathogens, including CaCV, PCSV, TNRV, CLVd, and LAYVV, infecting tomato plants in Vietnam. In addition, this is the first report of PCSV, TNRV, and LAYVV infecting tomato in the world.

Discussion
Recently, a large number of viruses and viroids infecting tomato and pepper plants have been identified in a single study based on diverse NGS techniques [23][24][25]. Although tomato and pepper plants are widely cultivated in Vietnam and viruses infecting tomato and pepper plants can cause devastating epidemics, little is known about viruses and viroids infecting both plants. In this study, we identified 15 and 14 viral pathogens infecting tomato and pepper plants, respectively, grown in the diverse fields in Vietnam by RNA-Seq.
A previous study identified a total of 22 viruses infecting tomato derived from 170 field-grown samples in China by small RNA sequencing [25]. When we compared our results to the previous study, six viruses, including PVY, ChiVMV, CMV, ToMV, STV, and TYLCV, were commonly identified in both studies. In contrast, six viral pathogens, including AYVCNV, CaCV, PCSV, TNRV, CLVd, and LAYVV, were identified only in our study. Of these, with the exception of AYVCNV, this is the first report of five viral pathogens, including CaCV, PCSV, TNRV, CLVd, and LAYVV, infecting tomato plants in Vietnam. In addition, this is the first report of PCSV, TNRV, and LAYVV infecting tomato in the world.
Pepper viromes in two different pepper cultivars grown in India have previously been reported, revealing diverse DNA and RNA viruses infecting pepper plants [23]. Interestingly, none of the viruses were identified in both studies, suggesting geographical regions and plant varieties might be important factors for virus diversity. Both tomato and pepper plants are usually cultivated from seeds. In particular, seed transmission of several DNA viruses, such as TYLCV and pepper yellow leaf curl Indonesia virus, in pepper and chili pepper, respectively, have previously been reported [26,27]. Moreover, the seed transmission of viruses infecting tomato has been reported by several research groups [6,28,29]. In addition, seed transmission of two viroids, tomato planta macho viroid and PCFVd, in plants belonging to the family Solanaceae has been reported [8]. By contrast, a recent study demonstrated that TYLCV, known as a seed-transmitted begomovirus, was not seed transmitted in tomato and tobacco plants [30]. This study examined infection of TYLCV in surface-disinfected or untreated seeds, resulting in no infection of TYLCV, suggesting that most of the virus was located externally as a contaminant of the seed coat [30]. Regardless of seed-borne or seed transmission of several viruses and viroids infecting tomato and pepper plants, the seed could be a main factor for virus transmission in tomato and pepper plants in Vietnam.
The advance of NGS techniques facilitates the easy identification of known and unreported viruses in a target plant [21]. Based on RNA-seq, we were able to identify several unreported viruses and viroids in Vietnam. For example, this is the first study reporting eight viruses, including PVY, ChiVMV, CMV, CaCV, PCSV, PCV-2, TNRV, and TMGMV, as well as two viroids, CLVd and PCFVd, infecting pepper in Vietnam. Furthermore, we were able to determine the proportion of individual viral pathogen in a given sample and distribution of identified viruses and viroids in different regions and plants, as shown in the previous studies [31,32].
It is noteworthy that tomato and pepper plants are hosts for a wide range of viruses and viroids, as shown previously [23,25]. In our study, we identified a total of 18 viral species in eight genera. Moreover, the identified viruses and viroids have different kinds of genome types, such as ssDNA, dsRNA, ssRNA, and circular ssRNA. Of these, viruses with ssDNA genomes were frequently identified, suggesting that they could be major factors causing viral diseases in both plants, as suggested previously [33]. In the case of begomoviruses with ssDNA genomes, a different kind of begomovirus was identified in each region. Two viruses, PCV-2 and STV with dsRNA genome, were frequently identified in pepper and tomato plants, respectively. In fact, detailed disease symptoms caused by viruses with dsRNA genome have not been well characterized [34,35].
Furthermore, many viruses and viroids were commonly identified in both tomato and pepper plants, which are members in the family Solanaceae [36][37][38]. Based on our results, geographical region and host were important factors in determining viral population. For example, three libraries, DDT, DDBP, and DDRP, originated from different hosts but from the same region, Don Duong. The composition of viral pathogens in the three libraries was very similar, and PCSV was dominantly present in all three libraries. In the case of NBT and NBHP libraries, from tomato and pepper plants, respectively, grown in Ninh Binh, CLVd was dominant in NBT, while ChiVMV was dominant in NBHP, suggesting host-specific viral populations. We carefully supposed that the dominance of CLVd in NBT library might be associated with seed transmission of CLVd, as reported previously [9]. Some viruses, such as PVY from BLT, were associated with NGS reads, but not detected with PCR. RNA-Seq results demonstrate the number of virus-associated contig and coverage of sequence against the complete genome of each virus. To validate the RNA-Seq analysis using the conventional RT-PCR technique, a high level of sequence coverage is necessary. Although PVY in BLT library was identified by RNA-Seq with high values of viral-associated contigs, the coverage of PVY in BLT was low compared to other viruses. This disagreement between RNA-seq and RT-PCR suggests that we need to carefully check the level of coverage for successful validation by conventional RT-PCR, even though the number of virus-associated contigs was high.
Multiple infection by diverse viruses in a single host is very common. Similarly, at least seven different viral pathogens were found to infect both tomato (BLT) and pepper plants (DDBP).
Furthermore, the analysis of viral population using virus-associated reads revealed that there was, preferentially, a dominant viral pathogen in tomato and pepper plants grown in the same field. However, we could not confirm whether the dominant viral pathogen played a major role in causing typical virus symptoms. Taken together, we identified diverse viruses and viroids infecting tomato and pepper plants grown in the fields in Vietnam by RNA-Seq. Our results showed that infection by different viral pathogens was common in the two plants. However, a specific viral pathogen was dominantly present, depending on the host plants as well as on the isolated regions, suggesting that the geographical region and host plant were two major factors for determining viral populations. Since we could not confirm whether the dominant virus identified by RNA-Seq is the major pathogen in each plant to cause viral symptoms observed, further experiments and screenings for identified viruses are required to provide more information for preventing virus diseases epidemic in pepper and tomato fields. Although we did not provide direct evidence for economic losses caused by virus infections in pepper and tomato plants in Vietnam, our results provide the comprehensive overview of viral pathogens infecting two important plants in the family Solanaceae. Many plant viruses reported in this study could infect diverse host plants and thus present the possibility of the continuous virus disease epidemics in the fields of Vietnam and potential threat to the agricultural industry.

Sample Collection and RNA Sequencing
Leaf samples were collected from plants exhibiting viral symptoms in open fields of pepper and tomato in Vietnam. We pooled leaf samples collected from the same geographical regions and host plants. Description of samples used in this study was were summarized in Table 2. Leaf samples were ground using pestle and mortar in the presence of liquid nitrogen. Total RNAs were extracted using RNeasy plant mini kit (Qiagen, Hilden, Germany). Extracted total RNAs were subjected to library preparation using NEBNext Ultra RNA Library Prep Kit for Illumina according to manufacturer's instruction (NEB, Ipswich, MA, U.S.A.). Detailed library preparation is described in the previous study [39]. The prepared libraries were paired-end (2 × 100 bp) sequenced by HiSeq2000 system (Macrogen, Seoul, Korea). * Geographical regions, host plants, and library names were described. Total read bases, total reads, GC percentage for each library by RNA-Seq were also provided.

Bioinformatic Analyses
Raw sequence files from each library were de novo assembled by the Trinity program with default parameters [40]. The assembled contigs from each library were subjected to a BLASTN search against viral genome database of National Center for Biotechnology Information (NCBI). The obtained virus-associated contigs were again subjected to a BLASTX search against NCBI's non-redundant protein (NR) database to eliminate endogenous virus-like sequences. Finally, we identified viruses infecting tomato and pepper plants based on virus-associated contigs. We mapped raw sequence reads on the reference genomes of identified viruses using the Burrows-Wheeler Aligner (BWA) program with default parameters (http://bio-bwa.sourceforge.net/). The number of mapped reads for identified virus was calculated using bbmap.sh implemented in BBMap program (https://jgi.doe.gov/data-andtools/bbtools/bb-tools-user-guide/bbmap-guide/). The raw data are available at the NCBI database with the BioProject number PRJNA636575.

Construction of Phylogenetic Trees
To generate phylogenetic trees, the assembled genome sequence of each virus was subjected to a BLASTN search against NCBI's GenBank. We retrieved the top ten homologous viral genome sequences for individual virus or viroid. Nucleotide sequences were aligned by ClustalW program [41]. Phylogenetic trees were constructed by MEGA7 program using the maximum likelihood method based on the JTT matrix-based model with a bootstrap of 1000 replicates [42].

Reverse Transcription-Polymerase Chain Reaction (RT-PCR) to Validate Infection of Identified Viruses and Viroids
RT-PCR was carried out using virus-specific primers ( Table 2). The RT-PCR reaction was conducted using the Diastar TM Onestep RT-PCR kit (SolGent, Daejeon, Korea) following conditions based on the manufacturer's instructions. The cycling conditions were 50 • C for 30 min, 95 • C for 15 min, followed by 30 cycles at 95 • C for 20 sec, 50 • C for 40 sec, and 72 • C for 1 min, with a final extension at 72 • C for 5 min. The PCR products were confirmed by gel electrophoresis with 1Kb DNA marker (Bioneer, Daejeon, Korea) in 1% agarose gel with TAE buffer.