Complete Genomic RNA Sequence of Tuberose Mild Mosaic Virus and Tuberose Mild Mottle Virus Acquired by High-Throughput Sequencing

Tuberose (Polianthes tuberosa) is an ornamental flowering crop of the Amaryllidaceae family. Tuberose mild mosaic virus (TuMMV) and tuberose mild mottle virus (TuMMoV), members of the genus Potyvirus, are ubiquitously distributed in most tuberose growing countries worldwide with low biological incidence. Here, we report the first coding-complete genomic RNA of TuMMV and TuMMoV obtained through high-throughput sequencing (HTS) and further, the presence of both the viruses were confirmed using virus-specific primers in RT-PCR assays. Excluding the poly (A) tail, the coding-complete genomic RNA of TuMMV and TuMMoV was 9485 and 9462 nucleotides (nts) in length, respectively, and contained a single large open reading frame (ORF). Polyprotein encoded by both the viral genomes contained nine putative cleavage sites. BLASTn analysis of TuMMV and TuMMoV genomes showed 72.40–76.80% and 67.95–77% nucleotide sequence similarities, respectively, with the existing potyviral sequences. Phylogenetic analysis based on genome sequences showed that TuMMV and TuMMoV clustered in a distinct clade to other potyviruses. Further studies are required to understand the mechanism of symptom development, distribution, genetic variability, and their possible threat to tuberose production in India.


Introduction
Polianthes tuberosa, also known as tuberose, is an important perennial herb within Amaryllidaceae family. In the country's tropical regions, it is frequently grown as an ornamental blooming plant for the development of long-lasting flower spikes. It is also known as "Rajanigandha" or "Nishigandha" in popular parlance. It is believed to have originated in Mexico. In India, it is popularly grown in Andhra Pradesh, Assam, Gujarat, Karnataka, Punjab, Rajasthan, Tamil Nadu, West Bengal, and parts of Uttar Pradesh. It is rich in fragrance constituents as well as secondary metabolites, including polyphenols, nonpolar volatiles, and benzenoid derivatives. Furthermore, the flower and bulb extracts exhibit anti-inflammatory, antispasmodic, diuretic, and emetic effects [1].  [2]. Members of the family are distinguished by host range, vector, genomic features, and phylogeny, with species demarcation frequently based on sequence identity of the main ORF (or, if necessary, the CP-coding region) being <76% (nt) and <82% (aa) [3].
Potyviridae members can be monopartite or bipartite (flexuous filamentous particles 650-950 nm long and 11-20 nm wide), single-stranded positive-sense RNA genomes with sizes ranging from 8.2 kb to 11.3 kb, with an average size of 9.7 kb. Genomic RNA is translated into polyproteins that must be proteolytically processed to provide ten mature and one fusion protein required for replication and motility: P1 (modulator of replication and translation), helper component proteinase HC-Pro (aphid transmission and silencing suppression), P3 (movement virus and replication), P3N-PIPO (cell-to-cell movement), 6K1 (formation of replication vesicles), cytoplasmic inclusion, CI (replication and helicase involved in virus movement), 6K2 (formation of replication vesicles), genome-linked protein VPg (replication, translation, and virus movement), NIa-Pro (polyprotein processing), NIb (RNA-dependent RNA polymerase), and CP (aphid transmission, virion formation and virus movement) [3].
Some members have a narrow host range, whilst others can infect a wide range of hosts [4]. Several of the members cause disease symptoms in commercially significant crops, rendering this genus one of the largest thoroughly researched virus groups by plant virologists. Potyviruses are primarily spread by aphids in a non-persistent manner as well as by mechanical means. Vertical transmission of potyviruses through seeds has also been documented [5]. Considering the economic importance of tuberose, identifying and characterizing the disease-causing viruses of tuberose is a prerequisite for devising suitable management strategies.
The rapid development of high-throughput sequencing (HTS) technology in recent years has not only enabled the investigation of diseases with unknown etiology but also facilitated the identification and characterization of known/novel viruses [6][7][8][9][10]. Unlike traditional antibody and nucleic acid-based detection techniques, such as enzyme-linked immunosorbent assay (ELISA), hybridization and PCR assays, HTS technologies do not warrant any virus-specific primers/probes/antibodies for detection of target pathogens. Through bioinformatic analysis of HTS reads, viruses associated with a particular sample can be unveiled and their genomes be recovered [11]. Besides, HTS facilitates detection of multiple infection of a plant species, thereby, helping us to comprehensively capture the entire virome and viral variability [8][9][10][11][12]. Using HTS technology, this study reports the first coding-complete genomic RNA sequence of tuberose mild mosaic virus (TuMMV) and tuberose mild mottle virus (TuMMoV) from diseased samples in India.

Results and Discussion
Tuberose is an economically important, vegetatively propagated flower crop grown worldwide [13]. In 2021 and 2022, symptoms like chlorotic stripes in leaves that initiated from the base and yellowing of the whole leaf, characteristic of potyvirus infection were observed in tuberose plants grown at the Horticultural Research Centre, Sardar Vallabhbhai Patel University of Agriculture and Technology, Meerut. The disease incidence of 79% was recorded. To detect the presence of viral particles, twenty-four symptomatic leaf samples were collected and samples were observed under TEM, which revealed the presence of flexuous particles of length 778.89 nm (Figure 1). Morphology and size of the virus particle was similar to those of potyviruses [14]. An increasing number of viral isolates have been unraveled utilizing HTS around the world, allowing for major advancements in diagnostic tools [6][7][8][9][10][11][12]15]. Furthermore, HTS technology may provide detailed insights into the etiological and epidemiological correlations of viruses associated with diseased plant samples [8][9][10][11][12]15,16]. Thus, HTS was employed to identify the viral agent causing the observed disease in the present study.
In present study, the obtained TuMMV and TuMMoV genomes contained 9485 and 9462 nt, respectively excluding the poly(A) tail. TuMMV and TuMMoV genomic RNA contained a large ORF, encoding for a polyprotein of 3070 aa and 3081 aa, which began with an ATG codon at 138 and 164 nt and ended with UAG and UAA termination codon at 9350 and 9490 nt, respectively. The characteristic domains, conserved cleavage sites, and motifs specific to the genus Potyvirus [21][22][23] were predicted in the polyprotein sequence encoded by the genomes.  RNA of pooled symptomatic leaf samples was sequenced using the Illumina HiSeq 2000 platform to identify viruses that may be associated with the symptoms. In the two libraries, the Illumina sequencing data generated was approximately 90 million 75 bp paired-end reads and after trimming, 90,380,608 reads (average length 75 bp) were acquired (Table S1). A total of 108,668 contigs were generated from samples of tuberose. BLASTx search against the nr database revealed the complete-genome sequence of tuberose mild mosaic virus (TuMMV) and tuberose mild mottle virus (TuMMoV). Copy numbers of TuMMV and TuMMoV were 11,011 and 108,535, respectively. In public domain, only five partial sequences of TuMMV isolates deposited from Taiwan, India, and USA [17,18], and eight partial sequences of TuMMoV isolates deposited from China, India, Mexico, Netherlands, and Taiwan [19,20] are available up to date.
In present study, the obtained TuMMV and TuMMoV genomes contained 9485 and 9462 nt, respectively excluding the poly(A) tail. TuMMV and TuMMoV genomic RNA contained a large ORF, encoding for a polyprotein of 3070 aa and 3081 aa, which began with an ATG codon at 138 and 164 nt and ended with UAG and UAA termination codon at 9350 and 9490 nt, respectively. The characteristic domains, conserved cleavage sites, and motifs specific to the genus Potyvirus [21][22][23] were predicted in the polyprotein sequence encoded by the genomes.
Furthermore, potyvirus-specific conserved motifs were discovered in proteins encoded by the genomes of both the viruses. The predicted motifs in TuMMV cleavage products are 256 H-(X) 8 [26,27]. Few of these motifs are suggested to play critical roles in the transmission of potyviruses by aphids [28]. Despite the presence of all of these conserved motifs, aphid transmission necessitates species-specific interactions [29,30]. Further researches are needed to determine the aphid species transmitting TuMMV and TuMMoV.
In  (Table 1). According to ICTV species demarcation, these results indicate that they belong to the same species [3,4]. Since 76% nt sequence identity in the complete genome has been defined as the species demarcation criteria for potyviruses [31], the Indian isolates of TuMMV and TuMMoV can be considered members of a distinct potyvirus species.
To analyze the evolutionary relationship between TuMMV, TuMMoV, and 44 other potyviruses, a phylogenetic tree based on their complete genomes was constructed. The results showed that these potyviruses can be divided into two distinct groups, subgroup I and subgroup II. Further, subgroup I divided into three distinct groups, subgroup Ia, subgroup Ib, and subgroup Ic. TuMMV and TuMMoV genomes that shared 65.20% nt identity clustered together in a single subgroup Ia, and both the viruses were showed genetic relatedness with papaya ring spot virus and Moroccan watermelon mosaic virus ( Figure 2).   RT-PCR assay using primers mentioned in Table 1 yielded an expected amplicon of 700 bp ( Figure S1) from 31 samples representing four cultivars, including samples used for HTS. The sequences obtained were deposited to NCBI GenBank with accession numbers ON116188-ON116195 and ON241024-ON241026. BLASTn analysis of the partial CP gene sequence of TuMMV showed 87.33-97.31% nt sequence identity with other TuMMV isolates available in GenBank while partial CP TuMMoV sequences of this study shared 87.33-97.31% nt identities with other TuMMoV isolates available in GenBank. Pairwise sequence identity comparison of the partial CP sequence of TuMMoV with similar sequences shared 95.80-98.40% nt and 96.0-99.56% aa identities with the global isolates ( Figure 3B). The TuMMV isolate shared highest sequence identities (97.33% nt and 96.7% aa) with a Taiwanian (EF137178) isolate while TuMMoV shared highest sequence identities (98.40% nt and 99.56%) with a Chinese isolate (AJ581528) whereas the suggested threshold for species demarcation in the coat protein is 76% [3,4].

Sample Collection and Electron Microscopy
Symptoms of viral disease were observed in the experimental field of tuberose cv. Mexican-single, Hyderabad-single, Rajni-phule, and Sikkim-selection during January 2021 at the Horticulture Research Center, Sardar Vallabhbhai Patel University of Agriculture and Technology, Meerut, India. Leaf samples exhibiting typical symptoms of viral diseases such as mosaic, puckering, and stunting were collected and stored at −80 °C till further analysis along with asymptomatic samples. Sap obtained from symptomatic leaf samples were examined under transmission electron microscope (TEM) after staining with 2% aqueous uranyl acetate (UA) by following the leaf dip method [32].

RNA Extraction, Library Preparation and HTS
Leaf samples of Mexican-single and Hyderabad-single cultivar were subjected to total RNA extraction using GeneJET RNA purification kit (Thermo Scientific, MA, USA) in accordance with the user instructions manual. The quality and quantity of total RNA were assessed using Qubit™ fluorometer (Thermo Fisher Scientific, MA, USA) and TapeStation (Agilent, CA, USA). From the isolated pool of total RNA, ribosomal RNA was depleted using the Ribo-Zero rRNA Removal Kit (Illumina, San Diego, CA, USA). Sequencing libraries were prepared from the ribodepleted RNA pool using truseq-stranded-total RNA library preparation kit (Illumina, CA, USA) by following the manufacturer's instructions. Bioanalyzer (Agilent, CA, USA) and TapeStation were used to verify the quality of the prepared libraries and the libraries were sequenced using Illumina HiSeq 2000 platform at NxGenBio Life Sciences (New Delhi, India) to produce 2 × 75 bp paired-end reads.

Data Processing and Virus Analysis
Raw reads obtained were imported into CLC Genomic workbench (20.0.4) and quality trimmed using the trimming tool to remove ambiguous and adaptor sequences. Trimmed reads were assembled using the de novo assembly tool in CLC Genomic workbench (20.0.4) to obtain longer contigs. Obtained contigs were subjected to BlastX analysis To better understand the genetic variability of the TuMMV isolates, a phylogenetic tree was constructed based on the partial CP coding sequences of TuMMV and TuMMoV from different geographical locations. In the phylogenetic tree, the TuMMV isolate of this study fell in sister clade to the TuMMV Taiwanian isolates while the TuMMoV isolate obtained in the current study formed a sister clade to the TuMMoV isolates of China ( Figure 3A).
Phylogenetic analysis based on the genomic RNA sequence ( Figure 2) and partial coat protein region (Figure 3) showed consistent clustering of isolates suggesting a close relationship between the Indian isolate and other TuMMV and TuMMoV isolates, supported by high posterior probability values.

Sample Collection and Electron Microscopy
Symptoms of viral disease were observed in the experimental field of tuberose cv. Mexican-single, Hyderabad-single, Rajni-phule, and Sikkim-selection during January 2021 at the Horticulture Research Center, Sardar Vallabhbhai Patel University of Agriculture and Technology, Meerut, India. Leaf samples exhibiting typical symptoms of viral diseases such as mosaic, puckering, and stunting were collected and stored at −80 • C till further analysis along with asymptomatic samples. Sap obtained from symptomatic leaf samples were examined under transmission electron microscope (TEM) after staining with 2% aqueous uranyl acetate (UA) by following the leaf dip method [32].

RNA Extraction, Library Preparation and HTS
Leaf samples of Mexican-single and Hyderabad-single cultivar were subjected to total RNA extraction using GeneJET RNA purification kit (Thermo Scientific, MA, USA) in accordance with the user instructions manual. The quality and quantity of total RNA were assessed using Qubit™ fluorometer (Thermo Fisher Scientific, MA, USA) and TapeStation (Agilent, CA, USA). From the isolated pool of total RNA, ribosomal RNA was depleted using the Ribo-Zero rRNA Removal Kit (Illumina, San Diego, CA, USA). Sequencing libraries were prepared from the ribodepleted RNA pool using truseq-stranded-total RNA library preparation kit (Illumina, CA, USA) by following the manufacturer's instructions.
Bioanalyzer (Agilent, CA, USA) and TapeStation were used to verify the quality of the prepared libraries and the libraries were sequenced using Illumina HiSeq 2000 platform at NxGenBio Life Sciences (New Delhi, India) to produce 2 × 75 bp paired-end reads.

Data Processing and Virus Analysis
Raw reads obtained were imported into CLC Genomic workbench (20.0.4) and quality trimmed using the trimming tool to remove ambiguous and adaptor sequences. Trimmed reads were assembled using the de novo assembly tool in CLC Genomic workbench (20.0.4) to obtain longer contigs. Obtained contigs were subjected to BlastX analysis against nonredundant (NR) database in the workbench. The alignment against reference viral genomes in the database were made using OmicsBox 2.1 (https://www.biobam.com/omicsbox, accessed on 23 February 2022). The outcomes showed in a report have a number of assembled reads and total used reads. Open reading frames (ORFs) encoded by the putative viral genomes were determined by NCBI ORF Finder (https://www.ncbi.nlm.nih.gov/ orffinder/, accessed on 2 March 2022). Conserved domains in the virus proteins were predicted using NCBI Conserved Domain-Search tool (https://www.ncbi.nlm.nih.gov/ Structure/cdd/wrpsb.cgi, accessed on 2 March 2022), and the genome organization of the identified viruses was constructed using the Bioedit 7.2 program [33].

Sequence Similarity and Phylogenetic Analysis
Genome sequences of identified viruses along with the available genome sequences of known potyviruses were retrieved and subjected to multiple sequence alignment using the CLUSTALW tool in MEGA-X [34]. Phylogenetic trees were constructed using neighbourjoining method and 1000 bootstrap replicates in MEGA-X. The nucleotide and amino acid sequence compositions and percent sequence identities were calculated using Bioedit 7.2 program [33] and visualized by the Sequence Demarcation Tool version 1.2 [35].

RT-PCR and Sanger Sequencing
Total RNA isolation was performed from four tuberose cvs. Mexican-single, Hyderabadsingle, Rajni-phule, and Sikkim-selection as described in Section 3.2. cDNAs were synthesized from isolated RNA using random hexamers (1 µL) and RevertAid Reverse Transcriptase kit (Thermo Scientific, Waltham, MA, USA) according to the manufacturer's protocol. PCR amplification of identified viruses was performed using virus-specific primers designed from the coat protein (CP) region of the obtained viral genomes (Table S4). The final 25 µL reaction volume contained 1x reaction buffer, 10 µM of each forward and reverse primers and 12.5 µL of DreamTaq PCR MasterMix (Thermo Scientific, Waltham, MA, USA). The reaction conditions were as follows: initial denaturation at 94 • C for 10 min, 35 cycles of 94 • C for 30 s, 52/55 • C for 45 s, 72 • C for 1 min followed by final extension at 72 • C for 10 min. PCR amplicons were purified and subjected to bidirectional Sanger sequencing at Centyle Biotech Pvt., Ltd., New Delhi, India.

Conclusions
Virus diseases limit tuberose production worldwide. With increased global trade, the possibility of introduction of tuberose viruses into newer areas grows by the day. As a result, rapid and reliable tuberose virus detection techniques are required to halt this inadvertent introduction and ensure virus-free tuberose production. We used HTS in this study and determined the first coding-complete genome sequences of TuMMV and TuMMoV. The genome sequences would be useful in providing a clearer picture of the virus evolution and in inferring biological properties of the viruses, which ultimately would help in devising management strategies. However, biological studies are required to understand the mode of transmission and effects of single/mixed viral infection on tuberose plants.