Molecular Characterization of a Novel Polerovirus Infecting Soybean in China

Poleroviruses are positive-sense, single-stranded viruses. In this study, we describe the identification of a novel polerovirus isolated from soybean displaying curled leaves. The complete viral genome sequence was identified using high-throughput sequencing and confirmed using rapid amplification of cDNA ends (RACE), RT-PCR and Sanger sequencing. Its genome organization is typical of the members of genus Polerovirus, containing seven putative open reading frames (ORFs). The full genome is composed of single-stranded RNA of 5822 nucleotides in length, with the highest nucleotide sequence identity (79.07% with 63% coverage) for cowpea polerovirus 2 (CPPV2). Amino acid sequence identities of the protein products between the virus and its relatives are below the threshold determined by the International Committee of Taxonomy of Viruses (ICTV) for species demarcation, and this strongly supports this virus’ status as a novel species, for which the name soybean chlorotic leafroll virus (SbCLRV) is proposed. Recombination analysis identified a recombination event in the ORF5 of the 3’ portion in the genome. Phylogenetic analyses of the genome and encoded protein sequences revealed that the new virus is closely related to phasey bean mild yellows virus, CPPV2 and siratro latent polerovirus. Subsequently, we demonstrated the infectivity of SbCLRV in Nicotiana benthamiana via infectious cDNA clone generation and agroinoculation.


Introduction
Solemoviridae is a recently recognized plant-infecting virus family divided into four genera, named Enamovirus, Polemovirus, Polerovirus and Sobemovirus. The family members are characterized by a single-stranded positive-sense RNA genome of 4-4.6 kb in length and an icosahedral (t = 3) virion with a diameter of 26-34 nm [1]. Polerovirus contains the most species among the four genera [1]. The typical polerovirus genome is organized by seven ORFs, with a viral protein genome linked (VPg) covalently linked to the 5 -end. ORF0 encodes the viral silencing suppressor (VSR) [2]. ORF1 encodes P1, which contains the VPg and serine proteinase domain [3,4]. In addition, ORF1 could be expressed with ORF2 in a −1 ribosomal frameshift manner as a viral RNA-dependent RNA polymerase (RdRp) [5]. ORF3 and ORF4 are expressed by translation from subgenomic RNAs to encode coat protein (CP) [6,7] and movement protein (MP), respectively [8]. ORF5 is expressed via translational readthrough of the leaky stop codon of ORF3 to produce the putative P3-P5 fusion protein, which is necessary for vector transmission of the viral component [9]. A short non-AUG-initiated ORF, termed ORF3a, is located in the upstream of ORF3 and is involved in the viral long-distance movement and systemic infection of plants [10,11]. Soybean (Glycine max L.) is an economically important oilseed crop and vegetableprotein resource. The soybean crops are constantly threatened by diverse viruses, such as soybean mosaic virus (SMV), cucumber mosaic virus (CMV) and bean common mosaic virus (BCMV), resulting in global yield reduction. SMV is the most devastating and widely distributed virus in China [12]. In recent years, increasingly novel viruses were identified by deep sequencing virus-derived small-interfering RNAs (siRNA). followed by de novo assembly of the complete genome. The accurate identification of new viruses or novel virus strains is important for the control of viral pandemics. Here, soybean leaves showing symptoms of viral infection were collected from the Guizhou province of China. A virus detected by high-throughput sequencing was determined to be associated with this symptom. The sequence analysis and phylogenetic results strongly suggested that the virus is a novel species in the Polerovirus genus. Thus, we have provisionally named it soybean chlorotic leafroll virus (SbCLRV). To our knowledge, this is the first polerovirus reported to infect soybean in China.

Sample Collection and RNA Extraction
The soybean leaves displaying virus-like symptoms were collected from the field in Guizhou province in China. For the viral genome sequencing and genome amplification, total RNA isolation was performed with TRIzol reagent (Tiangen, Beijing, China). The RNA integrity and quality were assessed via gel electrophoresis and absorbance OD260/280 nm using the nanodrop spectrometer, respectively. Good-quality RNA samples were thereby available for the subsequent research.

Small RNA Library Construction, Sequencing and Data Processing
For small RNA-sequencing, TruSeq Small RNA Sample Preparation Kit (Illumina, San Diego, CA, USA) was used for the library construction. Briefly, 3 and 5 adapters were first ligated to the RNAs, followed by reverse transcription and amplification to construct the cDNA library. Based on the band size, the adapter-ligated small RNA fragments were purified from the PAGE gel as the concentrated final library. The qualified library was submitted to HiSeq 2000 platform (Illumina, Inc., CA, USA) for further analyses. The generated clean reads were mapped to the GenBank Virus RefSeq database using bowtie [13]. Virus reference genomes with the most abundant sRNA reads mapped were obtained. Meanwhile, contigs assembled through SPAdes [14] and velvet [15] processing methods were blasted against the reference genomes and annotated.

Full-Length Genome Amplification, Sanger Sequencing
First-strand cDNA synthesis was performed with SuperScript II Reverse Transcriptase Kit (Invitrogen, Carlsbad, CA, USA) according to the supplier's instructions. Briefly, incubation temperature was 42 • C for 30 min, followed by 15 min at 70 • C to stop the reaction and 20 min at 37 • C in presence of RNaseH (Invitrogen, Carlsbad, CA, USA) to degrade the RNA. The 5 -and 3 -terminal sequence of the genomic RNAs were determined by 5 -and 3 -rapid amplification of cDNA ends (RACE) experiments using the 5 /3 RACE Kit, 2nd Generation (Roche, Germany) according to the manufacturer's protocol. The specific primers for cDNA synthesis and RACE (primer pairs for the whole genome generation) were designed based on the assembled contigs (Table S1). PCR amplifications were performed with Pfu DNA Polymerase (Promega Inc, Madison, WI, USA). For Sanger sequencing, the fragments of expected size were purified with E.Z.N.A. Gel Extraction Kit (Omega Bio-tek, Norcross, GA, USA) or E.Z.N.A. Cycle Pure Kit (Omega Bio-tek, Norcross, GA, USA) for nucleic acids containing agarose gel purification or common PCR products purification, respectively. The purified fragments were cloned into pEASY-Blunt Cloning vector (TransGen, Beijing, China) and sequenced commercially (Qingke Biotech, Chongqing, China) for sequence validation with the common primer pairs: M13-47/M13-48 (Table S1).

Viral Genome Characterization
The whole genome sequences were assembled with the aid of CLC Genomics Workbench 21.0.1 (Qiangen, Hilden, Germany) and DNAMAN software version 9.0 (Lynnon Biosoft, QC, Canada). Nucleotide sequences of ORFs on the genome were translated into amino acid sequences. The nucleotide and amino acid sequences of other poleroviruses were retrieved from the Genbank database in the National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/, accessed on 19 May 2022). The newly assembled complete genome sequence was submitted to the Genbank database in NCBI through BankIt.

Phylogenetic Analyses
For each polerovirus, one representative isolate was selected for analysis. ClustalW method was used for multiple sequence alignments. Phylogenetics tree construction was performed using MEGA 11.0.10 software (for windows) with maximum-likelihood (ML) algorithms [16], and the statistical confidence of branching was estimated via bootstrap analysis with 1000 replications. The SDT v1.2 software was used to display the pairwise identity of the ORFs aligned by the ClustalW program [17]. RDP4.3 software was used for the recombination events analyses among the distinct poleroviruses [18].

Construction of SbCLRV Infectious Clone and Agroinfiltration
For construction of the infectious clone, four fragments covering the SbCLRV fulllength genomic RNA were amplified and inserted into the pCass-RZ binary vector using the ClonExpress MultiS One Step Cloning Kit (Vazyme, Nanjing, China). Four primer pairs (Table S1)  For agroinfiltration, we first introduced the pCass-SbCLRV into the Agrobacterium tumefaciens strain GV3101. The transformed agrobacteria were infiltrated into the abaxial surface of the Nicotiana benthamiana seedlings leaves. Mock infiltrations were conducted with the pCass-RZ empty vector-transformed A. tumefaciens cells. The inoculated plants were kept in a greenhouse under condition 16/8 h (light/dark) photoperiod for symptoms to develop. After obvious symptoms developed in the systemically infected leaves, symptomatic tissues were detected by RT-PCR with viral-specific primers pXT297/pXT298 (Table S1). NbActin was used as an internal control to normalize the RNA level [19]. PCR products were analyzed on a 2% agarose gel.

Strand-Specific RT-PCR
RT-PCR was used for the strand-specific detection of positive-and negative-strand SbCLRV RNA. For each RNA, two RT reactions were performed with SuperScript II Reverse Transcriptase Kit (Invitrogen, Carlsbad, CA, USA) with specific primer pXT297 or pXT298, following the manufacturer's instructions. Two microliters of cDNA were used as template for the PCR reaction in presence of primers pXT297 and pXT298, which produce a 1150 bp fragment. A seminested PCR was carried out using 1 ul of the first PCR run product as template, with the primer pXT297 and pXT299, to generate a 238 bp fragment.

Identification of Novel Virus through High-Throughput Sequencing
During a field survey conducted in July 2021 in a pepper and soybean intercropping field located in Guizhou (China), a soybean seedling with virus-like symptom was observed ( Figure 1A). The infected leaves showed curling, and the downward leaves exhibited mild chlorosis. field located in Guizhou (China), a soybean seedling with virus-like symptom was observed ( Figure 1A). The infected leaves showed curling, and the downward leaves exhibited mild chlorosis.
The HTS method was used to verify the viral agent. We constructed the small RNA library, which was then submitted to the Illumina Hiseq TM 2000 platform for analysis. A total of 14,174,269 clean reads were generated. We removed the host sequences by mapping against the Glycine max reference genome and screening out the reads of 18 to 26 nt in length for the subsequent viral siRNA analyses. Finally, 32,468 reads were mapped to the Genbank Virus RefSeq. De novo assemblies were performed using SPAdes and velvet methods, which produced 440 and 394 contigs and ranged from 172 to 3230 nt in length, respectively. Most of the contigs were annotated as cowpea polerovirus 2 (CPPV2) and phasey bean mild yellows virus (PBMYV). The contigs mapped to the polerovirus genomes showed lower nucleotide identities ranging from 45% to 87%, which indicated that the agent might be a novel polerovirus.  The HTS method was used to verify the viral agent. We constructed the small RNA library, which was then submitted to the Illumina Hiseq TM 2000 platform for analysis. A total of 14,174,269 clean reads were generated. We removed the host sequences by mapping against the Glycine max reference genome and screening out the reads of 18 to 26 nt in length for the subsequent viral siRNA analyses. Finally, 32,468 reads were mapped to the Genbank Virus RefSeq. De novo assemblies were performed using SPAdes and velvet methods, which produced 440 and 394 contigs and ranged from 172 to 3230 nt in length, respectively. Most of the contigs were annotated as cowpea polerovirus 2 (CPPV2) and phasey bean mild yellows virus (PBMYV). The contigs mapped to the polerovirus genomes showed lower nucleotide identities ranging from 45% to 87%, which indicated that the agent might be a novel polerovirus.

Complete Sequence and Organization of SbCLRV Genome
To further confirm the HTS results and identify the complete virus genomic sequence, total RNA was extracted from the diseased leaves for the subsequent RACE and PCR experiments. Primers were designed, based on the assembled contig sequences, to generate viral genome amplicons with overlapping regions reserved (Table S1). By piecing and trimming, the complete genomic sequence of 5822 nt in length was obtained. A BLASTn search with the nucleotide sequence showed 63% coverage and 79.07% sequence identity for the CPPV2 isolate BE179 (KY364847.1) as the highest scoring hit, followed by 47% coverage and 82% sequence identity for several PBMYV isolates. Thus, we referred to these sequences for the genomic organization characterization. An ORF finder was applied for the open reading frames prediction and computational identification. Six large ORFs were found on the plus strand in three reading frames ( Figure 1B), which showed a genomic organization typical of polerovirus. The predicted protein products were submitted to the BLASTp search, which returned results on known poleroviruses (Table 1), and further characterized. We provisionally named this new genome soybean chlorotic leafroll virus (SbCLRV) and deposited it in Genbank with the accession number OM507197. ORF0 is positioned at 70-846 nt in the genome ( Figure 1B), predicted to encode the 258 aa RNA silencing suppressor protein (P0) with a predicted molecular mass of 29.4 kDa. The amino acid sequence shares 44.07% identity (90% query coverage, 6 × 10 −54 E-value) to the P0 of PBMYV (QTJ01847.1) ( Table 1). P0 protein of polerovirus was reported to carry an F-box motif (LPxxI/L) to form the SCF-like complex, which is involved in the RNA silencing suppressor function [20]. As expected, we found a F-box-like motif in the P0 of SbCLRV without conservation of the P as LSLLL, which was located at 60 aa ( Figure S1). ORF1 is located at 215-2194 nt position ( Figure 1B), expected to encode the 659 aa P1 protein with a calculated molecular weight of 73.1 kDa. P1 is most similar (61.88% amino acid identity, 97% coverage and 0 E-value) to the P1 of CpCV2 (YP_009352253.1) ( Table 1). Meanwhile, ORF1 is expected to yield a −1 ribosomal frameshift polypeptide by combination with ORF2 ( Figure 1B). The −1 ribosomal frameshifting event is mediated by a putative slippery sequence (X XXY YYZ) [21]. We performed the slippery sequence prediction and found a G GGA AAC sequence located at the 1699 nt position, which connects ORF1 and 2 (215-1699, 1702-3471 nt) to encode a 1085 aa P1-P2 protein of 122.4 kDa. The P1-P2 protein is with the highest identity (73.63% amino acid identity, 97% coverage and 0 E-value) to the P1-P2 from CpCV2. ORF3 starts at 3666 nt and ends at 4259 nt ( Figure 1B), encoding the coat protein P3 of 197 aa (21.9 kDa) with 88% identity (82.39% query coverage, 2 × 10 −84 E-value) to P3 of PBMYV (QTJ01893.1) ( Table 1) (4257-4259 nt) ( Figure 1B). The resulting P3-P5 fusion protein (706 aa, 79.6 kDa) shares 59.38% identity (90% query coverage, 0 E-value) with siratro latent polerovirus (SLPV) P3-P5 (QBR53291.1) ( Table 1). We found a C-rich sequence presented at the 5 end of the ORF5, which encodes a typical proline-rich sequence downstream of the P3 stop codon. ORF4 (3691-4263 nt) encodes movement protein (P4) ( Figure 1B), with high identity but low query coverage (98% amino acid identity, 56.15% coverage and 3 × 10 −63 E-value) to P4 from CPPV2 (QXU64018.1) ( Table 1). A non-canonical ORF, ORF3a, starts with start codon ATA (3548-3550 nt) and ends at 3685 nt ( Figure 1B). The predicted SbCLRV P3a (45 aa, 4.9 kDa) is most similar (78% amino acid identity, 84% coverage and 2 × 10 −22 E-value) to PBMYV P3a protein (QHI06645.1) ( Table 1). In addition, the 5 -UTR (1-69 nt) and 3 -UTR (5790-5822 nt) were of 69 nt and 32 nt in length, respectively.

Analysis of the Virus-Derived sRNAs
The vsiRNAs were mapped to the SbCLRV genome, which can be visualized in viral small RNA hotspots along the plot ( Figure 1C). In the single nucleotide resolution map, vsiRNAs were distributed throughout the genome in positive and negative orientations. Size-class distribution of the mapping sRNA revealed a prevalence of 21 nt, followed by 22 nt (Figure 1C). Analysis of the 5 -nucleotides bias of the SbCLRV-derived sRNAs showed a prevalence for 21 nt with U, 22 nt with G ( Figure 1C).

Phylogenetic Relationship of SbCLRV with Other Poleroviruses
Several known polerovirus genome sequences were retrieved from the Genbank database for pairwise sequence identity analysis against SbCLRV using SDT software. The results showed that the pairwise sequence identities were relatively high between SbCLRV and PBMYV, CpCV2, SLPV or groundnut rosette assistor virus (GRAV), which ranged from 61.1% to 70.8% (Figure 2A, Table S2).
To better evaluate the relationship of SbCLRV within the Polerovirus, we constructed ML phylogenetic trees for analyses based on the amino acid sequences of P1-P2 (model WAG + G) ( Figure 2D), P3 (model JTT + G) ( Figure 2E) and the full-length genomic nucleotide sequences (model GTR + G + I) ( Figure 2C). The SbCLRV was showed to be a distinct member of Polerovirus, more closely related to a subgroup composed of PBMYV, CPPV2, SLPV and GRAV in each of the phylogenetic tree ( Figure 2C-E).  Table S3.

SbCLRV Infectivity in N. benthamiana Plants
To investigate the infectivity of this novel virus, the full-length clone of SbCLRV was constructed as pCass-SbCLRV for agroinoculation assay. We infiltrated the recombinant A. tumefaciens cells into the N. benthamiana seedlings leaves. Ten days later, chlorosis and leaf curling symptoms showed on the upper leaves of the inoculated seedlings ( Figure  3A). RT-PCRs with SbCLRV-specific primers were performed against the symptomatic upper leaves and shown to be positive ( Figure 3B, Table S1). In addition, we performed  Table S3.

SbCLRV Infectivity in N. benthamiana Plants
To investigate the infectivity of this novel virus, the full-length clone of SbCLRV was constructed as pCass-SbCLRV for agroinoculation assay. We infiltrated the recombinant A. tumefaciens cells into the N. benthamiana seedlings leaves. Ten days later, chlorosis and leaf curling symptoms showed on the upper leaves of the inoculated seedlings ( Figure 3A). RT-PCRs with SbCLRV-specific primers were performed against the symptomatic upper leaves and shown to be positive ( Figure 3B, Table S1). In addition, we performed the negative strand-specific RT-PCR and seminested RT-PCR to detect the replication of SbCLRV, since SbCLRV is a positive-strand virus and negative-strand RNA is only found when it is replicating. The replicating virus was detected in all of the positive samples ( Figure 3C). These results indicate that SbCLRV could infect N. benthamiana.
Viruses 2022, 14, x FOR PEER REVIEW 8 of 11 the negative strand-specific RT-PCR and seminested RT-PCR to detect the replication of SbCLRV, since SbCLRV is a positive-strand virus and negative-strand RNA is only found when it is replicating. The replicating virus was detected in all of the positive samples ( Figure 3C). These results indicate that SbCLRV could infect N. benthamiana.

Discussion
In this paper, a full-length sequence of soybean chlorotic leafroll virus (SbCLRV) was determined, which has all the genomic characteristics of polerovirus and groups with other poleroviruses in the phylogenetic trees. Polerovirus genome gathers several organization and expression strategies that especially identify SbCLRV. (i) The F-box motif (LPxxI/L) in P0 is required for silencing suppressor activity [20]. This motif of SbCLRV is present as LSLLL, which is not fully conserved. As shown in Figure S1, variations are always observed in this motif, such as LCFLLR in PBMYV and GRAV. However, the RNA silencing suppressor functions of these P0 proteins need to be further confirmed. (ii) The −1 ribosomal frameshift signal in ORF1/ORF2. The site is always associated with a slippery

Discussion
In this paper, a full-length sequence of soybean chlorotic leafroll virus (SbCLRV) was determined, which has all the genomic characteristics of polerovirus and groups with other poleroviruses in the phylogenetic trees. Polerovirus genome gathers several organization and expression strategies that especially identify SbCLRV. (i) The F-box motif (LPxxI/L) in P0 is required for silencing suppressor activity [20]. This motif of SbCLRV is present as LSLLL, which is not fully conserved. As shown in Figure S1, variations are always observed in this motif, such as LCFLLR in PBMYV and GRAV. However, the RNA silencing suppressor functions of these P0 proteins need to be further confirmed. (ii) The −1 ribosomal frameshift signal in ORF1/ORF2. The site is always associated with a slippery sequence (X XXY YYZ) and a downstream RNA pseudoknot [5,21,22]. (iii) Non- AUG-initiated ORF3a [10,11]. The amino acid sequence of P3a is conserved in residues as well as length (45-46 aa). (iv) A proline-rich sequence downstream of the P3 stop codon [23]. According to the International Committee of Taxonomy of Viruses (ICTV) [1], the species-demarcation criteria in the family Solemoviridae suggest that the amino acid sequence identity difference in any of the predicted protein products between two different species should be greater than 10%. The identities of all the SbCLRV-encoded proteins ranged from 44.07% to 98% to the best matched sequences, meeting with the criteria for new species. With this guideline, we propose that SbCLRV (Genbank accession no. OM507197) should be considered a new member within the genus Polerovirus.
We analyzed the SbCLRV-derived vsiRNAs and found a majority of 21 and 22 nt vsiRNAs, which is consistent with previous reports on various virus-derived vsiRNA patterns [24][25][26], indicating that DCL4 and DCL2 play dominant roles in generating vsiRNAs. Different 5 -nucleotides of sRNAs are associated with special AGO recruitment preferences. AGO1 mainly recruits siRNAs with U at the 5 end [27,28]. We found that the 21 nt vsiRNAs exhibited 5 -U bias, which indicated that they were prevalently recruited by the AGO1 complex. For 22 nt vsiRNA, 5 -G bias was observed, which has seldom been reported previously with a particular AGOs preference.
A potential recombination event for SbCLRV, with MABYV and CPPV2, was identified within the P5 cistron by recombination detection analysis. It has been reported that polerovirus P5 protein was highly diversified in the C-terminus but conserved in the N-terminus [29]. We confirmed this with a ClustalW alignment based on the P5 amino acid sequences among several poleroviruses ( Figure S2). We found a dividing line between the conserved and variable sequences of P5 at the 227 aa (4940 nt located in the SbCLRV genome), which was located within the recombination region (4781-5177 nt), indicating that the recombination integrated both the N-conserved and C-variable ends ( Figure S2). The recombination event associated with ORF5 seems to be common in many poleroviruses [23,30,31]. It could be assumed that the recombination with the P5 conserved sequences broadens the potential host-range since P5 is related to the aphid transmission, while recombination with the diversified sequences contributes to new polerovirus development [9]. Meanwhile, co-infections might result from the recombination. In this study, we found SbCLRV recombination associated with the CPPV2 major parent and a MABYV minor one. As inferred, the host range might be extended, although MABYV has not been reported to be hosted by legume plants [30]. Since RdRp and CP are conserved proteins in poleroviruses [24,31], we used their amino acid sequences for the phylogenetic analyses. Additionally, phylogenetic analysis based on complete genome sequences is necessary [23]. Using pairwise nucleotide sequence analysis and phylogenetic analyses, we found that SbCLRV is closely related to the PBMYV, SLPV and CPPV2, suggesting a close relationship between these legume crop-infecting poleroviruses. Apart from the significance of the disease induced by these viruses on the cultivation of legume, these will be an excellent case study of poleroviruses speciation and modular evolution [23].
It was known that polerovirus is obligatorily phloem-limited and relies on aphids for transmission [31]. These characters make it difficult to conduct further functional study on Polerovirus members. However, this difficulty has been conquered through the developments of many polerovirus cDNA infectious clones [23,24,32,33]. Usually, an infectious clone is constructed under the control of the cauliflower mosaic virus (CaMV) 35S promoter and fused with a ribozyme (RZ) for polyprotein cleavage. Moreover, agroinfection via A. tumefaciens harboring the infectious clone functions as an alternative procedure for aphids with regards to polerovirus infection. In our study, agroinoculation of the combinant pCass-SbCLRV on N. benthamiana produced the SbCLRV-like symptom, which demonstrated the SbCLRV infectivity. However, this agroinfection assay could not be reproduced in soybean, although legumes have been agroinfected. We will investigate whether this is the genotype of the soybean associated with the SbCLRV infectivity.
In addition, further work should focus on diagnostic tool development for controlling and