Molecular Characterization of the Complete Genome of Three Basal-BR Isolates of Turnip mosaic virus Infecting Raphanus sativus in China

Turnip mosaic virus (TuMV) infects crops of plant species in the family Brassicaceae worldwide. TuMV isolates were clustered to five lineages corresponding to basal-B, basal-BR, Asian-BR, world-B and OMs. Here, we determined the complete genome sequences of three TuMV basal-BR isolates infecting radish from Shandong and Jilin Provinces in China. Their genomes were all composed of 9833 nucleotides, excluding the 3′-terminal poly(A) tail. They contained two open reading frames (ORFs), with the large one encoding a polyprotein of 3164 amino acids and the small overlapping ORF encoding a PIPO protein of 61 amino acids, which contained the typically conserved motifs found in members of the genus Potyvirus. In pairwise comparison with 30 other TuMV genome sequences, these three isolates shared their highest identities with isolates from Eurasian countries (Germany, Italy, Turkey and China). Recombination analysis showed that the three isolates in this study had no “clear” recombination. The analyses of conserved amino acids changed between groups showed that the codons in the TuMV out group (OGp) and OMs group were the same at three codon sites (852, 1006, 1548), and the other TuMV groups (basal-B, basal-BR, Asian-BR, world-B) were different. This pattern suggests that the codon in the OMs progenitor did not change but that in the other TuMV groups the progenitor sequence did change at divergence. Genetic diversity analyses indicate that the PIPO gene was under the highest selection pressure and the selection pressure on P3N-PIPO and P3 was almost the same. It suggests that most of the selection pressure on P3 was probably imposed through P3N-PIPO.


Introduction
Turnip mosaic virus (TuMV) is a member of the genus Potyvirus and possesses an exceptionally broad host range in terms of plant genera and families compared to any other potyvirus; it is known to infect at least 318 species of 156 genera belonging to 43 families of plants representing many arable, vegetable and ornamental crops, especially in the family Brassicaceae [1,2]. TuMV was ranked second only to Cucumber mosaic virus as the most important virus infecting field-grown vegetables in a survey of virus diseases in 28 countries and regions [3].
TuMV forms flexuous filamentous particles 700-750 nm in length, each of which contains a single copy of the genome, which is a single-stranded, positive-sense RNA molecule of about 10,000 nt. The virion RNA is infectious and serves as both the genome and viral messenger RNA. The genomic RNA is translated into one large polyprotein which is subsequently processed by the action of
In NIb, the conserved motif 2611 FDSS 2614 was located 266 aa upstream of the putative NIb/CP cleavage site [35], and 2710 GDD 2712 was necessary for RNA polymerase activity and NTP binding [35]. Like the CPs of most potyviruses, these three TuMV isolates also carried the 2882 DAG 2884 motif interacting with the PTK of HC-Pro as an important factor related to aphid transmission, and the 3054 R-(X) 43 -D 3098 motif associated with virus movement [36]. In addition, The three consensus motifs, 3013 MVWCIENGTSP 3023 , 3096 AFDF 3099 and 3116 QMKAAA 3121 , were also found in the CPs of the CCLB, LWLB and WFLB14 isolates [35]. The recently described ORF coding the putative protein PIPO [6] was identified within the P3 ORF expressed by a +2 ribosomal frameshift, starting from a G 1 A 4 motif at position 3076 ( Figure 1). This motif was distinct from the highly conserved motif G 1-2 A 6-7 that is known for other potyviruses [6] and ended with a UAA termination codon at position 3258-3261.

Percentage Identity
Pairwise comparisons of the CCLB, LWLB and WFLB14 genome sequences with those of 30 other TuMV isolates available in sequence databases show that the most closely related isolate is DEU4 (AB701701), which was the BR pathotype from Germany, sharing 95.0%, 94.7% and 94.8% nucleotide identity, respectively ( Table 2). The nucleotide and amino acid identities of each gene in these three isolates compared with those of others is shown in Table 2, and the isolates shared the highest identities with genes from Cal1 (AB093601), DEU4 (AB701701), PV0104 (AB093603), ITA8 (AB701725) and USA6 (AB701741).

Phylogenetic Relationships
To estimate the phylogenetic relationships among the TuMV isolates and the outgroups, the complete genome sequences of 33 TuMV isolates, including the three isolates determined in this work, were subjected to phylogenetic analyses, with two Japanese yam mosaic virus (JYMV) isolates (AB016500 and AB027007) sharing the highest identities with TuMV as the outgroup (Table 3, Figure 2). The maximum likelihood (ML) tree showed that these TuMV isolates were clustered to five lineages corresponding to basal-B, basal-BR, Asian-BR, world-B and OMs, which was consistent with previous reports [7,9]. The CCLB, LWLB and WFLB14 were clustered to the basal-BR group. The phylogenetic trees estimated for the individual P1 and CP genes of the 33 isolates were very similar with the results above (data not shown). In a recent report, it was showed that the "emergence" of TuMV was probably a "gene-for-quasi-gene" event based on in vivo and in silico studies [9]. According to this, conserved amino acids changed between the group were also found in 35 sequences, including two JYMV isolates (outgroup) using the clustal W program (Figure 2). At codon positions 852 and 1006, their OGp and OMs amino acids were the same ( 852 V, 1006 I), and the other TuMV groups (basal-B, basal-BR, Asian-BR, world-B) were different ( 852 K / 852 Q/ 852 L, 1006 K/ 1006 R). This asymmetric phylogenetic pattern was called the XXY pattern, which suggests that, at the divergence, the amino acids in the OM progenitor did not change, but in the TuMV other groups the progenitor did. In both sites, the encoded amino acids changed from hydrophobic to polar amino acids. Codon 852 encodes an amino acid at the N-terminal end of the P3 protein and Codon 1006 is translated to an amino acid in the C-terminal third of the P3 protein and near the 5 1 terminal third of the PIPO ORF. Amino acid 1548 in the middle of the CI protein sequence was unique to all isolates of OGp and OMs ( 1548 V), while the alternate ( 1548 A) was conserved in the other TuMV groups. The results in this work were consistent with that of Gibbs et al. [9]. At three codon sites (852, 1006, 1548), CCLB, LWLB and WFLB14 were all 852 L 1006 K 1548 A, which were conserved in isolates of the basal-BR group (Figure 2).
In a recent report, it was showed that the "emergence" of TuMV was probably a "gene-for-quasigene" event based on in vivo and in silico studies [9]. According to this, conserved amino acids changed between the group were also found in 35 sequences, including two JYMV isolates (outgroup) using the clustal W program (Figure 2). At codon positions 852 and 1006, their OGp and OMs amino acids were the same ( 852 V, 1006 I), and the other TuMV groups (basal-B, basal-BR, Asian-BR, world-B) were different ( 852 K / 852 Q/ 852 L, 1006 K/ 1006 R). This asymmetric phylogenetic pattern was called the XXY pattern, which suggests that, at the divergence, the amino acids in the OM progenitor did not change, but in the TuMV other groups the progenitor did. In both sites, the encoded amino acids changed from hydrophobic to polar amino acids. Codon 852 encodes an amino acid at the N-terminal end of the P3 protein and Codon 1006 is translated to an amino acid in the C-terminal third of the P3 protein and near the 5′ terminal third of the PIPO ORF. Amino acid 1548 in the middle of the CI protein sequence was unique to all isolates of OGp and OMs ( 1548 V), while the alternate ( 1548 A) was conserved in the other TuMV groups. The results in this work were consistent with that of Gibbs et al. [9]. At three codon sites (852, 1006, 1548), CCLB, LWLB and WFLB14 were all 852 L 1006 K 1548 A, which were conserved in isolates of the basal-BR group (Figure 2).

Recombination Analysis
The polyprotein-encoding gene sequences of 33 isolates from the public DNA sequence databases were screened for possible recombination events in isolates CCLB, LWLB and WFLB14 and assessed for evidence of recombination using an RDP version 4 software package, PHYLPRO version 1 and SISCAN version 2. Only three out of 33 genomes (9.1%) showed evidence of recombination. The CCLB, LWLB and WFLB14 isolates in this study had no "clear" recombination. Three recombinant isolates (CHN1, ND10J and 59J) were all intralineage recombinants of Asian-BR and world-B isolates, most with CH6 of Asian-BR as the major parent and 2J or KWB778J of world-B as the minor parent. Most recombination sites were located in P1, CI, 6K2 and CP, which were hotspots of recombination in TuMV [8,13,37] (Table 4).

Genetic Distance and Selection Pressure
Genetic distances of the 33 isolates within and between groups were calculated by the K2P methods in MEGA version 6 [38] (Table 5). It showed that the genetic distance within the basal-B group was the largest (0.185˘0.004), two times that of the basal-BR group, in which CCLB, LWLB and WFLB14 were clustered. Genetic distances within the group were significantly smaller than those between groups, indicating that the groupings are correct. Each gene was checked to determine the direction of mutation of 16 TuMV isolates in Chinese and Japanese populations in their d N /d S substitution rates using the codeml program and PBL method ( Table 6). It was found that the values of the d N /d S ratio were always <1 and differed considerably in different genomic regions, indicating that there was selection against most amino acid changes, namely, "negative selection" or "purifying selection", in most of these regions. The largest d N /d S ratio was for the PIPO gene. However, the d N /d S ratio of P3N-PIPO and P3 were almost the same, each only one quarter that of the PIPO gene. This indicates that most of the selection pressure in P3 was imposed by P3N-PIPO. In addition, the d N /d S ratios for the P1 gene was also larger than that of all other genes except PIPO. This result was helpful in understanding that P1 and P3 are the most variable genes in the TuMV genome [10]. The d N /d S ratios for the Chinese and Japanese populations and for the different phylogenetic lineages were not significantly different for any of the genes analyzed by the two methods.

Discussion
Here, we reported the complete genome sequences of three Chinese TuMV isolates infecting Raphanus sativus that were grouped to basal-BR lineage according to their molecular characteristics. Basal-BR is a recent "emerged" branch of the population in East Asia, which was in a state of sudden expansion [8,[11][12][13]17]. In China, since the first report of the basal-BR lineage isolates [17], the population of basal-BR isolates increased rapidly and showed characteristics of the founder effect [8]. In this study, CCLB, LWLB and WFLB14 genome sequences shared their highest identities with isolates from Eurasian countries (Germany, Italy, Turkey and China) and were clustered in the basal-BR group (Table 2, Figure 2) which was consistent with the (delete possible) interpretations that TuMV originated in western Eurasia and spread to other parts of the world [12].
Recently, a sister lineage of TuMV-like potyviruses (TuMV-OM) was identified from European orchids (Orchis militaris, Orchis morio and Orchis simia), from which TuMV diverged about 1000 years ago [7]. A virus emergence involving a major host switch would probably result in significant genomic changes, especially in the emergent lineage [9]. In this work, conserved amino acids changed between groups were found in 35 sequences, including two JYMV isolates (OGp) (Figure 2). The OGp and OMs groups codons were the same at three codon sites (852, 1006, 1548), and their TuMV other groups (basal-B, basal-BR, Asian-BR, world-B) were different. This pattern suggests that the codon in the OMs progenitor did not change but that in the TuMV other groups the progenitor did change at the divergence. Gibbs described this "emergence" of TuMV as probably a "gene-for-quasi-gene" event [9]. Codon 852 and 1006 are translated to amino acids in the P3 protein, and the amino acid 1548 is in the middle of the CI protein sequence. In previous studies, the P3 and CI genes of TuMV, together with the small 6K2 and VPg genes, were identified to be involved in host determination [9,[39][40][41].
The degree of selection pressure in genes can be estimated by calculating the d N /d S ratios, which provide evidence of strong selection against amino acid change as a driving force for TuMV evolution [13,15,42,43]. In previous results, it was shown that Chinese TuMV isolates were under negative or purifying selection according to the whole ORF and other genes [8,9,17]. In this work, we also checked gene-by-gene to see whether there were significant differences between the Chinese and Japanese populations in their d N /d S substitutions (Table 6). Surprisingly, the selection pressure on PIPO was the highest, not P1, which was reported to be the highest in previous studies [8,13,15]. This may be interpreted that PIPO was recently described as a new ORF encoded within the genome of the Potyviridae family [6], and the selection pressure on PIPO was not examined in that search. Additionally, we also found that the selection pressure on P3N-PIPO and P3 was almost the same, but only one quarter that of PIPO. The PIPO ORF is embedded within the P3 cistron and is translated in the +2 reading frame relative to the potyviral long ORF as the P3N-PIPO fusion protein. So it suggests that most of the selection pressure on P3 was imposed by P3N-PIPO and seems not to account for the presence of alternative stop codons in PIPO ORF [44]. This is hypothesized to be associated with the function of P3N-PIPO in cell-to-cell movement and overcoming host resistance [5,42,45,46]. In addition, the selection pressure on the P1 gene was (delete also) larger than that of all other genes except PIPO. The higher selection pressure on P1, P3 and P3N-PIPO might provide an evolutionary force to host dependence and adaptation [5,9,13,15].
TuMV isolates of basal-BR are prevalent and expanding rapidly in China since the first report of their existence in 2005. More full genome sequences of TuMV isolates of basal-BR in China were identified, which made it possible to further understand the genetic diversity of TuMV comprehensively. Our results provided useful information about the evolution and genetic conservation of TuMV. It will also be important in the future to study the pathogenic mechanism of TuMV and the resistance of cruciferous crops to TuMV isolates.

Virus Samples, RNA Extraction and Sequencing
Three samples with typical symptoms of viral diseases were collected from radish from Shandong and Jilin Provinces, which we named WFLB14, LWLB and CCLB. All the isolates were sap-inoculated to Chenopodium amaranticolor and serially cloned through single lesions at least three times. They were propagated in Brassica rapa plants.
The viral RNAs were extracted from purified virions with an Invitrogen Trizol Kit following the instructions of the manufacturer. The RNAs were reverse-transcribed with UTR-R, a primer that was complementary to poly (A) (Figure 1, Table 7). Most parts of the genomes were amplified by PCR (Platinum ® Taq DNA Polymerase, Invitrogen, Carlsbad, CA, USA) with forward primers and reverse primers, which will be provided upon request, and designed according to the conserved region and newly determined genomic sequences (Figure 1, Table 7). The 5 1 -proximal part was obtained using GSP and NGSP primers (Figure 1, Table 7) with the 5 1 -Rapid Amplification of cDNA Ends (5'-RACE) method, as described in an earlier study [47]. Table 7. Primers used for amplifying the complete genomic sequences of TuMV.

Primer
Sequence (5 1 Ñ3 1 ) The amplification products were ligated to the vector pMD18-T (TaKaRa Biotechnology Dalian Co., Ltd., Dalian, China), which was confirmed by PCR and restriction enzyme digestion before sequencing by an ABI PRISMTM 377 DNA Sequencer. Nucleotide sequences from each isolate were determined using at least four overlapping independent RT-PCR products for each region to cover the complete genome. At least six clones for the fragments obtained from the 5 1 -RACE were sequenced.

Phylogenetic Analyses
To estimate the phylogenetic relationships among the TuMV isolates and the outgroups, we aligned the 33 complete genome sequences (Table 3) using the clustal W program [48] and constructed the Phylogenetic tree using the maximum likelihood method in the MEGA version 6 [38]. Statistical significance of tree branching was tested by performing 1000 bootstrap replications. Two Japanese yam mosaic virus (JYMV) isolates (JYMV-j1 and JYMV-mild) [49,50] with known complete genomic sequences were used as the outgroup (OGp), because BLAST searches had shown them to be most closely and consistently related to those of TuMV. In order to identify sequence changes between groups, codons of interest were examined in an alignment of 33 sequences using the clustal W program [9,48].

Recombination Analyses
According to the Phylogenetic trees constructed using different genes of TuMV, we initially confirmed the probable instances of recombination by analyses of different isolates clustered into the lineages. All sequences were aligned using the clustal X program [48] and determined using a combination of methods in the RDP version 4 software package [51], namely RDP [51], GENECONV [52], BOOTSCAN [53], CHIMEARA [54] and MAXCHI [55]. All isolates that had been identified as likely recombinants by the programs in RDP version 4, supported by three different methods with an associated p-value of <1.0ˆ10´6, were re-checked using the original PHYLPRO version 1 [56] and SISCAN version 2 [57].

Genetic Distance and Selection Pressure
Genetic distances of all the isolates within and between groups were calculated by Kimura2-parameter (K2P)methods in MEGA version 6 [38]. Non-synonymous (d N ) and synonymous (d S ) differences that correlated with phylogenetic relationships were estimated using the codeml program of PAML version 4 [58] and the Pamilo-Bianchi-Li (PBL) method assembled in MEGA version 6 [38]. The d N /d S ratios, representing selection pressure in evolution for each protein-encoding region of TuMV sub-populations of different collection regions, were calculated using the Pamilo-Bianchi-Li method in MEGA version 6 [38].