High Mutation Frequency and Significant Population Differentiation in Papaya Ringspot Virus-W Isolates

A total of 101 papaya ringspot virus-W (PRSV-W) isolates were collected from five different cucurbit hosts in six counties of Oklahoma during the 2016–2018 growing seasons. The coat protein (CP) coding region of these isolates was amplified by reverse transcription-polymerase chain reaction, and 370 clones (3–5 clones/isolate) were sequenced. Phylogenetic analysis revealed three phylogroups while host, location, and collection time of isolates had minimal impact on grouping pattern. When CP gene sequences of these isolates were compared with sequences of published PRSV isolates (both P and W strains), they clustered into four phylogroups based on geographical location. Oklahoman PRSV-W isolates formed one of the four distinct major phylogroups. The permutation-based tests, including Ks, Ks *, Z *, Snn, and neutrality tests, indicated significant genetic differentiation and polymorphisms among PRSV-W populations in Oklahoma. The selection analysis confirmed that the CP gene is undergoing purifying selection. The mutation frequencies among all PRSV-W isolates were within the range of 1 × 10−3. The substitution mutations in 370 clones of PRSV-W isolates showed a high proportion of transition mutations, which gave rise to higher GC content. The N-terminal region of the CP gene mostly contained the variable sites with numerous mutational hotspots, while the core region was highly conserved.


Introduction
Papaya ringspot virus (PRSV), a single-stranded, positive-sense RNA virus, belongs to the family Potyviridae and genus Potyvirus. The virus is primarily grouped into two serologically indistinguishable strains: papaya-infecting type (PRSV-P) and cucurbit-infecting type (PRSV-W) [1]. The host range of PRSV-W is limited to Chenopodiaceae and Cucurbitaceae, while PRSV-P can infect plants in the papaya family (Caricaceae) as well [1][2][3]. The site-directed mutagenesis study in recombinant viruses showed that lysine amino acid (aa) at position 27 in NIa-Pro determines the host specificity in PRSV-P, as a single aa change at that position from lysine to aspartic acid can change the host range of PRSV-P to non-papaya-infecting [4].
A total of 101 PRSV-W isolates were collected from six counties (Blaine, Caddo, Cimarron, McCurtain, Muskogee, and Tulsa) of Oklahoma from five different hosts (cantaloupe, cucumber, pumpkin, squash, and watermelon) during the 2016-2018 growing seasons. The complete CP gene of these isolates was amplified by reverse transcription-polymerase chain reaction (RT-PCR), and the expected size DNA bands were gel-purified. PRSV-W positive samples were also checked by RT-PCR for mixed infection with specific primers of watermelon mosaic virus (WMV) and zucchini yellow mosaic virus (ZYMV) [13]. Fifty-five of these 101 PRSV-W had mixed infection with WMV, ZYMV, or both.

CP Gene Sequence Analysis
A total of 370 recombinant clones (3-5 clones from each isolate) were sequenced from 101 PRSV-W isolates in this study. The complete CP gene sequence of all PRSV-W clones was 864bp, which translated into 287 aa residues except one clone, which had 24 nucleotides (nt) deletion at the N-terminal region of the CP gene. The nt identity among consensus CP sequences of 101 PRSV-W isolates from this study ranged from 96 to 100%, and aa identity was 98-100%. The 165 Oklahoman isolates (101 from this study and 64 isolates from the previous study [12]) had 86-98% nt identity and 87-98% aa identity with other isolates from around the globe (Table 1).

Phylogenetic Relationship among PRSV-W Populations
The maximum likelihood (ML) phylogenetic tree constructed from the CP gene with 101 PRSV-W isolates in this study showed three distinct phylogroups circulating in cucurbit fields of Oklahoma ( Figure 1). Phylogroup 1 was further divided into three subgroups. Subgroup 1a included 28 isolates that belong to four counties (Caddo, McCurtain, Muskogee, and Blaine). Subgroup 1b had 20 isolates from three counties (Muskogee, McCurtain, and Cimarron). Subgroup 1c had three isolates from Blaine County. Phylogroup 2 consisted of two subgroups; 2a consisted of 9 isolates collected from Tulsa County, and 2b of 15 isolates from three counties (Blaine, Cimarron, and Tulsa). Phylogroup 3 consisted of two subgroups; 3a consisted of 11 isolates from Blaine, and 3b contained 10 isolates from two counties (Blaine and McCurtain). The ML phylogenetic tree made with sequences from this study and 64 sequences from the previous study [12] also showed a similar pattern (Supplementary Figure S2). Although that study included PRSV-W isolates from two counties (Atoka and Jefferson) not included in this study, they also clustered together with isolates from other counties. CD11  MK17  CD10  BL34  MK8  MK6  MK2  MC3  MC2  MC1  CD9  CD8  CD7  CD6  CD5  CD4  CD3  CD2  CD1  BL33  MK19  MK1  MK4  MK7  MK20  MK5  CD12  MK13  MK14  MK3  BL27  MK9  CD13  MK16  MK31  MK27  MK21  CM1  MC11  MK10  MK11  MK12  MK28  MK29  MK26  MC12  MK22  MK26  MK29  MK25  MK23  MK24  MK30  BL35  BL36  BL37  TL22  TL23  TL21  TL20  TL19  TL18  TL26  TL17  TL24  TL25  BL28  BL31  BL26  BL30  BL25  BL29  CM5  CM3  CM4  CM2  CM9  CM8  CM6  CM7  BL22  BL17  BL18  BL19  BL20  BL21  BL23  BL24  BL32  BL39  BL42  BL38  BL41  BL40  MC4  MC5  MC8  MC6  MC10  MC7  MC9 PRSV-P Cuba Selected PRSV-W isolates: 33 out of 101 from this study and 73 retrieved from GenBank were used to deduce the ML phylogenetic tree with the CP gene of WMV as an outgroup ( Figure 2). The selection of sequences from Oklahoma for global phylogenetic analysis was based on their phylogenetic position within Oklahoma isolates. Four distinct clusters of PRSV-W isolates were observed, loosely based on geographical location. All Asian isolates were grouped into Cluster 1 (C1), while Cluster 2 (C2) contained both North and South American isolates, except isolates from Oklahoma. Cluster 3 (C3) included all the isolates from Oceania, while Cluster 4 (C4) contained all the isolates from Oklahoma. The two isolates from Europe clustered together with American (French isolate) and Oceania isolates (Polish isolate). In addition, another phylogenetic tree was constructed using 148 PRSV isolates that contained both PRSV strains (P and W), including 42 isolates of PRSV-P retrieved from GenBank (Supplementary Figure S3). The Oklahoma isolates are again grouped in a separate cluster (C4). The clustering pattern was similar to PRSV-W, with Asian isolates in C1, other American isolates in C2, and Oceanian isolates in C3.

Genetic Variation among PRSV-W Populations
The overall genetic diversity (d) in the CP gene of PRSV-W isolates from Oklahoma was 0.020; the average number of nt differences (K) was 16.09, and nt diversity (π) was 0.019 (Table 2). PRSV-W isolates from Caddo County were the least diverse (d = 0.001, K = 2.07, π = 0.002), while isolates from Blaine County were the most diverse (d = 0.023, K = 18.02, π = 0.022). The estimates of evolutionary distances in the CP gene between PRSV-W isolates from different counties of Oklahoma showed that populations from Caddo County and Muskogee County were least distant (d = 0.003), and populations between McCurtain County and Tulsa County were most distant (d = 0.036) (Supplementary Table S3). The estimates of evolutionary divergence in the CP gene among PRSV-W isolates within different hosts were fairly consistent with genetic distance and nucleotide diversity in the range of 0.15 < d < 0.24 and 0.015 < π < 0.02, respectively. Similarly, the divergence in the CP gene between PRSV-W populations from different hosts was also close to the mean overall distance (Supplementary Table S3 Table 2). The divergence between phylogroup 1 and 2 was 0.027, phylogroup 1 and 3 was 0.024, and between phylogroup 2 and 3 was 0.036 (Supplementary Table S3). The haplotype diversity (Hd) for each of the aforementioned groups was high, with an overall Hd of 0.98 (Table 2). For further analysis, we used 33 selected CP gene sequences from this study, 73 PRSV-W isolates, and 42 PRSV-P isolates from GenBank to calculate genetic diversity among PRSV global isolates ( Table 3). The overall mean genetic diversity in the CP gene of all selected PRSV-W isolates (n = 106) was 0.071, and all PRSV isolates (n = 148) was 0.085 ( Table 3). The overall average number of nt differences (K) between PRSV-W isolates was 51.71, and nt diversity (π) was 0.07 (Table 3). The average number of nt differences and nt diversity was higher (K = 61.89, π = 0.08) in the overall PRSV group (n = 148). Between different phylogroups of PRSV, Asian isolates were most diverse (d = 0.11, K = 74.95, π = 0.09) and Oceania isolates were least diverse (d = 0.02, K = 18.27, π = 0.021). The haplotype diversity for each of these groups was >0.95, with an overall Hd of 0.99 (Table 3  and Supplementary Table S4). Table 3. Population genetic analysis based on the coat protein gene sequences of PRSV-W and PRSV populations in different phylogroups around the world.

Group
No. of Sequences (n)

Population Differentiation within PRSV-W Populations
The total genetic variability in a population compared to the total population and gene flow estimated from F statistics (Fst) showed that there is infrequent gene flow of PRSV-W between different counties (Fst >0.33) except between Blaine and McCurtain counties (Fst=0.24) and Caddo and Muskogee (Fst =0.14), where gene flow was frequent (Table 4). Similarly, the Nm value for all pairs of counties was <1 except CD and MK (Nm = 1.60). The Fst values for all pairs of PRSV-W hosts were <0.33, with only three pairs of hosts (cucumber and pumpkin, cucumber and watermelon, and pumpkin and squash) having an Nm value <1. The three phylogroups of PRSV-W within this study had infrequent gene flow and distinct population differentiation. However, the population of PRSV-W within the three years of the collection had frequent gene flow (Fst < 0.33, Nm > 1) but were genetically distinct based on other statistical tests. The P values for all permutation-based tests Ks, Ks*, Z*, and Snn were <0.01 for almost all the population groups showing significant genetic differentiation (Supplementary Table S5). The values of all three neutrality tests (Fu and Li's D and F and Tajima's D) for all population groups were negative and occasionally significant (Supplementary Table S6). The negative values on neutrality tests indicate polymorphisms within PRSV-W populations.
Population differentiation analyses were performed for different PRSV populations based on their phylogenetic groups and geographical locations. The Fst value for PRSV-W populations between Asia and other parts was >0.33, indicating an absence of gene flow between these groups ( Table 5). The significant Fst value was also supported by low Nm values. All other PRSV-W groups had an Fst value <0.33. However, when PRSV-P isolates were added to the study, the Fst value for all groups dropped below 0.33. All the permutationbased tests confirmed that there was distinct genetic differentiation between these groups (Supplementary Table S7). All the phylogroups from various parts of the world had non-significant negative values on three neutrality tests (Supplementary Table S8). Table 4. Gene flow and genetic differentiation estimates based on the coat protein gene sequences of PRSV-W isolates from different counties, hosts, phylogroups, and collection years.

Counties
Fst  Table 5. Gene flow and genetic differentiation estimates based on the coat protein gene sequences of PRSV populations between different phylogroups from around the world.

Mutation Frequency
The overall mutation frequency (f) among 370 sequences obtained from 101 PRSV-W isolates was 1.18 × 10 −3 . The mutation frequency within a single infection of PRSV-W isolates was 1.22 × 10 −3, and that of PRSV-W mixed with one or more viruses was 1.15 × 10 −3 . The average mutation frequencies for each county, host, and collection year in a single and mixed population were determined and compared (Tables 6-8). The average mutation frequency in different counties ranged from 0.79 × 10 −3 (McCurtain County) to 1.37 × 10 −3 (Tulsa County). The mutation frequencies of the virus in different counties were generally higher in single infections than in mixed infections, except in Blaine County (Table 6). Similarly, mutation frequencies in different hosts ranged from 0.98 × 10 −3 (squash) to 1.5 × 10 −3 (cucumber). The mutation frequencies in single and mixed infections were fairly close to the overall mutation frequency within the same host ( Table 7). The average mutation frequencies of PRSV-W populations in 2016 was 1.23 × 10 −3 , which increased slightly in 2017 (1.3 × 10 −3 ) and was down to 1.07 × 10 −3 in 2018. As in the case of different hosts, mutation frequencies in single and mixed infections were similar to overall mutation frequency in all three years. The average mutation frequency (1.60 × 10 −3 ) in the C-terminal region of the CP gene was highest among the three regions of the gene, followed by the N-terminal (1.10 × 10 −3 ) and core region (1.01 × 10 −3 ).

Selection Pressure Analysis
In almost all aforementioned populations, the number of non-synonymous mutations exceeded synonymous mutations. However, dN/dS of value did not equal or exceed 1 in any of the populations, indicating negative (purifying) selection. The selection analysis using four independent tests (FUBAR, FEL, MEME, and SLAC) available in the Datamonkey server showed a number of codons undergoing negative (purifying) selection (Table 9)

Mutational Pattern
There were 376 substitution mutations among 370 clones of 101 PRSV-W isolates ( Figure 3). Transition (293, 77.93%) were three-fold higher than transversion (82, 22.07%). A substitution from Adenine (A) to Guanine (G) was most common (110, 29.26%), followed by its reverse substitution G-A (80, 21.28%). Another transition substitution, Thymine (T) to Cytosine (C) (60, 15.96%) and its reverse substitution C-T (39 (10.37%), were also frequent. All transversion had a frequency <8%, with substitution from A to T being most of the lot (27,7.18%), while substitution from C to G was null. There was one clone of PRSV-W from Caddo County collected in 2016 (CD-2) with a deletion of 24 nt from positions 57-80. Among those 376 substitutions, 218 were non-synonymous and had 77 combinations of aa substitutions. Twenty-six of those aa substitutions were only observed once, and twenty-four were only observed two times. The most common aa substitution among the CP gene of the PRSV-W population involved Lysine to Arginine, with a frequency of 14 out of 218 non-synonymous substitutions (6.34%) (Table 10). Similarly, a change of aa from Asparagine to Aspartic acid was observed 12 times. The aa changes from Arginine to Lysine (9 times), Alanine to Valine, Glutamic acid to Glycine, and Leucine to Proline (8 times) were also frequent. There were also two mutations leading to the stop codon. The top 5 aa involved in non-synonymous substitutions were Arginine, Lysine, Asparagine, Aspartic acid, and Glycine ( Figure 4). There were no mutations observed involving tryptophan. A total of 231 sites (nt positions) had at least one substitution. The number of non-synonymous sites was 145, and that of synonymous sites was 86. Substitution on 12 of these sites occurred >3 times, 20 sites occurred 3 times, 53 sites once, and twenty-four were only observed two times. The most common aa substitution among the CP gene of the PRSV-W population involved Lysine to Arginine, with a fre quency of 14 out of 218 non-synonymous substitutions (6.34%) (Table 10). Similarly, change of aa from Asparagine to Aspartic acid was observed 12 times. The aa change from Arginine to Lysine (9 times), Alanine to Valine, Glutamic acid to Glycine, and Leu cine to Proline (8 times) were also frequent. There were also two mutations leading to th stop codon. The top 5 aa involved in non-synonymous substitutions were Arginine, Ly sine, Asparagine, Aspartic acid, and Glycine ( Figure 4). There were no mutations observed involving tryptophan.      Table 10. List of the most frequent amino acid changes among 218 non-silent mutations observed in the coat protein gene sequences of PRSV-W isolates in this study.

Rank Amino acid change Frequency (%) 1
Lysine-Arginine 14 (6.34) 2 Asparagine-Aspartic acid 12 (5.43) 3 Arginine-Lysine 9 (4.07) 4 Alanine-Valine 8 (3.62) 4 Glutamic acid-Glycine 8 (3.62) 4 Leucine-Proline 8 (3.62) 7 Aspartic acid-Asparagine 7 (3.16) 7 Asparagine-Serine 7 (3.16)  The entire CP gene was subdivided into three regions: N-terminal, core, and C-terminal. The N-terminal region included 1-195nt (1-65aa), the core region included 196-654nt (66-218aa) and the C-terminal included 655-864nt (219-287aa). Out of 287 aa in the CP gene, 229 aa sites were conserved among all PRSV-W isolates obtained in this study from Oklahoma. The core region was highly conserved as nearly 78% of the sites were completely conserved (without any mutations), in addition to 8.5% of the sites with silent mutations (Table 11). Conversely, the N region had only 62.6% of the sites, which were conserved with a higher number of mutational hotspots in the region. The mean genetic diversity was also highest (0.019) in the N-region. Most of the frequent non-silent mutations were observed in either the N-terminal or C-terminal of the CP gene. The most conserved site within CP was at the core region, with 29 consecutively conserved aa se-  The entire CP gene was subdivided into three regions: N-terminal, core, and Cterminal. The N-terminal region included 1-195 nt (1-65aa), the core region included 196-654 nt (66-218aa) and the C-terminal included 655-864 nt (219-287aa). Out of 287 aa in the CP gene, 229 aa sites were conserved among all PRSV-W isolates obtained in this study from Oklahoma. The core region was highly conserved as nearly 78% of the sites were completely conserved (without any mutations), in addition to 8.5% of the sites with silent mutations (Table 11). Conversely, the N region had only 62.6% of the sites, which were conserved with a higher number of mutational hotspots in the region. The mean genetic diversity was also highest (0.019) in the N-region. Most of the frequent non-silent mutations were observed in either the N-terminal or C-terminal of the CP gene. The most conserved site within CP was at the core region, with 29 consecutively conserved aa sequences from positions 121 to 149 among the consensus sequences from this study.

Discussion
This study confirms our first hypothesis that there is a strong purifying selection within the PRSV-W populations. The selection is likely acting on removing deleterious mutations caused by error-prone replication. The high mutation frequency within all PRSV-W populations from Oklahoma confirms that the population exists as mutant clouds compatible with the quasispecies concept. The PRSV-W populations within this study were collected in a relatively quick time frame, which makes high mutant clouds the likely outcome [23]. Additionally, these mutation frequencies were remarkably consistent in all the populations (based on geography, hosts, collection years, and single/mixed infections) within the range of 10 −3 . Mutations are the major driving forces for monopartite RNA virus variation in addition to recombination. The error rate of RNA replication ranges from 10 −3 -10 −5 base per copying cycle giving rise to high diversity within populations due to large mutant clouds. The high mutation rates reflect an evolutionary strategy as these mutant clouds usually work in favor of viruses for adaptation during environmental stress [24,25]. The majority of the mutation events in this study were substitutions, and rarely any insertion/deletions (indels) were found. The indel mutations are usually rare but are lethal in most cases [26]. Due to the abundance of deleterious mutations, the level of dN is generally higher in closely related sequences compared to distantly related sequences [27], which explains slightly higher purifying selection in Oklahoma isolates compared to global isolates.
The replication rates of most RNA viruses are swift, so they are able to reach exceptionally large population sizes within a brief period of time [26]. However, this large population size is not, in fact, an effective population size, as a substantial part of this population consists of mutants that will not pass to the next generation [25]. The genetic bottleneck reduces the population size below a threshold level to facilitate the transmission of fittest variants, thereby limiting the size of the effective population [28]. The genetic bottleneck usually is the product of the biology of the vector and its feeding habit and can also occur at different moments of the viral life cycle, such as virus movement between plant cells during systemic infection and horizontal transmission [29][30][31][32]. In addition, the purifying selection helps viruses maintain genetic stability by eliminating less-fit mutants with deleterious effects [33]. The genetic variations due to mutation are also structured by gene flow [34]. The gene flow among different hosts, geographical regions, and different parts of the same plant helps in shaping the global genetic diversity [35]. The low level of long-distance movement or gene flow might be the reason behind the non-uniform and variable viral populations in this study. However, this low level of gene flow was enough to accommodate variants from different phylogroups occurring in the same geographical area.
Utmost care was taken to reduce the mutations during RT-PCR steps by employing a number of strategies: high-fidelity reverse transcriptase and Taq polymerase were used, the number of PCR cycles was limited to 25, and any mutations found in only one direction of Sanger sequencing were not considered. Despite these precautions, there is a chance that some of these mutations might be due to experimental error. Similar to the study conducted by Simmons et al. [36], we calculated the highest possible number of erroneous mutations due to RT-PCR. The total number of possible mutations due to RT was ∼9 (2.9 × 10 −5 mutations × 864 sites × 370 clones), and PCR was ∼7 (2.28 × 10 −5 mutations × 864 sites × 370 clones), which adds to a total of ∼16 mutations. Even if we deduct these (possible) artifact mutations (16) from the total number of mutations (376) observed in the study, the mutation frequency remains in a similar range. However, the actual erroneous mutations due to experimental error might be significantly less than these calculated values due to the aforementioned experimental considerations.
The PRSV-W isolates collected from the same county, host, and growing season were grouped in different phylogroups, while isolates collected from different counties were grouped in the same phylogroup. This rejects our second hypothesis that these factors play a role in phylogenetic clustering. For instance, PRSV-W isolates collected from Blaine County in a single growing season (2017) from the same host (pumpkin) grouped in three different phylogroups (Figure 1). Similarly, PRSV-W isolates from Blaine, Cimarron, and McCurtain counties collected in 2018 fell in two different phylogroups. This diversity is also well supported by the higher within-group mean evolutionary distance of PRSV-W isolates from these counties compared to others ( Table 2). Some of the isolates from geographically distant locations (Muskogee and Caddo counties) even had identical sequences. In addition, isolates from two far corners of Oklahoma with more than 900 kilometers of distance were grouped together closely in the same phylogroup (phylogroup 1), irrespective of their collection year and host. The close evolutionary distance between isolates from various locations might have caused the close evolutionary relationship. The lack of geographical connectivity in phylogeny among these isolates can be attributed to different possibilities. First, all these isolates from Oklahoma might have been derived from the same most recent common ancestors (MRCA). Second, the virus or virus harboring aphids likely travels with harvested plants and fruits to various parts of the state, thereby facilitating spread in new locations. In both cases, the virus population can use the wild host as their reservoir during times other than the growing seasons of their primary hosts [13]. In addition, none of the phylogroups within Oklahoma had distinct fixed mutations in terms of nt and aa, indicating the recent common ancestry of all these populations. The anomaly was the isolates from Tulsa, which had three distinct aa changes compared to other populations at positions 44 (Alanine-Threonine), 76 (Valine-Isoleucine), and 120 (Serine-Asparagine).
Similarly, other parameters considered in the study, viz. host and collection years, also did not have a significant effect on phylogeny. For instance, the genetic differentiation between different hosts and their diversity was not significant in all populations of PRSV, and none of the frequent mutations were observed in specific hosts. The PRSV-W virus isolates collected at different points of time (2008-2018) from the same location did not cluster according to the collection years. However, if the isolates collected in the same year fell in the same phylogroup, they tended to group together, indicating a loose association between collection time and their evolutionary fate. This is further bolstered by the fact that mutation frequencies of virus isolates collected from different hosts and in different collection years remained highly similar (Tables 8 and 9).
The clustering pattern of global isolates showed distinct geographical clustering, with Asian, American, and Oceania isolates falling in different phylogroups. The two European isolates from France and Poland were anomalous as they were grouped with two distinct groups. More isolates from Europe are needed to evaluate if all of them cluster together with either American and Oceanian isolates or form separate clusters among themselves. The grouping pattern in the phylogeny of PRSV-W in this study is similar to recent studies [37,38], which showed that isolates from different parts of the world grouped together with one phylogroup and a few Asian isolates in different phylogroup. The PRSV isolates from other parts of the US were close to Mexican and other American isolates, as observed previously [39,40]. The distinct geographical clustering of the PRSV population based on continents shows PRSV-W populations do not have recent travel history across the continents (North and South America are referred to here as 'Americas'). While the aforementioned (refer to the previous paragraph) factors explain geographical connectivity among virus populations in nearby locations, longer distance movement from these modes of transmission is unlikely. Similar to the clustering pattern in Oklahoma, the global isolates did not have distinct phylogenetic differentiation among hosts and collection years.
More than 99% of the mutations observed were substitution mutations. These substitution mutations were biased towards transitions with a high proportion (>75%) of purine to purine or pyrimidine to pyrimidine nt change ( Figure 4). The transition mutation biases are common in viral systems and were noted in several previous studies [41][42][43][44]. All the combinations of nt substitutions involving G and C had the mutations favoring change to these nt, thereby favoring gain of net GC content. This net gain of GC content was also observed in a study conducted by Nigam et al. [44]. In addition, the resulting aa changes from these GC-rich mutations mostly involved aa Arginine, Lysine, Asparagine, Aspartic acid, Glycine, and Alanine with either loss or gain of the net charge. Interestingly, all these aa are also disorder-promoting [45]. Conversely, the order-promoting aa, such as Tryptophan and Cysteine, were rarely involved in substitution mutations.
Mutation frequency within the N-terminal region of CP was highest, followed by the C-terminal region and core region. Although mutations were frequent in core regions, they were disproportionately silent and included less positively selected sites in comparison to N-and C-terminal regions. This further consolidates the evidence of a highly conserved core region. All but seven isolates from McCurtain County, which were collected in 2018, had a DAG motif at amino acid positions from 7 to 9 in this study. These seven isolates have Threonine instead of Alanine. The Alanine to Threonine mutation was also observed in three other global isolates of PRSV; one from the USA and two from Taiwan. In addition, few isolates had NAG, DSG, and DSA instead of DAG motif. The DAG motif, highly conserved among Potyvirus CP genes, has a significant role in virus transmission by aphids [46] and is exposed on the viral surface [47]. However, the mutation in this region has been reported in many studies with efficient aphid transmission [48][49][50][51][52][53]. The other two motifs, PTK and KITC, present in another Potyvirus gene, Hc-Pro, and their interaction with the CP gene also have a vital role in viral transmission by aphids [54,55], and these motifs could facilitate aphid transmission in the absence of the DAG motif [49]. A number of conserved motifs described in Potyvirus CP previously were present in PRSV-W populations in this study with minor or no mutations. In addition, there were more highly conserved motifs in the core region of CP, which are specific to PRSV (Table 12). These conserved motifs were present in PRSV populations regardless of the biotype and might carry some evolutionary role. Further study is desired to decode the evolutionary messages conveyed by these motifs. The presence/absence of these motifs nevertheless could be useful in diagnostic tools such as primer design.  [56,59] Recombination analysis of all 101 PRSV-W isolates sequenced in this study, as well as all 165 Oklahoman PRSV-W isolates (101 isolates from this study and 64 isolates from the previous study), was conducted using RDP software. Only two recombination events were detected by two algorithms in 101 PRSV-W isolates and were not significant (data not shown). These results indicate that recombination events are not frequent in the CP-gene-coding region.
To our knowledge, this is the first study on mutational analysis within the quasispecies population of PRSV-W isolates. The present study provides a broad analysis of a wide range of PRSV-W populations isolated from diverse geographical locations of the state, host, and collection time based on the various aspects of evolution such as mutation, genetic differentiation, and phylogeny. The insights provided by this study will enhance existing knowledge of PRSV-W evolution and epidemiology and will be helpful in developing viable management strategies. Specifically, the high diversity of PRSV populations in different geographical locations and the possibility of multiple viral introductions in the same geographical location demand careful consideration towards accommodating different genetic aspects of the virus in multiple locations while developing sustainable control strategies.

Sample Collection and Detection of PRSV-W
Surveys were conducted during the three growing seasons of 2016-2018 in multiple counties of Oklahoma from different cucurbit hosts, and dot immunobinding assay (DIBA) was performed against 10 different viruses [13]. Confirmation of PRSV-W was completed by RT-PCR using the primers specific for the CP gene of PRSV-W (PRSVCPF; 5 CTGATGATTATCAACTTGTT3 , PRSVCPR; 5 TAAGGTGAAACAGGGTGGAG3 ) as described previously with minor modifications [13]. Total RNA was extracted by the Tri-reagent (Molecular Research Centre Inc, USA) method using 100 mg of infected plant tissue [17,63,64]. PRSV-W positive isolates were also tested with WMV-and ZYMV-specific primers as described previously [13]. High-fidelity DNA polymerase enzyme (Pfu, Stratagene) was used along with Taq polymerase, and the number of PCR cycles was limited to 25 to reduce potential mutations generated during RT-PCR.

Cloning and Sequencing
The purified PCR products from each isolate were ligated in the pGEM-T Easy Vector (Promega Corp, Madison, WI, USA). The ligated products were transformed into Escherichia coli DH5α competent cells (New England Biolabs, Ipswich, MA, USA) and were subjected to blue-white screening using Luria-Bertani agar (LBA), carbenicillin, isopropyl-thiogalactopyranoside (IPTG), and X-gal. Three to five clones from each isolate were used for Sanger sequencing in both directions using applied biosystems 3130 at the Department of Biological Science, the University of Tulsa, Oklahoma [65]. The details of virus isolates, their host, geographical location, and collection year are depicted in Supplementary Table S1.

Sequence Analysis
The purified PCR products were sequenced in both forward and reverse directions using Sanger sequencing method. Sequences were retrieved, analyzed in Finch TV, and compared with GenBank sequences using the basic local alignment search tool (BLAST). Sequence alignment was completed for the clones using the Clustal X program and ME-GAlign™ incorporated within the DNASTAR suite of programs (Madison, WI, USA). For CP sequences, consensus nt sequences for each isolate were obtained using the Editseq™. The consensus sequences of each virus isolate were deposited in the NCBI database, and their GenBank accession numbers are listed in Supplementary Table S1.

Phylogenetic Analysis
Consensus sequences of each isolate for CP gene of PRSV-W were used for phylogenetic analysis. The MEGA7 [66] software was used to construct four sets of ML phylogenetic trees: first, comparing PRSV-W isolates within this study (n = 101); second, comparing PRSV-W isolates from Oklahoma (101 isolates from this study and 64 selected isolates from GenBank); third, PRSV-W isolates from around the world (33 selected isolates from this study and 73 isolates from GenBank); and fourth, comparing all PRSV isolates irrespective of W or P strains (33 selected isolates from this study and 115 isolates from GenBank). The CP gene sequences of PRSV-W and PRSV-P retrieved from GenBank for phylogenetic analysis are listed in Supplementary Table S2. The best fit model selection test was completed in MEGA7, and the model with the lowest BIC value was selected for each phylogenetic tree reconstruction. For evaluation of statistical confidence in tree nodes, 1000 bootstrapping was completed. Each of these trees was visualized in Figtree version 1.4.3.

Genetic Diversity
Genetic diversity of all CP sequences of PRSV-W isolates was determined based on host, geographical location, year of collection, and their respective phylogroups within Oklahoma using the Kimura 2 parameter in MEGA7. For each of the aforementioned categories, the number of haplotypes, haplotype diversity, number of segregating sites, average number of nt differences, and average nt diversity was calculated using DNASP6. These analyses were also completed separately for PRSV-W isolates and PRSV isolates (including both strains) from around the world.

Gene Flow and Population Differentiation
The extents of genetic flow and differentiation were estimated using the fixation index (Fst) and the number of migrants successfully incoming per generation (Nm) values. The Fst value ranges from 0, indicating no genetic differentiation, to 1, indicating clear differentiation. The absolute Fst value of >0.33 indicates infrequent gene flow between the populations. Nm < 1 indicates reduced gene flow and increased genetic drift, which results in local population differentiation [67].
Similarly, the population differentiation was analyzed using permutation-based statistical tests Ks, Ks *, Z *, and Snn [67]. These tests are considered the most powerful statistical tests for analyzing sequence-based genetic differentiation among the highly mutating population in small sample sizes [68]. The Ks * was calculated as the average number of differences between sequences regardless of geographical origin. Under the null hypothesis, Kst * is expected to be near zero, meaning there is no genetic differentiation. The Z * statistic is a logarithmic variant of the rank statistic (Z), and smaller values indicate less genetic differentiation, and higher values with significant P values indicate higher genetic differentiation. The nearest neighbor statistic (Snn) measures the frequency of nearest neighbor sequences in the same locality. The value of Snn ranges from 1 2 in the case of panmixia to 1 when populations are distinctly differentiated.

Neutrality Tests
The values of segregating sites (S), the average number of nt differences (K), and a total number of mutations were used for testing the neutrality hypothesis using Tajima's, Fu, and Li's DandF among different population groups. Tajima's D test is based on the differences between the two estimators, Tajima's estimator (based on K) and Watterson's estimator (based on S) [68]. The positive value on Tajima's D statistic means an abundance of polymorphic alleles, while negative values indicate the presence of rare alleles. Fu and Li's D test is based on the difference between singleton mutation sites and total mutation sites, and Fu and Li's F test is based on a difference in singleton mutation sites and K [69]. The negative values for these tests mean a low-frequency polymorphism [70].

Selection Analysis
Hyphy and Datamonkey packages were used for the isolate-selection analysis of different PRSV and PRSV-W populations. The dN/dS value was determined by the HyPhy packages included in MEGA7. The dN/dS value <1 shows evidence of the purifying selection, dN/dS=1 indicates neutral selection, and dN/dS >1 indicates positive selection. Similarly, FUBAR, FEL, MEME, and SLAC programs incorporated in Datamonkey (https: //www.datamonkey.org, accessed on 5 March 2021 were used to deduce the numbers of positively and negatively selected codons.

Mutation Frequency and Pattern within the CP Gene
The total mutation count in the CP gene was completed for each isolate. The consensus nt sequence was inferred from 3-5 clones/isolate, and all mutations (substitution or indels) observed in those clones were counted. The total number of mutations within the different populations was determined, and mutation frequency was calculated using the following formula: Mutation frequency = total number of mutations in n isolates with m clones/total number of nt in n × m Whenever multiple nt mutations occurred consecutively, each individually mutated base was counted [43,71]. Any mutations that were only observed in either the forward or reverse direction were not considered as true mutations to reduce the possibility of artifact mutations. Mutation frequencies were determined for each isolate, location (county of collection), host, year of collection, and type of infection (single or mixed). For each of these categories, synonymous and non-synonymous mutations were also determined.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/pathogens10101278/s1, Figure S1: Geographical locations of different counties of Oklahoma; Figure S2: Maximum likelihood (ML) phylogenetic tree constructed in MEGA 7 using general time reversible (GTR) model for coat protein gene of 165 PRSV-W isolates from Oklahoma (101 from this study and 64 from GenBank; Figure S3: Maximum likelihood (ML) phylogenetic tree constructed in MEGA 7 using general time reversible (GTR) model for coat protein gene of 148 PRSV isolates (33 from this study and 115 from GenBank). Table S1: List of papaya ringspot virus-W isolates collected and sequenced the coat protein gene in this study; Table S2: Nucleotide sequences of the coat protein gene of papaya ringspot virus isolates (W and P) retrieved from GenBank. Table S3: Estimates of evolutionary divergence among the coat protein gene sequences of papaya ringspot virus-W isolates collected from different counties of Oklahoma, hosts, collection years and phylogroups; Table S4: Estimates of evolutionary divergence among the coat protein gene sequences of papaya ringspot virus isolates (both W and P strains) different phylogroups; Table S5: Genetic differentiation estimates in the coat protein gene sequences of papaya ringspot virus-W isolates from different counties, hosts, phylogroups and collection years; Table S6: Neutrality tests of coat protein gene sequences among the PRSV-W isolates from different counties, hosts, phylogroups and collection years; Table S7: Gene flow and genetic differentiation estimates based on the coat protein gene sequences of papaya ringspot virus -W isolates between different phylogroups from around the world; Table S8: Neutrality tests of coat protein gene sequences of papaya ringspot virus isolates among different population groups.   (Supplementary Table S1) were submitted to NCBI database. The accession numbers from (MZ099456-MZ099556) can be found online https://www.ncbi.nlm.nih.gov/genbank/ (accessed on 1 September 2021).