The Phylogeography of Potato Virus X Shows the Fingerprints of Its Human Vector

Potato virus X (PVX) occurs worldwide and causes an important potato disease. Complete PVX genomes were obtained from 326 new isolates from Peru, which is within the potato crop′s main domestication center, 10 from historical PVX isolates from the Andes (Bolivia, Peru) or Europe (UK), and three from Africa (Burundi). Concatenated open reading frames (ORFs) from these genomes plus 49 published genomic sequences were analyzed. Only 18 of them were recombinants, 17 of them Peruvian. A phylogeny of the non-recombinant sequences found two major (I, II) and five minor (I-1, I-2, II-1, II-2, II-3) phylogroups, which included 12 statistically supported clusters. Analysis of 488 coat protein (CP) gene sequences, including 128 published previously, gave a completely congruent phylogeny. Among the minor phylogroups, I-2 and II-3 only contained Andean isolates, I-1 and II-2 were of both Andean and other isolates, but all of the three II-1 isolates were European. I-1, I-2, II-1 and II-2 all contained biologically typed isolates. Population genetic and dating analyses indicated that PVX emerged after potato’s domestication 9000 years ago and was transported to Europe after the 15th century. Major clusters A–D probably resulted from expansions that occurred soon after the potato late-blight pandemic of the mid-19th century. Genetic comparisons of the PVX populations of different Peruvian Departments found similarities between those linked by local transport of seed potato tubers for summer rain-watered highland crops, and those linked to winter-irrigated crops in nearby coastal Departments. Comparisons also showed that, although the Andean PVX population was diverse and evolving neutrally, its spread to Europe and then elsewhere involved population expansion. PVX forms a basal Potexvirus genus lineage but its immediate progenitor is unknown. Establishing whether PVX′s entirely Andean phylogroups I-2 and II-3 and its Andean recombinants threaten potato production elsewhere requires future biological studies.


Introduction
Potato virus X (PVX, genus Potexvirus, family Alphaflexiviridae) [1,2] was one of the first potato (Solanum tuberosum) viruses described [3][4][5][6]. It is one of over 50 viruses now groups 2 and 4 were present in II-1 and II-2. Therefore, as strain group 4 isolates were in both major phylogroups, no direct correlation existed between phylogenetic placement and biological strain groups. When Kutnjak et al. [43] compared the complete genomes of nine Peruvian PVX isolates with those of 20 complete PVX genomes, phylogenetic analysis revealed that two were within minor phylogroup II-2 together with previously reported Andean isolates. However, six others were all in an entirely new Andean minor phylogroup they called II-3, and one was in major phylogroup I. They found no evidence of PVX recombination events. Phylogenetic analysis of all available CP sequences placed two other Andean isolates from Colombia within II-3, suggesting it might be widespread in the Andean region. Subsequently, phylogenetic analysis using CP sequences of three further Colombian isolates placed these in major phylogroup I [44,45].
About 9000 years ago, potato was domesticated from its wild potato ancestors in the Altiplano regions of Peru and Bolivia in the South American Andes mountains [46]. After the 1542 arrival of Europeans to the Americas, potato land races (=native potato cultivars) were taken to Europe during the Columbian Exchange of animals and plants between the Americas and Eurasia [47] and introduced from there to other continents [48]. During early studies in which collections of Andean potato land races and wild potato species were tested for presence of common potato viruses, PVX was often the most frequently detected [49][50][51][52][53][54]. Widespread occurrence of PVX was also revealed by studies in Peru of Andean potato germplasm collections and leaf samples collected from land races and locally bred modern potato cultivars growing in the field [30,31,55]. This common occurrence of PVX was accompanied by presence of PVX resistance genes Nx, Nb and Rx in potato land races and wild potato species [26,27,32,56,57].
Recently, we have reported the properties of genomic sequences of isolates of three common potato viruses (PVA, PVS and PVY), obtained from potato land races or locally bred modern potato cultivars growing in the Peruvian Andean potato domestication center, compared them with isolates from other world regions, and made deductions concerning their evolution [58][59][60]. In this paper, we report a similar study of PVX. The results of these analyses provide new information on the phylogenetics and population genetics of PVX. They also greatly enhance our understanding of the origins and spread of this virus by humankind.

Virus Isolates
The 11 historical isolates, all but one of which (DX) were sequenced, were collected between 1940 and 1985 and came from Peru (A, CP, DP, E), Bolivia (HB) and the UK (B, DX, EX), or were strain group 4 isolates derived from three of them (CP4, DX4, EX4) (Table 1a). Three Peruvian isolates (A, DP, E) were kept in desiccated leaf tissue over silica gel at 4 • C at the National Agrarian University, La Molina, Lima, Peru, before being sent to the UK for sequencing in 2018. All other historical isolates were maintained in a collection of historic freeze-dried virus isolates kept at FERA Science Ltd., York, UK. In earlier studies, all these isolates had been inoculated to potato cultivar differentials to establish which strain groups they belonged to [24,28,30,31,35,36].  Table S1 show which Department each came from, the year it was isolated and the total number of samples and isolates sequenced. A simplified searchable spread-sheet version (SM File S1) provides more detail of the provenance of each isolate and, where available, the potato cultivar from which it was isolated, and, for each Peruvian isolate, the number of the site from which it was collected, as shown in Figure 1. Each isolate name starts with a three-letter mnemonic of the Department where it was collected.  Table S1 show which Department each came from, the year it was isolated and the total number of samples and isolates sequenced. A simplified searchable spread-sheet version (SM File S1) provides more detail of the provenance of each isolate and, where available, the potato cultivar from which it was isolated, and, for each Peruvian isolate, the number of the site from which it was collected, as shown in Figure 1. Each isolate name starts with a three-letter mnemonic of the Department where it was collected. Peru's Andean highlands are shown as brown, the country′s coastal desert and Amazonian jungle regions as green and surrounding countries as white. The red dots marked on the main map represent the locations sampled, and the names marked on it are those of the countries' regional departments sampled (black lines are departmental boundaries). The red dots marked on the individual department maps clustered on either side show each collection site, and the numbers indicate each individual infected sample collected. Individual collection sites are numbered (Supplementary Materials (SM) Peru's Andean highlands are shown as brown, the country s coastal desert and Amazonian jungle regions as green and surrounding countries as white. The red dots marked on the main map represent the locations sampled, and the names marked on it are those of the countries' regional departments sampled (black lines are departmental boundaries). The red dots marked on the individual department maps clustered on either side show each collection site, and the numbers indicate each individual infected sample collected. Individual collection sites are numbered (Supplementary Materials (SM) File S1, column C). The names of the Departments also provide the first three letters of each isolate name (SM File S1, column F). The 49 PVX sequences already available in GenBank were downloaded in July 2020.

High-Throughput Sequencing
In the UK, samples of freeze-dried PVX-infected leaf material containing one each of 11 isolates (A, CP, CP4, DP, E, HB, B, DX, DX4, EX, EX4) were subjected to high-throughput sequencing (HTS) in 2016-2019 (Table 1a). Total RNA was extracted from each sample using the Total RNA kit (Qiagen, UK), including the optional DNase treatment. An indexed sequencing library was produced from the total RNA using a Scriptseq complete plant leaf kit (Illumina, USA) and sequenced on a MiSeq instrument (Illumina), using a 600 cycle V3 kit. The methods followed are described in more detail by Fox et al. [61]. Ten isolates provided a complete PVX ORF. No other virus sequences were associated with these complete ORFs and no sequence of isolate DX was obtained. The new genomic sequences with complete ORFs were mostly c. 6435 nts long. Their final sequences were submitted to GenBank with Accession Codes MT708134-MT708143 (Table 1a and SM File S1).
In Peru, total RNA was extracted from each potato leaf sample (Table 1b) using trizol, as instructed by the manufacturer. The large RNA fraction was precipitated by adding an equal volume of 4M LiCl at~4 • C (on ice) overnight, followed by centrifugation. The remaining small RNA fraction was subsequently precipitated by adding one volume of isopropanol followed by centrifugation. Small RNAs were separated on 3.5% agarose gels and bands corresponding to~20-25 nts excised and purified using quantum prep freeze and squeeze columns (BioRad). Small RNA libraries were prepared using the protocol of Chen et al. [62] and sent for sequencing on a HiSeq4000 by a commercial provider (Fasteris Life Sciences SA, Switzerland). Small RNA sequences were analyzed using VirusDetect v1.6 [63] to identify all viruses infecting the plants, and samples in which PVX was detected were selected for further analysis. Using the Geneious R11.1.3 software package (https://www.geneious.com; accessed on 1 May 2019), the PVX contigs produced by VirusDetect were extracted for each positive sample and a consensus was generated. The small RNAs were mapped back to the consensus to confirm the quality of the assemblies and make any corrections as necessary. Their final sequences are recorded in GenBank and have Acc Codes MT752611-MT752936 (SM Table S1 and SM File S1). The three Burundi leaf samples were desiccated in silica gel, similar to the Peruvian samples which were sent under license to Peru where they were sequenced with the Peruvian samples. Their final sequences were submitted to GenBank with Accession Codes MT520804-MT520806 (SM File S1). All the new genomic sequences with complete ORFs were c. 6450 nts long.

Sequence Analysis
Genomic sequences were edited using BioEdit [64] to extract their five gene regions (replicase, gp2 (25K), gp3 (12K), gp4 (8K) and gp5 (CP)). The sequences of each gene region were aligned using the encoded aa s as a guide, by the TranslatorX online server [65] (http://translatorx.co.uk; accessed on 1 June 2019) with its Multiple Alignment using Fast Fourier Transform (MAFFT) option [66]. The alignments were appended sequentially to form an alignment of concatenates with all genes in the same reading frame. A separate CP alignment was made from the new CP genes after 45 near-duplicate Peruvian sequences had been removed for computing convenience, and all of the PVX CP genes downloaded from GenBank.
Phylogenetic trees were calculated using the neighbor joining (NJ) option in ClustalX [81], and/or in Phylogenetic Maximum Likelihood (PhyML) 3.0 for ML [82]. In PhyML, the statistical support for their topologies was assessed using the Shimodaira and Hasegawa (SH) method [83]. Trees were drawn using Figtree Version 1.3 (http://tree.bio.ed.ac. uk/software/figtree/; accessed on 12 May 2018) and a commercial graphics package. PATRISTIC [84] was used to check for mutational saturation by comparing the patristic distances of the nt phylogenies with those of the aa s they encoded and confirmed by the method of Xia [85]. The BlastN and BlastP online facilities of GenBank [86] were used to search for potexvirus sequences with which to compare, and also to root, the PVX phylogenies.
The program DnaSP v.6.10.01 [87] was used to analyze genetic differences between selected populations of sequences. We used it to estimate average pairwise nt diversity (π), number of synonymous sites (SS), number of non-synonymous sites (NS), mean synonymous substitutions per synonymous site (dS), mean non-synonymous substitutions per non-synonymous site (dN) and ratio of non-synonymous nt diversity to synonymous nt diversity (dN/dS). It was concluded that genes were under positive, neutral or negative selection when their dN/dS ratios were >1, =1 and <1, respectively. Tajima s D statistical test was used to identify non-random evolutionary events such as population expansion, bottlenecks and selection by comparing the estimated number of segregating sites with the mean pairwise difference among sequences [88]. DnaSP v.6.10.01 was also used to assess the extent of genetic differentiation of PVX populations, measured as the amount of gene flow between them. This was done using the coefficient of genetic differentiation F ST (=the inter-populational component of genetic variation or the standardized variance in allele frequencies across populations) [89] and the gene flow parameter Nm (the product of the effective population number and rate of migration among populations) [90].
The TempEst program [91] was used to check for the presence of a linear temporal signal in all the dated sequences, and all those in Cluster B. The 'Least Squares Dating' (LSD) method Version lsd-0.3beta of To et al. [92] was used to estimate the TMRCAs (Time to the Most Recent Common Ancestor) of Cluster B. The statistical significance of correlation coefficients was calculated using the Social Science Statistics online site (https://www. socscistatistics.com/pvalues/pearsondistribution.aspx; accessed on 3 August 2020). Some alignments were separated into three sub-alignments using NSplitter (https://github.com/ HarryGibbs/NSplitter; accessed on 3 August 2020): one was of all the codon positions that had only changed synonymously, another was of codons that included at least one non-synonymous change and the third was of codons that had not changed.

Sequence Alignments
The 388 genomic sequences (339 new and 49 downloaded from GenBank) were edited and converted as described above to an alignment of concats 6357 nts long. A separate alignment of 488 CP genes was made from the CP genes of the new sequences after 45 nearduplicates were removed as described above and 128 CP sequences from GenBank were added. Three quarters of the CP sequences were 711 nts long, but 119 of the Peruvian sequences were 720 nts long, and three from the UK (the EX-2, B and EX sequences; GU384737, GU384738 and X88782) were 744 nts long with all the inserted codons being situated around 16 codons from their N-termini.

Recombination Analyses
When the concat sequences were checked for phylogenetic anomalies using Recombination Detection Program No. 4 (RDP4), 18 of the sequences, 17 of them from Peru, were found to have recombinant (rec) regions ( Table 2). The rec sequence not from Peru was HQ450387 from the USA. Peruvian rec sequence M31541 had an Argentinian major parent (X55802) and an unknown minor parent. The 18 rec sequences were removed from the alignment used for phylogenetic and population genetic analysis because rec sequences distort the results of most algorithms used for reconstructing phylogenies. The CP genes were also checked by RDP4, but no additional rec sequences were found. Thus, a significantly smaller proportion of the PVX population was recombinant compared with, for example, the population of PVY, where around 41% of isolates were recombinant [59].

Phylogroups
A ML phylogeny ( Figure 2) was generated from the non-rec concats using PhyML [82]. The topology of the phylogeny was the same as that reported for PVX by Cox and Jones [42] and Kutnjak et al. [43], who used 85 CP sequences and 29 complete genomes respectively, and two different methods of tree building: ML and NJ. Their phylogenies had a basal divergence, which produced two major phylogroups (I and II) that separated into five minor phylogroups, and, in conformity with the earlier reports, we call these I-1, I-2, II-1, II-2 and II-3. All phylogroup 1 concat isolates found previously were placed in minor phylogroup I-1, and I-2 only comprised three newly sequenced historical Peruvian isolates. We also grouped the distal parts of the phylogeny into 12 statistically supported clusters, A-L ( Figure 2), with 20 singletons. The sequences in each of the clusters are recorded in SM File S2 with the details of each isolate in SM File S1. The topology of the ML phylogeny of CP sequences was closely similar to that of the concats (data not shown). However, it had less well-defined clusters and less statistical support: 19.9% of the nodes of the concat tree were fully supported (SH = 1.0, [83]), whereas only 0.6% of the CP tree nodes were fully supported, and similarly, 22.1% and 14.8% respectively, of the other nodes had SH support values of 0.90-99. Nonetheless, the CP data adds detail to the distribution of Andean and non-Andean isolates in the PVX phylogeny and shows that whereas only Andean isolates form phylogroups I-2 (3 sequences) and II-3 (213 sequences), all the other phylogroups contain both Andean and non-Andean sequences, and, likewise, in phylogroup I-1, cluster D and the singletons are exclusively Peruvian, whereas clusters A, B and C include both Andean and non-Andean sequences. This distribution indicates that PVX originated in or near Peru and spread from there to the remainder of the world.
Among the historical PVX isolates belonging to known biological strain groups, the phylogenetic placement of the new complete genomic sequences (Table 1a) was as follows: isolates B (MT708140) from Scotland, and EX (MT708138) and EX4 (MT708137) from England, which belong to strain groups 2 or 4, were in minor phylogroup II-1, isolates HB (MT708134) from Bolivia, and both CP (MT708142) and CP4 (MT708141) from Peru, which belong to strain groups 2 or 4, were in minor phylogroup II-2, isolates A (MT708136), DP (MT708143) and E (MT708135) from Peru, which belong to strain groups 1 or 3, were in minor phylogroup I-2, and isolate DX4 (MT708139) from England, which belongs to strain group 4, was in minor phylogroup I-1. nodes had SH support values of 0.90-99. Nonetheless, the CP data adds detail to the distribution of Andean and non-Andean isolates in the PVX phylogeny and shows that whereas only Andean isolates form phylogroups I-2 (3 sequences) and II-3 (213 sequences), all the other phylogroups contain both Andean and non-Andean sequences, and, likewise, in phylogroup I-1, cluster D and the singletons are exclusively Peruvian, whereas clusters A, B and C include both Andean and non-Andean sequences. This distribution indicates that PVX originated in or near Peru and spread from there to the remainder of the world. The cluster B concats were 76, half from Peru, three from Colombia and the remainder from other continents: one of its two largest subclusters was of 29 concats only from The PVX sequences were separated into five concat minor phylogroups ( Figure 2) and analyzed using DnaSP 6. The complete concat sequences within each of them were analyzed, as were their five individual genes (SM Table S2): the I-2 and II-1 minor phylogroups were of only three sequences each, whereas the I-1, II-2 and II-3 minor phylogroups were represented by 203, 37 and 123 sequences, respectively. The genetic diversity estimates (π) confirmed that minor phylogroups I-2 and II-3, which contained only Andean isolates, are more genetically diverse than the other PVX minor phylogroups, as were most of their individual genes, except TGB3. The RdRp and the TGB3 genes, the largest (69% of the concat) and smallest (3.3%) respectively, are the most variable PVX genes. Furthermore, as for most virus genomes, the dN/dS ratios of the concats and all genes are less than one (mean 0.081), indicating that they have been under strong purifying (negative) selection. The smallest dN/dS ratio (mean 0.070) was that of the CP gene, perhaps because of its many functions: the activation of PVX RNA translation [93], the transport of infection [94] and viral genome RNA encapsidation [95].
Tajima s D statistical test [88] distinguishes which gene sequences have been evolving randomly ('neutrally') from those that have been evolving under non-random processes, such as selection, demographic expansion or contraction. This test returns a negative value when there are more polymorphisms in the population than expected under neutral processes and calculates the probability that the result is significant. We applied this test to the concat and individual genes of the three best-represented minor phylogroups and found that the concats and most of the individual gene sequences of I-1 and II-2, but not II-3, returned significant negative values (SM Table S2). This difference therefore correlates with the fact that I-1 and II-2, but not II-3, included isolates from outside the Andes, and indicates that the non-Andean populations of PVX were established by population expansion of migrants from the Andean population. The Tajima s D difference between Andean and non-Andean concats is consistent throughout their length in all genes, as was confirmed using the sliding window function of DnaSP 6 ( Figure 3). The details of the phylogeny of Cluster B (SM Figure S1), however, indicate that the spread of PVX from its Andean population to other parts of the world was not a simple single-direction migration and divergence. Several distinct subclusters within Cluster B include isolates from both the Andes and elsewhere (SM Figure S1; green and red Accession Codes, respectively). If, for example, one applies cladistic reasoning to sub-cluster B1 of SM Figure S1, it has a clear Peruvian origin as its basal divergences all involve Peruvian isolates, whereas, by the same reasoning, sub-cluster B2 clearly originated from outside the Andes but has recently spread to Peru. Thus, it seems that although PVX has mostly spread from the Andes to other parts of the world, it may have done so on more than one occasion, and there has also been some complex 'repatriations'.

World Populations of PVX
DnaSP 6 was used to assess the genetic linkage or 'gene flow' between PVX populations from different regions of the world. This was measured using FST and the 'gene flow parameter' (Nm), which indicate maximal linkage when FST is the smallest positive value, and Nm the largest positive value [96,97]. It can be seen (Table 3a) that the Andean PVX population (n = 346) is linked most closely with the European population (n = 17) by both the FST metric (0.073, the smallest), and by the Nm metric (3.19, the largest). Also, although similar values of the two measures were obtained for the Africa:Asia comparisons, these are less reliable as there were only six sequences in the Africa group. The comparisons of The details of the phylogeny of Cluster B (SM Figure S1), however, indicate that the spread of PVX from its Andean population to other parts of the world was not a simple single-direction migration and divergence. Several distinct subclusters within Cluster B include isolates from both the Andes and elsewhere (SM Figure S1; green and red Accession Codes, respectively). If, for example, one applies cladistic reasoning to sub-cluster B1 of SM Figure S1, it has a clear Peruvian origin as its basal divergences all involve Peruvian isolates, whereas, by the same reasoning, sub-cluster B2 clearly originated from outside the Andes but has recently spread to Peru. Thus, it seems that although PVX has mostly spread from the Andes to other parts of the world, it may have done so on more than one occasion, and there has also been some complex 'repatriations'.

World Populations of PVX
DnaSP 6 was used to assess the genetic linkage or 'gene flow' between PVX populations from different regions of the world. This was measured using F ST and the 'gene flow parameter' (Nm), which indicate maximal linkage when F ST is the smallest positive value, and Nm the largest positive value [96,97]. It can be seen (Table 3a) that the Andean PVX population (n = 346) is linked most closely with the European population (n = 17) by both the F ST metric (0.073, the smallest), and by the Nm metric (3.19, the largest). Also, although similar values of the two measures were obtained for the Africa:Asia comparisons, these are less reliable as there were only six sequences in the Africa group. The comparisons of the only three North American concats were omitted as they gave negative metrics.  F ST values were also calculated from the alignment of CP sequences grouped into 'continent' populations (SM File S1). The results (Table 3b) show that the PVX populations of each of the 'continents' are primarily linked with that of West Eurasia (Europe plus Russia and Middle East): the smallest F ST for comparisons involving East Asia (China, Korea and Japan) is West Eurasia (0.041), for one involving the Indian Subcontinent (India/Pakistan/Bangladesh) is also West Eurasia (0.069), and likewise for the Andean region (0.205). Thus, a combination of Tajima s D, F ST and Nm analyses of PVX gene populations indicate that PVX most likely spread first from the Andes to Europe/Russia/Middle East and from there, separately, to East Asia and the Indian subcontinent.

PVX Populations of Peru
The populations of PVX isolated from different Departments of Peru ( Figure 1) were compared. The Peruvian population is apparently "well mixed" [98] as there was no correlation between the phylogenetic clusters and the Peruvian Departments from which samples were collected. No cluster, however small, had sequences from a single site, the more sequences in a phylogenetic cluster the greater the number of Departments in which it was found (Figure 4). The smallest clusters were found in two to four Departments that were not necessarily adjacent, and the largest cluster of isolates (G) was found in all nine Departments, and included 14 sequences from Cajamarca in North Peru and four from Puno in South Peru at the border with Bolivia, where the northern portion of Lake Titicaca is situated (note: the distance between Cajamarca and Lake Titicaca is more than 1500 km). Viruses 2021, 13, x FOR PEER REVIEW 13 of 24       Table 2 and Figure 2.   Table 2 and Figure 2. As there were no Peruvian isolates within cluster E, it is not included in this Figure. Although only 18 rec concats (4.9% of the 388) were found ( Table 2), 17 of these were Peruvian. One of these had a major Argentinian parent. The 16 with entirely Peruvian parents support the conclusions of the F ST comparisons of the PVX populations from different Departments. First, they confirm that the Peruvian PVX population is geographically "well mixed" as the number of rec sequences found in each Department was broadly related to the total number of isolates collected from that Department (Figure 6), with the exception of the Ica population, which had twice as many rec sequences as any other Department yet was of average size. The distribution of the likely 'parental' isolates identified by the RDP4 analysis was not obviously related to their phylogeny or sampling density. Table 2 also shows that only four of the rec sequences were isolated from the same Department. This is because both of their 'parental' isolates involved just three rec sequences and parents from Junin and one from Ica, and only two more were from the same Department as the major 'parent' (Ica and Puno). In addition, the other nine were isolated from plants collected from a Department that did not provide either 'parent'. Most 'parental' isolates, both major and minor, were from Junin (ten and eight, respectively) or from Ica (three and five, respectively), and only four from other Departments. Thus, the provenance and parentage of the rec sequences again supports the conclusion that there has been much movement of PVX between the coastal Departments and the mountain Departments that supply most of the seed potatoes for their winter-irrigated crops, namely between Lima and Ica, and between Junin and Huancavelica.
rental' isolates, both major and minor, were from Junin (ten and eight, respect from Ica (three and five, respectively), and only four from other Departments. T provenance and parentage of the rec sequences again supports the conclusion th has been much movement of PVX between the coastal Departments and the m Departments that supply most of the seed potatoes for their winter-irrigate namely between Lima and Ica, and between Junin and Huancavelica.

Dating
The 'collection date' of the 370 non-rec PVX isolates supplying the concats i (SM File S1). Therefore, it was possible to test their phylogeny for a linear tempor by the TempEst method-None was found. They gave a 'x-intercept' (i.e., TMRC to Most Recent Common Ancestor) of 3318.16 CE (Common Era) with a correlat ficient of −0.136, which is, of course, nonsense. Therefore, the B cluster was test rately by TempEst and LSD as it is of 76 concats with known collection dates, thou

Dating
The 'collection date' of the 370 non-rec PVX isolates supplying the concats is known (SM File S1). Therefore, it was possible to test their phylogeny for a linear temporal signal by the TempEst method-None was found. They gave a 'x-intercept' (i.e., TMRCA, Time to Most Recent Common Ancestor) of 3318.16 CE (Common Era) with a correlation coefficient of −0.136, which is, of course, nonsense. Therefore, the B cluster was tested separately by TempEst and LSD as it is of 76 concats with known collection dates, though some of these are uncertain as only their GenBank submission dates, not their collection dates, are recorded in GenBank and research papers. For these, we used their submission dates minus one year. The TempEst and LSD analyses both showed it to have a temporal signal. The B cluster seems to have evolved coherently in that it has an even distribution of nodes and branch lengths and includes both Peruvian and non-Peruvian isolates. They gave an intercept of 1593 CE with a correlation coefficient of 0.178 (p = 0.125). The TMRCA of the B cluster was also estimated by LSD and found to be 1451 CE (evolutionary rate: 0.68 × 109 CE-1855 CE; 1.1 × 10 −4 s/s/year). However, when all non-synonymously changing codons were removed from the B cluster concats leaving only sequences of synonymously changing sequences (3822 nts long), they gave a nonsensical TMRCA of 3822 CE in a TempEst analysis.
The mean of the patristic distances connected through the root of the overall PVX ML phylogeny (Figure 2) is 15.86 times greater than the mean of those connected through the root of the B cluster (1.510 s/s ± 0.045 and 0.095 s/s ± 0.015, respectively), and this enables the TMRCA of PVX to be extrapolated from the TMRCA of the B cluster.

Origin of PVX
We checked whether the geographical origins of PVX might be indicated by its phylogenetic relationships with other potexviruses, as comparisons of this sort had shown that PVA [58], PVY [59] and wild potato mosaic virus [99] had all evolved from lineages of potyviruses that were originally American. Figure 7 shows a ML phylogeny of 44 potexviruses calculated from the concatenated nt sequences of their replicase and CP genes. It can be seen that PVX forms a basal lineage of the potexviruses on a very long branch. None of the potexviruses have obvious phylogeographic groupings, except perhaps those infecting cacti (CaVX, OpVX, PitVX, SchlVX and ZygVX), which are an iconic South American group of plants, but are collected as a hobby, so their apparent grouping may reflect recent activity of hobbyists rather than their geographical origin.

Origin of PVX
We checked whether the geographical origins of PVX might be indicated by its phylogenetic relationships with other potexviruses, as comparisons of this sort had shown that PVA [58], PVY [59] and wild potato mosaic virus [99] had all evolved from lineages of potyviruses that were originally American. Figure 7 shows a ML phylogeny of 44 potexviruses calculated from the concatenated nt sequences of their replicase and CP genes. It can be seen that PVX forms a basal lineage of the potexviruses on a very long branch. None of the potexviruses have obvious phylogeographic groupings, except perhaps those infecting cacti (CaVX, OpVX, PitVX, SchlVX and ZygVX), which are an iconic South American group of plants, but are collected as a hobby, so their apparent grouping may reflect recent activity of hobbyists rather than their geographical origin.

Discussion
Our analysis has provided important new information about the influence humans have inadvertently exerted upon the dispersion, diversity and evolution of PVX both within the potatoes' Andean domestication center and within the rest of the world. This information was obtained through applying a combination of HTS, recombination, phylogenetic, population genetic, dating and other analyses to study for the first time an extensive virome consisting of PVX isolates from both potato s domestication center in the Andean region and the rest of the world. The findings revealed the fingerprints of humans as a vector driving the global changes in the PVX population. These fingerprints included the periods both before and after potato, and along with it, PVX was dispersed far away from its original crop domestication center, resulting in acceleration of these changes. This improved understanding our findings have provided will benefit researchers in future similar studies with other economically important viruses dispersed away from their domestication centers to other parts of the world with their principal crop hosts. It will also benefit plant breeders, seed producers and marketers alike, in addressing the threat posed by virus diseases originally emanating from crop domestication centers.
The phylogeny of the large population composed of 370 non-rec sequences both confirmed earlier phylogenetic analyses of the smaller selection of sequences available then [42,43,100] and added to it. Both the 370 concat and 488 CP sequences were placed in two major (I, II) and five minor (I-1, I-2, II-1, II-2, II-3) phylogroups. Of these, I-2 (number of sequences, n = 3) and II-3 (n = 127) were of Andean isolates only, II-1 (n = 8) was of European isolates only and I-1 (n = 351) and II-2 (n = 43) were of isolates from both the Andes and elsewhere. Around half of the I-1 sequences, but only 10% of the II-2 sequences, were non-Andean. Considering that one of the well-sampled phylogroups, II-3 (n = 127), was of Andean sequences only, whereas the other well-sampled phylogroups, I-1 (n = 351) and II-2 (n = 43), were from both the Andes and elsewhere, it is likely that PVX first infected potatoes in the Andes and was spread from there to other parts of the world. Genetic diversity estimates (π) revealed that Andean minor phylogroups I-2 and II-3 were the most genetically diverse, indicating that they are the oldest, and the Tajima's D static test returned significant negative values for I-1 and II-2, but not for II-3, indicating that the first two, but not the third, arose by expansion of migrants from the Andean population. Furthermore, a combination of Tajima s D, F ST and Nm analyses of PVX gene populations indicated that PVX most likely spread first from the Andes to Europe and Middle East, and then independently from there to East Asia and the Indian subcontinent. However, applying cladistic reasoning to the distribution of Andean and non-Andean PVX isolates in large sub-clusters B1 and B2 suggested that the migration was complex because, although PVX mostly spread from the Andes to other parts of the world, it likely did so on several occasions, and there had also been some more-recent PVX 'repatriations' to the Andes.
The dated sequences in the concat alignment yielded no detectable temporal signal in a TempEst analysis. We therefore studied Cluster B in more detail as it has isolates from both the Andes and elsewhere and TempEst and LSD analyses showed it to have a temporal signal. Its dated sequence yielded a TMRCA of c. 1593 CE, although with statistical support of only p = 0.178. Nonetheless, this is an entirely plausible date for PVX-infected tubers to have been transported in early shipments of potato tubers from the Andes to Europe as part of the 'Columbian Exchange' of crops between Europe and the Americas after their discovery by Columbus. This suggests that PVX became established in Europe well before the potato late-blight (Phytophora infestans) epidemic of the mid-19th century. The near simultaneous divergences of four large clusters of isolates (A, B, C, D) in the PVX phylogeny occurred during the same period as the major divergences found in the phylogenies of PVA, PVS and PVY [58][59][60]. These divergences all occurred around the mid-19th century following the famine-causing epidemics of late blight (Phytophthora infestans) in European potato crops in 1845-1849 [101,102]. The earliest potatoes carried to Europe lacked genetic diversity so, when the blight epidemic struck, almost all existing potato cultivars were killed. This greatly stimulated the breeding of new cultivars using potato germplasm, much of it imported from Chile in South America [48,103]. The surge in potato breeding and trade would have stimulated virus spread, and produced the divergences in the PVS, PVY and PVA populations [58,60,93], like that found by us in the PVX population. Therefore, there are two possible TMRCAs for Cluster B: either the poorly supported 1593 CE or the hypothetical 1868 CE. These may be extrapolated using patristic distances within the ML phylogeny of PVX ( Figure 2) to provide two estimates of PVX TMRCA using the ratio of the mean patristic distance of the branch tips (=leaves) connected through the midpoint root (1.510 ± 0.045 substitutions/site) and those connected through Cluster B s basal node (0.095 ± 0.015 substitutions/site), a ratio of 15.86. Thus, the 'poorly supported TMRCA' of PVX is around 6900 years ago, whereas the 'hypothetical TMRCA' is around 2380 years ago. Both of these are within the period since potato was first domesticated in the Andean region around 9000 years ago [104][105][106]. However, both are clearly earlier than the TMRCAs of PVY and PVA (c. 150 CE), when potato production increased during the Tiahuanaco (=Tiwanaku) empire, which lasted from 110 to about 1000 CE [46,107,108]. However, TMRCAs merely indicate the coalescence date of the variants in existing populations, and so may indicate that the smaller potato population of pre-Tiahuanaco times was able to sustain a more diverse PVX population of PVX than of either PVY or PVA.
Of the 388 PVX genomes studied, only 4.9% were found to be recombinant, which is an unexpectedly small percentage as genomes from comparable populations of two economically important potyviruses, PVY and turnip mosaic virus [59,109], have ten times as many rec sequences. Also, there was clear evidence that the Peruvian PVX population has been geographically well 'mixed', presumably by local trade in seed potatoes. Possibly, cross-protection occurring in mixed infections between PVX strains may limit recombination. However, the frequent occurrence of isolate mixtures within individual samples (326 genomic sequences obtained from 269 samples) collected in the field (Table 1b, Figure 1) suggest that this is unlikely to be important. Nonetheless, it might be useful if past studies on cross-protection by different strains of PVX [3,110] could be reinterpreted in the phylogenetic and geographic contexts our analyses have provided. None of PVX s CP genes were found to be recombinant.
In our study, biological strain group 2 isolates were in minor phylogroups II-1 and II-2, strain group 1 and 3 isolates in major phylogroup I and strain group 4 isolates within all three of these groupings. This fits the pattern found previously by Cox and Jones [42]. What is new here is that major phylogroup I s minor phylogroups I-1 and I-2 both contained isolates previously assigned to strain groups, those in I-2 coming solely from Peru and in I-1 being non-Andean. Absence of any strain groups in minor phylogroup II-3 reflects the lack of biological studies with any of the entirely new Peruvian isolates it consists of. Future research on Andean PVX isolates should include providing more information about the biological strain groups they belong to, especially those in minor phylogroup II-3.
Comparing historical isolate sequences with recent sequences of the same virus from the same part of the world can reveal whether regional alterations in virus populations have occurred with the passage of time [111]. For example, when genome sequences obtained from PVY isolates first isolated from potato in the period 1938-1984 in Western Europe were compared with recent ones: (i) none of the former belonged to the PVY rec sequences that have largely displaced their non-rec parents since their first appearance in the 1980s, (ii) no other examples of potato isolates belonging to the minor phylogroup PVY C1 found readily in 1939-1943 appeared subsequently and (iii) minor phylogroup PVY C2 became rare after the 1980s. Thus, a major population shift away from PVY C1 occurred over the last 80 years and of PVY C2 over the last 30 years [111][112][113][114] (note: minor phylogroups PVY C1 and PVY C2 were recently renamed PVY C and PVY O3 respectively, by Fuentes et al. [59]). The oldest PVX isolate in the phylogeny is B (MT708134) isolated from potato cv. Duke of York in 1940 [24,115], which fits into minor phylogroup II-1. This was formerly the type isolate of potato virus B before this virus was considered to be a strain of PVX [116,117]. It belongs to PVX strain group 2 which was widely studied in the early days of Western European potato virus research [6,24,115], but by the 1980s was difficult to find except in old potato cultivars such as King Edward and Epicure [28,118]. The PVX isolates in minor phylogroup II-1 are all old ones from Western Europe. Therefore, there has been a major population decline in the occurrence of isolates within this minor phylogroup. This decline occurred due to the very widespread occurrence of PVX resistance gene Nb in Western European potato cultivars bred since the 1940s [29,56,115]. Establishing whether a similar decline in the phylogroup I-2 population has occurred in the Andean region would require future research on Andean PVX isolates to establish to which biological strain groups they belong.
Our analyses of local inter-Departmental spread of PVX within Peru reveals that the most significant PVX gene flow (genetic linkage) was: firstly, between the Andean mountain Departments of Cajamarca, Huanuco, Junin, Huancavelica and Puno, where potatoes are grown under summer rainfall, secondly between the coastal sea level Departments of Ica and Lima, where potatoes are grown under irrigation in winter, and the mountain Departments closest to them that supply their seed potatoes (Junin for Lima; Huancavelica for Ica) (Figure 4).
Overall, we find many features of the Andean and world PVX populations are completely congruent with the hypothesis that humankind has been the principal long-distance vector of PVX from its origin within the Andean potato population. However, our analyses of the placement of PVX within the potexviruses give no clues on the origin of PVX, nor of the populations, hosts or world regions that were involved in its survival indicated by the very long branch connecting PVX to the base of the potexvirus phylogeny ( Figure 7).
Among the common potato viruses, PVX is not one of the most damaging to the potato crop. Nevertheless, it infects potato worldwide: damaging severe PVX strains sometimes occur and mild PVX strains cause extremely damaging disease in mixed virus infections with PVA and PVY [7,8,13,15,16]. Also, PVX infection of potato plants has been reported to enhance their resistance to potato late-blight disease [119]. Moreover, minor phylogroups I-2 and II-3 were entirely Andean and, although biological studies have been undertaken with isolates from I-2 [30,31], none have been done with II-3 isolates. Also, only one of the 18 rec PVX isolates we found was from outside the Andean region (from the USA). Therefore, biological studies seem advisable to establish whether PVX rec isolates, and the non-rec isolates making up minor phylogroups I-2 and II-3, might be a potential cause for concern for potato-growing countries outside the Andean region. Following the completion of such studies, the appropriate biosecurity authorities of non-Andean countries would be in a position to consider whether precautions to prevent their establishment are required.
Existing systems for large-scale routine detection of common potato viruses, such as PVX, PVS, PVY, PVA and potato leaf roll virus (PLRV; genus Polerovirus, family, Luteoviridae), include using multiplex reverse transcription polymerase chain reaction (RT-PCR) assays to detect them simultaneously and quantitative real-time RT-PCR to provide greater sensitivity [120,121]. Our sequencing study involving many PVX isolates from the potato crop s Andean potato domestication center, along with our earlier sequencing studies with PVS, PVY and PVA isolates from this region [58][59][60], have greatly increased the sequence diversity now available for each of these four viruses. To ensure greater reliability of future multiplex RT-PCR detection procedures, we recommend the preparation of new primer sets able to detect the increased PVX, PVS, PVY and PVA sequence diversity revealed by our studies. HTS has proven unsuitable for use in large-scale routine virus detection because of its prohibitive cost and the variable genome structure of RNA viruses, which constitutes a serious barrier to designing diagnostic markers that detect diverse plant virus species [122]. Fortunately, targeted genome sequencing (TC-Seq), an amplicon sequencing strategy involving a multiplex PCR reaction that not only detects diverse virus sequence targets simultaneously, but also greatly reduces cost and workload, holds considerable promise for sensitive, large-scale routine plant virus testing in both biosecurity and healthy stock programs in the future [122].
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/v13040644/s1. Figure S1: The ML phylogeny of the concats of potato virus X Cluster B isolates. The Accession Codes of Peruvian isolates are in green, those of isolates from non-South American countries are in red, Table S1: Details of the origins of new potato virus X isolates sequenced from Peru in this study, Table S2: Genetic diversity of potato virus X population based on the coding genome sequence and each gene separately, File S1: Spread sheet version of Table 1 including additional potato virus X isolate data. Column A, Database record number; B, Accession Code; C, Peruvian isolate number (see Figure 1); D, Phylogroup (see Figure 2 and SM File S2); E, Cluster (see Figure 2); F, Isolate name; G, Provenance (country or Peruvian Department from which the isolate was collected); H, Continent of provenance (grouping used for DNAsp6 analysis); I, rec, recombinant, n-rec, non-recombinant (cluster not given for rec sequences); J, Collection date, "?" indicates submission date if collection date unknown; K, Original host of isolate; L, Host cultivar, if known; M, CP sequences analysed, File S2: The Accession Codes of the potato virus X isolates in the different clusters shown in Figure 2.