The Westward Journey of Alfalfa Leaf Curl Virus

Alfalfa leaf curl virus (ALCV), which causes severe disease symptoms in alfalfa (Medicago sativa L.) and is transmitted by the widespread aphid species, Aphis craccivora Koch, has been found throughout the Mediterranean basin as well as in Iran and Argentina. Here we reconstruct the evolutionary history of ALCV and attempt to determine whether the recent discovery and widespread detection of ALCV is attributable either to past diagnostic biases or to the emergence and global spread of the virus over the past few years. One hundred and twenty ALCV complete genome sequences recovered from ten countries were analyzed and four ALCV genotypes (ALCV-A, ALCV-B, ALCV-C, and ALCV-D) were clearly distinguished. We further confirm that ALCV isolates are highly recombinogenic and that recombination has been a major determinant in the origins of the various genotypes. Collectively, the sequence data support the hypothesis that, of all the analyzed locations, ALCV likely emerged and diversified in the Middle East before spreading to the western Mediterranean basin and Argentina.


Introduction
Although geminiviruses have been intensively studied since the 1970s [1,2], it is only since the recent development and application of viral metagenomics-based approaches that the true diversity of this group of viruses is starting to become apparent [3,4]. The genus Capulavirus is just one of several new genera in the family Geminiviridae that have been established to accommodate some of the novel and diverse geminiviruses that have been discovered over the past ten years [5]. The genus Capulavirus currently contains four species (Alfalfa leaf curl virus, Euphorbia caput-medusae latent virus, French bean severe leaf curl virus, and Plantago lanceolata latent virus) that infect both cultivated and non-cultivated plants in southern and northern Europe, the Indian subcontinent, and South Africa [6][7][8].
Alfalfa leaf curl virus (ALCV), which causes severe disease symptoms in alfalfa (Medicago sativa L.), is transmitted by the widespread aphid species, Aphis craccivora Koch [7,9]. A study of ALCV isolates collected between 2010 and 2014 in France and Spain [10,11] revealed that both intra-and inter-species recombination has played a significant role in the evolution of ALCV. In addition, this study suggested that ALCV was probably widely distributed across the Mediterranean basin [11]. Consistent with this hypothesis, ALCV was subsequently reported in 2018 from Jordan, Syria, Lebanon, and Tunisia [12] as well as from the non-Mediterranean countries Iran [13] and Argentina [14].
The discovery of ALCV in Argentina and the Mediterranean basin so soon after its initial characterization raises questions regarding the potential global emergence of this virus: does its discovery in these far-flung regions imply that a "true" global and potentially damaging geminivirus emergence event has recently occurred, or, more prosaically, does it merely reflect the fact that broader and more intensive sampling of plant material coupled with more sensitive virus detection techniques have only recently revealed an epidemic that has been ongoing for tens (or even hundreds) of years? Interestingly, alfalfa symptoms resembling those caused by ALCV, such as plant stunting and leaf curling, crumpling, and shriveling, have been reported since the 1950s in Europe (including France, Bulgaria, Romania, and Spain) and the Middle East (Saudi Arabia) [15][16][17][18][19]. The etiological agent of this alfalfa disease, which was shown to be transmissible by both grafting and A. craccivora [18,20], was identified as a rhabdovirus, referred to as lucerne enation virus (LEV) [15]. However, further studies revealed that different types of symptoms were observed depending on the mode of transmission [19], suggesting that in several instances where alfalfa diseases have been attributed to LEV, one or more additional viruses may have been co-infecting plants together with LEV. If this "Trojan horse" hypothesis was true, the apparently sudden occurrence of ALCV could simply reflect diagnostic biases rather than the recent emergence and global spread of the virus. Addressing this question is of great importance since alfalfa is the most-cultivated perennial forage legume in temperate regions of the world [21].
Here we examine 120 ALCV full genome sequences recovered from ten countries to reveal four distinct ALCV genotypes (ALCV-A, ALCV-B, ALCV-C, and ALCV-D) with the isolates of each genotype sharing <93% genome-wide pairwise identity with those of the other three genotypes. In addition, we confirm that ALCV isolates are highly recombinogenic and that recombination has been a major determinant in their origins. Phylogenetic analyses also suggest that, of all the countries analyzed, ALCV most likely originated in Iran and diversified in the Middle East before spreading to the Mediterranean basin and then onwards to Argentina.

Plant Sampling
Leaves from 564 alfalfa plants that were either asymptomatic or presented with conspicuous disease symptoms (including plant stunting or varying degrees of leaf curling, crumpling, and shriveling), were

DNA Extraction and PCR-Mediated Alfalfa Leaf Curl Virus Detection
Total DNA from the French, Greek, Italian, Jordanian, Lebanese, Namibian, South African, Spanish, Syrian, and Tunisian alfalfa samples was extracted using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) following the manufacturer's protocol. Total DNA from alfalfa samples from Argentina and Iran was extracted using the cetyltrimethylammonium bromide (CTAB) method [22]. PCR-based detection of ALCV from alfalfa plants collected in all countries other than Argentina and Iran was performed using the primer pair ALCV-187F (5 -TGG AAT ATT GTG CTG CTT GG-3 ) and ALCV-971R (5 -ATT TTG GGA CTT GTG CTC CA-3 ), as previously described in Bernardo et al. (2016) [11]. The presence of ALCV from the Iranian alfalfa samples was tested using PCR with the primer pair Gemini F1 (5 -ATG ATG GAT AAT TCA AAC CC-3 ) and Gemini R2 (5 -CAC CTC CAC TGT CTT GTC CA-3 ), as described in Davoodi et al. (2018) [13]. The presence of ALCV from the Argentinean alfalfa samples was tested using PCR with the KAPA HiFi HotStart PCR Kit (Kapa Biosystems, Wilmington, MA, USA) following the manufacturer's protocol with the primer pair ALCV CPF (5 -GAG AAC GTA TGG ATT GGT C-3 ) and ALCV CPR (5 -AGT GTA TGC GTT CTT CTG G-3 ). Amplification conditions consisted of: 95 • C for 3 min, 35 cycles at 98 • C for 20 s, 58 • C for 15 s, 72 • C for 45 s, and a final extension at 72 • C for 1 min.

Pairwise-Distance, Phylogenetic, and Recombination Analyses
Genome-wide pairwise comparisons of 85 newly determined ALCV genome sequences together with 35 previously determined ALCV genome sequences were done using the Sequence Demarcation Tool SDT v1.2 [23]. The 120 ALCV sequences were aligned together with the capulavirus Euphorbia caput-medusae latent virus (EcmLV, GenBank HF921459 [6]; chosen as outgroup) using MUSCLE [24] with default settings. Evidence of potential recombination events was detected within the 120 ALCV full-genome alignment using the RDP, GENECONV, BOOTSCAN, MAXIMUM CHI SQUARE, CHIMAERA, SISCAN and 3SEQ recombination detection methods that are implemented in RDP4.94 (using default settings [25]). Only recombination events that were detected with two or more detection methods, and had significant phylogenetic support, were considered credible evidence of recombination.
A maximum likelihood (ML) phylogenetic tree of the 120 aligned full genome sequences, with recombinant regions removed, was constructed using PhyML3 [26] implemented in MEGA 7.0.26 [27] with JC + G selected as the best fit nucleotide substitution model and 1000 non-parametric bootstrap replicates. The tree was rooted with EcmLV. Branches with less than 30% bootstrap support were collapsed using TreeGraph2 [28]. In addition, a maximum likelihood (ML) phylogenetic tree of the 56 aligned ALCV-A full genome sequences, with recombinant regions removed, was constructed using PhyML3 [26] implemented in MEGA 7.0.26 [27] with T92 + I + G selected as the best fit nucleotide substitution model and 1000 non-parametric bootstrap replicates. The tree was rooted with one Iranian isolate from ALCV-C (GenBank accession number: MH085199). Branches with less than 50% bootstrap support were collapsed using TreeGraph2 [28].
The evolutionary relationships of ALCV isolates were reconstructed using replication-associated protein (Rep) and coat protein (CP) amino acid sequences. Datasets consisting of 120 predicted ALCV Rep and CP amino acid sequences together with the corresponding homologous sequence from EcmLV, chosen as a capulavirus outgroup, was used to root the Rep and CP phylogenies. Predicted Rep and CP amino acid sequences were aligned using MUSCLE [24] with default settings. Maximum likelihood phylogenetic trees of the Rep and CP were inferred using PhyML3 [26] implemented in MEGA with the Jones-Taylor-Thornton (JTT) + G amino acid substitution model chosen as the best-fit using ProtTest [29]. Five hundred bootstrap replicates were used to test the support of branches. Branches with less than 50% bootstrap support were collapsed using TreeGraph2.

Statistical Analyses
Correlation of geographic and genetic distance were assessed using a Mantel test implemented in GenAlEx [30] (with 999 random permutations used to test the significance of the correlation). Whereas the genetic distance matrix was obtained for the 56 aligned ALCV-A full genome sequences, with recombinant regions removed using MEGA 7.0.26 (CLUSTALW alignment followed by uncorrected pairwise distance estimation with pairwise deletion of gaps), the geographic distance matrix was obtained using the program Geographic Distance Matrix Simulator 1.2.3 (http://biodiversityinformatics.amnh.org/open_source/gdmg).
Using this threshold level, four ALCV genotypes (ALCV-A, ALCV-B, ALCV-C, and ALCV-D) were clearly distinguished ( Figure 1). ALCV-A isolates (n = 56) were present in almost all of the countries where ALCV has so far been reported, including France, Greece, Iran, Italy, Jordan, Lebanon, Spain, Syria, and Tunisia. By contrast, ALCV-B (n = 19) was only recovered in France and Spain and ALCV-C (n = 10) and ALCV-D (n = 35) have to date, only been found in Iran and Argentina, respectively ( Figure 1). Averages of genome-wide pairwise identities between isolates of each genotype indicate that ALCV-A and ALCV-B were the most genetically diverse groups (95.7% and 96.4%, respectively) with ALCV-C isolates sharing, on average, 97.1% identity and ALCV-D isolates sharing 98.1% (Table 1). Specifically, two outlier isolates from Spain (ES34-2 and ES52-18), with strong evidence of recombination, were assigned to genotype ALCV-B because they share >93% genome-wide pairwise identity with 10 out of the 17 other isolates of genotype ALCV-B ( Figure 1). While ALCV-A and ALCV-D isolates all contained at least seven recognizable open reading frames (ORFs) with more than 30 aa, including four virion-sense ORFs (V1, V2, V3, and V4) and three complementary-sense ORFs (C1, C2, and C3), ALCV-B and ALCV-C isolates lacked the V2 ORF. Both of these genomic organizations have been previously described [11].

Phylogenetic and Recombination Analyses
Twenty-one unique recombination events were detected (Table 2). Notably, all of the examined ALCV isolates displayed traces of recombination events with, on average, two events being evident within each analyzed isolate. Ten out of the 21 ALCV recombination events apparently involved intra-species transfers of sequences between ALCV variants whereas the other 11 detected events apparently involved inter-species sequence transfers (Table 2).
Interestingly, event 2 (corresponding with event 1 in Bernardo et al. (2016) [11]), which involved an inter-species sequence transfer of the entire replication-associated protein (rep) gene, accounted for the clear divergence of the ALCV-A isolates from all of the other ALCV genotypes (Figure 2A). At least five recombination events involving the partial replacement of the rep gene (events 3, 4, 5, 11, and 20) also appear to be the primary causes of the genetic differences found between ALCV-B, -C, and -D isolates (Figure 2A).
The coat protein (cp) genes of the ALCV isolates have apparently been less affected by inter-species sequence exchanges than the rep genes of these isolates (Table 2); a factor that was likely responsible for the ALCV CP proteins displaying less variability than the ALCV REP proteins. The ALCV CP protein is likely to play a central role in insect transmission and virus genome packaging [31] and, as appears to be the case for begomoviruses [32], the cp gene is either less prone to recombination than the rep gene, or is more prone to recombination-induced functional impairments than the rep gene such that viruses with recombinant cp genes are selected against. However, event 15 (corresponding with event 9 in Bernardo et al. (2016) [11]), which is detectable in all analyzed ALCV isolates other than those in the ALCV-D genotype (all from Argentina) and eleven of the ALCV-B isolates from France and Spain, involved the transfer of a fragment of the cp. If this recombination event predated the most recent common ancestor of the analyzed ALCV isolates, then, in the ALCV-D isolates and the eleven ALCV-B isolates, evidence of this event may have been obscured by subsequent recombination events 6, 7, 13, 14, and 17 ( Table 2).
The ALCV-D genotype has also probably arisen following two major recombination events collectively involving transfers of approximately 2/3 of the genome: one involving acquisition of a rep gene (event 11, Figure 2A) and the other involving acquisition of the cp gene (event 18, Figure 2B).    Table 2) are depicted on branches of both phylogenetic trees.
These results indicate that, as is the case with other geminiviruses, ALCV (and probably the other capulaviruses) are likely highly recombinogenic and, as a consequence, have the potential for rapid adaptive evolution [33][34][35][36]. Given that ALCV is transmitted by A. craccivora, which feeds on a diverse range of host plant species [37], the host range encounters that ALCV might naturally have with other geminiviruses is probably high. This alone might account for 38% of the recombination events that were identified here as involving unknown geminiviruses. In addition, while A. craccivora has been reported on all continents except the Antarctic [38], and displays some morphological and and coat protein (CP) (B) amino acid sequences of 120 ALCV isolates, both rooted with Euphorbia caput-medusae latent virus CP and Rep amino acid sequences. Branches with less than 50% bootstrap support were collapsed. Branches associated with a black dot have bootstrap supports above 90% whereas those with white dots have bootstrap supports above 70%. Putative occurrences of recombination events (numbers within squares correspond to the number of the event as listed in Table 2) are depicted on branches of both phylogenetic trees.
These results indicate that, as is the case with other geminiviruses, ALCV (and probably the other capulaviruses) are likely highly recombinogenic and, as a consequence, have the potential for rapid adaptive evolution [33][34][35][36]. Given that ALCV is transmitted by A. craccivora, which feeds on a diverse range of host plant species [37], the host range encounters that ALCV might naturally have with other geminiviruses is probably high. This alone might account for 38% of the recombination events that were identified here as involving unknown geminiviruses. In addition, while A. craccivora has been reported on all continents except the Antarctic [38], and displays some morphological and genetic evidence of differentiation into host races [39], one can expect that geographical populations of this aphid could have driven the microevolution of the viruses that they carry and, by extension, promoted the rapid adaptive evolution of these viruses by recombination: as is suggested by the diversity of the Argentinean ALCV isolates.

Geographic Distribution of ALCV
While ALCV was initially reported from southern France and northern Spain, we show here that ALCV is found in diverse geographical and climatic zones of both the Old and New Worlds ( Figure 3A), including temperate oceanic (Rodez/France or Buenos Aires/Argentina), continental (Entzeim/France), mountainous (Courmayeur/Italy), cold semi-desert (Isfahan/Iran), hot semi-desert (Kerman/Iran), or subtropical (Jujuy/Argentina) climatic zones. Given the worldwide distribution of A. craccivora and the "climatic flexibility" of ALCV, it is plausible that few natural barriers exist that would effectively limit the global spread of this virus [38]. Even the presently known distribution of ALCV suggests that it is capable of infecting a range of alfalfa varieties within the M. sativa species complex, including M. sativa subsp. sativa and probably M. sativa subsp. falcata.
Although the natural host range of ALCV is unknown, it is plausible that even without alternative hosts, ALCV could persist in "wild" alfalfa populations that are commonly found within a variety of unmanaged habitats. If ALCV has a broad host range, as is commonly the case with geminiviruses, then it will be very difficult to control the local spread of the virus wherever it is introduced to in the world. Identification of alfalfa cultivars that are tolerant or resistant to A. craccivora could constitute a sustainable strategy to moderate the impacts of ALCV wherever it occurs.
It must be emphasized that the currently known geographical range of ALCV excludes some of the regions that were investigated in this study. While the virus was detected in every country within the Mediterranean basin and the Middle East from which samples were obtained, it was not detected in samples from South Africa and Namibia, where alfalfa has been grown since the mid 1800 s ( Figure 3A).
By contrast, we found that ALCV was widespread in Argentina, occurring in all 17 alfalfa-growing regions of the country from where samples were obtained. The 35 genome sequences from Argentinean samples collected between 2010 and 2017 were genetically highly homogeneous, sharing an average of 98.1% genome-wide pairwise identity. While this degree of diversity is consistent with the hypothesis that the Argentinean ALCV population was founded by a single introduced ALCV variant, our recombination analysis indicates that the descendants of this founder virus have undergone two significant inter-species recombination events that replaced 2/3 of the original genome, including the entire rep and cp genes with sequences from either a distantly related ALCV genotype or from a different Capulavirus species. One of the Argentinean isolates harbors traces of an additional minor recombination event (event 21, Table 2) that occurred in the large intergenic region. Collectively, the widespread distribution of ALCV in Argentina, the low diversity of the Argentinean ALCV population, and the fact that all of the isolates appear to have descended from the same recombinant ancestor, suggest that the virus was probably introduced only once and has subsequently spread throughout the country. This may have involved the efficient, large scale, and long-range transmission of the virus by A. craccivora. This is plausible since studies focusing on another persistently A. craccivora transmitted circular ssDNA virus (subterranean clover stunt virus, SCSV, Nanoviridae) have revealed that A. craccivora in Australia can migrate over several hundred kilometers from the coastal areas to cause SCSV re-infestation of pastures in the arid regions of southeast Australia [40]. Another possibility is that the same ALCV variant may have been introduced throughout Argentina in infected planting material such as seeds. Although seed-transmission has never been demonstrated for capulaviruses, this scenario cannot be completely ruled out as several recent studies have confirmed that geminiviruses belonging to at least three genera (Begomovirus, Becurtovirus, and Curtovirus) can be seed-transmitted [41][42][43].  [44]). Countries from which ALCV has been reported are highlighted in several colors corresponding to those used in the phylogenetic tree. (B) A maximum likelihood phylogenetic tree of the 120 aligned ALCV recombination-free genome sequences, with JC + G selected as the best fit nucleotide substitution model and 1000 non-parametric bootstrap replicates. The tree was rooted with Euphorbia caput-medusae latent virus. Branches with less than 30% bootstrap support were collapsed. Branches associated with a dotted circle have bootstrap support values above 50%.  [44]). Countries from which ALCV has been reported are highlighted in several colors corresponding to those used in the phylogenetic tree. (B) A maximum likelihood phylogenetic tree of the 120 aligned ALCV recombination-free genome sequences, with JC + G selected as the best fit nucleotide substitution model and 1000 non-parametric bootstrap replicates. The tree was rooted with Euphorbia caput-medusae latent virus. Branches with less than 30% bootstrap support were collapsed. Branches associated with a dotted circle have bootstrap support values above 50%.

Geographical Origin of ALCV
The ML phylogenetic tree of all 120 aligned ALCV complete genome sequences (with recombinationally-derived genome fragments removed and rooted with EcmLV) indicated that, among the locations from which samples were analyzed, the most recent common ancestor (MRCA) was probably located in Iran ( Figure 3B). The three main ALCV lineages diverged from this MRCA, to subsequently form the ALCV-C, ALCV-B, ALCV-A, and ALCV-D genotypes ( Figure 3B). The genotype A/D lineage probably experienced three major recombination events: Event 2 that involved the acquisition of a rep gene sequence from an unknown capulavirus and yielded genotype ALCV-A, and Events 11 and 18 that respectively involved the transfers of rep and cp genes from a divergent currently undiscovered ALCV lineage or a currently undiscovered Capulavirus species, which together yielded genotype ALCV-D. Two recombination events between ancestral ALCV-A and ALCV-B viruses (Events 1 and 3 in our analysis) yielded a sub-clade within the genotype A/B lineage that contains rep gene sequences that are today found in viruses that would otherwise be classified as belonging to the ALCV-A genotype (represented by eight isolates in our analysis).
Interestingly, the averages of genome-wide pairwise identities of ALCV-A isolates from the Middle East (96.4%; isolates from Iran, Jordan, Lebanon, and Syria, Table 1) was lower than that of isolates from the western/central Mediterranean basin (98.0%; isolates from France, Italy, Greece Spain, and Tunisia, Table 1); this supports the hypothesis that genotype ALCV-A originated in the Middle East before spreading further westward. A Mantel test of association between total genetic distance and geographic distance revealed a strong correlation for the ALCV-A isolates (Rxy: correlation coefficient of Mantel test = 0.755; P (rxy-rand ≥ rxy-data: probability of positive autocorrelation (one tailed) = 0.001). This result implies a high degree of genetic isolation by distance within the Mediterranean and Middle East countries ( Figure 4A).
In addition, the ML phylogenetic tree of all 56 aligned ALCV-A complete genome sequences (with recombinationally-derived genome fragments removed and rooted with an ALCV-C isolate), indicated that the ALCV-A MRCA was probably located in the Middle East ( Figure 4B) and that the Greek isolates followed by the western Mediterranean isolates (from France, Italy, Spain, and Tunisia) became successively more divergent from the Middle Eastern isolates ( Figure 4B). Collectively these results suggested that ALCV-A isolates originated in the Middle East and spread further westward.
Finally, the average genome-wide pairwise identity of the ALCV-B isolates, (96.4%; where all examined isolates have so far only been found in France and Spain), was lower than that of ALCV-C (97.1%), ALCV-D (98.1%), and ALCV-A from the western/central Mediterranean basin (98.0%); this supports the hypothesis that ALCV-B has been circulating for longer in Western Europe than the other ALCV genotypes. Collectively, the sequence data support the hypothesis that ALCV emerged and diversified in the Middle East (with Iran being the most probable origin of all the analyzed locations) before spreading possibly in at least two waves, to the Mediterranean basin and onwards from there to Argentina. This "two-waves" hypothesis fits well with the known domestication and spread history of alfalfa. While the first evidence of alfalfa cultivation can be traced to 7000 BC in Iran and/or central Asia [44], the plant was probably only domesticated in central Asia in approximately 5000 BC [45]. Alfalfa cultivation then spread to the Middle East around 1000 BC and from there to Greece by the Medes armies between 500 and 700 BC, and finally to Italy by 300 BC and the rest of the Roman Empire by 100 AD ( Figure 3A). Alfalfa cultivation in Europe then declined during the Middle Ages but was later reintroduced there via Spain by the Arabs in approximately 700 AD ( Figure 3A). Thereafter, alfalfa was introduced to South America by the Spanish in the sixteenth century ( Figure 3A). . SSx (sum of products of x matrix elements) = 3.143 × 10 9 ; SSy (sum of products of y matrix elements) = 0.657; SPxy (sum of cross products of corresponding elements of the x and y matrices) 3.433 × 10 4 ; Rxy (Mantel correlation coefficient) = 0.755 and P (rxy-rand ≥ rxy-data); probability of Rxy based on 999 standard permutations across the full dataset = 0.001. (B) A maximum likelihood phylogenetic tree of the 56 aligned ALCV-A recombination-free genome sequences, with T92 + I + G selected as the best fit nucleotide substitution model and 1000 non-parametric bootstrap replicates. The tree was rooted with one Iranian isolate from ALCV-C (GenBank accession number: MH085199). Branches with less than 50% bootstrap support were collapsed. Branches associated with white and black dots have bootstrap support values above 50% and above 70%, respectively.
It is important to stress, however, that it remains to be determined whether the timescales of ALCV dissemination throughout the Mediterranean mirror those of alfalfa dissemination. Given that all the alfalfa samples examined here have been collected over the past eight years (2010-2017) there was insufficient temporal signal in our data sets to infer accurate and precise nucleotide substitution rates that would enable the estimation of the dates when ancestral sequences likely arrived in the countries from which they were sampled. Samples of ALCV-infected alfalfa plants collected in the early to mid-1900s might yield ALCV genome sequences that could provide the temporal signal that is necessary to infer whether ALCV did indeed disseminate together with the spread of alfalfa cultivation throughout the Mediterranean.

Conflicts of Interest:
The authors declare no conflict of interest.