High-Throughput Sequencing Reveals Differential Begomovirus Species Diversity in Non-Cultivated Plants in Northern-Pacific Mexico

Plant DNA viruses of the genus Begomovirus have been documented as the most genetically diverse in the family Geminiviridae and present a serious threat for global horticultural production, especially considering climate change. It is important to characterize naturally existing begomoviruses, since viral genetic diversity in non-cultivated plants could lead to future disease epidemics in crops. In this study, high-throughput sequencing (HTS) was employed to determine viral diversity of samples collected in a survey performed during 2012–2016 in seven states of Northern-Pacific Mexico, areas of diverse climatic conditions where different vegetable crops are subject to intensive farming. In total, 132 plant species, belonging to 34 families, were identified and sampled in the natural ecosystems surrounding cultivated areas (agro-ecological interface). HTS analysis and subsequent de novo assembly revealed a number of geminivirus-related DNA signatures with 80 to 100% DNA similarity with begomoviral sequences present in the genome databank. The analysis revealed DNA signatures corresponding to 52 crop-infecting and 35 non-cultivated-infecting geminiviruses that, interestingly, were present in different plant species. Such an analysis deepens our knowledge of geminiviral diversity and could help detecting emerging viruses affecting crops in different agro-climatic regions.


Introduction
Agroecosystems are used for the production of food, feed, fuel, fiber, and other harvestable goods providing human support and health [1]. Mexico has 196,437,500 ha, of which approximately 13% correspond to agricultural land. In 2016, 21.9 million ha were cultivated, with an agricultural behind the observed genetic variation since such knowledge will be extremely useful to develop resistance strategies (e.g., RNAi-based approaches) and, as a consequence, to prevent crop losses.
High-throughput sequencing (HTS) has provided the means to detect known viruses as well as to identify novel viruses in plants [35][36][37][38][39]. As a result, sensitive and accurate diagnosis of viral infection has been achieved, rendering this method extremely useful for quarantine purposes. The development of bioinformatic tools and the design of various pipelines have contributed significantly towards a deep analysis of the vast amount of HTS data produced [40].
The current next generation sequence (NGS) technologies have a couple of drawbacks: firstly, the contigs/singletons need to be annotated de novo via short read assembly, a process that may create chimeras deriving from the different genomes present in a sample, and secondly, the accurate differentiation of sequences; thus, confirmation is required by cloning followed by Sanger sequencing. The hope is that the latest single-molecule NGS technologies (3rd generation), where long reads are obtained, could address both of the above-mentioned issues, especially when their sequence reading error rates drop significantly compared to the levels of the previous NGS technologies. The resolution of the metagenome of a given sample could identify genetic variations in a viral population, providing the necessary input to study viral genome evolution, and determine which environmental factors might influence the generation of novel plant pathogens from previously benign viruses.
It is a well accepted notion that non-cultivated (wild) host plants play a key role in the generation of viral genetic variation, maintaining a certain degree of sequence heterogeneity convenient for viral adaptation in nature, without altering essential consensus sequences. Only recently, metagenomics studies on non-cultivated plant species have attracted the attention of plant researchers [38,41]. Mexico is considered to be one of the most megadiverse countries worldwide [42] and, despite the fact that some non-cultivated plant species have been reported as geminivirus reservoirs [16,43], to date the knowledge of geminivirus distribution in Mexican natural ecosystems is still limited. To this purpose, we aimed to determine the genetic diversity of begomoviruses in non-cultivated species that are present in the vicinity of cultivated crops (designated as the agro-ecological interface) in Northern-Pacific Mexico. In the present study, rolling circle amplification (RCA) was applied to DNA samples from wild plant species obtained during a five-year survey, and several begomoviral DNA signatures and begomoviral sequences were identified. Our results support the presence of a niche for begomoviral evolution in the neighborhood of important cultivation areas in Northern-Pacific Mexico.

Plant Sample Collection
A total of 422 non-cultivated plants (both symptomatic and asymptomatic) located in the area between crops and wild vegetation zones (agro-ecological interface), in seven states in the northern-pacific region of Mexico during the period 2012-2016, were collected, GPS-documented, photographed, and identified to the species level. Thus, 132 species of plants belonging to 34 families were identified. The sampling regions were grouped as follows: (1) Baja California (BC), (2) Sonora (SO), (3) Sinaloa (SI), (4) Colima-Nayarit (CN), and (5) Coahuila-Durango (CD) states of Mexico. Samples were placed on ice and brought to the laboratory and stored at −80 • C until processed. Plants were collected in non-protected areas; additionally, an herbarium was established with most of the plant specimens.

DNA Isolation, RCA, and Library Construction
Total DNA was extracted from individual plants using the CTAB method [44], and the isolated DNA, upon spectrophotometric estimation of its concentration, was used as template for PCR-based Begomovirus detection using degenerate universal primers (Supplementary Table S3). For each sampling region, total DNA from Begomovirus PCR-positive plants, belonging to the same plant species, was mixed in equimolar concentration. Following this procedure, 100 ng of each DNA mixture was used for circular DNA-molecule enrichment by rolling circle amplification (RCA) using the illustra TempliPhi DNA Amplification Kit (GE Healthcare, Milwaukee, MA, USA), following the manufacturer´s instructions. Then, all RCA products obtained per sampling region were pooled in equimolar concentrations, and cleaned using phenol:chloroform:isoamyl alcohol (25:24:1)/potassium acetate (5 M) and 100 ethanol % precipitation (1/10 v/v, 1/2 v/v, respectively). DNA integrity was analyzed by agarose gel electrophoresis, and the cleaned RCA mixtures were used for NGS library construction that was sequenced by a commercial facility (LANGEBIO, Irapuato, GTO, MX) using Illumina Nextera XT paired end 2 × 150 bp protocol on a MiSeq 500. The same procedure was followed for each sampling region to obtain one library per region (for a total of five libraries).

Metagenomic Analysis of Geminivirus-Related Signatures
Reads obtained from each library were trimmed employing the trimmomatic tool [45] (parameters (TRAILING: 30, HEADCROP:5) followed by quality check analysis by FASTQC (https://www.bioinformatics.babraham.ac.uk). Each library was filtered for human, bacteria, plant, and eukaryotic viruses reads using the ViromeScan pipeline [40] in order to obtain the geminivirus-related reads. All filtered libraries were subjected to de novo assembly using SPAdes [46]; both contigs (≥78 bp) and unassembled reads were compared against the GeneBank non-redundant database using BLASTn hosted in the Galaxy server [47]. Geminivirus-related signatures were sorted by contig length and analyzed manually. Contigs obtained in the present study are available at: https://www.dropbox.com/sh/ha6pkzls9217dhf/AAADNUa0TfYj3EZ8bb315cSga?dl=0.

Begomovirus Full-Length Genome Amplification, Cloning, and Sequence Analysis
Full-length geminiviral genomes were obtained following a previously described protocol [48]. In brief, the total DNA (100 ng) from selected non-cultivated plants was used as a template for viral circular DNA genome enrichment by RCA, as mentioned above. To obtain the viral monomeric full-length genomes, the RCA products were digested with a selected a single-site restriction enzyme (BamHI, EcoRI, XbaI, or XhoI, depending on the virus under analysis). The expected linearized geminivirus full-length genomes (~2.7 kb) were recovered from 1% ultrapure agarose gels using PureLink Quick Gel Extraction Kit (Thermo Fisher Scientific, USA) according to the manufacturer´s instructions. The fragments were ligated into linearized pGreen 0029 plasmid [49] that was digested with the corresponding restriction enzymes. The resulting recombinant plasmids were transformed in E. coli DH5α, and positive clones were subjected to Sanger genome walking sequencing. Genome assemblies were obtained using SeqMan (DNASTARInc, Madison, WI, USA) and SnapGene (GLS Biotech LLC, Chicago, IL, USA) software. All pairwise comparisons were performed using the MUSCLE algorithm implemented in Mega 7 [50] and maximum likelihood phylogenetic tree(s) were constructed on both begomoviral DNA components, with a 1000 bootstrap on both components to assess branch support. To analyze the nucleotide and amino acid identity, open reading frames (ORFs) were separated and individually compared with highest match homologous genome of each virus obtained from NCBI databank, using the ClustalW algorithm implemented in Mega 7.

Results and Discussion
It is an accepted notion that global warming will have an impact on global food security. In particular, crop yields are predicted to significantly decrease considering the "worst" CO 2 -emission scenario (A1FI) put forward by the Intergovernmental Panel on Climate Change [51]. Plant pathogens will have varying responses to climate change and plant-pathogen warfare is expected to be altered [52], imposing negative, neutral or positive effects on yields depending on the host-pathogen-environment tripartite interaction (known as the 'disease triangle'). Disease pattern changes are anticipated due to alterations in the host range of plant pathogens, especially in rapidly evolving pathogens, while disease severity will be influenced by increased CO 2 , heavy rains, increased humidity, drought, and warmer winter temperatures [53]. Studies towards the understanding of the existing genetic diversity of plant viruses occurring in the agro-ecological interface, the generation of new genomes with features advantageous to the pathogen, and the relationship with their corresponding vectors will contribute significantly to humankind's preparation to adapt to climate change and to sustain future food production. In this study, high-throughput sequencing (HTS) was employed to determine begomoviral diversity in plant samples collected in Northern-Pacific Mexico.

Non-Cultivated Plants from Northern-Pacific Mexico Region as a Reservoir of Begomoviruses
Plant viruses have generally been studied as disease-causing infectious agents that have a negative impact on their hosts [54]. Considering the wide diversity of geminiviruses, non-cultivated plants may serve as reservoirs for known, agriculturally relevant viruses: such weed-hosted viral species hold great potential to initiate the future emergence of new diseases in cultivated plants [55].
To determine begomoviral diversity in Northern-Pacific Mexico, a survey was performed in the agro-ecological interface in the vicinity of crop sites during 2012-2016. Sampling areas were divided in five regions: Baja California, Sonora, Sinaloa, Colima-Nayarit, and Coahulia-Durango, with three, four, seven, two, and four sampling points, respectively ( Figure 1a). A total of 422 non-cultivated plants (both symptomatic and asymptomatic), belonging to 34 families and 132 species were identified (Supplementary Table S1 and Supplementary Figure S1). Among the different families identified in each region, the most commonly distributed were the Astaraceae, Solanaceae, Malvaceae, and Fabaceae. It is worth mentioning that those families were found to be the most predominant in natural ecosystems considering 200 plant families described in Mexico [42,56]. Using degenerate universal primers based on the DNA A genome of the genus Begomovirus, we were able to amplify the expected DNA fragment of 950-1100 bp in 252 out of the 422 (60%) tested individual plant specimens, indicating that begomoviruses were present in 29 plant families collected from all five regions sampled (Table 1). These results suggest that a number of non-cultivated plants widely distributed in Northern-Pacific Mexico represent a reservoir for begomoviruses.

Metagenomics Study Reveals a Number of Geminviruses from Non-Cultivated Plants
Following the pipeline described in the Materials and Methods section, NGS resulted in 16 to 215 million reads for the five libraries. After human, bacteria, plant, and eukaryotic virus sequence depletion, an NCBI-GenBank database search was performed to identify the most closely related geminivirus sequences, obtaining between 6000 and 4.6 million reads ( Table 2). Subsequent annotation showed that more than 99% of the reads correspond to the genus Begomovirus, and the remaining 1% matches the Geminiviridae genera Curtovirus, Becurtovirus, Turncurtovirus, Topocuvirus and Mastrevirus. No sequences with homology to genera Capulavirus, Eragrovirus, and Grablovirus were identified in our study. The genus Begomovirus has been reported previously as the most widely distributed in Mexico [3,4,16,18,[57][58][59]; in addition, sporadic reports on other genera like Curtovirus and Grablovirus were described [60,61]. Our data pointed out the abundance and importance of the genus Begomovirus in Mexico; however, follow up studies of the other genera are imperative. The de novo assembly of geminivirus-related reads resulted in 24,289 geminivirus-related contigs ranging from 78 to 2858 bp in length ( Table 2). The generated contigs were used to search the NCBI-GenBank database in order to identify the most closely related geminivirus exemplars at the species level (Supplementary Figure S2). Table 3 shows all geminivirus-related signatures ≥300 bp and with at least 80% similarity at the nucleotide level with the best match, regardless whether DNA A or B viral components were detected. Similar findings were described in a metagenomics analysis in whiteflies, in which only one DNA component of a bipartite begomovirus was retrieved [62]. Additionally, the geminivirus-related signatures with 100-300 bp and/or below <80% similarity at the nucleotide level with the best match in NCBI gene sequences are listed in Supplementary Table S2. It is important to note that short geminivirus-related signatures could hinder the correct classification of a begomovirus species or strain; nonetheless, profiling the phylogenetic composition of the viral communities is pivotal as a significant part of the different plant-virus-environment interaction.
The  (Table 3). Furthermore, nine non-cultivated plant-adapted viruses included only begomoviruses with bipartite genomes such as Solanum mosaic Bolivia virus (SoMBoV-signature), present in two regions, Sida mosaic Sinaloa virus (SiMSiV-signature) present in five regions, Sida golden yellow spot virus (SiGYSV-singnature) present in one region, Malvastrum bright yellow mosaic virus (MaBYMV-signature) present in three region, Rhyncosia golden mosaic virus (RhGMV-signature) present in five regions, and Rhyncosia golden mosaic Sinaloa virus (RhGMSV-signature) present in two regions, Euphorbia mosaic virus (EuMV-signature) present in three regions, Euphorbia yellow mosaic virus (EuYMV) present in one region and Blechum leaf curl virus (BlelCV-signature) present in one region, (Table 3). Interestingly, the observation of Tomato pseudo-curly top virus (TPCTV) in samples from Colima-Nayarit is the first report of the genus Topocovirus in Mexico. The list of geminiviruses are grouped as a potential of new viruses or strain of the best match virus, with molecular and biological validation being necessary.
The genus Begomovirus comprises the most common DNA viruses responsible for several virus-associated plant diseases in Mexico. Among them PHYVV, an endemic virus, and PepGMV have been documented as the most widespread and predominant in pepper crops [5,14,57,63]. It is noteworthy that the introduction of the promiscuous TYLCV in Yucatán and later in Sinaloa states, with a dramatic impact on crop yield, became the major concern for tomato crops in Northern Mexico [64]. Interestingly, TYLCV did not exclude the "native" viruses, and its co-infection with PHYVV or PepGMV caused severe disease in pepper crops [65]. However, the presence of such viruses in non-cultivated plants, as alternative hosts in nature, increases the chances of viral evolution through recombination events or other mechanisms, representing a latent threat. In fact, a new isolate of PHYVV described in pepper had significant sequence changes on its DNA B genome, which confers a modified host range since this isolate was able to infect tomato plants causing severe symptoms [5,66]. The other invasive virus which was introduced in Sonora state, apparently from the Middle East, was WmCSV [6], and it was detected in the present study along with SLCV. Our results suggest that WmCSV and SLCV are present in non-cultivated plants collected in Coahuila-Durango region (Table 3), which implies that WmCSV virus has the potential to spread in Mexico, and by adapting to new environments it has the potential to initiate an emerging disease in a new region. It is important to mention that both viruses could interact in mixed infection inducing more severe symptoms on crops as reported in Jordan [67]. The detection of ToChLPV, ToSLCV, OYMMV, BCaMV, and BYMMV, that were reported previously in Mexico, indicate that they still occupy an ecological niche and could be a potential source of viral disease. The identification of SiMSiV and RhGMV in all regions sampled is intriguing. Perhaps both of them represent viruses that are well adapted to different hosts and environments with a potential risk to evolve into an emerging disease. SiMSinV was initially reported in Sinaloa state associated to Sida rombifolia [17] without a negative impact on crops described to date. On the other hand, RhGMV was previously reported as the cause of disease in tobacco and soybean [4,18]. It is well known that viruses adapted to non-cultivated plants do not normally induce disease symptoms in their hosts. Nonetheless, SiMSiV and RhGMV induce symptoms in their corresponding first reported hosts (Sida rombifolia and Rhyncosia minima, respectively), with different wild species (acting as reservoirs) being suspected as the origin of the inoculum. Moreover, EuMV and EuYMV were reported previously in Euphorbia heterophylla in Mexico and Brazil [68,69]; whereas BlelCV, is a novel virus recently described in Chiapas state [70]. To the best of our knowledge, the ChiLCV, PepLRV, ToYSV, PYMV, CabLCV, SPLCV, MaBYMV, and ViYMV geminivirus-related signatures have not been previously described and/or associated to disease in Mexico and represent potential strains and/or novel viruses whose biological role needs to be determined in the immediate future. It is worth mentioning that viruses like ChilCV, PepLRV, and ToYSV are already associated with pepper, bean, and tomato diseases in Pakistan, Peru, Ecuador and Brazil [71][72][73].

Molecular Validation of the Predominant Begomoviruses Identified by HTS
The ecology of begomoviruses identified by HTS studies in non-cultivated plants requires more effort in order to understand the contribution of these plants for disease development [74][75][76]. Non-cultivated plants could be a source of viral inoculum to cultivated plants [77][78][79][80][81][82][83] and could contribute to viral evolution. Here, we described some non-cultivated plants at the agro-ecological interface carrying geminivirus-signatures (Table 3 and Supplementary Table S2). The information acquired represents important progress towards elucidating the above-mentioned issues, but more work is needed for the validation of the described identities.
According to our survey, plants belonging to the Fabaceae, Malvaceae, and Solanaceae families are the most widely distributed in all sampled areas. The HTS analysis revealed that signatures corresponding to TYLCV, SiMSiV, and RhGMV were present in all five NGS libraries, whereas the RhGMSV-signature was detected only in two out of five NGS libraries, suggesting that these species of begomovirus are predominant. Initially, the presence of these four viruses was confirmed by using sequence-specific primers for each viral species (Supplementary Table S3), testing an individual plant of the corresponding family for viral presence. To obtain the full-length viral genome of detected viruses, total DNA of Nicotiana glauca from Sinaloa (TYLCV PCR-positive); of Sida acuta from Colima (SiMSiV PCR-positive), and two Rhynchosia minima both from Sinaloa (PCR-positive for RhGMV/RhGMSV), were used for RCA amplification and cloning. Sequence analysis of obtained clones is summarized in Tables 4 and 5.

Ecogenomic Analyis of Predominant Begomoviruses
The metagenomic approach carried out in the present work, allowed us to identify the geminiviral diversity present in different plant families and geographical regions; however, it is crucial from an ecological point of view to determine the occurrence and dynamics of the viral community in non-cultivated plants.
The evolution of plant viruses is a complex process involving multiple ecological and genetic factors resulting in host-pathogen co-evolution. Studies on viral diversity have been documented in a wide number of non-cultivated plants, with new entries described either as different strains of existing viruses or as new viruses [32,62,68,[102][103][104][105][106]. However, it is noteworthy that viral species are often co-infecting the same plants with the possibility of genetic interaction, giving raise to inter-or intra-specific recombinant viruses resulting in more severe strains, as in the case of cassava mosaic diseases (CMD) [104,107] and tomato yellow curl diseases (TYLCD) [23][24][25]108]. Plant disease emergence requires that a virus from a reservoir host invades a new host resulting in new infection dynamics and viral adaptation [109]. In this sense, some geminiviruses acquired the ability to form a complex disease by assorting DNA A and B genomes, like ACMV and EACMV in Uganda [107] or those cases where the DNA A genome is similar to bipartite begomoviruses but the DNA B has not yet been described, such as TChLPV and ToSLCV in Mexico [7,58] or Datura leaf curl virus (DaLCV) in Sudan [110]. In the present work, the geminiviral diversity was described in different Mexican regions, showing that TYLCV, SiMSinV, PHYVV, RhGMV and RhGMSV are the predominant viruses. In addition, different multiviral complexes were detected in several plant species, highlighting the high frequency of mixed infections detected ( Figure 3). However, a relevant issue is the wide host range observed for these viruses and that the genetic structure of the virome could be modified in an unpredictable manner. Moving forward, work is in progress aiming to determine whether new viruses or strains constitute a potential risk for Mexican agriculture.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/11/7/594/s1, Figure S1: Representation of non-cultivated plants belonging to predominant plant families collected in natural ecosystem in five regions of northern-pacific Mexico, Figure S2: PCR detection of five geminivirus species in three predominant plant families of non-cultivated plants from northern-pacific Mexico, Table S1: Non-cultivated plants collected from northern-pacific regions of Mexico and determination of begomovirus host by PCR-test, Table S2: Geminivirus signatures obtained by de novo assembly from the metagenomic study, Table S3: List of PCR primers used in the present work title.