Begomovirus-Associated Satellite DNA Diversity Captured Through Vector-Enabled Metagenomic (VEM) Surveys Using Whiteflies (Aleyrodidae)

Monopartite begomoviruses (Geminiviridae), which are whitefly-transmitted single-stranded DNA viruses known for causing devastating crop diseases, are often associated with satellite DNAs. Since begomovirus acquisition or exchange of satellite DNAs may lead to adaptation to new plant hosts and emergence of new disease complexes, it is important to investigate the diversity and distribution of these molecules. This study reports begomovirus-associated satellite DNAs identified during a vector-enabled metagenomic (VEM) survey of begomoviruses using whiteflies collected in various locations (California (USA), Guatemala, Israel, Puerto Rico, and Spain). Protein-encoding satellite DNAs, including alphasatellites and betasatellites, were identified in Israel, Puerto Rico, and Guatemala. Novel alphasatellites were detected in samples from Guatemala and Puerto Rico, resulting in the description of a phylogenetic clade (DNA-3-type alphasatellites) dominated by New World sequences. In addition, a diversity of small (~640–750 nucleotides) satellite DNAs similar to satellites associated with begomoviruses infecting Ipomoea spp. were detected in Puerto Rico and Spain. A third class of satellite molecules, named gammasatellites, is proposed to encompass the increasing number of reported small (<1 kilobase), non-coding begomovirus-associated satellite DNAs. This VEM-based survey indicates that, although recently recovered begomovirus genomes are variations of known genetic themes, satellite DNAs hold unexplored genetic diversity.


Introduction
Begomoviruses (family Geminiviridae) and their associated satellite DNAs, known as alphasatellites and betasatellites, form complexes that cause devastating diseases in agricultural systems [1][2][3][4]. These begomovirus-satellite complexes infect a wide range of dicotyledonous plants within at least 37 different genera in 17 families, spanning vegetable and fiber crops, ornamentals (BCTV), a curtovirus (Geminiviridae) not known to naturally associate with satellite molecules, with betasatellites complementing host defense suppression mechanisms of this non-cognate helper virus [20,25]. Despite the observed replicative promiscuity of betasatellites, phylogenetic analyses indicate that these molecules group according to the host from which they were originally isolated [12] and suggest that adaptation of betasatellites to their cognate helper begomoviruses for replication results from co-evolution [26]. In contrast, alphasatellites do not appear to have affinity to specific helper begomoviruses and, thus, may be relatively mobile [7,11]. For example, OW bipartite begomoviruses that are not naturally associated with alphasatellites can facilitate the systemic movement of these molecules in host plants [7,27,28]. Moreover, in laboratory experiments, the curtovirus BCTV successfully mediated the systemic movement of an ageratum yellow vein virus-associated alphasatellite and allowed the transmission of this satellite by the BCTV leafhopper vector, thus implying trans-encapsidation of the alphasatellite [27].
The formation of new disease complexes through the recruitment of satellite DNAs by unrelated begomoviruses during mixed infections presents a serious concern for agriculture [1,2,29]. Furthermore, a recent study described the detection of an unprecedented mastrevirus-alphasatellite-betasatellite complex in symptomatic wheat plants collected from fields in India, highlighting that satellite DNAs may also be naturally recruited by non-begomovirus species [30]. In addition, alphasatellite and betasatellite molecules may recombine, resulting in the emergence of new satellite DNAs [31]. The promiscuous nature of alphasatellites and betasatellites, coupled with the adaptive potential of begomoviruses due to their genome plasticity [32,33], may lead to the emergence of new damaging begomovirus-satellite complexes and pose a risk to agricultural regions not yet impacted by these disease complexes [2,4,11]. Therefore it is critical to gain a better understanding of the diversity and distribution of satellite DNAs.
This study surveyed begomoviruses and associated satellite DNAs present in whiteflies collected from multiple crops and native vegetation in several countries through vector-enabled metagenomics (VEM; where virus particles are purified and sequenced directly from insect vectors [34][35][36]). The VEM approach led to the detection of begomoviruses, including novel species, in all the locations [37]. Here we discuss the satellite DNAs identified through the VEM survey, which include several novel alphasatellites in the NW. In addition, this study expands a class of small satellite DNAs similar to previously sequenced satellites associated with begomoviruses infecting Ipomoea spp. Genomic features for these Ipomoea satellites are described and compared to the increasing number of reported small (<1 kilobase (kb)), non-coding begomovirus-associated satellite DNAs, named here gammasatellites.

Whitefly Collection, Sample Processing, Metagenomic Sequencing and Data Analysis
This study analyzes data from a recent VEM study investigating begomovirus diversity using whiteflies; thus, detailed methods have been previously published [37]. Briefly, adult whiteflies were collected from various crop fields and uncultivated native vegetation from geographically distant locations (Guatemala, Israel, Puerto Rico, Spain, and United States) using battery-operated vacuums and manual aspirators (Table 1). Whiteflies were collected from weeds (Solanum nigrum) as well as the following crop plants: bean (Phaseolus vulgaris), eggplant (Solanum melongena), pumpkin (Cucurbita maxima), squash (Cucurbita pepo), and tomato (Solanum lycopersicum). Whitefly specimens (100-350 whiteflies per field site) were homogenized in SM Buffer (50 mM Tris¨HCl, 10 mM¨MgSO 4 , 0.1 M NaCl, pH 7.5) and filtered through a 0.22 µm Sterivex filter (Millipore, Billerica, MA, USA) to partially purify virus particles before DNA extraction and sequencing. Sterivex filters were stored at´80˝C and used for a PCR assay to distinguish between different whitefly species and B. tabaci phylogenetic groups based on the mitochondrial cytochrome c oxidase I gene [38]. DNA was extracted from 200 µL of filtrate using the QIAmp MinElute Virus Spin Kit (Qiagen, Valencia, CA, USA) following manufacturer's instructions and used as template for rolling circle amplification (RCA) using the illustra TempliPhi DNA Amplification Kit (GE Healthcare, Little Chalfont, Buckinghamshire, UK) to enrich for small circular genomes and DNA molecules [39]. Six replicate RCA reactions were performed for each sample. RCA replicates for each sample were pooled before sequencing through multiplexing at a commercial facility using the 454 GS FLX System (Roche, Indianapolis, IN, USA).
Metagenomic reads (average read length 263 nt) from each sample were dereplicated using default settings in the CD-Hit web server [40]. Only reads longer than 100 nt were used for assembly (minimum identity of 98% over 35 nt) using Geneious version R7 (Biomatters, Newark, NJ, USA). Both contigs and unassembled reads were compared against the GenBank non-redundant database using BLASTn and BLASTx (e-value < 0.001) [41]. Sequences with matches to begomovirus-associated sequences, including satellite DNAs, were identified and sorted using the Metagenome Analyzer (MEGAN4) software [42]. Analyzed contigs and unassembled reads generated from the different libraries are publicly available on the METAVIR web server [43] under project names "Whiteflies" and "Whiteflies_Unassembled".

Satellite DNA Molecule Completion
Contigs and/or unassembled reads potentially representing new begomovirus-associated satellite DNAs, including alphasatellites (<83% sequence identity to known species; [44]), betasatellites (<78% sequence identity to known species; [45]), and small (<1 kb) satellite DNAs, were used to design back-to-back (abutting) primers for inverse PCR assays to recover full-length DNAs. Inverse PCRs were performed using the HerculaseII Fusion DNA Polymerase (Agilent Technologies, Santa Clara, CA, USA) and products were cloned using the CloneJET PCR Cloning Kit (Thermo Scientific, Waltham, MA, USA). All clones were commercially Sanger sequenced with a minimum of 2ˆcoverage using vector primers and primer walking. Satellite DNAs were assembled using the Sequencher software (Gene Codes Corporation, Ann Arbor, MI, USA) and final sequences were inspected using SeqBuilder from the Lasergene software package (DNASTAR, Madison, WI, USA). Open reading frames (ORFs) encoding putative proteins > 70 amino acids were compared against the GenBank non-redundant database for annotation purposes.

Satellite DNA Sequence Analysis
Satellite DNA sequences were compared against reported sequences in GenBank using pairwise comparisons, multiple sequence alignments, and phylogenetic trees. All pairwise comparisons were performed using the MUSCLE algorithm [46] implemented in the Species Demarcation Tool (SDT) version 1.2 [47]. Multiple sequence alignments were performed using the MUSCLE algorithm implemented in the MEGA5 software [48] and edited manually. A maximum likelihood (ML) phylogenetic tree was constructed to evaluate the relationship among divergent alphasatellite sequences reported here and those found in the database (n = 84). For this purpose, predicted Rep amino acid sequences were aligned and a ML phylogenetic tree was constructed with the best-fit model (LG + I + G + F) according to ProtTest [49] using the PhyML server [50] with the approximate likelihood ratio test (aLRT) to assess branch support [51]. Branches with <70% support were collapsed using TreeCollapserCL4 [52].

Overview
VEM was used to investigate the diversity of begomovirus-associated satellite DNAs circulating in various crop fields and uncultivated vegetation from four countries (Guatemala, Israel, Spain, continental United States (California)) and an island in the Caribbean (Puerto Rico) ( Table 1). Pyrosequencing of RCA-amplified viral nucleic acids from whiteflies and BLAST analysis of assembled contigs and unassembled sequence reads allowed for the detection of satellite DNAs in eight out of fifteen metagenomic datasets, with satellites present in all the locations except California (USA) ( Table 1). Whiteflies sampled from most crop plants, except for beans, revealed the presence of satellite DNAs in at least one location. Satellite DNAs were not detected in whiteflies collected from weeds; however, these samples only originated from a single species (Solanum nigrum) in a single location. Seventy two circular satellite-like DNAs, ten of which were verified through inverse PCR, were assembled and completed ( Table 2). Sixty two of these completed satellite DNAs represent a novel class of small satellites similar to satellites associated with begomoviruses infecting Ipomoea spp., named Ipomoea satellites (Unpublished, GenBank accession numbers FJ914390-FJ914405). We propose naming these small (<1 kb), non-protein-coding satellite DNAs as "gammasatellites" to distinguish them from the previously described alphasatellites and betasatellites.

Detection of Protein-Encoding Satellite DNAs and Identification of a New Alphasatellite Clade
Protein-encoding satellite DNA molecules, including alphasatellites and betasatellites, were identified in whiteflies collected in Israel, Puerto Rico, and Guatemala (Table 2). Whiteflies collected from squash in Israel contained an alphasatellite molecule that was most similar to an alphasatellite associated with Cotton leaf curl Gezira (CLCuGe) virus (96% identity) initially identified in Burkina Faso, West Africa [53]. The VEM CLCuGe alphasatellite (1359 nt in size) exhibits an organization similar to that of the African alphasatellite. In addition, multiple contigs from the Israel squash dataset had similarities to betasatellite molecules associated with monopartite begomoviruses infecting okra, named CLCuGe betasatellites or CLCuGeB [45]. Three contigs representing CLCuGeB circular DNAs were assembled, one of which was verified by PCR. These VEM CLCuGeB sequences shared 96%-98% pairwise identity with CLCuGeB molecules identified in Egypt. Although CLCuGe virus was identified in the same metagenomic dataset from Israel [37], it is difficult to establish a direct association between these satellite molecules and their helper viruses in individual host plants. Most of the sequences similar to the VEM CLCuGe alphasatellite assembled into a single contig. In contrast, there were multiple contigs and unassembled sequence reads similar to CLCuGeB, indicating a higher genetic variability for betasatellites than alphasatellites at this location. This is consistent with studies investigating begomovirus-satellite complexes in crops which describe a higher variability of betasatellites than alphasatellites (e.g., [5,8,53]). Rico and Spain (Contig sequences representing complete Ipomoea satellites are provided in Supplemental File S1). b Genome-wide pairwise identities for best match in GenBank. Abbreviations refer to melon chlorotic mosaic virus (MeCMV) and sweet potato leaf curl Lanzarote virus (SPLCLaV). ** The best GenBank match to VEM Ipomoea begomovirus satellite 3 was a portion of the Rep-encoding region of the SPCLaV genome and the reported identity is only within this small region (8% coverage).
The VEM approach also enabled the detection of novel alphasatellite molecules in Guatemala and Puerto Rico, which exhibit typical features of alphasatellite DNAs (Table 2, Figure 1A). Since alphasatellites have rarely been found in the NW [9,10] and detected sequences are divergent from known alphasatellites, all NW alphasatellites were verified by PCR. The alphasatellites detected in Guatemala represent the first alphasatellite DNAs reported from Central America. Three sequences representing two species of alphasatellite molecules, named VEM alphasatellites 1 and 2, were sequenced from one of the Guatemala tomato field sites. The VEM alphasatellite 1 sequence shares approximately 64% identity with that of VEM alphasatellite 2 and both satellite molecules are most similar (~70%-75% identity) to alphasatellite molecules from India. In addition, three alphasatellite molecules representing a single species, named VEM alphasatellite 3, were sequenced from samples collected from tomato and pumpkin fields in Puerto Rico. The VEM alphasatellite 3 molecules range in size from 1300 to 1307 nt and share 93%-99% identity to each other. These molecules are most similar (~85% identity) to an alphasatellite molecule, Cuban alphasatellite 1, recently identified in weeds from Cuba [54]. In general, VEM alphasatellite 3 has a similar organization to Cuban alphasatellite 1, with one major ORF encoding a Rep and an adenine-rich region. However, Cuban alphasatellite 1 has two additional predicted ORFs with coding capacity > 100 amino acids, one overlapping the Rep ORF and another one identified in the complementary strand. All of the VEM alphasatellite 3 molecules also have the ORF overlapping the Rep; however, an ORF on the complementary strand similar in sequence to the one in Cuban alphasatellite 1 was not identified in any of the VEM alphasatellite 3 DNAs. Figure 1. Schematics depicting general genome organization and features observed in novel alphasatellites (A) and Ipomoea satellites (B) detected during this study. All genomes are characterized by a putative origin of replication (ori) marked by a stem-loop structure containing a conserved nonanucleotide motif. Alphasatellites encode a replication-associated protein (Rep) and exhibit an adenine-rich (A-rich) region. Ipomoea satellites do not exhibit any coding regions; however, these gammasatellites contain an A-rich region as well as a conserved region~100 nt long shared among various gammasatellites (γSCR). The γSCR contains a 23 nt stretch where all gammasatellites reported to date share high similarities with betasatellites, named here the satellite common region (SCR) core. Some Ipomoea satellites from Puerto Rico share high identities with a 66 nt stretch (highlighted in black) found within the rep gene of sweet potato leaf curl virus (SPLCV) genomes detected in the same region.
Phylogenetic analysis of predicted alphasatellite Rep amino acid sequences indicates that the NW alphasatellites reported here and those reported from South America [9,10] and the Caribbean [54,55] do not cluster together ( Figure 2). However, the Rep sequences of most NW alphasatellites share >73% amino acid pairwise identity and form a clade. This "NW alphasatellite-dominated" clade is most closely related (51%-55% amino acid identity) to ageratum yellow vein Singapore alphasatellite (AYVVSGA) molecules from Singapore and Oman, which were originally described as DNA-2 [21,27]. These "DNA-2-type" alphasatellites are phylogenetically distinct from other OW alphasatellites formerly known as DNA-1 [27]. Since the "NW alphasatellite-dominated" clade is also distinct from DNA-2-type alphasatellites, we have designated members of this clade as DNA-3-type alphasatellites. In addition to alphasatellites identified in the NW, the DNA-3-type alphasatellite clade contains two unique Rep sequences from alphasatellites reported from India, including croton yellow vein mosaic alphasatellite which is closely related to alphasatellites reported from Brazil and Venezuela [56]. There are two NW alphasatellite sequences that are distantly related to those of the DNA-3-type alphasatellite clade (shown in bold font in Figure 2), instead being more closely related to DNA-1-type alphasatellites from the OW. One of these sequences was detected in whiteflies collected from tomato plants in Guatemala (VEM alphasatellite 2) and the other was reported from dragonflies collected in an agricultural field in Puerto Rico (Dragonfly-associated alphasatellite; [55]). Therefore, these data demonstrate that more than one "type" of alphasatellite molecule is present in the NW. Midpoint-rooted maximum likelihood phylogenetic tree of predicted alphasatellite and nanovirus replication-associated protein sequences. Alphasatellites detected during this study are highlighted with a red star. Three groups of alphasatellites based on amino acid pairwise-identities are highlighted in shades of grey, including DNA-1-type alphasatellites from the Old World, DNA-2-type, and DNA-3-type alphasatellites mainly identified in the New World. Branches exhibiting less than 70% branch support were collapsed. Branches with approximate likelihood ratio test (aLRT) support > 91% are indicated with black circles, whereas branches exhibiting 85%-90% support are marked with a white circle. A list of sequences used for phylogenetic analysis and Rep pairwise identities are provided in Supplemental File S2. DNA-2-and DNA-3-type alphasatellites share characteristics that are absent in other alphasatellites. One of the first alphasatellites detected in the NW, DNA-3-type melon chlorotic mosaic virus alphasatellite, contains a putative ORF encoding >100 amino acids that overlaps the rep [9]. Both DNA-2-type and DNA-3-type alphasatellites, except for euphorbia mosaic alphasatellite (Brazil), contain this putative ORFwhereas this ORF is not present in alphasatellites outside this group. It is currently unknown if this ORF is expressed and what role, if any, its product plays in infectivity. Finally, DNA-2-and DNA-3-type alphasatellites have a 13 amino acid insertion in the Rep C-terminus that is absent in all other alphasatellite Reps. This unique 13 amino acid stretch was originally noted in alphasatellites from Brazil and Singapore [10]; however, our analysis suggests that this insertion is a common feature of DNA-2-and DNA-3-type alphasatellite Reps.

Detection of Non-Coding Small Satellite DNAs (Gammasatellites)
A novel class of small (~640-750 nt in size) satellite DNAs associated with begomoviruses infecting Ipomoea spp., named Ipomoea satellites, were detected in samples from Spain and Puerto Rico ( Table 2). Non-coding begomovirus-associated satellite DNAs smaller than 1 kb have been increasingly identified in the past few years [35,57]. Here we propose that these small non-coding satellite DNAs represent a third class of satellite DNAs, named "gammasatellites", that are distinct from alphasatellites and betasatellites, which are both protein-coding satellite DNAs >1 kb. Ten satellite sequences assembled from the tomato and squash datasets from Spain had high sequence pairwise identities (>94%) with Ipomoea satellite DNA sequences previously reported from Spain (accession numbers FJ914390-FJ914405). Additionally, fifty-two satellite DNA sequences assembled from the four Puerto Rico datasets shared more than 91% identity to each other but only 60% to 70% similarity to Ipomoea satellite DNA sequences from Spain ( Figure 3A). Since little is known about Ipomoea satellites, three molecules recovered from Puerto Rico were verified through PCR and compared to assembled, full-length Ipomoea satellite sequences to identify conserved features.
Ipomoea satellites reported to date contain several conserved features, including the key characteristics of gammasatellites, which are a size <1 kb and the lack of recognizable ORFs ( Figure 1B). Ipomoea satellites, like other gammasatellites reported to date, exhibit a putative ori marked by the canonical nonanucleotide motif TAATATTAC at the apex of a putative stem loop structure similar to that found in begomoviruses and betasatellites. Similar to other begomovirus-associated satellites [2,11], Ipomoea satellites contain an adenine-rich region. Ipomoea satellites from Spain havẽ 50% adenines over a 200 nt stretch whereas this region in satellites from Puerto Rico encompasses 60-100 nt. In addition, all Ipomoea satellites contain a region where they share more than 80% identity over 100 nt (Figures 1B and 3B). This conserved region is also found in other gammasatellites reported from tomato crops in Australia (tomato leaf curl virus satellite, ToLCV-sat; [58]) and weeds in the Philippines (malvastrum leaf curl virus-associated satellite, MaLCV-sat; accession number KF433066); however, this region is absent in gammasatellites reported from Florida through VEM (VEM-sats) [35] and weeds from Cuba [57]. Pairwise analysis of full-length gammasatellite molecules indicate that molecules containing the conserved 100 nt stretch share higher identities with each other than to those that lack this conserved stretch ( Figure 3A).
Since more than one type of satellite DNA seems to share a satellite common region (SCR), which has been traditionally associated with betasatellites [11], here we distinguish between the SCRs of gammasatellites and betasatellites. We refer to the common region spanning~120 nt that is shared among the majority of betasatellites [12] as the βSCR, whereas the 100 nt long region shared among several gammasatellites is referred to as the γSCR. Embedded within the γSCR there is a 23 nt stretch that is highly conserved (87%-100% identity) among all gammasatellites reported to date. This 23 nt long region was originally identified in gammasatellites from Cuba based on its high identity (95.7%) with a portion of the βSCR [57] (Figure 3B). Sequences representing sweet potato-infecting begomoviruses (sweepoviruses) were also identified in all the datasets containing Ipomoea satellite sequences [37]. Further work would be needed to confirm which helper begomoviruses are associated with the Ipomoea satellites identified in Puerto Rico and Spain. However, it is notable that approximately half of the Ipomoea satellite molecules identified from Puerto Rico share 89% pairwise identity with a small region (66 nts) of the rep gene of the three sweepovirus genomes identified in samples from the island ( Figure 1B). This shared region potentially links some of the Ipomoea satellites with sweet potato leaf curl virus (SPLCV). Since this region is not a conserved feature among the identified satellites from Puerto Rico, it may not be a crucial component for the biology of these molecules. Moreover, a similar region was not identified in known SPLCV genomes and reported Ipomoea satellites from Spain.

Discussion
Here we implemented the VEM approach to survey begomovirus-associated satellite DNAs circulating in crop fields and amongst weeds located in different countries by sampling their whitefly vector. The VEM survey resulted in the detection of satellite DNAs in most of the locations and further expanded the known diversity of these molecules. While satellite DNA molecules have traditionally been associated with OW monopartite begomoviruses, recent studies have identified alphasatellite and novel gammasatellite molecules associated with bipartite begomoviruses in the NW, indicating that satellite DNAs are more widespread than previously thought [9,10,54,55]. All three types of satellite DNAs, namely alphasatellites, betasatellites, and gammasatellites, were identified amongst the OW samples investigated in this study. Both alphasatellites and betasatellites potentially associated with CLCuGe virus were identified in Israel. Ipomoea satellites found in Spain were highly similar to sequences deposited into GenBank from the same regions (accession numbers FJ914390´FJ914405). Although no novel satellite DNAs were detected in samples from the OW, the VEM approach revealed the presence of novel alphasatellites in Puerto Rico and Guatemala as well as unique Ipomoea satellites in all the samples collected in Puerto Rico.
Including the sequences described in this study, there are currently 11 known NW alphasatellites, suggesting that alphasatellite molecules are widespread in the NW. These alphasatellites represent six species according to the suggested 83% pairwise identity demarcation criteria for alphasatellites [44]. These NW alphasatellites have been identified from symptomatic plants in South America (Brazil and Venezuela) [9,10] and Cuba [54], as well as in insects, including alphasatellites accumulated in whiteflies (present study) and top insect predators (dragonflies) collected in Guatemala and Puerto Rico [55]. Based on the few available sequences, individual NW alphasatellites may have a fairly limited geographic distribution. VEM alphasatellites 1 and 2, representing two new species, were only detected in whiteflies collected from a single tomato field in Guatemala. Similarly, none of the alphasatellites reported from South America have been identified in more than one country. However, VEM alphasatellite 3 detected in Puerto Rico may represent the same species as Cuban alphasatellite 1. The Cuban alphasatellite 1 was only identified in a single plant and it was speculated that there was a specific association between tomato yellow leaf distortion virus (ToYLDV) and the Cuban alphasatellite in Sida plants [54]. However, VEM alphasatellite 3 sequences were detected in whiteflies feeding on tomato and pumpkin in farms separated by~2.3 kilometers. Moreover, only contig sequences with low identities (~80% identity) to ToYLDV DNA-B were identified in metagenomic libraries from Puerto Rico [37]. Therefore, VEM alphasatellite 3 may associate with a different strain of ToYLDV or a different begomovirus species. Additional studies focused on identifying alphasatellites in the NW are needed to determine the biogeography of these molecules and identify their helper viruses.
Phylogenetic analysis of Rep amino acid sequences revealed that NW alphasatellites reported to date are not monophyletic (Figure 2). Although NW alphasatellites do not form a monophyletic group, most sequences reported from the NW form a well-supported clade, named here DNA-3-type alphasatellite clade, and share >73% identity. The alphasatellites encompassing the DNA-3-type clade are dominated by sequences from the NW and are most closely related to sequences representing AYVVSGA, with which they share genomic features not found in other alphasatellites. AYVVSGA has been identified as an "unusual" alphasatellite due to the shorter size (295 amino acids) and low sequence identity (<53% sequence identity) of its Rep compared to the Reps encoded by other alphasatellite molecules [7,27]. Therefore AYVVSGA was named DNA-2 [27] to distinguish this divergent satellite molecule from other OW alphasatellites (formerly known as DNA-1, [59]) which share more than 73% identity among each other (Figure 2). Despite the similarities observed between the DNA-2-and DNA-3-type alphasatellite sequences, DNA-2-type alphasatellites are unique in comparison to DNA-3-type alphasatellites identified in the NW. Reps encoded by DNA-3-type alphasatellites are~315 amino acids long and share 51-55% amino acid identity with AYVVSGA, which make these alphasatellites as distinct from DNA-2-type alphasatellites as these molecules are from OW DNA-1-type alphasatellites. Furthermore, in contrast to both DNA-1-and DNA-2-type alphasatellites, DNA-3-type alphasatellites for which the helper virus is known are only associated with bipartite begomoviruses in infected plants [9,10,54]. Interestingly, the DNA-2-type alphasatellite is the only one that has been shown to associate with monopartite and bipartite begomoviruses from the OW and NW, respectively, in an experimental setting [27]. Therefore the DNA-2-type AYVVSGA may represent an intermediate species between OW DNA-1-type and NW DNA-3-type alphasatellites.
The identification of the DNA-3-type alphasatellite clade with unique characteristics that differ from OW alphasatellites raises the possibility that there are alphasatellites that are indigenous to the NW and, thus, have a different evolutionary history compared to their OW counterparts. It should be noted that the DNA-3-type alphasatellite clade includes two sequences from India. If this clade indeed represents a NW lineage of alphasatellites, this would suggest that the DNA-3-type alphasatellites from India represent NW satellite DNAs imported to Asia. Alternatively, DNA-3-type alphasatellites may have spread to the NW from the OW. There are two NW alphasatellites, VEM alphasatellite 2 from Guatemala (present study) and Dragonfly-associated alphasatellite from Puerto Rico [55], that are distantly related to the DNA-3-type alphasatellite clade. Therefore it is likely that there have been at least two independent events that led to the emergence of alphasatellites in the NW. These two divergent alphasatellites are more closely related to alphasatellites identified in Africa, Asia, and the Middle East, including DNA-1-type alphasatellites, than they are to the DNA-3-type alphasatellite clade ( Figure 2). Further sampling is needed to help resolve the DNA-3-type alphasatellite clade and evaluate if divergent NW alphasatellites that are more similar to OW alphasatellites have been introduced from the OW or represent different NW alphasatellite lineages. Since alphasatellites from both the DNA-3-type alphasatellite clade and those similar to OW alphasatellites have been identified in the same regions (i.e., Guatemala and Puerto Rico), geographic location is not the only factor influencing the distribution of alphasatellites in the NW.
A further distinction between NW and OW alphasatellites is that OW alphasatellites are mainly associated with begomovirus-betasatellite complexes [5,8,11,12], whereas NW alphasatellites do not seem to be associated with betasatellites. There are no reports of betasatellites in the NW, even though studies have specifically searched for betasatellite DNAs while investigating NW alphasatellites in plant specimens [9]. Furthermore, deep sequencing of RCA products recovered from plants [54] and whiteflies (present study) have not revealed the presence of betasatellites. It is possible that NW alphasatellites associate with divergent satellite DNAs that may not be recognized through standard BLAST searches. Alternatively, NW alphasatellites may not be associated with other satellite DNAs, solely relying on their helper begomoviruses [10].
Finally, this study significantly increased our knowledge of gammasatellites, which are non-coding begomovirus-associated satellite DNAs smaller than 1 kb. Although a discernable role in disease development has not yet been observed for gammasatellites, these satellite DNAs seem to be widespread since they have been previously reported from Australia [58], Philippines (Accession Number NC021929), Spain (accession numbers FJ914390-FJ914405), Cuba [57], and Florida [35]. A subset of gammasatellites, named Ipomoea satellites, was detected in samples from Spain and Puerto Rico. Ipomoea satellites from Puerto Rico exhibited the highest genetic diversity and were distinct from satellites detected in Spain. Analyses of Ipomoea satellites detected here and those previously reported from Spain revealed several conserved genomic features including an ori similar to that observed in begomoviruses, an adenine-rich region, and a γSCR. The γSCR is present in Ipomoea satellites and other gammasatellites associated with monopartite begomoviruses reported from Australia and Philippines. Although the γSCR is not present in all gammasatellite molecules, there is a core region (23 nt long) within the γSCR that is shared (87%-100% identity) among all gammasatellites reported to date, including sequences from geographically distant locations. This core region within the γSCR is similar to a region found within the βSCR, where gammasatellites and most betasatellites share~87% identity. This supports the view that there is an evolutionary relationship between gammasatellites and betasatellites, as has been noted for gammasatellites identified in Cuba [57]. Future studies need to examine the prevalence of gammasatellites, their role in disease, if any, and their evolutionary history.

Conclusions
Since interactions among different begomoviruses and satellite DNAs may lead to the emergence of damaging diseases, it is critical to monitor begomovirus-satellite complexes present in crop fields and native vegetation. VEM using whiteflies is an unbiased sampling approach for detecting begomoviruses in different regions, providing an important glimpse into the reservoir of viral genetic diversity that is often overlooked due to the lack of visible symptoms. In this study, the VEM approach extended the known diversity and geographical range of satellite DNAs. This is the first report of alphasatellites from Central America and Ipomoea satellites from the Caribbean. Furthermore, the genetic information gathered led to the description of a phylogenetic clade (DNA-3-type alphasatellites) dominated by NW sequences and a third class of small, non-coding satellite molecules, named gammasatellites. The novelty of satellite DNAs described here suggests that these molecules hold unexplored genetic diversity. This is in contrast to recently described begomoviruses, which seem to represent variations of known genomic themes [37]. Therefore, satellite DNAs require further study as they may be a key factor driving the diversification of begomovirus-satellite disease complexes.