Development of a Large Set of Microsatellite Markers in Zapote Mamey (Pouteria sapota (Jacq.) H.E. Moore & Stearn) and Their Potential Use in the Study of the Species

Pouteria sapota is known for its edible fruits that contain unique carotenoids, as well as for its fungitoxic, anti-inflammatory and anti-oxidant activity. However, its genetics is mostly unknown, including aspects about its genetic diversity and domestication process. We did high-throughput sequencing of microsatellite-enriched libraries of P. sapota, generated 5223 contig DNA sequences, 1.8 Mbp, developed 368 microsatellites markers and tested them on 29 individuals from 10 populations (seven wild, three cultivated) from Mexico, its putative domestication center. Gene ontology BLAST analysis of the DNA sequences containing microsatellites showed potential association to physiological functions. Genetic diversity was slightly higher in cultivated than in the wild gene pool (HE = 0.41 and HE = 0.35, respectively), although modified Garza–Williamson Index and Bottleneck software showed evidence for a reduction in genetic diversity for the cultivated one. Neighbor Joining, 3D Principal Coordinates Analysis and assignment tests grouped most individuals according to their geographic origin but no clear separation was observed between wild or cultivated gene pools due to, perhaps, the existence of several admixed populations. The developed microsatellites have a great potential in genetic population and domestication studies of P. sapota but additional sampling will be necessary to better understand how the domestication process has impacted the genetic diversity of this fruit crop.


Introduction
The genus Pouteria Aublet (Sapotaceae) has 325 species, many of them known for their edible fruit and quality wood. It is a non-monophyletic genus composed of at least three distinct evolutionary lineages [1,2]. One important species of this genus is P. sapota ((Jacq.) H.E. Moore & Stearn), commonly known as zapote mamey, mamey or mamey rojo [3]. The natural habitat of P. sapota is the tropical and sub-tropical evergreen forests distributed on Mexico's Atlantic slope from North Veracruz to Tabasco state as well as in Mexico's Pacific slope from Jalisco to Chiapas, continuing through Central America up to Panama, and from 0-1300 m above sea level [4,5]. Multiple wild and cultivated populations of P. sapota have been recognized, mainly by fruit characteristics because wild fruits are significantly smaller than cultivated ones. However, there are no formal studies about the existence of subspecies or botanical varieties. The lowlands from Mexico and Central America are the putative center of domestication of P. sapota [5,6]. Mexico is the most important producer of zapote mamey in the world, being Yucatan State the main producing area [7]. In this country, the traditional method of production is still in small orchards or by harvesting wild trees in the regions where they exist, whereas grafting is generally employed in small plantations or traditional home gardens [6]. At present, zapote mamey has been introduced as a fruit crop in South America, the Caribbean, Europe, Asia, Australia, and among other regions on the world [6].
Currently, there is abundant information about the chemical composition of zapote mamey for its high nutritional value as the red fruits contain unique ketocarotenoids, such as cryptocapsinepoxide [8] and sapotexanthin [9], all precursors of vitamin A. The pulp of the fruit can have up to 247 mg of ascorbic acid per 100 g dry weight [10], and the main fatty acid in seed (oleic acid) can be up to 60% of the seed oil [11]. Traditionally, oils extracted from the seeds of P. sapota have been used for multiple medicinal purposes, including the effective control of the fungal skin dermatitis [12]. In fact, antimicrobial, trypanocidal, insecticidal, anti-inflammatory and anti-oxidant activities have been observed in compounds extracted from fruits, leaves, and bark of many other Pouteria species [13]. In contrast, little is known about how the domestication process of P. sapota and to what extend this may have impacted its genetic diversity in populations of vegetatively propagated fruit trees [14,15]. In fact, there is a lack of codominant molecular markers for this species that could help answer these questions. Therefore, genetic diversity has been mainly studied using morphological and biochemical characters of fruits from cultivated germplasms [16][17][18][19]. In recent years, studies have incorporated the use of dominant molecular markers [Random Amplified Polymorphic DNA (RAPD) and Amplified Fragment Length Polymorphism (AFLP)], but they only have analyzed genetic relationships among cultivated germplasms [20][21][22]. Thus, no genetic variations have been associated to the geographic origin of the samples, and no estimators of genetic diversity have been reported because dominant markers such as RAPDs and AFLPs cannot properly estimate genetic diversity.
Currently, there is a lack of molecular tools to study Pouteria species, and data mining is not an option as there is only one entry of 443 bp (GQ402492.1) for P. sapota in NCBI, GenBank. In absence of sequenced genomes, microsatellite-enrichment of genomic libraries has been shown effective to develop molecular markers for genetic studies [23,24]. Microsatellites, also known as simple sequence repeats (SSRs), are a widely applied class of codominant molecular markers used in molecular breeding, genetic conservation, and population genetics [25,26]. Therefore, the objectives of the present study were: (1) develop a large set of SSR molecular markers specific for P. sapota; and (2) to analyze, for the first time, the structure and genetic diversity of wild and cultivated populations of P. sapota collected in Southeast of Mexico, the putative domestication center of this species.
Individual contigs often presented more than one microsatellite, and these simple-sequence repeats were separated by one or more basepairs (bp). Apparently, when the distance in bp between two microsatellites within a contig was ≤10 bp (normally called imperfect repeats), 64%-90% of the markers within that contig were polymorphic. When the distance between microsatellites within a contig was longer, for example 11-343 bp then, ≤50% of the markers derived from those contigs were polymorphic (Supplementary Materials). Only 11 of the 368 microsatellites (2.9%) apparently deviated from Hardy-Weinberg equilibrium. Most repeat motifs detected were 2 bp long, and the most frequent motifs on the sequenced DNA were AG and AGG ( Figure 1). To simplify the recording of the repeat motifs, those that were circular permutations and reverse complements of each other were grouped together as one type, i.e., AAC, ACA, CAA, GTT, TGT and TTG were recorded as AAC. The total length of individual microsatellite sequences was between 8 and 32 bp. About half (183/379) of the contigs containing microsatellites had more than one motif (between 2 and 6), in those cases the distances between repeats within each contig were between 0 and 343 bp. No significant correlation (r = −0.449, non significant) was observed between the total repeat length (8-32 bp) and the polymorphism of the markers using 31 DNA samples for the 193 loci that had a single microsatellite per contig. The discriminating power of the markers was calculated with unique pattern informative combinations (UPIC) software [27] and the results are provided (Supplementary Materials). The smallest combination of markers that could discriminate all 31 samples was calculated using UPIC, and the 10 markers in that combination were: stv-pos_01505_a, stv-pos_00588_a, stv-pos_01512_a, stv-pos_00286_a, stv-pos_02644_b, stv-pos_00756_a, stv-pos_02013_a, stv-pos_01677_a, stv-pos_01580_a, and stv-pos_01632_a, with a total sum of UPIC scores equal to 44 (Supplementary Materials).

Genetic Diversity and Founder Effect
All estimators of genetic diversity were slightly higher in the cultivated than in the wild gene pool of P. sapota from Mexico (Table 1). This is an unexpected result considering the smaller number of cultivated individuals analyzed (Table 1). At present, there are no studies reporting genetic diversity in wild and cultivated populations of P. sapota in the world. Genetic diversity levels have been reported for others species within Sapotaceae. For example, high levels of genetic diversity were reported in 294 individuals (HE = 0.86) of a wild population of masaranduba (Manilkara huberi (Ducke) Standl) from Amazonas using SSRs [28]. Heterozygosity, HE = 0.951, of 20 wild individuals of sapodilla (M. sapota (L.) P. Royen) from Mexico was estimated using SSRs [29]. Heterozygosity, HE = 0.59, for three morphotypes of argan (Argania espinosa (L.) Skeels) from Morocco was determined using SSRs [30]. Furthermore, heterozygosity, HE = 0.35, for seven populations of Puerto Rican bully (Sideroxylon portoricense subsp. Minutiflorum Pittier) from Mexico was obtained using Inter Simple Sequence Repeats (ISSRs) [31]. Though compared with those studies, relatively low levels of genetic diversity were observed in the present work on P. sapota from Mexico (HE = 0.34). However, we believed this estimation of the genetic diversity in these populations is accurate given the large number of loci (approx. 200) analyzed here, and the accuracy of allele calling (1 bp differences detected) when using capillary electrophoresis. The higher level of genetic diversity observed in the cultivated gene pool of P. sapota could suggest that there was no founder effect during the domestication process of this species. However, when the founder effect was evaluated using the modified Garza-Williamson index, it was higher in the wild (0.49) than in the cultivated (0.42) gene pool (Table 1), indicating the possible existence of a founder effect in this last gene pool. This result was supported when founder effect was evaluated using Bottleneck software (cultivated gene pool, P-value = 0.00003). Another finding supporting a founder effect is that in 62 SSR loci analyzed, one or more alleles observed in the wild individuals, mainly in JC population, were not detected in cultivated individuals; this included markers stv-pos_1117, stv-pos_1602, stv-pos_1677, stv-pos_2282, stv-pos_2375, stv-pos_2461, stv-pos_2565, stv-pos_2755, stv-pos_2977, stv-pos_3228, stv-pos_3569, and stv-pos_3748 (Supplementary Materials). The existence of a founder effect in domesticated plants is something common in seed crops [32]. However, zapote mamey is a fruit crop propagated vegetatively by the farmers from Mexico [6]. The domestication process in vegetatively propagated cultivated trees likely involved few recombination and sexual cycles, resulting in cultivated populations that may not have diverged significantly from their wild progenitor [33].
A more alarming and also possible explanation for the lower level of genetic diversity observed in the wild gene pool is that this one had suffered a genetic erosion process because of human activities. The region where the wild populations of P. sapota were collected is a hot spot for potential species extinction as a result of urbanization, agriculture and other land uses [34]. Thus, important germplasms could vanish before their potential uses or benefits are described. Since the goal of gene banks is to preserve agricultural biodiversity, attention should be paid to the 62 SSR markers that showed alleles in wild but not in cultivated populations. BLAST analysis for some of these 62 markers showed potential homology to interesting biological functions, indicating that these markers can serve as a tool to identify valuable germplasms.

Genetic Structure and Cluster Analysis
There was a low genetic differentiation among the 10 studied populations of P. sapota from Mexico. We found an FST = 0.136 among all populations, ranging between pairs of populations from 0.01 to 0.18 ( Table 2). Analysis of Molecular Variance (AMOVA) confirmed these results; only 3.8% of the total variation was between wild and cultivated gene pools ( Table 3). As has been shown, the domestication process in vegetatively propagated cultivated trees can result in cultivated populations that do not significantly diverge from their wild progenitor [34]. Also, this low differentiation could be a result of genetic flow. We found high levels of genetic flow among populations, ranging between pairs of populations from 1.75 to 74.1 (Table 2). Though, there are no studies reporting genetic structure in P. sapota, a few reports about the genetic differentiation in other Sapotaceae species [31] exist; but they are not comparable with the present study as they did not analyze wild and cultivated populations, as shown here.     Table 4.
: distance between two locations in kilometers, not at scale.
The clustering pattern showed by the N-J analysis seems to be consistent with that obtained with the three-dimensional principal coordinate analysis (3D PCoA) (Figure 3). The two individuals from Puerto Rico are clearly separated from individuals of Mexico. Individuals from nine out of eleven populations grouped with other individuals from the same population, except for AA (wild) and AT (cultivated). In this analysis, the first three dimensions (dim-1, dim-2 and dim-3) explained 32.4%, 26.8% and 22.2% of the genetic variation, respectively ( Figure 3).
The clustering pattern of the Mexican P. sapota populations was further assessed on the basis of the assignment tests carried out with STRUCTURE. The Evanno method indicated an ideal K = 3 (Figure 4). Figure 5A shows the clustering pattern of wild and cultivated populations when K = 3. As in the N-J and 3D PCoA analyses, no clear separation was observed between wild and cultivated individuals. In this graph, we can see that the Jose Castillo wild population was different from the other wild ones. This finding could explain the ideal K = 3 found with the Evanno method, instead of K = 2, considering only the existence of wild and cultivated gene pools. Also, Figure 5A show that one individual of AA wild population (AA2) shared ancestry with DZ and OX cultivated populations, whereas two individuals from AT cultivated population (AT4, AT10) shared ancestry with several wild populations.   Table 4.   Table 4. Figure 5B shows the clustering pattern of wild and cultivated Mexican populations when K = 10 (considering the 10 sites collected in Mexico as different populations). In this graph, individuals from nine out of 10 Mexican populations were assigned according to their geographic origin. Also, we can see individuals admixed or migrants: One individual from AA wild population (AA2) had significant contribution of genetic material from OX cultivated population; one individual from AT cultivated population (AT10) was identical to those of population OX; and two individuals from YJ wild population (YJ21, YJ25) showed high genetic contribution from population UR.
Combining the results of the three cluster analyses performed (N-J, 3D PCoA, and STRUCTURE), it seems that the populations of P. sapota from Mexico are grouped by their geographical origin. However, Mantel test did not show the existence of a geographical isolation among the studied populations (all populations, r = 0.369, p-value = 0.09). The existence of admixed population shown by the STRUCTURE analyses is probably responsible for this result. These admixed populations hindered their grouping in the N-J and 3D PCoA analyses (Figures 2 and 3). In zapote mamey, most of molecular studies have been conducted to analyze the genetic relationships in cultivated germplasms. Using AFLP on 41 accessions from Cuba and Yucatan, no association to geographic area was found [20]. RAPDs and RAMP analyses on 14 accessions (13 from Guerrero and one from Yucatan, Mexico), did not distinguish the geographic origin of the accessions [21]. RAPDs on 15 trees in the state of Morelos, Mexico, showed that one of those trees was genetically dissimilar from the rest, and probably from a different geographic origin [22]. Hence, the microsatellite markers reported in the study allowed, for the first time, the clear discrimination by geographic origin of wild populations of P. sapota that were as close as 3 km apart (WH and JC) and many others that were between 28 to 112 km apart (Figure 2, Table 4).

Plant Material, DNA Extraction, and Isolation of Microsatellites
Analysis included 29 individuals of P. sapota collected in Mexico. Of these, 20 individuals were collected directly in the field from seven wild populations and nine individuals were collected in traditional home gardens from three Mayan towns of the Yucatan peninsula ( Table 4). The classification of these 29 individuals in wild and cultivated forms was based, essentially, on the site of collection (wild vs. traditional home gardens) and fruit size. Also, we included two individuals from the collection at USDA-ARS Tropical Agriculture Research Station (TARS) in Puerto Rico. DNA was extracted from lyophilized young leaves using DNeasy Plant Maxi kit (QIAGEN, Valencia, CA, USA). The accessions of P. sapota from TARS collection were used to generate separate SSR-enriched libraries following the protocol of Techen [24], briefly described here. DNA was digested with restriction enzymes AluI, HaeIII, DraI, RsaI, and HpyCH4IV (New England Biolabs, Ipswich, MA, USA). The DNA fragments were A-tailed with Taq-DNA Polymerase (Promega, Madison, WI, USA) in the presence of dATP for 2 h and then ligated for 3 h at 16 °C to a linker made from oligonucleotides (oligos) SSRLIBF3, 5′-CGGGAGAGCAAGGAAGGAGT-3′ and SSRLIBR3, 5′-Phos-CTCCTTCCTTGCTCTCTCCCGAAAA-3′ [24]. The ligated fragments were purified with MinElute (QIAGEN) and amplified by 20 cycles of PCR by using primer SSRLIBF3 and High Fidelity DNA Polymerase (Invitrogen, Carlsbad, CA, USA) at 94 °C for 30 s, 60 °C for 30 s, and 68 °C for 90 s. The amplified products were hybridized to three groups of biotinylated oligo repeats [23]. Sequences containing repeats were captured using streptavidin-coated magnetic beads M-270 (Invitrogen) [35]. After binding, the beads were washed with 0.5 × standard saline citrate (SSC) for 5 min. Elution of the DNA from the biotinylated oligos was done with 100 μL of MilliQ water (Millipore, Billerica, MA, USA) at 96 °C for 5 min. The eluate was PCR amplified for 10 cycles, as indicated for the ligation step. PCR products were sequenced by pyrosequencing in 1/8th of a plate 454-GS FLX (Roche Diagnostics, Indianapolis, IN, USA) by using GS Titanium Sequencing kit XLR70 (70 by 75 Titanium Pico-Titer Plates, Roche, Branford, CT, USA; 200 cycles). Sequences were assembled with 454 gsAssemby version 2.0 (Roche). Repeats were searched using SSRFinder [36], and primers were designed using Primer3 [37] with stringent parameter conditions: Annealing temperature 63 °C ± 1 °C, length 24-bp, 3′ GC clamp, and 5 bp maximum overlap of repeat within the primer. Assembled contigs resulting from high-throughput sequencing were screened using BLAST (Basic Local Alignment Search Tool) [38], BLASTn and MegaBLAST. Analysis of gene ontology was performed for the 379 repeat-containing DNA sequences using BLAST2GO [39].

Fingerprinting
A total of 384 primer sets were designed and used to screen 31 DNA samples of P. sapota listed in Table 2. Forward SSR primers were 5′ tailed with the sequence 5′-CAGTTTTCCCAGTCACGAC-3′ to permit product labeling [40], and reverse primers were tailed at the 5′ end with the sequence 5′-GTTT-3′ [41]. Primer 5′-CAGTTTTCCCAGTCACGAC-3′ labeled with 6-carboxy-fluorescein (FAM) (Integrated DNA Technologies, Inc., Coralville, IA, USA) was used for amplification of 15-ng DNA by using Titanium TaqDNA Polymerase (Clontech, Mountain View, CA, USA) in 5 μL reactions on an M&J thermal cycler (Bio-Rad Laboratories, Hercules, CA, USA) at 95 °C for 1 min, 60 °C for 1 min (two cycles), 95 °C for 30 s, 60 °C for 30 s, 68 °C for 30 s (for 27 cycles), and final extension at 68 °C for 4 min. Fluorescently labeled PCR fragments were analyzed on an ABI 3730XL DNA Analyzer, and data were processed using GeneMapper version 3.7 (both from Applied Biosystems, Foster City, CA, USA). UPIC scores, or discriminating power of the markers, were calculated using UPIC software [27].

Genetic Diversity and Cluster Analyses
Considering the low number of individuals by population, we grouped the individuals collected in Mexico in two gene pools: wild and cultivated. Then, to allow comparisons between gene pools with different sample size, the average number of alleles per locus (A) was calculated using the rarefaction method implemented in HP-Rare version 1 [42]. Also, we analyzed genetic diversity for each gene pool using the observed heterozygosity (HO) and the expected heterozygosity (HE) with a level of polymorphism of 0.05, using ARLEQUIN version 3.5 [43].
Founder effect due to domestication was estimated with the modified Garza-Williamson index (the number of alleles divided by the allelic range) [43,44], expected to be low in bottlenecked populations, using ARLEQUIN version 3.5 [43]. Also, founder effect was evaluated using the Bottleneck program v1.2.02 [45]. To do this, we performed a Wilcoxon sign test (α = 0.05) to determinate whether a significant number of loci featured heterozygosity excess, which is indicative of a recent bottleneck, assuming two-phase mutation model (TPM. model of microsatellite mutation) and using 1000 permutations.

Genetic Structure and Cluster Analysis
Genetic differentiation among wild and cultivated populations was analyzed using FST and a hierarchical AMOVA analysis [46]. Genetic flow among pairs of populations was estimated with M values (M = 2 Nm) [47]. A Mantel test [48,49] was made to evaluate the hypothesis of isolation by distance. All these analyses were made with ARLEQUIN version 3.5 [43].
The genetic relationships were inferred by three procedures. (1) A Neighbor-Joining (N-J) analysis was made using the standard genetic distance of Nei [50] and the program NTSYSpc version 2.2 (Exeter Software, Setauket, NY, USA). The confidence levels for the topology were assessed by bootstrap resampling (50,000 replicates) [51,52] with WINBOOT [53]. (2) A three-dimensional principal coordinate analysis (3D PCoA) was done using the computer package NTSYS-pc version 2.1 [54]. This technique allows us to explore the genetic structure of the sample data without a priori criteria, using each allele as an independent variable. (3) An assignment test of individuals was made with STRUCTURE 2.1 [55] using the admixture model with 200,000 burn-ins and 200,000 iterations to allow the Markov chain to reach stationarity. A total of ten independent simulations were run for each value of K tested, ranging from K = 1 to K = 12. Then, the data generated were used to obtain the ideal K with the method of Evanno [56] using the STRUCTURE HARVESTER program [57]. Bar graphs generated by STRUCTURE for K ideal and that produced considering the sites of collection as different populations were labeled with the drawing software PowerPoint™ 2010 (Microsoft Office).

Conclusions
It is very important to know about how the domestication process has impacted the extent and distribution of the genetic diversity in vegetatively propagated fruit trees. This work developed and used, for the first time, SSR codominant molecular markers to study this aspect in P. sapota, an important Neotropical fruit crop. We showed that the genetic diversity is slightly higher in the cultivated gene pool of P. sapota from Mexico in comparison with the wild populations, even though we found evidence of a bottleneck event in the cultivated gene pool, maybe as a result of the domestication process of this species. As suggested by the genetic structure and the cluster analyses performed in this study, wild and cultivated populations of P. sapota from Mexico show low genetic differentiation, perhaps, because of the existence of high levels of genetic flow favored by the production and propagation methods of zapote mamey practiced by traditional farmers from Mexico [6]. It is necessary to increase the sample size in order to better understand how the domestication process has impacted the diversity and genetic structure of zapote mamey.
The SSR reported in this work have a great potential for other uses. For example, given the large number of species in banks of germplasm (14,739 plant species in the U.S.A. alone [58]); compared to the number of sequenced plant genomes (approx. 80, [59]), there is a need for developing specific molecular markers to be used to characterize germplasm collections. Also, at the genus level, the taxonomy of Pouteria is far from being clearly defined [60]; this genus has a large number of synonyms and its taxonomy is constantly changing [2], probably due to the lack of genetic information. Since microsatellites usually transfer among plant species within a genus [61] and among plant related genera [62], the 368 markers reported here could also help clarify the phylogenetics of the genus Pouteria. MegaBlast and BLASTn analysis of the P. sapota DNA against the non-redundant database of GenBank (NCBI) for sequences not harboring microsatellites, showed similarity mainly to V. vinifera followed by P. trichocarpa and G. max. The same trend was observed when blasting microsatellite-containing sequences in BLAST2GO (Blast2GO 2009), which uses a different database. Considering that the genus Pouteria is non-monophyletic [1,2], the similarity of Pouteria (Sapotacea) and Vitaceae is interesting and should be further explored. Finally, studies have shown the feasibility of elaborating testable hypotheses based on the biological function of genes related to polymorphic microsatellites [63]. Some polymorphic SSR described here had significant hits on BLAST2GO analysis and are worth exploring. Thus, this work is an important contribution, generating a large set of SSR markers for P. sapota.