Genetic Diversity in Jatropha Curcas Populations in the State of Chiapas, Mexico

Jatropha curcas L. has become an important source of oil production for biodiesel fuel. Most genetic studies of this plant have been conducted with Asian and African accessions, where low diversity was encountered. There are no studies of this kind focusing in the postulated region of origin. Therefore, five populations of J. curcas were studied in the state of Chiapas, Mexico, using amplified fragment length polymorphism (AFLP) markers. One hundred and fifty-two useful markers were obtained: overall polymorphism = 81.18% and overall Nei's genetic diversity (He) = 0.192. The most diverse population was the Border population [He: 0.245, Shanon's information index (I): 0.378]. A cluster analysis revealed the highest dissimilarity coefficient (0.893) yet to be reported among accessions. An analysis of molecular variance (AMOVA) revealed that the greatest variation is within populations (87.8%), followed by the variation among populations (7.88%). The PhiST value (0.121) indicated moderate differentiation between populations. However, a spatial AMOVA (SAMOVA) detected a stronger genetic structure 642 of populations, with a PhiST value of 0.176. To understand the fine structure of populations, an analysis of data with Bayesian statistics was conducted with software Structure ©. The number of genetic populations (K) was five, with mixed ancestry in most individuals (genetic migrants), except in the Soconusco, where there was a tiny fraction of fragments from other populations. In contrast, SAMOVA grouped populations in four units. To corroborate the above findings, we searched for possible genetic barriers, determining as the main barrier that separating the Border from the rest of the populations. The results are discussed based on the possible ancestry of populations.


Introduction
Jatropha curcas L. is a shrubby plant, which has seeds with a high content of oil capable of being transformed into biodiesel.This plant grows well on marginal soils and is drought resistant [1,2].Moreover, it has been shown that this species has the capacity to control soil erosion [3,4] and potential for phytoremediation [5,6].It is expected that in 2015 about 13 million hectares will have been planted with J. curcas in tropical regions of Asia, Africa and America [7].It is widely distributed throughout the tropics, in Central and South America, Asia and Africa.Nevertheless, many studies mention that its center of origin is Mesoamerica [8][9][10].For a discussion about the debate of the center of origin, see our review article [11].In Mexico, J. curcas is known as Piñón and can be found as living fences in various states, both on the Pacific slope, as well as along the Gulf of Mexico [12].Seeds of this plant are toxic due to their high content of phorbol esters, but in Mexico there are a few non-toxic varieties used as food in some rural areas [7,13,14].
Researchers are studying many aspects of the biology of this species across the world, including its genetic diversity.Recently, the nuclear [15] and chloroplast [16] genomes of this species were sequenced, thus facilitating the identification of genes of productive interest, such as those related to the synthesis and accumulation of oil.Complementary to this, the study of genetic variation in populations with analysis of the phenotypic variation will contribute to the identification of loci associated with quantitative traits of agronomic interest.The understanding of the genetic diversity and structure of populations of J. curcas in their postulated center of origin will permit to identify genetic material useful for future improvement of the species.For example, it will be possible to design crosses between plants from groups that are genetically distanced.
Most genetic studies of this plant have been conducted with Asian and African accessions, with low diversity found therein.Furthermore, most Mexican germplasm studied for these purposes have been limited to varieties with low or no toxicity at all [10,[17][18][19][20][21][22].The results of these studies, although based on individual accessions, not populations, suggest that the center of diversity is in Mesoamerica.However, there are no molecular studies focusing on this region.
The state of Chiapas, southern Mexico, has a huge population of J. curcas distributed over most of its territory.Previous studies have shown that populations of J. curcas from Chiapas have a high variability in oil content and fatty acid composition [23], in toxic compounds of the seed [7] and in floral characteristics [24].Genetic relationships of these populations are still unknown.It should be noted that, while J. curcas is a semi-domesticated species (there are not "true" wild populations and is cultivated as fence, although, apparently, it does not exhibit the domestication syndrome), interchange of germplasm only exists at the local level and not between regions of the state [25].
Genetic diversity of this species is influenced by some aspects of its biology (J.curcas is a diploid, n = 11, with a genome relatively small, 416 Mb [26]; it is monoecious, producing male and female flowers in the same inflorescence [27]; a small proportion of its reproduction is via apomixis [27,28]).Ecological aspects are also important, for example, its main mating system is by xenogamy or cross-pollination [28], being pollinated by insects [29]).Anthropic management of populations can affect diversity since J. curcas is a plant in process of domestication and it is propagated mainly clonally.Based in the antecedents, we propose following the postulate: the genetic diversity in J. curcas populations in the Mesoamerican region is more elevated than those reported of Old World germplasm.Since there are no studies of these plant populations in Mesoamerica, the present work aims to study the genetic diversity of populations of J. curcas of the state of Chiapas, Mexico, using AFLP molecular markers.

Plant Material
One hundred and thirty-four individuals of J. curcas were collected from living fences in five populations of Chiapas, Mexico: Soconusco, Isthmus, Center, Frailesca and Border.Individuals were entered as accessions to the Germplasm Bank, in the Center for Biosciences at the Autonomous University of Chiapas (CenBio-UNACH, abbreviation in Spanish).
The criteria to group plants from different sites in determinate populations were climate (annual mean precipitation and temperature) and soil residual humidity (Table 1).Those sites are clearly differentiated in the mentioned criteria [30][31][32][33][34].

Isolation of Total DNA
Total DNA extraction was performed by the method described by Doyle and Doyle [35] and modified in the Research Laboratory CenBio-UNACH; young leaves were collected, transported on ice to the laboratory, were washed with sterile water and ethyl alcohol 70%, and kept at −30 °C until processing.An amount of 0.2 g of leaves was ground with liquid nitrogen with 60 µg of polyvinyl pyrrolidone and 1 mL of buffer CTAB (hexadecyltrimethylammonium bromide 0.1% w/v, 5 mm EDTA, 1.5 M NaCl, 50 mM Trizma Base pH adjusted to 8 with HCl and 2-mercaptoethanol 0.1% v/v).Extractions were made with chloroform-isoamyl alcohol and precipitation with isopropanol.The extracted DNA was purified with a mixture of phenol: chloroform: isoamyl alcohol (25:24:1).The integrity of the DNA, dissolved in 60 µL milli-Q water, was verified by gel electrophoresis on 1% agarose and quantified spectrophotometrically at 260 nm (GBC Cintra 10e™ spectrophotometer, Australia).

AFLP Analysis
AFLP was performed with the procedure proposed by Invitrogen™ (AFLP™ Core Reagent Kit, USA) using 500 ng of total DNA restricted with enzymes EcoRI and MseI and changing the time of ligation of adapters to h.In the pre-amplification primers used were: 5'-GACTGCGTACCAATTC + C-3', complementary to the EcoRI adapter, and 5'-GATGAGTCCTGAGTAA + C-3' to the adapter MseI in a reaction mix containing: 5 µL of the restricted and ligated products, 2 µL MgCl 2 25 mM, 0.5 µL dNTP mix 10 mM, 5 U of Taq DNA polymerase, 4 µL 10 × buffer and 10 pmol of each primer, adjusted to 25 µL with milli-Q water.The pre-selective amplification was conducted at an initial temperature of 92 °C for 2 min, 40 cycles at 92 °C for 2 min, 1 min at 38 °C for alignment and 2 min at 72 °C extension, 1 cycle of final extension at 72 °C for 10 min and finally at 4 °C indefinitely.The selective amplification was performed using primers 5'GATGAGTCCTGAGTAA+CAT3' for the adapter MseI 5'GACTGCGTACCAATTC+CAT3' for EcoRI adapter, which was labeled with the fluorogenic D2WellRed™ (Genoma Lab™, USA), under conditions previously described.
The amplified products were resolved by capillary electrophoresis in a CEQ8000™ (Beckman Coulter™, USA) sequencing equipment, for which a 2 µL sample and 0.125 µL standard 400 bp-labeled with the fluorogenic D1WellRed™ (Genoma Lab™)-were mixed, adjusting to 25 µL with sample loading solution (SLS).The electrophoretic conditions were: capillary temperature 50 °C, denaturation temperature 90 °C, an injection voltage of 2.0 kV and separation voltage of 5.0 kV, for one hour.To determine the size of the fragments, the electropherograms were calibrated with molecular weight marker (calibration curve fitted with cubic model, with correction to the mobility of marker AE.Ver2; confidence level >99%).The electropherograms obtained were taken into account only when the peaks obtained good resolution and if the correlation coefficient of the marker was at least 0.99, with cubic correction.Moreover, to accept the minor peaks, the intensity of its signal had to have at least 2% of the intensity of the second highest peak in the electropherogram in question.The findings were made in duplicate and were accepted only when the repetitions have the same result.

Statistical Analysis
Data from the electropherograms were transformed into a matrix: presence (1), absence (0) of the fragments, using the CEQ Genetic Analysis System © software version 9.0.We performed a cluster analysis of the populations, built with the coefficient of Nei's genetic identity.Genetic diversity within each population was measured by calculating the percentage of polymorphism (%P), effective number of alleles (Ne), Shannon index of information (I) and expected heterozygosity or Nei's genetic diversity (He), using the program GenAlEx © version 6.3 [36].The format of data selected was Binary (Diploid).Results were compared with those yielded by the program AFLP-SURV 1.0 © [37], using the approach of Lynch and Milligan [38] and the Bayesian method with non-uniform prior distribution to compute allelic frequencies [39].
To determine the degree of differentiation within and between populations and between regions, an analysis of molecular variance (AMOVA) was performed, with 100,000 permutations using the program Arlequin © version 3.11 [40].Results of AMOVA were compared with those obtained with the software SAMOVA 1.0 © [41].SAMOVA runs were made for 2 to 10 groups to find the number of homogeneous populations, using the PhiCT value as indicator.
Genetic structure of the populations was searched using the software Structure © version 2.3.2 [42].The program was run with 30,000 iterations, 50,000 iterations after burn-in and 10 repetitions of each number of genetic populations (K1-K9).According to prior information for the populations, a migration rate of 0.001 was assumed, so that the option was selected USEPOPINFO with the model of ancestry Admixture Model (GENSBACK = 3, MIGPRIOR = 0.001) proposed by Falush et al. [43].The value of K was estimated following the procedure described by Evanno et al. [44].
In order to find possible isolation by distance, a test of the Mantel correlation between Nei's genetic distance and geographical distance was performed with 10,000 permutations using the program GenAlEx © version 6.3.
Finally, possible genetic barriers were searched for using the program Barrier © version 2.2 [45], found with the Monmonier algorithm based on Fst genetic distances generated in AFLP-SURV 1.0, using the Bayesian method with non-uniform prior distribution to compute allelic frequencies [37,39].To estimate the robustness of barriers, the analysis was performed using one hundred bootstrapped distance matrices.Considering the potential genetic discontinuities according to the differences among regions, five barriers were initially searched for, but only the two more robust barriers (more than 50% of bootstrap support) were plotted.

Genetic Diversity
Two hundred and nine AFLP markers were obtained from electropherograms. Figure 1 shows a typical electropherogram of an accession of J. curcas.After elimination of six monomorphic fragments and a process of pruning of data (to avoid fragment size homoplasy), by the methods proposed by Lynch and Milligan [38] and Vekemans et al. [46], 152 useful markers remained for diversity analysis.Size(bp) Polymorphism rates were found to range between 71.7% and 92.1%, while the average polymorphism was 81.1%; the effective number of alleles (Ne) was between 1.181 and 1.398 with an average of 1.303; the Shannon diversity index (I) ranged from 0.202 to 0.378, the average was 0.306; the genetic diversity of Nei (He) ranged from 0.121 to 0.245 with an average of 0.192.These results reveal high genetic diversity in the populations studied.The population with greatest genetic diversity was Border and the least diverse was Soconusco (Table 2).Parameters of genetic diversity obtained with AFLP-SURV 1.0 were slightly different, but with the same tendency.For example, the most diverse population, Border, had a He value of 0.271, while the least diverse, Soconusco, had a He value of 0.123.Global gene diversity within populations was 0.207.
Very little research on genetic diversity of J. curcas has had a focus on populations [22,47,48].Ambrosi et al. [22] analyzed plants from different geographical regions, nine accessions of Jalisco, Mexico, and 17 commercial varieties from South America, Asia and Africa, using SSR markers.They found higher diversity values than those obtained in this investigation (Ne average of 1.843, an average of I of 0.661 and He average of 0.345), being the Mexican population the most diverse.Wen et al. [47] used EST-SSR markers to study populations from Indonesia, South America, Grenada and China, reporting average Ne values of 1.686, an average I value of 0.557 and an He average of 0.381.Another study, with ISSR markers, analyzed a total of 219 accessions from China and five from Myanmar, divided into seven populations and obtained an Ne average of 1.317, an I average of 0.292 and an He average of 0.190 [48].In a study with ISSR markers with 158 individuals from 8 semi-wild populations of Yunnan, China, a polymorphism of 55.04%, Ne average of 1.382,He average of 0.217 and a mean of I of 0.317 were found [49].Another study in China (nine populations) reported an He average of 0.235 and a mean I value of 0.376 [50].
Although individual-based trees are not useful to infer population structure or other population attributes [51,52], we compared the Jaccard's dissimilarity index of our accessions with those of other studies (Table 3).According to Bonin et al. [53], coefficients of similarity are accepted band-based metrics of diversity for dominant data.It is important to note that the type of marker used can bias values of diversity index.For example, the use of SSR markers has the advantage of obtaining observed heterozygosity values, but entails the risk of overestimating the diversity indexes, especially when using a low number of markers due to the high allelic variability in the sequences of SSR [49].Cluster analysis among the 134 accessions showed a Jaccard's dissimilarity coefficient of 0.893, indicating high genetic diversity.That value is over the 0.360 reported by Basha and Sujata [17], which analyzed 42 accessions of different geographical locations of India and a non-toxic variety of Veracruz, Mexico, using RAPD-ISSR markers.However, Tatikonda et al. [54] found a maximum dissimilarity coefficient of 0.570, which indicated relatively high percentage of diversity, in studying 48 accessions from India with AFLP markers.

Differentiation of Populations
The populations were grouped into two regions: (1) the coast of Chiapas (Soconusco and Isthmus); and (2) the central part of the state (Center, Frailesca and Border).AMOVA detected that the highest proportion of variation was found within populations (87.8% of the total molecular variation, Table 4).Phi statistics, which are equivalent to F statistics [66,67], indicated the presence of structuring and possibly genetic barriers.PhiST differentiation index (analogous to Fst and Gst indexes) had a value of 0.121, which was significant, indicating moderate differentiation (12.1% of total genetic variation is due to differentiation among populations).The significant value of PhiCT showed that 4.3% of global genetic variation is due to differentiation among regions.
Wen et al. [47] obtained Gst of 0.18 indicating a significant differentiation between geographical regions studied in Indonesia, China and South America.Ambrosi et al. [22] reported values of Fst of 0.20 reflecting large genetic differentiation among the geographic groups studied (America, Asia and Africa).Furthermore, Cai et al. [48] found moderate differentiation between populations with an Fst of 0.12.Xiang et al. [49] reported a Gst = 0.2944, in populations of Yunnan, China.Another study among populations of China, revealed Gst = 0.539 [50].Index values of differentiation among populations greater than 0.25 are very high and higher than 0.5 should be taken with caution, since it means that populations are so different that they could even are in the process of speciation, which is not likely to be happening with J. curcas.in contrast with the values of He, the Fst values calculated with different types of molecular markers are frequently proportional [68].Compared to these investigations, the degree of differentiation found in Chiapas populations is lower, which could indicate that previous to anthropic management the species had a high gene flow through pollen and seed dispersal [69].
The spatial analysis of molecular variance showed that the maximum partitioning of the genetic diversity is obtained when sites are arranged in four groups (PhiCT = 0.15021, p < 0.000): (1) Villa Flores; (2) Ciudad Cuauhtémoc, Comalapa, Jiquipilas, Ocozocuautla and Rizo de Oro; (3) Arriaga, Tonalá, Acapetahua, Cacahoatán, Huixtla, Mapastepec, Puerto Chiapas, Suchiate, Berriozabal, Pujiltic, La Concordia, Villa Corzo and Revolución Mexicana; (4) Pijijiapan.By this arrangement, the program SAMOVA 1.0 detected a stronger genetic structure of populations, with a PhiST value of 0.176 (Table 5).According to the International Plant Genetic Resources International [70], PhiST values below 0.05 indicate small genetic differentiation, values between 0.05 and 0.15 indicate moderate differentiation, values between 0.15 and 0.25 indicate that the differentiation is high and over 0.25 is very high.The program SAMOVA 1.0 defines groups of populations that are geographically homogeneous and maximally differentiated from each other [41].SAMOVA maximizes the proportion of total genetic variance among groups (PhiCT) and minimizes the variance among populations within groups (PhiSC) to obtain the most probable grouping.A constraint of SAMOVA, as its authors recognize, is that the algorithm assigns populations to groups based in the adjacency taking into account linear geographic distances, without considering ecological factors.For example, populations of the coast of Chiapas are grouped with those of Frailesca, which are geographically adjacent, but there is a mountain chain between them.A method considering "real" geographic distances (resistance distances) is needed.A resistance distance or circuit takes into account all possible pathways connecting populations pairs [71].
The study of the genetic relationships among the populations showed that populations Frailesca and Isthmus are the closest, and that despite the geographic proximity between the Border and Frailesca populations, the Border was the most distant of all (Figure 2).In this case, the similarity coefficient was 0.96; this result shows that most genetic variation is within populations rather than between populations, as shown by the above results of AMOVA and SAMOVA.
Since the populations showed moderate differentiation among them, possible isolation by distance was looked for, and the outcome of the Mantel test of correlation between the Nei genetic distance and geographic distance proved not to be significant with a value of r 2 = 0.00146 and p = 0.056.However, it would be interesting to make this test with resistance distances, rather than linear distances.Therefore, the existence of genetic barriers, and not the isolation by distance, may be the reason for the differentiation found in the populations of J. curcas in Chiapas.

Structure of Populations
The study of genetic structure revealed five genetic groups (K = 5) using the package Structure © with a migration rate of 0.001 and a model of ancestry admixed.The five groups identified using Structure © are consistent with the declared geographical populations, as can be concluded from Figure 3.The five colors used represent the five groups identified by Structure © .Each individual is represented by a vertical bar.The proportion of a color in the bar indicates the proportion of alleles coming from one of the five groups identified by Structure © in that individual genotype.Results should be interpreted with caution because, although Structure © it is a valuable tool for studying individuals whose population of origin is unknown, the program is not designed to describe relationships between populations [72].
Mixed ancestry was found in most individuals (probably genetic migrants), except in Soconusco, where there was a small fraction of alleles from other populations.This clearly shows that Soconusco is source of migrant bands for populations of both the Coast of Chiapas and its Central Valleys, with the exception of Border where most migrants belong to Center and Frailesca.To explain the origin and spreading of J. curcas individuals in Chiapas, we must take into account the study of the natural distribution in time and space, which is based on the biogeography and also considers the processes that led to such a distribution [73].
Within biogeography there is a school of thought which raises questions as to the origin of species (sympatric speciation), from which they are scattered at random, crossing preexisting barriers and colonizing new areas.This is called dispersalism.In contrast, vicariance theory assumes that populations are separated or fragmented by the formation of geographical barriers leading to allopatric speciation [73].On an infraspecific level, it is likely that individuals of the Soconusco population, with patterns of differentiated bands, have been dispersed in the past into other populations, prior to anthropic management and the emergence of significant barriers.The other possibility is that populations of J. curcas in Chiapas had patterns of similar fragments, which then differed after the rise of the Sierra Madre, and those genes that best adapted to areas on each side of the barrier persisted.To determine with more certainty the origin of populations, phylogeography studies are needed for this Mesoamerican plant, using conserved markers such as mitochondrial or chloroplast.The loci of the Border (in blue) and those of the Center (in yellow) were found in very low proportions in populations of the Coast of Chiapas, and despite the proximity, migrant alleles from the Border and Center were found in low proportions in Frailesca.Despite the closeness between Soconusco and the Border, there is scarce exchange of alleles between these populations, suggesting the existence of a genetic barrier.These two facts suggest that the alleles of Border could have come from Guatemala and then dispersed into Chiapas via two likely pathways, Soconusco and Border.It is possible that in Border, individuals with alleles in blue found appropriate conditions for their reproductive success.However, it is necessary to perform population genetic studies in Guatemala.The high diversity in Chiapas is remarkable, contrary to the homogeneity results found in the populations studied by Ambrosi et al. [22].
To corroborate the above findings, we searched for possible genetic barriers and found that both Border and Soconusco are isolated from the rest.The main barrier isolates the Border (yellow line "a" in Figure 4).It is clear that the Sierra Madre is a strong physical barrier between the populations of Soconusco and the Border, and is possibly the main cause of differentiation found among populations, keeping them separate right from the emergence of this mountain chain.The Sierra Madre mountain chain arose probably from the medium Miocene to the early Pliocene (between 13 and 4.5 million years ago (m.y.a.)) [74,75], while J. curcas probably exists from more than 70 m.y.a.[76].The second most important barrier (65% of bootstrap support) separates Soconusco from the Isthmus, although apparently on the coast of Chiapas there are no major geographical barriers separating these two populations, so it may be that the climate is taking this role.Soconusco has an average annual precipitation of 2500 mm, relative humidity of 79.4% and average annual temperature of 27 °C [33].Its climate is Aw2 (w) Ig [34], which corresponds to the most humid of the warm sub-humid tropical climates.For its part, the Isthmus has a climate type Aw0 (w), which corresponds to a warm sub-humid with summer rains climate.In this region, an average annual precipitation of 1500 mm is recorded, with less than one hundred days of rain per year [34].

Conclusions
High genetic diversity was found within and among populations of J. curcas in Chiapas, the highest being found among individuals.The population with the greatest diversity was the Border population and the least diverse was Soconusco.Depending of the method of analysis, moderate to high differentiation was detected among populations, which is attributed to the existence of genetic barriers between populations.If the dissimilarity among accessions is considered, in Chiapas this species has greater genetic diversity than in other parts of the world.Results showed that the Mesoamerican region could be a center of diversity of this plant.It is possible that, previous to the anthropic handling of J. curcas, its genetic base was sufficiently broad to avoid the erosion of diversity by the process of domestication and by the clonal propagation.

Figure 1 .
Figure 1.Electropherogram of a typical sample of J. curcas from Chiapas obtained by capillary electrophoresis.The marker of molecular size, 400 bp labeled with the fluorogenic D1WellRed ® , is showed in red.PCR products were resolved in a CEQ8000 ® (Beckman Coulter ® ) sequencer.

Figure 2 .
Figure 2. Dendrogram constructed with the coefficient of genetic identity of Nei from amplified fragment length polymorphism (AFLP) data of five populations of Jatropha curcas of Chiapas, Mexico.

Figure 4 .
Figure 4. Map of five populations of Chiapas showing the two main genetic barriers (a and b yellow lines) found by the Monmonier algorithm (Barrier © version 2.2), based on Fst distances.Thickness and the number on the side of the barriers indicate the percentage of bootstrap support.Map kindly prepared by F. Pérez-Racancoj (University of Chiapas).

Table 1 .
Jatropha curcas L. accessions collected in the state of Chiapas located in the Germplasm Bank of the Center for Biosciences, University of Chiapas, Mexico.

Table 2 .
Genetic diversity parameters of 134 accessions of J. curcas of Chiapas, Mexico, grouped in five populations.

Table 3 .
Comparison of genetic variation in Jatropha curcas collected in different parts of the world.

Table 4 .
Analysis of molecular variance of Jatropha curcas populations collected in Chiapas, Mexico.Measure of genetic differentiation among regions for the total populations; PhiSC: Measure of genetic differentiation among populations within a region; PhiST: Measure of genetic differentiation among populations.

Table 5 .
Spatial analysis of molecular variance of Jatropha curcas collected in Chiapas, Mexico.