Genetic Purity of Cacao Criollo from Honduras Is Revealed by SSR Molecular Markers

: The cultivation of cacao represents an income option and a source of employment for thousands of small producers in Central America. In Honduras, due to the demand for ﬁne ﬂavor cacao to produce high-quality chocolate, the number of hectares planted is increasing. In addition, cacao clones belonging to the genetic group named Criollo are in great demand since their white beans lack of bitterness and excellent aroma are used in the manufacturing of premium chocolate. Unfortunately, the low resistance to pests and diseases and less productive potential of Criollo cacao leads to the replacement with vigorous new cultivars belonging to the other genetic groups or admixture of them. In this study, 89 samples showing phenotypic traits of Criollo cacao from four regions of Honduras (Cop á n, Santa B á rbara, Intibuc á , and Olancho) were selected to study their genetic purity using 16 SSR molecular markers. The results showed that some samples belong to the Criollo group while other accessions have genetic characteristics of “Trinitario” or other admixtures cacao types. These results conﬁrm the genetic purity of Criollo cacao in Honduras, reafﬁrming the theory that Mesoamerica is a cacao domestication center and also serves to generate interest in the conservation of this genetic wealth both in-situ and ex-situ.


Introduction
Cacao (Theobroma cacao L.) is an economically important crop because it produces the raw material for making chocolate. It is native to South America, with its center of origin in the upper Amazonian region spanning Peru, Colombia, Brazil, and Ecuador [1][2][3]. Cacao was initially thought to be domesticated in Southern Mexico and Central America since vessels used by pre-Columbian cultures in Honduras and Mexico contained trace remains of theobromine, confirming the use of cacao products 1800-1000 years BCE [4][5][6][7][8][9]. However, recent archaeological finds reveal that the upper Amazon region was also a center for the domestication of cacao [10].
Spanish settlers in Honduras were introduced to cacao and realized the importance of this crop [11,12]. Recent studies showed that during this period of domestication there was a high selection for genes related to anthocyanins and theobromine, which caused the cacao Criollo to maintain a high frequency of deleterious mutations [13].
Cacao has high genetic and morphological diversity, with a diploid genome consisting of 10 chromosomes (2n = 20) [14] with a genome size between 411 and 494 Mbp [15]. Initially based on phenotype, cacao was classified into three groups Criollo, Forastero, and Trinitario, which is a hybrid between Criollo and Forastero. Forastero cacao is characterized by seeds with purple cotyledons and most of the chocolate consumed in the world is produced from these seeds known as "bulk or ordinary" cacao [3]. Criollo cacao is characterized by having white or pale pink seeds, used for the formulation of premium chocolate that sells for higher prices in the market [16]. Trinitario beans are variable in cotyledon color and produce a liquor with a strong cacao flavor. Criollo and Trinitario are known collectively as "fine of flavor" cacao, along with Amelonado, Nacional, and Refractario (Nacional hybrids). There are different meanings of the term "Criollo", depending on whether it is used by breeders, botanists, or growers. "True Criollo", "Ancient Criollo", and "Criollo" are the ways it is called respectively depending upon who is using the term [16]. Currently, Criollo is defined as a unique genetic group based on the classification proposed by Motamayor et al. [17]. In fact, with the use of molecular tools, a new classification of 10 genetic groups has been proposed, which has furthered the understanding of the genetic and phenotypic diversity of the crop, and "Criollo" cacao is one of the identified groups [17].
Knowing the genetic diversity of cacao has been of great interest to breeders and growers. The first classifications of the Theobroma species were made concerning their phenotypic characteristics, evidencing that one of the marked differences between Criollo and the others, was the content of anthocyanins in its seeds [2,4,18]. The first genetic diversity studies of cacao were made using RAPD and RFLP markers [19][20][21][22][23], followed by SSR markers [24][25][26][27][28][29][30][31][32], and SNP markers [33][34][35][36][37][38]. Some of these studies made it possible to identify Criollo as a different genetic group. A high level of homozygosity and significantly low genetic diversity within the Criollo population was also found [24] in a great part attributable to Criollo self-compatibility.
The results of the genetic diversity of cacao from Honduras and Nicaragua with SNP markers confirmed the existence of a population of ancient cacao Criollo. Furthermore, the Trinitario varieties of cacao used by growers are more influenced by the Amelonado genetic group than the Criollo genetic group [7]. The study also showed that although the varieties of cacao type Trinitario used by growers in both countries are different, the samples of ancient cacao Criollo analyzed were grouped in the same cluster [7]. Several studies on the genetic diversity of cacao have used samples of cacao Criollo from Honduras as a reference of the purity of the Criollo genetic group [2,7,16,29,35,39]. A phenotypic characterization study of cacao from Honduras determined that 72% of the production comes from cacao Trinitario type varieties and 28% comes from Amelonado type varieties [40]. The cacao growers in Honduras have the technical and scientific support provided by the Honduran Foundation for Agricultural Research (FHIA) for over 36 years and the foundation's agroforestry program has been collecting genetic material and maintaining a germplasm bank in order to preserve elite material for the improvement of cacao cultivation. However, the genetic purity of Criollo in Honduras and its diversity has not been fully investigated, even if it would be highly beneficial to the cacao genetic improvement and the conservation of Criollo in-situ and ex-situ. Therefore, the goals of this study were to assess the genetic purity and diversity of 89 cacao accessions, which have been selected based on the Criollo phenotype from four different departments of Honduras, using single sequence repeat (SSR) markers. The results of the analyses have also been compared with a robust SSR dataset of 116 samples obtained from the original classification developed by Motamayor et al. [17] representing the ten cacao genetic groups.

Plant Material
In this study, eighty-nine cacao plants were selected from four different departments in Honduras: Copán, Santa Bárbara, Olancho, and Intibucá ( Figure 1). Cacao trees were previously identified by the farmers, who had preserved the genetic material for many years, and only trees showing phenotypical characteristics of Criollos were sampled ( Figure 2). Generally, they were located inside small and medium-sized family farms, isolated, and not properly maintained. Two young fully expanded leaves were selected from each tree and conserved and transported inside 20 mL plastic tubes filled with 96% ethanol [41,42] for DNA extraction. The GPS coordinates were recorded for each tree for subsequent mapping (Table S1).

Plant Material
In this study, eighty-nine cacao plants were selected from four different departments in Honduras: Copán, Santa Bárbara, Olancho, and Intibucá ( Figure 1). Cacao trees were previously identified by the farmers, who had preserved the genetic material for many years, and only trees showing phenotypical characteristics of Criollos were sampled ( Figure 2). Generally, they were located inside small and medium-sized family farms, isolated, and not properly maintained. Two young fully expanded leaves were selected from each tree and conserved and transported inside 20 mL plastic tubes filled with 96% ethanol [41,42] for DNA extraction. The GPS coordinates were recorded for each tree for subsequent mapping (Table S1).

DNA Extraction and Quantification
For DNA extraction, 40-50 mg of dry leaf sample was placed in a 2 mL tube with tungsten carbide beads, frozen in liquid nitrogen, and finely ground in a tissue homogenizer (Tissue Lyser, QIAGEN, Hilden, Germany). DNA was extracted using an Invisorb Spin Plant Mini Kit (Stratec Molecular, Berlin, Germany). DNA has been electrophoresed on an agarose gel to validate quality and quantity.

DNA Extraction and Quantification
For DNA extraction, 40-50 mg of dry leaf sample was placed in a 2 mL tube with tungsten carbide beads, frozen in liquid nitrogen, and finely ground in a tissue homogenizer (Tissue Lyser, QIAGEN, Hilden, Germany). DNA was extracted using an Invisorb Spin Plant Mini Kit (Stratec Molecular, Berlin, Germany). DNA has been electrophoresed on an agarose gel to validate quality and quantity.

SSR Markers
Sixteen previously reported SSR markers were used in this study [43,44]. The list of the SSR markers' names, the EMBL accession, the chromosome where the locus is mapped, the specific primers, the observed size range of the amplicon, the motifs, and the references are reported in Table 1.  [17] was included as reference accessions (Table S2) representing the ten cacao genetic groups identified by Motamayor et al. [17]: Amelonado, Contamana, Criollo, Curaray, Guiana, Iquitos, Marañon, Nacional, Nanay, and Purús. Two SSR data sets were generated, one including only the samples from Honduras, named "Honduras" and the other data set including both Honduras and reference accessions, named "Complete". For the "Complete" dataset a representative number of reference accessions was analyzed along with our samples to assess the comparability of the two datasets.

Data Analysis
Despite gathering samples in a vast area, Honduran accessions sharing identical SSR profile were retrieved. Therefore, to reduce noise and to avoid any alteration of the results due to the presence of identical genotypes, a preliminary analysis was performed with the software Cervus 3.0.9 [45]. In this case, only the samples that shared 100% of their allele's identity were considered. From this procedure, 32 identical samples were detected and then not taken into consideration in the subsequent evaluation unless they do not belong to the same department, reducing the number of examined genotypes to 57 (Table S3).
Concerning the "Honduras" dataset, for each SSR locus, the number of alleles (Na), the effective number of alleles (Ne), frequency of the predominant allele (Fa), the observed (Ho) and expected heterozygosity (He), and the fixation index (F) were calculated using Genealex 6.5 [46]. The polymorphic information content (PIC) was calculated with PowerMarker 3.25 [47].

Cluster Analysis
The Bayesian approach implemented in the software STRUCTURE v.2.3.4 [48] was used to define the population structure in both datasets, "Honduras" and "Complete", as previously described. An admixtured and shared allele frequencies was adopted to determine the number of populations (K), assumed in the range from 1 to 10 in "Honduras" and from 1 to 20 in "Complete". For each K value, 10 runs with 100,000 burn-in periods and 200,000 MCMC (Markov Chain Monte Carlo) iterations were performed. The ∆K of the Evanno method [49] was calculated to estimate the best value of K that fits the data, by using STRUCTURE HARVESTER [50]. The structure plot was obtained with the package pophelper v2.3.0 [51] in the R-project. The Criollo reference (Criollo 13) was included in the "Honduras" dataset to establish which Honduran samples were pure Criollo.
A further STRUCTURE analysis was performed with the 116 reference accessions from Motamayor et al. [17] along with the eight Honduran pure Criollo cacaos, which had the highest coefficient of membership (higher than 0.99) to the Criollo cluster in the STRUCTURE analysis of "Complete" dataset (Cop_22, Cop_23, Cop_24, Cop_28, Cop_29, San_106, San_107, and San_109) to sustain the fitness with the 10 genetic groups established in Motamayor et al. [17].
Moreover, concerning the "Complete" dataset, a matrix of Manhattan distances was obtained to produce a principal coordinates analysis (PCoA) with the packages adegenet 2.1.3 [52] and Ade4 [53] in the R-project. The result was plotted to show graphically if individuals tended to be grouped according to the 10 cacao genetic groups identified by Motamayor et al. [17].

Descriptive Genetic Parameters of the SSR Loci
In total, the 16 SSR markers were able to amplify 88 alleles across all cacao samples of the "Honduras" dataset. As shown in Table 2

Cluster Analysis
The STRUCTURE analysis results indicated that the 57 Honduran samples were grouped in two clusters (Figure 3). Only the accessions with a coefficient of membership higher than 0.70 were considered belonging to one of the two clusters. Trees sampled in Copán and Santa Bárbara (except for San_5, San_6, San_17, San_87, San_88, and San_89) were clustered in the same population along with Criollo 13, which is the reference carrying the genetic characteristics of pure Criollo cacao. Only two accessions sampled in Intibucá (Int_54 and Int_55) were grouped along with Criollo 13. Additionally, some accessions collected in Olancho (Ola_11, Ola_14, Ola_15, Ola_57, Ola_60, Ola_62, Ola_63, Ola_69, Ola_79, and Ola_85) were found to be pure Criollo, while ten samples were grouped in the second population, being represented by the blue color of the bar (Ola_8, Ola_9, Ola_12, Ola_16, Ola_59, Ola_61, Ola_65, Ola_68, Ola_75, and Ola_84). The remaining genotypes from Intibucá and Ola_10, Ola_86, San_3, San_4, and San_18 showed an admixture of the genetic characteristics of both two clusters (being the bar roughly half blue and half green). The STRUCTURE analysis results indicated that the 57 Honduran samples were grouped in two clusters (Figure 3). Only the accessions with a coefficient of membership higher than 0.70 were considered belonging to one of the two clusters. Trees sampled in Copán and Santa Bárbara (except for San_5, San_6, San_17, San_87, San_88, and San_89) were clustered in the same population along with Criollo 13, which is the reference carrying the genetic characteristics of pure Criollo cacao. Only two accessions sampled in Intibucá (Int_54 and Int_55) were grouped along with Criollo 13. Additionally, some accessions collected in Olancho (Ola_11, Ola_14, Ola_15, Ola_57, Ola_60, Ola_62, Ola_63, Ola_69, Ola_79, and Ola_85) were found to be pure Criollo, while ten samples were grouped in the second population, being represented by the blue color of the bar (Ola_8, Ola_9, Ola_12, Ola_16, Ola_59, Ola_61, Ola_65, Ola_68, Ola_75, and Ola_84). The remaining genotypes from Intibucá and Ola_10, Ola_86, San_3, San_4, and San_18 showed an admixture of the genetic characteristics of both two clusters (being the bar roughly half blue and half green). The ΔK of the Evanno method separated the "Complete" dataset into two main clusters ( Figure S1). No other K was identified as a probable population number. The first cluster (in blue) consisted of all pure reference accessions of Criollo and most samples from Honduras with a different degree of Criollo purity (Figure 4). The second cluster included the other cacaos accessions presenting admixture population characteristics. This group contained the remaining genetic groups identified by Motamayor et al. [17]. The ∆K of the Evanno method separated the "Complete" dataset into two main clusters ( Figure S1). No other K was identified as a probable population number. The first cluster (in blue) consisted of all pure reference accessions of Criollo and most samples from Honduras with a different degree of Criollo purity (Figure 4). The second cluster included the other cacaos accessions presenting admixture population characteristics. This group contained the remaining genetic groups identified by Motamayor et al. [17]. The divergence between Criollo and the other groups was much higher than the genetic divergence among the remaining nine clusters, and the presence of many Criollo samples in the dataset determined the clustering of the other cacaos into the same group. According to the Manhattan distances, a principal coordinates analysis was produced from the first two dimensions to show the genetic differences in the "Complete" dataset. The first dimension accounted for 24.42% of the total variation, while the second dimension for 9.69%. PCoA revealed the same result of the STRUCTURE analysis: two main clusters were identified. In the group located on the left of the plot all Criollo samples clustered together either those collected in Honduras either the reference Criollo accessions from Motamayor et al. [17], while the other nine genetics groups were clustered on the right side of the plot ( Figure 5) along with some Honduran samples mainly distributed among the Amelonado, Guiana, and Marañon groups. Moreover, other Honduran samples, showing features of both of the two clusters and covering the genetic distances between Criollo and other genetic groups (Amelonado, Guiana, and Marañon), should be ascribed to as Trinitario.  [49]. Each individual sample is represented by a vertical line divided into K colored segments that represent the membership fraction for each cluster. The bar line under the plot represents the Honduras samples and the ten genetic groups identified by Motamayor et al. [17].
Honduran accessions included also Trinitario samples that showed genetic characteristics of Criollo and other genetic groups [54].
According to the Manhattan distances, a principal coordinates analysis was produced from the first two dimensions to show the genetic differences in the "Complete" dataset. The first dimension accounted for 24.42% of the total variation, while the second dimension for 9.69%. PCoA revealed the same result of the STRUCTURE analysis: two main clusters were identified. In the group located on the left of the plot all Criollo samples clustered together either those collected in Honduras either the reference Criollo accessions from Motamayor et al. [17], while the other nine genetics groups were clustered on the right side of the plot ( Figure 5) along with some Honduran samples mainly distributed among the Amelonado, Guiana, and Marañon groups. Moreover, other Honduran samples, showing features of both of the two clusters and covering the genetic distances between Criollo and other genetic groups (Amelonado, Guiana, and Marañon), should be ascribed to as Trinitario.
A further STRUCTURE analysis was performed with the 116 references accessions from Motamayor et al. [17] along with the eight Honduran pure Criollo cacaos: Cop_22, Cop_23, Cop_24, Cop_28, Cop_29, San_106, San_107, and San_109. This reduction of the samples representative of the Honduran population was necessary to diminish the statistical weight of the Criollo accessions, which, due to their genetic distance with the other groups and to their low variability, determined a peak of the ∆K corresponding to K = 2 and not congruent with that of [17]. The ∆K of the Evanno method showed that the most probable population number is K = 2, highlighting a great genetic distance between Criollo and all the other genetic groups; however, peaks of K= 5 and K = 10 were also identified as the possible number of clusters ( Figure S2). The value K = 10 showed the same results obtained by Motamayor et al. [17] (Figure 6). All Criollo accessions collected in Honduras correctly clustered with the Criollo references, while, all the other remaining genetic groups were identified.
Agronomy 2021, 11, x FOR PEER REVIEW 10 of 15 Figure 5. Principal coordinates analysis of the "Complete" dataset. The references accessions are divided based on the ten genetic groups [17], while all Honduran samples are identified by the same symbol to better visualize their wide distribution.
A further STRUCTURE analysis was performed with the 116 references accessions from Motamayor et al. [17] along with the eight Honduran pure Criollo cacaos: Cop_22, Cop_23, Cop_24, Cop_28, Cop_29, San_106, San_107, and San_109. This reduction of the samples representative of the Honduran population was necessary to diminish the statistical weight of the Criollo accessions, which, due to their genetic distance with the other groups and to their low variability, determined a peak of the ΔK corresponding to K = 2 and not congruent with that of [17]. The ΔK of the Evanno method showed that the most probable population number is K = 2, highlighting a great genetic distance between Criollo and all the other genetic groups; however, peaks of K= 5 and K = 10 were also identified as the possible number of clusters ( Figure S2). The value K = 10 showed the same results obtained by Motamayor et al. [17] (Figure 6). All Criollo accessions collected in Honduras correctly clustered with the Criollo references, while, all the other remaining genetic groups were identified. The bar line under the plot represents the ten genetic groups identified by [17]. The 8 representative Honduras pure Criollo samples cluster with the Amazon Criollo reference accessions.

Discussion
The cultivation of cacao is of great economic importance for small producers in Honduras. In recent years, the number of hectares planted was increased, and therefore the cacao export volumes have also grown, especially to Europe. The increase in the Figure 6. Population structure of Theobroma cacao L. of the 116 reference accessions along with 8 representative Honduran pure Criollo samples based on 16 SSR markers, using K = 10 according to the second best ∆K. Each individual sample is represented by a vertical line divided into K colored segments that represent the membership fraction for each cluster. The bar line under the plot represents the ten genetic groups identified by [17]. The 8 representative Honduras pure Criollo samples cluster with the Amazon Criollo reference accessions.

Discussion
The cultivation of cacao is of great economic importance for small producers in Honduras. In recent years, the number of hectares planted was increased, and therefore the cacao export volumes have also grown, especially to Europe. The increase in the demand for Honduran cacao is due to the high quality chocolate produced by the fine flavor cacao genetically influenced by the mixture of Criollo cacao. However most of the cacao cultivated fields are of the Trinitario type, which is a mixture between Criollo and Amelonado [7,55], because of the advanced lack of interest of Criollo cultivation in Honduras where no commercial plantations are available. Since about 10 years ago, ex-situ conservation began, taking some trees of Criollo cacao from the farmers to the FHIA germplasm bank. The cacao cultivation in Honduras has a historical and cultural significance, it is known from anthropological discoveries that ancient populations such as the Mayas, Lencas, Chortis, Tolupanes, and others used cacao as a form of trade and even came to have importance in their ceremonial rites, and in the form of commodities exchange (used as money) [10]. It is believed that the Criollo cacao was the only cultivated in Mesoamerica before the arrival of the Europeans [2,24]. Some investigations indicate that cacao was domesticated in Mesoamerica, from Southern Mexico to Panama and that is why it is common that in these countries Criollo cacao is found in a dispersed way. Possibly, the Mesoamerican cacao producers stopped cultivating Criollo cacao due to the arrival of new cacao cultivars (Forastero type) that are more resistant to pests and diseases and overall more productive [7,16]. Although Criollo cacao did not continue to be cultivated, it is found in small plots or isolated areas that are about to disappear. It is observed that some Criollo cacao trees are not vigorous, possibly the genetic isolation has caused an increase in the level of homozygosity (as confirmed by the observed heterozygosity and F index detected in the present study) hindering their growth, development, and adaptability to other environmental conditions, but Criollo cacao is superior in quality [40].
There are few studies on the characterization of the genetic diversity of Criollo cacao of Honduras using genomic tools. The most important study was carried out by Ji et al. [7] using SNP markers, where eleven out of the fourteen samples collected as Criollo from Honduras turned out to be pure. Motilal et al. [39], investigating the genetic diversity of Belizean Criollo cacao found that eleven different genotypes belonged to the Criollo group out of 77 samples analyzed. In this study, five accessions of Criollo cacao from Honduras (Honduras 6, 10, 11, 13, and 18), which belong to the international cacao genebank in Trinidad and Tobago (ICG, T), were used as references. In both studies [7,39] the authors used Criollo 13 as a reference as was done in the present research. Criollo 13 belongs to the CATIE cacao germplasm bank in Costa Rica [16]. The results of the present study are similar to those found by [7,39]: in fact, thirty out of the fifty-seven samples considered were shown to be pure Criollo, using Criollo 13 as a reference in the analysis, and they are distributed in Copán, Santa Bárbara, and Olancho. However, although samples were taken from trees that phenotypically have characteristics of Criollo cacao, the results show that not all the collected samples have genetic purity: in fact, eleven cacaos (from Intibucá, Olancho, and Santa Bárbara) have a mixture of Criollo and other genetic groups and therefore they could be considered as Trinitario type (Figures 3 and 4). Moreover, the remaining Honduras samples show a genetic profile clustering mainly with the Amelonado, Guiana, and Marañon genetic groups ( Figure 5) and they are predominantly located in the Olancho departments, where the cultivation of cacao has been recently introduced. The geographical areas where the samples of Criollo cacao were collected in Honduras (Copán, Santa Bárbara, Intibucá, and Olancho) represent areas currently inhabited by indigenous groups that preserve the tradition of cacao cultivation, especially in Copán, which was in past times the most important ceremonial and astronomical center of the Mayan culture [24,56].
The observed heterozygosity level detected in the "Honduras" dataset is low and comparable with that observed within the cacao populations analyzed by [7,39]. This low level of heterozygosity is probably due to the fact that samples has been collected either in small-medium family farms either from wild sources where they live in isolation or in limited groups without any farming care. This condition might be conducive to inbreeding or possible crossover between siblings that could account for the low variability within the Criollo population. The low variability is also supported by the PCoA where the pure Criollos are located in a restricted area on the left side of the graphic ( Figure 5). Moreover, the distribution of the groups in the PCoA shows a great affinity with those of the PcoA in Ji et al. [7] and of the multivariate analysis of the genetic distances in Motilal et al. [39], indicating that a fairly relative number of SSR marker is sufficient to identify the main cacao groupings.
The present study evidenced that notwithstanding the collection of samples has been carried on according to the morphological characteristics of Criollo, some samples showed a different genetic clustering. Therefore, the morphological characterization is not sufficient to identify Criollo cacao and genetic analysis is requested for a correct determination of the samples.

Conclusions
The SSR markers used in the current study were able to separate the samples of pure Criollo cacao from the other remaining genetic groups and admixture populations. Results highlighted the great genetic distance between Criollo and other genetic groups. Criollo 13 cacao continues to be a good reference for studies of cacao genetic diversity where Criollo samples are used. Finally, it was possible to determine the genetic purity of cacao Criollos from Honduras and affirm the results of other archaeological investigations that indicate that Mesoamerica is the center of cacao domestication. These results can be used to include these samples of Criollo cacao in a genetic improvement program and especially for the conservation of this genetic wealth that we have inherited from our ancestral peoples.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-4 395/11/2/225/s1, Table S1: List of the 89 cacao accessions collected in Honduras. The ID, the department where samples were collected and the GPS coordinates are shown, Table S2: List of 116 reference from Motamayor et al. [17], Table S3: List of the allelic profiles of the 57 Honduran samples for the 16 SSR loci, Figure S1: The ∆K of the Evanno method for the "Complete" dataset, Figure S2

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.