Assessing the Genetic Diversity of Ilex guayusa Loes., a Medicinal Plant from the Ecuadorian Amazon

: Ilex guayusa Loes. is a shrub native to the Neotropics, traditionally consumed as an infusion. Despite its cultural value and extensive use, genetic research remains scarce. This study examined the genetic and clonal diversity of guayusa in three different Ecuadorian Amazon regions using 17 species-speciﬁc SSR markers. The results obtained suggest a moderately low degree of genetic diversity (He = 0.396). Among the 88 samples studied, 71 unique multilocus genotypes (MLGs) were identiﬁed, demonstrating a high genotypic diversity. A Discriminant Analysis of Principal Components (DAPC) revealed the existence of two genetic clusters. We propose that a model of isolation-by-environment (IBE) could explain the genetic differentiation between these clusters, with the main variables shaping the population’s genetic structure being temperature seasonality (SD × 100) (Bio 4) and isothermality × 100 (Bio 3). Nonetheless, we cannot dismiss the possibility that human activities could also impact the genetic diversity and distribution of this species. This study gives a ﬁrst glance at the genetic diversity of I. guayusa in the Ecuadorian Amazon. It could assist in developing successful conservation and breeding programs, which could promote the economic growth of local communities and reinforce the value of ancestral knowledge.


Introduction
Ilex guayusa Loes. (Aquifoliaceae) is a shrub or a small tree [1] native to the Neotropics [2,3], distributed along the Amazon region of Colombia, Ecuador, Peru, and Bolivia [1][2][3][4][5][6]. It is commonly used to prepare tea-like infusions similar to other Amazonian beverages such as yerba mate (I. paraguariensis A. St.-Hil) and guaraná (Paullinia cupana Kunth) [6]. An important attribute of guayusa tea is its high caffeine content and a very palatable flavor without the bitterness often perceived in common tea (Camellia sinensis (L.) Kuntze) [7,8]. It grows in altitudes ranging from 200 to 2000 m.a.s.l. [6,9], and Ecuador appears to be the main center for its cultivation, especially towards the eastern Andean slopes and the adjacent Amazonian piedmont [4,[9][10][11]. Growing as part of secondary forests, guayusa proliferates on sandy-loam soils with acidic pH, and in humid and semi-dark environments [2]. Extensive use of this species by humans throughout its range of distribution strongly suggests some degree of domestication [3]. At present, its reproduction is only known to occur asexually through human propagation of basal shoots and hardwood stem cuttings [2,11]. Years of vegetative propagation and selection by humans have likely induced the loss of guayusa flowering capability [12]. According to the Global Biodiversity Information Facility (GBIF) database (accessed on 24 June 2020), the most recent preserved fertile specimen was collected in Peru in 2004. In Ecuador, the last fertile collected specimen  [29]. The Ecuadorian Amazon was divided-for the purpose of this research-into three regions: Northern, Central, and Southern, considering the samples' latitudinal range (each region covering a latitudinal extension of 1.75 decimal degrees). Each circle represents an individual, and its color represents the region to which it belongs. The Ecuadorian Amazon provinces are labeled and highlighted in colors.

Genomic DNA Isolation
Total genomic DNA was isolated using a modified CTAB-activated charcoal protocol described by Križman et al. (2006) [30] and an inhibitor-free isolation protocol for recalcitrant plants described by Rezadoost et al. (2016) [31]. Briefly, 1.5 cm 2 of a leaf was finely ground in a mortar with liquid nitrogen and homogenized with 1 mL of extraction buffer: 100 mM Tris-HCl pH 8, 2 M NaCl, 20 mM EDTA pH 8, 2% (w/v) CTAB, 1% (w/v) PVP (MW 10,000), 0.5% (w/v) activated charcoal, and 1% (v/v) β-mercaptoethanol; the last two components were suspended right before use [30]. The mixture was transferred into a microcentrifuge tube and incubated at 55 °C for 30 min with frequent agitation. Sam-  [29]. The Ecuadorian Amazon was divided-for the purpose of this research-into three regions: Northern, Central, and Southern, considering the samples' latitudinal range (each region covering a latitudinal extension of 1.75 decimal degrees). Each circle represents an individual, and its color represents the region to which it belongs. The Ecuadorian Amazon provinces are labeled and highlighted in colors.

Genomic DNA Isolation
Total genomic DNA was isolated using a modified CTAB-activated charcoal protocol described by Križman et al. (2006) [30] and an inhibitor-free isolation protocol for recalcitrant plants described by Rezadoost et al. (2016) [31]. Briefly, 1.5 cm 2 of a leaf was finely ground in a mortar with liquid nitrogen and homogenized with 1 mL of extraction buffer: 100 mM Tris-HCl pH 8, 2 M NaCl, 20 mM EDTA pH 8, 2% (w/v) CTAB, 1% (w/v) PVP (MW 10,000), 0.5% (w/v) activated charcoal, and 1% (v/v) β-mercaptoethanol; the last two components were suspended right before use [30]. The mixture was transferred into a microcentrifuge tube and incubated at 55 • C for 30 min with frequent agitation. Samples were centrifuged at 16,000× g for 10 min, and the supernatant was transferred to a new tube. One volume of chloroform-isoamyl alcohol (24:1) was added to each tube and vortexed thoroughly. Samples were then centrifuged at 16,000× g for 10 min, and the chloroform-isoamyl alcohol step was repeated. After subsequent centrifugation, the supernatant was transferred to a new tube, and 1 2 volume of Buffer 2 was added: 50 mM Tris-HCl, 2 M guanidine thiocyanate, 0.2% (v/v) β-mercaptoethanol, and 0.2 mg/mL proteinase K; the last two components were suspended right before use [31]. Tubes were incubated at 40 • C for 15 min, and one half of total volume 4 M NaCl was added. Tubes were placed on ice for 5 min, and then 300 µL of isopropanol was mixed by inversion. Incubation took place at room temperature for at least 1 h. Tubes were centrifuged at 9000× g for 20 min, and pellets were washed with 500 µL of wash buffer (15 mM ammonium acetate in 75% (v/v) ethanol) [30]. Pellets were air-dried for 15 min, and DNA was suspended in 30 µL of ultra-pure distilled water. DNA quality and concentration were assessed using a NanoDrop 2000 Spectrophotometer (Thermo Scientific, Wilmington, DE, USA). Samples were diluted to a final concentration of 30 to 60 ng/µL.

Development of SSR Markers for Ilex guayusa
SSR markers were developed at the University of Manchester Genomics Facility using the Galaxy-based bioinformatics pipeline for Illumina next-generation sequencing data [32]. Briefly, DNA was fragmented on a sonicator and size-selected before sequencing. The Illumina MiSeq platform was used to sequence a single individual of I. guayusa with the shotgun 2 × 250 paired-end sequencing methodology (Nextera DNA Preparation Kit, Illumina, San Diego, CA USA) and using 0.33 of a flow cell. After sequencing, data quality was assessed using FastQC [33], and reads were filtered and trimmed using Trimmomatic [34].
Microsatellite loci were located and identified using PAL finder v.0.02 [35]. The configuration was set to search for sequences with a minimum of eight repeated units for each locus, with motifs varying between two and six nucleotides. The function Pal_filter [32] was used to remove imperfect, interrupted repeats, loci without primers, and loci where the primer sequence had occurred in more than one read. The functions PANDASeq and Pal_filter [32] were used to assemble paired-end reads. Primer3 software was used for oligo design [36,37]. Primer design was optimized for use with Platinum Taq DNA polymerase (Invitrogen, Carlsbad, CA, USA) with an optimal Tm of 62 • C and a maximum difference of 3 • C among primers.
A total of 14,903,079 raw sequence reads were produced, with none flagged as poor quality. Sequence length ranged from 50 to 300 bp with a reported GC content of 37%. After screening, a total of 328 primer pairs were obtained; 13 SSRs had motifs of 3 bp and the rest consisted of 2 bp motifs. From the total, 17 SSRs were selected as candidates for our study (Table S2). The melting temperature (Tm) of the primer sets ranged from 58 to 63 • C, and primer length varied from 18 to 25 bp. Locus-specific forward primers were synthesized, including a universal tail (Tail A) complementary to universal primers fluorescently labeled with VIC, 6-FAM, NED, or PET [38].

Sample Preparation and Genotyping
For each set of primers, the optimal annealing temperature was adjusted to perform further PCR amplification. PCR products were amplified and fluorescently labeled using a three-primer system protocol described by Blacket et al. (2012) [38]. PCR reactions were carried in a total volume of 30 µL containing 1 U of Platinum Taq polymerase (Invitrogen, Carlsbad, CA, USA), 1× PCR Buffer, 3 mM MgCl 2 , 1 mg/mL BSA, 0.2 mM dNTPs, 0.5 µM universal fluorescent primer, 0.15 µM forward primer, 0.5 µM reverse primer and 30 ng of DNA template. Cycling conditions consisted of an initial denaturation of 94 • C for 15 min, followed by 45 cycles of 94 • C for 30 s, 90 s at the optimal annealing temperature, 72 • C for 60 s, and a final elongation step at 72 • C for 5 min. Successful amplification was checked by running an electrophoresis of the PCR products in a 1.5% (w/w) agarose gel stained with SYBR Safe (Invitrogen, Carlsbad, CA, USA). Labeled PCR products were genotyped by Macrogen Inc. (Seoul, Korea) in an ABI 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA, USA) using 500LIZ as the size standard. Genotyping results were read using GeneMarker v.2.4.0 (Softgenetics, State College, PA, USA) [39]. For the analyses, we divided the Ecuadorian Amazon area into three regions: Northern, Central, and Southern, considering the latitudinal range where the samples were collected, from −0.00913 to −4.64852 latitudes. Each region covered a latitudinal extension equivalent to 1.75 decimal degrees. Multilocus genotypes (MLGs) were assigned to the individuals using GenoDive v.1.2 to identify potential clones [40]. Average Bruvo's genetic distances [41] across all loci were used as input for GenoDive. These were calculated using the poppr package available for R v.1.3.959 [42,43] and transformed (range from 0 to 100). A clonal threshold of 2 was used according to the parameters for threshold selection suggested by Meirmans and Van Tienderen [40]. Pairs of individuals with a genetic distance below the threshold shared the same MLG and were therefore considered clones. To avoid a biased estimation of genetic diversity and population structure, analyses were conducted using clone correction, which is a method of data censorship [44]. In this way, clones were censored at the region level, and only one individual per MLG was represented in the dataset.
A genotype accumulation curve was calculated using the poppr R package [42] with 10,000 permutations to determine the loci's resolution power to discriminate between genotypes. The probability of observing a particular MLG in an individual was calculated with the same package using the formula Pcgen = (∏pi)2 h , where pi is the frequency for each allele observed in the MLG and h is the number of heterozygous loci. Then, (Pcgen) n−1 was obtained, where n is the number of times the genotype was observed in the population. These statistics were calculated to assess the probability that the same MLGs were assigned by chance between individuals [45].
Indexes of clonal diversity were calculated in GenoDive. The total number of alleles (A), expected heterozygosity (He), observed heterozygosity (Ho), and F-statistics were obtained using the hierfstat R package [46]. The adegenet R package [47] was used to evaluate the inbreeding coefficients' distribution as an indicator of the proportion of inbreeding. Private alleles (PA) were determined with the poppr R package [42], and the mean allelic richness (AR) standardized to the minimum sample size (N = 13) through rarefaction was calculated with the diveRsity R package [48]. The polymorphic information content (PIC) for each primer was determined with the polysat R package [49], and null allele frequencies for all loci were calculated with FreeNA [50]. The Hardy-Weinberg equilibrium was tested for all loci using the pegas R package [51].

Population Structure
To test genetic differentiation among and within regions, an Analysis of Molecular Variance (AMOVA) was performed using GenAlex v.6.5 with 9999 permutations [52]. The hierfstat R package [46] was used to calculate pairwise F ST between regions. In species with clonal or mixed reproduction (sexual and asexual), model-free clustering methods are recommended to infer population structure [53,54]. For this reason, a DAPC was performed with the adegenet R package [47]. To determine the optimal value of genetic clusters (K), values of between 1 and 10 were assayed by calculating the Akaike information criterion (AIC) and the Kullback Information Criterion (KIC) statistics [55]. The number of PCs retained for DAPC was determined by a mean a-score optimization and stratified cross-validation.
A Principal Coordinates Analysis (PCoA) was performed with the ape R package [56] based on Bruvo's genetic distances. Ellipses were drawn to visualize differentiations between regions (Northern, Central, Southern) and the genetic clusters identified with the DAPC. To tests for statistical significance between groups, we performed a distance-based Redundancy Analysis (dbRDA) and an ANOVA-like permutation test with the vegan R package [57], followed by pairwise comparisons using the 'multiconstrained()' function of the BiodiversityR R package [58]. The effects of geographic and environmental differences on genetic structure were preliminary evaluated with Mantel tests using the R package vegan [57]. Nineteen bioclimatic variables of the current climate conditions (average for 1970-2000) from the WorldClim database v.2.1 were downloaded with a 30 arc-second resolution [59], and their values were extracted using the raster R package [60]. To avoid multicollinearity, highly correlated bioclimatic variables (r > 0.9) were removed using the caret R package [61], resulting in six non-redundant bioclimatic variables: annual mean diurnal range (Bio 2), isothermality ×100 (Bio 3), temperature seasonality (SD × 100) (Bio 4), mean temperature of the wettest quarter (Bio 8), precipitation seasonality CV (Bio 15), and precipitation of the driest quarter (Bio 17). Each bioclimatic variable was considered a different environmental space vector, and units were first standardized using the vegan R package [57]. Then, we used the Canberra distance to calculate the environmental distances between individuals within these space vectors [62]. Geographic distances were calculated following the Haversine function with the R package geosphere [63]. Significance was evaluated with Spearman's rho coefficient with 9999 permutations.
The most relevant bioclimatic variables for predicting population structure were identified with the vegan R package [57] by computing a dbRDA and an ANOVA-like permutation test (9999 permutations) on the genetic distances, using the bioclimatic variables as explanatory variables. Additionally, the log-transformed scores along the first principal coordinate of the PCoA (PC1) were used as the response variable in a linear regression model, using the bioclimatic variables as the predictors. The best resulting linear model was obtained through a stepwise selection of the bioclimatic predictors using the MASS package [64], and the multicollinearity was assessed by calculating the Variance Inflation Factor (VIF).

Niche Characterization Modeling
We used MaxEnt v.3.4.1 [65], a presence-background modeling software based on the maximum entropy algorithm, to model the current potential distribution range of the Ecuadorian guayusa at the species level and for the different genetic clusters found in this study. Ilex guayusa occurrence points were obtained from the Global Biodiversity Information Facility (GBIF) database (accessed on 24 June 2020) [66][67][68][69][70][71][72][73][74] and from the geographic coordinates of the individuals sampled in this study. A total of 457 occurrence points were manually verified and filtered accordingly with the 'thin()' function of the spThin R package [75], with 100 iterations and a thinning distance of 1 km to avoid sampling biases and increase model performance. After filtration, 187 occurrence points were employed to model the I. guayusa distribution range. To model the genetic clusters' distribution range, the geographic coordinates corresponding to each individual were used as occurrence points.
MaxEnt was run using 10 replicates and a subsample run type with a convergence threshold of 10-5. We used a maximum number of iterations of 5000, a total of 50,000 background points, a regularization multiplier value of 2, and a cloglog output format. The auto-feature settings were used for models with at least 80 occurrence records; linear, quadratic, and hinge features were used for models with occurrences between 15 and 79; and linear and quadratic features for models with occurrences between 10 and 14. For models with fewer than 10 occurrences, only the linear feature was used [76]. To avoid inaccurate predictions outside the environmental range of training data, the "fade-by-clamping" option was used [76].
A random sample comprising 75% of the total database was used for model training, and the remaining 25% were used to evaluate model performance. A threshold rule of 10 percentile training presence was retained. The prediction accuracy of the models was assessed with the calculated area under the curve (AUC). Values of AUC between 0.9 and 1.0 indicate "excellent" model performance, while values between 0.8 and 0.9 indicate "good" model performance [76]. Schoener's D was calculated to assess the overlap between Diversity 2021, 13, 182 7 of 20 the modeled niches using ENMTools v.1.3, which ranges from 0 (no niche overlap) to 1 (identical niches) [77].

Clonal and Genetic Diversity of Ilex guayusa in the Ecuadorian Amazon
Among the 88 guayusa individuals analyzed, a total of 71 MLGs were identified ( Figure S1). After clone correction, 12 samples were eliminated from the dataset, leaving 76 individuals for further analyses. In some cases, individuals sharing the same MLGs were distributed across various sampling locations from our study. This distribution indicated that the probability of finding the same MLG was independent of the distance, and therefore a spatially structured pattern of genetic variation is not evident. The Northern region had the highest number of individuals sharing the same MLG, with up to five individuals sharing the same MLG in one case (Table S3). The genotype accumulation curve demonstrated that using 17 SSR loci was sufficient to discriminate between all MLGs ( Figure S2). Since (Pcgen) n−1 values were <0.05 for all clonal genotypes (7.58 × 10 −8 to 9.56 × 10 −5 ), we can assume that the multiple occurrences of identical MLGs among individuals were unlikely to have been generated by sexual reproduction events [45]. The Central region was found to have the highest clonal diversity. The corrected Shannon-Wiener Diversity index (ShW) was significantly higher for this region when compared to the Northern and Southern regions (p-value = 0.003) ( Table 1). The 76 selected individuals were used to estimate the genetic diversity indexes. A total of 95 alleles with an average of 5.6 alleles per locus were identified, ranging from 36 alleles in the Southern region to 66 alleles in the Central region. The Central region reported the highest values of genetic diversity (He = 0.444), and the Southern region was the least diverse (He = 0.308) ( Table 2). When assessing allelic richness corrected through rarefaction (AR), we obtained the same results: a higher richness for the Central region, followed by the Northern and Southern regions. The overall expected heterozygosity (He) found was 0.396 (Table 2), suggesting a moderately low level of genetic diversity for I. guayusa in the Ecuadorian Amazon. The inbreeding coefficient (F IS ) had an overall value of −0.185. The F IS coefficients' distribution is presented in Figure S3, which shows a right-skewed non-normal distribution, suggesting that inbreeding has rarely occurred in the sampled individuals.
The most informative locus was GYS12, and the least informative locus was GYS14, with overall PIC values of 0.56 and 0.07, respectively (Table S2). Significant Hardy-Weinberg disequilibrium was found in all SSR markers (Table S4). Null alleles presented low to mid mean frequencies across all loci, ranging from 0 (for GYS12, GYS17, GYS 22, and GYS 28) to 0.122 (for GYS03) (Table S5).

Population Structure
Pairwise F ST values ranged from 0.017 to 0.029, revealing that the three Amazon regions exhibit a low degree of genetic differentiation [78]. The Northern and Southern regions were the least differentiated, while the Central and Southern regions were the most differentiated (Table S6). AMOVA analysis indicated that 97% of the total genetic variation was found within regions and 3% among regions (p-value = 0.0026) ( Table 3). DAPC results suggest the existence of two genetic clusters (Figure 2a), as measured by the AIC and KIC statistics ( Figure S4). With this analysis, 67 individuals were classified within Cluster 1 (teal) and nine within Cluster 2 (pink). As seen in Figure 2b, with the discriminant function 1 of the DAPC, these two genetic clusters can be clearly differentiated. Geographically speaking, Cluster 1 comprised individuals from the three Amazon regions, while Cluster 2 included individuals from the Central region exclusively, located at specific collections sites known as "Canelos" and "Sevilla don Bosco" (Figure 2c). The degree of genetic structure was further explored with a PCoA based on Bruvo's distances, in which the first two axes explained 44% of the total variance. Statistically significant differences between the individuals' clustering according to their region of origin can be seen in Figure 3a (p-value = 0.0011). Pairwise comparisons revealed that individuals from the Northern and Southern regions clustered together, showing no differences (p-value = 0.4020). In the case of the Central region, individuals distributed indistinctly in the PCoA plot. However, within this distribution, there were specific samples that separated from the rest through the variance explained by the first coordinate (PC1). For this reason, pairwise comparisons recognized the Central region as a statistically different group compared to the Northern (p-value = 0.0010) and Southern regions (p-value = 0.0210). Figure 3b illustrates that individuals form two separate groups The degree of genetic structure was further explored with a PCoA based on Bruvo's distances, in which the first two axes explained 44% of the total variance. Statistically significant differences between the individuals' clustering according to their region of origin can be seen in Figure 3a  ples that separated from the rest through the variance explained by the first coordinate (PC1). For this reason, pairwise comparisons recognized the Central region as a statistically different group compared to the Northern (p-value = 0.0010) and Southern regions (p-value = 0.0210). Figure 3b illustrates that individuals form two separate groups representing the two genetic clusters previously found with the DAPC (p-value = 0.0001).
The degree of genetic structure was further explored with a PCoA based on Bruvo's distances, in which the first two axes explained 44% of the total variance. Statistically significant differences between the individuals' clustering according to their region of origin can be seen in Figure 3a (p-value = 0.0011). Pairwise comparisons revealed that individuals from the Northern and Southern regions clustered together, showing no differences (p-value = 0.4020). In the case of the Central region, individuals distributed indistinctly in the PCoA plot. However, within this distribution, there were specific samples that separated from the rest through the variance explained by the first coordinate (PC1). For this reason, pairwise comparisons recognized the Central region as a statistically different group compared to the Northern (p-value = 0.0010) and Southern regions (p-value = 0.0210). Figure 3b illustrates that individuals form two separate groups representing the two genetic clusters previously found with the DAPC (p-value = 0.0001).

Isolation-by-Distance (IBD) and Isolation-by-Environment (IBE)
Geographic distances were not correlated to genetic distances (Mantel test p-value = 0.6623, r = −0.0278). Therefore, an IBD model does not seem to explain the observed population structure. However, environmental distances were significantly correlated to genetic distances (Mantel test p-value = 0.0059, r = 0.0945), suggesting that an IBE model could explain the genetic structure we found. Environmental variables' influence as predictors of genetic distances was evaluated with a dbRDA (Figure 4). The model was considered statistically significant (p-value = 0.0002; adjusted R 2 = 0.129), and the first two constrained axes explained 15.97% of the variance. Temperature seasonality (SD × 100) (Bio 4) and isothermality ×100 (Bio 3) were identified as significant environmental variables determining the genetic differentiation among clusters (p-value = 0.0020 and p-value = 0.0010, respectively).
could explain the genetic structure we found. Environmental variables' influence as predictors of genetic distances was evaluated with a dbRDA (Figure 4). The model was considered statistically significant (p-value = 0.0002; adjusted R 2 = 0.129), and the first two constrained axes explained 15.97% of the variance. Temperature seasonality (SD × 100) (Bio 4) and isothermality ×100 (Bio 3) were identified as significant environmental variables determining the genetic differentiation among clusters (p-value = 0.0020 and p-value = 0.0010, respectively). Additionally, the linear regression model for the log-transformed PC1 scores, using the environmental variables as predictors, suggested that there is a significant linear relationship between the PC1, the temperature seasonality (SD × 100) (Bio 4) (β = 0.01437, p-value = 0.0001) and the annual mean diurnal range (Bio 2) (β = 0.06665, p-value = 0.0433) (adjusted R 2 = 0.169). Finally, a Wilcoxon rank-sum test revealed that the values of Bio 4 for both genetic clusters were significantly different (mean Cluster 1 = 41.63 °C, mean Cluster 2 = 49.81 °C; p-value = 1.8888× 10 -5 ), as well as for Bio 3 (mean Cluster 1 = 87.90%, mean Cluster 2 = 86.03%, p-value = 0.0008). The values of Bio 2 for both clusters were not significantly different (p-value = 0.0631).

Niche Characterization Modeling
We modeled a total of three potential ecological niches for I. guayusa: one for the species and one for each of the other two genetic clusters identified in this study. We also obtained the mean AUC values for each model, which is a measurement of prediction accuracy. All AUC values were >0.9, therefore indicating that our models had an excellent prediction performance. According to the Schoener's D statistic (Table 4), the predicted distribution range for Cluster 1 was the most similar with respect to that predicted for the species (D = 0.875). By contrast, the Cluster 2 distribution range was the most distinct (D = 0.426). Specifically, the suitable habitat for the species (Figure 5a) and Cluster 1 (Figure 5b in teal) was found mainly in the Ecuadorian Central Amazon, towards the

Niche Characterization Modeling
We modeled a total of three potential ecological niches for I. guayusa: one for the species and one for each of the other two genetic clusters identified in this study. We also obtained the mean AUC values for each model, which is a measurement of prediction accuracy. All AUC values were >0.9, therefore indicating that our models had an excellent prediction performance. According to the Schoener's D statistic (Table 4), the predicted distribution range for Cluster 1 was the most similar with respect to that predicted for the species (D = 0.875). By contrast, the Cluster 2 distribution range was the most distinct (D = 0.426). Specifically, the suitable habitat for the species (Figure 5a) and Cluster 1 (Figure 5b in teal) was found mainly in the Ecuadorian Central Amazon, towards the Andean-Amazonian piedmont. For Cluster 2 the suitable habitat principally comprised the eastern slopes of the Andean cordillera and the eastern Amazon basin (Figure 5b in pink). For each generated model, the bioclimatic variable importance was calculated in terms of its percentage contribution to the model's fitness (Table 5). In this way, it was determined that the potential distribution range of the species and the Cluster 1 samples was mainly explained by the mean temperature of the wettest quarter (Bio 8) and the precipitation of the driest quarter (Bio 17) ( Table 5). By contrast, Cluster 2 samples' distribution range was mainly explained by the mean temperature of the wettest quarter (Bio 8) and the precipitation seasonality CV (Bio 15) ( Table 5).  For each generated model, the bioclimatic variable importance was calculated in terms of its percentage contribution to the model's fitness (Table 5). In this way, it was determined that the potential distribution range of the species and the Cluster 1 samples was mainly explained by the mean temperature of the wettest quarter (Bio 8) and the precipitation of the driest quarter (Bio 17) ( Table 5). By contrast, Cluster 2 samples' distribution range was mainly explained by the mean temperature of the wettest quarter (Bio 8) and the precipitation seasonality CV (Bio 15) ( Table 5).  (b) genetic clusters. Niche suitability, which ranges from 0 to 1, is denoted by the intensity of colors. The darker the color's intensity, the higher the niche suitability.

A Moderately Low Genetic Diversity for Ilex guayusa in the Ecuadorian Amazon
In this study, we found that I. guayusa presents a moderately low genetic diversity (He = 0.396) in the Ecuadorian Amazon region. In a previous report, we assessed the preliminary genetic diversity of guayusa from the Ecuadorian Amazon using ISSRs markers and found that it was considerably lower than the one found with SSR markers (He = 0.19) [25]. Direct comparisons are not possible since the molecular markers used were different; however, the higher levels of genetic diversity found could be attributed to the high discriminating power of SSRs [79][80][81]. Additionally, SSRs markers have shown higher resolution to assess genetic diversity in clonal species when compared to ISSRs markers [82][83][84][85]. The present study also covered a greater sampling area, which could have contributed to the better genetic diversity characterization of the guayusa in the Ecuadorian Amazon.
SSRs markers have been previously used in Uruguay and Brazil to evaluate the genetic diversity in yerba mate (I. paraguariensis), a species closely related to guayusa. Obtained values of He ranged from 0.5 to 0.6 for Uruguay and Brazil, respectively, which were higher than the values of He found in our study [14,15]. This difference in genetic diversity could be explained by the fact that yerba mate is mainly propagated through sexual reproduction, which could increase genetic diversity [2]. In addition, I. paraguariensis is an allogamous species, which employs cross-fertilization as a reproductive strategy to promote intense gene flow between populations and, in this way, generate high levels of genetic diversity [86,87].
For clonal plants, the conservation of their genetic diversity could be promoted by a phenomenon called the "Meselson effect" [88,89]. Balloux et al. (2003) [88] suggested that the absence of sexual reproduction may lead to the accumulation of mutations over time, promoting genetic divergence between alleles within loci. This generates an excess of heterozygotes and negative F IS values, which are commonly reported within clonally reproducing species [90]. This could be the case of guayusa, where we found an overall negative F IS value (−0.185) and high Ho, similar to other clonally propagated plants [91][92][93].
The way in which guayusa has been cultivated and consumed could also be implicated in explaining the genetic diversity found. The earliest report of human utilization of this species can be traced back to 500 A.D.; however, its selection and domestication process began only in the late 1600s, when its extensive consumption was known to have been already established [10]. During this elapsed time, guayusa was being intermittently traded among indigenous communities [3], and until the present day has only been cultivated under a smallholder production model, which has shown to strengthen biodiversity conservation [94]. In commercially important crops, up to 10,000 years are needed to achieve full domestication [95], which could explain why guayusa is not yet considered a completely domesticated plant [3]. It is known that extended cultivation and selection periods are among the most critical factors causing genetic diversity erosion [95], which certainly is not the case for guayusa.
Concerning the genetic diversity found in each of the three regions of our study, we noted that the Central region harbored the greatest genetic diversity. Previous studies reported that the distribution of genetic diversity within the range of a species is often not uniform [93,96]. The "Abundant-Center Hypothesis" states that central populations tend to develop greater levels of genetic diversity. This phenomenon occurs fairly frequently [97][98][99] and needs to be further explored in guayusa.

High Clonal Diversity for Ilex guayusa in the Ecuadorian Amazon
In this study, we found high clonal diversity for I. guayusa (Simpson's diversity index, D = 0.986-0.998; Shannon-Wiener index, ShW = 1.893-2.670), as was previously reported for another member of the Ilex genus (I. integra) [18]. Genotypic diversity assessment is important in clonal species because clonality can reduce the number of unique MLGs present in a population [100][101][102][103][104]. Regarding clonal diversity distribution, we found that individuals sharing identical MLGs were not necessarily close in proximity. It is known that certain means of vegetative propagation such as stolons, runners, and bulbils can result in this type of MLG distribution pattern, in which clones are located farther apart from the parental plant [105]. To the best of our knowledge, guayusa reproduction is mainly performed by human planting of stem cuttings [3]. Thus, the present distribution of the identified MLGs is most likely explained by secondary factors such as human activity [26].
Mechanisms for preserving clonal diversity in plants with poor or absent sexual reproduction could encompass multiple strategies. One of them is maintaining a clonal reproduction habit, since continuous propagation could lead to the long-term persistence of established MLGs in a population [27,92]. In that way, the only means by which an MLG could be lost is by the death of all the individuals harboring this MLG [27]. This is relatively uncommon for long-lived species such as guayusa, whose trees can live for hundreds of years according to various ethnobotanical observations [4]. As the species is long-lived, there is also a probability that the MLGs in existence today are clones from the original MLGs established among the founder populations [27,92]. In this way, we could still be analyzing some part of the species' initial clonal diversity, when sexual reproduction was not as limited as today [106]. All these factors could account for the high clonal diversity observed in guayusa.
Additionally, truly clonal individuals are considered "immune" to the diversifying effect of recombination occurring during sexual reproduction [107]. This preserves MLGs over time because whenever the gene pool is reshuffled during reproduction, the alleles can be lost by chance, especially in small populations [27]. Nonetheless, we cannot rule out the probability that occasional sexual reproduction events could occur, as has been suggested for other species [89,104]. Regardless of the idea that guayusa does not bloom [4], studies suggest that even in fully domesticated plants with lost sexual fertility, sexual reproduction still occurs at a negligible rate in semi-domesticated varieties [28]. Most forest tree species, like guayusa, are at an early stage of domestication [95], and therefore, it is probable that they still conserve effective means of sexual reproduction.

Influence of the Environment and Human Activities as Modulators of Ilex guayusa Population Structure
The DAPC suggested the existence of two genetic clusters. The PCoAs on Bruvo's distances further confirmed that individuals clustered in two different groups. Together with the low values of the F-statistics and the large intrapopulation genetic variability calculated by the AMOVA, these results suggest that I. guayusa has a shallow and subtle population structure across the Ecuadorian Amazon. Previous studies have identified geographic distance as a key factor affecting a species' genetic structure [62,108]. However, we did not find evidence that an increase in the geographic distance was correlated with an increase in genetic differentiation. For example, Cluster 1 grouped distant individuals, such as those belonging to the Northern and Southern regions, independently of the geographic distance existing between them. Similarly, Cluster 2 grouped only individuals belonging to Canelos and Sevilla Don Bosco (two specific localities at the Central Amazon), while other individuals in the proximity were completely excluded. For this reason, we propose that a model of IBD does not explain the genetic structure that we found for I. guayusa, and suggest that the model of IBE could explain it. We corroborated that the environmental distances were correlated to the genetic distances with a Mantel test, indicating that climatic and habitat conditions have played a role in determining the population structure of this species.
IBD appears to be more frequently reported in plants than IBE [98]. However, this may be due to the recent recognition of the environment's role in shaping spatial patterns of gene flow and genetic variation [109][110][111]. Nowadays, IBE is known as a widespread but complex phenomenon in nature [98,112], driving genotypic and phenotypic divergence in several taxa, including tree species like Fraxinus angustifolia Vahl [113], Alnus incana (L.) Moench [114], Populus nigra L. [115], and Sequoiadendron giganteum (Lindl.) J.Buchholz [116]. It should be considered that IBE does not operate in isolation and can interact with geography, colonization, selection, gene flow, and genetic drift as well [117,118]. For this reason, studying observed patterns of spatial genetic differentiation caused by IBE is coupled with several challenges, such as disentangling the effect of correlated variables and determining the processes that have generated and maintained the observed IBE patterns [111].
To deepen into the processes driving genetic differentiation, a linear regression model and a dbRDA, which is a method used to assess the influence of environmental variables on population structure, led us to suggest that among the studied bioclimatic variables, the one that best explained the genetic differentiation of Cluster 2 in guayusa was the temperature seasonality SD × 100 (Bio 4). This bioclimatic variable is defined as the amount of temperature variation over a given year. Previous studies have determined that Bio 4 can demographically isolate populations by environmental climate heterogeneity, which will promote genetic divergence within the species [119,120]. Temperature seasonality has also been demonstrated to affect the caffeine concentration in species like tea [121] and yerba mate [122]. Caffeine is a principal chemical constituent produced among many species of the Ilex genus [123], and in the case of I. guayusa, variable concentrations of caffeine can result in a broad range of desired and undesired physiological effects when consumed [1]. In 1991, Lewis et al. highlighted the importance of experiential learning as a factor for the human selection of I. guayusa individuals [1]. They observed that cultivars of guayusa known to produce unsettling symptoms after consumption were rarely propagated among indigenous communities. They further suggested that plant populations with specific chemical profiles can harbor genetic differences as well, and therefore, the selection of I. guayusa cultivars based on their chemical profile may have potentially affected the population structure seen nowadays [1].
We predicted the potential distribution range for I. guayusa at the species level and for each of the genetic clusters identified. Our results highlighted the presence of niche dissimilarities for each of the genetic clusters found. Previous studies have reported this pattern, in which predicted niches could vary between genetic groups within a species [120,124] since they are usually more environmentally compact and tuned to specific climatic conditions [120]. We observed that the potential distribution range of I. guayusa at the species level and for Cluster 1 was mainly explained by the mean temperature of the wettest quarter (Bio 8) and the precipitation of the driest quarter (Bio 17). On the other hand, genetic Cluster 2 exhibited a different combination of environmental variables that explained its potential distribution range, mainly precipitation seasonality CV (Bio 15). In addition, we predicted that Cluster 2 could occupy landscapes towards the range limits of the species' distribution. At these marginal or borderline habitats, populations can develop unique genetic variation due to local adaptations to the occurring environmental conditions [93]. This may promote the genetic divergence of peripheral populations, causing IBE patterns of genetic differentiation as observed in this study for guayusa.
Even though more studies are needed, the possibility that certain guayusa genotypes might have adaptations to specific climatic conditions is of great interest in the context of climate change and species conservation. This study emphasizes the usefulness of using SSRs to identify genotypes whose conservation should be prioritized. Incorporating genetic information into the modeling of a species' distribution range is important because it can help identify specific variables and environmental drivers that could account for the observed genetic structure [120]. Moreover, including genetic information can improve modeling accuracy when predicting a species' potential distribution under climate change scenarios, giving important cues for its conservation in the future [124].
We have discussed how environmental factors may have affected the genetic structure of I. guayusa. However, genetic diversity is a complex interplay between several evolutionary phenomena [95], in which we cannot dismiss human activity as an additional factor that may have influenced I. guayusa genetic structure. Historical records indicate that due to guayusa trading, the commercial relationships between the Andes and the Amazon intensified [125]. In Ecuador, the Bobonaza and the Pastaza rivers were considered key fluvial routes for transportation and commercialization between the Central Amazon and the Central Andes [126], which could have strategically connected human settlements located alongside these rivers. Examples of this in the Ecuadorian Amazon are Canelos and its adjacent valleys, which hold great importance in the guayusa cultivation history [7,10] since they are suspected to be the real centers of guayusa cultivation [4]. Around 1850, the use and cultivation of guayusa were thought to be confined to these locations, and in the same decade, the existence of a group of guayusa trees from the pre-Columbian period on an ancient site called Antombós, located in the Central Andes at the gorge of the Pastaza river, was reported [4,5,7,127]. As guayusa was increasingly traded between regions, this could have led to the migration of individuals from both guayusa populations, shaping the genetic structure that the species has today. We found that the samples from Canelos grouped within Cluster 2, which might serve as evidence of guayusa's trading and cultivation history. Further analyses that include samples from the eastern Central Andes could shed light of the composition of Cluster 2.
With these preliminary results, a more thorough sampling of individuals at multiple locations should be performed to better understand the genetic diversity and demographic history of guayusa. Moreover, we highlight the environmental plasticity found within this clonal species as a key feature for its success [105,128] as it could help overcome genetic diversity erosion and environmental changes [28]. The identification of populations adapted to diverse environments and different ranges of distribution is of great importance for the conservation of I. guayusa. Usually, these populations harbor unique genetic pools that could make them more resilient to environmental changes, which in the near future could help maintain the species' ability to evolve and adapt [62,93].

Conclusions
This study offers a first glance at the genetic diversity of I. guayusa in the Ecuadorian Amazon region. The species showed a moderately low genetic diversity and high clonal diversity. Two genetic clusters were identified among the 71 individuals with unique MLGs. We suggest that the population structure found might be explained by a model of IBE. Temperature seasonality (SD × 100) (Bio 4) and isothermality ×100 (Bio 3) were found to have an important influence on how the guayusa genetic clusters are distributed. Nonetheless, further studies are needed to fully understand its genetic diversity and population structure. It is relevant to establish conservation strategies and sustainable use of guayusa since it is considered an important agroforestry resource. Different approaches should be considered, such as studying the possible occurrence of sexual reproduction and maintaining the chacra agroforestry model based on a smallholder-based production system. Conserving its genetic and clonal diversity is key to establishing successful longterm breeding programs, which will promote the economic growth of local communities and reinforce the value of ancestral knowledge.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/d13050182/s1, Figure S1: Frequency distribution of the pairwise Bruvo's genetic distances (transformed between 0 and 100) for 88 Ilex guayusa individuals genotyped with 17 SSR markers. Following the assumption that the histograms of the pairwise comparisons are often multimodal, the valley between the first and second peak was considered as a good candidate to use as a threshold. At a threshold = 2, the inferred number of MLGs was 71. Figure S2: Genotype accumulation curve describing the genotypic resolution of the 17 SSR markers used in this study. All the employed markers were sufficient to identify and discriminate between the 71 MLGs. Figure S3: Histogram of the frequency of the F (inbreeding coefficient) values observed within the 76 Ilex guayusa individuals selected after clone correction. The data follow a right-skewed non-normal distribution. Figure S4: Goodness-of-fit statistics obtained for the determination of the optimal number of genetic clusters (K) using a maximum-likelihood approach. (a) Akaike information criterion (AIC) values plotted against a determined number of K; (b) Kullback Information Criterion (KIC) values plotted against a determined number of K. Table S1: Associated information to the Ilex guayusa samples collected for this study in the Ecuadorian Amazon. Table S2: Information of SSR-specific primers designed for Ilex guayusa. Table S3: Inferred MLG for each sample as calculated by GenoDive, using a genetic distance threshold of 2. Table S4: Results of the Hardy-Weinberg Equilibrium (HWE) test for each locus, after a Bonferroni correction. Table S5: Null allele frequencies estimation for each analyzed region and SSR locus.