Genetic Diversity and Population Structure of Jubaea chilensis, an Endemic and Monotype Gender from Chile, Based on SNP Markers

Jubaea chilensis (Molina) Baill., also named Chilean palm, is an endemic species found in the coastal area of Mediterranean sclerophyllous forest in Chile. It has a highly restricted and fragmented distribution along the coast, being under intense exploitation and anthropogenic impact. Based on 1038 SNP markers, we evaluated the genetic diversity and population structure among six J. chilensis natural groups encompassing 96% of the species distribution. We observed low levels of genetic diversity, a deficit of heterozygotes (mean HE = 0.024; HO = 0.014), and high levels of inbreeding (mean FIS = 0.424). The fixation index (FST) and Nei’s genetic distance pairwise comparisons indicated low to moderate structuring among populations. There was no evidence of isolation by distance (r = −0.214, p = 0.799). In the cluster analysis, we observed a closer relationship among Culimo, Cocalán, and Candelaria populations. Migration rates among populations were low, except for some populations with moderate values. The K value that best represented the spatial distribution of genetic diversity was ∆K = 3. Habitat fragmentation, deterioration of the sclerophyllous forest, lack of long-distance dispersers, and a natural regeneration deficit may have driven inbreeding and low levels of genetic diversity in the palm groves of J. chilensis. Although extant populations are not at imminent risk of extinction, the rate of inbreeding could increase and migration could decrease if the effects of climate change and human impact become more acute.


Introduction
Understanding the extent to which human activities affect the distribution of the genetic diversity of plant species is mandatory for geneticists and ecologists interested in precariousness in dispersion is highly affected by the black rat Rattus rattus L. that predates the seeds. Cordero et al. [34] found that the germination rate was 6%, plant survival was 1.81%, and only 7.9% of surviving seedlings become infantile plants (4 years old). The recently introduced domestic species, such as cattle and horses, also have an important effect, considering that they remove, consume, and regurgitate the seeds, but leave them susceptible to competition and desiccation [26,35]. In spite of the fact that pollen dispersal can cover a greater distance than seeds [36][37][38], in this case, its movement is limited because of the abundant flowers produced by just one individual, which mean that pollinators visit contiguous flowers of the same individual, and visits to the flowers of other individuals are very infrequent events [26]. It has been described that, in systems with restricted pollen dispersal, the interruption of genetic connectivity can occur even in continuous populations [39][40][41].
To assess the magnitude of the loss of genetic diversity that J. chilensis populations present as a result of isolation and low population sizes, a genetic conservation study was carried out in remnant populations of the species under study. Specifically, we estimated parameters of genetic diversity and inbreeding and evaluated genetic structure and patterns of contemporary migration among populations. The information obtained will be crucial to understand the current genetic diversity as well as provide input for planning strategies for safeguarding sustainable management of the conservation of this emblematic species.

Statistical Analysis and Genetic Diversity
The total number of reads was 226,279,541 with good average quality (Q-score ≥ 30), showing an optimal value (Q ≥ 30) throughout all of the sequences (101 bp, -read length sequenced by the Illumina HiSeq platform). A total of 1038 SNPs were selected for this study, according to our previously established parameters. The distribution of SNP minor allele frequencies (MAFs) is shown in Table S1. More than 96.92% of the SNPs for all sampled populations presented MAF values ≤ 0.1, varying from 0.994 in Culimo to 0.969 in Viña del Mar/Valparaíso, with MAF scores ranging up to MAF ≤ 0.4.
Expected heterozygosity values (H E ) were higher than the observed values (H O ) for all populations. The mean H O varied from 0.036 in Culimo to 0.108 in Petorca, and the mean H E varied from 0.086 in Culimo to 0.145 in Petorca, with an overall H O of 0.014 and an overall H E of 0.024, both suggesting a deficit of heterozygotes. Inbreeding coefficient values (F IS ) for all populations were statistically significant (overall value of 0.424, with a 95% confidence interval not including zero), varying from 0.177 in Viña del Mar/Valparaíso to 0.586 in Culimo, which also indicates high levels of inbreeding with little or no random mating (Table 1). All F ST comparisons between populations showed values above 0.049 and below 0.119, which indicates moderate genetic structuring ( Table 2). Nei's genetic distance values were lower than 0.003 in all populations, also indicating that the populations may be genetically similar (Table 2). In the PCO, the first two axes explained 61.4% of the variance (35.0% and 26.4%, respectively). There was relative proximity between the Culimo, Cocalán, and Candelaria group and the Petorca population, while the Ocoa and Viña del Mar/Valparaíso populations remained in opposite peripheries ( Figure 1). Mantel test results indicated that Nei's genetic distance matrix did not significantly correlate with the geographic distance matrix (r = −0.214, p = 0.799), i.e., the nearest populations did not have lower values of genetic differentiation. When observing the Mantel analyses between geographical distance and F ST , similar results were observed, that is, there was no significant correlation (r = −0.125, p = 0.669). Low migration rates were observed (m = 0.010-0.022), except for the following populations ( Table 3). The log likelihood values were comparable between BA3-SNPs runs, as was the Bayesian deviance (Table S2).  Table 3). The log likelihood values were comparable between BA3-SNPs runs, as was the Bayesian deviance (Table S2).

Genetic Structure and Admixture Levels
Based on the maximum value of ∆K, the estimated number of optimal groups was K = 3 ( Figure 2 and Figure S1, Table S3). Individuals from the Cocalán, Culimo, Candelaria, and Petorca populations had very close genetic characteristics, while Viña del Mar/Valparaíso was characterized as a unique group for almost all K values. The Culimo and Petorca individuals remain together for all of the assigned K values (Figure 2). At K = 3, 99% of the individuals in each population are correctly assigned to the area from which they were sampled, and only one individual from OCOA (129) contains attributions higher than 50% from other areas of origin ( Figure S2).

Genetic Structure and Admixture Levels
Based on the maximum value of ΔK, the estimated number of optimal groups was K = 3 (Figures 2 and S1, Table S3). Individuals from the Cocalán, Culimo, Candelaria, and Petorca populations had very close genetic characteristics, while Viña del Mar/Valparaíso was characterized as a unique group for almost all K values. The Culimo and Petorca individuals remain together for all of the assigned K values (Figure 2). At K = 3, 99% of the individuals in each population are correctly assigned to the area from which they were sampled, and only one individual from OCOA (129) contains attributions higher than 50% from other areas of origin ( Figure S2).

Discussion
Our results show that all studied populations of J. chilensis present low genetic diversity and high inbreeding. Given their biallelic heritage, the highest value of HE expected for SNPs is 0.5. However, the highest value observed for J. chilensis was less than a third of that value, indicating that these populations likely suffer from loss of

Discussion
Our results show that all studied populations of J. chilensis present low genetic diversity and high inbreeding. Given their biallelic heritage, the highest value of H E expected for SNPs is 0.5. However, the highest value observed for J. chilensis was less than a third of that value, indicating that these populations likely suffer from loss of genetic diversity. Null alleles are an issue for many marker types and could also result in a downward bias in estimated heterozygosity. Unfortunately, we are not able to estimate null alleles' frequency and SNP null alleles have not been described in this species. The inbreeding coefficient was significant and high for all populations, even though the species is diclinomonoescious, which favored crossing [42]. Interestingly, the population with a small number of specimens (CUL) had the lowest value of H E and the highest value of F IS . All of these results are indicative that J. chilensis may be suffering from genetic erosion. Other palm species have also shown high levels of inbreeding, and it was usually associated with the mating system, anthropic effects, and low abundance of populations [43][44][45][46][47][48]. The levels of inbreeding found in this study are concordant with what has been found in other species of the Arecaceae family. For instance, Shapcott et al. [43] found high levels of inbreeding in populations of five species of the genus Pinanga. Cibrián-Jaramillo et al. [44] found high inbreeding in populations of Chamaedorea ernesti-augusti, and declare that the high inbreeding is a result of the anthropic effect. Santos et al. [45], using nine ISSR markers, showed high levels of inbreeding and low abundance levels of Attalea vitrivir individuals. Other studies using microsatellites confirmed the prevalence of inbreeding in Acrocomia aculeata populations, product of crosses between very close relatives, or even full siblings [46][47][48]. For Syagrus coronata, for example, the authors take into account the high-level of inbreeding to conduct conservation measures [49]. Given that the sclerophyllous forest inhabited by J. chilensis has been highly exploited and approximately only 121,284 specimens are found in Central Chile (2.5% of the original population of the 19th century), being distributed among highly fragmented populations [26,50,51], conservation measures for this species should consider the low level of genetic diversity and inbreeding found in our study.
Contrary to our expectation, we found low genetic structure, distant populations of J. chilensis being genetically similar, and no evidence of isolation by distance. These results are unlikely to reflect the current seed dispersal pattern of the species. Currently, seeds of J. chilensis are dispersed close to mother plants, as rodents (Octodont degus and Spalacopus cyanus) and introduced domestic species are the main seed dispersers. The frequency of rodent-seed interaction is low (<25%) and transport usually occurs over short distances (<6 m [35]), indicating that rodent-dependent dispersion may be insufficient for genetic exchange between existing palm groves [26]. Pollen dispersion is also unlikely to explain the low genetic structure among the populations found here, as the movement of pollen between individuals of J. chilensis seems to be limited owing to the high abundance of flowers produced per individual [26]. It has been described that, in systems with restricted pollen dispersal, the interruption of genetic connectivity can occur even in continuous populations [39][40][41].
Nowadays, the geographical distribution of palm groves is highly scattered in a series of small, isolated patches, consisting of small independent local populations at a geographic level, as depicted in Figure 1. As a consequence, the degree of inbreeding increased as a result of the crossing between relatives [52][53][54]. Studies on different taxa of the Arecaceae family suggest that fragmentation considerably affects the persistence of palm trees. In the south of the Brazilian Amazon, forest fragmentation has affected Bactris Jacq. ex Scop. genus (palm), leading to inbreeding depression and possibly leading to extinction of the local palm tree [55]. For the Astrocaryum aculetassimum (Schott) Burret, an endemic Atlantic Forest palm species, it has been reported that the species populations are highly affected by the loss of seed dispersers as a result of fragmentation and hunting [56]. Moreover, for the Astrocaryum mexicanum Liebm. ex Mart. species inhabiting Los Tuxtlas (State of Veracruz/México), the abundance of coleopterans and pollinating beetles varies according to the size of the fragment, making reproduction doubly susceptible to the effects of fragmentation [57]. In Ecuador, the seedlings of Ceroxylon echinulatum Galeano and Attalea colenda (O.F. Cook) Balslev and A.J. Hend. have failed to survive as a result of deforestation [37,58,59]. An extensive review of the literature of palms from Tropical America indicated that anthropic influences may cause changes in the genetic structure, increasing inbreeding, and genetic drift in fragmented populations [37]. Similarly to this study, the Phoenix dactylifera L. populations occurring in natural oases in Tunisia also presented H E values higher than H O values with low genetic structure [60].
It is important to highlight that genetic structure analysis may reflect both contemporary and historical processes such as past gene flow and population fluctuation [61], and that the contemporary migration rates estimated here usually correspond to the last three generations. Moreover, J. chilensis is a palm tree with a significantly long life expectancy, with 1000 years being the estimated age for the oldest individual [29,62]. Therefore, the low genetic structure detected for some populations and the moderate migration rates observed among some populations could be a result of high gene flow in the past. In the Pleistocene, seeds of J. chilensis were consumed and dispersed by the extinct megafauna species belonging to the following families: Gomphotheriidae, Camelidae, Equidae, Notohippidae, Homalodotheriidae, Toxodontidae, Astrapotheriidae, Macraucheniidae, Mylodontidae, and Megatheriidae [26,31]. Simulations suggest that extinct megafauna would frequently disperse large seeds over a long distance, and the events of long-distance seed dispersal by these animals would be up to ten times longer than long-distance dispersal by smaller-sized extant mammals [63]. Cocalán is the second largest population; this can be related to the fact that Cocalán presents an excellent habitat for the growth and development of Jubaea, where the soils are almost exclusively granitic [29] and highly resistant to drought and shade [64].
Germination and establishment studies of J. chilensis have been carried out [65][66][67][68] in the context of the National Plan for the Conservation of the Chilean Palm (Corporación Nacional Forestal, CONAF) in 2005. Unfortunately, the Plan does not have recommendations to evaluate the genetic structure of this palm as a fundamental requirement to propose viable and efficient actions for the conservation of this species [1]. As a result, active restoration of J. chilensis via sowing seedlings has been carried out without any consideration of genetic or biogeographic origin. Understanding the distribution of genetic variability can be especially important for species facing extinction risk, especially when translocations for restoration, genetic rescue, or assisted gene flow are fundamental aspects for the successful long-term population [69]. Currently, the Chilean palm is listed as endangered (EN) by IUCN. The lack of long-distance seed dispersal events, no capacity for vegetative propagation, and low seed regeneration, plus its severe seed harvest and demographic bottleneck, may increase the extinction risk of J. chilensis capacity [26,60,61]. Although at this moment there is no imminent risk, all these factors could promote genetic isolation, either within or between populations, which can intensify the loss of genetic diversity and inbreeding depression [41,54].

Conclusions
All populations of J. chilensis studied have low genetic diversity, high inbreeding, and no evidence of isolation by distance. Habitat fragmentation, deterioration of the sclerophyllous forest, lack of long-distance dispersers, and natural regeneration deficit may have driven inbreeding and low levels of genetic diversity in the palm groves of J. chilensis. Although extant populations are not at imminent risk of extinction, the rate of inbreeding could increase, and the migration effect might be overwhelmed by the effects of climate change and human impact. Thus, considering our findings plus the natural history and ecological context wherein J. chilensis lives, we suggest that this monospecific, relict, and endemic palm species should be considered as critically endangered by international conservation organizations.

Study Organism
Jubaea chilensis is an arborescent and woody monocot, with a bare and cylindrical stipe narrower towards the top, reaching up to 30 m in height and from 0.80 to 1.10 m in diameter. It is a diclino-monoecious plant with cross pollination [42] and the fruit is a drupe with a single spherical seed of approximately 2-3 cm (0.79-1.18 pol.) in diameter. The fruit has an orange fleshy pulped-pericarp and a hard-lignified endocarp [26][27][28][29][30]. J. chilensis' natural populations are distributed from La Serena (29 • 54 S-71 • 15 W) in the Coquimbo Region to Tapihue-Pencague (35 • 15 S-71 • 47 W) in the Maule Region. It inhabits warm climate areas with dry summers and coastal areas with coastal fog influence [29]. The species is found from sea level to approximately 1400 m [58], thus tolerating temperatures from 2.9 • C to 30.8 • C, with precipitations ranging from 127 to 879 mm [26,63]. The remaining populations Plants 2022, 11, 1959 8 of 14 are in an advanced stage of aging, with little or no appearance of natural regeneration and under a high anthropic impact due to the commercialization of their fruits and sap [26]. The dispersed seeds germinate for up to four years, forming persistent seed banks. Although the species adapts very well to its environment, the first stages of growth in which the seedlings must survive under the canopy are critical [62], especially in the understory of sclerophyllous and/or spiny species [63,64].

Study Area and Sampling
For the selection of population groups and individuals of J. chilensis to be sampled, we considered four parameters: (i) populations with more than 50 individuals; (ii) a sampling scheme that covers the entire species' geographical range; (iii) the exclusion of specimens belonging to plantations; and (iv) a maximum of 30 individuals per population was analysed. We sampled the following populations groups:  (Figure 1, Table 1). These groups of populations correspond to 95.98% of the total species abundance and are differentiated mainly by the orographic and edaphoclimatic characteristics (Figure 3, Table 4).

Genotyping-by-Sequencing Library and Single Nucleotide Polymorphisms' Selection
We collected leaf tissue samples from 140 adult individuals of J. chilensis chosen at random, which included the entire distribution range and with a minimum separation between individuals of 150 m, between February and December 2015 (Table 4). Samples were stored in silica. For molecular analysis, DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, EUA). DNA concentration was quantified using the Qubit High Sensitivity Assay kit (Invitrogen) and DNA integrity was assessed through visualization in a 1.2% agarose gel electrophoresis. Prior to library construction, DNA amount per sample was normalized at 100 ng/µL. The Genotyping-by-Sequencing (GBS) library was constructed using the standard protocol described by Elshire et al. (2011, [70]) and employing the ApeKI restriction enzyme. Single-end 100-bp sequencing was conducted on the Illumina HiSeq 2500 platform. Samples were demultiplexed according to their respective barcodes. The generated sequences were filtered to remove low-quality sequences and contaminated reads using SeqyClean 1.9.9 (https://github.com/ibest/seqyclean, accessed on 1 January 2021); only high-quality paired-end sequences (with average PhredScore over 24 and over 65 bp) were used for further analysis. The single nucleotide polymorphisms' (SNPs) prospection was performed using the software pipeline PyRAD v1.2 [71]. Reads with more than five Ns or shorter than 35 bp were discarded. The clustering threshold was set to 90% and the maximum number of SNPs per locus was set to 30. A locus had to be present in at least 50% of the samples to be retained in the final dataset. All other parameters were maintained at default values. To conduct population genetic analysis, we identified putative loci under selection using software BayeScan 2.1 [72] using the default values (Q-value < 0.05). The LOSITAN software was also used to identify loci under selection [73] and later removed from the SNPs dataset. The R package r2vcftools (https://github.com/nspope/r2vcftools, accessed on 1 January 2021)-a wrapper for VCFtools [74]-was used to perform final quality control on the genotype data. Filtering criteria included biallelic SNPs, linkage disequilibrium (r 2 < 0.8) [75], Hardy-Weinberg equilibrium (HWE, p > 0.0001), and loci with less than 20% missing data.

Genetic Analysis
The fixation index coefficient (F IS ) was calculated based on the variance found in the allelic frequencies [76], and the intra-population genetic diversity (expected and observed heterozygosities) was estimated under the Hardy-Weinberg equilibrium. Both analyses were performed using the 'het' option in VCFtools implemented in r2vcftools (https://github.com/rojaff/LanGen_pipeline, [74]-accessed on 1 January 2021). F ST calculations between populations were performed using the dartR package v.183 [77] in R software (R Core, [78]). Genetic distance was estimated according to Nei (1972), where a constant and equal mutation rate is considered for all loci with equal population sizes for all generations and a mutation-drift balance, using the adegenet package in R [79]. Additionally, the geographical distance (in meters) between pairs of populations was calculated considering the Earth's curvature (assuming the spherical model), using the geosphere package in R (R Core, [80]). Mantel test based on Spearman's correlation with 9999 permutations was performed using geographical distance with F ST , using the vegan package [81]. A principal coordinate analysis (PCO) based on Nei genetic distance was conducted using the pcoa command of the ape package [82,83]. We also calculated the migration rates (m) [84] of the studied populations. We used the BA3-SNPs v 3.0.4 [85] to determine the amount and direction of migration between populations. BA3-SNPs was run ten times using a different random seed across 10,000,000 iterations with a burn-in of 1,000,000 iterations and sampling every 1000 iterations. To compute the suitable allele frequency (a), inbreeding coefficient (f), and migration rate (m), we tested several values until we obtained the acceptance percentages recommended by BA3-SNPs authors (between 20 and 60%), being a-0.80, m-0.60, and f-0.60. We then assessed model convergence across all ten runs using Tracer v1.7.2 (http://beast.community/tracer, accessed on 1 January 2022). We used the Bayesian deviance as calculated by Meirmans (2014, [86]) to search for the best fitting model (the one with the lowest Bayesian deviance was selected for interpretation) [87]. A rough 95% confidence interval (CI) was constructed as mean ± 1.96 * sdev. All migration rates whose 95% confidence intervals did not include zero were reported as significant.

Genetic Structure and Admixture Levels
Populational structure and admixture degree were inferred using the Bayesian grouping method implemented in the STRUCTURE v 2.3.4 program [88][89][90]. Parameters were calculated for each K value (1 ≤ K ≤ 6) by means of an MCMC analysis of 100,000 replicates with a burn-in period of 20,000. The results were obtained based on 25 runs per population grouping, assuming both the Admixture model, which allows individuals to have multiple population origins, and the correlated allelic frequency model [88,89]. Finally, the posterior probability for each K value was determined by introducing the simulation results into STRUCTURE HARVESTER v 0.6.94 [91], where K values were evaluated using the logarithm of likelihood of the observed data (LnP [D]) as well as the second-order change rate of the data s log-likelihood in distinct runs of K (∆K) [92].
Supplementary Materials: The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/plants11151959/s1. Figure S1: ∆K values for different numbers of populations assumed (K) in the STRUCTURE HARVESTER analysis v 0.6.94 for the six population groups of Jubaea chilensis, central Chile. Figure