Independent Evolutionary Lineages in a Globular Cactus Species Complex Reveals Hidden Diversity in a Central Chile Biodiversity Hotspot

Unraveling the processes involved in the origin of a substantial fraction of biodiversity can be a particularly difficult task in groups of similar, and often convergent, morphologies. The genus Eriosyce (Cactaceae) might present a greater specific diversity since much of its species richness might be hidden in morphological species complexes. The aim of this study was to investigate species delimitation using the molecular data of the globose cacti “E. curvispina”, which harbor several populations of unclear evolutionary relationships. We ran phylogenetic inferences on 87 taxa of Eriosyce, including nine E. curvispina populations, and by analyzing three plastid noncoding introns, one plastid and one nuclear gene. Additionally, we developed 12 new pairs of nuclear microsatellites to evaluate the population-level genetic structure. We identified four groups that originated in independent cladogenetic events occurring at different temporal depths; these groups presented high genetic diversity, and their populations were genetically structured. These results suggest a complex evolutionary history in the origin of globular cacti, with independent speciation events occurring at different time spans. This cryptic richness is underestimated in the Mediterranean flora of central Chile, and thus unique evolutionary diversity could be overlooked in conservation and management actions.


Introduction
The planet's biodiversity is declining at unprecedented rates [1]. Mediterranean regions are among the most unique and threatened ecosystems, characterized by mild, wet winters and very dry, long, hot summers [2,3], and harboring 20 percent (%) of species even when having a surface less than 5% the of total global landmass [4]. The Mediterranean area in Central Chile is recognized by the high number of endemic species, many of which are heavily threatened [4,5]. Since the Chilean economy has long depended on its natural capital for economic development, there has been a severe impact on even its most biodiverse ecosystems, from agriculture, to housing, mining and deforestation [6]. Although knowledge of the biodiversity of Central Chile has increased in recent decades, there are still important gaps, especially in non-charismatic animal groups and non-woody plants. Moreover, in the last decade, a large number of angiosperms belonging to different families, such as Alstromeriaceae [7,8], Amarillidaceae [9,10], Brassicaceae [11], Orchidaceae [12] and Cactaceae [13,14], have been described.
The underestimation of diversity may be due, in part, to the occurrence of species that are sufficiently distinct based on molecular characterizations, but which have been classified as a single nominal species because they are at least superficially morphologically indistinguishable [15]. Estimates of species diversity and levels of endemism should consider the possible occurrence of cryptic species that could go unnoticed or be synonymized when relying only on morphological approaches in their classification, because they do not present obvious differences in their vegetative or reproductive morphologies. This makes it necessary to use genetic approaches that account for the variability of the species.
The cactus family, with 1851 accepted species [16], is a diverse group within the Neotropics that presents remarkable diversity in life forms and reproductive strategies [17]. Additionally, this family is characterized by high levels of morphological convergence, where the most common life forms, such as columnar and globose forms, have evolved repeatedly across the Cactaceae. This phenomenon has largely complicated the taxonomy of the family, among which South American globose cacti are among the least studied [18]. The tribe Notocacteae (Cactoideae subfamily) harbors several globose species and is regarded as one of the oldest and most narrowly distributed lineages in southern South America [19]. Within this tribe, Eriosyce sensu lato has a complex taxonomic history with a high level of uncertainty, evidenced by the long history of taxonomic changes since its description more than 100 years ago. Species complexes, which include several populations, were historically assigned species ranks, but are currently considered single species. Eriosyce curvispina, with dozens of published names, is one of the most taxonomically complex species within the family [16]. This taxonomic uncertainty is due to the presence of taxa that, based on their morphology, cannot be discriminated, even when having long evolutionary histories that separate them. Additionally, several groups lack comprehensive samples of populations. This has a negative impact on our understanding of the origin and persistence mechanisms of the groups, and underestimates taxonomic diversity, challenging the efficiency of conservation and management actions.
The existence of complexes of barely distinguishable species is a well-known phenomenon in several groups of angiosperms (e.g., [20,21]). However, in Cactaceae, this phenomenon is just beginning to be understood after major revisions in the family that have lumped much of its diversity [22]. Accurate species delimitation is of great importance in establishing precise hypotheses about the mode and tempo of the evolutionary origins of species. In E. curvispina, two contrasting hypotheses can be delineated, a single origin with posterior divergence of infraspecific taxa, or independent speciation events that would reveal a greater diversity at the species level. Here, we investigate the phylogenetic relationships and molecular diversity across E. curvispina populations in order to delimit species and to understand the sequence of origin of its diversity. To achieve these objectives, we analyzed the phylogenetic relationships within Eriosyce section Horridocactus, the clade where E. curvispina is nested [19], and developed 12 microsatellites markers to further investigate the genetic structure of the species complex.

Plant Material
The E. curvispina complex has a globular stem form, with slightly curved prickles; the forms of the flowers are also very similar, but the colors differ from place to place, from lemon yellow to reddish, as shown in Figure 1. They are produced from young areoles, forming a circle around the stem apex; flowers are broad, slightly funnelform, and of 3-5 cm long by 3-5 cm wide [23].  To analyze the evolutionary relationships of the E. curvispina complex, we added 33 samples to a backbone phylogenetic data matrix of the genus published elsewhere [19]. Specifically, we sequenced 18 new individuals of the section Horridocactus and 15 new individuals of the sister clade Neoporteria, all from field collected specimens. We mainly collected the material from roots or flowers and kept it in CTAB-NaCl (hexadecyltrimethylammonioum bromide-sodium chloride) buffer (2%:22%) to transport to the laboratory and store at 4 • C until the extraction.

Sequence-Based Phylogenetic Inferences
For DNA extraction, we used 40-50 mg of root or flower tissue that first was pulverized into a fine power using an automatic homogenizer, and then total DNA was extracted using a DNeasy Plant Kit (Qiagen, Valencia, CA, USA). For the phylogenetic analysis, we amplified three noncoding chloroplast markers (rpl32-trnL, trnH-psbA and trnL-trnF), one plastid gene (ycf1) and one nuclear gene (PHYC) following the protocol described by Guerrero et al. [19]. PCR products were checked on 1% agarose gels and then sent to Macrogen (Seoul, Korea) for sequencing in both directions.
We utilized 117 sequences (33 new and 84 sequences from Guerrero et al. [19]), assembled and edited in the program Geneious Prime ® 2020.2.3 (Biomatters Ltd., Auckland, New Zeland). Sequences for each marker were automatically aligned using Muscle and then checked manually. The outgroup consisted of 19 species, mostly from the core Notocacteae. Each marker was aligned separately and then concatenated. A microsatellite region in the ycf1 dataset was excluded (450 bp) due to ambiguous alignment in this region. The best partitions and molecular models were evaluated using PartitionFinder v.2.1.1, as described by Guerrero et al. [19].
Bayesian inference of the concatenated matrix was performed using Mr.Bayes v3.2.7 [24], and unlinked rate heterogeneity, based on frequencies and substitution rates across partitions. Bayesian ran 30 million generations across four independent runs with four chains each, sampling every 1000 generations. The best models were GTRG for rpl32-trnL and trnH-psbA and GTRINVGAMMA for the rest of the markers. Convergence was monitored using the standard deviation of split frequencies, and when this value stabilized below 0.01, it was considered a strong indication of convergence. The associated likelihood values, effective sample size (ESS) values, and burn-in values of the different runs were verified with the program Tracer v1.7.1 [25]. Trees were visualized using software FigTree v1.4.4 [26] (Supplementary Figure S1).
Divergence dates were estimated using BEAST v2.6.2 [27], and were evaluated with the four clock models available on a concatenate matrix considering one sample per species (87 taxa). The best clock model was the relax clock exponential determined with the value of marginal L provided by PathSampler (an application for model selection inside the BEAST package). For the prior, we used the nodes' age information reported in the Cactaceae phylogeny by Hernández-Hernández et al. [17], with 95% highest posterior density (HPD). The three dated nodes we used in our analysis were: (i) the root node corresponding to 17.15 Ma with normal distribution between 12.67 and 24.46 Ma, (ii) the second node corresponding to 12.44 Ma with normal distribution between 8.59 and 17.95 Ma and (iii) the third node (Core Notocacteae) corresponding to 8.78 with normal distribution between 5.54 and 13.03. Trees were visualized using the software FigTree v1.4.4 (see Figure 2 and Supplementary Figure S2). Maximum likelihood (ML) analyses of this concatenated matrix were performed using the program raxmlGUI 2.0 v.2.0.6 [28]. The search for an optimal ML tree run was combined with a rapid bootstrap analysis based on 100 trees and 1000 replicates (see Figure 2 and Supplementary Figure S3). Genes 2022, 13, x FOR PEER REVIEW 5 of 14

Microsatellite-Based Population Genetics
Microsatellites were designed on the basis of next-generation sequencing from four DNA pools (Pool A Eriosyce chilensis var. albidiflora; Pool C E. chilensis; Pool M E. curvispina and Pool S E. litoralis). These samples were collected between Los Molles and Pichidangui localities (Latitude −32.2 and Longitude −71.47). Sequencing libraries were prepared using a Nextera XT DNA kit, and then sequenced using an Illumina ® MiSeq Next Generation Sequencer with an output of 15 million fragment reads at AutralOmics facilities (https://australomics.cl/, accessed on 25 January 2022 [29]). After sequencing, the first step was removing adapters, checking read quality and cleaning read quality using Trimmomatic [30] and Prinseq [31] software, with a Q > 28. The second step was assembling paired reads with Pandaseq software [32], in order to obtain longer fragments. The third step was the identification of repetitive sequences and the generation of different sets of partitions for each identification; for this purpose, the virtual machine of the QDD software [33] was used, which by way of Primer3 [34] generated a set of partitions for each repetitive sequence identified. Finally, a BLAST (Basic Local Alignment Search Tool [35]) search of the amplicons generated for each microsatellite identified for each sample analyzed was performed, to find the common sequences in all pools.
From the information described above, we choose dinucleotide and trinucleotide sequences common in the four pools. Afterwards, a set of 52 primer pairs were tested with different Eriosyce DNA samples to find the most polymorphic ones. Finally, we choose 12 loci: PS5, PS9, PC2, PC11, PM10, PA12, PM8, PC10, PM6, PC7, PM7 and PA6 (Suppl. Table

Microsatellite-Based Population Genetics
Microsatellites were designed on the basis of next-generation sequencing from four DNA pools (Pool A Eriosyce chilensis var. albidiflora; Pool C E. chilensis; Pool M E. curvispina and Pool S E. litoralis). These samples were collected between Los Molles and Pichidangui localities (Latitude −32.2 and Longitude −71.47). Sequencing libraries were prepared using a Nextera XT DNA kit, and then sequenced using an Illumina ® MiSeq Next Generation Sequencer with an output of 15 million fragment reads at AutralOmics facilities (https: //australomics.cl/, accessed on 25 January 2022 [29]). After sequencing, the first step was removing adapters, checking read quality and cleaning read quality using Trimmomatic [30] and Prinseq [31] software, with a Q > 28. The second step was assembling paired reads with Pandaseq software [32], in order to obtain longer fragments. The third step was the identification of repetitive sequences and the generation of different sets of partitions for each identification; for this purpose, the virtual machine of the QDD software [33] was used, which by way of Primer3 [34] generated a set of partitions for each repetitive sequence identified. Finally, a BLAST (Basic Local Alignment Search Tool [35]) search of the amplicons generated for each microsatellite identified for each sample analyzed was performed, to find the common sequences in all pools.
From the information described above, we choose dinucleotide and trinucleotide sequences common in the four pools. Afterwards, a set of 52 primer pairs were tested with different Eriosyce DNA samples to find the most polymorphic ones. Finally, we choose 12 loci: PS5, PS9, PC2, PC11, PM10, PA12, PM8, PC10, PM6, PC7, PM7 and PA6 (Supplementary Table S1)  The PCR's were performed in a thermo-cycler (Veriti 96 Well Thermal Cycler, Applied Biosystems) programmed as: 1 min at 94 • C for initial denaturation, followed by 35 cycles of 98 • C for 00.05 s, primer specific annealing temperature (58 or 62 • C) for 00.05 s min, 72 • C for 00:40 s, and final extension at 72 • C for 1:45 min. The PCR products were checked on 1% agarose gels in 1X Tris-boric acid-EDTA buffer running a mixture of 5 ul of PCR product with 2 ul of 6X loading buffer (New England Biolabs, Hitchin, UK) with GelRed ® (Biotium, USA) and co-running with a 100 bp DNA ladder (New England Biolabs, UK). The amplified PCR products were sent to the Unidad de Secuenciación y Tecnologías Ómicas at the Pontificia Universidad Católica de Chile (Santiago, Chile) for genotyping. They were analyzed on ABI PRISM 3500 XL (Applied Biosystems) using the size standard Genescan LIZ 500 (Applied Biosystems, San Francisco, CA, USA). Then the loci from dinucleotides repeats, were scored manually and analyzed using Geneious Prime ® (Geneious software v2020.2.3, Biomatters Ltd., Auckland, New Zeland) with Microsatellite plugin taking only the fragments over 100 pb.
We estimated allelic frequencies, polymorphic loci (%P), expected heterozygosity (He) and observed heterozygosity (Ho) with GenAlEx 6.51b2 [36]; the fixation index (Fis) and linkage disequilibrium (LD) with Genetix 4.05; Fst with FreeNA [37]; and the genetic variation within populations, as well as among populations and regions (phylogenetic I-IV Groups), with Analysis of Molecular Variance (AMOVA) using GenoDive v.3.0 [38] ( Table 2).  The population's genetic structure was evaluated using Structure v2.3.4 [39]; this approach consists of plotting the second-order rate change in ln Pr (X/K) for successive K s (referred as DK) against a range of K values and selecting the true K based on the maximal value using the Evanno method [40] (see Supplementary Figure S4). We used a set of K from 2 to 10, with 1 million runs and 10 runs per K and the package pophelper in R to analyze and visualize the population structure results [41,42]. In addition, we used the Discriminant Analysis of Principal Components (DAPC) to evaluate the genetic structure in the library Adegent [43] (see Figure 3). We used cross-validation to determine the appropriate number of components to retain; in our case, we retained 50 principal components, accounting for 0.793 of the proportion of the variance conserved.

Phylogenetic Reconstruction
The concatenated matrix with the five markers included 4841 bp of aligned sequences for 117 individuals, of which 2440 were variable for the complete matrix and 1958 for the ingroup (Table 3). Of the aligned concatenated matrix, the plastid non-coding marker rpl32-trnL contributed 1354 bp (30 % of variable sites), trnL-trnF contributed 1084 bp (15%) and trnH-psbA contributed 439 bp (4%), while the plastid gene ycf1 contributed 930 bp (20%) and the nuclear gene PHYC contributed 1034 bp (locus with 31% of variable sites). Sequence data yielded a well-supported phylogenetic hypothesis of the Horridocactus clade ( Figure 2; Supplementary Figures S1-S3). Bayesian and ML inferences retrieved identical topologies of the Horridocactus clade ( Supplementary Figures S2 and S3), and strongly support the non-monophyly of the E. curvispina complex; we found that its members clustered in four pairs of taxa distributed across three different clades within Horridocactus in four groups (I-IV, Figure 2; Supplementary Figures S1-S3). These groups branched along the phylogenetic tree in a sequence with independent timing of origin (Figure 2;  Supplementary Figure S2). The oldest divergence occurred in Clade A at 3.7 ± 2.3-5.5 Ma (95% confidence interval, IC), separating E. curvispina (Los Molles and Laguna Verde) from Andean populations Putaendo and Escorial. Within Groups I and II, divergences were dated 3.3 ± 2.0-4.9 Ma and dated 2.7 + 1,6-3.8 Ma, respectively. Non-putative members of the E. curvispina complex were retrieved in Clade B, which is composed of species with the geophyte growth form (northernmost distribution within Horridocactus). Within Clade C, Group III of E. curvispina harbors populations from Choapa Valley, Tilama and Limahuida, which were placed together in a branch sister to E. limariensis; this divergence occurred at dated 1.2 + 0.5-1.9 Ma. Two more members of the E. curvispina complex (Ocoa and Farellones) originated at 1.4 ± 0.4-2.1 Ma (Group IV), and were placed sister to E. aspillagae, which was the species with southernmost distribution.

Genetic Diversity and Differentiation of Species
The 12 nuclear simple sequence repeats (SSRs) markes revealed that the groups previously identified in the phylogenetic inferences differ in their genetic diversity and structure ( Table 2). We detected an overall high percent of polymorphic loci, expected heterozygosity, effective alleles and Shannon's information index. Within Groups I-IV, the differentiation values (Fst = Wright s F-statistics) were significantly different from zero. The population in Ocoa has little genetic variation, shown by the lowest fixation index, observed and expected heterozygosity, and levels of polymorphism of 67%, also present a fixation index, Fis = 0.0990 (see Table 2).
The DAPC analyses showed a range of variation among populations, with less divergence between Group II (E. curvispina Putaendo and E. curvispina Escorial) and Group III (E. curvispina from Choapa Valley, Tilama and Limahuida). Group I presented substantial genetic differentiation compared to its closely related Groups II and III. Group IV (E. curvispina Ocoa and Farellones) showed a higher genetic divergence, as the population of E. curvispina at Ocoa is the most differentiated genetic group (Figure 3). Structure analysis also supported these results (Supplementary Figure S3).
AMOVA analysis indicated that the majority of genetic variation occurred within populations (43.4%; Table 1), less among populations and even less among regions, with a non-significant coefficient of correlation value (p-value = 0.208). These results are consistent with the DAPC analysis, where the cluster formed by E. curvispina from Putaendo, El Escorial, Farellones, Tilama and Valle del Choapa was observed, and with the result of Structure, K = 6 (Supplementary Figure S3).

Discussion
Phylogenetic inferences retrieved a well-supported phylogenetic tree revealing evolutionary lineages within the section Horridocactus and strong divergences among putative members of the E. curvispina complex. There is support for independent speciation events, revealing a greater diversity that is going unnoticed, although important genetic uniqueness has been cumulated. Horridocactus species were grouped into three major clades (A-C); two of them harbor E. curvispina populations, which in turn were distributed into four distinct groups (I-IV). This revealed a more complicated evolutionary scenario that is not consistent with the existence of a taxonomic complex, because it presents independent cladogenetic events. Additionally, our study strongly suggests that the members of Groups I to IV originated at different time spans, supporting that divergence mechanisms produced deep and shallow speciation events across the evolutionary time.
The four groups identified are geographically isolated either by latitudinal or altitudinal separation; the high topographic complexity in the central zone of Chile means that, even in short linear distances between populations, they can have strong reproductive isolation [44]. Within Groups I and II, in Clade A (Figure 2), we found significant genetic cumulative divergence, as can be inferred by the great branch length observed in the phylogram between the species. The severe climatic changes that occurred at the beginning of the Pleistocene, together with the complex topography in central Chile, might have had a strong impact on the early divergence of these taxa. For instance, the establishment of new climate regimes and/or permanent ice barriers would have played a key role in this separation [45][46][47], where Andean populations were able to remain in refugia, fostering spatial reproductive isolation. This phenomenon has been suggested in other Andean plant clades with important divergences at the beginning of the Pleistocene [48][49][50]. The coastal populations of E. curvispina at Los Molles and Laguna Verde presented an estimated divergence of 3.3 Ma; additionally, DAPC of SSRs data support the existence of the population genetic structure, suggesting that there is a hard barrier between these two coastal populations that constrain gene flow, reinforcing differences. In contrast, the Andean populations (Escorial and Putaendo) of Clade A showed high levels of divergence in the phylogenetic tree (Supplementary Figure S2), while the population genetic structure assessed by microsatellites data was less significant. These distinct patterns of genetic variation obtained from different molecular markers suggest that modern genetic exchange inferred from microsatellites data may have reduced the genetic distance between these two taxa [51][52][53][54].
Members of Groups III and IV originated more recently (0.85-0.72 Ma) in two independent cladogenetic events. Group III occupies valleys of north central Chile, while species members of Group IV occupy areas further south within the transversal mountain ranges and the Andes. Both habitat types have experienced major vegetation changes over the last million years following glacial cycles [55][56][57][58]; these long-term cycles, with the effect of the South American dry diagonal, have significant xerophytic effects, favoring range expansion and contractions in plant populations [59][60][61][62]. Mechanisms contributing to isolation and posterior divergence involve the complex topography of central Chile, and the bee pollination system, which can easily lead to reproductive isolation, as most native bees are small in size and have a reduced pollination range [63]. In addition, this divergence between coastal and inland populations might be attributed to asymmetric selection regimes exerted by different climate regimes at different elevations [50][51][52] and to the distinct pollinator guilds, which covary with elevation [64].
Members of section Neoporteria, another group of globose cacti, diverged along the coast but did not show levels of deep divergence [19]; this difference is probably due to differences in pollination systems. While species of the Horridocactus clade are beepolinated [65], most of the Neoporteria species are hummingbird pollinated [66,67]. The high vagility of hummingbirds would allow pollen exchange over greater distances, reducing genetic divergence.
The population at Ocoa showed the lowest fixation index, and the lowest genetic diversity among populations, congruent with the fact that it is an isolated population, and thus gene flow with other populations is less likely. This isolation may be caused by the fact that the native xerophitic vegetation remains in island-like hills, and in the last two centuries, the surrounded matrix has been transformed into an anthropized landscape due to different activities such as cereal crops, fruits, livestock, and housing [68,69]. In consequence, local "E. curvispina" has experienced a habitat reduction, increasing the chances for inbreeding, plant extirpation and genetic diversity reduction.
These results support inferences about the historical and modern eco-evolutionary processes that have contributed to the observed diversity patterns of this group, as the combination of geographic factors (topography and climatic variations) added to pollination guild variation, and in more recent times agricultural activities, may be major factors molding the genetic diversity of these globular cacti. Despite the wide distribution and importance of the arid and semi-arid zones, phylogenetic and population genetics studies are still scarce [70][71][72], especially those that address the mechanisms by which they enabled the genetic diversity that we see today [73][74][75][76][77].

Conclusions
The phylogenetic reconstruction with chloroplast DNA (cpDNA) and nuclear DNA markers showed strong relationships of the taxa within Eriosyce, and that the Eriosyce curvispina complex is a polyphyletic group lumping at least four evolutionary lineages. We identified four groups that originated in independent cladogenetic events occurring at different temporal depths; these groups presented high genetic diversity, and their populations were genetically structured. These results improve our understanding on the origin of an endemic rich group of cacti, allow us to more clearly determine its biodiversity, and suggest that a fraction of the flora of the Mediterranean Central Chile biodiversity hotspot may be cryptic. This information is mandatory for accurate extinction risk assessments and the efficient design of conservation actions, avoiding overlooking highly threatened species in an increasingly anthropized landscape.
Supplementary Materials: The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/genes13020240/s1, Figure S1: Phylogeny of Eriosyce s.l. inferred with Mr.Bayes with 30 million generations, each node with posteriori probabilities; Figure S2: Phylogeny of Eriosyce s.l. with 87 taxa obtained using Beast2 with Relax Clock Exponential Model, each node with height 95% HPD; Figure S3: Phylogeny of Eriosyce s.l. with 87 taxa obtained using RaxML with 1000 repeats, each node with maximum-likelihood bootstrap support; Figure S4: Evanno plot for 150 taxa of Eriosyce curvispina complex with 12 microsatellites loci; Table S1: Primers for amplification and multiplexing of microsatellites.

Data Availability Statement:
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at the following link: https://github.com/pabloguerrero-cmd/Villalobos-Barrantes_etal_Genes_2022, accessed on 20 January 2022.