Genesis, Evolution, and Genetic Diversity of the Hexaploid, Narrow Endemic Centaurea tentudaica

: Within the genus Centaurea L., polyploidy is very common, and it is believed that, as to all angiosperms, it was key in the history of its diversification and evolution. Centaurea tentudaica is a hexaploid from subsect. Chamaecyanus of unknown origin. In this study, we examined the possible autopolyploid or allopolyploid origin using allozymes and sequences of three molecular markers: nuclear‐ribosomic region ETS, and low‐copy genes AGT1 and PgiC . We also included three species geographically and morphologically close to C. tentudaica : C. amblensis , C. galianoi , and C. ornata . Neighbor‐Net and Bayesian analyses show a close relationship between C. amblensis and C. tentudaica and no relationship to any of the other species, which suggest that C. tentudaica is an auto‐ polyploid of C. amblensis . Allozyme banding pattern also supports the autopolyploidy hypothesis and shows high levels of genetic diversity in the polyploid, which could suggest multiple origins by recurrent crosses of tetraploid and diploid cytotypes of C. amblensis . Environmental niche mod‐ eling was used to analyze the distribution of the possible parental species during the present, Last Glacial Maximum (LGM), Last Interglacial Period (LIG), and Penultimate Glacial Maximum (PGM) environmental conditions. Supporting the molecular suggestions that C. tentudaica originated from C. amblensis , environmental niche modeling confirms that past distribution of C. amblensis over‐ lapped with the distribution of C. tentudaica . as a reverse primer. PCR amplifications between the exon 11 to 15 were performed with the following parameters: Three minutes at 95°C, followed by 35 cycles of 94°C for 30 s, 54°C for 30 s and 72°C for 2 min, with 10 additional min at 72°C. PCR reactions were carried out using a total volume of 25 µL for each sample, con‐ taining 2.5 µL of dNTPs (2 mM), 2.5 µL of 10× PCR Buffer II (10 mM), 2.5 µL of MgCl2 solution 25 mM, 1 µL of each primer at 5mM for ETS and 10 mM for AGT1 and PgiC amplification, 0.2 µL of AmpliTaq ® DNA Polymerase (Applied Biosystems, USA), 11.3–11.8 µL of distilled water, 0.5 µL of DMSO (dimethyl sulfoxide; Sigma‐ Aldrich, Schnelldorf, for ETS and 1 µL of BSA 0.4 mg/mL (bovine serum albu‐ min; New England Biolabs, Ipswich, MA, USA) for AGT1 and PgiC , and 3 µL of diluted DNA (proportions of water and DNA depends on the total DNA obtained during extrac‐ tion).


Introduction
The role of polyploidy and hybrid speciation in the origin of new species has been frequently studied and well documented in the last decades, particularly in angiosperms [1][2][3][4]. The occurrence of polyploidy has played a very important role in the appearance and diversification of regulatory genes, emphasizing its role in plant evolution. Recent studies suggest that all angiosperms descend from a paleopolyploidization event that gave place to key innovations, securing its ancient high rate of diversification [2,4]. Besides, after whole-genome duplication, genomes undergo several changes in genome size, genome structure, and epigenetic control, allowing them to have a very rapid evolution [4,5].
Two main mechanisms can cause polyploidy. The first one is known as autopolyploidy and occurs within a single species, although it can involve a cross between genetically differentiated populations. The origin of a new species from a whole-genome duplication event is a simple mechanism, because polyploidy acts as a reproductive barrier.
When polyploids arise and mate with diploids, a prezygotic barrier usually develops, creating sterile aneuploid gametes. For this reason, polyploids are reproductively isolated from their parents [1], and they constitute new species, according to the biological [6] and evolutionary [7] concepts of species. The second mechanism that leads to a whole-genome duplication event is known as allopolyploidy and occurs by interspecific hybridization of two different species [3,5]. It has been suggested that whole-genome duplication solves the incompatibility problems after interspecific hybridization that are observed in homoploid hybrids ( [5,8,9], and references therein).
Morphology often gives some hints on the hybrid origin of a species. However, molecular analyses are much more efficient: Besides detecting hybridization, they also can identify the parental species even when they have similar traits, and morphology alone does not determine the hybrid origin. Moreover, molecular data may reveal old hybridization events that are not evident from morphology, and they have detected some species as hybridogenic [10]. For this reason, specific molecular analyses focused on polyploidy have been developed in recent years, using especially nuclear-ribosomal regions like ITS and ETS, because there are very easy to amplify and show good resolution [11,12,13]. Lowcopy genes are also used in phylogenetic and biogeographical studies because they have a higher rate of evolution compared to organellar genomes, are usually present in multiple independent loci, and show biparental inheritance. These genes are present in the genome with one to four copies and usually show great resolution, because they are not subject to concerted evolution [14]. Thus, they are ideal candidates for identifying the parental species of suspected hybrids [14,15]. Finally, codominant genetic markers, including allozymes and microsatellites, have been proved as very efficient in discriminating between the different types of polyploid hybrid speciation (i.e., auto-vs. allopolyploidy), mainly through the careful observation of banding patterns, e.g., References [16,17]. Besides, allozymes may be useful in determining genetic diversity [18].
When analyzing the hybrid origin of a species, both homoploid or polyploid, the historical distribution of parental taxa should be taken into account, because environmental conditions were different in the past (particularly during extreme climate events, such as the Last Glacial Maximum, LGM) and current distribution may not reflect their ancient distribution ( [19], and references therein). As in other latitudes, some studies have suggested that climatic conditions of the LGM (ca. 20,000 years BP), and probably at other glacial maxima, could lead to a different distribution of plant species in the Iberian Peninsula compared to the present, shifting their area to lower latitudes [20]. The environmental conditions during glacial and interglacial periods could lead to cyclic changes of the distribution, in which some species expanded and contracted their ranges depending on their specifics requirements ( [19], and references therein). For this reason, the southern peninsulas of Europe became important refugia for species that later colonized the north during the postglacial warming [21]. Ancient distribution of the possible parental species during glacial and interglacial periods would help us in explaining the current distribution of the hybrid species; for example, contact areas where parental species met could be far from their current range [21][22][23][24]. Tools as Ecological Niche Modeling (ENM) are very useful to detect the possible contact areas between the distributions of the suspected parental taxa in the past ( [19], and references therein).
An intriguing case of polyploidy is that constituted by Centaurea tentudaica (Rivas Goday) Rivas Goday and Rivas Mart. It is a hexaploid taxon with 2n = 6x = 60 [29] from subsect. Chamaecyanus, described for the first time by Rivas Goday (1964) [30] as a subspecies of Centaurea toletana. It was assigned the species rank by Rivas Martínez (1980) [31] and soon after Fernández- Casas and Susanna (1982) [32] considered it a variety of Centaurea amblensis Graells. Finally, Rivas Martínez (1988) [33] gave it the subspecies rank, Centaurea amblensis Graells subsp. tentudaica (Rivas Goday) Rivas Mart. Nowadays, there is some consensus in considering that it should be considered an independent species because it is hexaploid [29,34], while C. amblensis is tetraploid with 2n = 4x = 40 [35], and some morphological differences do exist [36]. Centaurea tentudaica has a very narrow distribution, as it only occurs in Sierra of Tudia, Badajoz, Spain [34,35]. The species has been evaluated as 'Critically Endangered' [CR;29] because of its small range: The area of occupancy and extent of occurrence is below 1 and 2 km 2 , respectively. It shows acaulescent or subacaulescent habit with pinnatisect or pinnatifid leaves; appendages of the involucral bracts long, recurved, and pinnatifid; and pink florets [35]. All of these characters strongly suggest that C. tentudaica is closely related to C. amblensis. Thereafter, any research on the polyploidy taxon should include C. amblensis as one of the potential parental species.
Our main objectives in this work using cloned sequences of nuclear-ribosomal DNA spacers ETS, as well as low-copy genes AGT1 and PgiC, and allozymes are multifaceted: (1) To evaluate whether Centaurea tentudaica is an auto-or an allopolyploid; (2) given the threatened status and the conservation interest of C. tentudaica, to survey the levels of genetic diversity within and among (sub)populations; and (3) to develop an ENM to infer the potential distribution of Centaurea amblensis, the only putative parental identified on morphological basis, at present, LGM, LIG (Last Interglacial Period, ca. 120,000 years BP) and PGM (Penultimate Glacial Maximum, ca. 140,000 years BP) conditions.

Sampling Strategy and Markers
For DNA sequencing, fresh leaves from adult plants in wild populations were collected and preserved in silica-gel. The studied species, vouchers, chromosome numbers, and complementary data are listed in Table S1. The geographic origin of the sampled populations is illustrated in Figure 1. For our study, we selected the nuclear ribosomal ETS because concerted evolution is usually incomplete in sect. Acrocentron [10,13], and different copies can persist in cases of introgression. These copies can be detected by cloning, and they are generally useful for identifying reticulation. We also selected two low-copy genes: AGT1, which was useful for determining gene flow and hybridization in Centaurea [37]; and PgiC, which will be used for the first time in this section.  Table S1. Maps were obtained from www.maps-for-free.com, www.geamap.com, and the National Geographic Institute (2019).
For the ETS region and low-copy genes, we selected seven individuals of C. tentudaica from four subpopulations ( Figure 1): (1) One from the north side of El Labrado hill with pale pink florets and pectinate bracts; (2) two from the northwest side of El Labrado, one with pale pink and the other with dark pink florets; (3) two from Cumbre de las Ceborillas hill, one with large capitula and the other with small capitula (both with dark pink florets); (4) two from Cumbre de los Bonales hill, one with dark pink florets and pinnatifid leaves (all other individuals selected of C. tentudaica had pinnatisect leaves) and the other one with pale pink florets and acaulescent habit (all the other individuals of C. tentudaica were subacaulescent). Details of the locations are shown in Table S1. Besides, to verify the origin of C. tentudaica, we selected five individuals (from five populations) of three additional species: C. amblensis (three populations one individual each, all of them with dark pink florets), a tetraploid species [cf. 35] that was chosen because of its morphological resemblance to C. tentudaica; and C. ornata (one population, one individual) and its close relative Centaurea galianoi Fern. Casas and Susanna (one population, one individual), both with yellow florets, because of their geographical proximity to C. tentudaica. As outgroups, C. toletana was chosen for the ETS analysis; C. cephalariifolia Willk. and C. scabiosa L., for the AGT1 analysis; and C. polypodiifolia Boiss. and C. behen L., for the PgiC region.
For allozymes, we sampled a total of 85 individuals along a linear transect covering all the distribution range of C. tentudaica; sampled individuals were spaced by several meters to avoid the hypothetical collection of ramets. We tried to do a comparable sampling between the three main subpopulations in which most individuals of C. tentudaica are arranged, and that correspond to three hills of slightly over 1000 m a.s.l.: Cerro Tudia, El Labrado, and Cumbre de los Bonales ( Figure 1 and Table 1). Fresh leaf samples consisted of small leaves, which were deposited in paper bags and stored at 4 °C until protein extraction one or two days later.
The PCR products of the three regions were cloned using the TOPO TA Cloning kit (Invitrogen, Carlsbad, CA, USA) following the manufacturer's instructions. From each reaction, 8-16 colonies were amplified using T7 and M13 universal primers under the conditions described in Vilatersana et al. (2007) [44]. After examination in 1.2% agarose gels, six to eight of the 16 PCR products with the correct size were selected for sequencing using the same primers. Before sequencing, all the PCR products were purified with the QI-Aquick PCR Purification Kit (Qiagen Inc., Valencia, CA, USA). Sequencing was performed on an ABI 3730XL Analyser (Applied Biosystems) following the manufacturer's protocol at Macrogen Inc., Korea.

Sequence Alignment and Phylogenetic Analyses
Sequences were aligned manually using BioEdit v7.0.5.3 [45]. Unique substitutions in clones from a single accession were excluded. This reduced the size of the matrices, as well as the impact of PCR artifacts (chimeric sequences and Taq errors [46,47]). To verify the presence of possible recombinant sequences, datasets were checked using RDP v4.97β [48].
Network analyses (Neighbor-Net) were carried out for each marker using Splits Tree4 v4.14.8 software [49] with the criterion set to uncorrected pairwise (p) distances, excluding gap sites. For the Bayesian analyses, the datasets were processed using jModeltest v2.1.10 [50] in order to find the best evolutionary model that described the data properly. These models were used to carry out the Bayesian analyses using Mr Bayes v3.2.7 [51]. The Marginal Likelihood Estimators for the two models (non-clock and strict clock) were obtained using Stepping Stone Sampling following the Mr Bayes manual. Two independent MCMC analyses were run, each for 20 million generations and two chains, starting with random initial trees and sampling every 1000 generations. The convergence of the parameters was checked using TRACER v1.7.1 [52], discarding the first 25% of the generations as burn-in.
As expected for a hexaploid species, banding patterns were particularly complex. Even for the 'interpretable' loci (those that showed rather clear banding patterns, with strongly stained and well-defined bands), determining the allelic dosages was not possible in most cases; co-migration of bands, the faintness of some alleles, and the occurrence of multiple heterodimeric bands frustrated our efforts to assign 6-letter genotypes (e.g., abbccc) to each individual for each locus. In order to run the conventional software for estimating the common genetic diversity parameters, we followed two approaches that have been used in similar situations. For the first approximation, we selected only the loci for which we were able to confidently identify the alleles present at each individual (what we called above as the 'interpretable' loci, 11 in total). This way enables treating loci as codominant, but only simplified phenotypes can be obtained; for example, the phenotype 'ab' could correspond to up to five different genotypes (abbbbb, aabbbb, aaabbb, aaaabb, and aaaaab) (Figure 2). This method, despite representing a big simplification of the actual genotypic richness, estimates the allele frequencies at the population level. On the basis of allele frequencies, we can calculate some of the main within-population genetic diversity parameters, including P (percentage of polymorphic loci), A (mean number of alleles per locus), and He (expected heterozygosity), but not Ho (observed heterozygosity). Examples using this approach include the hexaploid Sequoia sempervirens [54] and the Chinese Smilax polyploid series [55]. For the second approximation, we treated the allozyme data as dominant markers. Each identified allele of the 11 interpretable loci was regarded as a separate 'locus' and coded as 1 (allele present) vs. 0 (allele absent) ( Figure 2). As a result, we obtained a binary presence/absence matrix of 23 'loci'. This second approximation may imply some bias because we are not dealing with actual loci. However, it has the advantage that we can obtain accurate individual genotypes, and thus, we are able to calculate among-population genetic diversity parameters (e.g., the fixation index, FST), as well as surveying the population structure. This approximation has been used, for example, in polyploid Achillea [56]. However, there are works even combining the two approaches (as in Cardamine; [57]), as we are doing in the present research. . Through the first approach, up to four alleles can be identified for the Pgi-2 ('a', 'b', 'c', and 'd'). As we were not able to estimate the allele dosage (that is, the number of copies of each allele) in all the individuals, we assigned 'allele phenotypes' (indicated at the bottom of the zymogram) instead of true genotypes. Please note that individuals 3-13 are all codified as 'ab', but they might actually correspond to different genotypes: Individuals 7, 9, and 10 could be 'aabbbb' or 'abbbbb' but, instead, individuals 4, 5, and 12 are probably 'aaabbb' or even 'aaaabb'. Through the second approach, each of the four alleles that can be identified in the Pgi-2 is treated as a dominant locus, so if present, it is codified as '1′, if absent, as '0′. Hence, an individual that in the upper part of the graph is codified as 'ab', in the lower part of the graph is codified as '0.0.1.1′, while an individual with an allele phenotype 'abcd' would be '1.1.1.1′.
With the phenotypes obtained through the first approximation and by using the program BIOSYS-1 [58], we computed P95 (percentage of polymorphic loci when the most common allele had a frequency of <0.95), A, and He. Instead of Ho, we calculated 'individual heterozygosity' (Hi) by hand, measured as the percentage of the 11 interpretable loci for which the sampled individuals were heterozygous. The genotypes obtained through the second approximation were used to calculate FST with the help of AFLP-SURV v1.0 [59], but also to run a Principal Coordinate Analysis (PCoA)-conducted using GenAlEx v6.5 [60] based on genotypic distances.

Ecological Niche Modeling
Ecological niche modeling was performed to evaluate the potential distribution of C. amblensis for the present time, LGM, LIG, and PGM (~140 ka) environmental conditions. Building models for C. tentudaica was discarded because at the study resolution (2.5 arcmin) the number of wild occurrences (2) was not enough to get reliable results (≥ 5) [61]; unfortunately, climatic data for the LGM and older time periods are not available at finer resolutions than 2.5 arc-min (https://www.worldclim.org/data/v1.4/paleo1.4.html; http://www.paleoclim.org/;https://datadryad.org/stash/dataset/doi:10.5061/dryad.27f8s90). We used the maximum entropy algorithm implemented in MaxEnt v3.3.3k [62]. Occurrence data of C. amblensis were obtained from Fernández Casas and Susanna (1985) [35] and the Global Biodiversity Information Facility (https://www.gbif.org/). All the occurrences were verified manually, since at least one of the authors of this work was involved in collecting some of the occurrences. The rest were backed by herbarium specimens, except for five locations that were confirmed through photographs. A total of 74 validated locations were included for C. amblensis. The selected background for this study was the Iberian Peninsula, since C. amblensis is restricted to this area [29,[34][35]. We used a set of 19 environmental variables at 2.5 arc-min resolution obtained from the WorldClim database (http://worldclim.org). We also included layers for soil type (from the Harmonized World Soil Database, HWSD; www.fao.org/nr/land/soils/harmonized-world-soil-database/en), soil pH (from ISRIC, World Soil Information; http://isric.org), and altitude (from DEMSRE3; http://worldgrids.org/doku.php/wiki:demsre3) at 30 arc-sec resolution [which were further rescaled to 2.5 arc-min with ArcGIS v10.5 (ESRI, Redlands, CA, USA)]. Of all occurrence records, 75% were randomly selected to construct the niche model under current conditions in MaxEnt. The remaining 25% was used to test the prediction of the model using the area under the curve (AUC), which is the probability that a random site will be ranked above a random absence site. A random ranking has on average an AUC of 0.5, and a perfect ranking has the best possible AUC of 1.0 [63]. Jackknife tests, percent of contribution, and response curves of the variables were implemented in MaxEnt to define the influence of each one. Finally, we selected eight relatively uncorrelated variables (r ≤ |0.85|) ( Table  S2). The correlation analysis was run using SDMtoolbox v2.4 [64] implemented in ArcGIS. The niche model under current conditions was projected to the LGM, LIG, and PGM using paleoclimatic layers from Oscillayers (https://doi.org/10.5061/dryad.27f8s90). The altitude layer of the three past environmental conditions was constructed from the present layer by adding 120 m to each pixel to simulate the LGM altitude conditions [65,66], by subtracting 2.15 m to each pixel to the LIG conditions [67], and by adding 90 m to the PGM conditions [68]. Soil and pH layers were assumed to be the same as in present conditions. We run 100 replicates in MaxEnt using the subsample method (75%/25% of occurrences for training and testing the model, respectively), and applying the maximum sensitivity plus specificity logistic threshold to obtain a map of probability of the presence of C. amblensis in ArcGIS. Afterwards, the final model was evaluated by the AUC, obtained from Maxent, and by calculating the True Skill Statistic (TSS) [69] using RGui v4.0.3 [70]. Values with a TSS higher than 0.5 are considered as optimal results in terms of power prediction [69].

Results
We sequenced 195 clones, from which only 90 were retained after eliminating recombinant sequences and those containing unique single-nucleotide polymorphisms. Total alignments were 666 bp long for ETS, 1030 bp for AGT1, and 726 bp for PgiC. The number of gaps, polymorphic sites, total mutations, and informative sites are shown in Table 2.
The best model for the different regions was HKY + G for all the cases, based on Akaike information criteria (AIC). We selected the strict clock uniform using the Bayes Factor (BF) values for all the regions (Table S3).

ETS Region
The Bayesian analysis of the ETS region revealed a partially resolved tree ( Figure S1). This analysis showed a moderately-supported (PP = 0.92) clade, including C. tentudaica and C. amblensis ( Figure S1). The tree also showed a copy shared by C. tentudaica and C. ornata, but without support ( Figure S1, clade B), and a well-supported copy shared by C. tentudaica and C. galianoi ( Figure S1, clade C). The Neighbor-Net (Figure 3) showed one copy shared by C. amblensis and C. tentudaica, and a second one by C. ornata, C. tentudaica, and C. galianoi, with three copies corresponding to clades B, C, and D in the Bayesian analysis ( Figure S1).

Low-copy Genes AGT1 and PgiC
AGT1 analysis showed a partially resolved tree ( Figure S2), with a large clade, including all species analyzed, but with only two well-supported subgroups: The first one ( Figure S2, clade C) included a clone of C. amblensis and a clone of C. tentudaica, and the second ( Figure S2, clade D) comprised only C. tentudaica. The Neighbor-Net analysis did not show well-differentiated copies (Figure 4). The Bayesian analysis for PgiC showed seven different clades ( Figure S3). The first one had high support and corresponded to clade A, which included all clones of C. ornata and C. galianoi. Clades B, C, and D corresponded to subcopies shared by C. tentudaica and C. amblensis, the first and the last one with very high support. All the other subcopies were found only in C. tentudaica with high support. The Neighbor-Net showed the same welldifferentiated copies that clearly separate C. tentudaica and C. amblensis from C. galianoi and C. ornata ( Figure 5).

Allozymes
Of the 11 interpretable loci for C. tentudaica, four were monomorphic with a single allele, namely, Idh-2, Mdh, 6Pgd-1, and Pgi-1. In the seven remaining polymorphic loci, no evidence of fixed heterozygosity was found; rather, all of them showed both putatively balanced (having an equal number of copies of each allele) and unbalanced (having a different number of copies of each allele) heterozygotes ( Figure 2). The number of alleles per polymorphic loci ranged from 1 to 4, with a mean at an individual level of 1.55. Only one locus (Pgi-2) provided phenotypes of four alleles (abcd; Figure 2). Regarding the levels of genetic diversity, they were very similar among subpopulations of C. tentudaica, averaging 60.61%, 2.03, and 0.287 for P95, A, and He, respectively ( Table 1). The value of FST was very low (0.023), indicating almost no genetic divergence among the three subpopulations. The graphical representation of PCoA ( Figure S4) agreed with the FST estimation, as individuals belonging to the tree subpopulations are randomly distributed across the two dimensions.

Ecological Niche Modeling
The predictive power of the model was very high, obtaining an average AUC of 0.975±0.014 and a TSS of 0.848±0.069 after 100 runs. According to MaxEnt, the variables that contribute the most were altitude (28.8%), soil type (23%), precipitation of driest month (WorldClim code: bio14, 21.1%), and annual temperature range (WorldClim code: bio7, 9.6%), which explained almost 80.5% of the potential distribution. The mean temperature of the coldest quarter (WorldClim code: bio11) showed a very low contribution (5%), but scored the highest permutation importance with 55%, followed by precipitation of the driest month (bio14), with a 37.9% of permutation importance. The jackknife test showed that the variable with the highest gain, when used in isolation, was soil type. On the contrary, the variable that decreased most the gain when omitted was precipitation of driest month (bio14), indicating that it carries information not present in other variables.
The LGM and PGM models showed a high probability of a larger distribution of C. amblensis compared to the present, with proper environmental conditions that would have included large areas towards the south, reaching Sierra of Tudia and surroundings ( Figure  6). The LIG model showed a probability of distribution restricted to the center-west of the Iberian Peninsula, a result very similar to the present model ( Figure 6).

Relative Utility of ETS, AGT1, and PgiC in Disentangling the Origin of Centaurea tentudaica
The ETS shows multiple copies within a single species, suggesting incomplete concerted evolution at this region, as previously reported in several studies [10,11,13,37]. Low-copy genes show different levels of resolution in the present study, being the PgiC more informative than AGT1. The PgiC region is used here for the first time in section Acrocentron of genus Centaurea and shows great resolution, as found in previous phylogenetic studies on Compositae [15] and Onagraceae [71,72]. The lack of resolution obtained herein with the AGT1 was already observed in other studies, including those of Li et al. (2019) [73] in Orobanchaceae, but in opposition to studies carried out in Centaurea of Compositae [37] and in a large number of families within angiosperms [43], where it proved to be particularly useful.
The lack of resolution of AGT1 in the present study makes it unsuitable for exploring the origin of the polyploidy in C. tentudaica. Despite the high number of polymorphic sites and mutations (Table 2), this region shows the lowest number of parsimony informative sites. This implies that most of the polymorphisms are not shared with any other sequence, resulting in the lack of resolution observed both in the Bayesian analyses and the networks. On the contrary, the ETS shows at least two copies and offers better resolution. The presence of more than one ribotype in one species might be (1) the result of incomplete lineage sorting, or (2) acquisition of different copies through hybridization ( [15,44], and references therein). In our case, we have observed that copies are usually shared between nearby populations. For this reason, we think that hybridization is the main cause of the high number of ribotypes observed in a single species. The PgiC region shows high resolution, which is surprising because it has the lowest number of polymorphic sites (Table  2). However, in comparison with the AGT1 region, it has a higher number of parsimony informative sites, which explains its higher resolution. The PgiC region, thus, has been very useful to explore the origin of C. tentudaica, showing more than one copy shared with full support between C. tentudaica and C. amblensis. In summary, the resolution of the three regions was different, and when combined, provided good insights into the genesis of this hexaploid narrow endemic species.

The origin of Centaurea tentudaica
The results show that C. tentudaica is closely related to C. amblensis, sharing at least one copy for the three regions analyzed. The most plausible hypothesis is, thereafter, autopolyploidy. Copies of C. tentudaica shared with C. galianoi should be interpreted as a sign of introgression that occurred after the genesis of the polyploid: If C. galianoi was implied in the genesis of C. tentudaica, all the individuals from all populations would show some signs of introgression in the sequences. However, the presence of copies from C. galianoi is limited only to some subpopulations of C. tentudaica ( Figure 4). Significantly, subpopulations of C. tentudaica on the southern face of Sierra de Tudia, which are geographically closer to C. galianoi populations, have individuals with orange anther tubes ( Figure 7B), while most of the remaining subpopulations have only purple anther tubes like C. amblensis ( Figure 7A). Changes in the color of anther tubes in Acrocentron are usually the result of hybridization [37]. Therefore, we should discard on a morphological basis the role of C. galianoi as one of the putative parentals of C. tentudaica because the presence of yellow anthers is only occasional. Centaurea tentudaica also has its own genome, as we can see in the results of AGT1 and PgiC regions. Exclusive copies are probably the result of its geographic isolation, given that we find copies shared between subpopulations and copies exclusive of a single subpopulation. Other examples in Centaurea sect. Acrocentron shows that introgression is very frequent in this group. The polyploid series of C. toletana is a timely case: Tetraploid C. toletana is, by all evidence, an autopolyploid [35], but all the studied populations show evidence of introgression by sharing multiple copies of the nuclear ETS [10]. The allozyme banding patterns are also indicative of autopolyploidy in C. tentudaica because there is no evidence of fixed heterozygosity (as expected for allopolyploids, due to the combination of two divergent parental genomes), and the polymorphic loci show both balanced and unbalanced heterozygotes (e.g., Figure 2). Once accepted that C. tentudaica is probably an autopolyploid of C. amblensis, we should explore geographical evidence. The present distribution of C. amblensis is restricted to altitudes of 800-1500 m on siliceous soils (exceptionally on clay soils) about 300 kilometers north from Sierra de Tudia, the range where C. tentudaica is distributed [35,74]. A plausible explanation is that the ancient distribution of C. amblensis included these hills. The Iberian Peninsula acted as a plant refugium during the Pleistocene, and intense climatic fluctuations led to a series of latitudinal and altitudinal migrations [10,29,[75][76][77]. During glacial periods, the conditions that are found at 800 m for the present time would be available at lower altitudes like 400-600 m [20], which are roughly the altitudes found in the plateau connecting directly south-west Spain (Sierra de Tudia) to central Spain (Sierra de Gredos; Figures 2 and  6). As a result, a pathway would have existed between Sierra de Tudia to the area where C. amblensis occurs in central Spain. In support of this hypothetical corridor, the ENM ( Figure 6) suggests that the potential area of C. amblensis during the LGM and PGM included the current area of C. tentudaica. These results, thus, strongly support the hypothesis that C. amblensis gave origin to the hexaploid, which colonized and adapted to higher lands during the postglacial warming.
All extant populations of C. amblensis are tetraploid ( [35], and references therein), and C. tentudaica is hexaploid [29]. An autopolyploid origin of the hexaploid from the tetraploid could be the result of a cross between an unreduced gamete (4x) and a normally reduced one (2x). Alternately, hexaploid C. tentudaica could be the result of the cross of tetraploid C. amblensis and an individual from the extinct diploid C. amblensis that surely existed at some point. Because triploids are usually sterile ( [78][79][80], and references therein), whole-genome duplication might occur to overcome sterility, which would explain the hexaploid level of the species. Our current knowledge of the genetic evolution of the species does not allow us a decision between both hypotheses. However, the allozyme banding patterns of C. tentudaica might give support to the second hypothesis, as some loci (such as Pgi-2; Figure 2) are extremely rich in genotypes (or, more precisely, allele phenotypes). Such a pattern of high allele richness seems very unlikely for a polyploid originated by a cross between an unreduced gamete (4x) and a reduced one (2x), as this probably would have occurred on a scenario of a single origin. Rather, the quite unexpected high levels of genetic diversity detected (P95 = 60.61%, A = 2.03, He = 0.287, and Hi = 0.292) for a species with a narrow range (<2 km 2 )-despite its polysomic inheritance, points to multiple geographic (and, perhaps, temporal) origins. For example, the levels of allozyme genetic diversity of C. tentudaica are higher than those for other endemic and range-restricted hexaploid species: Coreopsis grandiflora var. longipes (restricted to southeastern Texas; P95 = 38.67%, Hi = 0.287; [81]), Glyceria nubigena (endemic to the Great Smoky Mountains of the southeastern USA; P95 = 0.00%, A = 1.00, He = 0.000; [82]), or several species of Bidens endemic to Hawaii (P95 = 23.8-52.4%, A = 1.24-2.19, He = 0.044-0.158; [83]). In addition, genetic variation in C. tentudaica is much higher than in other polyploid species of sect. Acrocentron (in the autotetraploid C. podospermifolia; P95 = 25.6%; A = 1.41; and He = 0.087; [37]), which suggests that the high polymorphism in C. tentudaica is not the result of phylogenetic inertia. Indeed, multiple origins in polyploids seem to be the rule and not the exception, perhaps because it may increase the genetic variation of a polyploid species, thus facilitating its survival chances [84][85][86][87]. Regarding the conservation status of C. tentudaica, its high genetic diversity should be interpreted as a sign of genetic health, a situation that would be maintained at least in the short-term: the population size is of nearly 16,000 individuals, and there are no signs of genetic fragmentation (FST = 0.023; see also Figure 3). The only threats are grazing by sheep and the exploitation of new plantations of pine (Pinus pinea L.).

Conclusions
Once again, molecular markers, such as ETS and low-copy genes, have demonstrated their usefulness for phylogenetic analyses in highly hybridized groups when there are used in conjunction. Our results provide solid evidence of C. tentudaica being an autopolyploid of C. amblensis, which, despite being highly range-restricted, shows high levels of genetic diversity.
Supplementary Materials: The following are available online at www.mdpi.com/1424-2818/13/2/72/s1, Table S1: Species, origin of the material, herbaria and chromosome number, Table  S2: Environmental variables for ENM, Table S3: Marginal Likelihood Estimators for the two models (strict clock and non-clock) obtained using Stepping Stone method, Figure S1: Consensus tree resulting from the analysis of the ETS region of Centaurea tentudaica, Figure S2: Consensus tree resulting from the analysis of the AGT1 region of Centaurea tentudaica, Figure S3: Consensus tree resulting from the analysis of the PgiC region of Centaurea tentudaica, Figure S4  Data Availability Statement: data other than DNA sequences are available from the authors by direct request.

Conflicts of Interest:
The authors declare no conflict of interest.