An Integrated Taxonomic Approach Points towards a Single-Species Hypothesis for Santolina (Asteraceae) in Corsica and Sardinia

Simple Summary Systematics is the branch of biology that studies the relationships among organisms and their evolution, while taxonomy is the science of classification. In this work, a systematic and taxonomic investigation about three plant species of Santolina, commonly known as lavender-cotton, is presented. Two of these species occur exclusively in Corsica and Sardinia, two of the main islands of the Mediterranean Sea, while a third one is a common ornamental plant, known only as cultivated. By integrating several approaches, we find out that the two putative species from Corsica and Sardinia are actually very similar from many points of view. A two-species hypothesis is no longer supported according to our results, so that these plants should be reclassified as a single species. This study demonstrates the importance of integrating different sources of information to produce reliable classifications (i.e. taxonomic hypotheses). In addition, our study is useful to better understand plant evolution in the context of the Mediterranean Basin, one of the world’s biodiversity hotspots. Abstract Santolina is a plant genus of dwarf aromatic shrubs that includes about 26 species native to the western Mediterranean Basin. In Corsica and Sardinia, two of the main islands of the Mediterranean, Santolina corsica (tetraploid) and S. insularis (hexaploid) are reported. Along with the cultivated pentaploid S. chamaecyparissus, these species form a group of taxa that is hard to distinguish only by morphology. Molecular (using ITS, trnH-psbA, trnL-trnF, trnQ-rps16, rps15-ycf1, psbM-trnD, and trnS-trnG), cypsela morpho-colorimetric, morphometric, and niche similarity analyses were conducted to investigate the diversity of plants belonging to this species group. Our results confute the current taxonomic hypothesis and suggest considering S. corsica and S. insularis as a single species. Moreover, molecular and morphometric results highlight the strong affinity between S. chamaecyparissus and the Santolina populations endemic to Corsica and Sardinia. Finally, the populations from south-western Sardinia, due to their high differentiation in the studied plastid markers and the different climatic niche with respect to all the other populations, could be considered as an evolutionary significant unit.


Introduction
With about 30,000 taxa of vascular plants, the Mediterranean biogeographic region is one of the 34 most important mega hot-spots for biodiversity in the world [1,2]. The endemic-vascular plant richness (EVPR) in these hotspots is >2,000 species per 15,000 km 2 , and within these areas at least 10% of narrow endemics occur [2]. One of the causes of the high level of biodiversity in that area lies in its peculiar geological and climatic history during the last 20 million years [3,4]. Indeed, the shift from a sub-tropical to a Mediterranean climate, glaciations, oscillations of the sea level, tectonic movements, and human activity played a crucial role in the diversification of plants triggering speciation processes [5][6][7]. The Tyrrhenian islands (sensu Médail and Quézel [8]) of the central-western Mediterranean Sea (Sicily, Sardinia, Corsica, Tuscan Archipelago, and Balearic Islands), with approximately 6000 plant species, of which 10-12% are endemic [9], are considered one of the 11 macro hot-spots within the Mediterranean region [10]. Corsica (France) and Sardinia (Italy) are two of the largest islands of the Mediterranean Basin and host a great number of endemic vascular plants. Indeed, in Sardinia, 319 out of 2462 taxa are endemic [11], and 316 out of 2781 in Corsica [12]. The study of species that are endemic to both islands allows a better understanding of the evolutionary dynamics in the Mediterranean Basin [13][14][15][16]. In this context, the populations of Santolina L. (Asteraceae, Anthemideae) occurring in Corsica and Sardinia are a good case study.
Santolina is a genus of dwarf aromatic shrubs that includes about 26 species endemic to the western portion of the Mediterranean Basin [17][18][19]. According to Oberprieler [20], the diversification of this genus from its closest relative started around 10 million years ago. The centre of diversification is probably the Iberian Peninsula, where the highest number of Santolina species is reported [21]. Santolina corsica Jord. & Fourr. and S. insularis (Gennari ex Fiori) Arrigoni, the two species occurring in Corsica and Sardinia, belong to the S. chamaecyparissus L. complex, which includes 14 species distributed in the centralwestern Mediterranean [17]. According to the most recent floras [18,[22][23][24], these two taxa and S. chamaecyparissus s.str. form an easily distinguishable species group within the S. chamaecyparissus complex. All the three species are polyploid and show a distinctive combination of morphological characters: yellow flowers, tomentose stems, and long and tomentose leaves with short and obtuse segments.
Santolina insularis is endemic to Sardinia and is hexaploid with 2n = 6x = 54 chromosomes. Santolina corsica is recorded for both islands and is tetraploid with 2n = 4x = 36 chromosomes [24]. In Sardinia, it is recorded only for Monte Albo, in the central-eastern portion of the island, so that it shows an allopatric distribution with S. insularis. Santolina chamaecyparissus is a pentaploid sterile unit of unknown origin, and it is only known as a cultivated plant [19]. It is morphologically similar to S. corsica/S. insularis and shows an irregular karyotype structure [24,25]. Marchi and D'Amato [26] hypothesized that this pentaploid species may have arisen from a cross between a tetraploid and a hexaploid Santolina, involving possibly S. insularis as one of the putative parents.
The phenotypic differentiation of these three species is not straightforward. When Fiori [27] published the basionym of S. insularis, he expressed doubts regarding the taxonomic distinction of the newly described taxon with S. corsica. Later, the same author [28] considered these taxa as distinct, although he remarked their morphological affinity. In the same work, Fiori [28] reported several localities for S. insularis in continental Italy, but these localities were later confuted by Arrigoni [25], who referred them to naturalized populations of S. chamaecyparissus. Chromosome counts published by Marchi and D'Amato [26] and Marchi and collaborators [29] highlighted different ploidy levels for these three species. Accordingly, the current taxonomic circumscription of S. corsica and S. insularis heavily relies on the different ploidy levels and, secondarily, on qualitative morphological observations [30].
The aim of this study is to test the current two-species taxonomic hypothesis for Sardinia and Corsica by using an integrated approach [31,32], that involves molecular analyses, cypsela morpho-colorimetry, morphometry, and niche similarity tests. Santolina chamaecyparissus, traditionally considered as closely related, is also included in the study.

Collection of Material and Data
For molecular, cypsela morpho-colorimetric, and morphometric analyses, five populations of S. insularis, two populations of S. corsica, and one naturalized population of S. chamaecyparissus were sampled (Table 1). For morphometric analyses, 20 individuals were collected for each population (nine for S. chamaecyparissus, corresponding to the whole naturalized population). Leaf material from 3 out of the 20 individuals was taken, preserved in silica-gel, and was used for molecular analyses. As regards cypsela morpho-colorimetric analysis, cypselae of the Sardinian populations were already available at the Sardinian Germplasm Bank of the University of Cagliari (BG-SAR). Cypselae of the Corsican population of S. corsica were sampled in the field, while S. chamaecyparissus has been excluded from this analysis since no cypsela was found in the field. For the niche similarity tests, occurrence data from Sardinia were downloaded from Wikiplantbase #Sardegna [33], while occurrence data for Corsica were obtained from literature [27] and personal communications (J.-M. Tison). Santolina chamaecyparissus was excluded from the niche analysis, since this species is of unknown origin and its distribution is crucially affected by human cultivation [19]. Distribution of species and sampled populations are summarized in Figure 1.

Molecular Analysis
DNA was extracted using the kit ExgeneTM Plant SV Mini (GeneAll Biotechnology, Singapore). Both nuclear (ITS region) and plastid markers (trnH-psbA, trnL-trnF, trnQ-rps16, rps15-ycf1, psbM-trnD, and trnS-trnG) were screened for variability. Amplification was carried out with a Polymerase Chain Reaction (PCR) using the primers reported in Table 2 (amplification conditions for each marker are reported in Table 3). The following reagents were added in 0.2 mL Eppendorf tubes: 12.5 µL of Kodaq 2X PCR MasterMix (Applied Biological Materials, British Columbia, Canada), 1 µL of forward primer 10 µM, 1 µL of reverse primer 10 µM, and the previously extracted DNA (0.5-1.5 µL); then, tubes were filled with distilled water up to the volume of 25 µL. PCR products underwent quality checks and purification. The final product (5 µL) was then diluted with 8 µL of distilled water, and, finally, underwent capillary electrophoresis with a 3130 Genetic Analyzer (Applied Biosystems, Massachusetts, USA).
The electropherograms were edited with the software Sequence Scanner 2 (Applied Biosystems, Massachusetts, USA), while sequences were edited and aligned with the software Bioedit v. 7.2.5 [34] and Clustal W [35]. The sequences were submitted to GenBank ( Table S1). The nuclear markers were not further analyzed because of the total absence of variation among sequences. Concerning cpDNA markers, a haplotype network based on the concatenated alignment was built using the software TCS v. 1.21 [36]. Gaps were treated as missing data.

Cypsela Morpho-Colorimetric Analysis
In order to check for possible differences in external shape, size, and colour of cypselae, a morpho-colorimetric analysis was carried out. This approach is based on image analysis and turned out to be useful to untangle discrimination problems both in cultivated [42,43] and in wild plants [44][45][46]. Digital images of 100 cypselae for each population (99 for S. insularis from Monte Spada) were acquired using a flatbed scanner (Epson Perfection V550) with a digital resolution of 1,200 dpi. The scanner was calibrated following the protocol proposed by Shahin and Symons [47]. To avoid interference from environmental light, cypselae were randomly disposed on the scanner tray and covered by a box with white paper and then with a box with black paper. After acquisition, descriptors of cypsela size, shape, and colour features were measured and analyzed using ImageJ v1.52b (http: //rsb.info.nih.gov/ij, accessed on 25 November 2021). For each cypsela, 20 colorimetric and 26 morphometric characters were measured using the plugin Particles8 [48]. This plugin was also able to obtain 78 additional morphological parameters by computing the elliptic Fourier descriptors (EFDs). To minimize measurement errors and to optimize the efficiency of shape reconstruction, 20 harmonics were used to define the cypsela boundaries [49,50]. In Figure S1, a graphic explanatory representation of some morphometric variables analyzed by this approach is reported.
After standardization of data, a stepwise Linear Discriminant Analysis (hereafter LDA) was carried out to test the correct classification of groups (putative species and populations). This analysis was performed using the software SPSS 16.0 (Statistical Package for Social Science, IBM Corp., Armonk, NY, USA). Tolerance and F-to-Remove parameters were used by the model to select the best set of variables. Tolerance indicates the proportion of the variance of a factor not accounted by other independent variables, while F-to-remove defines the power of each variable in the model [42]. The performance of the model was tested with a leave-one-out cross-validation procedure. Moreover, for each discriminant function, the Wilks' λ, the percentage of explained variance, and the standardized canonical discriminant function coefficients (SCDFCs) were calculated. Wilks' λ is a measure of how well a discriminant function separates cases into groups, while SCDFCs allows comparing and ranking variables measured on different scales. The Box's M test was used to verify the homogeneity of the covariance matrices of the variables chosen in the stepwise LDA, while the analysis of standardized residuals was performed to verify the homoscedasticity of the variance of variables [51]. Kolmogorov-Smirnov's test was performed to compare the empirical distribution of the discriminant functions with the relative cumulative distribution function of the reference probability distribution. Levene's test was performed to assess the equality of variances for the discriminant functions.

Morphometric Analysis
Forty-four morphological characters were measured on dried material (the list of the morphological characters are reported in Table 4). Characters were selected among those used in literature for species discrimination [18,22,23,30]. Depending on the character (Table 4), measurements were made using a digital caliper or ImageJ v.1.52b software. For this latter case, images with a resolution of 1,200 dpi were acquired using a flatbed scanner (EPSON perfection 2480 photo). As regards the degree of tomentosity on leaves and stems (fs_hair, fsl_hair, and ssl_hair in Table 4), images were acquired using a digital camera (Canon PowerShot S45) mounted on a WILD Heerbrugg M420 stereomicroscope. For these variables, a portion of stem and a leaf segment was selected, and the degree of tomentosity was calculated by dividing the surface covered by hair by the total selected surface. After the measurements, specimens have been preserved in PI (acronym follows Thiers' Index Herbariorum [52]) and were digitized (HD images are available at JACQ Virtual Herbaria: https://www.jacq.org/, accessed on 9 December 2021). To explore the overall morphological variability, a principal coordinate analyses (PCoA) based on Gower distance [53] was carried out after standardization of variables. The Random Forest (RF) method was used to test the correct classification of a priori established groups according to the current taxonomic hypothesis and to alternative grouping hypotheses derived from other analyses. Random Forest is increasingly being used as a classification method in morphometric analyses [54][55][56], since it is able to manage datasets containing both quantitative and qualitative variables. In addition, it is non-parametric, hard to overtrain, and it can also be used when there is covariation among variables [57,58]. It is an algorithm based on decision trees, where each decision tree works on a bootstrap dataset. At each node of the tree, the algorithm uses a subset of random variables. After the "forest" (it consists of some hundreds of decision trees) is built, each decision tree casts an unweighted vote by assigning each sample to a group. Finally, the membership of a sample to one group rather than another depends on the total number of votes. The algorithm works firstly on a training dataset, and then it repeats the procedure on the remaining subset, called "test set". Random Forest analysis was carried out in R environment using the package "randomForest" (version 4.6-14, [59]). The function tuneRF was used to calculate the optimal number of variables to use at each node. One hundred iterations of this function were applied, and the suggested optimal number showing the highest frequency was selected. Then, a forest consisting of 800 decision trees was built. The function varImp was used to rank the variables (morphological characters) according to their discriminant power using the metric "mean decrease accuracy" (MDA). We specified the argument TRUE so that the algorithm evaluates the possibility of covariation among variables. For each tested hypothesis, random forest was repeated 10 times with different random "seeds" using the set.seed function. The 10 confusion matrices produced were used to calculate a "mean" confusion matrix, composed of mean percentage values of classification.
Finally, univariate analyses were conducted on all the quantitative variables comparing populations. A Bartlett test was performed to test the homoscedasticity. When p > 0.05, ANOVA followed by Tukey-Kramer post hoc test were performed. When p < 0.05, Kruskal-Wallis test followed by Wilcoxon-Mann-Whitney with Bonferroni correction were performed.

Niche Analysis
Following the approach published by Broennimann and collaborators [60], we carried out a niche analysis in a multivariate space defined by the climatic conditions in which the taxa occur. We firstly tested the current taxonomic hypothesis, and then we tested two alternative taxonomic hypotheses resulting from cypsela morpho-colorimetric and molecular analyses.
For each taxonomic hypothesis, we calculated the niche overlap between groups using Schoener's D overlap [61], whose values range from 0 (no overlap) to 1 (full overlap). Then, we used the similarity test to evaluate whether niches are more or less similar to each other than predicted by chance. The observed niche overlap was compared with the overlap measured between the niche of one group and the niche obtained by randomly sampling occurrence points in a background area of the other group. This randomization was repeated 100 times. To test whether our results are robust to different selections of the background, we defined three background areas using a 5, 10, and 15 km buffer zone around the occurrences. The analyses were conducted in R (R Core Team, 2019) using the "ecospat" package [62].

Molecular Analysis
The nuclear marker ITS1-5,8s-ITS2 (861 bp) did not provide any informative site and was therefore excluded from further inquiry (alignment is provided in Supplementary Materials). Concerning cpDNA data, the markers trnH-psbA and trnQ-rps16 show the highest number of informative sites, followed by rps15-ycf1 and psbM-trnD (Table 5). Con-versely, the marker trnF-trnL shows only one informative site. The concatenated matrix is 3,403 bp long and the number of informative sites is 34 (1%). Three haplotype groups are evident. The first group includes the two populations of S. insularis from south-eastern Sardinia (Buggerru and San Benedetto), which share the same haplotype. The second group includes S. chamaecyparissus only, while the third group includes all the remaining accessions, subdivided in four haplotypes. In particular, S. corsica from Mont Pigno belongs to a distinct haplotype, as well as two out of three individuals of S. insularis from Monte Corrasi and one individual of S. insularis from Monte Spada (Figure 2).

Cypsela Morpho-Colorimetric Analysis
The LDA selected 14 variables (Table S2) for their discriminant power. The EFDs and the colorimetric parameters were not considered useful by the algorithm and were discarded. In Table 6, the confusion matrix of the current taxonomic hypothesis is reported. Overall, the percentage of correct classification is 77.9%. An additional LDA was carried out considering all the populations as groups (the confusion matrix is reported in Table 7). In this case, the percentage of correct classification is obviously lower (51.9%). The three populations of S. insularis from central-eastern Sardinia show values of correct classification ranging between 53% and 66%. They tend to confuse each other, but they are scarcely confused with all the other populations. Similarly, all the other populations tend to confuse each other, but are scarcely confused with the populations of S. insularis from central-eastern Sardinia. This slight distinction allows the detection of two groups of affinity: the first one is composed of the populations of S. insularis from central-eastern Sardinia (Laconi, Monte Spada, and Monte Corrasi), and the second one composed of all the remaining populations of S. insularis and S. corsica. The dispersion of the standardized residuals tested by Levene's test, and the normal probability plot (P-P) comparing expected and observed cumulative probabilities are reported in Figure S2.  Table 7. Cypsela morpho-colorimetric analysis of all the studied Santolina populations. Confusion matrix of the LDA analysis applied comparing each population of S. corsica and S. insularis.

Morphometric Analysis
Two slightly separated groups can be observed in the PCoA (Figure 3): the first one is composed of S. corsica from Mont Pigno (type locality) and S. insularis from Buggerru, while the second one consists of all the remaining populations. In Table 8, the result of the Random Forest analysis testing the current taxonomic hypothesis is reported. Santolina corsica shows the lowest mean value of correct classification (46%), and it is largely misclassified as S. insularis. No individual samples of S. insularis or S. corsica are misclassified as S. chamaecyparissus.  Random forest was also used to test alternative taxonomic hypotheses that can be deduced by the information derived from PCoA, from cypsela morpho-colorimetric analysis, and from molecular analysis. According to PCoA, we split populations from Corsica and Sardinia into two groups (Table 9). Santolina chamaecyparissus is misclassified in 10% of cases with "Group 2". However, no individual of other groups is misclassified with S. chamaecyparissus. The other two groups show higher values of correct classification (82% and 97%), when compared to the groups tested in the current taxonomic hypothesis (Table 8). Furthermore, the two groups of affinity highlighted by the cypsela morpho-colorimetric analysis show similar correct mean classification rates (Table 10): 83% for "Group A" (populations of S. insularis from central-eastern Sardinia) and 94% for "Group B" (all the remaining populations of S. insularis and S. corsica). Santolina chamaecyparissus is misclassified in 26% of cases with populations from central-eastern Sardinia albeit, also in this case, no individual of other groups is misclassified with it. Finally, the results of Random Forest analysis applied to the three haplotype groups detected by molecular analysis are reported in Table 11. The group composed of the populations of S. corsica and S. insularis from central-eastern Sardinia shows high mean correct classification rates (99%), while the group composed of the two populations from south-eastern Sardinia show considerably lower values (57%). Santolina chamaecyparissus is classified correctly in 78% of cases on average, and no individual of other groups is misclassified with it. In Table 12, the mean (± standard deviation) values of quantitative morphological characters for each Santolina population are reported. All populations differ significantly from each other by 2-14 characters. The minimum number of characters is found by comparing Monte Corrasi with Laconi and Monte Spada populations. The maximum number of characters is found by comparing S. chamaecyparissus with Mont Pigno (Table 12  and Table S3).

Niche Analysis
According to the current taxonomic hypothesis, the niche overlap between the two putative species ranges from 0.26 to 0.36, and the similarity test shows that the climatic conditions of S. corsica are more similar to those of S. insularis in the three background areas. According to the alternative grouping derived from molecular results, the niche overlap ranges from 0.03 to 0.05, and the similarity test was not significant. Finally, according to the alternative grouping derived from the cypsela morpho-colorimetric analysis, the niche overlap ranges from 0.09 to 0.25, and the similarity test indicated that the climatic conditions of the group composed of S. insularis of south-western Sardinia plus S. corsica are significantly more similar than vice versa (Table 13). Table 13. Results of niche overlap and niche similarity tests for the current taxonomic hypothesis and two alternative groupings in Santolina. In the alternative grouping derived from cypsela morphocolorimetric results, "Group A" is composed of the populations of S. insularis from central-eastern Sardinia, while "Group B" is composed of S. corsica and the populations from south-western Sardinia. In the alternative grouping derived from molecular results, Group x is composed of populations from south-western Sardinia, while Group y is composed of the remaining populations. Backgrounds are defined by applying 5, 10, and 15 km buffer zones around the occurrence points. Significant results are indicated by 'less' for significant divergence or 'more' for significant similarity between tests. "ns" (not significant) when p > 0.05.

Discussion
The current circumscription of S. corsica and S. insularis is not supported on morphological grounds. This result is in accordance with Angiolini and Bacchetta [63], who stated that it is impossible to distinguish the two taxa only by morphological features. Conversely, our results are in contrast with Arrigoni and collaborators [30] and Arrigoni [22], who consider the distinction between the two putative species as sufficiently clear. According to their qualitative morphological observations, the main differences would lie in the colour of the flowers, the colour of the anthers, the peduncle under the floral head (more widened in S. insularis), and the length of the segments of the sterile stem leaves. In the field, we detected no difference in the colour of the flowers and anthers, which are as yellow as the majority of the other Santolina species [18,22,23]. Moreover, the width of the floral head peduncle is an extremely variable character (also within the same individual), so that it was preliminarily discarded from the list of morphological variables studied. As regards the length of the leaf segments, according to Arrigoni and collaborators [28] and Arrigoni [48], they should be 1-3 mm long in S. corsica, and 2-5 mm long in S. insularis. We did not detect significant differences for this character among populations (with the exception of the comparison Buggerru vs. Mont Pigno), and all the populations show the same range of variation (fsl_seg_length and ssl_seg_length in Table 12).
According to molecular analysis, all the studied populations share identical sequences of the ITS region, suggesting close phylogenetic relationships. Despite this, based on plastid markers, three haplotype groups can be detected: (1) S. chamaecyparissus, (2) the populations from Corsica and central-eastern Sardinia, and (3) the two populations from southern-western Sardinia (Figure 2). These latter populations are separated from others by Campidano graben, the greatest plain valley in Sardinia [63]. This plain was formed during the middle Pliocene and was submerged several times by marine transgression events [64,65] causing isolation. In addition, according to niche similarity tests, it seems that the Sardinian south-western populations occur in different climatic conditions with respect to all other populations from Corsica and Sardinia. The different climatic niche could reflect a potential adaptive differentiation, that allow us to define these populations as a full Evolutionary Significant Unit (full ESU). A full ESU, as defined by Guia and Saitoh [66] based on a concept previously theorized by Ryder [67], is a population showing both genetic and adaptive variation. In addition, an ESU represents a unit with a strong evolutionary potential, relevant for conservation purposes [68]. However, from a morphological point of view, the two populations from southern-western Sardinia are far from similar ( Figure 3). The population from Buggerru is morphologically much more similar to the population from Mont Pigno (Corsica, type locality of S. corsica), whereas that from San Benedetto (type locality of S. insularis) approaches all the remaining populations.
The cypsela morpho-colorimetric analysis points to a different grouping as well. Indeed, the three populations of S. insularis from central-eastern Sardinia (Laconi, Monte Spada, and Monte Corrasi) show a certain affinity, and are scarcely misclassified with the remaining populations. However, according to the results of the LDA, these three populations are only slightly differentiated from the remaining ones, and the values obtained by the LDA were more related to the cypsela morphology than colour (Table S2). The overall morphological homogeneity is in accordance with Briquet [69], who stated that the cypsela morphology is very similar in the whole genus Santolina.
Finally, climatic niches are overlapping following both the current taxonomic hypothesis and the grouping suggested by the cypsela morpho-colorimetric analysis.
Each different method used in our integrated taxonomic approach tends to group the studied populations in different ways. In any case, the current taxonomic circumscription of S. corsica and S. insularis is no longer supported. According to Giacò and collaborators [24], despite the different ploidy level, these two species show a very similar karyotype structure. Based on all this evidence, we think that the best option is to consider the tetraploid and hexaploid populations of Santolina from Corsica and Sardinia as a single polymorphic species, showing two cytotypes. Due to nomenclatural priority, the name that has to be applied is Santolina corsica Jord. & Fourr., so that the name Santolina insularis (Gennari ex Fiori) Arrigoni becomes a heterotypic synonym.
In literature, other cases of species endemic to Corsica and Sardinia showing multiple cytotypes are reported. For instance, Thymus herba-barona Loisel. (Lamiaceae) is a polymorphic species endemic to Corsica, Sardinia, and Mallorca, showing three different ploidy levels. A population genetic study [14] highlighted that, despite the high morphological and genetic variability among populations, there is no phylogeographic discontinuity among islands, so that the cytogenetic differences are not sufficient to support the distinction of three different taxa, as previously proposed. Another similar case concerns Borago pygmaea (DC.) Chater & Greuter (Boraginaceae), a species endemic to Corsica, Sardinia, and Capraia (a small island between Corsica and continental Italy), which shows both tetraploid and hexaploid ploidy levels [13]. Finally, another case of a single species showing two cytotypes can be found within the S. chamaecyparissus complex: S. villosa Mill. is a species endemic to continental Spain, which includes both tetraploid and hexaploid populations [24].
Our molecular and morphometric results also confirm that S. chamaecyparissus may be a species arisen from a cross between a tetraploid and a hexaploid Santolina, similar or identical to at least one of those growing in Sardinia and Corsica, as previously hypothesized by Marchi and D'Amato [26] and Giacò and collaborators [24]. In particular, morphometric analysis suggests that S. chamaecyparissus is very similar to the populations from central-eastern Sardinia (Figure 3). Despite this, when considered as a distinct a priori group in Random Forest tests, no individual of other populations is misclassified as S. chamaecyparissus. Accordingly, on taxonomic grounds, S. chamaecyparissus can be considered as a distinct species, thanks to its karyological and morphological peculiarities. For instance, it shows a lower number of leaf segments (see fsl_n_seg and ssl_n_seg in Table 12 and Table  S3), and the tubular portion of the flowers is longer (flower_length in Table 12 and Table  S3), if compared with all the other studied populations. If S. chamaecyparissus ever existed as a wild species in the past, we may assume that it possibly occurred in Sardinia, where both putative tetraploid and hexaploid wild progenitors can be found. Alternatively, S. chamaecyparissus may also be a species of full anthropogenic origin, possibly also involving other Santolina species as progenitors. In this context, we highlight that this plant was already widely known at the time of Pliny the Elder (AD 23-79; Naturalis Historia XXIV: "Chamaecyparissos herba ex vino pota contra venena serpentium omnium scorpionumque pollet" [the plant "Chamaecyparissos", drunk with wine, is effective against the venoms of every snake and scorpion]). However, more studies are needed to detect the origin of this pentaploid species, as well as that of the Sardinian and Corsican tetraploid and hexaploid cytotypes in a complex mostly composed of diploid species [24]. To address these questions, an integrative taxonomic and systematic study extended to the whole S. chamaecyparissus complex is ongoing.

Conclusions
This work contributed to a better understanding of plant diversity in a complex evolutionary context such as the insular system in the Mediterranean region, which hosts a high number of endemic species. Our integrated taxonomic approach highlighted the impossibility of consistently separating S. insularis from S. corsica, which have to be considered as just two cytotypes of the same species, according to our data. Indeed, the populations of Santolina from Corsica and Sardinia are polymorphic, but the morphological differences are not related neither with the ploidy level nor with cpDNA variation, albeit the identical ITS region sequences suggest a strong relatedness of these populations. Even without considering ploidy levels as relevant, cpDNA variation alone is not sufficient to justify a different taxonomic treatment for the populations from southern Sardinia. However, thanks to their genetic distance, coupled with different climatic niches, we propose treating these populations as an Evolutionary Significant Unit (ESU). Finally, our results highlighted a strong affinity of the pentaploid cultivated S. chamaecyparissus with the populations from Corsica and Sardinia. The taxonomic consequences of our study are the following: Santolina corsica Jord. and Fourr., Icon. Fl. Eur. 2: 9. 1869 ≡ Santolina chamaecyparissus var. incana subvar. corsica (Jord. and Fourr.) Rouy, Fl.  Table S1: GenBank accession numbers, Table S2: variables used by the LDA in the cypsela morpho-colorimetric analysis, Table S3: morphological characters that are significantly different among Santolina populations, Figure S1: graphic explanatory representation of some morphometric variables used in the cypsela morpho-colorimetric approach, Figure S2: residuals plot (a) and normal probability plot (b), "alignments.zip": alignments (in FASTA format).