Spatial Phylogenetics, Biogeographical Patterns and Conservation Implications of the Endemic Flora of Crete (Aegean, Greece) under Climate Change Scenarios

Human-induced biodiversity loss has been accelerating since the industrial revolution. The climate change impacts will severely alter the biodiversity and biogeographical patterns at all scales, leading to biotic homogenization. Due to underfunding, a climate smart, conservation-prioritization scheme is needed to optimize species protection. Spatial phylogenetics enable the identification of endemism centers and provide valuable insights regarding the eco-evolutionary and conservation value, as well as the biogeographical origin of a given area. Many studies exist regarding the conservation prioritization of mainland areas, yet none has assessed how climate change might alter the biodiversity and biogeographical patterns of an island biodiversity hotspot. Thus, we conducted a phylogenetically informed, conservation prioritization study dealing with the effects of climate change on Crete’s plant diversity and biogeographical patterns. Using several macroecological analyses, we identified the current and future endemism centers and assessed the impact of climate change on the biogeographical patterns in Crete. The highlands of Cretan mountains have served as both diversity cradles and museums, due to their stable climate and high topographical heterogeneity, providing important ecosystem services. Historical processes seem to have driven diversification and endemic species distribution in Crete. Due to the changing climate and the subsequent biotic homogenization, Crete’s unique bioregionalization, which strongly reminiscent the spatial configuration of the Pliocene/Pleistocene Cretan paleo-islands, will drastically change. The emergence of the ‘Anthropocene’ era calls for the prioritization of biodiversity-rich areas, serving as mixed-endemism centers, with high overlaps among protected areas and climatic refugia.


Phylogenetic Diversity
We estimated for each grid cell the phylogenetic alpha diversity (PD [1]) of the species inhabiting each of the grid cells with the 'picante' 1.6-2 R package [2] and the standardized effect size scores with 'PhyloMeasures' 2.1 R package [3]. We tested for non-random patterns in PD by estimating their standardized effect size (SES) scores as: where Xobs is the observed score within each grid cell and mean (XNULL) and standard deviation (XNULL) are the mean and standard deviation of a null distribution of scores generated by shuffling taxa labels of the grid cell-by-species matrix 999 times. We assessed the statistical significance of the SES scores by calculating two-tailed p-values (quantiles) as: where rankobs is the rank of the observed scores compared with those of their null distributions, and runs is the number of randomizations [2,3]. SES scores with p < 0.05 and p > 0.95 were considered as significantly lower and higher than expected for a given PD value, respectively. Positive SES values indicate phylogenetic overdispersion, whereas negative SES values indicate phylogenetic clustering. The greater sensitivity of SES to more terminal structure makes it better suited to explore assembly processes working at finer temporal and spatial scales [4].

Biodiversity Analyses
We followed the CANAPE protocol for spatial phylogenetics analyses as set out in [5,6]. We carried out all the relevant analyses in Biodiverse version 3.0 [5]. We first calculated phylogenetic endemism (PE [7]) and relative phylogenetic endemism (RPE [6]). PE is the total branch length from the dated phylogenetic tree of the lineages present at a grid cell divided by the range sizes of the respective lineages. RPE is the ratio between PE measured from the original phylogeny in relation to the PE estimated from a phylogeny with equally distributed branch lengths (see [6] for more details). Relative phylogenetic diversity (RPD) is also a ratio that compares the phylogenetic diversity (PD) observed on the actual tree in the numerator to that observed on a comparison tree in the denominator. To make them easily comparable between analyses, the trees in both the numerator and the denominator are scaled such that branch lengths are calculated as a fraction of the total tree length. The comparison tree retains the actual tree topology but makes all branches of equal length. Thus, RPD is PD measured on the actual tree divided by PD measured on the comparison tree, while RPE is PE measured on the actual tree divided by PE measured on the comparison tree [6]. RPE is the basis for the categorical analyses of neo-and paleo-endemism (CANAPE).

Randomization Tests
We assessed the statistical significance of PD, PE, RPD and RPE by the following [6] approach. We compared the actual PE and RPE values of each grid cells to the 999 values of a null distribution, using the 'rand_structured' option in Biodiverse. In this model, species occurrences in grid cells are randomly reassigned to grid cells without replacement, thus keeping constant both the total number of grid cells for each species and the species richness of each grid cell. We ran 999 randomizations, calculating PD, PE, RPD and RPE for each run. These values formed a null distribution for each grid cell for use in non-parametric tests of the significance of observed values. We estimated p-values from a two-tailed distribution of values to identify areas with higher (>0.975) or lower (<0.025) PE or RPE than the null distribution [6].

CANAPE
CANAPE is a two-step procedure discriminating grid cells with significantly high PE in neo-or paleo-endemism based on species occurrences and the dated phylogenetic tree [6].
First, to determine whether a site is a center of significantly high endemism, a grid cell needs to be significantly high (one-tailed test, α = 0.05) in the numerator of RPE, the denominator or both.
If (and only if) grid cells pass one of those tests, then they are divided into four meaningful, non-overlapping categories of centers of endemism [6]. If a point is significantly high in the RPE ratio (two-tailed test, α = 0.05), then it is a center of paleo-endemism (contains significantly more endemic species on long branches). If a point is significantly low in the RPE ratio (twotailed test, α = 0.05), then it is a center of neo-endemism (contains significantly more endemic species on short branches). If it is significantly high in both the numerator and the denominator (taken alone), but not significant for RPE, then it is a center of mixed endemism. Mixed endemism can be interpreted as a center of endemism having a mix of rare long and rare short branches, so not significantly dominated by either paleo-endemism or neo-endemism. The mixed endemism areas are further subdivided: those grid cells that are significantly high in both the numerator and the denominator at the α = 0.01 level are termed super-endemic sites (i.e., highly significant concentration of endemic long and short branches [6]).
All analyses were performed using Perl wrapper functions to run Biodiverse in R modified from https://github.com/NunzioKnerr/biodiverse_pipeline.

Future Diversity Patterns [Changes in Phylogenetic Beta Diversity (ΔBD)]
From the initial set of 44 predictors, only four were not highly correlated (Spearman rank correlation < 0.7 and VIF < 2 [8]). Multicollinearity assessment was performed with the 'usdm' 1.1.18 package [9]. All variables were standardized [i.e., (value-mean/standard deviation)] to help ensure model convergence and enhance comparability of parameter estimates. Model selection was based on Akaike's information criterion corrected for small sample sizes (AICc). We used the dredge function in the 'MuMIn' 1.15.6 package [10] to run a complete set of models with all possible combinations of the predictor variables and to identify the set of 'best models' according to the widely accepted criterion for different AICc values: ΔAICc < 2 (all models with ΔAICc < 2 are considered as equally parsimonious and as having relatively similar levels of support [11]). If more than one model had ΔAICc < 2, we calculated the relative importance of each variable as the sum of AICc weights (Akaike weights-wAICc) for the models in which the variable was included ( [11]; see also [12]). Akaike weights are directly interpreted in terms of each model's probability of being the best supported for explaining the data [11,12]. Finally, we calculated the normalized root mean square error (RMSE) for each set of the 'best' models with the 'sjstats' 0.11.2 R package [13]. Table S1. Summary statistics regarding altitude (m a.s.l.) for the different types of endemism centres as well as for the not-significant sites. NS: not-significant. SD: standard deviation.   Figure S1. Categorical Analysis of Neo-and Paleo-Endemism (CANAPE) results estimated for coarser geographical scales , based on the phylogeny generated following the framework proposed by [14,15].    Yellow areas indicate sites with high current beta-diversity that will continue to have proportionally high beta-diversity.

Type Median SD Min Max
Blue areas indicate sites that beta-diversity is predicted to increase in the future. Green areas indicate sites where beta-diversity will remain largely unchanged. We used a function generated by José Hidasi-Neto to generate the map (http://rfunctions.blogspot.ca/2015/03/bivariate-mapsbivariatemap-function.html).   Figure S13. The values of the Silhouette index for the k-means and the CLARA unsupervised clustering algorithms regarding the optimal number of biogeographical sectors (clusters) predicted to occur for the BCC 2.6 GCM/RCP combination in Crete. Figure S14. The values of the Silhouette index for the k-means and the CLARA unsupervised clustering algorithms regarding the optimal number of biogeographical sectors (clusters) predicted to occur for the BCC 8.5 GCM/RCP combination in Crete. Figure S15. The values of the Silhouette index for the k-means and the CLARA unsupervised clustering algorithms regarding the optimal number of biogeographical sectors (clusters) predicted to occur for the CCSM4 2.6 GCM/RCP combination in Crete. Figure S16. The values of the Silhouette index for the k-means and the CLARA unsupervised clustering algorithms regarding the optimal number of biogeographical sectors (clusters) predicted to occur for the CCSM4 8.5 GCM/RCP combination in Crete. Figure S17. The values of the Silhouette index for the k-means and the CLARA unsupervised clustering algorithms regarding the optimal number of biogeographical sectors (clusters) predicted to occur for the HadGEM2 2.6 GCM/RCP combination in Crete. Figure S18. The values of the Silhouette index for the k-means and the CLARA unsupervised clustering algorithms regarding the optimal number of biogeographical sectors (clusters) predicted to occur for the HadGEM2 8.5 GCM/RCP combination in Crete.