Combining Satellite Remote Sensing and Climate Data in Species Distribution Models to Improve the Conservation of Iberian White Oaks ( Quercus L.)

: The Iberian Peninsula hosts a high diversity of oak species, being a hot-spot for the conservation of European White Oaks ( Quercus ) due to their environmental heterogeneity and its critical role as a phylogeographic refugium. Identifying and ranking the drivers that shape the distribution of White Oaks in Iberia requires that environmental variables operating at distinct scales are considered. These include climate, but also ecosystem functioning attributes (EFAs) related to energy–matter exchanges that characterize land cover types under various environmental settings, at finer scales. Here, we used satellite-based EFAs and climate variables in species distribution models (SDMs) to assess how variables related to ecosystem functioning improve our understanding of current distributions and the identification of suitable areas for White Oak species in Iberia. We developed consensus ensemble SDMs targeting a set of thirteen oaks, including both narrow endemic and widespread taxa. Models combining EFAs and climate variables obtained a higher performance and predictive ability (true-skill statistic (TSS): 0.88, sensitivity: 99.6, specificity: 96.3), in comparison to the climate-only models (TSS: 0.86, sens.: 96.1, spec.: 90.3) and EFA-only models (TSS: 0.73, sens.: 91.2, spec.: 82.1). Overall, narrow endemic species obtained higher predictive performance using combined models (TSS: 0.96, sens.: 99.6, spec.: 96.3) in comparison to widespread oaks (TSS: 0.80, sens.: 92.6, spec.: 87.7). The Iberian White Oaks show a high dependence on precipitation and the inter-quartile range of Normalized White Oaks and provide spatially explicit geospatial information about each oak species (or set of species) relevant for developing biogeographic conservation frameworks. on performance scores (p<0.001). Estimated coefficients show that (despite variability) the “climate-only” and the “EFA-only” sets both have on average less performant models in relation to the combined climate–EFA set, since both coefficients are negative and, respectively, equal to: -0.018 ± 0.015 and -0.124 ± 0.014 (estimate ± std. error). These estimates also show that performance gains are higher in relation to the “EFA-only” set but less in relation to the “climate-only” (see Supplementary Information S5 for more details). The AICc-based model ranking (which measures model fitting performance while accounting for the number of variables in the models; Supplementary Information S 6 ) also showed that for nine (out of thirteen) species, including Q. broteroi , Q. canariensis , Q. ×cerrioides, Q. lusitanica, Q. marianica, Q. orocantabrica, Q. pubescens, Q. pyrenaica and Q. subpyrenaica , better model performances are attained for the climate–EFA combined models. For Q. faginea , Q. petraea and Q. robur the EFA-only models obtained better performances according to the AICc-based ranking, while for Q. estremadurensis the climate-only model was the best one. This evidences that the higher number of variables in combined models (i.e., higher complexity) does not affect the overall ranking.


Abstract:
The Iberian Peninsula hosts a high diversity of oak species, being a hot-spot for the conservation of European White Oaks (Quercus) due to their environmental heterogeneity and its critical role as a phylogeographic refugium. Identifying and ranking the drivers that shape the distribution of White Oaks in Iberia requires that environmental variables operating at distinct scales are considered. These include climate, but also ecosystem functioning attributes (EFAs) related to energy-matter exchanges that characterize land cover types under various environmental settings, at finer scales. Here, we used satellite-based EFAs and climate variables in species distribution models (SDMs) to assess how variables related to ecosystem functioning improve our understanding of current distributions and the identification of suitable areas for White Oak species in Iberia. We developed consensus ensemble SDMs targeting a set of thirteen oaks, including both narrow endemic and widespread taxa. Models combining EFAs and climate variables obtained a higher performance and predictive ability (true-skill statistic (TSS): 0.88, sensitivity: 99.6, specificity: 96.3), in comparison to the climate-only models (TSS: 0.86, sens.: 96.1, spec.: 90.3) and EFA-only models (TSS: 0.73, sens.: 91.2, spec.: 82.1). Overall, narrow endemic species obtained higher predictive performance using combined models (TSS: 0.96, sens.: 99.6, spec.: 96.3) in comparison to widespread oaks (TSS: 0.80, sens.: 92.6, spec.: 87.7). The Iberian White Oaks show a high dependence on precipitation and the inter-quartile range of Normalized Difference Water Index (NDWI) (i.e., seasonal water availability) which appears to be the most important EFA variable. Spatial projections of climate-EFA combined models contribute to identify the major diversity hotspots for White Oaks in Iberia, holding higher values of cumulative habitat suitability and species richness. We discuss the implications of these findings for guiding the long-term conservation of Iberian water) are providing new opportunities to push the boundaries of SDMs. EFAs inform on several relevant features of ecosystem functioning such as primary productivity, evapotranspiration, land surface temperature, and albedo, thereby encompassing several dimensions including carbon gains, water and energy balance [36][37][38]. EFAs are considered essential biodiversity variables (EBVs) describing the 'Ecosystem Function' class [39]. Therefore, the EFA framework provides key variables to model the ecosystem processes that maintain suitable environmental and habitat conditions for keystone species [40,41]. This will allow us to anticipate species' range shifts under global change [30]. Overall, EFAs offer an alternative and cost-effective approach to improve predictions on the distribution of species and support biodiversity monitoring once these are capable of capturing the intra-annual dynamics of ecosystem functioning.
Although the use of SDMs has played a major role in oak forest conservation [42,43], they have been mainly applied to iconic oak species, like cork oak (Q. suber L.) [44,45], or to broadly distributed species (e.g., Q. robur) or species groups [46,47], sometimes at smaller and regional scales [48,49]. We lack to date studies encompassing narrow distributed and endemic (NDE), relict species, especially in a phylogeographic hotspot such as the IP. This knowledge gap mainly derives from issues related to the taxonomic uncertainty and delimitation of species, which in turn hampers data quality and quantity. These data are typically based on inflated nomenclatural inaccuracy and outdated, due to misleading identification that results from ancient herbaria reviews. Still, it is the biggest bulk of information available in worldwide databases (e.g., https://www.gbif.org/), but its limitations hinder the process of accurately predicting species distributions across geographic space, hampering conservation efforts [50][51][52]. Furthermore, previous modelling techniques mostly rely on climatic envelopes that perform better at broad global scales, and they tend to overlook ecosystem functioning features at a fine spatial scale. To overcome this situation, SRS-based ecosystem functioning attributes integrate the effects of vegetation structure, edaphic conditions and climate at the fine local/meso scale. This makes these variables suitable to improve predictions regarding species optimal environmental conditions, to forecast the effects of environmental change, to monitor biodiversity hotspots, or to model habitat quality from local/regional to global scales [37,40,41].
This study aimed to assess whether the combination of EFA and climate variables improves the performance of SDMs using seven narrow distributed or endemic Iberian White Oaks: Quercus canariensis, Q. ×cerrioides, Q. estremadurensis, Q. lusitanica, Q. marianica, Q. orocantabrica and Q. subpyrenaica along with six widespread species Q. broteroi, Q. faginea, Q. pyrenaica, Q. petraea, Q. pubescens and Q. robur. For this aim, we developed a set of combined models (integrating EFA and climate variables) and partial models (either EFA or climate variables). We compared their performance through several metrics and identified the most suitable areas to preserve threatened species. With this goal in mind, we specifically: i) Compared the performance of SRS-EFAs as predictors in SDMs, separately and combined with climate data, to determine the current potential distribution of the thirteen target species; ii) Identified the most suitable potential areas for the conservation of vulnerable NDE oak species by comparing the spatial projections of habitat suitability derived from SDMs based on different sets of predictors; iii) Depict the fittest biogeographic areas for oak conservation based on a species-richness map obtained through stacking individual SDM's .

Study Area
The Iberian Peninsula (IP) is located in southwestern Europe (bounded by longitudes -9.55° W to 3.35̊ E and latitudes 35.87° N to 43.80̊ N) and has a total area of 583 832 km 2 ( Figure 1). The region holds a highly diverse geological history [53] and a climate that ranges from a dry Mediterranean in the southeast to a wet Temperate Atlantic in the north [13]. As result of its biogeographic idiosyncrasies and diverse edaphoclimatic conditions, the IP is a major hotspot of European plant diversity, hosting 54% of all European species (22.7% of endemicity), including 21 genera [54] and 13 recognized phylogeographic refugia [55].

Occurrence Data and Focal Taxa
Oak species geographic information was obtained through extensive fieldwork in western IP (since 2005), classic literature and herbaria review. To tackle taxonomic geographic uncertainty, we gathered the information through the thorough examination of 17 reference herbaria, in more than 6000 vouchers. This info was scrutinized and homogenized with online collections and digital databases (Supplementary Information S1). Geographical records for the target taxa were subsumed and aggregated to a 1 x 1 km grid (Datum WGS1984/UTM30N), using both accurately georeferenced points and UTM grid centroids (Supplementary Information S2). We used ESRI ArcGIS 10.3 for spatial data processing and R version 3.3.1 (R Core Team, 2019) for data management and analysis.

Satellite-Based Ecosystem Functioning Attributes (EFAs)
Satellite remote sensing data from the Moderate Resolution Imaging Spectrometer (MODIS) onboard the Terra satellite platform were used to extract EFA variables. The MOD09A1 product, delivering global surface reflectance imagery with 8 days of temporal resolution, 500 m of spatial resolution and with an archive spanning from the year 2000 to the present day, was employed to calculate two spectral indices: (i) the enhanced vegetation index (EVI) [59] as a proxy of vegetation greenness, biomass and leaf area index-with values ranging from -1 to 1, with healthy vegetation generally holding values between 0.20 and 0.80; and (ii) the Normalized Difference Water Index (NDWI) [60] as a proxy for the amount of vegetation and soil water content-with values varying between -1 and +1, with higher values corresponding to high vegetation/soil water content.
To describe the intra-annual properties of each spectral index and aiming to obtain EFAs linked to carbon and water cycles [28,29], we calculated the median (as a descriptor of central tendency and quantity), the inter-quartile range (IQR; as a measure of intra-annual seasonal variation or seasonality) and the sine and cosine transformations of the day of maximum annual value, respectively, related to spring and winter peaks (DMaxSpring or DMaxWinter as a descriptor of phenology). These statistical measures were calculated for each complete year. To capture the multiyear normal conditions of each EFA variable, thus reducing the effect of stochastic annual climatic fluctuations, we computed the overall median. Calculations were performed in Google's Earth Engine (GEE) cloud-based platform [61]. All 8-day image composites available in the GEE archive spanning from 2001 to 2019 (i.e., all complete years) were used, totalling 873 available images/dates (check Supplementary Information S4). Data were exported from GEE at 1 km of final spatial resolution, totalling eight variables: median, IQR, sine and cosine transformations, and DMax for both EVI and NDWI.

Climatic Data
Bioclimatic variables (derived from monthly temperature and rainfall data to generate more ecologically meaningful variables) were used to portray the temperature and precipitation regimes of the study area. These climatic data were obtained for historical conditions  from the WorldClim dataset version-2.1 (released in January 2020; https://www.worldclim.org/data/worldclim21.html) at a spatial resolution of 30 arc-sec (~1 km). Based on a preliminary assessment (supported by the correlation and variable importance assessment from preliminary models; not shown) [62,63], three temperature-related and two precipitation variables were used, namely: BIO01-Annual Mean Temperature, BIO03-Isothermality, BIO06-Minimum Temperature of Coldest Month, BIO12-Annual Precipitation and, BIO18-Precipitation of Warmest Quarter.

Model Development, Evaluation and Multi-Algorithm Ensembling
Species distribution models (SDMs) were developed in R statistical software [64] using the biomod2 package [65,66]. This package implements a multi-model ensemble forecasting approach which combines several existing statistical and machine-learning-based algorithms thus enabling to assess and prevent a range of methodological uncertainties in each individual modelling algorithm, as well as the examination of species-environment relationships. Models were fitted using nine modelling techniques: generalized linear model (GLM); generalized boosted model (GBMs); generalized additive model (GAM); classification tree analysis (CTA); artificial neural networks (ANN); flexible discriminant analysis (FDA); multivariate adaptive regression splines (MARS); random forests (RF); and MAXENT.Phillips2 (Maximum Entropy Model), currently available in biomod2. Given the subpar performance of the surface range envelop method (also called BIOCLIM; see e.g., [67]) and the similarity of MAXENT.Phillips (also a maximum entropy-based model available in the biomod2 package) these two algorithms were not considered. Default parameters were employed (with the exception of the smoothing degree term in GAM which was set to k = 4 to prevent over-fitting issues [68] and the number of boosting trees in GBM, n.trees = 2000). Previously to modelling, all candidate variables were inspected for multicollinearity issues using Spearman's nonparametric pair-wise correlation. By setting a threshold of |ρ| < 0.8, we retained a total of ten variables (out of 13) including: all bioclimatic variables (BIO 01, 03, 06, 12 and 18) and EVI annual median, EVI IQR, EVI DMax-Spring, NDWI median and NDWI IQR. Given that only presence data were available for the selected species, we established a total of five sets of randomly generated pseudo-absences, each set with ten times the number of presence records allowing to diversify environmental conditions in the data used for model training. Since no previous information was available about the species prevalence (p) nor a robust way to accurately estimate it and after checking the comparative results by [69], model weights were adjusted to set p=0.5 (biomod2 default) thus giving a similar weight to presences and generated pseudo-absences [69]. A spatial thinning procedure implemented in the spThin R package [70] was applied prior to modelling which imposes a minimum spatial separation between input records (through subsampling) with a distance equal to 3 km, thus decreasing autocorrelation and sampling bias effects.
Holdout cross-validation was employed to evaluate the models, with 80% of the input records used for model fitting (train data set) and 20% for model evaluation at each round (test data set). A total of ten rounds were performed for the model evaluation. For assessing the model performance, the area under the receiver-operating curve (ROC), the true-skill statistic (TSS), Cohen's Kappa (KAPPA) and the sensitivity and the specificity values were calculated [65]. Given that 450 models were generated per species, the less performant models were filtered out before the final ensemble forecasting. Hence, we selected the top 10% percentile best models for the six best performing techniques, considering the TSS rank (n=30). Based on these top performing models, an ensemble using the average was implemented, thus reducing inter-model uncertainty. To 'binarize' projections (to dichotomous suitable/unsuitable habitat) the threshold value maximizing the TSS statistic was used. A linear mixed-effects model was devised through the R package lme4 [71] to check if there was an effect of the set of predictive variables (either the combined set including the climate and EFA variables, climate-only or EFA-only; i.e., fixed factor) on the predictive performance scores (i.e., response variable) while controlling for the species and evaluation metric (i.e., random factors). An ANOVA analysis was then performed to assess if the effect of the set of predictive variables was statistically significant. Finally, given the different number of variables between models setups (partial vs. combined), we also calculated the Akaike information criterion (with finite sample size correction; AICc) for GLM's allowing to rank models [72,73] while accounting for a different number of variables in each set (i.e., partial climate or satellite EFA's models vs. combined ones).

Model Performance and Added Value of EFAs as Predictors
Overall, the models combining EFAs and climate variables obtained a higher performance and . These results show that the modelling performance for the different sets of variables and taxonomic groups increase in performance when combining Climate and EFA variables for all metrics but especially for Kappa and TSS, whereas ROC is less sensitive to improvements, followed by climate-only and a consistent lower performance for the EFA-only set ( Figure 2). This is particularly true for the NDE species-group, with the slightest differences between climate and EFA and climateonly in TSS and ROC for the remaining White Oaks. When comparing performance metrics, it is also important to consider their scale of variation; while Cohen's Kappa and TSS vary between -1 and 1 (with better models nearing one), ROC varies between 0 and 1 (also with better models nearing one) and hence the later lower ability to discriminate across good performing models holding a lower variance and saturating at the upper-end part of the spectrum. The increase in the performance by the combined set of climate and EFA is also reflected in terms of sensitivity (true positive rate), but especially in terms of specificity (true negative rate) ( Figure 3). As before, these gains are reflected especially in the NDE group, and particularly for species such as Q. ×cerrioides, Q. canariensis, Q. orocantabrica and Q. estremadurensis (Table 1 and Appendix A - Figure A1).  and specificity (or true negative rate) for all species between groups of variables. Boxes display (from bottom to top) the 25%, 50% (median) and 75% quartiles of the distribution; (b) the comparison of specificity values between different groups of species. Whiskers present the minimum and maximum values without outliers (shown as points). Sets of predictive variables (x axis) are: "CLIM_EFA" (climate and EFA combined), "CLIM_only" (climate variables only) and "EFA_only" (satellite EFA variables only).
Results from the post hoc ANOVA test based on linear mixed-effects modelling (with a species and evaluation metric as random factors) showed a significant effect of the set of predictive variables on performance scores (p<0.001). Estimated coefficients show that (despite variability) the "climateonly" and the "EFA-only" sets both have on average less performant models in relation to the combined climate-EFA set, since both coefficients are negative and, respectively, equal to: -0.018 ± 0.015 and -0.124 ± 0.014 (estimate ± std. error). These estimates also show that performance gains are higher in relation to the "EFA-only" set but less in relation to the "climate-only" (see Supplementary  Information S5 for more details). The AICc-based model ranking (which measures model fitting performance while accounting for the number of variables in the models; Supplementary Information S6) also showed that for nine (out of thirteen) species, including Q. broteroi, Q. canariensis, Q. ×cerrioides, Q. lusitanica, Q. marianica, Q. orocantabrica, Q. pubescens, Q. pyrenaica and Q. subpyrenaica, better model performances are attained for the climate-EFA combined models. For Q. faginea, Q. petraea and Q. robur the EFA-only models obtained better performances according to the AICc-based ranking, while for Q. estremadurensis the climate-only model was the best one. This evidences that the higher number of variables in combined models (i.e., higher complexity) does not affect the overall ranking.
The consistent better performance of combined climate and EFA models, despite the relatively small increment for some species (Figures. 2,3 and A1), suggests that these two types of variables, when combined, improve the predictive models. Results improved especially for NDE species which have narrower biogeographic ranges and ecological niche breadths. In contrast, models based solely on EFA variables show the lowest performances, even when compared with climate-only models. This may be explained by the coarse scale of the models developed (i.e., with low spatial resolution and a large areal extent) for which regional variations of climate patterns hold substantial more predictive power than finer/local ecosystem functioning patterns depicted by satellite EFA variables. This may be due to the particularly high affinity of most White Oak species to climate gradients, which constrain or control several important physiological aspects of these species' environmental tolerances at a regional scale [74]. The scale of the influence of variables capturing primary to secondary (or regional to local) patterns structuring species habitat given each one's spatial autocorrelation patterns, has been previously identified for SDMs [75]. This may also mean that when used at a narrower extent and higher spatial resolution (i.e., finer scale), EFA variables may prove more relevant by their ability to describe small scale variation, heterogeneity and environmental gradients in species habitat patterns, and thus present distinct predictive abilities given species attributes and modelling scale [37]. At a regional/coarser scale, EFAs are mainly driven by climate patterns while in contrast these are more linked to land-cover and land-use patterns at a finer/local scale and thus capturing human influence on ecosystems [36][37][38]. As such, EFAs offer an integrative, quicker and more up-to-date view of ecosystem responses to environmental factors and changes, in comparison to attributes of structure (e.g., patch size, fragmentation) or composition (e.g., percent cover of certain habitats) [36][37][38]76] and thus linking species responses to pressures on ecosystem functioning and its state [38]. This holds advantages for implementing species-monitoring programs [41,77,78] and at the same time, satellite-based EFAs also have the virtue of inserting more 'realism' into the spatial predictions of models. For example, implementing EFA variables in models may aid by removing unsuitable areas occupied by artificial or shrubby vegetation (both with a lower mean and seasonal variation in annual EVI or productivity) that may appear as suitable in climate-only models (which solely portray the potential distribution) while in combined models they likely do not. By including integrative, finer-scale attributes of ecosystem functioning linked to each species habitat and suitable environmental conditions, models may better approximate the realized distribution of species which may present several advantages for conservation, management and restoration efforts. Nonetheless, it is important not to blur the distinction between modelling a species distribution or its environmental suitability from actually detecting it through satellite imagery using spectral signatures or indices [79].
A frequent limitation linked to SDMs is that species occurrences (one of SDM's key inputs) are usually available at coarse resolutions [80] while their conservation and management are often required at a much finer spatial resolution [81,82]. Moreover, many predictive variables are also not measurable or available at the required spatiotemporal resolution and consequently, surrogates and interpolated data (e.g., from weather stations) have to be used instead [83,84]. Structural predictors from thematic maps (e.g., land-cover) also bear limitations by often not representing key landscape features nor the ecosystem processes relevant for characterizing a certain species status and change. Moreover, both occurrence data and predictive variables can have inappropriate or distinct spatial, thematic and/or temporal resolutions which impacts model quality and performance [85].

Combined Models for Individual Species
Overall, combined models (i.e., with both satellite-based EFA and climate variables) showed higher performance scores for all NDE taxa than other White Oaks (TSS > 0.88; Figure 2), especially for Quercus ×cerrioides, Q. canariensis, Q. orocantabrica, and Q. estremadurensis ( Table 1). The lowest TSS values corresponded to Q. faginea and Q. pyrenaica (0.71 and 0.66, respectively). ROC values were always above 0.9, suggesting an overall excellent performance of the NDE models. Cohen's Kappa values (typically more sensitive to true negative rates) varied between 0.56 and 0.97, also showing a strong performance of combined (climate and EFA) models with variations across species. In addition, sensitivity and specificity scores were always above 80% (with a slight exception for the widespread Q. pyrenaica), confirming the good and well-balanced performance of the models (Table  1; Supplementary Information S7). Table 1. Evaluation scores per studied species for the ensemble of 30 top-models combining both climate and satellite-based EFA variables (TSS: true skill statistic; ROC: area under the receiver operating curve; KAPPA: Cohen's Kappa; sensitivity (also called true positive rate or recall); specificity (also defined as the true negative rate)). The number of species records presented are computed after removing duplicates and applying the spatial thinning algorithm for imposing a minimum separation distance of 3 km between points (spThin). * narrow distributed; ** endemic. Generally, we observe a bigger relevance of precipitation-related variables and a fluctuation of the importance of a second group of predictors, either related with temperature, or with EFA variables. When considering all species, climatic variables are generally the most important ones, especially the precipitation of the warmest quarter (BIO18), followed by annual precipitation (BIO12), annual mean temperature (BIO01) and winter cold (BIO06) ( Figures. 4 and Appendix A -Figure A2). The same pattern is noticeable for the widely distributed White Oaks, but temperature-related variables (BIO01 and BIO06) become the second and third most important variables for the NDE group, followed by BIO12, with NDWI_MED, occupying the fifth position for these group of species (Appendix A - Figure A2).

Species names TSS ROC KAPPA Sensitivity Specificity
When taking into account individual species (Figure 4), summer precipitation (BIO18) is predominantly the most important variable for most of the Iberian White Oaks, only superseded by winter cold (BIO6) for Q. lusitanica, annual mean temperature (BIO01) for Q. orocantabrica, Q. petraea and Q. pyrenaica, and annual precipitation (BIO12) for Q. robur. EFA-related variables occupy the third and fourth positions for several NDE oaks, like Q. canariensis, Q. ×cerrioides, Q. marianica, Q. estremadurensis and Q. orocantabrica, especially EVI_MED and NDWI_MED. Q. faginea seems to be the species with the most balanced equilibrium between variables (Figure 4).
Regarding EFAs, its contribution was found to be more important as predictors in models of NDE species than other White Oak species as before, regarding model performances. More specifically, the inter-quartile range of the NDWI (a descriptor of seasonality in water balance/content) was the most relevant variable across all of the studied species, followed by the median of NDWI and EVI (descriptors of water balance/content and primary productivity, respectively) ( Figure 4).
These results reflect the high dependence of precipitation for these group of oaks, especially in the dry season (BIO18) for most of the Gall Oaks, or those geographically distributed throughout the sub-Mediterranean and southern areas of the IP. The winter cold (BIO06) importance explains the distribution of Q. lusitanica in the Atlantic shore of the IP, showing this species' intolerance for winter cold, while the annual precipitation (BIO12) marks the distribution of Q. robur as it cannot withstand low precipitation levels and summer drought. The third and fourth position occupied by EFA predictors for some NDE species, represent the importance of the vegetation state/quality, for the maintenance of some of these oaks, especially for the relicts Q. canariensis and Q. estremadurensis and the endemics Q. ×cerrioides and Q. marianica. These taxa are dependent on shady and moist environments, normally in deep canyons of Atlantic areas. Herein, EVI-related EFA's were found suitable to depict primary production and water availability patterns of these areas with higher levels of humidity linked to the maintenance of these species [86,[87][88][89].
The visual inspection of suitability maps for the seven NDE taxa shows the best areas for the preservation and recovery of the selected species. Here, we can compare the combined model (with satellite EFA and climate variables), with the spatial overlap of partial models (RS and climate) (Figures 5 and 6). Overall, the contribution of EFA predictors brings higher accuracy for the spatial delimitation of these species' recovery areas, by reflecting the vegetation state through primary productivity and moisture conditions (better captured by NDWI-related EFA variables), and by adding it to the climatic suitability for each species. Variable importance (average ± standard deviation across all models) for each oak species considering the two groups of predictors, climatic (precipitation and temperature) and satellite-based EFAs. BIO_1-annual mean temperature; BIO_4-temperature seasonality; BIO_6-min temperature of coldest month; BIO_12-annual precipitation; BIO_18-precipitation of warmest quarter; EVI_IQR-seasonality (EVI intra-annual variation); EVI_MED-productivity (EVI median); EVI_DMS-phenology (EVI day of maximum-spring). Precipitation-related variables ("Prec", shown in yellow), temperature ("Temp" in green) and remotely-sensed EFA's ("RSens", in blue).
When analysing the narrowly-distributed oaks ( Figure 5), Q. canariensis evidences suitable environmental conditions in its "typical" locations, from the Iberian southwest (Monchique and Algeciras) and Catalonian sub-populations, with the maintenance of residual areas in Morena System and suitable areas and Littoral shore of Centre Portugal (around Lisbon) (Figure 1). A similar pattern is followed by Q. estremadurensis, with a wider suitable area, through central Portugal up to Montes de Toledo (Spain) and an incursion in the Portuguese Douro thermophilic areas. Q. lusitanica preserves most of its known distribution [7,9], with the overlap of partial models being considerably smaller than the combined models ( Figure 5). When looking at endemic oaks ( Figure 6), Q. marianica shows a larger suitable area with a higher ecological plasticity in comparison to its parental species Q. canariensis, extending throughout Morena mountain chain and the western Portuguese coast. By tracking the remaining parental species (Q. broteroi) it enhances the role of hybridization by adapting to new ecological conditions as a strategy for survival in growing drier habitat types and conditions, where at least one of the parentals lost fitness (Q. canariensis) [90,91]. Q. orocantabrica shows a wider area out of its currently known range, across the Cantabrian mountains and three disjoint subpopulations in Rioja, Cuenca and the Central system (Figure 1), with the EFA predictors contributing to diagnose orophile conditions in the suitable environmental space. These novel spatial inferences may in turn help managers and stakeholders to devise much-needed restauration, recovery or re-introduction efforts [92][93][94][95]. Likewise, Q. marianica, the nothotaxa Q. subpyrenaica, show a wider and more continuous distribution in comparison to the parental Q. pubescescens (see Appendix A - Figure A4), however, in reverse, a more reticulated distribution in the overlap of the partial models. This shows a higher ecological plasticity by matching the other parental species (Q. faginea) and tracking the sub-Mediterranean belt of northeast IP, while Q. pubescens presents a more continuous temperate distribution in the partial models and a reticulated distribution in the combined ones, being a more restrict taxon in terms of distribution. The endemic nothotaxa Q. ×cerrioides has a reduced suitable area in the littoral Catalonian mountains, as it results from contact with the residual Q. canariensis forests.
The remaining Iberian White Oaks, with wider distributions, frequently present smaller overlap areas between partial models than the combined ones, highlighting more detailed areas where to recover this species group when taking into account the EFA predictors (Appendix A - Figures A3  and A4). The Quercus broteroi combined model shows a continuous area in western Iberia and throughout the Morena System and Montes de Toledo, with the overlap of partial models evidencing the best areas for this species recovery. By opposite, Q. faginea combined models show a wider area in the Douro Basin and throughout the Cantabrian mountains, Iberian and Betic Systems, and eastern IP, while the overlap shows a more conservative area for its recovery (Figure 1). Likewise, Q. pyrenaica presents a similar pattern, while including the central system and northwest Iberian mountain areas, with disjoint relict subpopulations in southern Iberia. Q. petraea and Q. robur present complementary projected areas that reflect their ecological requirements, both with a larger overlap between combined models and the overlap of partial models. Q. pubescens shows similar patterns of the nothotaxa Q. subpyrenaica, with the first having a temperate distribution while the second enters into sub-Mediterranean areas [14,96]. Both species evidence notorious differences between the combined models when in comparison to the overlap of the partial ones, with larger areas in the first ones.

Spatial Projections and Patterns of Species Richness
Model projections for all Iberian White Oak species (Figure 7) reveal areas with cumulative suitability up to seven oak taxa, highlighting the Iberian southwest mountains, the Lusitanian Basin, Montes de Toledo and to sub-littoral Catalonian mountains (Figure 1) as highly relevant. These areas can be considered the largest hotspots and the most important "lost" areas for the Iberian White Oaks conservation, being suitable for the highest diversity of oak taxa. Oaks in general show strong niche conservatism [74], particularly the Subgenus Quercus [12], that comprises temperate lineages of oaks [97], which were heavily constrained in the IP by its biogeography, prevailing only in areas with similar climates. This way water availability and the Atlantic influence plays a major role, especially when in a seasonal climate like the Mediterranean one [13]. Thus, our study provides a lacking approach to assess these woody temperate plants species richness which can provide useful data for promoting their persistence [98], while unveiling the importance of multiple drivers of ecosystem functioning and their interaction with the climate in a biogeographic transitional area between Temperate and Mediterranean Europe.
Such results allow to better understand the co-distribution and the spatial patterns of potential species richness of these taxa and therefore to know and prioritize areas with higher biogeographic conservation value. The central and western IP show the best areas for the Gall Oaks, where Q. canariensis, Q. broteroi, Q. lusitanica and Q. marianica have suitable environmental conditions to cooccur, together with Q. estremadurensis, which is a relict species, and also the presence of the widespread Q. pyrenaica, that in these thermophilic areas is also rare but potentially distributed. This area is only matched by the littoral Catalonian range, where Q. canariensis co-occurs with the roburoid Q. pubescens, Q. petraea and Q. robur, with an intricated legacy of introgression [28]. This pattern is also present in the pre-Pyrenees area, where Q. subpyrenaica marks the transition from the submediterraenan to the temperate belt alongside the contact of Q. pubescens and Q. faginea. The northern and western mountain ranges, including the Iberian, Cantabrian and Gallaecian Systems are also quite important for the preservation of the European southernmost suitable areas for Q. robur, Q. petraea and the endemic Q. orocantabrica. Moreover, this area has a major relevance for the evolutionary history of these group of oaks in terms of the European forests [11] and their inherent relevance for forest conservation when confronted with projected climatic shifts.

Conclusions
We highlighted the importance of incorporating satellite-derived ecosystem functioning attributes for mapping the potential distribution and habitat suitability of oak species, especially to those with a narrow distribution and relict-like character (NDE), in a biodiversity hotspot such as the Iberian Peninsula. Our results show that EFAs are able to improve predictions of the distribution of both narrow-and wide-ranged oak species at the regional scale, and confirm its robustness as predictors of habitat features that complement the climate influence by describing relevant complementary dimensions of the carbon cycle and water balance at the ecosystem level. Thus, satellite-based predictors such as EFAs can nowadays be computed from imagery available at high frequencies and several spatial resolutions, which allow to match the scale of the relevant processes in each study, and to anticipate species' responses to different disturbances. By doing this, our combined modelling framework illustrates the great potential of EFAs to contribute for monitoring and conservation through the essential biodiversity variables framework (EBV class 'ecosystem function').
By incorporating and combining these variables with climatic data, our work addresses the best areas for the conservation of each taxa, for which vegetation state and ecosystem functioning attributes are generally overseen in climate-only studies. The good performance of the models helped to predict the best areas for the preservation of the studied species, while variable importance measures aided to explain the main factors behind species' distributions in the geographic space. Therefore, this work also serves as base for forecasting the impact of climate change in the Iberian oak forests, in the face of future scenarios. Furthermore, our methodology proved to be suitable to be applied to analogous situations and throughout the Mediterranean Basin and targeting oak forests worldwide. This is of paramount importance in a group characterized by a complex and reticulate evolutionary history. Globally, our results show that species distribution models are a suitable tool to accurately predict oak species distribution which holds great potential to accurately depict the 'best' areas for the conservation of the studied taxa. Our study highlights the Iberian Southwest, the Lusitanian Basin and the Catalonian littoral mountains, as the most important biogeographic areas, with greater overall habitat suitability for a larger number of species, functioning as hotspots and relict-like areas for the broad-scale conservation of the Iberian White Oaks.

Figure A4
Combined models (remote sensing-RS and climate) (Left) and partial models (RS or climate and overlap) for Q.petraea, Q. pubescens and Q. robur.