Limitations of Species Distribution Models Based on Available Climate Change Data: A Case Study in the Azorean Forest

: Climate change is gaining attention as a major threat to biodiversity. It is expected to further expand the risk of plant invasion through ecosystem disturbance. Particularly, island ecosystems are under pressure, and climate change may threaten forest-dependent species. However, scientiﬁc and societal unknowns make it di ﬃ cult to predict how climate change and biological invasions will a ﬀ ect species interactions and ecosystem processes. The purpose of this study was to identify possible limitations when making species distribution model projections based on predicted climate change. We aimed to know if climatic variables alone were good predictors of habitat suitability, ensuring reliable projections. In particular, we compared the performance of generalized linear models, generalized additive models, and a selection of machine learning techniques (BIOMOD 2) when modelling the distribution of forest species in the Azores, according to the climatic changes predicted to 2100. Some limitations seem to exist when modelling the e ﬀ ect of climate change on species distributions, since the best models also included topographic variables, making modelling based on climate alone less reliable, with model ﬁt varying among modelling approaches, and random forest often providing the best results. Our results emphasize the adoption of a careful study design and algorithm selection process. The uncertainties associated with climate change e ﬀ ect on plant communities as a whole, including their indigenous and invasive components, highlight a pressing need for integrated modelling, monitoring, and experimental work to better realize the consequences of climate change, in order to ensure the resilience of forest ecosystems in a changing world.

The biosphere has entered an era of rapid environmental changes resulting in significant negative impacts on ecosystem resilience [9][10][11][12]. However, scientific and societal unknowns make it challenging

Study Area
The proposed study was applied in three distinct geographic areas: Pico, Terceira, and São Miguel islands, covering circa (ca.) 447 km 2 , 400.6 km 2 , and 744.6 km 2 , respectively. The use of distinct areas of an archipelago allows for the identification of the major factors underlying species distributions at different environmental scenarios and multiple spatial scales (e.g., see [62,71,72,74,75]).
The Azores archipelago (ca. 2346 km 2 ), located in the North Atlantic Ocean and west of continental Portugal, is particularly well-studied and features rich data for invasive and native species occurrence, and environmental data [69][70][71]. The first human settlements were established in the Azores around A.D. 1440 [52]. More than 550 years of human presence has taken its toll on the local biodiversity, including the 420 endemic species [52,76]. The natural vegetation includes diverse communities, namely coastal vegetation, coastal and inland wetlands, meadows, peat bogs, and several types of native forests and scrubs. Nowadays, 81.1% of the vascular plant species are exotic, although most are not invasive [77]. In fact, before human settlement, natural forests could have covered around 75% of the land surface of the Azorean islands [78].
The replacement of the natural forest in the Azores has followed a clear temporal sequence. Human activities had restricted the native forest in most islands to areas above 500 meters above sea level (m a.s.l [78,79]). During the last decades of the 20th century, the reduction of native forest area was significant, with the clearing of large fragments, at mid and high altitude, for pasture [52,80]. Pittosporum undulatum, A. melanoxylon, or Eucalyptus globulus Labill. dominate most forest patches located in low-to mid-elevation areas. At higher altitudes, Cryptomeria japonica D. Don. dominates, along with the remaining stands of natural forests, particularly above 600 m a.s.l. [78,81,82].

Species Data
We assembled forest species data from the Forest Inventory of the Azores [83]. Occurrence records for studied species (P. undulatum, A. melanoxylon, and M. faya) were obtained by using all the polygons identified in the Forest Inventory. The data, provided in vector format (shapefile), were converted to raster (RST) and comma-separated values (CSV) formats for further analysis. All biological data were geocoded to a grid of 100 × 100 m in order to match with the environmental variables used (Figures 1-3). We assembled forest species data from the Forest Inventory of the Azores [83]. Occurrence records for studied species (P. undulatum, A. melanoxylon, and M. faya) were obtained by using all the polygons identified in the Forest Inventory. The data, provided in vector format (shapefile), were converted to raster (RST) and comma-separated values (CSV) formats for further analysis. All biological data were geocoded to a grid of 100 × 100 m in order to match with the environmental variables used (Figures 1-3).       The three target species show different patterns of spatial distribution and range boundaries. Since, below 600 m a.s.l., most forest patches in Pico, Terceira and São Miguel islands are dominated by P. undulatum and A. melanoxylon (invasive plants), along with individuals remaining from former native forests, such as M. faya [71,72], these three species were selected. Furthermore, their distribution and ecology is relatively well known and stable compared with other plant taxa. Eucalyptus globulus Labill. and C. japonica are also important forest species but, in the Azores, there are mostly restricted to those areas where they were planted.
Pittosporum undulatum is a shrub or tree up to 15 m tall, and in its native range it is found in different types of Eucalyptus forest in Australia, being one of the most abundant species [84]. The canopy is pyramidal, 3-5 m in diameter, the leaves are evergreen, ovate, and with a wavy margin, and shows white-creamy perfumed flowers in loose umbellate cymes that bear one to four insect pollinated flowers [85][86][87][88][89]. It is widespread in the three islands but especially in Pico Island [71,74,[90][91][92].
Acacia melanoxylon is an evergreen tree with a pyramidal or rounded dense crown, usually reaching 8-15 (20) m high [87]. In the Azores, this species has important settlements on Pico, in all the northern slopes of the island, and some scattered stands in Terceira and São Miguel islands, however its interest fell sharply with the introduction of C. japonica in the regional wood market [87].
Morella faya is an evergreen shrub or small tree, found on all the Azores islands and is frequently a dominant species in lowland and coastal native vegetation [78,79,[93][94][95]. Nevertheless, it can also be found up to 600 m of altitude and even as high as 1000 m in Pico Island. In the Canary Islands and Madeira, it is found up to 1300 m [96,97]. According to the Forest Inventory of the Azores [83], M. faya is currently present in a total of 22% of the forested area in the Azores archipelago but in only 5% of that area it is the dominant species [71]. For this reason, it is fundamental to preserve M. faya and the associated flora and fauna [71,97].

Climate Data
The CIELO model [98][99][100] is a simple layer model based on the transformations experienced by an air mass crossing over the mountain, and simulates the evolution of an air parcel's physical properties starting from the sea level up to the mountains [62]. Original climate data comprised 12 monthly time step variables developed for the project by the CIELO model [98][99][100] at a 100 × 100 m grid resolution. This model allowed us to obtain a high number of climatic variables associated with temperature ( • C), relative humidity (%), and precipitation (mm) for the 20th and 21st centuries.
We related species distributions to climate using six variables derived from the original climate data set: Annual minimum temperature ( • C), annual maximum temperature ( • C), annual mean temperature ( • C), annual minimum relative humidity (%), annual maximum relative humidity (%), and total annual precipitation (mm). The selection of these set of particular climatic variables was considered, since they were shown to be relevant for the study species [91,92].
Possible future climate scenarios in Azores were based on runs of the CIELO model forced with boundary conditions derived from RCPs suggested in the fifth Assessment Report of The Intergovernmental Panel on Climate Change [101]. The IPCC scenario RCP8.5 (more "severe" scenario) was chosen following a precautionary approach. This option is based on the historical and still actual trend on the greenhouse emissions, which follows the scenario that leads to radiative forcing of 8.5 W/m 2 by the year 2100 [62,101,102].

Topographic Data
We also added topographic variables to the models, allowing the inclusion of geomorphological aspects, given their previous importance in former modelling exercises [71,74,91,92].
Topographic data were derived based on a digital elevation model (DEM) available in CIELO ( [98][99][100]; see http://www.climaat.angra.uac.pt). The variables used were elevation, aspect, slope, curvature, flow accumulation, summer hill shade, and winter hill shade. Curvature is the second derivative of the surface, which highlights the local geomorphology, specifically the flatness, convexity, and concavity of the surface. Flow accumulation is a simulation of the superficial flow by considering that all raster cells flow into each downslope cell in the DEM. Hill shade is a simulation of the lighting conditions on the surface dictated by the topography and the position of the sun (the summer and winter solstices were considered).
The identification of the most suitable variables for modelling is decisive to maximize the accuracy of distribution models and their reliable projection in space and time. Table 1 shows a characterization of the ecogeographical variables used, and Table 2 shows the summary of present and future climatic variables for the three islands.

Modelling Approach
Potential changes in the spatial distribution of the target species were predicted under each climatic scenario by applying a modelling approach-BIOMOD (BIOdiversity MODelling; "biomod2")-based in a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models, and the examination of species-environment relationships [103,104]. BIOMOD includes the ability to maximize the predictive accuracy of current species distributions and the reliability of future potential distributions, using different types of statistical modelling methods into different environmental conditions (e.g., climate, topographic, or land-use change scenarios) [73,104]. BIOMOD 2 ([105]; version 3.3-7.) is implemented in R ( [106]; version 3.3.1; R Development Core Team, 2016) and is a freeware, open-source package.

GLM: Generalized linear models
This is a generalization of ordinary linear regression, allowing for response variables that have normal (i.e., Gaussian) or non-normal error distribution models, such as the binomial and Poisson, through the use of a link function. Akaike's or Bayesian information criteria (AIC or BIC) are used to select the most informative and parsimonious model. [107,116,117]

GAM: Generalized additive models
A non-parametric extension of GLMs, where the linear predictors correspond to smooth nonlinear functions of the predictive variables.
Typically use the same underlying distributions for the response variable as GLMs. [108,118]

ANN: Artificial neural networks
Models are based on a stepwise progression of multivariate and univariate analyses, which are versatile in extracting information out of complex data, and which can be effectively applicable to classification and association. [119,120]

GBM: Generalized boosted models
The individual models consist of classification or regression trees. In an iterative process, a final model is built by progressively adding trees while re-weighting the data poorly predicted by the previous tree.

RF: Random forest
Consists in an ensemble of unpruned classification or regression trees, created by using bootstrap samples of the training data and random feature selection in tree induction. [113,121,122]

CTA: Classification tree analysis
Operates by recursively parsing the training observations on a binary splitting measure applied to explanatory variables, such as spectral responses. [114,123,124]

SRE: Surface range envelope
It is a presence-only approach, based on the environmental conditions corresponding to the occurrence data, allowing the definition of the environment where a species can be found. Uses the extreme percentiles, as recommended by Nix or Busby.
[ 105,115,125] Most modelling techniques implemented in BIOMOD 2 require that species distribution data include presences and absences [73,104]. When data are presence-only, a possible solution is to generate random pseudo-absences [126,127]. This can be done in BIOMOD 2 using strategies of increasing complexity [104,128].
The modelling approach was divided into two parts based on pseudo-absence requirements: GLM-GAM-ANN-SRE and GBM-RF-CTA. Since those algorithms require presence and absence data, we generated pseudo-absences following the recommendations of Barbet-Massin et al. [129]. For GLM-GAM-ANN-SRE, we used 10000 random background absences across the whole study extent; and for GBM-RF-CTA we used a number of pseudo-absences equal to the number of occurrences. In order to minimize projection bias, we applied the "randomPoints" function in R package "dismo" [130,131].
As no independent data existed to evaluate the predictive performance of the models, species presences were randomly split into test data (30%) and training data (70%), and 10 replicate model runs per algorithm were performed to account for model variability [103,117].
A preliminary screening was performed to investigate which variables would be more suitable for modelling, based on model evaluation (described in Section 2.4). The models were tested in a sequential manner, starting with models developed using climatic variables only and, in a second step, calculating new models that also included topographic variables. We calculated a total of 843 models with different combinations of climatic or climatic and topographic variables-ecogeographic variables (EGVs), for each of the nine species/islands combinations. More specifically, we started with models incorporating all the climatic variables in Table 1, and progressed to calculate models including all the climatic and topographic variables in the same Table 4. During model evaluation, we defined the best variable set for each species/island combination.

Model Validation
Two evaluation measures were calculated: The true skill statistic (TSS; [132]) and the area under the curve (AUC) of the receiver-operating characteristic (ROC) plot [133]. Both evaluation measures were calculated via the proportion of the two prediction types "sensitivity" (correctly predicted presences) and "specificity" (correctly predicted absences) which take into account all four elements of the error matrix (true and false presences and absences), derived from the test data [117,134]. The TSS requires a specific threshold level and is a prevalence independent alternative to Cohen's kappa [132]. It ranges from -1 to +1, with values above 0.4 indicating a statistically reliable model performance [132,135]. In terms of prediction accuracy, TSS values ranging from 0.2 to 0.5 were considered poor, from 0.6 to 0.8 useful, and values above 0.8 were good to excellent [136]. The ROC curve, proposed by Peterson et al. [137], was used to assess the predictive performance of the models. A good model is defined by a curve that maximizes sensitivity for low values of the false positive fraction [138].
To guarantee most accurate model predictions, AUC values lower than 0.5 were considered poor, in the range 0.5-0.7 considered fair, in the range 0.7-0.9 considered good, and excellent for values above 0.9 [139].

Model Projections
For the best models, including climatic only or both climatic and topographic variables, and for the nine species/islands combinations, we projected the respective potential distributions for the reference period (present) and for the future conditions, using the respective functions of BIOMOD 2. The two probabilistic ensemble forecasts were translated into two binary maps using the value that maximizes the AUC scores as the threshold for distinguishing presence and absence predictions. The two maps were then reclassified into four levels, according to the respective quartiles of habitat suitability.

Variable Importance
In BIOMOD 2, variable importance is calculated by shuffling a single variable of the given data set. The value returned is based on the correlation of the model predictions obtained (i) with the initial dataset and (ii) with the shuffled data set. The highest the value (i.e., closer to 1), the more influence the variable has on the model. A value of 0 assumes no influence of that variable on the model. Note that this technique does not account for interactions between the variables [105]. This calculation is accessible using the variables_importance() function. Table 4. Sets of ecogeographic variables (EGV) that produced models with highest fit in Pico, Terceira, and São Miguel islands using BIOMOD 2. PU: P. undulatum; AM: A. melanoxylon; MF: M. faya. The modelling approach was initiated by calculating models with all the climatic or all the climatic and topographic variables described in Table 1. During model evaluation, we determined the best variable sets for each species/island combination, which are reported herein.

Summary of the Best Models
From the preliminary analysis of more than 800 models including different combinations of EGVs, the best models were selected based on TSS and AUC values. A summary of those models is provided in Table 4 and in Figures 4-6.
For TSS and AUC, EGV set 1 (climatic variables only) gathered the least accurate predictions among the different environmental datasets. The models derived by using EGV set 2 (also including topographic variables) were among the best, according to the TSS and AUC mean values (Figures 4-6).

Relative Importance of EGVs
The importance of EGVs varied between species and islands (Supplementary Material-Figures S1, S2, S3). For P. undulatum, annual minimum temperature, total annual precipitation, and elevation were more important in Pico Island, while total annual precipitation and slope were more important in Terceira and São Miguel islands. For A. melanoxylon, annual maximum relative humidity, elevation, and slope were more important in Pico Island, annual minimum temperature and annual maximum temperature in Terceira Island, and total annual precipitation and slope in São Miguel Island. Regarding M. faya, total annual precipitation and winter hill shade were more important in Pico Island, and total annual precipitation and slope in Terceira and São Miguel islands.

Present and Future Projections
The probability data were projected into map format in order to compare the realized habitat distribution in the present conditions and the future expected distribution under the EGV set 2 (Figures 7-9).
The present and future maps (Table 5) revealed changes in the species' probability of occurrence (increase/decrease), thereby affecting species' range sizes (gains/losses) at the study locations.
According to model projections, most species would decrease their geographic space when considering the change in the number of cells belonging to the third and fourth quartiles of probability (i.e., probability of occurrence above 0.5 that is above 500 in the maps).
Regarding Pico Island, in response to the future climate scenario, the three species showed a reduction in the probability of occurrence at the lowest elevation belt, and therefore a loss of the more adequate habitat, although with an increase in habitat quality at higher elevations, but still below the 0.5 probability value (Table 5; Figure 7). In Terceira Island, models predicted an extension of the more adequate habitat for A. melanoxylon and P. undulatum, although much lower for the latter. For M. faya, a pattern similar to what happened in Pico Island was found, with some degradation of habitat quality at the lowest elevation belt and a slight increase in habitat quality at higher elevations (Table 5; Figure 8). Still, the spatial distributions patterns of the species remained more favourable at relatively low elevations, except for A. melanoxylon, which would apparently gain habitat at the interior areas of the island. Finally, for São Miguel Island, M. faya, which presently has a very restricted distribution, is predicted to have a profound increase in its most suitable habitat, while for A. melanoxylon and P. undulatum, a pattern similar to that found in Pico Island is expected, with habitat degradation at the lowest elevation belt and some increase of habitat suitability at higher elevations, although still below a probability of 0.5 (Table 5; Figure 9).

Discussion
One important research topic is to understand the currently ongoing decline in the biodiversity of the ecosystems [140]. Changes in land use and climate have been identified as the main drivers of this decline [24,65]. Forest ecosystems are under pressure, as climate change may threaten species across a wide range of taxonomic groups [64,65] and the conservation of forest biodiversity depends

Discussion
One important research topic is to understand the currently ongoing decline in the biodiversity of the ecosystems [140]. Changes in land use and climate have been identified as the main drivers of this decline [24,65]. Forest ecosystems are under pressure, as climate change may threaten species across a wide range of taxonomic groups [64,65] and the conservation of forest biodiversity depends on predicting the effects of many elements of global change. However, this is a daunting and complex task [2]. One of the greatest uncertainties in the available climate changes variables comes from the downscaling method used to refine projections at the regional scale [141].
An impressive number of new climate change scenarios have recently become available to assess the ecological impacts of climate change [40,41]. Among these impacts, shifts in species ranges analysed with SDM algorithms are the most widely studied [38,40,62].
Casajus et al. [40] suggest the use of clustering approach, k-mean clustering, which reduces the number of climate change scenarios needed to project species distributions, while retaining the coverage of uncertainty in future climate conditions. This methodology selects an objective subset of climate change scenarios and offers an efficient guidance to project species distribution through time.
Our results showed that, despite the good accuracy of the model's predictions, there was considerable variation depending on the type of modelling approach, the species and the island. Therefore, modellers must be aware that their results will include several degrees of uncertainty, when projecting possible changes in species distribution attributed to climate change [40,131,142,143]. Moreover, in general, the use of climatic EGVs only led to a lower accuracy in model predictions, when compared with models also including topographic variables [144]. Furthermore, the association with topographic variables might have a connection with the history of land use in the target area. As an example, the present distribution of M. faya in São Miguel Island might be the result of forest clearing, leading to the limitation of species occurrence to sloped terrain or the ridges of volcanic craters, since the flatter areas were converted to agriculture [72,94,97]. A similar situation is seen with P. undulatum in São Miguel Island, where the plant has occupied native forest habitat and forest patches along water lines and sloped terrain [90], while in Pico Island, it has invaded large areas of abandoned vineyards at the lowest elevation belt. Therefore, we should stress that the previous history of land use might be reflected on the present distribution of the target species. Thus, their present distribution might not be at balance with the environment, thereby affecting future predictions largely based on the present relationship between climatic data and species distribution data, and leading to a possible underestimation of the extent of suitable habitat see [71,91].
In this context, a critical question arises. Which and how many climatic environmental/climate change scenarios are required to project species distributions, and carry out the impact analyses that cover the range of possible climate futures? Casajus et al. [40] propose an objective approach to select climate scenarios when projecting species distribution under climate change. Yet, there are no fixed guidelines when selecting the best type of climatic variables to map spatial projections, which is also contingent on the particular period of time to be considered and on the magnitude of environmental changes in the study area [145]. When environmental changes are weak, range dynamics might be determined predominantly by intrinsic population dynamics, as described by Lawton [146] and Pimm [147]. However, when environmental changes are strong, species distributional dynamics should be driven heavily by these changes [145].
Although a new climate might not directly favour invasive plant species over natives, many invasive plant species share traits that could increase their dominance in a transitioning climate scenario [2]. Moreover, new suitable areas might emerge in what were previously unsuitable or marginal habitats while previously suitable areas might be converted into unsuitable or marginal ones [145,148,149].
In our analysis, we found a clear effect of the type of modelling algorithm on model adjustment. RF and GAM showed the highest TSS and AUC values when compared to the other algorithms. One of the main advantages of GAM is that it offers a great flexibility in order to represent the relationship between the dependent variable and the explanatory variables [108]. GAM models are usually more adjusted due to their non-parametric component or semi-parametric component (i.e., "smooth functions"), and can fit nonlinear curves in a way that permits a great variety of functional forms [108,118]. On the other hand, RF is an ensemble classifier that consists of many decision trees, and generates an output class that is the mode of the classes generated by the individual trees [113]. It is one of the most accurate learning algorithms available; for many datasets, it produces a highly accurate classifier [150]. However, as this a single case study, caution is required before making specific recommendations about which algorithm to favour over others, as the algorithms that performed well here may not do so in other areas or under other circumstances [13]. In previous studies in the Azores, using MaxEnt, and targeting the same species and islands, we have found AUC values that were generally slightly below, or in particular cases, slightly above those presently obtained with RF or GAM [92]. It has been suggested that the use of multiple SDMs, which vary in complexity and statistical mechanisms, is a more robust way to assess species distributions [151,152]. One study reported a comparison of performance of five modelling approaches (GLM; GAM, RF; GBM; and MaxEnt) and concluded that none provided superior predictions in all performance criteria [153], in agreement with previous comparative studies [154,155].
Regarding potential model overfitting, an option to avoid this situation would be to use a metric such as AIC (Akaike's information criterion) or other derived criteria (e.g., BIC, Bayesian information criterion; WAIC, Watanabe-Akaike information criterion), as reported in our previous research dealing with GLMs in a maximum likelihood or Bayesian context (see [92]). However, experts suggest that errors can occur in the estimation of the degrees of freedom (i.e., the number of parameters in the model is not the effective number of degrees of freedom in the models) when computing AIC for MaxEnt, GBM, RF, and other algorithms [156]. Therefore, we followed the guidelines suggested by Hauenstein et al. [156] and focused on cross-validation. Although some authors consider that the predicted distributions obtained by different SDMs/algorithms are affected by a variety of factors, but still improved by combining them in an ensemble [103], we experienced some difficulties in following this guideline. In fact, we opted to base our projections in the algorithm that provided the best fit, since the projections obtained by the ensemble models often suggested extreme scenarios for species increase of decrease. This was even more accentuated for those islands/species where land use might have had a stronger effect on species distribution (e.g., M. faya in São Miguel Island). In addition, when ensemble model selection entirely relies on the AUC metric (which has been criticized; [157,158]), then the additional spatial error introduced may have a greater effect due to the inclusion model being sensitive to this error, while producing a high value of AUC [131]. In the case of our study, we further evaluated the predictive performance of the models, by randomly splitting data into test (30%) and training sets (70%), by replicating models 10 times, and also by using two evaluation metrics.
The importance of the EGVs varied between species and islands. However, elevation and slope were among the most important variables in the EGV sets. In terms of climatic variables, those related to annual precipitation values seemed to play an important part in habitat suitability. All these findings agree with previous modelling studies in Azores [71,72,74,91]. In general, the importance of EGV sets indicated that elevation and precipitation influence the ecological niche, suggesting that the studied species tend to avoid the high mountains, where temperatures are low and relative humidity and rainfall are high.
Regarding models projections and the expected changes in species distribution, our results revealed different outcomes for different species and islands. Namely, in some cases, species are predicted to clearly increase their distribution range, e.g., M. faya in São Miguel Island. This could be expected since the present distribution of this species was most likely reduced by land use changes that occurred in the past [71,97]. In contrast, for Pico Island, and particularly for P. undulatum and A. melanoxylon, a reduction in the number of cells within the third and fourth quartiles of habitat suitability was predicted. Apparently, habitat adequacy will decrease at low elevation, probably due to a temperature increase, while slightly increasing at higher elevations, but without a sufficient compensation for the habitat degradation that might occur at the lowest elevation belt.
In previous studies in Terceira and São Miguel islands, Ferreira et al. [62] found a tendency for upward shift in altitude in the suitable climate space of the analysed vascular plants, with the highest decrease occurring on coastal areas, a similar situation to what was found in the present study. However, regarding vascular plants, the mentioned researchers focused mainly on coastal species with a relatively narrow altitudinal distribution, whereas we analysed the distribution of more common and widespread species. Thus, differences in future predictions are expected when modelling species with very different ecological niches. Ferreira et al. [62] also found a possible increase in suitable climate space for the endemic Laurus azorica (Seub.) Franco, particularly in Terceira Island, which is agreement with our predictions of habitat gain for P. undulatum and A. melanoxylon in Terceira, and M. faya in São Miguel Island.
Meanwhile, prediction of range shifts under changing climatic conditions has important limitations, mainly related to the disregard of biotic interactions (e.g., [159]), dispersal limitations and intraspecific variation in niche breadth [160][161][162]. Moreover, land use might also play an important role in the present distribution of the target species [71,91], and will certainly affect their distribution in the future. Furthermore, the reliability of species distribution models depends on the quality of the used data [152], which results from historic factors that might have affected the present distribution (e.g., forest clearing, active plantation), besides the analysed climatic variables. Also, our results suggest that a potential extension of the range of an invasive species could not be the obligatory outcome of climate change, as seen for P. undulatum in Pico Island, with a possible degradation of its habitat at low elevation but a slight increase in habitat quality at high elevations. This still adds more uncertainty to the real outcome of climate change in the future distribution of native and invasive trees.
The future climate projections in Azores (PRAC 2017-Programa Regional para as Alterações Climáticas dos Açores; [163]) suggest an upward trend in global mean temperature, projections also verified by the IPCC (Intergovernmental Panel on Climate Change; [101]). However, due to the strong Atlantic influence, the future projections indicate that this increase will not be as marked as in the continental regions, namely Mainland Portugal [163]. This is mainly because of the geographical location of the islands, the greater oceanic thermal inertia, and the heat exchanges between this environment and the atmosphere [99,100,163]. Towards the end of the century, mean temperature will change with a projected increase between 1.4 and 1.9 • C for RCP4.5 and between 2.5 and 3.2 • C for RCP8.5. The increase of temperature will be more accentuated in the islands of the western group [163,164]. Changes in precipitation across the archipelago will not be uniform and some shift of precipitation from littoral to the interior will be expected [164]. Globally, for the entire surface of each island, a moderate increase in precipitation in the RCP8.5 scenario, for the period 2010-2039, is expected, especially for the central group. For the same scenario, a generalized decrease in precipitation is expected for all the islands by the end of the century, especially in the western group [164].
As seen above, the present study suggests that much attention has to be devoted to algorithm selection, differences within the same species for different habitats (i.e., islands or regions), and differences among species [131]. Peculiarities such as the previous historic land use of the target areas, might have constrained the present distribution of the target species, rendering predictions based on climate variables alone prone to potential error. Meanwhile, the inclusion of topographic or land use data in the modelling process should not be seen as way to replace climatic data, but integrated in a sequential manner, as proposed in our research. This would allow the identification of possible effects derived from previous land use, while avoiding the risk of biasing predictions under future climate, with unknown but potentially different land use. Therefore, our results emphasize the adoption of a careful study design and algorithm selection process. The uncertainties associated with climate change, as well as the unknown future of plants communities as a whole, including the response of both their indigenous and invasive components, highlight a pressing need for integrated modelling, monitoring, and experimental work to realize the ecological and geographical consequences of climate change, in order to ensure the resilience of forest ecosystems in a changing world.

Conclusions
Our results emphasize the need for the adoption of a careful study design and algorithm selection process. In fact, we found some relevant differences in model performance among different algorithms. In particular, RF was revealed to perform consistently well along the different islands and species. We also found that, in general, the EGVs associated with species distributions were similar to what was obtained in the previous modelling exercises. Also, models that including topographical variables performed better than models including climatic EGVs alone. This shows that other aspects, besides climate, most likely associated with historic land use changes, presently play an important role in forest species distribution, therefore potentially blurring our understanding of future range alterations associated with climate change. We also obtained some results that had not been previously anticipated, namely a global degradation of P. undulatum habitat in Pico Island, resulting from a potential loss of adequate habitat at the lowest elevation belt, which would not be fully compensated by the slight increase in habitat suitability that would be expected at high elevation. We should also mention that the future climate scenario used in this exercise (RCP8.5) mainly predicts an increase in temperature between 2.5 and 3.2 • C, with only a slight (and again uncertain) change in relative humidity and rainfall. Considering all this underlying uncertainty, the continuous but uncertain action of land use in the future, and the variation in the predictions that were obtained, depending on the island and on the target species, we conclude that the present predictions about changes in species distribution ranges associated with climate change are still profoundly uncertain.

Supplementary Materials:
The following are available online at http://www.mdpi.com/1999-4907/10/7/575/s1, Table S1: Evaluation of distribution models/algorithms (mean values) in Pico Island by the true skill statistic (TSS) and the receiver operating characteristics curve (AUC), using two different EGV sets; Table S2: Evaluation of distribution models/algorithms (mean values) in Terceira Island by the true skill statistic (TSS) and the receiver operating characteristics curve (AUC), using two different EGV sets; Table S3: Evaluation of distribution models/algorithms (mean values) in São Miguel Island by the true skill statistic (TSS) and the receiver operating characteristics curve (AUC), using two different EGV sets; Figure S1: Mean values of relative explanatory variables used to predict the distribution of P. undulatum (PU), in Pico, Terceira and São Miguel islands. Mean values correspond to EGV set 2; Figure S2: Mean values of relative explanatory variables used to predict the distribution of A. melanoxylon (AM) in Pico, Terceira and São Miguel islands. Mean values correspond to EGV set 2; Figure S3: Mean values of relative explanatory variables used to predict the distribution of M. faya (MF) in Pico, Terceira and São Miguel islands. Mean values correspond to EGV set 2.
Author Contributions: L.D.S. and L.S. conceived and designed the modelling questions, performed the modelling exercises, and analysed the data; E.B.d.A. and F.V.R. contributed with EGV information and R.B.E. revised the paper and discussion.