Application of Thermal and Phenological Land Surface Parameters for Improving Ecological Niche Models of Betula utilis in the Himalayan Region

Modelling ecological niches across vast distribution ranges in remote, high mountain regions like the Himalayas faces several data limitations, in particular nonavailability of species occurrence data and fine-scale environmental information of sufficiently high quality. Remotely sensed data provide key advantages such as frequent, complete, and long-term observations of land surface parameters with full spatial coverage. The objective of this study is to evaluate modelled climate data as well as remotely sensed data for modelling the ecological niche of Betula utilis in the subalpine and alpine belts of the Himalayan region covering the entire Himalayan arc. Using generalized linear models (GLM), we aim at testing factors controlling the species distribution under current climate conditions. We evaluate the additional predictive capacity of remotely sensed variables, namely remotely sensed topography and vegetation phenology data (phenological traits), as well as the capability to substitute bioclimatic variables from downscaled numerical models by remotely sensed annual land surface temperature parameters. The best performing model utilized bioclimatic variables, topography, and phenological traits, and explained over 69% of variance, while models exclusively based on remotely sensed data reached 65% of explained variance. In summary, models based on bioclimatic variables and topography combined with phenological traits led to a refined prediction of the current niche of B. utilis, whereas models using solely climate data consistently resulted in overpredictions. Our results suggest that remotely sensed phenological traits can be applied beneficially as supplements to improve model accuracy and to refine the prediction of the species niche. We conclude that the combination of remotely sensed land surface temperature parameters is promising, in particular in regions where sufficient fine-scale climate data are not available.


Introduction
As high-elevation treelines can be considered indicators of past and recent climate change and variability [1,2], ecological niche modelling studies frequently use climate variables from numerical models to predict the current and future potential distribution of treeline species [3][4][5].However, these predictions potentially disregard important local abiotic or biotic factors, which influence the actual species' distribution (i.e., the realized niche) because climate is not the exclusive factor determining habitat suitability [6].To date, the number of studies modelling distribution ranges of deciduous treeline species' in the entire Himalayan mountains remains very limited (e.g., [7,8]).Studies which evaluate the performance of ecological niche models comparing many different predictor sets for modelling the current distribution of a treeline species covering the entire Himalayan arc do not exist.As pointed out by Bobrowski and Schickhoff [9], modelling species' distributions in high-altitude regions faces numerous challenges, most importantly the sparse data availability due to poor accessibility of the terrain.This applies in particular to species occurrence data (often obtained from herbaria) as well as to environmental predictors such as climate variables, which are often spatially interpolated and afflicted with errors [9].
Therefore, remotely sensed data can provide additional, spatially contiguous information in higher resolution and accuracy in particular in remote areas like the Himalayan mountains, giving insight into vegetation characteristics and spatial patterns, surface temperatures, and topographical features.Since the availability of multispectral satellite images in the early 1970s, biophysical mapping of the earth's surface has contributed to ecological studies [10][11][12].Previous studies have documented the merit of incorporating remote sensing data in species distribution models such as land cover data [13][14][15] and topographical information from the Shuttle Radar Topography Mission (SRTM) (for examples, see [16]).Today, remote sensing data play an increasing role in modelling species distributions [17][18][19][20].Compared to models using solely climatic/topographical predictors [18,[21][22][23], the inclusion of remotely sensed variables as predictors in species distribution models improves prediction accuracy and refines the mapped distribution range of the species.The spectral measurements and their derivates are directly linked to biophysical properties of the land surface which, in turn, are linked to the primary environmental regimes and to habitat quality (productivity, vegetation structure, land cover type) [24].Thus, the use of remotely sensed variables may also help to improve the understanding of the interactions of driving factors for complex species composition and vegetation zonation, e.g., at alpine treelines.This includes phenological traits, which represent species characteristics of recurring seasonal biological events in the life cycles [25,26].Remote-sensing-based phenological analyses yielded notable results for modelling species' distributions [27][28][29].
For this study in the Himalayan mountains, characterized by distinct vertical climatic gradients and respective altitudinal zonations of vegetation, we selected the treeline-forming Himalayan Birch (Betula utilis) as a target species.Birch forests feature prominently within this zonation, and B. utilis constitutes an ideal study organism due to its status as a principal deciduous broadleaved treeline species.The distribution extends across the Himalayan arc with higher dominance in the western and central part of the mountain system.Over much of its range, B. utilis forms a narrow forest belt on north-facing slopes between evergreen coniferous forests (e.g., Abies spectabilis) below and an evergreen broadleaved krummholz belt (e.g., Rhododendron campanulatum) above (for more associated tree species, see [30,31]).
Bobrowski et al. [32] and Bobrowski and Schickhoff [9] modelled the potential distribution of B. utilis at a smaller spatial extent in the Himalayan region using modelled climate-related predictor variables only.In this study, we aim to bridge the gap between species' potential and actual distributions by deriving the first comprehensive ecological niche models for B. utilis for the entire Himalayan mountains, as well as by supplementing and substituting the standard predictors with remotely sensed land surface temperature, vegetation phenology, and topography parameters.In particular, we investigate the suitability of various predictor sets including bioclimatic variables (Chelsa [33]), topography [34], phenological traits derived from MODIS Land Cover Dynamics data [35], annual cycle parameters derived from MODIS Land Surface Temperature data [36], and their combinations.In light of the above, we (1) analyse possibilities to improve the niche model of B. utilis based solely on bioclimatic variables by adding different remotely sensed variables; and (2) explore the potential of a pure remote sensing approach by substituting the bioclimatic variables with remotely sensed land surface temperature data.

Study Area and Species Data
The Himalayan mountains are located between the Indian Subcontinent in the south and the Tibetan Highland in the north, extending from Afghanistan in the northwest (c.36 • N and 70 • E) to Yunnan in the southeast (c.26 • N and 100 • E), and covering an area of more than 1,000,000 km 2 .Due to a distinct three-dimensional geoecological differentiation, the Himalayas show a high variation of climate, rainfall, altitude, and soils [37,38].The climate ranges from tropical in the Indian lowlands to permanent ice and snow at the highest elevations, and from more continental in the northwest to more oceanic in the southeast [39].
The distribution range of B. utilis extends across the Himalayan arc from Afghanistan to southwest China.Towards the eastern Himalayas, where more maritime climatic conditions favour the competitiveness of evergreen Rhododendron spp., B. utilis becomes a less frequent companion in subalpine forests and at treelines [30].The total elevational range of B. utilis extends from 2700 to 4500 m [40].In the northwest Himalayas, B. utilis is widely distributed between 3100 and 3700 m, while the range shifts to higher elevations towards the eastern Himalayas (mainly between 3800 and 4300 m).Pure birch stands with Rhododendron campanulatum and Sorbus microphylla in the understory and are often found at the uppermost limit of subalpine forests [41].
Presence-only occurrence data of B. utilis were compiled from three different sources: 215 geo-referenced records (1980-2016) were accessed via the Global Biodiversity Information Facility [42].Further, 202 records were added from a database compiled from a literature survey ([30], unpublished data).Additionally, 827 records were extracted from freely available satellite images (GoogleEarth TM [43]) and added to the dataset.Extractions from GoogleEarth have been shown to be valuable in global treeline research [44,45].These occurrence localities were validated through expert knowledge, obtained from numerous field visits in the Himalayan mountains.
Lowermost occurrences (e.g., in avalanche paths) were removed since they do not represent the "zonal" climatic conditions of the treeline birch belt.To reduce spatial autocorrelation, we kept only one occurrence point per grid cell (i.e., 1 km × 1 km), resulting in 1041 occurrence points for modelling the current distribution of B. utilis (Figure 1).

Study Area and Species Data
The Himalayan mountains are located between the Indian Subcontinent in the south and the Tibetan Highland in the north, extending from Afghanistan in the northwest (c.36°N and 70°E) to Yunnan in the southeast (c.26°N and 100°E), and covering an area of more than 1,000,000 km 2 .Due to a distinct three-dimensional geoecological differentiation, the Himalayas show a high variation of climate, rainfall, altitude, and soils [37,38].The climate ranges from tropical in the Indian lowlands to permanent ice and snow at the highest elevations, and from more continental in the northwest to more oceanic in the southeast [39].
The distribution range of B. utilis extends across the Himalayan arc from Afghanistan to southwest China.Towards the eastern Himalayas, where more maritime climatic conditions favour the competitiveness of evergreen Rhododendron spp., B. utilis becomes a less frequent companion in subalpine forests and at treelines [30].The total elevational range of B. utilis extends from 2700 to 4500 m [40].In the northwest Himalayas, B. utilis is widely distributed between 3100 and 3700 m, while the range shifts to higher elevations towards the eastern Himalayas (mainly between 3800 and 4300 m).Pure birch stands with Rhododendron campanulatum and Sorbus microphylla in the understory and are often found at the uppermost limit of subalpine forests [41].
Presence-only occurrence data of B. utilis were compiled from three different sources: 215 georeferenced records (1980-2016) were accessed via the Global Biodiversity Information Facility [42].Further, 202 records were added from a database compiled from a literature survey ([30], unpublished data).Additionally, 827 records were extracted from freely available satellite images (GoogleEarth TM [43]) and added to the dataset.Extractions from GoogleEarth have been shown to be valuable in global treeline research [44,45].These occurrence localities were validated through expert knowledge, obtained from numerous field visits in the Himalayan mountains.
Lowermost occurrences (e.g., in avalanche paths) were removed since they do not represent the "zonal" climatic conditions of the treeline birch belt.To reduce spatial autocorrelation, we kept only one occurrence point per grid cell (i.e., 1 km × 1 km), resulting in 1041 occurrence points for modelling the current distribution of B. utilis (Figure 1).

Predictor Variable Sets
The aim of the study was to evaluate the performance of four different predictor variable sets: (1) a quasi-mechanistical statistically downscaled Chelsa CLIMATE data set [33]; (2) topographical variables based on a remotely sensed Digital Elevation Model (TOPO) [34]; (3) phenological traits

Predictor Variable Sets
The aim of the study was to evaluate the performance of four different predictor variable sets: (1) a quasi-mechanistical statistically downscaled Chelsa CLIMATE data set [33]; (2) topographical variables based on a remotely sensed Digital Elevation Model (TOPO) [34]; (3) phenological traits derived from MODIS Land Cover Dynamics data (PHENO) [35]; and (4) annual cycle parameters derived from MODIS Land Surface Temperature data (LST) [36].Furthermore, we assessed the suitability of combinations of these predictor variable sets using an additive procedure, which resulted in 11 final models (Figure 2).First, every predictor set was used separately, followed by the combination of two and three predictor sets.Subsequently, two approaches were followed: (1) combinations of statistically downscaled variables with remotely sensed variables; and (2) combinations of exclusively remotely sensed variables.
To test the potential of surface temperature to substitute downscaled climate data, predictor sets TOPO and PHENO were combined with either CLIMATE or LST (Figure 2).
Remote Sens. 2018, 10, x FOR PEER REVIEW 4 of 20 derived from MODIS Land Cover Dynamics data (PHENO) [35]; and (4) annual cycle parameters derived from MODIS Land Surface Temperature data (LST) [36].Furthermore, we assessed the suitability of combinations of these predictor variable sets using an additive procedure, which resulted in 11 final models (Figure 2).First, every predictor set was used separately, followed by the combination of two and three predictor sets.Subsequently, two approaches were followed: (1) combinations of statistically downscaled variables with remotely sensed variables; and (2) combinations of exclusively remotely sensed variables.
To test the potential of surface temperature to substitute downscaled climate data, predictor sets TOPO and PHENO were combined with either CLIMATE or LST (Figure 2).All predictor variables were tested for multicollinearity using Spearman's rank correlation, since high collinearity might lead to low model performance and wrong interpretations [46].Regarding climate variables, only ecologically relevant variables were included, which represent general patterns and annual climatic variability in the Himalayan mountains.We calculated pairwise correlations, resulting in a small set of predictor variables (rs ≤ 0.7 according to [46]) (Table 1).All predictor variables were tested for multicollinearity using Spearman's rank correlation, since high collinearity might lead to low model performance and wrong interpretations [46].Regarding climate variables, only ecologically relevant variables were included, which represent general patterns and annual climatic variability in the Himalayan mountains.We calculated pairwise correlations, resulting in a small set of predictor variables (r s ≤ 0.7 according to [46]) (Table 1).

Chelsa Climate Data
Chelsa climate data (CLIMATE) are based on quasi-mechanistical statistical downscaling of the ERA-interim global circulation model with Global Precipitation Climatology Centre and Global Historical Climatology Network bias correction, and a resolution of 30 arc seconds (for details, see [33]).Precipitation amounts, estimated under consideration of orographic factors such as wind fields, valley exposure, and boundary layer height, showed high precision compared to precipitation data from other climate data sets [33].
Based on the gridded monthly fields of temperature and precipitation at a resolution of 30 arc seconds, we generated 19 bioclimatic variables (Table 1), which are widely used in species distribution modelling and represent annual characteristics (e.g., Temperature Annual Range), seasonality (e.g., Precipitation Seasonality) and extreme environmental factors (e.g., Precipitation of Driest Month) [47].In addition, the variables Average Precipitation of May and Average Precipitation of March, April, and May were calculated in order to account for potential premonsoon drought stress [41,48,49].The selected bioclimatic variables have proven to be suitable for modelling the potential distribution of B. utilis at smaller spatial scales in the Himalayan mountains [32].

Digital Elevation Model
The topographic data is based on the Digital Elevation Model obtained by the Shuttle Radar Topography Mission (SRTM) [34], which employed interferometric synthetic aperture radar, and which is considered here as a remote sensing dataset as well.The data was aggregated to a 1 km grid to calculate Slope angle and Slope aspect using SAGA GIS [50], complementing the set of potential explanatory variables.Since Slope aspect is a circular variable, it was converted into two separate continuous quantitative variables (i.e., Northness and Eastness).

MODIS Land Cover Dynamics
We chose MODIS Land Cover Dynamics (PHENO) product MCD12Q2 as it provides eight parameters that can be linked to phenological events and plants' phenology [35].The data was obtained from the NASA Land Processes Distributed Active Archive Center [35].We compiled time series with a spatial resolution of 500 m × 500 m from 2000 to 2013 using the 16-day composite.Long term means of the annual metrics were used to reduce the effect of the interannual variability.Four variables providing cardinal phenophase transition dates at annual time steps were selected (Onset Greenness Increase, Onset Greenness Maximum, Onset Greenness Decrease, and Onset Greenness Minimum).These dates correspond to the timing of vegetation green-up, maturity, senescence, and dormancy, respectively.Furthermore, we selected the Enhanced Vegetation Index (EVI [51]).The first two EVI metrics (EVI Onset Greenness Min and EVI Onset Greenness Max) correspond to the EVI value of green-up and dormancy onset dates.The third metric records the sum of fitted daily EVI values during the identified vegetation cycle (i.e., Onset Greenness Increase to Onset Greenness Minimum)-EVI Area.For further information on the calculation of each PHENO predictor, see Zhang et al. [52,53] and Ganguly et al. [54].

MODIS Land Surface Temperature
MODIS Land Surface Temperature (LST) offers a unique archive of daily LST observations in 1 km resolution [55].However, LST is a spatially and temporally highly variable quantity, and moreover affected by plenty of gaps resulting from cloud coverage.Therefore, we use long time series rather than few observations to model the long-term seasonality of LST.A model is fitted for the annual temperature cycle resulting in parameters that describe the annual temperature variation [36,56].Here, we use a model with three parameters, which is sufficient for regions outside the tropics [57].MAST is the mean annual land surface temperature for 2003-2014, YAST is the mean annual amplitude of the land surface temperature for 2003-2014, and THETA is the phase shift in days relative to spring equinox on the northern hemisphere.Additionally, we used NCSA, which represents the number of clear-sky acquisitions.These parameters represent a very robust estimate of LST and its annual dynamics but can also be used to estimate LST for any day of the year.The parameters were derived globally based on collection 5 of the MODIS daily level 3 global 1 km grid product from EOS Terra and Aqua (MOD11A1 and MYD11A1)-separately for night time and for day time (see [36] for details on calculation).For this study, these four variables were reprojected to the target grid and directly considered as predictor variables.

Modelling Algorithm
Generalized linear models (GLM) were applied since they represent a classical and robust approach to analyse presence and absence data [58].Major advantages of GLMs over more complex machine-learning algorithms (e.g., random forest) include predictions which are easily interpretable and not "black box" predictions.We used the iterative weighted linear regression technique to derive maximum likelihood estimates of the response variable, with observations distributed according to exponential family and systematic effects [59].We calculated GLMs with binomial distribution, logit-link function, and polynomial terms of second order [60], but did not include interaction terms among predictor variables.Prior to the modelling, stepwise variable selection in both directions (i.e., forward and backward) was applied using the Akaike Information Criterion (AIC) [61], resulting in the model possessing the lowest AIC value [62,63] for each predictor variable set respectively.

Pseudo-Absence Selection
As GLMs need presence and absence points, pseudo-absence points were generated.For study area selection, we used a convex hull, covering the full extent of the known occurrences of B. utilis distribution in the Himalayan region.By limiting the study area, large regions where the species cannot occur were excluded in further statistical analyses in order to prevent overpredicting the distribution range of the species [64].For random selection of pseudo-absences, the limits were set as 5 km from the nearest occurrence, resulting in total 10,000 pseudo-absences (following the pseudo-absence selection procedure for GLMs described by Barbet-Massin et al. [65]).

Model Evaluation
For model validation, all presence and pseudo-absence points were split into training and testing data samples with a ratio of 80:20 percent using random sample splitting [66].For each predictor variable set respectively, we repeated this procedure five times, resulting in five versions of the model and accuracy metrics, which were finally averaged.Due to the lack of a universally valid model evaluation measurement, we applied several performance evaluation metrics.Calculated evaluation measures included the AIC, the Area Under the Curve (AUC), Cohen's Kappa, and the coefficient of determination for explained variance in the data set (Pseudo-R 2 ; [67]).We calculated the root mean square error (RMSE) in order to account for overfitting of the data (i.e., RMSE should be very similar between training and testing data sets), indicating a good fit of all models.Moreover, visual inspection on spatial patterns of the predictions was conducted since evaluation parameters may perform well in climatic space of the model, but not in geographic space (i.e., spatial prediction).
We calculated variable importance to evaluate variable contribution in the final models for each predictor variable set respectively.All statistical analyses were performed using the programming language R [68], maps were produced using ArcGIS [69].

Model Evaluation and Comparison
Using an additive approach with four predictor sets resulted in 11 final models (Table 2) which showed substantial differences with regard to evaluation measurements.Regarding all different evaluation metrics and predictor variables, CLIMATE + TOPO + PHENO performed best, followed by CLIMATE + TOPO and LST + TOPO + PHENO, while the PHENO model exhibited the poorest performance in predicting the distribution of Betula utilis.CLIMATE + TOPO + PHENO possessed the lowest AIC with 535, followed by CLIMATE + TOPO (577) and LST + TOPO + PHENO with 594.High AUC values could be observed for all models ranging between 0.96 and 0.92, whereas the PHENO model showed a lower AUC value with 0.77.Performance ranks for Cohen's Kappa were as follows: CLIMATE + TOPO + PHENO: 0.72, Climate + Topo: 0.66, CLIMATE + PHENO: 0.66, and LST + TOPO + PHENO: 0.66, whereas PHENO ranked last (0.01).
The explained variance was highest for the CLIMATE + TOPO + PHENO model with 69%, followed by CLIMATE + TOPO and LST + TOPO + PHENO with 65%.The model solely based on downscaled climate parameters (CLIMATE) explained 56%, while the model solely based on remotely sensed land surface temperature (LST) explained 41% of the variance in the test sets and the PHENO model 15%.Generally, CLIMATE always performed better than LST, except for explained variance for the combination with phenological traits (CLIMATE + PHENO: 0.63; LST + PHENO: 0.64).However, LST benefitted more from the addition of further predictors than CLIMATE (increase of 24% compared to 13% explained variance).In combination, LST + TOPO + PHENO performed better than CLIMATE.
Values for RMSE revealed no overfitting between the training and testing data sets for all models.

Variable Importance
We selected 5 climatic variables, 2 topographic variables, 4 land cover metrics, and 4 land surface temperature variables out of 40 potential predictor variables.
In the following, only CLIMATE, CLIMATE + TOPO + PHENO, LST, and LST + TOPO + PHENO will be considered (Figure 3; see Supplementary Material Figure S1 for variable importance of all models).Relative variable importance varied among the four subsets of predictor variables.However, a few general characteristics became evident.Regarding climatic variables (i.e., CLIMATE), Precipitation of Coldest Quarter (bio19) was the most important predictor, followed by Temperature of Wettest Quarter (bio8, i.e., temperature of growing season), Temperature Annual Range (bio7), and Average Precipitation of March, April, and May (prec_mam), whereas Precipitation Seasonality (bio15) had lowest variable importance.Among the topographical variables, the highest importance was found for Slope.Phenological traits derived from PHENO data were always lower in variable importance, whereas temporal metrics, e.g., Onset Greenness Increase (Green_Inc), were superior to the Enhanced Vegetation Index (EVI) metric.
For the LST model, differences in variable importance were low between all variables, whereas highest variable importance was found for YAST (mean annual temperature).Differences in importance were striking between the predictor variable sets of the LST + TOPO + PHENO model.The highest value of variable importance was found for Slope, followed by YAST (mean annual temperature), NCSA (number of clear-sky acquisitions), MAST (mean annual amplitude of temperature), and THETA (phase shift in days relative to spring equinox on the northern hemisphere), whereas differences in variable importance were rather low between LST-related variables.For phenological traits, EVI showed highest variable importance.
Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 20 hemisphere), whereas differences in variable importance were rather low between LST-related variables.For phenological traits, EVI showed highest variable importance.

Ecological Niche Models
The continuous predictions of the models showed noticeable differences (Figure 4).According to the model evaluation measurements, CLIMATE + TOPO + PHENO performed best, followed by LST + TOPO + PHENO, whereas CLIMATE was superior to LST.
It becomes apparent that the core distribution of B. utilis was predicted in the western part of the Himalayan mountain system, whereas only the LST model predicted a principal distribution in the central part of the mountains.All models showed a uniform distribution along the Himalayan arc.The habitat predicted by CLIMATE tends to be wider in range compared to the other predictions.
Displaying differences in the model predictions in detail (Figure 5), the overall appearance (Figure 4) is better elucidated.The model solely based on climate predictor variables (CLIMATE, Figure 5a) roughly met the lower limit of occurrences compared to the CLIMATE + TOPO + PHENO model (Figure 5c), but overpredicted the uppermost limits of B. utilis.Overall, the prediction appears blurry and the broadleaved deciduous treeline could not be distinguished from other vegetation formations (Figure 5a).The CLIMATE + TOPO + PHENO model differentiates between slope structures, and clearly delimits the lower distributional range of B. utilis.At higher altitudes, occurrence probability decreases and diminishes (Figure 5c).Comparing CLIMATE + TOPO + PHENO models with LST + TOPO + PHENO-based models, similar patterns were observed, whereas the distributional range is predicted more constrained in the latter, leaving a smaller distribution range of B. utilis (Figure 5c,d).
For the model built with remotely sensed temperature-related variables only (LST, Figure 5b), similar patterns compared to CLIMATE (Figure 5a) could be found.The distribution appears coarse and fuzzy, compared to CLIMATE + TOPO + PHENO and LST + TOPO + PHENO models (Figure 5c,d).

Ecological Niche Models
The continuous predictions of the models showed noticeable differences (Figure 4).According to the model evaluation measurements, CLIMATE + TOPO + PHENO performed best, followed by LST + TOPO + PHENO, whereas CLIMATE was superior to LST.
It becomes apparent that the core distribution of B. utilis was predicted in the western part of the Himalayan mountain system, whereas only the LST model predicted a principal distribution in the central part of the mountains.All models showed a uniform distribution along the Himalayan arc.The habitat predicted by CLIMATE tends to be wider in range compared to the other predictions.
Displaying differences in the model predictions in detail (Figure 5), the overall appearance (Figure 4) is better elucidated.The model solely based on climate predictor variables (CLIMATE, Figure 5a) roughly met the lower limit of occurrences compared to the CLIMATE + TOPO + PHENO model (Figure 5c), but overpredicted the uppermost limits of B. utilis.Overall, the prediction appears blurry and the broadleaved deciduous treeline could not be distinguished from other vegetation formations (Figure 5a).The CLIMATE + TOPO + PHENO model differentiates between slope structures, and clearly delimits the lower distributional range of B. utilis.At higher altitudes, occurrence probability decreases and diminishes (Figure 5c).Comparing CLIMATE + TOPO + PHENO models with LST + TOPO + PHENO-based models, similar patterns were observed, whereas the distributional range is predicted more constrained in the latter, leaving a smaller distribution range of B. utilis (Figure 5c,d).
For the model built with remotely sensed temperature-related variables only (LST, Figure 5b), similar patterns compared to CLIMATE (Figure 5a) could be found.The distribution appears coarse and fuzzy, compared to CLIMATE + TOPO + PHENO and LST + TOPO + PHENO models (Figure 5c,d).

Discussion
Modelling ecological niches and species distributions in remote high mountain regions like the Himalayas are challenging tasks.Current studies in the field of plant species distribution modelling in the Himalayan mountains mainly use climatic variables to predict species' distribution or to forecast species range shifts under climate change scenarios (e.g., [7,32,41,[70][71][72][73][74][75][76][77]).With regard to Betula utilis, reasonable results were obtained using solely climate for predicting the potential distribution [9,32].However, the necessity arose for approximating the actual distribution of B. addressed in this study by evaluating the benefits of incorporating remotely sensed data into the modelling approach.

Modelling the Ecological Niche of Betula utilis
We found that models combining climate-related variables with remotely sensed topography and phenological traits (CLIMATE + TOPO + PHENO and LST + TOPO + PHENO) outperformed models using only one predictor variable set (CLIMATE and LST).The latter revealed no concise distinction between the vegetation formations (Figures 4 and 5a,b), which indicates the need for further explanatory variables when modelling a species' realized niche on a large scale, i.e., the entire Himalayan mountain system.
We found that all models predicted suitable habitats for B. utilis as a more or less narrow line stretching along the Himalayan arc, featuring prominently in the western parts.All models predicted lower suitability in the eastern parts of the mountain system, a result which coincides with the occurrence data, and which is in line with the fact that due to the more maritime climate in the east, B. utilis loses its dominance as a treeline species in favour of evergreen Rhododendron species [30].Model evaluation revealed highest performance for models built on bioclimatic variables combined with remotely sensed topography and phenology data (CLIMATE + TOPO + PHENO), emphasizing the improvement achieved by incorporating remotely sensed land cover data for predicting the distribution of B. utilis in the Himalayan mountains.Information on the phenological traits (Greenness Increase, and Greenness Decrease) turned out to be beneficial in refining the ecological niche of B. utilis (e.g., demarcation of other vegetation formations and exposed rocks).
In addition, models built solely on remotely sensed data (LST + TOPO + PHENO) suggested that phenological traits are highly relevant for modelling a more realistic distribution range of the species.The findings indicate that models built on remote sensing data yield promising predictions, which may be of interest in remote areas like the Himalayan mountains due to free availability, global coverage, and fine-scale resolution (<1 km).However, temperature variables alone (LST) are not fully capable of predicting the distribution range, as they predicted primarily suitable habitats in the central part of the mountain system and some artefacts (i.e., lakes) on the Tibetan plateau.This is to be attributed to the fact that surface temperature is, apart from atmospheric conditions, also affected by land surface characteristics.Moreover, LST only has a weak proxy for precipitation, namely the number of cloudy/cloud-free days.Consequently, the results using LST + TOPO + PHENO show lower model performance compared to the CLIMATE + TOPO + PHENO model, which incorporates precipitation-related and more plant growth specific variables.However, interestingly the LST model benefits much more from additional predictor sets (i.e., TOPO), indicating a certain redundancy between bioclimatic and topographic variables.This is plausible since topographic features are used in the downscaling process of the Chelsa data [33].
Globally, climate governs global patterns of land cover [78], but land cover and climate are not fully independent [13].Our results confirm topoclimatic variables as the main drivers behind the distribution range, whereas phenological traits substantially contribute to narrow the modelled ecological niche of B. utilis (i.e., more realistic distribution).Similar results were obtained by Parra et al. [79] and Buermann et al. [17] when predicting species distribution across the Amazonian and Andean region.In a hierarchical scheme of environmental controls on species distributions, climatic variables are large-scale determinants, followed by geology, topography, and land cover, which moderate many of the effects of macroclimatic variables [13,16].Our predictions of the CLIMATE + TOPO + PHENO model are consistent with the distribution pattern documented in several vegetation maps showing a narrow band of birch forests forming the upper treeline on north-facing slopes (e.g., [80,81]).However, the actual distribution might be smaller, since topoclimate variables and phenological traits are not the only factors determining habitat suitability.Although not considered in this study, interactions of a whole of site factors such as ecology of tree species, site history, current biotic interactions, and anthropogenic influences affect treeline species' spatial distributions [1,30].

Ecological Interpretation of Predictor Variables
We found two temperature-and three precipitation-related variables (CLIMATE) among the most important for predicting the potential distribution of B. utilis.Mean Temperature of the Wettest Quarter and Temperature Annual Range were most relevant among the temperature-related variables.Growing season air and soil temperatures are considered key factors controlling tree growth at treelines and elevational position of treelines at global scales [1,2].Much lower growing season temperatures (i.e., Mean Temperature of the Wettest Quarter) and higher average winter temperature (January) in the eastern Himalayas result in lower seasonal temperature variation which favours evergreen Rhododendron species over the deciduous species B. utilis at treelines.In the eastern Himalayas, B. utilis becomes less competitive and evergreen broadleaved species (Rhododendron spp.) are the principal treeline species [30].By contrast, higher degrees of continentality with higher mean temperatures of warmest months and severe winter coldness lower the competitiveness of Rhododendron spp. at treeline elevations in the western and northwestern Himalayas, and contribute to the higher competitive strength of B. utilis on north-facing slopes.
Precipitation Seasonality, Precipitation of the Coldest Quarter, and Average Precipitation of March, April, and May were the precipitation-related variables, with Precipitation of the Coldest Quarter having the highest variable importance.Precipitation and related factors such as soil moisture and soil nutrient availability can be significant for treeline formation and dynamics [82,83].Thus, precipitation-related variables potentially limit the climatic space of treeline tree species.The period from November to January was identified as the coldest quarter.Higher winter snowfall in the more continental western parts of the Himalayan region obviously contributes to the competitiveness of B. utilis.Monsoonal summer rains, on the other hand, are of less significance.
Regarding temperature-related remotely sensed variables, all variables proved to be of great importance, but mean annual temperature amplitude (YAST) revealed highest variable importance.Generally, temperature is strongly negatively correlated with altitude (except for cold air inversions in winter months), allowing little room for variation in regional-scale climate data sets [33,84].This, in turn, leads to similar results for LST + TOPO + PHENO, compared to CLIMATE + TOPO + PHENO models, where additional information on phenological traits refined the predicted niche of B. utilis (Figure 4).Interestingly, the number of cloud-free acquisitions NCSA also showed considerable variable importance, indicating that it, despite being a rather poor proxy, contains some information about precipitation.
Furthermore, MAST (mean annual temperature) and THETA (phase shift in days relative to spring equinox) were revealed to be important when modelling the ecological niche of B. utilis.We conclude that THETA, as it represents heat accumulation, shows sensitivity to seasonal snow cover.Vegetation analyses [30,85,86] showed that thickness and duration of snow cover providing sufficient soil moisture at the beginning and at the end of the growing season is one of the principal factors controlling the distribution of B. utilis forests.
The inclusion of topographic variables like Slope led to an improved prediction of climate-only models (both CLIMATE and LST).This is not surprising, since in topographically highly complex regions like the Himalayas, areas can be predicted as climatically suitable but might be inaccessible due to the slope angle.Betula forests thrive on humid, shady slopes with deeply weathered podzolic soils, and are more or less absent from south-facing slopes, in particular in the more continental W Himalaya [31,32,85].
Due to the steep gradient of hydrothermal conditions in the Himalayas, ranging from subtropical at lower to alpine conditions at higher elevations, diverse vegetation formations are formed, characterized by changing phenological traits.Our results provide compelling evidence that the climate-only models (both CLIMATE and LST) could be improved with remotely sensed phenological traits.Differences in variable importance were found between CLIMATE + TOPO + PHENO and LST + TOPO + PHENO models (Figure 3), the most important phenological traits were Onset Greenness Increase, Onset Greenness Decrease, and the Enhanced Vegetation Index (EVI).Based on these variables, a distinction between evergreen coniferous forests below the birch belt and the evergreen broadleaved krummholz belt and Rhododendron dwarf thickets above could be achieved (Figure 5c,d).
The CLIMATE + TOPO + PHENO model revealed the importance of two temporal metrics.The timing of increasing greenness (Onset Greenness Increase) of the B. utilis belt differed significantly from the vegetation formations located below (Kruskal-Wallis Test; p ≤ 0.05).The median green-up date for B. utilis was at 141 Julian days after the snow melt, reflected in the predictor variable Average Precipitation March, April, and May, when sufficient soil moisture for foliation is available.The evergreen coniferous vegetation below had a much earlier green-up date (125 Julian days), whereas the evergreen krummholz belt above had a similar green-up date as the B. utilis belt (141 Julian days).Comparing the averaged dates of vegetation senescence (Decrease of Greenness), no significant differences could be found between the three vegetation formations (Kruskal-Wallis Test; p ≥ 0.05).Nevertheless, the B. utilis belt undergoes senescence earlier (217 Julian days) than the evergreen vegetation formations (after 222 Julian days).
The LST + TOPO + PHENO model showed considerable variable importance of the EVI, which is a convenient measure of plant phenology computed from MODIS surface-reflectance data.The EVI identifies vegetation growth, maturity, and senescence, and thus marks seasonal cycles [51].The applied EVI identifies the vegetation cycle and records sums of the onset of greenness and of minimum greenness.Hence, at the start of the growing season of evergreens, coniferous vegetation in lower altitudes is earlier, while the growing season ends up later compared to deciduous broadleaved vegetation.This results in the highest median EVI value (37) for vegetation below the treeline, whereas the EVI values for the B. utilis belt (24) and for vegetation formations above the treeline (17) were lower.The amount of light reflected from leaves at visible and infrared wavebands is determined by leaf traits and physiological performance [87].The information of distinctly different phenological characteristics between the vegetation types was causal to the refinement of the niche models.Furthermore, these seasonal variations can be used to track changes in vegetation phenology [88], whereas shifts in seasonal phenological events are among the first responses at plant and ecosystem levels to climate change [89].Shifts of flowering to earlier dates have been reported for Rhododendron species [90], and earlier green-up data resulting in an extension of the growing season [91,92] have been reported for the Himalayas.In this context, investigations of underlying climatic factors and quantification of changing plant phenological traits may provide the basis for efficient nature conservation management, expansion of protected areas, and appropriate habitat restoration strategies.

Application of Remote Sensing Data for Modelling Species' Distributions
The availability and the quality of input parameters determine model performance.Often, standardized statistically derived parameters do not fully reflect the species' physiological needs and habitat requirements.Abiotic and biotic data derived from remote sensing may open up new opportunities in analysing and modelling species' distributions, since they provide response and predictor variables.In an extensive literature review, He et al. [23] and references therein present countless applications of remotely sensed data for modelling species' distributions.In general, many remote sensing products are adaptable for modelling biota and can be customized in accordance with the study aims.
In the present study, we highlight the benefits of remotely sensed data in deriving tree species occurrences and predictor variables.The potential for future studies lies in the generation of presence and absence data sets, which are highly required in ENM [93].Due to unique biophysical properties, hyperspectral sensors can detect subtle differences in reflectance based on unique plant chemistries, which is beneficial for identification of plant species [23].Another advantage is the possibility to incorporate biotic interactions into the models, which are often disregarded due to data limitations [94,95].
Our results emphasize thermal metrics, which could be beneficially incorporated into further treeline studies in remote mountainous regions as they provide freely accessible, complete, and long-term data.Major advantages of LST-related variables include continuous observations without interpolation and geographical bias and therefore fewer uncertainties [23].Recent studies showed how LST data could improve species modelling studies (e.g., [17,96,97]).These parameters offer numerous possibilities, such as tailored predictors in high resolution.As time series data of vegetation characteristics (i.e., phenological metrics) are becoming more and more available, changing habitat suitability can be estimated and incorporated into the model approach.In this way, knowledge can be generated, which is particularly important for modelling spatial expansion of invasive species, extinction risk assessment, and range shifts under future climate change [23].
We conclude that the current state of information may serve as a baseline for future studies, even though the available data derived from remote sensing technology is rather short term.Restrictions in the practical applicability arise from the fact that high resolution satellite imagery is still often very expensive.On the other hand, the free of cost imagery and software is already available and will become more customary in the future.Our results show that, even with freely available data, software model performance could be improved, indicating the potential for future modelling studies.
Airborne technology is a continually expanding field, and high resolution remotely sensed data will provide more insights into spatial patterns and underlying factors in future modelling studies.

Conclusions
Modelling ecological niches in remote mountain regions like the Himalayas remains a challenging task due to severe data limitations such as availability of high-quality environmental information.Given the respective climatically and topographically complex terrain, models based on climate variables alone predict the potential distribution of species only, since land cover characteristics are omitted.Nevertheless, climatic gradients determine floristic gradients in high mountain regions, and in turn, phenological traits represent niche proxies.We conclude that the addition of remotely sensed topography and phenological traits as predictor variables leads to improved model performances for the current distribution of B. utilis.It is a conspicuous broadleaved deciduous tree species at the treeline in the Himalayan mountains, allowing a clear separation on the basis of phenological traits from adjacent vegetation types which consist mainly of evergreen coniferous and evergreen deciduous species in the tree layer.It becomes obvious that the inclusion of remotely sensed topography and phenological traits results in a more constrained predicted niche, compared to solely climate based models.Our results underline the relevance of phenological traits to reduce the gap in modelling studies between potential and actual distributions of species over vast, remote, and heterogeneous mountain regions.However, the actual distribution of B. utilis might be smaller than predicted, since topoclimate variables and phenological traits are not the only factors determining habitat suitability, and the resolution of 1 km 2 is often too coarse to account for small-scale landscape characteristics such as varying aspects.
We conclude that the approach of substituting bioclimatic variables with annual temperature cycles from remotely sensed LST data is promising, in particular in regions where sampling efforts are low and sufficient fine-scale climate data are not available.
We further conclude that remotely sensed data make a valuable contribution when modelling the current distribution of B. utilis, since they provide long-term, fine-scale, and freely available observations.Our results emphasize the need for high-resolution data when modelling the actual distribution of treeline species in order to account for the heterogeneous terrain and microclimate.The incorporation of remotely sensed temperature derivates expands the classical approach (i.e., bioclimatic variables and topography) and shows great potential to derive more tailored variables (e.g., temperature of the growing season, precipitation amounts, snow cover) for ecological niche modelling.In future studies, further improvement of ecological niche models could be achieved by incorporating high-resolution remote sensing data.

Figure 2 .
Figure 2. Overview of the predictor sets used in the modelling procedure for estimating the ecological niche of Betula utilis.

Figure 2 .
Figure 2. Overview of the predictor sets used in the modelling procedure for estimating the ecological niche of Betula utilis.

Figure 3 .
Figure 3. Variable importance of the models using four different predictor variable sets: Chelsa climate data and Land Cover Dynamics data (CLIMATE + PHENO); Chelsa climate data (CLIMATE); Land Surface Temperature and Land Cover Dynamics data (LST + PHENO); Land Cover Dynamics data (PHENO) and Land Surface Temperature (LST) For variable description, seeTable 1 and for variable importance of all models Supplementary Material Figure S1.

Figure 3 .
Figure 3. Variable importance of the models using four different predictor variable sets: Chelsa climate data and Land Cover Dynamics data (CLIMATE + PHENO); Chelsa climate data (CLIMATE); Land Surface Temperature and Land Cover Dynamics data (LST + PHENO); Land Cover Dynamics data (PHENO) and Land Surface Temperature (LST) For variable description, seeTable 1 and for variable importance of all models Supplementary Material Figure S1.

Figure 4 .
Figure 4. Continuous predictions of the models using four different predictor variable sets: Chelsa climate data (CLIMATE); Land Surface Temperature (LST); and both combined with Topography (TOPO) and Land Cover Dynamics data (PHENO).For continuous prediction of all models, see Supplementary MaterialFigure S2.

Figure 5 .
Figure 5. Detailed excerpt of continuous predicted occurrence probability using four different predictor variable sets: (a) Chelsa climate data (CLIMATE); (b) Land Surface Temperature (LST); (c) Chelsa climate data, Topography and Land Cover Dynamics data (CLIMATE + TOPO + PHENO); (d) Land Surface Temperature, Topography and Land Cover Dynamics data (LST + TOPO + PHENO).For detailed continuous prediction of all models, see Supplementary MaterialFigure S3.

Figure 4 .
Figure 4. Continuous predictions of the models using four different predictor variable sets: Chelsa climate data (CLIMATE); Land Surface Temperature (LST); and both combined with Topography (TOPO) and Land Cover Dynamics data (PHENO).For continuous prediction of all models, see Supplementary MaterialFigureS2.

Figure 4 .
Figure 4. Continuous predictions of the models using four different predictor variable sets: Chelsa climate data (CLIMATE); Land Surface Temperature (LST); and both combined with Topography (TOPO) and Land Cover Dynamics data (PHENO).For continuous prediction of all models, see Supplementary MaterialFigure S2.

Figure 5 .
Figure 5. Detailed excerpt of continuous predicted occurrence probability using four different predictor variable sets: (a) Chelsa climate data (CLIMATE); (b) Land Surface Temperature (LST); (c) Chelsa climate data, Topography and Land Cover Dynamics data (CLIMATE + TOPO + PHENO); (d) Land Surface Temperature, Topography and Land Cover Dynamics data (LST + TOPO + PHENO).For detailed continuous prediction of all models, see Supplementary MaterialFigure S3.

Figure 5 .
Figure 5. Detailed excerpt of continuous predicted occurrence probability using four different predictor variable sets: (a) Chelsa climate data (CLIMATE); (b) Land Surface Temperature (LST); (c) Chelsa climate data, Topography and Land Cover Dynamics data (CLIMATE + TOPO + PHENO); (d) Land Surface Temperature, Topography and Land Cover Dynamics data (LST + TOPO + PHENO).For detailed continuous prediction of all models, see Supplementary MaterialFigure S3.

Table 1 .
Predictor sets with variables used for modelling the ecological niche of Betula utilis.

Table 2 .
Model evaluation results with regard to several performance measures for five averaged generalized linear model runs based on the four predictor variable sets and their combinations: Topography (TOPO); Chelsa climate data (CLIMATE); Land Cover Dynamics data (PHENO); Land Surface Temperature (LST).The results for training and test data are displayed (training 80% and testing 20%, means of 5 runs).

Table 1
and for variable importance of all models Supplementary Material FigureS1.

Table 1
and for variable importance of all models Supplementary Material FigureS1.