Modelling Distributions of Rove Beetles in Mountainous Areas Using Remote Sensing Data

: Mountain ecosystems are biodiversity hotspots that are increasingly threatened by climate and land use/land cover changes. Long-term biodiversity monitoring programs provide unique insights into resulting adverse impacts on plant and animal species distribution. Species distribution models (SDMs) in combination with satellite remote sensing (SRS) data offer the opportunity to analyze shifts of species distributions in response to these changes in a spatially explicit way. Here, we predicted the presence probability of three different rove beetles in a mountainous protected area (Gran Paradiso National Park, GPNP) using environmental variables derived from Landsat and Aster Global Digital Elevation Model data and an ensemble modelling approach based on five different model algorithms (maximum entropy, random forest, generalized boosting models, generalized additive models, and generalized linear models). The objectives of the study were (1) to evaluate the potential of SRS data for predicting the presence of species dependent on local-scale environmental parameters at two different time periods, (2) to analyze shifts in species distributions between the years, and (3) to identify the most important species-specific SRS predictor variables. All ensemble models showed area under curve (AUC) of the receiver operating characteristics values above 0.7 and true skills statistics (TSS) values above 0.4, highlighting the great potential of SRS data. While only a small proportion of the total area was predicted as highly suitable for each species, our results suggest an increase of suitable habitat over time for the species Platydracus stercorarius and Ocypus ophthalmicus , and an opposite trend for Dinothenarus fossor . Vegetation cover was the most important predictor variable in the majority of the SDMs across all three study species. To better account for intra- and inter-annual variability of population dynamics as well as environmental conditions, a continuation of the monitoring program in GPNP as well as the employment of SRS with higher spatial and temporal resolution is recommended.


Introduction
Mountain ecosystems are biodiversity hotspots with higher species richness and levels of endemism than adjacent lowlands as a result of steep environmental gradients over short distances that lead to topographic, geologic, and climatic heterogeneity [1,2]. Such physiographic complexity creates a mosaic of habitats and, hence, a multitude of ecological niches. Furthermore, spatial isolation of mountain ridges by environmentally very different lower elevations enhances the segregation of populations and, potentially, speciation. Mountains also play a key role in providing ecosystem services [3], e.g., they supply half of the global human population with fresh water [4]. At the same time, mountain regions are especially sensitive to the impacts of a changing climate as seen by the shrinking of glaciers, changes in water provisioning, and the increase of extreme events such as avalanches and landslides [5].
To understand the impacts of these changes on plant and animal species distributions, longterm biodiversity monitoring initiatives are needed to develop adapted conservation measures [6]. While data collection is a time-consuming endeavor by itself, accessibility is an additional major challenge in mountains. However, such in situ data are indispensable for predicting species occurrence probability at larger scale using, e.g., species distribution models (SDMs, see [7] for detailed information). SDMs based on species data stemming from standardized biodiversity monitoring schemes instead of opportunistic observations have been shown to provide higher model accuracy even with a lower sample size [8]. However, not only reliable species presence data, but also geospatial information on ecologically relevant environmental factors is needed. Since ecological niches in mountains are largely influenced by micro-topography [9], geospatial data created by employing spatial interpolation techniques (e.g., for climatic data, [10]) is often not capable of capturing such small-scale differences and heterogeneity.
Besides climatic data, which governs species distribution at coarse resolution, land use/land cover (LULC) is a main determinant of species presence/absence at finer resolutions [11]. However, existing LULC maps are often not thematically detailed enough and the temporal resolution may be inadequate to improve predictions of species distributions [12][13][14]. In this study, we therefore used satellite remote sensing (SRS) data offering great potential to mirror spatially explicit habitat characteristics for use in SDMs [15]. Furthermore, SRS data is often the only consistent source of information on environmental gradients in space and time allowing also retrospective analyses. However, the use of SRS data in mountain regions is challenging, e.g., due to high and persistent cloud and snow cover, strong illumination effects, and often short vegetation periods (see [16] for more detailed information and possible ways forward). Therefore, only a limited number of studies so far has employed satellite-derived parameters for SDMs in mountainous regions (but see [17,18]).
The use of SRS data for assessing taxonomic, structural, and functional biodiversity research has generally been increasing in recent years [19], driven by a free and open access data policy together with rapid sensor developments offering increasing spatial, temporal, and spectral resolution of satellite imagery [20]. Most importantly, the recently available high-resolution and high-frequent data of the Sentinel-2 mission allows a much more detailed characterization of vegetation phenology [21], improved mapping of fine-scaled habitat types [22], and better assessment of nature conservation status [23]. However, despite these promising developments, studies employing SRS data for analyzing temporal trends in biodiversity face a number of limitations in data availability. Due to the availability of a consistent and cross-calibrated multidecadal data record and a long-term systematic data acquisition strategy, Landsat satellite imagery is a unique monitoring tool for biodiversity assessments over time at a high spatial scale [24,25].
The objective of this study was to evaluate the potential of Landsat data for predicting the presence of different rove beetles (order Coleoptera, family Staphylinidae) in the Gran Paradiso National Park (GPNP), a mountainous protected area in northwestern Italy. Specifically, we analyzed species data from two different field campaigns (2006-2007 and 2012-2013) to track changes in time and to test model stability. Staphylinids are key organisms in various terrestrial ecosystems and, together with carabid beetles, ants, and spiders, constitute the vast majority of epigeic mesoarthropods (i.e., invertebrate animals that live on the soil surface) [26]. They are abundant, taxonomically and trophically diverse, occupy a variety of ecological niches, and can be easily sampled; all these characteristics make them an attractive taxon for use as bioindicators [27,28]. Moreover, many rove beetle species respond to and depend on local-scale environmental parameters and resources [27][28][29], representing an important challenge for the application of SRS data. In the present study, we focused on three species with fairly different habitat requirements. To collate a well-constructed set of predictor variables relevant to the biology of the species [30], we derived a set of selected topographic and environmental features addressing the peculiarities of the mountainous region studied. We used a state-of-the-art ensemble modelling framework (i.e., an approach that integrated multiple individual model predictions to obtain a final, more accurate one; [31]) that combined two regression-based methods (generalized linear models (GLM) and generalized additive models (GAM) [32,33]) and three machine-learning methods (random forest (RF), generalized boosting models (GBM), and maximum entropy (MAXENT) [34][35][36]) by means of weighted average, an approach suggested to have higher accuracy than single-algorithm models and to perform best among consensus methods [37]. In particular, we addressed the following questions: • How well can habitat suitability be modelled based on SRS data for rove beetles in mountainous ecosystems? • How and why do SRS-based model predictions differ among years? • What are the most important SRS predictors? How do they differ among species?

Study Area
The study region encompassed the territory of the Gran Paradiso National Park (GPNP, total area of 847.4 km 2 ) located in northwestern Italy (Figure 1a). GPNP is the oldest national park in Italy that gained official conservation status in 1922 and provides habitat for the threatened alpine ibex (Capra ibex). The park is part of the Long-Term Ecological Research (LTER, site code: LTER_EU_IT_109) and of the NATURA 2000 (site code: IT1201000) networks and has been subject to intensive long-term monitoring efforts and research [38].
GPNP is characterized by complex topography with elevation ranging from 670 to 4061 m above sea level (a.s.l.) including different altitudinal vegetation belts (montane, subalpine, alpine, and nival) [39]. Woodlands (20.2% of the territory) are dominated by larch (Larix decidua) and Norway spruce (Picea abies) while species-rich alpine grasslands and pastures (17% of the territory) can be found above the treeline. About 60% of the territory is covered with sparse or no vegetation, bare rocks, or glaciers [38,40]. The climate is alpine-continental, characterized by low annual mean temperatures (ca. 5 °C), high seasonal differences, and a general paucity of precipitation (990 mm/year on average). However, remarkable dissimilarities can be observed among different valleys due to variations in altitude, slope, and aspect. Snow cover is usually present from November-December to March-April [41,42].

Sampling Design for Biodiversity Data and Species Characteristics
Species occurrence data for this study were collected in the context of the long-term monitoring project Monitoring of Animal Biodiversity in Mountain Ecosystems, which is an initiative of the GPNP and the first attempt to develop a protocol for long-term monitoring of multiple taxa in the Italian Alps [39]. The main purpose of this long-term monitoring initiative is to collect information on changes in community composition for multiple taxa [39] to allow analyzing biodiversity changes in time and along elevational gradients. The first monitoring campaign started in 2006-2007 and campaigns (carried out over a two-year period) will be continued every six years. The monitoring scheme encompasses five representative altitudinal transects to cover three vegetation belts (high montane, subalpine, alpine; elevational range 1200-2600 m a.s.l.; Figure 1b). Sampling units are plots (circular areas with 100 m radius, between five and seven per transect, resulting in a total of 30 plots for GPNP, Figure 1c), which are separated by an altitudinal difference of approximately 200 m to allow for data independence. Pitfall traps in each plot were set up along one diameter of the plot with a distance of 50 m from each other (Figure 1d). Species presence/absence data were collected using pitfall traps (plastic cups, diameter of 7 cm, filled with ca. 10 cc of white vinegar and some drops of detergent to break surface tension). The traps were emptied and refilled every two weeks from May to September. Sampled adult specimens were identified at the species level by taxonomic experts who registered presence/absence for each taxon.
Here, we focus on rove beetles (Coleoptera: Staphylinidae) using data from the two monitoring campaigns in 2006-2007 and 2012-2013. Rove beetles have been previously shown to be sensitive to ecosystem modification and anthropogenic impacts [43,44]. Pitfall traps (when being active for a whole season) can be considered suitable for obtaining estimates of presence/absence as well as relative population density and for quantifying population fluctuations in epigeal Coleoptera and allow a standardized and repeatable sampling through space and time [45,46]. Among the 178 species present in the database, we here selected three (Dinothenarus fossor, Platydracus stercorarius, and Ocypus ophthalmicus) for which a sufficiently high number of sampling locations was available, and which were not highly localized within our study area. We chose rather common species for which taxonomy, general habitat requirements, and ecological traits are reasonably well known [27]. From those, we focused on species with taxonomic proximity (same subfamily and tribe, Staphylininae Staphylinini), but characterized by quite different ecological and functional specialization, in particular in terms of macro-and micro-habitat affinity and altitudinal range. For detailed species characteristics see Box 1 and description in Section S1 in the supplementary. Box 1. Characteristics of the Staphylinidae species used for species distribution modelling according to Tagliapietra and Zanetti [47] and Zanetti et al. [48]

Species Presence/Pseudo-Absence Data for Model Training
To train the SDMs, we used the environmental conditions at the trap locations in which individuals of each target species were captured (true presence data), separately for each species. The number of traps and plots in which individuals of each species were captured varied among species and years (Table 1). Furthermore, a few samples were excluded from the analysis as located in areas covered by clouds and snow in the analyzed SRS data. For all species and both monitoring campaigns, we pooled the data collected in 2006-2007 and 2012-2013, respectively, thereby increasing the number of traps and plots in which individuals were captured ( Table 1). The use of data from consecutive years is a common approach in species distribution modelling to account for inter-annual differences in species abundance and occurrence [49]. We initially used all traps where the study species were captured as presence points for fitting the models, but, due to the close proximity of traps within plots, then decided to model each species on plot-pooled data by subsampling only one trap per plot as presence point. For each plot, we selected the trap with the highest number of captured individuals per species, thus ensuring that the site was recurrently frequented by the study species. Subsampling is an effective way of removing pseudo-replicates (i.e., spatially correlated observations) in the dataset. Although this approach further reduced the number of presences available for modelling, previous studies [50,51] showed that accurate SDMs can be fitted on very small numbers of training records. Due to the limited size of our dataset, we decided not to use traps without captured individuals as absence information but to generate pseudo-absence data. Pseudo-absences are commonly used in distribution modelling [52], and they should not be interpreted as sites at which the species was truly absent, but rather as a sample of the environmental variation in the study area [53,54]. Moreover, this approach allowed us to use the true absences as an independent dataset for subsequent model evaluation (see Section 2.7). Specifically, we created 500 stratified random point locations as pseudo-absence data, which were allocated according to the sampling design in terms of trap density at specific altitudinal ranges (Figure 1c; e.g., 6.7% of the traps were located at 1300 m a.s.l. thus 33 out of 500 random points were assigned to the range 1300 ± 50 m). Additional constraints for the locations of these stratified random points were a minimum distance of 30 m between points to match the spatial scale of the analyzed satellite remote sensing data (see Section 2.4.) and the non-existence of points within a buffer area with 500 m diameter around trap locations (for detailed information see Figure S1). This buffer acknowledged the necessity to use reasonable pseudo-absences that are adequately environmentally distant from the presence data (but not outside the area of interest) in order to avoid inflation of AUC (area under curve of the receiver operating characteristics) values [55]. Areas covered by clouds and snow were excluded for allocation of pseudo-absence data.

Satellite Remote Sensing Data
We used multispectral Landsat surface reflectance data provided by the Earth Resources Observation and Science Centre Processing Architecture [56]. Landsat imagery has proven suitable to derive geospatial information on environmental characteristics that affect species distributions in multiple studies [18,57] and has been used to model changes in biodiversity between different time periods [58]. Two Landsat images acquired on 22 September 2006 (Landsat 5-TM) and 9 September 2013 (Landsat 8-OLI) that were consistent with the time period of in situ species data sampling (between May and September) were downloaded. These images were selected to minimize overall cloud cover over the study area (41.6% in 2006 and 25.3% in 2013) and differences between acquisition dates. We topographically corrected the images and calculated the tasseled cap transformation to derive "brightness", "greenness", and "wetness" using sensor-specific coefficients [59,60]. Additionally, we derived the Normalized Difference Vegetation Index (NDVI, [61]) and downloaded the NDVI-based land surface temperature (LST) for each acquisition date [62].
In order to represent altitudinal and topographic gradients, the ASTER Global Digital Elevation Model (GDEM, cell size: 30 × 30 m, [63]) was aligned to match pixel locations of the Landsat imagery. While the use of altitude information has been subject to debate in ecology in general [1] and in species distribution modelling in particular [9,64], we here used altitude as a proxy for climatic parameters since spatially explicit estimates of climate were not available at sufficiently high spatial resolution. In addition, based on the ASTER GDEM, the Terrain Ruggedness Index (TRI, based on 8 adjacent cells within a quadratic array around the center cell [65]), sine of aspect (eastness), cosine of aspect (northness), and slope were calculated. Areas covered by clouds, cloud shadows, and snow were masked out ( Figure S2). To avoid collinearity of explanatory variables in the subsequent analyses, a threshold of |r| > 0.7 was used to exclude variables [66]. As a result, NDVI was not used in the modelling framework due to its strong correlation with altitude. Likewise, brightness and land surface temperature were also excluded as they were highly correlated with northness and wetness ( Figure S3). An overview of the SRS variables used and their ecological relevance is given in Table 2.  [65,70] ASTER GDEM

Wetness (TCT)
Responds to soil/canopy moisture and standing water, being often highest in young forest stands (not sensitive to topography) [73,74] Landsat imagery

Ensemble Modelling and Model Parametrization
Since the choice of model algorithm is a major component of prediction uncertainty in SDMs [37,75], several algorithms were employed and combined in order to estimate and account for prediction uncertainty (ensemble modelling, [31]). We used five modelling methods that are representative for different classes of model algorithms, namely generalized linear models and generalized additive models (GLM and GAM, regression-based methods, [32,33]), random forest, generalized boosting models, and maximum entropy (RF, GBM, and MAXENT; machine-learning methods; [34][35][36]) as implemented in the biomod2 package version 3.3 [76] using R version 3.6.0 [77]. To account for the within-algorithm model variation when different sets of data were used for model fitting, we computed distribution models for each species using 50 repetitions where 70% of the data were used for model training and 30% for model testing. To avoid spatial sorting bias in the cross-validation procedure [78], we used a spatially stratified approach to split the training and the testing data sets, to prevent selecting testing-presence sites close to training-presence sites. Specifically, the full dataset was divided into 50 clusters (one for each model repetition), using kmeans clustering based on the spatial coordinates of presence/pseudo-absence points. For each model run, 30% of the points closest to the center coordinates of the respective cluster were used as testing dataset and the remaining as training dataset. Model outputs were scaled between 0 and 1000 to ensure comparability of model predictions across the different model algorithms. A binomial link function was used to fit GLMs, allowing linear and quadratic terms as well as linear interactions of the explanatory variables. To simplify the full model (i.e., to reduce the number of environmental predictors by removing less important ones), a stepwise backward selection was applied to select final models, based on the Bayesian Information Criterion (BIC) [79]. GAMs were also fitted with a binomial link function, including all seven explanatory variables and using the algorithm as implemented in the mgcv package [80]. The basis dimension of the smooth functions was set to k = 4 to avoid overfitting. For MAXENT, the maximum number of iterations was set to 500 and the 'Auto feature' settings were used while excluding threshold and hinge features to avoid over-parameterization [81]. The regularization parameter β was set to 0.002 to optimize model performance [82]. For RF models we used 500 trees, following previous studies [75], while for GBMs we used the default settings (number of trees = 100, interaction depth parameter = 1, learning rate = 0.1, and subsampling fraction = 0.5) suggested by the gbm package [83]. The AUC was used as evaluation criteria to generate the ensemble predictions from the five different algorithms (see biomod2 documentation for further details). To obtain a relevant combination of several unbiased (i.e., with fair accuracy) models, all individual models with an AUC value <0.7 were discarded, and the ensemble models were constructed for each species and time period by computing the weighted average of all remaining model predictions [52]. The weights were based on the AUC scores of each model, so that better performing models had a higher influence in the final ensemble. This technique has been shown to be one of the best performing consensus methods [37] and is widely used in ensemble applications [52]. To assess shifts in suitable habitat between the two time periods, we defined five classes of habitat suitability based on the presence probability scores extracted from the SDM maps. The five classes were defined by the following thresholds of presence probability scores: 0-200, >200-400, >400-600, >600-800, and >800-1000, ordered from low to high habitat suitability. To compare changes over time, we calculated the percentage of area in each class for the common area not masked by cloud, shadow, and snow coverage in either of the two time periods. Moreover, we calculated Spearman rank correlation coefficients between the SDMs, to assess the correlation of species distributions between time periods and across different species.

Assessing Variable Importance of SRS Data
Variable importance was assessed by comparing the model prediction derived from the original dataset and predictions derived from permuted datasets [76,84]. Permuted datasets were created by randomizing one environmental variable resulting in one dataset for each predictor variable. The predictions of occurrence probabilities resulting from permuted datasets and the original dataset were compared by calculating the Pearson correlation coefficient; a small correlation coefficient thus indicated high importance of the permuted variable. This procedure was repeated three times for each model run, and mean correlation values for each predictor variable were derived across the 50 model repetitions. Values of variable importance were finally calculated by subtracting the correlation coefficient from 1. To ensure comparability between different model algorithms, the relative variable importance was calculated for every variable per algorithm. To understand the effect of single variables on the SDM predictions, response curves showing the sensibility of the model to each variable were plotted for all variables in the ensemble models [76].

Model Evaluation
We used a cross-validation procedure to assess model performance and calculated two evaluation metrics commonly used in species distribution modelling, AUC and the true skills statistics (TSS) [54]. AUC values for SDMs indicate how well the model discriminated between species presence and absence/pseudo-absence data [85]. According to Swets [86], models with AUC values below 0.7 are considered as poorly accurate, those with AUC between 0.7 and 0.9 as useful, and those with values above 0.9 as highly accurate. TSS is a threshold-dependent metric, not affected by the size of the validation set nor by species prevalence [54,87]. TSS is calculated as sensitivity (i.e., true positive rate) + specificity (i.e., true negative rate) -1, and therefore ranges from -1 to +1, where values of zero or less indicate a performance no better than random guessing [54]. According to González-Ferreras et al. [88], model accuracy is poor when TSS values are below 0.4, fair when TSS is between 0.4 and 0.6, good when TSS is between 0.6 and 0.8, and excellent for values above 0.8. Mean AUC and TSS scores were calculated based on the 50 model repetitions for all model algorithms and for the ensemble models. In addition to AUC and TSS, we analyzed and compared model predictions at presence locations and at independent true absence data (i.e., traps in which no individuals of each target species were captured) for each species to examine how well derived maps represented in situ data. Finally, as uncertainty maps are an important tool for communicating model reliability in a spatially explicit way [89], we calculated the difference between the 97.5 th percentile and the 2.5 th percentile of cell values derived from the prediction maps for each species, based on 50 model repetitions.

Model Performance
AUC and TSS values were highly concordant in assessing model performance, and the majority of species prediction models showed satisfactory accuracy levels based on the two evaluation metrics (i.e., AUC > 0.7 and TSS > 0.4, Table 3), and can thus be categorized as useful [86]. Dinothenarus fossor showed the highest number of well-performing models, probably because of the larger number of presence points in both time periods, compared to the other two species (Table 1) In addition to the cross-validation results, the comparison between modelled presence probabilities at presence locations and at absence records not used for model training revealed that the majority of ensemble models had fair discriminatory power in the prediction of suitable vs. unsuitable habitat. For Dinothenarus fossor, the modelled presence probabilities were significantly higher in the ensemble predictions at presence sites compared to absence sites, in both time periods (Figure 2). However, for the other two species, the ensemble models built for 2006-2007 had higher discriminatory power than those built for 2012-2013.

Effects of Inter-Annual Variability of Species Records on SRS-Based Training Data
The inter-annual variability of the plots in which individuals of the three species were captured (see Section 2.3) together with the inter-annual differences in the SRS data themselves (see Section 2.4) had clear effects on the values and range of the SRS-based model training data ( Figure  S4). Variability was generally lower for the variables derived from the ASTER GDEM (altitude, eastness, northness, slope, and ruggedness), which were static data for both time periods. On the contrary, the largest differences were found for greenness and wetness, due to variations in ecosystem conditions and seasonal changes as a result of the differences in acquisition dates (the 2006 imagery was acquired 20 days later than the 2013 data). However, variability cannot solely be attributed to differences between the two Landsat data sets. For example, Ocypus ophthalmicus showed considerable variation also for the ASTER GDEM-based variables between 2006-2007 and 2012-2013. Similar differences can be seen for Dinothenarus fossor with respect to eastness. Overall, the subsampling of the full dataset by retaining only one trap per plot did not significantly reduce the variance and range of the environmental variables, apart from few exceptions ( Figure S4).

Variable Importance
Across species, model algorithms, and years, greenness generally had the highest mean variable importance according to correlation analyses between model predictions based on permuted and original datasets, followed by ruggedness (Table 4). This trend was evident also in the ensemble models, for which greenness and ruggedness were always the most important variables, with the only exception being the model built for Platydracus stercorarius in 2006-2007, in which wetness was the second most important variable after greenness. Slope was of only very minor importance in all models. However, slight differences in variable importance between species and years could be detected. For example, topographic variables like ruggedness and northness were more important for Ocypus ophthalmicus than for the other two species, whereas greenness was less important than for Dinothenarus fossor and Platydracus stercorarius.

Species Distribution Maps
The area of GPNP for which species distributions could be modelled was larger in 2012-2013 due to a smaller proportion of areas covered by clouds, cloud shadows, and snow in the respective Landsat data set. While the regions predicted as potentially suitable varied to a different extent between the species, valleys generally had higher suitability scores than hillsides or high-altitude plateaus (Figure 3). According to our results, the southern and eastern parts of the study area were generally more suitable compared to the remaining regions. While the six SDMs had high correlation coefficients (Table S1), indicating large similarities in the modelled distributions of the three species, we also observed some differences: Ensemble models built for Ocypus ophthalmicus resulted in a higher predicted occurrence probability in the northeastern part of the study area compared to the other species. The southwestern valleys in the GPNP were also modelled as highly suitable for Ocypus ophthalmicus, whereas their modelled presence probability was intermediate for Platydracus stercorarius and low for Dinothenarus fossor (Figure 3).
Visual comparison of the final SDM maps revealed that modelled distribution patterns across time periods were more similar for Ocypus ophthalmicus than for Dinothenarus fossor and Platydracus stercorarius, which is supported by higher correlation values between the correspondent SDM maps (Table S1). High habitat suitability scores (i.e., above 600) were only obtained for a small proportion of GPNP across all species (below 20% of the study area, Table S2). For Dinothenarus fossor, the area with high suitability scores (>600 to 1000) decreased from 2006-2007 to 2012-2013 (16.4% vs. 8.3% ,  Table S2). Contrarily, the highly suitable area increased from 8.9% to 16.4% for Platydracus stercorarius and from 11.3% to 19.1% for Ocypus ophthalmicus between the two time periods. Model uncertainty was generally higher in areas with medium to high presence probability and low in areas with lowest predicted suitability (see Figure 3 and Figure S6), showing a similar spatial pattern for the three species. When comparing time periods, overall uncertainty was higher for Dinothenarus fossor and Platydracus stercorarius in the 2012-2013 models, whereas the trend was opposite for Ocypus ophthalmicus.

Discussion
The objective of this study was to examine the potential of SRS data to produce species distributions maps of rove beetles with different ecological and functional specialization. A large set of studies predicting species distribution of invertebrates relies on habitat parameters recorded in the field as well as bioclimatic variables [90][91][92]. Others also include SRS-derived information on land use/land cover [93][94][95][96] or variables related to topography (e.g., slope, northness, and eastness, [97]). However, to our knowledge the present study is the first attempt to predict species distribution of beetles in a mountainous region entirely based on SRS data.

Model Performances to Predict Invertebrate Species Distribution
The presented SDMs can be rated as useful according to Swets [86] and González-Ferreras et al. [88] based on mean AUC and TSS values obtained in the cross-validation step. When comparing different algorithms, regression-based methods (GLM and GAM) had the lowest mean performance, whereas machine-learning methods (RF, GBM, and MAXENT) were found to be the most accurate. GLMs have been widely used in SDM studies [53], but their inability to capture nonlinear responses might be one reason for their poor performance in our study. Otherwise, our results confirmed the robustness and effectiveness of more recently developed machine-learning techniques: Elith et al. [53] showed better performances of GBM and MAXENT compared to GAM and GLM across different species (plants, birds, mammals, and reptiles) and regions; similarly, Marmion et al. [37] and Grenouillet et al. [98] found RF to have the highest accuracy among several tested single algorithm models (among which GLM and GAM) in predicting plant and fish species distributions, respectively. Araújo and New [31] list RF, GBM, and MAXENT as techniques that already incorporate the notion of ensemble forecasting, which could be one of the underlying factors explaining their higher predictive accuracy over regression methods. While the performance of all model algorithms was negatively affected by the small number of presence points used for model fitting, the selection method of pseudo-absences impacted regression and machine-learning techniques differently: Wisz and Guisan [99] and Barbet-Massin et al. [100] showed that a large number (e.g., 10,000) of randomly selected pseudo-absences improved the performance of GLMs and GAMs, hence our spatially stratified selection approach may have influenced the performance of these model types.
Overall, the variability among the five algorithms confirmed the suitability of the chosen ensemble modelling approach, and the evaluation metrics values identified the ensembles as the highest performing models compared to single algorithm models, as also shown by Marmion et al. [37] and Grenouillet et al. [98]. Nonetheless, the comparison of modelled presence probability scores at true absence and presence points revealed that not all ensemble models had high discriminatory power (Figure 2). In particular, the models built for Platydracus stercorarius and Ocypus ophthalmicus in 2012-2013 tended to overestimate distribution ranges, predicting high occurrence probability also at absence locations. This could be due to the small number of presence points available for model training (18 and 12, respectively, Table 1). In fact, models trained using all traps where beetles were captured as presence points (instead of plot-pooled data) had a consistently higher accuracy in discriminating presences from true absence sites. Moreover, distributions of widespread species, such as Platydracus stercorarius, Ocypus ophthalmicus (see description in Section S1), were more difficult to predict compared to those of rare species [50,51]. Nevertheless, the obtained species distribution maps can be considered as quite reliable based on local ecological knowledge, which is becoming increasingly important in ecological modelling in general [101,102]. In particular, the SDMs met our expectations of more similar distribution patterns for Platydracus stercorarius and Dinothenarus fossor, which are both macro-habitat generalists and strongly dependent on vegetated areas, compared to the predicted distribution of Ocypus ophthalmicus, which is less dependent on vegetation (see description in Section S1).
Regarding the spatial resolution of the employed Landsat images and GDEM (30 × 30 m), we assumed adequacy to predict species distribution of invertebrates as other studies employing SRS data were successfully conducted on an even broader spatial scale so far [93,94] (but see section Suggestions for Model Improvement). Moreover, despite general recognition of the importance of scale selection in SDMs [103], studies addressing scale directly have found equivocal results: Guisan et al. [104] showed that a 10 times lower resolution of environmental layers used in modelling did not severely affect predictions from SDMs, despite a general decrease in model performance when using explanatory variables at coarser grain size. Nevertheless, as ecological processes are also highly dependent on scale, more research in this direction is needed to better understand the complex relationships between the temporal and spatial variability of species presence records and representation of this variability in SRS data.

Temporal Variability of Species Presence Records, SRS Data, and Modelled Distribution Ranges
Inter-annual fluctuations in the number of epigeic beetles are a quite common phenomenon found in many works while not completely understood. Indeed, the number of specimens caught by pitfall traps, and consequently the detectability of the species, are linked not only to animal abundance but also to their activity [105,106]. Movement patterns are in turn highly dependent on different factors like temperature and general weather conditions as well as food supply and individual fitness [107]. Furthermore, variations at the micro-habitat scale, including changes in soil moisture, humidity, and in the occurrence of ephemeral resources like prey availability, significantly affect beetles' abundance, activity, and ultimately detectability [28,108,109]. To reduce yearly variability, we selected species known to be (1) not directly depending on ephemeral resources and (2) nonspecific predators, feeding on various soil arthropods.
The variability of species abundances at trap locations was one reason which led to differences in the values of the SRS-based model predictors at the presence locations between the two time periods (2006-2007 and 2012-2013). The second reason for the observed variations was in the interannual differences in environmental conditions, as greenness and wetness were based on different Landsat images for the two time periods. Specifically, the Landsat image was acquired 20 days later in 2006 compared to 2013, explaining the lower values of greenness ( Figure S4) associated with differences in vegetation phenology [110]. Such differences in the vegetation-related environmental variables between 2006-2007 and 2012-2013 likely had a comparatively stronger effect on the SDMs built for Dinothenarus fossor and Platydracus stercorarius, for which greenness was highly important. The variation in greenness values could explain the larger differences in predicted species distribution of the two species between time periods when compared to Ocypus ophthalmicus.
In addition, the 2006 Landsat image was also more strongly affected by clouds, shadows, and snow, which further reduced the total area available for modelling in both time periods and limited comparisons about temporal changes in species distributions. The effect of acquisition date of SRS imagery on analyzing species patterns was also shown by other studies [111,112], highlighting the drawbacks of using single SRS imagery representing temporal snapshots and not accounting for intra-annual differences of environmental variables. Analyzing multi-temporal SRS data may be a way to overcome these limitations and has been shown to improve SDMs for marsh bird species [113]. SDMs based on longer composite periods (i.e., multiple years) also showed less deviance from observed species presence/absence field data [114]. While being aware of these problems, the number of suitable SRS imagery was unfortunately restricted in our study and presented SDMs could not be improved in this regard (but see section Conclusions). Furthermore, radiometric differences between Landsat 5 TM (Thematic Mapper) and Landsat 8 OLI (Operational Land Imager) images may also have caused differences in species predictions [115]. Nevertheless, radiometric differences were overall small in relation to other influencing factors, i.e., phenological differences between the two acquisition dates, and did not affect our overall outcome.

Importance of SRS Predictors
For all species, SRS-derived greenness was very important. This indicates that vegetated areas are generally preferred by our study species (which is well known at least for Dinothenarus fossor and Platydracus stercorarius, whereas Ocypus ophthalmicus is less dependent on vegetation [47,48,116]). This trend was clearly visible in the variable response curves of all models ( Figure S5), in which the modelled probability of species presence increased at higher greenness values. Moreover, ruggedness was identified as the second most important variable in almost all models, negatively affecting the probability of species occurrence. This indicates that penetrability of the terrain determines species distribution: rugged terrain with large variations in elevation, which in turn affect soil thickness and moisture, was less suitable for rove beetles. While known macro-and micro-habitat requirements are rather different (see Box 1 and description in Section S1), the patterns of variable importance and their response curves were fairly similar for the three study species, with greenness and ruggedness being the two most important variables in all ensemble models but one (the 2006-2007 model for Platydracus stercorarius, in which greenness and wetness scored the highest importance values, see Table 4). This indicates that the SRS variables employed in the present study were not entirely capable of depicting the subtle differences in terms of habitat requirements of invertebrate species.
Apart from ruggedness, other topographic-related variables, such as slope, eastness, and northness, had relatively low importance in all models. The effect of wetness in the SDMs is hard to interpret, as it showed contrasting results in the models´ variables response curves. Wetness had a strong negative effect in the 2006-2007 model built for Platydracus stercorarius, but not in 2012-2013. For the other two species, wetness had lower importance scores and a seemingly negative effect on species occurrence, as should be expected in xerophilous species (see Box 1). Altitude, which is typically inversely related to temperature, affected the SDMs very weakly but in different ways for the three species: While we observed negative effects for Dinothenarus fossor (although only for the model referring to [2006][2007], the tendency was opposite for Ocypus ophthalmicus in both time periods, and for Platydracus stercorarius in 2006-2007. These results are in line with our expectations, as the heliophilous species Ocypus ophthalmicus is known to occupy higher altitudes with sunny places above the tree line and to have a broader altitudinal range compared to the other species (see Box 1 and Section S1).
It is difficult to reflect our findings regarding variable importance with respect to other studies, as only very little research has focused on the analysis of spatial patterns of Staphylinidae, either at the community or at the species level. Indeed, the highest number of bibliographic sources is represented by faunistic works, followed by articles considering the role of Staphylinidae as bioindicators, in forest or in agricultural ecosystems [47,48,117,118]. In any case, some general patterns have been found. In particular, it emerged that Staphylinidae are usually more influenced by local factors than landscape-scale features [119]. Community composition and the presence of single species have often been observed as being determined by micro-climate (soil moisture and temperature) and by habitat-related characteristics (vegetation cover and micro-habitat characteristics like decaying material or soil texture) [27,117,120].

Suggestions for Model Improvement
While the ensemble SDMs outperformed the single algorithm models for all species and years and were thus rated as useful based on the cross-validation results, some suggestions for model improvement can be formulated.
A continuation of the monitoring program in GPNP is recommended (and currently conducted) to assess trends in biodiversity and the impact of environmental changes on species [6,121]. The generation of new species presence and absence information will also improve the performance of SDMs as inter-annual variability in species trapping success can be balanced out. Our decision to train the SDMs on plot-pooled data to avoid issues of pseudo-replication and of spatial correlation reduced our sample size significantly, consequently affecting the discriminatory power of the fitted models. An in-depth analysis of the spectral reflectance characteristics of presence and pseudo-absence data may have further improved model performance as recently shown by Remelgado et al. [122].
The inclusion of additional predictors, such as detailed land cover and grazing intensity maps, could surely improve the models and possibly increase their discernment between the distribution maps of the three species. In our case, these data were unfortunately not available at the right spatial and temporal scale. The availability of environmental variables at finer spatial scale could reduce the problem of spatial autocorrelation between near presence points by enabling the detection of changes in environmental characteristics at a higher resolution, namely within the 50 m of inter-trap distance.
While we here focused on cost-effective explanatory SRS data, recent advances in remote sensing technologies provide the possibility to better account for intra-annual differences in habitat characteristics which likely improve SDMs for our study species in the future. Specifically, Sentinel-1 radar images and Sentinel-2 optical images allow to monitor soil moisture content [123] and vegetation status and composition [124,125] at high spatial resolution and high temporal frequency, respectively. Likewise, high resolution satellite imagery (e.g., PlanetScope) or unmanned aerial vehicles (UAVs) equipped with multi-and hyperspectral sensors as well as laser scanner systems are very promising in this context by providing high resolution data on micro-habitat, vegetation structure, and topography [126,127].

Conclusions
The presented findings have important implications for future studies on species distribution modelling. Our results suggest that single time interval studies (as currently often done in SRSbased SDMs) may lead to false assumptions regarding underlying environmental drivers of species distributions. However, a continuation of monitoring is needed to not only track changes in species distribution patterns but also to better account for inter-annual variations due to natural population dynamics and differences in activity patterns. While in the present study, freely available Landsat images were employed, we detected shortages and corresponding implications for species distribution modelling due to the restricted temporal resolution of this data in mountainous regions characterized by high and persistent cloud cover. The limited number of suitable SRS imagery severely restricted our ability to account for intra-annual and inter-annual environmental variability and in turn limited the explanatory power regarding habitat suitability trends over time. However, recent advances in remote sensing technology can solve these problems in future works. Finally, the derived species distribution maps provide valuable information for conservation endeavors in GPNP, e.g., to identify species hot spots and potential connectivity paths inside the protected area. Furthermore, they support environmental change scenarios to identify potential biodiversity losses.

Supplementary Materials:
The following are available online at www.mdpi.com/xxx/s1, Section S1: Detailed characteristics of analyzed species. Figure S1: Random points to obtain pseudo-absence data for species distribution modelling. Figure S2: Satellite remote sensing input data. Figure S3: Spearman rank correlations of explanatory variable values. Figure S4: Values and ranges of the SRS-based model training data. Figure S5: Variable response curves of ensemble models predictions. Table S1: Spearman rank correlation of species distribution maps. Table S2: Categorization of predicted species distribution. Figure S6: Uncertainty maps of predicted species distribution based on aggregated true presence data.
Author Contributions: Conceptualization: A.D., S.R., R.S., and A.C. designed the study and methodology. Project administration: A.C. Investigation: C.C. and R.V. conducted species sampling in the field, R.S. acquired and processed satellite images. Software: M.E., A.D., S.R., and A.C. wrote and adapted the R code for ensemble modelling. Formal analysis: A.D. and S.R. applied the R code to analyze the data; A.D., S.R., and A.C. prepared figures and tables. Validation: C.C. and R.V. evaluated species distribution maps based on local knowledge. Funding acquisition: A.C., C.C., R.S., and R.V. Resources: A.D., S.R., A.C., C.C., R.S., and R.V. Writing, original draft: All authors wrote the manuscript. All authors have read and agree to the published version of the manuscript.
Funding: This work has been carried out within the H2020 project 'ECOPOTENTIAL: Improving Future Ecosystem Benefits Through Earth Observations' (http://www.ecopotential-project.eu). The project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No 641762. Parts of the R code used in this study were developed in the context of the project 'Improving species distribution models of endangered plants in Mexico by utilizing remote sensing data and spatial measures of model uncertainty' funded by the National Commission for the Knowledge and Use of Biodiversity of Mexico (CONABIO).