Potential of Airborne LiDAR Derived Vegetation Structure for the Prediction of Animal Species Richness at Mount Kilimanjaro

: The monitoring of species and functional diversity is of increasing relevance for the development of strategies for the conservation and management of biodiversity. Therefore, reliable estimates of the performance of monitoring techniques across taxa become important. Using a unique dataset, this study investigates the potential of airborne LiDAR-derived variables characterizing vegetation structure as predictors for animal species richness at the southern slopes of Mount Kilimanjaro. To disentangle the structural LiDAR information from co-factors related to elevational vegetation zones, LiDAR-based models were compared to the predictive power of elevation models. 17 taxa and 4 feeding guilds were modeled and the standardized study design allowed for a comparison across the assemblages. Results show that most taxa (14) and feeding guilds (3) can be predicted best by elevation with normalized RMSE values but only for three of those taxa and two of those feeding guilds the difference to other models is signiﬁcant. Generally, modeling performances between different models vary only slightly for each assemblage. For the remaining, structural information at most showed little additional contribution to the performance. In summary, LiDAR observations can be used for animal species prediction. However, the effort and cost of aerial surveys are not always in proportion with the prediction quality, especially when the species distribution follows zonal patterns, and elevation information yields similar results.


Introduction
We are facing a decrease of global biodiversity [1,2] and the rate of this loss is accelerating with ongoing climate change [3] as well as the rapid transformation of natural habitats by human landuse [4].To mitigate the effects of this biodiversity loss on the functionality of ecosystems [5][6][7], monitoring of species and functional diversity is an important prerequisite for focused management strategies [8][9][10].To facilitate a unified monitoring system, a set of essential biodiversity variables (EBVs) were developed during the last years, e.g., [11][12][13].However, gathering those variables during field campaigns is only possible in a number of limited situations, as area-wide coverage is unfeasible due to high costs as well as the lack of experts.This is particularly true for surveys across large areas with a steep elevation gradient, as complex terrain hinders accessibility [14].
To advance the application of species richness related EBVs from remote sensing, performance must be compared across many taxa.While meta-analyses across different case studies allow some conclusions, the individual study design, different computations of error estimates and the uniqueness of the study regions make it difficult to actually compare the results regarding the model performance for different taxa.This becomes even more challenging if the study has been conducted in mountainous terrain, which is common for ecological space-for-time approaches as global biodiversity hotspots often tend to be in mountainous areas [40].Here, elevational change of the land cover in combination with a fixed amount of work force generally limits the number of ground samples that can be collected within one land cover zone.These individual limitations in training and testing datasets lead to a variety of testing approaches with varying degrees of reliability and comparability of error estimates.
This study analyzes the predictive performance of airborne LiDAR-derived variables for mapping the species richness of 17 taxonomic groups from four feeding guilds in a comprehensive manner.The study area is located at the southern slopes of Kilimanjaro (Figure 1) and field observations stretch from an elevation of 800 to 4400 m.Since the taxonomic assemblages cannot be directly observed using LiDAR, vegetation structure is used as a surrogate for species richness.It is assumed that the taxa or the aggregated feeding guilds can be predicted differently well by LiDAR data.For example, the "plant diversity hypothesis", links consumer richness (especially herbivores) to plant diversity [41][42][43].Therefore, it is expected that the performances of structural models decline from herbivores to predators [14,44], as the distance of their position in the food chain to plants increases.Furthermore, as structural properties, at scales accessible for remote sensing, tend to be more relevant for animals with larger body-sizes [45][46][47][48], it is expected that the performance of species richness models increases with increasing body size.Similar consideration of correlation between structure and species traits apply for flying taxa that perceive the landscape with a grain not as detailed as walking taxa (environmental grain hypothesis, e.g., Kaspari and Weiser [49], Sarty et al. [50]).The unique dataset also allows for critically evaluating whether LiDAR-derived information brings any gain at all compared to models that rely solely on the known decrease of species richness with elevation or elevation correlated environmental properties [43].This study investigates the potential of LiDARderived variables characterizing vegetation structure as predictors for animal species richness at the southern slopes of Mount Kilimanjaro.To disentangle the structural LiDAR information from co-factors related to elevational vegetation zones, LiDAR-based models were compared to the predictive power of elevation models.

Study Area and Sampling Design
The study area is the southern slopes of Mount Kilimanjaro.In the framework of a comprehensive research unit, study plots of 50 m × 50 m were established across 12 land cover zones with five replica plots per zone.The 60 plots describe two ecological gradients: an elevational gradient from 800 m a.s.l. to 4400 m a.s.l. and a disturbance gradient from (near) natural to anthropogenic land-cover types within elevation zones (Figure 1 and Table A1).

Diversity Data
On 59 of the 60 study plots, data for estimating the diversity of 17 taxa were available.Sampling followed standardized approaches as described in detail in Peters et al. [43] and Peters et al. [52].For an overview of the taxa and sampling methods, see Table A2.The species richness data were aggregated (sum of species) to four feeding guilds.See Tables A3 and A4 for the proportional allocation of species per taxon to feeding guild.As true bugs, spiders and springtails were not identified to the species level, the entire group of spiders was assumed to be predators, springtails count as decomposers and true bugs were ignored when estimating species richness of feeding guilds.

LiDAR Data
The LiDAR data set was acquired during two missions with a Riegl LMS-Q780 sensor carried by an Airbus Helicopter at an altitude between 850 m and 1750 m above ground level.The northern (high land) plots were sampled in March 2015 and the southern (low land) plots in November 2016 (Figure 1).The temporal distance was assumed to be negligible since both acquisition dates fall into the early rainy season and plots of the same land-cover type are covered within the same flight campaign except for disturbed ocotea forests.The LiDAR pulses contain between one and seven returns with a vertical accuracy of 0.15 m and a horizontal accuracy of 0.20 m (95% confidence interval).The mean point density is 34 points per square meter but varies due to terrain and flight conditions.Outliers were removed and points were classified into ground and non-ground following the standard procedure using the LAStools preprocessing software [53].Rasterized LiDAR layers (e.g., digital terrain model (DTM), digital surface model (DSM), and canopy height model (CHM)) were generated by the open source remote sensing data base (RSDB) at a resolution of 1 m [54].The resulting DTM preserves fine details in regions with high ground point densities and plausible elevation estimates in regions with low ground point densities (e.g., dense forest).
To derive a set of potential predictor variables from the LiDAR observations, several indices, which characterize structural properties of the 50 m × 50 m areas, were computed for each of the plots using the RSDB (see Table 1) [54].The compiled 97 LiDAR metrics used included e.g., canopy height metrics (maximum, standard deviation, median, quartiles, etc.), return number metrics (maximum, standard deviation of different layers, etc.) and ecological estimates (leaf area index, above-ground biomass and gap fraction, etc.; for the complete list of variables, see Table 1).The land covers in the study area can be grouped into forest and non-forest (see Figure 1 and Table A1 for details).Due to their complex multi-layered structure, forested plots appear considerably different to non-forested plots hence the sets of LiDAR variables for these two types differ slightly.For the current study, this means that, on non-forested plots, variables describing vegetation layers reached a maximum of 8 m height, and on forest plots, vegetation height reached a maximum of 29 m (indicated in Table 1).The two thresholds correspond to variables where at least 50% of the plots had vegetation in this height.In the following, all modeling approaches were always carried out for forested and non-forested plots separately, to account for these fundamental differences.

Predictive Modeling of Diversity
The computations and analyses in this study were performed using the R environment 3.5 in conjunction with the caret package [57,58].Partial least squares regression (PLSR) is useful for models in data settings with a smaller number of observations relative to the number of predictor variables.It can also handle multicolinearity, a situation that is unavoidable, when using LiDAR-derived variables [59][60][61][62].To reduce the impact of overfitting caused by correlated variables, a forward feature selection (FFS) implemented in the CAST package [63] was used, which ensures a more stable variable selection than recursive feature elimination approaches [64].
To distinguish between effects on species richness predictability based on a pure elevation gradient versus habitat structure, three different model groups (elevation and its square only, structural metrics only and structural metrics to predict residuals of elevation based model) were established.Then, within each of these three groups of models, an individual prediction model was separately built for each taxon and feeding guild for forested and non-forested areas.The same combinations of plots for training and testing were used across all models (Table 2 for overview).In the first group of models, only the elevation and its square were used to predict species richness ("elevation model").In the second group of models, only the structural metrics derived from LiDAR (no elevation) were considered ("structure model").In the third group of models, the same structural metrics were used to predict the residuals of the elevation model ("residual model").Hence, the predictions of the residual model do not represent the complete species richness, but only that part which cannot be explained by the elevation model.Therefore, to be able to compare the results of this model with the elevation and structure model, the results of the residual model were added to the elevation model ("combination model").Even though it is not a separate model in a strict sense, but only the sum of the elevation model and the residual model, this mixed approach will be called "combination model" in the following.The pure residual model, on the other hand, can be used to compare the plain structure dependence of taxa and feeding guilds without effects of elevation.Prediction results from forested and non-forested areas were assembled to one error estimation per response variable, to compare the general model performance of taxa and feeding guilds for the whole study area.
To test if the model performance depends on species traits, the correlation of model performance of each taxa and feeding guild to the respective body size and the mode of movement were tested.For body size, the Spearman rank correlation coefficient was calculated.Groups were sorted by body size from large (large mammals with up to 1.7 m length) to small groups (parasitoid wasps with only a few millimeters).For the test between model performance and flying/non-flying groups of organisms, the Mann-Whitney U-Test was performed.

Validation Strategy and Model Tuning
Due to the limited number of observation samples per taxonomic group, choosing an appropriate tuning and testing strategy of the various models was of major importance.As illustrated in Figure 2, model training and testing consisted of two separate cross validation cycles.The outer 20-fold-cross validation withholds one random plot of each land-cover type in every resample.Those samples were held back from model training to qualify them for estimating the model performance for new locations in the study region.This repeated approach allows for more stable validation results given the limited number of plots.The inner cross validation was embedded within the PLSR machine learning approach.It uses the same method of leaving one plot per land-cover type out in each resample.The inner cross validation was used for model tuning and variable selection only.Tuning affected the number of principal components used in the PLSR and varied between one and two.Feature selection was implemented according to Meyer et al. [64].
For quantifying the predictive performance, the root mean squared error (RMSE) was computed for each fold of the outer cross validation.Previously, the results from forested and non-forested areas were combined, to be able to make a general statement per taxon and feeding guild.Since species richness varied considerably across the taxonomic groups, the RMSE of each group was normalized with the standard deviation of the species richness per group of the plots used in each model.

Results
The elevation model performs best for 14 out of 17 taxa (Figure 3a and Table 3).On average, the RMSE/sd values for these 14 taxa are 0.21 lower than in the structural and 0.23 lower than in the combined model.The structural model performs better only for parasitoid wasps and the combined model is the best for aculeate wasps and insectivorous bats.
For all three model types, the interquartile range (IQR) of large mammals, springtails, bees, parasitoid wasps and insectivorous bats is rather small, while syrphid flies, moths, dung beetles and grasshoppers show large variations of RMSE/sd values.Only large mammals, millipedes and springtails show a significant superior model performance for the best model (here elevation) compared to both other models (Tukey test).For an individual taxon, a median performance of the RMSE/sd of half the standard deviation or better is only reached for ants, grasshoppers, springtails, bees, parasitoid wasps, other aculeate wasps and insectivorous bats.Bees reach the best model results across all taxa and model types with an RMSE/sd of 0.34 (elevation model).Regarding feeding guilds, the elevation model performs best for generalists, decomposers and predators with an RMSE/sd value that is 0.20 and 0.13 lower on average than the structural and the combined model, respectively (Figure 3b and Table 3).Only for generalists and decomposers was the best model (elevation) performing significantly better than the other two.The structural model performs best for herbivores but only with slight differences in the RMSE/sd to the combined (0.01) and the elevation model (0.02).
To explore the potential of modeling species richness outside the gradient of Mount Kilimanjaro, Figure 4 shows a comparison of the plain residual models (RMSE/sd).These results are independent of the elevation and do not model species richness, but the residuals of the elevation model.Therefore, it is possible to compare the ranking of species performance as it would be suspected if it was only dependent on structure without a superimposing elevational gradient.Taxa and feeding guilds are sorted by their median error estimates which range between 1.1 and 2.5 (Table 3).Smaller values here mean a closer relationship to structural metrics.Value ranges of model performances within each group lie within the same magnitude, except for dung beetles which show a high variation.The RMSE/sd of the residual model shows a ranking of the feeding guilds from predators (1.2), over herbivores (1.3), to decomposers (1.8) and generalists (2.2).The analysis of the best subsets of prediction variables could not identify regular patterns (Figures A1-A4).
There is no statistical relationship between model performance and body size of the assemblages (Table 4).However, there is a difference for the combined and residual model with a better performance for the flying than for the non-flying taxonomic groups (see Table 4).For the description of plot elements, see Figure 3.

Table 4.
Results of rank tests comparing the performances of models measured by RMSE/sd (as shown in Figures 3 and 4) with respect to selected traits (body size, mode of movement) of the taxa.
For the tests between the performance of the models and body size, the Spearman rank correlation coefficient (r) was used.Body size was sorted from large to small groups.For the test between flying and non-flying groups of organisms the Mann-Whitney U-Test was used.Significant results, in terms of the p-value, are marked bold.

Discussion
The study evaluated the potential of LiDAR data to predict species richness at Mount Kilimanjaro.The influence of the respective effects of elevation and vegetation structure on species richness were investigated by comparing the model performances of models that used elevation as the only predictor, models that used LiDAR variables only and models that used LiDAR variables to predict the residuals of the elevation models.
Generally, performances of the different models varied only slightly within each taxon, with no significant difference of the best performing model to both other models, except for three taxa (large mammals, millipedes, springtails).However, there is a trend, indicating that the elevation model performs best for 14 out of 17 taxa.All taxa which do perform significantly better with one specific model, belong to that group.In the cases where the structure or combined model performs best, the performances differ only marginally and differences are not significant.
As expected, results of the model performances of feeding guilds indicate that herbivores are influenced more by structure than generalists and decomposers (Figure 4b).However, considering the feeding guilds, generalists and decomposers are the only groups for which the best of the three models (elevation model) is significantly better than the other two.It is ecologically reasonable that generalists, which obviously use a wide variety of food, are, at least for feeding reasons, not specifically connected to the vegetation structure.Opposed to that, for herbivores, structure was suspected to be the most relevant predictor as they rely solely on vegetation and therefore structure should influence feeding patterns and the occurrence and diversity of species Even though performances improve slightly with the structure model, differences were not significant.Decomposers rely on the existence of organic material.Still, as long as the supply of organic material is given, it seems reasonable that other environmental factors, which are linked to elevation, would have a greater impact.In conclusion, prediction results for the feeding guilds show a tendency of the hypothesized correlation between a lower feeding guild and a higher dependency on the structure.However, these differences are very small and therefore not convincing.The model performances of the feeding guilds are generally comparable with the ones of the individual taxa.Nevertheless, as more field samples are included in the feeding guilds, sampling uncertainties are partly leveled out by the higher number of sampled individuals.
This study further aimed at comparing the general potential of LiDAR-derived variables for the prediction of the structurally dependent proportion of species richness for different taxa and feeding guilds.For this comparison, the elevation corrected residual model provides the relevant information (Figure 4).In line with the discussion about the best model type, generalists and decomposers seem to be the group not tightly connected to habitat structure, whereas herbivores seem to depend more on vegetation structure (Figure 3a).Along with the other models, there is no notable difference in the overall performances between taxa and feeding guilds for the residual models.
A comparison between the model types allows for drawing conclusions about the influence of elevation and structure as relevant predictors for biodiversity.In their study at Mount Kilimanjaro about diversity gradients at different levels of aggregation, Peters et al. [43] already showed that mean annual temperature is the most important variable to predict animal species richness in the region.In the present study, some taxa are significantly more influenced by elevation than by properties of the structure itself, but generally, median performances between models differ only slightly.The residual model attempts to illuminate patterns within the remaining structural properties that are not attributable to the strong gradient in the study area.However, only samples from four to five replica plots per land-cover type have been available for model training which limits the performance.This might promote over-fitted models for structural properties leading to larger prediction errors when applied to unknown locations, whereas the elevation model is able to find a general pattern within all plots as they are well distributed along the elevation gradient (Figure 1).Still, even a slightly worse structural or combined model compared to the elevation model validates the general usability of LiDAR data for predicting species richness, even though the effort of LiDAR missions then seems questionable.In the variable selection of the LiDAR metrics, no patterns emerge (Figures A1-A4).Neither individual variables nor variable groups (Table 1) appear in clear patterns across models.The LiDAR variables were calculated for individual plots (50 m × 50 m).For some taxa, it might be beneficial to account for the structure of a larger spatial environment.Therefore, in future studies, it could be tested whether variable cell sizes of the LiDAR metrics can improve prediction models.
The hypothesized positive correlation between body size and the modeling performance is not supported by the data.However, the mode of movement significantly correlates with the prediction performance in the combined and the residual models.Especially in the residual model, flying taxa outperform the others.The six taxa with the smallest median error are species with the ability to fly.Only the flying taxa bees and birds (Rank 9 and 14 out of 17) lie within the worse performing half of taxa, showing the generally poorer performance of non-flying taxa.The comparably poor performance of predicting birds with structural metrics alone is rather surprising, as birds are the most studied taxonomic group in species-habitat structure relationships [65] and there are many studies that demonstrate promising correlations of bird diversity and different structural features (e.g., Müller et al. [35], Smart et al. [66], Vogeler et al. [67] or see the detailed review of Davies and Asner [68]).
The results of this study are based on 59 plots.Even though the total number of study sites seem to be sufficient for modeling purposes, compared to similar studies, the different land-cover types that follow the elevation gradient, in addition to the necessary division into forest and non-forest areas for modeling, limit the number of repetitions.Hence, model building has to be carefully adjusted to the limited number of plots.The possibility of also using land-cover type as a categorical predictor variable was discarded due to the low number of replicates.However, land cover is indirectly included in the model by the natural orientation of land cover along the elevational gradient at Mt. Kilimanjaro.
In general, it is not easy to evaluate the results in the context of other studies, since a comparison of the results can only provide indications of the success of the modeling.This is because the studies were conducted in different landscapes, for different taxa, but most importantly, with different measures of biodiversity.Species richness, beta diversity, and other metrics are related but not identical.In previous studies, the role of elevation is handled in different ways.The studies of Müller and Brandl [14] and Vierling et al. [69], for example, analyze the influence of LiDAR-derived variables compared to other abiotic and biotic variables for the prediction of spider species distribution and forest beetle assemblages.Results show a comparable or even much better performance of LiDAR variables to ground based measures [69].As the variable elevation is a by-product of LiDAR point clouds, these studies included elevation in the group of LiDAR-derived variables, with elevation being a rather important variable.However, elevation changes within the study area are limited to about 800 meters with a rather homogeneous forest cover.
The studies of Zellweger et al. [39] and Rechsteiner et al. [37] are situated in a more mountainous terrain (>1200 m elevation difference); however, the LiDAR derived variables are limited to structural ones and elevation is not used as a predictor, although elevation is intrinsically included in structural variables at least along elevational vegetation zones.With similar complex terrain (around 4000 m elevation difference), Zellweger et al. [70] used structural as well as topographic and climate variables.Even though elevation was not used directly, climatic variables (including temperature) showed the highest importance for modeling beta-diversity of birds and butterflies, even exceeding results when vegetation structure was included in the models.This seems consistent with the results of the present study, given that elevation is a main proxy for temperature [71].Overall, all these studies show clearly that the elevation gradient might be able to explain a major part of structural variables.An observation in a similar sense is made by Acebes et al. [65] in their review of 173 papers.They find, especially for forested areas, that, while canopy height is most commonly used as a LiDAR metric to model species-habitat structure, canopy cover and terrain topography performed better overall when they where used.
The study of Müller et al. [72] covers an elevation gradient of around 800 m and does only take vertical profile metrics derived from LiDAR data into account.They could show that canopy arthropod diversity is driven by different structural features in the vegetation.Using a similar number of study plots as in our study, but exclusively in spruce forest, the overall vegetation structure is much more homogeneous than at Mt. Kilimanjaro.Thus, finer differences are likely to be masked by the large variability in vegetation structure in the models.The same is true for the study of Schooler and Zald [73], who analyzed the predictability of small mammals diversity in temperate mixed forest and found that it could be predicted by LiDAR derived structural metrics.
Therefore, when using LiDAR data, non-structural properties (e.g., elevation, temperature, or other abiotic variables-depending on the study area) should be investigated separately to avoid false conclusions concerning the effect of LiDAR-derived vegetation structure.Those abiotic conditions are relevant for modeling and therefore models from one study area are not necessarily representative in other areas [38].Using a separate residual model shows great potential to avoid spurious correlation that leads to erroneous predictions when the model is applied to new locations.
At Mount Kilimanjaro, with its substantial elevation gradient, the utilization of LiDAR data does not significantly improve modeling results.A larger sampling size per land cover is required to further improve the robustness of conclusions drawn for the selection of models.To approach the long-term goal of comprehensive mapping of EBVs like species occurrence or taxonomic diversity with the use of remotely sensed data, areas with a less complex land-cover gradient in homogenous landscapes need to be addressed in future studies to understand the influence of structure better.
To provide comparable results, further studies need to be conducted on multi-taxon approaches with field surveys and data sets of similar granularity.Study areas with different terrain complexities should be considered.In doing so, a solid base for valuable model-building strategies can be generated and can assist the research community in quantifying EBVs in the future.

Institutional Review Board Statement:
The study was conducted with the least possible disturbance for nature and the environment.As we aimed at collecting data on the real ecosystems, we did not experimentally modify any vegetation.Vertebrates were sampled by acoustic and visual detection, which did not cause any disturbance.To assess biodiversity of arthropods, we had to collect, kill and preserve them.However, we are confident that this did not have a major impact on the species populations.During the study, we recorded 53 mammals, 202 birds and 1909 arthropods.Mammals and birds were identified by visual and acoustic detection; they were not captured nor killed.Arthropods were collected using ethanol or ethylenglycol for killing and preservation of specimens.Arthropods were mounted and labelled to be stored in public museum collections in Germany and Tanzania.

Data Availability Statement:
The data that support the findings of this study are documented and archived in the central project database of the DFG-Research Unit FOR1246 (https://www.kilimanjaro.biozentrum.uni-wuerzburg.de,accessed on 22 October 2021), and are available from data owners upon reasonable request.Most data are already published or will be published shortly via GFBio (https://www.gfbio.org/,accessed on 22 October 2021), following the Rules of Procedure of the German Research Foundation (DFG) and the DFG-Research Unit FOR1246.
The R scripts used within the study are available under a GPL 3.0 license as Git repository at github.com.A release of the Git repository to reproduce the results of the study is also available at https://github.com/envima/Kili_src,accessed on 22 October 2021.

Land Cover
Elevation

Figure 1 .
Figure 1.Study area with sampling plots.Colors of symbols show different land covers, shapes show the different flight missions from 2015 and 2016.The background image indicates the large-scale vegetation zones along the elevational gradient (background: Google Maps [51]).

Figure 2 .
Figure 2. The model training (upper right loop) uses a partial least squares regression (PLSR) and a forward feature selection with a 20-fold cross validation.Validation is carried out by predicting the values of the testing plots.The division of testing and training plots (outer loop) follows a repeated stratified sampling approach, with randomly chosen resamples of one plot per land cover for the testing, leaving the rest of the plots for training.Validation is based on the median root mean square error (RMSE) of the individual resamples, normalized by the standard deviation of these RMSE values.

Figure 3 .
Figure 3. Modeling performances for each taxon (a) and feeding guild (b) in terms of the root mean square error normalized by standard deviation (RMSE/sd).Smaller values show a better model performance.Colors represent the different model types.Taxa are grouped into "elevation", "structure" and "combination" depending on which of the three models shows the best median RMSE/sd.Stars indicate if the best model is significantly better than both of the other models.Within the groups, taxa and feeding guilds are sorted by descending RMSE/sd.The boxes include the median and the inter quartile range (IQR) with notches indicating roughly the 95% confidence interval.Whiskers are extending to ±1.5 times the IQR and points indicate single error values outside of this range.

Figure 4 .
Figure 4. Modeling performance for the residuals of the elevation model for each taxon (a) or feeding guild (b) as root mean square error normalized by standard deviation [RMSE/sd].Taxa (a) and feeding guilds (b) are sorted by increasing median modeling performance and therefore increasing influence of vegetation structure on the target variable (which means decreasing median RMSE/sd).For the description of plot elements, see Figure3.

Figure A1 .
Figure A1.Variable selection for the structure model of each taxon in forest.Colors show how often variables were included during 20-fold cross-validation.Structural variables are sorted by the total number of times they where selected.See Woellauer et al.[54] for variable details.

Figure A2 .
Figure A2.Variable selection for structural variables for the residual model and combined model of each taxa in forest.Colors show how often variables were included during 20-fold crossvalidation.Structural variables are sorted by the total number of times they where selected.See Woellauer et al.[54] for variable details.

Figure A3 .
Figure A3.Variable selection for the structure model of each taxon in non-forest.Colors show how often variables were included during 20-fold cross-validation.Structural variables are sorted by the total number of times they where selected.See Woellauer et al.[54] for variable details.

Figure A4 .
Figure A4.Variable selection for structural variables for the residual model and combined model of each taxa in non-forest.Colors show how often variables were included during 20-fold crossvalidation.Structural variables are sorted by the total number of times they where selected.See Woellauer et al.[54] for variable details.

Table 1 .
Overview of structural variables characterizing the vegetation and a description on their calculation.Most indices were calculated on the LiDAR (Light Detection And Ranging) point cloud of each plot (50 m × 50 m); only a few were calculated on 1 m × 1 m cells of the canopy height model (marked: based on CHM).

Table 2 .
Overview of the different models calculated in this study.

Table A2 .
[52]ral sampling methods as well as some details and the calculation of the species richness of the biodiversity data.Further details on the sampling approaches can be found in Peters et al.[43]and Peters et al.[52].If not indicated otherwise, the species richness is calculated as the total (cumulative) number of species per study site (equal sampling effort for all sites).

Table A3 .
Fractional breakdown of feeding guilds.The numbers for each taxon represent the relative contribution to the species richness of a given feeding guild.Therefore, each column adds up to a total of 1. Taxa are listed in alphabetic order.True bugs were not identified to the species level and are therefore not included.

Table A4 .
Fractional breakdown of taxa.The numbers for each feeding guild represent the relative contribution of species richness to a given taxon.Therefore, each row adds up to a total of 1. Taxa are listed in alphabetic order.True bugs were not identified to the species level and are therefore not included.