Predicting Habitat Properties Using Remote Sensing Data: Soil pH and Moisture, and Ground Vegetation Cover

: Remote sensing data comprise a valuable information source for many ecological landscape studies that may be under-utilized because of an overwhelming amount of processing methods and derived variables. These complexities, combined with a scarcity of quality control studies, make the selection of appropriate remote sensed variables challenging. Quality control studies are necessary to evaluate the predictive power of remote sensing data and also to develop parsimonious models underpinned by functional variables, i.e., cause rather than solely correlation. Cause-based models yield superior model transferability across different landscapes and ecological settings. We propose two basic guidelines for conducting such quality control studies that increase transferability and predictive power. The ﬁrst is to favor predictors that are causally related to the response. The second is to include additional variables controlling variation in the property of interest and testing for optimum processing method and/or scale. Here, we evaluated these principles in predicting ground vegetation cover, soil moisture and pH under challenging conditions with forest canopies hindering direct remote sensing of the ground. Our model using lidar data combined with natural resource maps explained most of the observed variation in soil pH and moisture, and somewhat less variation of ground vegetation cover. Soil pH was best predicted by topographic position, sediment type and site index ( R 2 = 0.90). Soil moisture was best predicted by topographic position, radiation load, sediment type and site index ( R 2 = 0.83). The best model for predicting ground vegetation cover was a combination of lidar-based estimates for light availability below canopy and forest type, including an interaction between these two variables ( R 2 = 0.65).


Introduction
The use of remote sensing (RS) technologies has been a breakthrough for ecology, which has long struggled with the field-based means for mapping spatial properties. Since the first aerial photographs were taken in the mid-1800s, the range of remote sensing technology has evolved greatly [1]. From multispectral satellite images to aerial laser data, remote sensing is now utilized in a range of ecological contexts, such as for monitoring biodiversity [2], assessing ecosystem services [3] and mapping terrestrial and aquatic habitats [4,5]. With such methods, it should be possible to characterize large areas with respect to environmental demands of vulnerable and threatened species in a more efficient way.
When using RS data, a demanding step is the choice of processing method to obtain derived variables. These processed RS variables should predict the property of interest as accurately as possible. Without the resources needed to perform a pilot study, the best alternative is to utilize previous studies testing alternative RS indices' predictive power for evapotranspiration [28]. Furthermore, forest type may show correlation to soil moisture due to differences in the tree species requirements with respect to soil moisture conditions, i.e., Pinus sylvestris being more drought tolerant than Picea Abies [29]. For ground vegetation cover, an important controlling factor is light availability, presumably modulated by forest type [30]. Unfavorable growing conditions due to poor nutrient or moisture conditions, or unstable masses on steep slopes, may also affect ground cover due to lower plant establishment and growth [31][32][33]. In addition, ground vegetation cover may correlate with soil depth, because areas with thin soils are more likely to have patches of bare rock.
Therefore, in this study we address the question: Can we build RS-based models with high predictive power, using mainly variables that have a causal relationship to the three ecological properties of soil pH, soil moisture and ground vegetation cover?
Remote Sens. 2022, 14, x FOR PEER REVIEW may affect soil moisture via differences in the water holding capacity and drainag [20]. In addition, solar radiation load may affect soil moisture through its effect on otranspiration [28]. Furthermore, forest type may show correlation to soil moisture d differences in the tree species requirements with respect to soil moisture condition Pinus sylvestris being more drought tolerant than Picea Abies [29]. For ground vege cover, an important controlling factor is light availability, presumably modulated b est type [30]. Unfavorable growing conditions due to poor nutrient or moisture c tions, or unstable masses on steep slopes, may also affect ground cover due to lower establishment and growth [31][32][33]. In addition, ground vegetation cover may cor with soil depth, because areas with thin soils are more likely to have patches of bare Therefore, in this study we address the question: Can we build RS-based model high predictive power, using mainly variables that have a causal relationship to the ecological properties of soil pH, soil moisture and ground vegetation cover?

Field Data Collection and Processing
Forest-covered areas inside the two study areas were delimited using ArcMap v10.7.1 (excluding clear-cuts and young forests). Random sampling points were generated in QGIS v3.16.3, with a minimum distance of 200 m between points to tentatively reduce the likelihood of spatial autocorrelation. This was also achieved in area 2, where small patches of forest within the agricultural area led to sample plots becoming very clustered. During the field survey, conducted in June 2021, a high-precision GPS (Topcon HiPer SR) was used to mark the exact coordinates at the center of each plot. Around each sampling point, a 10 × 10 m sampling square was marked, 56 squares in total (area 1: n = 36, area 2: n = 20). Within each square, data on ground vegetation cover (%), vascular plant species, dominating tree species, soil depth (cm) and vegetation type were recorded. Vegetation type was determined following the Norwegian system "Nature in Norway" (NIN) [34,35]. From the collected data, the mean Ellenberg indicator value for soil pH was calculated [36]. The NIN vegetation types were used to quantify soil moisture, as they are ordered on an ordinal scale after vegetation drought tolerance (4 levels). In addition, we included an additional level for vegetation affected by ground water, increasing the number of levels to 5. Using vegetational bioindicators to represent soil pH and soil moisture has some advantages and disadvantages. On the positive side, vegetation patterns are a more stable expression of the abiotic conditions that often fluctuate with time [37]. On the negative side, using vegetation as a bioindicator depends on how well one can identify a representative number of the species present. This can be difficult when vegetation cover is suppressed, such as under low light conditions.
The ground vegetation was initially defined as all vascular plants lower than 50 cm, and percent cover was estimated visually. Lichen cover was later included in the definition of ground vegetation cover to reduce model complexity and because the light lichen cover can easily be delimited later using satellite images. Lastly, soil depth was measured using a thin metal rod (max. 30 cm) to approximate the depth of the soil available for the ground vegetation [38]. Soil depth was measured in the center of the 10 × 10 m plot, and in each 5 × 5 m square created when dividing the plot into four equal pieces. The average was then taken for each plot.

Generating Predictor Variables from Remote Sensing Data
Recent lidar (light detection and ranging) data were acquired from the national map service [39][40][41]. The data have a medium to high point density (2-5 per m 2 ).
Five remote sensing indices representing topographically controlled water drainage was included: Saga Wetness Index (SWI), depth-to-water table (DTW), topographic position (TPI), deviation from elevation (DEV) and slope (SLOP) ( Table 1). To represent variation in spatial scale, SWI was generated from terrain models with different resolutions (1, 5, 10 and 30 m) using the Saga plugin in QGIS [42]. DTW was generated with a range in drainage area sizes (1, 2, 4, 6, 8 and 10 ha) using ArcMap v10.7.1 and the D8 flow algorithm [43]. TPI was estimated by subtracting the elevation at each cell from the mean elevation of the surrounding neighborhood. DEV was calculated the same way, but in addition normalized by the standard deviation to account for local surface roughness [44]. The effect of the neighborhood size was assessed using a radius of 10, 30 and 100 m. Slope was estimated from a 1 m resolution terrain model. Representing light conditions or radiation load, four different remote sensing indices were included: canopy cover (CC), heat load index (HLI), Subcanopy Solar Radiation model (SSR) and a modified version of SSR accounting for forest gaps (GAP) ( Table 1). A canopy height model at 1 m resolution was used to calculate canopy cover (%, canopies were defined as vegetation > 2 m height). A 2 m resolution DEM was used to calculate HLI at each study site, using the "area solar radiation tool" in ArcGIS v10.7.1. To model sub-canopy solar radiation, the heat load index was multiplied with a light penetration index (LPI) [10]. LPI is the proportion of laser beams reaching the forest floor when accounting for the filtering effect of the canopy. LPI was corrected for solar angle using a moving kernel that "smooths" the canopy at a certain angle dictated by the sun angle. Furthermore, gaps and edges in the canopy were accounted for in the GAP model [10]. Spatial scale was assessed only for canopy cover. Here, the mean was taken for the 10 × 10 m plot, but also for an area of 20 × 20 m and 30 × 30 m around the plot center. The remaining light indices were assessed with respect to radiation properties: diffuse, direct and a combination of direct and diffuse radiation proportionate to conditions in southeastern Norway (52% diffuse, 48% direct) during the growing season.

Additional Information Included
Decent maps based on broad-scale field surveys are available for some spatial properties with high utility for many purposes. We included existing maps for quaternary sediments, calcium content in bedrock and site index (Table 1). A disadvantage of predicting soil pH based on bedrock calcium content is that the thick layer of marine clay likely decouples the topsoil from the bedrock. Thus, we created a map using the calcium content map as the base and exchanging the calcium content value for all areas covered by clay to the second-highest level for calcium content (level 4). These maps had a considerably lower resolution compared to the remote sensing data, with sediments and bedrock in 1:50,000-1:250,000 scale and site index 1:5000. The low-resolution data can be a problem when used for predictions together with high-resolution lidar data, as they likely will lead to some inaccuracies when making predictions on a finer scale. On the other hand, including them may increase the model's explanatory power and may thus give better predictions than when not included.
In addition, we included forest type based on field observations (Table 1). However, this variable can also be generated using remote sensing data [52]. Finally, soil depth from field measures was included in the analysis, although it could been approximated using remotely sensed data [53].

Statistical Analysis
A generalized least-squares linear model with Gaussian distribution was used for the analysis of soil pH, using the nlme R package [54]. Soil moisture was analyzed using proportional odds ordinal regression with a logit link via R packages rms [55] and MASS [56]. Here, the proportional odds assumption was checked for all predictor variables using the brant R package [57]. For the analysis of ground vegetation cover, we used beta regression with a logit link via the betareg R package [58]. All statistical analyses were performed in R v4.0.4 [59]. First, simple predictor-response regressions were used to identify the "best" alternative predictor variable representing topographic wetness (SWI, DTW), topographic position (TPI, DEV), solar radiation load (CC, SSR, GAP, HLI) and soil properties (sediment type, bedrock calcium content) for each response, with some exceptions. For soil moisture, we chose sediment type to be the most relevant based on existing knowledge [20]. When analyzing soil pH, solar radiation load was not deemed relevant. AICc was used for model selection using the MuMIn R package [60]. In case the predictor had a nonlinear relationship to the response, we included polynomial terms. The variables representing light availability/radiation load were centered because of large values, and topographic position variables were centered for easier interpretation [47]. Residuals were checked for spatial autocorrelation using Moran's I index from the ape R package [61]. If found to be an issue, the model selection procedure was performed including either location or a spatial correlation term in all models compared.
After finding the best alternative variable for topographic wetness (SWI, DTW), topographic position (TPI, DEV), solar radiation (CC, SSR, GAP, HLI) and soil properties (SED, CA.sed, CA.bed) ( Table 1), these predictors were included in a second model selection procedure to find the best combination of variables, using multiple regressions and AICc score to rank models. Location was included as a fixed effect, and whereas slope, soil depth, site index and forest type were included in all the multiple regression analyses, soil depth was excluded in the soil pH model. In the analysis of ground vegetation cover, an interaction between light and forest type was included as part of the multiple regression model selection. Model diagnostics on top-ranked models were performed using R packages Dharma [62] and Sure [63], and prediction plots were generated using sjPlot [64].

Soil pH
For soil pH, the best variable representing soil properties was sediment type (AICc weight = 0.99), which was clearly better than the second-best variable bedrock calcium content + clay (AICc weight = 0.01) (Table S1). Topographic wetness was best represented by SWI at 1 m resolution (AICc weight = 0.51), followed by DTW with a flow initiation threshold of 8 ha (AICc weight = 0.13). DEV30m was the best variable representing topographic position (AICc weight = 0.86), followed by TPI30m (AICc weight = 0.14) ( Table S1).
The multiple regression model comparison ranked these models as the top three: The best model carried 45% of the cumulative model weight and explained a high amount of the observed variation (adj. R 2 = 0.91, or 0.90 if the location was excluded). Including SWI1m or slope did not improve model fit ( Table 2).
The top-ranked model predicted soil pH to be highest in area 2, but the confidence intervals were overlapping with area 1 (Figure 2). Topographic position predicted higher soil pH in terrain depressions and foot slopes and decreasing soil pH when moving upslope, with the lowest soil pH predicted at terrain ridges. For sediment type, soil pH was highest for clay sediments. The effect of the other sediment types could not be clearly distinguished from each other, though sites with thin layers of glacial tills and bare rock tended to predict somewhat higher soil pH than thicker layers of glacial tills. For peat sediments, we only had two observations, which makes the predictions less reliable. Soil pH was predicted to be lower on sites with low site index compared to medium. High site index predicted medium soil pH, but confidence intervals were overlapping with both low and medium site index (Figure 2). Table 2. The top three highest-ranked multiple regression models explaining variation in soil pH, soil moisture and ground vegetation cover. Included is the adjusted R 2 for the top-ranked soil pH model, and pseudo-R 2 for the top-ranked models for soil moisture and ground vegetation cover.

Response
Model

Soil Moisture
For soil moisture, the best topographical wetness variable was SWI1m (AICc w = 0.31), but DTW with a flow initiation threshold of 1 and 2 ha were both clustered w ΔAICc less than 1, indicating similar support as for the top-ranked model (Tabl Topographic position was best represented by DEV30M (AICc weight = 0.71), foll by TPI30m (AICc weight = 0.29). The best variable representing light availability wa SSR global model (AICc weight = 0.28); however, the SSR diffuse light model, HLI d light model and SSR direct light model all had ΔAICc less than 1, indicating that the similar support as the top-ranked model (Table S2).
The three top-ranked models were: The top model carried 20% of the cumulative weight and explained 83% of th served variation (Table 2). In the second-best model, sediment type was replaced by tion. Including soil depth did not improve model fit (third-ranked model, Table 2).
Medium site index predicted a higher probability for high soil moisture (level 0)

Soil Moisture
For soil moisture, the best topographical wetness variable was SWI1m (AICc weight = 0.31), but DTW with a flow initiation threshold of 1 and 2 ha were both clustered within ∆AICc less than 1, indicating similar support as for the top-ranked model (Table S2). Topographic position was best represented by DEV30M (AICc weight = 0.71), followed by TPI30m (AICc weight = 0.29). The best variable representing light availability was the SSR global model (AICc weight = 0.28); however, the SSR diffuse light model, HLI diffuse light model and SSR direct light model all had ∆AICc less than 1, indicating that they had similar support as the top-ranked model (Table S2).
The three top-ranked models were: The top model carried 20% of the cumulative weight and explained 83% of the observed variation (Table 2). In the second-best model, sediment type was replaced by location. Including soil depth did not improve model fit (third-ranked model, Table 2).
Medium site index predicted a higher probability for high soil moisture (level 0) compared to low and high site index (Figure 3), the latter predicting a higher probability for somewhat moist areas (level 1). Site index did not separate between semi-dry and dry areas (levels 2 and 3). The moistest sites were somewhat more probable in areas with low radiation load, although the effect was uncertain. The semi-most sites were most probable in areas with medium radiation load, but the confidence intervals were wide at both high and low amounts of radiation, suggesting that the effect was uncertain (Figure 3). Semi-dry or dry areas did not seem to be predicted by solar radiation, though confidence intervals did suggest that areas with high solar radiation load may predict semi-dry areas. High soil moisture sites were not clearly predicted by any sediment type. Semi-moist areas were most probable on clay. Thin glacial tills and bare rock areas did also seem to predict semi-moist areas, but here, confidence intervals were wider, indicating predictions with low precision. Semi-dry areas were most probable in areas with thicker glacial tills, but with high uncertainty. The driest areas did not seem to be predicted by any sediment type. For topographic position, terrain depressions predicted the wettest areas, and semi-moist areas were predicted by all other topographic positions, except terrain depression. However, the wide confidence intervals suggested that the effect may be uncertain ( Figure 3). In general, the model did not perform well at predicting the driest soil moisture class (level 3). This was most likely due to the low sample size, i.e., four observations. Remote Sens. 2022, 14, x FOR PEER REVIEW 9 of 16 were most probable on clay. Thin glacial tills and bare rock areas did also seem to predict semi-moist areas, but here, confidence intervals were wider, indicating predictions with low precision. Semi-dry areas were most probable in areas with thicker glacial tills, but with high uncertainty. The driest areas did not seem to be predicted by any sediment type. For topographic position, terrain depressions predicted the wettest areas, and semi-moist areas were predicted by all other topographic positions, except terrain depression. However, the wide confidence intervals suggested that the effect may be uncertain ( Figure 3).
In general, the model did not perform well at predicting the driest soil moisture class (level 3). This was most likely due to the low sample size, i.e., four observations.

Ground Vegetation Cover
For ground vegetation cover, the best variable representing light availability was the

Ground Vegetation Cover
For ground vegetation cover, the best variable representing light availability was the SSR model with diffuse light (AICc weights = 0.43), followed by canopy cover with a 30 m resolution (AICc weight = 0.13) (Table S3). Topographical wetness was best represented by DTW with a flow initiation threshold of 10 ha (AICc weight = 0.15), but the alternative variables were all within ∆AICc less than 2, implying that they had similar support. DEV30M was the best variable representing topographical position (AICc weight = 0.28), but not clearly better than the other variables that were all clustered within ∆AICc less than 1. CA.sed was the best variable for soil properties (AICc weight = 0.65), followed by CA.bed (AICc weight = 0.32) (Table S3).
For the multiple regression model selection, these models were ranked as the top three: The top-ranked model carried a rather low amount of the cumulative weight (18%) but explained a fair amount of the observed variation (pseudo-R 2 = 0.65). Including topographical position or slope did not improve model fit (Table 2).
Ground vegetation cover was predicted to be higher in pine and deciduous forest compared to spruce forests (Figure 4). Pine and deciduous forest were more similar, with overlapping confidence intervals. Vegetation cover was predicted to increase almost linearly to an increasing amount of diffuse light. The uncertainty was highest under low light conditions. The effect of light availability on ground vegetation cover varied with forest type. In spruce forest, vegetation cover increased rapidly with an increasing amount of light but leveled off when the light levels reached around two-thirds of the maximum level. The predictions in the deciduous forest were similar, but here, the ground vegetation cover was predicted to be somewhat higher than in the spruce forest, under the same light conditions (Figure 4). When light levels reached around two-thirds of the maximum level, the predicted ground vegetation cover in the deciduous forest started to converge with that in the spruce forest. In the pine forest, the predicted ground vegetation cover was higher than in the other forest types under low light conditions. Under high light availability, ground vegetation cover was lower in pine forest compared to the other forest types (Figure 4).
Ground vegetation cover was predicted to be higher in pine and deciduous forest compared to spruce forests (Figure 4). Pine and deciduous forest were more similar, with overlapping confidence intervals. Vegetation cover was predicted to increase almost linearly to an increasing amount of diffuse light. The uncertainty was highest under low light conditions. The effect of light availability on ground vegetation cover varied with forest type. In spruce forest, vegetation cover increased rapidly with an increasing amount of light but leveled off when the light levels reached around two-thirds of the maximum level. The predictions in the deciduous forest were similar, but here, the ground vegetation cover was predicted to be somewhat higher than in the spruce forest, under the same light conditions (Figure 4). When light levels reached around two-thirds of the maximum level, the predicted ground vegetation cover in the deciduous forest started to converge with that in the spruce forest. In the pine forest, the predicted ground vegetation cover was higher than in the other forest types under low light conditions. Under high light availability, ground vegetation cover was lower in pine forest compared to the other forest types (Figure 4).

Discussion
By combining RS data and natural resource information, we were able to build models explaining moderate to high amounts of the observed variation in soil pH, soil moisture and ground vegetation cover.
Sediment type was an important variable for predicting both soil pH and soil moisture. For soil pH, the effect of sediment type was due to the difference between clay and glacial tills. Similarly, Lamarche et al. [65] found that forest soils had higher pH when originating from clay compared to glacial tills. The effect may partly be caused by differences in soil texture, with clay having finer and easier weatherable particles, releasing basic cations at a higher rate compared to coarser particles. In addition, finer particles bind cations better, dampening the leaching rate [20]. However, both clay and glacial tills may vary in mineral content [66,67], which may lead to differing effects on soil pH (e.g., Gruba and Socha [68] found that clay content was positively correlated with total acidity). Calibrating the model with soil samples may be sensible when using it in areas with differing sediment conditions. This will still be less resource-demanding than creating field-based maps of soil pH where sediment type varies less than soil pH. For soil moisture, the effect of sediment type was also due to the difference between clay and glacial tills, clay being related to higher soil moisture than tills. This was expected since clay has a higher water holding capacity than glacial tills [20]. In a similar study, sediment type was found to be an important variable in predicting soil moisture [69]. Here, glacial tills were also the driest sediment type (clay was not included). In contrast, a similar study found that sediment type, including clay and glacial tills, was not important for predicting soil moisture [8]. This study included plots from a large geographical area, meaning that large-scale processes likely were more important in this study compared to ours. In addition, contrary to our study, they included elevation in the analysis. We omitted this variable due to its correlation with sediment type, and since sediment type has a more direct relationship to soil moisture. Including variables that possibly correlate with several factors affecting the response variable may bring about results where variables that have indirect and vague relationships to the response outperform variables that are more directly related to it. This can give the impression that the indirect variable is important when predicting soil moisture in general, but this may not be the case when moving outside the study area.
We expected that bedrock calcium content would be an important predictor of soil pH, as it has been in other similar studies [70,71]. This was not the case in our study, even when we accounted for the decoupling effect of the thick layers of marine clay in study area 2. One explanation may be the glacial movement and transportation of material, which leads to the calcium content of the sediments likely better representing upstream bedrock type. The coarse scale of the bedrock map may also have played a part, as it does not capture all small-scale variations.
Topography is important when explaining soil pH and soil moisture, as indicated by DEV30M being represented in the top-ranked models for both properties. This is most likely due to its control over water drainage and accumulation, including transportation of material and basic cations [21]. Similar studies have found that topography is related to soil pH and moisture, but it is not clear which topographic variable best represents this mechanism. Li et al. [7] found that slope and cross-slope curvature explained soil pH better than topographic wetness (topographic position was not included) in a broad-leaved forest in China. Contrarily, Baltensweiler et al. [72] found that terrain wetness explained more of the variation in soil pH than cross-slope curvature in a mountain forest in the Swiss Alps. Similarly, Aagren et al. [73] found that depth-to-water table was a better predictor of soil moisture compared to topographic position in a boreal forest in Sweden. In Finland, Kemppinen [70] had more success predicting soil moisture using the Saga Wetness Index (SWI) than a Topographical Position Index (TPI). This contrasts with our study, as we found that topographic position was a better variable than both DTW and SWI. It is not clear why the best variable varies between studies, but it has been suggested that soil transmissivity, variation in topography and local climate may affect the performance of topographic indices and what represents the optimum scale [73].
Site index was included in the top-ranked model for both soil moisture and soil pH. Since most plots with high site index were located on clay sediments, the effect was confounded, and high site index was most likely caused by the clay sediment. Medium and low site index differentiated between higher and lower soil pH and moisture for the plots located on glacial tills. Site index represents forest productivity, which is caused by the combined effect of soil moisture, soil pH and climatic conditions. Thus, it can be used as an indicator for these three factors, as shown in our analysis. We did not use a remote sensing-based version of the index, but it can be estimated with relatively high accuracy using bitemporal lidar data [74].
Light availability or radiation load was included in the top-ranked model for both soil moisture and ground vegetation cover. For soil moisture, the association was most likely due to the effect of radiation load on evapotranspiration [28,75]. Here, the best variable was one of the most comprehensive variables, including the effect of solar angle, topography, canopy and both direct and diffuse radiation. This suggests that higher precision can be achieved when using more complex estimations of solar radiation load, in comparison to using more simple methods, such as topographic aspect.
As expected, ground vegetation cover increased with higher light availability. Although light is known to be an important limiting factor for plant establishment and growth [30], this does not always translate into a detectable association between light availability and vegetation cover. For example, Tinya et al. [76] did not find a correlation between the herbaceous cover and the amount of measured diffuse light inside a mixed temperate forest in Hungary. This was assumed to be due to poor establishment of herbaceous species in nutrient-poor soils. This suggests that how one defines the ground vegetation cover is important. We included species that can tolerate both nutrient-poor soils (dwarf shrubs) and dry soils (lichen); thus, we did not experience the same lack of correlation. Our top-ranked model did, however, have other issues. Of the three ecological properties included in our analysis, we had the least success explaining ground vegetation cover, possibly because we did not account for dead trunks and snags in our model, which led to lower estimated vegetation cover. In addition, estimating vegetation cover visually is not a precise form of measurement. Measurement errors are unavoidable and create noise in the data.
Forest type was included in the top-ranked model for ground vegetation cover, both as an additive factor and in interaction with light availability. The amount of light reaching the forest floor depends on the amount of absorption, reflection and transmittance of the light as it travels through the canopy [77]. Tree species may modulate these three factors through differences in leaf angle, canopy structure and density [28]. These differences are probably not fully accounted for by the lidar-based variables, so including tree species or forest types gives better predictions. In our study, deciduous forests had higher estimated light availability than spruce forests. Unlike us, Renaud et al. [78] found that deciduous forests had lower levels of available light below the canopy compared to spruce forests. In their study, the deciduous forests consisted mostly of beech (Fagus sylvatica), which are generally much darker than the deciduous forests in our study area. This means that dominating tree species may be a better predictor than forest type.

Conclusions
In this study, we found that soil pH, soil moisture and ground vegetation cover may be modeled using a combination of remotely sensed data and natural resource maps. By choosing variables based on causal or well-known correlational relationships to the response, we were able to build more easily interpretable models that explained medium to high amounts of the observed variation. We found that topographically controlled water drainage patterns, which may affect both soil moisture and pH, are one of the more challenging variables to select in advance because the optimum method and/or scale seem to vary between study areas. However, it may be possible to select the best variable based on existing knowledge of a selected study area (e.g., sediment types, local climate) and some measures of topographic variation estimated from digital elevation models. Future studies should focus on testing the relationship between these factors and the optimum topographical wetness variable for a better understanding of why and when the optimum processing method and spatial scale vary, and how to select the optimum variable without performing a pre-study field survey. If this can be sorted out, it would be helpful for future ecological studies and other types of studies, as well for management projects, where remote sensing data can be of great use.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/rs14205207/s1, Table S1: Ranking of single regression models for soil pH using AICc scores.; Table S2: Ranking of single regression models for soil moisture using AICc scores.; Table S3: Ranking of single regression models for ground vegetation cover using AICc scores.; Table S4: Model summary of the highest ranked multiple regression models for soil pH, soil moisture and ground vegetation cover. Data Availability Statement: Data will be shared upon reasonable request to the corresponding author.