Mapping the Groundwater Level and Soil Moisture of a Montane Peat Bog Using UAV Monitoring and Machine Learning

One of the best preconditions for the sufficient monitoring of peat bog ecosystems is the collection, processing, and analysis of unique spatial data to understand peat bog dynamics. Over two seasons, we sampled groundwater level (GWL) and soil moisture (SM) ground truth data at two diverse locations at the Rokytka Peat bog within the Sumava Mountains, Czechia. These data served as reference data and were modeled with a suite of potential variables derived from digital surface models (DSMs) and RGB, multispectral, and thermal orthoimages reflecting topomorphometry, vegetation, and surface temperature information generated from drone mapping. We used 34 predictors to feed the random forest (RF) algorithm. The predictor selection, hyperparameter tuning, and performance assessment were performed with the target-oriented leave-location-out (LLO) spatial cross-validation (CV) strategy combined with forward feature selection (FFS) to avoid overfitting and to predict on unknown locations. The spatial CV performance statistics showed low (R2 = 0.12) to high (R2 = 0.78) model predictions. The predictor importance was used for model interpretation, where temperature had strong impact on GWL and SM, and we found significant contributions of other predictors, such as Normalized Difference Vegetation Index (NDVI), Normalized Difference Index (NDI), Enhanced Red-Green-Blue Vegetation Index (ERGBVE), Shape Index (SHP), Green Leaf Index (GLI), Brightness Index (BI), Coloration Index (CI), Redness Index (RI), Primary Colours Hue Index (HI), Overall Hue Index (HUE), SAGA Wetness Index (TWI), Plan Curvature (PlnCurv), Topographic Position Index (TPI), and Vector Ruggedness Measure (VRM). Additionally, we estimated the area of applicability (AOA) by presenting maps where the prediction model yielded high-quality results and where predictions were highly uncertain because machine learning (ML) models make predictions far beyond sampling locations without sampling data with no knowledge about these environments. The AOA method is well suited and unique for planning and decision-making about the best sampling strategy, most notably with limited data.


Introduction
On a global scale, peatlands as a group of wetlands are essential for the landscape and environment because they provide a wide range of critical ecosystem services [1], water-quality improvement, flood abatement, and habitat functions. However, peatlands are decreasing globally in terms of their condition and diversity through habitat loss, climate change, and pollution [2]. Peat bogs, a subdivision of peatlands, in particular, are significant water-logged reservoirs with a specific hydrological regime [3], representing habitats dominated by sphagnum moss, stored organic matter, a high water table, and low pH [4,5]. The Sumava Mountains (Sumava Mts., Bohemian Forest) feature the most extensive peat bog complex in Central Europe due to their specific geology and morphology [6].
Similar upland peat bogs occur in Canada, Scotland, and Scandinavia. However, Sumava's mountainous peat bog complexes have a specific regime concerning a humid climate and optimal relief configuration [6] to study specific dynamics over time. Various human activities have affected the peatlands in the Sumava Mts. in past decades [7]. Among these human activities, artificial surface drainage is one of the most critical threats [8], affecting approximately about 70% of the peat bogs in Sumava National Park (Sumava NP).
To understand the hydrological regime and the effects of human impacts on the peatlands in Sumava NP, extensive research on hydrology [6,[9][10][11][12][13][14][15], vegetation [16], soil [14], restoration of mires [8,15], and soil water dynamics [17] have been conducted. All these studies required invasive fieldwork or analysis, requiring demanding and recurrent field monitoring and surveying campaigns. Peatland complexes represent the zone with the highest status of nature protection with restricted access and limited invasive surveying and monitoring activities. For such specific environments, there is the emerging importance of novel monitoring approaches that minimize the impact on protected areas.
Recent advances in drones (a.k.a., unmanned aircraft systems, UASs, unmanned aerial vehicles, UAVs, or remotely piloted aircraft systems, RPAS) and the miniaturization of advanced sensors present new potential for noninvasive methods to monitor natural processes, even in inaccessible or sensitive areas. Due to sensor miniaturization, it is now possible to choose between different camera systems spanning from heavyweight LiDAR (light detection and ranging) to lightweight RGB, multispectral, and thermal mapping sensors [18], enabling unforeseen future research fields. These abilities make drones particularly attractive for ultrahigh-resolution imagery and 3D point clouds, independent of weather and cloud coverage to capture small-scale variabilities of the land surface in high detail. Recent improvements in drone technology have provided exciting new abilities for peatland observation, providing a noninvasive alternative to conventional fieldwork. Earlier, these skills allowed the investigation of a high-resolution classification of South Patagonian peat bogs and showed potential gaps in upscaled CH4 fluxes using UAV system color-infrared (CIR) imagery [19]. Additionally, [20] examined Alberta's water table dynamics in Canada using drones and demonstrated the potential for measuring over large areas using UAV technology, thus providing insights into the impacts of disturbances on hydrology and related ecosystem functions. [21] used UAV remote sensing to map water depth over large areas to show methane release in a treed bog ecosystem in northern Alberta, Canada. A year later, [22] studied seismic lines generated by CH4 flux estimates based on microtopography and depth to water features using photogrammetric inputs. Additionally, [23] used thermal imagery equipped on a UAV to identify groundwater discharges and the usefulness of thermal validation concerning process-based restoration goals in Massachusetts, USA.
Further, [24] used UAV-based thermal infrared (TIR) imaging and color infrared imaging, combining stable water isotope studies to identify treatments along a mining area in Finland. Most studies practiced one or two sensors in studying peatland environments. To overcome these issues, [25] used multiple sensor records with a fixed-wing UAV at two rewetted fen sites in northern Germany (Mecklenburg Western Pomerania) and classified spectrally similar vegetation. That study made satisfactory predictions of vegetation types based on two multitemporal and multi-sensor datasets, via machine learning (ML).
Frequently, predictive modeling using ML has been used to show spatial and spatiotemporal patterns and changes in the environment [26][27][28][29][30]. Additionally, predictive modeling is commonly used with field data to train demographic models with spatial continuous predictor variables obtained from remote sensing images [28,30]. Those models can typically make predictions for an entire area of interest [30], even exceeding training data locations without knowing about these environments, which is one of the drawbacks of conventional ML methods, such as random forest (RF) [31]. Moreover, ML has been criticized for overfitting [30,32]; for example, complex models with many parameters can obtain both global trends and noise-generated intentions simultaneously. In this case, the model can be outwitted into thinking that the noise encodes real information. This overfitting yields a steep decrease in predictive performance [32]. To guard against vague predictions beyond sampling locations and overfitting, [33] proposed an improved model that yielded better performance. Therefore, we took advantage of their ML prediction strategy using forward feature selection (FFS) for optimal variable selection in conjunction with the leave-location-out (LLO) cross-validation (CV) method to: (1) reduce the overfitting of noisy variables; (2) not estimate model performances based on the random k-fold CV but rather apply this target-oriented validation strategy in the context of unknown space; and (3) estimate the area of applicability (AOA) to find the area where prediction is positively specific or uncertain using the dissimilarity index (DI) [33].
Our research aimed to obtain information and predict the dynamic properties of groundwater level (GWL) and top-layer soil moisture (SM) in 2 dimensions and time (2D + T) based on data from digital surface models (DSMs), RGB, multispectral, and thermal data from drone imagery. We used a CAST spatiotemporal prediction model based on an RF algorithm and the FFS method in conjunction with target oriented LLO CV validation [32]. Compared to other models, the CAST model yields the advantage of dismissing unsuitable predictors and estimating the AOA automatically.
The specific objectives of this study were to: 1.
Collect UAV RGB, multispectral, and thermal imagery along with ground truth data on GWL and SM over ten months; 2.
Extract a suite of variables (microtopographic drivers, vegetational, and temperature information) using a multi-sensor (RGB, multispectral, and thermal sensors) UAV monitoring approach; 3.
Spatiotemporally predict GWL and SM by the CAST ML model and assessing the importance of selected drivers; and 4.
Simulate prediction certainties across study areas, mimicking the environmental reality.

Site Description and Climate
We conducted our study in the Rokytka Peat bog, one of the most extensive montane raised ombrotrophic peat bog complexes (250 ha) within the Sumava Mts., Czechia. We selected one of four large bog sites located along the Rokytka stream, a second-order stream of a river that drains an area of 0.6 sq. km with an altitude range between 1100 and 1260 m a.s.l. [12], at two test sites (  (Figure 1c). Both sites are surrounded by dead spruce with healthy seedlings in the lower part and various dwarf communities. The climate in this area is variable and subject to both oceanic and continental influences. Due to the specific morphology and peat bog open space area, the Rokytka catchment is one of the coldest places throughout the country in January or February ( Figure 2). Freezing temperatures can occur even during summer, and the average daily air temperature is 4.8 • C (Figure 2a). This locality also belongs to one of the Czech Republic's wettest parts. Humidity varies more during the day than during a year (Figure 2b), and solar radiation shows regular annual behavior with highs in summer and lows in winter ( Figure 2c).
All meteorological parameters except precipitation are measured in the experimental catchment with Fiedler devices without an external heating system during winter (Figure 3a). Therefore, several wind speed data gaps occurred when a wind propeller freezes (Figure 2d). Precipitation measured at nearby the Filipova Hut meteorological station at a similar altitude equaled 1239 mm yr −1 (1981-2010) [34]. However, the last Remote Sens. 2021, 13, 907 4 of 29 two years were dryer compared to the long-term average, with an annual precipitation of 1093 mm yr −1 (Figure 2e).
The experimental site contains a well-developed raised ombrogenous peat bog with the dominant soil type being Histosol with a depth varying from 0.5 m (LP) to approximately 6 m (UP).
Organic matter in the soil is heterogeneous and can be found in different stages of decomposition that vary throughout the peat bog. The storm hydrograph at the catchments (0.65 sq. km) where the experimental peat bog is located is highly variable, with quickly and steeply rising and falling limbs. The hydrological response to rainfall events was fast, and the recession to antecedent base flow occurred rather quickly (Figure 2e). The mean daily runoff equaled 3.4 mm per day (1297 mm y −1 ) at the outlet.
The UP is drained naturally into the local stream, and the vegetation of peat bogs varies based on the water regime. Grass parts without trees where this experiment occurred belong to areas where the groundwater table occurs near the surface. Soil is primarily saturated with water during the year. The LP hydrological response is probably affected by one drainage ditch. However, due to the high hydraulic conductivity of peat, the channel drains only a small buffer [15]. In general, UP and LP did not differ in SM or GWL, which can be proven by their similar vegetation cover (grass, moss).
Remote Sens. 2021, 13, x FOR PEER REVIEW 4 of 30 3a). Therefore, several wind speed data gaps occurred when a wind propeller freezes (Figure 2d). Precipitation measured at nearby the Filipova Hut meteorological station at a similar altitude equaled 1239 mm yr⁻ 1 (1981-2010) [34]. However, the last two years were dryer compared to the long-term average, with an annual precipitation of 1093 mm yr-1 ( Figure 2e). The experimental site contains a well-developed raised ombrogenous peat bog with the dominant soil type being Histosol with a depth varying from 0.5 m (LP) to approximately 6 m (UP).
Organic matter in the soil is heterogeneous and can be found in different stages of decomposition that vary throughout the peat bog. The storm hydrograph at the catchments (0.65 sq. km) where the experimental peat bog is located is highly variable, with quickly and steeply rising and falling limbs. The hydrological response to rainfall events was fast, and the recession to antecedent base flow occurred rather quickly (Figure 2e). The mean daily runoff equaled 3.4 mm per day (1297 mm y⁻ 1 ) at the outlet.
The UP is drained naturally into the local stream, and the vegetation of peat bogs varies based on the water regime. Grass parts without trees where this experiment occurred belong to areas where the groundwater table occurs near the surface. Soil is primarily saturated with water during the year. The LP hydrological response is probably affected by one drainage ditch. However, due to the high hydraulic conductivity of peat, the channel drains only a small buffer [15]. In general, UP and LP did not differ in SM or GWL, which can be proven by their similar vegetation cover (grass, moss).

Ground Survey
Field surveys and airborne campaigns occurred between August and November 2018, and between May and November 2019 for a total of 10 days of data collection (Figure 2; represented by red dashed lines). For the GWL measurements, we installed 63 water tubes around the study area, 43 at the UP (Figure 1d), and 20 at the LP (Figure 1d). For the wells, we employed 1 m long and 2.5 cm diameter PVC pipes (Figure 3b) perforated through nearly their entire length to allow water to infiltrate, and we manually installed them into soft peat soil. We measured the GWL within the wells in Figure 3h for each UAV survey using a blowpipe and a scale. We used the blowpipe to identify the tube's water level and a measuring tape to record the water ground-level depth, similar to [14]. Additionally, we collected 63 samples of the topsoil next to the wells (Figure 3d) at both sites, which were dried in the laboratory to obtain the gravimetric soil water content (GWC) of the surface peat material taken from the samples to determine moisture content. GWC is the mass of water per mass of dry soil in a given sample. The taken soil probe was first weighed and then dried in an oven at 105 • C for at least three days, depending on the soil material. The equation for GWC (θg) is: where Wet is the weight of the soil sample from the field and Dry is the weight of the dry soil sample. Further, we installed 12 Thermochron iButton (model DS1921G-F5#) loggers (hereafter referred to as Tbuttons, Figure 3c), 8 in the UP and four in the LP, to validate the topsoil surface temperature estimates during thermal imagery. In situ ground truthing is required for UAV thermography radiometric calibration due to spatiotemporal variations in air temperature, humidity, and atmospheric aerosol concentrations [34]. The Tbuttons logged the temperature at half-hour intervals with an accuracy of ±1 • C, sampling 2048 temperature readings that allowed thermal imagery calibration later (Section 2.4). We measured the exact position of the Tbuttons and wells with a Topcon HiPer SR geodetic global navigation satellite system (GNSS) receiver with a horizontal accuracy of 1.8 cm and a vertical accuracy of 2.5 cm using a real-time kinematic method with a virtual reference station.

UAV Platforms and Cameras
UAV campaigns were conducted with ground surveys. We used two different UAV platforms. For the first flight, a commercial Mavic Pro (DJI Co.) drone was used to acquire nadir-viewing aerial imagery of the peat bog surface. We attached a second gimbal (MAPIR Survey3 Mount) with a FLIR DUO R dual-sensor RGB/thermal camera on the drone to capture thermal band (IR) imagery in the spectrum from 750 to 1350 nm set to an interval of 1 sec (Figure 3e). The mount was at a fixed 17-degree tilt that oriented the camera nadir during the maximum forward surveying flight speeds of approximately 8 m/s (https: //www.mapir.camera/products/dji-mavic-pro-mapir-survey-3-single-camera-mount, accessed on 23 March 2020). The thermal data were saved in Radiometric JPEG format to preserve the temperature values for every pixel. The resolution of the thermal lens is comparatively low (160 by 1220 pixels). The FLIR camera did not have a GPS sensor; thus, we distributed black 0.5 m × 0.5 m targets as ground control points (GCPs) (Figure 3g). For the second flight, we equipped the second drone, a MikroKopter ARF XL, with a miniaturized camera (Tetracam µ-MCA Snap 6) to acquire near-infrared (NIR) images of 550 to 900 nm every second in continuous mode ( Figure 3f). We positioned five validation targets with known reflectances in the field (Figure 3h). We used, white 0.5 m × 0.5 m targets as GCPs for the multispectral overflights ( Figure 3f) and conducted RGB and thermal overflights at a height of 60 m to ensure visibility of the surface cover and the black targets visibility in the thermal images. The multispectral overflights were taken at a height of 80 m due to the narrow shooting angles of the Tetracam. We measured the GCPs with the GNSS receiver Topcon (Section 2.2). Five permanent GCPs at the UP and three permanent GCPs at the LP with additional temporary distributed GCPs each time during the field campaign served as co-registration of all scenes taken by the different sensors ( Figure 1d). All specifications Remote Sens. 2021, 13, 907 7 of 29 of drones and sensors are shown in Table S1.

Data Processing 2.4.1. RGB Scenes
The two test sites (UP and LP) were flown across ten times over with nearly every sensor (Table 1). Most flights were performed around solar noon or in the early afternoon and under stable cloud cover (no clouds or sparse cloud cover). We processed all data (RGB, multispectral, and thermal images) in Agisoft Metashape Professional 1.6.2 (https: //www.agisoft.com, accessed on 19 January 2021) high accuracy settings and only generic preselection. RGB dense clouds were generated using high settings and aggressive depth filtering and then converted to orthomosaics (3 cm/pixel) based on the DSMs. Point clouds were georeferenced to the EPSG:5514 coordinate system (S JTSK/Krovak East North) using permanent GCPs (Figure 1d) to evaluate alignment accuracy. Black targets that were visible by both the RGB and thermal sensors were positioned on the permanent markers ( Figure 3g). The GCPs were manually marked in three to four images for each GCP until the algorithm picked up their correct locations across all images. This referencing also optimized the sparse point cloud, thereby mitigating distortion effects [35]. The resulting RGB orthomosaics were then used to calculate a suite of 22 vegetation spectral index maps using the uavRst v0.5-4 [36] remote sensing toolbox in R (Table 1). We derived the topomorphometric features listed in Table 1 using the SAGA GIS modules in QGIS [37].

Multispectral Scenes
Before loading the multispectral imagery into Agisoft, we performed radiometric corrections (noise and vignetting reductions) following [38], using appropriate values taken under controlled laboratory conditions. Band 4 was set as a master channel for processing in Agisoft. We used very-high settings and aggressive depth filtering for multispectral dense cloud generation. Before converting the imagery to orthomosaics (0.03 m/pixel), we performed radiometric calibration using the values from the five reflectance targets based on laboratory values and the empirical line method (Section 2.3, Figure 3h) according to the method of [38]. This method yields better results than the rapid-based calibration correction of images taken during the field campaign, as stated in [39]. In this study, white targets were used for georeferencing the multispectral scenes ( Figure 3f). From the multispectral orthomosaics, the normalized difference vegetation index (NDVI) was calculated using the raster calculator in Metashape. Table 1. Peat bog data set spatiotemporal predictors and ground truth responsive variables.

RGI
Red-green ratio index [40] (red/green) Horizontal curvature intersecting with the x-y surface plane. Affects the convergence or divergence of water [47].

WEI Wind Exposition Index
Values below 1 indicate wind-shaded areas, whereas values above 1 indicate areas exposed to the wind.

TRI (1, 4)* Terrain Ruggedness Index (1)* [m]
A measure of terrain complexity/heterogeneity. It calculates the sum change in elevation between a grid cell and its neighboring grid cells. The moving window radius determines how many cells are used to calculate the elevation change [48]. Value is always ≥ 0 m, where 0 represents the minimum roughness.

VRM (1, 4)* Vector Ruggedness Measure
A measure of terrain complexity/variance that captures variability in slope and aspect in a single measure. Ruggedness is measured as the dispersion of vectors orthogonal to the surface within a specific neighborhood.
The radius of the moving window determines how many cells are used to calculate the change in ruggedness.

TPI Topographic Position Index
Difference between the elevation of the cell and the mean of the elevation in surrounding cells [49]. Value is positive when the point is higher than its surroundings, zero when in a flat area or mid-slope, and negative when lower than its surroundings.
TWI SAGA Wetness Index [37] Describes the tendency of an area to accumulate water.

Predictor Variables Definition Formula
Thermal information (Thermal scenes) Ground truth observation variables Describes the source of soil water as well precipitation in peatland ecosystems.

SM Gravimetric Water Content
Controls soil respiration that is defined as a sum of organic matter decomposition and root respiration.
* Topomorphometric variables calculated using variable radii. The lower radius indicates the immediate location conditions, whereas the higher radius indicates broader microtopographic microclimate conditions.

Thermal Scenes
We saved the thermal data as radiometric JPGs and then converted them to 16-bit tiffs, which are favorable for processing in Agisoft. Additionally, we used very high settings and aggressive depth filtering for thermal dense cloud generation. Georeferencing was performed using black GCP targets for the RGB scenes. From DSMs, thermal orthomosaics were created to show a thermal signature in DN values ranging from 0 to 65535. Usually, DN values can be converted to temperatures on the principle of the Stefan-Boltzmann Law [50] as pixel DN values of the thermal band are proportional to the flux of infrared radiation received by the thermal lens [51]. A simple solution is to apply the calibration method reported in [51], the so-called pixel-by-pixel analysis, where all DN values in the thermal band are converted to near-real temperatures. In this study, we used only Tbuttons during the flights to measure ground objects temperature (Section 2.2, Figure 3c). In some cases, the flights took longer than half an hour; thus, we used the mean value measured by the Tbuttons. Concerning the thermal orthomosaics, we created a point shapefile with Tbutton localities with a 1 m buffer centered on the object used to extract the mean DN of the ground object. We performed linear regression between the sampled temperatures and the corresponding DNs [52]. The regression equation was then used to convert the thermal orthomosaics to the relative temperature maps that were used in the raster calculator in Metashape.

Input Variables
This study's final training dataset consists of GWL measurements from installing tubes one meter deep at 63 locations distributed over two locations, where data was recorded one day during each of ten months (10 days, Table 1). We also sampled the top-surface soil moisture at the same 63 locations next to the tubes but only for seven months (7 days, Table 1); three months are missing because the peat bog's soil moisture does not change much during the autumn months and was skipped.
A suite of several auxiliary potential predictor variables was used in this study to predict GWL and SM. Several spatially continuous variables that describe the terrain and can be geometrically characterized by DSMs and many other predictor variables were defined as spatially constant but vary in time as temporal predictor variables. These variables are represented by index maps from orthoimages. Orthoimages are prone to creating many different ratio maps depending on the camera system. Thus, we used a suite of potential vegetation indices based on the RGB scenes as spatial and temporal indicators of GWL and SM (Table 1). Therefore, vegetation is responsive to water regime (i.e., dry surfaces cause changes in the surface temperature), and we also extracted the relative surface temperature from thermal scenes. TIR imaging is also used to estimate the temperature of various surfaces, which allows mapping of temperature patterns providing information on hydrological processes [24].
Additionally, the calculated NDVI data from multispectral orthoimages deliver valuable information with regard to land surface characteristics [53] and represent a key chal-lenge for understanding ecosystem respondents' climate change [54]. Table 1 shows the full list of predictors used in this study. The spatial resolution layer outputs from all predictors were obtained at 0.03 m. Figure 4 shows a simplified flow chart of the methodological approach. After creating all index maps for each month, we stacked each month's layers to gather information on environmental variables in space and time at the UP and LP based on limited field data for that day within a given month. We cropped all layer stacks to a smaller extent, and certain layer stacks contained 32, 33, or 34 layers (maximum number of rasters) due to missing data (see Section 2.4.4). In R, we used an automated data value extraction script based on a point shapefile (63 tube locations) to derive all values from each predictor variable for the training samples ( Figure 4). Each training sample contained extracted information from all potential predictor variables (Table 1) and the GWL and SM information as much as possible based on the reference points. Before making predictions, we eliminated highly linear correlated predictors with absolute correlations above 0.75, zero-and nearzero variance variables [55], and thus raster layers containing incomplete data to reduce redundant data due to correlation, NANs, and near-zero ( Figure 4) [32,56]. With this step, we did the first preselection of our input variables without losing information; however, we reduced the training time significantly during the model performance. Ideally, descriptors should be uncorrelated, as an abundant number of correlated features can hinder the model's efficiency and accuracy. When this phenomenon occurs, different feature selection is necessary to mitigate issues with dimensionality, simplify models, and improve their interpretability and training efficiency [31]. Many other statistics can be used for variable selection, e.g., T-test, F-test, or Mallows CP, whereby different criteria may lead to very different preselection of variables [57]. Most analysis is based on the CAST [33] and caret packages [58] in R (R Core Team, 2020), which provide a wrapper to the RF algorithm [31], implementing functionality for data splitting and CV (Figure 4). Caret is needed for model training. CAST supports the caret package to provide an accounting for spatiotemporal dependencies during prediction, and CAST includes the "FFS" function for optimal variable selection and the major functionality of the "aoa" function that is doing the distance-based estimation of the AOA as we aimed to model GWL and SM far beyond the data loggers to determine how well the model performs ( Figure 4). can hinder the model's efficiency and accuracy. When this phenomenon occurs, different feature selection is necessary to mitigate issues with dimensionality, simplify models, and improve their interpretability and training efficiency [31]. Many other statistics can be used for variable selection, e.g., T-test, F-test, or Mallows CP, whereby different criteria may lead to very different preselection of variables [57].Most analysis is based on the CAST [33] and caret packages [58] in R (R Core Team, 2020), which provide a wrapper to the RF algorithm [31], implementing functionality for data splitting and CV (Figure 4). Caret is needed for model training. CAST supports the caret package to provide an accounting for spatiotemporal dependencies during prediction, and CAST includes the "FFS" function for optimal variable selection and the major functionality of the "aoa" function that is doing the distance-based estimation of the AOA as we aimed to model GWL and SM far beyond the data loggers to determine how well the model performs ( Figure 4). In this study, we distributed the water wells at two locations to measure GWL and SM by sampling them next to the wells (Figure 5), and the main aim was to predict GWL and SM based on a set of potential spatial and temporal predictors (34 predictors), which are terrain related, vegetation indices, or reflect surface thermal properties. Each of the In this study, we distributed the water wells at two locations to measure GWL and SM by sampling them next to the wells (Figure 5), and the main aim was to predict GWL and SM based on a set of potential spatial and temporal predictors (34 predictors), which are terrain related, vegetation indices, or reflect surface thermal properties. Each of the wells represented a "Tube ID" (Figure 5a,b; red crosses). Among the validation strategies, k-fold CV is popular to estimate the model's performance because of data that have not been used for model training. During CV, models are repeatedly trained (k models), and with each model run, the data of one-fold are put to the side and are not used for model training but model validation. In this way, the performance of the model can be estimated. Typically, a random k-fold is used, which means that each of the folds contains the training data (i.e., to map GWL or SM) since we aimed to map GWL and SM instead of performing a target-oriented validation that validated the model in view to spatial mapping. We were not interested in the model performance given random subsets of our data locations, but we needed to know how well the model can predict areas without data loggers. To find this out, we needed to repeatedly leave the complete time series of one or more data locations and use them as test data during CV. To do this, we needed to create meaningful folds by using CAST's function "CreateSpaceTimeFolds," which is designed to provide index arguments used by caret's Among the validation strategies, k-fold CV is popular to estimate the model's performance because of data that have not been used for model training. During CV, models are repeatedly trained (k models), and with each model run, the data of one-fold are put to the side and are not used for model training but model validation. In this way, the performance of the model can be estimated. Typically, a random k-fold is used, which means that each of the folds contains the training data (i.e., to map GWL or SM) since we aimed to map GWL and SM instead of performing a target-oriented validation that validated the model in view to spatial mapping. We were not interested in the model performance given random subsets of our data locations, but we needed to know how well the model can predict areas without data loggers. To find this out, we needed to repeatedly leave the complete time series of one or more data locations and use them as test data during CV. To do this, we needed to create meaningful folds by using CAST's function "CreateSpaceTimeFolds," which is designed to provide index arguments used by caret's trainControl, where the data samplings' location defines a location. The index defined which data points were used for model training during each model run and reversely defined which data points were hindered. Thus, using the index argument, we could account for the data's dependencies by leaving the complete data from one or more data locations out like in LLO CV. We focused on LLO CV; therefore, we used the "Tube ID" to define the location of a data location (water well) and split the data into five-folds; thus, five model runs were performed, each leaving one-fifth of all data loggers out for validation. Furthermore, CAST's FFS variable selection was used to exclude counterproductive variables or will not improve the LLO performance and thus excluded from the model by FFS [33,59]. In that way, FFS selected variables led in combination with the highest LLO performance and the best spatial model. Consequently, FFS trained all possible pairs of two predictor variables, and the best predictor (hyperparameter settings) was used at the respective split to partition the data (mtry); thus, the optimal models with the smallest values. Based on these optimal models, predictor variables were iteratively increasing, and each of the remaining variables was tested for its improvement, reaching the best model. The variable importance is based on the out-of-bag MSE for an RF model [60]. The out-of-bag MSE is calculated for each tree before and after permuting a variable. The differences are averaged and normalized by the standard error. The importance scores were executed by the "varImp" function in R.

Model Description
The LLO CV error must be deemed the final performance indicator of spatial and spatiotemporal models [32]. Based on the trained model values extracted by point shapefile at 63 tube locations, we made spatial predictions of GWL and SM on the raster stacks containing all predictor variables' spatial data for that day of fieldwork. Thus, the trained model was used on these raster datasets, resulting in spatially comprehensive GWL and SM maps (Figures 6 and 7). At first sight, the predictions maps look good, but how accurate are they?
To analyze if the predicted models applied to the entire study area or if there were locations that were different in their predictor properties, [33] developed an approach to make predictions over an area of interest, determine the best model variables, and simulate how accurately the prediction performs far beyond the sampling locations in the AOA. We used that analysis for the proposed monthly models across the study area if there are different locations in their predictor properties from what the model has learned. In short, the proposed method implements a unitless measure for expressing how different a new data point is from the training data set, the so-called DI, a measure of unknown space, which proposes a threshold to define the AOA of a model. Consequently, the AOA is calculated using the model validated by CV. Second, it takes the spatial clusters into account and determines the threshold based on minimum distances to the nearest training point not located in the same cluster. This is done in the AOA function, where the folds used for CV are automatically extracted from the model. For example, we used a five-fold CV for model validation. The training data were split into five-folds where one-fifth (one fold) of the training data was left out during CV. Based on that one-fold held back during CV, the DI is calculated to all the other training data samples that have not been located at the same CV fold. In this way, the DI distribution can be seen within the training data and taken as a threshold by assuming the CV error was based on these DIs in our training data. Is DI smaller than the threshold, the predicted area is inside the AOA. Is the DI higher than the threshold, the predicted area is outside the AOA. Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 30   Thus, two data sets were used; the proposed training sampling locations, which were already trained and made the first selection of the best variables by the FFS; and the second dataset, which contained the raster stacks (32, 33, or 34 layers) with the new locations where predictions were used. In the end, we obtained a raster showing us the AOA, the area for which we expected the mode to make reliable predictions (Section 3.2). For more details, see [33]. To determine if relationships between the predictive models and the ground truth data existed, we plotted the GWL and SM variability (mean and median) (Figure 5c-f) at both sites to assist the proposed interpretation. Figures 6 and 7 show the individual simulated response predictions for GWL and SM's respective pixels for the entire UP and LP using the root-mean-square error (RMSE) as the metric for the optimal variable selection using the smallest values. Despite the differences in the final set of variables selected, the spatial predictions of the GWL and SM at both sites led to different results. The RF regression models' performance statistics, which only include variables selected in the FFS selection process, are shown in Table 2. GWL spatial predictions typically resulted in higher RMSE values but not necessarily in lower response variable predictions, as shown by the R 2 values ( Table 2). The best fit model for GWL prediction occurred in August 2019 (R 2 = 0.43) at the UP and in August 2018 at the LP (R 2 = 0.78). In comparison, SM spatial predictions resulted in lower RMSE values but showed a lower prediction power at both sites. The best fit model of the simulated SM prediction occurred in August 2019 (R 2 = 0.41) at the UP and in September 2019 at the LP (R 2 = 0.59). The importance of the different predictors ranged from 0 to 8 (GWL at UP), 0 to 5 (SM at UP), 0 to 3 (GWL at LP), and 0 to 5 (SM at LP) (Figures 8 and 9). These ranges represent the baseline for the variable weighting used to estimate the DI. The essential relative variables or feature importance ranking of RF for GWL predictions were temperature, RGB spectral variables (BI, GLI, CI, RI, and HUE), and one topomorphometric feature (TWI) at the UP ( Figure 7). Conversely, the LP importance variables for GWL predictions were NDVI, topomorphometric variables (PlnCurv, TWI, VRM1, and TPI), and, to a lower degree, RGB spectral variables (HUE and NDI) (Figure 9). The importance of predictive variables for the SM simulated prediction were mainly RGB spectral variables (BI, HI, ERGBVE, SHP, and HUE), NDVI and temperature at the UP and temperature, NDVI, topomorphometric predictor (VRM1), and RGB spectral variable (RI) at the LP (Figures 8 and 9).

AOA Calculations
The accuracies of model predictions were determined using AOA calculations. The AOA areas for which we expected the model to make reliable predictions are shown in Figures 9 and 10. AOA simulation models used the 95% quantile of the DI of all training data as a threshold suggested by [33], where the DI of the training data is calculated by considering the CV folds in the 5 fold case. Applying this threshold, the prediction error (RMSE) inside the AOA should be comparable to the trained model's CV RMSE. Conversely, the CV error does not apply outside the AOA [33]. Most simulated predictions showed reliable predictions (less of a gray area) except for a few (e.g., August 2018 and November 2018 for GWL and September 2018 for the SM simulation at the UP; Figure 10), where most areas were outside the AOA (gray area). Thus, predictions should not be considered for these locations or in further analysis. Reliable predictions were also made for the LP (Figure 11), except for October 2018 (GWL prediction) and for September 2019 (SM prediction), where the entire area is nearly gray. In future analysis, these two months might not be counted. Typically, low applicability values at the UP site appeared at the bush sites, over water surfaces, and at the corners and edges of the study area ( Figure 10). A LP site, low applicability values emerged between the high grass or the area edges ( Figure 11). Table 2. Statistical performance of the CAST models using cross-validation and a spatially constrained 5-fold based on the FFS method. Mtry value is the number of variables available for splitting at each model tree node. Model performance was assessed through the Root-Mean-Square-Error (RMSE), the Coefficient of Determination (R 2 ), Mean Absolute Error (MAE), and their respective Standard Deviations (SD). Reference estimates were assessed through MEAN, MEDIAN, SD, and the average score (MEAN) for both GWL and SM models, and reference data.      As a real-world example, we could not analyze the AOA reliability with simulated data [32,59,[61][62][63][64], where the prediction error within the AOA can be compared to the model error, assuming that the model error applies inside the AOA but not outside. Therefore, a simulated area-wide response is not available in usual prediction tasks; therefore, the AOAs estimation in the dataset used in this study had only point observations as a reference. We plotted the relationship between the DI threshold derived by the 95 quantile and the relative prediction performance measurements (Figure 12a,b).

Model Performance
Overall, using the CAST prediction method of implementing the FFS in conjunction DI provides a basis to estimate the AOA that allows the area for which the CV is suited accurately on average. Based on [33], if a location is similar to the training data properties, it will have a low distance in the predictor variable space (DI towards 0), while locations that are significantly different in their properties will have a high DI. DI threshold values differed between the GWL and SM predictions (Figure 12a). The GWL DIs were lower and more robust, ranging from 0.22 to 0.85, but were caused by higher predicted RMSE. Conversely, SM DIs were more widespread (0.084-1.23) and probably more sensitive, resulting in lower predicted RMSE values (Figure 12a). It was also optimistically approved by R 2 estimates, showing higher R 2 for higher GWL DIs and vice versa (Figure 12a). The R 2 for SM estimates resulted in a slighter opposite trend showing higher SM DIs by lower R 2 estimates. Individually, as observed in the boxplots, the GWL prediction models showed the highest absolute RMSE errors at lower DI thresholds at the UP, in which the lowest RMSE performances were at moderate to high DIs for the SM models at the UP and LP, respectively (Figure 12b). The best relative (R 2 ) model performances were made for the SM and GWL models at the LP with the highest DIs (Figure 12b) The statistical relationship between field data and the absolute model performances indicated a trend that by lower GWLs and higher surface water, the predicting models resulted in better prediction performances at both sites (Figure 12c). A similar trend was seen for the SM field data and their model performances, showing the models' lower SM better achievements (Figure 12d). The best model performances result in environmental conditions with relatively low GWL and SM than those to higher GWL and SM events (Figure 12c,d).

Model Performance
Overall, using the CAST prediction method of implementing the FFS in conjunction with target oriented LLO CV validation, modeling GWL and SM was probably not the best way to provide highly accurate prediction maps with high R 2 and low RMSEs. However, the proposed method preserved the modeling results with regard to overfitting and objective error assessment, a common problem in ML [32,59]. Thus, we did not consider longitude, latitude, and elevation as predictors that are typically responsible for overfitting due to spatiotemporal autocorrelation patterns stated in [32] that would probably result in better model performances. We only focused on the FFS's environmental predictors to minimize overfitting and retain the data's real spatiotemporal independencies (nonlinear and complex system), which is challenging to interpret physically by obtaining information on a handful of environmental variables in space and time.
By evaluating all RF performances (highest R 2 , lowest RMSE), we achieved the best results with the GWL and SM at the LP site, and the lowest variation in the predictions were found at the UP site. As a potential reason, lower performance might be caused by the test data, including, incomplete spatial coverage of the mapping area [65] and, more likely, incomplete coverage of the UAV maps. UAV mapping of an entire study area cannot be maintained constantly during field campaigns due to wind, battery, or camera failures, particularly for the Tetracam camera, which is sensitive to operational conditions. From the AOA predictions at the UP site, it is clear to move the sampling strategy from the bush site only to the open sites with grass, moss, and bare peat patches. Additionally, surface water is more easily viewed when the soil at the open sites is thoroughly saturated with water directly after precipitation or after snowmelt (e.g., May 2019, June 2019, August 2019, September 2019, November 2019), which is the primary source of extra inflow water to the peat bog. The remaining months (e.g., August 2018, October 2018, September 2018, July 2019) were less saturated, or the surface (bare peat patches) appeared to be nearly dry, especially at the UP (Figure 4).
In comparison, the LP site seemed to provide better GWL prediction. Most likely because the LP was saturated with water most of the time (Figure 4), the homogenous cover acted like a sponge and retained daily rainfall. In general, the lower the GWL level of the peat bog area is, the higher the surface water saturation and vice versa. This fact describes the dual-porosity organic soils [66,67]. Groundwater fluctuates in larger peat pores filled with mobile water. Undecomposed tissues of organic materials contain a nonnegligible volume of immobile water. Moreover, groundwater tables in tubes are affected by gravity rather than capillary forces in small capillary pores. These forces oppose each other, particularly during dry periods. However, this is not always the case due to other impacts, such as runoff, evaporation [68], and physical soil properties, such as peat volume change [69] or repellency [70]. Repellency can also affect the soil moisture distribution even several cm below the soil surface. The UP's surface runoff influences the LP; thus, the LP site was frequently flooded by water, resulting in lower and more diverse GWL measurements (Figure 4e), while the GWL in the UP was higher and relatively steady ( Figure 5c).
As mentioned above, SM predictions were marginally better at the LP than at the UP, which might be caused by homogenous vegetation cover and the marginally higher SM values measured at UP. Vegetation cover is known to play a vital role in the hydrometeorological cycle and local climate conditions [71]. Wet conditions, such as those at the LP, with high grass coverage and frequent saturated surface water along with heat energy are required to increase the soil temperature because water has a higher specific heat capacity than soil material, leading to an increase in SM and likely a better response to wetting and drying cycles, improving SM prediction [72]. The high variety of the SM DIs, which ranged widely, might be more sensitive to environmental factors than the GWL predictions, which showed more robust DI distributions (Figure 12a). This hypothesis, however, is part of the measurement in another experiment.

Interpretation of Model Predictions
Surface temperature is the most important predictor for the GWL and SM at both the UP and LP with regard to model importance. Surface temperature influences water movement in the soil: the higher the surface temperature is, the higher the evaporation of soil water. Organic soils are more resistant to evaporation, which is primarily caused by increasing temperature [73], for several reasons, including disconnection temperature conductivity during the drying process. However, transpiration can substitute for the loss of water from a peat bog. Hypothetically, a temperature rise in the future and water holding (the GWL) might be affected by higher transpiration. The GWL is considerably dynamic, and ombrotrophic bogs, such as the proposed experimental study area, are dependent on precipitation, which typically describes the high variability of the GWL [15]. Ref. [10] suggested that the GWL is specially controlled by the difference between precipitation and evapotranspiration, as stated in [14,74]. The temperatures of the surface water and GWL are affected by the amount of incoming solar radiation and the vegetation pattern. For example, the exposed bare peat patches absorb more heat in the UP, resulting in a rapid decrease in GWL and increased water temperature [15] when the peat is not dry with a high-temperature conductivity. Spots with higher temperatures aid water tracking and hydrograph separation. Beyond confirming the temperature-GWL-SM relationship, the RF provided insight into the usefulness and the impact of NDVI, as well as RGB spectral indices (BI, CI, GLI, RI, HUE, NDI, SHP, and ERGBVE), directly representing proxies for the estimation of vegetation response towards changes in hydroclimatic conditions [53,75,76]. Moreover, these predictors can be considered appropriate for assessing vegetation conditions in peatlands to characterize temporal and spatial patterns and the degradation process of a peat bog ecosystem [77], primarily NDVI, NDI, ERGBVE, SHP, and GLI. Interestingly, the predictors BI, CI, RI, and HI are important; thus, there seems to be a relationship between these factors, GWL and SM. Typically, these predictors are more prone to bare soil (in this study, bare peat coverage) and may provide accessory information about soil physical properties (e.g., about peat composition due to a lower water table during dry periods) [78]. Because SM decreases on a typically decomposed peat surface, high soil water retention may develop and yield development that has already been observed at the UP site. However, further investigations are required to clarify this specifically.
Further, the UP site contained the highest number of topomorphometric variables (TWI, PlnCurv, TPI, and VRM1) for GWL predictions and twice the number for SM predictions (VRM1) at the LP. The TWI is a measure of subsurface lateral transmissivity and has become a familiar and widely used method to gather information about the spatial distribution of wetness conditions (e.g., the position of shallow groundwater tables and soil moisture) [79]. PlnCurv, TPI, and VRM1 are topographic indices and thus play a significant role in GWL predictions with regard to dominant controls on groundwater level and runoff mechanisms [80]. Small local hydraulic gradients might control the downslope drainage condition, which determines GWLs [80]. As reported in [81], the probability of sphagnum decreases with increased ruggedness, elevation, and topographic position, which provides information on exposed bare peat. As reported in [82], VRM1 contributes to SM, particularly when using high-resolution DEMs; thus, VRM1 is a suitable proxy for soils with different porosities in terms of high roughness, highlighting that stony soils imply low soil moisture, and that low roughness describes developed soils that retain more moisture, as in peat. VRM1 measures vector dispersion across the central pixel rather than being a derivative of the slope, and is thus a much better proxy than others, such as TRI [83].
The proposed predictive models identify a variety of RGB, multispectral, and thermal maps as well as DSM-derived variables as suitable or complementary representatives to in situ measurements for GWL and SM predictions. However, we recommend going beyond their traditional use of, for example, elevation, slope, and aspect. Additionally, these derived variables, such as the RGB spectral indices and DSM-derived ones, are easy to compute proxies of environmental features that involve limited fieldwork and good knowledge of the peat bog's environmental peat bog status.

Conclusions
We tested how well the peat bog groundwater level (GWL) and soil moisture (SM) can be predicted using multi-source and multitemporal ultrahigh-resolution UAV maps with leave-location-out (LLO) cross-validation (CV) prediction models based on random forest (RF) machine learning (ML) algorithm. Additionally, forward feature selection (FFS) was used to prevent overfitting, and only the optimal predictor variables were used for spatial and temporal model predictions. In this case study, the predicted GWL and SM maps used 34 possible candidate predictors to visualize and access knowledge about the dynamic properties of two different peat bog locations in the Rokytka Moors within Sumava NP, Czechia. As input data, we used RGB spectral indices (RGI, VVI, VDVI, VARI, TGI, SI, SHP, SCI, SAT, NGRDI, NDTI, NDI, HI, GRVI, GLI, GLAI, ExG, ERGBVE, CI, BI, RI, and HI) and NDVI from multispectral data to obtain the vegetation response to changes in hydroclimatic conditions. We selected temperature soil information based on thermal data to explore the thermal potential on GWL and SM and topomorphometric variables (Slope, ProfCurv, PlnCurvc, WEI, TRI, VRM, TPI, and TWI) derived from DSMs to investigate microtopomorphometric controls on GWL and SM patterns. The created maps had a spatial resolution of 0.03 × 0.03 m. Predictions of the variable importance have proven the dominant impact of temperature on GWL and SM and extremely significant contributions of other predictors, such as NDVI, NDI, ERGBVE, SHP, GLI, BI, CI, RI, HI, TWI, PlnCurv, TPI, and VRM1. The predictions resulted in low (lowest R 2 = 0.12) to high (highest R 2 = 0.78) CV performances. We also mapped the area of applicability (AOA) of spatial prediction models (i.e., the area in which a model could expect to make accurate predictions). Prediction outside the AOA should be treated with caution. Most of the proposed predicted models yielded acceptable models except for a few where the incomplete spatial coverage of UAV maps was missing, and nearly all these areas were outside of the AOA. Primarily, ground truth sampling can be avoided in bush areas. The results showed that the DSM, RGB, multispectral, and thermal UAV imaging datasets were important in GWL and SM estimations to predict spatial distribution phenomena, especially the best model performances resulting in environmental conditions with relatively low GWL and SM than those to higher GWL and SM events. Thus, further research is needed both in sampling more ground measurements and in using additional profound statistical procedures to understand the causal structure of its underlying phenomena or interpretation of the ML models using UAV data. It is advisable to combine UAV input data with many relevant environmental variables used in this study when studying the dynamic mapping properties in spatially heterogeneous peat bog landscapes.
Supplementary Materials: The code and example input data of the lower peat bog are available online https://www.mdpi.com/2072-4292/13/5/907/s1. Table S1: Technical specification of drones and sensors used in this study. Funding: This research was funded by the Grant Agency of Charles University project GAUK78318 "Monitoring of peat bog habitats using UAV-based multispectral digital photogrammetry", Czech Science Foundation project 19-05011S "Spatial and temporal dynamics of hydrometeorological extremes in montane areas" and European Cooperation in Science and Technology COST CA16219 project "Harmonization of UAS techniques for agricultural and natural ecosystems monitoring". Data Availability Statement: R code used in the study and example input data of the lower peat bog are available via GitHub at: https://github.com/Theomania/peatbog-studies.git (accessed on 24 February 2021). Original datasets are available with the authors upon request.