Importance of Spatial Autocorrelation in Machine Learning Modeling of Polymetallic Nodules, Model Uncertainty and Transferability at Local Scale

Machine learning spatial modeling is used for mapping the distribution of deep-sea polymetallic nodules (PMN). However, the presence and influence of spatial autocorrelation (SAC) have not been extensively studied. SAC can provide information regarding the variable selection before modeling, and it results in erroneous validation performance when ignored. ML models are also problematic when applied in areas far away from the initial training locations, especially if the (new) area to be predicted covers another feature space. Here, we study the spatial distribution of PMN in a geomorphologically heterogeneous area of the Peru Basin, where SAC of PMN exists. The local Moran’s I analysis showed that there are areas with a significantly higher or lower number of PMN, associated with different backscatter values, aspect orientation, and seafloor geomorphological characteristics. A quantile regression forests (QRF) model is used using three cross-validation (CV) techniques (random-, spatial-, and cluster-blocking). We used the recently proposed “Area of Applicability” method to quantify the geographical areas where feature space extrapolation occurs. The results show that QRF predicts well in morphologically similar areas, with spatial block crossvalidation being the least unbiased method. Conversely, random-CV overestimates the prediction performance. Under new conditions, the model transferability is reduced even on local scales, highlighting the need for spatial model-based dissimilarity analysis and transferability assessment in


Introduction
The spatial distribution of deep-sea polymetallic nodules (PMN) is currently of high interest due to their high metal content of Mn, Fe, Ni, Co, Cu, or Li. These metals are needed for green and decarbonized technologies, such as electric cars and wind turbines [1,2]. The European Union alone will need 60 times more lithium and 15 times more cobalt by 2050 than today [3] for this transition. Deep-sea resources such as PMN could support this transition, with predictive spatial mapping having a key role in mining-block prioritization.
PMN spatial mapping advanced by using autonomous underwater vehicles (AUVs) to acquire large volumes of hydroacoustic and image data allows for high-resolution seafloor reconstruction at meter and even down to centimeter scales. The quantitative analysis of images enlightens the PMN distribution, narrowing the spatial gap that arises from the sparse ground-truth box-corer samples (usually > 1.8 km) with very limited sampling area (0.25 m 2 ) [4][5][6][7][8][9].
(d) The recently proposed method of dissimilarity index and area of applicability (AOA) [35]. AOA can identify geographical areas with novel feature space conditions that could hinder a reliable model transfer.
To our current knowledge, this is the first time that these topics are investigated for a regression random forests model applied for predicting PMN spatial distribution. Moreover, the literature research yielded only two studies in the marine environment (habitat mapping) that consider and include spatial-CV or/and the AOA, highlighting the need for further investigation. Similarly, QRF has been applied only rarely for seafloor spatial predictive modeling, despite the popularity of the RF algorithm in general [60].

Study Area
The Peru Basin is one of the largest PMN fields globally, with an average abundance of 10 kg/m 2 [61]. Although the abundance and metal grade are of economic value [1], only a few studies have examined the PMN spatial distribution for economic reasons, focusing on the northern part of the investigated area that exhibits a substantial spatial variance [62][63][64][65]. In this northern part, lies the "DISturbance and reCOLonization" (DISCOL) experimental area (DEA). Inside the DEA (Figure 1), disturbance experiments were conducted to assess the environmental impact produced by a plough harrow [66,67]. The DEA is spatially heterogeneous regarding the seafloor morphology, geochemical properties, and PMN distribution, with many authors highlighting the need for further research [62][63][64]68,69]. The mapped area extends north and south of the DEA (in the N-S direction) and towards the northeast (hereafter DEA-NE). The entire region lies between −4047 m and −4179 m water depth, which is slightly above the regional carbonate compensation depth (CCD) at −4250 m [63]. The DEA itself has low relief, gentle slopes (<3 • ), longitudinal abyssal furrows, and areas with low reflectivity backscatter values and no PMN, hereafter called black patches ( Figure 1). The sediments are layered clayey silts and silty clays, with foraminiferous residues and shell fragments [67,70].
The observed abyssal furrows strike perpendicularly to the regional contours and have a U-shape form. They are oriented parallel to the predominant NW bottom current flow [71,72]. Long-term studies showed that this deep flow is not stable but has periods of quasi-unidirectional strong currents (>5 cm/s but sometimes up to 17 cm/s) and periods with slower omnidirectional currents (1-3 cm/s). The recorded bottom current velocities are inside the velocity range that could preserve abyssal furrows [73]. The abyssal furrows act as bottom current channels, occasionally eroded during short periods of higher current velocities or even being relics from an earlier basin period, when the current regime was stronger [73][74][75]. Their preservation is supported by the low regional sedimentation rates of 0.4-2.0 cm/ka [76]. Past erosional bottom currents have also been proposed to be responsible for the formation of the black patches; these show Plio-Miocene carbonate-rich sediments as infilling [63,68,72].
The PMN-free black patches are easily recognizable due to their low backscatter reflectivity ( Figure 1). They have an ellipsoid shape, with their long axes oriented downslope. Their depth varies between 2-5 m with remarkable flat and horizontal seafloors [67]. This downslope direction has been observed in other low reflectivity PMN-free patches in Peru Basin, connected with downslope sediment transport [68].  [77]. The area is characterized by N-S orthogonal striking graben and horst structures, sea mountains, knolls and hills with steep slopes, local depressions, and pit structures [67]. Top right: ship-based MBES backscatter mosaic map of the same area. The AUV mapped DEA and DEA-NE areas are indicated as a shaded black polygon. Bottom left: (a) DEA and DEA-NE AUV-based bathymetry with photo locations crisscrossing the area (black points). The contours have a 2 m depth interval. GC: gravity corer [69], BCM: bottom current measurements [71]. Map (b) is an enlarged map showing the abyssal furrows. The furrow height is up to 2 m. Bottom right: (a) DEA and DEA-NE AUV-based backscatter map. The northern part has a higher backscatter intensity than the central and southern parts because of the higher number of PMN. Backscatter alternations are also created by the presence of abyssal furrows; (b) two of the three low reflectivity black patches (PMN-free) that can be identified in the DEA.
Larger-scale PMN-free areas with low backscatter reflectivity have also been observed for the Clarion-Clipperton Zone [15]. A recent gravity core inside a black patch in DEA ( Figure 1) revealed that its geochemical conditions differ significantly from the surroundings, having increased organic carbon content (0.5 wt%-0.8 wt%) [69]. The increased organic carbon content might have shifted the Mn-redox closer to the seafloor, where the diagenetically mobilized Mn is released into the bottom water and thus not supporting the PMN formation [68,69,78,79]. Photos inside the black patch confirmed the absence of surficial PMN ( Figure A1).
In the DEA-NE area, the seafloor is heterogeneous, with elongated and conical knolls and hills separated by local depressions. Visual inspection showed that the slopes have an extremely thin sediment coverage, while the local depressions act as sediment depocenters with increased sediment accumulation and talus debris [77]. The debris of basaltic fragments could act as nuclei supply to form new PMN [63,64]. PMN surficial coverage in this area varies, with black patches also present ( Figure 1). The high-reflectivity subareas are mainly due to an exposed basement, such as the two 10 m high outcropping volcanic cones of pillow basalt inside a crate-shaped structure (Figures 1 and A2). The  [77]. The area is characterized by N-S orthogonal striking graben and horst structures, sea mountains, knolls and hills with steep slopes, local depressions, and pit structures [67]. Top right: ship-based MBES backscatter mosaic map of the same area. The AUV mapped DEA and DEA-NE areas are indicated as a shaded black polygon. Bottom left: (a) DEA and DEA-NE AUV-based bathymetry with photo locations crisscrossing the area (black points). The contours have a 2 m depth interval. GC: gravity corer [69], BCM: bottom current measurements [71]. Map (b) is an enlarged map showing the abyssal furrows. The furrow height is up to 2 m. Bottom right: (a) DEA and DEA-NE AUV-based backscatter map. The northern part has a higher backscatter intensity than the central and southern parts because of the higher number of PMN. Backscatter alternations are also created by the presence of abyssal furrows; (b) two of the three low reflectivity black patches (PMN-free) that can be identified in the DEA.
Larger-scale PMN-free areas with low backscatter reflectivity have also been observed for the Clarion-Clipperton Zone [15]. A recent gravity core inside a black patch in DEA ( Figure 1) revealed that its geochemical conditions differ significantly from the surroundings, having increased organic carbon content (0.5 wt%-0.8 wt%) [69]. The increased organic carbon content might have shifted the Mn-redox closer to the seafloor, where the diagenetically mobilized Mn is released into the bottom water and thus not supporting the PMN formation [68,69,78,79]. Photos inside the black patch confirmed the absence of surficial PMN ( Figure A1).
In the DEA-NE area, the seafloor is heterogeneous, with elongated and conical knolls and hills separated by local depressions. Visual inspection showed that the slopes have an extremely thin sediment coverage, while the local depressions act as sediment depocenters with increased sediment accumulation and talus debris [77]. The debris of basaltic fragments could act as nuclei supply to form new PMN [63,64]. PMN surficial coverage in this area varies, with black patches also present ( Figure 1). The high-reflectivity sub-areas are mainly due to an exposed basement, such as the two 10 m high outcropping volcanic cones of pillow basalt inside a crate-shaped structure (Figures 1 and A2). The visual inspection revealed the absence of sediments in their current-exposed, steep sides. PMN were found in the base of the crater, mixed with talus debris [67,77]. Geochemical analyses of the GC sediments revealed depth-profile variations for several chemical compounds, which exceed the area variability of other stations, highlighting the need for further research [69].

Hydroacoustic Data
High-resolution MBES data were acquired with the AUV Abyss from GEOMAR [80,81]. Bathymetric data processing was performed with the QPS Qimera 1.7 and MB-System 5.7 software (Monterey Bay Aquarium Research Institute (MBARI) University of New Hampshire and MARUM, Handelsweg, The Netherlands) [82]. The backscatter processing was carried out with QPS FMGT 7. The finally generated GeoTIFF grids have a 3 m × 3 m cell size, projected in Universal Transverse Mercator (UTM) zone 16S. Parts of the MBES data have been presented already by [67,83], but here they have been reprocessed and merged into one unified dataset.

Seafloor Geomorphological Analysis
Sixteen derivatives of the bathymetric data were computed (Table 1), following the recommendations of the current literature in the field of quantitatively geomorphometric analysis of seafloor data [50,84]. Derivatives were calculated based on a 10-cell (30 m) neighborhood, which is the minimum defined size for some of the derivatives (e.g., vector ruggedness measure) [85]. However, there are bathymetric derivatives that are calculated from a 3 × 3 cell neighborhood (e.g., slope, aspect). For those derivatives, the mean bathymetry was calculated first, and afterwards the derivative was determined. This approach shows better results than grid resampling or derivative averaging [86]. The 30 m neighborhood relates to the AUV positioning uncertainty, ensuring that the correct seafloor derivatives values will be extracted for each photo location. In this respect, a spatial autocorrelation of the predictors can further reduce the impact of positional uncertainty on the ML accuracy [87,88]. Moreover, the spatial autocorrelation could eliminate existing MBES artifacts that could affect the predictive modeling [52,[89][90][91]. A multiscale approach was applied for the bathymetric position index, with finer and broader scales to better depict seafloor heterogeneity (0-30 m, 30-100 m, and 100-300 m). The aspect was transformed to northness and eastness [92][93][94]. Abbreviations and references to the algorithms of the calculated 14 derivatives are given in Table 1.

Optic Data
High-resolution photos were acquired by the deep survey camera system onboard the AUV Abyss [105]. The compact morphology-based nodule delineation (CoMoNoD) algorithm [9] was used for automated image analysis, providing the number of PMNs/m 2 . CoMoNoD has been used for quantitative assessment and predictive modeling already [7,8], in which the advantages and limitations of the algorithm have been discussed.
We need to highlight that the primary goal of SO242-1 was to re-map in high-resolution the seafloor (acoustically and visually) inside and outside the DEA, which had been ploughed in 1989 [66,67], to have data that can provide insight into the current state of the environmental status and change when compared to previous data acquired between 1989 and 1996 [67,77]. The crisscrossing AUV photo surveys were baseline exploration surveys about the general PMN occurrence and faunal distribution and were not meant to be a proper resource estimation survey. Generally, the optic data underestimate the number and abundance of PMN, and local correction factors (based on box-corer data from the photo locations) must be applied for a more realistic resource assessment [14,15,[106][107][108][109].
For compiling the ground-truth dataset, all photos inside and next to the plough disturbance tracks were excluded, as they do not represent the original seafloor state [67]. The degree of blanketing around the tracks varies [67]. Nevertheless, the PMN abundance estimation in the area of the full coverage photomosaic ( Figure 1) showed that the PMN could be effectively quantified, while it revealed the first signs of a correlation between PMN occurrence and seafloor morphology [9].
The optic data sampling in general has good geographical coverage (Figure 1), which is vital to efficiently depict the PMN spatial distribution trend, especially in local-scale studies such as ours [7,8,110]. The correlation points towards an underlying relationship between PMN and seafloor morphology that, synergistically with other environmental factors, influenced the PMN genesis and current spatial distribution. This is true, although the observed PMN number only represents the minimum number of nodules, as another part might occur within the sediment or due to a sediment cover that was not detected by the CoMoNoD algorithm.
In total, analyses of 30,000 photos were considered reliable and exported as ESRI shapefile in UTM 16S for further spatial analysis steps. The exported dataset was split into two independent datasets, creating the train and test dataset with 20,000 and 10,000 photos, respectively. The random data split without replacement was performed using the Subset Features tool (Geostatistical Analyst) in ArcMap 10.6 (© Environmental Systems Research Institute Inc. (ESRI), West Redlands, CA, USA); this tool ensures the same geographical coverage and statistical characteristics of the two generated datasets ( Figure A3). The training dataset was used for the spatial autocorrelation analysis, feature selection, and spatial modeling, and the test dataset was used only for the final model evaluation. An overview of the different processing steps of the presented modelling workflow is given in Figure 2.

Spatial Data Analysis
The presence of spatial autocorrelation was investigated using the local indicators of spatial association (LISA) and particularly the local Moran's I [16]. The local Moran's I identifies clusters with significant spatial aggregation of similar high (H-H) or low (L-L) values (i.e., many or few PMN). The index was calculated using the Cluster and Outlier Analysis (Anselin Local Moran's I) Tool (Spatial Statistics) in the ArcMap 10.6, according to the equations and null hypothesis provided as default from the software (i.e., the examined attribute is randomly distributed). The spatial relationship was based on the inverse Euclidian distance, and spatial weights were standardized to eliminate any bias that could be induced due to the uneven number of spatial neighbors. The analysis was based on 999 permutations. A false discovery rate correction was applied as the recommended approach to deal with the multiple testing and spatial dependency biases in large datasets; it provides significant clusters and outliers for a 95 per cent confidence level [111,112].

Spatial Data Analysis
The presence of spatial autocorrelation was investigated using the local indicators of spatial association (LISA) and particularly the local Moran's I [16]. The local Moran's I identifies clusters with significant spatial aggregation of similar high (H-H) or low (L-L) values (i.e., many or few PMN). The index was calculated using the Cluster and Outlier Analysis (Anselin Local Moran's I) Tool (Spatial Statistics) in the ArcMap 10.6, according to the equations and null hypothesis provided as default from the software (i.e., the examined attribute is randomly distributed). The spatial relationship was based on the inverse Euclidian distance, and spatial weights were standardized to eliminate any bias that could be induced due to the uneven number of spatial neighbors. The analysis was based on 999 permutations. A false discovery rate correction was applied as the recommended approach to deal with the multiple testing and spatial dependency biases in large datasets; it provides significant clusters and outliers for a 95 per cent confidence level [111,112].
A visual boxplot analysis was performed between the two spatial clusters and their underlying MBES derivative values. In addition, the non-parametric Wilcoxon-Mann-Whitney [42][43][44] was used to identify significant differences in the MBES derivatives between the two spatial clusters. Due to the large sample size (inherent sampling variability), the substantive significance (effect size) was additionally used together with the statistical significance (p-value) for the interpretation of the results, when the p-values are similar or the same [113]. The base [114] and rstatix R packages were used here [115].

Feature Selection
The results from the spatial autocorrelation analysis were used jointly with the Boruta analysis for the feature selection. The Boruta algorithm creates a shuffled copy of each predictor variable and calculates the average variable importance using the ranger-RF algorithm [45]. The variables with higher importance than their shuffled copies are considered relevant to the target variable (PMN). Since all the relevant variables have been identified, Spearman's rank correlation coefficient [116] was used to exclude correlated A visual boxplot analysis was performed between the two spatial clusters and their underlying MBES derivative values. In addition, the non-parametric Wilcoxon-Mann-Whitney [42][43][44] was used to identify significant differences in the MBES derivatives between the two spatial clusters. Due to the large sample size (inherent sampling variability), the substantive significance (effect size) was additionally used together with the statistical significance (p-value) for the interpretation of the results, when the p-values are similar or the same [113]. The base [114] and rstatix R packages were used here [115].

Feature Selection
The results from the spatial autocorrelation analysis were used jointly with the Boruta analysis for the feature selection. The Boruta algorithm creates a shuffled copy of each predictor variable and calculates the average variable importance using the ranger-RF algorithm [45]. The variables with higher importance than their shuffled copies are considered relevant to the target variable (PMN). Since all the relevant variables have been identified, Spearman's rank correlation coefficient [116] was used to exclude correlated features with coefficient values > 0-5. Between two correlating features, the one with the higher relevance was kept. The Spearman correlation was preferred instead of, e.g., the Pearson correlation, as it is a non-parametric measure, which assesses the presence of monotonic relationships while being robust to outliers and deviations from normality [117,118]. For executing the features selection, the Boruta [45] and GGally [119] R packages were used.

Quantile Regression Forests
Random forests is an ensemble machine learning method designed of multiple classification or regression trees [54]. Each tree uses a random subset of the training data through bootstrapping. The remaining data (out-of-bag data) are used for internal error validation. Each tree node is split using the best subset of predictors randomly chosen, minimizing the correlation among trees. A tree is developed until the maximum depth is reached; the final predicted value of the regression results after averaging all trees predictions is finished. The QRF extends the random forests approach by keeping all the predicted values for each observation. This information assesses the conditional distribution and, thus, quantiles can be estimated. The range between the maximum and minimum quantile for a single prediction expresses the model uncertainty for this prediction. Here, the 0.05th and 0.95th quantiles were used for the lower and upper prediction uncertainty value.
RF performs well with the recommended default hyperparameter values (e.g., minimal node size, maximal tree depth) [7,48,120]. Nevertheless, the default caret [121] ranger-RF optimization process was applied, focusing on the number of variables available for splitting (mtry) and the splitting criterion. The permutation variable importance and the partial dependence plots (PDP) were also calculated using the pdp [122] R package. In a subpart of this study area, previous RF modelling showed an overall good prediction performance based on the internal OOB data [83].

Cross-Validation Techniques
Three different cross-validation techniques have been tested.

Random k-fold Cross-Validation
Here, the model is repeatedly trained through random 10-fold CV, and its prediction performance is evaluated on the left-out fold data (k-1). Ten folds are recommended for large datasets, as they provide a good bias-variance trade-off [123].

Systematic k-fold Spatial-Blocking Cross-Validation
Ten spatial non-overlapping folds (sub-areas) with equal geographical coverage (2 km × 1 km) were created using ArcMap 10.6 [Create Fishnet Tool (Data Management)] ( Figure 3). The size of the block is a trade-off between the spatial autocorrelation range and the need for extrapolation [17,22,23,31]; it should minimize the influence of spatial autocorrelation in the training locations without creating extensive feature space differentiation among the blocks [17]. The Moran's I incremental analysis [Incremental Spatial Autocorrelation Tool (Spatial Statistics)] showed that the spatial autocorrelation quickly drops after the 1st km (<0.25) and approaches almost zero (<0.05) from 2 km onward ( Figure 3).

Feature Space k-fold Clustering Cross-Validation
As third method, the clustering large applications (CLARA) algorithm was used. CLARA [124] is a fast implementation of the partitioning around medoids [125] algorithm designed for large datasets. It uses an actual data point (medoid) as the center of each class, in which the sum of pairwise dissimilarities in this cluster is minimal; this method is robust to outliers. Only the predictors that resulted from the feature selection were clustered ( Figure 3). The optimal number of clusters was based on the Calinski-Harabasz index [126]. The clustering was performed with the R packages cluster [127], clusterCrit [128], and RStoolbox [129].
It needs to be mentioned that both the random and the spatial k-fold CV can only be applied inside the DEA, where AUV footage exists and an objective comparison between the three CV techniques is possible. Nevertheless, it provides the opportunity for transferring the model to a different neighboring area, the DEA-NE. The spatial/cluster-blocking integration within the model training was performed with the CAST R package [130]. Minerals 2021, 11, x FOR PEER REVIEW 9 of 34 (a1,b1) The sampling effort (here expressed as training points per km 2 inside each spatial/cluster block) is unevenly distributed among the blocks/clusters and within each block/cluster. Noteworthy, the AUV data acquisition did not aim at predictive modeling and was not optimized to achieve a balanced spatial sampling. More samples have been acquired in the central part of the area due to dedicated photomosaic survey [4,9]. (a2) PMN Moran's I value at varying distances; the Moran's I still does not reach zero beyond 2 km. (b2) Calinski-Harabasz index score for different numbers of clusters. Ten (10) is the optimum number in CLARA clustering, which is relatively high for (a1,b1) The sampling effort (here expressed as training points per km 2 inside each spatial/cluster block) is unevenly distributed among the blocks/clusters and within each block/cluster. Noteworthy, the AUV data acquisition did not aim at predictive modeling and was not optimized to achieve a balanced spatial sampling. More samples have been acquired in the central part of the area due to dedicated photomosaic survey [4,9]. (a2) PMN Moran's I value at varying distances; the Moran's I still does not reach zero beyond 2 km. (b2) Calinski-Harabasz index score for different numbers of clusters. Ten (10) is the optimum number in CLARA clustering, which is relatively high for the small area (≈17 km 2 ), indicating considerable spatial heterogeneity. The central and northern parts have higher variability.
(c) Boxplot of training (hold-in) and validation (hold-out) data that are used in every training iteration for each CV scheme (RanCV = random-CV, SpaCV = spatial-CV, CluCV = cluster-CV). We noticed that in each training iteration a similar number of training data is used, reducing the unequal number of sampling points in each block or cluster. (d) Two-dimensional representation of the clustered feature space with both independent and overlapping clusters. Due to the area heterogeneity, the first two principal components retained variation accounts only for ≈50% of the total variation.

Dissimilarity Index and Area of Applicability
The recently proposed approach of using the dissimilarity index (DI) and area of applicability (AOA) [35] is a model-based method that assesses geographical areas, which have new feature space conditions and where the prediction error of a given pre-trained model exceeds the training CV error. The DI is based on the Euclidean feature space distances between the training dataset and the respective data of the new area. Before the distance calculation, predictors are scaled and weighted according to their variable importance of the model training. Thus, the distances of more important predictors account more to the DI estimation. The 0.95th quantile (outlier removed) of the DI is used as the AOA threshold. DI values between 0 and 1 indicate similar conditions, while values >1 indicate dissimilarity. Conversely, AOA values closer to 0 indicate unknown conditions for the extrapolation. The CAST R package was used.

PMN Spatial Distribution and Spatial Autocorrelation
Plotting of the CoMoNoD results in a spatial context revealed a local scale heterogeneous PMN distribution, with higher numbers in the northeast and southwest parts of the studied area. Of particular interest is the central area within the photomosaic survey, where patches of high PMN numbers coexist next to areas with lower numbers. The PMN distribution follows the small-scale seafloor variations created by the abyssal furrows and the prevailing current regime ( Figure A4 Table 2). In detail, higher BS values are linked with H-H clusters having a large effect in the test. The broad-scale BPI (100-300 m) also has significant differences between the two groups, with an observable moderate substantive significance (effect size). In contrast, the smallscale BPI (0-30 m) has no statistically significant variation between the two clusters. Similar to the backscatter entropy (BSE), the aspect of the seafloor surface plays a role in H-H clustering, as areas with higher numbers of PMN tend to be north-faced oriented (i.e., northness values closer to 1).

Boruta Analysis and Feature Selection
Similar to the spatial analysis, the Boruta algorithm shows that BS, BPI100_300, and northness are important and relevant predictors. However, the MD and not the BS is the most relevant predictor ( Figure 5). Moreover, S and BSSD are ranked high, although both have not a significant difference between H-H and L-L groups ( Table 2). Opposite to the Wilcoxon-Mann-Whitney test, Boruta results indicate that all derivatives are relevant to predict the PMN distribution. Using all variables leads to a highly correlated dataset ( Figure A5) but excluding highly-correlated variables (r > ±0.5; in the Boruta importance score) results in the following predictors: MD, S, BS, N, BPI100_300, E, BSSD, and BPI0_30.

Boruta Analysis and Feature Selection
Similar to the spatial analysis, the Boruta algorithm shows that BS, BPI100_300, and northness are important and relevant predictors. However, the MD and not the BS is the most relevant predictor ( Figure 5). Moreover, S and BSSD are ranked high, although both have not a significant difference between H-H and L-L groups ( Table 2). Opposite to the Wilcoxon-Mann-Whitney test, Boruta results indicate that all derivatives are relevant to predict the PMN distribution. Using all variables leads to a highly correlated dataset (Figure A5) but excluding highly-correlated variables (r > ±0.5; in the Boruta importance score) results in the following predictors: MD, S, BS, N, BPI100_300, E, BSSD, and BPI0_30.

Model Training and CV Results
Based on the random-CV assessment, the QRF prediction performance was R 2 = 0.93. The same performance also results from the RF internal OOB error. However, when the spatial-and cluster-CV schemes were used, the prediction performance was reduced to R 2 = 0.19 and R 2 = 0.53, respectively ( Table 3). The performance is minimized when the spatial-blocking CV is applied. This shows that the model cannot efficiently transfer the predictions towards spatial blocks, where the SAC is low/absent and feature space extrapolation occurs to different degrees ( Figure A6). The effect of feature space extrapolation is also depicted in the clustering CV error assessment. Here, in every training repetition, the model is trying to predict a new feature space cluster. This feature space can be completely new, or it has overlap with other clusters to a varying degree (Figure 3). This fact combined with the varying degree of spatial distance among the training points (and consequently varying SAC) results in a higher training error variance than spatial-and random-CV ( Figure 6). The random-CV has a minimum training error variance due to the almost identical spatial and feature space characteristics of the randomly resampling folders. The performance analysis of each spatial/cluster block could provide in-depth information regarding the specific areas and seafloor clusters for which the model performs worst, potentially guiding future sampling ( Figure A7). predictions towards spatial blocks, where the SAC is low/absent and feature space extrapolation occurs to different degrees ( Figure A6). The effect of feature space extrapolation is also depicted in the clustering CV error assessment. Here, in every training repetition, the model is trying to predict a new feature space cluster. This feature space can be completely new, or it has overlap with other clusters to a varying degree (Figure 3). This fact combined with the varying degree of spatial distance among the training points (and consequently varying SAC) results in a higher training error variance than spatial-and random-CV ( Figure 6). The random-CV has a minimum training error variance due to the almost identical spatial and feature space characteristics of the randomly resampling folders. The performance analysis of each spatial/cluster block could provide in-depth information regarding the specific areas and seafloor clusters for which the model performs worst, potentially guiding future sampling ( Figure A7). The RF hyperparameter optimization differs between the spatial/cluster and random CV. In spatial-and cluster-CV, the optimum mtry value is eight (8), whereas it is five (5) in random-CV. This difference might result from the increased difficulty the model faces in spatial-and cluster-CV to extrapolate the predictions; it uses all the available information to gain predictive knowledge. In random-CV, the validation hold-out samples are identical to the training samples. Consequently, the model can predict well using fewer predictors.

Model Training and Sample Size
No significant correlation was found between the number of training samples used from the hold-in data and the prediction performance in the remaining (k-1) hold-out data (random-CV: −0.03 p = 0.93; spatial-CV: −0.11 p = 0.75; cluster-CV: 0.23 p = 0.51). The inclusion of all the available training points did not further improve the modeling performance, but resulted in a decrease, particularly in spatial-and cluster-CV (Table 3). This The RF hyperparameter optimization differs between the spatial/cluster and random CV. In spatial-and cluster-CV, the optimum mtry value is eight (8), whereas it is five (5) in random-CV. This difference might result from the increased difficulty the model faces in spatial-and cluster-CV to extrapolate the predictions; it uses all the available information to gain predictive knowledge. In random-CV, the validation hold-out samples are identical to the training samples. Consequently, the model can predict well using fewer predictors.

Model Training and Sample Size
No significant correlation was found between the number of training samples used from the hold-in data and the prediction performance in the remaining (k-1) hold-out data (random-CV: −0.03 p = 0.93; spatial-CV: −0.11 p = 0.75; cluster-CV: 0.23 p = 0.51). The inclusion of all the available training points did not further improve the modeling performance, but resulted in a decrease, particularly in spatial-and cluster-CV (Table 3). This decrement is attributed to the noise added to the model by using the additional 38% of data that are not spatially aggregated. This 'noise' could be the result of the inherent natural variability of PMN distribution and/or wrong analyses of the CoMoNoD algorithm, e.g., through noise or lower resolution image data (greater distance to the seafloor).

Model Performance in Test Data
The performance for the test data is the same for the three models (R 2 = 0.76), independently of the CV scheme followed. The higher predictive performance during the random-CV implies that data overfitting occurred. This is not the case for the spatial-and cluster-CV. At the end of the training cycle, the model has seen the same training data from the random folds, spatial blocks, and clusters and similar weights and relationships between predictors and response variables are established. The relationship type (e.g., monotonic, complex) and the marginal effect on the response variable are depicted as PDP in Figures 7 and A8-A10. The low-correlated dataset ensures that the assumption of independence (uncorrelated predictors) holds, and the PDP calculation is not violated [131,132]. monotonic, complex) and the marginal effect on the response variable are depicted as PDP in Figures 7 and A8-A10. The low-correlated dataset ensures that the assumption of independence (uncorrelated predictors) holds, and the PDP calculation is not violated [131,132].

QRF Variable Importance
The backscatter (BS) has the highest variable importance, followed by mean depth (MD), northness (N), and the coarse bathymetric position index (BPI100_300). Slope (S) and eastness (E) contribute less, while the backscatter standard deviation (BSSD) and BPI0_30 have marginal or no contribution at all (Figure 7). This ranking is closer to the spatial analysis results than from the Boruta analyses, where the BSSD, S, and MD have scored higher. The overall higher agreement between local Moran's I analysis and a spatially trained QRF shows that the Boruta may result in sub-optimum importance ranking, due to the overfitting that occurs when non-spatial training is performed.  The QRF response curve is represented by a black line; the LOESS curve [133] (blue line) shows the difference between a complex ML algorithm and a simpler non-parametric regression model [122]. Blue ticks on the x-axes represent the data deciles. All RF relationships have a non-linear character, with BS, MD, N, E, and BSSD having a monotonic response. For BS, a clear trend between higher backscatter values and an increasing number of PMN can be observed. A similar trend is noticed for the N and E features, with the highest PNM numbers observed towards NW dipping slopes, which are parallel the dominant bottom current direction. The BPI100_300 and slope S have complex relationships, with data aggregating in a specific range of the variable. This information could, in general, be a valuable indication for favorable geomorphological characteristics of PMN occurrence (e.g., slopes < 2 • ), but clearer indications are not given due to the sampling distribution in relation to the values of the derivatives. In most cases, the training samples are well distributed across the feature range, providing confidence regarding the established response curves. For BPI100_300 and BPI0_30, the data are aggregated in a small range, creating none-sampled regions inside the training feature space that consequently force model extrapolation. The model extrapolation is visualized better in the two-and three-way interaction PDP, where the data convex hulls between two predictor variables are presented (Figures A8-A10). All models have the same variable importance ranking and PDP and, thus, only this from spatial-CV is presented.

QRF Variable Importance
The backscatter (BS) has the highest variable importance, followed by mean depth (MD), northness (N), and the coarse bathymetric position index (BPI100_300). Slope (S) and eastness (E) contribute less, while the backscatter standard deviation (BSSD) and BPI0_30 have marginal or no contribution at all (Figure 7). This ranking is closer to the spatial analysis results than from the Boruta analyses, where the BSSD, S, and MD have scored higher. The overall higher agreement between local Moran's I analysis and a spatially trained QRF shows that the Boruta may result in sub-optimum importance ranking, due to the overfitting that occurs when non-spatial training is performed.

QRF Spatial Predictions and Uncertainty
The model prediction shows a heterogeneous PMN distribution, with a higher number of PMN aggregated in the northern and southern areas of the DEA that follow the overall seafloor topography and the bottom current regime (Figure 8). The lower and upper quantile predictions differ, with the 0.05th quantile being less affected by sampling artifacts, which are prominent in the 0.95th quantile. Moreover, the 0.05th and 0.50th quantile predict the spatial extent of PMN-free patches better (Figure 9). Inside the DEA, the central and southern parts have the lowest quantile difference, due to the increased sampling effort. The DEA-NE area also has parts with alternating high and low numbers of PMN (Figure 8). In Figure 10, the RF model results were plotted together with the visual observations from the ocean floor observation system, showing an overall spatial agreement. However, the predictions inside the DEA-NE area have high uncertainty, as the model has seen no training data from this area and the seafloor is morphologically more complex than the DEA ( Figure 8).
with BS, MD, N, E, and BSSD having a monotonic response. For BS, a clear trend between higher backscatter values and an increasing number of PMN can be observed. A similar trend is noticed for the N and E features, with the highest PNM numbers observed towards NW dipping slopes, which are parallel the dominant bottom current direction. The BPI100_300 and slope S have complex relationships, with data aggregating in a specific range of the variable. This information could, in general, be a valuable indication for favorable geomorphological characteristics of PMN occurrence (e.g., slopes < 2°), but clearer indications are not given due to the sampling distribution in relation to the values of the derivatives. In most cases, the training samples are well distributed across the feature range, providing confidence regarding the established response curves. For BPI100_300 and BPI0_30, the data are aggregated in a small range, creating none-sampled regions inside the training feature space that consequently force model extrapolation. The model extrapolation is visualized better in the two-and three-way interaction PDP, where the data convex hulls between two predictor variables are presented (Figures A8-A10). All models have the same variable importance ranking and PDP and, thus, only this from spatial-CV is presented.

QRF Spatial Predictions and Uncertainty
The model prediction shows a heterogeneous PMN distribution, with a higher number of PMN aggregated in the northern and southern areas of the DEA that follow the overall seafloor topography and the bottom current regime (Figure 8). The lower and upper quantile predictions differ, with the 0.05th quantile being less affected by sampling artifacts, which are prominent in the 0.95th quantile. Moreover, the 0.05th and 0.50th quantile predict the spatial extent of PMN-free patches better (Figure 9). Inside the DEA, the central and southern parts have the lowest quantile difference, due to the increased sampling effort. The DEA-NE area also has parts with alternating high and low numbers of PMN (Figure 8). In Figure 10, the RF model results were plotted together with the visual observations from the ocean floor observation system, showing an overall spatial agreement. However, the predictions inside the DEA-NE area have high uncertainty, as the model has seen no training data from this area and the seafloor is morphologically more complex than the DEA (Figure 8).  black patches better. In addition, the 0.95th quantile has linear artifacts; these are caused due to the difference between the model trend and sample values. The quantile uncertainty (QU) is minimized in the areas that have samples (here images) and have a similar seafloor geomorphology. The superimposed bathymetric contours have a 1 m depth interval. Figure 9. Enlarged maps of the PMN abundance prediction from three areas with black patches (no nodules). From left to right are the 0.05th, 0.50th, and 0.95th quantiles shown, followed by the quantile uncertainty. The superimposed bathymetric contours have a 1 m depth interval. black patches better. In addition, the 0.95th quantile has linear artifacts; these are caused due to the difference between the model trend and sample values. The quantile uncertainty (QU) is minimized in the areas that have samples (here images) and have a similar seafloor geomorphology. The superimposed bathymetric contours have a 1 m depth interval.

Dissimilarity Analysis and Area of Applicability
The dissimilarity analysis showed that the DEA-NE is dissimilar to the training samples, especially in the areas with rugged seafloor, increased slopes, and shallower bathymetric depth range. These circumstances reduce the area of applicability of the model, showing that additional samples must be added for better predictions. Data derived from the random-CV model have the smallest AOA. The overoptimistic error assessment that occurs limits the model AOA to a small region around the training locations, where the CV error is still small and comparable. Conversely, spatial-and cluster-CV have a larger prediction error. Hence, this error applies towards a wider area (Figure 11). The AOA also increased when excluding the MD, BSSD, and BPI0_30 features. The latter two were excluded due to their negligible contribution in the final model. Although, they are well sampled (Figure 11). MD was excluded, as the training samples do not cover the entire depth range of the study area ( Figure 11). We have prior knowledge from video observations (e.g., Figure 10 and previous studies [62,63,69]) that the DEA-NE is inside the bathymetric depth range that favors the aggregation of PMN. The exclusion of the mean depth could provide a simpler and better transferable model (on local scales), putting a greater importance on relative bathymetric variations, such as local elevations and depressions expressed by the BPI. The exclusion of the three variables MD, BSSD, and BPI0_30 resulted in a decreased dissimilarity and a larger AOA (Figure 11), but the predictive performance for the test data was reduced (R 2 = 0.73). In general, several parts of the DEA-NE remain unsuitable for predictions using the developed model, due to the complex geomorphology and lack of samples that cover this complex terrain.

Discussion
This article presents a PMN prediction case study based on AUV hydroacoustic and image data from the Peru Basin in the Pacific. The study highlights the presence of spatial autocorrelation within the polymetallic nodule (PMN) distribution, with the PMN being spatially aggregated due to local geomorphological settings and the prevailing bottom current regime. The northwest oriented bottom current is channeled through abyssal furrows and erodes fine sediments. This leaves abundant fragmental materials (fish teeth, Figure 11. Top left: the dissimilarity index (DI) between training samples and the study area resulted from the QRF model weights. Top right: the random-CV has the most limited AOA, which is restricted to the area directly around the training samples. For block-and cluster-CV (middle left), the prediction error is comparable for the entire DEA and a small part of the DEA-NE (0 = Not applicable, 1 = Applicable). Without the mean depth (MD) as predictor variable, the AOA increases (middle right). The spatial-and cluster-CV have very similar AOA statistic values (r = 0.96, p < 2.2 × 10 −16 ). Thus, only the cluster-CV AOA is presented in the density plots (bottom). The density plots show the difference between the training samples (blue) and the entire study area (red). Mean depth (MD) has the most significant difference, followed by backscatter (BS), northness (N), and eastness (E).

Discussion
This article presents a PMN prediction case study based on AUV hydroacoustic and image data from the Peru Basin in the Pacific. The study highlights the presence of spatial autocorrelation within the polymetallic nodule (PMN) distribution, with the PMN being spatially aggregated due to local geomorphological settings and the prevailing bottom current regime. The northwest oriented bottom current is channeled through abyssal furrows and erodes fine sediments. This leaves abundant fragmental materials (fish teeth, basalt debris, tiny pieces of broken shell, or nodules) as nuclei for the formation of PMN [134]. The higher number of PMN in the northern part of the DEA and DEA-NE is supported by the vicinity to seamounts (Figure 1), which provides additional nucleus material, initiating the nodule formation presence [135]. It is still unclear to what degree such influence has in comparison or synergistically to the geochemical properties of the sediment, which play an essential role in the PMN spatial distribution, their abundance, and size [134,136]. Using spatial geochemical information as input data in the modeling process would have been advantageous, as omitting such drivers limits the predictive performance. Unfortunately, the existing sampling methods (gravity cores, push cores, multi-cores, and box-corers) have limited spatial coverage, are typically clustered in small sub-regions due to the need for multiple replicates in biological studies, and generally do not occur in great numbers due to the time-consuming sampling (4-5 h per sample) and subsequent geochemical analyses.
Although our knowledge of the real reasons and their interrelationship with PMN formation is incomplete, analyzing the spatial autocorrelation has a strong potential to explore complex geomorphological relations, as it is an inherent part of the natural PMN distribution and variability across different spatial scales. In this respect, calculating the local Moran's I can contribute in three ways: (a) Local Moran's I reveals significant PMN spatial clusters. Further investigation of such areas revealed the spatial dependency of PMN by the seafloor morphology (exogenous autocorrelation), as the MBES derivatives values differ significantly between the H-H and L-L groups.
(b) Local Moran's I can effectively identify predictors for spatial modeling. In contrast, the non-spatial Boruta algorithm showed increased relevance in predictors such as the BSSD and BPI0_30 that had a minor to zero contribution in the final RF model.
(c) Local Moran's I reduces the number of training samples needed. Only the data that contribute the most are kept, and the model becomes computationally lighter while its performance increases. Studies have highlighted the need for better data (i.e., representative with low sampling uncertainty and noise, having enough variation to capture critical patterns in the data, and well distributed across the entire feature space) in data-driven approaches such as geostatistical and ML predictive modeling [18,137,138]. In this respect, the local Moran's I analysis could be an efficient tool.
Spatial autocorrelation (SAC) can help to address the issues mentioned above, at the same time not considering spatial autocorrelation may result in over-optimistic CV predictions. Similar to recent studies [17,[19][20][21]23,24,139,140], we showed that the conventional and commonly applied random 10-fold CV could not assess the model performance when spatial autocorrelation is present. In such cases, spatial-and cluster-blocking perform better. Spatial-blocking is the least biased and results in lower variance than cluster-blocking; the higher variance of cluster-blocking has also been mentioned by others [17].
The spatial-and cluster-CV also yielded different mtry hyperparameter values. Five predictors were adequate to predict the hold-out samples inside the random (and identical) folders. However, for the spatial-and cluster-CV, all available predictors (eight) were needed. The final models have the same performance in the unseen (but similar) test data. By applying different CV methods, we changed our assessment method, not the model itself. This fact shows that a spatially similar test dataset can show how well the model can reproduce the data, but it cannot inform about how the model performs when it is transferred to another area. In this case, the spatial-CV is recommended.
The random-CV could still be preferable in case of a) small datasets with geographical separation within the training points that exceed the area of spatial autocorrelation (influence zone) or b) when the predictions are restricted to nearby locations of the training samples [20,33,141]. Random-CV is the most straightforward implementation. Contrasting spatial-blocking demands a prior correlogram calculation to identify the block size, and cluster-blocking demands the use of clustering indexes (e.g., Calinski-Harabasz) to find the optimum number of clusters along with data preprocessing (e.g., scaling of input MBES derivatives). These procedures usually demand the use of more than one software, increasing the time and complexity of the processing.
However, the biggest challenge lies in realizing the need for training data that are simultaneously well spread across the geographical and feature space of the covariates used for the analysis. Despite the significant advances in terrestrial spatial sampling [142][143][144][145][146], deep-sea studies have only lately started developing methods to optimize AUV photo sampling in a way that maximizes the environmental information [147][148][149][150]. In this respect, the QRF and AOA could guide sampling efforts in previously sampled areas or even during sampling surveys (adaptive sampling or/and active learning).
The poor representation of the feature space, especially the range of the most important predictor variables, causes a reduced performance and transferability of regression random forests modelling [151,152]. In our case, the dissimilarity analysis (DI and density curves) showed differences between the training samples and the targeted feature space. This becomes visible in the quantile uncertainty (QU), which is maximized in the DEA-NE. The QU is a helpful tool to explore the model variation inside the convex hull of the training points ( Figures A8 and A9). QU is correlated with DI (r = 0.65, p < 2.2 × 10 −16 ). However, there are subareas inside the DEA-NE with high dissimilarity but no high analogue uncertainty. This is a misleading extrapolation effect that has been described by the AOA authors [35]. In other words, QRF can provide locations that have increased model variation, but it cannot identify if this is caused due to the inherent uncertainty of the hydroacoustic and image data or due to extrapolation.
Any model extrapolation beyond the training domain should be accompanied by a thorough transferability assessment, ideally with an external evaluation using data from the new area [22,33,153]. This was not possible here, due to a lack of data. Instead, we used the recently proposed AOA method. Within the DEA-NE area, the expected prediction error is higher than in the trained area of the DEA, reducing the model applicability. The exclusion of predictors that are prone to overfitting, such as elevation/depth, increases the generalization performance, which has also been mentioned by [22]. The advantage of RF to build complex non-linear response curves with the training data, outperforming other regression methods [154], could result in a disadvantage when these associations occur only locally [155]. A successful transfer thus relies on the assumption that the relationship between response variable and predictor exists in both areas. The generalization performance is maximized when the data information and feature selection are combined with the domain knowledge, even if the domain knowledge is basic or imperfect [156,157]. Other studies also showed that machine learning models with only a few predictor variables are more transferable in marine habitat mapping than complex ones [158].
Similarly, less complex models (e.g., partial least squares regression) could generalize better when spatial-CV schemes are applied, highlighting the trade-off decision to be made between accuracy and generalization performance [21,25,27]. Similarly, ensemble ML models that average predictions of different single models could also yield better predictions across different areas, as presented by [23,141]. The comparison of different models was beyond the scope of this paper. Here, we focused on the widely used random forests algorithm and its quantile uncertainty (i.e., QRF). We should underline that transferability is not the primary goal when applying machine learning-based spatial modelling. ML is designed to derive accurate predictions based on an existing measurement that captures the underlying relationship, for which our knowledge or conceptual understanding is still developing. Towards this direction, the RF importance score is a valuable measure to test known hypotheses, but also to generate new ones [155,159,160].
Future research in other parts of the Peru Basin and other known fields with polymetallic nodules (e.g., the Clarion-Clipperton Zone) could further enlighten the drivers of spatial autocorrelation. In addition, the spatial analysis in various scales would also provide insights for the underlying mechanisms that influence the spatial distribution of polymetallic nodules.

Conclusions
Our case study shows that spatial predictions of polymetallic nodules with ML methods need to be spatial-cross-validated when the spatial autocorrelation is present, and that the seafloor morphology varies. Similarly, model transfer to areas with scarce or no data should be evaluated by regarding the new area similarity with the training domain. Ideally, each ML predictive spatial map should be accompanied with its cross-validation strategy, uncertainty prediction, and area of application analyses, for supporting the model interpretation and decision making. In other words, the focus should not only lie in generating the final prediction map, but also on how this map has been derived from the fitted model and data. Funding: All data were acquired within the JPIO Project "Ecological Aspects of Deep-Sea Mining" framework, funded through BMBF grant 03F0707A. Funding was made available through MarTERA grant COMPASS-Drimp from BMWi grant 03SX466B.

Data Availability Statement:
The following data are available on PANGEA ® Data Publisher: SO242-1 ship-based MBES raw data at https://doi.org/10.1594/PANGAEA.859528; SO242-1 AUV raw images at https://doi.org/10.1594/PANGAEA.882349; SO242-1 AUV processed images at https: //doi.org/10.1594/PANGAEA.883838; Source code of the CoMoNoD algorithm at https://doi. pangaea.de/10.1594/PANGAEA.875070. The datasets and scripts generated for this study are available upon reasonable request to the corresponding author. These include: the processed shipbased MBES bathymetry, the processed and merged AUV-based MBES bathymetry, backscatter, their derivatives, the final selected subset of images for spatial analysis and modeling, the ArcMap spatial autocorrelation analysis outputs, and the R scripts. Moreover, the SO242-1 OFOS video footage and annotations. discussions during the conceptualization stage. We value Astrid Ulbrich for the proofreading of the manuscript. Finally, we thank the GEOMAR Library team for its support in gathering the necessary bibliography. This is publication 49 of the DeepSea Monitoring Group at GEOMAR Helmholtz Centre for Ocean Research Kiel.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript or in the decision to publish the results.   These alternations seem to follow the abyssal furrows microrelief. The background map is the backscatter intensity, with brighter areas to imply higher reflectivity. A higher-resolution map of PMN distribution inside the photomosaic area is provided by [9].  These alternations seem to follow the abyssal furrows microrelief. The background map is the backscatter intensity, with brighter areas to imply higher reflectivity. A higher-resolution map of PMN distribution inside the photomosaic area is provided by [9]. These alternations seem to follow the abyssal furrows microrelief. The background map is the backscatter intensity, with brighter areas to imply higher reflectivity. A higher-resolution map of PMN distribution inside the photomosaic area is provided by [9]. Figure A5. MBES derivatives correlation plot (left) and low-correlated predictors according to the Boruta ranking (right). Figure A5. MBES derivatives correlation plot (left) and low-correlated predictors according to the Boruta ranking (right). Minerals 2021, 11, x FOR PEER REVIEW 26 of 34 Figure A6. Seafloor clusters within each spatial block; at least one cluster is absent in each of the spatial blocks. This causes spatial-CV extrapolation, which creates results such as the cluster-CV approach. Figure A6. Seafloor clusters within each spatial block; at least one cluster is absent in each of the spatial blocks. This causes spatial-CV extrapolation, which creates results such as the cluster-CV approach.  Figure A7. Random fold (top), spatial block (middle), and cluster block (bottom) resampling prediction performance during CV. The model uses the information included in the nine folds/blocks/clusters to predict the remaining fold/block/cluster. Figure A8. Shown are 2D PDPs between slope, BPI100_300, and PMN. Left: extrapolated feature space, Right: convex hull of the two variables and the none-described empty feature space. The interpretation outside of the convex hull is inadvisable. Figure A7. Random fold (top), spatial block (middle), and cluster block (bottom) resampling prediction performance during CV. The model uses the information included in the nine folds/blocks/clusters to predict the remaining fold/block/cluster. Figure A7. Random fold (top), spatial block (middle), and cluster block (bottom) resampling prediction performance during CV. The model uses the information included in the nine folds/blocks/clusters to predict the remaining fold/block/cluster. Figure A8. Shown are 2D PDPs between slope, BPI100_300, and PMN. Left: extrapolated feature space, Right: convex hull of the two variables and the none-described empty feature space. The interpretation outside of the convex hull is inadvisable. Figure A8. Shown are 2D PDPs between slope, BPI100_300, and PMN. Left: extrapolated feature space, Right: convex hull of the two variables and the none-described empty feature space. The interpretation outside of the convex hull is inadvisable. . Figure A9. Three-way interaction PDP show the PMN response (colored) related to the BS, N, and E. The convex hull of the three variables is presented. The number of PMN increases with higher backscatter intensity and to the NW direction. In order to display the 3rd continuous variable (N), the PDP package converts it into shingles (a data structure that consists of a numeric vector along with four intervals that define the classes of the shingle. The intervals overlap is 10%). Figure A10. A 3D two-way interaction PDP shows the PMN response (colored surface) related to the BS and BPI100_300. Figure A9. Three-way interaction PDP show the PMN response (colored) related to the BS, N, and E. The convex hull of the three variables is presented. The number of PMN increases with higher backscatter intensity and to the NW direction. In order to display the 3rd continuous variable (N), the PDP package converts it into shingles (a data structure that consists of a numeric vector along with four intervals that define the classes of the shingle. The intervals overlap is 10%). . Figure A9. Three-way interaction PDP show the PMN response (colored) related to the BS, N, and E. The convex hull of the three variables is presented. The number of PMN increases with higher backscatter intensity and to the NW direction. In order to display the 3rd continuous variable (N), the PDP package converts it into shingles (a data structure that consists of a numeric vector along with four intervals that define the classes of the shingle. The intervals overlap is 10%). Figure A10. A 3D two-way interaction PDP shows the PMN response (colored surface) related to the BS and BPI100_300. Figure A10. A 3D two-way interaction PDP shows the PMN response (colored surface) related to the BS and BPI100_300.