Uncertainty and Overﬁtting in Fluvial Landform Classiﬁcation Using Laser Scanned Data and Machine Learning: A Comparison of Pixel and Object-Based Approaches

: Floodplains are valuable scenes of water management and nature conservation. A better understanding of their geomorphological characteristic helps to understand the main processes involved. We performed a classiﬁcation of ﬂoodplain forms in a naturally developed area in Hungary using a Digital Terrain Model (DTM) of aerial laser scanning. We derived 60 geomorphometric variables from the DTM and prepared a geomorphological map of 265 forms (crevasse channels, point bars, swales, levees). Random Forest classiﬁcation was conducted with Recursive Feature Elimination (RFE) on the objects (mean pixel values by forms) and on the pixels of the variables. We also evaluated the classiﬁcation probabilities (CP), the spatial uncertainties (SU), and the overﬁtting in the function of the number of the variables. We found that the object-based method had a better performance (95%) than the pixel-based method (78%). RFE helped to identify the most important 13–20 variables, maintaining the high model performance and reducing the overﬁtting. However, CP and SU were not e ﬃ cient measures of classiﬁcation accuracy as they were not in accordance with the class level accuracy metric. Our results help to understand classiﬁcation results and the speciﬁc limits of laser scanned DTMs. This methodology can be useful in geomorphologic mapping.


Introduction
Rivers, through erosion and accumulation processes generate various landforms in their floodplains [1][2][3]. Among them, levees, located next to the active or abandoned channels, are the most elevated forms; they can even be a couple of meters higher than the surrounding areas. Due to their position, they may provide the most critical controls on floodplains, determining the distribution of water and sediment [4,5]. The surface of levee's can be dissected by crevasse channels, which operate only during floods and have a crucial role in delivering water and sediment between the channel

Data Set and DTM Generation
The aerial survey of the study area was carried out by a Riegl LMS-Q680i aerial laser scanner in the framework of the SH/2/6 program in August 2012 [49] by Envirosense Ltd. The predetermined point density was 4 point/m 2 , and the accuracy was ±15 cm both vertically and horizontally. Our base dataset was the point cloud provided by the Trans Tisza Water Directorate. We conducted noise reduction with the neighbourhood distance based method and ground point classification with Cloth Simulation Filters (CSF) [50]. The DTM was generated from the ground points with the Natural Neighbour interpolation method at a resolution of 1 m with 0.18 m RMSE (Root-mean-square error) in the ESRI ArcGIS 10.3.1 software environment [51].

Terrain Analysis
We conducted a geospatial analysis with two open access software programmes, SAGA GIS (System for Automated Geoscientific Analyses Geographic Information System) 6.3.0 [52] and Whitebox GAT (Geospatial Analysis Tools) 3.4.0 [53] and we determined 60 terrain attributes from the DTM including basic parameters such as aspect, slope, gradient, curvatures, and secondary attributes such as the wetness index, the multiresolution index of valley bottom/ridge top flatness, mass balance and the convergence index (Table 1). Furthermore, there was an opportunity to use algorithms to determine the flood order for each of the cells within the DTM and to carry out elevation residuals analysis [53]. Some terrain attributes had tuning parameters; thus, in these cases, we repeated the calculations with different settings (e.g., difference from the mean elevation tool with 8, 16, 32 search neighbourhood sizes) to find the most suitable one for the characterisation.

Data Set and DTM Generation
The aerial survey of the study area was carried out by a Riegl LMS-Q680i aerial laser scanner in the framework of the SH/2/6 program in August 2012 [49] by Envirosense Ltd. The predetermined point density was 4 point/m 2 , and the accuracy was ±15 cm both vertically and horizontally. Our base dataset was the point cloud provided by the Trans Tisza Water Directorate. We conducted noise reduction with the neighbourhood distance based method and ground point classification with Cloth Simulation Filters (CSF) [50]. The DTM was generated from the ground points with the Natural Neighbour interpolation method at a resolution of 1 m with 0.18 m RMSE (Root-mean-square error) in the ESRI ArcGIS 10.3.1 software environment [51].

Terrain Analysis
We conducted a geospatial analysis with two open access software programmes, SAGA GIS (System for Automated Geoscientific Analyses Geographic Information System) 6.3.0 [52] and Whitebox GAT (Geospatial Analysis Tools) 3.4.0 [53] and we determined 60 terrain attributes from the DTM including basic parameters such as aspect, slope, gradient, curvatures, and secondary attributes such as the wetness index, the multiresolution index of valley bottom/ridge top flatness, mass balance and the convergence index (Table 1). Furthermore, there was an opportunity to use algorithms to determine the flood order for each of the cells within the DTM and to carry out elevation residuals analysis [53]. Some terrain attributes had tuning parameters; thus, in these cases, we repeated the calculations with different settings (e.g., difference from the mean elevation tool with 8, 16, 32 search neighbourhood sizes) to find the most suitable one for the characterisation. It calculates the maximum value of deviation from mean elevation across a range of spatial scales. One of the two output rasters is a raster containing the scale at which the maximum occurred (scale). The other output is a raster containing this maximum deviation value (magnitude).

MxEMs
Defaults [ It calculates using the difference from the mean elevation and accounts for the fact that gullies are differentiated from ravines or larger valleys because they have widths and maximum cross-sectional depths that are less than the specified parameters (the maximum gully width, the minimum and maximum gully depths, a threshold in difference from the mean elevation, a plan curvature threshold, and a smoothing parameter).

Preprocessing of Input Data for Model Building
The first step was to develop a geomorphology map of the fluvial forms of the area; such a map was generated in our previous work [47] by visual interpretation of available aerial photos and the DTM combined with field mapping. The map was edited in the GIS environment and was the input database of all analyses with 265 geomorphological features (105 point bars, 127 swales, 20 crevasse channels, 13 levees; the 2 levees of the study site were divided into subsections to ensure more features were included in the analysis).

Variable Selection
As 61 (the DTM and its 60 derived attributes) variables cause serious issues regarding overfitting, we intended to reduce the number of predictors; accordingly, we applied the Recursive Feature Elimination (RFE) technique. RFE was used directly with the classification algorithm (in our case with the Random Forest) based on the following theory: variables with the least contribution to overall accuracy are omitted from the set of predictors through iterations. Thus, the initial model considers all predictors and after removing one variable the model is rebuilt and run again. This process lasts until only one variable (with the largest contribution) remains in the model. Finally, the result is the rank of predictors based on their importance in making a contribution to obtaining greater classification accuracy [76]. We applied the RFE with a 10-fold cross-validation with 3 repetitions in R 4.03 with the caret package [77].

Supervised Classification Procedure
Random Forest (RF) is a robust and popular classifier in the remote sensing community because it does not make assumptions on normal distribution and variance homogeneity but its classification performance is high [78,79]. RF, as a classifier, has proved its efficiency in satellite imagery based land use studies [80][81][82], in urban studies based on aerial photography [83] and even in geomorphological object identification using DTMs [84,85]. Accordingly, we also chose RF as a supervised classification method.
Having the rank of predictor variables derived from RFE, i.e., knowing the predictor-set of the optimal predictor variable number that ensures the highest overall accuracy, we conducted model runs involving a decreasing number of variables with the RF classification. We removed one variable at a time with the lowest contribution until only one variable remained. We applied 500 trees, as earlier studies reported that errors stabilize before 500 trees are achieved [86]. Furthermore, the mtry parameter (the optimal number of variables for splitting at nodes) had been optimized to obtain the greatest accuracy: models were run with increasing mtry values from 2 to 30, and the best model was chosen based on the highest Overall Accuracy. All models were run with a 10-fold cross validation with 3 repetitions. This may seem redundant with the RFE; however, the result was an optimized model that can also be used for prediction, which was applied in the uncertainty analysis.
We aimed to identify swales, point bars, levees, and crevasse channels as the most frequent forms of the study area. Classification was performed with both (1) pixel-based (PB) and (2) object-oriented (OO) approaches ( Figure 2).

Predictor Stability Analysis
Predictors, i.e., geomorphometric variables, were selected by the RFE variable selection method, but the selection is a function of the input data. Accordingly, we also studied the selected variables of the RFE. We had set 11 different random samples with 1000 pixels from the 5000 training pixels and, using the RFE with 10-fold cross-validation with 3 repetitions, we determined the rank orders of the 11 realizations and we then evaluated and summarized the rank orders by their frequency.

Analysis of Overfitting
As a measure of overfitting, we determined the OA, both with the training dataset (i.e., dependent on the model, OAtrain) and the testing dataset (independent of the model, OAtest). The difference between the two types of accuracies revealed whether the classification performance was the function of a large number of variables (Equation (3)).
where OA is the Overall Accuracy.

Statistical Evaluation
We determined the Pearson correlation coefficients among the geomorphometric variables and visualized the results in a correlation plot grouped by hierarchical clustering with the Ward method. Nominal variables had been omitted. The correlation structure's internal consistency was quantified with Cronbach's α. If all variables correlate, α = 1, and when variables do not have any correlation with the other, α = 0 [93]. We also determined α-values per variable, which helped to find the metrics that deteriorate the internal consistency.
We applied General Linear Modelling (GLM) to reveal the relationship among the class level accuracy measure, the F1, the type of fluvial forms (as factor) and the number of variables (as covariate) [94]. Assumptions were checked with the Shapiro-Wilk test (normality) and the Levene test (variance homogeneity). Partial contributions of the independent variables were reported with the effect size (ω 2 ) [95]. Statistical analysis was performed in R 4.03 with the gamlj and corrplot packages [96,97].

Results of Data Preprocessing
First, we conducted PCA (Principal Component Analysis) on the variables of the pixels extracted from the raster layers (PB-approach), which indicated a good fit (Root Mean Square Residual -RMSR

1.
We used the vector layer as reference data of the fluvial forms: stratified random sampling was carried out and we chose 5000 pixels for training and 5000 pixels for testing.

2.
Polygons of the reference vector layer were used as objects; thus, the object-oriented term did not mean automatic segmentation, but real fluvial objects interpreted in a visual way. We determined the mean values of the DTM and the 60 derived raster layers by geomorphological features.
Both pixel sampling and mean value extraction were performed in ESRI ArcGIS 10.3.1 [51]. Classifications were performed in R 3.6.3 (R Core Team, 2020) with the rpart [87] and caret packages [77], and in EnMAP-Box (Environmental Mapping and Analysis Program) 2.1.1 software [88].

Model Evaluation and Uncertainty Analysis
We used the geomorphological map (see Section 2.4) as the reference dataset: in the OO-approach, the dataset was randomly split into training and testing, in a 50-50 ratio; in the PB-approach, we generated a stratified random sampling with 10,000 points within the polygons of the forms, split randomly into 5000 training and 5000 testing datasets.
where TP: true positive, TN: true negative, FP: false positive, FN: false negative number of classifications.
While the OA provides a general evaluation of classifications and makes it possible to compare different model performances, the F1 is the harmonic mean of the Producers' and Users' Accuracy (UA and PA) [91], and evaluates the class level performance.
For the evaluation of model performances, we applied a 10-fold cross-validation with 3 repetitions (RKCV): the training dataset was divided into 10 subsets of which 9 were used to train a model and one to test it; in the next step, another 9 subsets were used to train another model and the remaining subset for testing. The process finished when all subsets had been used for testing; finally, the whole procedure was repeated with another two randomly selected subsets, resulting in 30 models. We reported the medians of the OAs of the 30 models, and each OA was calculated on the test data (i.e., independently of the training dataset).
Next, we calculated the OAs and F1s using the test dataset, and also the "out-of-the-box" accuracies using the training dataset for testing, which was important to determine the level of overfitting (see Section 2.8).
Beside the thematic accuracy, we also evaluated the probabilities: RF classifications rely on the highest probability assigned to the classes (in our case fluvial forms); thus, to explore the variation in the maximal probability, the values per form provide important information [92].
The RF algorithm uses random selection of input data for the numerous (in our case 500) decision trees in different model runs, thus, theoretically, models cannot be repeated, i.e., all models are different. However, R (and Python) software provides the possibility of a repetition with the same result (random sets should be defined). We performed the predictions with the optimized model with 10 different random sets. While repeated cross-validation produced 30 models, providing a thorough analysis using the mean, median, standard deviation, and interquartile range of each model, it only used the reference dataset.
A spatial analysis was performed with the predictions: we used the whole data of the study area and we calculated the uncertainty. We repeated the predictions with 10 different RF models with the models of different numbers of input variables (20,15,10,9,8,7,6,5,4,3,2). Then, we determined the mean values of the 10 predictions, and we expected the following coded landforms: 1 (as crevasse channels), 2 (as point bars), 3 (as swales) and 4 (as levees). If all predictions of the repetitions resulted in the same code, numbers would be whole and identical with the expected classes' number, but if the prediction resulted in different morphological forms (i.e., classes), the mean would be a number with fractions. Thus, if we count the number of whole numbers where the results are fractions, we can visualize and express the level of uncertainty in the predictions.
Furthermore, beside the classifications, we also calculated the probabilities associated with the given pixels. R software provides the probabilities for each class and we combined the chosen class and the related probability to the reference pixels. Thus, we were able to evaluate the interaction of the class level probabilities and the effect of the number of variables.

Predictor Stability Analysis
Predictors, i.e., geomorphometric variables, were selected by the RFE variable selection method, but the selection is a function of the input data. Accordingly, we also studied the selected variables of the RFE. We had set 11 different random samples with 1000 pixels from the 5000 training pixels and, using the RFE with 10-fold cross-validation with 3 repetitions, we determined the rank orders of the 11 realizations and we then evaluated and summarized the rank orders by their frequency.

Analysis of Overfitting
As a measure of overfitting, we determined the OA, both with the training dataset (i.e., dependent on the model, OAtrain) and the testing dataset (independent of the model, OAtest). The difference between the two types of accuracies revealed whether the classification performance was the function of a large number of variables (Equation (3)).
where OA is the Overall Accuracy.

Statistical Evaluation
We determined the Pearson correlation coefficients among the geomorphometric variables and visualized the results in a correlation plot grouped by hierarchical clustering with the Ward method. Nominal variables had been omitted. The correlation structure's internal consistency was quantified with Cronbach's α. If all variables correlate, α = 1, and when variables do not have any correlation with the other, α = 0 [93]. We also determined α-values per variable, which helped to find the metrics that deteriorate the internal consistency.
We applied General Linear Modelling (GLM) to reveal the relationship among the class level accuracy measure, the F1, the type of fluvial forms (as factor) and the number of variables (as covariate) [94]. Assumptions were checked with the Shapiro-Wilk test (normality) and the Levene test (variance homogeneity). Partial contributions of the independent variables were reported with the effect size (ω 2 ) [95]. Statistical analysis was performed in R 4.03 with the gamlj and corrplot packages [96,97].

Results of Data Preprocessing
First, we conducted PCA (Principal Component Analysis) on the variables of the pixels extracted from the raster layers (PB-approach), which indicated a good fit (Root Mean Square Residual -RMSR = 0.05); however, the first five PCs explained only 73% of the total variance and several variables had poor (<0.5) community values. After excluding 11 variables (MnDwEC, Aspect, CatchA, DiurnAH, ValDpth, ModCA, MRRTF1, MRRTF2, PlanCurv, TotCurv, ConvI), the model fit was the same (RMSR = 0.05), but the explained total variance changed to 78% (Table 2). Next, we repeated the PCA with the OO-approach (i.e., pixel means of the visually interpreted segments of fluvial forms). In this case, we also had to exclude variables due to low communality, although only three of them were excluded (MxEMs, DiurnAH, PlanCurv), and the explained total variance was 86% with good fit (RMSR = 0.04) ( Table 3). The correlation plot ( Figure 3) also showed that geomorphometric variables formed highly correlating groups, but these groups usually contained 6-7 variables (i.e., corresponding to PCs of PCA). Cronbach's alpha indicated poor reliability (<0.001), but the omission of FlodO, MxEMs, Aspct, CatchA, ConvI, ModCA and ValDpth resulted in a better outcome of 0.690. This value is still low; it should usually be above 0.8.   A stability analysis of the variables of the PB-approach according to the RFE showed that the number of optimal variables (i.e., ensuring the largest OA) varied between 11 and 20, and the OAs varied between 78.5% and 81.1% ( Figure 6). GenSurf was the first in the rank order in eight cases out of the 11 repetitions, ElRel was the second in eight cases, DTM was the third in seven cases, FlodO was the fourth in eight cases and TPI was the fifth in five cases; all other rank places were mixed with different indices. Regarding the first ten variables, GenSurf, ElRel, DTM, FlodO, TPI, DevMe3,  A stability analysis of the variables of the PB-approach according to the RFE showed that the number of optimal variables (i.e., ensuring the largest OA) varied between 11 and 20, and the OAs varied between 78.5% and 81.1% ( Figure 6). GenSurf was the first in the rank order in eight cases out of the 11 repetitions, ElRel was the second in eight cases, DTM was the third in seven cases, FlodO was the fourth in eight cases and TPI was the fifth in five cases; all other rank places were mixed with different indices. Regarding the first ten variables, GenSurf, ElRel, DTM, FlodO, TPI, DevMe3, A stability analysis of the variables of the PB-approach according to the RFE showed that the number of optimal variables (i.e., ensuring the largest OA) varied between 11 and 20, and the OAs varied between 78.5% and 81.1% ( Figure 6). GenSurf was the first in the rank order in eight cases out of the 11 repetitions, ElRel was the second in eight cases, DTM was the third in seven cases, FlodO was the fourth in eight cases and TPI was the fifth in five cases; all other rank places were mixed with different indices. Regarding the first ten variables, GenSurf, ElRel, DTM, FlodO, TPI, DevMe3, DifME3 and ValDpth were stable elements of the ranking, MRVBF1 was in the list in nine cases, and the rest of the places consisted of other morphometric indices.
DifME3 and ValDpth were stable elements of the ranking, MRVBF1 was in the list in nine cases, and the rest of the places consisted of other morphometric indices.
The rank order of the variables was different for the OO-approach: the optimal number of variables varied between 10 and 60, with almost the same OAs between 95.2% and 95.7%. The most important variable was MRVBF1, which was the first in the rank order in 11 cases of the 11 repetitions. The second in the list was MorfFeat in 11 cases, the third was ConvISR in 5 cases, the fifth was MS_TPI2 in 8 cases and the sixth was DevME1 in 11 cases. From the seventh place in the rank, there was no dominant variable.

Pixel-Based Classification
As a first step, we involved all of the 61 variables and determined its overall accuracy: its 80.7% median OA indicated an efficient outcome (Figure 7a). Reducing the number of variables caused a slight decrease in OAs, but the rank order by OAs did not follow the number of variables. Differences in the OA-medians were slight and ranged between 79.7% (25 variables) and 76.1% (four variables). The model with four geomorphometric indices, GenSurf, ElRel, DTM and FlodO, was only 4.6% worse in the prediction than when 61 variables were used (Figures 7a and 8b). Furthermore, 11 variables resulted in the fourth best result (78.3%), and the optimal 20 variables according to the RFE produced an OA of 79.5%, which is only 1.1% worse than the model run with 61 variables. Finally, we tested the efficiency of the PCA-model, which was the fourth worst model (with a median of 74.2%), and worse than using four variables. However, there was a threshold in the decrease in OAs at three variables: the decreases in the OAs changed from <1% to 15%. The rank order of the variables was different for the OO-approach: the optimal number of variables varied between 10 and 60, with almost the same OAs between 95.2% and 95.7%. The most important variable was MRVBF1, which was the first in the rank order in 11 cases of the 11 repetitions. The second in the list was MorfFeat in 11 cases, the third was ConvISR in 5 cases, the fifth was MS_TPI2 in 8 cases and the sixth was DevME1 in 11 cases. From the seventh place in the rank, there was no dominant variable.

Pixel-Based Classification
As a first step, we involved all of the 61 variables and determined its overall accuracy: its 80.7% median OA indicated an efficient outcome (Figure 7a). Reducing the number of variables caused a slight decrease in OAs, but the rank order by OAs did not follow the number of variables. Differences in the OA-medians were slight and ranged between 79.7% (25 variables) and 76.1% (four variables). The model with four geomorphometric indices, GenSurf, ElRel, DTM and FlodO, was only 4.6% worse in the prediction than when 61 variables were used (Figures 7a and 8b). Furthermore, 11 variables resulted in the fourth best result (78.3%), and the optimal 20 variables according to the RFE produced an OA of 79.5%, which is only 1.1% worse than the model run with 61 variables. Finally, we tested the efficiency of the PCA-model, which was the fourth worst model (with a median of 74.2%), and worse than using four variables. However, there was a threshold in the decrease in OAs at three variables: the decreases in the OAs changed from <1% to 15%. Visualizing the classifications revealed that the 2-variable solution relevantly underestimated the swales and had a high proportion of salt-and-pepper errors (Figure 8a). The appearance of the 4and 20-variable versions showed clarified maps with only a small proportion of salt-and-pepper errors. Both versions had errors in the southern part of the study area; swales had been overrepresented. However, this area consisted of lakes (Kis-Morotva Lake, Kis-Pap Lake and Nagy-Pap Lake) and a floodplain depression of an irregular shape. Further sources of misclassifications were also outside the areas of interest; these were oxbow lakes at the northern part (Sulymos Lake) and on the eastern border (Nagy-Morotva Lake). Generally, the 20-variable solution reflected the geomorphology in an acceptable way and all larger errors were any of the forms we intended to identify.

Object-Oriented Classification
We applied the same procedure in OO-based classifications, too, and the result was different: the best OA belonged to the model of 10-variables (95.4%) (Figure 7b), which is almost 15% better than in the PB-approach. Another difference is that the optimal number of variables was 13 according to the RFE variable selection, but it was only the seventh best model in the list. We can discriminate three groups by the OAs: in the first group, the OAs were between 95.4% and 95.0%, in the second group between 91.3% and 90.9%, and third group consisted of only one model (with one variable) Visualizing the classifications revealed that the 2-variable solution relevantly underestimated the swales and had a high proportion of salt-and-pepper errors (Figure 8a). The appearance of the 4-and 20-variable versions showed clarified maps with only a small proportion of salt-and-pepper errors. Both versions had errors in the southern part of the study area; swales had been over-represented. However, this area consisted of lakes (Kis-Morotva Lake, Kis-Pap Lake and Nagy-Pap Lake) and a floodplain depression of an irregular shape. Further sources of misclassifications were also outside the areas of interest; these were oxbow lakes at the northern part (Sulymos Lake) and on the eastern border (Nagy-Morotva Lake). Generally, the 20-variable solution reflected the geomorphology in an acceptable way and all larger errors were any of the forms we intended to identify.

Object-Oriented Classification
We applied the same procedure in OO-based classifications, too, and the result was different: the best OA belonged to the model of 10-variables (95.4%) (Figure 7b), which is almost 15% better than in the PB-approach. Another difference is that the optimal number of variables was 13 according to the RFE variable selection, but it was only the seventh best model in the list. We can discriminate three groups by the OAs: in the first group, the OAs were between 95.4% and 95.0%, in the second group between 91.3% and 90.9%, and third group consisted of only one model (with one variable) with an OA of 45.2%. The first three variables, MRVBF1>MorfFeat>MS_TPI2, ensured a relatively high (95.0%) accuracy. Although the interquartile ranges were larger than in the PB-approach, even the minimums were~10% higher in the OO-models.
Maps of the OO-approach, i.e., the classified polygons, did not have large differences regarding the number of involved variables (Figure 9). Related to the 13-variable model, the 2-variable model had only some misclassifications, the two models were very similar, and most of the differences were of swales wrongly classified as crevasse channels. Furthermore, we found swales classified as point bars, and a point bar classified as a levee, but the geomorphic characteristics had been well represented in spite of the model errors.

Class Level Probabilities of Classifications
The highest probabilities usually belonged to crevasse channels and point bars, while the lowest belonged to swales ( Figure 10). However, this result is misleading, because F1-values indicated that the lowest accuracies were experienced in crevasse channels (<20%), and the swales and point bars had the greatest accuracy (60-70%; Figure 11). Regarding the number of variables, using 20 variables should have resulted in the most efficient model with the PB-approach according to the RFE, and this was true for the OA values (25 variables was only 0.2% better; Figure 6), but in the case of class level probabilities, the highest numbers were experienced with the 8-10-variable models, which were also efficient with the F1-values (Figure 11).

Class Level Probabilities of Classifications
The highest probabilities usually belonged to crevasse channels and point bars, while the lowest belonged to swales ( Figure 10). However, this result is misleading, because F1-values indicated that the lowest accuracies were experienced in crevasse channels (<20%), and the swales and point bars had the greatest accuracy (60-70%; Figure 11). Regarding the number of variables, using 20 variables should have resulted in the most efficient model with the PB-approach according to the RFE, and this was true for the OA values (25 variables was only 0.2% better; Figure 6), but in the case of class level probabilities, the highest numbers were experienced with the 8-10-variable models, which were also efficient with the F1-values (Figure 11). the lowest accuracies were experienced in crevasse channels (<20%), and the swales and point bars had the greatest accuracy (60-70%; Figure 11). Regarding the number of variables, using 20 variables should have resulted in the most efficient model with the PB-approach according to the RFE, and this was true for the OA values (25 variables was only 0.2% better; Figure 6), but in the case of class level probabilities, the highest numbers were experienced with the 8-10-variable models, which were also efficient with the F1-values ( Figure 11).  Visualized maximum probability values ( Figure 12) correspondent with the maps, but this was a natural phenomenon as the result of the classification, and the probability had been calculated with the same algorithm and settings. However, combining this information with the mean values of the landform classes (Figure 10), we were able to justify that the spatial distribution of the different values of probability provided the landform map itself. Visualized maximum probability values ( Figure 12) correspondent with the maps, but this was a natural phenomenon as the result of the classification, and the probability had been calculated with the same algorithm and settings. However, combining this information with the mean values of the landform classes (Figure 10), we were able to justify that the spatial distribution of the different values of probability provided the landform map itself. Visualized maximum probability values ( Figure 12) correspondent with the maps, but this was a natural phenomenon as the result of the classification, and the probability had been calculated with the same algorithm and settings. However, combining this information with the mean values of the landform classes (Figure 10), we were able to justify that the spatial distribution of the different values of probability provided the landform map itself.  In the OO-approach, probabilities were higher, usually above 80%, and the levees had the lowest and the point bars the highest values ( Figure 13). Although the 13-variable model did not result in better classification probabilities for the fluvial forms, the F1-values showed that the best class level performance belonged to this model ( Figure 14). We also have to note that the performance of models conducted with 10-11-12-13 variables was almost the same according to the F1. In the OO-approach, probabilities were higher, usually above 80%, and the levees had the lowest and the point bars the highest values ( Figure 13). Although the 13-variable model did not result in better classification probabilities for the fluvial forms, the F1-values showed that the best class level performance belonged to this model ( Figure 14). We also have to note that the performance of models conducted with 10-11-12-13 variables was almost the same according to the F1.  In the OO-approach, probabilities were higher, usually above 80%, and the levees had the lowest and the point bars the highest values ( Figure 13). Although the 13-variable model did not result in better classification probabilities for the fluvial forms, the F1-values showed that the best class level performance belonged to this model ( Figure 14). We also have to note that the performance of models conducted with 10-11-12-13 variables was almost the same according to the F1.

Spatial Uncertainty Issues
Spatial uncertainty analysis, as reflected in different realizations of classifications, also pointed to the fact that even a high classification performance showed different results, i.e., repetitions resulted in different spatial outcomes ( Figure 15). According to the 10 repetitions, the lowest proportion of differing realizations was associated with the levees (mean: 2.8%), and the point bars and swales had the largest proportion with almost the same values (mean: 6.9 and 6.5%, respectively). The uncertainty was the highest with the 2-variable models, and became stable when more than 10 variables were involved, but this did not mean lower values: point bars and swales had 1% more uncertainty with 10 variables (increasing from 6.1 to 7.1%).

Spatial Uncertainty Issues
Spatial uncertainty analysis, as reflected in different realizations of classifications, also pointed to the fact that even a high classification performance showed different results, i.e., repetitions resulted in different spatial outcomes (Figure 15). According to the 10 repetitions, the lowest proportion of differing realizations was associated with the levees (mean: 2.8%), and the point bars and swales had the largest proportion with almost the same values (mean: 6.9 and 6.5%, respectively). The uncertainty was the highest with the 2-variable models, and became stable when more than 10 variables were involved, but this did not mean lower values: point bars and swales had 1% more uncertainty with 10 variables (increasing from 6.1 to 7.1%).

Result of Overfitting Analysis
Overfitting, i.e., the effect of the number of variables on the OA values, was not obvious: the overfit difference was usually more than 12% above four variables for the PB-approach and 7% with the OO-approach (Figure 16). However, the peak of the PB-approach at 48 variables (26.6% difference) belonged to the model using the PCs of PCA, and as we pointed out, the explained variance of the PCA was not high (only 78%). In the OO-approach, the explained variance of the PCA of 52 variables was higher (86%) with a better model fit, and the difference was one of the smallest. The analysis revealed that using all variables did not cause a larger difference between the OAs of training and testing datasets than using only five variables. In the OO-approach, the difference was the smallest with the 1-variable and the 13-variable models.

Result of Overfitting Analysis
Overfitting, i.e., the effect of the number of variables on the OA values, was not obvious: the overfit difference was usually more than 12% above four variables for the PB-approach and 7% with the OO-approach (Figure 16). However, the peak of the PB-approach at 48 variables (26.6% difference) belonged to the model using the PCs of PCA, and as we pointed out, the explained variance of the PCA was not high (only 78%). In the OO-approach, the explained variance of the PCA of 52 variables was higher (86%) with a better model fit, and the difference was one of the smallest. The analysis revealed that using all variables did not cause a larger difference between the OAs of training and testing datasets than using only five variables. In the OO-approach, the difference was the smallest with the 1-variable and the 13-variable models. GLM models also indicated that F1 values were determined by the fluvial forms and the role of the number of variables was not significant, and also had a low effect size (Tables 4 and 5). Accordingly, fluvial forms had characteristically higher or lower F1 values, i.e., the number of variables did not improve the class level indices in a relevant way.  GLM models also indicated that F1 values were determined by the fluvial forms and the role of the number of variables was not significant, and also had a low effect size (Tables 4 and 5). Accordingly, fluvial forms had characteristically higher or lower F1 values, i.e., the number of variables did not improve the class level indices in a relevant way.

Discussion
3D point clouds of LiDAR are considered to be the most efficient and powerful surveying method, which provides accurate data about the ground and objects' surface height, even in vegetated areas. However, this data type can have issues in areas where the terrain is flat, but the relative differences cause important changes in the environment. These areas include wetlands, floodplains, and salt-affected steppes where 0.1-0.2 m relative differences cause changes in the water supply, soil Remote Sens. 2020, 12, 3652 21 of 29 moisture or salt content. A better understanding of the geomorphology and the fluvial processes can help sustainable planning in these areas, and the LiDAR technology can be a promising tool for this.

Object-Oriented and Pixel-Based Classifications
The OO-approach provided~15-17% better model performance for fluvial form identification than the PB-approach. The object-based and pixel-based comparisons usually proved that segments ensure better input data than pixels. The authors of [98] came to the same conclusion; they applied object-based "geons", i.e., homogenous regions, and pixels in the landslide susceptibility mapping and the approach using segments-the geons-provided the highest accuracy. Several other authors found the same results: [99] with hyperspectral data, [100] with Sentinel-2 images, [101,102] with ASTER images, and [103] with visible aerial imagery. We emphasized that our OO-based approach was not a classic segmentation of object-based image analysis (OBIA); our objects were fluvial forms interpreted by visual characteristics and field observations. Thus, in our case, one object meant a given fluvial form, with the sole exception of levees (there were too few forms: we divided up the existing ones to increase the cases). Similarly to the studies where OBIA was successful, our objects of fluvial forms were good basic units of the landscape, but the main difference from the OBIA lies in the interpretation: it needs expertise (the capability to identify the forms in aerial images, in digital terrain models and in the field); furthermore, it is time-consuming (depending on the extent), i.e., all objects in the area should be identified correctly; however, when it is completed, the geomorphological aspect is ready, too. Unlike segments that varied in shape as the pixel values changed, our objects covered all of the pixels that are characteristic of the classes. Accordingly, the mean pixel values of the raster layers of geomorphometric variables can regarded as the quantified measure of the separability of the forms. If classifications produce only a weak performance, we cannot expect a high degree of accuracy in the PB-approach. As our OO-based classifications provided about 95% OA, this was the theoretical maximum for the PB-approach, too. The 78-80% OA for the PB-approach was higher than using only the slope, aspect, terrain height and NDVI with two fluvial forms (71% OA; [47]).

Variable Selection, Number of Variables and the Issue of Overfit
Variable selection is a key step to reduce the number of variables in order to decrease the chance of overfitting while preserving high classification accuracy. We ran several models with different variable sets and the number of variables, and the results showed that in the OO-approach, the OAs were~95%, and in the PB-approach,~78%. The highest OAs were obtained with the highest number of variables for the PB-approach, but in the OO-approach this was not entirely true: involving all variables only resulted in the fourth place (although it is also important that the difference among first three models was <0.2%). Accuracy assessment using the RKCV was 7% worse than testing with the training data, indicating serious overfitting. With the OO-approach, the smallest overfit was experienced with the 1-, 12-and 13-variables models (1%, 1.2% and 0.8%, respectively). The overfitting was 6-7% in all other models, regardless of the number of variables, and even when using 2-9 variables. In the PB-approach, the overfitting was higher, at 11-12%, and the lowest values were observed with 1-3 variables. However, 1-3 variables produced a poor model performance, too; thus the lowest overfit values were not useful in selecting the best model.
An uncommon phenomenon was experienced with the PCA: application of PCs usually results in higher model performance, as has been proven in different studies [104][105][106][107]. However, in this case, the PCA was not a successful alternative. According to the analysis of correlation structure, several potentially useful metrics should be omitted based on the communality and the item-related Cronbach's α. For example, FlodO, ConvI and ValDpth were important predictors in the classification, but if we insist on the correlation-related dimension reduction, we have to miss them. The reason was that the correlation structure was not optimal, and the variables were geomorphometric indices and not bands of a hyperspectral sensor, where there is a high level of correlation between the bands. The authors of [108] used PCA with geomorphometric variables in watershed prioritization, but they used watershed parameters, which resulted in a better correlation structure. The authors of [109] used PCA to reduce 230 terrain attributes to 67, and they concluded that five metrics were enough to explain 51% of the total variance. In our case, we were not able to delineate this reduction, but the RFE as a feature selection method was appropriate to find the most important variables.
RFE provided a list of 13-20 variables from the possible 61, but the structure, i.e., the variables that make the list, varied. RFE was run with the RF algorithm, which uses several decision trees of bootstrapped samples, and all model runs can provide different results. Our experiment with 11 randomly selected datasets from the 5000 pixels and 265 forms showed that the OO and PB-approaches had different variables that contribute to the highest OAs. Furthermore, there were important variables, with relevant contributions in all repetitions. Our third observation was that although the variable sets could differ in the repetitions, the OAs obtained were the same in 1-2% variations. Accordingly, if there are many variables, there are different variable sets, which ensure more or less the same result. The most important geomorphometric indices were GenSurf, ElRel, DTM, FlodO, and TPI for the PB-approach, and MRVBF1, MorfFeat, ConvISR, MS_TPI2, and DevME1 for the OO-approach. In the first most important ten variables we find overlaps, but with different settings, i.e.,: with DevME and DifME, although MRVBF was the same regarding the ranks. Differences in the importance were normal, because the number of the data (1000 vs. 265) and the characteristics of the data (pixel values vs. pixel means by forms) were also different. The authors of [110] pointed out that RFE is useful in reducing predictor numbers; thus, machine and deep learning algorithms can be faster without variables that make a small contribution. We also experienced that RF models were trained in a shorter time, and the model accuracy was only slightly (1-2%) lower; moreover, in the OO-approach, omitting irrelevant predictors, and with only 10-13 variables, the model became~1% better.
DTM (absolute terrain height) should have been efficient alone, but as the terrain had a slight change from the river to the dyke, and from the North to South, thus, swales and point bars can have the same height in the area. Accordingly, the most efficient metrics were able to reflect the specific characteristics of fluvial forms, i.e., the shapes, convexity-concavity, and relative situation of the forms. Efficient variables considered the relative differences (based on minimums and maximums) of the neighboring areas; furthermore, the convergence-divergence, GenSurf, Elrel, FlodO, MRVBF1 and ConvISR were important both for PB and OO-approaches, while DevME and DifME were also important but with different settings: the larger kernel window (with 16 and 32 neighboring pixels) were efficient for the PB-approach and the smaller kernel (8 neighbors) had large importance for the OO-approach. Finding the right class of a single pixel requires larger neighboring area, while with the OO-approach values belonging to objects are means of the given polygon; therefore, larger kernels mean double averaging and do not help the classification.

Uncertainty
We measured the uncertainty in different ways, and the results were contradictory. The highest probability was associated with the classification, and highest values indicated that the decision regarding the classification was made in a straight or an ambiguous way, e.g., if the given probabilities varied, for example 0.90, 0.05, 0.05, 0.00, this meant that the classification identified the first class as 90%, but if this occurred for another form the values were 0.40, 0.35, 0.25, 0.10, which meant that there was only a 5% difference between the first and second classes. Based on this, we expected that higher probabilities would cause better classification accuracy on a class level, but there was no connection between the mean probabilities of fluvial forms and the F1s. Moreover, there were contradictions in the PB-approach; the correlation was 0.14 with the highest probabilities associated with the crevasse channels, whilst the F1s were the lowest (even <10%). Although the results were more similar with the OO-approach, the correlation was 0.74, and contradictions were also found (crevasse channels also had high values in some cases with low F1 values). Nevertheless, probability values reflected the spatial distribution of the landforms: all forms can be delineated based on the maximum probability of the pixels. This means that the maximum values differed by class, and these values were characteristic for all pixels of a given class. However, this value did not correspond with accuracy. Spatial uncertainty analysis also produced similar results: the proportion of uncertain pixels was the highest for swales and point bars, and the lowest values occurred with the levees; meanwhile, the class level F1s showed the opposite result: swales and point bars had the highest class level accuracy. Considering the dominance of these two forms in the area, there is also a higher chance of misclassifying them. Point bars and swales have similar shapes and the main difference between them is that swales are concave while point bars are convex forms [47]. Swales are shallow and elongated depressions with varying water cover or at least higher soil moisture, the vegetation is denser, and tussocks (Carex species) form a naturally uneven surface. Thus, considering that the terrain height has a tendency to increase from the river to the dykes, all swales differ in extent, water cover, vegetation cover, and absolute height above sea level. Point bars can have the same terrain height as swales and in certain locations (i.e., lower point bars between higher point bars) and at certain time can be covered by water; thus, they can also have denser vegetation related to those in a higher terrain position [50,111]. Although these forms can be discriminated easily with visual interpretation, and can be identified in the field by geomorphologists, due to the large number of forms combined with the similarities, semi-automatic misclassification between these two forms can be regarded as normal.
All classifications have errors, and similar features are hard to discriminate. The fluvial forms of a floodplain have similar geomorphometric properties; thus, even with the most accurate aerial LiDAR-based DTMs, classifications cannot be accurate. Using the objects' outlines and calculating the mean pixel values by raster layers (i.e., the OO-approach), very accurate 95% models can be obtained, but this supposes that there is a properly interpreted map of the morphological forms. However, the accuracy of the PB-approach was also acceptable, with an OA of 78%, considering that these investigated forms were similar from many perspectives. This research highlighted the importance of variable selection, and the specific characteristics of the models. We hypothesized that a larger number of variables increases the OA and decreases the uncertainty of the classifications, but the results were ambiguous: both the classification probabilities calculated by the RF algorithm, and the uncertainties calculated from the 10 repetitions of the classification maps, showed contradictory outcomes compared to the F1 as a class level accuracy metric.

Conclusions
We aimed to reveal the most important morphometric variables of fluvial form identification and to quantify the level of uncertainty and overfitting as a function of the number of variables. We found the following results:

1.
A large number of morphometric variables can be used efficiently in the identification of levees, crevasse channels, point bars and swales. However, a larger number of variables did not ensure a relevantly better model performance.

2.
RFE, as a variable selection technique, helped to find the fewest variables making the largest contribution to obtain the grates' accuracy. Our main finding was that the selected variable set can change by model runs; the maximum OAs were almost the same. Although the variables were not the same in the repeatedly conducted models, we were able to identify the most frequent ones. Involving four variables in the case of the PB-approach and two variables in the case of the OO-approach provided sufficient accuracy, and the errors did not differ relevantly from the maximum number of geomorphometric indices. 3.
OO and PB-approaches performed differently: the object-oriented approach was more successful with 95% OA, while the 78% OA of the pixel-based approach was a weaker performance; nevertheless, all the forms were identifiable despite the misclassifications. 4.
The probability of the classifications and the pixel-based spatial uncertainty (as different classification outcomes for the same pixels) was not an appropriate tool to evaluate the classification efficiency, because the values were not in accordance with the class level accuracy metric (F1s).

5.
Overfitting was in accordance with the optimal number of variables: the lowest level of overfitting coincided with the high OAs of the optimal number of variables. 6.
We emphasize that the most important variables (GenSurf, Elrel, FlodO, MRVBF1, ConvISR, DevME, DifME) ensured accurate models for fluvial forms, but the selection methodology was more important. Different aims and target geomorphological forms can also be identified with the help of geomorphometry after a careful variable selection.
The 78% accuracy of the PB-approach can be regarded as acceptable, as the fluvial forms studied had similar characteristics. These results, including the methodological findings, can help the water management directorates to evaluate the floodplain from a flood-management perspective and to find common points with nature conservation planners, to preserve the most valuable habitats.