Soil Management Effects on Soil Water Erosion and Runoff in Central Syria—A Comparative Evaluation of General Linear Model and Random Forest Regression

: The Mediterranean part of Syria is affected by soil water erosion due to poor land management. Within this context, the main aim of this research was to track soil erosion and runoff after each rainy storm between September 2013 and April 2014 (rainy season), on two slopes with different gradients (4.7%; 10.3%), under three soil cover types (SCTs): bare soil (BS), metal sieve cover (MC), and strip cropping (SC), in Central Syria. Two statistical multivariate models, the general linear model (GLM), and the random forest regression (RFR) were applied to reveal the importance of SCTs. Our results reveal that higher erosion rate, as well as runoff, were recorded in BS followed by MC, and SC. Accordingly, soil cover had a significant effect ( p < 0.001) on soil erosion, and no significant difference was detected between MC and SC. Different combinations of slopes and soil cover had no effect on erosion, at least in this experiment. RFR performed better than GLM in predictions. GLM’s median of mean absolute error was 21% worse than RFR. Nonetheless, 25 repetitions of 2-fold cross-validation ensured the highest available prediction accuracy for RFR. In conclusion, we revealed that runoff, rain intensity and soil cover were the most important factors in erosion.


Introduction
In the last few decades, land degradation has posed major concerns all over the world [1,2]. Over 60% of the world's land is subjected to different types of land degradation, and more than 3.2 billion people suffer from it [3][4][5]. The main issues are desertification and salinization [6,7], fertility reduction [8,9] and soil erosion [10], soil acidification [11,12], and pollution [13]. In addition, due to changing environments, stenocious species disappear and biodiversity decreases [14][15][16]. Thus, maintenance of soil physicochemical properties is a key factor for a healthy soil, which directly or indirectly leads to achievement of the Sustainable Development Goals (UN-SDGs), namely Goals 2, 3, 6, 7, 12-15 [17][18][19] Soil erosion is a serious degradation hazard that threaten agricultural production and sustainability of natural ecosystems all over the world [20]. More than 12 million ha/year of fertile soil were excluded from agricultural production due to this degradation form [21]. Furthermore, due to the runoff and sediment transportation, soil loss is accompanied by accumulation eutrophication and water pollution, nutrition leaching and crop yield depression [22]. Accordingly, soil erosion is a great challenge for sustainability in agroecosystems worldwide, such as Europe [22][23][24][25], Africa [26,27], Asia [28], and Australia [29].
Soil erosion by water can be induced by rainfall, slope, and may also be caused by melting snow, and irrigation. Rainfall erosivity depends on the duration, intensity, frequency of rainy storm [30]. Also, the slopes have an effect on soil erosion; in particular, the most important feature is the steepness, but the length, shape and aspect are also affecting the amount and velocity of runoff; nevertheless, erosion can occur on gentle slopes, usually as areal sheet erosion [31][32][33][34]. Furthermore, there are influencing factors having a role in regulating the amount of runoff such as soil properties (i.e., texture, organic matter content, aggregates stability, soil compaction and sealing) [35,36]; and agricultural activities (tillage practices) [35][36][37], as well as land cover/management [38,39].
Different approaches that are available determine the rate of erosion from experimental plots (EP), based on field or laboratory measurements, to erosion models using initiating and influencing factors as model parameters. The EP methods (i.e., levelling, volumetric, deluometric, deflametric, climatological, pluviological, and monolithic methods) were suggested by Zachar [40] as an ideal solution to directly measure soil erosion in a controlled environment (i.e., slope characteristics, soil conditions, and land cover). The most important issue of erosion studies is whether the findings can be extrapolated to other areas, due to the limited size of the EPs.
Over time, dozens of equations and mathematical models were developed for different purposes such as assessing soil erosion, quantifying soil erosion, drawing soil erosion hazard maps, and evaluating the effectiveness of erosion control measures. Since the first equation for predicting soil erosion was introduced by Zingg [41], considerable improvements had been achieved in modeling soil erosion. Scientifically, soil erosion models can be divided into three groups: empirical models (i.e., Universal Soil Loss Equation (RUSLE) [42]); physical models (i.e., Water Erosion Prediction Project model (WEPP) [43]); and hybrid/conceptual models (i.e., the Large-Scale Catchment Model (LASCAM) [44]). Regardless of the model type, the main goal was to simplify soil erosion processed to an acceptable level of accuracy in order to represent real-world scenarios.
Model validation is a serious part of the erosion analysis which has an extensive literature [45][46][47], even when the results are based on field experiments, the reliability of the conclusions should be investigated [48]. Statistical analysis can reveal the general relationships, but the main question is: how do the data represent the area and the phenomenon itself? Regression analysis provides a reliable measure with goodness-of-fit indices (e.g., R 2 and residual errors), but these are calculated from the dataset itself, and not from independent data. However, splitting the dataset into two parts may be misleading: a random selection can provide different datasets from the aspect of representativity, i.e., the train and test data can be selected to be appropriately different and similar, providing a randomly accurate or inaccurate outcome. A reliable method can split the datasets into several subsets, using one part to train the statistical models, and the other for testing. Moreover, we can repeat this procedure; thus, based on several random selections and repetitions can provide a basis to judge the models with higher reliability: medians, quartiles, means, standard deviations (SD) can be calculated from even hundreds of models runs. Accordingly, the accuracy, and the representativeness of the input data will be reflected in the range of model indices.
Syria is one of the Mediterranean countries being affected by both water erosion, and wind erosion, as in other countries in the Middle East and North Africa (MENA) region. Generally, wind erosion mainly dominates in the eastern and central part of Syria [49], while the western and northern parts as well as mountains are more subjected to soil erosion by water [50][51][52]. In the central part of Syria, especially in Mysiafe region (Hamah Governorate (Syria)), where successive and heavy rainstorms occur frequently, soil erosion had become a recurrent threat for sustainability of land resources. Besides, agricultural areas in this region are remarkably affected by soil erosion, which hinders the ecosystem and can cause catastrophic damage to agricultural production.
Although several erosion related studies have been prepared using experimental plots (i.e., Kbibo and Nasafi [53], Kbibo et al. [54]), data reliability had not been evaluated. Within this context, the main aim of this study was to investigate the impact of using different soil conservation techniques on water soil erosion and runoff. We quantified the role of the slope steepness and had the following hypotheses: (i) involving more driving factors into the modelling results in more accurate models; (ii) a non-parametric model can perform better due to lesser limiting prerequisites; and (iii) cross-validation with optimized models provides better information about the model accuracy.

Study Area
The study area is located in Al-Bustan village (36.329 E, 35.016 N) in Hamah Governorate (Syria) (Figure 1), NE Syria, and it is about 720 m above sea level. Generally, soils in this area are described as clayey soils (sand 9.8%; silt 21.9%; clay 68.3%), slight alkaline (pH = 8.1), non-saline (electrical conductivity (EC)= 0.15 dS/m), have moderate organic matter content (OM% = 1.9%), and high in calcium carbonate content (41.3%). Physical characteristics, such as bulk density (1.3 g/cm 3 ), practical density (2.62 g/cm 3 ), and porosity (49.61%) indicated a compacting soil condition. The study area is dominated by dolomitic rocks alternating with limestone, dating back to the middle and upper Jurassic, and has a maximum thickness of about 700 m. The climate of this region is characterized by hot summers and cold and rainy winters, with an average precipitation of 1890 mm. Olives cultivations dominate the land use, with smaller patches of Pinus forests as natural vegetation.

Experimental Design and Sampling
Two locations with different slopes angles (4.7% and 10.3% at the first and second location, respectively) were selected to run the experiment. For each slope, nine experimental plots (5 m × 1 m; Figure 2) were set up within each slope and were divided as follows: (i) three plots as a control without cultivation (bare soil, BS) to be directly exposed to rain drops ( Figure 2a); (ii) three plots were covered with metal sieve (MC) at a height of 20 cm in order to reduce the direct impact of rain drops on the soil aggregates (MC, Figure 2b); and (iii) three plots were planted with annual crop in alternating strips (SC) perpendicular to the slope and far from each other 30 cm, in order to mitigate the effect of rain drops, and reduce the impact of surface runoff (SC, Figure 2c). The total experimental plots were 18 plots (2 slopes × 3 land cover × 3 replications). The plots designed were similar to Wischmeier type [55], but in smaller size. During the monitoring period, which started from September 2013 and ended in March 2014, 10 rainy storms were recorded; before each rain events soil moisture content was measured. Rainfall amount in 30 min (i30) was collected from the rain gauge beside each plot as well as runoff and soil erosion. Sediment trap (tank, 200 L) contents (i.e., soil and water) were mixed for five minutes, then three samples each one 2-L (L) were collected; these samples were mixed together, then a representative sample of 3 L were collected. After that the tank was discharged, and samples were moved to the laboratory at Al-Bath University. In the laboratory, samples were placed in containers for sedimentation then soil was separated from the water and dried in an oven at 105 °C for 24 h. Later on, soil samples were weighed on a balance to calculate the amount of eroded soil from each event.

Statistical Analysis
Statistical evaluation was begun with checking the assumption of normal distribution of the variables. We applied the Shapiro-Wilk test and found that variables were not of normal distribution. Accordingly, we applied robust methods with bootstrapping and trimming to overcome the violated assumptions.
Robust 2-way factorial analysis of variance (ANOVA) was applied for the analysis of factor variables, soil cover and slope type. Beside the hypothesis testing, interaction was also studied.
Modified M-estimator and 5000 bootstrap samples were applied. Post hoc testing was performed with the Games-Howell test, which is insensitive for the inhomogeneity of variances [56].
To reveal the importance of soil management and the uncertainties of the outcomes, we applied two statistical models: the general linear model (GLM) and random forest regression (RFR). For both models, the response variable was the erosion (g/m 2 ), while the predictors were the runoff (L/m 2 ), soil moisture (%), rain intensity (mm/30 min), slope steepness (%), and the management. Categorical variables were involved as dummy variables.
The GLM has several assumptions: the dependent variable is normally distributed, the predictors combine additively on the response, residuals of the model have to follow the normal distribution, the model has to have homoscedasticity (i.e., residuals have been constant in the range of prediction), data should be independent and predictor variables' correlation should be smaller than 0.8 (i.e., to avoid multicollinearity) [56,57]. The GLM final outcome is merely dependent on such limitations. The Shapiro-Wilk test confirmed the normality of the residuals.
RFR is a non-parametric tree ensemble procedure with several applications to data analysis due to its prediction performance [58]. The random forest technique uses bagging and decision trees. We applied 500 decision trees, and the bagging ensured 500 model realizations with random selection of the data with replacement. The involved variables were the same as in case of GLM, but RFR does not apply all of them at the same time: for one decision tree, the algorithm uses the square root of the total number of variables. In our case, there were 6 variables used in one model, considering the categorical variables separate as single dummy ones.
We also determined the variable importance. In the case of GLM, we calculated the effect size (partial η²p). η²p is a standardized measure of the magnitude of the contribution of a variable in the explained variance [56]. RFR, with omitting variables during the calculations, exploits the advantage of lesser variables. Omitted variables cause change in the explained variance, and as there are several hundred models (in our case 500), we can determine the consequence of including or omitting the variables. The final outcome is a rank expressed as mean decrease accuracy (IncMSE%), a measure of sum of squares as a prediction error; the larger the value the larger the importance of a given variable) and mean decrease Gini (IncNodeGini, the impurity of the splits of the decision trees) [59]. RFR provides a measure of variable importance but a current limitation is that no systematic method exists to estimate the shared variances of the variables [60]. As Strobl et al. [61] pointed out the unreliability of default RF models' importance values, we applied the importance permutation [62].
As a verification of the models, we applied k-fold repeated cross validation (RCV) with 2 splits and 25 repetitions; i.e., altogether 50 models were run with splitting the dataset to a train and a set group (which meant two models: at first the first group was designated as train and second as the test set; secondly the second group was used as the train set and the first as the test), then it was repeated 25 times. The accuracies were plotted in a box and whiskers plot and this made it possible to compare the models' efficiency using the R 2 , mean absolute error (MAE) and root mean square error (RMSE) values. In this case, R 2 was not the traditional one referring to the response and predictor variables, but, as a pseudo-R 2 , it was the square of the correlation between the observed and modelled values [63]. This approach made it possible to fine tune the RFR model: we used the RMSE's smallest value to select the optimal model with the number of variables at the nodes of decision trees. We evaluated the RFR and GLM model predictions with the Bland and Altman [64] plot (visualizing the differences of observed and modelled data against their averages) and Wilcoxon paired test. Statistical analyses were conducted in R 3.6.2 [65] by using the packages shown in Table 1.

Relationship of Soil Erosion with the Influencing Factors
Soil erosion rates varied by the soil cover with significant effect (df = 2, F = 8.503, p < 0.001). In particular, erosion values at BS plots were significantly higher than at MC and SC plots as reflected by the p-values (Table 2). Furthermore, non-significant difference has been detected between erosion measured at MC and SC plots. Moreover, BS error values had the largest variance whereas MC and SC plots mean erosion values and standard errors were low (Figure 3). Considering the inclination, the difference between soil erosion in slopes characterized by 4.7% and 10.3% of steepness was not significant (mean difference = 21, df = 1, F = 0.197, p = 0.659).  Our results showed that, when runoff rates were lower than 5 L/m 2 , there was no distinct effect of soil cover (Figure 4). On the other hand, when runoff rate exceeded 5 L/m 2 , the influence of soil cover type is noteworthy. For instance, in BS plots an average increase by 0.2 L/m 2 of runoff accelerated erosion by 1.97 g/m 2 , and by 0.44 and 0.40 g/m 2 for MC and SC, respectively. Thus, if there was any soil cover, the erosion did not increase relevantly, compared to BS plots. The relationship was strong between the erosion and the runoff in each soil cover type ( Figure 5). Notably, soil moisture did not show any significant correlation with the erosion (r = 0.03, p = 0.507); while rainfall intensity was in a strong relation with the runoff (r = 0.95, p < 0.001). Although the regressions performed with the separated data explained at least 74% of variance, the aim was to find a solution involving all data without creating subsets, but in this case the R 2 was only 0.54 (p < 0.001). In spite of the significant model, the residuals were large, and the standard error of the estimate was 26.5 L/m 2 .

Multivariate Statistical Modelling
The single GLM model provided a result with an adjusted R 2 of 0.66 (Table 3). Partial η² indicated the importance of runoff (0.599) and the soil cover (0.132). All the other variables had insignificant (p > 0.05) effect, and minimal effect size (equal and smaller than 0.016, having less contribution in the model than 2%). Omitting all insignificant variables, the new model had an R 2 of 0.678 (Table 4). This model revealed that runoff rate had double the importance than soil cover. However, we excluded the rain intensity from the model due to the high level of correlation (r = 0.95; p < 0.001). The single RFR model explained 75.9% of variance, and the variable importance indicated the runoff rate and the soil cover as the most important variables. Considering the %IncMSE, the importance values indicated the same as the second GLM model involving only the runoff and soil cover (Table 4): runoff had larger importance value than the soil cover. As RF models are not sensitive to multicollinearity, in this model we included the rain intensity, too. However, despite the strong correlation of the rain intensity with the runoff, its importance was the two-third of the runoff. Soil cover types had a relevant effect, according to the IncMSE%. Nonetheless, soil moisture (sm) and slope angle had negligible effect ( Figure 6). In the next step we applied the RCV with 50 models: the results showed that the models had large variance if we use random subsets from the whole dataset. R 2 values' medians of the GLM and RFR models were almost similar (for GLM it was 0.66, and for RFR 0.70), and the GLM's minimum R 2 was 0.53, while it was lower in case of RFR, 0.4 (Figure 7). This would suggest using the GLM, but considering the range of predicted values, it was obvious that RFR performed better than GLM. In fact, the lower quartile was the same (0.62) for both models, but the upper quartile and the maximum was different (0.70 and 0.75 for the GLM, and 0.85 and 0.90 for the RFR). Moreover, both the MAE and the RMSE indicated lower residuals for the RFR model: RFR's median was 15% lower for RMSE and 32% for MAE than of GLM. Furthermore, in the case of RFR, largest RMSE was lower than the median of the GLM. The better performance of RFR was confirmed by the predicted values, too. Bland-Altman diagrams pointed to the main characteristics of the predictions, i.e., RFR's prediction error's standard deviation was 37.2 (indicated with 2 × SD in Figure 8), while in case of GLM it was 108.2 ( Figure 9).

Influence of Soil Cover and Management on Soil Erosion and Runoff
Soil erosion is one of the major threats to sustainable agricultural systems in the Mediterranean part of Syria (i.e., coastal region and mountains). In this research, soil erosion and runoff were recorded after each rainy storm in two slopes with different steepness (4.7%, and 10.3%) under three different treatments (BS; MC, SC). Despite the existing large variation of natural rainfall intensity, our results indicated that higher erosion rate as well as runoff were recorded in BS followed by MC, then SC.
In the study area, erosion and runoff varied by the soil cover. BS plots were the most affected by soil erosion phenomena; which could be explained by the fact that in the rainy seasons between October and February the soil remains bare. Unprotected soil is directly impacted by raindrop forces which deteriorated aggregate stability, leaving soil pores quickly blocked by fine particles (silt, clay), causing soil sealing and crusting, lessening soil infiltration and raising the susceptibility to soil erosion [73][74][75][76][77]. By contrast, land cover in both MS and SC minimize the direct impact of rainy storms, subsequently, less soil erosion and runoff were noticed. These results coincided with other studies conducted in different parts of the Mediterranean basin [78][79][80][81][82][83]. However, no statistical difference has been detected between MC and SC plots in terms of soil erosion.
Precipitation characteristics (i.e., duration, length, intensity) directly impact runoff and subsequently soil erosion [82]; where a positive correlation between activated runoff and soil erosion, especially in the semi-arid region [74]. Our results revealed a positive significant correlation between soil erosion and runoff under different land cover (Figure 4), where highly observed rainfall intensity produced higher runoff whether in BS or other plots. Increasing rainfall erosivity accelerates soil sealing and crusting, minimizing infiltration capacity, and enhancing runoff generation and mechanics of soil erosion [84][85][86] recorded a high correlation between soil erosion and runoff (R 2 = 0.90) in the western Mediterranean basin (eastern Spain). Moreover, Hortonian runoff was observed in the studied plots especially in BS plots, when rainfall intensity was high (31 mm/30 min) and greater than soil infiltration capacity leading to high runoff and erosion rate. We can highlight that the first two rainy storms of the year (i.e., 3 October 2013; 11 November 2013) induced the second highest observed surface runoff and soil erosion rate in both slopes. Furthermore, these storm events occurred after long dry periods from May to September with high intensity (>16 mm/30 min) resulting in high erosion rates. Notably, more than 40% of total soil erosion were recorded in the first three events (i.e., 10 March 2013; 11 November 2013; 12 November 2013). Several authors stressed the drastic impact of early rainstorms after drought periods in the Mediterranean region on land degradation [87][88][89], while others conclude that few rainy storms were responsible for the larger proportion of soil erosion [90,91].
Many other factors such as low soil organic matter, high content of fine particles (i.e., clay, silt), and high bulk density (1.3 g/cm 3 ) provided favorable conditions for runoff initiation and erosion acceleration, which could be observed not only in Syria but in all the Mediterranean region [92][93][94]. Similarly, Cerdà et al. [75] pointed out that high soil bulk density is a key factor for the fast ponding and runoff initiation in the dry Mediterranean soils.

Impact of Different Soil Management Techniques on Soil Erosion and Runoff
In addition to high-intensity in short-duration rainfalls, shallow soil, slope gradient and mismanagement of cultivated lands are common and crucial factors of erosion in the whole Mediterranean region; thus, conservation management practices are important tools and planning. Under the Mediterranean part of Syria, SC could be an effective tool for minimizing both overland flow (i.e., runoff) and soil loss. Unlike other soil conservation techniques, SC technology is inexpensive and effective in minimizing the erosion rate. However, herbaceous plant cover may compete with the main crop due to water and nutrient uptake, which may cause a negative impact on crop production [95,96] keeping farmers away from using it. As the MS technique can hardly be applied in large areas, we used it in this research to provide an overview of the impact of soil mulching on soil water erosion; assuming MS will have the same behavior of mulching materials. Our results highlighted that soil mulching significantly reduced total soil erosion by 76.6% and 80% in the gentle slope (4.7%) and steeper slope (10.3%), respectively. Thus, soil mulching by straw (barley; wheat) or other organic wastes from the agricultural production could be a proper solution for combating soil erosion [97]. A growing body of literature has demonstrated the importance of proper soil management/mulching for minimizing soil erosion by water [98][99][100][101][102][103][104][105]. Rahma et al. [106] illustrated that soil mulching is an effective way of conserving water and soil in the Loess Plateau (China) because of reduction of surface runoff as well as protecting soil aggregates from the direct impact of raindrops. Similarly, Keesstra et al. [93] demonstrated that runoff can be reduced from 65.6% to 50.7% by using straw mulching in citrus orchards. However, Lucas-Borja et al. [107] found that soil mulching had no effect on runoff, while it reduced soil erosion in forest fires-affected landscapes (Spain). Nevertheless, MC, SC techniques could serve as soil conservation practice providing a promising tool to be used under Syrian agroecosystem.

Performance of General Linear Model (GLM) and Random Forest Regression (RFR) in the Study Area
Application of our proposed GLM and RFR models indicates that RFR is more accurate and reliable in predicting soil erosion under 50 runs. Remarkably, when runoff exceeded 5 L/m 2 , erosion rate was highly influenced by the soil cover types; for example an increase by 0.2 L/m 2 of runoff (above 5 L/m 2 ) resulting in an increase by 1.97 g/plot of soil erosion in bar soil plots (BS), 0.44 g/m 2 and 0.40 g/m 2 erosion in the case of MC and SC, respectively.
Linear regression analysis is the most popular technique to quantify the results, but our analysis calls attention to the uncertainty of using only one model: the R 2 of the single GLM model (Table 3) was 0.7, but the 50 repetition of the RCV showed that it varies between 0.53 and 0.75, depending on the input data. Important to note that the two R 2 is not the same: in Tables 3 and 4 it is an adjusted R 2 which was corrected with number of predictors, and in RCV we report the square of the correlation of the modelled and observed values (pseudo R 2 ); i.e., the latest is more reliable measure as the models are applied on independent data. Nevertheless, both R 2 are appropriate to show the model fit.
RFR never provides the same outcome (unless the randomizations are fixed e.g., in R software) as the algorithm works with hundreds of bootstrapped samples and each randomization results in a new model. We revealed that R 2 alone is not meaningful enough because RFR's R 2 -values were worse than GLM's but both MAE and RMSE indicated lower errors: minimum was better with 38% for RFR than GLM for both error metrics. Model performance was better with the fine-tuned RFR model, which is visually presented in the Bland-Altman plots (Figures 8 and 9). In case of GLM (Figure 9), we labelled the cases when exceeded the RFR model's 2 × SD range, which indicated that RFR's predicted values were more accurate (in that case prediction error was larger than 2 × SD only in 3 cases). However, the observed and modelled values were not statistically different between the RFR and GLM predicted values according to the Wilcoxon paired test (W = 954, z-score = 0.287, pMC = 0.77). This is the case when hypothesis testing is biased by the large variance of one group of the factor: GLM's variance was too large and completely overlapped with the small variance of RFR's interquartile range; in case of RFR 50% of the differences related to the reference had fallen between −8.43 and 5.35, while the same range was between −74.61 and 57.45 for GLM ( Figure 10).

Importance of Different Variables in Soil Erosion
Importance of the involved predictors of erosion were similar in both models: runoff was the most important and the rain intensity (i30) was the next variable. Although i30 was in high correlation (r = 0.95, p < 0.001) with the runoff, the importance of runoff was higher in the RFR model (due to avoid multicollinearity we omitted i30 from the GLM). Soil cover was also important, but, in this research, soil moisture and the slope type (i.e., 4.7% or 10.3%) was not relevant. The relevance of soil moisture and slope angle were reported in several studies [108][109][110], but slopes were usually steeper (larger than 10°) and the difference was also ~10° among the different slope types. Moreover, smaller importance can be attributed to the local characteristics: other influencing factors, such as i30 and runoff in the combination of three types of soil cover proved to be more important than the slopes. We repeated the RFR model without the runoff variable to filter out its large effect, but the rank and the magnitude remained the same for the other variables, and the relevance of slope and soil moisture did not increase.
This work highlighted the importance of both the plot experiments and the statistical evaluation. Plots are the primary sources of collecting erosion information and the statistics are the tools when we are able to extract the biasing factors. Application of cross-validation in erosion literature is limited. de Graffenried and Shepherd [111] applied Classification and Regression Tree (CART) modelling with 10-fold cross-validation to test visible infrared spectroscopy in the assessment erosion risk, but usually we find more examples when this technique had been applied in gully erosion mapping [112,113]. Rotigliano et al. [114] also applied a repeated random method to assess the reliability of their classification results in debris flow sensitivity. However, all of these examples dealt with classification and not with regression issues. Our results pointed on the importance of assessing the reliability of experimental results, because the statistical tests and models can have several realizations depending on the input (i.e., training) dataset. If we explore the possible outcomes, at least regarding the experiments, we can provide better models with their uncertainty, too.

Conclusions
We conducted a plot experiment on a Mediterranean area using three soil cover types on two different slopes. We aimed to reveal the most important predictors of the soil erosion with the most reliable statistical models. We revealed that runoff, rain intensity and the soil cover were the most important factors of erosion, but simple bivariate regression models were not efficient when all soil cover types were involved. GLM and RFR multivariate models were more efficient and helped to determine the importance of influencing factors. Cross-validated models showed the uncertainty of the outcomes. Hyperparameter tuning of the RFR model ensured finding the least RMSE; accordingly, pseudo R 2 -values indicated that GLM performed better but based on RMSE and MAE the erosion rate prediction was more successful with the RFR, and errors were about two-thirds those of GLM's results. Ranking of the variable importance was similar with both models, but the rank itself contradicted the current experiences: slope steepness and soil moisture had only limited effect on the erosion, which can be explained by local characteristics of the plots. This result also call attention to model evaluations, each study site is different, and when soil conservation experts plan the possible steps of mitigating the erosion, the local processes also should be considered.