Soil Erosion Susceptibility Mapping in Kozetopraghi Catchment, Iran: A Mixed Approach Using Rainfall Simulator and Data Mining Techniques

Soil erosion determines landforms, soil formation and distribution, soil fertility, and land degradation processes. In arid and semiarid ecosystems, soil erosion is a key process to understand, foresee, and prevent desertification. Addressing soil erosion throughout watersheds scales requires basic information to develop soil erosion control strategies and to reduce land degradation. To assess and remediate the non-sustainable soil erosion rates, restoration programs benefit from the knowledge of the spatial distribution of the soil losses to develop maps of soil erosion. This study presents Support Vector Machine (SVM), Random Forest (RF), and adaptive boosting (AdaBoost) data mining models to map soil erosion susceptibility in Kozetopraghi watershed, Iran. A soil erosion inventory map was prepared from field rainfall simulation experiments on 174 randomly selected points along the Kozetopraghi watershed. In previous studies, this map has been prepared using indirect methods such as the Universal Soil Loss Equation to assess soil erosion. Direct field measurements for mapping soil erosion susceptibility have so far not been carried out in our study site in the past. The soil erosion rate data generated by simulated rainfall in 1 m2 plots at rainfall rate of 40 mmh−1 was used to develop the soil erosion map. Of the available data, 70% and 30% were randomly classified to calibrate and validate the models, respectively. As a result, the RF model with the highest area under the curve (AUC) value in a receiver operating characteristics (ROC) curve (0.91), and the lowest mean square error (MSE) value (0.09), has the most concordance and spatial differentiation. Sensitivity analysis by Jackknife and IncNodePurity methods indicates that the slope angle is the most important factor within the soil erosion susceptibility map. The RF susceptibility map showed that the areas located in the center and near the watershed outlet have the most susceptibility to soil erosion. This information can be used to support the development of sustainable restoration plans with more accuracy. Our methodology has been evaluated and can be also applied in other regions.


Introduction
Soil erosion determines landforms [1,2], soil distribution [3,4], soil fertility [3,5], and land degradation processes [6,7]. In arid and semiarid ecosystems, soil erosion is responsible for desertification [8,9]. An improved understanding of soil erosion processes, rates, and spatiotemporal variability supports the development of soil erosion control strategies and the mitigation of land degradation [10,11]. To assess and remediate non-sustainable soil erosion problems, knowledge about the spatial distribution of soil loss is required to develop accurate soil erosion maps. [12,13].
Soil erosion refers to the process of detachment, transport, and sedimentation of particles [14]. Raindrops play a more effective role in altering soil structure and separating soil particles due to their high kinetic energy compared to runoff. Runoff is the main cause of soil particle transport along the slopes. Sedimentation takes place in lowlands. Another relevant mechanism of soil erosion is splash erosion, which is important in areas with low vegetation cover [14].
Attributed to climatic conditions and geological features, along with the dense population and intensive agriculture, half of the total area of the world that is water-eroded is found in Asia. Within Asia, Iran shows high erosion rates due to the climatic conditions and the rugged terrain [15]. Growing population, economic changes, urbanization, agricultural expansion, deforestation, and grazing are leading to an increased soil erosion in Iran. Land use changes are very active in Iran and the soil erosion is estimated to increase by almost seven times in the next decade [15]. To bring soil erosion under control and to achieve a reliable and sustainable land management, it is relevant to predict and map soil erosion.
Rainfall simulators have been used in soil erosion studies since 1930. They are one of the most successfully used tools worldwide in different subjects including hydrology, agronomy, and geomorphology [16][17][18]. The rainfall simulators are used under laboratory and field conditions, and they range in size from centimeters to meters. Their technology is very variable with continuous mechanical and electronic improvements. Rainfall simulators enable the investigation of the impact of rainfall on soil erosion behavior under controlled experimental settings under field and laboratory conditions [19,20]. Rainfall simulators allow scientists to explore processes that are difficult to study under natural rainfall conditions [4,21]. They support the exploration of processes and mechanisms of soil erosion under controlled conditions that natural rainfall does not allow due to the uneven distribution of rainfall events in time and space [14,22].
Despite recent advances in hydrological modeling, soil erosion estimation and measurements are still research topics that need to be further developed and improved. The relationship between rainfall and erosion is highly complex and non-linear, and this makes runoff and soil loss forecasts a difficult task. Rainfall simulators and data mining techniques can be useful to quantify soil erosion and support soil protection strategies at the watershed scale. Data mining is a widely used method for solving many non-linear prediction problems related to natural resources. Recently, techniques for soil erosion research such as Support Vector Machines (SVM) [23], Random Forest (RF) [24,25], and adaptive boosting (AdaBoost) [26,27], have been increasingly used to map environmental phenomena and estimate soil erosion. Data mining models have been considered by various researchers as suitable approaches in soil erosion studies. Examples for the application of data mining models are provided by studies on gully erosion [28][29][30][31][32][33][34], piping erosion [35,36], and landslides [25,[37][38][39].
Rainfall simulator experiments and data mining models are used here as a new approach to develop distributed soil erosion susceptibility maps. This is a new strategy that links direct experimental field and model approaches. A portable rainfall simulator was used to directly measure the sediment yield in the field in each hydrological response unit (HRU). The data mining tools were used to develop the erosion rate maps in the Kozetopraghi catchment. Three statistical methods (SVM, RF, and Adaboost predictions) were used and compared to determine soil erosion.
The Kozetopraghi catchment was used as study area because it shows the dynamic changes of land degradation over the last decades due to deforestation and the overexploitation of water resources [40]. Changes in vegetation cover, development of a network of new infrastructures and Land 2020, 9, 368 3 of 18 agricultural intensification resulted in an increase in soil erosion rates, calling for an assessment of the soil erosion risk for a future improved sustainable watershed management program in the region. Kozetopraghi catchment is representative for many regions of the world where land degradation processes take place [8,41]. The Kozetopraghi catchment study site was selected from a number of areas with similar environmental changes in Iran, but it has a good database available to determine the soil erosion factors that are used as input for our models. The field measurements were carried out to generate input experimental data for the model development.
The objective of this paper is to develop a soil erosion risk map of the Kozetopraghi catchment as a test watershed for areas affected by recent land management and land use changes. Our approach is based on the measurements of the soil erosion rates directly in the field by means of rainfall simulation experiments.

Study Area
The Kozetopraghi catchment is located in the northern part of the Garasou basin, Iran ( Figure 1). The study area covers 766 km 2 . The climate is semi-arid, with an average annual rainfall of 456 mm and characterized by warm and dry summer and very cold and wet winter months. The period from November to March shows the highest precipitation amounts, while June to October are the dry months [40]. The annual average temperature is 8 • C, and the temperature is below 0 • C for 130 days per year [42]. Due to the growing population in the Kozetopraghi catchment as it can be observed in other catchments of Iran, large areas of rangelands have been converted to rain-fed cultivation. The most widespread crops in the study watershed are barley, corn, and wheat. The new agricultural land is affected by increasing soil losses and runoff discharge that result in the siltation of the reservoirs in the study area. This causes threats to the water supply for the population and for irrigation in the area. Hence, there is a need for an improved land and water management and strategies to mitigate soil erosion in the watershed.
Land 2020, 9, x FOR PEER REVIEW 3 of 18 the soil erosion risk for a future improved sustainable watershed management program in the region. Kozetopraghi catchment is representative for many regions of the world where land degradation processes take place [8,41]. The Kozetopraghi catchment study site was selected from a number of areas with similar environmental changes in Iran, but it has a good database available to determine the soil erosion factors that are used as input for our models. The field measurements were carried out to generate input experimental data for the model development.
The objective of this paper is to develop a soil erosion risk map of the Kozetopraghi catchment as a test watershed for areas affected by recent land management and land use changes. Our approach is based on the measurements of the soil erosion rates directly in the field by means of rainfall simulation experiments.

Study Area
The Kozetopraghi catchment is located in the northern part of the Garasou basin, Iran ( Figure  1). The study area covers 766 km 2 . The climate is semi-arid, with an average annual rainfall of 456 mm and characterized by warm and dry summer and very cold and wet winter months. The period from November to March shows the highest precipitation amounts, while June to October are the dry months [40]. The annual average temperature is 8 °C, and the temperature is below 0 °C for 130 days per year [42]. Due to the growing population in the Kozetopraghi catchment as it can be observed in other catchments of Iran, large areas of rangelands have been converted to rain-fed cultivation. The most widespread crops in the study watershed are barley, corn, and wheat. The new agricultural land is affected by increasing soil losses and runoff discharge that result in the siltation of the reservoirs in the study area. This causes threats to the water supply for the population and for irrigation in the area. Hence, there is a need for an improved land and water management and strategies to mitigate soil erosion in the watershed.

Methodology
The flowchart of the methodology used in this study is presented in Figure 2. The tasks developed were the following: (i) determining and delineating hydrological response units (HRUs); (ii) rainfall simulator experiments in each HRU; (iii) soil erosion inventory map; (iv) determining soil erosion parameters; (v) soil erosion susceptibility map modeling using SVM, RF, and AdaBoost algorithms; (vi) implementing a sensitivity analysis of soil erosion parameters; and, (vii) creating a soil erosion susceptibility map using an ensemble modeling approach.

Methodology
The flowchart of the methodology used in this study is presented in Figure 2. The tasks developed were the following: (i) determining and delineating hydrological response units (HRUs); (ii) rainfall simulator experiments in each HRU; (iii) soil erosion inventory map; (iv) determining soil erosion parameters; (v) soil erosion susceptibility map modeling using SVM, RF, and AdaBoost algorithms; (vi) implementing a sensitivity analysis of soil erosion parameters; and, (vii) creating a soil erosion susceptibility map using an ensemble modeling approach. Delineating Hydrological Response Units (HRUs). Partitioning a catchment into hydrological response units helps to characterize different hydrological dynamics within a large area such as a catchment. An HRU is an area with a common soil type, land use, and slope angle [43]. The soil type, land use, and slope maps were overlaid using ArcGIS 10.3 and the HRU map was developed for the Kozetopraghi catchment ( Figure 3). We delineated 58 HRUs with areas ranging from 6.84 (HRU1) to27.03 km 2 (HRU2).  Delineating Hydrological Response Units (HRUs). Partitioning a catchment into hydrological response units helps to characterize different hydrological dynamics within a large area such as a catchment. An HRU is an area with a common soil type, land use, and slope angle [43]. The soil type, land use, and slope maps were overlaid using ArcGIS 10.3 and the HRU map was developed for the Kozetopraghi catchment ( Figure 3). We delineated 58 HRUs with areas ranging from 6.84 (HRU1) to 27.03 km 2 (HRU2).

Methodology
The flowchart of the methodology used in this study is presented in Figure 2. The tasks developed were the following: (i) determining and delineating hydrological response units (HRUs); (ii) rainfall simulator experiments in each HRU; (iii) soil erosion inventory map; (iv) determining soil erosion parameters; (v) soil erosion susceptibility map modeling using SVM, RF, and AdaBoost algorithms; (vi) implementing a sensitivity analysis of soil erosion parameters; and, (vii) creating a soil erosion susceptibility map using an ensemble modeling approach. Delineating Hydrological Response Units (HRUs). Partitioning a catchment into hydrological response units helps to characterize different hydrological dynamics within a large area such as a catchment. An HRU is an area with a common soil type, land use, and slope angle [43]. The soil type, land use, and slope maps were overlaid using ArcGIS 10.3 and the HRU map was developed for the Kozetopraghi catchment ( Figure 3). We delineated 58 HRUs with areas ranging from 6.84 (HRU1) to27.03 km 2 (HRU2).   Setting up rainfall simulators in each HRUs. In each of the 58 HRUs, three rainfall simulation experiments were developed. A mobile simulator was placed randomly in each HRU for the experiment. A total of 174 (58 HRU × 3 tests) rainfall simulation experiments were carried out. Each rainfall simulator location shows an area of 1 m 2 (1 × 1 m). Each experiment consisted of 20 minutes duration of rainfall with an intensity of 40 mm hr −1 . The surface wash discharge water was collected in 1.5 L volume bottles and the samples from 174 experiments were transported to the laboratory. The amount of sediment was determined and then the runoff sediment concentration was calculated. The experiments were conducted from 25 July 2018 to 22 August 2018. This period was selected in order to avoid changes in the soil water content that would disturb the spatial variability of the runoff generation and soil losses. This procedure has been proven useful in other studies [4,44]. The locations where the samples have been taken in each HRU are shown in Figure 3.
Soil erosion inventory map. The soil erosion inventory map is the first step to develop the erosion susceptibility map [27]. An accurate analysis of soil erosion susceptibility needs a reliable soil erosion map that shows the risk locations and the corresponding soil erosion rates. Studies in different regions of the world have shown that erosion rates vary (Table 1). Based on the rainfall simulator experiments, locations with soil erosion rates higher than 5.5 g m −2 min −1 were defined as erosion points and the locations with less than 5.5 g m −2 min −1 were defined as non-erosion points. This threshold was determined through measurements conducted by the Department of Natural Resources of Ardabil province, Iran. From the results of the rainfall simulation experiments, 82 points showed erosion and 92 points were classified as non-erosion points. The soil erosion rates ranged between 3.32 and 76.24 g m −2 min −1 in the HRUs. Values of 1 and 0 were used to show existence and absence of soil erosion risk on the area, respectively. Although increasing the number of data points can improve the results of used algorithms and accuracy [45,46], we found that the 174 experimental points in the study area catchment (766 km 2 ) were sufficient to characterize soil erosion risk. Previous studies used much less measurements for modeling soil erosion in larger catchments [23,47]. Using the SDM package of the R software, the data points were used for algorithms calibration (70%) and validation (30%) by the random classification method for soil erosion susceptibility analysis. The random classification method is a data separating method that repeats the separating of data randomly into training and testing classes without replacement [48]. Soil erosion conditioning parameters. Based on the experiences of previous published studies on the topic [57][58][59] and the data generated in the Kozetopraghi catchment, nine effective soil erosion susceptibility factors were selected and generated: altitude, slope, aspect, length-slope, topographic wetness index (TWI), stream power index (SPI), hydrological soil group (HSGs), land use, and geology. The information on each of the nine parameters selected and the protocols to produce them is discussed below.
Altitude. A 12 × 12 m spatial resolution digital elevation model (DEM) was used to develop the altitude map. It was obtained from the https://vertex.daac.asf.alaska.edu/ website. The altitude Land 2020, 9, 368 6 of 18 of the study area ranges from 1398 m to 2540 m.a.s.l. Rainfall magnitude changes with altitude and hence impacts the soil erosion processes [60]. Topography also affects runoff water velocity and sediment transport.
Slope. Low slope angles reduce the runoff velocity and allow infiltration, decreasing soil erosion rates compared to steep slopes. Steep slopes also suffer from dry ravel [23,61,62]. To assess the influence of slope on soil erosion, a slope map in percentage for the Kozetopraghi catchment (ranging from 0 to 441%) was generated.
Slope Length. This factor has a high impact on soil erosion, since increased runoff velocity on longer slope lengths causes stronger erosion. In this study, upslope contributing area of each cell was generated using the Moore and Burch (1986) approach as seen in Equation (1): where fa is flow accumulation and is extracted from the DEM map, and b is the slope gradient in degrees. Aspect. Slope aspect shows the direction of slopes. It is a key factor in soil erosion because, for instance, south-facing slopes have oftentimes a higher risk of soil erosion compared to north-facing slopes [58]. The slope aspect of the study area was classified into nine directional classes as flat, north, northeast, east, southeast, south, west, southwest, and northwest ( Figure 4).
Land 2020, 9, x FOR PEER REVIEW 6 of 18 study area ranges from 1398 m to 2540 m.a.s.l. Rainfall magnitude changes with altitude and hence impacts the soil erosion processes [60]. Topography also affects runoff water velocity and sediment transport. Slope. Low slope angles reduce the runoff velocity and allow infiltration, decreasing soil erosion rates compared to steep slopes. Steep slopes also suffer from dry ravel [23,61,62]. To assess the influence of slope on soil erosion, a slope map in percentage for the Kozetopraghi catchment (ranging from 0 to 441%) was generated.
Slope Length. This factor has a high impact on soil erosion, since increased runoff velocity on longer slope lengths causes stronger erosion. In this study, upslope contributing area of each cell was generated using the Moore and Burch (1986) approach as seen in Equation (1) where fa is flow accumulation and is extracted from the DEM map, and b is the slope gradient in degrees.
Aspect. Slope aspect shows the direction of slopes. It is a key factor in soil erosion because, for instance, south-facing slopes have oftentimes a higher risk of soil erosion compared to north-facing slopes [58]. The slope aspect of the study area was classified into nine directional classes as flat, north, northeast, east, southeast, south, west, southwest, and northwest ( Figure 4).  Topographic Wetness Index (TWI). The TWI is a hydrological factor that describes the spatial pattern of soil moisture in the study area. TWI indicates water saturated areas, which in the Kozetopraghi catchment varied from 1 to 14. The TWI values were extracted from DEM map using Equation (2).
where A s the specific catchment area in square meters and b is the slope gradient (in degrees). Stream Power Index (SPI). The SPI is a measure to indicate the amount of erosion by hill down flow, assuming that discharge is proportional to the specific catchment area (Conforti et al., 2011). The SPI extracted from DEM map and ranges from −464,766 to 586,989.
where SPI is stream power index, A s is a specific catchment area in m 2 , and b is the slope gradient (in degree). Hydrological Soil Groups (HSGs). HSG show four groups (A, B, C, and D) of soils with different infiltration potentials [63]. Among these hydrological soil groups, infiltration potential decreases from group A to D. The HSGs were obtained from the Iranian Department of Water Resources. In the study area, the percent area cover for A, B, C, and D groups were 0.87, 16.77, 67.36, and 15 percent, respectively.
Land use. Land use plays an important role for most of the hydrological processes such as soil erosion, runoff velocity, infiltration, and evapotranspiration (Yalsin et al., 2011). The land use map was obtained from the Iranian Regional Water company of Ardabil (IRWA) and included grassland, cliff, residential, agriculture, garden, and range. Agriculture and secondary range cover the largest area of the study catchment.
Lithology. The lithological condition is also an important controlling factor for soil erosion. Based on the lithological map of the study area [40], there are Travertine (Qtr), young alluvial tundra (Qt2), alluvial deposit (Qal), marl, conglomerate, glauconitic marl with Gypsum and salt (OMS), bright gray sandstone (NGMS), conglomerate with lime, sandstone and tuff (NGC3), gypseous gray marl (NGM3), and coarse grained porphyry trachyte (Mdt). The most commonly distributed lithological unit in the study area is Qt2 that is sensitive to soil erosion.

Erosion Susceptibility Modeling
In this study, an erosion susceptibility map was generated using selected machine learning models. These models are described below.

Support Vector Machine (SVM)
The SVM is a machine learning method that was proposed by Vapnik et al. [64]. The SVM is a type of learning algorithm using high dimensional feature. This algorithm is based on statistical learning theory and the structural risk minimization principal that has a significant rule in many natural resources, environmental, and ecological related issues [23,65].The accuracy of an SVM algorithm mainly relies on the values of its factors and their alignment.

Random Forest (RF)
The RF algorithm can be regarded as a non-parametric ensemble learner that cumulates a set of highly randomized classification trees to analyze the relationship between the independent and dependent variables [66]. This algorithm is a relatively new, tree-based model that optimizes the predictive performance by combining many simple decision trees into a developed model rather than using a single-tree model [24]. For performing this model, the number of trees in the forest, the number of variables used to grow each tree, and the minimum number of terminal nodes were defined.

AdaBoost
AdaBoost is a type of boosting algorithm, and there are many models that are derived from this algorithm. A large part of them are related to categorization; the rest of them are used to determine the regression relationship [26,67]. Different from other boosting algorithms, the AdaBoost algorithm is a type of iterative algorithm that adjusts the learning pattern according to the error returned by weak learners [68]. The main idea of this algorithm is combining weak learners produced in each iteration to generate a powerful learner. This algorithm forces the weak learner to focus on the difficult instances of the training dataset and introduces a series of classifiers that complement each other [27].

Accuracy Assessment
The area under the curve (AUC) is used to assess the predictive accuracy of distributional algorithms derived from presence-absence data [69][70][71]. For the soil erosion classification, each class belongs to either a positive (erosion) or negative (non-erosion) classes. The number of positive and negative pixels that are correctly classified in positive and negative classes are known as true positive (TP) and true negative (TN); conversely the positive and negative pixels that have been erroneously classified are known as false positive (FP) and false negative (FN). Thus, the AUC value shows the accuracy of two classifiers in a receiver operating characteristic (ROC) graph. The AUC value is between 0 (a diagnostic test that cannot distinguish between susceptive and non-susceptive points) to 1 (reflecting that TP = 1 and FP = 0). In this study, the ROC curve was performed to analyze the model's accuracy using the pROC package. The estimated AUC values of >0.6 show poor accuracy, 0.6-0.7 shows average prediction, 0.7-0.8 shows good prediction and 0.8-0.9 indicates very good accuracy [72,73].

Sensitivity Analysis
The ROC analysis is a well-known method in order to assess the accuracy of diagnostic tests. Most of the statistical coefficients based on AUC rely on the Jackknife test [74], which is accepted to have high capability for a broad range of practical problems. In this study, the relative decrease of AUC values as a percentage (RD) was computed using Equation (4) [74] to assess the factor contribution.
where AUC all is the AUC values computed from the estimation by all factors and the AUC i factor is estimated when the ith factor has been removed. Moreover, the parameter sensitivity in the model with the highest accuracy has been determined by using IncNodePurity algorithm. The results were compared by using the Jackknife method. The IncNodePurity is a loss function which is used for determining the importance of parameters. More important parameters have highest value of node purities. The best split shows parameters that have low intra node variance and higher inter node variance [75].

Erosion Susceptibility Mapping Results
The erosion susceptibility maps were prepared by applying SVM, RF and AdaBoost models. The produced maps by each of the three algorithms were grouped into four classes: low, moderate, high, and very high surface erosion susceptibility ( Figure 5). The results are listed in Table 2. The classification was performed based on the natural break classification approach [76]. This technique is a categorization method that classifies maps regarding their inherent characteristics, and identifies class breaks that minimize the within-class differences and maximize the between-class differences.

Soil Erosion Susceptibility Classes Analyzes
The map using the SVM model showed that the highest susceptibility class covered the second largest area (217 km 2 ; shown in yellow color) and the very high susceptibility class was the smallest in its areal coverage (123 km 2 ; shown in red color) (Figure 5a; Table 2). The results of the RF model showed that the very high class in soil erosion susceptibility also covered the largest area with 255 km 2 . In this model, the smallest area was related to low class (130 km 2 ; shown in dark green color) (Figure 5b). In the AdaBoost model, the very high susceptibility class had the largest area (202 km 2 ) too, and the smallest area was related to low class (194 km 2 ). The RF and AdaBoost models have a similar classification. The difference is that the moderate class (shown in bright green color) in the RF model had the third rank (182 km 2 ), whereas in the AdaBoost model it had the second rank (175 km 2 ).

Soil Erosion Susceptibility Classes Analyzes
The map using the SVM model showed that the highest susceptibility class covered the second largest area (217 km 2 ; shown in yellow color) and the very high susceptibility class was the smallest in its areal coverage (123 km 2 ; shown in red color) (Figure 5a; Table 2). The results of the RF model showed that the very high class in soil erosion susceptibility also covered the largest area with 255 km 2 . In this model, the smallest area was related to low class (130 km 2 ; shown in dark green color) (Figure 5b). In the AdaBoost model, the very high susceptibility class had the largest area (202 km 2 ) too, and the smallest area was related to low class (194 km 2 ). The RF and AdaBoost models have a similar classification. The difference is that the moderate class (shown in bright green color) in the RF model had the third rank (182 km 2 ), whereas in the AdaBoost model it had the second rank (175 km 2 ). Among the models, the RF model simulated the largest extent of very high-class area and the smallest low class rank for soil erosion susceptibility.

Soil Erosion Plots Analysis
The number of the erodible plots in each class for the study models are indicated in Table 3. The results showed that the RF model identified the greatest number of soil erosion points (40 plots) in Land 2020, 9, 368 10 of 18 the very high class. The AdaBoost model in the very high susceptibility class identified 33 erosion plots, whereas the SVM model had the least points (21 plots) in this class. In the RF and Adaboost models, with increasing erosion susceptibility, the number of erosion points increased. On the other hand, the results of the SVM model simulations showed more erosion points in the moderate (25 plots) and high susceptibility classes (28 plots) than in the very high susceptibility class (21 plots). This result indicates the lack of accuracy of the SVM model. The comparative analysis showed that the RF model had better results than the SVM and Adaboost models in terms of soil erosion locations determining (Table 3).

Validation of Soil Erosion Susceptibility Maps
The estimated erosion susceptibility maps were validated by the area under the curve (AUC) value of the ROC curve ( Figure 6). In the ROC curve, a plot of measured and calculated values is used to estimate the area under the curve and hence to assess the accuracy of estimation. Previous studies mentioned that the area under the ROC curve (AUC) is reliable in order to investigate the uncertainty in model estimations [58,76]. The ability and uncertainty of the SVM, RF, and AdaBoost models were assessed using the AUC value. An AUC value near the value of 1.0 shows a high level of accuracy, whereas an AUC value near 0.0 shows inaccuracy [23]. The denoted plots of different models showed that the RF model had the most estimation accuracy (AUC = 0.91). The AUC value in the SVM and AdaBoost models obtained 0.77 and 0.78, respectively. Also, the mean square error (MSE) was used to conduct performance assessment ( Table 4). The calculated values showed that the RF model had the least error (0.09), whereas this error in SVM and AdaBoost models obtained 0.14 and 0.13, respectively.  Also, the mean square error (MSE) was used to conduct performance assessment ( Table 4).
The calculated values showed that the RF model had the least error (0.09), whereas this error in SVM and AdaBoost models obtained 0.14 and 0.13, respectively.

Parameters Sensitivity Analysis
The assessment of the sensitivity of different parameters in modeling soil erosion susceptibility allows to explore the importance of model parameters for the simulation. Determining the appropriate conditioning inputs is significant for a sound generation of erosion susceptibility maps. [58,77]. In the developed erosion susceptibility map, the sensitivity of nine parameters were assessed by the Jackknife approach with a partial-derivative calculation. This method is common and very effective because of its simplicity and reduction of the estimator bias that arises when applying more complex frameworks [58]. The relative importance of the effective parameters on soil erosion in the study area is shown in Figure 7. In addition, for the investigation of the sensitivity of parameters, the IncNodePurity (Figure 8) was calculated for each parameter in the RF model. The parameters with high IncNodePurity values are the most important or show high sensitivities [78]. Figure 8 shows the parameters sorted decreasingly by the IncNodePurity values in the RF model.  [78]. Figure 8 shows the parameters sorted decreasingly by the IncNodePurity values in the RF model.

Spatial Variability of Soil Erosion
The soil erosion maps ( Figure 5) showed that soil erosion processes in the Kozetopraghi watershed have a certain spatial variability. Some areas located in the lower and central part of the watershed show the highest erosion rates. On the contrary, the areas located in the mountainous terrain show lower soil erosion rates due to the forest and rangeland vegetation cover. Previous studies in the region found that the whole watershed was affected by soil erosion [79], but our research enables a spatial differentiation based on the rainfall simulation experiments. The results showed that in each of the three models used in this research, the soils that have the highest susceptibility to soil erosion and classified as very high are located in the central part of the watershed

Spatial Variability of Soil Erosion
The soil erosion maps ( Figure 5) showed that soil erosion processes in the Kozetopraghi watershed have a certain spatial variability. Some areas located in the lower and central part of the watershed show the highest erosion rates. On the contrary, the areas located in the mountainous terrain show lower soil erosion rates due to the forest and rangeland vegetation cover. Previous studies in the region found that the whole watershed was affected by soil erosion [79], but our research enables a spatial differentiation based on the rainfall simulation experiments. The results showed that in each of the three models used in this research, the soils that have the highest susceptibility to soil erosion and classified as very high are located in the central part of the watershed and near to the watershed outlet. The periferian areas in the mountain terrain of the watershed, where forest and rangelands are present, are the areas with the lowest soil losses.

Soil Effect on Soil Erosion Rates
The soils of these central and outlet parts of the watershed belong to the hydrologic soil group C that are mainly used by the agriculture. Agricultural runoff has been considered as a source of sediments and nutrients in many parts of the world. Agricultural management can degrade soils and induce low infiltration rates [80], which causes higher runoff and a high efficiency in the connectivity of the flows and water [44,81]. The C group shows low infiltration potential that leads to high amounts of surface runoff and increased soil erosion. This was found from the rainfall simulation experiments on the bare soils and this was supported by other studies in different regions of the world [4]. Low infiltration rates and high surface runoff induce soil degradation and further reduce infiltration and increase erosion rates [43,63,82]. The central and lower part of the watershed are dominated by the agricultural areas, where the farmers grow corn, wheat, and barley. These areas are mainly dry lands that are affected by intense ploughing. Until now, there is a lack of any soil management practices (such as conservation tillage) to reduce the soil losses and restore the soil fertility. High erosion rates on rain fed agricultural land were also observed in other regions of the world such as in vineyards [83], persimmon plantations [84] and citrus plantations [85].

The Impact of Land Use Effect on Soil Erosion Rates
Agricultural fields cover the largest part of the study sites and are also shown to be the most productive in terms of sediment generation. Poor land management and the related continuous soil disturbances cause the detachment of soil particles that are transported to the water ways.
The most widespread crops in the study watershed are barley, corn, and wheat. All the three crops are sown in April and harvested in June. Therefore, the vegetation cover is low during most of the year, leaving the soils prone to erosive rainfalls and induce the formation of rills and gullies. Moreover, the most widespread parent material in the cultivated land is NGC3 (conglomerate with lime, sandstone, and tuff) which contributed the highest soil erosion rates at the study area, again located in the center and at the outlet of the watershed. The mountain areas located at the outskirt of the watershed show forest and rangeland areas that protect the soils against the eroding influence of the rainfall, and moreover, the parent material is less erodible.

Model Outputs and Performance
The results of the AUC curve, MSE criteria, and the number of eroded points in high and very high susceptibility classes indicate that the RF model had a better performance than the SVM and Adaboost models. The ability and accuracy of the RF model in mapping soil erosion susceptibility is mentioned in other studies [86,87]. The three models show different results when it comes to identifying the extent of high and very high soil erosion susceptibilities. The RF model, as evaluated by the AUC, was shown to be superior. The MSE analysis also shows RF to be better than the other models.

Model Parameter Sensitivity
The sensitivity of the nine model input parameters was identified using Jackknife and the IncNodePurity tests. The results of the analysis to find relative importance of parameters in erosion susceptibility modeling showed that the slope parameter is the most important one. The slope parameter has a significant effect in accelerating runoff for transporting soil particles [15,17,58].
We consider our rainfall simulator-based experiment to identify areas of erosion hot spots as more reliable than other more conceptual approaches such as the Universal Soil Loss Equation.
The results reported in this study were based on a randomly selected soil erosion rates from runoff plots using controlled rainfall rates from the simulator. These results, when linked to well tested data mining algorithms, show a higher reliability than other empirical and semi-empirical approaches. The results from this study can be used to support the implementation of soil erosion conservation measures.
This erosion susceptibility map is model-based and depends partly on assumptions. Since models are approximations, the results need to be verified and rechecked using other approaches. Studies in other areas of the world have used these tools and methods for identifying erosion hot spots. Future studies on evaluating soil erosion using different rainfall rates and field management conditions is recommended.

Conclusions
Mapping soil erosion susceptibility is an important area of research to improve catchment management and achieve effective soil conservation. This study employed rainfall simulator experiments to develop a soil erosion inventory map. AdaBoost, SVM, and RF models were used to produce a soil erosion susceptibility map. Jackknife and IncNodePurity were used to analyze the sensitivity of the methods, and we found that slope, SPI, and altitude were the most important parameters to map soil erosion. The surface soil erosion susceptibility map obtained by using the RF model with a high accuracy indicates that the areas in the center or near the outlet of the catchment show higher soil erosion rates due to the agricultural activities. Those areas are affected by high erosion rates due to the impact of new agricultural land, inadequate tillage practices, and continuous soil disturbances. The mountainous regions located in the hilly terrain showed low soil erosion rates due to the natural vegetation cover. In addition, the approach in this study can be used as a guide for future research to analyze the soil erosion vulnerability to land use changes in semi-arid areas. Moreover, we hope that our work will foster new approaches of machine learning such as deep learning and optimization techniques that can improve the prediction of erosion sensitive areas.