Machine Learning-Based Gully Erosion Susceptibility Mapping: A Case Study of Eastern India

Gully erosion is a form of natural disaster and one of the land loss mechanisms causing severe problems worldwide. This study aims to delineate the areas with the most severe gully erosion susceptibility (GES) using the machine learning techniques Random Forest (RF), Gradient Boosted Regression Tree (GBRT), Naïve Bayes Tree (NBT), and Tree Ensemble (TE). The gully inventory map (GIM) consists of 120 gullies. Of the 120 gullies, 84 gullies (70%) were used for training and 36 gullies (30%) were used to validate the models. Fourteen gully conditioning factors (GCFs) were used for GES modeling and the relationships between the GCFs and gully erosion was assessed using the weight-of-evidence (WofE) model. The GES maps were prepared using RF, GBRT, NBT, and TE and were validated using area under the receiver operating characteristic (AUROC) curve, the seed cell area index (SCAI) and five statistical measures including precision (PPV), false discovery rate (FDR), accuracy, mean absolute error (MAE), and root mean squared error (RMSE). Nearly 7% of the basin has high to very high susceptibility for gully erosion. Validation results proved the excellent ability of these models to predict the GES. Of the analyzed models, the RF (AUROC = 0.96, PPV = 1.00, FDR = 0.00, accuracy = 0.87, MAE = 0.11, RMSE = 0.19 for validation dataset) is accurate enough for modeling and better suited for GES modeling than the other models. Therefore, the RF model can be used to model the GES areas not only in this river basin but also in other areas with the same geo-environmental conditions.


Introduction
One of the major problems in modern societies in the last decade is the degradation of natural resources, especially soil and water [1]. The rapid population growth and careless use of natural resources lead to soil and water degradation, which in turn threatens human lives and property [2,3].
Soil erosion by water, such as in the form of gully erosion, is one of the most common soil degradation processes worldwide [4,5]. Gully erosion typically features a deep channel eroded by running surface water that removes and transports the eroded surface soil particles and other materials [6]. Gully erosion causes various environmental problems like desertification, inundation, The Hinglo River basin geographically extends from 23°42´7.09´´ N to 24°0´56.78´´ N latitude and 86°59´32.68´´ E to 87°23´31.91´´ E longitude ( Figure 1). The total catchment area is 442.95 km 2 . The Hinglo River basin, which is a major tributary of the Ajay river basin, encompasses part of the Jamtra district and the Birbhum district in India. The length of the primary river is 66 km. Physiographically, most of this basin is part of the Chota Nagpur plateau fringe region. The study area is subjected to the Indian monsoon climate, with an average annual rainfall of 1316-1361 mm. This basin encompasses five geological formations, namely granite-gneiss, barker, ironstone shale, newer alluvium, and quartzite [33]. The alluvium thickness of the eastern part of the basin varies between 12 and 20 m [34]. The depth of the groundwater table of this region varies between 5 and 10 m b.g.l [35]. The area has seven soil texture classes, namely sand, clay, clay loam, haplustepts, sandy loam, loam, and fine loamy [36], whereby haplustepts covers most of the basin. The maximum elevation of the study is 284 m a.s.l. The north and middle-western parts of this study area are subjected to gully erosion [37]. A precise gully erosion susceptibility map is essential for this region to manage the erosion-prone areas.

Methodology
Different types of data were collected from various sources to fulfill the intent of our research ( Table 1). The gully locations were identified from field investigation using a handheld global positioning system (GPS) and Google Earth image. Table 1 details the data used in this study.

Methodology
Different types of data were collected from various sources to fulfill the intent of our research ( Table 1). The gully locations were identified from field investigation using a handheld global positioning system (GPS) and Google Earth image. Table 1 details the data used in this study.
The present study was carried out in the following five main steps ( Figure 2): (1) Gully inventory map (GIM) and GCFs data layers were prepared; (2) a multi-collinearity analysis was carried out to select the gully erosion conditioning factors (GCFs); (3) the weight-of-evidence (WofE) method was used to examine the relationship between gully erosion and GCFs; (4) the machine learning models RF, NBT, GBRT, and TE models were applied to prepare the GESMs, and (5) the performance of each of the ensemble models was evaluated using the area under the receiver operating characteristic (AUROC) and SCAI methods and a few statistical measures.

Preparing the Gully Inventory Map (GIM)
The GIM is essential for preparing the GESMs by various predictive models [27] and was considered as the dependent variable in this study area. To prepare the GIM, first, gully locations and dimensions were measured using the remotely sensed data through Google Earth. Then, a field investigation was conducted in the study area to update and ground truth check the data. Gully locations were geolocated with handheld GPS. A total of 120 gullies were identified in the study area. Of the 120 gullies, 84 (70%) gullies were randomly selected for model preparation, and the remaining 36 (30%) gullies were used for model validation ( Figure 3) based on previous literature [20,23,24]. Representative gully images are shown in Figure 1.

Preparing the Gully Conditioning Factors (GCFs)
Selecting geo-environmental factors is an important step in preparing the GESMs using various methods [11]. In this study, 14 GCFs, namely elevation, slope, aspect, monsoonal rainfall, soil type, geology, LULC, NDVI, distance to river, distance to lineament, Lof, TWI, STI, STI were used for spatial gully erosion modeling while considering the previous literature and multi-collinearity analysis.  considered as the dependent variable in this study area. To prepare the GIM, first, gully locations and dimensions were measured using the remotely sensed data through Google Earth. Then, a field investigation was conducted in the study area to update and ground truth check the data. Gully locations were geolocated with handheld GPS. A total of 120 gullies were identified in the study area. Of the 120 gullies, 84 (70%) gullies were randomly selected for model preparation, and the remaining 36 (30%) gullies were used for model validation (Figure 3) based on previous literature [20,23,24]. Representative gully images are shown in Figure 1. Selecting geo-environmental factors is an important step in preparing the GESMs using various methods [11]. In this study, 14 GCFs, namely elevation, slope, aspect, monsoonal rainfall, soil type, geology, LULC, NDVI, distance to river, distance to lineament, Lof, TWI, STI, STI were used for spatial gully erosion modeling while considering the previous literature and multi-collinearity analysis.
where "As" is the area of a specific catchment; "B" is the slope gradient in degrees; m is constant, i.e., 0.4, "n" is constant, i.e., 0.0896. The STI was classified into the five classes of 0-1. 75 Figure 4d). The SPI reflects the discharge, carrying capacity, and runoff erosion power, which determines the gully erosion susceptibility [22,39,40]. The SPI was derived from DEM using the following Equation (2).
where A S is the upstream contributing area and β is slope gradient (in degrees runoff erosion power, which determines the gully erosion susceptibility [22,[39][40]. The SPI was derived from DEM using the following Equation (2).
where AS is the upstream contributing area and β is slope gradient (in degrees). The SPI was   The study area's average monsoonal rainfall map was prepared using the kriging method based on the rainfall data of the last three years measured at different stations. The monsoonal rainfall was categorized into the five sub-classes of 738-748, 748-757, 757-767, 767-781, 781-797 ( Figure 5a). The TWI was defined by Beven and Kirkby [41]. It is commonly used to evaluate a region's hydrological features [42]. The TWI is considered to be an important gully erosion determining factor [39]. The TWI was derived from DEM imagery using the following Equation (3).
where AS is the upstream contributing area and β is the slope gradient (in degrees). The TWI was classified into five sub-categories, namely 2.92-7.  The study area's average monsoonal rainfall map was prepared using the kriging method based on the rainfall data of the last three years measured at different stations. The monsoonal rainfall was categorized into the five sub-classes of 738-748, 748-757, 757-767, 767-781, 781-797 ( Figure 5a). The TWI was defined by Beven and Kirkby [41]. It is commonly used to evaluate a region's hydrological features [42]. The TWI is considered to be an important gully erosion determining factor [39]. The TWI was derived from DEM imagery using the following Equation (3).
where A S is the upstream contributing area and β is the slope gradient (in degrees  Figure 5b). The distance from the river map was prepared by applying the Euclidian distance buffer (EDB) tool in GIS (Figure 5c). It was categorized into five sub-classes, namely 0-0.18 km, 0.18-0.42 km, 0.42-0.73 km, 0.73-1.17 km, and 1.17-2.10 km distance (Figure 5c).
Sensors 2020, 20, x FOR PEER REVIEW 9 of 28 The LULC map was extracted from Landsat 8OLI/TIRS imagery based on the maximum likelihood classification method in GIS (Figure 6a). Water bodies, fallow land, agricultural land, settlement, and natural vegetation are the land use types found in the basin (Figure 6a). Using the digitization process in GIS environment, the geological map was generated for the study area ( Figure  6b). Geologically, the study area consists of five geological formations, i.e., iron shale, barakar formation (comprises several meters of thick pebbly or conglomeratic succeeded by heterolithic crossstratified sandstone-mudstone-carbonaceous shale-coal beds), quartzite, granite-gneiss, and newer alluvium (Figure 6b). The NDVI map was prepared using the Landsat 8OLI/TIRS imagery in a GIS environment (Figure 6c) with the help of Equation (5).
where IR is the electromagnetic spectrum's infrared portion, and R is the electromagnetic spectrum's red portion. The NDVI was classified into five classes, namely −0.15 to 0.16, 0.16 to 0.20, 0.20 to 0.23, 0.23 to 0.28, and 0.28 to 0.43 ( Figure 6c). The soil type map was prepared using the district's registered soil type map in GIS (Figure 6d). Pedologically, the study area is composed of seven soil texture classes namely clay, fine loamy mixed (Haplustepts), clay loam, loam, fine loamy mixed type The measured length of the overland flow (Lof) was introduced by Horton [43] and is calculated using Equation (4).
where Dd is the drainage density. Drainage density is the total length of stream per unit area.  (Figure 5d). The LULC map was extracted from Landsat 8OLI/TIRS imagery based on the maximum likelihood classification method in GIS (Figure 6a). Water bodies, fallow land, agricultural land, settlement, and natural vegetation are the land use types found in the basin (Figure 6a). Using the digitization process in GIS environment, the geological map was generated for the study area ( Figure 6b). Geologically, the study area consists of five geological formations, i.e., iron shale, barakar formation (comprises several meters of thick pebbly or conglomeratic succeeded by heterolithic cross-stratified Sensors 2020, 20, 1313 9 of 25 sandstone-mudstone-carbonaceous shale-coal beds), quartzite, granite-gneiss, and newer alluvium ( Figure 6b). The NDVI map was prepared using the Landsat 8OLI/TIRS imagery in a GIS environment ( Figure 6c) with the help of Equation (5).
where IR is the electromagnetic spectrum's infrared portion, and R is the electromagnetic spectrum's red portion. The NDVI was classified into five classes, namely −0.15 to 0.16, 0.16 to 0.20, 0.20 to 0.23, 0.23 to 0.28, and 0.28 to 0.43 ( Figure 6c). The soil type map was prepared using the district's registered soil type map in GIS ( Figure 6d). Pedologically, the study area is composed of seven soil texture classes namely clay, fine loamy mixed (Haplustepts), clay loam, loam, fine loamy mixed type palustepts, sand and sandy loam ( Figure 6d).
Sensors 2020, 20, x FOR PEER REVIEW 10 of 28 In this study, the elevation, slope, rainfall, distance from river, distance from lineament, NDVI, TWI, SPI, and STI were used as numerical variables and reclassified into five sub-categories using the NBM in GIS. The aspect, geology, soil type, land use/land cover were used as categorical variables. The presence and absence of gullies were used as target variables.

Multi-Collinearity Analysis of Effective Factors
The multi-collinearity test is an important way to judge the linear dependency among the selected independent factors in the statistical modeling [44]. In the case of the machine learning models, this technique needs to be used for better results [45][46][47][48][49][50][51][52]. Researchers have applied multicollinearity analysis for gully erosion susceptibility mapping [53], groundwater potentiality mapping [54], landslide susceptibility mapping [48] etc. The multi-collinearity was tested using the tolerance (TOL) and variance inflation factor (VIF). The TOL was calculated using Equation (6), where R2 is obtained by the regression of each variable for the remaining variables in the multivariate regression [55]. In this study, the elevation, slope, rainfall, distance from river, distance from lineament, NDVI, TWI, SPI, and STI were used as numerical variables and reclassified into five sub-categories using the NBM in GIS. The aspect, geology, soil type, land use/land cover were used as categorical variables. The presence and absence of gullies were used as target variables.

Multi-Collinearity Analysis of Effective Factors
The multi-collinearity test is an important way to judge the linear dependency among the selected independent factors in the statistical modeling [44]. In the case of the machine learning models, this technique needs to be used for better results [45][46][47][48][49][50][51][52]. Researchers have applied multi-collinearity analysis for gully erosion susceptibility mapping [53], groundwater potentiality mapping [54], landslide susceptibility mapping [48] etc. The multi-collinearity was tested using the tolerance (TOL) and variance inflation factor (VIF). The TOL was calculated using Equation (6), where R2 is obtained by the regression of each variable for the remaining variables in the multivariate regression [55].
where R 2 j is the regression value of explanatory j on all other independent variables. A tolerance of less than 0.10 and a VIF value of 10 and above indicate a multi-collinearity problem [56].

Assessment of The Relationship between Gully Erosion and Effective Factors using Weight-of-Evidence (WofE) Model
The WofE is an important bivariate statistical method, which calculates the relative importance of effective factors by statistical means using the log-linear form of the Bayesian probability model [57]. In this analysis, the WofE model was used to demonstrate the relationship between gully occurrence and gully conditioning factors [51] obtained using the regression of each variable [55].
where B pix1 is the number of pixels of gully erosion in a particular class, B pix2 is the total number of pixels of gully erosion on a map, B pix3 is the number of pixels in a specific class of GCF, and B pix4 is the total number of pixels in a map. A is positive weight X + i indicates the existence of a gully pixel and a positive relationship between the presence of the gully pixel and GCF and vice versa. Finally, the weight was calculated using Equation (10) [26,58].
where, F is the weight, and P is the differential weight between positive and negative. P is negative for a negative correlation and positive for a positive correlation between GCFs and gully erosion [59]. Q (P) is the standard deviation (SD) of the weight contrast.

Random Forest (RF) Model
Decision trees were used to generate subset training datasets for the preparation of the final model based on the random sampling method [60]. The T (number of trees) and m (number of variables) are the important features of the RF model and are defined by the user. Micheletti et al. [60] concluded that a calibration set is not essential for defining the parameters. Calle and Urrea [61] noted that the RF model could be used for analyzing the importance of the factors. In this analysis, the RF model consists of the two trees (presence and absence of gullies) that are evaluated by the 14 random independent variables. For the RF algorithm, the generalization error is measured as follows [62].
where x and y represent the contributing factors to gully erosion displaying probabilities over x and the margin function, and the indicator function are represented by y space, mg and I (*) [63].

Naïve Bayes Tree (NBT) Model
Kohavi [64] suggested the use of the Naïve Bayes tree (NBT) method, a hybrid algorithm of decision tree and Naïve Bayes. The NBT model uses very little training data to evaluate the most important modeling and classification parameters [65]. The NBT was used as the reference classifier to evaluate the vulnerability to gully erosion in an ensemble framework [66]. The NBT operates as follows [67].
where pp(t i ) is the earlier variables output probability t i = (1,0), σ and ε indicate the average and SD of r i respectively 2.6.3. Gradient Boosting Regression Tree (GBRT) The GBRT was introduced by Friedman [32] and is an important machine learning technique. Boosting is a popular learning approach that was specifically designed to overcome categorization issues but has also been effectively extended to regression. The impetus for boosting is to unite a powerful committee with the output of many weekly learners [68]. The works of Hastie et al. [68], Ridgeway [69] and Scikit-learn [70] provide an in-depth description of gradient boosting and gradient boosted regression trees (Algorithm 1).

Tree Ensemble (TE) Model
The Tree Ensemble method combines various decision tree models to produce a more suitable and accurate predictive model than using a single tree model. The TE method consists of two decision tree methods, such as bagging and boosting. The random forest model uses the bagging and the gradient boosted regression tree model uses the boosting method. Therefore, the TE method is the sum of the ensemble of all tree models [71]. A sum-ensemble of trees model f : R 2 → R consists of a  (13) where pp(ti) is the earlier variables output probability ti = (1,0), σ and ε indicate the average and SD of ri respectively 2.6.3. Gradient Boosting Regression Tree (GBRT) The GBRT was introduced by Friedman [32] and is an important machine learning technique.
Boosting is a popular learning approach that was specifically designed to overcome categorization issues but has also been effectively extended to regression. The impetus for boosting is to unite a powerful committee with the output of many weekly learners [68]. The works of Hastie et al. [68], Ridgeway [69] and Scikit-learn [70] provide an in-depth description of gradient boosting and gradient boosted regression trees (Algorithm 1).

Tree Ensemble (TE) Model
The Tree Ensemble method combines various decision tree models to produce a more suitable and accurate predictive model than using a single tree model. The TE method consists of two decision tree methods, such as bagging and boosting. The random forest model uses the bagging and the gradient boosted regression tree model uses the boosting method. Therefore, the TE method is the sum of the ensemble of all tree models [71]. A sum-ensemble of trees model  (13) where pp(ti) is the earlier variables output probability ti = (1,0), σ and ε indicate the average and SD of ri respectively 2.6.3. Gradient Boosting Regression Tree (GBRT) The GBRT was introduced by Friedman [32] and is an important machine learning technique.
Boosting is a popular learning approach that was specifically designed to overcome categorization issues but has also been effectively extended to regression. The impetus for boosting is to unite a powerful committee with the output of many weekly learners [68]. The works of Hastie et al. [68], Ridgeway [69] and Scikit-learn [70] provide an in-depth description of gradient boosting and gradient boosted regression trees (Algorithm 1).

Tree Ensemble (TE) Model
The Tree Ensemble method combines various decision tree models to produce a more suitable and accurate predictive model than using a single tree model. The TE method consists of two decision tree methods, such as bagging and boosting. The random forest model uses the bagging and the gradient boosted regression tree model uses the boosting method. Therefore, the TE method is the sum of the ensemble of all tree models [71]. A sum-ensemble of trees model consists of a Ʈ of regression trees. Keeping the generality unchanged, a regression tree ∈ Ʈ is a binary tree where each internal node . n t nodes ∈ bears a rational predicate over the feature variables. The prediction of tree T is the leaf value of the prediction path. Finally, the signed margin prediction f(x) of the ensemble model is the sum of predictions of all individual trees, and the predicted label is acquired by the threshold value generally set at zero: ( ) 1 In this study, we consider the case of single-feature threshold predicates of the form i x T < or equivalently i x T > where 0 i n ≤ < and T ∈ fixed model parameters. This restriction keeps out oblique decision trees where predicates concurrently engage numerous feature variables. However, is a binary tree where each internal node n ∈ t.nodes bears a rational predicate over the feature variables. The prediction of tree T is the leaf value of the prediction path. Finally, the signed margin prediction f(x) of the ensemble model is the sum of predictions of all individual trees, and the predicted label is acquired by the threshold value generally set at zero: c(x) = 1 ⇔ f (x) > 0 .
In this study, we consider the case of single-feature threshold predicates of the form x i < T or equivalently x i > T where 0 ≤ i < n and T ∈ R fixed model parameters. This restriction keeps out oblique decision trees where predicates concurrently engage numerous feature variables. However, we note that oblique trees are seldom used in ensemble classifiers, partially because of their relatively high construction cost and complexity [72]. Kantchelian et al. [71] have used the equation and technique of the TE model for evasion and hardening of tree ensemble classifiers. Xiao et al. [73] have also used the tree ensemble classifier technique for identifying the different transportation nodes.

Validation Methods
In the present study, we used five statistical measures, namely precision (PPV), false discovery rate (FDR), accuracy, mean absolute error (MAE), and root mean squared error (RMSE) to evaluate the robustness of the used machine learning ensemble models. PPV is the proportion of units with an expected positive outcome that is positive for the true condition (Equation (14)). FDR is the proportion of the units with a predicted positive condition for which the true condition is negative (Equation (15)). The accuracy represents the maximum proportion of accurately estimated or defined units (Equation (16)). The MAE (Equation (17)) and RMSE (Equation (18)) were used to measure the variation between observed and predicted data. The robustness of models is good when PPV and accuracy are high, and FDR, MAE, and MRSE are low [74][75][76].
A standard tool for evaluating the model performance is area under the receiver operating characteristic (AUROC) curve [77,78]. ROC is plotted on the x-and y-axis based on the sensitivity and 10-specificity. The model output was assessed using the AUC (area under the curve) of ROC (Equation (19)) [77,78]. In the previous studies [79], the mathematical theory and equation of this approach are fully described. The sensitivity (i.e., probability detection) addresses the question of which part of the detected gullies is labeled accurately and its optimal value is 1 [80]. The specificity (i.e., negative predictive value) addresses the question of which part of the non-gullies is categorized correctly, and its optimal value is 1. The AUC values below 0.6, 0.6-0.7, 07-0.8, 0.8-0.9, and above 0.9 indicate a bad, medium, decent, very good, and excellent quality of the model. The training data set's ROC indicates the model's success rate and tests the model's suitability [81]. The test dataset's ROC reveals the model's predictive value and shows how good or bad the predictive model [79] is. The seed cell area index (SCAI) is an important method for judging the robustness of the models [24,82] and was used in this study.
where A is the true positive rate, B is the false positive, C is the false negative, D is the true negative.

Analysis of Muti-Collinearity of GCFs
The multi-collinearity indicates the intra-correlation among the gully conditioning factors. The multi-collinearity test was carried out by the SPSS software. The outcomes of the multi-collinearity between the 14 GCFs are presented in Table 2. The results of the multi-collinearity show that the tolerance and VIF values of gully conditioning factors are less than 0.1 and 4.5, indicating no multi-collinearity problems among the gully conditioning factors, which means that they can be used for predicting the gully erosion.

Analysis of Factor Importance using the Weight-of-Evidence (WofE) Model
The statistical calculation of the WofE model is shown in Table 3. The WofE values of independent factors represent their effect on gully development. The topographic factors like elevation, slope, and aspect are strongly related to gully erosion. When the WofE value is >1 the control of the effective factor on gully occurrence is high and vice versa. A WofE value of 13.95 was found in the elevation factor sub-class of >161m, which indicates a strong positive correlation between elevation and gully occurrence. The 0.96-1.83 slope subgroup has the maximum WofE value (6.101), which indicates a high positive correlation with the occurrence of gully erosion. In terms of the slope aspect, a high probability of gully occurrence is suggested by the north-east aspect with the WofE value of 4.8. The rainfall class of 781-797 mm with a value of 8.016 shows a strong relationship with gully erosion. The distance from river and lineament classes of 0-0.18 km and 0.19-0.43 km, with the WofE values of 4.054 and 3.703, demonstrate a strong inverse relationship with the occurrence of gullies. In the case of the geology and soil type, most of the gullies were found in the granite-gneiss geological formation and the fine loamy, mixed hyperthermic haplustepts soil type class. The WofE values of these classes are 3.408 and 11.625, indicating a high probability of gully occurrence. Generally, the LULC types strongly determine the development of gullies in a region [14].

Spatial Gully Erosion Susceptibility Analysis
The gully erosion models were built using ensemble machine learning algorithm-based training datasets to predict the spatial susceptibility to gully erosion. The gully erosion susceptibility (GES) indices produced by the machine learning techniques RF, GBRT, NBTree, and TE have been classified into four classes with respect to gully erosion susceptibility, namely low, medium, high, and very high, based on the natural break classification method. The Figure 7a-d show the GESMs produced by the four ensemble machine learning frameworks.
The GESM produced by the RF model ( Figure 7a) found that 2.29% (10.15 km 2 ) of the basin area has a very high GES. The high and moderate GES zones cover 4.59% (20.34 km 2 ) and 15.44% (68.38 km 2 ) of the watershed, respectively (Table 4), while the remaining 344.07 km 2 (77.68%) falls into the low gully erosion susceptibility class. The relative importance of the gully conditioning factors has also been assessed using the random forest method. According to this model, the elevation and rainfall are the most important contributing factors for gully erosion (Table 5), while the geology plays less of a role for gully erosion.
The GESM generated by the NBT (Figure 7d) shows that 70.45% (312.05 km 2 ) of the study area has a low GES. The high and very high GES classes make up 11.13% and 5.21% of the watershed, respectively (Table 4 and Figure 8). The medium susceptibility class covers 58.49 km 2 (13.20%) of the basin.
Based on the results of GBRT (Figure 7b), the research area has 315.85 km 2 (71.31%) that falls into the low susceptibility class, followed by 78.47 km 2 (17.71%) in the medium susceptibility class, 35.60 km 2 (8.04%) in the high susceptibility class and 13.03 km 2 (2.94%) in the very high GES classes ( Table 4) In case of the TN model (Figure 7c), the results show that the study area has 338.66 km 2 (76.46%) area in the low susceptibility class, followed by 71.76 km 2 (16.20%) in the medium susceptibility class, 21.67 km 2 (4.89%) in the high susceptibility class, and 10.86 km 2 (2.45%) in the very high GES class out of the total area of 442.95 km 2 (Table 4, Figure 8a,b). The relative importance of the GCFs was also assessed by the GBRT as like RF model. Similarly, elevation and rainfall are the most important factors while the geology is the least important factor contributing to gully occurrence (Table 5).
Fallow and barren land are extensively open to soil depletion by flowing water because of the absence of vegetation cover. Despite the complexities of gully formation, the main reason for it in this region is the intense monsoonal rainwater runoff. The region experiences a short rainy season with high-intensity precipitation events after hot and dry summers, which are ideal gully forming conditions.
Based on the results of GBRT (Figure 7b), the research area has 315.85 km 2 (71.31%) that falls into the low susceptibility class, followed by 78.47 km 2 (17.71%) in the medium susceptibility class, 35.60 km 2 (8.04 %) in the high susceptibility class and 13.03 km2 (2.94 %) in the very high GES classes ( Table  4) In case of the TN model (Figure 7c), the results show that the study area has 338.66 km 2 (76.46%) area in the low susceptibility class, followed by 71.76 km 2 (16.20%) in the medium susceptibility class, 21.67 km 2 (4.89%) in the high susceptibility class, and 10.86 km 2 (2.45 %) in the very high GES class out of the total area of 442.95 km 2 (Table 4, Figure 8 a,b). The relative importance of the GCFs was also assessed by the GBRT as like RF model. Similarly, elevation and rainfall are the most important factors while the geology is the least important factor contributing to gully occurrence (Table 5).
Fallow and barren land are extensively open to soil depletion by flowing water because of the absence of vegetation cover. Despite the complexities of gully formation, the main reason for it in this region is the intense monsoonal rainwater runoff. The region experiences a short rainy season with high-intensity precipitation events after hot and dry summers, which are ideal gully forming conditions.

Validation of Models
The validation of the GESMs using AUROC, SCAI, and eleven statistical measures are shown in Figure 9, Figure 10 and Table 6. In this research, we used some validation techniques that are rarely used in hazard modeling. To judge the capabilities of the models, we considered both the training and testing datasets. We found a good similarity between the data collected during fieldwork and the predicted results. Some field photos of gullies are presented in the methodology section. The success rates and predictive rates of the RF, TE, GBRT, and NBT models are 0.94, 0.90, 0.84, and 0.82 and 0.96, 0.91, 0.88, and 0.84, respectively. The AUCs of the AUROC indicate a very good to excellent prediction accuracy of the models for the GESMs (Table 6). Sensors 2020, 20, x FOR PEER REVIEW 20 of 28

Discussion
Gully erosion risk assessment based on GESMs and effective geo-environmental factors is the first step for managing gully erosion. Although different approaches and procedures for the spatial prediction of environmental hazards have been developed and implemented around the world, the aims of all these methods are the same. A controversial issue among environmental researchers is the preparation of a logical and reliable susceptibility map of natural hazards. In the past decade, machine learning techniques are being developed. The important applications of the machine learning techniques are prediction, categorization, clustering, and elaboration of data [83,84]. Different sources were used to prepare the input dataset. Because some of the factors considered in the GESM were derived from a digital elevation model (DEM), the resolution of the DEM greatly affects the precision of the results [85,86]. In this study, we used RF, GBRT, NBT, and TE tree-based machine learning algorithms for producing the gully erosion susceptibility maps based on training and validation datasets and 14 GCFs. These factors were tested for collinearity by TOL and VIF. The results indicate that no GCF has a multi-collinearity problem. The outcomes of the WofE showed that the effective parameters and gully erosion datasets have a strong positive correlation. The positive values of the effective parameters indicate a strong correlation with the probability of gully erosion.

Discussion
Gully erosion risk assessment based on GESMs and effective geo-environmental factors is the first step for managing gully erosion. Although different approaches and procedures for the spatial prediction of environmental hazards have been developed and implemented around the world, the aims of all these methods are the same. A controversial issue among environmental researchers is the preparation of a logical and reliable susceptibility map of natural hazards. In the past decade, machine learning techniques are being developed. The important applications of the machine learning techniques are prediction, categorization, clustering, and elaboration of data [83,84]. Different sources were used to prepare the input dataset. Because some of the factors considered in the GESM were derived from a digital elevation model (DEM), the resolution of the DEM greatly affects the precision of the results [85,86]. In this study, we used RF, GBRT, NBT, and TE tree-based machine learning algorithms for producing the gully erosion susceptibility maps based on training and validation datasets and 14 GCFs. These factors were tested for collinearity by TOL and VIF. The results indicate that no GCF has a multi-collinearity problem. The outcomes of the WofE showed that the effective parameters and gully erosion datasets have a strong positive correlation. The positive values of the effective parameters indicate a strong correlation with the probability of gully erosion. The SCAI values decrease from low susceptibility classes to very high susceptibility classes, which indicates the more accurate and significant results ( Table 6). The results of the seed cell area index (SCAI) for the very high susceptibility classes are 0.01 (RF), 0.03 (GBRT), 0.04 (NBT), and 0.01 (TE), which indicates that these are very good models ( Table 6). All the machine learning methods used in this study are well suited for modeling the gully erosion susceptibility. According to the AUROC curve, the SCAI and all the statistical measures, the RF model is the most accurate and robust model for gully erosion prediction.

Discussion
Gully erosion risk assessment based on GESMs and effective geo-environmental factors is the first step for managing gully erosion. Although different approaches and procedures for the spatial prediction of environmental hazards have been developed and implemented around the world, the aims of all these methods are the same. A controversial issue among environmental researchers is the preparation of a logical and reliable susceptibility map of natural hazards. In the past decade, machine learning techniques are being developed. The important applications of the machine learning techniques are prediction, categorization, clustering, and elaboration of data [83,84]. Different sources were used to prepare the input dataset. Because some of the factors considered in the GESM were derived from a digital elevation model (DEM), the resolution of the DEM greatly affects the precision of the results [85,86]. In this study, we used RF, GBRT, NBT, and TE tree-based machine learning algorithms for producing the gully erosion susceptibility maps based on training and validation datasets and 14 GCFs. These factors were tested for collinearity by TOL and VIF. The results indicate that no GCF has a multi-collinearity problem. The outcomes of the WofE showed that the effective parameters and gully erosion datasets have a strong positive correlation. The positive values of the effective parameters indicate a strong correlation with the probability of gully erosion. As per the RF and GBRT, the most effective factors are elevation, rainfall, NDVI, LULC, and slope, while the geology, soil type and distance from the river have little control over gully erosion. In this basin, the geology, and soil type are almost uniform, which may be the cause of them having less impact on gully erosion. The GESMs based on machine learning ensemble techniques, namely the RF, NBTree, GBRT, and TE models, were created using GIS and R programming language. In the upper parts of Hinglo River basin, we identified a very high susceptibility class of GES. Geologically, the upper catchment consists of the granite-gneiss geological formation. The soil type of the upper catchment is the fine loamy mixed type soil textural. Topographically, the upper catchment is rugged and badland topography. The study area covers the two topographical regions of the Chhoto nagpur plateau and the Rar lateritic region.
The AUROC, SCAI, and five statistical measures were used for validating the GESMs produced by the selected machine learning techniques and showed excellent accuracy in the prediction of gully erosion. For a number of reasons, it is not possible to completely eradicate or avoid some causes of errors, such as that the gully samples were chosen in an area where the gully erosion area is small in comparison to the non-gully area. We divided the sample data in a 70:30 ratio based on the suggestions in previous literature without testing the sample accuracy. A different ratio may yield better results, and that should be the subject of our future research. Noise in the selected gully conditioning factor data exists after the collinearity test and need other methods need to be applied to eradicate this problem. But the main advantage of the tree-based machine learning algorithms is in the collection of important information because they automate the process of investigating multiple datasets. The effective analysis of the non-gully area ratio, evaluation of sample division, considering the method for selecting the features, and use of ensemble approaches is useful in enhancing the accuracy of the GES models.

Models Comparisons
In this study the results of these four models i.e., RF, GBRT, NBT, and TE were categorized into low, medium, high, and very high GES zones (Figure 7). The division of the area into different GES classes is shown in Figure 8a,b. Among these models, the Naïve Bayes tree (NBT) model shows the largest area of the very high susceptibility zone. According to the validation results, these models have proved the excellent prediction accuracy. But the results of these models vary slightly in terms of the values of the AUROC curve, SCAI and the eleven statistical measures ( Table 6). The AUC of the RF model is 96%, which indicates this to be the most accurate, followed by 91% of TE, 88% of GBRT, 84% of NBT model based on the validation dataset. Therefore, the RF model is a better model for the prediction of GES (Table 6) for this basin compared to the other models. The findings also showed that the result of the tree-based ensemble methods has a better accuracy than the statistical models used in this region [23,24,87]. Our results are rational as the tree-based machine learning algorithms minimized bias, variance, and overfitting issues in GES modeling. This is confirmed by Arabameri et al. [88], Pourghasemi et al. [89], Hembram et al. [87], and Gayen et al. [90].

Conclusions
The purpose of this study is not only to investigate the capability of a machine learning model to predict the susceptibility to gully erosion, but also to compare its capability and robustness among the implemented models, i.e. GBRT, RF, NBT, and TE. Therefore, 14 geo-environmental factors were used and the significance of all GCFs was explored using the WofE, RF, and GBRT models. The findings underlined that the understanding of the strengths and limitations remains somewhat challenging for model selection, even when performing model comparisons with some clear objectives, such as prediction accuracy and robustness. Based on six threshold-dependent and -independent assessment criteria, the RF obtained the most outstanding performance as per the achievements. The GBRT, NBT, and TE have a slightly lower precision when compared to the RF in terms of pure prediction performance. The results of all the models show that the upper portion of the basin has the highest susceptibility to gully erosion in the whole basin. Therefore, immediate suitable planning is needed to prevent further gully and soil erosion in the Hinglo River basin. The outcome of variable significance showed that the elevation is the most significant GCF followed by the influences of rainfall and the NDVI. On the other hand, the geology, soil type, and STI influences are the least important. The results of this research could be helpful for land resource management to cope with the current uncertain situation and more accurately understand the different factors that influence gully erosion. Additionally, this approach could be used as a guideline for future research to analyze the vulnerability of gully erosion to land use change i.e., as a tool for regional soil resource analysis.