Gully Erosion Susceptibility Mapping in Highly Complex Terrain Using Machine Learning Models

: Gully erosion is the most severe type of water erosion and is a major land degradation process. Gully erosion susceptibility mapping (GESM)’s efﬁciency and interpretability remains a challenge, especially in complex terrain areas. In this study, a WoE-MLC model was used to solve the above problem, which combines machine learning classiﬁcation algorithms and the statistical weight of evidence (WoE) model in the Loess Plateau. The three machine learning (ML) algorithms utilized in this research were random forest (RF), gradient boosted decision trees (GBDT), and extreme gradient boosting (XGBoost). The results showed that: (1) GESM were well predicted by combining both machine learning regression models and WoE-MLC models, with the area under the curve (AUC) values both greater than 0.92, and the latter was more computationally efﬁcient and interpretable; (2) The XGBoost algorithm was more efﬁcient in GESM than the other two algorithms, with the strongest generalization ability and best performance in avoiding overﬁtting (averaged AUC = 0.947), followed by the RF algorithm (averaged AUC = 0.944), and GBDT algorithm (averaged AUC = 0.938); and (3) slope gradient, land use, and altitude were the main factors for GESM. This study may provide a possible method for gully erosion susceptibility mapping at large scale.


Introduction
Gully erosion refers to the erosion process that occurs with concentrated runoff [1], which is the most severe type of water erosion and a significant land degradation process [2,3]. Gully occurrence is controlled by a wide range of factors, including topographic parameters (e.g., contributing drainage area, slope gradient), land use, soil types, et al. [1,4], and is associated with the velocity and volume of the surface runoff [5]. Understanding the spatial distribution of gullies is essential for watershed management, soil and water conservation measurements, and infrastructure planning. Gully erosion susceptibility mapping (GESM) is of great interest to researchers as it is beneficial for predicting the spatial probability of gully occurrence [6].
Previous efforts in gully distribution modeling have included topographical threshold models, traditional statistical methods, and machine learning (ML) algorithms. Topographical threshold models are most commonly used for gully initiation prediction [7]. For example, Chaplot et al. [8] predicted gully erosion headcuts in the steep slopes of Laos based on topographic thresholds. Majhi et al. [9] evaluated the applicability of the terrain threshold method in the Rarh plain. Dewitte et al. [10] used slope-area threshold (S-A) analysis with a logistic regression model to predict the gully erosion in Algeria. The model parameters varied with environmental parameters, which limited its application, especially at large scale [11]. Some traditional statistical models have been applied to access GESM, such as logistic regression (LR) [12], analytical hierarchical process (AHP) [13], conditional analysis (CA) [14], evidential belief function (EBF) [15], frequency ratio (FR) [16], and weight of evidence (WoE) [17]. ML algorithms have recently been used in GESM with the developments in information technology. ML algorithms have been found successful in discovering complex relationships among various influence factors without strict assumptions and can be used to handle much bigger datasets [18]. GESM prediction accuracy has been greatly improved using ML algorithms compared with traditional statistical methods. Typical ML algorithms used in GESM, such as by Arabamari et al. [19], explored a new model based on the genetic algorithm-extreme gradient boosting (GE-XGBoost). Their results showed that this model was of great significance for predicting large-scale gully erosion susceptibility maps. Gayen et al. [20] used multivariate additive regression splines (MARS), flexible discriminant analysis (FDA), random forest (RF), and support vector machine (SVM) to predict the susceptibility map of gully erosion in the Pathro River Basin of India. They found that RF had the highest prediction accuracy (AUC = 0.962) and FDA had the worst performance (AUC = 0.842). Arabamari et al. [21] used alternating decision tree (ADTree), Naïve-Bayes tree (NBTree), and logistic model tree (LMT) approaches to evaluate the susceptibility of gully erosion on the Bastam watershed. It was found that the three extended models based on the decision tree produced excellent prediction results (AUC > 0.922). Saha et al. [22] applied random forest (RF), gradient-boosted regression tree (GBRT), Naïve-Bayes tree (NBT), and tree ensemble (TE) models to evaluate gully erosion susceptibility in the Hinglo River basin. They found that RF has the highest prediction accuracy and the best simulation effect and proposed that this model could also be used to evaluate gully erosion susceptibility in other areas with the same geological environment conditions. However, predicting gully erosion susceptibility maps efficiently, accurately, and interpretably remains a challenge. The parameters for terrain threshold models vary with the mechanism of gully erosion [4,23]. For the traditional statistical methods, strict assumptions have to be defined before research, which is considered a drawback of such methods [24,25]. The prediction accuracy is relatively low by using these methods [26][27][28]. The application of ML algorithms in GESM is also limited by computing efficiency since most of the above algorithms need pixel-by-pixel prediction using the regression method, especially when predicting the gully erosion susceptibility map in large regions or when using very high-resolution datasets. In some cases, the spatial resolution has to be reduced to improve computing efficiency, but this may also reduce the prediction accuracy. The high performance in terms of modeling can only be reached by sacrificing the interpretability of the model results [29]. A more efficient and interpretable method that can predict the gully erosion susceptibility map with high accuracy still needs to be found, especially in complex areas, where severe erosion often occurs.
This research aims to explore options for such a better way for GESM in the Loess Plateau area in China with very complex terrain types where GESM study is still limited. Since the traditional statistical methods are usually time-efficient and the ML algorithms are more accurate in GESM prediction, combining the two methods might be a valuable way to solve the problem. The weight of evidence (WoE) model is a traditional statistic model and was used by Arabameri et al. [30] and Shit et al. [31] to explain the relative importance of conditioning factor classes to gully erosion locations successfully. It was selected as the traditional statistical method in this research. Three commonly used machine learning models (XGBoost, GBDT, and RF) were selected. The resulting combined WoE and machine learning classification fitting model (WoE-MLC) for GESM was explored. The results of this research would be important in gully mapping application at large scale since a time-efficient, interpretable, and relatively accurate method for gully mapping method was offered. The organization of this paper is outlined as follows: Section 2 describes the was explored. The results of this research would be important in gully mapping application at large scale since a time-efficient, interpretable, and relatively accurate method for gully mapping method was offered. The organization of this paper is outlined as follows: Section 2 describes the study area, dataset, and the methods used; Sections 3 and 4 report the model results and discussion; and Section 5 supplies the conclusions of this study.

Study Area
The study area is the Mizhigou watershed, a small watershed with an area of 10.9 km 2 in Loess Plateau, China, within latitudes 37°41′ and 37°43′ N, and longitudes 109°56′ and 109°59′ E (Figure 1). The altitude ranges from 898 to 1151 m. The average slope gradient is 28.7°, with 27.1% of the hillslopes steeper than 40°. It is a typical loess gully and hilly area. The soil erosion in the watershed is serious [32]. This area belongs to the semi-arid climate zone; the average yearly precipitation total is 430 mm. The soil type is loess, and the soil particles are mainly silt particles that are easy to erode. The land use in the watershed is mainly grassland and cultivated land. There is still some sloping farmland distributed scattered in this watershed. Most of the farmland on hillslopes steeper than 25° has been built into terraces. Most of the gullies in the study area are V-shaped.

Base Data and Data Processing
The base datasets included digital orthophoto maps (DOM) and digital surface models (DSM) and were obtained using Unmanned Arial Vehicles (UAV). The drone used was a DJI 4 RTK, and the ground image control points were set up to improve the data accuracy. The Pix4dmapper software was used for data processing; it is based on

Base Data and Data Processing
The base datasets included digital orthophoto maps (DOM) and digital surface models (DSM) and were obtained using Unmanned Arial Vehicles (UAV). The drone used was a DJI 4 RTK, and the ground image control points were set up to improve the data accuracy. The Pix4dmapper software was used for data processing; it is based on image content, uses unique optimization techniques and regional network adjustment techniques to calibrate images, and can generate the DSM and DOM automatically and quickly. The resolution of DOM and DSM was 0.09 m. The DSM was resampled to 1 m for later use. The DOM and DSM datasets were obtained on 28 August 2019. Gaussian filtering was applied to reduce the effects of noise.
Vegetation is an important source of ground elevation error when using UAV-sourced datasets. In this study area, most of the hillslopes were covered by a low and relatively homogeneous height of grass or were bare land during the time of the flight. We assumed this had limited influence on the later analyses. There are some trees at the bottom of the valleys and on a tiny part of the hillslope. In these areas, the points with surface elevation values around the trees were obtained manually. Then, the 1 m-resolution DSM in these places was obtained by interpolation based on those points and mosaiced to the resampled 1 m DSM to make the final DSM dataset.
We used RTK GNSS instruments composed of base stations and mobile stations to carry out field measurements. The mobile station simultaneously receives the observation data from the base station broadcast, and the mobile station antenna receives the observation data from the satellite, which forms the RTK observation model [33]. Its horizontal precision is 2.5 mm ± 2 ppm and its vertical precision is 5 mm ± 2 ppm.

Experimental Procedure
The basic procedure of the experiment included five steps ( Figure 2). In Step 1, the base datasets including DOM, DSM, gully erosion inventory map, and gully erosion conditioning factors were obtained.
Step 2 was multi-collinearity analysis using tolerance and variance inflation factor (VIF); in Step 3, the relative importance of conditioning factor classes to gully erosion locations was determined using the weight of evidence model; Step 4 was gully erosion susceptibility mapping using RF, GBDT, and XGBoost; and Step 5 was validation through AUC. About 30 % (in number) of gullies belonged to testing datasets, which were used for validation. The other 70% of gullies belonged to the training dataset, which was used for model building.

Gully Erosion Inventory Mapping
Gully erosion inventory mapping is essential for GESM since it is the base datas for model building and validation [34]. The gully erosion inventory map used in th research was obtained by visual interpretation based on 0.09 m DOM.
Stratified random sampling was used for selecting the interpreted gullies used f

Gully Erosion Inventory Mapping
Gully erosion inventory mapping is essential for GESM since it is the base dataset for model building and validation [34]. The gully erosion inventory map used in this research was obtained by visual interpretation based on 0.09 m DOM.
Stratified random sampling was used for selecting the interpreted gullies used for gully erosion inventory mapping. The study area was firstly divided into 30 sub-watersheds based on hydrological analysis; then, the numbers and locations of the selected gullies were determined according to the watershed area and geomorphological types in each sub-watershed. The drainage area for each selected gully was chosen as the corresponding non-gully sample. There were other types of non-gully areas, such as construction land, cultivated land, water body, and so on. We selected these non-gully areas with a square with a side length of 10 m as a unit. In total, 353 gully polygons were selected, 70% (247 gullies) were used in the model building, and the remaining 30% (106 gullies) were used for the model validation; 502 non-gully polygons were selected, 70% (351 non-gullies) were used in the building of the model, and the remaining 30% (151 non-gullies) were used for the validation. The selected gullies range in length from more than ten meters to hundreds of meters and width from a few meters to dozens of meters. Many gullies hang on both sides of the gully, and the elevation continues to drop from the head of the gully. The shape of the gullies is mostly V-shaped.
The accuracy of the visually interpreted gullies was validated based on 32 fieldmeasured gully data obtained using GNSS RTK ( Figure 3) using Equations (1) to (4) [29,30]. The accuracy was 96.78%, which showed the interpreted results were of sufficient accuracy for use.
In these equations, M is the field measured area of the gully, V indicates the gully area obtained by visual interpretation of DOM, I is the intersection area of A and B, N is the number of gullies, and Precision is the overall accuracy.

Gully Erosion Conditioning Factors
An important task is to select practical geo-environmental factors to assess gully erosion susceptibility, as they influence the prediction quality of the models [14]. There is no global consensus or universal regulation for the selection of gully conditioning factors. Based on previous studies [6,20,35], available data, and field perceptions of the study area, we selected a total of 14 conditioning factors ( Figure 4) for GESM. These factors are the slope gradient, slope aspect, altitude, curvature, topographic wetness index (TWI), stream power index (SPI), drainage density, distance from the stream, land use, distance from the road, catchment area, distribution of terrace, slope length (LS), and fractional vegetation cover (FVC). In this study, we used ArcGIS 10.5 to extract topographic factors such as slope aspect, slope gradient, curvature, TWI, SPI, LS, and catchment area from the digital elevation model and the DOM was used to extract land use, stream network, and road distribution maps, FVC, and terraced fields. Subsequently, the layers of In these equations, M is the field measured area of the gully, V indicates the gully area obtained by visual interpretation of DOM, I is the intersection area of A and B, N is the number of gullies, and Precision is the overall accuracy.

Gully Erosion Conditioning Factors
An important task is to select practical geo-environmental factors to assess gully erosion susceptibility, as they influence the prediction quality of the models [14]. There is no global consensus or universal regulation for the selection of gully conditioning factors.
Based on previous studies [6,20,35], available data, and field perceptions of the study area, we selected a total of 14 conditioning factors ( Figure 4) for GESM. These factors are the slope gradient, slope aspect, altitude, curvature, topographic wetness index (TWI), stream power index (SPI), drainage density, distance from the stream, land use, distance from the road, catchment area, distribution of terrace, slope length (LS), and fractional vegetation cover (FVC). In this study, we used ArcGIS 10.5 to extract topographic factors such as slope aspect, slope gradient, curvature, TWI, SPI, LS, and catchment area from the digital elevation model and the DOM was used to extract land use, stream network, and road distribution maps, FVC, and terraced fields. Subsequently, the layers of distance from stream and road, as well as drainage density, were produced using the Euclidean distance and Line density tools in ArcGIS 10.5 software, respectively.

Multi-Collinearity Assessment
One of the most critical steps in GESM is to analyze the correlation or the influence of multi-collinearity among different influence factors. Multi-collinearity is a condition under which independent variables are highly correlated or interrelated [36]. Tolerance and variance inflation factor (VIF) are two essential indexes used to identify the relationship between independent variables [37]. A tolerance value less than 0.1 or a VIF value larger than 10 indicates a high multi-collinearity [38], in which case it is necessary to remove those specific variables. Otherwise, the accuracy of the prediction will decline [39].

Gully Erosion Modeling
WoE is a bivariate statistical method based on the Bayesian probability framework to estimate the relative importance of conditioning factor classes [40]. In this study, the model was used to determine the weight values of conditioning factor classes to gully erosion locations; the weights were calculated using Equations (5) to (11).
where P is the probability and ln is the natural log function. B is the presence of gully conditioning factor and B is the absence of gully conditioning factor. L is the presence of gully and L is the absence of gully. W + i and W − i indicate that the conditioning factor is present (positive correlation) and absent (negative correlation), respectively. C indicates the overall association between gully occurrence and conditioning factors. S(C) is the standard deviation of the contrast. S 2 (W + ) is the variance of the W + and S 2 (W − ) is the variance of W − , and W is the weight value of each factor for a specific class.
Each continuous conditioning factor was reclassified into a finite set of subclasses to calculate the proportion of gully and non-gully data in each class. We used the natural break method to classify the influence factors in order to highlight the differences at all levels, and the number of grades refers to the previous literature [20,21,30].

Multi-Collinearity Assessment
One of the most critical steps in GESM is to analyze the correlation or the influence of multi-collinearity among different influence factors. Multi-collinearity is a condition under which independent variables are highly correlated or interrelated [36]. Tolerance and variance inflation factor (VIF) are two essential indexes used to identify the relationship between independent variables [37]. A tolerance value less than 0.1 or a VIF value larger than 10 indicates a high multi-collinearity [38], in which case it is necessary to remove those specific variables. Otherwise, the accuracy of the prediction will decline [39].

Gully Erosion Modeling
WoE is a bivariate statistical method based on the Bayesian probability framework to estimate the relative importance of conditioning factor classes [40]. In this study, the model was used to determine the weight values of conditioning factor classes to gully erosion locations; the weights were calculated using Equations (5) to (11). Breiman [41] defines a random forest as a collection of tree-structured weak learners comprised of identically distributed random vectors where each tree contributes to a prediction for x. The random forest ML model is a nonparametric multivariate model consisting of multiple trees and can be used for classification and regression [6]. In each splitting process of the subtree in the random forest, a certain number of the data and features are randomly selected, and then the optimal value is obtained. In this way, the decision trees in the random forest can be different from each other, enhance the diversity of the system, and thus improve the performance of the model [42]. For the classification problem, the result of the simple majority voting method is used as the output of the random forest; for the regression problem, the simple average of the output result of a single tree is taken as the output of the random forest. The construction of the RF model was realized by using the scikit-learn library in Python.
(2) Gradient boosted fecision trees (GBDT); GBDT is an ensemble machine learning method combining multiple decision trees based on the boosting concept. It continuously improves prediction accuracy through interactions. A new decision tree was established in the gradient direction of the reducing residuals in each iteration [43]. The GBDT tree is constructed sequentially. That is, the first tree trains all the samples to get a model and its weight, while the latter tree continues to train the sample to get a model and weight with the goal of reducing the residual of the previous tree, and stop when the residual is small enough or reach the set number of trees. The final model is the weighted sum results of each tree. GBDT can be used for classification and regression problems, and it is regarded as one of the best algorithms for fitting actual distributions [44].
(3) Extreme gradient boosting machine (XGBoost); The XGBoost was introduced by Chen and Guestrin [45] based on the concept of boosting. XGBoost produces a prediction model in the form of a boosting ensemble of weak classification trees by a gradient descent that optimizes the loss function [46,47]. This algorithm first builds all the subtrees that can be built from top to bottom, then prunes backward from bottom to top so that local optimal solutions can be avoided [48]. It is more efficient and can be used for both classification and regression tasks. XGBoost has three crucial aspects. These are a regularized objective function for better generalization, gradient tree boosting for additive training, and shrinkage and column subsampling to preventing overfitting [45,49].
The optimal parameter combination can be explored by a learning curve or grid search algorithm, and the ranking of feature importance can be obtained by using the Feature_importance interface based on Gini index calculation; Gini index is a common parameter to evaluate the importance of factors in GESM [6,18,30]. Feature importance refers to the contribution rate of variables to fitting accuracy. A specific factor is more important if its feature importance is larger. The reasons for selecting these models are the following: (1) XGBoost is an efficient algorithm that has the capacity to find missing values and enhance the prediction performance result. It represents the state-of-the-art within the machine learning community. At present, it has seldom been used in gully erosion susceptibility mapping. (2) The XGBoost algorithm is improved on the basis of the GBDT algorithm, and the selection of GBDT algorithm can more effectively utilize the improvement effect of XGBoost. (3) The RF model is one of the most widely used models in gully erosion susceptibility mapping, and has achieved excellent simulation results. (4) Each of the three models is an ensemble model, which can remove the shortcomings of individual statistical or ML models [18].

Model Validation
We used 30% of the gullies which were not included in the modeling process to validate the GESM based on the receiver operating characteristic (ROC) curve and calculation of the area under the curve (AUC). A ROC curve effectively indicates the quality of deterministic and probabilistic models and forecast systems [50,51]. The x-axis and y-axis of the ROC curve are plotted for 1-specificity and sensitivity, respectively, and have different cut-off thresholds. X represents the sensitivity of the true positive rate (TPR), the value of the predicted gully pixel. The real case is the gully pixel, which accounts for the ratio of the gully pixel value in all real cases. Y represents the false positive rate (FPR), that is, the value that is predicted as gully pixels, but the real situation is non-gully pixels, which accounts for the ratio of non-gully pixels in all real cases. The AUC value is a quantitative value to evaluate the ROC curve. When the AUC value is 0.5, the prediction effect of the model is no better than the random method to predict the occurrence of gullies. When ideally, that is, a complete distinction between gully and non-gully pixels, the value of AUC is 1., and the closer the ROC curve is to the upper left corner, the more accurate it is [35]. The AUC value and prediction accuracy can be classified as follows: 0.5-0.6, poor; 0.6-0.7, average; 0.7-0.8, good; 0.8-0.9, very good; and 0.9-1, excellent [30]. The ROC-AUC can be calculated by Equations (12) to (14).
where TP is true positive, TN is true negative, FP is false positive and FN is false negative. P and N represent the presence and absence of gullies, respectively. Table 1 presents the multi-collinearity values for the 14 factors which are commonly used in gully erosion susceptibility mapping. Most of the factors in Table 1 met the requirements of multi-collinearity analysis (VIF ≤ 10, Tolerance ≥ 0.1), except for the factor of the existence of terrace, and could be used in the following analysis.

Importance of Conditioning Factors to Gully Erosion
Both the relative importance of conditioning factor classes and the feature importance of conditioning factors to gully erosion were evaluated.
The relative importance of conditioning factor classes to gully erosion locations using the WoE method is presented in Table 2. The weight values were higher in one or a few classes than that of the other classes for each factor. With altitude, slope gradient, and FVC increase, the weight values increased and then decreased, which showed that areas with medium values of the above factors were easier for gully formation. The weight values were the highest for the altitude values from 1000 m to 1035 m, slope gradient values from 40 • to 50 • and FVC values from 41% to 58%. The weight values increased with LS factor and catchment area drainage density, and SPI values also became larger. The weight values were largest when TWI, curvature, and distance from the stream were small. In addition, the weight value was the highest when the slope aspect was N-facing slopes, and the distance from the road was 385-485 m. Gully erosion is more likely to occur in this area if the values of altitude, slope gradient, FVC, and the distance from the road are medium, LS factor, catchment area, drainage density, and SPI values are larger, and TWI, distance from the stream, and curvature are smallest. N-facing slope tends to have more gullies formation than the other slope aspects. The weight values in Table 2 were calculated based on Pixels of Gullies, Pixels of Non-gullies, and Equations (5) to (11).  Figure 5 shows the feature importance of conditioning factors based on the MLC models. It showed that, compared with other factors, slope gradient, land use, and altitude were the most critical factors in predicting GESM for each of the three selected machine learning algorithms, the RF model, GBDT model, and the XGBoost model. The distribution of gullies in this area is mainly affected by terrain and man-made factors.

Gully Erosion Susceptibility Mapping (GESM)
GESM using each of the selected models is shown in Figure 6. The natural break method was used to divide the map into five different grades; these were "very low", "low", "medium", "high", and "very high". Generally, the GESM results were consistent using the combined WoE-MLC and machine learning models by themselves. Higher susceptibility zones are mainly located in the middle of the hillslopes at both sides of streams banks. Lower or medium susceptibility zones are mainly located on the top of hillslopes and the bottom of the streams. The maps obtained from WoE-MLC models were more fragmented than those from the machine learning models. More very low susceptibility areas were predicted on top of the hillslopes by using machine learning models.

Gully Erosion Susceptibility Mapping (GESM)
GESM using each of the selected models is shown in Figure 6. The natural break method was used to divide the map into five different grades; these were "very low", "low", "medium", "high", and "very high". Generally, the GESM results were consistent using the combined WoE-MLC and machine learning models by themselves. Higher susceptibility zones are mainly located in the middle of the hillslopes at both sides of streams banks. Lower or medium susceptibility zones are mainly located on the top of hillslopes and the bottom of the streams. The maps obtained from WoE-MLC models were more fragmented than those from the machine learning models. More very low susceptibility areas were predicted on top of the hillslopes by using machine learning models.

Validation of Models
The GESM prediction accuracy of the six models is shown in Figure 7. The results showed that the AUC values of the six models were between 0.925 and 0.957, indicating high prediction accuracy for all models. The average AUC value of the three WoE-MLC models was 0.931, still acceptable, although slightly lower than that of the machine

Validation of Models
The GESM prediction accuracy of the six models is shown in Figure 7. The results showed that the AUC values of the six models were between 0.925 and 0.957, indicating high prediction accuracy for all models. The average AUC value of the three WoE-MLC models was 0.931, still acceptable, although slightly lower than that of the machine learning models (0.954). The AUC values for the XGBoost algorithm, combined with WoE or as a separate machine learning regression method, were higher than that of RF and GBDT algorithms without WOE. learning models (0.954). The AUC values for the XGBoost algorithm, combined with WoE or as a separate machine learning regression method, were higher than that of RF and GBDT algorithms without WOE.

Discussion
We hypothesized that combining the WoE model and MLC model was a useful method for GESM, being more efficient and easier to interpret the influence of the varied factors for gully formation, and terrain-related factors should be the most critical factor for GESM in complex terrain areas. By exploring three commonly used ML regression models and WoE-MLC models, the findings of this study supported our hypothesis.

Rationale for Gully Erosion Susceptibility Mapping
The high gully susceptibility zones predicted by all of the models used in this research were mainly located in the middle of the hillslopes on both sides of stream banks, which is consistent with the prediction of Ding et al. [52]. Low or medium susceptibility zones are mainly located on the top of hillslopes and at the bottom of the streams ( Figure 6). This spatial distribution was reasonable according to field investigation and UAV-sourced DOM (Figure 1b). There are more gullies in the middle of the hillslopes in the Loess Plateau because the slope gradient usually is larger than 30°, larger than the other parts of the hillslope, and the catchment area is large enough for gully formation. On the top of hillslopes, the slope gradient is smaller than downwards of the hillslopes, and the drainage area is usually smaller, which makes the gully erosion susceptibility lower. At the bottom of the valleys in the Loess Plateau, although the catchment area usually is large, there are generally few gullies found since the terrain is relatively flat. Its main function is to transport the sediment generated by gully erosion to other places [37].
The gully classification in the Loess Plateau area is more complicated than in other areas because of its complex terrain surface [53,54]. Different susceptibility maps of gully erosion are often obtained based on different gully types. For example, the gully maps predicted by Dai et al. [55] and Yang et al. [56] were different from those in this research.

Discussion
We hypothesized that combining the WoE model and MLC model was a useful method for GESM, being more efficient and easier to interpret the influence of the varied factors for gully formation, and terrain-related factors should be the most critical factor for GESM in complex terrain areas. By exploring three commonly used ML regression models and WoE-MLC models, the findings of this study supported our hypothesis.

Rationale for Gully Erosion Susceptibility Mapping
The high gully susceptibility zones predicted by all of the models used in this research were mainly located in the middle of the hillslopes on both sides of stream banks, which is consistent with the prediction of Ding et al. [52]. Low or medium susceptibility zones are mainly located on the top of hillslopes and at the bottom of the streams ( Figure 6). This spatial distribution was reasonable according to field investigation and UAV-sourced DOM (Figure 1b). There are more gullies in the middle of the hillslopes in the Loess Plateau because the slope gradient usually is larger than 30 • , larger than the other parts of the hillslope, and the catchment area is large enough for gully formation. On the top of hillslopes, the slope gradient is smaller than downwards of the hillslopes, and the drainage area is usually smaller, which makes the gully erosion susceptibility lower. At the bottom of the valleys in the Loess Plateau, although the catchment area usually is large, there are generally few gullies found since the terrain is relatively flat. Its main function is to transport the sediment generated by gully erosion to other places [37].
The gully classification in the Loess Plateau area is more complicated than in other areas because of its complex terrain surface [53,54]. Different susceptibility maps of gully erosion are often obtained based on different gully types. For example, the gully maps predicted by Dai et al. [55] and Yang et al. [56] were different from those in this research. Their method was to determine the gully area by illuminating the surface of the gully, simulating the shadows on the shaded slopes, and then merging both sides of the shadows, so the gully map obtained included the bottom of the streams. Based on a 12.5 m DEM and other conditioning factors, Lei et al. [57] and Azedou et al. [58] predicted gully erosion susceptibility maps in the Robat Turk watershed and the rural municipality of El Faid, respectively. The gullies with smaller shapes were difficult to express at that coarser resolution, so the gully areas simulated tended to be more concentrated. This research mainly focused on the gullies caused by modern accelerated water erosion, which are much smaller and mainly distributed in the middle of the hillslopes. This type of gully was more active and was at the most serious stage of water erosion, so the damage to land resources was greater.
The spatial resolution of all the factors we used was 1 m, at which scale it might be challenging to avoid the influence of the micro-topography on the analysis [59]. Some studies have shown that the best resolution did not necessarily provide the best information [35]. Therefore, it is necessary to analyze the accuracy of gully erosion susceptibility maps at different spatial resolutions to decide the best or least the necessary spatial resolution.

Variable Importance and GESM Model Comparison
Although many researchers have used different factors in predicting gully erosion susceptibility, there are no universal rules with which to select independent variables for GESM modeling. In this research, we explored the contribution of independent variables to gully erosion susceptibility mapping in complex areas of the Loess Plateau. The result showed that slope gradient, land use, and altitude were the most important factors based on all the three machine learning models used. Gayen et al. [20] found in the Pathro River Basin of India, land use and altitude were the most critical factors affecting the occurrence of local gullies, while the importance of slope gradient was low. The terrain of the study area is much flatter in their study (less than 8.6 • in 89% of the study area) than in this research (with an averaged slope gradient value of 28.7 • , larger than 30 • in 48% of the study area). Slope gradient seems to be more important for GESM when the terrain is more complex. Studies by Pourghasemi et al. [34] in the southeast part of Golestan Province suggested that the distance from the stream, drainage density, and altitude were the more important conditioning factors. The study of Arabameri et al. [60] in the Najafabad watershed showed that drainage density, altitude, and LU were more important factors. It can be seen that the most critical factors affecting gully susceptibility may be different according to different study areas. The factor importance for different geomorphological types needs to be further studied.
According to the results of this research, the AUC values of the machine learning models were slightly higher than those of the WoE-MLC models. Machine learning models are based on each conditioning factor and use the pixel-by-pixel model to produce the GESM. WoE-MLC models utilize the feature importance of conditioning factors obtained from the machine learning classification model and the relative importance of conditioning factor classes obtained from the WoE model and then use raster calculation to create GESM. The pixel-by-pixel approach adopted by machine learning models would theoretically obtain more precise results than WoE-MLC models but is much more time-consuming and harder for interpretation. Although the accuracy of the WoE-MLC models is relatively lower (averaged AUC = 0.931), it is still acceptable since WoE-MLC models are more computationally efficient and interpretable than machine learning models. WoE-MLC models are of great potential for gully susceptibility mapping in large regions or at high resolution and in other situations where a large amount of data is needed. Many studies have used statistical and machine learning models to evaluate the relationship between gully occurrence and conditioning factors in different regions, and we believe that the method will be widely applicable.
The XGBoost algorithm was higher in prediction accuracy than the RF and GBDT algorithms. In order to know which algorithm was stronger in generalization ability, we used the training set to verify the GESM predicted by the three algorithms. It showed that the AUC values of XGBoost, RF, and GBDT were 0.958, 0.989, and 0.992, respectively. Although RF and GBDT performed well in the training set, their performance in the testing set was relatively poor according to the result in Figure 7, indicating that they were more influenced by overfitting. The XGBoost algorithm has a stronger generalization ability and can avoid overfitting more effectively. Additionally, the research of Ding et al. [52] showed that the XGBoost algorithm is more time-efficient, and this advantage gradually expands with the increase of the number of conditioning factors. At present, many studies have concluded that the RF algorithm has achieved better simulation results in gully erosion susceptibility mapping [20,37,61,62]. The XGBoost algorithm has been widely used in landslide susceptibility mapping [49], flash-flood susceptibility mapping [63], and groundwater vulnerability predictive mapping [64], and excellent simulation results have been achieved. However, there has been less application in gully erosion susceptibility mapping, and further research is still needed to fully evaluate the application of the XGBoost algorithm in gully mapping.

Future Work and Applications
In recent years, the deep neural learning network (DLNN), which is composed of fully convolutional neural networks (CNNs), deep belief networks (DBNs), stacked auto-encoder (SAE) networks, etc. [65], has become very popular [66]. Band et al. [67] suggested that the DLNN model can scientifically build a high-level feature from a raw dataset. It consists of a different topology than the general neural network of a single hidden layer, as more than one hidden layer is present in this algorithm. Their research found that the DLNN algorithm was more powerful for gully erosion susceptibility mapping than the traditional ML algorithm. However, the DLNN algorithm will have similar time-consuming problems as other ML algorithms, and more effort is needed in the future to solve this problem to meet the needs of gully mapping at a large scale.
With the development of fine-resolution images all over the world [59], large-scale modeling of gully maps will be possible and would be an urgent need for research and management. The results of this research offered a time-efficient, interpretable, and relatively accurate method for gully mapping and would be well applied in future large-scale gully modeling.

Conclusions
This study explored the gully erosion susceptibility mapping method in a watershed with highly complex terrain in the Loess Plateau of China. Three commonly used machine learning models were used, among which the prediction accuracy of XGBoost is higher than that of RF and GBDT. According to the feature importance of influence factors, slope gradient, land use, and altitude were the most important factors for gully mapping. Combing the weight of evidence and the machine learning classification model is a computationally efficient and interpretable way to implement gully susceptibility mapping.