Machine Learning Techniques for Gully Erosion Susceptibility Mapping: A Review

: Gully erosion susceptibility mapping (GESM) through predicting the spatial distribution of areas prone to gully erosion is required to plan gully erosion control strategies relevant to soil conservation. Recently, machine learning (ML) models have received increasing attention for GESM due to their vast capabilities. In this context, this paper sought to review the modeling procedure of GESM using ML models, including the required datasets and model development and validation. The results showed that elevation, slope, plan curvature, rainfall and land use/cover were the most important factors for GESM. It is also concluded that although ML models predict the locations of zones prone to gullying reasonably well, performance ranking of such methods is difﬁcult because they yield different results based on the quality of the training dataset, the structure of the models, and the performance indicators. Among the ML techniques, random forest (RF) and support vector machine (SVM) are the most widely used models for GESM, which show promising results. Overall, to improve the prediction performance of ML models, the use of data-mining techniques to improve the quality of the dataset and of an ensemble estimation approach is recommended. Furthermore, evaluation of ML models for the prediction of other types of gully erosion, such as rill–interill and ephemeral gully should be the subject of more studies in the future. The employment of a combination of topographic indices and ML models is recommended for the accurate extraction of gully trajectories that are the main input of some process-based models.


Introduction
Gully erosion is one of the most important forms of erosion that causes land degradation problems by reducing the quality and quantity of arable soil in both developed and developing countries [1,2].Gullies are categorized into two types, namely, "permanent" and "ephemeral" gullies [3].Permanent gullies are commonly described as erosion channels with a depth of 0.5 to 30 m or a cross-sectional area larger than 0.09 m 2 [4,5]  that cannot be obliterated by conventional tillage due to their large size [6,7].By contrast, ephemeral gullies are small channels that are formed due to the results of natural and concentrated flow which can be removed by normal tillage [3].Generally, the negative effects of gullies can be aggravated by rapid changes in soil characteristics resulting from land use change and agricultural pressure through expansion of farming activities and intensive grazing [1,[8][9][10][11].
There is unanimity in the literature that the prediction of gully erosion trajectories (trajectory is defined as the exact location and path of the gully) and the estimation of soil loss from gullies would provide planners with important information for implementing erosion control strategies in agricultural watersheds [12][13][14][15].Different studies have demonstrated that the initiation and development of gullies are mostly controlled by precipitation, topographic features, soil condition and land use/land management practices [16][17][18][19].
Hence, various process-based models (e.g., chemicals, runoff and erosion from agricultural management systems (CREAMS) [20], the ephemeral gully erosion model (EGEM) [21], groundwater loading effects of agricultural management systems (GLEAMS) [22,23], the revised ephemeral gully erosion model (REGEM) [24], the water erosion prediction project (WEPP) [25], and annualized agricultural non-point source (AnnAGNPS) [26]) have been developed to use these factors to quantify soil loss from gully erosion.However, measurements of the input factors, which are the basic requirements of the process-based models are not generally available on a larger scale.For example, the CREAMS model needs the original length of the gully to compute the amount of sedimentation [20], which limits its application for large scale areas where the measured lengths of gullies are not available.
Topographic indices (e.g., slope-area (SA) [27], the compound topographic index (CTI) [28] and the topographic wetness index (TWI) [29]) have been developed to predict gully trajectories that can be used as the input to process-based models for the estimation of soil loss due to gully erosion.Topographic indices predict gully trajectories based on the assumption that gully formation can be related to the combined effects of the main topographic characteristics [30].Although topographic indices show promising potential for the identification of the exact location of gullies, they only consider topographic factors and do not account for rainfall characteristics, land use or soil properties which is a major limitation that leads to simplistic representations of gully formation [31][32][33].
Recent studies indicate that remote sensing can supply good data for analyzing and predicting gully trajectories [34][35][36][37][38][39].Remote sensing not only provides the input data (e.g., digital elevation models (DEMs)) for topographic index models [16,40,41] and processbased models [42], but can also be used individually for the detection of gully trajectories by visual interpretation.Gully trajectories can be detected by manual digitization or interpretation of gullies from aerial photos or satellite images.For example, Wang, et al. [43] used multi-source remote sensing data to map the spatial distribution of gully trajectories at different spatial scales (Pleiades 1A, 0.7 m; unmanned aerial vehicle (UAV), 0.042 m).To identify gully trajectories, they used visual interpretation and field verification; the results obtained showed that the sub-meter images were a good source of data for the identification of various gully types, and that using satellite and UAV data simultaneously provided satisfactory gully erosion assessment at multiple spatial scales.In another study, Zhang, et al. [44] applied visual interpretation of topographical maps using Landsat enhanced thematic mapper plus (ETM+) images and SPOT-5 satellite images to map gully trajectories in the Kebai region, China.The results obtained showed that gullies had the greatest density in hilly and tableland regions and the lowest density in flatlands.
Although UAV data provide satisfactory results for detailed gully trajectory mapping at the site scale, due to limitations, such as vulnerability to weather conditions and overall budget problems, they are not suitable for large scales.On the other hand, as reported by Skytland [45], the use of remote-sensing information from a particular satellite is growing dramatically by several terabytes per day and global gathered observation data may exceed one exabyte [46].Therefore, due to the high volume of remote-sensing data, analyzing and visually interpreting the information provided by remote-sensing imagery is beyond human abilities to process and is also costly and labor-intensive [47][48][49].Therefore, advanced techniques are required that can adapt to the challenges associated with analyzing remotesensing data.
Many technical papers are available that use ML techniques for identifying areas prone to gully erosion via gully erosion susceptibility mapping (GESM).Susceptibility is defined as the potential of a specific landscape to be influenced by a particular erosion process or a group of erosion processes [65].To use ML algorithms efficiently in the field of GESM, four main questions must be answered: (1) What are the main steps for developing an accurate ML model for GESM? (2) What are the best predictor variables for GESM? (3) What spatial resolution of data can be used to produce accurate detection of gullies?(4) What is the best ML model to be used for GESM?However, among the different technical papers in the field of GESM, no study provides explicit answers to these questions because related studies test the ML models over different regions with different environmental conditions and the results of one study cannot be directly used in another study.Therefore, the main objective of the current paper is to present a thorough review of the literature on ML models and approaches currently used for GESM, especially those producing susceptibility maps of permanent gullies, to facilitate user selection among ML models, predictor variables and their spatial resolution and model development for effective GESM.The paper is organized as follows: Section 2 presents a brief discussion of some basic concepts applicable to widely used ML techniques.Section 3 describes and discusses the common ML methodologies used in GESM studies.Section 4 is dedicated to comparative performance analysis of various ML models applied to GESM.Section 5 presents the conclusion and recommendations.Table 1 provides a list of the abbreviations used in the paper.

Machine Learning Techniques Used in GESM
ML is a branch of computer science that is used for data analysis.It is also classified as an artificial intelligence method since it enables computers to learn from experiences in a similar way to humans or animals.In other words, ML techniques use computational methods to capture the relationship between predictors and targets without using a predefined equation as a model [66].This capability of ML models makes them flexible techniques for solving non-linear problems with a large number of datasets from multiple sources [49].Typically, ML problems are categorized into two main classes: (i) supervised and (ii) unsupervised learning.In supervised learning, the model is trained on known inputs and the corresponding outputs to establish a rule which is used for the prediction of future outputs [67].In unsupervised learning, input data is processed to discover hidden patterns or intrinsic structures in input data.Figure 1 shows the classification of the ML techniques and the most commonly used models related to each category.Generally, the main objective of ML models is to map a set of "explanatory" or "predictor" variables x = {x1, . . .xn} to an "output" or "predictand" variable y using a set of "training" samples {yi, xi} N i to obtain an approximate f(x) that minimizes the error between y and ŷ = f (x) [66].Different error functions are used to compute the error between y and ŷ = f (x), but the most used functions are squared-error (y-ŷ) 2 for regression and negative binomial log-likelihood for classification.There are various ML techniques in the literature that are used in the field of classification and regression or both; however, only the classification types which are applied for gully erosion identification are explained in the current review.

Random Forest
Random forest (RF), developed by [68], is a rapid learning algorithm that is used for regression and classification problems.In RF, various trees are produced by the algorithm and combined to form a forest.This procedure is based on the assumption that various prediction models could produce more accurate results than one model [69].To increase the performance accuracy of the model, a classification and regression trees (CART) technique is used to define each tree based on a bootstrapped sample of the dataset [70].The CART technique repeats k times to define the trees using a random subgroup of the variables chosen at each node [71].A majority vote is applied to all the trees to compute the final results of the model [69,72].The main interesting features of RF are that it (i) is a robust algorithm that avoids overfitting, (ii) shows low bias and low variance because of computation of the average over a large number of trees, (iii) provides strong estimation of errors using out-of-bag (OOB) data, and (iv) offers higher estimation performance [73,74].

Support Vector Machine
Support vector machine (SVM) is a control learning method proposed by Cortes and Vapnik [75].SVM is a powerful method due to its capability for working with non-linear data and minimizes complexity [76].SVM applies statistical learning theory and uses a mathematical process to obtain an optimal hyperplane that creates the maximum margin between two classes to separate them [77,78].The following optimization problem is solved to find the optimal hyperplane: where α i are the Lagrange multipliers and C denotes the penalty factor.Typically, when dealing with non-linear data, SVM uses techniques called kernel functions.Kernel functions can transfer data from a lower-dimensional space to a higher-dimensional space [79,80].Although various kernel functions have been proposed, based on the literature, four kernel types, including the radial basis function (RBF), and linear, polynomial, and sigmoid functions, are the most used functions [81,82].

Alternating Decision Tree
The alternating decision tree (ADTree) is a model which incorporates the decision tree with boosting algorithm [83].Some studies have demonstrated that the ADTree model produces better accuracy than the standard model trees in classification problems [84].Generally, in ADTree, two types of nodes, including a splitter node and a prediction node, are used instead of each decision node.The splitter node defines the data based on the selected attribute values, while the prediction node includes the real-valued number, which is used for prediction [83,85].At each boosting iteration, two sets, including a set of preconditions (P t ) and a set of base rules (R t ), are maintained by the algorithm.The base rules that generate preconditions can be any real numbers, including a prediction c 1 , a base condition c 2 , and two real numbers, a and b.
The first rule R 1 is set to have a true precondition and condition; the first prediction value can be defined as: Then c 1 and c 2 are selected by minimizing Z t (c 1 ,c 2 ): The prediction is a when The values of a and b are computed with the following formula: Suppose that R includes the base rules, then R t+1 = R t + r t creates the new rule, and all base rules in R t+1 are summed; the sign of the sum defines the classification rule as follows: where T is the number of training instances, r t is the two prediction values (a and b) at each layer of the tree, and x is a set of instances.

Naïve Bayes Tree
The naïve Bayes tree (NBTree) is a hybrid model that comprises the naïve Bayes and decision tree classifiers [86].It replaces the leaf node of the built decision tree with a naïve Bayes classifier [87,88].During the past few years, several researchers have shown great interest in the application of the NBTree algorithm in the field of classification and its satisfactory performance has been demonstrated [85][86][87]89].The classification rule for naïve Bayes is defined as follows: where c j is the output class of class set C, a 1 , a 2 , . . ., a m are the conditionally independent factors, and k is the total number of classes.

Logistic Model Tree
Logistic model tree (LMT) was proposed by Landwehr, et al. [90] for classification purposes.This algorithm incorporates a decision tree and linear logistic regression in a single tree to increase prediction accuracy [90].This has led to various applications of LMT in geoscience studies [62,91,92].To use the capability of the two algorithms, LMT uses the LogitBoost algorithm to fit the logistic regression at the nodes of the tree [93].For the fully-grown tree, LMT applies the CART algorithm [94] for pruning and a cross-validation technique is applied to select the optimal subtree [95].The LogitBoost incrementally refines the logistic regression model by introducing least-squares fitting additive modeling; the probability of each leaf nodes in class M i is calculated by simple logistic regression [95][96][97] as follows: where n is the number of factors in vector x, β i and β 0 are the coefficient and intercept of the regression, respectively, and D is the number of classes.

Artificial Neural Network
Artificial neural network (ANN) is a computational mechanism that has been developed as a crude attempted replication of the human brain [98].An attractive feature of ANN is its approach that does not consider any particular hypothesis with respect to the statistical distribution of the information [78].In contrast to classical statistical models, it can model non-linear relationships between predictor and target variables [99].The significant abilities of ANNs, such as high learning ability, ability to work with high-dimensional data and generalization [100,101], have led to it to being a widely used method for various types of prediction problems [102].
Among the different types of ANN structures, three structures, including the singlelayer feed-forward ANN (SLFF-ANN), the multi-layer feed-forward ANN (MLFF-ANN), and the recurrent ANN are the most used structures, among which the MLFF-ANN is the most popular [103].The MLFF-ANN consists of three layers, including an input layer, output layer and a hidden layer, which is located between the input and output layers [104].
At first, the network begins with multiplication of the random weights and data, which are fed into the input neuron.The resulting values are summed to calculate the input to the neurons in the next layer.Then, a non-linear activation function is used to convert the input value to the known output, as in Equation ( 9): where O j is the calculated amount of the neuron j, f is the activation function, w ij and b j are the weight and bias of the jth neuron for ith input x.
Finding the appropriate connection weights is known as optimization of the network which is performed in the training step using various learning algorithms.The most widely used learning algorithm is the gradient descent algorithm incorporated with back propagation (BP).To minimize the error between the estimated output and observed value, the BP algorithm tries to find the optimized weights between the layers of the network [105,106].After finding the weights with the least error, the network is called a trained network and will be evaluated with a new dataset to calculate its generalization capability [103].

Boosted Regression Trees
The boosted regression tree (BRT) technique is an advanced model that combines decision trees (weak learners) of fixed size and a robust boosting technique, as follows [107]: where F m (x) is the final model, γ m is a learning rate, h m (x) are weak learners, and M is the number of iterations.The boosting technique enhances the accuracy of the prediction by reducing the final model variance [108].To do this, boosting interactively fits new trees to the residual errors of the existing tree to build a large ensemble of small regression trees to demonstrate the complex relationships between the target and predictor variables (Equation ( 11)) [109].
Considering the loss function L(y, F(x)) and training dataset D = {(x 1 , y 1 ), . . .(x n y n )}, at each step, weak learners h m (x) are used to minimize the loss function using the current model F m−1 and its fit F m−1 (x i ): BRT uses a stochastic gradient boosting approach to solve this minimization problem: The strong learning ability and flexibility of the BRT model in dealing with complex data have been proven by several studies in different fields, such as urban expansion [110], environmental science [111] and ecological modeling [112].

ML Methodology of GESM
To comprehensively review the state of the art of ML models for GESM, top peerreviewed papers were chosen by implementing search engines, such as Scopus, ScienceDirect and the Web of Science (WOS) up to the present (2022).To select papers, the search criteria were the terms 'gully erosion' and 'machine learning' and consideration of three types of quality measure, including the source-normalized impact per paper (SNIP), the cite score and the h-index.On this basis, 19 high-quality published papers were selected for the current review.Table 2 shows the detailed information for the selected papers.From the reviewed papers, typically, four main steps are used in ML-related GESM studies to obtain accurate results.These steps are (i) preparation of an inventory map of gullies, (ii) creation of gully conditioning factors, (iii) multi-collinearity assessment, and (iv) model development and performance evaluation.Each step is described and discussed below.

Inventory Map of Gullies
A gully erosion inventory map (GEIM) is created to provide the location and spatial distribution of the gullies in the study area [62].Generally, for producing an inventory map, the gullies are determined with an extensive field survey using the global positioning system (GPS) and are validated with ancillary data, such as high-resolution images acquired from Google Earth [62,64,115].After selecting the pixels that show gully erosion, the nongully pixels, which are located in pixels other than the gully pixels, are selected and merged with gully erosion pixels to create a dataset of gully presence (positive cases) and absence (negative cases) for each set.The non-gully pixels are created randomly; it is recommended that the ratio of gully and non-gully pixels should be equal to one [98,126,127].In the next step, the GEIM dataset is divided into two groups, the training and validation sets to be used for ML model development.In the literature, it is suggested that 70% of the dataset is used for model training, and that the other 30% is considered as a validation set [62,72,98,100,128].Table 3 shows the detailed characteristics of the study area, the resolution of the imageries used, and the number of digitized gullies in each reviewed GESM study.As illustrated in the table, the number of digitized gullies is different in each study based on the attributes of the area.However, in most studies, the gully inventory has been split into training and test datasets based on the 70/30 rule.

Gully Conditioning Factors
The choice of appropriate geo-environmental factors (GEFs) is an important step for the construction of an accurate gully erosion susceptibility map [98,115,129].A wide range of GEFs, such as primary topographic attributes (e.g., slope, elevation, slope degree, aspect) and secondary topographic attributes (e.g., stream power index, terrain ruggedness index, topographic wetness and position indices) that contribute to the spatial distribution of gullies has been employed by different researchers in the literature [1].Table 4 presents a review of the different gully erosion studies with a focus on GEFs that have been used in each publication.
Table 4 shows that some GEFs, such as primary and secondary topographic attributes are the most used factors in GESM.On the other hand, some factors (e.g., convergence index, terrain ruggedness index, topographical position index) have not been used in most studies.This means that a standard methodology has not yet been accepted by scientists as a globally accepted procedure for the selection of GEFs [130].However, some authors have tried to investigate the importance of different GEFs to identify the most effective GEFs that significantly affect the accuracy of GESM.For example, Akgün and Türk [18] in their study for GESM in the Ayvalık region, located northwest of Turkey, investigated the importance of different GEFs for erosion susceptibility.Among seven GEFs considered, the weathering grades of rocks, lineament density and drainage density were the most effective factors for erosion, while slope gradient and land cover achieved a second importance ranking.
Soil surface properties represent another set of factors that significantly affect the quality of GESM due to their influence on resistance to erosion, infiltration and runoff rate [131,132].Thus, soil texture has attracted much attention from different researchers in the investigation of gully erosion susceptibility [98,129,[133][134][135][136][137][138].In addition to soil texture, Garosi, Sheklabadi, Conoscenti, Pourghasemi and Van Oost [115] used two other soil properties, soil organic carbon (SOC) and calcium carbonate equivalent (CCE), as predictor variables for GESM, in addition to other controlling factors.A variable importance analysis showed that three GEFs factors, including distance from rivers, CCE, and the topographic position index (TPI), significantly affected the accuracy of GESM.Roy and Saha [121] also found that the soil type factor had the most marked effect on GESM.
In another study, Amiri, Pourghasemi, Ghanbarian and Afzali [69] concluded that, among all the applied factors, three factors, including distance from the river, clay percent, and land use, noticeably affected GESM.Results obtained by Arabameri, Chen, Loche, Zhao, Li, Lombardo, Cerda, Pradhan and Bui [62] showed that, among eighteen GEF factors, drainage density, rainfall and slope were the most effective factors for predicting gully occurrences.A variable importance analysis conducted by Pourghasemi, Sadhasivam, Kariminejad and Collins [72] showed that the distance from rivers and the plan curvature had the most and the least importance, respectively, with respect to finding the zones most prone to gullying.
It is common in the field of GESM that a variable identified as the most important factor in one study is not found to be important in another study.For example, a variable importance analysis conducted by Pourghasemi, Sadhasivam, Kariminejad and Collins [72] showed that slope had the highest effect on gully erosion, but Gayen, Pourghasemi, Saha, Keesstra and Bai [117], in their study, concluded that slope had the lowest importance.As suggested by Gutiérrez, et al. [139] and Garosi, Sheklabadi, Conoscenti, Pourghasemi and Van Oost [115], this could be because some variables do not contribute to the spatial distribution of gullies, because there are some uncertainties related to the accurate quantification of these variables, or because there is significant variation in the effect of variables in different environments.However, most studies have suggested that the primary topographic attributes (e.g., elevation, slope and plan curvature), hydrological properties, such as rainfall, and anthropogenic factors, such as land use/cover are among the most important factors that significantly affect the quality of GESM [72,98,117,118].The spatial resolution of the GEFs is the most important factor that significantly affects the accuracy of GESM.As can be seen in Table 3, different spatial resolutions from 1 m to 30 m have been used in different studies to derive GEFs.However, choosing the proper spatial resolution of imageries for the extraction of GEFs depends on the extent of the study area and the availability of the data, and only a few studies have sought to assess the effect of the spatial resolution on the accuracy of the ML models developed on the same study area.For example, Chowdhuri, Pal, Saha, Chakrabortty and Roy [122] compared five types of DEM, i.e., shuttle radar topography mission (SRTM), advanced spaceborne thermal emission and reflection radiometer (ASTER), Cartosat-1, advanced land-observing satellite (ALOS) World 3D-30 m (AW3D30) with a spatial resolution of 30 m, and ALOS PALSAR with a spatial resolution of 12.5 m, to evaluate the scale-dependence of DEM-derived GEFs in GESM.The results obtained showed that the developed models with ALOS PALSAR produced higher accuracy than the developed models with other types of DEM.The results also showed that, although the DEMs with 30 m spatial resolution produced comparable results, of these, the AW3D30 produced the most appropriate results.In a similar study, Arabameri, Rezaie, Pal, Cerda, Saha, Chakrabortty and Lee [124] compared the predictive ability of GEFs derived from three types of DEM (i.e., ALOS PALSAR = 12.5 m, AW3D30 and ASTER = 30 m).The results showed that the ML models developed by GEFs derived from ALOS PALSAR produced the most appropriate results followed by the models developed by AW3D30 and ASTER.
Although the results of the studies described have demonstrated the superiority of the developed ML models with finer resolution images, the results of studies conducted by Gayen, Pourghasemi, Saha, Keesstra and Bai [117], Akgün and Türk [18], and Pourghasemi, Sadhasivam, Kariminejad and Collins [72] showed that ML models developed with coarser spatial resolution data (e.g., 20, 25 and 30 m) can provide satisfactory results for gully detection.Among the papers reviewed, only two papers used high resolution images for the preparation of GEFs.Angileri, Conoscenti, Hochschild, Märker, Rotigliano and Agnesi [114], in their study, used a DEM with 2 m spatial resolution to produce a gully erosion susceptibility map in a river catchment with an area of 9.5 km 2 .In another study, Yang, Wang, Pang, Long, Wang, Cruse and Yang [125] used 1 m digital surface models obtained by UAV to produce a gully erosion susceptibility map in a small watershed with an area of 10.9 km 2 .Although both studies obtained good predictive performance for the ML models, the applied high-resolution imageries in these studies cannot easily be used in large areas because of the difficulties associated with the acquisition of the high-resolution data and the computation costs.Therefore, the application of these types of data is limited to small areas.

Multi-Collinearity Assessment
Due to the large number of built GEFs in GESM, multi-collinearity problems occur that can lead to a reduction in model accuracy.The multi-collinearity issue arises when several factors are strongly correlated, which can lead to mistakes and misinterpretation in the model estimations [137,140].Therefore, a multi-collinearity evaluation is considered an essential step after preparing the GEFS [72,115].Generally, two statistical indicators, namely, the tolerance (TOL) and the variance inflation factor (VIF) are utilized to calculate the multicollinearity of variables.These two indicators are defined in Equations ( 14) and ( 15): where R 2 i is the coefficient of determination, calculated for the ith variable (e.g., x 1 , . . ., x n ) against every other variable in the model.Bui, et al. [141] suggested that a TOL < 0.1 and a VIF > 10 indicate a multi-collinearity problem.

Model Development and Performance Evaluation
In this step, all ML models are trained using a training set to find the optimum parameters of each model that produce the best results for detection of the gully and non-gully pixels in the training set and then the trained models are tested on the validation set.Validation of the developed models is an important task for the evaluation of the predicted gully erosion susceptibility maps (GESMs) [18].The validation process includes goodness of fit and predictive accuracy steps, whereby the former is used for model evaluation in the training dataset and the latter computes the model performance in predicting a validation dataset [141].Typically, two types of performance measures, including threshold-dependent and threshold-independent methods, are used for validation of the predicated GESMs in related studies [98].In most studies, three indicators are employed to assess the model performance: accuracy and Kappa coefficients for the threshold-dependent methods, and the receiver operating characteristic (ROC) curve for the threshold-independent method.

Accuracy
To compute the accuracy, predicted gully susceptibility maps are divided into gully or non-gully classes.Then, a contingency matrix (Table 5) is applied that has four components: true positive (TP) and true negative (TN), which show the total number of gully occurrence and non-gully occurrence pixels that are correctly classified, false positive (FP) that shows the number of non-gully pixels that are misclassified and incorrectly considered as gully pixels, and false negative (FN) that shows the number of gully pixels which are incorrectly detected as non-gully pixels.
Table 5. Contingency matrix employed for calculation of accuracy and Kappa coefficients.

Kappa Coefficient
The Kappa coefficient (Equation ( 17)) demonstrates the ability of the models for classification based on the proportion of the observed agreements (P o ) and the hypothetical probability of expected agreements (P e ), which account for the occurrences that happened by chance [142].

Receiver Operating Characteristic (ROC) Curve
The ROC curve is a well-known technique for quantitatively describing the efficiency of probabilistic tests [144,145].To obtain ROC, two vectors are required, where one vector indicates the binary condition of the presence-absence of a given problem and the other demonstrates the corresponding probability estimates [89,146].The shape of the ROC curve can be used to assess the ability of the model for prediction, where the higher performance is near the upper left part of the curve [18].In addition to the shape, the accuracy can be computed using the area under the ROC curve (AUC), which has been widely applied as a measure to quantify the performance of predictive models [147].According to the literature [147,148], AUC values for the accuracy assessment of models can be classified as follows: poor (0.5-0.6), moderate (0.6-0.7), good (0.7-0.8), very good (0.8-0.9) and excellent (0.9-1.0).

Software and Programming Languages Used for GESM
To accomplish the aims of the GESM using the four main steps described above, different software programs and programming languages are used by researchers, as follows: 1.
ArcGIS and SAGA-GIS software are typically implemented for preprocessing steps, such as preparing the inventory map of gullies and GEFs. 2.
MATLAB, Python, and R programming languages are used for multi-collinearity assessment, ML model development and performance evaluation.

Comparative Performance Analysis of ML
In this section, the predictive ability of ML models used in the studies reviewed is compared and discussed to identify the models that are the most appropriate for GESM.Some researchers have highlighted the suitability of certain ML techniques, such as support vector machine (SVM), artificial neural network (ANN), random forest (RF), decision tree (DT), and boosted regression tree (BRT), for gully erosion studies [64,114,[149][150][151].
Märker, Pelacani and Schröder [65] compared RF with stochastic gradient boosting (TreeNet: TN) to produce susceptibility maps (rill-interrill erosion and gullies) for the Orme River, Italy.The comparison of the models using AUC, the Kappa coefficient, and pseudo R 2 showed that, although both models provided good accuracy, TN outperformed RF.However, TN showed instability between the training and validation accuracy that occurred due to overfitting.In contrast, RF was more stable during the training and validation phases.Rahmati, Tahmasebipour, Haghizadeh, Pourghasemi and Feizizadeh [98], in a study that was carried out in the Kashkan-Poldokhtar Watershed, Iran, compared seven ML models, including SVM, with four well-known kernel types (radial basis function, polynomial, linear and sigmoid), RF, ANN, and BRT for GESM.All the applied models were tested on three different sample sets to comprehensively assess the performance of each model.To produce the GESMs, 12 GEFs were employed as predictors.It was concluded that the accuracy ranking was RF > SVM-RBF > BRT > SVM-polynomial.The RF and SVM-RBF showed the most predictive accuracy for GESM.The superiority of RF over other models has also been reported by other researchers [69,72,115,117] (Table 6).The excellent prediction performance of RF for gully detection is accounted for by the following: (1) RF is capable of using all predictors with different types without removing any parameter during the modeling; (2) RF can work with very large datasets; (3) since RF can create multiple predictions of each phenomenon using a combination of trees, it can find the non-linear relationships between predictors and predictands; (4) RF combines different types of data in the analysis to overcome the problems associated with lack of distribution of assumptions related to the input data; and (5) RF shows less sensitivity to the noise in data [152][153][154].

Paper Evaluation Criteria Results
[65] AUC, Kappa, R 2 TreeNet: TN > RF; RF was more stable [18] AUC Logistic regression is accurate [113] User's and producer's accuracy RF is useful [114] AUC SGT is outstanding [98] AUC, Kappa, accuracy RF > SVM-RBF > BRT > SVM-polynomial > ANN [115] AUC, Kappa, accuracy, RMSE, MAE RF > SVM > NB > GAM [116] User's and producer's accuracy SVM is useful [69] AUC RF > SVM > BRT [62] AUC LMT > NBTree > ADTree [117] AUC RF > MARS > SVM > FDA [118] AUC, SCAI a , FR b GWR-CF-RF > CF-RF > RF > CF [72] AUC RF outperformed the other 9 models [119] AUC, accuracy, TSS  For these reasons, RF has received significant attention in gully erosion studies.For example, Shruthi, Kerle, Jetten and Stein [113], in their study for gully system prediction using object-oriented analysis and ASTER data, employed an RF technique to find the relationship between the explanatory variables and gully erosion.Their results showed that satellite images with medium resolution had enough information for GESM based on an overall performance of 81% (OOB error of 19%).Kuhnert, Henderson, Bartley and Herr [151] proposed a methodology for the assessment of errors associated with gully density mapping in the Burdekin catchment in Queensland, Australia, using an RF modeling approach.Among all the applied models for GESM, after RF, SVM is the second most used method in gully erosion studies and its ability has been demonstrated by different researchers.For example, results obtained by Rahmati, Tahmasebipour, Haghizadeh, Pourghasemi and Feizizadeh [98], Garosi, Sheklabadi, Conoscenti,Pourghasemi and Van Oost [115], and Amiri, Pourghasemi, Ghanbarian and Afzali [69], showed that SVM achieved the second ranking after RF for GESM.The good performance of SVM is related to its ability to analyze non-linear relationships [155] and also because it is less sensitive to the input data, which makes SVM a powerful tool for the detection of a wide range of geo-environmental problems [156][157][158].Therefore, this model has also been used by researchers for gully erosion mapping.Makaya, Mutanga, Kiala, Dube and Seutloali [116] assessed the applicability of a Sentinel-2 MSI multispectral sensor for GESM in the Okhombe Valley, South Africa.The SVM model was applied for gully classification, and the accuracy of the model was assessed using a confusion matrix.The results showed that SVM achieved an overall classification accuracy of 77% for GESM.
Comparison of published papers in the field of GESM using ML techniques shows that several developments have occurred in ML applications from 2011 to 2021 (Table 2).
For example, recent studies have attempted to develop novel/hybrid models to produce better predictive performance than the ML techniques that are currently used for GESM.Arabameri, Pradhan and Rezaei [118] combined three methods, including the geographically weighted regression (GWR) technique, the certainty factor (CF) and random forest, (RF) to generate GESMs.The results showed that use of the GWR-CF-RF model resulted in better accuracy than the individual CF and RF models.In another study, a GIS-based hybrid model was proposed by Arabameri, Cerda, Pradhan, Tiefenbacher, Lombardo and Bui [119] for GESM.The Dempster-Shafer (DS) statistical model was combined with four kernels of BRT, including binary logistic, reg logistic, binary logitraw, and reg linear to create four hybrid models, including DS-binary logistic (DS-BL), DS-reg logistic (DS-RLG), DS-reg linear (DS-RL), and DS-binary logitraw (DS-BLW).Their results showed that the integration of the models increased prediction accuracy and DS-BL outperformed the other hybrid models.The individual DS model exhibited the worst performance among all the applied models.
In another study, Arabameri, Sadhasivam, Turabieh, Mafarja, Rezaie, Pal and Santosh [120] introduced novel hybrid ensemble models to detect gully-prone areas in the Bastam plain, Iran.Four new ensemble techniques, including credal decision tree-dagging (CDT-DA), credal decision tree-bagging (CDT-BA), credal decision tree-alternative decision tree (CDT-ADTree), and credal decision tree rotation forest (CDT-RF) were evaluated for GESM and compared with the results for an individual CDT.The results showed that use of CDT-RF resulted in greater accuracy than the other applied models.
Roy and Saha [121] evaluated gully erosion susceptibility in the Hinglo river basin, India, by utilizing the multilayer perceptron neural network (MLP) as the base classifier and two hybrid ensemble ML techniques, including bagging and dagging.Therefore, three models, including MLP, MLP-bagging, and MLP-dagging were developed and tested.The results of the accuracy assessment by AUC, MAE, and RMSE showed that MLP-bagging performed better than the other models.
Although many studies have been carried out in the field of GESM using ML techniques, researchers still contest the choice of the most accurate model because, in addition to the quality of the user data, the selection of the best model relies on the model structure [63].For example, Märker, Pelacani and Schröder [65], in their study, concluded that the TreeNet model gives better results than RF, but Pourghasemi, Sadhasivam, Kariminejad and Collins [72] compared RF with nine ML methods with the results showing that RF outperformed the other models.Therefore, it is difficult to provide a ranking of ML models for GESM and more attention needs to be paid to comparison of the different ML techniques to draw reasonable conclusions and gain insights into the drawbacks and advantages of the techniques.

Conclusions and Recommendations
Gully erosion is an important problem that has a great impact on agricultural activities and economics by promoting land and water degradation.ML techniques have been applied in GESM to produce valuable tools for regional managers via identification of locations where gullies occur, as well as those that are susceptible to gully initiation, to assess the environmental impacts of gullies and plan gully erosion controls to mitigate its negative environmental effects.This paper presented a review of ML models employed in the field of GESM.To produce reliable gully erosion maps using ML techniques, four main steps are typically used: (i) producing inventory maps of gullies, (ii) extracting gully conditioning factors, (iii) multi-collinearity assessment, and (iv) model development and performance evaluation.With respect to GEFs, most studies have suggested that primary topographical attributes (e.g., elevation, slope and plan curvature), hydrological properties, such as rainfall, and anthropogenic factors, such as land use/cover, are among the factors that particularly affect the quality of GESM.The spatial resolution of GEFs is an important attribute of GEFs and different studies indicate that a spatial resolution of 1 m to 30 m can be used for GESM.However, the most suitable spatial resolution should be selected based on different criteria, such as the objectives of the study, the extent of the study area, the availability of data, and computational resources.As shown in the present paper, there are many ML models used to estimate gully erosion susceptibility, some are more widely used (e.g., RF and SVM), and some are often used (e.g., logistic regression, ANN, NBTree).The two methods that have generally been found to be the best methods for gully erosion mapping are RF and SVM.Nonetheless, each method yields different results in different areas, and the selection of the best method greatly depends on the reliability of the training data.Therefore, further investigation is required to compare their capabilities for various regions that have different environmental conditions.Comprehensive validation of the applied models is another important step, for which, according to the literature, two types of performance measure are mostly used: threshold-dependent methods, such as accuracy and the Kappa coefficient, and threshold-independent methods, such as ROC.
Despite the promising results for the ML models in GESM, some suggestions to improve the quality of gully erosion susceptibility assessment prediction by ML models can be proposed.First, it is recommended to conduct further studies to test various factors (e.g., topographic and hydrologic) in different geospatial locations that may influence the accuracy of ML models.Second, data-mining models should be used to improve the quality of datasets based on a comprehensive analysis of the relationship between historical gully occurrence and causative factors.Third, an ensemble of models should be implemented to combine the ability of the models to increase the accuracy and decrease the uncertainty of the prediction.
Among the reviewed papers, besides mapping the prone areas to permanent gully erosion, some studies sought to use ML techniques for rill-interill erosion mapping [65,114] and ephemeral gully mapping [115].Since rill-interill and ephemeral gully erosion are other important types of soil erosion, more studies are required to explore the ability of ML methods to predict these types of erosion to gain insights into the relationships between these erosion processes and their controlling factors.
Considering the published papers in GESM using ML techniques, although the GESMs can be used as a valuable tool for the detection of degraded lands by gully erosion, the maps produced cannot reliably identify the gully trajectories which are the main input of some process-based models (e.g., CREAMS) for quantifying the soil loss from gullies.Therefore, it is recommended to combine topographic indices and ML models to provide more accurate estimation of gully trajectories that can be used in process-based models for the estimation of soil loss from gullies.With the help of ML models that can consider the combined effect of different GEFs on gully development, maps which show the spatial distribution of gully erosion occurrence can be produced.Topographic indices can use these distribution maps to extract gully trajectories based on topographic attributes.Funding: Funding for this study was provided from a Natural Sciences and Engineering Research Council of Canada (NSERC) grant and the Ontario Ministry of Agriculture, Food and Rural Affairs (OMAFRA) Agrifood Alliance, grant number RGPIN-2017-04400.
−) True negative (TN) (+|−) False positive (FP) Gully (+) (−|+) False negative (FN) (+|+) True positive (TP) + FP + FN (a) SCAI is the ratio of the surface area of the class to the gully surface area of that class.(b) FR is the ratio of the gully surface area in each class to the surface area of that class.(c) TSS is the true skill statistic.

Author
Contributions: H.M.: writing-original draft preparation; A.B., R.R. and P.D.: review and editing.All authors have read and agreed to the published version of the manuscript.

Table 1 .
Summary table of the abbreviations used (in alphabetical order) in the current paper.

Table 2 .
List of the reviewed publications, including the publisher's name and journal characteristics.

Table 3 .
Characteristics of the study area, resolution of the imageries and information of GEIM used in GESM studies.

Table 4 .
List of most widely used GEFs in different ML-based publications for GESM.

Table 6 .
List of recent studies in the area of GESM using ML techniques.