Novel Ensemble Approaches of Machine Learning Techniques in Modeling the Gully Erosion Susceptibility

: Gully of the others from best to worst accuracy is GLM, MaxEnt, SVM, GLM-ANN, GLM-MaxEnt, GLM-SVM, MaxEnt-ANN, MaxEnt-SVM, GLM-ANN-SVM-MaxEnt, GLM-MaxEnt-ANN, GLM-MaxEnt-SVM and MaxEnt-ANN-SVM. The resulting gully erosion susceptibility models (GESMs) are e ﬃ cient and powerful and could be used to improve soil and water conservation and management. learning ensemble models. A MaxEnt, a GLM, an SVM, and an ANN were tested for preparation of a gully erosion susceptibility model (GESM) for the Golestan Dam basin. Furthermore, this study compares the individual, two-, three-, and four-model ensembles to determine the best gully erosion assessment model (or model combination).


Study Area
The Golestan Dam basin occupies 219,846 hectares (between 37 • 24 00" to 37 • 45 00"N and 55 • 20 00" to 56 • 00"E) in northeastern Golestan Province. The elevation ranges between 58 and 2168 meters above sea level (m.a.s.l.) (Figure 1). A map of elevation of the study area was prepared from an advanced land observing satellite (ALOS) phased-array type L-band synthetic aperture radar (PALSAR) digital elevation model (DEM) acquired from the Alaska Satellite Facility. More than half of the basin is a gently sloped portion of the Alborz Mountains. The average annual rainfall varies from 224 to 736 mm, and rainfall is highest in the southeastern section of the basin; meteorological data were acquired from the Islamic Republic of Iran Metrological Organization [58]. Based on a land use map (Table 1) prepared at the 1:100,000 scale by the Natural Resources Department of Golestan Province, moderate quality rangeland comprises the largest portion of the area (25.9%). Poor rangelands account for 19.09%, garden and dry land-farming for 14.25%, dry land-farming for 11.2%, and dense forests for 11.19%. The remaining area (18.34%) is shared among gardens, moderate quality forest land, good rangelands, flood crossings, low quality forests, and residential areas. Based on a lithology map ( Table 2) acquired from the Geological Society of Iran, the region is comprised of wetlands (60.56%), swamp and marsh (13.4%), grey shale, siltstone and sandstone (8.03%), and shale and calcareous (5.57 %), and the remaining (12.44%) is comprised of other formations (Table 1). Rock outcrops, entisols, entisols/inceptisols, and mollisols are the most common soils in the study area [59].
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 29 an advanced land observing satellite (ALOS) phased-array type L-band synthetic aperture radar (PALSAR) digital elevation model (DEM) acquired from the Alaska Satellite Facility. More than half of the basin is a gently sloped portion of the Alborz Mountains. The average annual rainfall varies from 224 to 736 mm, and rainfall is highest in the southeastern section of the basin; meteorological data were acquired from the Islamic Republic of Iran Metrological Organization [58]. Based on a land use map (Table 1) prepared at the 1:100,000 scale by the Natural Resources Department of Golestan Province, moderate quality rangeland comprises the largest portion of the area (25.9%). Poor rangelands account for 19.09%, garden and dry land-farming for 14.25%, dry land-farming for 11.2%, and dense forests for 11.19%. The remaining area (18.34%) is shared among gardens, moderate quality forest land, good rangelands, flood crossings, low quality forests, and residential areas. Based on a lithology map (Table 2) acquired from the Geological Society of Iran, the region is comprised of wetlands (60.56%), swamp and marsh (13.4%), grey shale, siltstone and sandstone (8.03%), and shale and calcareous (5.57 %), and the remaining (12.44%) is comprised of other formations (Table 1). Rock outcrops, entisols, entisols/inceptisols, and mollisols are the most common soils in the study area [59].

Methodology
This study involves several steps ( Figure 2) and processes for creating the gully erosion susceptibility (GES) maps which includes: (i) To prepare the gully erosion inventory map and the GECFs dataset, 1042 gully head cut locations were identified using high-resolution images, field investigation, and global positioning system (GPS). Data for fourteen environmental factors identified from a literature review were compiled (data sources are described below). (ii) Multi-collinearity analysis among the GECFs using tolerance and variance inflation factor (VIF) techniques was done. (iii) The significance and effectiveness of GECFs was determined using the random forest (RF) model. (iv) GES maps were prepared with MaxEnt, ANN, SVM and GLM models. The ensemble models were prepared by combining sets of two, three and four models. (v) The performances of gully erosion susceptibility models (GESMs) were validated with the area under receiver operating characteristic curve (AUROC) and seed cell area index (SCAI) methods.
(iii) The significance and effectiveness of GECFs was determined using the random forest (RF) model. (iv) GES maps were prepared with MaxEnt, ANN, SVM and GLM models. The ensemble models were prepared by combining sets of two, three and four models.
The performances of gully erosion susceptibility models (GESMs) were validated with the area under receiver operating characteristic curve (AUROC) and seed cell area index (SCAI) methods.

Figure 2.
Flowchart of research in the study area. Figure 2. Flowchart of research in the study area.

Gully Erosion Inventory Map (GEIM)
The GEIM is the basic tool for preparing the GESMs. It consists of spatial datasets with several dependent factors. GEIM represents the spatial distribution of gullies and their geometries (size, shape, length, and width). Through historical and present-day distributions of gullies one can predict future probabilities of gully erosion in the region. Several studies have used GEIM to analyze gully erosion susceptibility [8,24,25,44,60]. ArcGIS 10.3 software was used to generate the GEIM of the Golestan Dam basin. High-resolution images from the Google Earth engine and the ALOS PALSAR DEM at a 12.5 × 12.5 m resolution were used to prepare the GEIM. Reports of 1042 historical gully formations were acquired from the Agricultural and Natural Resources Research Centre of Golestan Province and they were verified by extensive field surveys with handheld GPS. The inventoried sites were divided randomly into 729 training (70%) and 313 validation (30%) sites [19][20][21][22][23][24][25]. The dependent variable for gully formation was the presence or absence of gully and each pixel was coded binarily 1 or 0, respectively. An equal number of locations (pixels) without gullies (1042) were also selected for modeling [61]. The geometric characteristics of the gullies were also measured and recorded in the field [60] (Figure 3). The longest gully was 2.41 km and the shortest was 0.003 km. The deepest gully was 3.15 m and shallowest gully was 0.04 m. The widest gully was 21 m and the narrowest was 1.3 m.
were divided randomly into 729 training (70%) and 313 validation (30%) sites [19][20][21][22][23][24][25]. The dependent variable for gully formation was the presence or absence of gully and each pixel was coded binarily 1 or 0, respectively. An equal number of locations (pixels) without gullies (1042) were also selected for modeling [61]. The geometric characteristics of the gullies were also measured and recorded in the field [60] (Figure 3). The longest gully was 2.41 km and the shortest was 0.003 km. The deepest gully was 3.15 m and shallowest gully was 0.04 m. The widest gully was 21 m and the narrowest was 1.3 m.
GECFs' reciprocal relationships were assessed with a multi-collinearity test. The 14 GECFs were mapped in GIS (Figure 4a-n). Spatial resolution of each factor, however, was not the same. The base resolution of these maps was set by the PALSAR DEM resolution (12.5 × 12.5 m). All factors Remote Sens. 2020, 12, 1890 8 of 30 were converted to this resolution in ArcGIS 10.3. The preparation procedure for each factor is described below.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 29 ( Figure 4c). Rahamati et al. [78] used the RSP factor as a conditioning factor for groundwater potential mapping.

Hydrological Factors
As rainfall impacts drainage processes and collapses gully material [79], precipitation data were collected from the Metrological Department of Iran. The thin plate spile method, first formulated by Wahba, is an effective and accurate interpolation method. The modified orthogonal least squares of thin plate spline (TPS-M) was used to prepared the raster layer of the rainfall data collected from the different weather stations. This is more accurate method than the classical interpolation method. The detailed of the methodology could be found in the work of Boer et al. [80]. The rainfall amounts

DEM Derived Factors
The topographical and hydrological factors (elevation, slope, slope aspect, slope length, CI, TPI, TRI, topographical wetness index (TWI), stream power index (SPI), STI, distance to stream (DtS), and drainage density) were extracted from the ALOS PALSAR DEM following several steps and processes. The characteristic and morphology of the topographical and hydrological factors depends upon the accuracy and gridding techniques employed [69]. The efficiency of the gridding procedure determines the quality of several of the topographical and hydrological GECFs. The vertical accuracy of the DEM was 0.3 m and was used for all of these variables (a similar accuracy assessment was used in Gesch et al. [70]). Vertical accuracy was assessed by comparing elevation on the DEM with ground control points (GCPs). At each point, GCP elevations were subtracted from the DEM elevations at the locations in the GIS. The differences were identified as errors. Positive errors occurred where the DEM value was greater than the GCP, negative errors were for DEM values below GCP elevations. The mean error, root means square error, and standard deviation of the mean error were computed from these measured errors [70]. Interferometric synthetic aperture radar (InSAR) was used to produce the ALOS DEM, and this was analyzed by Zhou et al. [71] and Zhang et al. [72]. Phase measurement is the main step in InSAR for DEM creation and this was followed by the transformation of phase to height [71].
After processing the ALOS PALSAR DEM, the topographical and hydrological GECFs were created and measured in GIS with the SAGA GIS tool. The topographical factors influence erosive power, velocity of flow, and surface runoff [73]. Slope controls surface runoff and drainage, influencing gully formation [74]. Slope aspect is also an important GECF [75].
Nubre et al. [76] created the height above nearest drainage (HAND) as a new terrain model. The HAND model indicates topography according to the local relative heights in the drainage network. Kornejady et al. [77] used the HAND tool as a topo-hydrological factor for landslide susceptibility mapping. The ALOS PALSAR DEM was used in this study to create a DEM raster with a resolution of 10 m. For a series of computations, the DEM layer was used to construct a hydrologically consistent DEM, to define flow directions, and to delineate drainage channels. The mapped stream network (MSN) layers were acquired from the Iranian Department of Water Resources Management IDWRM [77]. The MSN was field survey assessed. The automated process for generating the HAND map was conducted in this case study, thus, the spatial data used in the HAND Tool derived from both the DEM and the MSN. When the user chooses the automated process, the "drainage network" area should become operational. The design of the Hand tool's user interface is valuable, as it allowed fast entry of layers and the choice of either the manual or automatic procedure. The relative slope position (RSP) was determined from the DEM using the SAGA GIS tool. The RSP value ranges from 0 to 1 (Figure 4c). Rahamati et al. [78] used the RSP factor as a conditioning factor for groundwater potential mapping.

Hydrological Factors
As rainfall impacts drainage processes and collapses gully material [79], precipitation data were collected from the Metrological Department of Iran. The thin plate spile method, first formulated by Wahba, is an effective and accurate interpolation method. The modified orthogonal least squares of thin plate spline (TPS-M) was used to prepared the raster layer of the rainfall data collected from the different weather stations. This is more accurate method than the classical interpolation method. The detailed of the methodology could be found in the work of Boer et al. [80]. The rainfall amounts ranged from 224 to 736 mm ( Figure 4). DtS and SD are essential hydrological components. DtS and SD determine the width of erosion in rills and gullies [81]. The drainage network was derived from the DEM with the automatic ArcHydro tool in the GIS. Distances from locations (pixels) to streams was measured with the Euclidean distance tool. The drainage density was determined by Horton's formula on a 1 km 2 grid. The highest drainage density was measured as 1.39 km/km 2 ( Figure 4k).

Environmental Factors
The distance to road (DtR) is important because roads affect runoff and influences the formation of gullies [82]. A road map at a scale of 1:100,000 was acquired from the National Geographic Organization of Iran [19][20][21][22][23][24][25]. Euclidean distance buffering was used to measure distances to roads. Distances ranged from 0 to 20,477 m ( Figure 4j). The road density map was created using the line density method. Road densities in the basin range from 0 to 0.56 km/km 2 ( Figure 4l). Geomorphology, LULC, NDVI, and lithology are regarded as significant to GES [83]. The geological map of the study area was acquired from the Geological Society of Iran [84]. The lithology map was digitized in GIS from the map at a scale of 1:100,000. Geologically, the study area is composed of eight geological segments and units: Kat, Qsw, Ksn, Ekh, Qm, Ksr, Jmz, and Jl ( Figure 4n). LULC contributes directly or indirectly by influencing evapotranspiration, infiltration, run-off, and sediment dynamics [85]. By comparison, many agricultural practices promote gully erosion and genesis [86]. The land use map was created from the combination of a map of 1:100,000 scale created by the Golestan Province Department of Natural Resources and images acquired from Google Earth. A total of 346 ground control points (GCPs) were selected for validation using the kappa index. The obtained kappa coefficient for the prepared map was 0.924, indicating its high accuracy. The land use types include poor range, dry farming-garden, good range, moderate forest, low forest, residential areas, moderate range, dense forest, agriculture, dry farming, flood crossings, and gardens (Table 1 and Figure 4).

Multi Collinearity Analysis
The GECFs were tested for the collinearity using tolerance (TOI) and the variance inflation factor (VIF) [19][20][21][22][23][24]62]. The TOI and VIF methods indicate whether there are linear relationships between GECF pairs. The linear relationship depends on the threshold range value of TOI and VIF. The values <10 for VIF and >0.1 for TOI, suggest no collinearity problems among the GECFs [62]. A multi-collinearity problem exists when two and more factors are highly correlated. Arabameri et al. [19], Saha et al. [58], and Roy and Saha [44] used this method for GES assessment. It is a vital test for subsequent modeling phases.

Methods
One statistical probability method (GLM), one artificial intelligence method (ANN), and two machine learning models (SVM and MaxEnt) were chosen to prepare GESMs. A number of ensemble models were created by combining these models in stepwise combinations (ensembles of two-, three-, and four-models) to determine which was best.

Maximum Entropy (MaxEnt)
MaxEnt is a presence-only feature and machine-learning model [87]. The presence-only feature is important for modeling as it is more reliable for remote areas [88]. On the other hand, the model is biased when exposed to data from nearby accessible areas [89]. MaxEnt identifies the existence probabilities of locations where a phenomenon is present based on the information theory and statistics [90]. To evaluate the uncertain distribution of probability it consists of set geo-environmental variables. MaxEnt assumes that the probability of occurrence for a phenomenon (here gully erosion) is equal at the all pixels so that it selects the uniform probability distribution function (PDF) as the target distribution. However, certain restrictions compel this PDF to circulate the true target. These constraints are determined by the GECF data that were used to map GES. An interchangeable equation of those factors, called the features was used in the MaxEnt [91]. Specific variables form the features that force MaxEnt's first conjecture to have those constraints. A continuous layer, such as distance to streams for example, creates a linear function that imposes a limit on the numerical average of factor values in the locations where the phenomenon is present. If the mean value of the factor at the presence points is, for instance, equal to 102 m, MaxEnt will consider the locations close to this number as highly susceptible areas. Specific types of features are contained in continuous variables, specifically quadratic, product, threshold, categorical, k binary variables [92]. The maximum entropy would be the final best estimation that satisfies all constraints [93,94]. All the mathematical details of the MaxEnt model can be found in Phillips et al. [87,92] and Elith et al. [91].

Artificial Neural Network (ANN)
The human brain is the inspiration for ANN [95]. ANN has several algorithms that can analyze and predict the nonlinear properties of a phenomenon [96]. Multi-layer perceptron (MLP) is one of the popular algorithms of the ANN model [97]. An ANN consists of the nodes of three layers: the input layer, hidden layers, and the output layer. The nodes in hidden layers measure the information inside the data when the input layer is not sufficiently involved and responsive to do so. GECFs and gully Remote Sens. 2020, 12, 1890 12 of 30 erosion training results connect the input layer to the output layer. The input and hidden layers then systematically strive to simplify and predict the structures by embracing the knowledge from the input nodes and working with dynamic functions [98].
The number of input and output nodes is defined by a structured code, where nodes are equal to GECFs and output nodes are equal to a Boolean value (0 and 1) for each pixel: 1 indicates probable gully erosion and 0 probably not gully erosion. The trials and errors are determined by the number of hidden layers or nodes [99]. The application of and detailed information about the ANN model is discussed in Arora et al. [100].

Support Vector Machine (SVM)
SVM is a supervised machine learning classifier introduced by Vapnik and Chervonenkis in 1963, that depends on the principle of statistical learning [101]. SVM can be used for classification and regression [102]. It consists of several types of classification functions and can be used to analyze errors and to generalize information that requires the least amount of model tuning [103]. To limit the inherent complexity in the behaviors of phenomena, these processes use stored information inside factors in many iterations [104]. Typically, the hyper-plane with the highest margin should provide the strongest classification performance [105]. A strong hyper-plane between objects is to be much greater than the problem's theoretical vision, when facing actual noisy artifacts. To assign training data around the margin an appropriate methodological error (total gap between the margin and training points), a soft hyper-plane is configured [106]. Subsequently, the rise in the uncertainty of the model contributes to a growth in the complexity of the model and to become the least suited model, thus leading to a decline in generalization, and vice versa. A standardized n-dimensional hyper-plane solution with a maximum distance that is rather complicated and that also has few training errors should thus be found [107][108][109][110].

General Linear Model (GLM)
The GLM was derived from the classic linear-regression model platform extension [111]. The GLM applies a common regression structure to non-normal distributions [112]. The statistical function that is known as LOGIT for the GLM model was computed by the Equation (1) [113,114]. GLM is used when there are n independent observations, Y is a function of explanatory variables, X 1 , X 2 , ..., X n (these can be either continuous or categorical), and observations derive from any exponential family distribution.

Measuring the Importance of GECFs by RF
RF is a non-parametric multivariate model [115]. RF models generate thousands of trees, developing 'forests' based on decision rules. Each tree in the RF model depends on a sample of bootstrapped data, applying a CART process with a random subset of variables used at each node. The final decision of class membership and output is made according to majority voting among all decision trees [116]. The RF model was used in this study to evaluate the relative importance of the variables. RF is a machine learning model developed by Breiman [115]. More details of this model can be found in Breiman et al. [116] and Calle et al. [117,118].

Receiver Operating Characteristics (ROC)
The ROC curve is a popular validation method for assessing a model's performance [119]. The area under curve (AUC) of ROC measures the discriminatory capacity of a classification model [120]. This method was widely applied for accuracy assessment of numerous assessments of natural hazards [121][122][123][124][125]. The ROC curve is used to test the efficiency of a classifier system over the spectrum of cut-off points [126]. The ROC curve is two-dimensional because there are two terms (events and non-events) in the modeling of precision assessment, and therefore, two types of possible accuracy [127]. The first aspect is the success rate of event identification (shown along the Y-or vertical axis). The ROC is a graph which plots the sensitivity (Y-axis) against the 1-specificity (X-axis) using the following Equations (2) and (3): where TP is true positive, FN is false negative, TN is true negative, FP is false positive. P is the total number of gullies and N is the total number of non-gullies.
The area under curve (AUC) indicates the predictive capacity of the model. The threshold values of AUC range from 0.5 to 1 (1 indicates the best performance and 0.5 indicates poor performance) [128]. AUC values have been classified into four levels of accuracy: poor (AUC = 0.6 to 0.7), fair (AUC = 0.7 to 0.8), good (AUC = 0.8 to 0.9) and excellent (AUC = 0.9 to 1) [129].
ROC curves were plotted with both the training gullies points and validation gullies points. The sizes of the training and validation gully sets are proportional and relevant. A 70:30% split is common and widely used, for instance in gully erosion susceptibility [19][20][21][22][23][24][25], landslide susceptibility [121], flood susceptibility [122], groundwater potential [130], and land subsidence susceptibility modeling [123][124][125]. Other sample ratios have been tested: 80:20 [131], 70:30 [132], and 50:50 [133]. There are no absolute rules or standard methods for partitioning of samples, but the decision reflects the user's and experts' opinions and goals. A model's performance depends on the partitions of the samples in the inventory datasets. The sample partitions make a suitable result when the larger percentage of the training data are selected for modeling and the smaller percentage of validation data are used to test the accuracy of the models. Therefore, a 70:30 ratio of the inventory datasets was used to train the GES models and to validate them.

Seed Cell Area Index (SCAI)
The SCAI is the ratio of gullies' pixels to each model's classes of susceptibility. Arabameri et al. [19] states that high SCAI values indicate good model accuracy and low values indicate poor performance. This method has been used to assess the predictive accuracies of models' susceptibility maps.

Creating Ensemble Models
Ensemble modeling combines two or more single models into a composite model to enhance the reliability of predictive power [134]. This technique has received significant interest among the experts, especially those who use data-mining and machine-learning models [135]. In all ensemble strategies (e.g., bagging and boosting), individual models are weighted. However, there is a diversity of methods to calculate the weights. This study used the heterogeneous group, an integration system that includes addition, subtraction, multiplication, and division when an expanded method was developed for further analysis. The weighted mean function was used to construct the ensemble structures of the three models using Equation (5): where EM is the resulting ensemble model, AUSRC i is the AUSRC value of the i th model (M i ).

Multi-Collinearity Analysis (MA)
The multi-collinearity test among GECFs was performed in SPSS 17 statistical software. The GECFs in this study achieved tolerance and variance inflation factor (VIF) values below the upper and lower limits, respectively (Table 3). There are no multicollinearity problems among the GECFs. Therefore, all GECFs were suitable for gully erosion modelling.

Gully Erosion Modeling with Individual Models
GESMs were produced using the MaxEnt, ANN, SVM, and GLM models. The GESMs were categorized into very low, low, moderate, high, and very-high susceptibility zones based on Jenks' natural break classification method. The SVM model categorized 16.01% of the basin as very-high GES. The MaxEnt assigned this class to 14.32%, ANN assigned 14.15%, and GLM 14.47% ( Figure 5). The classifications of very low susceptibility areas were 33.11% by MaxEnt, 56.30% by ANN, 39.99% by SVM, and 33.86% by GLM.

Gully Erosion Modeling by Ensemble of Two Models
Combining two models provides better performance for GES mapping. Pourghasemi, et al. [45] used ensemble machine learning algorithms GESM. Their results demonstrated that the ensemble model achieved the highest predictive performance. We used six two-model ensembles: GLM-MaxEnt, GLM-ANN, GLM-SVM, MaxEnt-ANN, MaxEnt-SVM, and ANN-SVM. The GESMs produced by these ensemble models were also classified into the five susceptibility classes. The GLM-MaxEnt model classified 13.52% of the basin as very-high GES. MaxEnt-ANN assigned very-high to 23.76%, the highest among these ensemble models, and GLM-ANN assigned 12.91%, the lowest among these models ( Figure 6). All ensemble models indicated that the central and northeastern parts of the basin have very-high GES.

Gully Erosion Modeling by Ensemble of Two Models
Combining two models provides better performance for GES mapping. Pourghasemi, et al. [45] used ensemble machine learning algorithms GESM. Their results demonstrated that the ensemble model achieved the highest predictive performance. We used six two-model ensembles: GLM-MaxEnt, GLM-ANN, GLM-SVM, MaxEnt-ANN, MaxEnt-SVM, and ANN-SVM. The GESMs produced by these ensemble models were also classified into the five susceptibility classes. The GLM-MaxEnt model classified 13.52% of the basin as very-high GES. MaxEnt-ANN assigned very-high to 23.76%, the highest among these ensemble models, and GLM-ANN assigned 12.91%, the lowest among these models ( Figure 6). All ensemble models indicated that the central and northeastern parts of the basin have very-high GES.

Gully Erosion Modeling by Ensemble of Three and Four Models
Two-model ensembles are the models most typically used [19][20][21][22][23][24][25]45]. The application of threeor four-model ensembles has been rare. Arabameri et al. [49] applied three-model ensembles of COPRAS-FR-RF, COPRAS-FR-BRT, and COPRAS-FR-LR to map GES. These ensembles included a multi-criteria decision approach model (COPRAS), two probabilistic models (FR and LR), and two machine learning models (RF and BRT). In this study, three-and four-model ensembles were used (Figure 7). The ensembles comprised of two machine learning models have been found to be effective and accurate for the prediction and mapping of natural hazards.

Gully Erosion Modeling by Ensemble of Three and Four Models
Two-model ensembles are the models most typically used [19][20][21][22][23][24][25]45]. The application of threeor four-model ensembles has been rare. Arabameri et al. [49] applied three-model ensembles of COPRAS-FR-RF, COPRAS-FR-BRT, and COPRAS-FR-LR to map GES. These ensembles included a multi-criteria decision approach model (COPRAS), two probabilistic models (FR and LR), and two machine learning models (RF and BRT). In this study, three-and four-model ensembles were used (Figure 7). The ensembles comprised of two machine learning models have been found to be effective and accurate for the prediction and mapping of natural hazards. The GLM-MaxEnt-ANN model classified 13.10% of basin as very-high susceptibility, 10

Assessing the Importance of the Factors
Evaluation of the significance of each GECF provides planners important information that can guide sustainable natural resource use [10,20]. The spatial correlations of the GECFs to gullies were calculated with the RF model in this study (Table 4). Others have used RF techniques for factorimportance analysis [20,58]. The results indicate that of greatest importance was distance to road (RF = 19.23). This factor was followed by LULC, HAND, rainfall, valley depth, distance to stream, slope length, stream density, aspect, elevation, geology, slope, and RSP. GES was higher on rangeland than on non-agricultural lands and on areas nearer roads. Higher gulley erosion was also associated with high stream density, HAND, and lower distances to streams. Elevation has direct and indirect effects on hydrological properties and soil moisture and these may influence gully erosion potential. Overall, gullying susceptibility was highest on rangelands that had higher densities of roads and streams.

Assessing the Importance of the Factors
Evaluation of the significance of each GECF provides planners important information that can guide sustainable natural resource use [10,20]. The spatial correlations of the GECFs to gullies were calculated with the RF model in this study (Table 4). Others have used RF techniques for factor-importance analysis [20,58]. The results indicate that of greatest importance was distance to road (RF = 19.23). This factor was followed by LULC, HAND, rainfall, valley depth, distance to stream, slope length, stream density, aspect, elevation, geology, slope, and RSP. GES was higher on rangeland than on non-agricultural lands and on areas nearer roads. Higher gulley erosion was also associated with high stream density, HAND, and lower distances to streams. Elevation has direct and indirect effects on hydrological properties and soil moisture and these may influence gully erosion potential. Overall, gullying susceptibility was highest on rangelands that had higher densities of roads and streams.

Validation of the Models
The GES model's performances were evaluated with the AUC of ROC and SCAI methods applied to the training and validation datasets. The training dataset produces the success rate curve (SRC) and the testing dataset the prediction rate curve (PRC). According to AUC of the SRC the ANN-SVM ensemble achieved the highest accuracy (0.948) (Figure 8). In order of success were models using ANN,  (Figure 9d-f). The AUCs of both the SRC and PRC indicate that all models achieved very good to excellent performance for GESM. Yet, though the ANN-SVM ensemble was superior in both assessments, the success and prediction rates of the other were less consistent. The AUC of SRC value of the ANN model was 0.946 (second best among the models), but the AUC of the PRC was 0.906 (seventh best).

Validation of the Models
The GES model's performances were evaluated with the AUC of ROC and SCAI methods applied to the training and validation datasets. The training dataset produces the success rate curve (SRC) and the testing dataset the prediction rate curve (PRC). According to AUC of the SRC the ANN-SVM ensemble achieved the highest accuracy (0.948) (Figure 8). In order of success were models using  (Figure 9d-f). The AUCs of both the SRC and PRC indicate that all models achieved very good to excellent performance for GESM. Yet, though the ANN-SVM ensemble was superior in both assessments, the success and prediction rates of the other were less consistent. The AUC of SRC value of the ANN model was 0.946 (second best among the models), but the AUC of the PRC was 0.906 (seventh best). SCAI was also used to evaluate prediction performance. The SCAI for all the individual and ensemble models indicated that all models achieved at least very good accuracy ( Figure 10). The susceptibility classifications of all 15 models were evaluated. The SCAI values decreased from as the susceptibility class increased ( Figure 11).   SCAI was also used to evaluate prediction performance. The SCAI for all the individual and ensemble models indicated that all models achieved at least very good accuracy ( Figure 10). The susceptibility classifications of all 15 models were evaluated. The SCAI values decreased from as the susceptibility class increased (Figure 11).

Discussion
Gully erosion is a geomorphic process but anthropogenic activities accelerate erosion and create imbalances in environmental mechanisms. Borrelli et al. [136] and Azareh et al. [33] concluded that gully erosion is a complex hazard that can interrupt economic growth, cause infrastructural collapse, and disrupt ecosystems. This study used novel machine learning ensembles comprised of MaxEnt, ANN, SVM, and GLM models to produce gully erosion susceptibility models (GESMs). These were evaluated using ROC curves and the SCAI. GESMs can be useful to governmental institutions, especially as they can improve decision making for gully erosion mitigation [19][20][21][22]. Several GESMs have been developed over the last few years and scholars have been searching for the best combinations of models and methods [19][20][21][22][23][24][25]45].

Discussion
Gully erosion is a geomorphic process but anthropogenic activities accelerate erosion and create imbalances in environmental mechanisms. Borrelli et al. [136] and Azareh et al. [33] concluded that gully erosion is a complex hazard that can interrupt economic growth, cause infrastructural collapse, and disrupt ecosystems. This study used novel machine learning ensembles comprised of MaxEnt, ANN, SVM, and GLM models to produce gully erosion susceptibility models (GESMs). These were evaluated using ROC curves and the SCAI. GESMs can be useful to governmental institutions, especially as they can improve decision making for gully erosion mitigation [19][20][21][22]. Several GESMs have been developed over the last few years and scholars have been searching for the best combinations of models and methods [19][20][21][22][23][24][25]45].
The Golestan Dam basin, encompassing most of Golestan Province, Iran, was chosen as the study site in which severe gully erosion can be assessed with individual and ensemble predictive models. Unique among GESM studies, this study tested and evaluated the results from two-, three-, and four-model ensembles. A gully erosion inventory data set was created and used in a now-common [19][20][21][22][23][24][25]44,[58][59][60] 70:30 random partition to calibrate the models and evaluate the results, respectively. GECFs that are useful in one study area may not be suitable other regions, therefore, factors were selected based on a review of literatures describing GES models [45,58]. The literatures helped to identify fourteen geo-environmental factors-elevation, slope, slope aspect, rainfall, geology, relative slope position, distance to stream, distance to road, drainage density, road density, slope length, HAND-that were used to map GES in the study area [19][20][21][22][23][24][25]44,[58][59][60]78]. After modeling, the contributions of each of the GECFs were assessed using the RF method [58]. The results indicated that the DtR and rainfall were the most important predictive variables among the GECFs. Using Jenks' natural break method, the GES maps were classified into five susceptibility classes. The GESMs classified 12.91%, to 16.01% of the study area as very-highly susceptible to gully erosion.
It is very difficult to avoid the problem of overfitting because non-gully pixels were chosen in a 1:1 ratio with the gully-present pixels, even though the actual number non-gully pixels is greater than the number of gully pixels. Furthermore, the training and testing datasets were based on a sampling ratio of 70:30 without testing the accuracy of the sampling ratio. Additionally, despite the multicollinearity test, noise may remain among the GECFs. The key benefit of machine learning algorithms is that they optimize the searches of several datasets from which valuable information is collected. Such methods may compensate for assumptions and can aid the formulation of strategies and the statistical processing of large datasets. Despite several drawbacks of machine learning, this study can help scholars create new models to improve performance not only for GES modelling, but for modelling of other hazards like floods, landslides, and others.
The distribution of GES in the Golestan Dam basin was successfully mapped by all individual and ensemble models. SVM-based models are very capable of dealing with non-linear and high-dimensional grouping problems by use of the kernel feature and the adding of slack parameters [137]. However, there may still be overestimation in SVM's GES model [138]. Some methods can be paired with SVM to boost the efficiency of GES assessments, which may achieve better results than other models [138]. The ANN-SVM ensemble achieved the highest accuracy, resembling the results in Chen et al. [46] and Pourghasemi et al. [45].

Models Prioritization
Comparing and appraising the machine learning models' results was a key objective of this study. The GESMs were constructed using the MaxEnt, ANN, GLM, and SVM models and eleven ensembles of these models. Model prioritization is an innovative and creative technique to evaluate model performance to select the best model. The prioritization process has been used in drainage morphometrics studies as sub-basin prioritization [139]. A similar approach was used here. The individual models, and the two-, three-, and four-model ensembles were used to produce fifteen GESMs. To rate the models, the AUC of SRC and the AUC of PRC were calculated and for each the models were ranked (i.e., prioritized) ( Figure 12

Conclusions
The Golestan Dam basin is one of the most gully-prone regions in Golestan Province owing to the combined impacts of GECFs. Modelling of GES is essential for the management of this destructive phenomenon in the Golestan Dam catchment. The broad goal of models is to provide more accessible and meaningful soil information to decision-makers based on the cutting-edge knowledge in this field. Communities in the Golestan Dam basin are facing soil erosion rates that threaten agriculture and local infrastructure. GESMs were developed from four standalone models (MaxEnt, ANN, SVM, and GLM). They were integrated into two-, three-, and four-model ensembles (GLM-MaxEnt, GLM-ANN, GLM-SVM, MaxEnt-ANN, MaxEnt-SVM, GLM-MaxEnt-ANN, GLM-MaxEnt-SVM, MaxEnt-ANN-SVM, ANN-SVM-GLM and ANN-SVM-GLM-MaxEnt). The estimations of GES from all of the models were categorized into five levels. AUC of the ROC curve and SCAI methods were used to evaluate the ensemble accuracies of the models' mapped predictions of GES. All models achieved excellent results. Among the 15 GES models, ANN-SVM produced the highest predictive capability and was selected as the best model for these purposes in this basin. GESMs are efficient and effective measures for the potential of gully erosion in this area. The government institutions in Iran can employ the ANN-SVM GESM tool to prevent formation of gullies and to mitigate soil erosion. To prevent soil erosion in the areas of highest GES, more intensive measures need to be implemented. Community outreach and public education are vital to soil conservation and management. Thus, different scholars, engineers, and decision makers have important roles in the quest for sustainable management in these areas that are susceptible to soil erosion and particularly to the formation of gullies.

Conclusions
The Golestan Dam basin is one of the most gully-prone regions in Golestan Province owing to the combined impacts of GECFs. Modelling of GES is essential for the management of this destructive phenomenon in the Golestan Dam catchment. The broad goal of models is to provide more accessible and meaningful soil information to decision-makers based on the cutting-edge knowledge in this field. Communities in the Golestan Dam basin are facing soil erosion rates that threaten agriculture and local infrastructure. GESMs were developed from four standalone models (MaxEnt, ANN, SVM, and GLM). They were integrated into two-, three-, and four-model ensembles (GLM-MaxEnt, GLM-ANN, GLM-SVM, MaxEnt-ANN, MaxEnt-SVM, GLM-MaxEnt-ANN, GLM-MaxEnt-SVM, MaxEnt-ANN-SVM, ANN-SVM-GLM and ANN-SVM-GLM-MaxEnt). The estimations of GES from all of the models were categorized into five levels. AUC of the ROC curve and SCAI methods were used to evaluate the ensemble accuracies of the models' mapped predictions of GES. All models achieved excellent results. Among the 15 GES models, ANN-SVM produced the highest predictive capability and was selected as the best model for these purposes in this basin. GESMs are efficient and effective measures for the potential of gully erosion in this area. The government institutions in Iran can employ the ANN-SVM GESM tool to prevent formation of gullies and to mitigate soil erosion. To prevent soil erosion in the areas of highest GES, more intensive measures need to be implemented. Community outreach and public education are vital to soil conservation and management. Thus, different scholars, engineers, and decision makers have important roles in the quest for sustainable management in these areas that are susceptible to soil erosion and particularly to the formation of gullies.