A Methodological Comparison of Three Models for Gully Erosion Susceptibility Mapping in the Rural Municipality of El Faid (Morocco)

Erosion is the main threat to sustainable water and soil management in Morocco. Located in the Souss-Massa watershed, the rural municipality of El Faid remains an area where gully erosion is a major factor involved in soil degradation and flooding. The aim of this study is to predict the spatial distribution of gully erosion at the scale of this municipality and to evaluate the predictive capacity of three prediction methods (frequency ratio (FR), logistic regression (LR), and random forest (RF)) for the characterization of gullying vulnerability. Twelve predisposing factors underlying gully formation were considered and mapped (elevation, slope, aspect, plane curvature, slope length (SL), stream power index (SPI), composite topographic index (CTI), land use, topographic wetness index (TWI), normalized difference vegetation index (NDVI), lithology, and vegetation cover (C factor). Furthermore, 894 gullies were digitized using high-resolution imagery. Seventy-five percent of the gullies were randomly selected and used as a training dataset, whereas the remaining 25% were used for validation purposes. The prediction accuracy was evaluated using area under the curve (AUC). Results showed that the factor that most contributed to the prevalence of gullying was topographic (slope, CTI, LS). Furthermore, the fitted models revealed that the RF model had a better prediction quality, with the best AUC (91.49%). The produced maps represent a valuable tool for sustainable management, land conservation, and protecting human lives against natural hazards (floods).


Introduction
Issues related to climatic hazards and water management have major importance given their impacts on the rural environment [1,2]. In the climate change context, there is evidence of more and more recurrent extreme climatic events. Several findings highlight global warming trends associated with reduced rainfall and vigorous hydrological cycles over most of Earth's surface [3]. In Morocco, climate change will produce more frequent and intense extreme events [4,5], which will lead to an increased probability of floods and droughts [6]. Land-use pattern change, rapid global population growth, and careless use of natural resources will intensify pressure on both the land and natural resources, with consequences with regard to the ability of the fragile ecosystems to sustain the provision of ecosystem services, in turn threatening human lives and property [7][8][9][10]. Thus, preventing further land degradation is a challenging task. Water erosion is one of the most frequent soil degradation processes in Morocco, especially in semi-arid areas [11,12]. The process consists of topsoil particle loss due to erosion agents that destroy agricultural lands and forest ecosystems and increase nutrient loss from the rich surface soil [13]. As an extreme form of soil erosion, gully erosion is one of the most visible and critical types of land machine learning models to identify areas where gullies occurred. While each approach offers their own advantages, comparative studies of susceptibility mapping with these three approaches represent a research gap in the field. This paper presents the material and methods by reporting on the study area (Section 2.1), the modeling approach (Section 2.2.2.), optimization and tuning of the model (Section 2.2.3.), and susceptibility model validation (Section 2.2.4.). The results will focus on defining the spatial relationship between the presence of gullies and the 12 factors causing a predisposition to gully erosion. The predictive capacities of the frequency ratio (FR), logistic regression (LR) and random forest (RF) were compared based on their predictive quality. Adopting these methods will be useful to decision-makers for identifying vulnerable areas for future land management measures in similar Mediterranean environments.

Study Area
The rural municipality of El Faid covers an area of 51,200 ha. It is located between the High Atlas and Anti-Atlas Mountains (between latitudes 30 • 26 and 30 • 39 North, and longitudes 8 • 24 and 8 • 4 West; Figure 1) in the center of the Souss basin, more precisely under the Souss River. The area is crossed by many seasonal streams (Tadenssat, Takenatine, Ait El Hssine, Aourir, Ait Oualil, Imaouen, etc.) which are characterized by a torrential nature during heavy rains that causes considerable damage to infrastructure, especially roads.
is dependent on conservative land management that is based on an assessment of the susceptibility to gullying and rational identification of appropriate conservative actions. Moroccan literature reveals rare or non-existent studies on gully erosion [39]. To contribute to the evaluation of gully erosion susceptibility, our study was focused on the rural municipality scale (El Faid). It aimed at comparing the predictive accuracy of bivariate, multivariate, and machine learning models to identify areas where gullies occurred. While each approach offers their own advantages, comparative studies of susceptibility mapping with these three approaches represent a research gap in the field. This paper presents the material and methods by reporting on the study area (Section 2.1), the modeling approach (Section 2.2.2.), optimization and tuning of the model (Section 2.2.3.), and susceptibility model validation (Section 2.2.4.). The results will focus on defining the spatial relationship between the presence of gullies and the 12 factors causing a predisposition to gully erosion. The predictive capacities of the frequency ratio (FR), logistic regression (LR) and random forest (RF) were compared based on their predictive quality. Adopting these methods will be useful to decision-makers for identifying vulnerable areas for future land management measures in similar Mediterranean environments.

Study Area
The rural municipality of El Faid covers an area of 51,200 ha. It is located between the High Atlas and Anti-Atlas Mountains (between latitudes 30°26′ and 30°39′ North, and longitudes 8°24′ and 8°4′ West; Figure 1) in the center of the Souss basin, more precisely under the Souss River. The area is crossed by many seasonal streams (Tadenssat, Takenatine, Ait El Hssine, Aourir, Ait Oualil, Imaouen, etc.) which are characterized by a torrential nature during heavy rains that causes considerable damage to infrastructure, especially roads.  The average elevation is 864 m. More than 30% of the study area is above 1000 m. The southern peaks culminate at more than 1500 m. El Faid consists of two contrasting areas: (1) Mountains (elevation exceeding 700 m), with more than 30% of the municipality surface characterized by a steep slope (exceeding 50%) constituting the upstream areas where the gullies that feed the main streams are located; and (2) The flat plain (the northern area) where most of the streams meet, representing the dynamic center with important Sustainability 2021, 13, 682 4 of 30 equipment and infrastructures. Thus, this area is key as it contains all the administrative establishments of the municipality.
The climate is arid and semi-arid. The average temperature is around +28 • C. Rainfall is characterized by spatial and temporal variability, with an average of 200 mm/year (at the Aoulouz Meteorological station).
As a part of the Anti-Atlas, the rural community of El Faid is composed of dolomitic limestones and shales of the Paleozoic, including massifs of Precambrian rocks [61]. The soils of the municipality are favorable to intensive agriculture, involving fruit trees such as citrus fruit and olive trees. The primary sector is therefore highly favored but it nevertheless suffers due to irregular climatic conditions, especially with regard to rainfall. The municipality shelters a large forest area consisting mainly of argan, juniper, and carob trees.

Methods
Susceptibility is defined as the probability of the spatial occurrence of a phenomenon based on the relationship between the distribution of the studied phenomenon and a set of geo-environmental factors controlling it [34]. The scientific literature includes a large number of models that have been used to describe and evaluate gully growth [62]. Forecasting susceptibility to gullying takes into account a number of interrelated steps ( Figure 2). These steps consist of preparing data to be used for the modeling process, developing the training model, and validating the adjusted model.
The average elevation is 864 m. More than 30% of the study area is above 1000 m. The southern peaks culminate at more than 1500 m. El Faid consists of two contrasting areas: (1) Mountains (elevation exceeding 700 m), with more than 30% of the municipality surface characterized by a steep slope (exceeding 50%) constituting the upstream areas where the gullies that feed the main streams are located; and (2) The flat plain (the northern area) where most of the streams meet, representing the dynamic center with important equipment and infrastructures. Thus, this area is key as it contains all the administrative establishments of the municipality.
The climate is arid and semi-arid. The average temperature is around +28 °C. Rainfall is characterized by spatial and temporal variability, with an average of 200 mm/year (at the Aoulouz Meteorological station).
As a part of the Anti-Atlas, the rural community of El Faid is composed of dolomitic limestones and shales of the Paleozoic, including massifs of Precambrian rocks [61]. The soils of the municipality are favorable to intensive agriculture, involving fruit trees such as citrus fruit and olive trees. The primary sector is therefore highly favored but it nevertheless suffers due to irregular climatic conditions, especially with regard to rainfall. The municipality shelters a large forest area consisting mainly of argan, juniper, and carob trees.

Methods
Susceptibility is defined as the probability of the spatial occurrence of a phenomenon based on the relationship between the distribution of the studied phenomenon and a set of geo-environmental factors controlling it [34]. The scientific literature includes a large number of models that have been used to describe and evaluate gully growth [62]. Forecasting susceptibility to gullying takes into account a number of interrelated steps ( Figure  2). These steps consist of preparing data to be used for the modeling process, developing the training model, and validating the adjusted model.

Used Data
Gully erosion is a complex natural phenomenon. Modeling its occurrence probability remains a challenging task. Such a process relies on the use of a set of predisposing factors for this phenomenon. Twelve factors were used as predictors for this work: elevation, slope, aspect, plan curvature, slope length factor (LS), lithology, stream power index (SPI), compound topographic index (CTI), land use, topographical wetness index (TWI), normalized difference vegetation index (NDVI), and vegetation cover (C factor). Once the probability of occurrence of the phenomenon on a spatial scale is established, the model validation phase is essential in order to assess its performance and predictive qualities. The training gully dataset was acquired through on-screen digitizing using highresolution imagery from Google Earth. Further, several field prospecting surveys were carried out to complete and correct them. Thus, the spatial gully dataset comprised 893 ravines ( Figure 3). The dataset was checked and was verified to ensure topological consistency for its use. Those gullies were randomly split into two subsets: the training dataset (669 gullies), and the validation dataset (224 gullies; 25%). The gully centroids were extracted with the intention of representing gully conditions. remains a challenging task. Such a process relies on the use of a set of predisposing factors for this phenomenon. Twelve factors were used as predictors for this work: elevation, slope, aspect, plan curvature, slope length factor (LS), lithology, stream power index (SPI), compound topographic index (CTI), land use, topographical wetness index (TWI), normalized difference vegetation index (NDVI), and vegetation cover (C factor). Once the probability of occurrence of the phenomenon on a spatial scale is established, the model validation phase is essential in order to assess its performance and predictive qualities.
The training gully dataset was acquired through on-screen digitizing using high-resolution imagery from Google Earth. Further, several field prospecting surveys were carried out to complete and correct them. Thus, the spatial gully dataset comprised 893 ravines ( Figure 3). The dataset was checked and was verified to ensure topological consistency for its use. Those gullies were randomly split into two subsets: the training dataset (669 gullies), and the validation dataset (224 gullies; 25%). The gully centroids were extracted with the intention of representing gully conditions. Topographical variables were derived from the digital elevation model DEM of 12.5 × 12.5 m pixel size resolution using common GIS functions. Lithology was extracted from Marrakech geology map; land use, NDVI, and factor C were derived from the satellite images of Sentinel-2 ( Table 1). The land use map accuracy was evaluated through a confusion matrix with an overall accuracy of 76.8% and a kappa coefficient of 66.1% (Table  2).  Topographical variables were derived from the digital elevation model DEM of 12.5 × 12.5 m pixel size resolution using common GIS functions. Lithology was extracted from Marrakech geology map; land use, NDVI, and factor C were derived from the satellite images of Sentinel-2 ( Table 1). The land use map accuracy was evaluated through a confusion matrix with an overall accuracy of 76.8% and a kappa coefficient of 66.1% (Table 2).  Topographic factors play an essential role in the explanation of the natural hazard of gully erosion [27,63]. The slope gradient affects surface runoff and drainage density [34,64,65]. The slope gradient map was generated from the DEM and was classified into five classes ( Figure 4b).
By controlling the duration of sunshine, humidity, and evapotranspiration, the aspect exerts a control on the vegetation that affects linear erosion processes [66]. The aspect map was generated from the DEM and classified into nine classes ( Figure 4c).
The curvature plan is considered to be a useful geomorphological land description parameter; it can be used to represent the spatial variability in diverging and converging overland flow of water [65,67]. The curvature map has been classified into three classes ( Figure 4d). Negative curvature plan values indicate concavity, positive values define convexity, and zero values indicate a flat surface. The LS and C factors used to assess vulnerability to erosion were used as predisposing factors for gully erosion. The formula used for the determination of factor C [68] using the NDVI provides information on vegetation cover and its health The formula is described as follows: where NDVI = (NIR − Red)/(NIR + Red).
where As is specific catchment area (m) and β is the slope gradient (°). The TWI map was classified into four classes (Figure 4h).
Soil surface properties are regarded as significant to gully erosion susceptibility [72]. They are commonly considered as an effective factor to evaluate the gully erosion process [73][74][75]. Moreover, the lithological characteristics of the surface materials play an important role in the soil's resistance to erosion, infiltration, and runoff, and consequently in the development of gullies [14,[76][77][78]. The study area contains two groups of rocks. According to their resistance to the splash effect and to friction by the flowing water, the rock types were classified by degree of friability according to Table 3. To take into consideration human influences, land use was added to the factors controlling gullying. The land use map was developed by classifying Sentinel-2 satellite images using ArcGIS 10.5 software, corrected and validated using very high-resolution images from Google Earth and field visits. The land use map was classified into four classes (agriculture 4.85%, forest land 58.82%, built-up areas 0.63%, bare soil, 35.96%). (Figure 4g) The forest area represents the dominant class for the rural municipality surface area. The forest land includes argan trees, poor pastures, and many non-forested areas.
To take into account the effect of the vegetation on the gully erosion hazard, the normalized difference vegetation index (NDVI) was used to describe the vegetation density. Therefore, areas presenting low NDVI values will have a high susceptibility of gully erosion [79]. The NDVI was calculated using the red and near-infrared bands of the Sentinel-2 using the equation: where NIR and R are the near infrared and red bands of the electromagnetic spectrum, respectively.
The negative values of this index correspond to surfaces other than the plant cover, for example water. Values close to zero show bare surfaces and non-forested areas.    The stream power index (SPI) is used to measure the erosive power of water flowing through a watershed. High SPI values represent areas with steep slopes and accumulation of flow, and thus high erosive potential. It is calculated using the following formula [69].
where As is the specific catchment area (m) and β is the slope gradient ( • ). The SPI map was classified into three classes ( Figure 4e). The compound topographic index (CTI) is the quotient of slope and flow accumulation. It represents the power of the watercourse per unit of bed area. Low values of this index indicate small watersheds with steep slopes and small water catchment areas, as opposed to areas with high values, which represent areas with gentle slopes and large water catchment areas [69]. It is calculated using the following relationship [70]: where A is the upstream drainage area, S is the local slope, and PLANC is the planform curvature. The CTI map was classified into six classes ( Figure 4f). The topographic wetness index (TWI) is a useful topographical and hydrological parameter. It is derived from the interactions of small-scale relief coupled to the contributing upward gradient land surface according to the following relationship [71]: where As is specific catchment area (m) and β is the slope gradient ( • ). The TWI map was classified into four classes ( Figure 4h). Soil surface properties are regarded as significant to gully erosion susceptibility [72]. They are commonly considered as an effective factor to evaluate the gully erosion process [73][74][75]. Moreover, the lithological characteristics of the surface materials play an important role in the soil's resistance to erosion, infiltration, and runoff, and consequently in the development of gullies [14,[76][77][78]. The study area contains two groups of rocks. According to their resistance to the splash effect and to friction by the flowing water, the rock types were classified by degree of friability according to Table 3. To take into consideration human influences, land use was added to the factors controlling gullying. The land use map was developed by classifying Sentinel-2 satellite images using ArcGIS 10.5 software, corrected and validated using very high-resolution images from Google Earth and field visits. The land use map was classified into four classes (agriculture 4.85%, forest land 58.82%, built-up areas 0.63%, bare soil, 35.96%). (Figure 4g) The forest area represents the dominant class for the rural municipality surface area. The forest land includes argan trees, poor pastures, and many non-forested areas.
To take into account the effect of the vegetation on the gully erosion hazard, the normalized difference vegetation index (NDVI) was used to describe the vegetation density. Therefore, areas presenting low NDVI values will have a high susceptibility of gully erosion [79]. The NDVI was calculated using the red and near-infrared bands of the Sentinel-2 using the equation: where NIR and R are the near infrared and red bands of the electromagnetic spectrum, respectively.
The negative values of this index correspond to surfaces other than the plant cover, for example water. Values close to zero show bare surfaces and non-forested areas. For the vegetation formations there are positive NDVI values between 0.1 and 0.5 (Figure 4i).

Gully Susceptibility Models Background of the Modeling Approaches Used Frequency Ratio (FR)
In bivariate statistical approaches, the frequency ratio is a method based on calculating the gully density weighting values for each class of factors [80]. The spatial relationship between the gully birthplace and each factor was derived by the ratio of the area where gullies were concentrated with respect to the total area studied. The ratio of the probability of occurrence of a gully to a non-occurrence with respect to a given attribute [81] is as follows: where FR i,j is the frequency ratio of class j in factor i; N pix (S i,j ) is the number of pixels of gully presence within class j in factor i; and N pix (N i,j ) is the number of pixels of class j in factor i. Values greater than 1.0 indicate a strong correlation between the presence of gullying and the factor's class and so forth [82]. The prediction rate was calculated for each predisposing factor using the following equation [83]: where SA is the index of the spatial association (FR) of spatial factors and gullies. The prediction rate is considered as high if (0.6 ≤ PR); moderate if (0.4 ≤ PR < 0.6), and low if (PR < 0.4).

Logistic Regression (LR)
Logistic regression is a multivariate statistical approach used to predict the probability of the presence of an event or dependent variable. The independent variables can be continuous or categorical, based on the optimization of the weights of each variable. These weights are estimated by regression coefficients [45].
In the case of gullying, the relationship between this phenomenon and these independent factors can be expressed as follows: where p is the gully occurrence probability (varying between 0 and 1 on an S-curve) and Y is a linear combination of the following form: where Y is the dependent variable representing the presence (1) or absence (0) of the gully, b 0 is the model intercept, b 1 . . . b n are the partial regression coefficients, and x 1 . . . x n correspond to the independent variables.

Random Forest (RF)
As a modern machine learning method, the random forest technique is based on the concept of a classification or regression tree (CART), which consists of classifying a dataset into homogeneous groups via a tree where each node has a choice (a rule) and at each end is a given decision [84].
By creating a set of decision trees, where each has a specific rule, the training data will be divided according to the number of trees, and the data will be classified according to their equivalent decision tree. The average of all the results received from each tree will be the result to be taken.

Data Set Preparation
It should be noted that for multivariate model training, the dataset should contain both present and absent gully erosion events. Security areas where the slope did not exceed 5% and where gullies were absent were identified. Random non-gullying point samples were taken from the security area. Non-gullying points were combined with training points. The gullying and non-gullying points were assigned a value of one (1) and zero (0), respectively, and then rasterized. Furthermore, the 12 independent factors as well as the dependant variable maps were converted to Raster format and resampled to a common resolution of 12.5 m and common extent. R language was used for processing.
Furthermore, for machine learning models the training data must be standardized or normalized (values between 0 and 1). It is also necessary to transform the categorical variables into numerical ones. For this purpose, the factors of land use, lithology, and aspect were divided into classes with values between 0 and 1 ( Table 4). For the frequency ratio model, results may depend on the used classification for the controlling factors. Indeed, each factor class is composed of a homogeneous location with regard to the considered factor. Literature classifies the factors causing a predisposition to gullying according to three approaches: (1) Choice of classes based on works carried out in the same study area, or in a study area with similar climatic characteristics [15,65,85]; (2) classification using the natural breaks method, rarely used in the literature [86,87]; and (3) use of an arbitrary choice that depends on the factor value range and the territorial characteristics of the area [35,88].
These three approaches were compared to test model sensitivity to the underlying controlling factors classification (Table 5).
RF model adjustment is based on obtaining the best hyperparameter arrangement to improve the model's accuracy and ensure its best possible tuning. The hyperparameters that must be optimized are: Mtry (number of variables randomly sampled as candidates at each split), Ntree (number of trees to grow), and Maxnodes/Nodesize (directly related to tree depth; the node size of classification trees implicitly limits the number of nodes in each tree).
After testing the results of the model with its default parameters, the effects of Mtry and Ntree on the predictive qualities of the model were tested since these parameters have the greatest effect on the model performance assessed through the out of bag error OOB. Table 5. Selected classes for three scenarios of the frequency ratio model. Approach 1 is based on works carried out in the same study area [39,47,[88][89][90]. Approach 2 is based on works carried out in a study area with similar climatic characteristics [86,87]. Approach 3 refers to classification using the natural breaks method [35,88].  The validation subset was used to evaluate the predictive power of the model. As a widely employed method to assess the quality of a classification [15,91], the area under the receiver operating characteristic curve (AUC) was selected for the validation of model performance. Several authors have tried to interpret AUC values. According to Yesilnacar (2005)  The confusion matrix was used to assess the accuracy of the classification. By comparing the classifications made by the three algorithms of the known classifications as selected, the confusion matrix allowed us to measure and summarize the accuracy of classification [94]. The FR model was fitted according to the three factor classification approaches described in the methodology. Prediction rates according to each approach are presented in Table 6.  According to the three approaches, lithology and land use factors underlay high susceptibility values, with a correlation of 6.96 for the lithology and 5.57 for the land use. However, with a prediction rate of 1, the slope did not have a significant effect on the estimate of susceptibility to gully erosion.

Factor
The limestone lithology class, which showed maximum frequency ratio values with a prediction rate of 0.89, was characterized by a high susceptibility to gullying. Indeed, the limestone substrate was compact, with low weathering and a low friability rate. For the forest areas class, it presented a relatively high value with a frequency ratio of 0.62. This high susceptibility was combined with the fact that the argan forest presented a deteriorated aspect with a sparse tree cover. For other categorical factors, such as aspect and curvature plan, the prediction rates of the frequency ratio model were moderate. Gullying is therefore not highly controlled by these two factors.
For each approach, the predisposing factors for gully erosion were weighted according to their density to estimate the relative contribution of each factor to the prediction of the spatial occurrence of gullying ( Figure 5). The limestone lithology class, which showed maximum frequency ratio values with a prediction rate of 0.89, was characterized by a high susceptibility to gullying. Indeed, the limestone substrate was compact, with low weathering and a low friability rate. For the forest areas class, it presented a relatively high value with a frequency ratio of 0.62. This high susceptibility was combined with the fact that the argan forest presented a deteriorated aspect with a sparse tree cover. For other categorical factors, such as aspect and curvature plan, the prediction rates of the frequency ratio model were moderate. Gullying is therefore not highly controlled by these two factors.
For each approach, the predisposing factors for gully erosion were weighted according to their density to estimate the relative contribution of each factor to the prediction of the spatial occurrence of gullying ( Figure 5). Figure 5. The prediction rate for each predisposing factor for gullying for the three approaches (see PR in Table 5).
For the first two approaches, NDVI presented the maximum susceptibility values, with a prediction rate of 7.53 for the first and 7.44 for the second. However, once the number of intervals was increased from three classes to five classes, this factor became the fourth highest in terms of its correlation weight, with a prediction rate of 3.36 (Table 6). By using the natural breaks method in the second approach and increasing the number of classes in the third one, the C-factor dropped from the second position to the fourth in the second approach, and to the third position in the third approach (Table 6).
Topographic factors: LS, TWI, and CTI (but not SPI) had low gully susceptibility values. The latter had moderate susceptibility values, with a prediction rate of 4.95 and 4.49 for the first two approaches. With the increase in class thresholds in the third approach, the SPI factor was the second lowest in terms of susceptibility.
According to the frequency ratio model, vegetation plays a decisive role in controlling the occurrence of gullying. With low susceptibility values the topography of the land presents a low control over gully erosion. Testing the effect of the class settings used in the frequency ratio model on the predictive qualities of the model led to several findings. Changes in class intervals affected the prediction rates of each gullying predisposing factor, but there was almost no change in the susceptibility ranking of the factors. In contrast, changing the number of classes had a higher effect on factor ranking. Thus, as the number of classes of a factor increased, its prediction rate tended to decrease. By increasing the number of classes of a causative factor, the number of pixels containing gully surface in every class tended to reduce. Hence, the more the number of classes increased, the more the surface of each class decreased.
The fitted frequency ratio model (using classes that presented the best prediction rates (PR) for the three scenarios (Table 5)) was used to produce gullying susceptibility Figure 5. The prediction rate for each predisposing factor for gullying for the three approaches (see PR in Table 5).
For the first two approaches, NDVI presented the maximum susceptibility values, with a prediction rate of 7.53 for the first and 7.44 for the second. However, once the number of intervals was increased from three classes to five classes, this factor became the fourth highest in terms of its correlation weight, with a prediction rate of 3.36 (Table 6). By using the natural breaks method in the second approach and increasing the number of classes in the third one, the C-factor dropped from the second position to the fourth in the second approach, and to the third position in the third approach (Table 6).
Topographic factors: LS, TWI, and CTI (but not SPI) had low gully susceptibility values. The latter had moderate susceptibility values, with a prediction rate of 4.95 and 4.49 for the first two approaches. With the increase in class thresholds in the third approach, the SPI factor was the second lowest in terms of susceptibility.
According to the frequency ratio model, vegetation plays a decisive role in controlling the occurrence of gullying. With low susceptibility values the topography of the land presents a low control over gully erosion. Testing the effect of the class settings used in the frequency ratio model on the predictive qualities of the model led to several findings. Changes in class intervals affected the prediction rates of each gullying predisposing factor, but there was almost no change in the susceptibility ranking of the factors. In contrast, changing the number of classes had a higher effect on factor ranking. Thus, as the number of classes of a factor increased, its prediction rate tended to decrease. By increasing the number of classes of a causative factor, the number of pixels containing gully surface in every class tended to reduce. Hence, the more the number of classes increased, the more the surface of each class decreased.
The fitted frequency ratio model (using classes that presented the best prediction rates (PR) for the three scenarios (Table 5)) was used to produce gullying susceptibility map ( Figure 6). This map shows five susceptibility classes (classified using natural breaks) representing various susceptibility rates: 5.57% very low, 18 (Figure 6). This map shows five susceptibility classes (classified using natural breaks) representing various susceptibility rates: 5.57% very low, 18.9% low, 25.3% moderate, 35.84% high, and 14.31% very high.

Gully susceptibility analysis using the LR model
Fitting the logistic regression model allowed for the analysis of the correlation between the 12 factors used to predict gullying. The correlation plots shown in Figure 7 facilitate understanding of the correlation between each factor and the training data.

Gully Susceptibility Analysis Using the LR Model
Fitting the logistic regression model allowed for the analysis of the correlation between the 12 factors used to predict gullying. The correlation plots shown in Figure 7 facilitate understanding of the correlation between each factor and the training data.
Low NDVI and TWI values were accompanied by the occurrence of gullying. As for factor C, the low values were negatively correlated with gully erosion, which is quite logical. The low values of factor C and the high values of NDVI were related to the presence of vegetation and therefore to a low risk of the presence of gullying. The low values of TWI correspond to a local steep slope. By their sigmoid form, the other topographic factors (elevation, slope, SPI and LS) were positively correlated with gullying. The Table 7 represents the coefficients of the fitted logistic regression model and their consecutive standard error. Sustainability 2021, 13, x FOR PEER REVIEW 16 of 31  The significance of the chi-squared values was acceptable at a 95% confidence interval. The fitted LR model was used to produce the susceptibility map ( Figure 8) for the rural municipality of El Faid. This map was reclassified by the natural breaks technique into five classes as follows: 42.7% very low, 21.2% low, 21.5% moderate, 12.4% high, and 2.07% very high.  3.1.3. Gully susceptibility analysis using the RF model RF model were fitted using default parameters. Thus, the best number of variables randomly sampled for Mtry was 12, with an accuracy of 0.885 and a kappa coefficient of 0.74. Optimizing the model was based on adjusting the Mtry and Ntree parameters.
The literature includes an extensive discussion of Mtry's influence on the model's results. Several studies reported that different Mtry values did not affect the true positive rates of their model and that other performance parameters (sensitivity/specificity, kappa, and ROC) were stable under different Mtry values. The optimal Mtry parameter was identified through increasing the number of variables with three scenarios (Figure 9). For each one, a maximum number of searches (0 to 10, 0 to 20, and 0 to 30) was tried. A model with a high predictive quality was achieved with an Mtry of 14, with a precision of 0.88% and kappa coefficient of 0.74%. The Ntree parameter had significant effect on the predictive accuracy of the model. Adelabu and Dube (2015) [44] showed that changes in Ntree and Mtry can have an effect on the OOB error. To find the effect of the Ntree parameter, the results of the model and the controlling factor prediction rates for three different numbers of trees (500, 1000, and 2000) were compared ( Figure 10). The Ntree parameter had significant effect on the predictive accuracy of the model. Adelabu and Dube (2015) [44] showed that changes in Ntree and Mtry can have an effect on the OOB error. To find the effect of the Ntree parameter, the results of the model and the controlling factor prediction rates for three different numbers of trees (500, 1000, and 2000) were compared ( Figure 10).
The most important variables controlling gully erosion for the three numbers of classification trees Ntree were: CTI, slope, SPI, and LS. For the other factors of less importance (such as TWI, curvature plane, NDVI, elevation, factor C, etc.) the change in the Ntree parameter did not significantly influence the ranking of the factors in terms of prediction rate for all three scenarios.
Ntree, achieving good RF mode performance, was identified using the out of bag (OOB) plot ( Figure 11) that measures the prediction error of the RF model. The average error on each prediction sample was very high for low Ntree values. However, as the number of trees increased, the error tended to decrease until it stabilized between 500 and 1500 trees. Regarding the area under the curve AUC parameter, it was found that for the three scenarios tested previously, the AUC values increased slightly when exceeding 500 to 1000 trees and also from 1000 to 2000 trees. The increase of the Ntree parameter was not accompanied by a perfect improvement of qualities in terms of AUC values. Hence, we will refer to OOB values to choose the optimal number of trees (Ntree). The controlling factor prediction rates for the tuned RF model are shown in Figure 12.    The optimized RF model was used to produce a gully susceptibility map ( Figure 12). This map was reclassified by the natural breaks technique into five classes as follows: 18.9% very low, 19.5% low, 22.6% moderate, 20.94% high, and 17.97% very high.

FR Model Validation
The results of the prediction rate curves of the three FR approaches are shown in Figure 13. By identifying the optimal logical breakpoints within a range of values, the use of the natural breaks method in the second approach allowed for an improvement of the AUC of 1.51% compared to the first approach. However, the increase in the number of continuous factor classes in the third approach increased the reductive qualities of the model, with a 5.13% improvement in the AUC. The optimized RF model was used to produce a gully susceptibility map ( Figure 12). This map was reclassified by the natural breaks technique into five classes as follows: 18.9% very low, 19.5% low, 22.6% moderate, 20.94% high, and 17.97% very high.

FR Model Validation
The results of the prediction rate curves of the three FR approaches are shown in Figure 13. By identifying the optimal logical breakpoints within a range of values, the use of the natural breaks method in the second approach allowed for an improvement of the AUC of 1.51% compared to the first approach. However, the increase in the number of continuous factor classes in the third approach increased the reductive qualities of the model, with a 5.13% improvement in the AUC. Running the frequency ratio model according to the classes presenting the best prediction rates for the three approaches improved the model's prediction rate by 0.86% compared to the third approach (Figure 14). The predictive performance of the frequency ratio Running the frequency ratio model according to the classes presenting the best prediction rates for the three approaches improved the model's prediction rate by 0.86% compared to the third approach ( Figure 14). The predictive performance of the frequency ratio (FR) model in the rural municipality of El Faid can thus be qualified as having good to excellent accuracy. Running the frequency ratio model according to the classes presenting the best prediction rates for the three approaches improved the model's prediction rate by 0.86% compared to the third approach ( Figure 14). The predictive performance of the frequency ratio (FR) model in the rural municipality of El Faid can thus be qualified as having good to excellent accuracy. Figure 14. Representation of the accuracy of the frequency ratio for the three approaches.

LR Model Validation
After standardizing the data by the "prediction" function, the AUC value ( Figure 14) was determined by the "performance" function which evaluates the performance of the predictor used. The AUC of the model was 97.3%, which testifies to the excellent predictive accuracy of the logistic regression model.

RF Model Validation
The validation dataset was used to evaluate predictive quality of the model according to the AUC parameter, using the hyperparameters shown in Figure 14. The predictive quality of RF model in the rural municipality of El Faid can thus be qualified having as good to excellent accuracy, with an AUC of 91.49%.

Comparison between Models (FR, LR, and RF)
Comparing and validating the performance of the gully susceptibility models showed that: The adjustment of the prediction models is a crucial step to improving their prediction results. In the case of the frequency ratio model the optimization of the hyperparameters improved the AUC values by 7.5%. The resulting AUC values for the FR, LR, and RF models were respectively 80.79%, 97.30%, and 91.49%. It can be said that the most reliable gully susceptibility map for the rural municipality of El Faid was that resulting from the logistic regression method, as it had the highest area under the curve.
To settle the model's accuracy issue in terms of prediction based on the validation sample, the areas presenting a very high susceptibility to gully erosion of each model were intersected with the validation set in a single map ( Figure 15). Thus, the model presenting the largest area of crossover with the 224 gullies of the validation sample would have the best predictive quality. It is necessary to specify that the model with the best accuracy should avoid predicting areas where there was no gullying as gully areas. Despite the fact that the LR model presented the best AUC value, the RF model showed a crossover area of 500,156 m 2 , in comparison to the 6500 m 2 presented by the LR model. The confusion matrix (Table 8) was used to validate the accuracy of the results of the models used. The findings from the confusion matrix confirm that the RF model showed the best accuracy, with 98.4% of the observations correctly classified. The FR model shows that 198 observations were not well observed, with reduced accuracy compared to other models ( Figure 16).     The comparison of the FR, LR, and RF susceptibility models proved that the model presenting the best AUC value was not necessarily the most accurate. However, machine learning methods such as the random forest RF model can be used successfully in other similar semi-arid contexts.

Discussion
This study attempted to use remote sensing and GIS techniques to predict the occurrence of gullies to generate maps of susceptibility towards gullying for the rural municipality of El Faid. Erosion vulnerability mapping is an essential element which, by illustrating risk, provides a tool for awareness raising, land management, and support for decision making and action. The comparison of the FR, LR, and RF susceptibility models proved that the model presenting the best AUC value was not necessarily the most accurate. However, machine learning methods such as the random forest RF model can be used successfully in other similar semi-arid contexts.

Discussion
This study attempted to use remote sensing and GIS techniques to predict the occurrence of gullies to generate maps of susceptibility towards gullying for the rural municipality of El Faid. Erosion vulnerability mapping is an essential element which, by illustrating risk, provides a tool for awareness raising, land management, and support for decision making and action.
The gully erosion susceptibility is controlled by several factors, and their importance can be assessed by various models [15,73]. Using a multitude of factors (topographic, hydrological, and anthropogenic) to compare three classification models, it was found that the validation method used to assess the forecasting accuracy of models was an essential step in determining the most reliable model [95]. However, with the development of classification algorithms and their efficiency [96], accuracy can be determined by different metrics [97].
The comparison between the results of each model showed that areas classified as highly susceptible to gullying were identified along steep slopes, drainage lines, and wastelands concentrated primarily in the southern part of the study area (Figures 6-13). Similarities in the areas of moderate gullying susceptibility were found in the douars near the mountains and also in lands with medium slope in the downstream parts of the small mountains in the south of the municipality. The logistic regression model did not show a similar pattern compared to the random forest and frequency ratio models. Similarities in areas presenting low susceptibility were due to moderate slopes, low drainage density, and less human influence, while areas of very low susceptibility were located near agricultural lands. In addition to the commonly used AUC method, the method of crossing areas classified as having a high susceptibility to gullying with the validation sample is a simple but effective method. The model with the highest precision should have the greatest gullying areas classified with high and very high sensitivity [34]. As a result, the RF model with an AUC of 0.90 showed the best predictive quality, with the best accuracy in terms of crossover area and confusion matrix.
The three methods chosen in this work have been widely used in susceptibility mapping studies. Each of these models has its own strengths and weaknesses, but no agreement has been reached on the best approach for the gully susceptibility analysis [38]. As a commonly used method [38,87,88,98], the frequency ratio model employs bivariate statistical analysis to quantify gullying susceptibility through individual factors related to gullying and offers better accuracy than other methods [99]. It takes into account the relationship of each predisposing factor and the occurrence of gully erosion, whereas it does not consider the relationship of all controlling factors on their own [13].
Logistic regression is used to establish a multivariate regression relationship between a dependent variable and several independent variables [100]. It does not assume linear dependencies between the controlling factors and the gullying process. However, this model generalizes the variables representing causal factors [38,101]. By forming a suitable link function for the usual linear regression model, logistic regression makes it possible to work with all types of variables, which represents a relevant advantage of this technique [100]. Results indicate that the LR model is more efficient than the FR model in the susceptibility modeling for gullying and other natural hazards [37,99,102].
The RF model is a robust and well-performing technique for susceptibility modeling, as corroborated by the research [103,104]. It can support multiple input variables without removing each variable and several other sets of classes that have high prophetic accuracy [105]. The RF model can also detect nonlinear relationships between both duplicate independent and dependent variables. Thus, our study showed that the RF model will be a good choice for other hazard susceptibility studies carried out in similar Mediterranean contexts.
The study of the susceptibility to gullying highlights the flood risk in the study area, providing a basis to guarantee the proper conduct of water and soil conservation. The maps thus produced will allow the prioritization of actions to reduce the phenomenon of erosion and the establishment of an efficient method to achieve the objectives while minimizing the investment of time and money. The gully erosion management actions thus proposed will make it possible to develop a consistent and concrete action plan, which will help stop inappropriate land management while trying to promote the economic conditions of the populations.

Conclusions
Soil erosion is one of the basic causes of land destruction. Linear gully erosion causes an irreversible loss of agricultural lands and water pollution by materials transported during floods. This work aimed to establish spatial susceptibility to gully erosion in the rural municipality of El Faid. The findings will provide a solid basis for establishing water and soil conservation action plans to reduce floods and disasters caused by stream overflows in the region.
Twelve predisposing factors (elevation, slope, aspect, curvature plan, slope length, stream power index, composite topographic index, land cover, topographic wetness index, normalized vegetation index, lithology, and vegetation cover factor) were chosen to compare three predictive modeling approaches: frequency ratio (FR), logistic regression (LR), and the random forest model (FR).
The predictive performance of the models, evaluated using the AUC, was improved by 5.13% for the frequency ratio (FR) model. Thus, the logistic regression (LR) model showed the best performance with one (AUC 97.3%), as compared to the random forest model (AUC 91.49%) and the frequency ratio model (AUC 80.79%). Areas presenting an extremely high vulnerability to gullying were compared to the validation sample. The area crossover and confusion matrices proved that the model presenting the best AUC value was not necessarily the best-performing one.
From modeling to erosion control, this study addressed all aspects related to the gully erosion problem. The susceptibility maps thus produced will allow for the elaboration of a consistent and concrete action plan that will ensure sustainable land management and enhance the resilience of local communities that are affected by this phenomenon. Funding: This research and the APC were funded within the framework of the Initiative "Environmental and Socio-Economic Vulnerability and Adaptation Potential to Climate Change", implemented by the Moroccan Regional Science Association, Rabat, and supported by Ouranos, Canada, through a partial funding from the Green Fund under the Government of Quebec's 2013-2020 Climate Change Action Plan.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available in the article.