Susceptibility to Gully Erosion: Applying Random Forest (RF) and Frequency Ratio (FR) Approaches to a Small Catchment in Ethiopia

Soil erosion by gullies in Ethiopia is causing environmental and socioeconomic problems. A sound soil and water management plan requires accurately predicted gully erosion hotspot areas. Hence, this study develops a gully erosion susceptibility map (GESM) using frequency ratio (FR) and random forest (RF) algorithms. A total of 56 gullies were surveyed, and their extents were derived by digitizing Google Earth imagery. Literature review and a multicollinearity test resulted in 14 environmental variables for the final analysis. Model prediction potential was evaluated using the area under the curve (AUC) method. Results showed that the best prediction accuracy using the FR and RF models was obtained by using the top four most important gully predictor factors: drainage density, elevation, land use, and groundwater table. The notion that the groundwater table is one of the most important gully predictor factors in Ethiopia is a novel and significant quantifiable finding and is critical to the design of effective watershed management plans. Results from separate variable importance analyses showed land cover for Nitisols and drainage density for Vertisols as leading factors determining gully locations. Factors such as texture, stream power index, convergence index, slope length, and plan and profile curvatures were found to have little significance for gully formation in the studied catchment.


Introduction
Although gullies occupy, on average, a small portion of a catchment (<5%), land degradation due to gully erosion causes serious environmental and socioeconomic problems by affecting soil and land functions [1,2]. Gully erosion lowers the groundwater table [3] and increases the susceptibility of soils to drought, causing crop yield reduction [4]. Gullies also increase landscape connectivity by providing efficient paths for the transport of water, sediment, and other materials from the source to the sink [5,6]. This affects flooding and reservoir siltation [7]. Soil erosion studies worldwide have shown that sediment in rivers mainly originates from gully erosion [8]. In Ethiopia, severe gully erosion has been a major

Study Area
The Minzir catchment is in the subhumid part of the upper Blue Nile river basin (Ethiopia) and covers a drainage area of 18 km 2 ( Figure 1). Its altitude ranges from 2030-2265 m a.s.l. with an average of 2095 m a.s.l. The catchment receives an average annual rainfall of 1480 mm, of which more than 90% falls between May and October. The mean maximum monthly temperature is 30.0 • C in March to 23.1 • C in August, whereas the mean minimum monthly temperature is 5.4 • C in December, with up to 13.1 • C in May and June.
Water 2021, 13, x FOR PEER REVIEW 3 of 22

Study Area
The Minzir catchment is in the subhumid part of the upper Blue Nile river basin (Ethiopia) and covers a drainage area of 18 km 2 ( Figure 1). Its altitude ranges from 2030-2265 m a.s.l. with an average of 2095 m a.s.l. The catchment receives an average annual rainfall of 1480 mm, of which more than 90% falls between May and October. The mean maximum monthly temperature is 30.0 °C in March to 23.1 °C in August, whereas the mean minimum monthly temperature is 5.4 °C in December, with up to 13.1 °C in May and June. The geology is dominated by Quaternary basalts [44], with soil type dominantly being Nitisols (29.5%), Vertisols (29.2%), Luvisols (14.9%), Leptosols (10%), and Alisols (7.7%). Cambisols, Fluvisols, and Regosols cover the remaining 5% of the catchment. The soil texture is dominantly clay (83.7%), clay loam (5.9%), sandy clay loam (3.5%), and loam (3.1%). The major land cover in the study area is cultivated land (83%), grassland (12.8%), and plantation forest (1.6%). Woodland, bushland, villages, and water bodies together comprise less than 3% of the catchment.
To reduce land degradation, different biological, agronomic, and physical soil and water conservation measures (e.g., bunds and check dams) were introduced in the catchment area [45]. Despite this effort, the amount of sediment reaching the Koga reservoir, which is located downstream of Minzir, is still significant and amounts to 25 Mg ha −1 y −1 [46,47]. Moreover, soil and water conservation efforts in the last 15 years resulted in only 35% reduction in soil erosion [48,49]. An increase in cultivated land and a decrease in woody vegetation has contributed to the increased sediment yield in the catchment [46].
The same study further stated that poorly planned and poorly constructed soil and water conservation measures are among the main causes of soil erosion in the catchment. A similar study from an adjacent catchment revealed that due to the lack of integrated approaches and the lack of maintenance, rehabilitation measures have not been able to reduce sediment in rivers [49]. This and an increase in cultivated land and a decrease in woody vegetation have contributed to the increased sediment yield in the catchment [46] in spite of the construction of rehabilitation measures.

Data Collection and Preparation
This research is organized as follows (see Figure 2): (1) data organization, including gully erosion inventory and selection of gully predictor factors; (2) multicollinearity test of gully predictor factors; (3) gully-erosion susceptibility mapping (GESM); (4) ranking gully predictor factors; (5) model validation.  [46,47]. Moreover, soil and water conservation efforts in the last 15 years resulted in only 35% reduction in soil erosion [48,49]. An increase in cultivated land and a decrease in woody vegetation has contributed to the increased sediment yield in the catchment [46]. The same study further stated that poorly planned and poorly constructed soil and water conservation measures are among the main causes of soil erosion in the catchment. A similar study from an adjacent catchment revealed that due to the lack of integrated approaches and the lack of maintenance, rehabilitation measures have not been able to reduce sediment in rivers [49]. This and an increase in cultivated land and a decrease in woody vegetation have contributed to the increased sediment yield in the catchment [46] in spite of the construction of rehabilitation measures.

•
Gully erosion inventory Gully erosion inventory data were prepared by combining field surveys with Google Earth TM imagery analysis. A total of 56 gullies were surveyed in the field; their morphology, including width, depth, length, and locations, were monitored. Then, the areal extent of each gully was derived by digitizing Google Earth imagery.

•
Gully erosion predictor factors From a literature review, 18 predictor factors that control gully erosion were selected from four data sets: digital elevation model (DEM), groundwater table, soil, and land

•
Gully erosion inventory Gully erosion inventory data were prepared by combining field surveys with Google Earth™ imagery analysis. A total of 56 gullies were surveyed in the field; their morphology, including width, depth, length, and locations, were monitored. Then, the areal extent of each gully was derived by digitizing Google Earth imagery.

•
Gully erosion predictor factors From a literature review, 18 predictor factors that control gully erosion were selected from four data sets: digital elevation model (DEM), groundwater table, soil, and land cover (Table 1). We omitted those that do not vary throughout the catchment and do not have explanatory values in our modeling, such as geology and rainfall depth. Therefore, a  total of 16 gully predictor factors, which have been subjected to the multicollinearity test  (elaborated in Table 2), will be used. Note: vector maps of land use/cover and soil were reprocessed into 12.5-m rasters to match the DEM resolution.
To ascertain compatibility during modeling, maps of the gully inventory and predictor factors were rasterized to the same pixel size of 12.5 m as the DEM using ArcGIS 10.5. To extract different gully-erosion factors from the DEM, SAGA GIS 7.5 was used [53]. The first 12 factors presented in Table 2 were derived from DEM data.
The 1:50,000 scale soil data (soil type and soil texture) has an equivalent resolution of 25 m. The nearest resampling technique [54] was used to match the soil data resolution with the DEM resolution. This resampling technique will not change the value of the cell. The maximum spatial error of applying the nearest resampling technique is one-half of the size of the cell. Table 2. Description of gully predictor variables.

2030-2265
Slope Determines both the kinetic energy and the volume of surface runoff [8], which, in turn, affects drainage density, discharge, and soil erosion [24].

0-34
Aspect Variation in aspect influences the distribution of vegetation by influencing the rate of recharge, soil moisture content, and evaporation that, in turn, control gully development [57,58].
Flat, North, Northeast, East, Southeast, South, Southwest, West, Northwest, and North Plan curvature Plan curvature, also known as contour curvature [59], is defined as "the rate of change of aspect along a contour" [60]. Plan curvature can have positive, negative, and zero values representing convexity, concavity, and flatness, respectively [61]. Flow diverges on a convex slope and converges on a concave slope.
Concave, flat, and convex TRI is a measure of topographic heterogeneity and is calculated on the basis of relief and drainage [63]. The TRI value for flat areas is zero, while the value for mountain areas with steep ridges is positive [64].
0. 35-19.6 Topography position index (TPI) TPI is derived from DEM by comparing a given cell elevation to the average of its surroundings [65]. The TPI value can be positive (when the elevation of a location greater than the average of its neighborhood), negative (when less than the surrounding areas), and zero (for flat or constant slope) [66]. LS is defined as the distance between the start of overland flow to a point where the slope gradient decreases sufficiently to cause depositions [67]. LS influences soil erodibility and critical shear stress [68].

0-772
Drainage density (DD) Drainage density is a measure of stream length per unit of catchment area [69] and is used as a predictor of gully erosion in many studies. The soil drainage characteristics affect soil water retention capacity, which, in turn, determines the rate of soil erosion [70].

0-2.2
Topographic wetness index (TWI) TWI quantifies the effect of local topography over the hydrological process and assesses the spatial distribution of soil moisture and surface saturation [71]. In areas where saturation excess runoff is the dominant process of gullies, TWI predicts saturated areas susceptible to gullies [72]. 2.89-17.6 Stream power index (SPI) SPI estimates the erosive power of surface runoff as a result of the relationship between discharge and catchment area [73]. Similarly, it predicts the potential of streams to modify the geomorphology of a catchment by gully erosion [74].

0-5965
Groundwater table depth (GWT) Research conducted in humid and subhumid regions has shown that elevated groundwater is one of the most important causes of gully erosion [18,75,76]. Nevertheless, we do not see this important factor incorporated in any of the gully erosion prediction models so far.

Soil type
Although gullies can evolve on any soil type, the soil type determines the size and shape of the gullies [77]. Therefore, integrating soil type into gully modeling increases prediction accuracy.

Soil texture
Soil texture determines the rate of infiltration [78] and erosion resistance, which determines the volume of gully erosion [79,80] Clay, clay loam, loam, sandy clay, sandy clay loam, silty clay, and silty clay loam.
Land cover Land cover and gullies are closely linked [81] as land cover is one of the factors that set the threshold for gully initiation [81] cultivated land, bushland, plantation forest, village, waterbody, woodland, and grassland The GWT surface generated using the IDW technique was resampled using the bilinear resampling technique. This method determines the new value of a cell based on a weighted distance average of the four nearest input cell centers [54]. This method is used for continuous data and will cause some smoothening of data.

Multicollinearity Test
In this study, multicollinearity was tested by combining results from the variance inflation factor (VIF) and tolerance (TOL), which are commonly used in different fields, including forest fire and gully erosion [41,82], with a correlation matrix of all predictor factors. The correlation method used in this study is the Pearson correlation coefficient. A correlation with a corresponding p-value of >0.01 is considered insignificant. The corrplot package, implemented in R statistical software, has been used to present the correlation matrix and the confidence interval graphically [83]. Tolerance is the reciprocal of VIF. Collinearity among predictor factors reduces model prediction accuracy [84]. Therefore, input factors with evidence of collinearity will be discarded before the GESM is developed.

Gully Erosion Susceptibility Map (GESM) and Variable Importance
Frequency ratio (FR) and random forest (RF) models discussed below were used to build the GESM. The GESM was finally classified as "low", "moderate", "high", and "very high" using the natural break method or the Jenks method in ArcGIS [85]. This classification method was selected as it reduces variance within a class. The RF model was also used to rank the most important gully erosion predictor factor.

•
Frequency ratio (FR) model As depicted in Equation (1), FR is the ratio of gully erosion probability of occurrence to nonoccurrences within a gully predictor factor class [29,41]. This method is one of the simplest statistical bivariate techniques used in various fields, such as landslide susceptibility mapping [86]. Moreover, studies have shown that the FR model predicts gully susceptible areas very well [36,87]. The modeling was executed by randomly dividing the gullies into a training set (n = 39 or 70%) and a validation set (n = 17 or 30%). ArcGIS 10.5 was used to develop the GESM using the FR model.
where A is the number of gully erosion pixels for each class of predictor factors, B is the total number of gully erosion pixels in the study area, C is the number of pixels in each class of gully predictor factor, and D is the number of total pixels in the study area.
The FR values obtained using Equation (1) were normalized using Equation (2).
where Y NFR is the normalized frequency ratio, X FR is the frequency ratio of the class within a predictor factor, MIN FR is the minimum of all frequency ratios within a predictor factor, and MAX FR is the maximum of all the frequency ratios within a predictor factor. Finally, the gully erosion susceptibility map is developed by summing up the normalized frequency ratio values for each predictor factors class.

•
Random forest (RF) model After conducting the multicollinearity test, gully erosion variable importance is computed using the RF model. The RF is known for effectively predicting variable importance in different disciplines, including land subsidence [88], invasive plant [89], groundwater [90], gully head susceptibility [31], and forest fire susceptibility [82]. The RF model is a multivariate nonparametric machine learning technique developed by Breiman [91]. The RF is a powerful decision tree classifier that predicts well when there is missing data, avoids over-fitting problems, produces more stable results, and is less sensitive to multicollinearity than other machine learning algorithms (e.g., support vector machine (SVM) and classification and regression tree (CART)) [30,88,92,93]. It is also known for predicting gully erosion very well compared to other machine learning algorithms [41].
The RF model was implemented in R statistical software [94]. As the analysis in the RF model was cell-based, a total of 2963 and 6560 gridded cells (12.5 by 12.5 m) were extracted from the gully and nongully irregularly shaped sample spatial polygons, respectively.
The RF model works by growing many decorrelated decision trees as a base learner using a fraction of randomly selected gully observation and gully predictor factors, with replacement. Every tree was trained using 2/3 of the randomly selected training samples, while the remaining 1/3 training samples, named out-of-bag (OOB) samples, were used to validate the prediction result. Finally, the majority vote or mode rule was used to allocate a pixel to a class [95]. The mean decrease in the Gini index (or Gini impurity) was used as an indicator for variable importance of the evaluated gully-erosion predictors [96]. The mean decrease in Gini is the mean of the total decrease in node impurity of the variable (i.e., gully predictor factors), weighted by the proportion of samples reaching that node in each individual decision tree in the RF. A higher mean decrease in Gini indicates higher variable importance.
Determination of variable importance using the RF model was executed for three scenarios with different combinations of gully predictor factors: (1) keeping all 15 nonsoiltype gully predictor factors for Nitisol soils only (i.e., excluding other soil types); (2) keeping all 15 factors for Vertisol soils only (i.e., excluding other soil types), and (3) using all 16 variables (i.e., including all soil types in the analysis).

Model Validation
In this study, model performances were evaluated using the area under the ROC curve (AUC). A ROC curve is the plot of sensitivity versus 1-specificity for different threshold values. The area under the ROC curve (AUC) is a commonly used parameter for quantifying the quality of a classificator because it is a threshold-independent performance measure [97]. It is a valuable technique for visualizing and measuring the accuracy of the models [26], and its value ranges between 0.5 and 1, with the highest accuracy at 1 [29]. This method has been successfully used in gully erosion prediction research [24,25,31].

Multicollinearity Test
The multicollinearity test showed that the VIF ranges from 1.04 to 7.8, while the TOL, which is the reciprocal of the VIF, varies from 0.13 to 0.96 (Table 3). VIF values greater than 5, with corresponding TOL values less than 0.2, indicate serious multicollinearity among factors [98]. VIF values for TRI and slope are greater than 5, suggesting multicollinearity (Table 3).  In addition to the multicollinearity test using the VIF method, the results of the correlation matrix ( Figure 3) suggest that TRI and slope are strongly correlated (0.9), with a corresponding p-value of 0 ( Figure S1), indicating that the correlation is significant. Guided by the results obtained using both the correlation matrix and VIF, we removed TRI and slope from the entire analysis. Besides, model prediction performance was evaluated by varying the number of gully predictor factors, depending on their importance presented in Figure 4c. In addition to the multicollinearity test using the VIF method, the results of the correlation matrix (Figure 3) suggest that TRI and slope are strongly correlated (0.9), with a corresponding p-value of 0 ( Figure S1), indicating that the correlation is significant. Guided by the results obtained using both the correlation matrix and VIF, we removed TRI and slope from the entire analysis. Besides, model prediction performance was evaluated by varying the number of gully predictor factors, depending on their importance presented in Figure 4c.

Gully Characteristics and Spatial Distribution
The average width, depth, and length of the 56 gullies were 6, 1.6, and 227 m, respectively (Table 4). Gullies in the study watershed occupy 16.8 hectares of land. On the basis of the average land-holding size (~0.7 ha) in the Ethiopian highlands [99], the land taken over by the gullies would have been able to support 24 households. Furthermore, the gullies in the study catchment are primarily located on cultivated land (57.45%) and grasslands (41.3%), indicating that they are eroding economically important land and, thereby, threatening the food security of the rural community. Many of the gullies were found on valley bottoms compared to hillslope areas. A larger percentage (>70%) of gullies were located on slope gradients less than 5%, and for areas with flat (48.1%) plan curvatures rather than concave and convex plan curvatures. Many gully pixels were also found at lower elevations (2030-2070 m a.s.l. and 2070-2104 m a.s.l) compared to high elevation classes. Furthermore, though the flat aspect class occupied a relatively small area, a larger percentage of gullies were found on lands with a flat aspect.

FR Model
The frequency ratio values listed in Table 5 provide a spatial relationship between gully locations and predictor factors (see Figures S2-S15). When FR values are greater than 1 in a given gully-erosion predictor class, the class may be considered susceptible to gullies [87]. Here, we will present the FR values of the top 4 gully-erosion predictor classes

Gully Characteristics and Spatial Distribution
The average width, depth, and length of the 56 gullies were 6, 1.6, and 227 m, respectively (Table 4). Gullies in the study watershed occupy 16.8 hectares of land. On the basis of the average land-holding size (~0.7 ha) in the Ethiopian highlands [99], the land taken over by the gullies would have been able to support 24 households. Furthermore, the gullies in the study catchment are primarily located on cultivated land (57.45%) and grasslands (41.3%), indicating that they are eroding economically important land and, thereby, threatening the food security of the rural community. Many of the gullies were found on valley bottoms compared to hillslope areas. A larger percentage (>70%) of gullies were located on slope gradients less than 5%, and for areas with flat (48.1%) plan curvatures rather than concave and convex plan curvatures. Many gully pixels were also found at lower elevations (2030-2070 m a.s.l. and 2070-2104 m a.s.l.) compared to high elevation classes. Furthermore, though the flat aspect class occupied a relatively small area, a larger percentage of gullies were found on lands with a flat aspect.

FR Model
The frequency ratio values listed in Table 5 provide a spatial relationship between gully locations and predictor factors (see Figures S2-S15). When FR values are greater than 1 in a given gully-erosion predictor class, the class may be considered susceptible to gullies [87]. Here, we will present the FR values of the top 4 gully-erosion predictor classes that provide the best FR model accuracy. However, this does not mean that the other gully predictor factors do not play a role in gully erosion, but only that the top four important predictor factors well illustrated the effects of these factors. Though large areas (40%) were covered by the DD class between 0 and 0.2 km km −2 , they had the smallest FR (0.34) and were less vulnerable to gully erosion. Drainage density (0.2-0.5 and 0.5-0.9 km km −2 ), with FR 1.67 and 1.48, respectively, were the most susceptible areas for gully formation. Drainage density of 0.9-1.4 and 1.4-2.2 km km −2 covers a small portion of the catchment, has relatively smaller FR values, and is less susceptible to gully erosion.
Based on FR values, the vulnerability to gully erosion increases with a decline in elevation. Low elevations (2030-2070 and 2070-2104 m a.s.l.), with FR 1.21 and 1.27, were the most susceptible areas for gully formation, while higher elevation areas showed little to no susceptibility to gully erosion.
Grassland (for grazing) with an FR value of 3.22 was the most susceptible area to gully-erosion. Though large areas were cultivated (82.7%), they had smaller FR (0.68) and were less prone to gully erosion. Other land uses, such as bushland, woodland, and plantation forests, were also less correlated with gully erosion.
The FR values also showed that gullies are positively correlated with shallower GWT zones. GWT depths (1.03-1.51 and 1.51-1.95 m) were found to have a higher FR (1.34) value, indicating that they are strongly correlated with gully erosion. Conversely, areas with deeper GWT had lower FR values, reflecting lower susceptibility to gully erosion. Areas with GWT depths between 0 and 1.03 occupy a small percentage of the catchment (2%) and are less prone to gully erosion (FR = 0.04). Figure 5a presents the GESM obtained using the FR model. Susceptibility is classified as "low", "moderate", "high", and "very high." The classifications were made using the natural break method or the Jenks method in ArcGIS [85]: 13.9%, 23.8%, 45.5%, and 16.8% for "low", "moderate", "high", and "very high" classes, respectively.  Figure 5a presents the GESM obtained using the FR model. Susceptibility is classified as "low", "moderate", "high", and "very high." The classifications were made using the natural break method or the Jenks method in ArcGIS [85]: 13.9%, 23.8%, 45.5%, and 16.8% for "low", "moderate", "high", and "very high" classes, respectively.

RF Model
The magnitude and rate of gully expansion differ between Nitisols and Vertisols [43] therefore, the most important gully predictor factors were examined separately for these soils (Figure 4a,b) in addition to all the soil types (Figure 4c).

RF Model
The magnitude and rate of gully expansion differ between Nitisols and Vertisols [43]; therefore, the most important gully predictor factors were examined separately for these soils (Figure 4a,b) in addition to all the soil types (Figure 4c).
Based on the mean decrease in Gini index values for the Nitisol soil type (Figure 4 The RF-model-derived GESM is presented in Figure 5b. The Jenks algorithm in ArcGIS [85] was used to determine the "low", "moderate", "high", and "very high" susceptibility classes at the class break of 46.7%, 20.8%, 13%, and 19%, respectively.
The percentage of gully erosion susceptibility class for Vertisols and Nitisols is presented in Figure 6. Class breaks of the low, moderate, high, and very high susceptibility classes in Nitisol are 47.9%, 18.8%, 19.8%, and 13.5%, whereas in the Vertisol class, breaks are 33.7%, 23.7%, 17.2%, and 25%, respectively. One can see that the areas with a low susceptibility class are higher for Nitisols than Vertisols. In contrast, the most susceptible area in Vertisols is approximately two times higher than in Nitisols. For the moderate and high susceptible classes, similar predictions have been obtained for both soils.
The percentage of gully erosion susceptibility class for Vertisols and Nitisols is presented in Figure 6. Class breaks of the low, moderate, high, and very high susceptibility classes in Nitisol are 47.9%, 18.8%, 19.8%, and 13.5%, whereas in the Vertisol class, breaks are 33.7%, 23.7%, 17.2%, and 25%, respectively. One can see that the areas with a low susceptibility class are higher for Nitisols than Vertisols. In contrast, the most susceptible area in Vertisols is approximately two times higher than in Nitisols. For the moderate and high susceptible classes, similar predictions have been obtained for both soils.

Model Validation
Model performance for the FR and RF models was evaluated for five scenarios (  Figure 7). The first case involved all predictor factors (i.e., using the 14 gully-predictor factors, excluding TRI and slope), which is called "All" hereafter. The second case used the top six factors obtained from Figure 4c, which is called "Top6" hereafter. The third case used all the top six gully-predictor factors but excluded soil type, called "exclSoil" hereafter. The fourth case involved all the top six gully-predictor factors but excluded GWT, called "exclGWT" hereafter. The third and fourth cases were evaluated to see the predictive potential of the newly introduced parameters (i.e., soil type and GWT) for gully prediction. In the fifth scenario, the top four factors obtained from Figure 4c were used; this scenario is called "Top4" hereafter.

Model Validation
Model performance for the FR and RF models was evaluated for five scenarios (Figure 7). The first case involved all predictor factors (i.e., using the 14 gully-predictor factors, excluding TRI and slope), which is called "All" hereafter. The second case used the top six factors obtained from Figure 4c, which is called "Top6" hereafter. The third case used all the top six gully-predictor factors but excluded soil type, called "exclSoil" hereafter. The fourth case involved all the top six gully-predictor factors but excluded GWT, called "exclGWT" hereafter. The third and fourth cases were evaluated to see the predictive potential of the newly introduced parameters (i.e., soil type and GWT) for gully prediction. In the fifth scenario, the top four factors obtained from Figure 4c were used; this scenario is called "Top4" hereafter.
The prediction accuracy of models using AUC is categorized as poor (0.5-0.6), average (0.6-0.7), good (0.7-0.8), very good (0.8-0.9), and excellent (0.9-1) [100]. In the case of RF, all five scenarios demonstrated excellent model prediction accuracy. The Top4 and Top6 model scenarios provided the best AUC values (99.9%). A higher AUC value (99.8%) was also obtained for the All model scenario in the RF model. Prediction accuracy after excluding soil type (exclSoil) and GWT (exclGWT) from the top six factors was 97.9% and 99.3%, respectively.
Prediction accuracy using the FR model ranged from average to good. Similar to the RF model, the highest prediction accuracy (AUC = 71.6%) was obtained for the Top4 FR model. The AUC values for the exclSoil and Top6 models were 70.8% and 70.7%, respectively. Lower AUC values were obtained for the All model (AUC = 63.3%) and the ex-clGWT model (AUC = 63.1%). Excluding the GWT reduced the FR model prediction accuracy by about 7%, whereas excluding the soil type did not affect the FR model's performance.

Spatial Distribution of Gullies and Gully-Erosion Predictor Variable Importance
All 16 gully-erosion predictor factors were found to be important for the development of gullies in the Minzir catchment, at different levels of importance. However, gully erosion prediction using the FR and RF models showed that the best prediction accuracy was obtained using the top four most important gully-predictor factors. This suggests that the impact of the other factors on gully erosion is well illustrated by DD, elevation, land use, and GWT.
DD was found to be the most important factor that determines gully location in the study catchment. Studies have shown that gully erosion susceptibility increases with increasing DD [33,101,102]. The FR values obtained in this study are the highest for the middle DD classes, and smaller values were obtained for the two highest classes. As described above, gully erosion is dominantly a function of four major factors. Therefore, lower FR Prediction accuracy using the FR model ranged from average to good. Similar to the RF model, the highest prediction accuracy (AUC = 71.6%) was obtained for the Top4 FR model. The AUC values for the exclSoil and Top6 models were 70.8% and 70.7%, respectively. Lower AUC values were obtained for the All model (AUC = 63.3%) and the exclGWT model (AUC = 63.1%). Excluding the GWT reduced the FR model prediction accuracy by about 7%, whereas excluding the soil type did not affect the FR model's performance.

Spatial Distribution of Gullies and Gully-Erosion Predictor Variable Importance
All 16 gully-erosion predictor factors were found to be important for the development of gullies in the Minzir catchment, at different levels of importance. However, gully erosion prediction using the FR and RF models showed that the best prediction accuracy was obtained using the top four most important gully-predictor factors. This suggests that the impact of the other factors on gully erosion is well illustrated by DD, elevation, land use, and GWT.
DD was found to be the most important factor that determines gully location in the study catchment. Studies have shown that gully erosion susceptibility increases with increasing DD [33,101,102]. The FR values obtained in this study are the highest for the middle DD classes, and smaller values were obtained for the two highest classes. As described above, gully erosion is dominantly a function of four major factors. Therefore, lower FR values for the two highest DD classes may be attributed to little contribution from the other three gully predictor factors. For example, the GWT in this area was the deepest, indicating little susceptibility to gully erosion.
Elevation was found to be the second most important gully-erosion predictor factor. The two lower elevation classes (2030-2070 and 2070-2104 m a.s.l.) had the largest gullied area, which was revealed by high FR values obtained for these classes (Table 5). This finding is consistent with studies in subhumid Iran, which found low elevation areas to be vulnerable to gullies due to the presence of concentrated flow [42].
The third-most important gully-erosion predictor factor was land use/cover. This is consistent with studies that found land cover as one of the most important factors determining gully locations [31,38,61]. While in the Minzir catchment, the area of grazing land (13.2%) is smaller than that of cultivated land (82.7%), it has the largest proportion of gullies, which was revealed by a high FR value (Table 5). Studies have stated that grazing lands are more susceptible to gully-erosion than other land-use groups [25,42,77]. Moreover, [103,104] reported that the conversion of forest land to cultivated and grazing land had resulted in increased gully erosion. Land-use change affects gully erosion by altering the hydrological and physicochemical properties of the soil. For instance, in tropical Brazil, grazing has increased the amount of surface runoff that, in turn, increased saturated area and subsurface flow in valley bottoms, which has led to gully development [76]. Forest plantations have resulted in a double increase in organic carbon and steady-state infiltration, thereby improving aggregate stability [105]. It was found that uncultivated land had twice as much soil organic carbon as 50-year-old cultivated land, which enhances soil aggregate stability and, therefore, increases soil resistance to erosion [106].
The relationship between the spatial distribution of gullies and the elevation of the terrain and slope suggested that gullies are mainly located at the valley bottom. This finding is in agreement with [24,33], who reported that due to concentrated flow, valley bottoms are more vulnerable to gullies. Field observations have also shown that intensified agricultural activities in the valley bottom expose the catchment to all forms of land degradation.
The fourth most important factor that determines gully erosion is GWT. Gullies have been observed in all classes of GWT, but FR values have shown that areas with GWT depths shallower than 2 m are the most susceptible to gullies (Table 5). This is consistent with most studies in subhumid Ethiopia, where elevated groundwater increases soil pore water pressure and enhances gully erosion rate [13,21]. This is also in agreement with the findings of [43], who have reported a higher gully head retreat rate (about four times) for gullies with shallower GWT than areas with deeper GWT.

Gully Susceptibility in Vertisols and Nitisols
Both Nitisols and Vertisols, which are the dominant soils in the study catchment, were found to be vulnerable to gully erosion (Table 5). However, both the GESM and the FR value showed that Vertisols are more prone to gully erosion than Nitisols. The RF-model-derived GESM showed that "low" susceptible classes in Nitisols (47.9%) were larger than Vertisols (33.7%), whereas "very high" susceptible classes were larger in the Vertisols (25%) than the Nitisols (13.7%). This is in line with the finding by [43] that higher moisture retention capacity, poor drainage, and elevated GWT lead to increased pore water pressure, which results in higher rates of gully erosion in Vertisols than in Nitisols that have lower water retention capacity and deeper GWT. In addition, Nitisols are a well-drained soil type [107] that reduces gully erosion due to both concentrated flow and soil saturation. Additionally, [108] found that the high swelling and shrinkage nature of Vertisols makes them more sensitive to gully erosion. The FR values also show that Fluvisols and Cambisols, which occupy a very small percentage of the catchment, are susceptible to gullies.
The results of separate variable importance analyses for Nitisols and Vertisols showed land cover and drainage densities as the number-one factor determining the location of the gully in each soil, respectively. Land cover for Vertisols was ranked number nine, suggesting it has little significance for gully-erosion in this soil. The other important gullyerosion factors (i.e., elevation, TWI, and GWT) are similar for both soils. Further research is needed on the fact that land cover is the most important gully-erosion factor for Nitisols.

Model Performance and Comparison
The performance of the FR models ranged from average to good, whereas, for the RF model, excellent prediction accuracy was obtained. Excellent model performance was achieved for all five models in the RF model. FR model performance was average for the models exclGWT and All, whereas Top4, Top6, and exclSoil models performed good. The Top4 model performed best in both RF and FR cases than the rest of the four models. This suggests that despite the importance of all predictor factors for gully erosion, gully erosion can successfully be predicted using only DD, elevation, land cover, and GWT.
The impact of excluding GWT and soil type as gully erosion predictor factors had a smaller impact on the RF model performance, whereas excluding GWT from the top six parameters caused a 7% reduction in the AUC value of the FR model. This shows the high explanatory power of the GWT parameter on the FR model. Moreover, GWT is among the top four most important parameters that determine gully locations. Therefore, accounting for this parameter in future gully erosion prediction research can provide better gully erosion prediction accuracy, especially in subhumid areas. Excluding soil type from the FR model did not affect the model performance. Despite this, soil type being the sixth most important factor determining gully development suggests its significance for gully erosion.
The excellent model performance in all five RF model scenarios may attribute to the ensemble learning algorithm in the RF model that works on majority voting principles by growing a large collection of decorrelated decision trees (or models) as a base learner. As a result, a single classifier error in RF model is outweighed by the majority [109], whereas FR is a single model. Furthermore, RF can handle high dimensionality (high numbers of attributes) and large datasets better than FR.
The excellent prediction accuracy achieved by the RF model is in agreement with [38] and [31], who have successfully mapped gully erosion and gully head susceptible areas. The FR model employed in this study predicts gully areas better than other statistical models such as weight of evidence and index of entropy [36,42].
Current approaches towards gully erosion susceptibility mapping in the Ethiopian highland are based on topographical threshold factors such as topographic wetness index (TWI) and stream power index (SPI) [39,40]. The application of these thresholds is flawed, primarily because streams and saturated-bottom lands are preferentially considered most vulnerable to gullies. However, our findings have shown that TWI and SPI importance on gully erosion is limited compared to the top four factors we have presented in this study, which have provided the best model prediction accuracy. Therefore, the finding of this research can be used for the successful planning and design of gully reclamation measures in a catchment.

Conclusions
There is no single factor responsible for the formation of gullies in the study area; all 16 factors we examined were found to be important for the development of gullies in the Minzir catchment. However, gully erosion prediction using the FR and RF models showed that the best prediction accuracy was obtained using the top four most important gully-predictor factors: drainage density, elevation, land use, and groundwater. This suggests that the impact of other gully-erosion parameters is well illustrated by the top four important gully-erosion predictor factors. The separate variable importance analysis for Nitisols and Vertisols showed land cover and drainage densities, respectively, as the most important factors that determine gully locations. The other most important factors for both soils were more or less similar. The fact that land cover is the most important factor for the development of gullies in Nitisols requires further investigation. The results of this study suggest that future planning and implementation of conservation measures in subhumid regions of Ethiopia should target areas with higher drainage density, low-lying areas, grazing land, and shallower groundwater table, which have been shown to be vulnerable to gully erosion.
Frequency ratio (FR) and random forest (RF) models have been identified as useful techniques for mapping gully erosion vulnerable areas and identifying the most important gully erosion predictor factor. The performance of the FR models ranged from average to good, whereas for the RF model, excellent prediction accuracy was obtained. The gully erosion susceptibility map developed in this study can be used to plan informed gully erosion rehabilitation and prevention measures in the Minzir catchment. As sediment from gullies threatens the entire upper Blue Nile basin, a water source for many water resource development activities, we recommend similar studies in different agroecology and geomorphic settings within the degraded Ethiopian highlands.

Data Availability Statement:
The data presented in this study are available in the article.