Implementation of Artiﬁcial Intelligence Based Ensemble Models for Gully Erosion Susceptibility Assessment

: The Rarh Bengal region in West Bengal, particularly the eastern fringe area of the Chotanagpur plateau, is highly prone to water-induced gully erosion. In this study, we analyzed the spatial patterns of a potential gully erosion in the Gandheswari watershed. This area is highly a ﬀ ected by monsoon rainfall and ongoing land-use changes. This combination causes intensive gully erosion and land degradation. Therefore, we developed gully erosion susceptibility maps (GESMs) using the machine learning (ML) algorithms boosted regression tree (BRT), Bayesian additive regression tree (BART), support vector regression (SVR), and the ensemble of the SVR-Bee algorithm. The gully erosion inventory maps are based on a total of 178 gully head-cutting points, taken as the dependent factor, and gully erosion conditioning factors, which serve as the independent factors. We validated the ML model results using the area under the curve (AUC), accuracy (ACC), true skill statistic (TSS), and Kappa coe ﬃ cient index. The AUC result of the BRT, BART, SVR, and SVR-Bee models are 0.895, 0.902, 0.927, and 0.960, respectively, which show very good GESM accuracies. The ensemble model provides more accurate prediction results than any single ML model used in this study.


Introduction
The origin and development of gullies are related to water-induced soil erosion and are treated as a major global problem. The formation of gullies and their gradual expansion significantly alter the shape of the earth's surface [1]. Therefore, gully formation that gradually changes the shape of the earth's surface is often associated with the formation of badland geomorphic features, which is particularly impeding in production activities, the construction of road networks, and associated activities [2]. As a result, land degradation in the form of gully erosion is the major cause of loss of productive soil surface, affecting nearly 1 billion hectares globally [3,4]. Therefore, this type of hazardous phenomenon causes environmental degradation, threatening to the lives of others, as well as loss of economic activity. Therefore, to understand the formation and development of gullies and associated erosional activities, and to overcome this global problem in a sustainable way it is necessary to perform gully erosion susceptibility (GES) mapping. In general, gully formation is a complex phenomenon that depends mainly on extreme precipitation events and improper land use utilization, as well as lack of proper planning [5]. Basically, a gully can be defined as a permanent water channel with a temporary flow of water during phases of heavy rainfall, in which the sediment is carried down the slope. The formation of gully erosion is a combination of different sub-processes, and these are the retreat of sloping land through head-cut, piping, fluting, crack development by tensional activity, and mass wasting [6]. In general, gullies are classified into three categories based on depth, namely grooves (<0.3 m), shallow gullies (0.3 to 2 m), and deep gullies (>3 m) [5]. The origin and development of gullies are widely responsible for the extensive hot and dry periods followed by heavy rainfall, together with human land-use practices, particularly in the monsoon dominated hot-dry climatic area. There are two groups of factors for the occurrence of gullies, which are physical factors (topography, climate, soil texture, vegetation cover, etc.) and anthropogenic factors (over grazing, deforestation, land utilization in an unplanned way, etc.) [7]. It is also a very well-known fact that similar factors are not responsible for the occurrence of gullies throughout the several regions in this world. Therefore, depending on the unique topographical, climatological, and hydrological characteristics, the pattern of gullies occurrence, their morphology, and erosional activities vary from place to place.
As stated in the aforementioned paragraphs, soil erosion causes a huge amount of economic losses and is treated as a global threat. The economic losses caused by soil erosion in South Asia are nearly USD 5400 million caused by a loss of soil productivity affecting 36 million tons annually [8]. The phenomenon of land degradation caused by soil erosion in the form of gully erosion is a critical hazard that has already affected approximately 3.975 million hectares (Mha) of land across India [9]. According to the various reports, such as the Ministry of Agriculture (MoA) (in 1994) and the National Bureau of Soil Survey and Land Use Planning (NBSS and LUP) (in 2004), the land degradation rate in India is estimated to be 107.4 and 146.8 Mha, respectively [10,11]. According to the Government of India's Ministry of Environment and Forestry's (2011) report, water erosion is responsible for about 10.21% of the land degradation in India. The unique climatic characteristics in India, i.e., heavy rainfall during the monsoon season and temperature fluctuations throughout the year, are responsible for the formation of gullies and increase the erosional potential and sedimentation. Our study area in the Gandheswari river basin is enormously affected by land degradation through gully erosion as it is located in the extended part of the Chotanagpur plateau with a sub-tropical Indian climatic condition. In addition to these two aspects, agricultural activities, destruction of the soil structure through fertilizers, increasing deforestation, and a lack of proper drainage systems are also responsible for the formation and development of rill-gullies in this area. Therefore, it is necessary to prevent and mitigate the gully formation processes by ascertaining their spatial extent and evaluating their controlling factors. GES mapping is an appropriate tool to carry out this task in a sustainable and perspective way.
In general, qualitative and quantitative methods have been applied to analyze gully erosion and to perform GESM [12]. Basically, a qualitative analysis requires a geomorphological and heuristic approach Remote Sens. 2020, 12, 3620 3 of 38 and skilled persons' knowledge, while quantitative methods are based on a perfect relationship among the gully erosion conditioning factors (GECFs) and numerical values representing the occurrence of gullies [13]. Over time, GESM has been carried out using various techniques and models. Initially, it was done using remote sensing, geographic information system (GIS), and statistical models [14]. Different types of statistical models including the frequency ratio, analytical hierarchical process (AHP) [15], certainty factor (CF) [16], weight of evidence (WoE) [17], and evidential belief function (EBF) [18] have been applied to assess GESM approaches. Then, machine learning (ML) algorithms have been used for their ability to handle a big dataset and discover the complex relationships among the various conditioning factors used in GESM. The most widely applied ML algorithms in GESM are the artificial neural network (ANN) [19], support vector machine (SVM) [19], random forest (RF) [20], boosted regression trees (BRT) [21], alternating decision tree (ADTree) [22], and the multi-layer perception approach (MLPC) [23]. More recently, ensemble techniques, i.e., an amalgamation of two or more statistical or ML models, have been used for GESM. Basically, ensemble models have been used to remove the shortcomings of individual statistical or ML models [24]. Therefore, keeping in view the fact of widely acceptance of ML algorithms among several research groups of people, due to their high accuracy assessment in the prediction result, in this study we also used three popular ML algorithms namely boosted regression tree (BRT), Bayesian additive regression tree (BART), support vector regression (SVR), and the ensemble of the SVR-Bee models, and as we know, the ensemble has a greater accuracy assessment than any single stand-alone ML model. It is also stated that in several literature studies as well as to the best of our knowledge, there is no such research work which is based on the ensemble of SVR-Bee model and BART algorithm on GESM. Therefore, the ensemble of SVR-Bee algorithm and their comparison with the three aforementioned ML models is the novelty of this research work for optimal prediction analysis of GESM. The literature study shows that various ML algorithms as well as statistical approacheshave been widely used in GESM by Ghosh and Guchhait [8,25], Hembrem et al. [26], Roy et al. [23], and Chakrabortty et al. [27,28] in the surrounding of the present research area, i.e., the extended part of Chotanagpur plateau region of Rarh Bengal in the eastern part of India. However, until now, no research work has been carried out in the present study area through the aforementioned ML methods used in the current research work. Therefore, we have also compared the prediction result produced by the current research work and another established research work's result, which is based on other ML models, except the ML models used in this research study area for understanding the suitable and better accuracy performances of ML algorithms. In this research, we assessed the GESM of a sub-tropical watershed of the Gandheswaririver in West Bengal, India.
The main aim in this research study is to assess gully erosion susceptibility mapping. As we know, the foremost step for sustainable management of land degradation through gully-induced soil erosion is to prepare GESM in an accurate way. It is indeed necessary to map future gully erosion prone areas in order to minimize the adverse effect of gully-induced soil erosion and for the proper management of agricultural land, road networks, etc. Thus, to do so in this study, we have prepared gully erosion susceptibility maps through theensemble approach of SVR-Bee model and three other ML algorithms. Therefore, for progress in our research work, here we have chosen twenty favorable conditioning factors for the occurrence of gullies in this region. Alongside, several statistical indices were used for experimental analysis of the input data, i.e., several gully erosion conditioning factors (GECFs), and the validation and accuracy assessment of the prediction result produced by ML models. Finally, it is concluded that the main strategy in this research work is to map gully erosion susceptibility in an optimal way through applying three stand-alone ML algorithms and the ensemble approach. The final output result of gully erosion susceptibility maps can help planners and land management authorities make the best use of land resources for sustainable development within this particular basin area.

Methodology
The objective of this research work was completed in the following five steps ( Figure 2). a. A total of 178 (89 each for gully and non-gully) gully erosion points were used to prepare a gully erosion inventory map. A total of 20 GECFs were used for the different GESMs. b. The variance inflation factor (VIF) and tolerance (TOL) techniques were used for multi-collinearity (MC) analysis. c. The importance of several GECFs was determined using the random forest (RF) algorithm and step-wise weight assessment ratio analysis (SWARA). d. GESMs wereprepared based on the boosted regression tree (BRT), Bayesian additive regression tree (BART), support vector regression (SVR), ML models and the SVR-Bee ensemble model. All of these ML models and the ensemble approach were run in MATLAB and the 'R' statistical In addition, the river basin is a unique region in terms of different physical characteristics such as the undulating plain of Chotanagpur plateau fringe, unconsolidated rock structure of Rajmahal formation, and the lateritic rocks and soil structure. In the hot-dry and seasonal moist climatic region, red lateritic soil has formed with a concentration of iron-aluminum oxides and the composition is very hard in nature. However, when the exposed hard lateritic comes in contact with rain water during the wet season, rocks are eroded by the head cut erosion. The overall parallel drainage systems over the plateau fringe dissect the lateritic Rarh Bengal into several patches. Some patches have been found as a bad land with extensive rill gully and ravine erosion. Due to the monsoonal rainfall, this region is Remote Sens. 2020, 12, 3620 5 of 38 facing several water-induced gully erosions, which has been reported in the literature [23,25,32,33]. In the present study area of Gandheswari river basin, a mainly elongated round head and steep wall with V shaped gully types have been found. Therefore, the land degradation problem has been noticed in this river basin, which affects agriculture, forestry, deformation of the land surface, etc. The major land use types in this region are single and double agricultural land, agro-forestry, very few patches of deciduous broadleaf forest, Shurbland, bare soil surface, and built-up land. In this area, no such protection measurement has been taken yet for controlling gully erosion, only the plantation of social-forestry has been adopted to control the same. Keeping in view the above fact, it has been observed that the rate of gully-induced soil erosion is very high in this area.

Methodology
The objective of this research work was completed in the following five steps (Figure 2). The methodology used in this study is described in detail as follows: First, 178 randomly selected gully erosion points along with 20 GECFs were used to produce different gully erosion susceptibility maps. When researchers apply any linear statistical-based model, the multi-collinearity of the variables may decimate the accuracy of the model's result [34]. Before applying the gully erosion susceptibility models, we checked the potential linear dependencies among the variables to avoid multi-collinearity among variables. When the TOL value is above 0.2, and the VIF value is below 10, there is no multi-collinearity among the variables. The GES maps in this study area were prepared using the ML algorithms of boosted regression tree (BRT), Bayesian additive regression tree (BART), support vector regression (SVR), and the ensemble of the SVR-Bee model. Due to some specific advantages, we used the abovementioned four models to analyze the GESMs. The advantage of the BRT model is that it is a non-parametric method used for measuring the relationships among the dependent and independent variables to forecast the analysis [35]. BART is a tree ensemble method that explains the variance of output product variables in a given dataset [20]. SVR is a supervised ML algorithm that is widely used in a complex dataset because it measures several curved boundaries. Any single ML model will have pros and cons in terms of susceptibility mapping and an ensemble method may potentially overcome some limitations. Therefore, we developed an ensemble SVR model with Bee algorithm methods for this study. The Bee algorithm generally helps solve the big data problems through modification and by calculating the fitness of the model [36]. The validation of the gully erosion susceptibility maps

a.
A total of 178 (89 each for gully and non-gully) gully erosion points were used to prepare a gully erosion inventory map. A total of 20 GECFs were used for the different GESMs. b.
The variance inflation factor (VIF) and tolerance (TOL) techniques were used for multi-collinearity (MC) analysis. c.
The importance of several GECFs was determined using the random forest (RF) algorithm and step-wise weight assessment ratio analysis (SWARA). d.
GESMs wereprepared based on the boosted regression tree (BRT), Bayesian additive regression tree (BART), support vector regression (SVR), ML models and the SVR-Bee ensemble model. All of these ML models and the ensemble approach were run in MATLAB and the 'R' statistical programming package by using the respective algorithms. e.
Every model was validated using the receiver operating characteristic curve with the area under curve (ROC-AUC), accuracy (ACC), true skill statistic (TSS), and the Kappa coefficient index analysis.
Remote Sens. 2020, 12, 3620 6 of 38 The methodology used in this study is described in detail as follows: First, 178 randomly selected gully erosion points along with 20 GECFs were used to produce different gully erosion susceptibility maps. When researchers apply any linear statistical-based model, the multi-collinearity of the variables may decimate the accuracy of the model's result [34]. Before applying the gully erosion susceptibility models, we checked the potential linear dependencies among the variables to avoid multi-collinearity among variables. When the TOL value is above 0.2, and the VIF value is below 10, there is no multi-collinearity among the variables. The GES maps in this study area were prepared using the ML algorithms of boosted regression tree (BRT), Bayesian additive regression tree (BART), support vector regression (SVR), and the ensemble of the SVR-Bee model. Due to some specific advantages, we used the abovementioned four models to analyze the GESMs. The advantage of the BRT model is that it is a non-parametric method used for measuring the relationships among the dependent and independent variables to forecast the analysis [35]. BART is a tree ensemble method that explains the variance of output product variables in a given dataset [20]. SVR is a supervised ML algorithm that is widely used in a complex dataset because it measures several curved boundaries. Any single ML model will have pros and cons in terms of susceptibility mapping and an ensemble method may potentially overcome some limitations. Therefore, we developed an ensemble SVR model with Bee algorithm methods for this study. The Bee algorithm generally helps solve the big data problems through modification and by calculating the fitness of the model [36]. The validation of the gully erosion susceptibility maps (GESMs) is an important step in the assessment of gully erosion susceptibility models. Previous researchers have used the receiver operating characteristic (ROC) curve with the area under the curve (AUC), accuracy (Acc), true skill statistic (TSS), and the Kappa index for the quantitative validation of GESMs prepared based on the BART, BRT, SVR, and the ensemble model [37][38][39]. The ROC curve is the most common technique used to quantitatively assess the model performance and the AUC of ROC has been used to measure the diagnostic ability of the model. Finally, all the models were validated using an ACC, ROC-AUC, TSS, and the Kappa coefficient index analysis.

Gully Erosion Inventory Map (GEIM)
Geomorphologists have used hypothesis criteria using different statistical analyses and predicting several natural hazard phenomena. In addition, this prediction is determined based on the past events, and an associated future prediction will be estimated through a functional relationship in the given datasets for various conditioning factors [18]. In GESM, it is very important to collect data that represent the dependent variables to make predictions using any model. In this study, we identified gully head-cut erosion points from Google Earth satellite imagery and intensive field surveys conducted with GPS to prepare an accurate GEIM. In general, the GEIM is used to evaluate the relationship between the distribution of gully head-cuts and several conditioning factors. Basically, the formation of gully head-cuts depends on topographical, lithological, climatic, and various other factors. Consequently, predicting the occurrence of gully head-cuts is largely dependent on the abovementioned factors. We identified a total of 89 gully head-cuts and included them in our GEIM. Of the total number of identified gully head-cuts, 70% (62 gully head-cut points) were selected for training the dataset, and the remaining 30% (27 gully head-cut points) were used to test the dataset. Similarly, 89 non-gully points were selected and divided to be used in training and testing the dataset (Figure 1). Some field photographs of gully head-cut points are shown in Figure 3. dependent on the abovementioned factors. We identified a total of 89 gully head-cuts and included them in our GEIM. Of the total number of identified gully head-cuts, 70% (62 gully head-cut points) were selected for training the dataset, and the remaining 30% (27 gully head-cut points) were used to test the dataset. Similarly, 89 non-gully points were selected and divided to be used in training and testing the dataset (Figure 1). Some field photographs of gully head-cut points are shown in Figure 3.

Gully Erosion Conditioning Factors (GECFs)
The first step in predicting the GESM is to select several suitable GECFs. Basically, GECFs have been divided into conditioning and triggering factors including environmental, topographical, geological, hydrological, climatological factors, etc. Despite this, there is no universal regulation for selecting the most appropriate GECFs. Therefore, we based our selection of GECFs on the

Gully Erosion Conditioning Factors (GECFs)
The first step in predicting the GESM is to select several suitable GECFs. Basically, GECFs have been divided into conditioning and triggering factors including environmental, topographical, geological, hydrological, climatological factors, etc. Despite this, there is no universal regulation for selecting the most appropriate GECFs. Therefore, we based our selection of GECFs on the geographical condition, variations in gully head-cut occurrences, and the models used in our analysis. To fulfill the objective of this research, we selected a total of 20 conditioning factors for GESM. These factors are the slope, aspect, elevation, profile curvature, plan curvature, topographic wetness index (TWI), stream power index (SPI), drainage density (DD), distance from river, rainfall, land use land cover (LULC), normalized difference vegetation index (NDVI), soil texture, soil erodibility, geology, geomorphology, bare soil index (BSI), ferrous mineral index (FMI), iron oxide, and normalized difference water index (NDWI) (Figure 4a-t). Different data sources were used to obtain these conditioning factors. We used the ALOSPALSAR DEM with a 12.5 m resolution, which was freely available on the Alaska Satellite Facility (ASF) website, to prepare the topographic and hydrological factors. We also used Sentinel 2A satellite images with a 10 m resolution to prepare the LULC map. Furthermore, we employed a Landsat 8 OLI image, a topographical map at a scale of 1:50,000 from the Survey of India (SOI), and a geological map at a scale of 1:2,500,000 from the Geological Survey of India (GSI). Table 1 shows details of the several data sources used in this study. We divided the conditioning factors into five categories by using Jenk's natural break methods in the ArcGIS platform. The reason behind the classification of each conditioning factor into five categories is described as follows. During the time of gully erosion susceptibility modelling, we used the training gully and non-gully points for extraction of respective point values from each of the causative factor of gully erosion (such as slope, Remote Sens. 2020, 12, 3620 8 of 38 TWI, and NDVI). Thus, each of the gully and non-gully points contains different respective values in every gully conditioning factor. Therefore, to understand the spatial distribution of values in gully causative factors, we have divided several factors into five categories to indicate the five data ranges such as very high, high, moderate, low, and very low zones, since the stretched range (in the ArcGIS platform) is only given the highest and lowest value, The established research studies have presented the factors in a categorized way, such as five, four, and three [40,41]. The GECFs used in this research are discussed in the following section.

Multi-Collinearity (MC) Analysis
The linear relationship between two or more variables in a dataset is known as multi-collinearity [58]. The multi-collinearity analysis is based on the linear dependency among several variables. In general, the MC analysis is carried out when two or more independent variables in a regression model are correlated. The MC analysis indicates a lack of orthogonality among the variables in a dataset. Orthogonal means that there is no linear relationship between the variables [59]. It is well known that even a small MC can create big problems in the dataset and, therefore, prediction results may become inaccurate. Several conditioning factors were used in the GESM. Therefore, it is necessary to identify the perfect relationship among the variables through the MC test. In general, MC occurs when there is a high correlation among the variables in a dataset, in which case it is necessary to remove those particular variables, otherwise, the prediction accuracy will be reduced. In this study, the MC test was carried out using the tolerances (TOL) and variance inflation factors (VIF) techniques. These two techniques were calculated as follows: where 2 indicates the regression value of j on added variables in a given dataset. The threshold value of MC occurrence in a dataset is when VIF is >5 and TOL is <0.1.

Slope
The slope of an area is highly affected by the runoff pattern, infiltration rate, drainage pattern, and drainage density, which ultimately affects the erosional activities [42]. In areas with a steep slope, the infiltration rate is low, and the runoff is high, which regulates the initiation of gully formation.
Gently sloping areas are highly susceptible to flow accumulation and gully development [43]. It is very well known that gullies develop specifically in the gently sloping areas of a river catchment. The slope map was derived from the DEM and divided into five categories, i.e., 0 to 1.01, 1.01 to 3.38, 3.38 to 5.75, 5.75 to 15.24 and 15.24 to 43.18 degree (Figure 4a).

Aspect
Different parameters such as the duration of sunlight, vegetation cover and its distribution, evapotranspiration and moisture retention capacity largely depend on the slope aspect of the area. Therefore, this factor indirectly affects gully erosion processes. The structural configuration of an area is highly dependable on the slope aspect [44]. The aspect map in this study area was classified into nine categories (Figure 4b).

Elevation
Elevation is an important topographic attribute, and it controls the process of gully erosion along with the spatial distribution of gully occurrences (Conoscenti et al., 2014). The density and vegetation cover types, as well as precipitation, are affected by the elevation. The elevation map we obtained from the DEM was classified into five categories, i.e., 11 to 55, 55 to 80, 80 to 141, 141 to 246, and 246 to 383 m ( Figure 4c).

Profile Curvature
The profile curvature is parallel to the direction of the highest slope. It affects the flow pattern on the surface through the acceleration or deceleration rate. A negative value indicates a convex surface, while a positive value indicates a concave surface within a particular cell. A value of zero indicates a horizontal surface. The profile curvature map in this study is categorized into five categories, i.e., −3.32 to −0.90, −0.90 to −0.34, −0.34 to 0.28, 0.28 to 0.84, and 0.84 to 2.61 ( Figure 4d).

Plan Curvature
The plan curvature is the intersection of a horizontal plane with the surface plane [45]. It is useful for analyzing the land morphology. In a down slope area, the converging and diverging flow patterns of water are affected by the plan curvature and impact the soil erosion processes. The plan curvature map is classified into five categories, i.e.,

Topographic Wetness Index (TWI)
The TWI is a water-related topographic factor which specifically quantifies the hydrological process affecting the terrain [46]. The TWI is used to measure the runoff velocity, discharge rate, transportation capacity etc. i.e., in a nutshell, it is used to assess the erosive power of runoff. The following equation is used to measure the TWI [47].
where, A s is the catchment area in m 2 and β is the gradient of the slope in radians.  The SPI is used to measure the erosional capacity of a stream. Therefore, it is an important factor for GESM. A high SPI value indicates a high erosional capacity and vice-versa [48]. The SPI value were calculated by using the following equation.
where As represents the upslope area, and tanβ represents the slope angle. The SPI map was classified into the classes of 0. 34 Figure 4g).

Drainage Density (DD)
A higher DD represents a highersurface runoff and associated erosional capacity, and vice-versa. The DD is the total length of stream in a particular area. Many factors contribute to the development of DD, including geological structure, soil texture, vegetation coverage, slope etc. The DD was calculated as follows.
where, n i=1 S i is the total length of a stream in kilometers and a is total area of the basin in sq. km. The DD value map of the study area was classified into the classes of 0 to 0.66, 0.66 to 1.87, 1.87 to 3.09, 3.09 to 4.35 and 4.35 to 6.73 (km/km 2 ) ( Figure 4h).

Distance from River
The drainage system of a watershed is highly correlated with gully head-cut erosion and its retreat. The formation and on-going erosion of a gully head-cut is significantly related to the distance from the river and positively correlated with the drainage system [49]. The Euclidean distance buffering was used to prepare the distance from river map in GIS. The distance from river map is shown in Figure 4i and classified as 0 to 446. 36

Rainfall
Rainfall is one of the most important factors for GESM. In general, high intensity rainfall and longer rainfall periods cause more intense gully erosion, whereas light rainfall or short duration has less of an effect. After a long dry period, a short but high-intensity rainfall may have high erosive power and facilitate gully formation and erosion. In this study area, we applied the inverse distance weighted (IDW) method to prepare the rainfall map. The rainfall map was classified into the following five categories: 515.62 to 519.48, 519.48 to 522.17, 522.17 to 526.28, 526.28 to 531.32 and 531.32 to 537.03 mm ( Figure 4j).

Land Use Land Cover (LULC)
The land use of an area significantly influences on the slope stability and gully formation [50]. Basically, an area covered with dense vegetation is less prone to erosion, whereas an area of barren or sparsely vegetated land is more prone to gully formation and soil erosion. Therefore, LULC is a significant factor in GESM. We used Sentinel 2A satellite data along with a topographic map to prepare the LULC map and determine five LULC types, i.e., crop land, built-up land, shrubland, water bodies and deciduous broadleaf forest ( Figure 4k).

Normalized Difference Vegetation Index (NDVI)
The NDVI is widely used to detect the spatio-temporal variation of vegetation cover [51]. It is a measurement of the surface reflectance along with vegetation growth and biomass [52]. Therefore, to understand the vegetation cover and its impact on gully erosion, this factor has been Remote Sens. 2020, 12, 3620 13 of 38 widely used in GESM. The vegetation cover influences erosion through the trees root system, which holds the surrounding soil together. Landsat 8 OLI satellite data was used to calculate the NDVI map using the following equation.
where, ρ NIR is the reflectance of the near infrared band and ρ R is the reflectance of the red band.
The NDVI map of the present study area was classified into five classes, i.e., −0.10 to 0.05, 0.05 to 0.09, 0.09 to 0.12, 0.12 to 0.15 and 0.15 to 0.31 (Figure 4l).

Soil Texture
The soil texture of an area plays a vital role on gully erosion as it determines the rate of infiltration, surface and sub-surface runoff, and soil resistance [53]. The soil textures have been used to determine the tunnel erosion or piping, which ultimately form into gullies due to the subsidence of the roofs covering the tunnel. We used the soil texture factor for GESM and has previously also been widely used by several researchers. We classified the study area into the five soil texture units of urban area, gravelly loam, fine loamy sandy, fine loamy, fine clay ( Figure 4m).

Soil Erodibility
Soil erodibility largely depends on the soil texture. A sandy, loamy soil texture is more prone to erodibility than gravelly clay loam, for example. Therefore, soil erodibility significantly influences gully erosion, which is why we chose it as a conditioning factor. The soil erodibility map for this area was classified into the classes of 0.00 to 0.13, 0.13 to -0.14, 0.14 to 0.20, 0.20 to 0.32 and 0.32 to 0.38 (Figure 4n).

Geology
The geological structure of an area affects the occurrence of gully erosion. The intensity of weathering largely depends on the geological structures, and it influences the erosional processes. Therefore, it is important to know which rock type is exposed or close to the earth's surface and potentially influencing gully erosion. In this study area, we found four rock types, namely unclassified metamorphics, the Chotanagpur gneissic complex, fluvial sediments, and the gabbro and anorthosite complex (Figure 4o).

Geomorphology
Geomorphology is the physical features of the earth's surface and correlates with the geological structures of that area. The erosional pattern of an area is largely dependent on the geomorphological features. Therefore, in this study, we used geomorphology as an important factor for GESM. Seven types of geomorphological features were found on the ISRO's Geoportal Bhuvan website, and these are river, ponds, water bodies, older flood plain, active flood plain, pediment pediplain complex and dissected denudational hills and valleys (Figure 4p).

Bare Soil Index (BSI)
The BSI was introduced to differentiate between the bare soil surface from vegetated land area [54]. This index has been widely used to understand the vegetation cover and canopy density of an area [55,56]. We used the BSI as a GECF to evaluate the GESM for better prediction analysis. The following equation was used to obtain the BSI.
where, ρ SWIR represents the reflectance of the short wave infrared band, ρ R represents the reflectance of the red band, ρ NIR represents the reflectance of the near infrared band and ρ B represents the reflectance of the blue band. In our study area, the BSI map value ranges from −1016 to 897 and was classified into five categories (Figure 4q).

Ferrous Minerals Index (FMI)
Ferrous minerals are iron-bearing materials that originate from granite rocks through weathering processes. Basically, this mineral is commonly found in the lateritic soil. In this study, we used FMI as an important GECF for predicting gully susceptibility mapping. The following equation was used to calculate the FMI.
where SWIR is the shortwave infrared band, and NIR is the near infrared band of the electromagnetic spectrum. The FMI map in this study area was classified into the five categories of 0.95 to 1.11, 1.11 to 1.138, 1.138 to 1.16, 1.16 to 1.18 and 1.18 to 1.28 (Figure 4r).

Iron Oxide
Iron oxide is known as ferric oxide, and it is an inorganic compound. Iron oxide is formed through weathering of granitic rocks. Hard outcrops made up of iron minerals are found in lateritic soil. This material indirectly influences erosion through the weathering process. Areas with larger quantities of iron oxide are more susceptible to erosion, and vice-versa. The iron oxide levels in the soil were calculated using the following equation.
Iron oxide = Red band Blue band (7) The amount of iron oxide in the soil was classified into the five groups of 0.77 to 1.14, 1.14 to 1.23, 1.23 to 1.29, 1.29 to 1.38 and 1.38 to 2.04 ( Figure 4s).

Normalized Difference Water Index (NDWI)
The NDWI is used to determine the plant water content. It is essential for understanding the changes in water content and vegetation canopies [57]. In this study, we used the NDWI as a GECF to identify the vegetation health because the state of vegetation health influences the vulnerability to erosion in an area. The NDWI was calculated as follows [57].
where ρ NIR represents the reflectance of the near infrared band and ρ SWIR represents the reflectance of the short wave infrared band. The NDWI map values range from −0.41 to 0.43 ( Figure 4t).

Multi-Collinearity (MC) Analysis
The linear relationship between two or more variables in a dataset is known as multi-collinearity [58]. The multi-collinearity analysis is based on the linear dependency among several variables. In general, the MC analysis is carried out when two or more independent variables in a regression model are correlated. The MC analysis indicates a lack of orthogonality among the variables in a dataset. Orthogonal means that there is no linear relationship between the variables [59]. It is well known that even a small MC can create big problems in the dataset and, therefore, prediction results may become inaccurate. Several conditioning factors were used in the GESM. Therefore, it is necessary to identify the perfect relationship among the variables through the MC test. In general, MC occurs when there is a high correlation among the variables in a dataset, in which case it is necessary to remove those particular variables, otherwise, the prediction accuracy will be reduced.
In this study, the MC test was carried out using the tolerances (TOL) and variance inflation factors (VIF) techniques. These two techniques were calculated as follows: where R 2 j indicates the regression value of j on added variables in a given dataset. The threshold value of MC occurrence in a dataset is when VIF is >5 and TOL is <0.1.

Importance of GECFs by Random Forest (RF) and SWARA Weight
Random Forest (RF) The RF is a non-parametric multivariate statistical method. It is a popular ML algorithm which is defined as an ensemble of binary decision trees [60]. This algorithm was developed by Breiman [61] in 2001. Decision trees are incorporated into the RF algorithm by a random selection from the training dataset. During the formation of decision trees, the selection of features is important in the RF algorithm because the RF always tries to choose the most significant features [62]. The decision trees were produced independently during the training phase using the bagging approach [61]. In general, the bagging approach indicates that one sample got chances for more than one time, and others may not for at least one time [63]. The basic function of the RF algorithm is a random vector, i.e., i k , here GECF, is created separately and distributed among all the trees. Thereafter, using the training dataset every tree is grown. Finally, the tree structures of the random vector (i k ) classifiers represented by h(X, i k ), k = 1, 2, . . . n are combined for an input vector, i.e., X [64]. In general, the number of decision trees (Ntree) and number of variables (Mtry) are required parameters for RF classifiers [63]. The RF algorithm and its generalization error can be expressed as follows: mg(x, y) = av k I(h k (x) = y) − max j y av k I(h k (x) = j) where x and y represent GECFs that specify the probability over the x and y space, mg indicates the marginal function, i.e., the conditional probabilities of the model, and I( * ) represents the indicator function, i.e., described as a set X that indicates the membership of an element in a subset N of X, where the value of 1 indicates all elements of X in N and the value of 0 indicates all elements of X not in N [61]. Furthermore, two types of errors have been calculated in the RF model: The mean decrease in accuracy (MDA) and mean decrease in gini (MDG). The MDA is obtained from the calculation of the out-of-bag (OOB) error, conversely, the MDG is the measurement of the contribution of each variable to the homogeneity of the nodes and leaves. Both MDA and MDG have been used widely in many fields for ranking the variables' importance and the selection of variables [46,65]. In this study, the average of MDA and MDG has been used to measure the relative importance of gully causative factors, where the OOB error obtained is now 12% so that the model is also 88% accurate to predict the relative important variables.
Step-Wise Weight Assessment Ratio Analysis (SWARA) Weight assessment is an important technique often applied to solving many problems in different fields of the decision analysis study. In the past few decades, several weight assessment techniques have been developed such as the analytical hierarchy process (AHP) [66], analytical network process (ANP) [67], entropy [68], etc. In 2010, a new multi-criteria decision making method, called SWARA, was introduced. It is a step-by-step weight assessment procedure. In the SWARA approach, an expert opinion is much more important for determining the weights in each criterion. Here, an expert opinion was used to select criteria and rank them based on the experts' inherent knowledge, experience, information, and understanding. Thus, the most important criterion was given the highest rank, and the least important criterion was given the lowest rank [69]. The weighting parameters were evaluated using the SWARA method [70]. The SWARA method was carried out in the following steps [71,72]: i.
Ordering the criteria based on expert opinion. ii.
Endow with relative importance among the criteria. An expert gives the importance of the jth criteria with respect to the previous ( j − 1) criteria according to the average value (s j ) ratio. iii.
Determining the coefficient k j : iv.
Computation of the recalculated weight w j : v.
Computation of the final weights of the criteria:

Gully Erosion Susceptibility Modelling (GESM)
Boosted Regression Tree (BRT) The BRT is a ML algorithm that makes use of both statistical and data mining techniques. The main objective of this model is to improve the effectiveness by combining several fitting models [35]. Two different algorithms are used in this model, namely boosting and regression, whereby boosting is used to combine several fitting models, and regression is related to the algorithm of classification and regression tree (CART) class [73]. Boosting is a very dominant ML method for improving the model accuracy, and regression is used to categorize the classification system based on the decision trees group [74,75]. The primary disadvantage of using a single decision tree is that it categorizes the training dataset based on a single tree. Therefore, in this case, there is a probability to loss of information and a very weedy accuracy assessment in the result. Thus, boosting techniques were applied in the BRT model to remove the weakness of the single tree model [76]. Three parameters namely the number of boosting tree iterations, the tree depth interaction, and the shrinkage were used to run the BRT algorithm model [77]. The shrinkage parameter is needed to control the complexity of the model. The tree depth interaction in a BRT model is determined in the course of contributing the trees [78]. The algorithm used in the BRT model can be explained in the following way: BRT is based on prediction variables of X = {x 1 , . . . . . . x n } and the variable of response by y, in which the training sample of y i , X i , i = 1, . . . N of the known y and X values. By analyzing this, a function of F * (X) is determined that basically maps X to y. According to Friedman [79], of all the values of (y, X), the loss function may be minimized by using the following equation: The Gradient boosting approximates F(X) may be calculated as follows: where, g(X; α m ) represents the regression tree of a particular node, α m represents the tree parameters, and β m represents the coefficients. Thereafter, the BRT model can be run using the following equation: where h(x; m) represents the function of a classification with α parameters along with x variables, m represents several stages of the model of variables, and β m represents the coefficient in the stage of m. The BRT model of GESM has been developed using the gbm package in the R statistical software. The gbm.step function was used to fit the model, and the model was simplified by the gbm.simplify function reducing the number of explanatory variables. The BRT also depends on the number of regression trees generated, as in the RF process. Finally, the gully erosion probability was calculated for every pixel of the study area and then the database was formatted to make the map in the Arc GIS environment.

Bayesian Additive Regression Tree (BART)
The framework of the BART model is a nonparametric statistical method that determines the covariates relationship in a dataset. In recent times, this model has performed admirably in the predictive analysis with both continuous and binary datasets. In general, the tree-based ensemble method is used in the BART model, which is basically a Bayesian version of ML techniques [80]. The tree-based ensemble method has an advantage in regression analysis, which isthe fact that this method can fit the nonlinear functional relationship. Therefore, in addition to the aforementioned advantage, the BART model also includes an uncertainty measurement in the prediction results. This model also surpasses several ML algorithms such as random forest, boosting, neural nets, lasso, etc. In this research, we used the BART model to evaluate the reasonable predictive performance along with its uncertainty measurement through prediction intervals [81]. Response variables can be predicted in the BART model by using the Bayesian classification and regression tree (CART) methods [82]. This model also has the capacity to deal with a nonlinear relationship in a variable which is responsible for that model. Furthermore, when irrelevant regression variables are added to this model, it also maintains an excellent predictive result [81]. The following equation can be used to express the BART model: where, Y k represent the tree ensemble model of variables Y, g x k ; T j , M j is the decision tree of CART, T j is the decision tree j = 1 . . . . . . n (n = sum total number of trees in this model), M j is the parameter of the terminal node of T j , and ε k ∼ N 0, σ 2 in which, σ 2 is the residual variance. The Bayesian model is formed by prior distribution in a dataset. Chipman et al. [82] used the following prior distribution to form the Bayesian model: where, µ ij indicates the parameters of the terminal node of a known tree T j and their prior distribution is represented as follows: where, σ 0 = 0.5 e √ m and e indicates a hyper-parameter i.e., marginalization of several terminal node parameters.
Remote Sens. 2020, 12, 3620 18 of 38 The combination of Equations (19) and (20) can be used to obtain the posterior distribution of the BART model, where p(Y|X, T, M, σ) indicates the likelihood for the sum of trees and T indicates every tree in a given dataset such as T = (T 1 , . . . ., T n ). This is obtained by calculating p T j R j , σ αp T j p R j T j , σ , for accepting and rejecting a new tree from the whole tree system and creating a new set of terminal node parameters for the new tree by applying p M j T j , R j , σ . Therefore, the concluding algorithm for the BART model can be described as follows: where R j indicates a vector of partial residuals in this model excluding tree j. In the current work, with the aid of the R programme and R packages, the Bayesian additive regression trees were constructed. This typically results in a complicated decision tree that needs to be "pruned" to express only the most relevant data. The gully and non-gully data have been divided as nominal data, such as 1 and 0, and formed the regression tree for each raster pixel. Thus, the binary predictor of BART was converted into GESM in the Arc GIS environment.

Support Vector Regression (SVR)
The SVR model is a supervised ML algorithm primarily developed by Vapnik et al. [83], and it is widely used by several research groups throughout the world. The SVR algorithm is an adaptation of the newly developed ML algorithm based on support vector machines classification [84]. In general, this model is used to develop structure and control complex functions in a system. Additionally, in a training dataset, the SVR model can maximize the nominal margin through the optimal regression task [85]. The SVR model is generally applied when the dataset is too complex, and this algorithm has the capacity to solve this problem by creating numerous curved margins [86]. In the SVR model, the relationship between input and output variables can be recognized through the structural risk minimization (SRM) norm [87]. Therefore, the calculation of SRM is necessary, and it is done using the following Equations (13) and (14): where the input data is represented by z = (z 1 , z 2 , . . . z n ) and the resultant value is represented by y b ∈ R l . In addition, v ∈ R l indicates the weight factor, c ∈ R l indicates the constant number of the mathematical function, and l represents the dataset size in the model. ∅(z) indicates the non-linear function to map the input dataset. The following equation can be used to define v and c, and was developed based on the SRM principles: Minimize : where the penalty factor is P, which balances the model flatness and its risk, ζ b , ζ * b indicates slack variables within the model, and ε represents the optimized performance of the model [88]. The Lagrangian function can be used to solve the optimization problem using the following equation: In which the Lagrangian multipliers are represented by Thereafter, SVR can be calculated by: where the kernel function may be expressed by m(z, z b ) = φ(z), φ(z b ) . SVR models require that a vector of real numbers can represent any environmental parameter. The GECFs were derived and converted by GIS into a cell raster. To transfer the grid raster into the data format, an interface program has been developed by LibSVM. The binary classification has only provided the outputs 0 and 1, so we selected the primitive outputs of the core programme for Gaussian distribution.
Bee Algorithm (BA) The BA was developed to understand the foraging behavior of honeybees, and it is an optimization algorithm generally used to determine the optimal solution [89,90]. The BA considers three kinds of bees, namely the employed bees, onlooker bees, and scout bees. The BA algorithm is completed in five phases, i.e., the phases of initialization, employed bee, onlooker bee, scout bee, and, finally, memorization [91]. The food source information such as the place and amount of food is carried by employed bees to onlooker bees. After that, onlooker bees select the attractive food sources based on nectar contents and fitness, which they calculate. On the other side, scout bees seek new food sources and search several areas. In a nutshell, it can be said that the role of employed and onlooker bees is exploiting while the role of scout bees is exploring [91]. In BA, the possible solution of a problem can be solved through searching for food sources. Here, food sources are D-dimensional vectors in which D represents a number of variables. The fitness of the model is determined by the amount of nectar at a given food source. The following equation is used to calculate the fitness and its limitation: Subject to, where V * 1 represents the fundamental harmonic, S represents switching the angles number, and h s indicates the Sth harmonic variables at the three-phase multilevel output.

Ensemble of SVR and Bee Algorithm
Multiple single models that may be statistical or machine learning have been merged to develop the overall performance, which is known as theensemble approach. In machine learning, the ensemble approach has the capability to improve the prediction accuracy than any single stand-alone model. Thus, keeping in view the advantages of the ensemble approach, it has been widely used for a comprehensive analysis of several hazard-related problems. Several established research works have been used in various ensemble approaches for gully erosion studies with a high prediction accuracy [37,92]. Therefore, in this study, we have also applied the ensemble approach of SVR and the Bee algorithm. Many types of kernel functions were used in the SVR model. In this ensemble model, the Bee algorithm functions have been used instead of the primary kernel function with the SVR model. The ensemble approach in this study was carried out in the 'R' statistical programming package, which is freely available and is the most popular software. This ensemble approach is more optimal than the stand-alone machine learning algorithm used in this study for GESM.

Validation and Accuracy Assessment
We carried out a validation and accuracy assessment of several predictive outcomes of the ML models used in this study using the accuracy (ACC) assessment, receiver operating characteristic (ROC) area with the under the curve (AUC), true skill statistic (TSS), and the Kappa coefficient index analysis. All these validation techniques are described in the following section.
The ACC assessment is used to estimate the number of pixels of hazards and non-hazards in an area, in our study, it is used to determine the gully head-cut and non-gully head-cut pixels. The following four indices of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) were used to assess the ACC [93]. Among the aforementioned four indices, TP and FP correctly represented the gully head-cut and non-gully head-cut pixels, while TN and FN incorrectly represented the gully head-cut and non-gully head-cut pixels [94]. The ACC result can be calculated using the equation: The success of a model, be it a statistical or ML model, can be predicted through the ROC-AUC analysis as it estimates the occurrence and non-occurrence of gully head-cuts on an X and Y axis [93,95], whereby the X axis signifies the sensitivity of the true positive rate (TPR) and the Y axis signifies 1-specificity of the false positive rate (FPR). The sensitivity and 1-specificity were correctly classified as gully head-cuts and non-gully head-cuts susceptibility zones. In a ROC-AUC analysis, training and validation datasets were used to verify the model goodness and prediction ability, respectively [96]. The lower and higher values of a ROC-AUC analysis are 0.5 and 1, whereby the lower (0.5) value indicates poor performance and the higher (1) value indicates a very good performance of the model. The ROC-AUC can be calculated by: where P and N represent the presence and absence of gully head-cuts, respectively. TSS is a statistical threshold dependent assessment matrix [97]. This validation method is based on the equal proportions of the events and non-events phenomenon in a validation dataset. The value of TSS ranges from +1 to −1, whereby +1 represents a good performance of the model and a value of less than zero indicates a poor performance of the model. TSS was estimated by: The reliability of several ML models used in GESM was validated through the Kappa coefficient index. When the Kappa value is close to −1, then the model is unreliable, and if the Kappa value is close to +1, then the model is reliable. In this model, the relationship between the estimated and observed values is poor when the Kappa value is <0 [98]. The overall Kappa value ranges can be divided into the following five groups: Slight (0-0.2), fair (0.2-0.4), moderate (0.4-0.6), substantial (0.6-0.8), and conditions (0.8-1) [99]. The following equation was used to calculate the Kappa coefficient index:

Multi-Collinearity Analysis
To stay within the VIF and TOL limit, the 20 gully causative factors that showed no MC were selected and the three variables (distance to road, topographic ruggedness index, and distance from fault) that had co-linearity problems were removed. The selected variables and their multi-collinearity results are shown in Table 2. The ranges of the TOL of the selected variables are 0.244 to 0.886, and the maximum and minimum VIF are 4.16 and 1.12, which indicate no multi-collinearity between the conditioning factors of gully erosion.

Exploratory Data Analysis
A total of 20 gully formation variables were scrutinized for this gully erosion susceptibility modelling. Each variable was filtered by the multi-collinearity analysis. Figure 5 presents the association between the training dataset (gully and non-gully data) and each variable. The slope mainly varies between 0 and 43 degrees in the Gandheswari watershed, but the gully locations are mainly located in the under 5-degree slope areas. The north, northeast, east, and southeast slope direction are the most prone to gully erosion. There is a significant effect of the topographic elevation on gully erosion, whereby the moderate elevation of 11 to 141 m was found to have the highest concentration of gullies. The flat profile curvature with low convex and concave (0 to 0.8 and −0.8) slope areas is also associated with gully locations. In terms of the plan curvature, about 50% of the training gully locations belong to the flat to slightly convex or concave plan curvature (0 to 1.28 and −1.28). Although the TWI ranges from 7 to 28, 90% of the gully locations lie within the TWI range of 9 to 18. The low to medium (2 to 10) SPI is associated with most of the gully locations. The 48 training gully locations are located in the very low drainage density (0 to 1 km 2 ) regions, and the rest of the gullies fall into the 1 to 4 sq. km drainage densities. Most of the gullies are found near the river, and the gullies slowly decrease with the increasing distance to the drainages. Being a small watershed, the variation of rainfall is a minute and gullies are found in the different rainfall ranges, but a low rainfall is mostly associated with gully locations. The NDVI value of 0.06 to 0.17 is found at about 80% of the gully locations, which means that the sparse vegetation areas are strongly associated with gully erosion. The moderate iron oxide (1.2 to 1.3), FMI (1.10 to 1.23), NDWI (0.06 to 0.24), and BSI (−5 to 9) areas of the watershed were found to contain most of the gullies. The severe soil erodibility (0.24 to 0.39) areas have most of the gullies. The qualitative variables of gully erosion are geology, geomorphology, LULC, and soil texture. The Gandheswari watershed is mainly situated on the geological formation of the Chotanagpur gneissic complex, and all the gullies were found in this formation. Ninety-six percent of the gullies were found in the geomorphological unit of the pediment pediplain complex. In addition, the cropland and the gravelly loam soil textural class are associated with most of the gully locations.

Exploratory Data Analysis
A total of 20 gully formation variables were scrutinized for this gully erosion susceptibility modelling. Each variable was filtered by the multi-collinearity analysis. Figure 5 presents the association between the training dataset (gully and non-gully data) and each variable. The slope mainly varies between 0 and 43 degrees in the Gandheswari watershed, but the gully locations are mainly located in the under 5-degree slope areas. The north, northeast, east, and southeast slope direction are the most prone to gully erosion. There is a significant effect of the topographic elevation on gully erosion, whereby the moderate elevation of 11 to 141 m was found to have the highest concentration of gullies. The flat profile curvature with low convex and concave (0 to 0.8 and −0.8) slope areas is also associated with gully locations. In terms of the plan curvature, about 50% of the training gully locations belong to the flat to slightly convex or concave plan curvature (0 to 1.28 and −1.28). Although the TWI ranges from 7 to 28, 90% of the gully locations lie within the TWI range of 9 to 18. The low to medium (2 to 10) SPI is associated with most of the gully locations. The 48 training gully locations are located in the very low drainage density (0 to 1 km 2 ) regions, and the rest of the gullies fall into the 1 to 4 sq. km drainage densities. Most of the gullies are found near the river, and the gullies slowly decrease with the increasing distance to the drainages. Being a small watershed, the variation of rainfall is a minute and gullies are found in the different rainfall ranges, but a low rainfall is mostly associated with gully locations. The NDVI value of 0.06 to 0.17 is found at about 80% of the gully locations, which means that the sparse vegetation areas are strongly associated with gully erosion. The moderate iron oxide (1.2 to 1.3), FMI (1.10 to 1.23), NDWI (0.06 to 0.24), and BSI (−5 to 9) areas of the watershed were found to contain most of the gullies. The severe soil erodibility (0.24 to 0.39) areas have most of the gullies. The qualitative variables of gully erosion are geology, geomorphology, LULC, and soil texture. The Gandheswari watershed is mainly situated on the geological formation of the Chotanagpur gneissic complex, and all the gullies were found in this formation. Ninety-six percent of the gullies were found in the geomorphological unit of the pediment pediplain complex. In addition, the cropland and the gravelly loam soil textural class are associated with most of the gully locations.

Spatial Mapping of Gully Erosion Susceptibility
Gully erosion susceptibility maps were created by applying the machine learning and the ensemble models for the Gandheswari watershed. The gully erosion susceptibility maps from all the mentioned models are presented in Figure 5. The maps have been divided into five different susceptibility zones using the natural break methods, and the same symbol is used for each zone of maps. These are veryhigh, high, medium, low, and verylow. All the models have shown consistency between the different susceptibility zones in the four models. The coverage of the very high, high, medium, low, and very low gully erosion susceptibility areas of the BRT model are 21.37, 19.95, 19.79, 19.50, and 19.39% (Figures 6a and 7). The coverage of the very high, high, medium, low, and very low gully erosion susceptibility areas in the BART model are 19.82, 21.50, 19.67, 19.54, and 19.46% (Figures 6b and 7). The BRT and BART model showed more or less the same areal coverage of the gully erosion susceptibility zones. In the case of the SVR model, the coverage for very high, high, medium, low, and very low susceptibility areas are 9.21, 22.19, 30.03, 26.44, and 12.13%, respectively (Figures 5c and 7). The maximum portion of the gully erosion susceptibility area are occupied by the medium and high zone, while the very low and very high zones are occupied by the small portion of the study area. In the ensemble model, the coverage of the very low, low, medium, high, and very high gully erosion susceptibility areas is 13.70, 27.21, 28.93, 21.14, and 9.02%, respectively (Figures 6d and 7). In addition, the maximum portion of the study area is occupied by the low, medium, and high susceptible zones. All the maps showed the upper part of the watershed to be highly susceptible to gully erosion, while the southern part and the mouth of the basin indicated low gully erosion susceptibility.

Spatial Mapping of Gully Erosion Susceptibility
Gully erosion susceptibility maps were created by applying the machine learning and the ensemble models for the Gandheswari watershed. The gully erosion susceptibility maps from all the mentioned models are presented in Figure 5. The maps have been divided into five different susceptibility zones using the natural break methods, and the same symbol is used for each zone of maps. These are veryhigh, high, medium, low, and verylow. All the models have shown consistency between the different susceptibility zones in the four models. The coverage of the very high, high, medium, low, and very low gully erosion susceptibility areas of the BRT model are 21.37, 19.95, 19.79, 19.50, and 19.39% (Figures 6a and 7). The coverage of the very high, high, medium, low, and very low gully erosion susceptibility areas in the BART model are 19.82, 21.50, 19.67, 19.54, and 19.46% (Figures 6b  and 7). The BRT and BART model showed more or less the same areal coverage of the gully erosion susceptibility zones. In the case of the SVR model, the coverage for very high, high, medium, low, and very low susceptibility areas are 9.21, 22.19, 30.03, 26.44, and 12.13%, respectively (Figures 5c and  7). The maximum portion of the gully erosion susceptibility area are occupied by the medium and high zone, while the very low and very high zones are occupied by the small portion of the study area. In the ensemble model, the coverage of the very low, low, medium, high, and very high gully erosion susceptibility areas is 13.70, 27.21, 28.93, 21.14, and 9.02%, respectively (Figures 6d and 7). In addition, the maximum portion of the study area is occupied by the low, medium, and high susceptible zones. All the maps showed the upper part of the watershed to be highly susceptible to gully erosion, while the southern part and the mouth of the basin indicated low gully erosion susceptibility. Remote Sens. 2020, 11, x FOR PEER REVIEW 25 of 40

Model Evaluation Performance Result
The AUC ranges from 0.5 to 1, whereby values closer to 0.1 indicate that the model has provided a perfect prediction, while values closer to 0.5 indicate that there is a problem with model fitting. Other model diagnostic tools such as Acc, TSS, and Kappa have also been used to find a suitable robust gully erosion susceptibility model (Figure 8). In this study, the models were evaluated by both the training and validation datasets of gully and non-gully locations to measure the success and prediction performance of the model. The evaluation result of the model performances in the training stage and testing stage for all the machine learning and the ensemble model are summarized in Figure 8. All the evaluation results indicated that the BAR, BART, SVR, and ensemble models perform well and that the data were sufficient for training and validating the gully data. The ensemble model is the most robust model in this study and yielded the highest AUC, Acc, TSS, and Kappa in the training (AUC, 0.960; Acc, 0.850; TSS, 0.590; Kappa, 0.641) and validation stage (AUC, 0.917; Acc, 0.801; TSS, 0.607; Kappa, 0.541). The AUC value of the ensemble model when considering the gully training points is 0.960, which indicates a high success capacity. The rest of the models also yielded the optimal accuracy, coming close to that of the ensemble model. The AUC of the SVR, BART, and BRT is 0.927, 0.902, and 89.5, respectively (Figure 9a). When we considered the validation dataset, the prediction rate capacity of ROC indicates the same trend of model robustness (Figure 9b). The values of Acc in the ensemble, SVR, BART, and BRT for the training datasets are 0.85, 0.792, 0.751, and 0.684 and the Acc values for the validation datasets are 0.801, 0.721, 0.674, and 0.622, respectively. The evaluation results of TSS and Kappa also showed the same trend of AUC and Acc in the training and testing stage. Therefore, it can be concluded that both predictions, the (testing dataset) and success (training dataset) rate showed little variation of results and a good accuracy of the models, which were deemed to produce realistic and accurate gully erosion susceptibility maps for delineating the gully erosion areas.

Model Evaluation Performance Result
The AUC ranges from 0.5 to 1, whereby values closer to 0.1 indicate that the model has provided a perfect prediction, while values closer to 0.5 indicate that there is a problem with model fitting.
Other model diagnostic tools such as Acc, TSS, and Kappa have also been used to find a suitable robust gully erosion susceptibility model (Figure 8). In this study, the models were evaluated by both the training and validation datasets of gully and non-gully locations to measure the success and prediction performance of the model. The evaluation result of the model performances in the training stage and testing stage for all the machine learning and the ensemble model are summarized in Figure 8. All the evaluation results indicated that the BAR, BART, SVR, and ensemble models perform well and that the data were sufficient for training and validating the gully data. The ensemble model is the most robust model in this study and yielded the highest AUC, Acc, TSS, and Kappa in the training (AUC, 0.960; Acc, 0.850; TSS, 0.590; Kappa, 0.641) and validation stage (AUC, 0.917; Acc, 0.801; TSS, 0.607; Kappa, 0.541). The AUC value of the ensemble model when considering the gully training points is 0.960, which indicates a high success capacity. The rest of the models also yielded the optimal accuracy, coming close to that of the ensemble model. The AUC of the SVR, BART, and BRT is 0.927, 0.902, and 89.5, respectively (Figure 9a). When we considered the validation dataset, the prediction rate capacity of ROC indicates the same trend of model robustness (Figure 9b). The values of Acc in the ensemble, SVR, BART, and BRT for the training datasets are 0.85, 0.792, 0.751, and 0.684 and the Acc values for the validation datasets are 0.801, 0.721, 0.674, and 0.622, respectively. The evaluation results of TSS and Kappa also showed the same trend of AUC and Acc in the training and testing stage. Therefore, it can be concluded that both predictions, the (testing dataset) and success (training dataset) rate showed little variation of results and a good accuracy of the models, which were deemed to produce realistic and accurate gully erosion susceptibility maps for delineating the gully erosion areas. Remote Sens. 2020, 11, x FOR PEER REVIEW 27 of 40   Figure 10 shows the two-dimensional graphical presentation of the observed gully erosion and the predictive gully erosion susceptibility for the BRT, BART, SVR, and ensemble models on a Taylor diagram. The diagram summarizes the multiple statistical aspects of the model performance in a single diagram (Taylor, 2001). The Taylor diagram represents the connection between the predicted and observed gully erosion in the Gandheswari watershed. All the machine learning and the ensemble models are similar in their ability to predict the gully erosion, but the ensemble models   Figure 10 shows the two-dimensional graphical presentation of the observed gully erosion and the predictive gully erosion susceptibility for the BRT, BART, SVR, and ensemble models on a Taylor diagram. The diagram summarizes the multiple statistical aspects of the model performance in a single diagram (Taylor, 2001). The Taylor diagram represents the connection between the predicted and observed gully erosion in the Gandheswari watershed. All the machine learning and the ensemble models are similar in their ability to predict the gully erosion, but the ensemble models  Figure 10 shows the two-dimensional graphical presentation of the observed gully erosion and the predictive gully erosion susceptibility for the BRT, BART, SVR, and ensemble models on a Taylor diagram. The diagram summarizes the multiple statistical aspects of the model performance in a single diagram (Taylor, 2001). The Taylor diagram represents the connection between the predicted and observed gully erosion in the Gandheswari watershed. All the machine learning and the ensemble models are similar in their ability to predict the gully erosion, but the ensemble models yielded the highest correlation and lowest standard deviation. Therefore, the ensemble gully erosion susceptibility model is the most robust model. yielded the highest correlation and lowest standard deviation. Therefore, the ensemble gully erosion susceptibility model is the most robust model.

Relative Importance of the Variables
The predisposing factors of gully erosion susceptibility modelling have been selected by considering the various available literature and physical characteristics of the study area, and then the multi-collinearity analysis for these factors was carried out. The assessment of important variables of gully erosion was done through the mean decrease accuracy, using the RF model, as shown in Table 3. The high mean decrease in accuracy represents the high importance of the input variables of the gully erosion susceptibility in this RF calculation. According to Table 3

Relative Importance of the Variables
The predisposing factors of gully erosion susceptibility modelling have been selected by considering the various available literature and physical characteristics of the study area, and then the multi-collinearity analysis for these factors was carried out. The assessment of important variables of gully erosion was done through the mean decrease accuracy, using the RF model, as shown in Table 3. The high mean decrease in accuracy represents the high importance of the input variables of the gully erosion susceptibility in this RF calculation. According to Table 3

Relative Importance of Sub-Classes of the Variables
Many gully erosion susceptibility studies showed only the assessment of important variables affecting gully erosion, but it is also significant to identify the sub-factors or sub-classes of each conditioning factor of gully erosion. The relationships of gully erosion with each sub-factor of gully conditioning factors were obtained from the SWARA weight analysis and are shown in Table 4. The SWARA weights of the sub-factors of each conditioning factor show that the high (5.75 to 15.24) slope areas are associated with a higher weight (0.41) and very likely prone to gully erosion. A very high slope (15.24 to 43.18) was not associated with gully formation, but the weight decreased with medium to very low slope areas. In terms of the aspect, all directions of the slope are associated with gully formation except the flat (0.05), west (0.06), and northeast (0.06) slope directions. The medium elevation class (80 to 141 m) was weighted as 0.61, which means it is strongly associated with the occurrence of gullies in the study area. Higher (0.28 to 0.84) classes of profile curvature are associated with an increased likelihood to affect gully erosion based on the weighting of 0.40. The five classes of plan curvature are associated with a high probability to affect gully formation, and the very high (0.86 to 2.96) class has the highest weight (0.36). High classes of TWI (14.05 to 17.24) and SPI (7.57 to 10.69) are associated with the maximum probability of gully erosion, and the weights of the components are 0.41 and 0.52, respectively. The very low drainage density (0 to 0.66 km/sq.km) class has a very high (0.36) probability of gully erosion, while the highest class of distance to the river (2172.30 to 3794.09 m) has a high SWARA weight (0.43). A low and medium rainfall (515.62 to 522.17 mm) is most favorable for gully formation, and gully erosion and the weights of the two classes are 0.37 and 0.39, respectively. The shrubland areas are prone to gully erosion because shrubland is often associated with barren land and the SWARA weight of the shrubland is 0.56. The NDVI is the most important factor affecting gully erosion, and in our study area, the NDVI ranges from −0.10 to 0.31 with the very high class being the most favorable to gully erosion with a weight of 0.34. Most of the area (96% of the study area) is covered by the Chotanagpur gneissic complex geological formation, so this geological formation has received the highest weight. The gravelly loam soil texture group with very low soil erodibility (0.00-0.13) has high weights (0.42 and 0.79), meaning that these areas are most favorable for gully erosion. Therefore, the dissected denudational hills and valleys are an effective geomorphological unit for probable gully erosion. Accordingly, low to very high FMI (0.95 to 1.28) and high to very high (1.29 to 2.04) iron oxide classes are considered being more favorable to gully erosion. The medium BSI covers 99% of the study area, so this class automatically receives the highest weight (1.0). The low NDWI (−0.41 to 0.10) class received a higher SWARA weight (0.31), so it is associated with gully erosion.

Discussion
The gully erosion susceptibility assessment is crucial for the preparation of erosion control and mitigation measures [100]. To obtain reliable and highly accurate maps of gully erosion susceptibility is a challenge for planners and managers [101]. To overcome the challenges, researchers have tried to propose novel susceptibility models and tested them in gully prone regions around the world. Different approaches, procedures, and models for the spatial prediction of natural hazards and disasters have been developed, and these models have been implemented around the world. The aims of all these methods are the same in all approaches [41]. For the past decades, many statistical and heuristic susceptibility methods have been used to understand different environmental hazards [38,102,103]. However, these above-mentioned susceptibility models have some limitations in their ability to analyze the multifarious connection between predictors and response, and, very recently, data mining/machine learning ensemble models have been developed for gully erosion susceptibility modelling [48,104,105]. In this paper, we proposed and evaluated three machine learning models and one ensemble model for gully erosion susceptibility modelling with the highest possible precision and suggested the most suitable model for the Gandheswari watershed. The proposed models are the BRT, BART, SVR, and the ensemble model of the Bee and SVR algorithm. Prediction, categorization, clustering, and elaboration of environmental hazards' data were assessed in the application of the machine learning approaches [41,106].
Gully erosion is controlled by several geo-environmental factors [107]. Therefore, the variables of the gully erosion susceptibility modelling have been incorporated based on suggested approaches in similar published research and the geo-environmental characteristics of the study area. The topographic, hydrological, geological, and anthropogenic (land use) attributes are the most important variables for the development of gullies around the world [107][108][109].
The spatial resolution of the input variables greatly affects the output result, and many topographical factors were derived from the DEM. We used a 12.5 m spatial resolution ALOS PALSAR DEM [109]. We selected the 20 individual gully conditioning factors based on the outcome of the multi-collinearity assessment. The multi-collinearity assessment of factors based on the TOL and VIF is the best method to check for a collinearity between the variables because this multi-collinearity check can increase the accuracy of the model [23]. According to the relative importance variables based on the mean decrease accuracy (MDA), the NDVI is the most important factor followed by the plan curvature. Other factors such as SPI, geology, drainage density, geomorphology, and elevation are responsible for gully erosion in this watershed. We examined the various available literature regarding the gully erosion susceptibility model and previous authors suggest that the relative importance of conditioning factors for gully erosion are area-specific and are thus, not transferable to other regions [100]. For some examples from eastern India, Saha et al. [41] identified the elevation, rainfall, NDVI, LULC, and slope as the most important factors, whereas Gayen et al. [110] reported that the LULC, drainage density, elevation, soil type, and distance from lineament are the most effective factors contributing to gully erosion. The SWARA weight is one of the methods used to define the weight of the sub-classes of the factors that play a dominant role in the modelling process [69]. The role of each individual class of each causal gully erosion factor is shown in Table 4.
To analyze the multifunction connection between the response (gully) and the predictors (gully conditioning factors), machine learning ensemble approaches were used for their ability to produce robust gully erosion susceptibility models [91]. The BRT models used two algorithms, i.e., boosting and regression, and, in this study, we used it for the spatial prediction of the gully erosion, but it has also been applied for many environmental hazards. Zabihi et al. [111] found that the BRT model is better for gully erosion susceptibility than the multivariate adaptive regression spline (MARS) model.BART, which is a newly invented tree classifier approach, provides admirable performance in both binary and continuous datasets and was successfully applied in this study [112]. The other interesting supervised machine learning algorithm is SVR, which is widely used in many susceptibility models, such as landslide susceptibility and groundwater potentiality [36,113]. In this study, the SVR model successfully predicts gully erosion. We applied the Bee algorithm with the SVR model and established a novel ensemble model, which yielded the best accuracy compared to the gully erosion susceptibility models based on the three stand-alone machine learning algorithms.
Furthermore, to compare the performance and robustness of the model, several researchers applied the AUC of ROC curve, accuracy, TSS, and Kappa index [37][38][39]. Each gully susceptibility model was evaluated by the above model evaluator in the training and testing stage. The results demonstrated that all the models performed well, but that the ensemble models have provided the best prediction of gully erosion. Several gully erosion susceptibility studies found that the ensemble algorithm of the machine learning model is the most suitable model [18,114]. A previous gully erosion susceptibility study in the Gandheswari watershed indicates that the multi-layer perception approach (MLPC) and its ensembles (MLPC-Bagging, MLPC-Dagging, and MLPC-Decorate) yielded the best result, but that the MLPC-Decorate is the best ensemble model with an AUC of ROC curve of 0.906 [23].

Conclusions
Various types of soil erosion processes can accelerate land degradation, and have a negative impact on agriculture. There are various types of soil erosion processes occurring at the surface, which are sheet erosion, rills, gullies, ravines, etc. Globally, gully erosion, or the development of gullies, is one of the most destructive processes of land degradation. In this study, the objective was the gully erosion susceptibility modelling and mapping for the Gandheswari watershed. Stand-alone and ensemble classifier models with gully inventory datasets and gully conditioning factors were successfully used to assess the gully erosion susceptibility. The AUC of ROC curve, accuracy, TSS, and Kappa index were employed to assess and compare the gully erosion susceptibility models. The results of the model evaluation from the training and validation gully inventory datasets have shown that the gully susceptibility models prepared based on the BRT, BART, SVR, and ensemble approaches performed best in terms of fitting and the prediction capability of the model. The output susceptibility maps and the model performance evaluation result indicate that the ensemble of the SVR-Bee model is the best-fit model, and it is most capable of predicting the occurrence of gullies in the study area. The relative importance result of the gully causative factors and their sub-classes showed that the NDVI and plan curvature are the most important factors. All four gully erosion susceptibility maps reveal a good consistency between the spatial zones of the different susceptibilities of gully erosion. Furthermore, the most valuable outcomes of the study are the gully susceptibility maps, which will help the local administrators, decision-makers, and planners manage land degradation and land use planning. In addition, the new susceptibility methods may help further research addressing various environmental hazards such as landslides, flooding, and soil erosion. Since no study is without any limitations, we need to emphasize that this study has not included the full hydrological modelling of gully systems, which is a challenging task for future studies.