Comparison of Support Vector Machine, Bayesian Logistic Regression, and Alternating Decision Tree Algorithms for Shallow Landslide Susceptibility Mapping along a Mountainous Road in the West of Iran

: This paper aims to apply and compare the performance of the three machine learning algorithms–support vector machine (SVM), bayesian logistic regression (BLR), and alternating decision tree (ADTree)–to map landslide susceptibility along the mountainous road of the Salavat Abad saddle, Kurdistan province, Iran. We identiﬁed 66 shallow landslide locations, based on ﬁeld surveys, by recording the locations of the landslides by a global position System (GPS), Google Earth imagery and black-and-white aerial photographs (scale 1: 20,000) and 19 landslide conditioning factors, then tested these factors using the information gain ratio (IGR) technique. We checked the validity of the models using statistical metrics, including sensitivity, speciﬁcity, accuracy, kappa, root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC). We found that, although all three machine learning algorithms yielded excellent performance, the SVM algorithm (AUC = 0.984) slightly outperformed the BLR (AUC = 0.980), and ADTree (AUC = 0.977) algorithms. We observed that not only all three algorithms are useful and e ﬀ ective tools for identifying shallow landslide-prone areas but also the BLR algorithm can be used such as the SVM algorithm as a soft computing benchmark algorithm to check the performance of the models in future.

All machine learning algorithms must be tested and validated in landslide-prone areas to select those with the highest performance and prediction accuracy. Therefore, the main aim of this study is to compare the efficiency of BLR, SVM, and ADTree algorithms to landslide susceptibility along a road section using in Kurdistan province, Iran. BLR is an algorithm that is a combination of a basebased theory algorithm and a logistic regression function. However, the ADTree is a decision tree algorithm. Its performance on landslide modeling and susceptibility modeling has been earlier confirmed and suggested [51,53,[76][77][78][79]. Therefore, in this study, we aim to compare the performance of a functional-based algorithm, SVM, a bayes-based theory algorithm, BLR, with a decision treebased algorithm, ADTree, for shallow landslide susceptibility modeling in the study area. SVM, in particular, can handle complex and non-linear datasets [35], and thus, is a robust benchmark model that has been successfully used in landslide susceptibility mapping. This study is a pioneering step in the application of advanced predictive machine learning algorithms in landslide susceptibility research in the study area Another objective is to check the ability of the BLR and ADTree algorithm as the benchmark models, such as the SVM in landslide susceptibility mapping.

Study Area
The Salavat Abad saddle is located in southwest Kurdistan province, Iran ( Figure 1). The study area covers about 18.7 km 2 and ranges in elevation from 1699 to 2500 m above sea level [19]. A road through the saddle, which connects Sanandaj City to Tehran, the capital of Iran, has strategic, economic, and socio-cultural importance. Much of Kurdistan province is located in the Zagros Mountains, a tectonically active range dominated by sedimentary and volcanic rocks [80].
The climate of the study area is influenced by warm Mediterranean air masses, resulting in rainfall and snowfall in winter, with an average precipitation of about 470 mm [19]. Many costly and fatal mass movements occur in the winter season.  The climate of the study area is influenced by warm Mediterranean air masses, resulting in rainfall and snowfall in winter, with an average precipitation of about 470 mm [19]. Many costly and fatal mass movements occur in the winter season.

Landslide Inventory Map
The dataset for this study comprises 66 landslides previously mapped by the Forest, Rangeland, and Watershed Management Organization of Iran [19]. We examined the landslides by reviewing aerial photographs (1:40,000 scale) and Google Earth image, and by inspection in the field. Most of the landslides are the result of the slope modification of slopes due to road construction ( Figure 2). In this study, landslide bodies were converted into the points (central points) and each polygon of landslides was considered as one landslide location that was applied for the modeling procedure.

Landslide Inventory Map
The dataset for this study comprises 66 landslides previously mapped by the Forest, Rangeland, and Watershed Management Organization of Iran [19]. We examined the landslides by reviewing aerial photographs (1:40,000 scale) and Google Earth image, and by inspection in the field. Most of the landslides are the result of the slope modification of slopes due to road construction ( Figure 2). In this study, landslide bodies were converted into the points (central points) and each polygon of landslides was considered as one landslide location that was applied for the modeling procedure.

Landslide Conditioning Factors
Based on the literature, data availability, and our experience, we selected 18 landslide conditioning factors for modelling: Slope angle, slope aspect, elevation, distance to road, topographic wetness index (TWI), normalized difference vegetation index (NDVI), lithology, land use/land cover, rainfall, distance to fault, plan curvature, profile curvature, slope length-angle index (LS), solar radiation, stream power index (SPI), distance to the river, river density, and fault density. The factors are described briefly in the following subsections:

Landslide Conditioning Factors
Based on the literature, data availability, and our experience, we selected 18 landslide conditioning factors for modelling: Slope angle, slope aspect, elevation, distance to road, topographic wetness index (TWI), normalized difference vegetation index (NDVI), lithology, land use/land cover, rainfall, distance to fault, plan curvature, profile curvature, slope length-angle index (LS), solar radiation, stream power index (SPI), distance to the river, river density, and fault density. The factors are described briefly in the following subsections:

Slope Aspect
The slope aspect is defined as the cardinal direction of the maximum slope [81]. We extracted nine slope aspect classes from the DEM with a resolution of 12.5 m, obtained from Advanced Land Observing Satellite (ALOS) Phased Array L-type Synthetic Aperture Radar (PALSAR): North, northeast, northwest, east, southeast, southwest, south, west, and flat. Most landslides are located on the southwest-and east-facing slopes (Figure 3b).

Elevation
An elevation map was extracted from DEM, and values placed in five categories using the natural break classification method: 1557-1751, 1751-1917, 1917-2096, 2096-2300, and 2300-2515 m asl. Nearly half of the landslides are located in the lowest elevation class (Figure 3c).

Topographic Wetness Index
The TWI introduced by Beven and Kirkby [82] in rainfall-runoff modeling to identify the impact of topography and wetness on rates of runoff. It can be computed as follows, where χ is the specific catchment area and γ is the slope angle (in degree). We created a TWI layer and defined five categories using the natural break classification method: <6, 6-7, 7-8, 8-9, and >9. Most landslides are within the TWI > 9 class (Figure 3e).

Normalized Difference Vegetation Index
Normalized difference vegetation index (NDVI) provides a measure of vegetation within an area [57]. NDVI can be formulated as follows, where Red and NIR are the red and near-infrared bands, respectively. The NDVI map was generated using Landsat 8 OLI from 2017. Our NDVI map is shown in Figure 3f.

Lithology
A geology map of the study area at a scale of 1:100,000, obtained from Geological Surveys of Iran (GSI). The description of the lithological units are shown in Table 1, and its categories are shown in Figure 3g.

Land Cover/Land Use
Our field survey indicated that most of the landslides in the study area have happened near the road where the vegetation has been removed. In this study, we extracted a land cover/land use map from the Kurdistan province land cover map printed at a scale of 1:100,000 (Figure 3h).

Rainfall
A mean annual rainfall map was prepared from data acquired from eight meteorological stations in and around the study area using the IDW (Inverse distance weighted) interpolation method. We defined five categories: 413-419, 419-422, 422-426, 426-430, and >430 mm (Figure 3i).

Distance to Fault
A map of fault distances was extracted from the geology map. We defined five categories based on the manual classification method: 0-50, 50-100, 100-150, 150-200, and >200 m (Figure 3j). Plan curvature provides a measure of convergence or divergence of runoff on slopes [83]. Values can be positive (concave curvature), negative (convex curvature), or zero (flat slopes. A plan curvature layer was extracted from the DEM and divided into five categories using the natural break classification

Profile Curvature
Profile curvature can affect the velocity of runoff and thus erosion [48]. We extracted the profile curvature from the DEM and created five categories using the natural break classification method:

Stream Power Index (SPI)
SPI can be formulated as follows [84], where A r is the specific catchment area and γ is the slope angle. We created an SPI map from the DEM in SGAG software and then defined five categories based on the natural breaks classification method:

Distance to the River
A layer of river distance was created based on mapped rivers in the study area. We defined five categories based on the manual classification method: 0-50, 50-100, 100-150, 150-200, and >200 m ( Figure 3p).

Fault Density
The fault density matrix is defined as the total length of the faults within a standard area of 1 km 2 [85]. We prepared this map based on mapped faults and created five categories using the natural break classification method: 0-2, 2-4, 4-6, 6-8, and >8 km/km 2 ( Figure 3r).

Support Vector Machine
Support vector machine (SVM) is one of the machine learning methods used for classification and regression [86]. The main objective of the algorithm is to classify data with the highest confidence margin using linear data sorting. It maps input data to a much higher level using the Phi function on a training dataset. A linear equation called the 'surface separator' separates the data into two classes (in the case of this study, landslide, and non-landslide). SVM minimizes error by classifying and separating data with the help of a separator-hyperplane. Training points near the line of separation are termed 'surface vectors' [87].
Consider set X i , which includes linear training data i = (0, 2, 3, . . . , n), referred to as training vectors. The training vectors contain two classes denoted by y i = ±1. The support vector machine maximizes the two datasets by finding an n-dimensional hyperplane (Figure 4), expressed as follows, with the following condition, where w is the normal separator hyperplane, b is a scalable datum, and (.) signifies a multiplication operation. The following is obtained using Lagrangian coefficients of cost, where λ i is the Lagrangian multiplier. Equation (7) can be minimized by using the w and b ratios as a standard. For cases that are noisy and indistinguishable ( Figure 4), a variable ξ i can be used as a weak meaning (slack variables ξ i ), in which case Equation (8) becomes: Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 28

Bayesian Logistic Regression
Bayesian logistic regression (BLR) has been used with a two-state dependent variable about effective factors of landslides [88]. With this method, a logistic regression model is created based on the relations between dependent and independent variables. Then, a Bayesian function is applied based on the behavior and response of the effective factors using a prior probability function [88]. A Bayesian function is created in three stages, as follows [88]: (1) Determine the prior probability of parameters (2) Determine the likelihood function for data (3) Create a posterior distribution function for parameters If x is a training dataset and x = (x 1 , x 2 , … , x n ), landslide conditioning factors, and y = (y1, y2) is a dependent variable (landslides and non-landslides), a logistic function obtains the posterior probability function for samples belonging to a specific class, where xi are the effective factors, c is the prior log odds ratio (c = log ), and b is the bias. w0

Bayesian Logistic Regression
Bayesian logistic regression (BLR) has been used with a two-state dependent variable about effective factors of landslides [88]. With this method, a logistic regression model is created based on the relations between dependent and independent variables. Then, a Bayesian function is applied based on the behavior and response of the effective factors using a prior probability function [88]. A Bayesian function is created in three stages, as follows [88]: (1) Determine the prior probability of parameters (2) Determine the likelihood function for data (3) Create a posterior distribution function for parameters If x is a training dataset and x = (x 1 , x 2 , . . . , x n ), landslide conditioning factors, and y = (y1, y2) is a dependent variable (landslides and non-landslides), a logistic function obtains the posterior probability function for samples belonging to a specific class, where x i are the effective factors, c is the prior log odds ratio (c = log P(class=0) P(class=1) ), and b is the bias. w 0 and w i are the weights trained by training data, and i th factors of xi are used to calculate the f(x i ) function using log P(x i |class =0) P(x i |class =1) (for binary variables). A prior univariate Gaussian function is used to calculate weights in Bayesian-logistic regression model, where '0 and 'σ i ' are the data average, and variance, respectively [89].

Alternating Decision Tree
The alternating decision tree (ADTree) algorithm uses the rules of a tree algorithm for classification by combining tree and boosting algorithms [90]. ADTree identifies and eliminates gaps among the tree and boosting algorithms. The algorithm includes decision and prediction nodes. The decision node expresses a situation, and the prediction node includes a numerical value [91]. ADTree first searches the best constant prediction coefficient for the training data in the stem of the tree. The tree is then grown based on the repetition of data, using the boosting algorithm, and a new rule is added. Next, a decision node and two prediction nodes are created [90]. Then, the algorithm allocates weight to each prediction node so that its predictability can be calculated by summing all weights [92].
(x 1 , y 1 ), . . . , (x m , y m ), are pixels in the training data, x i ∈ R d , and y i is the equivalence of occurrence and non-occurrence of landslides. The boosting algorithm grows the tree, with each repetition (t) supporting two conditions-a precondition (Pt) and a group of rules (Rt). A group of major conditions, C, is created by the weak algorithm in each repetition of the boosting algorithm. The algorithm works as follows:

1-Initialization
Let Rt be correct for creating major rules, assuming a precondition (related to the selection of a prediction node for entering the algorithm) and the condition (related to the decision node in the stem of the tree). The first predicted amount is obtained by the following equation, where W + (T) and W − (T) are the sums of positive, and negative weights, respectively, and they justify the C condition in the training data.

2-Pre-adjustment
The test samples are weighted again by -Create a C group of rules by the weak algorithm using the weight-related to each training sample W i,t .
Appl. Sci. 2020, 10, 5047 12 of 27 -For each main precondition c 1 ∈ P t and each condition c 1 ∈ P t calculate: -Select c 1 , c 2 with minima Z t (c 1 , c 2 ) and run R t+1 and R t through the adding R t rule so that the precondition and condition are equal to, respectively, c 1 and c 2 . Then predict the two prediction amounts: -Establish P t+1 and P t by adding c 1ˆc2 and c 1ˆ c 2 -Update the weights based on the following equation for each repeat:

3-Output
Sum all weights and all major rules Rt + 1:

Multicollinearity Tests
The correlation between the factors increases the redundancy affecting the landslide modelling and the accuracy of the results. Therefore, the multi-collinearity test of the conditioning factors is necessary to analysis when evaluating landslide modeling and susceptibility. For this analysis, two measures, including the tolerance (TOL) (TOL=1 − R 2 ) and variance inflation factor (VIF) (VIF = 1/TOL), have been used in the multicollinearity test [93,94]. If TOL > 0.1 and VIF < 10, there is a correlation among the factors and the factor with having such information should be removed from the modeling process [48,75].

Selecting the Most Important Conditioning Factors by IGR
Several methods have been used to determine the importance of different factors for landslide occurrence, notably fuzzy-rough theory [95], relief algorithm [96], information gain, and information gain ratio [97]. Information gain specifies the amount of information that a factor can provide about the class. It selects factors with high levels of probability, and does not consider factors with low entropy level. This result is achieved using the IGR index, which was introduced by Quinlan in 1996 [98]. Effective factors for prediction have high IGR values. In this study, we evaluated the importance of our 18 conditioning factors using the Average merit (AM) of the IGR technique [62]. Average merit (AM) quantitatively determines the importance and ranking of factors [62]. The AM is the weight computed by the IGR feature selection technique.
Assume S is the training dataset with n input samples, and also that n (Li, S) is the number of training data in S belonging to Li class (landslide and non-landslide). Then: If we consider the factors impacting landslides, the needed information gain for dividing S into (S 1 , S 2 , . . . ., S m ) is as follows: The following equation is used to calculate the information gain for each effective factor, for example, slope angle (A): SplitIn f o is the information gained by the ratio of S training data to a subset with m items using the following equation:

Validation and Comparison of the Models
In this study, we evaluated model accuracy using the following metrics: Sensitivity, specificity, accuracy, kappa, root-mean-square deviation (RMSE), and area under the curve (AUC). There are four types of possible significance, i.e., true positive (TP), false positive (FP), true negative (TN), and false-negative (FN). TP is the number of expected landslides that are truly landslides. FP is the number of expected landslides that are non-landslides. TN denotes the number of expected non-landslides that are truly non-landslide, whereas FN is the number of non-landslides. Better predictive ability is indicated by higher values of sensitivity, specificity, AUC, and accuracy and the lower values of RMSE [21]. A kappa index value of 1 indicates an ideal model, whereas a value of −1 signifies a non-reliable model. The mentioned metrics are expressed as follows: Speci f icity = TN TP + FN where O and E are the observed and expected agreement, respectively, X predicted and X actual are the predicted and observed values of the ith instance from models, and n is the number of instances. The receiver operating characteristic (ROC) curve has been used to test the overall performance of LSM methods [93]. The area under the ROC curve is the statistical summary of the overall performance of models [72]. The x, and y axes of the AUC are, respectively, the sensitivity and 100-specificity. The values of AUC range from 0 to 1, with values closer to 1 indicating a better predictive ability; an AUC value of 1 indicates perfect model performance [94]. The schematic diagram of methodology is illustrated in Figure 4. Table 2 shows the correlation between 18 conditioning factors. The results conclude that there is no correlated problem among the models, and all of them can be selected as inputs to modeling procedure by the machine learning algorithms.

Landslide Modeling and Evaluation Process
After selecting significant conditioning factors, we performed the modeling process on the training dataset using SVM, BLR, and ADTree and then tested the results. The goodness-of-fit analysis indicates that all landslide models predict the spatial distribution of landslides well (Table 4)  Next, we assessed the predictive power of the models using the validation dataset ( Table 5). The SVM model yielded the highest sensitivity (85.7%), followed by the BLR (78.6%) and ADTree (71.4%) models. The SVM model also provided the highest specificity value (91.7%), followed by the BLR (83.3%) and ADTree (81.8%) models. Accuracy values for the three models are 88.5% (SVM model), 80.8% (BLR), and 76.0% (ADTree). Kappa (0.869) and AUC (0.976) for the SVM model are greater than corresponding values for the BLR and ADTree models. Finally, the SVM model has the lowest RMSE (0.251), followed by the BLR (0.277) and ADTree (0.343) modes. Overall, the results from both the training and validation datasets show that the SVM model outclassed the BLR and ADTree models in predicting the locations of landslides in the study area. In addition to the comparison of the performance of the models, based on the statistical-indexed base metrics, we assessed the efficiency of the three algorithms based on the CPU time during the modeling implementation. We concluded that in the SVM algorithm the CPU time to process by the training and validation datasets were 0.03 s; however, in the BLR this time for training dataset was 0.05 s and for validation dataset was 0.03 s. Moreover, the ADTree that had the lowest goodness-of-fit and prediction accuracy had 0.09 s and 0.06 s based on the training and the validation datasets, respectively.

Development of Landslide Susceptibility Maps
After training the SVM, BLR, and ADTree machine learning models with the training dataset and validating them with the validation dataset, we ran the models and obtained outputs as weights (landslide susceptibility indexes, LSIs). LSIs were assigned to each pixel of the study area to construct the landslide susceptibility maps. There are a variety of classification methods in ArcGIS, including manual, equal interval, natural break, quantile, geometric interval, and standard deviation used [16,35,[98][99][100]. We selected the most classification used methods to create the landslide susceptibility maps such as natural breaks, quintile, and geometric intervals for reclassifying the LSIs. In the natural breaks method, no jump is detected in the values [101]. However, the quintile and geometric interval methods essentially split the distribution of susceptibility values into equal divisions, with similar proportions of the total area attributed to each class [102].
As selecting the method used to reclassify the LSIs depends on the LSI histogram [99,103,104], we prepare the histogram of the three mentioned methods based on the landslide pixel against susceptibility classes. A histogram is a better that most of landslide pixels have been placed in high susceptibility (HS) and very high susceptibility (VH) classes. Then, we chose the quintile classification method for the landslide susceptibility map derived using the SVM and ADTree models, and the geometrical interval method for the model based on the BLR ( Figure 5). Each map has five susceptibility classes: very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility (MS), high susceptibility (HS), and very high susceptibility (VHS) (Figure 6).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 17 of 28 maps such as natural breaks, quintile, and geometric intervals for reclassifying the LSIs. In the natural breaks method, no jump is detected in the values [101]. However, the quintile and geometric interval methods essentially split the distribution of susceptibility values into equal divisions, with similar proportions of the total area attributed to each class [102].
As selecting the method used to reclassify the LSIs depends on the LSI histogram [99,103,104], we prepare the histogram of the three mentioned methods based on the landslide pixel against susceptibility classes. A histogram is a better that most of landslide pixels have been placed in high susceptibility (HS) and very high susceptibility (VH) classes. Then, we chose the quintile classification method for the landslide susceptibility map derived using the SVM and ADTree models, and the geometrical interval method for the model based on the BLR ( Figure 5). Each map has five susceptibility classes: very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility (MS), high susceptibility (HS), and very high susceptibility (VHS) (Figure 6).

Evaluation of Landslide Susceptibility Maps
We used the ROC curves for the training and validation datasets to evaluate the machine learning models. The probabilities of landslides calculated for the training and validation datasets provide measures of the performance, and prediction accuracy of the models, respectively [11]. The x-axis, and y-axis of the ROC curves are, respectively, the sensitivity and 100-specificity indices. The performance and prediction accuracy of the three models are shown in Figure 7a, b, respectively. The performance of the SVM model is slightly higher (AUC = 0.988) than that of the BLR (AUC = 0.985), and ADTree (AUC = 0.977) models. The prediction accuracy of SVM is also slightly higher (AUC = 0.984) than that of the BLR (AUC = 0.980) and ADTree (AUC = 0.977).

Evaluation of Landslide Susceptibility Maps
We used the ROC curves for the training and validation datasets to evaluate the machine learning models. The probabilities of landslides calculated for the training and validation datasets provide measures of the performance, and prediction accuracy of the models, respectively [11]. The x-axis, and y-axis of the ROC curves are, respectively, the sensitivity and 100-specificity indices. The performance and prediction accuracy of the three models are shown in Figure 7a,b, respectively. The performance of the SVM model is slightly higher (AUC = 0.988) than that of the BLR (AUC = 0.985), and ADTree (AUC = 0.977) models. The prediction accuracy of SVM is also slightly higher (AUC = 0.984) than that of the BLR (AUC = 0.980) and ADTree (AUC = 0.977).

Discussion
The susceptibility of an area to landslides is a function of different possible conditioning factors. As all of the factors might have no predictive capability, the most important must be objectively chosen to strengthen the performance and accuracy of the learning algorithms in the training phase. In this study, we used the IGR technique to identify factors with high predictive powers. The weight of each factor in the training phase was calculated using an entropy index. In this study, tested 18 conditioning factors with a raster resolution of 10 m and found that 10 factors were significant: Distance from the road, normalized difference vegetation index, land use, slope aspect, lithology, slope angle, precipitation, distance from faults, elevation, and topographic wetness index. Eight factors were removed from final modeling because they had AM values of 0: e distance from stream, slope-length, annual solar radiation, profile curvature, plan curvature, fault density, drainage density, and stream power index.
Researchers have used three main methods to display classes on landslide susceptibility maps: the natural break, geometrical interval, and quantile methods. We statistically tested the three methods for producing maps using the three machine learning algorithms investigated in this study. The natural breaks classification method was selected for the SVM and ADTree models, and the quantile method for the BLR model. Most of the researchers in landslide susceptibility mapping confirmed the capability of the natural break method to classify the LSIs [105][106][107][108][109]. The quantile method among the classification methods is generally the most effective and commonly used method [104,110,111]. Nhu et al. [16] applied the natural break, geometrical interval, and quantile and based on their histogram of landslide probability values selected the natural break classification method for the random forest algorithm. However, the geometrical interval method for the three ensemble models of rotation forest-based random forest (RF-RAF), bagging based random forest (BA-RAF), and random subspace-based random forest (RS-RAF) to produce shallow landslide susceptibility maps.
Distance to the road is the factor most closely related to landslides in the study area. All the susceptibility maps showed that most landslides are less than 100 m from the road through the study area. The road is located at high elevation in wet areas (high topographic wetness index), which are other significant landslide conditioning factors. In recent years, the road through the study area has been widened, and new bridges have been constructed, changing the landscape and initiating instability along the road. The road is also trafficked by trucks and other heavy vehicles. Therefore, they should be more considered during road widening and other engineering construction in future.

Discussion
The susceptibility of an area to landslides is a function of different possible conditioning factors. As all of the factors might have no predictive capability, the most important must be objectively chosen to strengthen the performance and accuracy of the learning algorithms in the training phase. In this study, we used the IGR technique to identify factors with high predictive powers. The weight of each factor in the training phase was calculated using an entropy index. In this study, tested 18 conditioning factors with a raster resolution of 10 m and found that 10 factors were significant: Distance from the road, normalized difference vegetation index, land use, slope aspect, lithology, slope angle, precipitation, distance from faults, elevation, and topographic wetness index. Eight factors were removed from final modeling because they had AM values of 0: e distance from stream, slope-length, annual solar radiation, profile curvature, plan curvature, fault density, drainage density, and stream power index.
Researchers have used three main methods to display classes on landslide susceptibility maps: the natural break, geometrical interval, and quantile methods. We statistically tested the three methods for producing maps using the three machine learning algorithms investigated in this study. The natural breaks classification method was selected for the SVM and ADTree models, and the quantile method for the BLR model. Most of the researchers in landslide susceptibility mapping confirmed the capability of the natural break method to classify the LSIs [105][106][107][108][109]. The quantile method among the classification methods is generally the most effective and commonly used method [104,110,111]. Nhu et al. [16] applied the natural break, geometrical interval, and quantile and based on their histogram of landslide probability values selected the natural break classification method for the random forest algorithm. However, the geometrical interval method for the three ensemble models of rotation forest-based random forest (RF-RAF), bagging based random forest (BA-RAF), and random subspace-based random forest (RS-RAF) to produce shallow landslide susceptibility maps.
Distance to the road is the factor most closely related to landslides in the study area. All the susceptibility maps showed that most landslides are less than 100 m from the road through the study area. The road is located at high elevation in wet areas (high topographic wetness index), which are other significant landslide conditioning factors. In recent years, the road through the study area has been widened, and new bridges have been constructed, changing the landscape and initiating instability along the road. The road is also trafficked by trucks and other heavy vehicles. Therefore, they should be more considered during road widening and other engineering construction in future.
The second and third most crucial landslide conditioning factors are the normalized difference vegetation index land use. Most landslides in the study area have happened in unvegetated or sparsely vegetated areas, including rangeland.
Slope aspect is another critical factor as landslides tend to occur on slopes oriented toward the northwest and west because these aspects experience more precipitation and runoff than other factors. Precipitation in the study area is higher than the average for the country. By considering landslides susceptibility maps and its histograms with three models, it can be concluded that the susceptible areas of landslides belong to a very high susceptible class.
After selecting appropriate landslide conditioning factors, we prepared landslide susceptibility maps using three machine learning models and the natural break, geometrical intervals, and quantile classification methods. Our finding concluded that natural break and quantile had most concordance and consistency with the reality of the study area. We have shown that the SVM algorithm has the highest goodness-of-fit and prediction accuracy of the three machine learning algorithms tested in this study based on both training and validation datasets. This result is consistent with the findings of other landslide researchers [22,[112][113][114][115][116][117]. For example, Kalantar et al. [115], compared the performance of SVM, LR, and ANN for landslide assessment in a catchment in the Dodangeh watershed, Mazandaran province, Iran. They concluded that SVM outperformed the other models, and therefore, it was potentially known as the most powerful algorithm for landslide modeling in their study area. Abedini et al. [116] compared the performance of the SVM and LMT models for landslide susceptibility mapping in Kamyaran county, also in Kurdistan province, and confirmed the superiority of the SVM model. SVM has also been successfully used in landslide susceptibility mapping in the Cameron Highlands, Malaysia [19]. In contrast, Nhu et al. [15] compared LMT, LR, NBT, ANN, and SVM models for landslide susceptibility mapping in Bjar city, Kurdistan province, and found that LMT had the highest, and SVM the lowest, goodness-of-fit and prediction accuracy.
According to the best of our knowledge of the literature on landslide susceptibility mapping, SVM can be successfully used as a benchmark computing machine learning algorithm in new ensemble models [118,119]. For example, Pham et al. [119] proposed a new ensemble model consisting of random subspace base classification and regression tree (RSSCART) for landslide modeling and assessment in the Luc Yen district of Yen Bai province, Viet Nam. They compared their new ensemble model, the SVM benchmark model. SVM offers several advantages over other machine learning models: (i) It is free from feature selection techniques that are required by other models such as decision trees; (ii) it can handle complex and non-linear problems with large datasets; and (iii) it solves the convex quadratic programming optimization problem of separating the hyper-plane and thus is a suitable replacement for artificial neural networks [35,113,120].
Our results indicate that the BLR algorithm outperformed ADTree in the landslide modeling process and susceptibility map assessment. BLR is an LR method, within the Bayesian paradigm, that includes a posterior distribution function for evaluating each landslide conditioning factor. BLR also offers several advantages that make it as a robust algorithm for modeling: (i) it can estimate probability intervals of landslide occurrence; (ii) it can be used with small samples, as it does not rely on large-sample approximations; (iii) available prior information about regression coefficients can be incorporated in the Bayesian model; and (iv) multi-level data or models are particularly suited to the hierarchical structure of Bayesian modeling [121,122]. The performance and prediction accuracy of the BLR model has been confirmed and reported, not only for landslide modeling [65,88], but also for flood [123] and land subsidence [92] susceptibility mapping.
ADTree has been suggested and used by some environmental researchers [53,92,124]. An advantage of using ADTree is that it has the fastest induction time for domain problems with few discriminative features [125]. Moreover, it has been successfully used as a base classifier in coupled ensemble models, including multiboot (MB), bagging (BA), random subspace (RS), and rotation forest (RF) for landslide susceptibility mapping [51,78]. Our results will be useful to landslide hazard managers, decision-makers, and researchers when selecting the most appropriate models for landslide susceptibility mapping. However, we acknowledge the limitations of the present study, largely uncertainties in input data. For example, results can differ depending on the sample size and raster resolution. Shirzadi et al. [78] studied these uncertainties and suggested a raster resolution of 10 m for training/validation sample sizes 60/40% and 70/30%; and a resolution of 20 m for sample sizes of 80/20% and 90/10%. Another limitation of the current study is related to model selection. Each algorithm has a specific probability distribution function or rule, not all of which fit a given training dataset. Therefore, it is necessary to test the models and select the best one for a given study. This process is mainly done using a trial-and-error technique and is time-consuming.
A limitation of landslide susceptibility mapping, in general, is that maps generated with machine learning techniques can accurately show where landslides are likely to occur based on geo-environmental factors, but the important physical, mechanical, and elastic properties of soil such as porosity, permeability, cohesion, and pore water pressure are not considered. These soil-related properties strongly control landslide occurrence at the site scale, yet preparing maps showing their distributions is costly and time-consuming. We recommend that researchers consider these factors and specially to use them in conjunction with slope stability models and deterministic numerical models that address the factor of safety (FOS). For example, Shallow Landsliding Stability (SHALSTAB) and SINMAP (Stability Index MAPping). These models couple a hydrologic model with an infinite slope form of the Mohr-Coulomb failure law to spatially predict slope failures. Therefore, one of the ways to enhance the accuracy of the susceptibility maps is to use the soil-related factors in future.
Additionally, the application of high-resolution data such as airborne laser scanning of Light Detection and Ranging (LiDAR) not only could enhance the quality of the conditioning factors but also the prediction accuracy of the models. The ability of high-resolution data has been confirmed and evaluated by some landslide researchers [99,[126][127][128]. For example, Jebur et al. [128] by very high-resolution data, LiDAR, optimized the used landslide conditioning factors, and they concluded that a high-quality, informative database, is essential and classification of landslide types prior to landslide susceptibility assessment is necessary to help improve model performance.

Conclusions
Accurate landslide susceptibility maps provide land-use managers and government officials with a valuable tool for managing landslide hazard and risk. In this paper, we evaluate the performance and prediction accuracy of three well-known machine learning models (SVM, BLR, and ADTree) for landslide susceptibility mapping in the Salavat Abad saddle, Kurdistan province, Iran. The saddle is an important area that connects Kurdistan to other provinces of Iran, and thus, a priority for landslide management and remediation. We determine the most critical geo-environmental factors using the IGR technique to delineate better, visualize, and interpret landslide-prone areas. In our study area, the essential factors for landslide modeling are distance to road, NDVI, and land use. Our models show that the area bordering the arterial road in the Salavat Abad saddle is most susceptible to landsliding. We also show the SVM algorithm has a high goodness-of-fit and prediction accuracy of landslides in the study area, and that BLR and ADTree are suitable alternatives in the study area. Therefore, we suggest the SVM and BLR as soft computing benchmark models in similar areas in terms of topographic, climate, and lithology features.