Shallow Landslide Prediction Using a Novel Hybrid Functional Machine Learning Algorithm

: We used a novel hybrid functional machine learning algorithm to predict the spatial distribution of landslides in the Sarkhoon watershed, Iran. We developed a new ensemble model which is a combination of a functional algorithm, stochastic gradient descent (SGD) and an AdaBoost (AB) Meta classiﬁer namely ABSGD model to predict the landslides. The model incorporates 20 landslide conditioning factors, which we ranked using the least-square support vector machine (LSSVM) technique. For the modeling, we considered 98 landslide locations, of which 70% (79) were


Introduction
Landslides are important geohazards that can seriously impact the natural and built environment [1][2][3]. About 66 million people live in landslide-prone areas, with the greatest risk in terms of numbers in Asia [4,5]. Managing this risk involves a multi-step process centered on identification, characterization and prediction of landslides [6]. In this paper, we focus on spatial prediction of landslides, while recognizing that landslide prediction has temporal and magnitude components [7,8].
Spatial predictions of landslides commonly involve the production of landslide susceptibility maps [9]. Such mapping is challenging because it relies on adequate high-quality data [10]. Moreover, there is not yet a globally accepted standard approach, in spite of the numerous techniques that have been proposed and used [11,12]. Yet, over the past several decades, there have been remarkable advances in geographic information system (GIS) and remote sensing tools that have been applied to assess landslide susceptibility, hazards, risks and mapping [13][14][15].
Models for predicting landslide susceptibility can be created using qualitative or quantitative methods [16,17]. Qualitative methods based on landslide inventories and parameter weighting rely on expert judgment, whereas quantitative statistical, probabilistic and deterministic methods are mathematically based. With adequate input data, quantitative methods will generally outperform qualitative methods [18,19].
Each model has advantages and disadvantages depending on differences of the specific study areas. Therefore, new approaches are desirable for testing and validation.
A recent development that shows considerable promise is the combination of different methods to build hybrid models that can generate more accurate spatial predictions of landslides [67]. Data mining approaches are being combined with other methods, such as ANN-Bayes analysis [68], stepwise weight assessment ratio analysis (SWARA), the adaptive neuro-fuzzy inference system (ANFIS) [69], rough set (RS)-SVM [70], neuro fuzzy inference system optimized by particle swarm optimization (PSOANFIS) [71], ANFIS optimized by shuffled frog leaping algorithm (SFLA) [72], ANFIS with grey wolf optimizer (GWO) and biogeography-based optimization (BBO) [73], random subspace and the naive Bayes tree (RS-NBT) [66], and weights of evidence (WoE) and evidential belief function (EBF) [49]. These approaches have provided reasonable results; however, no single hybrid model has emerged as superior to the others.
The objective of this study is to introduce a new hybrid machine learning approach for landslide prediction. Our new approach merges the AdaBoost (AB) Meta classifier with the stochastic gradient descent (SGD) algorithm as a base classifier. We refer to this approach as the stochastic gradient descent-AdaBoost ensemble (ABSGD) method. Here we use it to predict locations of shallow landslides in Chahar Mahaal-o-Bakhtiari Province, Iran. To our knowledge, this hybrid approach has not previously been used for LSM and landslide prediction. To test the performance of our proposed approach, we compare results from the study area to those of several soft computing benchmark models, including logistic regression (LR), the logistic model tree (LMT) and the functional tree (FT).

Study Area
Our study area is the Sarkhoon watershed, located within the Zagros Mountains, Iran Figure 1). The study area ranges in elevation from 1370 to 3375 m above sea level. The watershed is underlain mainly by sedimentary rocks of Late Cretaceous, Eocene, Miocene and Pliocene age, including limestone, dolomite, marl, sandstone and conglomerate. Complex folds and both reverse and strike-slip faults are present within the study area [74].
Average annual precipitation is 874 mm and temperatures range from below freezing during winter to 40 • C during summer. Land cover/land use in the watershed is approximately 59% forest, 34% rangeland, 3.5% rock outcrop, 3% dry farming and 0.7% residential land. Drought, conversion of land to farms and road construction over the past four decades have degraded the land [75] and increased the susceptibility of the watershed to landslides. Location of the study area in Iran; the red circles denote landslides for testing; the red triangles denote landslides for training; the green circles denote non-landslides for testing; and the green triangles denote non-landslides for training.

Landslide Inventory Map (LIM)
To frame this study, we collected both landslide and non-landslide points in the Sarkhoon watershed, taking into account published studies from other areas [76][77][78][79][80]. We collected some of the landslide polygons from the Forests, Rangelands and Watershed Management Organization of Iran. The polygons cover both scar and accumulated/body zones. But in this study we selected the center of each scar zone of landslides as landslide locations. Additionally, other parts of landslides were determined based on the 1:20,000-scale aerial photographs provided by the provincial Department of Natural Resources and Watershed Management. We then ground-truthed the landslides in the field and recorded their GPS locations. Our inventory of 98 landslide points included 55 translational slides, 22 complex landslides and 21 rotational slides ranging in size from 100 to 60,000 m 2 ( Figure 2).
We also randomly chose 100 non-landslide points to be used for LSM. Both the landslide and non-landslide points were divided into training and testing subsets for modeling purposes. About 70% of the points were randomly chosen for the training dataset and 30% were selected for testing.

Landslide Conditioning Factors
We selected the following twenty landslide conditioning factors: land use, lithology, average annual precipitation, altitude, slope angle, aspect, European Slope Length and Steepness Factor (LS-Factor), general curvature, profile curvature, plan curvature, longitudinal curvature, tangential curvature, solar radiation, stream power index (SPI), topographic position index (TPI), topographic wetness index (TWI), terrain roughness index (TRI), distance to streams, distance to roads and distance to faults. The classification of different conditioning factors is presented in Table 1. We used seven land use classes in the study area. These include: dry farming, sparse forest, dense forest, poor rangeland, good rangeland, residential area and rock outcrops, which have been mapped by the Chahar Mahaal-o-Bakhtiari Department of Natural Resources and Watershed Management (http://www.frw.org.i). We derived lithological units and faults from the geology map of Ardales and Dehdez sheets prepared by Geological Survey & Mineral Explorations of Iran (GSI) at a 1:100,000 scale [74]. A total of ten lithological units were identified in the Sarkhoon watershed (Table 1). We built an average annual precipitation map using a relationship between average annual precipitation and elevation based on 42 years of average annual precipitation data (1972-2014) from nine meteorological stations in the watershed.
We created a Digital Elevation Model (DEM) with 12.5 m resolution from ALOS PALSAR data provided by the Alaska Satellite Facility (https://vertex.daac.asf.alaska.edu/#). Maps of elevation, slope angle, aspect and length, general, profile, plan, longitudinal and tangential curvature, solar radiation, SPI, TPI, TWI, TRI and distance to stream were constructed from the DEM using ARC GIS 10.3 and SAGA 6.0.0 software. The distance to road map was constructed from the road network built by the Iran National Cartographic Center in DGN format and 1:25,000 scale. The flowchart for the landslide susceptibility mapping and analysis of spatial data of the watershed is shown in Figure 3.   (1) <350,000; (2) 350,000-700,000 (3) 700,000-1,050,000; (4) 1,050,000-1,400,000; (5) 1,400,000-1,750,000;

AdaBoost Meta Classifier
First introduced by Freund and Schapire [81], AdaBoost is a boosting ensemble technique used to improve the predictive capability of weak classifiers. The technique incrementally constructs one classifier at a time; each classifier is trained on a dataset generated selectively from the original dataset by progressively increasing at each step the likelihood of "difficult" data points [82]. AdaBoost has been used in ensemble to improve the prediction ability specially in support vector machines [83], neural networks [84] and decision trees [85].
We apply the technique in this study as follows. Let U = (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n ) be an original training dataset where x = x 1 , i = 1, 2, . . . , n is a set of landslide conditioning factors, y = y i ∈ {−1, 1} represents two classes for classification and W = {w 1 , w 2 , . . . , w n } is the weight distribution against the samples at the i th boosting iteration. For a given iteration, AdaBoost constructs a new set of training datasets, which are sampled from the original training dataset, with the weight distribution W. Thereafter, the weak learner is called to build a base classifier defined by S t which uses the new training datasets for learning. An error of S t , denoted as E s , is calculated using the following equation [86]: The weights of the samples are updated during the learning process as follows: where β and z i are calculated using the following equations [87]: The calculated weights are then normalized to add up to one, as follows: In the final step, AdaBoost combines all the results of the classification of classifiers.

Stochastic Gradient Descent Algorithm
The stochastic gradient descent algorithm (SGDA) is a drastic simplification algorithm [88] that utilizes a small subset, which is selected randomly, to compute the gradient of the objective function [89]. The batch size is called as the number of training dataset utilized for the approximation in one iteration. The parameters can be updated more frequently than the gradient descent by using a small batch size, thus accelerating the convergence. A batch size of 1, in the extreme case, provides the maximum frequency of updates and a very simple perceptron-like algorithm. In the SGDA, the weights of the features are updated for the training sample using the following equation [90]: where N is the batch size, M is the meta-parameter that controls the degree of regularization, z is the iteration counter, α z is the learning rate, ω i is the weight of the feature, and L( j, w) is the conditional log-likelihood of the j th training sample [89].

Logistic Regression
Logistic regression (LR) is a popular statistical method applied to landslide susceptibility mapping [91][92][93]. It establishes a multivariate regression relationship between independent variables and a dependent variable [31,92]. The variables can be discrete, continuous or both. The LR algorithm estimates the probability of a certain landslide event by utilizing the maximum likelihood estimation [39]. In the case of landslide prediction, the dependent variable is a binary variable (landslide and non-landslide). The algorithm of LR can be expressed in a simple form as follows [92]: where P is defined as the probability of a past landslide event and f is determined by: where n is the number of the factors, a 0 is the intercept of the algorithm, a i , i = 1, 2, . . . , n is the slope coefficient of the algorithm and x = x i , i = 1, 2, . . . , n is the attributes of the factors.

Logistic Model Tree
Logistic model tree (LMT) is one of the classification tree classifiers. It uses a combination of decision tree and logistic regression machine learning methods [94]. In LMT, the classification and regression tree algorithms are used to prune the tree for classification, whereas the LogitBoost algorithm is used to construct the logistic regression model at every node of the tree; the splitting process is carried out by the logistic variant information gain [94,95]. To find the number of LogitBoost iterations, the LMT employs cross-validation to prevent over-fitting. The additive logistic regression of least squares fitting is used in the LogitBoost algorithm at each class N i as follows [94]: where n is the number of factors and α o and α i are, respectively, the initial coefficient and the coefficient of the i th component of vector x.
In LMT, the posterior probabilities of the leaf nodes are calculated using the linear logistic regression method [94]: where C is the number of classes.

Functional Tree
Functional tree (FT) is a tree classifier that uses a combination of attributes at leaf nodes, decision nodes or both and leaves in the learning classification tree. FT uses the logistic regression function to split at the functional inner nodes and predict at the functional leaves. In FT, the functional leaves are used to reduce the variance, whereas the functional inner nodes are used to reduce the bias of classification. The application of FT in landslide prediction is limited to few case studies [96].
x = x i , i = 1, 2, . . . , n is a set of attributes of the factors, and y = y i represents output classes (landslide and non-landside). The classification of the FT algorithm is carried out using the following steps: (1) construct the model, which is the probability of distribution of the output classes, by selecting the constructor of the Linear Bayes Discriminate Function; (2) generate the new constructed dataset by extending the new factor that belongs to the landslide or non-landslide classes; and (3) construct the classification tree by selecting the factors from the initial training datasets and the new datasets.

Factor Selection Using the Least Square Support Vector Machine (LSSVM)
Factor selection techniques are used to improve and enhance the predictive ability of models during the modeling process with a training dataset. Problems with over-fitting and noise in the training dataset can be overcome by removing factors that have no predictive power. To achieve this objective, we used the least square support vector machine (LSSVM), which was originally proposed by Suykens et al. [97] as a SVM-modified method. LSSVM is a kernel supervised machine learning method that uses the least square linear function for classification and regression problems [98]. It depends on standardization networks and uses the quadratic cost function to reduce the variance in the training dataset and solve a set of linear equations [99].
Consider a training dataset of S data points (x 1 , y 1 ), . . . , (x n , y n ) , where x i ∈ R d is a feature vector and y i ∈ {−1, +1} is the landslide and non-landslide values. A nonlinear function is used to map the data points into a high-dimensional Hilbert space. The LSSVM classifier is formulated by minimizing [99]: Subject to the equality constraints: where γ > 0 is a regularization factor, b is a bias term and e i is the difference between the estimated and the actual outputs.

Statistical Index-Based Evaluation
In this study, we used several statistically based measures including sensitivity (SST), specificity (SPC), accuracy (ACC), root mean squared error (RMSE) and the area under the receiver operating characteristic curve (AUC) to evaluate the landslide modeling process. These quantitative measures were obtained using a 2×2 contingency/confusion table in which four types of possible outcomes-true positive (TP), false positive (FP), true negative (TN) and false negative (FN)-were captured ( Table 2). The 2×2 contingency/confusion table in binary classification such as landslide and non-landslide is obtained based on a cutoff value (here is 0.5). Then, it calculated based on the comparison between each landslide ground truth pixel (actual landslide locations) and landside pixel on the obtained classified map. TP and FP refer to landslide locations that are determined to be, respectively, landslide and non-landslide locations. FN and TN classify non-landslide locations as, respectively, landslide and non-landslide locations. Statistical values derived from these four factors are computed as follows [100]: where n is the total number of samples in the landslide training dataset or validation dataset, X obsevation is the predicted probability value in the landslide training dataset or validation dataset and X estimatin is the actual probability value calculated from the landslide susceptibility model.

AUC
The areas under the receiver operating characteristic curve (AUC) is a standard tool for evaluating and assessing the general performance of models [27,49,66,85,101,102]. We used AUC to check the performance of our landslide models. The y-axis of the curve provides a measure of the model sensitivity and the x-axis records 100-specificity [66,103]. The AUC index ranges from 0.5 for an inaccurate model to 1 for an ideal model with higher performance [85,104]. The index is computed as follows [105]: where P is the total number of landslide locations and N is the total number of non-landslide locations.

The Most Significant Conditioning Factors in the Modeling Process
One of the most important steps in any environmental modeling process is the determination of the most significant conditioning factors. Not all factors have the same effect on event occurrences; some may have no effect and must be removed from further consideration. In the present study, the LSSVM model was applied to rate the effectiveness of each conditioning factor based on average merit (AM). Application of this model revealed that distance to road is the most important conditioning factor for landslide occurrences in the Sarkhoon watershed (AM = 19.9), followed by elevation (AM = 18.7), aspect (17.8), rainfall (17)

Model Validation and Comparison
The modeling process performances using SST, SPC, ACC, RMSE and AUC for both the training and testing phases are shown in Table 3. LMT has the highest sensitivity in the training set (0.783%), meaning that 78.3% of the landslide locations are classified as landslide, followed by FT (75.4%), LR (86.6%) and SGD and ABSGD (83.6%). ABSGD had the highest specificity (87.    (Table 3).

Landslide Susceptibility Mapping
After determining the conditioning factors that provided the best model prediction power, we determined the optimal operator for each model. We used a trial-and-error process to determine the optimum values of all parameters in each algorithm such that the goodness-of-fit and performance of the applied algorithms yielded the highest values. All parameters were changed stage-by-stage and the performance of the models checked. The optimum values of these parameters were selected for the final stage of modeling (Table 4). We transformed the study area into raster format with a pixel size of 10 m. Each pixel was classified as either landslide or non-landslide. We next estimated the landslide indexes that show the probability of landslide occurrence for each pixel based on the training dataset and the learned model. Thus, each pixel of the study area was assigned a unique index. Finally, the indexes for each model were assigned to five classes, namely very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility (MS), high susceptibility (HS) and very high susceptibility (VHS) using the quantile classification scheme [106,107], as shown in Figure 5a-e. The results show that the northeast, middle and southern parts of the Sarkhoon watershed have very high landslide susceptibility and that they are mostly located along the roads.

Map Verification and Comparison
Model evaluation is an important step in any environmental modeling process, without which the results cannot be shown to have scientific significance [106,107]. We determined the validity of the five landslide maps of the Sarkhoon watershed (  (Figure 6b). Although all models yielded good and reasonable results, the ABSGD ensemble model had the highest predictive power for landslide susceptibility assessment. The success and prediction rate curves for training and validation landslides based only on landslide locations were designed for the ABSGD and SGD models. We showed that ABSGD had the highest performance and prediction capability for the training (AUC = 0.855) and validation (AUC = 0.765) datasets. Corresponding values for SGD are lower (AUC training = 0.843; AUC validation = 0.727) (Figure 7a,b).

Discussion
A goal of spatial landslide modeling is to produce a reliable susceptibility map with high prediction accuracy. Therefore, research is focused on developing and evaluating the performance of predictive landslide susceptibility models [55]. Although many methods have been developed for landslide modeling over the past four decades, machine learning algorithms and their ensemble techniques have been favored in recent years. Their efficiency in enhancing the performance of the models has been stressed by many researchers [31,108].
The main objective of this study was to introduce a new machine learning ensemble model that combines the stochastic gradient descent (SGD) as a base function classifier and AdaBoost as a Meta classifier, namely, ABSGD. Using a linear support vector machine (LSVM) with 10-fold cross-validation, we identified the distance to road as the most significant factor for landslides in the Sarkhoon watershed. Similar findings have been previously reported by Pham et al. [61,91,101]. The results of the factor selection also indicated that all other factors are important for the modeling and prediction of landslides in the Sarkhoon watershed.
We compared the model results and the validation process to assess the ability of ABSGD to spatially predict landslides using four soft computing benchmark models-the SGD, LR, LMT and FT models. Five measures, namely sensitivity, specificity, accuracy, RMSE and AUC, were used for the comparison. The results indicated that the ABSGD model had a better goodness-of-fit (using the training dataset) and prediction capability (using the validation dataset) than the other models.
Additionally, the results of this study showed that the LR model had a higher value of goodness-of-fit and prediction capability than the SGD model and that the SGD model outperformed the LMT and FT decision tree classifiers. The results confirmed that AdaBoost improved the performance of the SGD algorithm. This finding is in agreement with those of Bui et al. [55], Pham et al. [91] and Shirzadi et al. [66], all of whom state that Meta classifiers can enhance the performance of base classifiers. Shirzadi et al. [66] reported that the random subspace (RS) can improve the predictive power of the naive base tree (NBTree) for landslide modeling. In addition, Pham et al. [102] revealed that RS improved the performance of the classification and regression tree (CART) for preparing landslide susceptibility maps.
The main advantage of AdaBoost as a Meta classifier is that it can provide a good balance between accuracy and diversity and reduce noise and data over-fitting in the training dataset [109]. In sum, AdaBoost, as a boosting algorithm, has a good generalization capability, fast performance and low implementation complexity in classification issues [110].

Conclusions
A key objective in predictive modeling of landslides is to produce reliable susceptibility maps that can assist managers, land use planners and decision makers to better manage landslide-prone areas. We have shown that machine learning ensemble models can improve spatial landslide predictions due to improvements in the performance of the base classifier. In this study, we used a novel ensemble model, which we refer to as the stochastic gradient descent-AdaBoost ensemble (ABSGD), to prepare a reliable landslide susceptibility map for the Sarkhoon watershed in Chahar-Mahaal-oBakhtiari Province, Iran. This ensemble model combines a functional classifier, SGD and a Meta classifier (AdaBoost).
The results of landslide factor selection using LSSVM with a 10-fold cross-validation showed that all conditioning factors affected the spatial landslide modeling, with distance to road proving the most important. Steep slopes crossed by roads are prone to landslides largely due to cut-and-fill construction techniques and diversion of drainage. Additionally, our results indicate that, although all models performed reliably, the ABSGD model outperformed the LR, SGD, LMT and FT models. Therefore, we suggest that a combination of SGD and AdaBoost provides a better optimized model for increasing the accuracy of predictive landslide susceptibility mapping. In this study, we showed that hybrid models can enhance the performance of individual models in assessing predicting landslides.