A Novel Intelligence Approach of a Sequential Minimal Optimization-Based Support Vector Machine for Landslide Susceptibility Mapping

: The main objective of this study is to propose a novel hybrid model of a sequential minimal optimization and support vector machine (SMOSVM) for accurate landslide susceptibility mapping. For this task, one of the landslide prone areas of Vietnam, the Mu Cang Chai District located in Yen Bai Province was selected. In total, 248 landslide locations and 15 landslide-a ﬀ ecting factors were selected for landslide modeling and analysis. Predictive capability of SMOSVM was evaluated and compared with other landslide models, namely a hybrid model of the cascade generalization optimization-based support vector machine (CGSVM), individual models, such as support vector machines (SVM) and naïve Bayes trees (NBT). For validation, di ﬀ erent quantitative criteria such as statistical based methods and area under the receiver operating characteristic curve (AUC) technique were used. Results of the study show that the SMOSVM model (AUC = 0.824) has the highest performance for landslide susceptibility mapping, followed by CGSVM (AUC = 0.815), SVM (AUC = 0.804), and NBT (AUC = 0.800) models, respectively. Thus, the proposed novel SMOSVM model is a promising method for better landslide susceptibility mapping and prediction, which can be applied also in other landslide prone areas.


Introduction
Landslide susceptibility mapping is an appropriate tool for management of landslide hazards [1]. Landslide susceptibility of an area is usually assessed based on the analysis of spatial relationship of historical landslide occurrences with the number of affecting factors [2]. Occurrence of landslides depends on the characteristics of the study area such as geology, topography, soil, and other geo-environmental factors. In addition, analysis of the natural mechanism of landslides helps in the assessment and management of landslides [3].
Challenge to modeling landslides is the uncertainty issue including inputs, landslide conditioning factors, and model selection [4]. As there is no standard guideline and framework to select the number of landslide conditioning factors, the users based on the literature and data availability of a given study area select the factors for the modeling process. Although there are some factor selection techniques to determine the best factors in the modeling, another uncertainty is model selection that affects the goodness-of-fit and prediction accuracy of the models [4]. It is apparent that some methods and techniques have been developed; however, all of them are not applicable in all regions. Therefore, each model firstly should be tested and evaluated for specific area and then to be used for modeling process. Basically, the main aim of landslide researchers is to select the best factors and models in order to decreases the uncertainties during modeling process for enhancing the power prediction of the models.
In recent years, instead of single ML models, hybrid models are developed and applied for landslide susceptibility mapping for better accuracy of landslide prediction. These hybrid models include ANFIS coupled with a genetic algorithm (ANFIS-GA) [29,30], ANFIS coupled with differential evolution (ANFIS-DE) [29], ANFIS combined with biogeography-based optimization and BAT algorithms (ANFIS-BBO and ANFIS-BAT) [31], ANFIS combined with an imperialistic competitive algorithm (ANFIS-ICA) and firefly algorithm (ANFIS-FA) [32], naïve Bayes trees (NBT) classifier coupled with random subspace ensemble (RS-NBT) [26], alternative decision trees combined with various ensemble methods [24], and the radial basis function neural network coupled with rotation forest (RBFRF) [33]. Generally, these hybrid ML techniques show promising alternative approaches compared with single ML approaches as their combination or integration usually gives better performance than using each individual machine learning or decision-making model alone. Hybrid models take advantages of individual ML methods; thus, they can learn the data more deeply and discover more accurately the relationship hidden in complex problems such as landslides.
In this study, the main objective is to apply a novel hybrid ML model named sequential minimal optimization-based support vector machines (SMOSVM), which is a combination of sequential minimal optimization (SMO) and SVM for accurate mapping of landslide susceptibility at the Mu Cang Chai District, Yen Bai province, which is one of the high landslide prone areas of Vietnam. Out of these methods, SVM is known as a benchmark single model and as one of the powerful classifiers which is widely used for classification problems in general and in landslide prediction in particular [34][35][36]. However, SVM has a disadvantage that it is not applicable for large and complex datasets as it uses inequality constraints to solve large scale quadratic programming problems arising during learning process which leads to great computational complexity [37]. Therefore, Platt [38] proposed SMO which can be used to overcome the limitations of SVM, and it can decrease the over-fitting and noise problems in training dataset [39]. Therefore, it is considered that hybrid model in combination of SMO with SVM can be faster and more effective in solving the prediction problems. This approach is based on the assumption that the problem of large quadratic programming in SVM could be divided into a series of the smallest possible problems that could be tackled analytically using two Lagrange multipliers per step [38]. Even though this approach is promising, so far its predictive capability has not been verified for landslide susceptibility mapping. Performance of the new hybrid model was validated and compared with single SVM, NBT models and a new hybrid model, namely Cascade Generalization Optimization-based SVM (CGSVM), using statistical based methods and receiver operating characteristic curve technique. Weka 3.9 (www.cs.waikato.ac.nz) and ArcGIS 10.3 software (ESRI, Redlands, CA, USA) were used for data processing and development of landslide susceptibility maps.

Description of the Study Area
The Mu Cang Chai District, which is one of the landslide prone area of Vietnam, located in the northwest part of Yen Bai Province was selected as a study area (Figure 1). This district is located between latitudes 21 • 39 00 N to 21 • 50 00 N and longitudes 103 • 56 00 E to 104 • 23 00 E, covering an area of approximately 1196 km 2 . The population of the Mu Cang Chai District in 2010 was 50,107 people, with a population density of about 42 people per km 2 . Climate of this area is temperate, tropical monsoon type. Rainfall in the area is relatively high, which varies from 3700 mm to 5490 mm and humidity about 81%. Annual temperature varies from 9.7 • C (December/January) to 28 • C (June/July). Majority of the area is covered by forest (61.76%), followed by barren lands, cultivated lands, residential area, and scrub lands.

Landslide locations
Landslide locations were recorded from aerial photographs (scale 1:33,000), Google Earth images, and field surveys. Validation of the landslide events was done in the field under the Vietnam Institute of Geosciences and Mineral Resources (VIGMR) national project named "Survey, assessment and zoning of landslide warning in the mountainous region of Vietnam" (Figure 2). In Topography of the area is dominated by elongated ridges (hills) and intervening valleys. Elevation ranges from 280 m to 2820 m with mean elevation 1515 m. Mountain slopes are relatively steep, up to 88 degrees. A major part of the area is occupied by extrusive and intrusive magmatic (volcanic) rocks. Metamorphic and sedimentary rocks are also present in this area. Tectonically, the area is still active, as evident by earthquake activities.

Landslide locations
Landslide locations were recorded from aerial photographs (scale 1:33,000), Google Earth images, and field surveys. Validation of the landslide events was done in the field under the Vietnam Institute of Geosciences and Mineral Resources (VIGMR) national project named "Survey, assessment and zoning of landslide warning in the mountainous region of Vietnam" (Figure 2). In total 248 landslide locations were identified to construct landslide inventory map ( Figure 1). The landslide inventory was used to assess the spatial relationship between landslide events and landslide conditioning factors. Five types of landslides observed in this area namely rotational (124 events), mixed (36 events), translational (35 events), toppling (45 events), and debris slides (eight events). Most of landslides in this area are triggered by heavy rains during monsoon.

Landslide Influencing Factors
Landslide affecting factors which depend on the local topography, geology, meteorology, and other geo-environmental factors, such as slope, elevation, aspect, curvature, plan curvature, profile curvature, land use, lithology, distance to faults, distance to roads, distance to rivers, fault density, road density, river density, and rainfall, were selected for landslide susceptibility analysis in this study. For evaluating relationship of these factors with landslide events, Frequency ratio (FR) analysis was performed based on number of landslide pixels per number of pixels of each class of the affecting factor [6].
Aspect is defined as the direction of slopes faces [3] which affects the precipitation and solar radiation [40,41]; thus, it affects landslide occurrences [3]. Thus, an aspect map was prepared from a Digital Elevation Model (DEM) with 20 m spatial resolution which was generated from topographical map at the scale of 1: 500,000 ollected from the VIGMR, and classified into nine classes (Figure 3a). The highest FR values of landslide occurrence were obtained for southwest (FR = 1.2) slopes. Other slopes (west, south, and east slope with FR > 1 were also observed susceptible to landslide

Landslide Influencing Factors
Landslide affecting factors which depend on the local topography, geology, meteorology, and other geo-environmental factors, such as slope, elevation, aspect, curvature, plan curvature, profile curvature, land use, lithology, distance to faults, distance to roads, distance to rivers, fault density, road density, river density, and rainfall, were selected for landslide susceptibility analysis in this study. For evaluating relationship of these factors with landslide events, Frequency ratio (FR) analysis was performed based on number of landslide pixels per number of pixels of each class of the affecting factor [6].  (Figure 3h). The FR values and distance to roads conditioning factor are having reverse relationships, the FR values are increasing with the reduction of distance from roads ( Figure 4). Higher value of FR (5.7) was obtained for the class of 0-50 m distance to road in the study area.
Slope is one of the most important factors for landslide incidence [41,[52][53][54]. However, it should be considered in relation with the slope materials to analyze landslide occurrences as the shear resistance of the slope of unconsolidated materials decreases as slope angles increase [3]. Normally, landslides have high FR in moderate slopes (30-40 degree) [40]. A slope map of the study area was generated from DEM with several classes  (Figures 3i and 4).
Rainfall is one of the triggering factors for landslide occurrences in the northern part of Viet Nam including the study region [55,56]. Rainfall decreases the shear resistance of ground/rock mass due to saturation [41,57]. Rainfall map of the study region was generated using rainfall data of 31 years (1984 to 2014) obtained from Global Weather data for SWAT [41,58] and classified into different classes based on annual average rainfall: 3771-4000, 4000-4250, 4250-4500, 4500-4750, 4750-5000, 5000-5250, and 5250-5491 mm ( Figure 3j). Analysis of the FR data indicates that the threshold value of landslide occurrences (FR = 1) is at lower rainfall values (4000-4250 mm), therefore, higher rainfall values are not increasing the landslide events as the slopes already failed at lower values ( Figure 4).
Profile curvature presents the rate of slope change over each terrain unit [44]. Profile curvature map was derived from the DEM into different classes ( Figure 3k). The FR analysis indicates that the class: [(−52.003)-(−9.183)] is most susceptible to landslide occurrences ( Figure 4).
Plan curvature indicates terrain surface bending on slope in perpendicular direction [44] affecting the stability of slopes in hilly areas. Plan curvature map was generated from DEM in different classes ( Figure 3l). Plan curvature class: [(−334.189)-(−69.843)] has the highest value of FR, suggesting that this class is more prone to landside occurrences than other classes.
Lithology plays an important role in landslide occurrences as different types of rocks have different geo-mechanical properties affecting the stability of slopes [41,57,59]. Generally, metamorphic and sedimentary rocks have more frequency of landslide occurrences than igneous rocks due to presence unfavorable discontinuities [41]. A lithology map of the study area was generated from the Geological and Mineral Resources Map of the Mu Cang Chai District on 1:50,000 scale. Different lithological groups present in the area include group 1 (igneous magmatic rocks), group 2 (intrusive magmatic rocks), group 3 (sedimentary rocks), group 4 (mafic-ultramafic magma rocks), group 5 (carbonate rocks), and group 6 (quaternary deposits). These groups are based on estimated strength, degree of weathering, and mineral composition [60,61] (Figure 3m). The FR value reveals that group 1 (FR = 1.1) of lithology has the most potential for landslide occurrence in this area ( Figure 4).
Land use pattern affects the stability of slopes depending on its use for cultivation, forest, building, vacant, or barren land. Anthropogenic activities also disturb the natural environment of ground slope [40]. Land use map of the study area was generated using air photos on 1:33,000 scale and classified into five classes: barren land, cultivated land, forestland, residential area, and scrubland ( Figure 3n). The FR values indicate that residential areas (FR = 4.4) and cultivated lands (FR = 2.4) are most susceptible to landslide occurrences in comparison to other classes (FR < 1).

Dataset Generation
Training and testing datasets were generated training and validating models [63]. In the present study, landslide locations were randomly classified into two sets: (1) 70% landslide location for training dataset; and (2) 30% landslide locations for testing dataset using random data classification tool of ArcGIS. The ratio of random classification was decided based on the standard practice mentioned in the literature [63]. Data conversion in 20 × 20 m pixel size was done to maintain the uniformity with other layers. A separate dataset of non-landslide points was also extracted from non-landslide areas for the analysis. More specifically, 174 landslide points and 174 non-landslide points were used to generate training dataset, 74 landslide points and 74 non-landslide points were utilized to generate testing dataset. Finally, landslide-affecting factor maps were used to sample with these landslide and non-landslide points for generating the final datasets for further processing in models.

Support Vector Machines (SVM)
SVM was introduced by Vapnik [64], which is known as one of the best classifiers for solving many real classification problems including landslides [14]. The main principle of SVM is to find the optimal hyper-plane to classify two variables of binary classification problems [63]. This hyper-plane in a three-dimensional space can classify the landslide and non-landslide points. The SVM function fits some hyper-planes and then the best one with the lowest classification error is selected and performed to final classify landslide and non-landslide points. For landslide prediction, suppose (x, y) is a vector of training dataset whereas x = x i , i = 1, 2, . . . , m represents landslide influencing factors (m is the number of factors), and y = (1, 0) represents classified variables (landslide and non-landslide). The optimal hyper-plane can be found during training process of the SVM as following expression [64]: where b is defined as the offset from the origin of the hyper-plane, k x i , x j are kernel functions which are defined as infinite dimensional feature spaces [65].
Using above Equation, the hyper plane is generated to divide two labels (landslide, and non-landslide) for classification, and it also causes the quadratic programing problems as following [64]: where C is the complexity parameter that controls the trade-off between allowance and maximizing margin for misclassification [66]; ε i are positive real constants [67].

Sequential Minimal Optimization (SMO)
SMO is known as an efficient algorithm for solving the quadratic programming problems arises during training process of SVM. It was applied widely for training SVM especially for complex problems with large and complicated datasets [38]. During the SVM learning process, SMO is applied simultaneously to optimize the quadratic programming problems that has the penalty for misclassification, as shown in Equation (2) [66]. In other words, SMO is an algorithm that optimizes the result of the SVM classification algorithm. It is possible to misclassify some cases of landslides during the training process by the SVM model. To avoid this error during training, SMO, which uses the optimal quadratic programming problems, leads to accurate selection of the best hyper-plane for classifying landslide and non-landslide points. Therefore, SMO decreases the misclassification of SVM and, hence, improves the goodness-of-fit and thus prediction accuracy. It can be carried out in two main steps: (1) To identify and solve analytically the two Lagrange multipliers, at first, the constrained maximum value is obtained by the calculation of the constraints on the two Lagrange multipliers, and the constraint 0 ≤ β i ≤ C is utilized to restrict two Lagrange multipliers within a diagonal line [68]. Lagrange multipliers are then shifted to the point with the lowest value of the objective function [68]. (2) To choose suitable Lagrange multipliers using heuristics for optimizing the quadratic programming problems [38], two heuristics are utilized to choose two suitable Lagrange multipliers [38]. One heuristic is employed to train all samples in the first multiplier and identify those that do not satisfy the Karush-Kuhn-Tucker (KKT) conditions [38]. A second heuristic is utilized to maximize approximately the size of the previous step in the second multiplier during the optimization process. Suitable Lagrange multipliers are selected based on selection of the sample having the largest error difference from the previous sample [68].

Cascade Generalization (CG)
CG, proposed in 2000, has been extensively employed in domains of ensemble learning [69][70][71][72]. Different from conventional stacking algorithm consisting of multiple levels, in the procedure of CG algorithm, the outputs of base level are utilized to generate new features to samples in original data for the purpose of extending input space [73]. Therefore, CG can be considered as a sequential framework, which is used to integrate various classifiers while stacking is parallel. Additionally, CG possesses other merits as well, including that even classifiers on intermediate levels have access to the original attributes, and the computational efficiency is significantly enhanced without internal cross validation [39]. It should be also noted that there exist two cascade generalization schemes, respectively, loose coupling and tight coupling schemes [69].
Suppose that the original training data D can be expressed as the following form: where y m is the corresponding class label of the m-th sample. X m represents the original attribute vector of the m-th sample. M is the total number of samples. The metadata produced by inputting original training data D into the base level classifiers can be described as below: where C m denotes the vector of predictive classes which are generated by various base level classifiers. When addressing binary classification problems, Equation (4) can be rewritten as follows if these base level classifiers output conditional probability distributions: where p n and p p mean the probability distributions of negative and positive classes namely. c km represents the predictive class derived from the k-th base level classifier. CG can improve performance of the base classifier by decreasing the bias in training dataset [39]. CG belongs to the family of stacking generalization algorithms [74]. The training is done by this technique at two or more levels including: (i) a learning algorithm is used to combine the outputs of the base classifier (SVM). The original training dataset constitutes the level zero data; however, level one is the outputs of the base classifier and (ii) the level one dataset is used to prepare the final classification. Eventually, the final results can be obtained by processing the metadata on multiple learning levels using the aforementioned procedure. In other words, at this stage the results of classification by base classifiers (such as SVM) are combined to obtain the final decision [39].

Naïve Bayes Trees (NBT)
NBT, belonging to the family of decision tree algorithms, is known as a combination of naïve Bayes theory and decision tress [75]. In terms of the NBT structure, the most significant feature is that naïve Bayes classifier is adopted on each leaf node and decision trees is adopted on each node [76]. For landslide prediction, suppose (x, y) is a vector of training dataset whereas x = xi, i = 1, 2, . . . , m represents landslide influencing factors (m is the number of factors), and y = (1, 0) represents classified variables (landslide and non-landslide). In this model, firstly, the tree is grown using a decision tree algorithm. A landslide conditioning factor with the highest entropy is selected as the root and then the tree will be divided and nodes appear. When all landslide examples are labeled to their classes the algorithm is stopped and the leaf nodes are created. Consequently, a naïve Bayes algorithm is constructed for each leaf using the data associated with that leaf. Finally, the probability values for each pixel of training and then for all pixels of study area are assigned and computed to prepare landslide susceptibility map. Specifically, the NBT classifier can be implemented using the following formula [77]: where PP(t i ) refers to the prior probability of the output variables t i = (1, 0). r i is the i-th attribute in training dataset. σ and ε correspondingly denote the mean value and standard deviation of r i . In the process of establishing decision trees, the gain ratio (GR) values are calculated by Equation (7) in an effort to control tree growth [78]: where U represents the training dataset in this case.

Evaluation and Comparison Methods
For validation, two quantitative methods were applied, namely the statistical index (SI)-based method and the receiver operating characteristic (ROC) curve method. These two methods are applied widely to validate the performance of the models [14,43,79]. The SI-based method is the evaluation based on the values of statistical indexes such as sensitivity (SST), specificity (SPF), accuracy (ACC), kappa, and root mean squared error (RMSE). SST shows the degree of success of the model in correctly classifying the number of landslides pixels whereas SPF shows the degree of success of the model in correctly classifying the number of non-landslide pixels [14]. ACC indicates the degree of success of the model in correctly classifying the number of landslides and non-landslide pixels (the general performance of the landslide model). Kappa shows how reliable the model is for landslide prediction. RMSE shows how accurate the model is for landslide classification [80]. Higher the values of SST, SPF, ACC, and kappa show better performance of landslide models. Lower values of RMSE indicate better predictive capability of landslide models [14]. These statistical indices can be calculated using four where P exp is expected agreements, X Pred. is the predicted values in the training dataset or the validation dataset; X act. is the actual values from the landslide susceptibility model and n is the total samples in the training dataset or the validation dataset. ROC curve is a graphical measure to assess the overall performance of prediction models [82,83]. It is plotted in a two-dimensional space using the SST and 100-SPF on the x-axis and y-axis, respectively [84,85]. To assess the general performance of a given model, the area under the ROC curve (AUC) is used [86]. Mathematically, higher AUC metric indicate better performance of a given model. A model with AUC equals to 0.5 is an inaccurate model (random accuracy model); however, a value of 1 indicates a perfect model [87].

Linear Support Vector Machine (LSVM) Feature Selection
In spatial prediction modeling, selection of appropriate input factors is one of the most important steps and on the other hand there is no global guideline for the selection of landslide conditioning factors [88]. In the present study, LSVM was applied for the selection of the proper conditioning factors using the following equation [89,90]: where m = (m 1 , m 2 , m 3 , · · ·, m 12 ) is the input vector containing the factors, w T is the inverse matrix, and n is the offset from the origin of the hyper-plane [89].

Methodological Flow Chart and Steps
In the current research two novel classifier ensemble methods, namely SMOSVM and CGSVM models, were applied for the development of landslide susceptibility maps. SMOSVM is a hybrid approach of SMO and SVM models and the CGSVM model is constructed based on CG and SVM. Performance of the SMOSVM and CGSVM models were compared with other single benchmark models (SVM and NBT). The current study was conducted in four main steps: (I) preparation of the influencing factor maps and landslide/non-landslide inventory map, (II) factor selection using LSVM, (III) landslide susceptibility modelling, and (IV) model validation and comparison ( Figure 5).  Table 1 shows average merit (AM) and standard deviation (SD) metrics of factor selection and also determine the order of significance of each of the conditioning factors using the LSVM technique on landslide susceptibility modeling by the training dataset. AM is a criterion to state the role of each factor on landslide occurrence. A higher value of AM for a given factor shows a greater significant factor for landslide incidence in the modelling process [4,24]. Results indicate that although all factors are important factors in the present study, but a road density with an AM of 14.7 is the most important factor for landslide incidence in this area as the construction of roads creates more instability in the groundmass/rock mass. It is followed by lithology (AM = 13.7), distance to roads (AM = 12.9), distance to faults (AM = 11.   Table 1 shows average merit (AM) and standard deviation (SD) metrics of factor selection and also determine the order of significance of each of the conditioning factors using the LSVM technique on landslide susceptibility modeling by the training dataset. AM is a criterion to state the role of each factor on landslide occurrence. A higher value of AM for a given factor shows a greater significant factor for landslide incidence in the modelling process [4,24]. Results indicate that although all factors are important factors in the present study, but a road density with an AM of 14.7 is the most important factor for landslide incidence in this area as the construction of roads creates more instability in the groundmass/rock mass. It is followed by lithology (AM = 13.7), distance to roads (AM = 12.9), distance to faults (AM = 11.1), elevation (AM = 10.9), plan curvature (AM = 9.1), fault density (AM = 8.3), profile curvature (AM = 7.7), distance to river (AM = 7.2), slope (AM = 6.6), aspect (AM = 5.8), curvature (AM = 3.4), land use (AM = 3.2), rainfall (AM = 3.1), and river density (AM = 2.3). However, rainfall has an AM value 3.1, but it is one of the most important triggering factors of landslides. Similarly, erosion and scouring processes are caused by the action of rivers, especially during monsoons. Therefore, all 15 factors, even though they may not have higher AM values, contribute to the occurrence of landslides, and were considered in the present landslide susceptibility modeling.

Model Construction
Landslide model of SMOSVM was constructed using training dataset generated from the selected factors. Basically, selection of the complexity parameter (C > 0) affects performance of the SMOSVM model [66]. Therefore, the complexity parameter is needed to set up to obtain the highest predictive capability of the SMOSVM model. Krawiec and Bhanu [91] and Kibriya et al. [92] suggested to set the complexity parameter to 10, however, Kurokawa et al. [93] set the complexity parameter equals to 1. In general, no agreement has reached in selection of the certain complexity parameter. In the present study, trial-and-error process [41] was applied to optimize the value of the complexity parameter. The AUC value was utilized to evaluate performance of the SMOSVM model with various values of the complexity parameter. The value of the complexity parameter corresponding to the highest AUC value is selected to build the SMOSVM model. The performance of the SMOSVM model with various values of the complexity parameter is shown in Figure 6. It can be observed that the SMOSVO model has the highest AUC value with the complexity parameter of 7. Therefore, the complexity parameter is set to 7 for training the SMOSVM model in this study. The same value of the complexity parameter was also applied for training individual SVM model and CGSVM. In addition, 10 iterations were used to train the CGSVM.

Model Validation and Comparison
The landslide model of SMOSVM was validated using training (goodness-of-fit) and testing (performance) datasets and different quantitative/statistical metrics. Results of training and testing datasets are shown in Figures 7-9. The training results (Figure 7a) indicate that the highest PPV (%) metric was obtained for the CGSVM (88.50%) model, followed by SMOSVM (86.8%), SVM (79.30%),
Regarding to RSME values of training (0.289) and validation (0.412) datasets (Figure 8), SMOSVM has the highest goodness-of-fit and performance compared with other landslide models such as CGSVM (RMSE training = 0.379 and RMSE validation = 0.426), SVM (RMSE training = 0.391 and RMSE validation = 0.426), and NBT (RMSE training = 0.420 and RMSE validation = 0.426). In addition to the above-mentioned statistical metrics, the kappa index also was used for model validation and comparison using training and validating detests (Figure 9). Results show that based on the training detest, the kappa value for SMOSVM (0.74) is the highest value. It is followed by CGSVM (0.71), SVM (0.56), and NBT (0.52), respectively. However, using validating dataset results show that SMOSVM (0.5) has the highest value of kappa compared with other models.

Development of Landslide Susceptibility Maps
Landslide susceptibility maps of the study area were constructed using analysis of results of the SMOSVM, CGSVM, SVM, and NBT models. Geometrical Intervals (GI) method [94] was used to reclassify landslide susceptibility indexes to make different susceptible classes of all susceptibility maps such as very low, low, high, and very high ( Figure 10). For example, in SMOSVM, these classes belonged to (0.004-0.122), (0.122-0.183), (0.183-0.301), (0.301-0.534), and (0.534-0.990), respectively (Figure 10a). Reliability of these maps was evaluated by correlating with the past landslide locations by overlay analysis (Figure 11). It can be pointed out that in SMOSVM moderate class has the highest number of pixels (26.1%), followed by very low and low (22%), high (17.3%), and very high (12.5%), respectively. Moreover, largest numbers of landslide pixels were observed in very high class (86.7%), followed by high and moderate (5.24%), low (2.02%), and very low (0.806%), respectively. In CGSVM, the class of very low susceptibility was assigned most (highest) value of pixels (40.8%) while the lowest one was obtained for the high (10.8%) and very high (11.2%) susceptibility classes. In this model, the highest landslide pixels were obtained for the very high susceptibility class (45.6%), followed by the moderate (16.1%), low (14.9%), high (14.1%), and very low (9.27%) classes. In term of SVM, results conclude that very high class has the highest number of pixels (23.4%), followed by low (21.9%), very low (21.4%), high (17.8%) and moderate (15.4), respectively. However, the largest numbers of landslide pixels were observed in very high class (69.4%), followed by high (13.7%), moderate (8.87%), low (6.85%), and very low (1.21%), respectively. In NBT, value of 36.5% as the highest pixel value was assigned for the moderate class, followed by low (30.8%), high (21.4%), very high (7.2%), and very low (4.19%). Moreover, value of 44% was assigned for very high susceptibility class. It is followed by high (30.2%), moderate (20.2%), low (5.65%), and very low (0%), respectively ( Figure 11). Results of analysis show that landslide susceptibility maps produced by these models are reliable as the number of landslide pixels progressively increased from very low susceptibility to very high susceptibility classes. However, the map produced by the proposed SMOSVM model is the most reliable in comparison to other models.

Development of Landslide Susceptibility Maps
Landslide susceptibility maps of the study area were constructed using analysis of results of the SMOSVM, CGSVM, SVM, and NBT models. Geometrical Intervals (GI) method [94] was used to reclassify landslide susceptibility indexes to make different susceptible classes of all susceptibility maps such as very low, low, high, and very high ( Figure 10). For example, in SMOSVM, these classes belonged to (0.004-0.122), (0.122-0.183), (0.183-0.301), (0.301-0.534), and (0.534-0.990), respectively (Figure 10a). Reliability of these maps was evaluated by correlating with the past landslide locations by overlay analysis (Figure 11). It can be pointed out that in SMOSVM moderate class has the highest number of pixels (26.1%), followed by very low and low (22%), high (17.3%), and very high (12.5%), respectively. Moreover, largest numbers of landslide pixels were observed in very high class (86.7%), followed by high and moderate (5.24%), low (2.02%), and very low (0.806%), respectively. In CGSVM, the class of very low susceptibility was assigned most (highest) value of pixels (40.8%) while the lowest one was obtained for the high (10.8%) and very high (11.2%) susceptibility classes. In this model, the highest landslide pixels were obtained for the very high susceptibility class (45.6%), followed by the moderate (16.1%), low (14.9%), high (14.1%), and very low (9.27%) classes. In term of SVM, results conclude that very high class has the highest number of pixels (23.4%), followed by low (21.9%), very low (21.4%), high (17.8%) and moderate (15.4), respectively. However, the largest numbers of landslide pixels were observed in very high class (69.4%), followed by high (13.7%), moderate (8.87%), low (6.85%), and very low (1.21%), respectively. In NBT, value of 36.5% as the highest pixel value was assigned for the moderate class, followed by low (30.8%), high (21.4%), very high (7.2%), and very low (4.19%). Moreover, value of 44% was assigned for very high susceptibility class. It is followed by high (30.2%), moderate (20.2%), low (5.65%), and very low (0%), respectively ( Figure 11). Results of analysis show that landslide susceptibility maps produced by these models are reliable as the number of landslide pixels progressively increased from very low susceptibility to very high susceptibility classes. However, the map produced by the proposed SMOSVM model is the most reliable in comparison to other models.

Evaluation of Landslide Susceptibility Maps
To assess prediction performance of the models and accuracy of produced maps, ROC curve and FR analysis were used. Results of the graphical analysis ( Figure 12) illustrate that the SMOSVM model has the highest value of AUC for both training dataset (0.964) and testing dataset (0.

Evaluation of Landslide Susceptibility Maps
To assess prediction performance of the models and accuracy of produced maps, ROC curve and FR analysis were used. Results of the graphical analysis ( Figure 12) illustrate that the SMOSVM model has the highest value of AUC for both training dataset (0.964) and testing dataset (0.  Finally, the values of FR in NBT for very low, low, moderate, high and very high susceptibility classes are 0, 0.183, 0.553, 1.42, and 6.1, respectively. This study indicates that the FR values from very low to very high susceptibility classes progressively increased; which imply that all landslide models are reliable and have good performance.

Discussion
Landslides are one of the most devastating natural hazards in hilly regions all over the world. Progressively, landslide models are being developed using statistical methods and ML techniques to accurately predict landslides for timely taking preventive and protective measures [95]. With this

Discussion
Landslides are one of the most devastating natural hazards in hilly regions all over the world. Progressively, landslide models are being developed using statistical methods and ML techniques to accurately predict landslides for timely taking preventive and protective measures [95]. With this Figure 13. Analysis of FR of the susceptibility maps of the models.

Discussion
Landslides are one of the most devastating natural hazards in hilly regions all over the world. Progressively, landslide models are being developed using statistical methods and ML techniques to accurately predict landslides for timely taking preventive and protective measures [95]. With this objective, we developed a novel hybrid model SMOSVM to predict accurately landslide occurrences at the Mu Cang Chai District, of Yen Bai Province, Viet Nam. For this, we applied the LSVM technique using a 10-fold cross validation method to select the most important landslide affecting factors. Model studies reveal that although all conditioning factors have positive roles on landslide incidence, road density with the highest average merit (14.7) is more significant for landslide modeling, followed by lithology and distance to roads. In this study, river density was observed the least effective factor. In other areas also factors related to roads are most important in land slide occurrences [14,21,62,95,96]. Main reason is that excavation of roads creates instability of hill slopes by the removal of toe supports and exposes weak geological features/planes on the slope face. This make the road sections vulnerable to slides and sometimes causes landslides at the time of road construction itself.
In the present study, ML and optimization algorithms were used in landslide prediction models as these techniques overcome over-fitting and noise problems. These techniques also have the higher goodness-of-fit and performance in comparison to other conventional models. Moreover, ML ensemble models and optimization algorithms are more powerful and flexible than the individual conventional and machine learning classifiers [33]. Considering the advantage of these models, a novel ensemble intelligence approach, namely SMOSVM, was adopted for landslide susceptibility mapping. For comparison and validation of the proposed model CGSVM, SVM, and NBT algorithms were used. Results indicate that SMOSVM outperforms and outclasses other models, such as CGSVM, SVM, and NBT, using both training (goodness-of-fit) and testing (performance) datasets.
In general, it can be stated that all landslide models perform well in the present study but the SMOSVM model has the highest predictive power for landslide prediction, followed by CGSVM, SVM, and NBT, respectively. It was also observed that performance of the hybrid model SMOSVM model significantly improved in comparison to single by 2% as per analysis of the ROC method. These findings are reasonable as SMOSVM used SMO technique to solve effectively quadratic programming problems. These techniques enhance not only the speed of the SVM model but also the predictive power of the model as it can decrease the over-fitting and noise problems in training dataset [39]. Predictive performance of SMOSVM was evaluated with standard models, such as SVM, which is known as one of the best classifiers for landslide prediction [14]. Another hybrid model, namely NBT, which is a hybrid approach of the naïve Bayes classifier [41] and decision tree classifier [97] is also an efficient method for landslide assessment; however, its performance might be affected by the independent assumption of naïve Bayes classifier [98]. As predictive capability of the SMOSVM model depends on the suitable selection of the complexity parameter ( Figure 6) its proper optimization was needed to achieve the best and reliable performance of this model. In the present study, based on the trial-and-error technique [41], the complexity parameter was set to 7 to gain the highest performance of the SMOSVM model.

Conclusions
The main objective of the study was to apply a novel hybrid ML model named SMOSVM, which is a combination of SMO and SVM for accurate mapping of landslide susceptibility at the Mu Cang Chai District, Yen Bai Province of Vietnam. SVM is known as a benchmark single model and as one of the powerful classifier, but has a disadvantage in solving large scale quadratic programming, whereas the SMO algorithm overcame the limitations of SVM as SMO has several advantages, such as (i) being a simple and fast training algorithm and being easy to implement; (ii) it can be more successful when the data is large and inputs are spares; and (iii) it can decrease the complexity of difficult problems thus can enhance performance of models.
Preparation of landslide susceptibility maps was carried in this study out using two optimization algorithms namely SMOSVM and CGSVM. Performance of the models was evaluated and validated using area under ROC curve (AUC) and standard statistical measures and results were compared with other benchmark landslide models such as SVM and NBT. Analysis of results indicated that although all landslide models performed well, prediction power of SMOSVM (AUC = 0.824) is the best, followed by CGSVM (AUC = 0.815), SVM (AUC = 0.804), and NBT (AUC = 0.800) models, respectively. Therefore, the SMOSVM model can be considered as a promising method for landslide susceptibility assessment. The present study confirmed that that hybrid model in combination of SMO with SVM is more effective in solving the prediction problems. SMOSVM can be used for the landslide prediction and properly management of landslide-prone areas. More studies are required to select best input parameters including geo-mechanical properties of the rock mass/ground mass in the models for further refining the prediction capabilities of ML methods.