Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling

: This paper presents novel hybrid machine learning models, namely Adaptive Neuro Fuzzy Inference System optimized by Particle Swarm Optimization (PSOANFIS), Artiﬁcial Neural Networks optimized by Particle Swarm Optimization (PSOANN), and Best First Decision Trees based Rotation Forest (RFBFDT), for landslide spatial prediction. Landslide modeling of the study area of Van Chan district, Yen Bai province (Vietnam) was carried out with the help of a spatial database of the area, considering past landslides and 12 landslide conditioning factors. The proposed models were validated using different methods such as Area under the Receiver Operating Characteristics (ROC) curve (AUC), Mean Square Error (MSE), Root Mean Square Error (RMSE). Results indicate that the RFBFDT (AUC = 0.826, MSE = 0.189, and RMSE = 0.434) is the best method in comparison to other hybrid models, namely PSOANFIS (AUC = 0.76, MSE = 0.225, and RMSE = 0.474) and PSOANN (AUC = 0.72, MSE = 0.312, and RMSE = 0.558). Thus, it is reasonably concluded that the RFBFDT is a promising hybrid machine learning approach for landslide susceptibility modeling. further map PSOANN 63% the and in the


Introduction
Landslides are gravitational movements of slope-framing materials caused by natural and anthropogenic activities [1]. They are considered one of the major hazards affecting human life, m. High mountains, namely Tay Con Linh and Kieu Lieu Ti, are located on the western side. Bac Ha, Quan Bạ, and Dong Van are the plateaus (highlands) located on the northern side, with an average elevation of 1000-1200 m. Dong Van Plateau is the highest at 1600 m. The midlands (elevation 100-150 m) are on the southwest side. The lowest elevation in the area is in the southeast.
Hills and valleys are generally aligned in the northwest to southeast direction, parallel to the orientation of geological faults. Drainage density in the area is high and most of the drainage is structurally controlled. Hill slopes are very steep in places (up to 84°). Narrow valleys and steep hill slopes are some of the main factors causing landslides, besides heavy rains and anthropogenic activity. Changes in the land use pattern for cultivation of rice on terraces and other developmental activities increased the landslide occurrences in the area. Accumulation of irrigation water on the terraces increases effective weight and reduces the strength of the slope-forming materials, thus adversely affecting the stability of slopes. Hills and valleys are generally aligned in the northwest to southeast direction, parallel to the orientation of geological faults. Drainage density in the area is high and most of the drainage is structurally controlled. Hill slopes are very steep in places (up to 84 • ). Narrow valleys and steep hill slopes are some of the main factors causing landslides, besides heavy rains and anthropogenic activity. Changes in the land use pattern for cultivation of rice on terraces and other developmental activities increased the landslide occurrences in the area. Accumulation of irrigation water on the terraces increases effective weight and reduces the strength of the slope-forming materials, thus adversely affecting the stability of slopes.
Geologically, the study area is occupied by igneous, metamorphic, and sedimentary rocks belonging to the Tu Le-Ngoi Thia complex (21.56%), Tram Tau formation (15.42%), and Ca Vinh complex (13.17%). Rock mass in this area is highly weathered. Depth of weathering varies from 10 m to 18 m. Most of the landslides are observed in the weathered Tu Le-Ngoi Thia complex (10.78%), Tram Tau formation (10.18%), and in gabbro and diabase rocks (11.38%) ( Figure 2 and Table 1). Weathered rocks have high permeability and low strength, resulting in slope failure. Tram Tau formation (10.18%), and in gabbro and diabase rocks (11.38%) ( Figure 2 and Table 1). Weathered rocks have high permeability and low strength, resulting in slope failure.

Landslide Inventory
A landslide inventory showing the location and type of landslides occurring in the area is important for the development of landslide models. In this area, 167 landslides were identified from Google Earth images and air photos checked against the available historical record and limited field investigations. Based on these data, a landslide inventory map was constructed. Translational, rotational, mixed, and debris flow types of landslides occur in the area. Translation type of landslides are prominent in the study area, hence only these landslides were taken into account for modeling. National Road No. 32 is most affected by landslide hazards (Figure 3). The size of landslides varies from a few cubic meters to thousands of cubic meters. We selected the center of each scar (polygon) of the landslide as one point with a cell size of 20 m for sampling as we considered that most of the pixels of a landslide polygon have identical conditions for landslide occurrence in similar types of slope-forming materials [17,18].

Landslide Influencing Parameters
In landslide modeling, it is very important to select the suitable affecting factors for landslide assessment. In our study, the selection of factors is based on the analysis of the nature of landslide occurrences in relation to the characteristics of geomorphology, geology, hydrology, meteorology, and human impacts in the study area. Thus, we have selected 12 factors, namely slope, aspect, elevation, curvature, slope length, valley depth, distance to rivers, distance to roads, distance to faults, Topographic Wetness Index (TWI), and Terrain Ruggedness Index (TRI), for landslide analysis and modeling. Each factor was classified into several classes based on the standard classification for lithology and aspect, natural break method for slope and expert's knowledge method for elevation, curvature, slope length, valley depth, distance to rivers, distance to roads, distance to faults, TWI, and TRI [19][20][21][22][23]. In addition, the Frequency Ratio (FR) method, which is defined as the percentage of the number of landslide pixels per the percentage of the number of class pixels in the study area, was applied to assess the spatial relationship between the landslides and 12 conditioning factors (Table 2). curvature, slope length, valley depth, distance to rivers, distance to roads, distance to faults, TWI, and TRI [19][20][21][22][23]. In addition, the Frequency Ratio (FR) method, which is defined as the percentage of the number of landslide pixels per the percentage of the number of class pixels in the study area, was applied to assess the spatial relationship between the landslides and 12 conditioning factors ( Table  2).
Slope is important in landslide susceptibility study [24]. A slope angle map of the study area was generated from a Digital Elevation Model (DEM) with 20 m spatial resolution, which was generated from the topographic map of 1:50000 scale. A total of six classes (0-7.92, 7.92-17.82, 17.82-26.07, 26.07-34.65, 34.65-44.88, and 44.88-84.16°) were obtained on the slope map using the natural break method in GIS application (Figure 4a). According to the FR analysis, slopes in this area between 7.92°and 34.65° had the high FR values, ranging from 1.13 to 1.69, which indicate the highest susceptibility to landslide occurrences in these three classes.  Slope is important in landslide susceptibility study [24]. A slope angle map of the study area was generated from a Digital Elevation Model (DEM) with 20 m spatial resolution, which was generated from the topographic map of 1:50000 scale. A total of six classes (0-7.  (Figure 4a). According to the FR analysis, slopes in this area between 7.92 • and 34.65 • had the high FR values, ranging from 1.13 to 1.69, which indicate the highest susceptibility to landslide occurrences in these three classes.
Aspect is a significant factor in the development of landslide susceptibility maps [25]. A map of aspect was extracted from the DEM with nine slope aspect classes: north (0-22. Elevation is one of the important factors in the occurrence of landslides as height affects the loading on the slope and thus enhances the chances of landslides when the sliding plain has a dip (orientation) towards the open excavation [26]. The weathering profile also depends on the elevation of the area. An elevation map was extracted from the DEM 20 m including seven classes (0-200, 200-400, 400-600, 600-800, 800-1000, 1000-1200, 1200-1400, 1400-1600, 1600-1800, and 1800-2542 m) (Figure 4c). The FR analysis indicated that the class of 400-600 m above sea level is the most susceptible (FR = 1.66), whereas above elevation 1400 m the frequency of occurrence of landslide susceptibility is the lowest. This might be due to more weathering on the middle height slope in comparison to higher levels.
Curvature is an important landslide affecting factors such as the runoff or accumulation of water on the slope, depending on the type of curvature [27]. In this study, a curvature map was extracted from the DEM 20 m and classified as concave, convex, or flat depending on its value either below, above, or equal to 0.05, respectively ( Figure 4d). The FR analysis showed that 55.69% of landslides occurred in concave class curvature slopes, which occupy 41.71% of the area. The occurrence of more landslides on a concave surface can be related to the accumulation of more water on such slopes.
Slope length is the distance from the origin of the landslide's flow along its flow path to the place of its runout distance or end. The parameters that control the runout distance of a landslide are geometry, physical property, and frictional coefficients. A slope length map was constructed from the DEM 20 m using SAGA tool with six classes (0-20, 20-50, 50-100, 100-150, 150-200, and 200-2501 m) ( Figure 4e). The FR analysis based on the slope length map showed that the highest susceptibility to landslide incidence is in the 200-500 m slope length class ( Table 2). This may be due to the topography and structure of the area.
Valley depth controls the weathering process and water transportation and accumulation; thus, it affects landslide occurrences. In this area, a total valley depth map was constructed from the DEM 20 m using SAGA tool considering six classes of depth (0-5, 5-30, 30-60, 60-100, 100-150, and 150-656 m) ( Figure 4f). The FR analysis showed that the most landslide-susceptible class is at 100-150 m (FR = −1.62), whereas the lowest FR value (0.47) was obtained for valley depth >150 m.
Distance to rivers is one of the most important factors for the stability as distance from a river affects the saturation degree of the slope-forming materials (Dai et al., 2001;Saha et al., 2002). A distance to rivers map was constructed on the basis of buffering the rivers extracted from the topographic map (1: 50,000) with five classes (0-100, 100-200, 200-300, 300-400, and >400 m) (Figure 4g). The FR analysis indicated that with the increase of the distance to the rivers, the probability of landslide occurrence is decreased. Specifically, most of the landslides are located within the 100-200m distance class (FR = 1.56).
Distance to roads is one of the factors that most affects landslide occurrences as most of the landslides are observed close to roads [28]. In this study, a distance to roads map was constructed on the basis of buffering the roads extracted from the topographic map (1: 50,000) and divided into five different buffer class (0-100, 100-200, 200-300, 300-400, and >400 m) (Figure 4h). The FR analysis indicated that most landslides occurred within 0-100 m from roads.
Distance to faults is one of the most important affecting factors as slope may fail along faults depending on the nature and orientation of faults [29]. Faults with clay gouge and dipping towards the slope face are the most unfavorable features for slope stability. In the study area, a distance to faults map was constructed with five different buffer classes on the basis of buffering the faults extracted from the geological map (1: 50,000) (0-250, 250-500, 500-750, 750-900, and >900 m) (Figure 4i). The FR analysis indicated that with increasing distance from the faults, the probability of landslides is decreased. In this area, fault distance between 250 m and 500 m was most vulnerable to landslide occurrence (FR = 1.56).
Lithology plays a very important role in landslide occurrences as soft and weathered rocks are more vulnerable than hard unjointed rocks, thus lithological units have different vulnerability to landslides [30]. In the study area, a lithology map was extracted from the Geological and Mineral Recourses Map on a scale of 1:50,000 with seven major lithological units (A, B, C, D, E, F, and G) ( Figure 4j and Table 3). The FR analysis indicated that group A has the highest FR value (1.46), while group C has the lowest value (0.26) ( Table 2).
Topographic Wetness Index (TWI) is a secondary geomorphometric parameter used to describe and quantify local relief [31] as it reveals the diversity and complexity of landslide topographic surface. As the slope-forming material moves, the TWI range increases. In this study, a TWI map was generated from the DEM 20 m using the SAGA tool with different classes (0-8, 8-9, 9-10, 10-11, and 11-24) ( Figure 4k). The FR analysis indicated that the class of 9-10 of TWI is the most susceptible (FR = 0.99) ( Table 2).
Terrain Ruggedness Index (TRI) proves capable of differentiating landslide population into smaller groups, consistent with their variable origin and mechanism of displacement. As the slope surface moves, the TRI range decreases. However, in the case of slump and rockslide, the calculation is different. In this study, a TRI map was generated from the DEM using the SAGA tool with different classes (0-1, 1-3, 3-5, 5-7, and >7) ( Figure 4l). The FR analysis indicated that the class of 3-5 of TRI is the most susceptible class ( Table 2). landslides is decreased. In this area, fault distance between 250 m and 500 m was most vulnerable to landslide occurrence (FR = 1.56).
Lithology plays a very important role in landslide occurrences as soft and weathered rocks are more vulnerable than hard unjointed rocks, thus lithological units have different vulnerability to landslides [30]. In the study area, a lithology map was extracted from the Geological and Mineral Recourses Map on a scale of 1:50,000 with seven major lithological units (A, B, C, D, E, F, and G) ( Figure 4j and Table 3). The FR analysis indicated that group A has the highest FR value (1.46), while group C has the lowest value (0.26) ( Table 2).
Topographic Wetness Index (TWI) is a secondary geomorphometric parameter used to describe and quantify local relief [31] as it reveals the diversity and complexity of landslide topographic surface. As the slope-forming material moves, the TWI range increases. In this study, a TWI map was generated from the DEM 20 m using the SAGA tool with different classes (0-8, 8-9, 9-10, 10-11, and 11-24) (Figure 4k). The FR analysis indicated that the class of 9-10 of TWI is the most susceptible (FR = 0.99) ( Table 2).
Terrain Ruggedness Index (TRI) proves capable of differentiating landslide population into smaller groups, consistent with their variable origin and mechanism of displacement. As the slope surface moves, the TRI range decreases. However, in the case of slump and rockslide, the calculation is different. In this study, a TRI map was generated from the DEM using the SAGA tool with different classes (0-1, 1-3, 3-5, 5-7, and >7) ( Figure 4l). The FR analysis indicated that the class of 3-5 of TRI is the most susceptible class ( Table 2).     The ANFIS was first introduced by Roger Jang [32]. It consists of two parts, a neural network (ANN) and a reasoning capability of Fuzzy Inference System (FIS) in order to enhance the power prediction for comparing the use of a single model [33]. In other word, the ANFIS is able to train FIS membership function (MF) parameters on a training dataset using a combination of back-propagation gradient descent and least-squares methods [34]. The FIS performed is based on the concepts of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning [35]. Among all FIS membership function, the Sugeno fuzzy model has been widely used due to high interpretability and computational efficiency, and built-in optimal and adaptive techniques [36]. The flowchart of ANFIS architecture is shown in Figure 5. In this figure, a circle indicates a node and rectangles denote adaptive nodes. We assumed that there are two FIS, including x and y and one input, z. At first, using the Sugeno fuzzy model, four fuzzy "if-then rules" can be developed; where, Ai and Bi are the fuzzy sets, and pi, qi, and ri are the parameters obtained during the training process. The ANFIS consists of five layers as follows ( Figure 5): Layer 1 (fuzzification): In this layer, the amount of the input variables will fuzzify and each node employs a node function by: where any fuzzy membership function (MFs) can be adopted on μAi(x) and μBi − 2(y) such as Triangle, Generalized bell (Gbell), and Gaussian. Layer 2 (fuzzy AND): in this layer, each node calculates the firing strength of a rule via multiplication.

Layer 3 (normalization):
In this layer, the firing strength of each node will be normalized using In this figure, a circle indicates a node and rectangles denote adaptive nodes. We assumed that there are two FIS, including x and y and one input, z. At first, using the Sugeno fuzzy model, four fuzzy "if-then rules" can be developed; R1 : If x is A 1 and y is B 1 , then z 1 = p 1 x + q 1 y + r 1 R2 : If x is A 1 and y is B 2 , then z 2 = p 2 x + q 2 y + r 2 R3 : If x is A 2 and y is B 1 , then z 3 = p 3 x + q 3 y + r 3 R4 : If x is A 2 and y is B 2 , then z 4 = p 4 x + q 4 y + r 4 (1) where, A i and B i are the fuzzy sets, and p i , q i , and r i are the parameters obtained during the training process. The ANFIS consists of five layers as follows ( Figure 5): Layer 1 (fuzzification): In this layer, the amount of the input variables will fuzzify and each node employs a node function by: where any fuzzy membership function (MFs) can be adopted on µA i (x) and µB i − 2(y) such as Triangle, Generalized bell (Gbell), and Gaussian.

Layer 2 (fuzzy AND):
in this layer, each node calculates the firing strength of a rule via multiplication.

Layer 3 (normalization):
In this layer, the firing strength of each node will be normalized using the ratio of firing strength of every node to the total value of each node.
where ω i is the normalized firing strength.

Layer 4 (fuzzy inference):
In this layer, each node has the following function: where ω i is the output of layer 3 and (p i ; q i ; r i ) is the consequent parameters set.

Layer 5 (defuzzification):
The overall outputs of all the rules will be obtained in this layer using the defuzzification process of the FIS, which is formulated as follows: In addition, the details of the ANFIS model can be observed in various studies including those by Chen, Panahi, and Pourghasemi [34], Jang [32], and Aghdam et al. [37].

Multilayer Perceptron Neural Networks
Artificial Neural Networks (ANNs), as a branch of Artificial Intelligence (AI), are nonlinear function approximation algorithms that can be used as a proper approach for classification and prediction problems such as landslides based on the degree of membership value of each pixel over the study area [38]. It indicates that with increasing the value of membership of each pixel, the probability of landslide occurrence will be increased. The ANNs have two functions, Multi-Layer-Perceptron (MLP) and Radial Base Function (RBF). Some researchers that have used the ANNs for landslide susceptibility mapping reported that the MLP is better than the RBF function in the detection of landslide locations [27,39].
The MLP consists of input, one and more hidden layers, and one output so that its complexity will increase when increasing the number of hidden layers [27]. In the landslide susceptibility assessment using the MLP, the condition factors are input layer, the result of landslide modeling, landslide and non-landslide, is output layer, and the classifying layers are the hidden layer [40].
This approach, based on the two main datasets, including training and testing datasets, was performed. A training dataset is applied for the training process, which it performs in two steps; firstly, the hidden layers propagate forward the input layer to output value and consequently the error is computed to compare the pre-value and target value. Secondly, during the training process, the weights will be regulated for achieving the best results with the least difference [41]. Moreover, in the testing phase, the validity of the obtained results (target values) based on some error criteria will be checked for future samples.
Consider that x = xi, i = 1, 2, . . . , n is the vector of landslide conditioning factors, y = yi, i = 1, 2 that indicates landslide and non-landslide classes. The MLP neural network function in the landslide modeling can be expressed as follows: where b is bias and f (x) is an unknown function that is optimized by the adjustable network weights during the training process for a given network architecture [40].

Particle Swarm Optimization (Pso)
The PSO is one of the evolutionary algorithms (meta-heuristic) developed by Kennedy et al. (1995). Design of the PSO is based on the nearest route to find food using the movement of biological organisms such as flocks and fish [42]. In recent years, it has been most popular in the optimization of nonlinear problems [34]. In this algorithm, a swarm of particles denotes a potential answer to the problem that searches for the best position based on the best solution. The fitness function can be used to assess the merit of the particles for calculating the fitness values. The particles in the PSO move along the feature space using a set of the following updated equations [42]: where x i and v i are the position and velocity of the i th particle in the feature space, respectively; w is the inertial weight coefficients; c 1 and c 2 are learning factors, and rand 1 and rand 2 are positive random numbers from 0 to 1. p best is the personal best position of particle i, and g best is the best among all of the particles. In this study, the PSO method is used to optimize the ANFIS and ANN modeling parameters to construct the PSOANFIS and PSOANN prediction models for landslide susceptibility assessment.

Rotation Forest
Rotation Forest (RF) is one of the meta ensemble algorithms that was first introduced by Rodriguez et al. [43] to enhance the power prediction of a weak individual classifier in comparing with using a weak individual classifier alone and also increasing the diversity of base classifiers [44]. In this approach, feature space of training dataset are divided into some subsets based on the Principal Component Analysis [45] for learning base classifiers. The Meta classifiers generally create higher prediction accuracy in comparison with single-based classifiers [46].
In this study, the RF as a Meta classifier in order to detect landslide occurrence locations has been applied. Consider x = x(x 1 , x 2 , . . . , x 12 ) is the vector of 11 landslide conditioning factors, y = (y 1 , y 2 ) is the vector of landslide and no-landslide occurrence class, and D indicates the training dataset. C 1 , C 2 , . . . , C L are the number of classifiers for learning, and φ is a set of landslide conditioning factors. In the first step, φ are divided into k training subsets in which 10/k landslide conditioning factors in each training subset are created. Let φ i,j be j-th (j = 1, 2, . . . , k) subset of landslide conditioning factors C i and P i,j is landslide conditioning factor in φ i,j from D. According to the bootstrap algorithm, P' i,j with 75% sized randomly selected from P i,j .
In the next step, to calculate the coefficients of z i,1 , the P' i,j will be transformed with the size z' i,1 equals to T × 1. In fact, the RF is constituted using base classifier and the rotation matrix (Z a i ) by transformation technique (rearranging the matrix of Z i ), which is observed as follows [40]: Then, the columns of Z i are rearranged using the original feature set. In the next step, the θZ M i value will be transformed on a training dataset using classifier D i . Consequently, all classifiers after training with parallel manner will be summed [43].
The classification phase, using the testing dataset of x, will be evaluated when d ij θZ a i is the probability value determined by classifier D i based on the hypothesis that x belongs to class y. Then, the average combination method of a class is obtained as follows: Lastly, the largest confidence of the class will be assigned by θ.

Best First Decision Trees
The main idea of the expansion of decision tree nodes of Best First Decision Trees (BFDT) algorithm was introduced by Friedman et al. (2000). In this algorithm, the best node expanded in depth-first order as compared to C4.5 and CART [47]. The best node among all nodes to split is a node that leads to maximum reduction of impurity such as Gini index or information gain. The BFDT creates a binary tree in which each internal node is assigned two outgoing edges.
The growth of the tree will continue until the internal nodes reach maximum homogeneity. This means that a terminal node does not split further when it will be pureed so that all cases have the same value for the dependent variable (landslide and non-landslide). To assess the impurity in this algorithm, information gain and Gini index measures based on the entropy are used. In this study, Information Gain (IG) is used for assessing the impurity. Moreover, the entropy specifies the purity of any sample set. Consider D as the training dataset, A as a conditioning factor such as slope angle, and "i" a class label (landslide and non-landslide). The following equation can obtain the IG values of factors (e.g., slope angle): where p i is the proportion of D belonging to class i. The IG leads to splitting the training dataset by a reduction in entropy using the following equation: where values (A) is the set of all possible values for slope angle factor (A) and D i is the subset of D for which attribute A has value i. The tree in the BFDT algorithm will be stopped when all instances belonging to a landslide or non-landslide as a target feature or the best value of IG value are less than zero [48].

Validation Assessment
In this study, mean square error (MSE), root mean square error (RMSE), and area under the receiver operative characteristic (AUC) curve were used to validate the performance of the developed models. The MSE estimates the generalization error of the model, whereas the RMSE measures the forecasting errors of the models [49]. The MSE and RMSE can be expressed as follows: where X obs denotes the observed values in the training dataset or validation dataset, X est represents the estimated (output) values from the landslide susceptibility models, and n is the total number of samples in the training or validation datasets [50]. The result of modeling is effective when the values of RMSE and MSE are small [51]. In addition, another standard and applicable technique that has been utilized in almost all landslide susceptibility assessments is the Area under the Receiver Operative Characteristic (AUC) Curve [52]. Generally, the ROC curve is plotted based on the sensitivity as the y-axis and the 1-specificity as the x-axis [53]. The AUC pinpoints the performance of a model so that a higher AUC indicates better model performance [52]. It has a range between 0.5 (random model) and 1 (ideal model) [54,55]. The AUC can be formulated as follows: where TP and TN are the number of correctly and incorrectly classified as landslides, respectively; R is the total number of landslides and non-landslides [53].

Methodology Adopted for Developing Landslide Susceptibility Maps
The methodology of the present study includes four main steps: (1) generation of training and testing dataset, (2) building of the hybrid models, (3) validation of the hybrid models, and (4) development of landslide susceptibility map ( Figure 6). A brief description of methodology is below: testing dataset, (2) building of the hybrid models, (3) validation of the hybrid models, and (4) development of landslide susceptibility map ( Figure 6). A brief description of methodology is below: Step 1: Training and testing datasets were generated using landslide data of the study area. A training dataset was generated with 70% of landslide inventory (117 locations), whereas a testing dataset was constructed with the 30% remaining landslide inventory (50 locations). In the datasets, non-landslide locations were also taken into account as landslide prediction is considered a binary classification problem. Non-landslide locations were identified based on the study of the area. Out of these, 117 non-landslide locations were used for the training dataset while 50 non-landslide locations were used for testing datasets. For modeling, landslide instances were assigned "1" whereas nonlandslide instances were assigned "0".
Step 2: Using the training dataset, the hybrid models (RFBDFT, PSOANFIS, and PSOANN) were constructed for spatial prediction of landslides at the study area. More specifically, the RFBDFT was constructed by combining the RF ensemble and the BDFT classifier. In the RFBDFT, the RF was trained with 25 iterations and the BDFT was trained with 10 folds in internal cross-validation. The PSOANFIS was constructed by combining the PSO optimization and the ANFIS classifier, while the PSOANN was constructed by combining the PSO and the ANN classifier. In the PSOANFIS, the model was trained with 1500 iterations, 0.99 inertia weight, and 25 populations. In the PSOANN, the number of hidden layers was set to nine.
Step 3: The hybrid models was validated using several criteria, namely MEA, RMSE, and AUC. In this step, the models were validated in goodness-of-fit using the training dataset and predictive capability using the testing dataset.
Step 4: Mapping landslide susceptibility started with generation of Landslide Susceptibility Index (LSI) values for each pixel of the study area using the hybrid models. Thereafter, the LSIs were assigned to each pixel in the GIS environment and were reclassified using the natural break classification method [19].  Step 1: Training and testing datasets were generated using landslide data of the study area. A training dataset was generated with 70% of landslide inventory (117 locations), whereas a testing dataset was constructed with the 30% remaining landslide inventory (50 locations). In the datasets, non-landslide locations were also taken into account as landslide prediction is considered a binary classification problem. Non-landslide locations were identified based on the study of the area. Out of these, 117 non-landslide locations were used for the training dataset while 50 non-landslide locations were used for testing datasets. For modeling, landslide instances were assigned "1" whereas non-landslide instances were assigned "0".

Results and Discussion
Step 2: Using the training dataset, the hybrid models (RFBDFT, PSOANFIS, and PSOANN) were constructed for spatial prediction of landslides at the study area. More specifically, the RFBDFT was constructed by combining the RF ensemble and the BDFT classifier. In the RFBDFT, the RF was trained with 25 iterations and the BDFT was trained with 10 folds in internal cross-validation. The PSOANFIS was constructed by combining the PSO optimization and the ANFIS classifier, while the PSOANN was constructed by combining the PSO and the ANN classifier. In the PSOANFIS, the model was trained with 1500 iterations, 0.99 inertia weight, and 25 populations. In the PSOANN, the number of hidden layers was set to nine.
Step 3: The hybrid models was validated using several criteria, namely MEA, RMSE, and AUC. In this step, the models were validated in goodness-of-fit using the training dataset and predictive capability using the testing dataset.
Step 4: Mapping landslide susceptibility started with generation of Landslide Susceptibility Index (LSI) values for each pixel of the study area using the hybrid models. Thereafter, the LSIs were assigned to each pixel in the GIS environment and were reclassified using the natural break classification method [19].

Results and Discussion
Goodness-of-fit and prediction accuracy of the RFBFDT model are given in Figure 7 Figure 8 shows the results of goodness of fit and prediction accuracy using training and validation datasets.
The results indicate that using the training dataset the values of RMSE, RMSE, error mean, and error SD are 0.14, 0.374, 0.005, and 0.375, respectively. These values using the validation dataset are 0.225, 0.474, −0.0298, and 0.476, respectively. Moreover, the results expressed that in the PSOANN model, the values of RMSE, RMSE, error mean, and error SD using training dataset are 0.168, 0.41, −0.0005, and 0.411, respectively. In the validation process, the results stated that the values of 0.312, 0.558, 0.0003, and 0.561 acquired for RMSE, RMSE, error mean, and error SD, respectively ( Figure 9).
Landslide hybrid models were then evaluated through the ROC curve analysis. The results are given in Figure 10. The results of performance of the ensemble models exhibited that the RFBFDT model acquired the highest of AUC value (0.891), followed by the PSOANFIS model (0.890) and the PSOANN model (0.850). Additionally, the validation dataset confirmed that the RFBFDT ensemble models had the highest prediction accuracy, with an AUC value of 0.826. This is followed by the PSOANFIS model (AUC = 0.760) and the PSOANN model (AUC = 0.720). The results of AUC are completely in agreement with the results of model validation using MSE, RMSE, error mean, and error SD values in the training and validation phases. Overall, the RFBFDT ensemble model is the best model for predicting landslide locations compared to the other models (PSOANFIS and PSOANN).
Landslide susceptibility is assessed based on the landslide susceptibility index (LSI), which was generated from the model construction process. Thereafter, the obtained LSI was transferred to all pixels of the study area and they were classified for determining the susceptibility levels of landslides in the study area. Landslide susceptibility maps of the study area were finally constructed with five susceptibility classes including very low, low, moderate, high, and very high ( Figure 11). The distribution of these susceptibility classes on the maps was calculated and shown in Figure 12. A map generated by the RFBDFT model indicated that 48% of the study area falls into the low class, 42% in the moderate class, and 11% in the high class, whereas, in the map generated by the PSOANFIS model, 25% of the study area is covered by the low class, 44% by the moderate class, and 31% by the high class. A further map generated by the PSOANN model indicated that 25% of the study area falls in the low class, 63% in the moderate class, and 13% in the high class.

Conclusions
In this study, three novel hybrid machine learning approaches, namely PSOANFIS, PSOANN, and RFBFDT, were applied for the development of landslide susceptibility maps. A spatial database of 167 past landslides of Van Chan district, Yen Bai province, Vietnam was used to generate the datasets for modeling, considering 12 landslide conditioning factors. Validation of the models was done using the AUC, MSE, and RMSE methods.

Conclusions
In this study, three novel hybrid machine learning approaches, namely PSOANFIS, PSOANN, and RFBFDT, were applied for the development of landslide susceptibility maps. A spatial database of 167 past landslides of Van Chan district, Yen Bai province, Vietnam was used to generate the datasets for modeling, considering 12 landslide conditioning factors. Validation of the models was done using the AUC, MSE, and RMSE methods. The results show that the RFBFDT (AUC = 0.826, MSE = 0.189, RMSE = 0.434) is the best model in comparison to other hybrid models, namely PSOANFIS (AUC = 0.76, MSE = 0.225, RMSE = 0.474) and PSOANN (AUC = 0.72, MSE = 0.312, RMSE = 0.558). Thus, it can be reasonably concluded that the RFBFDT model can be used for better landslide susceptibility assessment, land use planning, and hazard management in landslide-prone areas.

Conclusions
In this study, three novel hybrid machine learning approaches, namely PSOANFIS, PSOANN, and RFBFDT, were applied for the development of landslide susceptibility maps. A spatial database of 167 past landslides of Van Chan district, Yen Bai province, Vietnam was used to generate the datasets for modeling, considering 12 landslide conditioning factors. Validation of the models was done using the AUC, MSE, and RMSE methods. The results show that the RFBFDT (AUC = 0.826, MSE = 0.189, RMSE = 0.434) is the best model in comparison to other hybrid models, namely PSOANFIS (AUC = 0.76, MSE = 0.225, RMSE = 0.474) and PSOANN (AUC = 0.72, MSE = 0.312, RMSE = 0.558). Thus, it can be reasonably concluded that the RFBFDT model can be used for better landslide susceptibility assessment, land use planning, and hazard management in landslide-prone areas. However, as these proposed models were applied in only one of the areas of Vietnam, their applicability must be tested in other hilly areas of Vietnam as well as other parts of the world. Moreover, another limitation of this research is that we considered a fixed combination of conditioning factors for modeling; therefore, it would be better to test the effectiveness of the models with different combinations of conditioning factors to explore the possibility of further improvement of the models.