Hybrid Computational Intelligence Methods for Landslide Susceptibility Mapping

In this study, hybrid integration of MultiBoosting based on two artificial intelligence methods (the radial basis function network (RBFN) and credal decision tree (CDT) models) and geographic information systems (GIS) were used to establish landslide susceptibility maps, which were used to evaluate landslide susceptibility in Nanchuan County, China. First, the landslide inventory map was generated based on previous research results combined with GIS and aerial photos. Then, 298 landslides were identified, and the established dataset was divided into a training dataset (70%, 209 landslides) and a validation dataset (30%, 89 landslides) with ensured randomness, fairness, and symmetry of data segmentation. Sixteen landslide conditioning factors (altitude, profile curvature, plan curvature, slope aspect, slope angle, stream power index (SPI), topographical wetness index (TWI), sediment transport index (STI), distance to rivers, distance to roads, distance to faults, rainfall, NDVI, soil, land use, and lithology) were identified in the study area. Subsequently, the CDT, RBFN, and their ensembles with MultiBoosting (MCDT and MRBFN) were used in ArcGIS to generate the landslide susceptibility maps. The performances of the four landslide susceptibility maps were compared and verified based on the area under the curve (AUC). Finally, the verification results of the AUC evaluation show that the landslide susceptibility mapping generated by the MCDT model had the best performance.


Introduction
Landslide often do not exist in isolation. In the same area, landslides may occur at the same time (known as connected landslide groups) with a spatiotemporal symmetry. Landslides are among the most dangerous natural disasters and often occur after heavy rains or earthquakes [1]. According to statistics, the number of people who die from landslides every year is about 1000 worldwide, and landslides result in annual property losses of up to four billion dollars [2]. Landslides may also be caused by various other factors, such as geological features and vegetation coverage [3][4][5][6]. Landslide The research method of this study can be divided into the following five steps (  The research method of this study can be divided into the following five steps ( Figure 2): (1) data collection; (2) generating the landslide inventory map and selecting factors; (3) establishing the landslide model; (4) validation and comparison of models; and (5) generating the landslide susceptibility map and determining the best model for the study area. The landslide inventory, as the basis of landslide susceptibility mapping, includes the geographic locations of landslide occurrences [47]. The landslide inventory map also plays an important role in studying the relationships between landslide conditioning factors and landslides [29]. Studying the geological and climatic conditions related to landslides is helpful to predict the locations of future landslides in the area. Figure 1 shows the landslide inventory map of Nanchuan County based on a field geological survey and interpretation of aerial photos. Ultimately, 298 landslides were identified in the landslide inventory map. Following previous studies, the 298 landslides were randomly divided into two parts, 70% (209 landslides) for the training dataset and 30% (89 landslides) for the validation dataset. At the same time, the same number of non-landslide points were randomly selected in the study area and converted into pixels.
As the second step of generating the landslide susceptibility map, the landslide conditioning factors were determined. A digital elevation model (DEM) with a resolution of 20 m by 20 m was used to extract profile curvature, plan curvature, slope, aspect, stream power index (SPI), sediment transport index (STI), and topographic wetness index (TWI) factor maps. In this paper, a total of 16 landslide conditioning factors were determined according to relevant geological characteristics and environmental conditions for landslide susceptibility evaluation. Among the 16 landslide conditioning factors, 13 are continuous factors: Altitude, profile curvature, plan curvature, slope aspect, slope angle, SPI, TWI, STI, distance to rivers, distance to roads, distance to faults, rainfall, and normalized difference vegetation index (NDVI). The other three were categorical factors: Soil, land use, and lithology. Because the causes of landslides are not only complicated but also difficult to determine, the existing studies have not formed a unified choice of landslide conditioning factors [48]. However, previous studies have found the relationship between the occurrence of landslides and some conditions, such as topographic and geological characteristics, climatic conditions, and human activities [49]. Therefore, based on previous landslide susceptibility studies and related geological characteristics and environmental conditions in the study area, 16 landslide condition factors were determined. Finally, the 16 landslide conditioning factors were transformed into the same spatial resolution (20 by 20 m). The landslide inventory, as the basis of landslide susceptibility mapping, includes the geographic locations of landslide occurrences [47]. The landslide inventory map also plays an important role in studying the relationships between landslide conditioning factors and landslides [29]. Studying the geological and climatic conditions related to landslides is helpful to predict the locations of future landslides in the area. Figure 1 shows the landslide inventory map of Nanchuan County based on a field geological survey and interpretation of aerial photos. Ultimately, 298 landslides were identified in the landslide inventory map. Following previous studies, the 298 landslides were randomly divided into two parts, 70% (209 landslides) for the training dataset and 30% (89 landslides) for the validation dataset. At the same time, the same number of non-landslide points were randomly selected in the study area and converted into pixels.
As the second step of generating the landslide susceptibility map, the landslide conditioning factors were determined. A digital elevation model (DEM) with a resolution of 20 m by 20 m was used to extract profile curvature, plan curvature, slope, aspect, stream power index (SPI), sediment transport index (STI), and topographic wetness index (TWI) factor maps. In this paper, a total of 16 landslide conditioning factors were determined according to relevant geological characteristics and environmental conditions for landslide susceptibility evaluation. Among the 16 landslide conditioning factors, 13 are continuous factors: Altitude, profile curvature, plan curvature, slope aspect, slope angle, SPI, TWI, STI, distance to rivers, distance to roads, distance to faults, rainfall, and normalized difference vegetation index (NDVI). The other three were categorical factors: Soil, land use, and lithology. Because the causes of landslides are not only complicated but also difficult to determine, the existing studies have not formed a unified choice of landslide conditioning factors [48]. However, previous studies have found the relationship between the occurrence of landslides and some conditions, such as topographic and geological characteristics, climatic conditions, and human activities [49]. Therefore, based on previous landslide susceptibility studies and related geological characteristics and environmental Nowadays, there is no accepted standard for the classification of landslide conditioning factors [50]. In this paper, classify the continuous factors based on the landslide distribution characteristics in the study area and refer to previous studies on landslide susceptibility analysis [51]. After referencing the literatures, all continuous factors are broken and classified in ArcGIS software [52]. Table 1 shows the classification list of factors in this study. Altitude is one of the important parameters in the study of landslide susceptibility and plays an important role in the evaluation. The study area has an altitude range of 312-2228 m, which was divided into nine categories at intervals of 200 m [53]. The vertical plane parallel to the inclined plane is called the profile curvature. The profile curvature influences the erosion and deposition of the slope by controlling the velocity of down-slope flows [54,55]. In this paper, the profile curvature ranges from −27.51 to 21.58 and is divided into three categories [56]. The profile curvature map was generated in the ArcGIS software ( Figure 3b). Plan curvature influences the slope stability by controlling the dispersion and convergence of down-slope flows [57]. The plan curvature map was also generated in the ArcGIS software. The range of the plan curvature was −23. 95-19.49, and these were divided into three categories [56]. Rainfall, solar exposure, and dry wind all affect the occurrence of landslides. All of these influencing parameters are related to slope aspect [58,59]. Therefore, slope aspect is also an important parameter in landslide susceptibility evaluation [60,61]. In this paper, the slope aspect was divided into nine categories [53]. The slope angle directly affects the occurrence of landslides; therefore, the slope angle is a parameter that cannot be ignored in landslide susceptibility evaluation [62][63][64][65], and is a factor frequently used in landslide susceptibility mapping [66,67]. The slope angle map of the study area was generated based on a DEM (Figure 3e), in which slope angles were divided into eight categories [53]. The SPI reflects the erosion capacity and sediment content of water flow. In this study, the SPI values were divided into five categories [68]. The TWI can reflect the water content in the soil, which is of great significance in the study of rock and soil stability. In this study, the TWI (0.24-12.95) was divided into five categories and mapped accordingly [20]. Figure 3h is the STI map, in which STI values were divided into five categories [69]: 0-5, 5-10, 10-15, 15-20, and >20.
Rivers may influence the stability of a slope through erosion of the bottom of the slope and increasing the water level because of sediment accumulation at the bottom [64]. The distance to rivers was divided into five categories with intervals of 200 m [53]; the longest distance to a river in the study area was 3153.92 m. Building roads around slopes reduces the load on both the foot of the slope and the terrain; therefore, the distance to roads is a parameter that must be considered in landslide susceptibility mapping [64]. The range of distance to roads in the research area is 0-9005.8 m, which was also divided into five categories with an interval of 200 m [70]. Distance to faults is an important parameter in landslide susceptibility analysis. Faults are fractures of the earth's crust, which reduce rock strength and can ultimately leads to landslides [71]. The distance to faults was divided into five categories: 0-1000 m; 1000-2000 m; 2000-3000 m; 3000-4000 m; and >4000 m [70]. These categories were mapped in the ArcGIS software. Rainfall, as one of the most important factor in inducing landslides, is also the most important factor in the evaluation of landslide susceptibility [72,73]. Rainfall in the study area was divided into five categories [74] in the generated rainfall map (Figure 3l). According to previous work, the NDVI value is directly proportional to the vegetation coverage area [75,76]. The range of NDVI values in the study area is −0.05-0.56. The NDVI map (Figure 3m) was generated and divided into five categories in the ArcGIS software [76]. Different soil types have different compositions and textures. In the study area, the soil map is divided into seven categories by soil type: Haplic Alisol, Cumulic Anthrosol, Dystric Cambisol, Rendzic Leptosol, Haplic Luvisol, Chromic Luvisol, and Dystric Regosol.
Land use, as a landslide conditioning factor, can affect the stability of the slope. For example, the roots of vegetation can maintain the stability of the slope, thereby reducing the incidence of landslides [77]. Table 1 shows the land-use classification of the study area. Rock strength and permeability differ with different lithologies; therefore, the lithology influences landslide occurrence [78]. The lithological map of the study area shows classification into seven categories: Type 1 (Jurassic: Mudstone, sandstone, siltstone, and limestone), type 2 (Triassic: Limestone, dolomite, sandstone, and siltstone), type 3 (Permian: Limestone and shale with intercalation of siltstone), type 4 (Silurian: Shale, siltstone, and interbedded limestone), type 5 (Ordovician: Grayish-black charcoal shale with a siliceous base), type 6 (Ordovician: Limestone and carbonate), and type 7 (Cambrian: Dolomite and limestone). Table 1. Landside conditioning factors and their classes.

Credal Decision Tree
Credal decision trees are used to solve classification problems of credal sets [79]. Compared with the J48 algorithm, which uses information gain as the segmentation standard to determine the attributes of each branch node, a CDT considers the imprecise probability and uncertainty in the original segmentation standards [79]. The processing method of missing values is the same as that of C4.5 [80]. The measure of total uncertainty (TU) consists of two parts, which can be expressed as [41,79]: where φ represents a credal set on a frame X, TU is the value of the total uncertainty, IG represents a non-specific general function on the corresponding credal set, and GG is a general function of the randomness of credal sets. The function of the non-specific state can be expressed as [79]: (2)

Credal Decision Tree
Credal decision trees are used to solve classification problems of credal sets [79]. Compared with the J48 algorithm, which uses information gain as the segmentation standard to determine the attributes of each branch node, a CDT considers the imprecise probability and uncertainty in the original segmentation standards [79]. The processing method of missing values is the same as that of C4.5 [80]. The measure of total uncertainty (TU) consists of two parts, which can be expressed as [41,79]: where ϕ represents a credal set on a frame X, TU is the value of the total uncertainty, IG represents a non-specific general function on the corresponding credal set, and GG is a general function of the randomness of credal sets. The function of the non-specific state can be expressed as [79]: where m (ϕ) is a focal element and A is the power set of X.
The measure of the randomness of a general credal set can be expressed as [79]: where the maximum occupies all probability distributions of ϕ and ϕ represents a general credal set. This function not only verifies all the basic properties verified in Dempster-Shafer's theory but also is a good measure of the randomness for credal sets [79,81].

Radial Basis Function Network
Radial basis function networks are an effective method to solve nonlinear problems using a special radial function [82]. These networks are now widely used in various fields, such as image processing and analysis [83,84] and credit estimation [85]. In this paper, Figure 4 shows the framework structure of the RBFN, which consists of 16 input layers, a hidden layer, and an output layer. The mapping from the output layer to the hidden layer is non-linear, whereas the mapping from the hidden layer to the output layer is linear [86]. Finally, the output layer can be expressed as in Equation (4) [86]: where n is the number of nodes of the hidden point, c i represents the center of the i th hidden node, x is the input vector, w ih is the weight of the i th node of the hidden layer, w 0h is the offset of the h node of the output layer, and φ i is the radial basis function centered on c i . In general, the Gaussian function is often used as a basis function in RBFN. The Gaussian function can be expressed as in Equation (5) where the maximum occupies all probability distributions of φ and φ represents a general credal set. This function not only verifies all the basic properties verified in Dempster-Shafer's theory but also is a good measure of the randomness for credal sets [79,81].

Radial basis function network
Radial basis function networks are an effective method to solve nonlinear problems using a special radial function [82]. These networks are now widely used in various fields, such as image processing and analysis [83,84] and credit estimation [85]. In this paper, Figure 4 shows the framework structure of the RBFN, which consists of 16 input layers, a hidden layer, and an output layer. The mapping from the output layer to the hidden layer is non-linear, whereas the mapping from the hidden layer to the output layer is linear [86]. Finally, the output layer can be expressed as in Equation (4) [86]: where n is the number of nodes of the hidden point, ci represents the center of the ith hidden node, x is the input vector, wih is the weight of the ith node of the hidden layer, w0h is the offset of the h node of the output layer, and is the radial basis function centered on ci. In general, the Gaussian function is often used as a basis function in RBFN. The Gaussian function can be expressed as in Equation (5)

MultiBoosting
MultiBoost is a combination of AdaBoost and wagging, and is an extension of the AdaBoost technique for forming decision committees [87]. Decision committee learning can reduce the misclassification of learning classifiers [88]. It has been found that MultiBoost can reduce most of AdaBoost's superior bias while reducing most of the superior variance of bagging [88]. MultiBoost not only has the bias and variance reduction features in its constituent committee learning algorithm, but also has a potential computational advantage: Committees can learn in parallel [88].
The working principle of the MultiBoost method can be divided into three steps: First, a subset is randomly selected from the training dataset to generate the initial base classifier; then, the instance weight is adjusted according to the precision performance of the base classifier; and finally, a new subset is selected from the weighted instance to train a new classifier [89]. For the training dataset S (x i , y i ), where x i ∈ R, y i ∈ (landslide, non-landslide), the final classifier can be obtained from the following equations [46]: where S is S with instance weights assigned to be 1, Ct is a base learner (S ), and ε t is the weighted error on the training dataset.

Optimization of the Dataset
In this paper, the correlation attributes evaluation (CAE) method was used to study the influence of each landslide conditioning factor on landslide occurrence [90]. The calculated average merit (AM) values reflect the influence ability of the selected factors; a greater AM value indicates greater influence [43,91]. The AM values of the 16 conditioning factors evaluated in the study area were calculated as follows: Altitude, 0.269; profile curvature, 0.017; plan curvature, 0.045; slope aspect, 0.133; slope angle, 0.366; SPI, 0.143; TWI, 0.08; STI, 0.19; distance to rivers, 0.141; distance to roads, 0.274; distance to faults, 0.077; rainfall, 0.153; NDVI, 0.025; soil, 0.241; land use, 0.282; and lithology, 0.221 ( Figure 5).

Model Performances and Validation
The main purpose of landslide susceptibility evaluation is to identify locations that may be affected by future landslides [29]. Therefore, no matter which integration method is used to generate the landslide susceptibility map, it needs to be verified and evaluated [92]. In this paper, a popular technique called the area under the ROC curve (AUC) was used to quantitatively determine the predictive power of two integrated models and the two single models [93][94][95][96]. The ROC curve was

Model Performances and Validation
The main purpose of landslide susceptibility evaluation is to identify locations that may be affected by future landslides [29]. Therefore, no matter which integration method is used to generate the landslide susceptibility map, it needs to be verified and evaluated [92]. In this paper, a popular technique called the area under the ROC curve (AUC) was used to quantitatively determine the predictive power of two integrated models and the two single models [93][94][95][96]. The ROC curve was plotted using the sensitivity and 1-specificity of the model [97]. Figure 6 Tables 2 and 3 show three evaluation statistics that were used in addition to the AUC values: The standard error, 95% confidence interval (CI), and significance level (p-value). These statistical methods were used to verify and compare the performance of the models. In general, smaller standard error values, smaller CI, and smaller p-values indicate better model performance [91]. The MCDT model not only produced the minimum standard errors, 95% CI, and p-values in the training and validation datasets but also yielded the highest AUC value, indicating that the performance of the MCDT model was the best in this study.
The chi-square test was used to verify the performance of the four models (CDT, RBFN, MCDT, and MRBFN) through pairwise comparison. Table 4 shows the level of significance after comparison. The chi-square values of five groups were significantly higher than the critical value of 3.841, and the significance level was also higher than the critical value of 0.05. However, the RBFN model and MRBFN model differed from other models. In particular, the significance levels of the MCDT and RBFN models are notable. Through comparison, it was found that there were statistical differences between all models except the RBFN and MRBFN models.

Generation of Landslide Susceptibility Maps
The characteristic of CDT is that it does not ignore imprecise probabilities and the application of uncertainty measures for the original split criterion [79]. It not only reduces wrong pruning by back fitting but also sorted values for numeric attributes. Furthermore, it treats missing values and values similarly to C4.5. The parameters used in the construction of the CDT model are: Svalue is 1.0, maxDepth is −1, minNum is 2.0, numFolds is 3, and seed is 1. The RBFN model is established based on the training data. The ten-fold cross-validation method used in Weka software [98] not only reduces the variability of the model, but also avoids the problem of overfitting in the modeling process. The parameters used in the RBFN model are: Clustering seed is 1, maximum number of iterations is −1, the number of clusters is 2, minimum standard deviation is 0.1, and ridge is 1.0E-8. The function of the MultiBoosting algorithm in the modeling process is to use the training subset to construct various classifier-based models, and then adjust the weights of the classifier-based models to optimize the classification accuracy. The advantage of using MultiBoosting to build a hybrid model is that MultiBoost symmetrically considers the characteristics of Boosting and wagging to prevent overfitting problems [89]. Besides, the accuracy of classification results can be increased by reducing the variance and deviation of the system. The parameters used by the MultiBoosting model are: Numlterations is 10, numSubCmtys is 3, seed is 1, and weightThreshold is 100.
When the training and validation of the models were completed, the models were applied to the study area to calculate the landslide susceptibility indices (LSI). The LSI values were then converted to raster format to generate landslide susceptibility maps in the ArcGIS software. Figure 8a-d shows the landslide susceptibility maps constructed using the CDT, RBFN, MCDT, and MRBFN models, respectively. The corresponding LSI value ranges of the four landslide susceptibility maps are 0.039 to 0.949, 0.092 to 0.852, 0 to 1, and 0 to 1. Using the equal area method, each of the four landslide susceptibility maps was divided into five landslide susceptibility classes: Very low (40%), low (20%), moderate (20%), high (10%), and very high susceptibility (10%), for better comparison of the landslide susceptibility maps. The susceptibility zones of the landslides can be visually assessed from the four figures.

Discussions
In this paper, two hybrid models of MultiBoosting-based artificial intelligence methods were established: The MCDT and MRBFN models. The main purpose was to evaluate landslide susceptibility in Nanchuan County, China. To build the landslide susceptibility models, 16 landslide conditioning factors were determined.
The results show that each conditioning factor contributed to the modeling of landslide susceptibility, but the extents of their contributions differed. Slope angle yielded the highest AM

Discussions
In this paper, two hybrid models of MultiBoosting-based artificial intelligence methods were established: The MCDT and MRBFN models. The main purpose was to evaluate landslide susceptibility in Nanchuan County, China. To build the landslide susceptibility models, 16 landslide conditioning factors were determined.
The results show that each conditioning factor contributed to the modeling of landslide susceptibility, but the extents of their contributions differed. Slope angle yielded the highest AM value, which demonstrates the importance of slope angle to landslide susceptibility. In addition, altitude is an important factor that also produced a high AM value in the landslide susceptibility evaluation. Soils with different compositions and structures affect landslide occurrences [18], and the AM values show that soil types play an important role in this study area. Road construction will destroy the natural topology and make slopes unstable. Therefore, distance to roads has become an important factor affecting landslides. In this study, distance to roads (AM = 0.274) occupies a relatively high position in the overall landslide conditioning. Rainfall is also one of the main parameters in landslide susceptibility evaluation and is closely related to landslide occurrence. The rainy season in the study area lasts for four months: May, June, July, and August. The maximum daily rainfall in the area is 121.4 mm. Rainfall (AM = 0.153) is an essential parameter for landslide susceptibility assessment in the study area. Lithology and land use, both important parameters in landslide studies [61,99], also play important roles in this study area. Based on the AM values of plan curvature, TWI, NDVI, and profile curvature, it can be inferred that the occurrence of landslides is less affected by these four factors in the study area, as these factors had no significant influence (AM < 0.01) on landslide occurrence. However, these factors cannot be ignored because many scholars have studied their relationships with landslides.
Visual analysis of the four landslide susceptibility maps revealed the same very low susceptibility zone around the southwest valley. However, the southeast of the study area is covered by areas of high and very high susceptibility. Figure 8 shows that the four landslide susceptibility maps have similar spatial distributions of moderate and high susceptibility zones. In this study, the target ratio of the hybrid models was smaller than that of the single models under visual analysis. For the two single models, this means that the hybrid models had higher predictive power and reliability.
The performances of the two-hybrid models and the two single models were verified and compared. Performance of the two-hybrid models was significantly improved compared with the single models under the training dataset and the validation dataset. Based on comparison of the ROC curves, SEs, and 95% CI of the two hybrid models, the MCDT model performed better than the MRBFN model. The final results show that the MCDT model had the best performance in this study, with the minimum SE (0.015) and CI (0.87-0.93) and maximum AUC (0.90) under the training dataset, as well as the minimum SE (0.035) and CI (0.71-0.84) and maximum AUC (0.77) under the validation dataset. These four landslide susceptibility maps will help local government agencies and related organizations to prevent and manage landslide hazards. All models and methods used in this study can be used to evaluate landslide susceptibility in other areas with similar conditions.

Conclusions
The main purpose of this study was to apply MultiBoosting with two artificial intelligence methods (the RBFN and CDT models) to evaluate landslide susceptibility in Nanchuan County, China. A total of 16 landslide conditioning factors (altitude, profile curvature, plan curvature, slope aspect, slope angle, SPI, TWI, STI, distance to rivers, distance to roads, distance to faults, rainfall, NDVI, soil, land use, and lithology) were determined to affect the occurrence of landslides in the study area. The landslide susceptibility maps generated by the two-hybrid models and those generated by the two single models were compared in pairs. ROC curves, statistical analysis methods, and chi-square tests were used to compare and evaluate the spatial prediction abilities of the hybrid models and single models.
The final results show that both the hybrid model and the single model contributed to the evaluation of landslide susceptibility in the study area, but the evaluation capabilities were different. The AUC values of the CDT, RBFN, MCDT, and MRBFN models were 0.75, 0.67, 0.77, and 0.73, respectively. These results indicate that the hybrid models MCDT and MRBFN had higher predictive power than the CDT model and RBFN model. However, the MCDT model had the best performance in the study area. Through this research, it can be concluded that a hybrid model can improve the predictive ability of a single model. The hybrid models used in this study can be employed to help prevent and control landslide disasters in similar areas.
The final results show that two hybrid computational intelligence methods (MCDT and MRBFN) can be successfully applied to landslide susceptibility assessment. The obtained landslide susceptibility map can provide valuable information for local governments or organizations to study slope stability and can also be used as a reference for infrastructure planning, engineering design and disaster reduction design. However, the best method for landslide susceptibility mapping still needs to be studied and discussed in this study area. Therefore, it is recommended to carry out detailed land use planning and urban development after specific site surveys in very high and highly sensitive areas.
Author Contributions: G.W., X.L., W.C., H.S. and A.S. contributed equally to the work. X.L. and G.W. collected field data and conducted the landslide susceptibility mapping and analysis. X.L. and W.C. wrote and revised the manuscript. G.W., H.S. and A.S. provided critical comments in planning this paper and edited the manuscript. All the authors discussed the results and edited the manuscript. All authors have read and agreed to the published version of the manuscript.