Improved Shallow Landslide Susceptibility Prediction Based on Statistics and Ensemble Learning

Rainfall-induced landslides bring great damage to human life in mountain areas. Landslide susceptibility assessment (LSA) as an essential step toward landslide prevention has attacked a considerate focus for years. However, defining a reliable or accurate susceptibility model remains a challenge although various methods have been applied. The main purpose of this paper is to explore a comprehensive model with high reliability, accuracy, and intelligibility in LSA by combing statistical methods and ensemble learning techniques. Miyun country in Beijing is selected as the study area. Firstly, the dataset containing 370 landslide locations inventories and 13 conditioning factors were collected and non-landslide samples were prepared by clustering analysis. Secondly, random forest (RF), gradient boosting decision tree (GBDT), and adaptive boosting decision tree (Ada-DT) were selected as base learners for the Stacking ensemble method, and these methods were evaluated using measures like area under the curve (AUC). Finally, the Gini index and frequent ratio (FR) were combined to analyze the major conditioning factors. The results indicated that the performance of the Stacking method was enhanced with an AUC value of 0.944 while the basic classifiers also performed well with 0.906, 0.910, and 0.917 for RF, GBDT, and Ada-DT, respectively. Regions with a distance to a stream less than 2000 m, a distance to a road less than 3000 m, and elevation less than 600 m were susceptible to the landslide hazard. The conclusion demonstrates that the performance of LSA desires enhancement and the reliability and intelligibility of a model can be improved by combining binary and multivariate statistical methods.


Introduction
Landslides are a common natural phenomenon and may cause unpredictable damage to human beings and property worldwide, especially in China where geohazards are enormously occurring and widely distributed [1]. Generally, damages can be decreased or mitigated by predicting the area prone to landslides [2,3]. Therefore, landslide susceptibility mapping (LSM), which predicts the spatial distribution of the likelihood of a landslide occurring, is significant and worthwhile for the reduction of hazards.
How to improve the quality of a model is always the focus of attention and discussed by researchers although related studies have been conducted on improving the predictive accuracy [4,5]. The effectiveness of LSM depends greatly on the models adopted [6], which can be roughly divided into knowledge-based and data-driven methods [7]. Conventional knowledge-based methods as a heuristic, are subjective and limited to be applied in smallscale areas. Conventional statistical methods, like logistic regression (LR) and principal component analysis, are popular due to their simplicity. Nevertheless, the mechanism of a areas (82.9%), cultivated land (8.3%), and reservoirs, roads, and villages (collectively 9.8%). The average annual precipitation is 663.1 mm (1981-2012) mainly concentrated in summer (76.4%) and it is a continental monsoon semi-arid climate.

Study Area
Miyun country located in Beijing, China, extends from longitudes of 116°39′ E to 117°30′ E and latitudes of 40°13′ N to 40°47′ N ( Figure 1). It has a population of more than 470,000 and occupies an area of about 2229.45 km 2 , which is composed of mountainous areas (82.9%), cultivated land (8.3%), and reservoirs, roads, and villages (collectively 9.8%). The average annual precipitation is 663.1 mm (1981-2012) mainly concentrated in summer (76.4%) and it is a continental monsoon semi-arid climate. The study area is part of the transition zone between the North China Plain and the Yanshan mountains, which leads to a series of large fold and fault structures. The faults are large in scale and widely distributed, mainly in the Northeast and north-south directions. The elevation ranges from 45 m to 1750 m above mean sea level with a slope angle between 10-45°. The strata are mainly composed of Archaean (Ar), Proterozoic (Pt), Mesozoic Jurassic (J), and Quaternary (Q). Three types of lithology are usually exposed in our investigation: gneiss from Middle Archean (ArXdgn), dolomites from Proterozoic (Pt22w), and siltstone from Mesozoic Jurassic (J2z). Magmatic intrusive rocks are widely distributed, accounting for nearly one-third of the total area and are exposed discontinuously in the northeast direction.
Road traffic is developed, and human activities are intensive in the study area, involving mining, reservoir, and power station projects. The disasters are various and frequent, mainly rain-induced landslides, which has affected the normal life of the local villagers.

Landslide Inventory
The statistically based models follow a crucial assumption: future landslides have more chances to occur again in the places with the conditions which cause the landslides once and present [18,19]). Accordingly, the landslide inventory map as the initial source is essential and was depicted according to related records (from 1970-2010), field surveys The study area is part of the transition zone between the North China Plain and the Yanshan mountains, which leads to a series of large fold and fault structures. The faults are large in scale and widely distributed, mainly in the Northeast and north-south directions. The elevation ranges from 45 m to 1750 m above mean sea level with a slope angle between 10-45 • . The strata are mainly composed of Archaean (Ar), Proterozoic (Pt), Mesozoic Jurassic (J), and Quaternary (Q). Three types of lithology are usually exposed in our investigation: gneiss from Middle Archean (ArXdgn), dolomites from Proterozoic (Pt22w), and siltstone from Mesozoic Jurassic (J2z). Magmatic intrusive rocks are widely distributed, accounting for nearly one-third of the total area and are exposed discontinuously in the northeast direction.
Road traffic is developed, and human activities are intensive in the study area, involving mining, reservoir, and power station projects. The disasters are various and frequent, mainly rain-induced landslides, which has affected the normal life of the local villagers.

Landslide Inventory
The statistically based models follow a crucial assumption: future landslides have more chances to occur again in the places with the conditions which cause the landslides once and present [18,19]). Accordingly, the landslide inventory map as the initial source is essential and was depicted according to related records (from 1970-2010), field surveys (from 2016-2017) (Figures 2 and 3), and Google Earth satellite images interpretation (May 2018) ( Figure 4). Ultimately, 620 landslide locations were identified, including soil slides (370), rockslides (6), and falls (244) [18]. It is accepted that different type of landslides has a different mechanism of occurrence. Soil slides were only considered in our work and were represented as points shown in Figure 1. Landslides occurred during or after heavy rainfall. Based on field investigation, remote sensing interpretation and relevant records, the scale of landslides in the study area is generally small, accounting for about 80%. The area of Sustainability 2022, 14, 6110 4 of 21 landslides ranges from 3.6 km 2 to 300 m 2 while the depth of most landslides is less than 4 m, belonging to shallow landslides.
(from 2016-2017) (Figures 2 and 3), and Google Earth satellite images interpretation 2018) ( Figure 4). Ultimately, 620 landslide locations were identified, including soil s (370), rockslides (6), and falls (244) [18]. It is accepted that different type of landslide a different mechanism of occurrence. Soil slides were only considered in our work were represented as points shown in Figure 1. Landslides occurred during or after h rainfall. Based on field investigation, remote sensing interpretation and relevant rec the scale of landslides in the study area is generally small, accounting for about 80% area of landslides ranges from 3.6 km 2 to 300 m 2 while the depth of most landslides i than 4 m, belonging to shallow landslides.  (from 2016-2017) (Figures 2 and 3), and Google Earth satellite images interpretation ( 2018) (Figure 4). Ultimately, 620 landslide locations were identified, including soil s (370), rockslides (6), and falls (244) [18]. It is accepted that different type of landslide a different mechanism of occurrence. Soil slides were only considered in our work were represented as points shown in Figure 1. Landslides occurred during or after h rainfall. Based on field investigation, remote sensing interpretation and relevant rec the scale of landslides in the study area is generally small, accounting for about 80% area of landslides ranges from 3.6 km 2 to 300 m 2 while the depth of most landslides i than 4 m, belonging to shallow landslides.

Choice of Mapping Units
The selection of mapping units should be determined in advance for LSM [20]. Another piece of literature discussed and compared the difference among mapping units, such as grid cells and slope units [21]. To better predict or identify the locations of landslides, slope units were applied in our work, which describes the topographic and geo-

Choice of Mapping Units
The selection of mapping units should be determined in advance for LSM [20]. Another piece of literature discussed and compared the difference among mapping units, such as grid cells and slope units [21]. To better predict or identify the locations of landslides, slope units were applied in our work, which describes the topographic and geomorphic conditions of landslides integrally. Finally, the area was divided into 8736 slope units using the hydrologic analysis tool in ArcGIS and indispensable artificial corrections according to remote sensing images. Detailed division steps and discussion can be referred to in other literature [22].

Conditioning Factors
Factors responsible for a landslide are various and there is no consensus on the choice of number and types of factors. It is commonly accepted that landslide is controlled by topographical, geological, and triggering factors. However, data availability, reliability, and accuracy should be given priority [23] and finally, 13 conditioning factors were selected. Detailed information on conditioning factors is shown in Table 1 and Figure 5a-m. A brief description of each controlling factor is given below.    Topographic-related factors were derived from the DEM (Digital Elevation Model) with a resolution of 30 m (http://www.gscloud.cn, accessed on 4 April 2022) originally sourced from the Shuttle Radar Topography Mission (SRTM) data. Elevation affects slope instability and precipitation properties and was frequently applied to LSM [24,25]. Landslides are likely to occur as slopes become steep and vice versa [26]. Maximum elevation difference (MED) reflects the potential energy of a slope and was calculated in ArcGIS [27]. Topographic wetness index (TWI) and Curvature reflect topographic relief [28]. TWI was reclassified into six classes ( Figure 5g) and the related algorithm is as follows: where, As is the specific catchment area, β is the slop angle. The plan curvature ( Figure 5g) and profile curvature ( Figure 5g) are both the most extensively used predisposing factors, which reflect the changes in terrain [29]. The slope aspect map was reclassified into eight classes according to the eight cardinal directions (Figure 5g).
Fault information (Figure 5i) was collected from a geological map of which the ratio was 1:50,000. Faults decrease the rock strength, which acts as potential weak planes in slopes. It was produced by the spatial distance analysis tool in ArcGIS. Similarly, the distance to roads ( Figure 5h) and distance to rivers ( Figure 5j) were both constructed based on the data from the Department of Natural Resources of Beijing (DNRB).
Shallow landslides are mainly caused by heavy or continuous rainfall [30]. Consequently, both the maximum 24 h rainfall ( Figure 5l) and maximum seven days of rainfall Topographic-related factors were derived from the DEM (Digital Elevation Model) with a resolution of 30 m (http://www.gscloud.cn, accessed on 4 April 2022) originally sourced from the Shuttle Radar Topography Mission (SRTM) data. Elevation affects slope instability and precipitation properties and was frequently applied to LSM [24,25]. Landslides are likely to occur as slopes become steep and vice versa [26]. Maximum elevation difference (MED) reflects the potential energy of a slope and was calculated in ArcGIS [27]. Topographic wetness index (TWI) and Curvature reflect topographic relief [28]. TWI was reclassified into six classes ( Figure 5g) and the related algorithm is as follows: where, A s is the specific catchment area, β is the slop angle. The plan curvature ( Figure 5g) and profile curvature ( Figure 5g) are both the most extensively used predisposing factors, which reflect the changes in terrain [29]. The slope aspect map was reclassified into eight classes according to the eight cardinal directions (Figure 5g). Fault information (Figure 5i) was collected from a geological map of which the ratio was 1:50,000. Faults decrease the rock strength, which acts as potential weak planes in slopes. It was produced by the spatial distance analysis tool in ArcGIS. Similarly, the distance to roads ( Figure 5h) and distance to rivers ( Figure 5j) were both constructed based on the data from the Department of Natural Resources of Beijing (DNRB).
Shallow landslides are mainly caused by heavy or continuous rainfall [30]. Consequently, both the maximum 24 h rainfall ( Figure 5l) and maximum seven days of rainfall ( Figure 5m) were selected based on the data (1981-2000) from Beijing Hydrology Manual (BHM) using the kriging interpolation coordinated with elevation in ArcGIS and 11 precipitation stations nearby were taken as reference. Rainfall was regarded as the natural trigger while the distance to the road was the human factor.
Factors were reclassified into four to eight classes and the mean value was regarded as the statistic value of slope units.

K-Means Clustering
K-means comes out to be a well know clustering method due to its efficiency and feasibility [31]. It is applied to divide n observations into k clusters, where each sample is allocated to the cluster based on the closest Euclidean distance, thus considered as the centroid of the cluster [32]. The procedure is then repeated until the change of the cluster seed from one stage to the next is negligible. The main equation involved in k-means is as follows: where u n+1 represents the sum of squares of distances from each point to the cluster center after the nth clustering; ε represents the precision value.

FCM Algorithm
The fuzzy c-means method is a soft clustering method developed by Dunn [33] and it is different from K-means (hard clustering). It has been widely used for statistical analysis of geological problems because of its flexibility and rationality [34]. Its core idea is to assign the objects to the corresponding clusters according to the degree of membership. The function of the FCM clustering is defined by the equation: where C i represents the cluster centers, C represents the number of centers, u ij represents the membership matrix; m represents the degree of fuzziness; J is the objective function and n is the number of objects in the database; d 2 is the Euclidean distance between the ith clustering center and the jth sample [35]. Two parameters as m and C are required to determine in advance. C is determined by the cluster validity function [36] and m is equal to 2 referred to in most applications in this study.
Machine learning methods need both positive and negative datasets. Three-hundredseventy positive samples (that is, landslide locations) were set as "1" and the same number of negative samples with the value of "0", which were selected based on the result of K-means and FCM in this study. As the purity of absent samples increases, it is more likely to reflect the characteristics of non-landslide areas. Accordingly, the critical value of the model results distinguishing landslides and non-landslides is 0.5.

Frequency Ratio
The equation for determining the FR value of a certain level of conditioning factor is defined below [16]: where i indicates the i-th class for each variable considered.
An FR i greater than 1 manifest that there exists a close relationship between landslide occurring and variable class, and if the values are less than 1 then a weak correlation is reflected. Continuous variables are required to be reclassified into classes before application, as Table 1 showed.

LR Model
LR establishes a non-linear probability function model, trying to find appropriate regression coefficients to express the correlation between the independent variable and the dependent variable [37]. The LR model is constructed as the equation below: where p is the probability of a landslide occurring; y is a linear combination function as Equation (7).
where b 0 is the constant value, and b 1 , b 2 , . . . , b n refer to each significant input variable (x 1 , x 2 ,..., x n) causing the landslide. The forward7 stepwise method was adopted to screen variables during LR modeling in SPSS software.

RF
RF belongs to a family of ensemble methods based on the decision tree and Bagging technique and it was first introduced by Breiman [17]. The bagging technique, which is also called bootstrap aggregation, is applied to selecting variables and samples randomly as the training data for modeling. Unused observations are applied to calculate the classification error. Consequently, there are two powerful ideas of RF: random feature selection and Bagging [38]. More details about RF can be found in Breiman [17]. RF was modeled in Python 3.7 using the scikit-learn package [39]. The number of trees (k) and the number of predictive variables (n) are required tuning before modeling [40].

GBDT
GBDT forms weak classifiers (DT) iteratively based on Gradient Boosting [41]. The parameter of the weak classifier defaults to the direction of the. The GBDT was applied in Python 3.7 using the GBDT class library of scikit-learn.

AdaBoost-DT
AdaBoost (known as adaptive boosting) is another boosting algorithm, which was invented by Freund and Schapire [42]. Unlike gradient boosting, AdaBoost assigns incorrectly classified samples with modified weights after each iteration. The final classifier is constructed by combining all weak classifiers. AdaBoost-DT is also applied in Python 3.7 using the AdaBoost class library of scikit-learn.

Gini Index
The split method tree-based classifiers adopt is the minimum principle of Gini and thus Gini index is applied to calculate the relative importance of conditioning factors. The relevant formula is as follows: where T expresses the training set, N is the number of categories, and P is the probability of a sample that is classified into the kth class.

Stacking
The stacking ensemble consists of base-classifiers and meta-classifier. Stacking takes the results predicted by the base-classifiers as the input attributes and the meta-classifier merges the different predictions into the final prediction. It is believed that stacking performs better than any basic classifiers [43]. Figure 6 shows the structure of the Stacking. The basic classifiers of Stacking were three ensemble learning machines that have been showing great performance in statistical analysis: RF, GBDT, and AdaBoost-DT. LR model was used as the combiner. To avoid over-fitting of the meta-classifier, the dataset is divided into two disjoint subsets: one for training base-classifiers and the other for testing. To train the meta-level classifier, 5-fold cross-validation is applied to construct the meta-levels for all combining methods. The basic classifiers of Stacking were three ensemble learning machines that have been showing great performance in statistical analysis: RF, GBDT, and AdaBoost-DT. LR model was used as the combiner. To avoid over-fitting of the meta-classifier, the dataset is divided into two disjoint subsets: one for training base-classifiers and the other for testing. To train the meta-level classifier, 5-fold cross-validation is applied to construct the metalevels for all combining methods.

Evaluating Model Performance
Models need a reliable evaluation and/or validation process [44]. The capacity of a model to classify was evaluated by a 5-fold cross-validation procedure, where the data is divided into five independent groups, one at a time for testing and the remaining four groups for training [45].
Accuracy, sensitivity, and specificity were three statistical indexes evaluating the performance [13]:

Evaluating Model Performance
Models need a reliable evaluation and/or validation process [44]. The capacity of a model to classify was evaluated by a 5-fold cross-validation procedure, where the data is divided into five independent groups, one at a time for testing and the remaining four groups for training [45].
Accuracy, sensitivity, and specificity were three statistical indexes evaluating the performance [13]: where True Positive (TP) refers to the number of landslide samples with correct classification, True Negative (TN) refers to the number of non-landslide samples with correct classification, False Positive (FP) refers to the number of landslide samples with incorrect classification and False Negative (FN) refers to the number of non-landslide samples with incorrect classification. AUC is a metric commonly used to assess the quality of the model and it varies from 0.5 to 1. The higher the AUC value shows the stronger the predictive ability [46].
Non-parametric models need to be optimized by tuning related hyperparameters before application [47]. The involved parameters for modeling utilized in this study were shown in Table 2 and the flowchart of methods involved was shown in Figure 7. Non-parametric models need to be optimized by tuning related hyperparameters b fore application [47]. The involved parameters for modeling utilized in this study wer shown in Table 2 and the flowchart of methods involved was shown in Figure 7.

Non-Landslide Samples Selected by FCM and K-Means
LSM generated based on cluster analysis does not need to identify the positive an negative labels of the samples in advance. Based on the curve of the clustering effectiv ness index Vcs (Figure 8), the preferred value is five. Consequently, the study area wa reclassified into five areas based on the FR values, which were very low, low, moderat high, and very high. The proportions of each area are: very low (15.97%), low (23.25%

Non-Landslide Samples Selected by FCM and K-Means
LSM generated based on cluster analysis does not need to identify the positive and negative labels of the samples in advance. Based on the curve of the clustering effectiveness index Vcs (Figure 8), the preferred value is five. Consequently, the study area was reclassified into five areas based on the FR values, which were very low, low, moderate, high, and very high. The proportions of each area are: very low (15.97%), low (23.25%), moderate (19.29%), high (33.5%) and very high (8%). Among them, the very-low area accounted for 15.97% of the whole study area with only 3.24% of landslide locations and an FR value of 0.2. Besides, the high or very-high area accounted for 41.5% of the study area with more than 55% of landslide locations and the FR values were both greater than 1.  Similarly, the results constructed by K-means were shown in Table 3. The proportions of each area are: very low (11.66%), low (22.30%), moderate (18.71%), high (39.16%), and very high (8.17%). The very-low area accounted for only 1.62% of landslide locations with an FR value of 0.14. The high or very-high area accounted for 47.33% of the study area with more than 55% of landslide locations. Compared to the results obtained by FCM, the area with low or very low class predicted by K-means occupied a smaller area (5.26%) while a bigger area (5.83) with high or very high class. The zoning maps should follow two rules: (1) the recorded landslides should appear in high-susceptibility areas as many as possible and (2) the high-susceptibility area should occupy a small proportion (Bui et al., 2012). Therefore, the results obtained by FCM were more reasonable. Selecting the non-landslide samples in a more reliable area is the main purpose and it means that the bigger the very-low class area, the easier the sampling will be. Meanwhile, 370 non-landslides samples were collected from the area with very-low susceptibility predicted by FCM.

Number of clustering centers
Clustering validity function Vcs Similarly, the results constructed by K-means were shown in Table 3. The proportions of each area are: very low (11.66%), low (22.30%), moderate (18.71%), high (39.16%), and very high (8.17%). The very-low area accounted for only 1.62% of landslide locations with an FR value of 0.14. The high or very-high area accounted for 47.33% of the study area with more than 55% of landslide locations. Compared to the results obtained by FCM, the area with low or very low class predicted by K-means occupied a smaller area (5.26%) while a bigger area (5.83) with high or very high class. The zoning maps should follow two rules: (1) the recorded landslides should appear in high-susceptibility areas as many as possible and (2) the highsusceptibility area should occupy a small proportion (Bui et al., 2012). Therefore, the results obtained by FCM were more reasonable. Selecting the non-landslide samples in a more reliable area is the main purpose and it means that the bigger the very-low class area, the easier the sampling will be. Meanwhile, 370 non-landslides samples were collected from the area with very-low susceptibility predicted by FCM.

Evaluation and Comparison of Different Models
To highlight the performance of the Stacking model, three basic classifiers were also applied for modeling. Analyses of the statistical measures using the training set were shown in Table 4. The Stacking showed the best performance in terms of classifying landslides (sensitivity = 91.89%), followed by the GBDT model (sensitivity = 86.97%), the Ada-DT model (sensitivity = 85.66%) and RF model (sensitivity = 79.93%). In terms of the classification of non-landslides zones, Stacking model also performed best (specificity = 91.84%), followed by the GBDT model (specificity = 85.67%), the Ada-DT model (specificity = 82.26%) and the CART model (specificity = 83.16%). Besides, the Stacking model also had the highest accuracy (91.84%). It was noticed that the Stacking model achieved an AUC of 0.963, while RF was 0.920, GBDT was 0.957 and Ada-DT was 0.959 ( Table 5). The standard errors were less than 0.05 and the probability estimation was negligible. The predictive capacity needs to be evaluated using validation data. The results confirmed that the Stacking model perform the best as the values of sensitivity, specificity, accuracy and AUC were highest (Tables 6 and 7), which was 91.78%, 90.54%, 91.16% and 0.944, respectively, followed by Ada-DT (sensitivity = 86.96%, specificity = 82.19%, accuracy = 85.13% and AUC = 0.917), GBDT (sensitivity = 86.11%, specificity = 84.00%, accuracy = 85.03% and AUC = 0.910), and RF (sensitivity = 81.33%, specificity = 75.34%, accuracy = 78.38 and AUC = 0.906) (Figure 9).    The Stacking model exhibited the best both in training and validation data compared to the other three ensemble learning methods, which indicated ideal goodness-of-fit to modeling and generalization capability. The performance of GBDT and Ada-DT was similar, and the RF model performed the worst but was still satisfactory. The gaps in performance between training and validation data were not obvious among the models. Compared to the RF model, the application of the Stacking model enhanced the performance significantly and was regarded as the most suitable model for LSM in this study.

Application of Stacking Method for LSM
The above analysis proves that the Stacking method has superior ability in LSM compared with the other three models. Therefore, the probability of landslides occurring was calculated for all mapping units in the whole study area. The LSM was also constructed with five susceptible classes, which were very low (0-0.2), low (0.2-0.4), moderate (0.4-0.6), high (0.6-0.8), and very high (0.8-1) (Figure 10). Table 3 showed the distribution ratio of each level. The very low susceptible level occupied 26.04% of the area while low, moderate, high, and very high susceptible levels represented 15.31%, 15.46%, 32.45%, and 10.74%, respectively ( Figure 11). It was noticed that LSM has the smallest area percentage in very high susceptibility levels while the largest is in high. Landslide locations were mostly distributed in the red areas. Meanwhile, most of the non-landslide samples screened by FCM clustering appeared in blue areas. The Stacking model exhibited the best both in training and validation data compared to the other three ensemble learning methods, which indicated ideal goodness-of-fit to modeling and generalization capability. The performance of GBDT and Ada-DT was similar, and the RF model performed the worst but was still satisfactory. The gaps in performance between training and validation data were not obvious among the models. Compared to the RF model, the application of the Stacking model enhanced the performance significantly and was regarded as the most suitable model for LSM in this study.

Application of Stacking Method for LSM
The above analysis proves that the Stacking method has superior ability in LSM compared with the other three models. Therefore, the probability of landslides occurring was calculated for all mapping units in the whole study area. The LSM was also constructed with five susceptible classes, which were very low (0-0.2), low (0.2-0.4), moderate (0.4-0.6), high (0.6-0.8), and very high (0.8-1) (Figure 10). Table 3 showed the distribution ratio of each level. The very low susceptible level occupied 26.04% of the area while low, moderate, high, and very high susceptible levels represented 15.31%, 15.46%, 32.45%, and 10.74%, respectively ( Figure 11). It was noticed that LSM has the smallest area percentage in very high susceptibility levels while the largest is in high. Landslide locations were mostly distributed in the red areas. Meanwhile, most of the non-landslide samples screened by FCM clustering appeared in blue areas.  The high or very-high susceptibility areas are mainly distributed closed to streams or provincial highway, which runs through three townships including Fanzipai Town, Sihetang Town, and Fengjiayu Town in the study area. These areas are densely populated.
The landslide susceptibility class ranged from very low to very high around the Miyun reservoir. It is noteworthy that once a landslide occurs in this area, a series of disaster chains may be induced.

Analysis of Major Conditioning Factors
The stacking method performed the best in terms of accuracy, but the results had a poor analysis of the occurrence of landslides, which was confusing. Understanding the major factors that have a significant contribution to landslides occurring helps in the prevention and treatment of landslides. Based on the Gini index, ten major parameters were selected and normalized as shown in Table 8   The high or very-high susceptibility areas are mainly distributed closed to stre or provincial highway, which runs through three townships including Fanzipai To Sihetang Town, and Fengjiayu Town in the study area. These areas are densely popul The landslide susceptibility class ranged from very low to very high around the yun reservoir. It is noteworthy that once a landslide occurs in this area, a series of dis chains may be induced.

Analysis of Major Conditioning Factors
The stacking method performed the best in terms of accuracy, but the results h poor analysis of the occurrence of landslides, which was confusing. Understanding major factors that have a significant contribution to landslides occurring helps in the vention and treatment of landslides. Based on the Gini index, ten major parameters selected and normalized as shown in Table 8  The high or very-high susceptibility areas are mainly distributed closed to streams or provincial highway, which runs through three townships including Fanzipai Town, Sihetang Town, and Fengjiayu Town in the study area. These areas are densely populated.
The landslide susceptibility class ranged from very low to very high around the Miyun reservoir. It is noteworthy that once a landslide occurs in this area, a series of disaster chains may be induced.

Analysis of Major Conditioning Factors
The stacking method performed the best in terms of accuracy, but the results had a poor analysis of the occurrence of landslides, which was confusing. Understanding the major factors that have a significant contribution to landslides occurring helps in the prevention and treatment of landslides. Based on the Gini index, ten major parameters were selected and normalized as shown in Table 8, including DTS, DTR, elevation, slope angle, TWI, maximum 24 h rainfall, lithology, MED, maximum seven days of rainfall, and profile curvature. Among them, DTS, DTR, and elevation have a significant impact on the occurrence of landslides (Figure 12), the weight values of which were 0.37, 0.34, and 0.16, respectively. While the weight values of lithology, MED, maximum seven days rainfall, and profile curvature were close to 0.01, which had a limited contribution. The weight values of slope angle, TWI, and maximum 24 h rainfall were close to 0.04, 0.03, and 0.02, respectively. rence of landslides (Figure 12), the weight values of which were 0.37, 0.34, and 0.16, respectively. While the weight values of lithology, MED, maximum seven days rainfall, and profile curvature were close to 0.01, which had a limited contribution. The weight values of slope angle, TWI, and maximum 24 h rainfall were close to 0.04, 0.03, and 0.02, respectively.  Therefore, three conditioning factors, namely DTS, DTR, and elevation, were considered the major factors responsible for the landslide. Rivers are an important factor affecting the occurrence of landslides. On the slopes closer to the river, the toe of the slope is easily soaked by the river water, which reduces the strength of the rock and makes landslides more likely. Road development and construction are important tasks in mountainous area construction. However, unreasonable road excavation is a common human factor that induces geological disasters. Road construction often produces a large number of slopes, which destroy the stability of the slope and finally, lead to the occurrence of landslides.
The relationship between the major factors and landslides was further explored by calculating the FRi of each parameter ( Table 9)   Therefore, three conditioning factors, namely DTS, DTR, and elevation, were considered the major factors responsible for the landslide. Rivers are an important factor affecting the occurrence of landslides. On the slopes closer to the river, the toe of the slope is easily soaked by the river water, which reduces the strength of the rock and makes landslides more likely. Road development and construction are important tasks in mountainous area construction. However, unreasonable road excavation is a common human factor that induces geological disasters. Road construction often produces a large number of slopes, which destroy the stability of the slope and finally, lead to the occurrence of landslides.
The relationship between the major factors and landslides was further explored by calculating the FRi of each parameter ( Table 9) The selection and analysis of major factors by combining basic machine learning and bivariate methods made up for the defects of stacking, thereby ensuring the integrity of geological hazard assessment. Table 9. Spatial relationship between landslide conditioning factors and landslides using frequency ratio.

. Internal and External Cross-Validation
The basic classifiers used in our work have several hyperparameters that control the behavior and performance. In some cases, reasonable "guesses" are available (e.g., n tree = 500 in RF), in other cases classifiers are very sensitive to the parameters, which means that default hyperparameter settings fail to guarantee optimal performance of machine-learning techniques. Therefore, hyperparameters need to be tuned before application and inner cross-validation should be used for this [48].
On the other hand, external cross-validation was also essential. One can find an "excellent model" using the method "Leave-One-Out" because of the randomness in the sampling scheme, the results of which are unconvincing. Only by implementing a more rigorous k-fold (or other types) cross-validation scheme can one infer the actual capacity of a model to learn the functional relationships between landslides and causative factors as well as the variability that the models and the susceptibility estimates exhibit [45,48].
While various machine-learning algorithms have been recognized in recent years due to their powerful capabilities of data processing and generalization, there are several practical challenges related to bias-reduced assessment of a model's predictive power and some researchers often ignore them, which leads to an unreliable or uncertain result. Single hold-out model performance measures were popular [49]. However, statistically based landslide susceptibility models desire a more credible validation and assessment before generalization.

The Selection of Non-Landslide Samples
A complete disaster inventory map is emphasized in a multitude of studies, which consists of the locations and number of a certain disaster [9]. The quality of landslide presence samples is more convincing compared to that of landslide absence because nonlandslide samples are selected randomly or subjectively although quite a few methods or principles will be adopted. Seldom do studies consider or discuss the noise and influence of the absence of data bring to data-driven models [50]. Non-landslide points need to be selected from low-prone areas as far as possible, which is arduous to implement by selecting randomly. Clustering analysis help solve the problem by combining with the bivariate methods. FR was calculated to judge the area with low susceptibility based on the results of FCM and K-means in this study and the non-landslide samples were generated from it, which improved the quality of non-landslide records and the performance of models logically. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The related code of machine learning applied in the study is available at https://github.com/Liangzhu-mz, accessed on 6 April 2022.