A Novel Heterogeneous Ensemble Framework Based on Machine Learning Models for Shallow Landslide Susceptibility Mapping

: Landslides are devastating natural disasters that seriously threaten human life and property. Landslide susceptibility mapping (LSM) plays a key role in landslide hazard management. Machine learning (ML) models are widely used in LSM but suffer from limitations such as overﬁtting and unreliable accuracy. To improve the classiﬁcation performance of a single machine learning (ML) model, this study selects logistic regression (LR), support vector machine (SVM), random forest (RF), and gradient boosting decision tree (GBDT), and proposes a novel heterogeneous ensemble framework based on Bayesian optimization (BO), namely, stratiﬁed weighted averaging (SWA), to test its applicability in a typical landslide area in Yanbian Prefecture, China. Firstly, a dataset consisting of 1531 historical landslides was collected from ﬁeld investigations and historical records, and a spatial database containing 16 predisposing factors was established. The dataset was divided into a training set and a test set in a ratio of 7:3. The results showed that SWA effectively improved the Accuracy, AUC, and robustness of the model compared to a single ML model. The SWA achieved the best classiﬁcation results (Accuracy = 91.39% and AUC = 0.967). To verify the generalization ability of SWA, we selected published landslide datasets from Yanshan country and Yongxin country in China for testing. SWA also performed well, with an AUC of 0.871 and 0.860, respectively. As indicated by shapely values (SVs), Normalized Difference Vegetation Index (NDVI) is the factor that has the greatest impact on landslide occurrence. The landslide susceptibility maps obtained from this study will provide an effective reference program for land use planning and disaster prevention and mitigation projects in Yanbian Prefecture, China.


Introduction
According to the Critical Incident Database (https://www.emdat.be,accessed on 12 March 2023), landslides around the world have caused 66,438 deaths and economic losses of approximately USD 10.8 billion [1] during 1900-2020.Accurate prediction of the potential location of landslides can greatly minimize the losses caused by landslides.Landslide susceptibility prediction can effectively address this major issue based on the landslide inventory and related predisposing factors [2,3].Therefore, landslide susceptibility mapping (LSM) is an important part of landslide prevention and management, which is often considered the first stage of disaster management and can provide scientific guidance for further disaster management [4,5].Currently, various models have been designed based on GIS and machine learning (ML) techniques: from the initial statistical application models such as the Frequency Ratio (FR), Analytic Hierarchy Process (AHP), and Information Value Method (IVM) to the current Support Vector Machine (SVM) using kernel approach, Random Forest (RF) based on [6] Bagging, and XGBoost [7] based on Boosting.
Statistical methods are usually constructed using linear analysis between historical landslides and predisposing factors and are widely used in LSM due to their simple principles and efficient computations.Rehman et al. [8] used Muzaffarabad as the study area and calculated the subjective and objective weights of all conditioning factors and their categories using AHP and FR.In addition, Lin et al. [9] explored the application of the IVM method in geological hazard evaluation to provide a scientific basis for the development of regional geological hazard prevention and control strategies.However, landslides occur under the coupling of multiple factors, and the data distribution of multiple contributing factors is different, so it is obvious that the relationship is not linear in the strict sense.Therefore, there are limitations to exploring LSM.
However, the ML method [10] can effectively overcome the effects of different data distributions and is good at capturing nonlinear relationships, improving the computational efficiency and accuracy of results by leaps and bounds.Models often used for LSM analysis are logistic regression (LR), support vector machine (SVM), random forest (RF), decision tree (DT), etc. Research results have been published on the application of these ML models in LSM.Chen et al. [11] used Chongren County, China, as the study area and combined it with 333 landslide sites for analysis, finally finding that RF has considerable performance in terms of AUC and statistical metrics.However, Hong et al. [12] showed that the predictive performance of LR is better than that of RF.Therefore, there are geographical differences in the prediction of LSM by different models, and there are limitations to improving the final classification effect by considering only the optimization of a single model.The ensemble methods [13] not only integrate the predictions of multiple models but are also more robust and easier to scale than a single model.The use of ensemble learning for LSM has been explored by scholars such as Lv et al. [14], who selected three base learners and four ensemble methods.The ensemble model will give better performance than a single model.
Therefore, for the application of EL in LSM, this paper proposes an ensemble framework for landslide susceptibility prediction in Yanbian Prefecture.In this paper, four of the most representative ML models, namely, LR, SVM, RF, and GBDT, are selected as single classifiers and combined with Bayesian optimization (BO) to propose a stratified weighted averaging (SWA) framework.This study has the following three main contributions.(1) It introduces BO into the hyperparameter optimization of a single model to improve the classification ability and robustness of a single model.(2) The four different ML models have significant differences in algorithm structure and feature processing, and SWA can reflect the performance improvement for a single model.At the same time, this paper comprehensively evaluates the performance of all models from multiple metrics, which can overall reflect the feasibility of all selected models.(3) Due to the uninterpretable characteristics of the ML training process, we cannot obtain the degree of influence of the selected conditional factors on the occurrence of landslides.Therefore, the SHAP method is selected to derive the factors that contribute most to the occurrence of landslides in the study area.Unlike previous related studies [15], this study obtains a set of model optimal parameter combinations after BO of a single model, and model derivation is performed on top of this to search for the best combination of weight values.On the one hand, the number of discriminative models is increased to improve the accuracy of prediction, and on the other hand, the computational process is simple and expandable.

Description of the Area
Yanbian Prefecture (Figure 1) is located in the Changbai Mountains region of Jilin Province, China, extending between longitudes of 127 • 27 E and 131 • 18 E and latitudes of 41 • 59 N and 44 • 30 N. The total area of Yanbian Prefecture is about 42,700 square kilometers, with the mountainous area accounting for 54.8% of the total area of the Prefecture, the plateau for 6.4%, the valley for 13.2%, the valley plain for 12.3%, and the hills for 13.3%.The overall topography of Yanbian Prefecture has a large elevation difference, with the west being higher and the east lower.The landforms are mountainous, hilly, and basin-like in 3 gradients from the southwest, northwest, and northeast to the southeast.The topography of Yanbian Prefecture combines both terrain height and slope factors for the development of geological hazards.
kilometers, with the mountainous area accounting for 54.8% of the total area of the Prefecture, the plateau for 6.4%, the valley for 13.2%, the valley plain for 12.3%, and the hills for 13.3%.The overall topography of Yanbian Prefecture has a large elevation difference, with the west being higher and the east lower.The landforms are mountainous, hilly, and basin-like in 3 gradients from the southwest, northwest, and northeast to the southeast.The topography of Yanbian Prefecture combines both terrain height and slope factors for the development of geological hazards.The climate of Yanbian Prefecture is characterized by distinct monsoons, dry and windy in spring, warm and rainy in summer, cool and less rainy in autumn, and long cold periods in winter [16].Precipitation spreads over a more uneven region, decreasing from the southeast to northwest.The leeward slopes of mountainous areas are smaller than the windward slopes, and areas close to the ocean with sufficient sources of water vapor are larger than those distant from the ocean with insufficient water vapor.The yearly average rainfall is mostly 550-650 mm, with rainfall concentrated from June to August, accounting for about 60% of the annual rainfall.The climate of Yanbian Prefecture is characterized by distinct monsoons, dry and windy in spring, warm and rainy in summer, cool and less rainy in autumn, and long cold periods in winter [16].Precipitation spreads over a more uneven region, decreasing from the southeast to northwest.The leeward slopes of mountainous areas are smaller than the windward slopes, and areas close to the ocean with sufficient sources of water vapor are larger than those distant from the ocean with insufficient water vapor.The yearly average rainfall is mostly 550-650 mm, with rainfall concentrated from June to August, accounting for about 60% of the annual rainfall.

Landslide Inventory
Landslide inventories are the basis for landslide susceptibility studies.Landslide inventory mapping can enhance our understanding of the relationship between historical landslide spatial distributions and LSM [17,18].The landslide inventory of Yanbian Prefecture is obtained by the Department of Natural Resources of Jilin Province through field research and historical disaster records during 2010-2020.The landslides in the study area are mainly shallow landslides with a thickness of 0.5-10 m in the form of rotation or translation, most of which are related to the upper soil layer [19].Landslides are mainly caused by rainfall.
Since landslides occur in a few areas and most areas are free of geological hazards, non-landslide points are selected at a ratio of the positive/negative samples of 1:1.15 to make the dataset more consistent with the actual geological environment.Meanwhile, according to the relevant studies [20], appropriately increasing the number of negative samples is beneficial to model training.Therefore, the dataset contains 1531 historical disaster points and 1761 non-landslide points that are randomly selected on top of the 400 m buffer zone of disaster points.The training and test sets are divided into a 7:3 ratio.

Data Preparation
Selecting the suitable predisposing factors is a key step in LSM [21].However, in the absence of adequate corresponding assessment standards and technical specifications, the selection of landslide conditioning factors depends on subjective judgment.Reichenbach et al. [22] analyzed 565 relevant papers over the period of 1983-2016, from which they found that these factors can be broadly classified into 5 categories: geological, hydrological, land cover, geomorphological, and other.On the basis of a field survey, a literature review, and available data from the study area, the following 16 condition factors are selected.All data preparation is implemented in ArcGIS 10.8 software.All data sources and types can be seen in Table 1.Elevation, slope, aspect, profile curvature, Topographical Roughness Index (TRI), Topographical Wetness Index (TWI), and Stream Power Index (SPI) extracted from the Digital Surface Model (DSM) are provided by the ALOS Global Surface Model.Lineaments [23] include faults, ridgelines, fissures, or boundaries between strata on the Earth's surface, with different lengths and orientations.To extract the lineaments in the study area, the DSM is used to obtain 0 • , 45 The raster data for lithology and landform are obtained from A New Map of Global Ecological Land Units [24] and The New Global Lithological Map Database [25].Fault data are provided by the Seismic Fault Active Survey Data Center (SFACDC) with a scale of 1:500,000.Road and water system data are obtained from the National Platform for Common Geospatial Information Services (NPCGIS).Normalized Difference Vegetation Index (NDVI) is calculated for each annual year of Landsat 5/8 remote sensing images based on Google Earth Engine (GEE), and then the maximum NDVI is obtained for the location of each image pixel.Yearly average rainfall (YAR) is provided from Fine Resolution Mapping of Mountain Environment during 1991-2020.Landform, Lithology, and Land cover are categorical variables, so the nearest neighbor resampling method is used to unify the resolution to 30 m × 30 m.
Elevation [26] controls the direction of watercourses and the density of the drainage network and has a significant effect on soil moisture and slope (Figure 2a).The slope controls the shear forces acting on the slope and is one of the key causes of landslides (Figure 2b).Theoretically, the risk of landslides increases in areas with higher slopes [27,28] and decreases to zero in areas with slopes below 5 • .Aspect is related [29] to the occurrence of landslides because slopes with different aspects are affected differently by precipitation and solar radiation (Figure 2c).Pro Cur (Figure 2d) and TRI (Figure 2g) [30,31] can effectively reflect the topographic complexity and changes in surface relief and erosion patterns.The presence of lineaments [32] in an area can lead to more water infiltration into joints and cracks, thus increasing the likelihood of landslides.For quantitative analysis, lineament density (LD) is used as the predisposing factor (Figure 2h).The development of cracks and shear cracks along the fracture zone [33], the formation of fault damage zones [34], and water infiltration can damage the structure of nearby geotechnical bodies, so the distance to faults (Figure 2k) should be taken into account.The relief of the topography also plays an important role in the occurrence of landslides.Landslides mainly occur [35] in the loose accumulation layer, but landslides at bedrock are relatively rare.Lithology (Figure 2j) reflects the [36] physical and mineralogical properties of geotechnical bodies in the study area.
There are many water systems in the study area, and the rivers near the water systems have a strong erosion effect on the geotechnical body, which can easily lead to landslides.And the closer the water system is, the more serious the erosion is, so DTW (Figure 2m) and SPI (Figure 2f) [37,38] are included in the analysis.TWI (Figure 2e) is a physical indicator of the effect of regional topography on runoff direction and accumulation.This index [39] helps to identify rainfall runoff patterns, potential areas of increased soil water content, and areas of ponding and quantifies the control of topography on underlying hydrological processes.Rainfall is a common and important trigger in landslide hazards.Strong rainfall [40] not only softens the sliding zone but also increases the self-weight of the slope body.And the rainwater (Figure 2p) infiltrates and transports along the joint fissures and remains for a long time, which makes the rock body soft and changes the internal stress state of the slope, thus causing landslides to occur.
It is noteworthy that previous relevant studies have rarely considered the effects of human activities, but such factors cannot be ignored when determining landslide susceptibility.Human construction activities [41] such as slope cutting, road building (Figure 2l), and bridge building will change the internal structure of the geotechnical body, thus affecting the stability of the slope.LC (Figure 2o) reflects the situation of land use [42].Vegetation cover areas, bare ground, and water bodies have an important influence on the stability of slopes.NDVI (Figure 2n) reflects the vegetation cover of the study area [43].The presence of slope vegetation enhances the resistance of hillsides to landslides.After establishing a good spatial database, Z-Score [44] normalization is used to facilitate the training of machine learning (ML) modeling.
where µ represents the overall mean and σ represents the overall standard deviation.
the slope body.And the rainwater (Figure 2p) infiltrates and transports along the joint fissures and remains for a long time, which makes the rock body soft and changes the internal stress state of the slope, thus causing landslides to occur.
It is noteworthy that previous relevant studies have rarely considered the effects of human activities, but such factors cannot be ignored when determining landslide susceptibility.Human construction activities [41] such as slope cutting, road building (Figure 2l), and bridge building will change the internal structure of the geotechnical body, thus affecting the stability of the slope.LC (Figure 2o) reflects the situation of land use [42].Vegetation cover areas, bare ground, and water bodies have an important influence on the stability of slopes.NDVI (Figure 2n) reflects the vegetation cover of the study area [43].The presence of slope vegetation enhances the resistance of hillsides to landslides.After establishing a good spatial database, Z-Score [44] normalization is used to facilitate the training of machine learning (ML) modeling.
where  represents the overall mean and σ represents the overall standard deviation.

Methodology
Figure 3 shows the technical route, and this study includes the following steps.First, a dataset of the 16 predisposing factors mentioned in Section 2.3 and a dataset of hazard and non-hazard points are prepared.Secondly, 4 conventional ML models such as LR, SVM, RF, and GBDT are combined with BO for hyperparameter optimization, while a comparative study is conducted with the SWA model.Thirdly, to solve the black box problem of modeling, the SHAP method is introduced into the ML model interpretability analysis to evaluate the contribution value of each conditioning factor, which is introduced in Section 5.1.

Methodology
Figure 3 shows the technical route, and this study includes the following steps.First, a dataset of the 16 predisposing factors mentioned in Section 2.3 and a dataset of hazard and non-hazard points are prepared.Secondly, 4 conventional ML models such as LR, SVM, RF, and GBDT are combined with BO for hyperparameter optimization, while a comparative study is conducted with the SWA model.Thirdly, to solve the black box problem of modeling, the SHAP method is introduced into the ML model interpretability analysis to evaluate the contribution value of each conditioning factor, which is introduced in Section 5.1.

Information Gain Ratio
To improve the training efficiency of the model, the first step should be to filter the selected elements.The information gain (IG) is introduced to measure how much information the feature can bring to the classification task.The more information it brings, the more important the feature is.The expression of IG [45] is shown below: where H(X) represents the information entropy of X, and H(X|Y) represents the information entropy of the subset as a whole after the division based on different values of the Y array, also known as conditional entropy.The difference between the two is the information gain.

Multicollinearity Analysis
While considering the importance of the features, it is also necessary to consider the correlation between the features, and the metrics with too high a correlation should be eliminated to ensure the training efficiency of the model.To this end, the Pearson correlation coefficient is introduced [46].This indicator is used to measure the covariance between variables X and Y, and its value range is [−1, 1]; ρ = 0 means no correlation between the two variables, and ρ = 1 or ρ = −1 means a linear relationship between the two variables.The expression is as follows: where Cov(X, Y) represents the covariance between the variables X and Y; σ X and σ Y represent the standard deviation of the variables X and Y. µ X and µ Y represent the means of X and Y, respectively.E(•) represents the mathematical expectation.The strength of the correlation has the following divisions: very strong (0.9-1.0), high (0.7-0.9), moderate (0.4-0.7), low (0.2-0.4), and very weak (0.0-0.2).

Logistic Regression
Logistic regression (LR) is a model often used in supervised learning to solve dichotomous classification tasks [47].It is often used in LSM, and the expression of logistic regression is shown below: where z = w

Support Vector Machine
Support vector machine (SVM) is a supervised machine learning algorithm that uses kernel methods and has a very excellent classification effect.SVM is often used in LSM [48].SVM classifies the model with the smallest possible classification error on the data by updimensioning the data and finding a hyperplane as a decision boundary in the distribution of data, using the distance from the sample to the hyperplane.And the loss function of the SVM is as follows: where ||w|| 2 represents the weight vector w = (w 1 , w 2 , . . .w i ) of l2 parametrizations.The vector w represents the weights.Φ(x i ) represents the selected kernel function, while w • Φ(x i ) + b ≥ 1 − ζ i represents the final chosen decision hyperplane.ζ i is used as a slack variable to measure how much the instance allows for interval misclassification.And C is used as a penalty factor to control the penalty by weighing between minimization and ζ i , under joint action of the 3 parameters acting together, thus minimizing the overall loss function.

Random Forest
The basic unit of Random Forest (RF) is the decision tree [49].Each decision tree is a classifier.N trees will have N classification results.RF integrates all category voting results, assigning the most voted category as the final output.A single decision tree takes cross-entropy or the Gini coefficient to calculate the label impurity for tree building.Information entropy is used to measure the degree of confusion in the data and can also be used to measure the label purity of a dataset.The higher the entropy value, the greater the amount of information contained.The Gini coefficient is used as a measure of purity by calculating 1 minus the sum of squares of p(i|t).The smaller the Gini coefficient, the higher the label purity.

Gradient Boosting Decision Tree
Gradient Boosting Decision Tree (GBDT) [50] is often used in LSM due to its excellent classification performance and robustness.Unlike RF, GBDT uses regression trees to solve classification problems.After multiple iterations, each classifier is trained based on the residuals of the previous classifier, and the accuracy of the final classifier is continuously improved by reducing the bias.
Specifically, based on the results of the previous weak evaluator f (x) t−1 , the loss function L(x,y) is computed.L(x,y) is used to adaptively influence the construction of the next weak estimator f (x) t and integrate the model output.The output result of evaluator is H t (x i ), the result of which is affected by the overall all weak estimators f (x) 0 ∼ f (x) T .
The Friedman Mean Square Error (F-MSE) [51] is the main impurity measure used in GBDT.F-MSE can be interpreted as the product of the summed mean of the sample sizes on the left and right leaf nodes and the squared mean squared error difference between the left and right leaf nodes.w L and w R represent the sample size on the left and right leaf nodes, respectively.r i is the residual on sample i. ŷi is the predicted value of sample at the current sub-node.This approach allows impurities to fall more quickly, making the overall branching more efficient.

Bayesian Optimization
Hyperparameter optimization [52] can have a significant impact on the performance of machine learning models, which mainly include various types of grid-based search methods (GS) [53], Bayesian optimization (BO), and genetic algorithms (GAs).GS methods use enumeration or random sampling to search for parameters through the defined parameter space, and the search process generates a significant amount of computation.Although genetic algorithms [54] can effectively improve model performance, they involve complex coding and decoding processes and depend on the merit of the initial population.BO [55] uses different probabilistic surrogate models (PSMs) with acquisition functions (AFs) to find the optimal solution fast and accurately.
The basic flow of BO is as follows.Assume that f(x) is a smooth and uniform function.First, two real observations are selected and the observation curve f* is inferred based on the PSM and contains the confidence level of this curve.Based on the PSM output, the maximum value of AF is calculated, and the x pointed by the maximum value is the next actual observation point.The estimated f* is continuously corrected by multiple observations.After a set number of estimations, f* continues approximating f(x), and then the minimum value of the final f* is taken as the minimum value of f(x).
In this paper, the tree-structure parameter estimator (TPE) is used as PSM and AF is used as the expected increment.The TPE process [56] divides p(x|y) into two parts, and the equation is shown below: TPE constructs different distributions for observation point x on either side of the threshold y * , where l(x) is the probability density function formed using the observation variable x (i) such that y * > y (i) .g(x) is the probability density function using the remaining observations.The threshold for delineation is the quantile γ, so we have p(y * < y (i) ) = γ.Therefore, we can obtain p(x) and the expression for the collection function.

Stratified Weighted Averaging
Weighted averaging (WA) [57] is a common ensemble method that is widely used in landslide susceptibility prediction.This study aims to proposes an improved ensemble method based on WA.The stratified weighted averaging (SWA) is implemented in 3 steps.The whole process searches the weight values as hyperparameters to obtain the final prediction probability.First, the training set is divided into 5 copies, each containing 1 validation set and 4 sub-training sets (Figure 4).On top of the optimal combination of parameters for BO, the hyperparameters are adjusted using a grid search for 5 rounds of training.For a single classifier, 5 derived models can be obtained.Next, an intra-group fusion of predicted probabilities is performed (Figure 5).Taking LR as an example, the predicted probability ( LR X ) of the LR model is obtained by predicting the 5 derived models ([LR1, LR2, LR3, LR4, LR5]) on the testing set, using BO search weight values.x .And similarly, SVM X , RF X , and GBDT X represent the predicted probabilities of SVM, RF, and GBDT on the testing set, respectively.As a result, the predicted probabilities of intra-group fusion are obtained for all 4 base classifiers.Finally, the weight values of inter-group fusion are obtained by BO to obtain the final predicted probability ( P X ), which yields the classification result.The judgment threshold is also searched as a hyperparameter and the final judgment threshold obtained is 0.5688.
(157 ) Next, an intra-group fusion of predicted probabilities is performed (Figure 5).Taking LR as an example, the predicted probability (X LR ) of the LR model is obtained by predicting the 5 derived models ([LR1, LR2, LR3, LR4, LR5]) on the testing set, using BO search weight values.x LR1 represents the predicted probability of LR1 on the testing set, and w LR1 represents the weight value of x LR1 .And similarly, X SV M , X RF , and X GBDT represent the predicted probabilities of SVM, RF, and GBDT on the testing set, respectively.As a result, the predicted probabilities of intra-group fusion are obtained for all 4 base classifiers.Finally, the weight values of inter-group fusion are obtained by BO to obtain the final predicted probability (X P ), which yields the classification result.The judgment threshold is also searched as a hyperparameter and the final judgment threshold obtained is 0.5688.

Model Evaluation Metrics
LSM needs to be validated to have scientific significance, so it is necessary to select the appropriate indicators to evaluate the selected model.The ROC curve [58] is used as an important indicator of the overall classification effectiveness of the model.The hori-

Model Evaluation Metrics
LSM needs to be validated to have scientific significance, so it is necessary to select the appropriate indicators to evaluate the selected model.The ROC curve [58] is used as an important indicator of the overall classification effectiveness of the model.The horizontal coordinate of the ROC curve is the false positive rate, which represents the proportion of false samples among all the samples predicted to be of class 0. The vertical coordinate is the recall rate, which focuses on the proportion of all class 1 samples that are not accurately identified.The AUC is divided into 4 classes: poor (0.5-0.7), moderate (0.7-0.85), good (0.85-0.95), and excellent (0.95-1.00).
We obtain the confusion matrix [59] by comparing the predicted labels of the model in the testing set with the true labels based on the number of samples with false positive (FP), false negative (FN), true positive (TP), true negative (TN).TS is the sum of the number of TP, TN, FP, and FN samples.On this basis, Accuracy, Recall, and Precision are calculated.Meanwhile, to weigh the prediction results against the test set regarding the proportion of class 1 samples, we consider using the harmonic mean of Precision and Recall, i.e., F1-Score, as the model evaluation metric [60].Finally, to measure the consistency between the predicted and true values, the Kappa coefficient is introduced.The more unbalanced the confusion matrix, the higher the P exp and the lower the Kappa value.The Kappa coefficients' [61] range is [−1, 1] and can be divided into 5 groups to represent different levels of consistency: slight (0.00-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), and almost perfect (0.81-1.00).

Result 4.1. Factor Assessment Results
Figure 6a shows the information gain value (IG) of each landslide conditioning factor.The top three IG values are elevation, TRI, and yearly average rainfall, which correspond to the complex topography of Yanbian Prefecture; the following factors are Land cover and NDVI, which reflect the influence of ecological environment on the occurrence of landslides, and places with high vegetation cover and good soil and water conservation will inhibit the occurrence of landslides; TWI is ranked last with IG values of 0, but TWI is an essential indicator of geological elements and topography analysis.So, it is not excluded.
The top three IG values are elevation, TRI, and yearly average rainfall, which correspond to the complex topography of Yanbian Prefecture; the following factors are Land cover and NDVI, which reflect the influence of ecological environment on the occurrence of landslides, and places with high vegetation cover and good soil and water conservation will inhibit the occurrence of landslides; TWI is ranked last with IG values of 0, but TWI is an essential indicator of geological elements and topography analysis.So, it is not excluded.As shown by the Pearson correlation coefficient graph (Figure 6b), the correlation coefficients of any two groups of 16 selected factors are less than 0.7, and there is no strong correlation.In summary, all 16 selected factors are used in LSM.

Landslide Susceptibility Maps
There are three main steps in making LSM [62].Firstly, all the predisposing factors are added to the pixel cells of the whole study area, and then the probability of landslide occurrence is predicted separately under different models.Finally, the study area is classified into five classes using the quantile method, namely, very low, low, medium, high, As shown by the Pearson correlation coefficient graph (Figure 6b), the correlation coefficients of any two groups of 16 selected factors are less than 0.7, and there is no strong correlation.In summary, all 16 selected factors are used in LSM.

Landslide Susceptibility Maps
There are three main steps in making LSM [62].Firstly, all the predisposing factors are added to the pixel cells of the whole study area, and then the probability of landslide occurrence is predicted separately under different models.Finally, the study area is classified into five classes using the quantile method, namely, very low, low, medium, high, and very high.The percentages of very low and very high prediction results (Figure 7) show a similar distribution pattern in the LSM of the five models.
The predicted results are consistent with the geographic characteristics of the study area.Very high areas are mostly areas with low elevation, while areas with high vegetation cover are mostly very low areas.According to the distribution results of LSM, the prediction results of RF-TPE are mostly distributed in low, median, and high classes, and the prediction results of GBDT-TPE are the opposite.SWA considers the prediction results of the four models simultaneously and assigns different weights to make the results of LSM more reliable.
The percentages of landslide susceptibility classes under the five models are shown in Figure 8.For the LSM generated by the LR-TPE model, the percentages of pixels occupied by very high and very low grades are 19.51% and 19.79%, respectively, while low, medium, and high occupy 20.49%, 20.17%, and 20.05%, respectively.For the LSM generated by SVM-TPE, the percentages of pixels occupied by very high and very low grades are 19.97% and 19.94%, respectively, while low, medium, and high occupy 20.15%, 20.07%, and 19.86%, respectively.For the LSM generated by RF-TPE, the percentage of pixels occupied by very high and very low grades are 19.95% and 19.71%, respectively, while low, medium, and high occupy 20.25%, 20.16%, and 19.94%, respectively.For the LSM generated by GBDT-TPE, the percentages of pixels occupied by very high and very low grades are 19.92% and 19.96%, respectively, while low, medium, and high occupy 21.18%, 19.71%, and 19.23%, respectively.For the LSM generated by SWA, the pixels occupied by very high and very low grades are 19.39% and 19.94%, respectively, while low, medium, and high occupy 20.45%, 20.19%, and 20.04%, respectively.The above results indicate that the high-susceptibility areas are concentrated in the lower-terrain areas, which are characterized by frequent human activities, and land cover type is mostly cropland.Meanwhile, according to the DTR (Figure 2l), landslides are densely distributed near the roads, but the areas with higher vegetation cover are usually lower-susceptibility. Combining geomorphology, land cover, and human activities, the susceptibility distribution has geomorphic plausibility [63].The predicted results are consistent with the geographic characterist area.Very high areas are mostly areas with low elevation, while areas wi tion cover are mostly very low areas.According to the distribution resu prediction results of RF-TPE are mostly distributed in low, median, and h the prediction results of GBDT-TPE are the opposite.SWA considers the pr of the four models simultaneously and assigns different weights to mak

Model Validation and Comparison
The evaluation of model performance plays a key role in LSM.In this study, four ML models are selected as the base classifier, and BO is used to obtain their optimal parameter combinations, on top of which the SWA method is used for model fusion.There are 1531 landslide points and 1761 non-landslide points in the study area, and the dataset is divided into a training set and a testing set according to the ratio of 7:3.In this paper, six frequently used model performance metrics are selected for comparative study.All models are built and optimized using the Python libraries: Scikit-Learn 1.0.1 and Hyperopt 0.2.5.All assessment metrics are obtained through five rounds of cross-validation [64].

Model Validation and Comparison
The evaluation of model performance plays a key role in LSM.In this study, four ML models are selected as the base classifier, and BO is used to obtain their optimal parameter combinations, on top of which the SWA method is used for model fusion.There are 1531 landslide points and 1761 non-landslide points in the study area, and the dataset is divided into a training set and a testing set according to the ratio of 7:3.In this paper, six frequently used model performance metrics are selected for comparative study.All models are built and optimized using the Python libraries: Scikit-Learn 1.0.1 and Hyperopt 0.2.5.All assessment metrics are obtained through five rounds of cross-validation [64].
Considering the classification effect of landslides and non-landslides (Table 2), GBDT-TPE (90.88%) outperforms the other models in terms of accuracy, followed by RF-TPE with SVM-TPE (90.37%) and LR-TPE (88.15%).In terms of the classification accuracy of landslide points in prediction results, SVM-TPE achieves the best Precision (90.20%), followed closely by GBDT-TPE (89.11%), then RF-TPE (88.45%) and LR-TPE models (85.40%).In terms of the accuracy of landslide point classification in the test set, GBDT-TPE obtains the highest Recall (91.09%), followed by RF-TPE (Recall = 90.63%),SVM-TPE (89.22%), and LR-TPE (88.69%).In terms of F1-Score, GBDT-TPE obtains the highest F1-Score (0.901), followed by SVM-TPE (Recall = 0.897), RF-TPE (0.895), and LR-TPE (0.870).The Kappa coefficient is used to measure the prediction accuracy and the best performance is obtained for GBDT-TPE and RF-TPE (0.817), followed by SVM-TPE (0.807) and finally LR-TPE (0.761).Finally, we compare the single model with SWA (Figure 9).Except for Precision, the rest of the metrics achieve the best performance with AUC = 0.967 and Accuracy = 91.39%,while the overfitting is greatly alleviated.This makes the LSM more realistic, while the robustness of the model is improved.Finally, we compare the single model with SWA (Figure 9).Except for Precision, the rest of the metrics achieve the best performance with AUC = 0.967 and Accuracy = 91.39%,while the overfitting is greatly alleviated.This makes the LSM more realistic, while the robustness of the model is improved.

Analysis of Model Interpretability
The SHAP method [65] is often used to solve ML model interpretability problems.In interpreting the ML models, the SHAP values (SVs) can be understood as the importance of the contribution of the selected factors to the predicted values of the model.In the SHAP summary plots, the vertical coordinates represent the ranking of feature importance, from top to bottom, from most important to least important.The horizontal axis indicates the SV size of the selected factors [66], and the color of the selected factors represents the size of the value taken by the factor.When SV > 0, the larger the SV value, the more it promotes the occurrence of landslides, and conversely, when SV < 0, the smaller the SV, the more it inhibits the occurrence of landslides.
The Shapley value for feature i X in a model is shown above, where p is the total number of features, \ { } Ni is a set of all possible combinations of features excluding i X , S is a feature set in fS is the model prediction with features in S , and is the model prediction with features in S plus feature i X .In terms of topographic factors, elevation has the highest SV value (Figure 10).It is worth mentioning that this is not to suggest that elevation is a trigger condition for landslides to occur, but in the area of lower elevation, the terrain is flat, human engineering activities are frequent, and cutting slopes to build roads will destroy the original geotechnical structure.At the same time, there is little vegetation cover, soil erosion is serious, and erosion of geotechnical bodies is aggravated by heavy rainfall, which very easily causes damage.The lower-elevation area is coupled with many landslide susceptibility factors.In terms of environmental factors, NDVI has the highest SV value.Lower vegetation cover or near-water bodies [67] are mostly high-landslide-susceptibility areas, while higher NDVIs are usually lowlandslide-susceptibility areas.In terms of ecological factors, Land cover (LC) has the highest SV value.LC reflects the interaction between human activities and the natural environment [68], and areas with dense built-up areas or areas with dense cropland are

Analysis of Model Interpretability
The SHAP method [65] is often used to solve ML model interpretability problems.In interpreting the ML models, the SHAP values (SVs) can be understood as the importance of the contribution of the selected factors to the predicted values of the model.In the SHAP summary plots, the vertical coordinates represent the ranking of feature importance, from top to bottom, from most important to least important.The horizontal axis indicates the SV size of the selected factors [66], and the color of the selected factors represents the size of the value taken by the factor.When SV > 0, the larger the SV value, the more it promotes the occurrence of landslides, and conversely, when SV < 0, the smaller the SV, the more it inhibits the occurrence of landslides.

shape(
The Shapley value for feature X i in a model is shown above, where p is the total number of features, N\{i} is a set of all possible combinations of features excluding X i ,S is a feature set in N\{i}, f (S) is the model prediction with features in S, and f (S ∪ {i}) is the model prediction with features in S plus feature X i .In terms of topographic factors, elevation has the highest SV value (Figure 10).It is worth mentioning that this is not to suggest that elevation is a trigger condition for landslides to occur, but in the area of lower elevation, the terrain is flat, human engineering activities are frequent, and cutting slopes to build roads will destroy the original geotechnical structure.At the same time, there is little vegetation cover, soil erosion is serious, and erosion of geotechnical bodies is aggravated by heavy rainfall, which very easily causes damage.The lower-elevation area is coupled with many landslide susceptibility factors.In terms of environmental factors, NDVI has the highest SV value.Lower vegetation cover or near-water bodies [67] are mostly highlandslide-susceptibility areas, while higher NDVIs are usually low-landslide-susceptibility areas.In terms of ecological factors, Land cover (LC) has the highest SV value.LC reflects the interaction between human activities and the natural environment [68], and areas with dense built-up areas or areas with dense cropland are usually high-susceptibility areas, while shrub forest areas are usually low-susceptibility areas.In summary, NDVI is the most influential factor for landslide occurrence.It is recommended to build support projects in landslide-affected areas to prevent residents from being injured by falling rocks, with emphasis on ecological prevention and control to maintain soil and water stability [69].After incorporating the spatial database of landslides, a geohazard project with monitoring and early warning was established to determine the status of landslides and set the warning level for the reference of the local government [70].
usually high-susceptibility areas, while shrub forest areas are usually low-susceptibility areas.In summary, NDVI is the most influential factor for landslide occurrence.It is recommended to build support projects in landslide-affected areas to prevent residents from being injured by falling rocks, with emphasis on ecological prevention and control to maintain soil and water stability [69].After incorporating the spatial database of landslides, a geohazard project with monitoring and early warning was established to determine the status of landslides and set the warning level for the reference of the local government [70].
However, factors such as DTF and yearly average rainfall, which are crucial for landslide susceptibility studies, ranked low instead.Such factors require the necessary attention.Landslide susceptibility analysis requires comprehensive consideration.The experimental results show that the ensemble learning method obtains a higher prediction accuracy than the base classifier.SWA can better alleviate the overfitting and reduce bias.The four basic classifiers used in this study differ significantly in terms of classification effect and overfitting, and the greater the difference between the models, the better the results obtained by fusion will be.The final combination of weight values for SWA is [0.0014, 0.2866, 0.0404, 0.6716].Models with better classification performance (GBDT-TPE) and fitting effectiveness (SVM-TPE) are given higher weight values.At the same time, models with weak classification performance (LR-TPE) and severe overfitting (RF-TPE) are given lower weight values.The combination of weights corresponds to the results of the analysis in Section 4.2.The above analysis shows that the performance improvement brought by the SWA is a guideline for decision-makers to analyze landslide susceptibility based on heuristic experimental results.

Exploration of Model Generalization Ability
To discuss the generalization ability of SWA, the four base models retain their original parameters in Section 5.2 and SWA is implemented on this basis.This study selects two scholars' publicly available landslide inventory datasets for testing.Firstly, the dataset of landslides in Yanshan County, China (Figure 11a), was made public by Fang et al. [71].The basic geomorphology of Yanshan County is characterized by a high south and a low north.This dataset contains 380 historical landslide hazard sites, all of which are shallow landslides.The experimental results show that the ensemble learning method obtains a higher prediction accuracy than the base classifier.SWA can better alleviate the overfitting and reduce bias.The four basic classifiers used in this study differ significantly in terms of classification effect and overfitting, and the greater the difference between the models, the better the results obtained by fusion will be.The final combination of weight values for SWA is [0.0014, 0.2866, 0.0404, 0.6716].Models with better classification performance (GBDT-TPE) and fitting effectiveness (SVM-TPE) are given higher weight values.At the same time, models with weak classification performance (LR-TPE) and severe overfitting (RF-TPE) are given lower weight values.The combination of weights corresponds to the results of the analysis in Section 4.2.The above analysis shows that the performance improvement brought by the SWA is a guideline for decision-makers to analyze landslide susceptibility based on heuristic experimental results.

Exploration of Model Generalization Ability
To discuss the generalization ability of SWA, the four base models retain their original parameters in Section 5.2 and SWA is implemented on this basis.This study selects two scholars' publicly available landslide inventory datasets for testing.Firstly, the dataset of landslides in Yanshan County, China (Figure 11a), was made public by Fang et al. [71].The basic geomorphology of Yanshan County is characterized by a high south and a low north.This dataset contains 380 historical landslide hazard sites, all of which are shallow landslides.For the accuracy of a single model, SVM-TPE slightly decreases compared to SVM, but the other three models outperform LR, CNN, and RNN.In terms of Recall, the overall Recall value of the five selected models is lower than that of the model in the original paper.However, in terms of Precision, the Precision of five selected models is much higher than that of the original paper model.In the original paper, Simple averaging (SA) and

Figure 1 .
Figure 1.The map of study area.

Figure 1 .
Figure 1.The map of study area.

Figure 3 .
Figure 3.The flowchart of the developed methodology.Figure 3. The flowchart of the developed methodology.

Figure 3 .
Figure 3.The flowchart of the developed methodology.Figure 3. The flowchart of the developed methodology.

1
LR x represents the predicted probability of LR1 on the testing set, and wLR1 represents the weight value of 1 LR

Figure 5 .
Figure 5.The framework of stratified weighted averaging.

Figure 9 .
Figure 9.The ROC curves of all models.

Figure 9 .
Figure 9.The ROC curves of all models.

Figure 10 .
Figure 10.SV for 5 models: (a) LR-TPE, (b) SVM-TPE, (c) RF-TPE, (d) GBDT-TPE, and (e) SWA.However, factors such as DTF and yearly average rainfall, which are crucial for landslide susceptibility studies, ranked low instead.Such factors require the necessary attention.Landslide susceptibility analysis requires comprehensive consideration.

Table 2 .
List of model metrics.

Table 2 .
List of model metrics.

Table 6 .
The parameter space of GBDT.

Table 6 .
The parameter space of GBDT.