Rainfall Induced Landslide Susceptibility Mapping Based on Bayesian Optimized Random Forest and Gradient Boosting Decision Tree Models—A Case Study of Shuicheng County, China

: Among the most frequent and dangerous natural hazards, landslides often result in huge casualties and economic losses. Landslide susceptibility mapping (LSM) is an excellent approach for protecting and reducing the risks by landslides. This study aims to explore the performance of Bayesian optimization (BO) in the random forest (RF) and gradient boosting decision tree (GBDT) model for LSM and applied in Shuicheng County, China. Multiple data sources are used to obtain 17 conditioning factors of landslides, Borderline-SMOTE and Randomundersample methods are combined to solve the imbalanced sample problem. RF and GBDT models before and after BO are adopted to calculate the susceptibility value of landslides and produce LSMs and these models were compared and evaluated using multiple validation approach. The results demonstrated that the models we proposed all have high enough model accuracy to be applied to produce LSM, the performance of the RF is better than the GBDT model without BO, while after adopting the Bayesian optimized hyperparameters, the prediction accuracy of the RF and GBDT models is improved by 1% and 7%, respectively and the Bayesian optimized GBDT model is the best for LSM in this four models. In summary, the Bayesian optimized RF and GBDT models, especially the GBDT model we proposed for landslide susceptibility assessment and LSM construction has a very good application performance and development prospects.


Introduction
Landslides are one of the most common natural hazards and when they occur, they usually cause loss of life and significant economic losses [1,2].How to assess the risk of landslides effectively Water 2020, 12, 3066; doi:10.3390/w12113066www.mdpi.com/journal/water Water 2020, 12, 3066 2 of 22 has always been the focus and difficulty to reduce disaster risk [3].Risk is composed of the hazard of disaster, the vulnerability and exposure of victims and the disaster preparedness and mitigation capacity.Meanwhile, the hazard is composed of the susceptibility and probability of inducing factors.
For landslides, the susceptibility assessment is the key point of the risk assessment.Landslide susceptibility mapping (LSM) is a very excellent approach of susceptibility assessment because it can provide information on the potential area of landslides occurrence [4][5][6].The core supposition of LSM is that future landslides more likely to occur under the same or similar environmental conditions as previous hazards [7].Hence, LSM can predict the potential area of future hazards occurrence by considering the historical disaster locations and their conditioning factors.However, the results of LSM may be affected by prediction models, which means that it is very important to choose suitable research methods for ensuring the availability and scientific validity of the LSM.
There are various methods to map the susceptibility of landslides including traditional mathematical and statistical models and advanced machine learning techniques.Traditional statistical analysis methods can be utilized, such as weight of evidence, frequency ratio [8], even series and parallel model [9] and other constructing weighted index system methods [10].Various machine learning algorithms, such as logistic regression [11,12] naive Bayes [13], decision tree [14], support vector machines [15,16], genetic algorithm [17], artificial neural network [18,19], convolutional neural network [20], recurrent neural network [21], random forest (RF) [22,23] and even fuzzy mathematics [24] can also be used in LSM.However, due to the complex environmental factors system of landslides in different study areas, the accuracy and scientific nature of LSM drawn by each model is very important.Therefore, the evaluations of models, including their advantages and applicability, are very important to obtain a satisfactory LSM.
As one of the best algorithms of actual distribution in machine learning, the gradient boosting decision tree (GBDT) model is rarely involved in the previous research of LSM.It works very well for classification issues such as the calculation of landslides susceptibility and it is often compared with the RF model.Besides, for different machine learning models, most of the related studies only use the default value of initial parameters (as known as hyperparameters) and do not consider the optimization problem, although the parameter adjustment of machine learning model is extremely important that also needs attention to optimize the model performance [25,26].
For landslides, rainfall is one of the most important triggering factors, especially short-term and instantaneous extreme rainfall [27][28][29].The hazard assessment of landslide events triggered by the intensity, duration and type of rainfall has long been a question of great interest in a wide range of fields [30][31][32][33].The research goal of this study is to suggest methods to carry out landslide susceptibility analysis using RF and GBDT model and to evaluate Bayesian optimization (BO) in RF and GBDT models in LSM.In this study, we collected multi-source data such as field survey data, precipitation data and remote sensing satellite data from Shuicheng County, China.And used the traditional RF and the advanced GBDT machine learning models to construct LSM, respectively and compared their performance and applicability in the study area.Besides, we used the Bayesian algorithm to optimize the hyperparameters of RF and GBDT to improve the accuracy of models and evaluated the effect of Bayesian optimization on improving the accuracy of the model using multiple validation approach.Based on the above methods, we constructed the LSM of Shuicheng County to provide a reference for the prevention and reduction of landslides.The highlights of this paper include: The GBDT model is carried out to evaluate and map the landslide susceptibility and compared with the RF model; Borderline-SMOTE and Randomundersample are combined to solve the imbalanced sample problem; Bayesian optimization is used to adjust model hyperparameters; Evaluating model accuracy using multiple validation indexes.

Study Area
Shuicheng County belongs to Liupanshui, Guizhou, China.The area is about 3605 km 2 and the population is about 754,900 [34] (Figure 1).Shuicheng County is one of the most landslide-prone areas in China, one of the reasons is because it is karst landscape with easily leaking surface water and high soil moisture content.The annual average precipitation (AAP) of study area is about 1100 mm, precipitation is the major inducing factor for landslides.On 23 July 2019, a rainfall induced landslide occurred in the study area, which caused the destruction of many houses and roads and the death of villagers [35,36].According to the Guizhou Meteorological Bureau, between 18 and 23 July, Jichang Town experienced three periods of heavy rain shortly before the landslide: the night of 18 July from 19 July to 20 July and the night of 22 July.The cumulative rainfall at this site reached 189.1 mm for 18-23 July and 98 mm for 22-23 July.[35].
Water 2020, 12, x FOR PEER REVIEW 3 of 22 Shuicheng County belongs to Liupanshui, Guizhou, China.The area is about 3605 km 2 and the population is about 754,900 [34] (Figure 1).Shuicheng County is one of the most landslide-prone areas in China, one of the reasons is because it is karst landscape with easily leaking surface water and high soil moisture content.The annual average precipitation (AAP) of study area is about 1100 mm, precipitation is the major inducing factor for landslides.On 23 July 2019, a rainfall induced landslide occurred in the study area, which caused the destruction of many houses and roads and the death of villagers [35,36].According to the Guizhou Meteorological Bureau, between 18 and 23 July, Jichang Town experienced three periods of heavy rain shortly before the landslide: the night of 18 July from 19 July to 20 July and the night of 22 July.The cumulative rainfall at this site reached 189.1 mm for 18-23 July and 98 mm for 22-23 July.[35].

Data
The monitoring of landslides in Shuicheng County is done using field surveys, remote sensing images and combining the landslides' historical locations recorded by the China Geological Survey during 1999-2018 [37].These points are the centroid of landslide scarp, which has been proved the best landslide sampling strategy [38].Combined with multi-source data, the landslide inventory was collected and 240 landslides sites were obtained (Figure 1).
The conditioning factors are extremely important for landslide susceptibility assessment [39].A total of 17 conditioning factors were selected based on their impact on the landslides and the data accessibility (Table 1).To store these conditioning factors into a uniform attribute table, according to the DEM pixel size, all of the factors' pixel size was resampled to 30 m × 30 m.

Data
The monitoring of landslides in Shuicheng County is done using field surveys, remote sensing images and combining the landslides' historical locations recorded by the China Geological Survey during 1999-2018 [37].These points are the centroid of landslide scarp, which has been proved the best landslide sampling strategy [38].Combined with multi-source data, the landslide inventory was collected and 240 landslides sites were obtained (Figure 1).
The conditioning factors are extremely important for landslide susceptibility assessment [39].A total of 17 conditioning factors were selected based on their impact on the landslides and the data accessibility (Table 1).To store these conditioning factors into a uniform attribute table, according to the DEM pixel size, all of the factors' pixel size was resampled to 30 m × 30 m. Topography is the most dominant factor in slopes stabilities [1].In this study, topography factors are calculated by the Digital Elevation Model (DEM) with 30 m × 30 m pixel size which was used of Advanced Spaceborne Thermal Emission and Reflection Radiometer Digital Elevation Model (ASTER DEM) data jointly developed by METI, Japan and NASA, the USA provided by the Geospatial Data Cloud site [40].Elevation, slope, aspect, plan curvature and profile curvature data were extracted from DEM. Elevation affects the degree of rock weathering in landslides assessment (Figure 2a).The slope is another important factor that can reflect the steepness of the topography [41] (Figure 2b).Generally speaking, the greater the slope, the higher the possibility of landslides, when other conditions are the same.Aspect can affect solar radiation and precipitation, thus affecting soil moisture (Figure 2c).Moreover, plan curvature and profile curvature were chosen in LSM [42] (Figure 2d,e).Topography is the most dominant factor in slopes stabilities [1].In this study, topography factors are calculated by the Digital Elevation Model (DEM) with 30 m × 30 m pixel size which was used of Advanced Spaceborne Thermal Emission and Reflection Radiometer Digital Elevation Model (ASTER DEM) data jointly developed by METI, Japan and NASA, the USA provided by the Geospatial Data Cloud site [40].Elevation, slope, aspect, plan curvature and profile curvature data were extracted from DEM. Elevation affects the degree of rock weathering in landslides assessment (Figure 2a).The slope is another important factor that can reflect the steepness of the topography [41] (Figure 2b).Generally speaking, the greater the slope, the higher the possibility of landslides, when other conditions are the same.Aspect can affect solar radiation and precipitation, thus affecting soil moisture (Figure 2c).Moreover, plan curvature and profile curvature were chosen in LSM [42] (Figure 2d,e).Lithology affects the shear strength and permeability of slopes, which is another important conditioning factor for landslides occurrences [43].Geological age can also characterize the development degree of regional lithology.Faults control the formation and development of geological hazards and geological processes are more active in the vicinity of faults.The lithology map and geological age map were digitized from geological maps obtained from the China Geology Survey [37] (Figure 2f,g) and we were able to calculate the distance to faults by spatial interpolation (Figure 2h).
Roads can also reflect the possible influence of human activities on geological hazards to a certain extent.Meanwhile, the traffic on roads can cause vibration to destabilize rock material.The Lithology affects the shear strength and permeability of slopes, which is another important conditioning factor for landslides occurrences [43].Geological age can also characterize the development degree of regional lithology.Faults control the formation and development of geological hazards and geological processes are more active in the vicinity of faults.The lithology map and geological age map were digitized from geological maps obtained from the China Geology Survey [37] (Figure 2f,g) and we were able to calculate the distance to faults by spatial interpolation (Figure 2h).
Water 2020, 12, 3066 8 of 22 Roads can also reflect the possible influence of human activities on geological hazards to a certain extent.Meanwhile, the traffic on roads can cause vibration to destabilize rock material.The closer the distance to the road, the higher is the possibility that geo-hydrological hazards will occur.Therefore, we selected the distance to roads as a conditioning factor (Figure 2i).
Hydrological factors are the factors that must be considered in geological hazards.The surface river is one of the most active factors in external dynamic geological processes, distance to rivers can clearly express the influence of surface water on the landslides' susceptibility.(Figure 2j).Also, this study selected four kinds of hydrological indexes which are mainly used in the study of landslides, including the stream power index (SPI), the sediment transport index (STI), the topographic relief index (TRI) and the topographic wetness index (TWI) [25].Among them, SPI is the movement of strong particulates as gravity acts on the sediment.(Figure 2k).The STI represents slope failure and deposition [25] Figure 2l).TRI is the difference in elevation between the highest and lowest in an area.(Figure 2m).TWI represents the effect of different terrains on saturation degree and surface runoff location [25] (Figure 2n).The calculation equations of these four hydrological indices are as follows: where A S represents the catchment area (m 2 /m), β is the slope of each grid [44], DEM MAX and DEM MIN are the maximum and minimum DEM value of eight grids around each grid, respectively.All the variables in those equations can be extracted by DEM data using "Fill," "Flow Accumulation," "Flow Direction," "Neighborhood Statistics" and "Raster Calculator" tools in ArcGIS.Land cover, especially vegetation cover, is often used as a factor in determining landslide stability [45].Hence, we chose land cover and the normalized difference vegetation index (NDVI) as conditioning factors [46] (Figure 2o,p).The land cover map was obtained at http://data.ess.tsinghua.edu.cn [47] and the NDVI data was calculated through Landsat8 OLI satellite remote sensing digital products using the band algebra tool of the Environment for Visualizing Images (ENVI) software [48].
Rainfall is the major inducing factor for landslides in the study area.Meanwhile, rainfall also reflects the soil hydrology.We chose the AAP as a conditioning factor to reflect the effect of rainfall (Figure 2q).The rainfall data was download at China Meteorological Data Service Center [49] and the annual average rainfall data are averages from 1981-2018.

Methods
The LSM framework in this research can be divided into four steps and can be visualization in Figure 3.The first step is data collection: generating the list of landslides and the maps of various conditioning factors.The second step is data processing: listing the landslides inventory and its conditioning factors and dividing the study area into grids and preparing the samples including training set and test set.The third step is LSM generation: using the RF model, GBDT model and Bayesian optimized RF (RF_B) and GBDT (GBDT_B) models to train the training set samples and producing the LSM.The last step is model comparison and validation: comparing and verifying the difference among the four models using various evaluation methods to models.

Data Pretreatment
First, the study area was divided into 30 × 30 m grids.Then, in this study, the continuous variables were reclassified into five classes using natural fracture method and the discrete variables were reclassified based on the ratio of landslides count in various grades to the area.Finally, the factor class of each grid were inputted into the attribute table for the preparation of the following process of samples and LSM.

Imbalanced Sample Problem and Sample Preparation
The imbalanced sample problem is must be considered in the susceptibility assessment of landslides [50].Three commonly used processing methods include: (1) oversampling the minority class samples, (2) undersampling the majority class samples and (3) weighting of two kinds of samples [51].Borderline-SMOTE is an improved oversampling algorithm based on Synthetic Minority Over-sampling Technique (SMOTE) (Verbiest et al. 2014), it only uses a few class samples on the boundary to synthesize new samples to improve the class distribution of samples.The common undersampling method is the Randomundersample, which selects samples randomly in the majority class.
Because there are 240 disaster points in the study area and the amount of positive sample (disaster point sample) is too small, after many experiments, the prediction effect is the most obvious and not overfitting after magnifying the disaster point sample by 1 time.In this paper, the samples were preprocessed by combining Borderline-SMOTE and Randomundersample methods, these processes were implemented in Python 3.7 environment.The specific process is as follows: first, taking all the grid data as input data and undersampling non-disaster point (negative) samples, repeating twice to obtain 480 non-disaster point samples, then taking them and 240 disaster points as input data, using the Borderline-SMOTE module to oversample the disaster point samples to generate new disaster point samples and got 480 disaster point samples.Unifying the positive and negative samples into one database.And finally, the samples were divided into training set (70%; 672) and test set (30%; 288) randomly.

Data Pretreatment
First, the study area was divided into 30 × 30 m grids.Then, in this study, the continuous variables were reclassified into five classes using natural fracture method and the discrete variables were reclassified based on the ratio of landslides count in various grades to the area.Finally, the factor class of each grid were inputted into the attribute table for the preparation of the following process of samples and LSM.

Imbalanced Sample Problem and Sample Preparation
The imbalanced sample problem is must be considered in the susceptibility assessment of landslides [50].Three commonly used processing methods include: (1) oversampling the minority class samples, (2) undersampling the majority class samples and (3) weighting of two kinds of samples [51].Borderline-SMOTE is an improved oversampling algorithm based on Synthetic Minority Over-sampling Technique (SMOTE) (Verbiest et al. 2014), it only uses a few class samples on the boundary to synthesize new samples to improve the class distribution of samples.The common undersampling method is the Randomundersample, which selects samples randomly in the majority class.
Because there are 240 disaster points in the study area and the amount of positive sample (disaster point sample) is too small, after many experiments, the prediction effect is the most obvious and not overfitting after magnifying the disaster point sample by 1 time.In this paper, the samples were preprocessed by combining Borderline-SMOTE and Randomundersample methods, these processes were implemented in Python 3.7 environment.The specific process is as follows: first, taking all the grid data as input data and undersampling non-disaster point (negative) samples, repeating twice to obtain 480 non-disaster point samples, then taking them and 240 disaster points as input data, using the Borderline-SMOTE module to oversample the disaster point samples to generate new disaster point samples and got 480 disaster point samples.Unifying the positive and negative samples into one database.And finally, the samples were divided into training set (70%; 672) and test set (30%; 288) randomly.

RF Model
RF model is a classification algorithm proposed by Leo Breiman [52].RF can be considered as a collection of numerous random decision trees, which are Classification and Regression Tree (CART) generally.Through the bootstrap resampling technique, some samples in the training set are randomly and repeatedly selected to train the decision tree and then many decision trees are generated to form a random forest [22].Essentially, it is created by combining multiple decision trees, each of which relies on independent samples, after randomly generating many decision trees, samples can choose the best classification by the statistical results of each tree.The advantages of the RF model include: randomly choosing samples in decision trees can avoid overfitting to some degree; randomly choosing samples can enhance noise resistance; it can handle high dimensions sample without factor screening.

GBDT Model
GBDT also called Multiple Additive Regression Tree (MART) is an iterative decision tree algorithm proposed by Friedman [53].GBDT model uses the gradient descent method and combines the decision tree method with the bagging and boosting algorithm to solve the over-fitting problem of traditional decision trees [54].In recent years, it has attracted people's attention because of the machine learning model which is used to search and sort.It consists of many decision trees and takes the cumulative results of trees as the final result.GBDT belongs to Boosting ensemble learning but the difference from the traditional AdaBoost algorithm is, GBDT generates a weak classifier (usually using CART regression tree) through multiple iterations, each classifier is trained based on the residual of the last iteration of the classifiers.The results are finalized by the weighted summation of the weak classifiers in each iteration.GBDT model can ultimately be described as: M is the number of iterations and T(x, θ m ) is the weak classifier generated in each iteration, θ m is the loss function which can be described as: where F m−1 (x i ) is the present iteration and GBDT minimizes the θ m to establish the parameters of the next classifier.Each round of training can reduce the loss function as much as possible to achieve the local optimal solution or the global optimal solution.GBDT is highly generalizable and very suitable for classification and prediction.

Bayesian Optimization
In machine learning, adjusting the initial parameters (also known as hyperparameters) is a tedious but crucial task, because it significantly influences the algorithm performance.It is very time-consuming to adjust the parameters manually and the random grid search also needs a long running time.Therefore, many methods of automatically adjusting hyperparameters have been proposed.BO is a method of finding the minimum value of functions, which has been applied to hyperparametric search in machine learning problems [55].It establishes an alternative function based on the past evaluation results of the objective function to find the value of the minimized objective function.Compared with random grid search, its advantage is that when selecting parameters in each iteration, it will refer to the previous evaluation results, which greatly saves search time and improves optimization efficiency.BO includes four main contents: (1) the loss on the verification set using this set of hyperparameters in machine learning; (2) search space: the value range of the hyperparameters to be searched; (3) optimization algorithm: the method of constructing objective function and selecting RF model and GBDT model have various hyperparameters.Due to the small sample size of this study, we selected the number of weak classifiers (N_Estimators), the maximum depth for each decision tree (Max_Depth), the minimum number of samples on leaf nodes (Min_Samples_Leaf), the maximum number of leaf nodes (Max_Leaf_Nodes) in RF and GBDT models and the weight reduction factor for each weak classifier (Learning_Rate), Subsampling ratio (Subsample) in the iterative pattern-based GBDT model as hyperparameters for BO.These hyperparameters and their default value and search spaces were listed in Table 2.

Model Evaluation
The Precision, Recall, F1 value, Accuracy, the over prediction rate (OPR), the unpredicted presence rate (UPR), Matthews correlation coefficient (MCC) and receiver operating characteristic (ROC) curve methods were selected to verify the models' accuracy.Those methods are based on the statistics of true positive (TP), false positive (FP), true negative (TN) and false negative (FN).
The Precision refers to the proportion of correctly categorized grids identified by the model [56]: The Recall is the proportion of the landslide grids rightly detected by the model, it is calculation formula as follows: The F1 value is the weighted harmonic average of Precision and Recall, which can be calculated by the following formula: The Accuracy is the proportion that the model can correctly classify all positive and negative samples, which can be estimated using Equation (10): The OPR also called the commission error, it is the proportion of the wrongly classified disaster grids identified by model: The UPR is the proportion that the model fails to classify correctly in the actual disaster point grid which is calculated using Equation ( 12): The MCC can evaluate a binary classification model, even when the sample sizes for the two categories are very different [57].The MCC formula is as follows: The range of MCC values is from -1 to 1.The larger the value, the more accurate the model.The ROC curve is a common method for evaluating landslide prediction models [58,59].It is plotted based on the "Sensitivity" and the "1-Specificity."The sensitivity and specificity are calculated as follow: The model performance can be indicated by calculating the area under the curve (AUC) [60].The threshold for AUC values is 0.5 to 1, the closer it is to 1, the more accurate the model is [61].

Feature Importance
As the black-box model of mechanical learning, both RF and GBDT models can adapt to a large number of features for training, so we did not take the factor screening step.For the part of feature Importance, we took advantage of the feature ranking ability of the GBDT model and the improvement of performance by Bayesian optimization, the GBDT_B model is finally selected to calculate the importance of each factor, as shown in Figure 4.Among them, the elevation is the most important factor of landslides, its importance is far more than other factors, more than 0.15, The second important factor is precipitation, which is more than 0.9 in importance.Plan curvature and distance to faults are also responsible for the occurrence of landslides, while TWI, SPI factor has little effect on it.
Water 2020, 12, x FOR PEER REVIEW 12 of 22 The MCC can evaluate a binary classification model, even when the sample sizes for the two categories are very different [57].The MCC formula is as follows: The range of MCC values is from -1 to 1.The larger the value, the more accurate the model.The ROC curve is a common method for evaluating landslide prediction models [58,59].It is plotted based on the "Sensitivity" and the "1-Specificity."The sensitivity and specificity are calculated as follow: The model performance can be indicated by calculating the area under the curve (AUC) [60].The threshold for AUC values is 0.5 to 1, the closer it is to 1, the more accurate the model is [61].

Feature Importance
As the black-box model of mechanical learning, both RF and GBDT models can adapt to a large number of features for training, so we did not take the factor screening step.For the part of feature Importance, we took advantage of the feature ranking ability of the GBDT model and the improvement of performance by Bayesian optimization, the GBDT_B model is finally selected to calculate the importance of each factor, as shown in Figure 4.Among them, the elevation is the most important factor of landslides, its importance is far more than other factors, more than 0.15, The second important factor is precipitation, which is more than 0.9 in importance.Plan curvature and distance to faults are also responsible for the occurrence of landslides, while TWI, SPI factor has little effect on it.

Results of Bayesian Optimization
In this study, the Tree Parzen Estimator (TPE) function in the Hyperopt library was used as the optimization algorithm for BO in the python 3.7 environments.Each optimization ran 500 iterations and the losses in the iterative process of Bayesian optimization are illustrated in Figure 5.The minimum loss of RF is −0.7185 and constringes to the 429th iteration, while the minimum loss of the GBDT model is lower and constringes after the 396th iteration.Therefore, it can be seen that the Bayesian optimization is more effective in improving GBDT compared to the RF model.
Water 2020, 12, x FOR PEER REVIEW 13 of 22 In this study, the Tree Parzen Estimator (TPE) function in the Hyperopt library was used as the optimization algorithm for BO in the python 3.7 environments.Each optimization ran 500 iterations and the losses in the iterative process of Bayesian optimization are illustrated in Figure 5.The minimum loss of RF is −0.7185 and constringes to the 429th iteration, while the minimum loss of the GBDT model is lower and constringes after the 396th iteration.Therefore, it can be seen that the Bayesian optimization is more effective in improving GBDT compared to the RF model.Table 3 shows the results of the hyperparameters optimized by BO.The number of trees is 252 and the max depth of each tree is 42 in the RF model, while the GBDT model needs 310 iterations to achieve the optimal result with 47 max depth of each iteration.The value of Max_Leaf_Nodes shows that all the factors are used in the two models, indicating that all the factors we used are associated with landslide occurrence.Also, the unique hyperparameters "learning rate" and "subsampling" of the GBDT model are 0.30346 and 0.95475 respectively, these two hyperparameters are the most crucial in reducing function loss of the GBDT model.Table 3 shows the results of the hyperparameters optimized by BO.The number of trees is 252 and the max depth of each tree is 42 in the RF model, while the GBDT model needs 310 iterations to achieve the optimal result with 47 max depth of each iteration.The value of Max_Leaf_Nodes shows that all the factors are used in the two models, indicating that all the factors we used are associated with landslide occurrence.Also, the unique hyperparameters "learning rate" and "subsampling" of the GBDT model are 0.30346 and 0.95475 respectively, these two hyperparameters are the most crucial in reducing function loss of the GBDT model.

LSMs Based Multiple Models
In this study, the RF, GBDT and the RF_B, GBDT_B models were constructed by using the Scikit-Learn library in the python 3.7 environments.By training various models, the probability of landslides occurrence in each grid was obtained as the susceptibility value and the LSMs produced by ArcGIS software through the spatial distribution of susceptibility values.To better visualize the graphics, the susceptibility values were reclassified into five classes by the Natural Breaks approach as shown in Figure 6.In these LSMs constructed by various models, the high-risk areas are very similar, all concentrated near the faults and high elevation area, which is similar to what we expected because most landslides occur in high elevation and near fault zones.
Water 2020, 12, x FOR PEER REVIEW 14 of 22 graphics, the susceptibility values were reclassified into five classes by the Natural Breaks approach as shown in Figure 6.In these LSMs constructed by various models, the high-risk areas are very similar, all concentrated near the faults and high elevation area, which is similar to what we expected because most landslides occur in high elevation and near fault zones.Figure 7 shows the statistics of each grade of LSMs constructed by the four models, in which the distribution of each grade of RF and RF_B is very similar, with the highest proportion of low-level area (about 28.5%) and the smallest of very high-level area (about 8.2%).In the GBDT model, the proportion of very low is the highest (30.87%) and the proportion of the very high-level area is the Figure 7 shows the statistics of each grade of LSMs constructed by the four models, in which the distribution of each grade of RF and RF_B is very similar, with the highest proportion of low-level area (about 28.5%) and the smallest of very high-level area (about 8.2%).In the GBDT model, the proportion of very low is the highest (30.87%) and the proportion of the very high-level area is the lowest (11.73%).However, the grade distribution of the GBDT_B model is different from other models, very low area accounts for 57.37%, very high area is 21.80%, while the proportion of the middle three grades is very small and very close.
Water 2020, 12, x FOR PEER REVIEW 15 of 22 the low and very low areas.This indicates the good accuracy of all four models.Compared to the RF model, the GBDT model has a higher Pli at both very low and very high levels, indicating that the GBDT model is more sensitive to positive samples and less effective than RF in predicting negative samples.Comparing the four models, GBDT_B has the best Pli distribution, which also indicates that it may be the best model and the specific model validation results are compared and analyzed in 4.4.

Model Comparison and Validation
For model verification, this study selects a variety of evaluation methods based on TP, TN, FP, FN to evaluate different models.Table 5 shows the multiple validation indexes of the RF and GBDT models and their Bayesian optimized models.Table 4 shows the quantitative results for the comparison of historical landslides of each grade in LSMs.As can be seen from the table, the percentage of historical landslides in each grade to total landslides (P li ) is over 63% for all models in both the high and very high grades and very few in both the low and very low areas.This indicates the good accuracy of all four models.Compared to the RF model, the GBDT model has a higher P li at both very low and very high levels, indicating that the GBDT model is more sensitive to positive samples and less effective than RF in predicting negative samples.Comparing the four models, GBDT_B has the best P li distribution, which also indicates that it may be the best model and the specific model validation results are compared and analyzed in 4.4.

Model Comparison and Validation
For model verification, this study selects a variety of evaluation methods based on TP, TN, FP, FN to evaluate different models.Table 5 shows the multiple validation indexes of the RF and GBDT models and their Bayesian optimized models.
Water 2020, 12, 3066 16 of 22 A high Precision value indicates the model has a good prediction effect of the positive samples.The Precision values of the four models are in the following order: GBDT_B > RF_B > RF > GBDT.In other words, the Bayesian optimized model is more effective in predicting the occurrence of potential landslides, especially the GBDT_B model.A Higher Recall value indicates that the model is more sensitive to negative samples.Therefore, for the prediction of non-landslide points, RF_B has the best performance, while GBDT is the worst.F1 and Accuracy values are indicators for evaluating the overall prediction effect of the model.Besides, the MCC value is a parameter for testing the prediction performance of the model for binary classification and all of their order is GBDT_B > RF_B > RF > GBDT just like the precision.Figure 8 shows the ROC curves of four models.Among them, the AUC of GBDT is the smallest (0.796), followed by the RF model (0.845), while after BO, the AUC of RF_B is 0.860 and GBDT_B increases to 0.866.According to the results of various verification indexes, the order of model accuracy is GBDT_B > RF_B > RF > GBDT.Bayesian optimization improves the accuracy of the RF model by 1% and that of the GBDT model by 7%.As a whole, for the models we proposed, they all have high enough model accuracy to be applied to the prediction of the susceptibility of landslides.Besides, we referred the LSMs to the satellite images which show that most historical landslides occurred in high-risk areas, that can prove the availability of the models.performance of the model for binary classification and all of their order is GBDT_B > RF_B > RF > GBDT just like the precision.Figure 8 shows the ROC curves of four models.Among them, the AUC of GBDT is the smallest (0.796), followed by the RF model (0.845), while after BO, the AUC of RF_B is 0.860 and GBDT_B increases to 0.866.According to the results of various verification indexes, the order of model accuracy is GBDT_B > RF_B > RF > GBDT.Bayesian optimization improves the accuracy of the RF model by 1% and that of the GBDT model by 7%.

Discussion
This work aims to estimate the regional landslides susceptibility by using RF and GBDT model before and after Bayesian Optimization, discusses and compares the accuracy of different models.Based on the importance of factors obtained by the GBDT_B model and combined with the results of LSMs produced by different models.The area with high susceptibility of landslides shows a banded distribution, which is mainly affected by elevation and fault zone.The higher elevation is, the closer to the faults is, its geological activity is stronger and more prone to landslides, which is consistent with most related research results [42,62].As the main inducement of landslides in Shuicheng County, precipitation is second in importance only to elevation, indicating that the hydrological conditions of rock and soil have a significant effect in inducing landslides.To a certain extent, it provides a reference for the relationship between the landslide early warning and the extreme rainfall warning.
In the LSMs constructed with default parameters and Bayesian optimized RF model, the susceptibility is mostly concentrated in the medium and low levels and there is little difference in the overall and spatial distribution before and after optimization.This is mainly because for the RF model, BO mainly changes only the number of weak learners (decision trees), the number of trees expands from 100 to 252, which has little improvement in the whole model predictive performance.However, the class distribution of the GBDT model is different from the RF model, which decreases from low class to high class.After Bayesian optimization, the hierarchical distribution appeared polarized -more concentrated on very low and very high classes.Comparing the locations of actual disaster points, only 4 of the 240 landslide points are not in a very high area, which means there is no over-fitting phenomenon, it indicated that the accurate classification effect of the GBDT_B model is more obvious and the prediction of the susceptibility of landslides is clearer.
In this paper, a variety of indices and the AUC values were utilized to validate the effect of RF, GBDT and the improvement of Bayesian optimization.The validation calculations illustrate that RF and GBDT models can reach the basic accuracy standard and can be applied to the LSM.When the default values are selected for RF and GBDT modeling, the model accuracy (Accuracy = 0.736, AUC = 0.796) of GBDT is lower than that RF (Accuracy = 0.760, AUC = 0.845).As the more advanced machine learning model, GBDT does not achieve the accuracy of RF without adjusting parameters, which may be due to the difference in the operational mechanism of the two models.The RF model is repeatedly and randomly selecting the subsamples from the original training sample to construct multiple decision trees by using the bootstrap resampling technique, while the GBDT model minimizes the loss function by iterative calculation to determine the parameters of the next weak classifier to achieve the local optimal solution or the global optimal solution.Under the condition of no parameter adjustment, the GBDT model does not use the subsampling process, that is, all the samples are applied to each weak classifier, which will increase the variance and produce over-fitting.Meanwhile, if the Learning_Rate value is 1, the regularization will not be adopted, which will also lead to the reduction of the model accuracy.
After using Bayesian optimization to adjust the parameters, RF model accuracy improved by 1%, while the GBDT model improved by 7% and the GBDT_B accuracy is the highest of the four models and the AUC increased to the highest 0.866.This shows that in LSM construction, Bayesian optimization mainly improves the RF model by setting the optimal value of the number of weak learners and the maximum depth of each decision tree and has only a weak optimization effect.While for the GBDT model, in addition to increasing the number of weak learners and setting the maximum depth of each tree, the more important is setting the learning rate of each iteration and adjusting the subsamples proportion, thus significantly improving the performance of the model and give full play to the robust prediction ability of the GBDT model.
Meanwhile, compared with researches of landslide susceptibility mapping (LSM) constructed by other methods, the accuracy of the GBDT_B model is approximately 6% higher than logistic regression (accuracy = 0.742, AUC = 0.79) [12] and even higher than convolutional neural network (accuracy = 0.776, MCC = 0.555, AUC = 0.813) [20], which is the most advanced deep learning model at present.Besides, compared to many LSM studies, the model accuracy of the proposed method in this paper is higher than traditional logistic regression, frequency ratio and other mathematical and statistical models [8][9][10][11][12]25,63] and is not inferior to some advanced machine learning models, such as artificial neural network, recurrent neural network and conventional neural network [20,21,26,64,65] and its operational efficiency is much higher than most neural network models.Compared with other studies that also use RF or GBDT models without hyperparameters optimization [66,67], the accuracy is also significantly improved after Bayesian optimization, indicating the importance of Bayesian optimization of hyperparameters in machine learning model.These results also indicate that the combination method of Bayesian optimization and the GBDT model is a robust technique that has high promise for LSM.

Conclusions
This study presented the application of BO in RF and GBDT models for LSM in Shuicheng County, China.In this study, multiple data sources are used to obtain seventeen conditioning factors of landslides, Borderline-SMOTE and Randomundersample methods are combined to solve the imbalanced sample problem.RF and GBDT models before and after BO are applied to calculate the susceptibility value of landslides and four LSMs were proposed by using RF, GBDT, RF_B, GBDT_B.These methods perform well and can be extended to other study areas and other research fields.Based on the validation of the results by multiple indexes, the following conclusions were obtained.First, the four LSMs constructed in this study are all applicable in landslide susceptibility assessment for prevention and management.Second, the performance of the RF is better than the GBDT model without BO, which indicates that the advanced nature of the GBDT model cannot be reflected under the condition of parameter optimization.After adopting the Bayesian optimized hyperparameters, the prediction accuracy of the RF and GBDT models is improved by 1% and 7%, respectively, which shows that the BO can optimize the two models especially the GBDT model.Finally, according to the comprehensive analysis of the four models, the prediction performance of the GBDT_B model is the highest and it is better than many other machine learning models, which has very good prediction performance.In summary, The Bayesian optimized RF and GBDT models, especially the GBDT_B model we proposed are scientific and feasible in landslide susceptibility assessment and have robust prediction ability and application prospects.

Figure 1 .
Figure 1.Location and landslide points of Shuicheng County, China.

Figure 1 .
Figure 1.Location and landslide points of Shuicheng County, China.
hyperparameter for evaluation; (4) optimization results: the evaluation results of the objective function, including hyperparameters values and losses on the verification set.

Figure 4 .
Figure 4.The feature importance calculated by the GBDT_B model.

Figure 4 .
Figure 4.The feature importance calculated by the GBDT_B model.

Figure 5 .
Figure 5.The loss in the iterative process of Bayesian optimization.

Figure 5 .
Figure 5.The loss in the iterative process of Bayesian optimization.

Figure 7 .
Figure 7.The statistics of each grade of LSMs constructed by the models.

Figure 7 .
Figure 7.The statistics of each grade of LSMs constructed by the models.

Figure 8 .
Figure 8.The ROC curve of the four models.

Table 1 .
Data structures and summaries of landslides conditioning factors.

Table 1 .
Data structures and summaries of landslides conditioning factors.

Table 2 .
Hyperparameters, default values and search spaces of random forest (RF) and gradient boosting decision tree (GBDT) models.

Table 3 .
The Bayesian Optimization Result of hyperparameters in the RF and GBDT models.

Table 3 .
The Bayesian Optimization Result of hyperparameters in the RF and GBDT models.

Table 4 .
Quantitative results for the comparison of historical landslides of each grade in LSMs (Pli is the percentage of historical landslides in each grade to total landslides).

Table 5 .
Model validation results using multiple methods.

Table 4 .
Quantitative results for the comparison of historical landslides of each grade in LSMs (P li is the percentage of historical landslides in each grade to total landslides).

Table 5 .
Model validation results using multiple methods.