Comparison of Di ﬀ erent Machine Learning Methods for Debris Flow Susceptibility Mapping: A Case Study in the Sichuan Province, China

: Debris ﬂow susceptibility mapping is considered to be useful for hazard prevention and mitigation. As a frequent debris ﬂow area, many hazardous events have occurred annually and caused a lot of damage in the Sichuan Province, China. Therefore, this study attempted to evaluate and compare the performance of four state-of-the-art machine-learning methods, namely Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), and Boosted Regression Trees (BRT), for debris ﬂow susceptibility mapping in this region. Four models were constructed based on the debris ﬂow inventory and a range of causal factors. A variety of datasets was obtained through the combined application of remote sensing (RS) and geographic information system (GIS). The mean altitude, altitude di ﬀ erence, aridity index, and groove gradient played the most important role in the assessment. The performance of these modes was evaluated using predictive accuracy (ACC) and the area under the receiver operating characteristic curve (AUC). The results of this study showed that all four models were capable of producing accurate and robust debris ﬂow susceptibility maps (ACC and AUC values were well above 0.75 and 0.80 separately). With an excellent spatial prediction capability and strong robustness, the BRT model (ACC = 0.781, AUC = 0.852) outperformed other models and was the ideal choice. Our results also exhibited the importance of selecting suitable mapping units and optimal predictors. Furthermore, the debris ﬂow susceptibility maps of the Sichuan Province were produced, which can provide helpful data for assessing and mitigating debris ﬂow hazards.

are complex and various. Three active faults run through the study area, namely the Longmenshan fault, the Xianshuihe fault, and the Anninghe fault. Complex geological and geomorphological characteristics play an important role to trigger heavy geological hazards in this region, because of complex interactions between the Qinghai-Tibet Plateau and the Sichuan Basin [22]. The study area belongs to the subtropical monsoon zone with an annual average rainfall of 1000 mm; over 80% of precipitation usually takes place in the monsoon season (between April and October) [5]. The average temperature in January ranges from 3 °C to 8 °C and the average temperature in July is 25 °C~29 °C [5]. Landslides, debris flow, and mountain torrents are widespread here. In particular, debris flows triggered by heavy rains pose the greatest threat in the study area.
The Sichuan Province is an important node of the "One Belt and One Road" where many important projects such as the Sichuan-Tibet Railway and the Baihetan Hydropower Station exist in the region. To promote sustainable development of society and economy, it is always important to conduct debris flow susceptibility analyses to understand the surface dynamics and climatic variability.  To promote sustainable development of society and economy, it is always important to conduct debris flow susceptibility analyses to understand the surface dynamics and climatic variability.

Materials and Methods
To achieve debris flow susceptibility mapping, four main stages were adopted; illustrated in Figure 2. First, the debris flow inventory was prepared and the causal factors were selected. Then, they were separated into two independent groups, namely the training set and the validation set, using the validation set approach. Second, four debris flow susceptibility models were set up based on the LR, RF, SVM, and BRT algorithms. Third, we applied the constructed models to develop debris flow susceptibility maps of the study area. Finally, these models were evaluated and compared using two widely used criteria, including predictive accuracy (ACC) and the area under the receiver operating characteristic curve (AUC).

Materials and Methods
To achieve debris flow susceptibility mapping, four main stages were adopted; illustrated in Figure 2. First, the debris flow inventory was prepared and the causal factors were selected. Then, they were separated into two independent groups, namely the training set and the validation set, using the validation set approach. Second, four debris flow susceptibility models were set up based on the LR, RF, SVM, and BRT algorithms. Third, we applied the constructed models to develop debris flow susceptibility maps of the study area. Finally, these models were evaluated and compared using two widely used criteria, including predictive accuracy (ACC) and the area under the receiver operating characteristic curve (AUC).

Compilation of Debris Flow Inventory
Debris flow inventory is an important prerequisite for the analyses of debris flow susceptibility because there is an assumption that past events have a great influence on the future [2]. In this study, detailed information of 3839 rainfall-triggered debris flow events in the Sichuan Province, from 1949 to 2017 was collected, based on historical records collection, aerial photographs, satellite remote sensing images interpretation, and field verification. Some of the information was obtained from government departments in Sichuan. Therefore, the inventory is reliable in both quality and completeness. The locations of debris flows are shown as points ( Figure 1). As can be seen in the diagram, these events were concentrated in the mountainous and hilly areas of the mid-Sichuan region.

Selection of Debris Flow Causal Factors
The selection of debris flow causal factors is also an important task for susceptibility modeling and mapping. Investigators have been using diverse geo-environmental factors in previous studies and have been trying to explore their relationship with debris flows. Based on the general cause of debris flows, six clusters of factors were initially determined for modeling in this study, including topographical, geological, edaphic, meteorological, land-cover, and sociometric factors ( Table 1). All factors were prepared with the help of GIS and RS. The topographical factors, including mean slope angle, slope aspect, mean altitude, altitude difference, and groove gradient, were derived from the

Compilation of Debris Flow Inventory
Debris flow inventory is an important prerequisite for the analyses of debris flow susceptibility because there is an assumption that past events have a great influence on the future [2]. In this study, detailed information of 3839 rainfall-triggered debris flow events in the Sichuan Province, from 1949 to 2017 was collected, based on historical records collection, aerial photographs, satellite remote sensing images interpretation, and field verification. Some of the information was obtained from government departments in Sichuan. Therefore, the inventory is reliable in both quality and completeness. The locations of debris flows are shown as points ( Figure 1). As can be seen in the diagram, these events were concentrated in the mountainous and hilly areas of the mid-Sichuan region.

Selection of Debris Flow Causal Factors
The selection of debris flow causal factors is also an important task for susceptibility modeling and mapping. Investigators have been using diverse geo-environmental factors in previous studies and have been trying to explore their relationship with debris flows. Based on the general cause of Remote Sens. 2020, 12, 295 5 of 20 debris flows, six clusters of factors were initially determined for modeling in this study, including topographical, geological, edaphic, meteorological, land-cover, and sociometric factors (Table 1). All factors were prepared with the help of GIS and RS. The topographical factors, including mean slope angle, slope aspect, mean altitude, altitude difference, and groove gradient, were derived from the Digital Elevation Model (DEM). Notably, the groove gradient referred to the ratio of the height difference of gully to its length and was an elemental parameter for the initiation and motion of debris flows. Geological factors, namely seismic intensity and lithology, were prepared in a GIS environment using a seismic information map and lithological composition map, respectively. Similarly, edaphic factors (soil texture and soil erosion) and meteorological factors (moisture index, aridity index, mean annual temperature, accumulated temperature of 10 • C, and annual precipitation) were acquired from the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (RESDC) and were pre-processed by GIS technology [23]. Additionally, sociometric factors (population density and road density) were obtained from the remote sensing datasets provided by RESDC and the OpenStreetMap [24]. Land cover factors (Normalized Difference Vegetation Index and land use) constructed from remote sensing images were also commonly used [25].
Watersheds were selected as mapping units to avoid problems of raster grid-cells, such as lack of physical relations with debris flows [26]. Watershed units have significant conceptual and operational advantages [8]. We used the Hydrological Analysis Tool of ArcGIS v.10.2 software to divide the study area into watersheds. Figure 3 exhibits 2471 mapping units ranging from 8.26 km 2 to 1829.68 km 2 .
We excluded a few regions (e.g., plateau and plain areas) of the Sichuan Province in susceptibility mapping because of the inadequate conditions to trigger a debris flow.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 20 woodland, grassland, waterbody, construction land, and unused land), and the area of each type was also taken as a factor.
Watersheds were selected as mapping units to avoid problems of raster grid-cells, such as lack of physical relations with debris flows [26]. Watershed units have significant conceptual and operational advantages [8]. We used the Hydrological Analysis Tool of ArcGIS v.10.2 software to divide the study area into watersheds. Figure 3 exhibits 2471 mapping units ranging from 8.26 km 2 to 1829.68 km 2 . We excluded a few regions (e.g., plateau and plain areas) of the Sichuan Province in susceptibility mapping because of the inadequate conditions to trigger a debris flow. The raw data of the causal factors were resampled based on the mapping units, using the Zonal Statistics Tool in the ArcGIS v.10.2 software. The following factors took the average value in the watershed-mean slope angle, mean altitude, moisture index, aridity index, mean annual temperature, accumulated temperature of 10 °C, annual precipitation, and NDVI. Moreover, other factors took the mode value in the watershed, including slope aspect, seismic intensity, lithology, soil erosion, and land use. In particular, five factors had sub-factors. (1) The seismic intensity was reclassified into five groups (<VI, VI, VII, VIII, and ≥IX), and the area of each group was also taken as a factor. (2) The lithology was constructed with five groups based on hardness (extremely soft, soft, moderate hard, hard, and extremely hard), and the area of each group was also taken as a factor. (3) The soil texture included three factors-clay content, sand content, and silt content. (4) Soil erosion was reclassified into six groups (micro, mild, moderate, serious, drastic, and very drastic), and the area of each group was also taken as a factor. (5) Land use was interpreted as six groups (cropland, woodland, grassland, waterbody, construction land, and unused land), and the area of each type was The raw data of the causal factors were resampled based on the mapping units, using the Zonal Statistics Tool in the ArcGIS v.10.2 software. The following factors took the average value in the watershed-mean slope angle, mean altitude, moisture index, aridity index, mean annual temperature, accumulated temperature of 10 • C, annual precipitation, and NDVI. Moreover, other factors took the mode value in the watershed, including slope aspect, seismic intensity, lithology, soil erosion, and land use. In particular, five factors had sub-factors. (1) The seismic intensity was reclassified into five groups (<VI, VI, VII, VIII, and ≥IX), and the area of each group was also taken as a factor. (2) The lithology was constructed with five groups based on hardness (extremely soft, soft, moderate hard, hard, and extremely hard), and the area of each group was also taken as a factor. (3) The soil texture Remote Sens. 2020, 12, 295 6 of 20 included three factors-clay content, sand content, and silt content. (4) Soil erosion was reclassified into six groups (micro, mild, moderate, serious, drastic, and very drastic), and the area of each group was also taken as a factor. (5) Land use was interpreted as six groups (cropland, woodland, grassland, waterbody, construction land, and unused land), and the area of each type was also taken as a factor. Therefore, 42 initial causal factors were prepared. An excellent debris flow susceptibility model relies on a set of suitable factors, so the above factors should be further evaluated and selected [8]. For this study, we adopted the backward variable selection method based on the RF algorithm to select the optimal factors and improve the predictive capability [27]. First, we constructed and evaluated an RF model with all factors, where the model performance and variable importance were recorded. Then, the factor with the lowest importance was eliminated and a new model was implemented. This procedure was repeated until there was only one factor left. Finally, a set of factors with the highest performance was chosen for the final prediction, and the rest were removed. In Figure 4 it can be seen that only 15 factors were selected as the optimal predictors for assessing the susceptibility of debris flow in this study.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 20 prediction, and the rest were removed. In Figure 4 it can be seen that only 15 factors were selected as the optimal predictors for assessing the susceptibility of debris flow in this study.

Partition of Data Sets
Of the 2471 watersheds in the study area, 772 watersheds were positive cases (debris flows had occurred) and the remaining 1699 watersheds were negative cases (debris flows had not occurred). According to previous studies, the size of the types of cases selected for a model should be similar [28,29]. Therefore, 772 negative cases were selected randomly along with the same number of positive cases, to train and validate the models. In general, approximately 70% of the data was randomly selected for model training, meanwhile the remaining 30% was used for model validation. This data partitioning method, namely the validation set approach, was easy to implement. However, the ratio of training/validation set needed to be chosen carefully. The inappropriate ratio might cause potential problems in the procedure of data mining, such as overfitting or deficient model training, which significantly affects the predictive performance of the model. A split of 70%-30% is a common choice adopted by many investigators for coping with this challenge [17,22,30,31]. Therefore, 1082 watersheds consisted of 541 positive cases and 541 negative cases were used to train the models,

Partition of Data Sets
Of the 2471 watersheds in the study area, 772 watersheds were positive cases (debris flows had occurred) and the remaining 1699 watersheds were negative cases (debris flows had not occurred). According to previous studies, the size of the types of cases selected for a model should be similar [28,29]. Therefore, 772 negative cases were selected randomly along with the same number of positive cases, to train and validate the models. In general, approximately 70% of the data was randomly selected for model training, meanwhile the remaining 30% was used for model validation. This data partitioning method, namely the validation set approach, was easy to implement. However, the ratio of training/validation set needed to be chosen carefully. The inappropriate ratio might cause potential problems in the procedure of data mining, such as overfitting or deficient model training, which significantly affects the predictive performance of the model. A split of 70%-30% is a common choice adopted by many investigators for coping with this challenge [17,22,30,31]. Therefore, 1082 watersheds consisted of 541 positive cases and 541 negative cases were used to train the models, while 462 watersheds contained 231 positive cases and 231 negative cases served the output validation. The positive/negative cases were labeled as 1/0 for modeling. To obtain more robust conclusions, the sampling procedure was repeated three times. All three sample datasets participated in the model operation. Lastly, the values of causal factors were resampled for each watershed.

Model Construction Using Machine Learning Algorithms
After preparing datasets, we constructed and trained four debris flow susceptibility models using machine learning algorithms. The statistical tool R version 3.4.4 was used for the model training [32]. We paid more attention to adjust and optimize the parameters based on the cross-validation approach, for improving the effectiveness of the models. The model output was the occurrence probability of debris flow and was used to simulate susceptibility.

Logistic Regression (LR)
Logistic Regression (LR) is a multivariate regression algorithm that has been extensively used for the susceptibility assessment [22,33,34]. LR is suitable to understand the relationship between a binary variable (whether the debris flow will occur or not) and several causal factors, and estimate the probability of an event [35]. The logit-natural logarithm of LR can be expressed as below: Therefore, in this study, the probability p of a debris flow occurrence in each watershed could be estimated by using the following equation: where X = (X 1 , . . . , X p ) are the debris flow causal factors, β 0 represents the intercept, (β 1 , . . . , β p ) are the regression coefficients. LR uses the maximum likelihood method to estimate (β 1 , . . . , β p ). Finally, the probability of a debris flow occurring varies from 0 to 1.

Random Forest (RF)
Random Forest (RF) is a multivariate model that belongs to one of the ensemble-learning techniques [36]. The algorithm is also suitable for debris flow susceptibility assessment. According to the decision rules, a series of decision trees were established, and final decision (whether the debris flow will occur or not) was determined based on the majority vote [37]. When constructing these decision trees, each time a split in a tree was considered, a random sample containing m causal factors was selected as the split candidates, among all factors. Forcing each split to consider only a subset of all factors helped to overcome the weakness of overfitting and improved the stability. This process was thought of as de-correlating trees, thereby, making the results more reliable. There were two important parameters, namely the number of trees and the tree depth, which needed to be tuned when modeling. Additionally, to assess factor importance, the mean decrease accuracy and mean decrease Gini were calculated [38][39][40].

Support Vector Machines (SVM)
Support Vector Machines (SVM) was developed in the 1990s [41] and has grown into a popular approach for classification because of its superior empirical performance in a variety of settings [30,42,43]. In this study, debris flow causal factors were mapped into a high-dimensional feature space. Then, the model struggled to detect a hyperplane to separate positive cases and negative cases, as much as possible [44]. The optimal hyperplane can be obtained by solving the following optimization problem: maximize β 0 ,β 11 ,β 12 ,··· ,β p1 ,β p2 ,ε 1 ,ε 2 ,··· ,ε n M subject to y i where C is a non-negative tuning parameter, M is the width of the margin, ε 1 , ε 2 , . . . , ε n are slack variables. Later, to classify new data, the decision function can be written as below: where K is the function that we will refer to as a kernel, b represents the offset from the origin of the hyperplane, n means the number of causal factors, and α i are positive real constants. The radial basis kernel function was adopted in this study due to its robustness, as reported by Rahmati et al. [30] and Kavzoglu et al. [45]. The core parameters of SVM modeling included gamma and cost.

Boosted Regression Trees (BRT)
Boosted Regression Trees (BRT) is an approach of combining gradient boosting algorithm with classification and regression trees [46]. BRT adopts a method similar to RF to implement the debris flow susceptibility assessment. The difference is that smaller trees are typically sufficient in BRT, because of their slow learning process. Additionally, each tree in BRT is created based on the modification of previous trees, unlike the RF algorithm. The core of training the BRT model is to select the optimal value of three pivotal parameters-the shrinkage coefficient, the number of trees, and splits in each tree. They control the rate at which boosting learns, the model's scale, and the complexity of the boosted ensemble, respectively. The optimal parameters were automatically set through cross-validation.

Evaluation and Comparison Methods
In this study, two commonly used criteria, including the predictive accuracy (ACC) and receiver operating characteristic (ROC) curve were applied to quantify and compare the performance of models. ACC is a statistical metric that relies on the components of the confusion matrix [30,47]. As Table 2 shows, the confusion matrix reveals the discrepancy between the model results and the actual observed outcomes. ACC can be estimated by the following equation: where TP and TN refer to the number of watersheds that are correctly classified, while FP and FN refer to the number of watersheds classified incorrectly.  The ROC curve, which elucidates the alterations of true positive rate (TPR) and false positive rate (FPR) when the discrimination threshold changes [48,49], is also a widely used technique to measure the goodness-of-fit and the predictive power of probabilistic models. TPR is the ratio of positive cases that are correctly identified under a specific threshold value. FPR means the ratio of all negative cases that are incorrectly predicted to be positive, under the same threshold value. They also rely on the confusion matrix and can be obtained from following equations: This popular graph visualizes the confusion matrix under various thresholds and tracks two kinds of classification errors [50]. The overall performance of debris flow susceptibility models is quantified by the area under the curve (AUC). An ideal ROC curve should be close to the upper-left corner, usually the higher the AUC value the better the model. According to the previous studies, the performance of a model based on the AUC value can be classified as several levels: 0.5~0.6 = poor, 0.6~0.7 = moderate, 0.7~0.8 = acceptable, 0.8~0.9 = excellent, and 0.9~1 = almost perfect [19,30].

Development of Debris Flow Susceptibility Maps
The core of LR modeling was the estimation of the regression coefficients using the maximum likelihood method. During the RF modeling, the number of trees and the tree depth were determined as 1000 and 5. For the SVM model, the parameters, gamma and cost, were tuned to 1 and 10, respectively. The important parameters in the BRT model, i.e., the shrinkage coefficient, the number of trees, and splits in each tree, were identified to be 0.2, 1000, and 4, respectively. After model building and operation, we averaged the model outputs of three sample datasets to generate the results. Repeated sampling was helpful to reduce sampling error and gain more robust analysis results. Four models were applied to calculate the debris flow susceptibility index for each watershed in the Sichuan Province. According to the computed index, ranging from 0 to 1, susceptibility levels were reclassified into five categories (very low, low, moderate, high, and very high) using the natural break classification method in the GIS environment [17]. Then, susceptibility maps were produced in the GIS platform for visualization ( Figure 5). The results of the assessment showed that watersheds with high and very high debris flow susceptibility were chiefly distributed in the central mountainous region of the study area. Whereas there was lower susceptibility in the western plateau districts as well as the eastern plain districts, with a gentle topography fluctuation. Figure 6 depicts the relative distribution of the susceptibility classes calculated for each model. In the LR model, the low class had the largest proportion (22.42%). 21.57%, 17.44%, 16.35%, and 22.22% of watersheds which fell into the 'very low', 'moderate', 'high', and 'very high' susceptibility classes, respectively. For the debris flow susceptibility maps of the RF and SVM model, the percentages of each class were very similar to those acquired by the LR model. Furthermore, the percentage of very high class in the BRT model (12.59%) was small, which was lower than that based on other models. The moderate and lower debris flow susceptibilities were the main levels in the study area.   Figure 6 depicts the relative distribution of the susceptibility classes calculated for each model. In the LR model, the low class had the largest proportion (22.42%). 21.57%, 17.44%, 16.35%, and 22.22% of watersheds which fell into the 'very low', 'moderate', 'high', and 'very high' susceptibility classes, respectively. For the debris flow susceptibility maps of the RF and SVM model, the percentages of each class were very similar to those acquired by the LR model. Furthermore, the percentage of very high class in the BRT model (12.59%) was small, which was lower than that based  Overall, the susceptibility map intuitively describes the prone distribution of future debris flows. The establishment of a large-scale prediction system based on machine learning methods has extremely high application values and broad application prospects. More comprehensive analyses of debris flow prediction system should be conducted to guide the practice of disaster prevention and reduction in the future.

Evaluation and Comparison of Machine Learning Models
The performance of the four models was evaluated and compared using the criteria chosen in Section 3.3. Analyses of the ACC and AUC using the training set are shown in Figure 7a and Table 3. The highest AUC value belonged to the BRT model (AUC = 0.907), followed by the RF model (AUC = 0.870), the SVM model (AUC = 0.865), and the LR model (AUC = 0.843), respectively. Similarly, it could be found that the BRT model had the highest ACC value (0.823), other models followed it. The criteria showed a high goodness-of-fit for all models in the training step. However, performance in the training step was not enough to assess the prediction capacity of the model [51]. Therefore, we paid more attention to the performance of models in the validation set. Table 4 and Figure 7b  The ACC values of these models are far above 75% and the AUC values range from excellent to almost perfect. According to these, all four machine-learning models performed well, considering the above factors for debris flow susceptibility mapping. The BRT model was superior to the rest of the models. This reiterates the fact that a data-driven classification model that learns slowly shows impressive performance [48]. Overall, the susceptibility map intuitively describes the prone distribution of future debris flows. The establishment of a large-scale prediction system based on machine learning methods has extremely high application values and broad application prospects. More comprehensive analyses of debris flow prediction system should be conducted to guide the practice of disaster prevention and reduction in the future.

Evaluation and Comparison of Machine Learning Models
The performance of the four models was evaluated and compared using the criteria chosen in Section 3.3. Analyses of the ACC and AUC using the training set are shown in Figure 7a and Table 3. The highest AUC value belonged to the BRT model (AUC = 0.907), followed by the RF model (AUC = 0.870), the SVM model (AUC = 0.865), and the LR model (AUC = 0.843), respectively. Similarly, it could be found that the BRT model had the highest ACC value (0.823), other models followed it. The criteria showed a high goodness-of-fit for all models in the training step. However, performance in the training step was not enough to assess the prediction capacity of the model [51]. Therefore, we paid more attention to the performance of models in the validation set. Table 4 and Figure 7b  The ACC values of these models are far above 75% and the AUC values range from excellent to almost perfect. According to these, all four machine-learning models performed well, considering the above factors for debris flow susceptibility mapping. The BRT model was superior to the rest of the models. This reiterates the fact that a data-driven classification model that learns slowly shows impressive performance [48].

Assessment of Factor Importance
To evaluate the effect of factor selection, BRT was utilized. The AUC value of BRT experienced improvement after removal of unimportant factors. Therefore, a careful analysis of causal factors before modeling is indispensable. Considering the relevance and their corresponding weights, and discarding unimportant factors, result in better forecasting performance.
A variety of factors can trigger occurrences of debris flows. Under the premise that the main controlling factors of debris flow are still controversial, the assessment of factor importance is valuable for interpreting and diagnosing the contribution of different predictor variables. The relative importance of the fifteen factors used to build the models and produce the debris flow susceptibility maps are presented in Figure 8. The results are shown based on the mean decrease of the Gini index in the RF model and are expressed relative to the maximum value. The Gini index is regarded as a commonly-used measurement of total variance across all classifications, and is suitable for assessing the factor importance [48]. A large mean decrease value of the Gini index by splits over a given factor shows a significant predictor. The classification tree models (RF and BRT) have the same mechanism for assessing the relative importance of factors. While LR and SVM rank the factor importance by relying on the regression coefficients and weight vectors, respectively. One of the advantages of applying the Gini index in the RF or BRT model is that it is easier to interpret these results than the SVM or LR. As this figure shows, we can deem that all fifteen factors have positive contributions to debris flow susceptibility modeling. The mean altitude, altitude difference, aridity index, and groove gradient have the largest mean decrease in the Gini index, followed by others. There were four topographical factors, three meteorological factors, three edaphic factors, two geological factors, two

Assessment of Factor Importance
To evaluate the effect of factor selection, BRT was utilized. The AUC value of BRT experienced improvement after removal of unimportant factors. Therefore, a careful analysis of causal factors before modeling is indispensable. Considering the relevance and their corresponding weights, and discarding unimportant factors, result in better forecasting performance.
A variety of factors can trigger occurrences of debris flows. Under the premise that the main controlling factors of debris flow are still controversial, the assessment of factor importance is valuable for interpreting and diagnosing the contribution of different predictor variables. The relative importance of the fifteen factors used to build the models and produce the debris flow susceptibility maps are presented in Figure 8. The results are shown based on the mean decrease of the Gini index in the RF model and are expressed relative to the maximum value. The Gini index is regarded as a commonly-used measurement of total variance across all classifications, and is suitable for assessing the factor importance [48]. A large mean decrease value of the Gini index by splits over a given factor shows a significant predictor. The classification tree models (RF and BRT) have the same mechanism for assessing the relative importance of factors. While LR and SVM rank the factor importance by relying on the regression coefficients and weight vectors, respectively. One of the advantages of applying the Gini index in the RF or BRT model is that it is easier to interpret these results than the SVM or LR.
As this figure shows, we can deem that all fifteen factors have positive contributions to debris flow susceptibility modeling. The mean altitude, altitude difference, aridity index, and groove gradient have the largest mean decrease in the Gini index, followed by others. There were four topographical factors, three meteorological factors, three edaphic factors, two geological factors, two sociometric factors, and one land-cover factors. That is, the topography, meteorology, and edaphology were the most important factor clusters. Previous studies also illustrated that by explaining the general cause of debris flows-surface rock and soil gradually lose their strength because of earthquakes or weather conditions, which are potentially unstable in the steep slopes, and finally, seepage forces formed by rainfall cause them to slide, the slide distance depends on the topography and strength loss amount [52][53][54].
Remote Sens. 2020, 12, x FOR PEER REVIEW  15 of 20 sociometric factors, and one land-cover factors. That is, the topography, meteorology, and edaphology were the most important factor clusters. Previous studies also illustrated that by explaining the general cause of debris flows-surface rock and soil gradually lose their strength because of earthquakes or weather conditions, which are potentially unstable in the steep slopes, and finally, seepage forces formed by rainfall cause them to slide, the slide distance depends on the topography and strength loss amount [52][53][54]. The susceptibility map derived from the BRT model has been combined with the factor maps to analyze the relationship between the causal factors and debris flow occurrences. The distribution of watersheds with high and very high susceptibility classes (737 watersheds) on four most important factor maps (mean altitude, altitude difference, aridity index, and groove gradient) is shown in Figure  9. It can be seen that there are obvious regularities. Most watersheds with high and very high susceptibility classes are highly associated with the following conditions-mean altitude varying from 2000 to 3000 m, altitude difference varying from 2000 to 3000 m, aridity index varying from 0.85 to 1.35, and groove gradient varying from 100% to 200%. According to this case, prevention and mitigation of debris flow risk should be paid more attention to in these types of highly coupled areas. The susceptibility map derived from the BRT model has been combined with the factor maps to analyze the relationship between the causal factors and debris flow occurrences. The distribution of watersheds with high and very high susceptibility classes (737 watersheds) on four most important factor maps (mean altitude, altitude difference, aridity index, and groove gradient) is shown in Figure 9. It can be seen that there are obvious regularities. Most watersheds with high and very high susceptibility classes are highly associated with the following conditions-mean altitude varying from 2000 to 3000 m, altitude difference varying from 2000 to 3000 m, aridity index varying from 0.85 to 1.35, and groove gradient varying from 100% to 200%. According to this case, prevention and mitigation of debris flow risk should be paid more attention to in these types of highly coupled areas. Remote Sens. 2020, 12, x FOR PEER REVIEW 16 of 20

Discussion
On the basis of the remote sensing data, GIS tools, and machine learning algorithms, debris flow susceptibility assessment of the Sichuan Province was implemented. The final four susceptibility maps did not vary considerably between the models and had a consistent spatial distribution pattern. There were obvious regional characteristics exhibited in the susceptibility maps. The transition belt of the Qinghai-Tibet Plateau to the Sichuan Basin concentrates most watersheds of high and very high debris flow susceptibility, where the topography varies enormously. Additionally, this region is coupled with dry valleys and fault zones. Severe soil erosion and frequent earthquakes provide abundant loose materials for debris flows. Similarly, we should also blame the hazard prone on the engineering dregs generated by high-intensity road and hydropower development. Through a combined analysis of factor importance, we identified the high-risk areas and major causal factors that were conducive to preferable hazard prevention. Some factors, such as NDVI and seismic intensity, were always regarded as necessary factors. However, the analyses of factor importance revealed that they were not highly important in this particular application. Hence, we inferred that some factors were site-specific. This inference was in agreement with the investigation conducted by Chen et al. [55].
In this study, all models exhibited good performance and was suitable for constructing debris flow susceptibility maps. Among them, the BRT model was the most reliable and accurate in the study area. As shown in Figure 4, proportions of the different debris flow susceptibility classes from the four models were not exactly the same. From this empirical observation, we concluded that the predictions of the BRT model tended to be optimistic, even though there was no structural evidence. There was no universal agreement on which algorithm performed best on various environments. Each machine learning method has its pros and cons, and the performance of one method is not always better than the other. Rahmati et al. [30] applied seven machine learning methods to analyze the susceptibility of gully erosion and found that the BRT model exhibited a better performance than SVM. However, Garosi et al. [19] illustrated that the BRT and ANN models obtained similar outstanding performance in their research. This might result from the lack of uniform criteria for the selection of factors. Although advanced machine learning methods have slightly different performance in various studies, they always have good predictive abilities and are suitable for the study of susceptibility. Based on the above analyses, we recommend that governors and investigators

Discussion
On the basis of the remote sensing data, GIS tools, and machine learning algorithms, debris flow susceptibility assessment of the Sichuan Province was implemented. The final four susceptibility maps did not vary considerably between the models and had a consistent spatial distribution pattern. There were obvious regional characteristics exhibited in the susceptibility maps. The transition belt of the Qinghai-Tibet Plateau to the Sichuan Basin concentrates most watersheds of high and very high debris flow susceptibility, where the topography varies enormously. Additionally, this region is coupled with dry valleys and fault zones. Severe soil erosion and frequent earthquakes provide abundant loose materials for debris flows. Similarly, we should also blame the hazard prone on the engineering dregs generated by high-intensity road and hydropower development. Through a combined analysis of factor importance, we identified the high-risk areas and major causal factors that were conducive to preferable hazard prevention. Some factors, such as NDVI and seismic intensity, were always regarded as necessary factors. However, the analyses of factor importance revealed that they were not highly important in this particular application. Hence, we inferred that some factors were site-specific. This inference was in agreement with the investigation conducted by Chen et al. [55].
In this study, all models exhibited good performance and was suitable for constructing debris flow susceptibility maps. Among them, the BRT model was the most reliable and accurate in the study area. As shown in Figure 4, proportions of the different debris flow susceptibility classes from the four models were not exactly the same. From this empirical observation, we concluded that the predictions of the BRT model tended to be optimistic, even though there was no structural evidence. There was no universal agreement on which algorithm performed best on various environments. Each machine learning method has its pros and cons, and the performance of one method is not always better than the other. Rahmati et al. [30] applied seven machine learning methods to analyze the susceptibility of gully erosion and found that the BRT model exhibited a better performance than SVM. However, Garosi et al. [19] illustrated that the BRT and ANN models obtained similar outstanding performance in their research. This might result from the lack of uniform criteria for the selection of factors. Although advanced machine learning methods have slightly different performance in various studies, they always have good predictive abilities and are suitable for the study of susceptibility. Based on the above analyses, we recommend that governors and investigators obtain an optimal susceptibility map by comparing and combining multiple models in practical applications. Therefore, further comparison and ensemble studies are necessary to guide the method selection for predicting debris flows.

Conclusions
Comparative studies of multiple machine learning models for debris flow susceptibility mapping are very useful to predict future events. The important contributions of our comparative research are summarized below. In this study, all four machine-learning models showed great performance. The BRT model obtained the optimal goodness-of-fit and predictive capability as compared to the other models, in terms of both AUC and ACC, while it was also stable and did not show any overfitting. Therefore, these models, especially BRT, show promising techniques for producing debris flow susceptibility map. This map of the study area shows that the distribution of the watersheds with high and very high susceptibility is coupled with an extreme topography transition zone. Additionally, environmental data based on RS and GIS provide important data sources for regional analyses of debris flow susceptibility. Proper selection of the optimal factors and appropriate mapping units not only improved the prediction performance of the models but also helps avoid the arbitrariness of the factors used. The topographical factors, meteorological factors, and edaphic factors played the most important role in this case. These study results provide a comprehensive perspective on debris flow susceptibility in the Sichuan Province, which are essential for policymakers to implement sustainable disaster mitigation in high debris-flow-prone areas.