Debris Flow Susceptibility Assessment Using the Integrated Random Forest Based Steady-State Inﬁnite Slope Method: A Case Study in Changbai Mountain, China

: Debris ﬂow events often pose signiﬁcant damage and are a threat to infrastructure and even livelihoods. Recent studies have mainly focused on determining the susceptibility of debris ﬂow using deterministic or heuristic / probabilistic models. However, each type of model has its own signiﬁcant advantages with some irreparable disadvantages. The random forest model, which is sensitive to the region where the terrain conditions are suitable for the occurrence of debris ﬂow, was applied along with the steady-state inﬁnite slope method, which is capable of describing the initiation mechanism of debris ﬂow. In this manner, a random-forest-based steady-state inﬁnite slope method was used to conduct susceptibility assessment of debris-ﬂow at Changbai mountain area. Results showed that the assessment accuracy of the proposed random-forest-based steady-state inﬁnite slope method reached 90.88%; however, the accuracy of just the random forest model or steady-state inﬁnite slope method was only 88.48% or 60.45%, respectively. Compared with the single-model assessment results, the assessment accuracy of the proposed method improved by 2.4% and 30.43%, respectively. Meanwhile, the debris-ﬂow-prone area of the proposed method was reduced. The random-forest-based steady-state inﬁnite slope method inherited the excellent diagnostic performance of the random-forest models in the region where the debris ﬂow disaster already occurred; meanwhile, this method further reﬁned the debris-ﬂow-prone area from the suitable terrain area based on physico-mechanical properties; thus, the performance of this method was better than those of the other two models.


Introduction
As one of the most widespread natural process in mountainous environments, the occurrence of debris flow events often poses significant damage and a threat to infrastructure, urban development, and the livelihood of humans, even potentially resulting in loss of life [1][2][3]. With rapid socio-economic development, this situation tends to be more critical, particularly if it is not well handled. In this case, relevant studies in these popular research areas have been the focus of many researchers. However, given that the actual terrain is extremely complex, because the physico-mechanical properties varied strongly in a spatial sense. So, it is difficult to obtain accurate physico-mechanical properties data at medium/large scale. Only the precision of the slope factor (obtained from the high-precision digital elevation model) is qualified at regional scale. But it is impossible to replace and reflect the influence of the actual terrain (i.e., steep and confined pathways) to debris flow hazard only depending on the slope gradient. But for heuristic/probabilistic models, the debris flow susceptibility is primarily assessed by considering the abundance of the background information (including factors such as slope, curvature, elevation, and terrain complexity). Thus, these heuristic/probabilistic models can clearly reflect the influence of complex terrain on the debris flow hazard. But these types of models mainly consider the susceptibility from the perspective of debris flow pathways; meanwhile, the influences of the other three controlling factors are neglected. Such problems are responsible for the uncertainties in susceptibility assessment when using heuristic/probabilistic or deterministic models, and it is important to solve them to improve the assessment accuracy. Although, either the deterministic or heuristic/probabilistic models are all appropriate for medium or large-scale debris flow susceptibility assessment [16]. But the deterministic models are difficult to obtain the necessary intensive and reliable field survey data and heuristic/probabilistic models have a lack of consideration for initiation mechanisms and physico-mechanical properties. Thus, determining the optimum model for debris flow susceptibility assessment remains a difficult task.
To overcome such a problem, the current study proposes a methodology that combines the results of deterministic models with those of heuristic/probabilistic models for susceptibility assessment. The method calculates susceptibility based on the influences of abundant background information (i.e., steep and confined pathways) and takes the four controlling factors (i.e., water, easily entrained debris, steep slope, and triggering mechanism) into consideration. Therefore, this study uses a new perspective, providing a different train of thought and approach to dealing with debris flow susceptibility assessment, and rendering the results somewhat different from those of previous studies. To demonstrate the feasibility of the proposed method, multi-source data are used to the extent possible to characterize the terrain and physico-mechanical conditions of debris-flow occurrence. The aim is to conduct debris flow susceptibility assessment using the proposed method and verify the accuracy of the result, after that compare with the random forest model (representing the heuristic/probabilistic model) and steady-state infinite slope method (representing the deterministic model) to determine the optimal model.

Study Area
Changbai Mountain is the most likely active volcano to erupt in China, with the most recent eruption occurring in 1900 [35,36]. This area is one of the most famous scenic areas in the world, well-known by the spectacular scenery and by TianChi Lake, which lies in a volcano. With the further development of the scenic nature of the mountainous landscape and TianChi Lake, the popularity of this area for domestic and international tourism has increased tremendously, leading to a significant increase in the likelihood of being affected by a hazard. The study area is located on the north side of Changbai Mountain (128 • 2 48" E to 128 • 3 28" E and 42 • 3 30" N to 42 • 1 20" N) of Jilin Province, China ( Figure 1). The area size is approximately 10 km 2 , and the main part of the scenic area is located in this region, including several well-known scenic spots, such as Changbai Waterfall, Changbai Mountain, TianChi, and Yuehua Square, containing well-developed tourism infrastructure. The elevations of the study area range from 1755 m to 2655 m a.s.l., with a mean value of 2170 m a.s.l., whereas the slopes range from 0 • to 83 • , with a mean value of 25.53 • . Thus, the terrain of the study area was extremely complex. Owing to the occurrence of several eruptions in the past 5000 years [37], the entire area is covered by different thicknesses of pyroclastic debris, with the thicknesses ranging from 0 to 10 m. Because of this characteristic, abundant source material was available for debris flow disasters. Therefore, with the stimulation of the appropriate triggering factor, this area is extremely prone to debris flow disaster. The annual precipitation (one of the potential triggering factors) range from Water 2020, 12, 2057 4 of 20 1300 mm to 1400 mm, and extreme precipitation event often occur in this area [38]. In addition, the water capacity of TianChi Lake is approximately 2040 km 3 , and the lake level may uplift rapidly when affected by some triggering event (e.g., collapse or eruption). By observing Figure 1, we can note that the topography of this region leads to the lake water draining only from the north side, causing debris flow disaster.

Materials and Methods
This research conducted debris flow susceptibility assessment using proposed random forest-based steady-state infinite slope method (RF-SSIS), random forest (RF) mode, steady-state infinite slope (SSIS) method, and verified the accuracy of results respectively. To find out the most optimum model. The framework of whole study is shown in Figure 2.
First, through the topographic map, nine terrain parameters (i.e., slope, elevation, curvature, plane-curvature, profile-curvature, topographic wetness index (TWI), terrain ruggedness index (TRI), slope gradient, slope length index (SLI), and distance to rivers) were obtained, which were used in RF model. The ten physico-mechanical parameters (i.e., cohesion, internal friction angle, hydraulic conductivity, slope gradient, thickness of the loose sediments, cumulative drainage area, length of cumulative drainage, specific weight of water, specific weight of saturated loose sediments, and net rainfall) used in SSIS method were obtained from high resolution image, source material thickness (thickness of the loose sediments) map, and geological map.
Second, the terrain parameters were divided into training and testing data. The training data was used to train the RF model and calculate the susceptibility of study area. The ten physico-mechanical parameters were used to calculate the factor of safety value (susceptibility value) through SSIS method.
Third, all nineteen parameters were assigned to the RF-SSIS method to calculate the debris flow susceptibility of the study area.

Materials and Methods
This research conducted debris flow susceptibility assessment using proposed random forestbased steady-state infinite slope method (RF-SSIS), random forest (RF) mode, steady-state infinite slope (SSIS) method, and verified the accuracy of results respectively. To find out the most optimum model. The framework of whole study is shown in Figure 2.  Finally, the prediction accuracies of the three models were calculated and the most optimum model was obtained.
The probabilistic model was used to measure the likelihood of debris flow occurring in the terrain of the study area. Thus, only parameters that represent the complexity of the terrain were selected for computational purposes. With the help of GIS tools, the 1-m resolution digital elevation model (DEM) was obtained from a topographic map with a 1:5000 scale, as provided by the Jilin Institute of Geological Environment Monitoring (JIGEM). In the same DEM, nine parameters (Figure 3a,b) were identified and used in the models to calculate the susceptibility of the debris flow. Such parameters include the slope, elevation, curvature, plane-curvature, profile-curvature, topographic wetness index (TWI), terrain ruggedness index (TRI), slope gradient, slope length index (SLI), and distance to rivers. For successful debris flow susceptibility assessment, it is important to focus on accuracy and a reasonably complete disaster inventory [39]. Thus, the debris flow inventory of the study area was provided by JIGEM while recording the details, including the occurrence location, time, intensity, source area, pathways, accumulative area, and the triggering factors-time and intensity-of eight investigated debris flow events. According to Cama [11] the debris flows can be described as rapid gravity-induced mass movements controlled by topography, which are usually triggered as a consequence of storm rainfall. Therefore, the prone area should be the source area and pathways; meanwhile, the pixels inside the accumulative area should not be included in the sample data. In this study, there are 101,830 pixels within the source area and pathways that belong to the sample data, of which 80% were selected as the training data to train the model, while the remaining 20% was used to verify the accuracy of the results.
Water 2020, 12, x FOR PEER REVIEW 6 of 21  From the deterministic models used in this study, four main parameters were needed, namely, (1) internal friction angle, (2) cohesion, (3) thickness of the soil, and (4) hydraulic conductivity. Internal friction angle is the shear strength index of soil, it reveals the friction properties; cohesion is the Water 2020, 12, 2057 7 of 20 attraction between homogeneous substance. These two parameters indicate the strength of soil stability, the higher the value means the soil is more stability. Soil is the most important component of debris flow, the thickness of soil directly affects occurrence probability and intensity of debris flow. When the thickness is equal to 0 or very small, it only can cause flood instead of debris flow. Hydraulic conductivity is the index that measures the infiltration speed and volume of water; it affects the internal balance of soil. When the internal balance breaks, the soil becomes unstable. For accuracy and an appropriate definition of the soil's physico-mechanical properties for the given region, lab tests should be conducted. However, owing to the spatial variability of these parameters, the values of the parameters tend to vary strongly spatially, even in small areas, and thus significantly more in a medium-scale area such as the study area. Thus, on a regional scale, defining the properties of the soil through a lab test is not feasible. Therefore, some studies defined the properties of the soil based on soil maps and the support of expert criteria [40]. They assumed that the unique physico-mechanical properties of each soil classes were homogeneous, did not vary spatially. Thus, the different physico-mechanical properties could be assigned to each soil classes with the help of experts with professional knowledge of the geological, geomorphological, and geotechnical characteristics of the study area. Notably, the correct procedure to discriminate the different physico-mechanical characteristics of the homogeneous region should be based on the soil maps; however, sometimes, the soil map was not available at the regional scale [41]. Another way to define soil properties was to extrapolate them from geological and land coverage maps that are more ordinary. In this way, the homogeneous regions could be reclassified as a portion of the territories with the same geology formation and/or the same land coverage features. This study based on the thoughts presented by Bregoli [40], to differentiate homogeneous soil classes based on the land coverage and geology maps (Tables 1 and 2, respectively). The use of such assigned values in this study is supported by various local and international studies [40,[42][43][44]. In this study, the critical net rainfall threshold used was provided by the Jilin Meteorological Service, and the land coverage map was obtained from a high-resolution satellite image−Airbus with a 0.89 m spatial resolution, providing different soil classes as shown in Figure 4. Geological and source material thickness spatial distribution maps at 1:10,000 scale was provided by the JIGEM (Figures 5 and 6).     After the reclassification of the geology classes. Legend codes are shown in Table 2.

Random Forest Regression Model
As a highly flexible machine learning algorithm rising in popularity, random forest (RF) is a classifier that contains several decision trees by considering an ensemble, i.e., forest of n trees, to multiply the efficiency and predictive capability accordingly [45]. Initially, the RF model was applied in marketing or insurance; nowadays, it is used in many probabilistic models to assess disaster susceptibility. The model has been commonly used and is well-known as an excellent means of prediction performance through a reliable processing procedure [46,47]. In addition, it has great tolerance for outliers and data noise and not easy to over-fit. The RF model was conducted in the following steps: (1) Bootstrap aggregation (bagging) was used to randomly extract N distinct samples from the original training sample dataset K times, and K decision trees were built based on these samples; (2) for each conditioning parameter with L attribute variables (in this case there are 9,657,550 variables in each parameter), a random constant I was assigned, and I variables were selected from L, in which I << L, because if I is higher than L, it will create a lot of null value, leading to the invalid results; (3) each node was split according to step 2 until splitting could no longer occur; (4) steps 1-3 were repeated K times to build the random forest. For classification, voting was performed to obtain the optimum result. For regression, the mean value of all trees was used, resulting in the optimum prediction result. Unselected data are referred to as out-of-bag (OOB) data, which were used to calculate the error of the model (i.e., OOB error) and are equal to the standard deviation error between the predicted and observed values. The random forest regression model was used to assess the probability of debris flow occurrence for each pixel by considering a combination of nine parameters, including curvature, TRI, and distance to the river. In the training of the RF model, the debris flow existed regions were recorded as 1, whereas the non-debris flow regions were recorded as 0. After recording, a regression was performed, and the resulting values ranged from 0 to 1. Through the different combination of these parameters, the complexity of the real terrain was explained. The area After the reclassification of the geology classes. Legend codes are shown in Table 2. After data preparation, the obtained parameter was assigned to three models to conduct debris flow susceptibility assessment. The details of three models used in this paper are shown below.

Random Forest Regression Model
As a highly flexible machine learning algorithm rising in popularity, random forest (RF) is a classifier that contains several decision trees by considering an ensemble, i.e., forest of n trees, to multiply the efficiency and predictive capability accordingly [45]. Initially, the RF model was applied in marketing or insurance; nowadays, it is used in many probabilistic models to assess disaster susceptibility. The model has been commonly used and is well-known as an excellent means of prediction performance through a reliable processing procedure [46,47]. In addition, it has great tolerance for outliers and data noise and not easy to over-fit. The RF model was conducted in the following steps: (1) Bootstrap aggregation (bagging) was used to randomly extract N distinct samples from the original training sample dataset K times, and K decision trees were built based on these samples; (2) for each conditioning parameter with L attribute variables (in this case there are 9,657,550 variables in each parameter), a random constant I was assigned, and I variables were selected from L, in which I << L, because if I is higher than L, it will create a lot of null value, leading to the invalid results; (3) each node was split according to step 2 until splitting could no longer occur; (4) steps 1-3 were repeated K times to build the random forest. For classification, voting was performed to obtain the optimum result. For regression, the mean value of all trees was used, resulting in the optimum prediction result. Unselected data are referred to as out-of-bag (OOB) data, which were used to calculate the error of the model (i.e., OOB error) and are equal to the standard deviation error between the predicted and observed values. The random forest regression model was used to assess the probability of debris flow occurrence for each pixel by considering a combination of nine parameters, including curvature, TRI, and distance to the river. In the training of the RF model, the debris flow existed regions were recorded as 1, whereas the non-debris flow regions were recorded as 0. After recording, a regression was performed, and the resulting values ranged from 0 to 1. Through the different combination of these parameters, the complexity of the real terrain was explained. The area is said to have a high likelihood of debris flow occurrence if the combinations are more similar to the area where the debris flow of other similar events has already occurred. Nevertheless, in this study, the RF model was still unable to express the initiation mechanism of the debris flow disaster.

Steady-State Infinite Slope Method
As a traditional approach, SSIS method is able to estimate the factor of safety value to measure the slope stability. According to the conservation of mass and Darcy's law, for a given cell of region with a cumulative drainage area a and length b, the steady-state condition can be expressed as where α is the slope gradient of a given cell, h is the water table depth, z is the thickness of the loose sediments, K is the hydraulic conductivity, q is the groundwater outflow, p is the amount of rainfall, and PE is the potential evapotranspiration. This leads to the value (p − PE) as the net rainfall.
Another main assumption of this method is that the infiltration capacity of the soil considered far exceeds the net rainfall. Hence, over-land water can infiltrate to water instantly which can lead to the negligence of the over-land flow, in which only the groundwater flow is considered.
Combining Equations (1) and (2), the ratio between the water table depth and thickness of loose sediment/s can be derived by the following equation, as postulated by Montgomery and Dietrich [48]: where, I = (p − PE). The stability of a completely saturated loose sediment layer can be computed based on the method proposed by Skempton and DeLory [49] and expressed as the FS: where, ϕ is the internal friction angle c is cohesion, γ w and γ s are the specific weight of water and saturated loose sediments, respectively. The initiation of rainfall-triggered debris flow was due to the high-pore-pressure reducing ratio between the resisting and acting stresses. The condition is that, when the ratio is lower than 1, the loose sediment layer becomes unstable, initiating the debris flow disaster. The behavior is described by the common Mohr-Coulomb failure approach to the infinite slope stability [50]. But, when the soil thickness was greater than the length and width of source materials, this assumption no longer holds up. Despite the thickness of pyroclastic debris in this study ranging from 0 to 10 m, it was still far from its length and width. In this case, the steady-state infinite slope (SSIS) method was adopted to measure the FS of the study area. The SSIS method measures the slope stability through a simplified way to calculate the water pore pressure [51,52]. At first, the method assumed the study area to have rainfall of constant intensity with an indefinite duration, causing the water table to reach steady-state conditions. The slope meets the limit equilibrium condition when FS equals to 1, and if FS ≥ 1, the slope is recognized as unstable.
There are two limit cases that complicates the assessment based on Equation (4). These are the unconditionally unstable case (UUC) and unconditionally stable case (USC), respectively. While UUC represents the unstable slope found in a dry condition, such as a steep slope, USC represents the stable slope found in a completely saturated condition, such as that of a flat area. The two cases are indicated based on Equations (5) and (6). In this study, any region falling under these two conditions was not considered, and hence, removed from the general calculation. Generally, regions with a slope higher than 45 • were eliminated along with flat regions.
The SSIS method used ten parameters including cohesion, internal friction angle, and water conductivity to calculate the FS value of the slope. The FS value can also indicate the susceptibility of debris flow. In this analysis, a higher FS means that the pixel was more prone to debris flow occurrence.

Random Forest Model-Based Steady-State Infinite Slope Method
Despite the good performance exhibited by the RF model and SSIS method in debris flow susceptibility assessment, the two are not without some significant limitations, hindering the suitability of assessment performance. During susceptibility assessment, the RF model considers an abundance of background information i.e., topographical information for calculations, making it somewhat sensitive to terrain prone to debris flow occurrence. However, it did not consider the initiation mechanism of debris flow during calculation; thus, physico-mechanical parameters were not considered in the calculation. On other hand, the SSIS method was used to measure the slope stability by considering the parameters such as cohesion, internal friction angle, source material et al., which is closer to the real debris flow initiation mechanism. However, owing to the spatial variability of physico-mechanical parameters, it is hard to obtain the values of these parameters. But, because the cumulated drainage area and the slope factors were obtained from the high-precision DEM, so only these data were precisely enough for indicating the influence of real terrain conditions on debris flow. In this study, the remaining physico-mechanical parameters were assigned based on the land coverage and geological map. Thus, the precision of remaining parameters was not enough to indicate the influence of real terrain conditions on debris flow, making incomplete and inappropriate results with respect to the actual situation. It is because of such drawbacks that the study uses a random forest model-based steady-state infinite slope method (RF-SSIS) to calculate the debris flow susceptibility, producing an entire new way of thinking, as shown in Figure 7.
First, the RF model (model 1) and SSIS (model 2) method are used to successfully conduct the susceptibility assessment in the study area, in addition, neglecting the use of Equations (5) and (6) to eliminate the unqualified area of model 2.
Second, the results from Model 1 and Model 2 were overlapped, and two results (FS value and RF results) were assigned to each pixel.
In this case, a threshold of 0.5 was set, and filtering was performed to retain the pixels with RF values greater than or equal to 0.5. This allowed to eliminate pixels with no-suitable terrain condition.
Third, only the FS value was retained and the RF result value was deleted. In this case, only the region with the terrain condition suitable for debris flow occurrence has the FS value, and the FS value of other regions was 0. After that, centered as 1 and normalization was performed of the FS result obtained from step 2. Normalization matches the quantized range of FS with the RF result. Furthermore, the slope reached a critical condition at FS = 1. on debris flow. In this study, the remaining physico-mechanical parameters were assigned based on the land coverage and geological map. Thus, the precision of remaining parameters was not enough to indicate the influence of real terrain conditions on debris flow, making incomplete and inappropriate results with respect to the actual situation. It is because of such drawbacks that the study uses a random forest model-based steady-state infinite slope method (RF-SSIS) to calculate the debris flow susceptibility, producing an entire new way of thinking, as shown in Figure 7. First, the RF model (model 1) and SSIS (model 2) method are used to successfully conduct the susceptibility assessment in the study area, in addition, neglecting the use of Equations (5) and (6) to eliminate the unqualified area of model 2. Finally, results obtained from step 3 were overlapped with the RF results obtained from step 1. In this way, each pixel had two values again. Then, the mean of these two values was taken, and a result was obtained with values ranging from 1 to 0. The closer this value is to 1, the more this pixel represent an area prone to debris flow. This method through Step 2 was used to remove the regions where the terrain conditions were not suitable for debris flow occurrence. This is because the RF model was considered for the abundant background information (i.e., terrain information); thus, its result can reveal the influence of the actual terrain on the debris flow more objectively. After elimination, the terrain conditions of the remaining regions (RF > 0.5) were considered debris-flow-prone areas. The final mean value indicates both the influence of physico-mechanical properties and the terrain conditions of the occurring debris flow.

Receiver Operating Characteristic Curve
The performance of the prediction accuracy for debris flow susceptibility can be assessed using the receiver operating characteristic (ROC) curve method, which plots the true positives rate (i.e., sensitivity) versus the false positives rate (i.e., 1-specificity), which was used to measure the goodness-of-fit of the model prediction. The true positives rate is the ration between number of true positives pixels (TP) and number of positives pixels (P). The false positives rate is the ration between number of false positives pixels (FP) and number of negative pixels (N). TP means the pixels classified as debris-flow prone by model, and belongs to the pixels where the debris-flow actually occurred before classification as well. FP means the pixels classified as debris-flow prone by model, but belongs to the pixels where the debris-flow does not occur before classification. P means all the pixels inside the actually occurred debris-flow. N means all the pixels outside the actually occurred debris-flow. P is the sum of TP and FN (false negative). N is the sum of FP and TN (true negative). FN means the pixels classified as debris-flow non-prone by model, but belongs to the pixels where the debris-flow actually occurred before classification. TN means the pixels classified as debris-flow non-prone by model, and belongs to the pixels where the debris-flow does not occur before classification as well. The area under the curve (AUC) value represents the area under the ROC curve, which is utilized to quantitatively show the results of the ROC. The AUC varies from 0.5 (diagonal line) to 1, with higher values indicating better predictive capability of the model.

Results
Susceptibility assessment results of the RF model and SSIS method are shown in Figure 8a,b, respectively. The regions of the RF results with values greater than 0.5 account for 33.84% of the entire study area (Figure 8a), mainly distributed around the major gullies. The regions with FS values (SSIS results) greater than 1 account for 30.99% of the entire surveyed area (Figure 8b), mainly distributed in the regions with slope ranging from 5 • to 40 • . When assessing the OOB error, after establishing 100 trees, the error rate of the RF model reached a minimum with the tendency of becoming stable. Thus, this study used an RF model with 300 trees and conducted a debris flow susceptibility assessment; as such the results are comparatively credible. The result of the RF-SSIS is shown in Figure 9; in this result, the debris-flow-prone region, i.e., the region with pixel values higher than 0.5, accounted for 26.01% of the entire study area (Figure 10). The elimination of the region was conducted based on the result of Model 1 (RF model); thus, the distribution of the debris-flow-prone area of the RF-SSIS was generally similar, even though the area decreased by 7.83% ( Figure 10). After the debris flow susceptibility assessment was conducted, the remaining 20% of the sample data was used to verify the accuracy of the result. It shows that the prediction accuracy (AUC) of Model 1 (RF model) was 88.48%, and that of Model 2 (SSIS method) was 60.45% ( Figure 11). Thus, we noticed that the gap in the prediction accuracy of these two models was significant. Furthermore, the prediction accuracy (AUC) of the RF-SSIS method reached 90.88%, with the AUC value improving by 2.4% and 30.43% (Figure 11), compared with the RF model and SSIS method, respectively. For the RF model, the proposed RF-SSIS method not only improved the prediction accuracy but also reduced the area of the debris-flow-prone region. This is due to the fact that the method identifies and eliminates some regions that are unsuitable for the occurrence of debris flow in terms of the mechanism or physico-mechanical properties. Then, as opposed to the SSIS method, the RF-SSIS method determined the favorable terrain for the occurrence of debris flow based on the abundant background information, with better performance than that provided by the elimination of the unfitted area based on Equations (5) and (6). Afterwards, the susceptibility assessment through the aspect of mechanism and terrain conditions and the result was more reliable and accurate. The result of the RF-SSIS is shown in Figure 9; in this result, the debris-flow-prone region, i.e., the region with pixel values higher than 0.5, accounted for 26.01% of the entire study area (Figure 10). The elimination of the region was conducted based on the result of Model 1 (RF model); thus, the distribution of the debris-flow-prone area of the RF-SSIS was generally similar, even though the area decreased by 7.83% ( Figure 10). After the debris flow susceptibility assessment was conducted, the remaining 20% of the sample data was used to verify the accuracy of the result. It shows that the prediction accuracy (AUC) of Model 1 (RF model) was 88.48%, and that of Model 2 (SSIS method) was 60.45% ( Figure 11). Thus, we noticed that the gap in the prediction accuracy of these two models was significant. Furthermore, the prediction accuracy (AUC) of the RF-SSIS method reached 90.88%, with the AUC value improving by 2.4% and 30.43% (Figure 11), compared with the RF model and SSIS method, respectively. For the RF model, the proposed RF-SSIS method not only improved the prediction accuracy but also reduced the area of the debris-flow-prone region. This is due to the fact that the method identifies and eliminates some regions that are unsuitable for the occurrence of debris flow in terms of the mechanism or physico-mechanical properties. Then, as opposed to the SSIS method, the RF-SSIS method determined the favorable terrain for the occurrence of debris flow based on the abundant background information, with better performance than that provided by the elimination of the unfitted area based on Equations (5) and (6). Afterwards, the susceptibility assessment through the aspect of mechanism and terrain conditions and the result was more reliable and accurate. Water 2020, 12, x FOR PEER REVIEW 15 of 21

Discussion
On the basis of multi-source data, RS, and the GIS technique, the debris flow susceptibility assessment of the study area was implemented using three different models (RF, SSIS, RF-SSIS). The RF model and SSIS method used in this research represented the heuristic/probabilistic and deterministic models, respectively. The SSIS method (i.e., deterministic model) predicted the future according the current situation [53][54][55] and the RF model (i.e., heuristic/probabilistic model) predicted the future based on the past and present [56][57][58][59][60]. Each of these methods has a robust theoretical basis and practical support; however, each has outstanding advantages and also some irreparable disadvantages. The SSIS method, couples a Mohr-Coulomb failure mechanism with a steady state lateral flow to calculate susceptibility. When using SSIS method at a regional scale for assessment, the disadvantages of this method still caused a lower prediction performance. Meanwhile, the SSIS method assumed that each of the pixels inside the study area was an independent infinite slope. So, the pixel interconnection was neglected. [40,61]. However, in actuality, the terrain condition of the source area and pathway of debris flow was extremely complex, meanwhile, every pixels were interacted with the neighboring eight pixels, also affected by them; for instance, it was assumed that before the occurrence of debris flow, each of the pixels inside the region had a slope gradient that differed from others. Owing to the fact that the slope of some pixels may approach 0° or be higher than 45°, these pixels were considered as non-prone by SSIS method. But owing to these pixels inside the source area or pathway region, they actually belong to debris-flow prone in reality. In addition, Figure 8b shows that many plain slope regions were recognized as debris-flow-prone areas. As for every single pixel inside these regions, the parameters of each pixels supported them to classified as debris-flow prone; however, from the regional perspective, the terrain condition of these regions cannot support this classification. For the RF model, owing to the accessibility of high-precision data, it can fix the disadvantage of SSIS method in obtaining accurate and reliable physico-mechanical parameters. The parameters like LSI, TWI, and TRI can reflect the interconnection of pixels. Thus, one of the great advantages is that this model can effectively determine the regions with the terrain condition suitable for the occurrence of debris flow under the support of sufficient disaster inventory and geo-information data. But this is also its biggest weakness, because the RF model was limited to just locating the appropriate regions; yet, there was a lack of consideration of starting mechanism or physico-mechanical properties, causing some

Discussion
On the basis of multi-source data, RS, and the GIS technique, the debris flow susceptibility assessment of the study area was implemented using three different models (RF, SSIS, RF-SSIS). The RF model and SSIS method used in this research represented the heuristic/probabilistic and deterministic models, respectively. The SSIS method (i.e., deterministic model) predicted the future according the current situation [53][54][55] and the RF model (i.e., heuristic/probabilistic model) predicted the future based on the past and present [56][57][58][59][60]. Each of these methods has a robust theoretical basis and practical support; however, each has outstanding advantages and also some irreparable disadvantages. The SSIS method, couples a Mohr-Coulomb failure mechanism with a steady state lateral flow to calculate susceptibility. When using SSIS method at a regional scale for assessment, the disadvantages of this method still caused a lower prediction performance. Meanwhile, the SSIS method assumed that each of the pixels inside the study area was an independent infinite slope. So, the pixel interconnection was neglected. [40,61]. However, in actuality, the terrain condition of the source area and pathway of debris flow was extremely complex, meanwhile, every pixels were interacted with the neighboring eight pixels, also affected by them; for instance, it was assumed that before the occurrence of debris flow, each of the pixels inside the region had a slope gradient that differed from others. Owing to the fact that the slope of some pixels may approach 0 • or be higher than 45 • , these pixels were considered as non-prone by SSIS method. But owing to these pixels inside the source area or pathway region, they actually belong to debris-flow prone in reality. In addition, Figure 8b shows that many plain slope regions were recognized as debris-flow-prone areas. As for every single pixel inside these regions, the parameters of each pixels supported them to classified as debris-flow prone; however, from the regional perspective, the terrain condition of these regions cannot support this classification. For the RF model, owing to the accessibility of high-precision data, it can fix the disadvantage of SSIS method in obtaining accurate and reliable physico-mechanical parameters. The parameters like LSI, TWI, and TRI can reflect the interconnection of pixels. Thus, one of the great advantages is that this model can effectively determine the regions with the terrain condition suitable for the occurrence of debris flow under the support of sufficient disaster inventory and geo-information data. But this is also its biggest weakness, because the RF model was limited to just locating the appropriate regions; yet, there was a lack of consideration of starting mechanism or physico-mechanical properties, causing some regions with suitable terrains to be considered as debris-flow-prone regions, but because of the restrictions on the physico-mechanical properties such as hydraulic conductivity or thickness of the loose sediment, these regions are hard to become debris-flow-prone regions in reality. Even if one tries to assign physico-mechanical properties parameters to the RF model, the lack of the actual values before debris flow made it impossible to achieve.
The main thought of the RF-SSIS method proposed by this research was to integrate the strong points of SSIS and RF models to improve the prediction accuracy of the debris flow susceptibility assessment from entirely different perspectives. The strong point of the RF model was to seek terrain prone to debris flow [47], with the result value ranging from 0 to 1 and 0.5 as the critical point.
Regions with values higher than 0.5 were classified as debris-flow-prone regions, whereas the closer this value is to 1 signifies regions more prone to debris flow. Therefore, filtering was conducted, eliminating the non-conforming regions in the SSIS result based on the condition that the pixel RF value was equal or greater than 0.5. After the elimination, the terrain conditions of the remaining regions was considered as prone to debris flow. Meanwhile, the FS value of the pixel further quantified the likelihood of debris flow occurrence (i.e., the susceptibility) from the aspect of physico-mechanical properties and initiation mechanism [62]. Even if the eliminated regions had a higher FS value but the basic terrain conditions were unsuitable, these regions still did not belong to the debris-flow-prone area. After the remaining SSIS result was overlapped with the result of RF model, the mean value of these two were taken, the RF-SSIS models result was obtained with value ranging from 0 to 1. The RF-SSIS models not only evaluated the debris flow susceptibility from the perspective of the deterministic model but also evaluated the debris flow susceptibility from the perspective of the heuristic/probabilistic model. In addition, when the RF-SSIS result was higher than 0.5, it must have meant that these pixels were prone to debris flow, and there were several situations that lead to the values greater than 0.5. First, these pixels were suitable for the occurrence of debris flow both from the terrain condition and the physico-mechanical properties (both the values RF and SF were higher than 0.5); thus, the final susceptibility values were higher than 0.5. Second, in the case in which one condition was very suitable for debris flow, e.g., the very suitable region terrain conditions (RF value higher than 0.7) with unsuitable physico-mechanical properties (SF value between 0.3 to 0.4), synthetical considerations determined this region as debris-flow-prone regions (final value higher than 0.5). If both conditions were not suitable (both lower than 0.5) or one condition was extremely unsuitable (equal to 0), this region must not be the debris-flow-prone area because the final value was lower than 0.5.
As shown in Figure 11, when comparing the proposed method with the RF model, the determination of true-positive pixels was improved slightly, but the determination of false-positive pixels was improved significantly. From Figures 8 and 9 we can notice the pattern mentioned before, the reduction of the prone area outside the debris-flow-existed region was significant; however, the status inside the debris-flow-existed region was basically unchanged. This is due to the proposed RF-SSIS method inheriting the excellent diagnostic performance of the RF model (i.e., heuristic/probabilistic model) for the region where a debris flow disaster already existed [63]; meanwhile, this method further refined the debris-flow-prone area from the suitable area terrain condition based on the physico-mechanical properties. This is the reason why the proposed RF-SSIS method had better predicting performance than the RF model; however, the prediction accuracy did not improve very well, because under the support of historical data, the RF model exhibited very high prediction accuracy for debris-flow exist areas [14]; so the space for improvement was limited and difficult to further refine. Therefore, even though the determination of FP was improved significantly, the RF-SSIS method classified just 758 more TP pixels than did the RF model; the determination on TP was less improved, causing an insignificant improvement in the prediction accuracy. However, as the most representative model of heuristic/probabilistic model, the RF model showed excellent performance in previous debris flow susceptibility assessment [64,65], despite the prediction accuracy of the proposed RF-SSIS method improved slightly compared to the RF model; nevertheless, the assessment method was still shown to have been improved. For the SSIS method, an assessment was conducted based on the triggering mechanism and physico-mechanical properties, and when this method was used in regional assessment, it also demonstrated good performance [66,67]. Through Figure 11 the prediction accuracy of proposed RF-SSIS method is shown to improve significantly as that of the SSIS method, as the SSIS method was a prediction method that calculates the susceptibility based on the current data. Thus, the prediction performance on existing debris flow regions was relatively lower. However, the final result of the RF-SSIS method not only indicated the influence of the terrain on the debris flow but also the influence of the physico-mechanical properties on the debris flow; thus, the prediction accuracy for the region where the debris flow disaster already exist was significantly higher than that for the SSIS model. Meanwhile, this method inherited the performance of SSIS models in determining the area where debris flow would not occur; thus, the false-positive pixel determination was improved significantly compared to that of the RF model. Thus, these situations resulted in an improvement in the prediction accuracy performance.

Conclusions
This study aims to use the proposed random-forest-based steady-state infinite slope method to analyze debris flow susceptibility of Changbai mountain area. The Changbai mountain is located in the east of Jilin Province, where the debris flow disasters caused by the extreme rainfall events were widely present. Meanwhile, as a most famous active volcano in China, if there is an eruption in the future, it may cause collapse or flood events, which will trigger the debris flow disasters. So, it is crucial to assess the debris flow susceptibility of this area, for disaster prevention and mitigation. The RF-SSIS method integrates the random forest model with steady-state infinite slope method, using the advantage of the RF model (i.e., sensitive to the regions where the terrain conditions are prone to debris flow) and the advantage of the SSIS method (i.e., measure the slope stability through the aspects of physico-mechanical properties and initiation mechanism) to further refine the assessment result. The prediction accuracy of RF-SSIS method reached 90.88%, improving the prediction accuracy by approximately 2.4% and 30.43%, compared with the RF model and SSIS method, respectively. According to the results, the debris-flow-prone area of the RF-SSIS result was the smallest among the models. Thus, the results have the characteristics of both the deterministic and heuristic/probabilistic models, and the accuracy was higher than those of these models. Therefore, the RF-SSIS method provides a different perspective to the research of debris flow susceptibility assessment. However, further improvement is still needed. The difficulty of obtaining the value of physico-mechanical properties and assigning these values appropriately on a spatial scale provided the challenge. Meanwhile, this research offered a different perspective, which integrated two distinct methods to conduct susceptibility assessment; however, how to combine them organically was the key point to further improve prediction accuracy. Therefore, the current study presented this as a challenge requiring attention and proposed it as a point of future study to improve the prediction accuracy in the comprehensive study on debris flow and other similar natural disaster management.