A Method for Improving Controlling Factors Based on Information Fusion for Debris Flow Susceptibility Mapping: A Case Study in Jilin Province, China

Debris flow is one of the most frequently occurring geological disasters in Jilin province, China, and such disasters often result in the loss of human life and property. The objective of this study is to propose and verify an information fusion (IF) method in order to improve the factors controlling debris flow as well as the accuracy of the debris flow susceptibility map. Nine layers of factors controlling debris flow (i.e., topography, elevation, annual precipitation, distance to water system, slope angle, slope aspect, population density, lithology and vegetation coverage) were taken as the predictors. The controlling factors were improved by using the IF method. Based on the original controlling factors and the improved controlling factors, debris flow susceptibility maps were developed while using the statistical index (SI) model, the analytic hierarchy process (AHP) model, the random forest (RF) model, and their four integrated models. The results were compared using receiver operating characteristic (ROC) curve, and the spatial consistency of the debris flow susceptibility maps was analyzed while using Spearman’s rank correlation coefficients. The results show that the IF method that was used to improve the controlling factors can effectively enhance the performance of the debris flow susceptibility maps, with the IF-SI-RF model exhibiting the best performance in terms of debris flow susceptibility mapping.


Introduction
A fast-moving debris flow that has a wide influence range can be defined as a transient mass motion within the loose steep slope channel due to rainfall. Debris flow, which causes a substantial loss of lives and property [1,2], has become one of the most dangerous geological disasters in the world and it poses a serious threat to the living environment of humans [3].
Debris flow susceptibility mapping (DFSM) can predict the location of debris flow that is based on terrain, as well as geological and hydrological characteristics, to prevent and reduce the impact of debris flow disasters [4][5][6]. With the development of GIS and remote sensing technology an increasing number of methods are used for DFSM.
The models for DFSM in previous studies are mainly divided into three categories: statistical models, heuristic models, and soft computing models [7]. For statistical models, Xu et al. [8] used the information value model to analyze the debris flow susceptibility in Sichuan province, China. In addition, the logistic regression model [9], evidence belief function [10], weight of evidence [11], frequency ratio [10,12], and statistical index (SI) [13] have been used extensively. Regarding heuristic

Data Preparation
Data collection is the basis for subsequent analysis [33][34][35]. In this study, a debris flow inventory map, including 868 debris flow events, which was compiled based on the debris flow data  The study area is 5.0 m to 2691 m above sea level, and it is located in two geomorphological units: the eastern Changbai Mountain and the western Songliao Plain. The terrain is high in the southeast and low in the northwest. Bounded by the latitude 42 • 40 -43 • , the study area spans two major structural units: the Tarimand-China-north Korea para platform area and the Tianshan-Xingan trough fold area.
The exposed strata are from the Archean Eon to the Cenozoic Era. The rock mass can be divided into 13 rock groups, according to lithological characteristics. The lithology consists mainly of granite, basalt, glutenite, clay rock, pyroclastic rock, carbonate rock, and gneiss.
The geological environmental conditions in the study area are complex, and the occurrence of geological hazards is the result of multiple factors. After the study area enters the rainy season, debris flow occurs frequently, causing huge economic losses every year. We selected Jilin province as the study area, because such disasters occur frequently in this region; thus, sufficient data can be collected to verify the IF method, which has practical value for our research.

Data Preparation
Data collection is the basis for subsequent analysis [33][34][35]. In this study, a debris flow inventory map, including 868 debris flow events, which was compiled based on the debris flow data before 2012, as provided by the Jilin Provincial Department of Land and Resources, was combined with field investigation data ( Figure 2). Then, the debris flow events were randomly divided into training and validation datasets: 70% (608 events) were used for training the models, and 30% (260 events) were used for validation.

Data Preparation
Data collection is the basis for subsequent analysis [33][34][35]. In this study, a debris flow inventory map, including 868 debris flow events, which was compiled based on the debris flow data before 2012, as provided by the Jilin Provincial Department of Land and Resources, was combined with field investigation data ( Figure 2). Then, the debris flow events were randomly divided into training and validation datasets: 70% (608 events) were used for training the models, and 30% (260 events) were used for validation.
According to the characteristics of debris flow and the results of a 1:100,000 geological disaster investigation in the study area, the relationship between debris flow and the geological environment was analyzed. Then, based on experience [36][37][38][39][40][41], nine layers of debris flow controlling factors (i.e., topography, elevation, annual precipitation, distance to water system, slope angle, slope aspect, population density, lithology, and vegetation coverage) were taken as predictors. The spatial database for the study area is shown in Table 1.  According to the characteristics of debris flow and the results of a 1:100,000 geological disaster investigation in the study area, the relationship between debris flow and the geological environment was analyzed. Then, based on experience [36][37][38][39][40][41], nine layers of debris flow controlling factors (i.e., topography, elevation, annual precipitation, distance to water system, slope angle, slope aspect, population density, lithology, and vegetation coverage) were taken as predictors. The spatial database for the study area is shown in Table 1.
The details of the grading standard of each controlling factor are as follows. The hierarchical diagram of the controlling factors is shown in Figure 3.
Topography affects the formation, movement, and scale of debris flow [42]. In this study, according to the order of the density of debris flow point in each geomorphic unit from small to large, the topography was divided into group 1 (Songliao low plain, piedmont slope plain, high plain), group 2 (mountain basin, valley), group 3 (hills, low hills), and group 4 (medium and low mountains, terraces).

Methodology
The methods that were used in this study can be summarized in three parts. The first part is the IF method for improving the controlling factors. The second part is the models for DFSM. Six models, i.e., SI, AHP, SI-AHP, SI-RF, AHP-RF, and SI-AHP-RF, were used to develop the debris flow susceptibility maps. The third part is the method that is used to verify the results. ROC curve analysis is used to verify the success rate and prediction rate of the debris flow susceptibility maps, and the Spearman's rank correlation coefficients are used to verify the spatial consistency of the debris flow susceptibility maps. Figure 4 shows the flow of the research methods.   The relative height difference determines the gravitational potential energy inside the slope [3]. According to survey statistics, the Changbai Mountain area in the eastern part of the study area is 800-1500 m above sea level: the main peak of Changbai Mountain and its surrounding peaks are above 2000 m, and the central plateau plain is 600-800 m above sea level. According to these critical values, the elevation was divided into four classes: 0-600 m, 600-800 m, 800-1500 m, and >1500 m.
Rainfall plays an important role in slope instability [4]. Debris flow disasters are mostly caused by continuous rainfall. Therefore, the annual precipitation, which is mainly distributed between 600-1000 mm in the study area, was selected as the controlling factor [43,44]. Then, the interval from 600-1000 mm was divided into two parts on average. Thus, annual precipitation was divided into four categories: 0-600 mm, 600-800 mm, 800-1000 mm, and >1000 mm.
The steep slopes provide loose material for debris flow [45]. The slope angle in the northwest of the study area is mainly 0-5 • , while the slope in the southeast is generally near 10 • , and, in a few areas, it is more than 20 • . According to these critical values, the slope angle was divided, as follows: 0-5 • , 5-10 • , 10-20 • , and >20 • .
The slope aspect is related to precipitation and topographical trends [1]. According to the influence of light, the slope aspect was divided into shady slope (135 • -180 • , 180 • -225 • ), semi-shady slope The population density indirectly reflects the influence of human activities on the geological environment. Human activities can cause vegetation degradation and changes in topography and geomorphology, indirectly increasing the possibility of debris flow. According to the number of people per square kilometers, the population density was divided into four classes: very low (0-10), low (10-100), moderate (100-500), and high (>500). The lithology controls the stability of the slope and determines the amount of material that is available for debris flow [46,47]. According to the anti-weathering ability of rock, the lithology was divided into four types: soil, soft rock, hard rock and extremely hard rock.
The lower the vegetation coverage is, the more easily the rock mass becomes weathered, and the more likely it is that debris flow will occur. In this study, the vegetation coverage of the eastern Changbai Mountains is more than 80%, while that of the western plain is less than 20%. The vegetation coverage in the central region is mainly between 20% and 50%. According to these critical values, vegetation coverage was divided into four classes: low (<20%), moderate (20%-50%), high (50%-80%), and very high (>80%).
Rivers will erode the rock mass at the bottom of a slope, affecting the stability of the slope. In general, the likelihood of debris flow decreases as the distance to the water system increases [13]. In this study, the distance to the water system was divided into six classes:0-500 m, 500-1000 m, 1000-1500 m, 1500-2000 m, 2000-2500 m, and >2500 m.

Methodology
The methods that were used in this study can be summarized in three parts. The first part is the IF method for improving the controlling factors. The second part is the models for DFSM. Six models, i.e., SI, AHP, SI-AHP, SI-RF, AHP-RF, and SI-AHP-RF, were used to develop the debris flow susceptibility maps. The third part is the method that is used to verify the results. ROC curve analysis is used to verify the success rate and prediction rate of the debris flow susceptibility maps, and the Spearman's rank correlation coefficients are used to verify the spatial consistency of the debris flow susceptibility maps. Figure 4 shows the flow of the research methods.

Methodology
The methods that were used in this study can be summarized in three parts. The first part is the IF method for improving the controlling factors. The second part is the models for DFSM. Six models, i.e., SI, AHP, SI-AHP, SI-RF, AHP-RF, and SI-AHP-RF, were used to develop the debris flow susceptibility maps. The third part is the method that is used to verify the results. ROC curve analysis is used to verify the success rate and prediction rate of the debris flow susceptibility maps, and the Spearman's rank correlation coefficients are used to verify the spatial consistency of the debris flow susceptibility maps. Figure 4 shows the flow of the research methods.

The Information Fusion Method
The IF method is based on the Minkowski distance and Dempster-Shafer theory. The Minkowski distance, which is a distance function that is defined on eigenvector space [48,49] is used to measure the similarity between the controlling factors. The Dempster-Shafer theory is used to calculate the credibility degree of each controlling factor. The credibility degree is used as a weight to improve the layer of each controlling factor, and it is the result of the IF method. The calculation process of the credibility degree is as follows: Step 1: Assign the grade of each controlling factor from small to large while using a value from 1 to 4 or a value from 1 to 6. For example, for annual precipitation, the intervals 0-600 mm, 600-800 mm, 800-1000 mm, and >1000 mm was assigned values of 1, 2, 3, and 4, respectively. Finally, the column vector was obtained according to the values of all the disaster points in a controlling factor.
Step 2: Calculate the Minkowski distance according to column vectors of the controlling factors: where D ij is the Minkowski distance between controlling factor i and controlling factor j; x 1i and x 2i are the values of a disaster point in a column vector; and, m is a variable parameter.
Step 3: Obtain the similarity measure Matrix, according to the Minkowski distance.
Step 4: Calculate the support degree of the controlling factors: where Sup(X i ) is the support degree of controlling factor i; X i is the controlling factor i; and n is the number of controlling factors.
Step 5: Calculate the credibility degree of controlling factor i: where C f l(X i ) is the credibility degree of the controlling factor i; X i is the controlling factor i; X j is the controlling factor j; Sup(X i ) is the support degree of controlling factor i; Sup X j is the support degree of controlling factor j; and, n is the number of controlling factors.

The Statistical Index Model
The SI model is a binary statistical method, whose result can reflect the weights of the controlling factors. [50,51]. The weights are obtained by the following formula. where W ij is the weight of grade j in controlling factor i; M ij is the debris flow density of grade j in controlling factor i; M is the total density of debris flow within the map; D ij is the number of debris flow events of grade j in controlling factor i; D T is the number of debris flow events in the map; P ij is the number of pixels of grade j in controlling factor i; and, P T is the total number of pixels in the map.

The Analytic Hierarchy Process Model
The AHP model is a multistandard decision-making process and a common method for determining subjective weight [52,53]. There will be some uncertainty results due to the evaluation of different experts. This model can be described in four steps, as follows: Step 1: Establish a hierarchical analysis structure model for DFSM.
Step 2: Construct a pairwise comparison matrix: where A is the pairwise comparison matrix and a ij is the result of comparison between controlling factor i and controlling factor j.
Step 3: Calculate the weight vector from the pairwise comparison matrix. Determine the weight of each controlling factor.
Step 4: Check the consistency of the weights, and when the consistency ratio (CR) is less than or equal to 0.1 the result is considered reasonable.
where, CR is the consistency rate; λ max is the weighting and vector mean component; n is the number of controlling factors; and, RI is the degree of freedom index.

The Random Forest Model
The RF model, which can analyze the importance of classification features and determine the weight of each controlling factor, is a classification model that is composed of many decision trees [54,55]. The RF model includes two main kinds of algorithms: the GINI index algorithm and the out-of-bag (OOB) error rate replacement algorithm. The GINI index is used to calculate the impurity of nodes to measure the weight. The calculation process is as follows: Step1: Calculate the GINI index of node C: where GI c is the GINI index of node C; k is the K category of node C; and, P ck is the proportion of category K in node C.
Step 2: Calculate the importance of factor j in node C: where IRF ij GINI is the importance of factor j in node C; and, GI l and GI r represent the GINI values at the two new nodes that branch down. Step 3: Calculate the weight of controlling factor j: where IRF j is the weight of controlling factor j; n is the number of decision trees; and, m is the number of controlling factors.

The Integrated Model
Integrated models can make up for the shortcomings of individual models because of their ability to solve high-dimensional problems and high identification accuracy [7], which lead to more accurate results. Therefore, in this study, the SI model, the AHP model, and the RF model were integrated to get four integrated models: SI-AHP, SI-RF, AHP-RF, and SI-AHP-RF. The integration of individual models is achieved through the following steps: Step 1: Obtain the weight of each grade of the controlling factor according to the individual models.
Step 2: Standardize these weights in the data analysis module of Statistical Product and Service Solutions (SPSS) software.
Step 3: Obtain the new weights of each grade of the controlling factors in the integrated model by using the following formula: where ω is the new weights of each grade of the controlling factors in the integrated model and ω i and ω j are the standardized weights of each grade of the controlling factors in the individual models.

Combination of the IF Method and Six Models
The credibility degree that was obtained by the IF method is regarded as the improved weight of each controlling factor. Then, according to the standardized weight of each grade of the controlling factors obtained by using SI, AHP, SI-AHP, SI-RF, AHP-RF, and SI-AHP-RF models, the improved weights of the controlling factors were obtained by the following formula: where ω is the improved weight of controlling factor i, C f l(X i ) is the credibility degree of controlling factor i, and ω is the standardized weight of controlling factor i that was obtained by the selected model.

Validation of Debris Flow Susceptibility Maps
To compare the performance of different debris flow susceptibility maps and ensure the reliability of the IF method, the ROC curve and the Spearman's rank correlation coefficient were selected.

ROC Curve
The ROC curve is drawn from a series of two-category methods (demarcation values or decision thresholds) and it uses sensitivity as the ordinate and specificity as the abscissa. The area under the curve is between 0.5 and 1. The larger the area is, the better the effect of the model [41].

Spearman's Rank Correlation Coefficient
Spatial consistency can be interpreted as the similarity of the debris flow susceptibility assessment results in the spatial distribution. The Spearman's rank correlation coefficient, which is a different nonparametric measure of the correlation of variables, was used to evaluate the spatial consistency between the two different debris flow susceptibility maps. The calculation process is as follows: Step 1: Obtain the column vectors according to the grade of debris flow susceptibility of all the debris flow points. The grades of debris flow susceptibility were assigned values of 1 to 4 from small to large.
Step 2: Calculate the difference D between two column vectors: where D is the difference between two column vectors; X and Y are the column vectors that were obtained from different debris flow susceptibility maps; R(X i ) and R(Y i ) are the value of the debris flow susceptibility grade corresponding to a disaster point; and, N is the number of debris flow points.
Step 3: Calculate the correlation between two debris flow susceptibility maps: where R s is the Spearman's rank correlation coefficient; D is the difference between two column vectors; and, N is the number of debris flow points.

The Results of the Information Fusion Method
The correlation between the selected controlling factors was expressed in terms of the magnitude of the Minkowski distance value. The smaller the Minkowski distance between the controlling factors, the higher the similarity between them. The support degree and the credibility degree of the selected controlling factors are shown in Table 2. The topography has the highest credibility degree value (0.157), and the lowest credibility degree value is 0.086 for elevation.

DFSM using Six Models Based on Original Controlling Factors
Based on the original controlling factors, the standardized weights, which are shown in the Table 3, were calculated by the selected six models, i.e., SI, AHP, SI-AHP, SI-RF, AHP-RF, and SI-AHP-RF. The debris flow susceptibility maps ( Figure 5) were finally obtained by superimposing the layers of the controlling factors according these weights in the ArcGIS software. The susceptibility of debris flow was divided into four grades-low, moderate, high, and very high-according to the natural fracture method [56].

DFSM Using Six Models Based on Improved Controlling Factors
Based on the improved controlling factors, the new standardized weights of controlling factors, which are shown in the Table 4, were calculated by the selected six models i.e., IF-SI, IF-AHP, IF-SI-AHP, IF-SI-RF, IF-AHP-RF, and IF-SI-AHP-RF. The debris flow susceptibility maps ( Figure 6) were finally obtained by superimposing the layers of the improved controlling factors, according to these new weights in the ArcGIS software.

DFSM Using Six Models Based on Improved Controlling Factors
Based on the improved controlling factors, the new standardized weights of controlling factors, which are shown in the Table 4, were calculated by the selected six models i.e., IF-SI, IF-AHP, IF-SI-AHP, IF-SI-RF, IF-AHP-RF, and IF-SI-AHP-RF. The debris flow susceptibility maps ( Figure 6) were finally obtained by superimposing the layers of the improved controlling factors, according to these new weights in the ArcGIS software.

Results of the ROC Curve
The success rate comes from the training dataset, and the prediction rate comes from the validation dataset. The pairwise comparison results between the models are show in Figure 7;

Results of the ROC Curve
The success rate comes from the training dataset, and the prediction rate comes from the validation dataset. The pairwise comparison results between the models are show in Figure 7; Figure 8 shows, which reveal an improvement in the performance of the models based on the IF method.

Results of Spatial Consistency Analysis
The smaller the Spearman's rank correlation coefficient is, the greater the spatial consistency between the two debris flow susceptibility maps. As shown in Table 5, the Spearman's rank correlation coefficients between the debris flow susceptibility maps that were obtained by the same models, such as SI and IF-SI, are obviously smaller than the coefficients between other maps. This phenomenon indicates that there was a high spatial consistency between the debris flow susceptibility maps that were obtained by the same models, which proves that the IF method is indeed effective. In addition, the results show that there is also a high degree of spatial consistency between IF-SI-AHP-RF, IF-SI-RF, and IF-AHP-RF, and low spatial consistency between the remaining maps. Table 5. The Spearman's rank correlation coefficients between the debris flow susceptibility maps. 6. Discussion

Comparison of Debris Flow Susceptibility Maps
As shown in Figures 7 and 8, the success rate and the prediction rate of the twelve debris flow susceptibility maps are more than 0.8, which indicates that the debris flow susceptibility maps are credible. The IF-SI-RF model, which has the highest success and prediction rates, can be considered to be the best model. The results of the best IF-SI-RF model show that the area ratios of low, moderate, high, and very high were 37.7%, 21.4%, 25.5%, and 15.4%, respectively, and the areas with high susceptibility are distributed mainly in the middle and low mountain areas in the east of the study area.
The success and prediction rates of debris flow susceptibility maps that are based on the improved controlling factors are significantly better than those that are based on the original controlling factors. As shown in Table 6, the success rates of SI, AHP, SI-AHP, SI-RF, AHP-RF, and SI-AHP-RF increased by 4.1%, 5.5%, 5.7%, 4.3%, 5.1%, and 5.3%, respectively, and the prediction rates increased by 5.1%, 6.3%, 5.7%, 5.2%, 6.5%, and 7.1%, respectively, which proves that the IF method can improve the rationality of the controlling factors. In addition, the results of six different types of models were significantly improved, which shows that the scope of application of the IF method is extensive.

Why the IF Method Can Improve the Controlling Factors
It is also necessary to analyze the reasons why the IF method can improve the controlling factors. First, there is an inevitable correlation between the selected controlling factors, because a controlling factor will have an impact on other controlling factors. For example, in the plain area, the slope angle is relatively small. In addition, when the study area is relatively large, the geological environment conditions are complex and diversified, and the same controlling factors will play different roles in different areas. Thus, there will be conflicts between the controlling factors. The IF method can weaken the correlations and conflicts between the controlling factors. In addition, when the principal component analysis method is used to analyze the controlling factors, the number of controlling factors will be reduced. However, the IF method can make full use of the original data, therefore, it can improve the controlling factors, further improving the performance of the debris flow susceptibility maps.

Spatial Consistency Analysis of Debris Flow Susceptibility Maps
The improvement in the success rate and prediction rate of debris flow susceptibility maps is not enough to show the effectiveness of the IF method. Therefore, the spatial consistency of the debris flow susceptibility maps is further analyzed. When the two debris flow susceptibility maps, which were obtained by the same model that was based on the original controlling factors and the improved controlling factors, show high spatial consistency, the improvement in the controlling factors is persuasive. In contrast, if the spatial consistency varies greatly, the improvement in the controlling factors is incorrect. Therefore, to ensure the reliability of the IF method, it is necessary to test the spatial consistency between the debris flow susceptibility maps. The final analysis results show that there is good spatial consistency between the two debris flow susceptibility maps that were based on the original controlling factors and the improved controlling factors, which further proves that the IF method is effective.

Conclusions
The IF method is proposed and verified for DFSM by taking Jilin province, China, as a study area. Based on field investigations and historical data, nine debris flow controlling factors were selected and improved by the IF method. The SI, AHP, SI-AHP, SI-RF, AHP-RF, and SI-AHP-RF models were used to develop debris flow susceptibility maps, and the results were compared and verified. The conclusions are as follows.
The success and prediction rates of debris flow susceptibility maps that are based on improved controlling factors are significantly better than those based on the original controlling factors. In addition, according to the results of spatial consistency analysis, the two debris flow susceptibility maps, which were obtained by the same model based on the original controlling factors and the improved controlling factors, have a high spatial consistency. Therefore, the conclusion that the IF method can improve the controlling factors for DFSM is trustworthy.
Regarding the results of the best IF-SI-RF model, areas with high susceptibility are mainly distributed mainly in middle and low mountain areas in the east of the study area. The results of this study can provide reliable information for the prevention and management of debris flow disasters in the study area and they have significance for reducing and avoiding the losses that are caused by debris flow.