Novel Entropy and Rotation Forest-Based Credal Decision Tree Classifier for Landslide Susceptibility Modeling

Landslides are a major geological hazard worldwide. Landslide susceptibility assessments are useful to mitigate human casualties, loss of property, and damage to natural resources, ecosystems, and infrastructures. This study aims to evaluate landslide susceptibility using a novel hybrid intelligence approach with the rotation forest-based credal decision tree (RF-CDT) classifier. First, 152 landslide locations and 15 landslide conditioning factors were collected from the study area. Then, these conditioning factors were assigned values using an entropy method and subsequently optimized using correlation attribute evaluation (CAE). Finally, the performance of the proposed hybrid model was validated using the receiver operating characteristic (ROC) curve and compared with two well-known ensemble models, bagging (bag-CDT) and MultiBoostAB (MB-CDT). Results show that the proposed RF-CDT model had better performance than the single CDT model and hybrid bag-CDT and MB-CDT models. The findings in the present study overall confirm that a combination of the meta model with a decision tree classifier could enhance the prediction power of the single landslide model. The resulting susceptibility maps could be effective for enforcement of land management regulations to reduce landslide hazards in the study area and other similar areas in the world.


Introduction
Landslides, one of the most frequent geological hazards in China, cause thousands of millions of dollars in damage, dozens of casualties, and many geological environment problems every year [1][2][3][4][5]. In order to reduce the losses caused by landslides, predicting the areas where landslides are most likely to occur has become more important [3,6]. Landslide susceptibility research is an important approach to predicting the spatial distribution of landslides, which can be regarded as the spatial probability of landslide occurrence, according to a series of geoenvironmental conditions [7].
A landslide is a pattern of transforming the Earth's surface under the influence of human activities [8][9][10]. Landslide is complex movement under the action of multiple factors, such as altitude, slope angle, rainfall, lithology, land use, and so on [11][12][13]. In recent years, more researchers Altitude, which greatly influences topographic attributes and controls differences in vegetation distribution, is one of the most commonly used factors in landslide susceptibility studies [70][71][72]. The altitude map (Figure 2a) was achieved from ASTER GDEM data with a resolution of 30 m collected from the National Aeronautics and Space Administration (NASA) [73]. In addition, DEM data were used to generate slope angle (Figure 2b [74,75]. Distances to rivers, which can influence the hydrologic processes of a slope, were obtained by buffering the river network from the topographic maps at the 1:50,000 scale ( Figure 2i). Meanwhile, distances to roads were constructed by the same method from the road distribution maps (Figure 2j). This can be regarded as the impact of human activities on landslides, which causes a loss of toe support and changes the landform. NDVI is an index that shows the vegetation growth state and coverage. It can affect the stability of landslides through the reinforcement of plant roots and the permeability of surface soil (Figure 2k) [76][77][78].
The physical and mechanical properties of soil vary with soil type. They also influence the infiltration of surface water and the flow of ground water [79,80]. The soil types in the study area Altitude, which greatly influences topographic attributes and controls differences in vegetation distribution, is one of the most commonly used factors in landslide susceptibility studies [70][71][72]. The altitude map (Figure 2a

Index of Entropy (IoE)
The entropy of a landslide refers to the extent to which various conditioning factors influence its development [20]. The equations used to calculate the information coefficient Wj representing the weight values for the various conditioning factors [17,18] are as follows:

Index of Entropy (IoE)
The entropy of a landslide refers to the extent to which various conditioning factors influence its development [20]. The equations used to calculate the information coefficient W j representing the weight values for the various conditioning factors [17,18] are as follows: H jmax = log 2 S j , S j is the number of classes (3) where H j and H jmax are the entropy values, I j is the information coefficient, and W j is the resulting weight value for the factors as a whole [21].

Credal Decision Tree
The credal decision tree (CDT) was proposed by Abellán and Moral in 2003 to address classification problems with credal sets [60]. During the construction process of a CDT, to avoid generating a too-complicated decision tree, a novel criterion was introduced: stop when the total uncertainty increases due to branching of the decision tree [88]. Based on Dempster's and Shafer's theory [89,90], an improved method was created to quantitatively measure the total uncertainty of credal sets. The function used in total uncertainty measurement can be briefly expressed as Equation (7): where ξ is a credal set on frame X, TU represents the value of total uncertainty, IG is a general function of nonspecificity on the corresponding credal set, and GG is a general function of randomness for a credal set. Abellán and Moral acquired sequences of conclusions and achievements related to total uncertainty measurement [91,92], and the calculation procedure of TU and properties of this measure are described systematically in relevant references. The imprecise Dirichlet model [93] was employed to compute the probability intervals of a variable. Suppose that Z is a variable whose values are represented by z j , and the corresponding probability distribution p(z j ) satisfies Equation (8) [94]: where n z j is the number of occurrences of the event where Z = z j , N is the sample size, and S is a hyperparameter whose value is usually 1 or 2, according to Walley [93].

Rotation Forest
Generally, it is considered that classifier ensembles can improve the performance of a single classifier [59]. As a novel technique to construct classifier ensembles, the rotation forest (RF) model has been widely used in landslide susceptibility mapping with the aim of acquiring better prediction accuracy [95][96][97]. Suppose that X is the original training data, and X can be written as an N × n matrix (N is the number of training samples, and n is the number of features). The corresponding class label set and feature set can be denoted as Y and F, respectively. Assume that L is the total number of decision tree classifiers in the RF algorithm, and the ith decision tree is Di (i = 1, 2, . . . , L). In the RF algorithm, F is first randomly split into k subsets. We can then obtain F ij (the jth feature subset for the ith decision tree) and X ij (the training data for features in F ij ). Based on the bootstrap approach, a nonempty subset X ij is generated, whose size is 75% of the original training data. In the next step, an M × 1 (M = n/k) coefficient vector is obtained by using linear transformation on X ij , and the coefficient vector can ij . Subsequently, a sparse rotation matrix R i can be created, shown as Equation (9): In this way, the new training dataset for D i can be calculated as Equation (4), and all the single decision tree classifiers will be trained in a parallel manner [98].
where R a i is the new sparse rotation matrix formed by rearranging the columns of R i according to the original feature set.

Bagging
Bagging is an abbreviation for "bootstrap aggregating", which is a technique to raise the accuracy of machine learning algorithms [61]. The main idea of bagging is that it generates an ensemble classifier composed of multiple base classifiers that are constructed with various bootstrapped training sets [99]. Bagging not only contributes to decreasing the classification variance but also can improve the generalization capability of the ensemble classifier [61]. It has been proved that the combining rule of base classifiers may have a notable effect on bagging performance [100]. Currently, the majority vote combining rule has been adopted extensively in bagging. The ultimate classification result can be obtained by the formula demonstrated in Equation (11): where 1(C i (x) = y) is the indicator function.

MultiBoostAB
MultiBoostAB is the Waikato Environment for Knowledge Analysis (WEKA) version of MultiBoosting [62]. In essence, MultiBoosting is a combination of AdaBoost and wagging, a variant of bagging [101]. AdaBoost and bagging are two widely used techniques in the field of ensemble learning [96,99,102]. It was demonstrated that AdaBoost could remarkably decrease the bias and variance of classifiers, while bagging only had an attenuation effect on variance [103]. However, it has been proved that bagging has better performance in error reduction [61]. Compared with bagging, wagging determines random instance weights with the continuous Poisson distribution. Suppose that i is the number of subcommittees, I i is a variable to limit the iterations of the ith subcommittee, and T represents the number of iterations. Values of I i can be calculated by Equation (12): In the process of iteration, the weighted errors on training sets can be figured out by Equation (13).
β t depends on the corresponding value of error, and the final classification function is shown as Equation (14) [101]: Entropy 2019, 21, 106 10 of 24 where ε t refers to the weighted error, m is the number of examples in the training sequence, and C t (x j ) is the classification result of the tth base classifier.

Selection of Landslide Conditioning Factors
In the present study, the index of entropy model was used to reduce the unevenness among the factors and thereby provide a realistic status of their impact on landslide susceptibility ( Table 2) [104]. The results of each class of the conditioning factors were then extracted as inputs to calculate the importance of conditioning factors and modeling landslide susceptibility. The result of the importance of conditioning factors by correlation attribute evaluation (CAE) [105] is shown in Table 3

Generation of Landslide Susceptibility Maps
After the training and validation processes of landslide models, landslide susceptibility maps were obtained in the following two steps. First, the probability of landslide occurrence (PLO) for each pixel was generated using the probability distribution functions of the CDT and RF-CDT models. In the second step, PLOs were reclassified by mathematical methods, such as standard deviation, equal interval, natural break, geometric interval, and quantile. In this study, the quantile method was exploited to divide the PLOs into five categories: very low, low, moderate, high, and very high. The quantile method is a standard classification method in ArcGIS software that provides a more comprehensive analysis for both linear and nonlinear models in practical problems and makes a useful supplement for general regression models [106,107]. Therefore, the landslide susceptibility mappings (LSMs) in this research were classified by the quantile method. Figures 3 and 4 present the results of LSMs for the CDT and RF-CDT models, respectively.       To further demonstrate the feasibility of the RF-CDT model in the landslide susceptibility study, two ensemble models, consisting of the CDT model as well as bagging and MultiBoostAB, were introduced to the benchmark models. The establishment, training, validation, and assessment processes of the benchmark models were the same as with the RF-CDT model, and landslide susceptibility maps generated by the benchmark models are shown in Figures 5 and 6. Area percentages of landslide susceptibility classes of all models are shown in Figure 7.

Model Validation and Comparison
In landslide susceptibility modeling, it is essential to validate and compare the quality of results. Validation of the results is regarded as one of the most important aspects of landslide susceptibility research, and the assessment results will not show scientific significance without validation [34,108]. In this paper, the prediction ability of the four models was evaluated using the receiver operating characteristic (ROC) curve [109,110]. The ROC curves and the parameters of the ROC curves using the training dataset are shown in Figure 7 and Table 4, respectively. Similarly, the ROC curves and the parameters of the ROC curves using the validation dataset are shown in Figure  8 and Table 5, respectively. In the training dataset, the RF-CDT model has the highest area under the ROC curve (AUC) value (0.813), followed by the bag-CDT model (0.809), the MB-CDT model (0.788), and the CDT model (0.779). The model with the highest AUC value for the validation dataset was RF-CDT (0.759), followed by bag-CDT (0.740), MB-CDT (0.729), and CDT (0.663). It can be concluded that the RF-CDT model had the best performance in both training and validation processes. All the evaluation results were obtained under a confidence interval (CI) at 95%.

Model Validation and Comparison
In landslide susceptibility modeling, it is essential to validate and compare the quality of results. Validation of the results is regarded as one of the most important aspects of landslide susceptibility research, and the assessment results will not show scientific significance without validation [34,108]. In this paper, the prediction ability of the four models was evaluated using the receiver operating characteristic (ROC) curve [109,110]. The ROC curves and the parameters of the ROC curves using the training dataset are shown in Figure 7 and Table 4, respectively. Similarly, the ROC curves and the parameters of the ROC curves using the validation dataset are shown in Figures 8 and 9 and Table 5, respectively. In the training dataset, the RF-CDT model has the highest area under the ROC curve (AUC) value (0.813), followed by the bag-CDT model (0.809), the MB-CDT model (0.788), and the CDT model (0.779). The model with the highest AUC value for the validation dataset was RF-CDT (0.759), followed by bag-CDT (0.740), MB-CDT (0.729), and CDT (0.663). It can be concluded that the RF-CDT model had the best performance in both training and validation processes. All the evaluation results were obtained under a confidence interval (CI) at 95%.

Model Validation and Comparison
In landslide susceptibility modeling, it is essential to validate and compare the quality of results. Validation of the results is regarded as one of the most important aspects of landslide susceptibility research, and the assessment results will not show scientific significance without validation [34,108]. In this paper, the prediction ability of the four models was evaluated using the receiver operating characteristic (ROC) curve [109,110]. The ROC curves and the parameters of the ROC curves using the training dataset are shown in Figure 7 and Table 4, respectively. Similarly, the ROC curves and the parameters of the ROC curves using the validation dataset are shown in Figure  8 and Table 5, respectively. In the training dataset, the RF-CDT model has the highest area under the ROC curve (AUC) value (0.813), followed by the bag-CDT model (0.809), the MB-CDT model (0.788), and the CDT model (0.779). The model with the highest AUC value for the validation dataset was RF-CDT (0.759), followed by bag-CDT (0.740), MB-CDT (0.729), and CDT (0.663). It can be concluded that the RF-CDT model had the best performance in both training and validation processes. All the evaluation results were obtained under a confidence interval (CI) at 95%.

Discussion
Landslides have caused much financial loss and have threatened the safety of humans all over the world [111]. Various approaches have been used to study landslide susceptibility, and the research methods have evolved from simple statistical models to machine learning models. In order to achieve precise evaluation results, the use of new models in landslide susceptibility research has become more important. In this study, we chose the credal decision tree (CDT) as the basic model and combined it with rotation forest (RF), bagging (bag), and MultiBoostAB (MB) models to build ensemble models.
As there are no standards for selecting landslide conditioning factors [112], how to determine the conditioning factors has become a very important issue. In order to deal with it reasonably, the selection of conditioning factors in this paper was based on the geoenvironmental characteristics of the study area, the mechanism of landslide occurrence, and similar landslide susceptibility studies.
According to the importance analysis by the CAE model, it can be concluded that the NDVI, a commonly used conditioning factor that indicates the state of plant growth in the study area, is the most important landslide conditioning factor. According to its definition, the interval of NDVI value is [-1, 1] and the higher the value, the better the vegetation growth. The study area lies in hilly and valley regions of the Weibei dry plateau, one of the key areas of soil and water loss of Shaanxi Province, and rainfall is mainly concentrated from July to September. Therefore, under the joint action of uneven distribution of rainfall and serious soil erosion, the vegetation growth of the study

Discussion
Landslides have caused much financial loss and have threatened the safety of humans all over the world [111]. Various approaches have been used to study landslide susceptibility, and the research methods have evolved from simple statistical models to machine learning models. In order to achieve precise evaluation results, the use of new models in landslide susceptibility research has become more important. In this study, we chose the credal decision tree (CDT) as the basic model and combined it with rotation forest (RF), bagging (bag), and MultiBoostAB (MB) models to build ensemble models.
As there are no standards for selecting landslide conditioning factors [112], how to determine the conditioning factors has become a very important issue. In order to deal with it reasonably, the selection of conditioning factors in this paper was based on the geoenvironmental characteristics of the study area, the mechanism of landslide occurrence, and similar landslide susceptibility studies.
According to the importance analysis by the CAE model, it can be concluded that the NDVI, a commonly used conditioning factor that indicates the state of plant growth in the study area, is the most important landslide conditioning factor. According to its definition, the interval of NDVI value is [−1, 1] and the higher the value, the better the vegetation growth. The study area lies in hilly and valley regions of the Weibei dry plateau, one of the key areas of soil and water loss of Shaanxi Province, and rainfall is mainly concentrated from July to September. Therefore, under the joint action of uneven distribution of rainfall and serious soil erosion, the vegetation growth of the study area is relatively low, and the NDVI interval is [−0.09, 0.39]. In addition, many studies have indicated that plants play a positive role in landslide occurrence because their root systems can increase soil strength and reduce water infiltration [113][114][115].
In the case of land use, the average merit is 0.191. It is well known that land use has a close relationship with human activities and may affect soil and water loss, precipitation infiltration, and surface structure [116]. It can be seen in Figure 2m that farmland is the main type of land use. As the study area is located in the Weibei dry plateau, the infiltration of agricultural water will increase slope mass and reduce soil strength, which makes landslides occur more easily. It can be seen in Figures 4-7 that most landslides occur in low-altitude areas with nearby linear conditioning factors, such as distance to roads and rivers. Correspondingly, we can find that landslides decrease as we move away from roads and rivers. These results can also be found in similar studies [117,118].
According to the parameters of ROC curves of the training and validation datasets, the RF-CDT model reflected the spatial distribution of landslides perfectly, while the CDT model had the lowest accuracy rate. The rotation forest model is a powerful new machine learning method that has been widely used in many fields and performed admirably in previous landslide susceptibility studies [32,49,119]. The bag-CDT model performed worse than the RF-CDT model, and its AUC values of training and validation datasets were 0.809 and 0.740, respectively. The MB-CDT model ranked third, with training and validation dataset AUC values of 0.788 and 0.729, respectively.
In a nutshell, the ensemble models in this paper expressed more promising results compared to single evaluation models in current studies [96,120,121]. Based on the CDT model combined with the RF, bag, and MB models, landslide susceptibility in Linyou County was studied. As mentioned above, the RF-CDT model performed best in this research compared to other models. This raised a question as to why AUC values increased rapidly with the CDT model combined with the RF model. Perhaps the answer to this question can be explained as "slightly underperformed," which means that there should be a threshold for positive synergy among models [122,123]. In this paper, the RF model had the best cooperation with the CDT model. However, limits in different models have different interconnection rules that may be difficult to determine, especially when facing a series of factors with various ranges.

Conclusions
The present study allowed us to reach the following conclusions: (1) The importance of conditioning factors was quantitatively defined by CAE. All 15 conditioning factors were applied to create the landslide susceptibility maps, and NDVI had the highest importance of all the conditioning factors.
(2) The proposed hybrid RF-CDT model, with AUC values of 0.813 and 0.759, achieved good results in the training and validation phases compared to the single CDT model.
(3) The performance of the proposed hybrid RF-CDT model was also compared with the hybrid bag-CDT and MB-CDT models, and the results of AUC, SE, and CI at 95% also indicate that the RF-CDT model is a promising method.
As a final remark, it is worth noting that the present study indicates that machine learning ensemble frameworks are promising techniques, and the obtained susceptibility maps may be employed to manage land use planning and landslide risk mitigation.
Author Contributions: Q.H., Z.X., R.L., S.Z. and W.C. collected field data and conducted the landslide mapping and analysis. Q.H., Z.X., R.L., S.Z. and W.C. wrote the manuscript. S.L., N.W. and B.P. provided critical comments in planning this paper and edited the manuscript. All the authors discussed the results and edited the manuscript.