Landslide Susceptibility Mapping Using Rotation Forest Ensemble Technique with Different Decision Trees in the Three Gorges Reservoir Area, China

: This study presents a new ensemble framework to predict landslide susceptibility by integrating decision trees (DTs) with the rotation forest (RF) ensemble technique. The proposed framework mainly includes four steps. First, training and validation sets are randomly selected according to historical landslide locations. Then, landslide conditioning factors are selected and screened by the gain ratio method. Next, several training subsets are produced from the training set and a series of trained DTs are obtained by using a DT as a base classiﬁer couple with different training subsets. Finally, the resultant landslide susceptibility map is produced by combining all the DT classiﬁcation results using the RF ensemble technique. Experimental results demonstrate that the performance of all the DTs can be effectively improved by integrating them with the RF ensemble technique. Speciﬁcally, the proposed ensemble methods achieved the predictive values of 0.012–0.121 higher than the DTs in terms of area under the curve (AUC). Furthermore, the proposed ensemble methods are better than the most popular ensemble methods with the predictive values of 0.005–0.083 in terms of AUC. Therefore, the proposed ensemble framework is effective to further improve the spatial prediction of landslides.


Introduction
Landslides are one of the most serious natural disasters in the world, causing a large number of casualties each year [1]. Therefore, it is crucial to perform landslide susceptibility mapping (LSM) to prevent and reduce damages. In recent decades, many methods on landslide susceptibility analysis have been proposed and can be mainly divided into two groups, i.e., qualitative and quantitative [2]. Qualitative methods have been widely used for LSM, such as weighted linear combination [3], multi-criteria evaluation [4] and ordered weighted averaging [5]. Quantitative methods mainly depend on the relationship between influencing factors and landslide occurrences and can be grouped into two categories, i.e., physically-based methods and data-driven approaches. Physically-based methods assess landslide susceptibility based on simplified physically modeling strategy [6], while data-driven approaches develop a functional relationship between conditioning factors and the past and historical landslide events [7], including weights of evidence [8][9][10], frequency ratio [11], random forest [12,13], artificial neural network (ANN) [14,15], convolutional neural networks [16,17], and support vector machine (SVM) [18][19][20].
Nowadays, the ensemble framework has become a hot issue in the field of machine learning and pattern recognition [21]. Many studies validated that the combined paradigm is better than individual classifiers [22][23][24]. The ensemble techniques have been also used in

Description of Study Area
The study area is located in China and has an area of 446.32 km 2 and its altitude is in the range of 80-2000 m mean sea level ( Figure 1). The Zigui-Badong section of the Three Gorges reservoir is in the subtropical monsoon climate zone and the study area has sufficient rainfall and humidity. During 2001-2010, the annual average precipitation in Zigui and Badong Counties is 944.5 and 1069.2 mm, respectively. Abundant rainfall is one of the main conditioning factors for the frequent occurrence of geological disasters in the reservoir area [40].

Preparation of the Database
Historical landslide locations were employed to construct relevant landslide susceptibility models. Consequently, an accurate landslide inventory map is particularly important for LSM. In this study, a total of 196 landslide locations were identified through field surveys, historical landslide records and Google Earth images visual interpretation, and the distribution of the landslide locations is shown in Figure 1. To construct the landslide susceptibility models, the training and validation sets are required. In this work, the 196 landslide locations were randomly divided into two parts: 70% (137 locations) were used as training samples and the remaining 59 landslide locations for validation. To predict nonlandslide areas, the same number (137 and 59) of non-landslide locations was randomly selected to construct the training and validation sets for prediction. The selection of conditioning factors is an important step of LSM. There are many conditioning factors that trigger landslides [1]. In this study, 20 landslide conditioning factors were selected based on expert knowledge and literature review [35,[41][42][43][44][45], including altitude, aspect, catchment area, catchment slope, curvature, distance to rivers, slope, slope form, terrain position index (TPI), terrain ruggedness index (TRI), terrain surface convexity (TSC), terrain surface texture (TST), topographic wetness index (TWI), lithology, distance to faults, land use, rainfall, magnitude, normalized difference vegetation index (NDVI), and normalized difference water index (NDWI). Table 1 shows the information of landslide conditioning factors. Land use Using a support vector machine method to classify the images into five land use classes with an overall accuracy of 93.95%. Landsat 7 ETM + images NDVI Calculated from remote sensing images using the ENVI software [49,50]. NDWI

Magnitude
Using a Kriging interpolation method to generate magnitude raster data.
Historical earthquakes and instruments monitored data since 1970

Rainfall
Using an inverse distance weighted spatial interpolation method to generate the rainfall factor. 6 rainfall stations

Methodology
The proposed framework is based on the integration of DTs and the RF ensemble technique. The flowchart is illustrated in Figure 2 and three main steps in this framework as follows: (1) Data acquisition and preprocessing. In this work, historical landslide events and landslide conditioning factors are acquired to perform spatial prediction of landslide occurrence. Specifically, the historical landslide locations are produced by past landslide records and remote sensing images. Meanwhile, a series of related conditioning factors are selected for LSM and screened using the GR method. Afterwards, these data are resampled with the same grid size. Finally, the training and validation sets are produced for constructing and testing landslide prediction methods. (2) Construct prediction methods and produce landslide susceptibility maps. The ensemble framework is first performed to optimize the original datasets using the training set. Then, the base classifier of DT is applied to the screened datasets for spatial prediction of landslides. Next, the RF ensemble technique is used for landslide susceptibility modeling. Finally, landslide susceptibility maps are obtained using the constructed prediction methods. (3) Verification and comparison. The predictive performance of the proposed ensemble framework is evaluated using the objective criteria of ROC and AUC.

Gain Ratio Method
Gain ratio (GR) is a widely applied factors selection method in LSM. It can determine the importance of each landslide conditioning factor through assigning a weight to each feature based on its capability [51]. Let T be a training set and n the total of instances, the GR on attribute X is briefly calculated as follows.
where Gain(X) is the information gain of attribute X and SplitIn f o X (T) is inferred the split information value. Gain(X) and SplitIn f o X (T) are calculated by following equations: where p i is the prior probability and m represents the number of values of attribute X. The H(T) is the expect information entropy of data set T and is defined as follows: The final calculated average merit (AM) reveals the importance of conditioning factors to the occurrence of landslides.

Alternating Decision Tree
It is known that the AdaBoost algorithm is an important machine learning technique [52]. Thus, it is natural to combine the techniques of boosting and DTs to obtain reliable classifiers, whose results are based on majority voting over several DTs. For instance, two popular boosting DTs of CART and C4.5 have been widely used. However, the interpretation of these classifiers is a challenging problem. The ADT is a combination of DTs with boosting that produce classification rules that are easier to interpret [53].

Forest by Penalizing Attributes
Recently, a novel decision forest approach of FPA was presented [54]. This approach has the following advantages. First, a series of high-precision DTs can be obtained by this approach using not only a subset of but all non-class attributes in a dataset. Second, penalties are imposed to those attributes that are used in the current tree to produce the following trees to encourage better diversity. Finally, this approach is capable of gradually increasing weights from the attributes that have not been validated in the following tree(s). Consequently, this approach can ensure the optimized prediction accuracy.

Functional Tree
The main idea of the FT framework is to build multivariate trees for classification and regression problems [55]. In this framework, both functional decision and leaf nodes are produced for prediction problems when growing and pruning the tree, respectively. As the behavior of FT, the employment of functional decision and leaf nodes can be considered as a bias and variance reduction process, respectively. Furthermore, it is favorable for multivariate methods to use linear functions both at decision nodes and leaves, especially for large datasets.

Logistic Model Tree
The LMT, integrating standard DT classifier and LR function, is a classification tree method which is evaluated more efficiently than simple LR of C4.5 model. [56]. In the LMT algorithm, a DT is defined as a tree structure with the LR functions at the leaves. This approach employs the LogitBoost and C4.5 algorithms for building an LR function at each node and pruning, respectively. The LogitBoost is capable of providing a novel strategy for choosing the attributes involved in the LR function.

Hoeffding Tree
The Hoeffding tree is an incremental DT induction algorithm that is capable of learning from large data streams based on the assumption that the data distribution is fixed over time [57]. It grows incrementally a DT based on the theoretical guarantees of the Hoeffding bound, which can measure the number of observations that can compute statistics with a specified accuracy. This theoretical advantage can ensure that this algorithm can demonstrate better performance than other incremental DT methods and cost less computational time.

Rotation Forest Ensemble
RF is a classifier ensemble method using independently trained DTs [58] which aims at constructing accurate and diverse classifiers. Different from the idea of random forest, each tree in RF is trained on the entire dataset in a rotated feature space. In the tree-induced prediction methods, the clusters are always parallel to the feature axes. Thus, any rotation of the axes may produce a very different tree.
Assuming that M represents the number of DTs, RF trains M DTs independently and uses a new different dataset whose features are extracted for each tree. Let x = {x 1 , x 2 , . . . , x n } T be a sample characterized with n attributes, D = {D 1 , D 2 , . . . , D M } be the ensemble of M classifiers, X and N × n matrix denote the training instances and the feature set, respectively. The RF algorithm is briefly introduced as follows: (1) To construct the training set for the RF algorithm, the feature set with n features is randomly divided into K subsets, and thus each feature subset consists of M = n/K features. (3) Repeat the previous steps to obtain the K sets of PC coefficients and put these PC coefficients into the Matrix R as follows: (4) Multiply the original dataset X with this Matrix (5) to obtain the new feature dataset and the base classifier is trained using this feature dataset. (5) Repeat the previous steps to obtain trained base classifiers. (6) For a given unknown sample for prediction, each base classifier produces a class probability value, and all the class probabilities are combined to obtain the final prediction probability.
It should be noted that different features can be obtained by the feature set with various ways for partition. Therefore, RF can construct accurate and diverse classifiers for landslide prediction.

Model Evaluation Criteria
The performance of prediction methods is commonly assessed using the ROC curve technique [59]. It is constructed by plotting two values which are true positive (TP) rate and false positive (FP) rate [60,61]. Furthermore, the area under ROC curve (AUC) has been often applied to quantitatively assess the performance of LSM methods [62][63][64]. More specifically, a LSM method is confirmed good if the AUC value is near to 1 [65,66]. Meanwhile, two statistical criteria of OA and MCC were also used in our experiments as follows: where TP and TN (true negative) denote the number of landslide and non-landslide samples that are correctly classified, whereas FP and FN (false negative) represent the number of non-landslide and landslide instances that are misclassified, respectively. In addition, the Chi Square test is another crucial statistical approach that is widely applied to assess the significant difference among expected models [67]. The statistical indexes of Chi-square and p values are calculated and ranked. If the Chi-square value is higher than the standard value of 3.841 and p value is smaller than 0.05, the difference among the methods is significant [68].

Importance Evaluation of Landslide Conditioning Factors
In this work, the predictive ability of all the landslide conditioning factors were obtained before constructing landslide susceptibility framework. Generally, a factor with higher AM value is confirmed important to landslide susceptibility modelling. In present study, the factor with AM value of zero is removed for further analysis. The AM value of each conditioning factors is shown in Figure 3. It can be observed that distance to rivers and altitude have the highest prediction capability with the AM values of 0.3624 and 0.2744, respectively, indicating that the two factors are more significant than the other factors. Most of the other factors have the AM values between 0.0105 and 0.1006, including NDWI, NDVI, land use, TST, distance to faults, TWI, TRI, lithology, curvature, catchment area, TSC, TPI, slope, catchment slope, slope form and magnitude. In addition, the AM values of the remaining factors are positive but less than 0.01, indicating that little contribution is provided to the methods by aspect and rainfall. Therefore, all the conditioning factors were used for the subsequent steps of LSM.

Conditioning Factors Analyses Using Frequency Ratio
The results of spatial relationship between landslide locations and related conditioning factors using the frequency ratio (FR) model are shown in Appendix A Table A1. The frequency ratio method can evaluate the sub-classes of specific factors and provide useful instructions for decision-makers to understand the conditioning factors related to landslides and make better policies [11,16,30]. The higher FR value shows that landslide hazards are more prone to occur in corresponding zone [69]. Specifically, with regard to altitude, the class of <300 m has the highest FR value of 4.08, whereas the other classes have lower probability for landslide occurrence because the FR values are near to 0. The FR analysis of the aspect factor proved that the slopes facing northwest, south and north have more potential for landslide occurrence than those facing other orientations. The higher FR values of 1.61 and 1.31 were obtained in the class of 9000-25,000 m 2 and 0.3-0.5 for the catchment area and catchment slope factors, respectively, indicating higher spatial relationship with landslide occurrence. As for curvature, the class of (-0.05)-0.15 has the highest FR value of 1.21, indicating that the slopes with the other classes in terms of curvature are not responsible for landslides in this area. For magnitude, the FR values decreases as the magnitude increases. For distance to faults, we can observe that the area is more prone to landslide occurrence when its location is 3600-5400 m away from the faults. Distance to rivers is a critical factor because landslides often occur on both sides of the Yangtze River. It is obvious that landslide occurrences decrease with increasing distance to rivers. Furthermore, the possibility of landslides is greatly increased when the distance to rivers is less than 560 m, which can be verified by the highest FR value of this class (<560 m). For land use, the residential areas are responsible for landslides due to the highest FR value of this class. For lithology, it can be concluded from Table A1 that the F class has highest probability of landslide occurrence with the highest FR value of 1.88. In the case of NDVI, the class of 0.1-0.5 is responsible for landslide because this class obtains the highest FR value of 3.94. With respect to NDWI, we can observe that the class of -0.4-0.3 has high spatial relationship with landslide occurrence. Rainfall is another crucial factor that influences slope stability; thus, the 1030-1060 mm class gets the highest FR value of 1.08.
The 10-20 • class of the slope factor has the highest spatial relationship with landslide occurrence due to its highest FR value. Slope form plays a key role in analyzing the stability of landslides. It is observed that the class of GE/V is responsible for landslide occurrence with the highest FR value of 2.79. For TPI factor, more than 50% of landslides occurred in the −5-2 class. Results regarding TRI revealed that the FR value decreases as the TRI value increases and the class of <7 has the highest possibility of landslide occurrence with the FR value of 1.27. As for TST and TWI, the classes of <23 and 3.6-4.2 are highly susceptibility to landslides occurrence. The spatial relationship between landslide locations and TSC shows that the <42 class has the highest spatial relationship with FR values of 2.27.

Model Validation
In our experiments, all landslide models were constructed using the training set and the parameters were optimized through the trial-and-error process. Some related parameters of these methods were set up as shown in Table 2. Once the methods were built, the final landslide susceptibility map based on these methods were prepared in an ArcGIS environment. In order to better describe the susceptibility level of the study area, we used the natural break algorithm to divide the whole study area into five susceptibility classes [46,70]. Figure 4 presents landslide susceptibility maps of different methods and depicts the distribution of each susceptible class. It can be observed that all the DTs and DT+RF ensemble methods have similar spatial distributions. Specifically, the susceptibility class varies from very high to very low as distance to rivers increases. Furthermore, very high susceptible zones locate in the areas with lower altitude, indicating that these areas have great contribution to landslide occurrence. Landslide density is defined as the percentage of landslide pixels divided by the percentage of susceptible class pixels [71], and it was used to evaluate the effectiveness of landslide susceptibility maps. We can conclude from Table 3 that the very high susceptible class has the highest landslide density, followed by high, moderate, low, and very low susceptibility classes. Moreover, all the ensemble methods achieved higher very high landslide density values than corresponding base classifier.      Table 4 lists the OA and MCC value of all the methods. It can be seen that all the ensemble methods achieved better performance than corresponding DT classifiers in terms of OA and MCC. In particular, the FT+RF method achieved the highest improvement of 7.63% than FT model in terms of OA, followed by the ADT+RF (2.54%), LMT+RF (1.7%), VFDT+RF (0.88%), and FPA+RF (0.85%) methods, respectively. The same trend can be seen in terms of MCC that the FT+RF methods achieved the highest improvement of 0.152, followed by the ADT+RF (0.049), LMT+RF (0.035), VFDT+RF (0.019), and FPA+RF (0.017) methods. The ROC curves using the validation set are illustrated in Figure 5. For the DTs in Figure 5a, the VFDT method achieved the highest AUC value of 0.892, followed by the LMT, ADT, FPA, and FT methods with the AUC values of 0.884, 0.871, 0.858, and 0.779, respectively. For the ensemble methods in Figure 5b, both the VFDT+RF and FPA+RF methods obtained the highest AUC value of 0.907, followed by the ADT+RF, FT+RF and LMT+RF methods with the AUC values of 0.903, 0.900, and 0.896, respectively. It can be seen that the VFDT method obtained better prediction result than that of the other DTs. When this base classifier is integrated with the RF ensemble technique, the best prediction performance was achieved using the VFDT+RF method as well. Furthermore, all the DTs can be improved when integrating them with the RF ensemble technique because the ensemble methods are more efficient than the DTs. In particular, the FT+RF method achieved the greatest improvement over FT (0.121) in terms of AUC, followed by the FPA+RF (0.049), ADT+RF (0.032), VFDT+RF (0.015) and LMT+RF (0.012) methods. Table 5 lists the results of Chi-square test between DTs and ensemble methods. It can be seen that there is a significant difference between the DTs and the corresponding ensemble methods because the Chi-square and p values of these pair models ideally satisfied the specified threshold values previously mentioned.

Comparation with Benchmark Methods
To further validate the effectiveness of the ensemble framework, three state-of-the-art RF-based ensemble methods of RBFNN+RF, MLPNNs+RF and NB+RF were selected for comparison. The three benchmark methods have been successfully used in LSM [29,38,39]. The resultant maps of these methods and the corresponding ROC curves are shown in Figures 6 and 7, respectively. In terms of prediction performance, all the proposed ensemble methods were better than the three ensemble methods because the DTs are sensitive to rotation of the feature axes of the RF structure, which can result in more accurate results.

Parameter Analysis
As mentioned in Section 3.3, the PCA algorithm was originally used in the RF ensemble technique to rotate the axes rather than reducing dimensionality. In fact, other linear transformations may realize the same function in the RF algorithm, such as nonparametric discriminant analysis (NDA), Gaussian random projections (GRP), sparse random projections (SRP), and random subset (RS). To evaluate the performance of these feature extraction approaches on the prediction results, we construct several RF ensemble techniques for comparison. Figure 8 shows the AUC values of the ensemble methods with different feature extraction methods. Specifically, for the ADT+RF, LMT+RF, and VFDT+RF methods, each of them with PCA achieved higher AUC values than that with GRP, SRP and RS, respectively. The FPA+RF method with SRP obtained the highest AUC value of 0.914, which is only 0.007 higher than that of the FPA+RF method with PCA. Moreover, the FT+RF method with PCA, GRP and SRP obtained the same AUC value of 0.901, which means that any of these feature extraction approaches can result in a satisfactory prediction accuracy. Based on the above analysis, the PCA algorithm is an appropriate choice for the performance of the RF ensemble technique.

Discussion
Recently, many machine learning techniques have been developed for landslide susceptibility modelling, including LR [14], SVM [72], and ANN [73]. Among them, ensemble methods are very effective to combine weak classifiers to obtain better prediction performance [24,30,39]. To the best of our knowledge, there is no comparative study of a generalized ensemble framework by integrating the same ensemble technique with different base classifiers. In this study, the main goal of this study is to compare and evaluate the performance of a novel ensemble framework by integrating five DTs with the RF ensemble technique for LSM at the Three Gorges Reservoir area. Before analyzing landslide susceptibility, it is significant to evaluate the predictive capability of 20 conditioning factors. Zhou et al. [36] implemented the landslide susceptibility analysis in the Three Gorges Reservoir area and indicated that the factors of altitude and distance to rivers are much more important than other factors, which was in agreement with our results. The altitude and distance to rivers are important factors that influence the occurrence and development of landslides, especially in the Three Gorges Reservoir area. The Yangtze River runs through the entire study area, and the reservoirs construction induce a large number of landslide hazard. Furthermore, in the study area, areas with lower altitude are usually close to the mainstream of the Yangtze river. The periodically fluctuation of water level strongly influences the rock and soil mass near the bank slope. Therefore, the factors of altitude and distance to rivers play an important role in the occurrence of landslides. Moreover, Peng et al [35] concluded that rainfall was relatively uniform in the same Three Gorges Reservoir area and had little importance to landslide occurrence, which is in consistent with current study. Specifically, The GR results demonstrated that the altitude and distance to rivers factors obtain much higher AM value than the other conditioning factors. Furthermore, the FR results showed that the <300 m class of altitude and the <560 m class of distance to rivers achieved the highest FR values, accounting for over 83% and 88% of landslide locations, respectively. The main reason on these observations is that the areas located in a lower altitude are very close to the Yangtze River. Meanwhile, the water level of the Three Gorges Reservoir unusually has significant increases and periodic fluctuations, which seriously affect the stability of bank slopes [42,74].
In our experiments, all the proposed ensemble methods can achieve a better performance than the traditional DTs, since the proposed ensemble framework can effectively improve predictive capability by avoiding over-fitting and reducing variance and bias, which is accord with the previous studies [30,39,44]. Comparison of the performance of all models indicated that ensemble methods have 0.012-0.121 and 0.85%-7.63% improvement than base classifier in terms of AUC and OA values, respectively. Although the improvement seems to be limited, but from Table 3 we can confirm that all the models is significant on providing susceptibility maps. Moreover, the result of significance analysis also demonstrated that ensemble methods is statistical difference with corresponding base classifiers, which proved that ensemble methods is instructive for decision makers to prefer those ensemble methods than DT classifiers. Specifically, the FT+RF method obtained the greatest performance improvement among all the proposed methods, since the FT model can reduce bias by using functional decision and has a better combination capability with the RF ensemble technique than the other DT base classifiers, which demonstrated that selecting an optimal base classifier is critical for applying ensemble technique. The RF ensemble method has been proved as a preeminent technique that integrated tree-related classifiers in the field of LSM [30,39,75]. Moreover, several previous studies applied RF ensemble integrated with other base classifiers of RBFNN, MLPNNs, and NB, respectively [29,38,39], which obtained relatively good results. However, the result of present study shows that our proposed five ensemble frameworks all achieved better accuracy than RBFNN+RF, MLPNNs+RF, and NB+RF in terms of AUC. It is reasonable because RF can optimize the dataset and train the base classifier in a rotated feature space, and the selected DTs are very sensitive to rotation of the feature axes in RF architecture. Therefore, the DTs can perform better in combination with RF and improve its performance.

Conclusions
This article proposes a novel ensemble framework by integrating DTs with the RF ensemble technique to produce landslide susceptibility maps. RF ensemble technique can accurately portray the landslide susceptibility distribution of the Three Gorges Reservoir area of China. The final susceptibility maps were produced using the DTs of ADT, FPA, FT, LMT, and VFDT and their ensembles, which were based on 20 conditioning factors and landslide inventory map. Experiment results demonstrated that all the DT-based classifiers can be improved by the RF ensemble technique with 0.012-0.121, 0.85-7.63%, and 0.017-0.152 in terms of AUC, OA, and MCC, respectively. Specifically, FT obtained the highest performance improvement and exhibits the best integration ability than other DT base classifiers. Moreover, all the proposed ensemble methods achieved better performance against the state-of-the-art RF ensemble methods in terms of AUC, which demonstrated that the RF ensemble technique has better integration capability with DT classifiers. That comparison also confirmed that selecting an appropriate base classifier is of great significant for ensemble technique to perform landslide susceptibility analysis. In conclusion, the proposed ensemble framework is effective for landslide disaster management and assessment. In the future, our studies will be made by investigating more efficient ensemble prediction methods.

Acknowledgments:
The authors are grateful for the data and materials provided by the Headquarters for Prevention and Control of Geological Disasters in the Three Gorges Reservoir. Also, the authors would like to thank the handling editors and the three anonymous reviewers for their valuable comments and suggestions, which significantly improved the quality of this paper.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Spatial relationship between each landslide conditioning factor and landslides using FR model.