Quantitative Assessment of Landslide Susceptibility Comparing Statistical Index, Index of Entropy, and Weights of Evidence in the Shangnan Area, China

In this study, a comparative analysis of the statistical index (SI), index of entropy (IOE) and weights of evidence (WOE) models was introduced to landslide susceptibility mapping, and the performance of the three models was validated and systematically compared. As one of the most landslide-prone areas in Shaanxi Province, China, Shangnan County was selected as the study area. Firstly, a series of reports, remote sensing images and geological maps were collected, and field surveys were carried out to prepare a landslide inventory map. A total of 348 landslides were identified in study area, and they were reclassified as a training dataset (70% = 244 landslides) and testing dataset (30% = 104 landslides) by random selection. Thirteen conditioning factors were then employed. Corresponding thematic data layers and landslide susceptibility maps were generated based on ArcGIS software. Finally, the area under the curve (AUC) values were calculated for the training dataset and the testing dataset in order to validate and compare the performance of the three models. For the training dataset, the AUC plots showed that the WOE model had the highest accuracy rate of 76.05%, followed by the SI model (74.67%) and the IOE model (71.12%). In the case of the testing dataset, the prediction accuracy rates for the SI, IOE and WOE models were 73.75%, 63.89%, and 75.10%, respectively. It can be concluded that the WOE model had the best prediction capacity for landslide susceptibility mapping in Shangnan County. The landslide susceptibility map produced by the WOE model had a profound geological and engineering significance in terms of landslide hazard prevention and control in the study area and other similar areas.


Introduction
Landslides, as one of the most critical geological hazards in the world, seriously threaten lives, property and natural resources [1][2][3][4][5]. According to the latest statistics on geological disasters carried out by the Chinese Geological Environment Information Site, more than 270,000 geological hazards occurred from 2006 to 2016, causing a direct economic loss of $7.7 billion, and the proportion of loss caused by landslides has increased year by year (http://www.cigem.gov.cn). Hence, in order to reduce the damage caused by landslides, investigating landslide susceptibility maps has become an important task that needs to be addressed [3,[6][7][8][9]. Previous studies of landslide susceptibility mapping found that the quality of the data, the depth of the research and the methods of analysis were the three most important factors with a primary effect on the accuracy and reliability of the assessment results [6,[9][10][11].
Along with the application of global positioning systems (GPS), remote sensing (RS), and geographic information systems (GIS) to landslide susceptibility mapping, more and more researchers have begun to frog-leaping algorithms [51]; and artificial neural network-maximum entropy [52]. Some review articles show that different models have different characteristics, and each of them has strengths and weaknesses [41,53]. In the current research, we address compare three statistical models, applying, analyzing and inspecting the statistical index (SI), index of entropy (IOE), and weights of evidence models (WOE) with regard to landslide susceptibility mapping, using the case study of Shangnan Country, China.

Study Area
Shaanxi Province is situated in middle of China. The study area (Shangnan County) is located in the southeastern part of Shaanxi Province, China, between the latitudes of 33°06′ and 33°44′ N, and

Data
The amount, distribution and characteristics of existing landslides were the basis of the susceptibility assessment. A landslide inventory map of a study area is effective and is organized to demonstrate the basic information regarding existing landslides [44,54]. In this case, the historical data on landslides and related information-including the topographical, geological and meteorological conditions-were acquired using three approaches, namely the analysis of existing historical records, interpretation of satellite images and field surveys in Shangnan County, respectively. In total, 348 existing landslides were identified, of which most of the landslides in the study area are slides (326), the others include 12 rock falls and 10 debris flow [12,55]. According to an analyse in the GIS environment, the size of the largest landslide is more than 30,000 m 2 , the smallest landslide is nearly 15 m 2 , while the average is 9600 m 2 . In addition, the shape and scale of the landslides in Shangnan County were simplified as a centroid point to establish the susceptibility assessment models. Finally, 348 landslides were randomly divided into training data (70%) and testing data (30%) (Figure 1).
In this paper, a total of thirteen landslide conditioning factors were employed to establish a series of mathematical models; the conditioning factors included slope angle, slope aspect, elevation, plan curvature, profile curvature, stream power index (SPI), sediment transport index (STI), topographic wetness index (TWI), distance to faults, distance to rivers, distance to roads, normalized difference vegetation index (NDVI), and lithology.
Slope angle is related to the failure mode and scale of the landslide, and was used widely and frequently in landslide susceptibility assessment [31,[56][57][58]

Statistical Index (SI)
The statistical index model was first proposed by van Westen et al. [74]. In the SI model, a weight value for a parameter class can be defined as the natural logarithm of the landslide density in the class, divided by the landslide density in the whole study area [75,76]: where i j W is the weight for the class i of factor j , i j D is the landslide density within class i of the factor j , and D is the landslide density in the whole study area.

Index of Entropy (IOE)
The index of entropy is the second model used in this study. The entropy indicates the extent of the disorder of a system [77]. The equations used to calculate the information coefficient j W are expressed as below: (0,1), 1, 2,..., Slope aspect is another critical parameter used broadly in landslide susceptibility assessment. This factor can influence the meteorological conditions, such as rainfall, evaporation, temperature, etc. These meteorological conditions are generally connected to the stability of slopes [59,60]. Additionally, based on the DEM, the slope aspects in the study area were grouped into nine categories, as shown in Figure 2b.
The varieties of elevation reflect the changes in landforms between different geomorphic units. Therefore, elevation is also a relevant landslide conditioning factor used frequently in the establishment of landslide susceptibility assessment models [29,45,61]. In this study, the elevation values in Shangnan County were divided into six classes with an interval of 300 m, as follows: <500 m, 500-800 m, 800-1100 m, 1100-1400 m, 1400-1700 m, and >1700 m ( Figure 2c).
Curvature, a technical term in topography, is the rate of change of the slope gradient or aspect in a particular direction [62]. Moreover, curvature can be further divided into plan curvature and profile curvature. The former is the curvature of a contour line formed by intersecting a horizontal plane with the surface, while the latter refers to the curvature in the vertical plane parallel to the slope direction [63,64]. For this reason, it was helpful to consider plan curvature and profile curvature in this study. By analyzing the DEM in the ArcGIS software (10.0, Esri, California, MA, USA), the plan curvature and profile curvature values in the study area were obtained and grouped into four classes based on the natural break method [25] (Figure 2d-e).
The stream power index (SPI) is a parameter measuring the stream power and erosion power of flowing water [65]. The scouring and infiltration of flowing water have a strong effect on the strength of the soil and rock that compose a slope. In the present study, the SPI values were arranged in four classes with an interval of 30, namely <30, 30-60, 60-90, and >90 ( Figure 2f). The sediment transport index (STI) is used to measure the erosive and transporting capacity of a stream [14]. In this study area, the STI values were divided into four categories with an interval of 10: <10, 10-20, 20-30, and >30 ( Figure 2g).
The topographic wetness index (TWI) reflects the degree of accumulation of water at a site [66]. The TWI values in the study area were calculated and classified into four categories with an interval of 2 as follows: <5, 5-7, 7-9, and >9 ( Figure 2h).
Generally speaking, faults can weaken the mechanical characteristics of the rock and soil of adjacent slopes [67]. Based on the ArcGIS software, buffers consisting of the Euclidean distance to faults were generated. Taking an equal interval of 1000 m, the values of the distance to faults are shown in Figure 2i, namely, <1000 m, 1000-2000 m, 2000-3000 m, 3000-4000 m, and >4000 m.
The seepage force generated by the discharge along slopes and rivers and the wetting effects of rivers have an adverse influence on the stability of slopes [68]. In this case, buffers consisting of the Euclidean distance to rivers were formed and are shown in Figure 2j. According to the equal interval classification method, there are five categories, namely <200 m, 200-400 m, 400-600 m, 600-800 m, and >800 m.
In Shangnan County, road building is one of the most major human engineering activities. Road construction frequently leads to the excavation of the toe of slopes, which may contribute to the occurrence of landslides [69]. In this case, the influence of roads was measured by the distance to roads, and the values were classified into five classes with an interval of 500 m: <500 m, 500-1000 m, 1000-1500 m, 1500-2000 m, and >2000 m, respectively ( Figure 2k).
The normalized difference vegetation index (NDVI) is also universally applied in the process of landslide susceptibility assessment [25,70]. This parameter indicates the conditions of the vegetation coverage in the study area. By analyzing the near-infrared and the red band of Landsat 8 Operational Land Imager (OLI) images (http://www.gscloud.cn/), the NDVI values were calculated and classified into five classes based on the natural break method [34,71] Lithology is one of the most fundamental factors that determines the physical and mechanical properties of rock and soil [72,73]. Based on the field surveys and geological mapping, the lithological map of Shangnan County was digitized using the ArcGIS software. As is shown in Figure 2m, the lithological units in study area were grouped into nine categories based on the geological ages and lithofacies.

Statistical Index (SI)
The statistical index model was first proposed by van Westen et al. [74]. In the SI model, a weight value for a parameter class can be defined as the natural logarithm of the landslide density in the class, divided by the landslide density in the whole study area [75,76]: where W i j is the weight for the class i of factor j, D i j is the landslide density within class i of the factor j, and D is the landslide density in the whole study area.

Index of Entropy (IOE)
The index of entropy is the second model used in this study. The entropy indicates the extent of the disorder of a system [77]. The equations used to calculate the information coefficient W j are expressed as below: where W j is the resultant weight value for the factors as a whole, P j is the slope failure probability for j = 1, 2, . . . , n, I j is the information coefficient, H j and H jmax are the entropy values, S j is the number of classes, and m and n are the landslide and domain percentages, respectively.

Weights of Evidence (WOE)
The WOE method is a probabilistic approach based on a log linear form of Bayes' rule, expressed as: where A is the presence or absence of the landslide in the study area, and B is the landslide predictive factor. The approach calculates the weight for each B based on A, as follows [78,79]: where W + i is an indicator of the positive correlation, W − i shows the level of negative correlation, B is the presence of a desired class of landslide conditioning factor, and B is the absence of desired class of landslide conditioning factor. A is the presence and A is the absence of the landslide. The difference between the two weights is called the weight contrast: The contrast reflects the overall spatial correlation between the desired class of landslide conditioning factor and the landslides.

Selection of Landslide Conditioning Factors
In landslide susceptibility modelling, landslides usually occur under different conditions, and the contribution of the conditioning factors to landslide occurrence is quite different [48]. Therefore, the removal of unimportant landslide conditioning factors to improve the performance of landslide models is necessary [80,81]. In this study, the SI, IOE, and WOE models were employed to construct the landslide susceptibility maps. Nevertheless, one of the most critical assumed conditions of these models is the independence assumption among the conditioning factors [38]. Therefore, in the present study, the coefficient of variation (CV) attribute evaluation (CVA) method was used to validate all thirteen landslide conditioning factors considered for the development of landslide susceptibility models. This method evaluates the worth of an attribute by computing the value of the coefficient of variation with respect to the class. It first creates a ranking of attributes based on the variation value, then divide this into two groups, using a verification method to select the best group [82].

Selection of Landslide Conditioning Factors
In the present study, based on the CVA method (a 10-fold cross-validation method [83,84], seed = 1), the importance of all the conditioning factors was measured according to average merit (AM), and the calculation results are illustrated in Table 1. The results show that all the AM values of the conditioning factors were larger than zero, indicating that the thirteen selected factors have positive influence on landslide occurrence. Of these factors, the highest AM value was for distance to roads (AM = 0.

Application of the SI Model
In this case, the SI model was applied to analyze the relationships between each conditioning factor and landslide occurrence ( Table 2). From Table 2, it can be seen that for the slope angle 0-20 • , the SI values were positive, which indicates that landslides were more prone to occurring in these areas. This is also in line with some other landslide susceptibility studies [81,[85][86][87]. With regard to slope aspect, an eastern aspect had the highest SI value of 0.3024, while the lowest SI value was for southeast (-0.4019). In addition, no landslides occurred in flat areas (SI = 0), which conforms to actual situations and related research results [88,89]. When the altitude was lower than 800 m, there was a larger probability of landslides being triggered; all the landslides were not situated in areas with an altitude greater than 1400 m. In terms of curvature, the classes with a plan curvature of -1.09 to -0.11 (0.0124) and -0.11 to 0.88 (0.0264) had positive SI values, while the SI values were positive for classes with a profile curvature of -0.02 to 1.26 (0.0450) and 1.26 to 11.43 (0.0851). In the case of the SPI, compared with the other classes, the class of 0 to 30 (0.0801) had a more positive effect on landslide occurrence. In the case of STI, the class of >30 had the only negative SI value (-0.2712). In the case of the TWI, the intervals of 5-7 (0.0952) and 7-9 (0.1415) could be interpreted as promoting conditions. With regard to the distance to faults, the probability of landslide occurrence decreased with the increasing distance to faults, and the highest SI value of 0.2902 was for the class of 0-1000 m. For the distance to rivers, the only positive SI value of 0.2964 belonged to the class <200 m. For the distance to roads, landslides mainly spread in areas of where the distance to roads was within 500 m. Both the highest NDVI and the lowest NDVI had a positive impact on landslide occurrence. In the case of lithology, the SI values of the harder metamorphic rocks, softer metamorphic rocks, hard carbonate rocks, hard intrusive rocks and soft gravelly soils were -0.5121, 0.6650, -0.4742, -0.7160, and 0.3584, respectively. Finally, the landslide susceptibility indexes (LSI) were calculated using the SI values and Equation (11). The corresponding landslide susceptibility map (LSM) (Figure 3) was generated using ArcGIS software. It is clear that the probability of landslide occurrence rises with the enlargement of the LSI. In the present study, the natural break method, which seeks to reduce the variance within classes and maximize the variance between classes [90], was used to the reclassify the LSI values into five categories, namely very low, low, moderate, high and very high. LSI SI = Slope angle SI + Slope aspect SI + Elevation SI + Plan curvature SI + Profile curvature SI +SPI SI + STI SI + TWI SI + Distance to faults SI + Distance to rivers SI + Distance to roads SI +NDVI SI + Lithology SI (11) Entropy 2018, 20, 868 13 of 22 13 high, and very high (Figure 4). IOE LSI =Slope angle? 0.0202+Slope aspect? 0.0560+Elevation? 0.1923+Plan curvature? 0.0006 +Profile curvature? 0.0053+SPI? 0.0022+STI×0.0063+TWI? 0.0090 +Distance to roads? 0.0546+NDVI? 0.0145+Lithology? 0.0954 (13)

Application of the IOE Model
From Table 2, we acquired the W j values of various conditioning factors. W j is an index to measure the importance of factors. Thus, it can be seen that the most critical factor was altitude (W j = 0.1923), followed by distance to faults (W j = 0.1068), lithology (W j = 0.0954), slope aspect (W j = 0.0560), distance to roads (W j = 0.0546), distance to rivers (W j = 0.0511), slope angle (W j = 0.0202), NDVI (W j = 0.0145), TWI (W j = 0.0090), STI (W j = 0.0063), profile curvature (W j = 0.0053), SPI (W j = 0.0022), and plan curvature (W j = 0.0006). It should be explained that the above ranking only applies to Shangnan County. The relative importance of conditioning factors usually varies for different study areas [91]. To produce a landslide susceptibility map using the LSI, the landslide occurrence probability values were calculated using Equation (12). Similarly, the produced landslide susceptibility map was further classified into five classes based on the natural break method, including very low, low, moderate, high, and very high (Figure 4). LSI IOE = Slope angle × 0.0202 + Slope aspect × 0.0560 + Elevation × 0.1923 + Plan curvature × 0.0006 +Profile curvature × 0.0053 + SPI × 0.0022 + STI × 0.0063 + TWI × 0.0090 +Distance to roads × 0.0546 + NDVI × 0.0145 + Lithology × 0.0954 (12)

Application of the WOE Model
In Table 2, the weight contrast values are noted as C, which indicate the landslide susceptibility of various classes of conditioning factors. In terms of the slope angle, landslides are more likely to occur in areas with a slope angle of 0-10° (0.273) and 20-30° (0.029). For slope aspect, east (0.244) had

Application of the WOE Model
In Table 2, the weight contrast values are noted as C, which indicate the landslide susceptibility of various classes of conditioning factors. In terms of the slope angle, landslides are more likely to occur in areas with a slope angle of 0-10 • (0.273) and 20-30 • (0.029). For slope aspect, east (0.244) had the highest probability of triggering landslides, which is in line with the conclusion of the SI model. For altitude, most landslides are more prone to occurring at altitudes <800 m. For curvature, the results showed that the highest contrast value (0.130) was for plan curvatures between -0.11 and 0.88, while profile curvatures from -0.02 to 1.26 (0.108) were most prone to landslides. For the SPI, the WOE results were the same as the SI results, and the class 0-30 had the highest contrast value of 0.264. For the STI, class of 10-20 had the only positive value (0.177), which indicates that areas with STI values of 10-20 had a positive effect on landslide occurrence. For TWI, the highest contrast value (0.235) was found for class <5. In the case of distance to faults, distance to rivers and distance to roads, the highest contrast values belonged to the class <1000 m for distance to faults, the class <200 m for distance to rivers, and the class <500 m for distance to roads. For the NDVI, it was found that the range 0.17-0.33 was the only class for which the contrast value was larger than zero. In the case of lithology, hard carbonate rocks and hard intrusive rocks were identified as promoting landslides, this result did not coincide with the results of the SI model.
Finally, based on the results of the WOE model, the LSI values for the study area were calculated using Equation (13). The natural break method was introduced to reclassify landslide susceptibility into five classes: very low, low, moderate, high, and very high ( Figure 5): LSI WOE = Slope angle C/S(C) + Slope aspec C/S(C) + Elevation C/S(C) + Plan curvature C/S(C) +Profile curvature C/S(C) + SPI C/S(C) + STI C/S(C) + TWI C/S(C) + Distance to faults C/S(C) +Distance to rivers C/S(C) + Distance to roads C/S(C) + NDVI C/S(C) + Lithology C/S(C) (13)

Application of the WOE Model
In Table 2, the weight contrast values are noted as C, which indicate the landslide susceptibility of various classes of conditioning factors. In terms of the slope angle, landslides are more likely to occur in areas with a slope angle of 0-10° (0.273) and 20-30° (0.029). For slope aspect, east (0.244) had the highest probability of triggering landslides, which is in line with the conclusion of the SI model.

Validation and Comparison of the Models
It is absolutely necessary to quantitatively measure the accuracy of the landslide susceptibility maps produced by the various classification models [92]. To assess the performance of the three landslide susceptibility mapping models described above, the corresponding area under the curve (AUC) curves for the training dataset and testing dataset were obtained. The receiver operating characteristics (ROC) curve and the AUC are two common indices used in the validation and comparison of different landslide susceptibility models [33,34,51,80,93]. In the present study, the AUC method, which was plotted using the cumulative area percentages as the horizontal axis and the cumulative percentage of landslides as the longitudinal axis [17,19,94], was used to compare the performance of the three models. Generally, the model with the highest AUC value was considered to show the best landslide susceptibility mapping performance.
In the case of the training dataset, the AUC values for the SI, IOE, and WOE models were 0.7467, 0.7112, and 0.7650, respectively, and the corresponding accuracy rates were 74.67%, 71.12%, and 76.50% (Figure 6a). It was clear that the landslide susceptibility map generated with the WOE model was more in line with actual situations. The performance of the SI model was second only to the WOE model. Compared with the other models, the accuracy of the IoE model was relatively low.
In the case of the testing dataset, the prediction accuracy values for the SI, IOE and WOE models were 73.75%, 63.89% and 75.10%, respectively (Figure 6b). The results showed that WOE had the best prediction capacity, followed by the SI model and the IOE model. In addition, the AUC values of the testing dataset were lower than those of the training dataset. When using the IOE model, the AUC value calculated with the testing dataset decreased by 0.0723 compared to the results found using the training dataset. Therefore, it could be concluded that the landslide susceptibility maps produced by the SI and WOE models both had good spatial effectiveness for the study area, and that the IoE model was not very suitable for landslide susceptibility mapping in Shangnan County.
Entropy 2018, 20, x 16 of 22 Figure 6. Area under the curve (AUC) curves of the models using (a) training data and (b) testing data.

Conclusions
In recent years, understanding of the serious effects of landslides on people's life and property has increased. Thus, it is necessary to promote landslide susceptibility assessment in landslide hazard zones. Classical probability models and novel machine learning algorithms should be introduced to landslide susceptibility modeling with the aim of acquiring better prediction accuracy.
In this paper, the SI, IOE, and WOE models were employed to assess landslide susceptibility in Shangnan County, and the performance of the three models was compared. According to their relevance and suitability, thirteen conditioning factors were selected for modeling. The landslide data were then classified into two groups, namely a training dataset (70% of the landslides) and a testing dataset (30% of the landslides). The importance of the conditioning factors was also evaluated using

Conclusions
In recent years, understanding of the serious effects of landslides on people's life and property has increased. Thus, it is necessary to promote landslide susceptibility assessment in landslide hazard zones. Classical probability models and novel machine learning algorithms should be introduced to landslide susceptibility modeling with the aim of acquiring better prediction accuracy.
In this paper, the SI, IOE, and WOE models were employed to assess landslide susceptibility in Shangnan County, and the performance of the three models was compared. According to their relevance and suitability, thirteen conditioning factors were selected for modeling. The landslide data were then classified into two groups, namely a training dataset (70% of the landslides) and a testing dataset (30% of the landslides). The importance of the conditioning factors was also evaluated using the CVA method with AM values. The results showed that all the thirteen conditioning factors had a positive effect on landslide occurrence. The AUC plots generated with training dataset demonstrated that the WOE model (AUC = 0.7605) had the highest accuracy of landslide susceptibility mapping, followed by the SI model (AUC = 0.7467) and IOE model (AUC = 0.7112). Similarly, the prediction capacity of the three models was measured using AUC plots generated from the testing dataset. The results indicated that the WOE model had the best performance in landslide susceptibility prediction.
The landslide susceptibility map produced by the WOE model can be meaningful for landslide hazard prevention and control in Shangnan County and other mountainous areas with similar features. The landslide susceptibility maps can also be used as a basis for future landslide risk assessment studies of the study area and other areas with similar geo-environmental characteristics. The model can also be applied in other areas to expand its use.