An Improved Information Value Model Based on Gray Clustering for Landslide Susceptibility Mapping

Landslides, as geological hazards, cause significant casualties and economic losses. Therefore, it is necessary to identify areas prone to landslides for prevention work. This paper proposes an improved information value model based on gray clustering (IVM-GC) for landslide susceptibility mapping. This method uses the information value derived from an information value model to achieve susceptibility classification and weight determination of landslide predisposing factors and, hence, obtain the landslide susceptibility of each study unit based on the clustering analysis. Using a landslide inventory of Chongqing, China, which contains 8435 landslides, three landslide susceptibility maps were generated based on the common information value model (IVM), an information value model improved by an analytic hierarchy process (IVM-AHP) and our new improved model. Approximately 70% (5905) of the inventory landslides were used to generate the susceptibility maps, while the remaining 30% (2530) were used to validate the results. The training accuracies of the IVM, IVM-AHP and IVM-GC were 81.8%, 78.7% and 85.2%, respectively, and the prediction accuracies were 82.0%, 78.7% and 85.4%, respectively. The results demonstrate that all three methods perform well in evaluating landslide susceptibility. Among them, IVM-GC has the best performance.


Introduction
Landslides, as geological hazards causing serious casualties, property loss, and environmental damage, restrict sustainable development [1,2].To minimize economic losses and loss of human life, landslide-prone areas should be identified.A landslide susceptibility map is urgently needed.
The information value based method has been widely applied as a statistical data-driven method recommended by experts [17] to assess landslide susceptibility [18,19].Xu et al. [18] used GIS and the information value model to evaluate debris flow susceptibility.Chen et al. [19] made a landslide susceptibility map using the information value model in the Chencang District of Baoji, China.
Zhu et al. [20] compared the information value model with the weights-of-evidence method in landslide susceptibility mapping.The results demonstrate that the information value model had higher prediction accuracy.Chen et al. [21] made a comparison between the information value model and logistic regression model in landslide susceptibility mapping, which suggests that the results of the information value model were more coincident with actual landslide events.The higher prediction accuracy of the information value model in landslide susceptibility mapping is partly because the relative weights of different classes of each landslide predisposing factor can be determined objectively.In addition, different factors have different influences on the occurrence of landslides.However, the traditional information value model regards all landslide predisposing factors at the same level of importance and assigns equal weight to each factor.Thus, this model cannot reflect the differences between the contributions of various landslide predisposing factors.To improve the information value model, several methods have been proposed.Jiang et al. [22] combined the information value model with an analytic hierarchy process to assess landslide susceptibility.An information value model integrated with Shannon's entropy was proposed by Sharma et al. [23].However, for these methods, the weights of landslide predisposing factors are determined through human intervention, which increases uncertainties in the results.
This paper proposes an improved information value model based on gray clustering.Since the effects of various predisposing factors on landsliding are different.It is vital to understand the differences in effect and hence to weight the importance of different factors.This model objectively determines both the relative weights of different classes within each predisposing factor and the weights of predisposing factors for landslides.The proposed model is evaluated by comparing its landslide susceptibility mapping results with those of the traditional information model and the improved model combined with the analytic hierarchy process.This study provides new insight for landslide susceptibility mapping that can help governments to conduct landslide prevention and mitigation.

Study Area
The study area of Chongqing is located in the southwestern part of China, between the longitudes 105 • 11 E and 110 • 11 E and latitudes 28 • 10 N and 32 • 13 N.This area is characterized by a complex geological structure, soft surface layer, deep valleys, and steep slopes.The basic tectonic framework of this area originated from Indosinian-Yanshan movement and Himalayan movement.Affected by the Huayingshan fault zone, Qiyaoshan fault zone, and Changshou-Zunyi fault, a series of tectonic folds and faults developed in this area.Chongqing is located in the eastern part of Sichuan Basin.The eastern Chongqing is connected to the Qinba Mountains and Wuling Mountains, and the Western Chongqing is linked with the Mid-Sichuan Hilly Region.This area has a distinct topographical relief that is controlled by geological structures.The mountain alignment is broadly consistent with the tectonic line.West Chongqing has mainly low mountainous and hilly regions.The Jialing River and Yangtze River run through the whole region.The climate of this area is subtropical monsoonal, with abundant precipitation and storms.In recent years, increasing human activities in this region, especially for the construction of the Three Gorges Reservoir, caused more impact on the natural terrain.Consequently, landslides became the most extensive and serious geological hazard in the area.

Landslide Inventory Data
In this study, a landslide inventory with a total of 8435 landslide events before 2014 was provided by the Chongqing Institute of Geology and Mineral Resources (Figure 1).All landslide events are represented by point features with attributes of latitude, longitude, and area.The minimum area of landslides was 3 m 2 , and the maximum area was 3,080,000 m 2 .
The main predisposing factors of landslides in Chongqing include rainfall, earthquakes, erosion of slope toes by rivers, and human activities.Most landslides in the study area are caused by rainfall, followed by landslides caused by earthquakes and erosion.Some studies have indicated that the rainfall threshold was 150 mm/day in the study area [24].Moreover, a great number of construction projects initiated by local governments were also responsible for the landslide occurrence.

Landslide Predisposing Factors
In this paper, we utilized eight landslide predisposing factors to construct landslide prediction methods, including elevation, slope gradient, aspect, rainfall, distance from the faults, distance from the road network, distance from the hydrographic network, and the normalized difference vegetation index (NDVI).
Elevation has great effects on climate, hydrology, geology, and soil, which are factors related to the occurrence of landslides.Slope gradient is a main driving force of landsliding.Theoretically, landslides are more likely to occur on steep slopes [25].However, some studies have reported that landslides are most likely to occur when the slope gradient is moderate [26].This is due to a lack of material foundation for landsliding at large slope gradients [27].Aspect influences the distribution of water and heat resources and, hence, affects soil, rock, and vegetation types [28].Rainfall is an important triggering factor of landslides by directly or indirectly reducing the shear strength of rocksoil through physical and chemical effects on rock-soil.Therefore, the mean annual precipitation The main predisposing factors of landslides in Chongqing include rainfall, earthquakes, erosion of slope toes by rivers, and human activities.Most landslides in the study area are caused by rainfall, followed by landslides caused by earthquakes and erosion.Some studies have indicated that the rainfall threshold was 150 mm/day in the study area [24].Moreover, a great number of construction projects initiated by local governments were also responsible for the landslide occurrence.

Landslide Predisposing Factors
In this paper, we utilized eight landslide predisposing factors to construct landslide prediction methods, including elevation, slope gradient, aspect, rainfall, distance from the faults, distance from the road network, distance from the hydrographic network, and the normalized difference vegetation index (NDVI).
Elevation has great effects on climate, hydrology, geology, and soil, which are factors related to the occurrence of landslides.Slope gradient is a main driving force of landsliding.Theoretically, landslides are more likely to occur on steep slopes [25].However, some studies have reported that landslides are most likely to occur when the slope gradient is moderate [26].This is due to a lack of material foundation for landsliding at large slope gradients [27].Aspect influences the distribution of water and heat resources and, hence, affects soil, rock, and vegetation types [28].Rainfall is an important triggering factor of landslides by directly or indirectly reducing the shear strength of rock-soil through physical and chemical effects on rock-soil.Therefore, the mean annual precipitation (MAP) is used as the indicator.Proximity of a fault is also a main predisposing factor for landslides.It is well known that landslides tend to occur in the surrounding area of a fault due to fractures in the rock mass [29,30].The buffer distance from the faults is used as the indicator.Road construction also results in the oversteepening of side slopes.Therefore, there is a high probability of landslide occurrence along a road.The distance of a slope to drainage structures is another important factor for slope stability.Streams may adversely affect stability by eroding the slopes or saturating the lower part of material [31,32].Therefore, we chose the distance from the hydrographic networks as a predisposing factor.NDVI is an important index denoting a region's vegetation cover, and it is an important factor for landslide occurrence and movement [33].Plant roots can hold the soil to mitigate the effect of rainfall [34].Theoretically, the possibility of landslide occurrence gradually decreases with increasing NDVI value [35].
The 30-m-resolution global digital elevation model generated from the stereoscopic data collected by the advanced spaceborne thermal emission and reflection radiometer global digital elevation model (ASTER GDEM) was utilized to provide elevation information.Based on ASTER GDEM, a slope gradient and an aspect map were generated.The hydrographic network was also extracted from ASTER GDEM by computing flow accumulation.The geological structure data extracted from a geological map of Chongqing was in vector format on a scale of 1:500,000.The road network vector data was retrieved from the topographic map of China.The rainfall data, including daily precipitation at rainfall observation stations in 2013 and 2014 and geographical coordinates of these observation stations, was provided by the Chongqing Institute of Geology and mineral resources.(MAP) is used as the indicator.Proximity of a fault is also a main predisposing factor for landslides.
It is well known that landslides tend to occur in the surrounding area of a fault due to fractures in the rock mass [29,30].The buffer distance from the faults is used as the indicator.Road construction also results in the oversteepening of side slopes.Therefore, there is a high probability of landslide occurrence along a road.The distance of a slope to drainage structures is another important factor for slope stability.Streams may adversely affect stability by eroding the slopes or saturating the lower part of material [31,32].Therefore, we chose the distance from the hydrographic networks as a predisposing factor.NDVI is an important index denoting a region's vegetation cover, and it is an important factor for landslide occurrence and movement [33].Plant roots can hold the soil to mitigate the effect of rainfall [34].Theoretically, the possibility of landslide occurrence gradually decreases with increasing NDVI value [35].
The 30-m-resolution global digital elevation model generated from the stereoscopic data collected by the advanced spaceborne thermal emission and reflection radiometer global digital elevation model (ASTER GDEM) was utilized to provide elevation information.Based on ASTER GDEM, a slope gradient and an aspect map were generated.The hydrographic network was also extracted from ASTER GDEM by computing flow accumulation.The geological structure data extracted from a geological map of Chongqing was in vector format on a scale of 1:500,000.The road network vector data was retrieved from the topographic map of China.The rainfall data, including daily precipitation at rainfall observation stations in 2013 and 2014 and geographical coordinates of these observation stations, was provided by the Chongqing Institute of Geology and mineral resources.

Methodology
For the data set, each 30 × 30 m grid cell was used as the study unit.The 8435 recorded landslides were randomly divided into two subsets.The 70% (5905) of inventory landslides were used for model training, and the remaining 30% (2530) were used for model validation.The three models, i.e., the information value model, the improved information value model based on analytic hierarchy process and the improved information value model based on gray clustering, were used to assess landslide susceptibility.Finally, landslide susceptibility was divided into the following five classes: very low, low, moderate, high, and very high using Jenks natural breaks optimization.Jenks natural breaks optimization is a data clustering method designed to determine the best arrangement of values into different classes.This is done by seeking to minimize each class' average deviation from the class mean, while maximizing each class' deviation from the means of other groups.In other words, the method seeks to reduce the variance within classes and maximize the variance between classes [34].Details of the three models are provided in the following subsections.

Information Value Model (IVM)
IVM is a statistical analysis method that was developed from information theory.In this model, information values of predisposing factors were used to characterize the possibility of landslides occurrence.The information value I( , H) of each landslide predisposing factor ( = 1,2, … , ) can be expressed as follows [21,36,37]:

Methodology
For the data set, each 30 × 30 m grid cell was used as the study unit.The 8435 recorded landslides were randomly divided into two subsets.The 70% (5905) of inventory landslides were used for model training, and the remaining 30% (2530) were used for model validation.The three models, i.e., the information value model, the improved information value model based on analytic hierarchy process and the improved information value model based on gray clustering, were used to assess landslide susceptibility.Finally, landslide susceptibility was divided into the following five classes: very low, low, moderate, high, and very high using Jenks natural breaks optimization.Jenks natural breaks optimization is a data clustering method designed to determine the best arrangement of values into different classes.This is done by seeking to minimize each class' average deviation from the class mean, while maximizing each class' deviation from the means of other groups.In other words, the method seeks to reduce the variance within classes and maximize the variance between classes [34].Details of the three models are provided in the following subsections.

Information Value Model (IVM)
IVM is a statistical analysis method that was developed from information theory.In this model, information values of predisposing factors were used to characterize the possibility of landslides occurrence.The information value I(x i , H) of each landslide predisposing factor x i (i = 1, 2, . . ., n) can be expressed as follows [21,36,37]: where H represents the likelihood of landsliding, S is the total number of study units from the study area, N is the total area of landslides in the study area which is the sum of area of all landslide points in the study area, S i is the number of the study units with the presence of predisposing factor x i , and N i is the total area of landslides with the presence of predisposing factor x i which is the sum of area of the landslide points with the presence of predisposing factor x i .Therefore, the total information I of each study unit can be calculated as the sum of the information values of all predisposing factors [38].
when I < 0, the possibility of landslide occurrence is lower than average; when I = 0, the possibility of landsliding is equal to average; and when I > 0, the possibility of landsliding is higher than average [39].The larger the information value, the greater the possibility of landsliding.
The method is composed of the following steps: (1) Preprocessing landslide data and landslide predisposing factors data.Generating the slope and aspect distribution map by the use of DEM data and hydrology tool of ArcGIS.Thorough buffer area analysis of hydrographic network, road network, and faults generating the corresponding buffer maps.The rainfall should be interpolated to draw the rainfall distribution map; (2) Classifying landslide predisposing factors, then calculating information values of landslide predisposing factors according to Equation ( 1); (3) Overlaying the information values distribution maps of all landslide predisposing factors to calculate total information by the use of the map algebra tool, ArcGIS; and (4) Reclassifying the total information using Jenks natural breaks optimization to generate a landslide susceptibility map.

The Improved Information Value Model Based on Analytic Hierarchy Process (IVM-AHP)
IVM can be improved using an analytic hierarchy process.The construction of the improved model consists of the following steps [15,[40][41][42]: For the establishment of the hierarchy, with 1-9 and its reciprocal as the scale of the importance of predisposing factors on landslide occurrence (Table 1), the relative importance of predisposing factors is compared to construct a pairwise comparison matrix [43]. 2 The largest eigenvalue and corresponding eigenvector of the comparison matrix are calculated.The eigenvector is normalized to represent the weights of predisposing factors [44][45][46].3 The consistency of the matrix is checked.Consistency ratio (CR) is used to calculate the consistency as Equation ( 3) where RI is the mean random index that has been defined by Saaty [47] (Table 2).CI is the consistency index that is defined as where λ max is the largest eigenvalue and N is the order of the comparison matrix.When the value of CR is less than 0.1, the pairwise comparison satisfies the consistency requirements [48,49].
Otherwise, the comparison matrix must be reconstructed, which means that we should return to the first step [50].4 The total weighted information value of each study unit is obtained using the information values derived from the IVM according to Equation (5): In this equation, ω i (i = 1, 2, . . ., n) is the weight of each predisposing factor.Then, the total weighted information value can be reclassified using Jenks natural breaks optimization to generate the landslide susceptibility map.The former factor is slightly more important than the latter.5 The former factor is obviously more important than the latter.7 The former factor is intensely more important than the latter.9 The former factor is extremely more important than the latter.In this paper, the model (IVM) described in Section 4.1 is improved based on gray clustering.The information value derived from the IVM is used to obtain the relative weights of different classes within each landslide predisposing factor and to determine the weights of these factors.
In landslide susceptibility mapping, study units are clustering objects that are denoted by i(i = 1, 2, . . ., n) and landslide predisposing factors as clustering indexes are expressed as j(j = 1, 2, . . ., m).The value of the jth predisposing factor at the ith study unit is expressed as y ij .Gray classes k(k = 1, 2, . . ., s) are regarded as landslide susceptibility classes.The n is the total number of study units.The m is the number of landslide predisposing factors, which is 8 for this paper.The s is the number of landslide susceptibility classes, which is 5 in this study.Gray clustering for the landslide susceptibility mapping has the following steps [51][52][53]: Using a min-max normalization method, the data are normalized to eliminate the influence of dimension.Among all y ij values, the maximum value y M and the minimum value y m are used to normalize y ij [52]: 2 The whitening weight functions of predisposing factors are determined.The f k j (•)(1, 2, . . . ,m; k = 1, 2, . . ., s) is the whitening weight function of the kth susceptibility class of the jth predisposing factor [52]. 1 The lower whitenization weight function (Figure 3a) is [−, −, 2 The moderate whitenization weight function (Figure 3b) is x ∈ [x k j (1), x k j (2)] ISPRS Int.J. Geo-Inf.2017, 6, 18 8 of 20 3 The upper whitenization weight function (Figure 3c) is [x k j (1), k j (2), −, −] x ∈ [x k j (1), x k j (2)] 3 The clustering weight η j (j = 1, 2, . . ., m), which reflects the influence of each landslide predisposing factor on landslide occurrence, is calculated by Equation ( 10): where λ j is the sum of the positive information values of the jth landslide predisposing factor.
The m ∑ j=1 λ j is the total positive information value of all predisposing factors.
All the clustering coefficients of the ith study unit constitute a clustering vector: 5 According to the clustering vector, the susceptibility class that the study unit i(i = 1, 2, . . ., n) belongs to can be determined.The study unit i belongs to the class i.e., k * equals the value of k whose clustering coefficient 3 The clustering weight ( = 1,2, … , ), which reflects the influence of each landslide predisposing factor on landslide occurrence, is calculated by Equation ( 10): where is the sum of the positive information values of the jth landslide predisposing factor.The ∑ is the total positive information value of all predisposing factors.4 The clustering coefficient of the study unit i(i = 1,2, … , n) for the susceptibility class k(k = 1,2, … , s) is expressed as

Receiver Operating Characteristics Curve
As a useful tool to study binary problems, such as the manifestation or not of landslides, the ROC curve has been widely used to evaluate the performance of a landslide susceptibility model [54,55].The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings [56].The false-positive values along the x-axis are the proportion

Receiver Operating Characteristics Curve
As a useful tool to study binary problems, such as the manifestation or not of landslides, the ROC curve has been widely used to evaluate the performance of a landslide susceptibility model [54,55].The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings [56].The false-positive values along the x-axis are the proportion of areas classified as landslide prone zones but are actually not.In contrast, the true-positive values along the y-axis are the proportion of landsliding zones classified as landslide prone areas [54].The landslide susceptibility model is evaluated using the area under the ROC curve (AUC) [55].The value of AUC ranges from 0.5 to 1.The model with the largest AUC is regarded as the best.An AUC close to 1 suggests that the model produces a good result [54].In contrast, an AUC value close to 0.5 implies a poor result.It is generally accepted that a model has a high accuracy if the AUC of this model is larger than 0.7 [57].

Wilcoxon Signed-Rank Test
The Wilcoxon signed-rank test is a nonparametric test equivalent to the dependent t-test.As the Wilcoxon signed-rank test does not require normality for the data, it can be used when normality has been violated, and the use of the dependent t-test is inappropriate.It is used to compare two sets of scores that come from the same participants [58].In this paper, it was used to compare the spatial pattern of landslide susceptibility zones extracted by the three models to check if the prediction results of the three models are significantly different.

Results
Using 70% of the inventory landslides, three landslide susceptibility maps were generated using the information value model (IVM), the improved information value model based on analytic hierarchy process (IVM-AHP), and the new improved information value model based on gray clustering (IVM-GC).Eight landslide predisposing factors were selected for landslide susceptibility mapping, including elevation, slope gradient, aspect, rainfall, distance from the faults, distance from the road network, distance from the hydrographic network, and NDVI.

Application of IVM
According to existing research [18,19,59] or Jenks natural breaks optimization, each landslide predisposing factor was divided into five classes, except for aspect, which was divided into nine classes.Using Equation (1), the information value of each class of landslide predisposing factor was calculated (Table 3).
In terms of elevation, as indicated in Table 3, the 100-200 m class had the largest information value of 1.342, followed by 0.134 at 200-300 m.The remaining classes were negative.Therefore, landslides were prone to occur between 100 and 300 m.
As for slope gradient, most landslides occurred between 10 • and 35 • .The maximum information value of 0.494 was found in the range of 10 • -20 • , which was the range that landslides were most likely to occur.
For aspect, the information values varied from −0.226 to 0.387.The maximum was found on the northwest exposure and the minimum was on the flat areas.Therefore, the probability of landslide occurrence was the largest in the northwest exposure and least in the flat areas.
For the distance from the hydrographic network, the largest information value was 0.441 at the interval of <1000 m.The information value of the distance from the hydrographic network between 1000 and 2000 m was the second largest at 0.338.The ranges 2000-3000 m and 3000-4000 m had the information values of −0.134 and −0.436, respectively.The >4000 m class had the smallest information value.From these results, it was clearly shown that landslides were more likely to occur when the distance from the hydrographic network was less than 1000 m.The possibility of landslide occurrence in the >4000 m class was the least.GS: distance from faults, HN: distance from hydrographic network; RN: distance from road network; S: the total number of the study units of the study area; N: the total area of landslides in the study area; S i : the number of the study units with the presence of predisposing factor x i ; N i : the total area of landslides with the presence of predisposing factor x i .
For the distance from the faults, intervals 0-600 m and 600-1200 m had the information values of 0.205 and 0.053, respectively.The information values for the 1800-2400 m and >2400 m classes were negative.This indicated that landslides were more likely to occur when the distance to the faults was less than 1800 m.There was the least possibility of landslide occurrence in the >1800 m range.
The information value of rainfall ranged from −1.553 to 0.247.In the study area, rainfall between 1100 and 1200 mm/year had the largest information value of 0.247, which suggested that the probability of landslide occurrence in this interval was greater than for any other intervals.The information value of rainfall between 1200 and 1250 mm/year was the second largest (0.216).This result was inconsistent with common knowledge that the information value should gradually increase with increasing rainfall.This phenomenon may be due to sudden rainstorms, which also contributed greatly to the occurrence of landslides [60].
As for the distance from the road network, the largest information value was 0.619 at the interval of 0-200 m.Beyond 800 m, the information value was the least at −0.164, indicating the lowest landslide frequency.
With respect to NDVI, the <0.55 class had the largest information value of 0.663 and the >0.85 class had the smallest information value of −0.781.The information value gradually decreased with increasing NDVI value.
Landslide susceptibility was determined based on the total information, which was the sum of the information values of all landslide predisposing factors.Based on Jenks natural breaks optimization, the total information was divided into five classes, including very low, low, moderate, high, and very high susceptibility.Then, the landslide susceptibility map of the Chongqing study area was generated (Figure 4).optimization, the total information was divided into five classes, including very low, low, moderate, high, and very high susceptibility.Then, the landslide susceptibility map of the Chongqing study area was generated (Figure 4). Figure 4 shows that low susceptibility areas were mainly distributed in the southwest and northeast of the study area.High susceptibility areas were distributed in a banded pattern, along the same directions as most roads, hydrographic networks and faults.Low susceptibility areas occupied 32.96% of the study area, which was the largest proportion among all classes, while very low, moderate, high, and very high susceptibility areas accounted for 14.36%, 30.37%, 16.45,% and 5.86% of the study area, respectively.

Application of IVM-AHP
Table 4 shows the pairwise comparison matrix and weights of landslide predisposing factors determined by the analytic hierarchy process.As indicated in Table 4, slope gradient had the maximum weight, i.e., the largest influence on landslide occurrence.The weight of aspect was the minimum, meaning that aspect had the least influence on occurrence of landslides.The weight of elevation, distance from the faults, distance from the hydrographic network, distance from the road network, rainfall, and NDVI were 0.082, 0.155, 0.059, 0.041, 0.258, and 0.035, respectively.
The consistency of the pairwise comparison matrix was tested using a consistency ratio (CR).The consistency index (CI) and CR values were 0.116 and 0.082, respectively, which indicated that Figure 4 shows that low susceptibility areas were mainly distributed in the southwest and northeast of the study area.High susceptibility areas were distributed in a banded pattern, along the same directions as most roads, hydrographic networks and faults.Low susceptibility areas occupied 32.96% of the study area, which was the largest proportion among all classes, while very low, moderate, high, and very high susceptibility areas accounted for 14.36%, 30.37%, 16.45,% and 5.86% of the study area, respectively.

Application of IVM-AHP
Table 4 shows the pairwise comparison matrix and weights of landslide predisposing factors determined by the analytic hierarchy process.As indicated in Table 4, slope gradient had the maximum weight, i.e., the largest influence on landslide occurrence.The weight of aspect was the minimum, meaning that aspect had the least influence on occurrence of landslides.The weight of elevation, distance from the faults, distance from the hydrographic network, distance from the road network, rainfall, and NDVI were 0.082, 0.155, 0.059, 0.041, 0.258, and 0.035, respectively.The consistency of the pairwise comparison matrix was tested using a consistency ratio (CR).The consistency index (CI) and CR values were 0.116 and 0.082, respectively, which indicated that the pairwise comparison matrix satisfied the consistency requirement.Using the information values of each class of landslide predisposing factor derived from the IVM, the weighted information values were obtained using Equation ( 5) and then the landslide susceptibility map was generated (Figure 5).As shown in Figure 5, low susceptibility areas were mainly distributed in the southwest and northeast of the study area.High susceptibility areas were distributed in a belt pattern, which was similar to the results of the information value model shown in Figure 4. Low susceptibility areas occupied the highest proportion, reaching 34.93%, while very low susceptibility areas took up only 3.31%.Moderate, high and very high susceptibility areas accounted for 28.70%, 21.56% and 11.49% of the study area, respectively.

Application of IVM-GC
Based on the information values of the landslide predisposing factors derived from the IVM, the landslide susceptibilities were divided into the following five classes for each predisposing factor: very low, low, moderate, high, and very high.A larger information value indicates a higher possibility of landslide occurrence, i.e., the class of higher landslide susceptibility.Table 5 shows the landslide susceptibility classification of predisposing factors.It clearly indicates that 100-200 m elevation, 10-20 • slope gradient, <600 m distance from the faults, <200 m distance from the road network, 1100-1200 mm/year rainfall, <1000 m distance from the hydrographic network, the northwest exposure aspect, and <0.55 NDVI had the highest landslide susceptibility, i.e., the very high class.In contrast, <100 m elevation, <5 • slope gradient, >2400 m distance from the faults, >800 m distance from the road network, <1000 mm/year rainfall, >4000 m distance from the hydrographic network, flat aspect, and >0.85 NDVI fell into the very low susceptibility class.After normalizing the data, the clustering weights of landslide predisposing factors were calculated using Equation (10).The results shown in Table 6 indicate that the clustering weight (0.222) of the elevation was the maximum, while the distance from the faults had the lowest clustering weight (0.048).The weights of slope gradient, distance from road network, rainfall, distance from hydrographic network, aspect, and NDVI were 0.074, 0.177, 0.083, 0.117, 0.092, and 0.187, respectively.Subsequently, for each study unit, by calculating the clustering coefficient, the clustering vector was generated.The susceptibility class that each study unit belonged to was finally determined with Equation ( 13) and the resulting landslide susceptibility map of Chongqing is shown in Figure 6.
As shown in Figure 6, low susceptibility areas occupied 39.78% of the study area, followed by 23.44% for moderate susceptibility areas.Very low, high, and very high susceptibility areas accounted for 14.77%, 17.81% and 4.20% of the study area, respectively.In addition, low susceptibility areas were mainly distributed in the southwest and northeast of the area.High susceptibility areas were distributed in a banded pattern, similar to the result of information value model shown in Figure 4.
Equation ( 13) and the resulting landslide susceptibility map of Chongqing is shown in Figure 6.
As shown in Figure 6, low susceptibility areas occupied 39.78% of the study area, followed by 23.44% for moderate susceptibility areas.Very low, high, and very high susceptibility areas accounted for 14.77%, 17.81% and 4.20% of the study area, respectively.In addition, low susceptibility areas were mainly distributed in the southwest and northeast of the area.High susceptibility areas were distributed in a banded pattern, similar to the result of information value model shown in Figure 4.

Model Validation
In this study, the generated susceptibility maps were evaluated using a receiver operating characteristics (ROC) curve.In addition, Wilcoxon signed-rank test was used to check if the spatial pattern of the landslide susceptibility zones generated by the three models were similar.

Receiver Operating Characteristics Curve
In this study, using the ROC curve, the success rate and prediction rate were calculated to assess the model accuracy and prediction ability of the three models.The success rate was obtained by comparing the 5905 landslides used for model training with the generated landslide susceptibility map (Figure 7).As shown in Figure 7, the x-axis represented the proportion of areas classified as landslide prone zones that are actually not.The y-axis represented the proportion of landslide zones classified as landslide prone areas.The AUC values of the IVM, IVM-AHP, and IVM-GC were 0.818, 0.787, and 0.852, respectively.Therefore, the model accuracies of the IVM, IVM-AHP, and IVM-GC were 81.8%, 78.7% and 85.2%, respectively.IVM-GC had a better performance in model construction than the IVM and IVM-AHP.The remaining 2530 (30%) landslides were compared with the landslide susceptibility map to calculate the prediction rate (Figure 8).The AUC value of the IVM was 0.820, the AUC value of the IVM-AHP was 0.787, and the AUC value of the IVM-GC was 0.854.Therefore, the prediction accuracies of the IVM, IVM-AHP and IVM-GC were 82.0%, 78.7% and 85.4%, respectively.IVM-GC had the largest AUC value, while IVM-AHP had the smallest AUC value.Thus, IVM-GC had a better prediction capability than the IVM and IVM-AHP.
By comparing the results shown in Figures 7 and 8, the AUC value of IVM-GC was the largest, followed by IVM, and IVM-AHP had the lowest value in both Figures 7 and 8.It was shown that the success rate curve was similar to the prediction rate curve.In addition, the AUC values of the three models were all larger than 0.7, which suggested that the three models performed well for evaluating the landslide susceptibility of Chongqing.Among them, the AUC of IVM-GC is the largest, which indicated that IVM-GC was a relatively good method for landslide susceptibility mapping in the study area in comparison to the other two models.
followed by IVM, and IVM-AHP had the lowest value in both Figures 7 and 8.It was shown that the success rate curve was similar to the prediction rate curve.In addition, the AUC values of the three models were all larger than 0.7, which suggested that the three models performed well for evaluating the landslide susceptibility of Chongqing.Among them, the AUC of IVM-GC is the largest, which indicated that IVM-GC was a relatively good method for landslide susceptibility mapping in the study area in comparison to the other two models.

Wilcoxon Signed-Rank Test
Using SPSS Statistics 22 software, the p-value was calculated to determine statistically significant differences (p-value < 0.05).By comparing the landslide susceptibility classification of IVM with the landslide susceptibility classification of IVM-GC in the same location, the p-value was 0.131.A comparison between the landslide susceptibility classification of IVM and IVM-AHP had a p-value of 0.458.For the comparison between the landslide susceptibility classification of IVM-GC and IVM-AHP, the p-value was 0.544.All p-values of the three comparison results were larger than 0.05.Therefore, we conclude that the landslide susceptibility mapping results of the three models had no statistically significant differences.

Discussion
For each landslide predisposing factor, information values vary among different classes (Table 3).The class with the largest information value has the highest possibility of landslide development.Each of the predisposing factors makes its own contribution to landslide occurrence, and, hence, landslides are caused by a combination of predisposing factors.According to the results of the information values shown in Table 3, the combination of landslide predisposing factors, including 100-200 m elevation, 10°-20° slope gradient, <600 m distance from the faults, <200 m distance from the road network, 1100-1200 mm/year rainfall, <1000 m distance from the hydrographic network, the northwest exposure aspect, and <0.55 NDVI, had the largest total information value and made the greatest contribution to landslide occurrence.With respect to the correlation of the variables, we have checked whether the factors used are independent from each other by utilizing a multicollinerarity test.The results show that there is a certain multicollinearity among these factors.However, some studies indicated that multicollinearity does not affect the goodness of fit and the goodness of prediction [61].
In this paper, landslide susceptibility was reclassified into the following five classes: very low, low, moderate, high, and very high.Landslide susceptibility maps were produced using the

Wilcoxon Signed-Rank Test
Using SPSS Statistics 22 software, the p-value was calculated to determine statistically significant differences (p-value < 0.05).By comparing the landslide susceptibility classification of IVM with the landslide susceptibility classification of IVM-GC in the same location, the p-value was 0.131.A comparison between the landslide susceptibility classification of IVM and IVM-AHP had a p-value of 0.458.For the comparison between the landslide susceptibility classification of IVM-GC and IVM-AHP, the p-value was 0.544.All p-values of the three comparison results were larger than 0.05.Therefore, we conclude that the landslide susceptibility mapping results of the three models had no statistically significant differences.

Discussion
For each landslide predisposing factor, information values vary among different classes (Table 3).The class with the largest information value has the highest possibility of landslide development.Each of the predisposing factors makes its own contribution to landslide occurrence, and, hence, landslides are caused by a combination of predisposing factors.According to the results of the information values shown in Table 3, the combination of landslide predisposing factors, including 100-200 m elevation, 10 • -20 • slope gradient, <600 m distance from the faults, <200 m distance from the road network, 1100-1200 mm/year rainfall, <1000 m distance from the hydrographic network, the northwest exposure aspect, and <0.55 NDVI, had the largest total information value and made the greatest contribution to landslide occurrence.With respect to the correlation of the variables, we have checked whether the factors used are independent from each other by utilizing a multicollinerarity test.
The results show that there is a certain multicollinearity among these factors.However, some studies indicated that multicollinearity does not affect the goodness of fit and the goodness of prediction [61].
In this paper, landslide susceptibility was reclassified into the following five classes: very low, low, moderate, high, and very high.Landslide susceptibility maps were produced using the following three different methods: IVM, IVM-AHP, and IVM-GC.In these maps, high susceptibility areas were basically distributed along the northeast to southwest direction in the study area.The high susceptibility areas were close to the geological structure, road network and hydrographic network and were mostly located in moderate slope gradient areas, where the information values were higher.
AUC was selected to evaluate the success rate and prediction rate of the three landslide susceptibility models.Theoretically, the model with the largest AUC value is the best.Based on the validation results, all three models performed well for evaluating landslide since their AUC values were all larger than 0.7.In addition, both success rate and prediction rate of IVM-GC are the largest, compared to IVM and IVM-AHP.Therefore, IVM-GC had a better performance than the other two models in the study area.IVM regards all landslide predisposing factors the same level of importance and assigns an equal weight to each factor.The criteria to construct the comparison pairwise of predisposing factors depend on the experience of researcher, which is subjective and is the main disadvantage of IVM-AHP.Moreover, IVM-GC inherits the advantages of the information value model, which can obtain the relative weights of different classes for each landslide predisposing factor, and appropriately determine the weights of the predisposing factors.However, the classification of each predisposing factor was based on literature and may not be the best for this case.Therefore, in future research, the effects of predisposing factor classification on landslide susceptibility assessment should be studied and an objective classification method should be advanced.There are many earthquake-induced landslides in Northwestern Chongqing; however, due to the limitation of the data, we did not use earthquakes as a predisposing factor in our model.Thus, earthquakes should be considered in future research [62].In addition, landslide susceptibility should be performed considering the different landslide typologies and a separation between landslide triggering conditions.Moreover, the study unit of this paper was 30 × 30 m, which was not sufficiently linked to the topography and geomorphology.In contrast, the slope unit is more related to the geomorphology, which is defined as a unit between the ridge and valley.Hence, future studies should use the slope unit as the study unit.Furthermore, landslide areas were not used to validate the landslide susceptibility models, which could be a source of uncertainty.Further studies should take this into account.
The Wilcoxon signed-rank tests indicated that the landslide susceptibility mapping results of the three models had no statistically significant differences in the spatial pattern of landslide susceptibility zones.This suggests that the prediction results of the three models were similar because these models were all based on the information value.

Conclusions
This paper proposes an improved information value model based on gray clustering for landslide susceptibility assessment.Using slope gradient, aspect, rainfall, elevation, distance from the road network, distance from the hydrographic network, distance from the faults, and NDVI as landslide predisposing factors, landslide susceptibility maps of Chongqing, China were generated based on three models, i.e., IVM, IVM-AHP, and IVM-GC.
The resultant landslide susceptibility maps show that the high susceptibility areas are mainly distributed along the northeast to southwest direction in the study area.The Wilcoxon signed-rank tests indicated that the spatial pattern of the landslide susceptibility zones generated by the three models had no statistically significant differences.ROC was used to evaluate these models by comparing the success rate and prediction rate.By calculating the AUC values of the success rate and the prediction rate curves, all three models performed well in evaluating the landslide susceptibility of Chongqing.Among them, IVM-GC had the best performance for landslide susceptibility mapping in the study area.IVM-GC not only inherits the advantages of the information value model, which can obtain the relative weights of different classes of each landslide predisposing factor but can also appropriately determine the weights of predisposing factors.
In our newly improved IVM-GC model, however, the classification of each predisposing factor was based on relevant literature and may not be the best for this case.Therefore, further studies should explore the effects of predisposing factor classification on landslide susceptibility assessment, and an objective classification method should be advanced.In addition, earthquakes should be used as a predisposing factor in our model and the different landslide typologies and a separation between landslide triggering conditions should be considered.Furthermore, the slope unit, which is more related to the topography and geomorphology, should be used as the study unit for future research.

Figure 1 .
Figure 1.Topographic map of Chongqing showing the location of research area.Points indicate landslide events before 2014.

Figure 1 .
Figure 1.Topographic map of Chongqing showing the location of research area.Points indicate landslide events before 2014.
The NDVI data were provided by International Scientific & Technical Data Mirror Site, Computer Network Information Center, Chinese Academy of Sciences.(http://www.gscloud.cn)at a resolution of 500 m and resampled to a resolution of 30 m. Landslide predisposing factors maps are shown in Figure 2. ISPRS Int.J. Geo-Inf.2017, 6, 18 4 of 19 The NDVI data were provided by International Scientific & Technical Data Mirror Site, Computer Network Information Center, Chinese Academy of Sciences.(http://www.gscloud.cn)at a resolution of 500 m and resampled to a resolution of 30 m. Landslide predisposing factors maps are shown in Figure 2.
Aspect ( d) Distance from faults

Figure 2 .
Figure 2. Landslide predisposing factors maps of the study area.

Figure 4 .
Figure 4.The landslide susceptibility map of Chongqing based on information value model.

Figure 4 .
Figure 4.The landslide susceptibility map of Chongqing based on information value model.

19 Figure 5 .
Figure 5.The landslide susceptibility map of Chongqing based on IVM-AHP.

Figure 5 .
Figure 5.The landslide susceptibility map of Chongqing based on IVM-AHP.

Figure 6 .
Figure 6.The landslide susceptibility map of Chongqing based on IVM-GC.Figure 6.The landslide susceptibility map of Chongqing based on IVM-GC.

Figure 6 .
Figure 6.The landslide susceptibility map of Chongqing based on IVM-GC.Figure 6.The landslide susceptibility map of Chongqing based on IVM-GC.

Table 1 .
The scale and definition of pairwise comparison matrix.
4.3.The Improved Information Value Model Based on Gray Clustering (IVM-GC)

Table 3 .
Information values of landslide predisposing factors.

Table 4 .
The pairwise comparison matrix.
GS: distance from faults; HN: distance from hydrographic network; RN: distance from road network.

Table 4 .
The pairwise comparison matrix.

Table 5 .
The susceptibility classification of landslide predisposing factors.
GS: distance from faults, HN: distance from hydrographic network, RN: distance from road network.

Table 6 .
The weight of each predisposing factor.