Comparison of Logistic Regression, Information Value, and Comprehensive Evaluating Model for Landslide Susceptibility Mapping

: This study validated the robust performances of the recently proposed comprehensive landslide susceptibility index model (CLSI) for landslide susceptibility mapping (LSM) by comparing it to the logistic regression (LR) and the analytical hierarchy process information value (AHPIV) model. Zhushan County in China, with 373 landslides identiﬁed, was used as the study area. Eight conditioning factors (lithology, slope structure, slope angle, altitude, distance to river, stream power index, slope length, distance to road) were acquired from digital elevation models (DEMs), ﬁeld survey, remote sensing imagery, and government documentary data. Results indicate that the CLSI model has the highest accuracy and the best classiﬁcation ability, although all three models can produce reasonable landslide susceptibility (LS) maps. The robust performance of the CLSI model is due to its weight determination by a back-propagation neural network (BPNN), which successfully captures the nonlinear relationship between landslide occurrence and the conditioning factors.


Introduction
In terms of economic and death impact, landslides rank seventh globally [1]; they cause damage to roads, railways, power lines, and even tourism and historical sites [2,3]. China is a mountainous country, with its development severely restricted by landslides. Many efforts have been made to prevent and alleviate landslides. Landslide hazard is characterized by two main components: the first is temporal, related to landslide frequency in a particular area; the second one is spatial and is related to the spatial probability of occurrence of landslides, so-called "susceptibility" [4]. Landslide susceptibility mapping (LSM) is critical for landslide prevention [5,6]. Herein, the main points of landslide susceptibility mapping are simply summarized: first, the conditioning factors (CFs) and their contributions are determined by analyzing the characteristics and distribution of the existing landslides; after that, through a linear or nonlinear way, the relationship between CFs and landslide susceptibility is established; finally, using the relationship, LSM for unknown areas can be completed [7]. LSM is the spatial assessment of landslide at initial stage. Its accuracy directly affects the rationality of the site selection as well as the decision of the disaster control. Therefore, LSM has vital practical significance.
The research techniques used for LSM can be roughly categorized into qualitative and quantitative ones [8,9]. The qualitative methods are based on prior knowledge of experts. The basic idea of the qualitative method is that experts identify the judgment rules for conditioning factors and then perform a weighted summation of them to obtain the networks (BPNNs). The FR method is used to determine the weighted value for each class of the conditioning factors of landslide, the BPNN is utilized to determine the weighted value for every factor, and the CA is used to optimize the non-landslide samples before the BPNN process. For convenience, this model is denoted as the CLSI model, which is the abbreviation of the comprehensive landslide susceptibility index [45] model.
To further verify the superiority and generalizability of this model, Zhushan County was taken as a study region. It is a landslide-prone area in Hubei Province of China.
Two representative methods, namely, the LR and the analytic hierarchy process information value (AHPIV) model, were used for comparison. Specifically, LR represents the traditional statistical method, and AHPIV represents an integrated method that combines prior knowledge and subjective weight determination. With these methods, the landslide susceptibility (LS) maps of the study region were produced, respectively, and their performances were evaluated in terms of prediction accuracy and classification ability. The verification methods included the area under the receiver operating feature curve (AUC), seed cell area index (SCAI), and the cumulative number of landslide points.

LR Model
The LR is one of the most common statistical methods in earth sciences [39]. The variables in the LR model can be either discrete or continuous. For continuous variables, normal distributions are not required [18]. This feature is quite useful in LSM due to the diversity and complexity of CFs [46]. In the LR model, the relationship between landslide occurrence and the CFs can be described as [37]: where P (0 ≤ P ≤ 1) is the probability of landslide occurrence, meanwhile, Z is linear logistic parameter Logit (P) with the range of (−∞, +∞). Equation (2) shows the detailed calculation of Z [2]. Z = Logit(P) = ln( P 1 − P ) = β 0 + β 1 x 1 + · · · + β n x n (2) where n is the quantity of landslide CFs, β 0 the constant coefficient, β 1 . . . β n the partial regression coefficients, and x 1 . . . x n the independent variables (i.e., the CFs in this study).

AHPIV Model
The AHPIV model is an integrated model that can be expressed by a weighted sum equation [8]: where w i is CF weights and I i is the information value (IV) of the CF class. In the AHPIV, the AHP is used to obtain the w i , and the information value method is used to obtain the I i . The two methods are briefly described as follows.

AHP Method
The AHP is a semi-qualitative multi-criteria decision-making technique, which is widely used in many research fields including LS. It can consider both subjective and objective factors while making the decision [31,43]. It can be used alone [43,47] or in combination with other methods [8] in the LSM.
Using the AHP to determine the weights of CFs, the following steps are involved: (i) build a hierarchy model of factors; (ii) establish a judgment matrix through pairwise comparison (represent the importance from less to more using 1 to 9); (iii) calculate the principal eigenvalue (λ max ) and the corresponding eigenvector of the judgment matrix; (iv) test consistency using the consistency ratio (CR) (see Equation (4)) [48]. CR must be less than 0.1; (v) normalize principal eigenvector to obtain factor weights. In Equation (4), RI is the random consistency index see Table 1 [48,49], and n is the order of the judgment matrix.  [37,43,50]. An information value I i of a CF class can be defined as [43]: where L im is the number of landslide grid cells in mth class of ith factor, T im is the number of the grid cells in mth class of ith factor, L is the total number of landslide grid cells, and T is the total number of grid cells in the study area. The existence of a CF class is adverse to landslide development when I i is negative; the existence of a CF class is conducive to landslide development when I i is positive [51].

The Comprehensive Evaluating (CLSI) Model
Similar to the AHPIV model, the comprehensive evaluating (CLSI) model is also an integrated model [6]. The main purpose of the CLSI model is to calculate the "LSI" of each grid cell [46] see Equation (6).
where w i denotes CF weights and R im denotes the frequency ratio (FR). In the CLSI, the BPNN is applied to obtain w i , and the FR method is used to evaluate R im . The two methods are briefly described as follows.

BPNN Method
The BPNN model commonly includes three layers (input, hidden, and output layers) [43]. The quantized values of the CFs form the input layer, and the absence or presence of landslide, represented by 0 or 1, respectively, is within the output layer. Neurons in these layers are connected to each other by weight values [43]. During training, the networks can adjust the weights between layers according to the importance of each input data [52]. Therefore, after the BPNN model is well trained, the weight of each CF can be calculated by inversion. This study refers to the weight inversion process provided by Zhou (1999) [52].
To deal with the weight determination through BPNN, two types of samples are needed to construct a BPNN model. One is landslide samples, and the other is nonlandslide samples [5]. Since regional landslide survey data are usually incomplete, the non-landslide area identified by the traditional sampling method has a large sampling error; this error greatly affects the prediction results due to the fact that sample preprocessing can improve the accuracy of neural network models. This study optimized the selection of a non-landslide data set, using the two-step cluster analysis (TSCA). A brief introduction of TSCA is given in a previous study [6].

FR Method
The contribution of different classes of each factor to landslide development is different. To represent this distinction quantitatively, the frequency ratio (FR), which is a widely used traditional probability method for LSM [6,16,18,[53][54][55][56], is used. The FR is defined as the ratio of landslide presence (a im ) divided by the ratio of landslide absence (b im ) as in Equation (7) [6].
where R im is the frequency ratio of mth class of ith factor; the meanings of L im and L are consistent with Equation (5); N im is the number of non-landslide grid cells that fall in mth class of ith factor; and N is the total number of non-landslide grid cells.

Description of Study Area
Zhushan County is located in the northwest of Hubei Province of China. It encompasses a total area of 3587.8 km 2 between the coordinates 109 • 32 E and 110 • 25 E longitude and 31 • 30 N to 32 • 37 N latitude Figure 1. The region has a subtropical monsoon climate with abundant rainfall and four distinct seasons. There are a total of 646 rivers, with a river network density of 0.76 km/km 2 . The major geomorphic types are mountainous, hills, rift basins, and valley terraces. The study area includes seven geological age units, which are Sinian, Cambrian, Ordovician, Silurian, Cretaceous, Tertiary, and Quaternary formations. The main lithology of these formations is magmatic, clastic, and carbonate rocks. The Quaternary covers are mainly distributed in the northern and northeastern parts of the study region.

Landslide Inventory
The geological conditions of Zhushan County are complex, the terrain is undulating, and the distribution density of landslides is high. Based on the long-term field investigation and historical data provided by the Geological Environmental Center of Hubei Province, this study locates 373 landslides.
The landslide inventory data set consists of 373 landslides, which are denoted as black triangles Figure 2. After using the "feature to raster" function of ArcGIS software, the landslide inventory map was converted into raster data with a grid size of 50 m × 50 m. These landslides cover 1010 grid cells on the 1:50,000 DEM map of the study area.

Conditioning Factors
Landslide CFs are essential in LSM [19,57]. Due to the diversity of regional geological backgrounds, there is no universal criterion for conditional factor selection [6,9]. The occurrence of landslides is the result of multiple factors, mainly divided into two types: external inducing factors and intrinsic background factors. The former includes human engineering activities, rainfall, earthquakes, etc. The latter includes topography, geological structure, lithology, etc. Based on the literature, data availability, and our experience, we selected 8 CFs: lithology, slope structure, slope angle, altitude, distance to river, stream power index (SPI), slope length, distance to road.
In order to show the relationship between each CF and the occurrence of landslide, three parameters are given in the form of polyline-histogram, respectively: (1) the percentage of each class of each CF, i.e., t im = T im /T; (2) the percentage of landslides in each class of each CF, i.e., a im = L im /L; (3) the IV of each class of each CF I i . See Section 2.2 for specific calculations.

Lithology
Due to the difference in material strength, slopes with different lithologies have different potentials to become landslides [36]. In this study, considering both mechanical properties and structural integrity of rock and soil mass, the main lithology was reclassified into four types, which are hardest rock, medium-hard rock, soft rock, and soil Figure 3a.
The histogram in Figure 4a shows that the soft rock group in the region has the widest distribution range, accounting for 56.17% of the total number of grid cells, and the soil has the smallest distribution range, accounting for only 1.02%. However, from the IV curve, the landslide density in the soil group is the highest, with an IV of 1.76. Meanwhile, the hardest rock group has the lowest IV of −1.09.

Slope Structure
Slope structure is an important property describing the type of slope and an important indicator of slope stability. It reflects the positional relationship between rock layers and the empty surface of the slope. Slope structure can usually be divided into forward slope, diagonal slope, cross slope, and reverse slope. Different slope structure types have great differences in the development characteristics and degree of landslides. For example, the forward slope is prone to large-scale landslides, controlled by the lithological interface and weak interlayer. Based on the raster calculation function of ArcGIS, this paper uses the angle between the slope aspect and the rock layer inclination (range 0-180 • ) to characterize the slope structure of the study area [58] and divide the slope structure into 4 types according to the calculation results: 0-45 • , 45-120 • , 120-160 • , and 160-180 • , as shown in Figure 3b.
The bar chart in Figure 4b shows that the slope structure of 45-120 • has the widest distribution range (43.27%). This is because, in the study area, most of the landslides are cross slopes; the slope structure of 160-180 • has the smallest distribution range (10.98%). According to the IV curve, 0-45 • is a forward slope, and this range has the maximum IV of 1.03, which is consistent with the conclusion that the forward slope is more prone to landslide.

Slope Angle
A large number of statistics have shown that the slope angle is closely related to landslide activity [59][60][61][62]. For this reason, it has been considered an important CF in LSM [24,36,53,[63][64][65]. In this area, the slope angles are divided into 6 classes, as shown in Figure 3c.
The histogram in Figure 4c shows that slopes with slope angles of 20-30 • have the widest distribution range (35.27%). The number of landslides that fall in this class is also the largest (43.66%). The area with a slope angle greater than 50 • accounts for the smallest percentage of total area (1.98%), as well as the smallest percentage of existing landslide (0.59%). In addition, according to the IV curve, in the study area, with increasing slope angle, the IV of each class increases first and then decreases, reaching the maximum in the 20-30 • class.

Altitude
Much literature selects the altitude as the CF of landslide [11,12,33,54,66]. It is related to other geological and landform processes, such as weathering erosion, accumulation of debris, and slope deformation, etc. The altitudes in the considered study region range from 230 m to 2600 m ( Figure 3d). To facilitate summary, the altitudes are divided into 10 classes, as shown in Figure 4d.
The histogram in Figure 4d shows that the number of grid cells in each class has small differences. The largest number of landslides developed in the section with an altitude of <500 m, accounting for 30.50% of the number of landslides. Besides, the IV of each class decreases with the increase of altitude.

Distance to River
The water system in the study area mainly includes rivers, seasonal streams, and gullies with low terrain. The distance to river can partially reflect the hydrological environment of the slope, and its significance in landslide occurrence is widely recognized [36,40,43,63,[66][67][68]. According to the hydrogeological map of Zhushan County, the distance to river was obtained and divided into 6 classes ( Figure 3e).
The histogram in Figure 4e shows that the number of grid cells in the range of >1000 m is the largest (49.19%), and the remaining classes have small differences. The information value curve shows an obvious decreasing trend, indicating that the development of landslide is negatively correlated with the distance to river in Zhushan County. When the distance is less than 200 m, the information value reaches the maximum of 0.89, while when the distance is greater than 1000 m, the information value reaches the minimum of −0.51.

Stream Power Index (SPI)
The SPI is used to characterize potential erosive power associated with flowing stream [6,69]. It considers the slope geometry as well as the landscape at a given point. Many studies on LSM have noticed the impact of SPI [33,40,70]. It can be calculated by Equation (8) [39]: where A (m 2 ) is the specific catchment area and β ( • ) is the local slope gradient [39]. The distribution of SPI is shown in Figure 3f.
The histogram in Figure 4f shows that the class with the SPI value [0,1) accounts for the largest percentage of the total grid cells (51.24%), as well as the largest percentage of the landslide grid cells (68.47%), thus owning the maximum information value of 0.42. The information value curve has an approximately normal distribution; with the increase of SPI, the information value first increases and then decreases.

Slope Length
The slope length is a parameter in the Universal Soil Loss Equation (USLE), which has been taken into account in soil erosions and LSM [71][72][73]. Slope length refers to the distance of uninterrupted overland flow along the slope [73]. Higher erosion rick usually occurs on steeper and longer slopes, and vice versa [74]. The slope length is calculated using Equation (9) [73,75,76]: where λ is the slope length along the horizontal projection and β is the ratio of rill erosion to interrill erosion. The distribution of slope length is shown in Figure 3g.
The histogram in Figure 4g shows that classes 20-40 (24.06%) and 40-60 (23.51%) have the largest proportions of total grid cells, as well as the largest proportions of landslide grid cells: 34.06% and 28.81%, respectively. According to the information value curve, in the range where slope length is less than 160, as the slope length increases, the information value increases steadily, reaching a maximum value of 0.38 in the class of 140-160. However, when the slope length is greater than 180, the information value drops sharply to a negative value of −0.93.

Distance to Road
Frequent human engineering activities will exacerbate landslide hazards. A large number of landslides that occurred on embankments or cut slopes confirm this conclusion [77,78]. Therefore, this study, like many others [12,21,54,63,68], regards the distance to road as an important CF in LSM. According to the topographic map of Zhushan County, the distance to road was obtained and divided into 6 classes (Figure 3h).
According to Figure 4h, the number of grid cells with a class >1000 m is far larger than other classes, accounting for 57.72% of the total. The landslides it contains account for 34.75%. In addition, from the IV curve, the development of landslide is negatively correlated with the value of distance to road. When the distance is less than 200 m, the information value reaches the maximum of 1.21, while as the distance is greater than 1000 m, the information value reaches the minimum of −0.73.

Landslide Susceptibility Mapping
The LS maps of Zhushan County are respectively produced by the LR model, AHPIV model, and the CLSI model, as shown in Figure 5.

LSM Using LR Model
In our LR model, eight CFs are taken as independent variables: lithology (x 1 ), slope structure (x 2 ), slope angle (x 3 ), altitude (x 4 ), distance to river (x 5 ), SPI (x 6 ), slope length (x 7 ), and distance to road (x 8 ). The CF is normalized in advance. Select all landslide grid cells, 1010 in total, and mark them as 1. Subsequently, 1010 non-landslide grid cells are randomly picked and marked as 0. These 2020 grid cells together form the prediction sample set of the LR model. The data are exported to the statistical analysis software SPSS. The equation of Z is obtained as follows: Subsequently, based on the raster calculation function of ArcGIS, through Equations (1) and (2), the P value of each grid cell is calculated. According to the natural breakpoints method, the LSM results are divided into 5 levels, which are very low (29.84%), low (14.37%), moderate (15.42%), high (29.31%), and very high (11.06%). The LS map obtained by the LR model is shown in Figure 5a.

LSM Using AHPIV Model
To calculate the I n of each grid cell using Equation (3), the weight of each factor needs to be determined first, using the AHP method. The key to this method is to build a reasonable hierarchy. According to the landslide investigation data, this article believes that the gestation and development of landslides are controlled by five aspects: structural geology, topography and landforms, hydrological geology, environmental changes, and external disturbances. From these aspects, an evaluation system containing 8 CFs is established, and the hierarchical structure is shown in Figure 6. According to the hierarchical structure in Figure 6, the first-level and second-level judgment matrices are constructed, as shown in Tables 2 and 3, respectively. Since there is only one secondary CF in the categories of environmental changes and external disturbances, there is no need to construct a second-level judgment matrix. Through calculation, the CR values of four judgment matrices are respectively 0.09, 0, 0, and 0. All meet the consistency test. The principal eigenvectors of the above matrix are respectively calculated, and the weights of eight CFs are obtained through normalization see Table 4.  In Equation (5), the I i values correspond to each class for each CF see Table 5. Subsequently, based on the raster calculation function of ArcGIS, through Equation (3), the distribution range of I n in Zhushan County is calculated. By the natural breakpoints method, the LSM results are divided into 5 levels, which are very low (20.76%), low (30.26%), moderate (26.37%), high (18.74%), and very high (3.88%). The LS map obtained by the AHPIV model is shown in Figure 5b.

Non-Landslide Area Selection
Before the BPNN procedure, the two-step cluster analysis (TSCA) was used for sample preprocessing. In order to determine a dataset that can better characterize the geological environmental conditions of non-landslide, Figure 7 shows the difference between the sampling process in this study and the traditional sampling process. Due to their close relationship with the landslide occurrence, these eight CFs were also used as evaluation indicators for TSCA. Data normalization is required before analysis. The study region is divided into 5 clusters, and the results are shown in Figure 8.
The distribution of the clusters in Figure 8 shows that (1) only Clusters 1 and 2 have an obvious band-like distribution trend, and the rest are distributed in the form of scattered areas; (2) the clustering result is controlled by multiple evaluation factors such as lithology, distance to river, etc.; and (3) the outer circle shows relatively small differences of total grid cells in each cluster, meanwhile, the inner circle shows big differences of landslide grid cells in each cluster.
The following conditions are used to filter the target clusters for non-landslide grid cells sampling. Among them, condition 1 refers to [5].
Condition-2: C 2 = min P c P t (12) where N c denotes the number of landslide grid cells for each cluster, N t denotes the number of landslide grid cells in the whole area (N t = 1010), P c equals to the number of landslide grid cells in each cluster divided by the number of total grid cells in each cluster, and P t denotes the percentage of landslides in Zhushan County (P t = 1010/1, 413, 608). According to the clustering results and screening conditions see Table 6, the clusters that meet condition 1 include Clusters 4 and 5. Based on the second condition, the cluster with the minimum C 2 value is the most suitable cluster for random sampling of nonlandslide, i.e., Cluster 5.

Weight Determination for Each Factor
In this study, a three-layered BPNN model was applied for weight determination using the MATLAB software package. The input nodes are the CFs, the number is 8 (N i = 8). The output node is the value of LSI (N o = 1). For the number of hidden layer nodes (N h ), the upper limit is (2N i + 1) [79], and the lower limit is (N i + N o )/2 [80]; thus the best range of hidden layer nodes is 5 ≤ N h ≤ 17. After comparing 5, 10, 12, 15, and 17 as possible hidden layer nodes, 15 nodes were identified as the best. Based on this, a BPNN model with a structure of 8-15-1 is established.
According to the literature [81], the training sample size (N sample ) for the three-layered BPNN model can be determined by Equation (13). (13) where N denotes the total number of nodes, W denotes the total number of weights, and ε denotes the accuracy parameter. In this study, assume the model has an accuracy level of 90%, and thus the ε equals to 0.1 and the range of training sample number is [1350,3213]. Therefore, 2510 grid cells were selected, including all landslide grid cells (1010) and 1500 non-landslide grid cells (selected from Cluster 5). Among them, 80% is used for model training, and 20% is used for model validation. For transfer functions and other parameters [82], please see Table 7. Table 7. Parameter settings in back-propagation neural network (BPNN) analysis. In order to make the results more rigorous, the calculation was repeated 10 times to achieve the average value see Table 8. The covariance values (COVs) indicated that the difference between the 10 calculations is small, and the overall result is reasonable and reliable. In Table 7, the mean values represent the calculated average weights of CFs, and the last column is the normalized weight of each CF. The SPI has the minimum weight while the lithology has the maximum weight. The results show that lithology has the greatest impact on the occurrence of landslides, which is very consistent with the actual survey situation. Moreover, the weight value of distance to road is second only to lithology. In the field investigation, part of the landslides in the study area were found distributed along the roads. The slope angle also has a greater impact on landslide occurrence, which is consistent with the situation shown in the CF analysis. Factors with a relatively small degree of influence are SPI and slope length.

Landslide Susceptibility Map
After the weights are determined, the FR of each class of the 8 CFs is obtained according to Equation (7), as shown in Table 5.
After obtaining the weights and the frequency ratios, the LSI value of each grid cell was calculated according to Equation (6). Based on the natural breakpoints method, the LSM results are divided into 5 levels, which are very low (19.49%), low (20.42%), moderate (23.29%), high (26.91%), and very high (9.89%). The LS map obtained by the CLSI model is shown in Figure 5c.

Validation and Analysis
The performance of the LSM model can be evaluated from two aspects. First, whether the model can accurately predict the nonlinear relationship between CFs and the occurrence of landslides based on the available data, i.e., explore the rules and use this to evaluate the landslide susceptibility within unknown areas. The higher the accuracy of the evaluation, the better the model performance. For practical application, the second criterion is the classification ability. The greater the difference between each zone, the higher the classification ability. Considering these two aspects, we did the following verification work.

Validation Based on AUC Accuracy
Regarding the first aspect, the accuracy of the LSM results must be verified. For this purpose, the receiver operating characteristic (ROC) curve and area under the curve (AUC), which have been widely used in previous studies [3,19,42,83], were applied in this study. The ROC curve reflects the corrections between the "Sensitivity" (Equation (14)) and "1-Specificity" (Equation (15)) [6], which are: 1 − Speci f icity = TN FP + TN (15) where TP is the true positive rate, FN is false negative rate, TN is true negative rate, and FP is false positive rate. The range of the AUC value is [0.5, 1], which reflects the overall accuracy of the prediction. The larger the value, the higher the model accuracy. Note that "false positive" means that a stable area is misjudged as a landslide-prone area. On the contrary, "false negative" refers to the situation that the landslide-prone area is misjudged as a stable area. In this study, a total of 71,640 grid cells were used to complete three ROC curves using SPSS software. All landslide grid cells were used to form a positive data set, and 5% of nonlandslide grid cells that had not participated in the previous calculations were randomly selected to form a negative data set. The result is shown in Figure 9. The prediction accuracy of the three models exceeds 80%, indicating a relatively good prediction performance. This laterally confirms the rationality of the selected CFs.  Figure 9 also suggests the CLSI model has the highest AUC value of 0.902, followed by the LR model (0.851) and the AHPIV model (0.820). The bias in determining the weights by the AHP method greatly reduces the accuracy of the AHPIV model. The CLSI model adopts the more objective and rational BPNN in determining the weights, thus leading to a better result. Benefiting from the rich historical survey data and the approximately linear relationship between some CFs and the occurrence of landslides (e.g., distance to river, distance to road), the performance of the LR model is also remarkable.

Validation Based on Seed Cell Area Index
Regarding the second aspect, the classification ability of the three LSM models must be verified. A good classifier produces a large difference between the divided zones. In this study, the seed cell area index (SCAI) [84] was used to quantify the difference. The landslide grid cell is called a "seed cell", and the SCAI can be obtained by SCAI = P area P seed (16) where P area suggests the percentage of grid cells in each susceptibility zone to total grid cells in the whole area and P seed suggests the percentage of landslide grid cells in each susceptibility zone to grid cells of all landslides. The SCAI is a dimensionless parameter. Generally, the high-proneness area should have a lower SCAI value, and vice versa. A greater difference in SCAIs between the high-proneness area and the low-proneness area indicates a better performance of the model. The SCAI values are calculated and shown in Table 9. From very low area to very high area, the SCAI values present a decreasing trend for the three models. These results also demonstrate the rationality of the three models. In addition, the SCAI differences between the very low and very high areas for the AHPIV and CLSI model are 12.08 and 11.98, respectively, and they both are far greater than that of the LR model (5.81). This is because the LR model has a large number of landslide grid cells in the very low area, which directly leads to poor interval division. The last column in Table 9 lists the difference value (D-value) of SCAI between adjacent zones. It can be seen that all the minimum D-values fall into the AHPIV and LR model, while there are two maximum D-values that fall into the CLSI model. This result indicates that, in addition to the accuracy, the CLSI model performs the best in classification ability. This is to be expected because the CLSI uses the trained BPNN to calculate the weights, and this captures the nonlinear relationship between the CFs and the occurrence of landslides, leading to more objective and scientific LSM results.

Validation Based on Landslide Points
Verification of the three landslide susceptibility maps was also performed based on existing landslide points. After LSM, the 373 existing landslides were marked on three LS maps (Figure 10), most of which are located in the very high and high susceptibility zones. From very low to very high susceptible zone, the number of landslide grid cells increases. The AHPIV model showed the worst performance. The CLSI model and LR model showed a better and similar performance. The LR model has a small value of the line slope between the very low area and the low area, indicating that the LR model has a weak ability to identify the low-proneness area. This is consistent with the results of the SCAI verification. However, the LR model has the largest line slope between the moderate area and high area. The CLSI model has a better resolution ability in identifying the very low area and the very high area, but it is slightly inferior to the LR model in distinguishing the moderate area and the high area. In summary, the LSM results show good consistency with the historical landslides, especially the results of the CLSI and LR model.

Discussion and Conclusions
In the past few decades, regional LSM has become a frontier research topic due to its complexity and nonlinear characteristics. A variety of methods have been used to establish the evaluating models. The authors have established a comprehensive landslide susceptibility index model (CLSI) [6], which is an integration of prior knowledge and an objective weighting method. To further verify the superiority and generalizability of this model, Zhushan County was taken as the study region. It is a landslide-prone area in Hubei Province of China. Two representative methods, namely, the LR and the analytic hierarchy process information value (AHPIV) model, were used for comparison. Specifically, LR represents the traditional statistical method, and AHPIV represents an integrated method that combines prior knowledge and subjective weight determination.
The LS maps ( Figure 5) generated by the three models were well coincident with each other. Specifically, the very high and high susceptibility areas are located along the roads and rivers within a distance of 200m. In addition, lithology also plays a vital role, especially for the soil distribution area. For example, the weights obtained by both the AHP method and BPNN method are highly correlated with the lithology and the distance to road. Moreover, because there are many shallow landslides in Zhushan County, a slope angle less than 30 • significantly contributes to the development of landslides.
On the other hand, the very low and low susceptibility areas are far from the river network and road network. These areas have a strong correlation with altitude, with most landslides distributed in zones with altitudes higher than 800 m. This phenomenon can be explained as follows. First, these areas are less affected by human engineering activities; hence the slope stability is higher. Second, due to the strong denudation in high altitude areas, few Quaternary deposits and strong weathering soft rocks accumulate in this area, which provide little material source for shallow landslides.
Generally, a good LSM model performs well in both result accuracy and susceptibility zone classification. Therefore, the performances of these three LSM models were validated in terms of these two aspects. The ROC curve and the value of SCAI were used as the indicators for these two aspects. The ROC results show that LSM in Zhushan County using the three models is viable, and the CLSI model has the highest AUC value of 0.902, followed by the LR (0.851) and AHPIV (0.820). The validation based on SCAI values indicate that these three models generate reasonable LSM partitions, and the CLSI model has the best classification ability. Subsequently, the existing landslide grid cell accumulation curves were used for further verification. A good agreement was obtained between the LS maps and existing landslides. The CLSI model has a better ability in identifying the very low area and the very high area. Through these comparisons, this study clearly reveals that the robust performance of the CLSI lies in the weight determination, that is, the determined weights by the BPNN successfully captures the nonlinear relationship between the CFs and the occurrence of landslides.
There are some literature regarding the comparison of the LR model and other existing methods in LSM [2,10,36,37,39,46]. Du et al. [34] compared the LR model, IV model, and LRIV model. The success rates were 69.2%, 68.8%, and 81.7%, respectively. The prediction rates were 78.5%, 71.6%, and 84.6%, respectively. This study showed that the performance of the LR model is in the middle position. Akgun [63] compared the LR model, likelihood ratio, and multi-criteria decision analysis. The LR was determined to be the most accurate method compared to the other two. Merghadi et al. [85] did a lot of work to compare the application of machine learning methods including the LR model. They believed that although the AUC value of the LR model is greater than 0.82, it has no accuracy advantage compared to other machine learning methods.
In addition, a few documents discuss the performance of the AHPIV model in landslide susceptibility mapping. Zhang et al. [31] elaborated on the process and prediction performance of the AHPIV model. The AUC value of this model was 0.694 for the prediction rate. Du et al. [8] compared two integrated models (the AHPIV and the LRIV) in LSM. The performances of the methods were also validated and compared using ROC curves. The AUC values obtained using the AHPIV and LRIV methods were 0.884 and 0.906, respectively. Results show that the LRIV method performs better than the AHPIV method. Banerjee et al. [86] also applied the AHPIV method to the field of LSM. The evaluation accuracy analysis result of this model was 85%.
Although scholars are committed to the comparison of different methods, it is hard to reach a consensus. Due to different prior knowledge of each study (i.e., different geological backgrounds, different types of occurrence landslides, and different conditioning factors), it is not possible to make a horizontal comparison. However, is the CLSI model superior to other methods besides the LR and AHPIV model or not? Can the selection of more conditioning factors give better LSM results or not? These problems need to be addressed and solved in our future study.
In summary, the main contributions of this research are (1) the LSM processes using the LR model, the AHPIV model, and the CLSI model was explored and summarized; (2) eight CFs of lithology, slope structure, slope angle, altitude, distance to river, SPI, and distance to road are reasonable CFs for LSM in Zhushan County; (3) reasonable LS maps of Zhushan County were produced in ArcGIS software; (4) the CLSI model was found to be more appropriate for LSM than the LR model and AHPIV model, in terms of result accuracy and classification ability; and (5) the CLSI model can be used as a robust predictor for the County-level area.