Landslide Susceptibility Prediction Considering Regional Soil Erosion Based on Machine-Learning Models

: Soil erosion (SE) provides slide mass sources for landslide formation, and reﬂects long-term rainfall erosion destruction of landslides. Therefore, it is possible to obtain more reliable landslide susceptibility prediction results by introducing SE as a geology and hydrology-related predisposing factor. The Ningdu County of China is taken as a research area. Firstly, 446 landslides are obtained through government disaster survey reports. Secondly, the SE amount in Ningdu County is calculated and nine other conventional predisposing factors are obtained under both 30 m and 60 m grid resolutions to determine the e ﬀ ects of SE on landslide susceptibility prediction. Thirdly, four types of machine-learning predictors with 30 m and 60 m grid resolutions—C5.0 decision tree (C5.0 DT), logistic regression (LR), multilayer perceptron (MLP) and support vector machine (SVM)—are applied to construct the landslide susceptibility prediction models considering the SE factor as SE-C5.0 DT, SE-LR, SE-MLP and SE-SVM models; C5.0 DT, LR, MLP and SVM models with no SE are also used for comparisons. Finally, the area under receiver operating feature curve is used to verify the prediction accuracy of these models, and the relative importance of all the 10 predisposing factors is ranked. The results indicate that: (1) SE factor plays the most important role in landslide susceptibility prediction among all 10 predisposing factors under both 30 m and 60 m resolutions; (2) the SE-based models have more accurate landslide susceptibility prediction than the single models with no SE factor; (3) all the models with 30 m resolutions have higher landslide susceptibility prediction accuracy than those with 60 m resolutions; and (4) the C5.0 DT and SVM models show higher landslide susceptibility prediction performance than the MLP and LR models.


Introduction
Landslides are one of the most common geological disasters worldwide, costing many human lives and incurring economic losses every year [1][2][3][4][5]. For example, millions of people and related property are seriously menaced by the widely distributed landslides in China [6]. Determining how to predict the spatial distribution of landslides has become a major concern of engineers all over the world. Landslide susceptibility prediction (LSP) can be defined as the spatial probability of

Introduction to Ningdu County and Landslide Inventory
Ningdu County (26 • 05 ~27 • 08 N, 115 • 40 ~116 • 17 E) is located in the north of Ganzhou City, Jiangxi Province of China, with a total area of 4075.5 km 2, as shown in Figure 1. Ningdu County is an area with hilly and mountainous terrain. The topography of Ningdu County is high in the north and low in the south, with an altitude ranging from 155~1411 m. Ningdu County is located in zone of warm temperate climate with fully humid according to Koppen's climate classification [41], with an average rainfall of 1500~1900 mm per year from the 1970s to 2015, where there is maximum rainfall in 2002 year with annual rainfall of 2380 mm. Rainfall processes usually occur from April to July in each year. The exposed strata in the study area include Presinian, Sinian, Cambrian, Carboniferous, Jurassic, Cretaceous and Quaternary systems. The land use types are mainly forest and bare grassland. The forest coverage rate of Ningdu County is approximately 71.8%, and the natural forest area accounts for approximately 87% of the forest area. The zonal vegetation is the central Asian zone of evergreen broad-leaved forest. low in the south, with an altitude ranging from 155~1411 m. Ningdu County is located in zone of warm temperate climate with fully humid according to Koppen's climate classification [41], with an average rainfall of 1500~1900 mm per year from the 1970s to 2015, where there is maximum rainfall in 2002 year with annual rainfall of 2380 mm. Rainfall processes usually occur from April to July in each year. The exposed strata in the study area include Presinian, Sinian, Cambrian, Carboniferous, Jurassic, Cretaceous and Quaternary systems. The land use types are mainly forest and bare grassland. The forest coverage rate of Ningdu County is approximately 71.8%, and the natural forest area accounts for approximately 87% of the forest area. The zonal vegetation is the central Asian zone of evergreen broad-leaved forest.
In this study, a landslide inventory map with a total of 446 landslides is produced based on the field investigation and high-resolution image interpretation from 1970 to 2003, implemented by the Ningdu Land Resources Bureau, Jiangxi Province of China. In general, these landslides are composed of the quaternary silty clay intercalated with crushed stones, the thickness of slide masses varies between 2 m and 8 m. According to the landslides classification rules presented by Hungr et al. [42], these landslides can be classified as shallow soil landslides with a movement type of clay/silt slide with main characteristics of small scale, high frequency and wide distribution. Generally speaking, these landslides are mainly distributed in the surrounding mountainous areas of Ningdu County, while less in the northern and central areas. Among these landslides, the minimum, maximum and average landslide areas are about 616 m 2 , 44,123 m 2 , and 6519 m 2 , respectively. The development process of these landslides is controlled by basic predisposing factors, and is induced by the heavy rainfall and unreasonable human engineering activities (such as mining of ore resources, land-use change, etc.) [43].  In this study, a landslide inventory map with a total of 446 landslides is produced based on the field investigation and high-resolution image interpretation from 1970 to 2003, implemented by the Ningdu Land Resources Bureau, Jiangxi Province of China. In general, these landslides are composed of the quaternary silty clay intercalated with crushed stones, the thickness of slide masses varies between 2 m and 8 m. According to the landslides classification rules presented by Hungr et al. [42], these landslides can be classified as shallow soil landslides with a movement type of clay/silt slide with main characteristics of small scale, high frequency and wide distribution. Generally speaking, these landslides are mainly distributed in the surrounding mountainous areas of Ningdu County, while less in the northern and central areas. Among these landslides, the minimum, maximum and average landslide areas are about 616 m 2 , 44,123 m 2 , and 6519 m 2 , respectively. The development process of these landslides is controlled by basic predisposing factors, and is induced by the heavy rainfall and unreasonable human engineering activities (such as mining of ore resources, land-use change, etc.) [43].

Conventional Landslide Predisposing Factors
The selection of landslide predisposing factors is uncertain due to the complexities of the landslide occurrence mechanism and the surrounding geological environments [6]. Most researchers select predisposing factors taking into account natural factors that affect landslide development (including topography and geomorphology, land cover, climate, etc.) and human engineering construction. However, the SE factor providing the material basis for landslide occurrence and reflecting long-term rainfall erosion is not considered. In fact, landslides are likely to occur in areas with serious SE phenomena. Hence, this study considers the SE factor and 9 other conventional predisposing factors for LSP (Table 1).
In this study, topography factors of elevation, slope, plan curvature, aspect, topographic wetness index (TWI) and profile curvature are calculated based on a digital elevation model (DEM) with 30 m resolution. The LSP modelling process will be complicated under grid resolution that is too high [44], while the landslide inventory and predisposing factors cannot be accurately reflected under grid resolution that is too low [45]. Hence, a grid resolution of 30 m, widely used in other studies [15,30,33,46], is selected to deal with the landslide inventory and predisposing factors considering the large area of Ningdu County. The elevation is often considered as a predisposing factor for landslides [47]. Soil and vegetation in the study area are affected by climate and temperature with increasing altitude, and the weathering effect of rocks also gradually decreases with increasing altitude. As a result, there are fewer landslide events in areas with high elevation values [48]. The slope and aspect reflect the landslide scale through the effects of external factors such as rainfall, solar radiation and land vegetation [20,[49][50][51]. The plan curvature reflects the influence of topography on the convergence and divergence of water flow during the downslope flow [35]. The profile curvature is defined as a vertical plane curvature paralleling the sloping direction [52]. The TWI describes the distribution of soil moisture on the surface [53,54]. In addition, lithology is an expression of the inherent physical properties of landslide occurrence. This study generates a lithology map using a geological map with scale of 1:100,000 provided by the Land Resources Bureau of Ningdu County. The NDVI and NDBI factors are acquired from Landsat-8 Thematic Mapper (TM) images (taken on 15 October 2013, path/row 121/41 and path/row 121/42). The NDVI reflects the density of vegetation on the ground, which has an inhibitory effect on the landslides occurrence, and the NDBI represents the density of build-up area and reflects the concentration degree of human activities to some extent in Ningdu County [16].
All predisposing factors are converted into grid units with 30 m resolution in ArcGIS 10.2 software ( Figure 2). The influence degrees of different subclasses of each predisposing factor on landslide occurrence can be calculated through the frequency ratio (FR) analysis, and the continuous predisposing factors are generally classified into eight subclasses for building the relationships between predisposing factors and landslides. Meanwhile, each continuous predisposing factor is classified by the Jenks natural break method, which can minimize differences within the subclass and maximize the differences between the subclasses ( Table 1). Most of these landslides mainly occur in areas of elevations less than 618 m and slopes above 19.33 • . The FR values of profile curvature is greater than 1 in areas above 4.71. Meanwhile, the probability of landslide occurrence is greater with NDVI values higher than 0.243 and/or with NDBI values higher than 0.292. Landslides are mainly distributed in metamorphic and carbonate rocks. In addition, a study area with SE amounts higher than 5 t/ha provides a constructive environment for landslide occurrence. In this study, the independence and collinearity test of predisposing factors are carried out, the results indicate that there are weak correlations and no multi-collinearity between factors. Hence, the FR values of all these factors can be used as input variables of LSP models.

Spatial Database Analysis
The LSP can be regarded as a binary classification problem. In this study, the 446 landslide locations are respectively converted into 3711 and 1748 landslide grid units under 30 m and 60 m grid resolutions in a raster format in ARCGIS 10.2. The landslide and non-landslide grid units are respectively set to be 1 and 0, which are taken as the output variables of the LSP models [55,56]. In the LSP processes, the landslide grid units and a same number of randomly selected non-landslide grid units are randomly divided into a training dataset and testing dataset with a ratio of 70% and 30%. In this study, the FR values of the predisposing factors are considered as the attribute features of landslide and non-landslide grid units. Hence, FR values are taken as the input dataset for LSP modelling. The 9 traditional predisposing factors are used for single models with no SE factor, while all the 10 predisposing factors are used for the SE-based models.

Modelling Processes Analysis
The main purpose of this study is to explore the effects of the SE factor on landslide occurrence and evaluate the importance of the SE factor in LSP modelling through comparisons of SE-based models and single models. This study has four main steps ( Figure 3): (1) preparation of the model dataset, including the landslide inventory map, the 10 predisposing factors (such as SE, topography wetness index (TWI), . . . etc.) for SE-based models, and 9 predisposing factors for single models (such as NDVI, TWI, . . . etc.). In addition, this model dataset is respectively expressed with 30 m and 60 m grid resolutions; (2) correlation analysis, collinearity diagnosis and relative importance analysis of all predisposing factors; (3) the MLP, LR, SVM and C5.0 DT models are applied to carry out the LSP; (4) The AUC value is adopted to evaluate the performance of all the SE-based models and single models, and the differences in LSI before and after adding the SE factor is verified through the Wilcoxon rank test.
as NDVI, TWI, …etc.). In addition, this model dataset is respectively expressed with 30 m and 60 m grid resolutions; (2) correlation analysis, collinearity diagnosis and relative importance analysis of all predisposing factors; (3) the MLP, LR, SVM and C5.0 DT models are applied to carry out the LSP; (4) The AUC value is adopted to evaluate the performance of all the SE-based models and single models, and the differences in LSI before and after adding the SE factor is verified through the Wilcoxon rank test.

Frequency Ratio Analysis
The FR reflects the distribution status of landslides in the subclasses of each predisposing factor and reveals the correlations between the landslide and each predisposing factor. An FR greater than 1 indicates that landslide events are more likely to occur under the corresponding predisposing factor, and a small FR suggests a low probable occurrence of landslide. The FR values of predisposing factors are used as the input variables of each model and it is expressed as Equation (1), where s′

Frequency Ratio Analysis
The FR reflects the distribution status of landslides in the subclasses of each predisposing factor and reveals the correlations between the landslide and each predisposing factor. An FR greater than 1 indicates that landslide events are more likely to occur under the corresponding predisposing factor, and a small FR suggests a low probable occurrence of landslide. The FR values of predisposing factors are used as the input variables of each model and it is expressed as Equation (1), where s represents the landslide area within a subclass of a predisposing factor, S and S respectively indicates the total area of landslides and the total area of Ningdu County, s represents the area of the subclass.

Modelling of Soil Erosion (SE) Intensity
SE is the process of soil destruction due to various external forces such as water power and wind power. The soil is eroded for a long time and can be regarded as the soil response under the integrated actions of rainfall, topographical factor, rock-soil properties, vegetation and human activities [57][58][59]. In this study, the Revised Universal Soil Loss Equation (RUSLE) model is used to calculate the SE modulus in the study area [60][61][62]. Then we can classify the SE modulus into five levels. According to the Technological Standard of Soil and Water Conservation (SL190-2007) [63], the very low level is of 0~5 t/ha, low level is of 5~25 t/ha, moderate level is of 25~50 t/ha, high level is of 50~80 t/ha and very high level is greater than80 t/ha. In general, the RUSLE is expressed as: where A is the average annual SE amount per unit area (t/(ha·year); R and K respectively indicate the average rainfall erosivity ((MJ·mm)/(ha·h·year)) and the soil erodibility ((t·ha·h)/(MJ·ha·mm)); L and S respectively suggest the slope length and the slope steepness factors; C and P suggest the cover and management factor and the conservation practices factor, respectively. In general, all of the factors L, S, C and P are regarded as dimensionless.

Results of Soil Erosion Intensity Calculation
The calculation of SE intensity is mainly obtained through Equation (2) (Figure 4f). The daily rainfall data of 7 meteorological stations from 1998 to 2017 are used to obtain the rainfall erosivity R values of the whole research area through the kriging model ordinary with Gaussian semi-variational function [64] (Figure 4a). Soil erodibility factor K is obtained using the 1:4 million soil type data of Ningdu County in the second national soil survey and the K values of similar soil types near the study area [65,66] (Figure 4b). Topographic factor LS is obtained using the LS formula proposed by Y. Liu et al. [67] with DEM data (Figure 4c). The C value is acquired through widely used NDVI values ( Figure 4d). The conservation practices factor P value is obtained by adopting manual interpretation of Landsat TM8 images and the current land use map of Ningdu County (Figure 4e). The above data are also checked through comparing with some other published papers [65,[68][69][70] to ensure the reliability of this study.

Multilayer Perceptron (MLP)
The MLP is a feed-forward neural network typically used to address the binary classification problem of the complex mapping relationship between predisposing factors and landslide event occurrence in the LSP process. In this study, the MLP model with one input layer, multi-hidden layer and one output layer adopting the back propagation algorithm is proposed. First, the weight values of input and hidden layers are set randomly, and then the activation function in the hidden layer is used to address the sum of the product between the input layer and these weights. Then, the errors between the output value and the expected values are calculated. Finally, a minimum error result is obtained in the error back propagation iteration process. The MLP has the advantages of a good learning ability in the training stage and a realized optimal error in the automatic weight adjustment process. As a result, an accurate and stable prediction model can be constructed. The total amount of SE in Ningdu County obtained in this study is 3.832 million t, with a SE modulus of 940.37 t/km 2 , which can be considered as low SE. Studies have shown that the average SE modulus in the Poyang Lake watershed is approximately 1100 t/km 2 [68], and that in Ganzhou City is 1386.66 t/km 2 [70]. Due to the similar geographical environment characteristics between Ningdu County and the above study areas and the significant effects of various conservation practices carried out by the government in recent years, the SE modulus of Ningdu County has gradually decreased, and the prediction of SE distribution is reliable.

Multilayer Perceptron (MLP)
The MLP is a feed-forward neural network typically used to address the binary classification problem of the complex mapping relationship between predisposing factors and landslide event occurrence in the LSP process. In this study, the MLP model with one input layer, multi-hidden layer and one output layer adopting the back propagation algorithm is proposed. First, the weight values of input and hidden layers are set randomly, and then the activation function in the hidden layer is used to address the sum of the product between the input layer and these weights. Then, the errors between the output value and the expected values are calculated. Finally, a minimum error result is obtained in the error back propagation iteration process. The MLP has the advantages of a good learning ability in the training stage and a realized optimal error in the automatic weight adjustment process. As a result, an accurate and stable prediction model can be constructed.

Logistic Regression (LR) Model
LR is a multivariate statistics method that has been widely used in the studies of landslide susceptibility all over the world [71]. The occurrence (set as 1) and/or non-occurrence (set as 0) probability of landslides can be calculated by LR model [72,73]. In this study, the LR model mainly uses the natural logarithm ratio (logit transformation) of landslide to non-landslide probability to construct a linear equation with each predisposing factor. The LR equation is expressed as: where p and 1 − p respectively suggest the occurrence and non-occurrence probability of a landslide; x i (i = 1, 2, · · · , n) is the predisposing factor; a i (i = 1, 2, · · · , n) is coefficient value of x i (i = 1, 2, · · · , n), and a i is positive, indicating that this factor has the positive effect on landslide occurrence; otherwise, it inhibits the landslide occurrence; a 0 suggests a regression constant value. Then the landslide susceptibility (named as occurrence probability) can be expressed as: y = e a 0 +a 1 x 1 +a 2 x 2 +a 3 x 3 +···+a n x n 1 + e a 0 +a 1 x 1 +a 2 x 2 +a 3 x 3 +···+a n x n (4)

Support Vector Machine (SVM)
SVM is often regarded as a classification system based on statistical learning theory [55,74]. This study adopts a set of linearly separable training vectors x i (i = 1, 2, · · · , n), including 10 predisposing factors and corresponding output classes y i = ±1 and the maximum clearance of the n-dimensional hyper-plane to distinguish landslide susceptibility. Its expression is shown in Equation (5), and the constraint condition of correct classification is shown in Equation (6) with ω of hyper-plane norm and b of constant value.
Then, the Lagrange function is introduced to solve the problem of convex quadratic optimization, as shown in Equation (7) with λ i as a Lagrange multiplier. For linear inseparability, a slack variable is added to control the classification error. The constraint condition of correct classification is changed as shown in Equation (8).
ν(0, 1) is introduced to consider misclassification, and the distance of the hyper-plane is shown in Equation (9). SVM possesses several kernel functions of polynomial and radial basis, and some studies have shown that the radial basis function kernel is more suitable for LSP [75].
The C5.0 DT is an improved version of the C4.5 algorithm, which not only optimizes memory but also introduces the concept of entropy [76,77]. The C5.0 DT is an inverted tree composed of a root node, inner node and leaf node, and its purpose is to establish a set of optimal sequences to realize the classification of input variables. This algorithm segments the training data according to the variables providing the maximum information gain and then improves the classification accuracy and simplifies the model by deleting or pruning the unimportant leaves [78]. The information gain represents the importance of the variable, and its size also determines the priority selection order of the variable establishing the rule. The C5.0 DT can explain the establishment of sequence rules of the model more intuitively compared with the black-box models [79]. In addition, C5.0 DT adopts the adaptive boosting method. The fundamental idea is to construct the model in an orderly way, with the latter model building the next model based on the error classification samples of the former model, and so on. Finally, the weighted sum of each model is used to predict the whole model to improve the classification accuracy. The features of the data in the deepest node of the complete decision tree will lead to the over-fitting of the data. Therefore, the appropriate pruning of decision trees can reduce the inference rules with insignificant effects to limit the growth of trees and optimize the model. The C5.0 DT modelling is built in this study. The output types of SE_C5.0 DT and C5.0 DT models are decision trees. The boosting method is used to improve the model accuracy with the number of experiments being 10. The accuracy of the model is estimated by cross-validation method with the folding numbers being 10. Meanwhile, the pruning degree of the expert model is selected to prune the decision tree in the model, and its value is set at 70 to avoid over-fitting of the model. In addition, the minimum number of records for each sub-branch in the decision tree is 15. An accurate model is obtained through the pruning degree and the minimum number of records of the sub-branch.

Statistically Significant Differences of Landslide Susceptibility Prediction (LSP) Models
To compare the statistically significant differences of LSP models before and after adding the SE factor, the Wilcoxon symbol rank test is used to conduct a non-parametric test on the two sets of LSIs calculated by the SE-based model and the single model. This method usually uses the Z value and P value to evaluate the significant differences between the two sets of LSIs. In general, there is a significant difference between the LSIs calculated by the SE-based models and single models if the Z value is outside the range of (−1.96, +1.96) and P value is less than 0.05 [80].

Results
The Based on the spatial database introduced in Section 3.3, the SE-based MLP model and single MLP model are adopted to carry out LSP in Ningdu County. Relevant parameters of both SE_MLP and MLP models are adjusted to construct the optimal models for LSP. The learning rate, momentum and iteration time in both models are 0.01, 0.25 and 500, respectively, and the number of hidden layers is set as two layers with the SoftMax activation function. The completely trained SE_MLP and MLP models are used to predict the LSI of each grid unit in Ningdu County. Then, two types of LSMs are produced under SE_MLP and MLP models by rasterizing the results obtained in ARCGIS 10.2. The LSMs in this study are reclassified into five LSLs: very low, low, moderate, high and very high-through Jenks natural break method [81] (Figure 5a,b). The area with very low LSL is maximum, while the low, moderate, high and very high LSLs are 30.93%, 17.79%, 18.26%, 19.14% and 13.88%, respectively, in the SE_MLP model.

LSP Related Spatial Dataset Analysis
In order to explore the impact of SE on landslide susceptibility under different grid resolutions, the SE intensity in Ningdu County is calculated under 60 m resolution, then the other 9 conventional predisposing factors and the landslide inventory are also expressed with 60 m resolution. As a result, all 446 landslides are divided into 1748 landslide grid units which are set to 1, meanwhile, the same of 1748 non-landslide grid units are randomly selected in the study area and are set to 0. These 3496

LSMs of LR model
It is necessary to determine the independence of these predisposing factors to avoid information redundancy before modelling the SE_LR and LR. The correlation coefficient R of each predisposing factor is calculated with the correlation analysis tool in SPSS 24.0 software, which indicates that there are weak correlations among all predisposing factors. In addition, collinearity testing is also carried out for all input variables in this study, and the col-linearity between predisposing factors are identified adopting a variance inflation factor (VIF) and tolerance index (TOL), with VIF > 5 or TOL < 0.1 indicating the severity of col-linearity. As shown in Table 2, no multi-collinearity existed among these predisposing factors, with the maximum value of VIF being 1.592 and the minimum value of TOL being 0.628. Therefore, all 10 predisposing factors can be used as input variables of SE_LR and LR models. The spatial dataset of the SE_LR model is imported into SPSS 24.0 software to acquire the regression coefficients (β), the standard error and significance of predisposing factors in the logistic regression equation. Table 2 shows that the significance of all predisposing factors is less than 0.  (Figure 5c,d).
The results show that the very low, low, moderate, high and very high areas account for 25.57%, 23.46%, 20.38%, 17.87% and 12.72% of the total area, respectively, in the SE_LR model.

LSMs of SVM and C5.0 DT Models
In this study, the SE_SVM and SVM models are established with the kernel function of radial basis function (RBF). The stop standard, rule parameter (C), regression accuracy and kernel parameter (γ) of both models are 10 −3 , 9, 0.1 and 0.6, respectively. These trained models are used to predict the LSIs of all grid units (Figure 5e,f). The very high, high, moderate, low and very low areas, respectively, account for 20.15%, 14.43%, 13.57%, 22.67% and 29.18% of the total area for the SE_SVM model. The trained C5.0 DT model is used to obtain the LSIs of the study area and produces LSMs (Figure 5g,h). The very high, high, moderate, low and very low areas respectively account for 14.34%, 16.55%, 16.80%, 22.67% and 32.94% of the total area for the SE_C5.0 DT model.

LSP Related Spatial Dataset Analysis
In order to explore the impact of SE on landslide susceptibility under different grid resolutions, the SE intensity in Ningdu County is calculated under 60 m resolution, then the other 9 conventional predisposing factors and the landslide inventory are also expressed with 60 m resolution. As a result, all 446 landslides are divided into 1748 landslide grid units which are set to 1, meanwhile, the same of 1748 non-landslide grid units are randomly selected in the study area and are set to 0. These 3496 landslide and non-landslide grid units are used as output variables of the SE-based and single models, and then are also randomly divided into a training and testing dataset with a ratio of 70/30%. The FR method is also used to calculate the non-linear correlations between landslide occurrences and predisposing factors.

LSMs Using SE-Based Model and Single Model
Firstly, the SE_MLP and single MLP models are built to produce the LSMs. The relatively optimal SE_MLP and single MLP models are constructed by adjusting the relevant parameters with the learning rate, momentum and iteration time in both models being set to 0.03, 0.5 and 500, respectively. The remaining parameters of the MLP model are the same as those under 30 m grid resolution. Then the trained SE-MLP and MLP models are adopted to predict the LSIs of grid units in Ningdu County and produce the LSMs (Figure 6a,b). The results show that the very low, low, moderate, high and very high susceptible areas respectively account for 29.49%, 15.42%, 21.29%, 18.59% and 15.21% of the total area in the SE_MLP model.  Table 3, suggesting that the SE, NDBI, slope, elevation and plan curvature factors play important roles in the processes of SE_LR model building. The LSMs are produced according to Equation (3) and are reclassified as shown in Figure 6c,d. Furthermore, the very low, low, moderate, high and very high susceptible areas respectively account for 35.31%, 19.74%, 18.61%, 15.67% and 10.67% of the total area for the SE_LR model.
For the use of SE_SVM and SVM models, The stop standard, rule parameter (C), regression accuracy and kernel parameter (γ) in both models are 10 −3 , 8, 0.1 and 0.7, respectively. The remaining SVM parameters are set to default values. Then, the trained SE_SVM and SVM models are used to predict LSIs and to produce LSMs (Figure 6e,f). Finally, the SE_C5.0 DT and C5.0 DT models are also used for LSP. In the adjustment processes of the SE_C5.0 DT and C5.0 DT models, the tree pruning degree is set to 60, and the minimum number of records for each sub-branch is set to 9. Then the LSMs produced by the trained SE_C5.0 DT and C5.0 DT models are shown in Figure 6g,h.

Discussion
In this paper, the importance of the SE factor in LSP modelling, the comparative studies between SE-based models and single models with no SE factor, and the statistical differences of LSP models under 30 m and 60 m resolutions are discussed.

Frequency Ratio Analysis of SE Factor under Different Resolutions
The predisposing factors with different spatial resolutions have major influence on the results of soil erosion intensity and LSP. Taking the soil erosion prediction as example, the characteristic of soil erosion in different unit areas is one of difference, e.g., the maximum soil erosion modulus is about 4700 (t/ha) under 30 m resolution while 1870 (t/ha) under 60 m resolution. Similarly, the LSP results under 30 m and 60 m resolutions are also different from each other due to the changes of landslide grid units number and the corresponding predisposing factors. The purpose of using different resolutions is to avoid the effects of the prediction errors of landslide susceptibility and soil erosion amount on the research results, and to obtain more realistic and reliable influence rules of soil erosion factor on LSP. Figure 7 indicates the relationship between landslide and SE classes and area distribution under 30 m and 60 m grid resolutions. Compared with the 60 m grid resolution, soil erosion under 30 m grid resolution has a stronger correlation with landslide events. The FR value is greater than 1 in the area with the high SE class and above, and it is 2.272 in the area with a very high class under 30 m grid resolution, where the FR is 2.68 times higher than that in the very low class. In addition, it is 1.295 in the area with a very high class under 60 m grid resolution. It is can be seen from Figure 7c that the area of SE with low level and above increase in 60 m grid resolution comparing with 30 m grid resolution.

Discussion.
In this paper, the importance of the SE factor in LSP modelling, the comparative studies between SE-based models and single models with no SE factor, and the statistical differences of LSP models under 30 m and 60 m resolutions are discussed.

Frequency Ratio Analysis of SE Factor under Different Resolutions
The predisposing factors with different spatial resolutions have major influence on the results of soil erosion intensity and LSP. Taking the soil erosion prediction as example, the characteristic of soil erosion in different unit areas is one of difference, e.g., the maximum soil erosion modulus is about 4700 (t/ha) under 30 m resolution while 1870 (t/ha) under 60 m resolution. Similarly, the LSP results under 30 m and 60 m resolutions are also different from each other due to the changes of landslide grid units number and the corresponding predisposing factors. The purpose of using different resolutions is to avoid the effects of the prediction errors of landslide susceptibility and soil erosion amount on the research results, and to obtain more realistic and reliable influence rules of soil erosion factor on LSP. Figure 7 indicates the relationship between landslide and SE classes and area distribution under 30 m and 60 m grid resolutions. Compared with the 60 m grid resolution, soil erosion under 30 m grid resolution has a stronger correlation with landslide events. The FR value is greater than 1 in the area with the high SE class and above, and it is 2.272 in the area with a very high class under 30 m grid resolution, where the FR is 2.68 times higher than that in the very low class. In addition, it is 1.295 in the area with a very high class under 60 m grid resolution. It is can be seen from Figure 7c that the area of SE with low level and above increase in 60 m grid resolution comparing with 30 m grid resolution.
However, the results in the two cases are similar in that soil erosion has a strong influence on the occurrence of landslide, which reflect how the frequency of landslides occurrence increases with the increase of soil erosion class (Figure 7a,b). The main reasons are that soil erosion reflects the soil loss amount under the coupling effects of rainfall, soil properties, topographic and land cover factors. Generally speaking, if a certain area has a more serious soil erosion condition, it indicates that there are worse environmental characteristics of brittle soil physical properties, complicated terrain, less land cover and/or sensitivity to heavy rainfall, which are more conducive to the occurrence of landslides [43]. Hence, soil erosion is one of the most important predisposing factors for the evolution of rainfall-induced shallow soil landslides. In this study, the FR values of the soil erosion factor embody the influence degrees of different classes of soil erosion intensity on the evolution of landslides, and the increase of FR value suggests that landslides are more likely to occur in areas with more serious soil erosion conditions. Therefore, FR values of the SE factor rise gradually with increasing SE intensity class. The importance of each predisposing factor reflects the contribution of the predisposing factor to landslide occurrence. These four models are essentially non-linear predictors based on the determination processes of the weights of corresponding predisposing factors. The importance ranks However, the results in the two cases are similar in that soil erosion has a strong influence on the occurrence of landslide, which reflect how the frequency of landslides occurrence increases with the increase of soil erosion class (Figure 7a,b). The main reasons are that soil erosion reflects the soil loss amount under the coupling effects of rainfall, soil properties, topographic and land cover factors. Generally speaking, if a certain area has a more serious soil erosion condition, it indicates that there are worse environmental characteristics of brittle soil physical properties, complicated terrain, less land cover and/or sensitivity to heavy rainfall, which are more conducive to the occurrence of landslides [43]. Hence, soil erosion is one of the most important predisposing factors for the evolution of rainfall-induced shallow soil landslides. In this study, the FR values of the soil erosion factor embody the influence degrees of different classes of soil erosion intensity on the evolution of landslides, and the increase of FR value suggests that landslides are more likely to occur in areas with more serious soil erosion conditions. Therefore, FR values of the SE factor rise gradually with increasing SE intensity class.
The importance of each predisposing factor reflects the contribution of the predisposing factor to landslide occurrence. These four models are essentially non-linear predictors based on the determination processes of the weights of corresponding predisposing factors. The importance ranks of these predisposing factors are evaluated depending on the relative values of these weights according to the inherent classification attributes of these non-linear predictors. It can be seen from Figure 8a Most of the landslides in Ningdu County belong to shallow soil landslides with characteristics of relatively loose soil, poor shear strength and sensitivity to rainfall. These unfavorable characteristics significantly improve the probability of landslide occurrence. At the same time, the SE can quantitatively reflect the processes of slope erosion and effect of sustained rainfall, and the product of soil erosion is the material source of slope accumulation layer [22,23]. Moreover, the FR values of landslides increase with the rise of SE level, suggesting that there is also a certain correlation between SE intensity and landslide occurrence [25]. Therefore, it is very necessary to consider the SE factor to improve the modelling accuracy of conventional LSP.

Accuracy Comparisons of SE-Based Models and Single Models under Different Resolutions
The evaluation of model performance is one of the key processes in successful LSP modelling, and the AUC is widely used to evaluate LSP performance [82,83]. The X-axis of the receiver operating characteristic (ROC) represents the proportion of non-landslides correctly classified as nonlandslides, and the Y-axis represents the proportion of landslides correctly classified as landslides. The AUC can quantitatively reflect the model accuracy, and further analyses of the performance of Most of the landslides in Ningdu County belong to shallow soil landslides with characteristics of relatively loose soil, poor shear strength and sensitivity to rainfall. These unfavorable characteristics significantly improve the probability of landslide occurrence. At the same time, the SE can quantitatively reflect the processes of slope erosion and effect of sustained rainfall, and the product of soil erosion is the material source of slope accumulation layer [22,23]. Moreover, the FR values of landslides increase with the rise of SE level, suggesting that there is also a certain correlation between SE intensity and landslide occurrence [25]. Therefore, it is very necessary to consider the SE factor to improve the modelling accuracy of conventional LSP.

Accuracy Comparisons of SE-Based Models and Single Models under Different Resolutions
The evaluation of model performance is one of the key processes in successful LSP modelling, and the AUC is widely used to evaluate LSP performance [82,83]. The X-axis of the receiver operating characteristic (ROC) represents the proportion of non-landslides correctly classified as non-landslides, and the Y-axis represents the proportion of landslides correctly classified as landslides. The AUC can quantitatively reflect the model accuracy, and further analyses of the performance of the SE factor in each model as shown in Table 4, Figures 9 and 10. In addition, compared with the LSP accuracy under 60 m resolution, the LSP accuracy of the model increased as a whole under the 30 m resolution. However, the variation of SE on landslide susceptibility under different resolution are not great. The LSP accuracy of the SE-based models under 30 m resolution is 2~4% higher than that of the single models with no SE factor, and it is 2~5% under 60 m resolution. Moreover, the SE factor performs best in the MLP model under 30 m grid resolution, improving the accuracy by 3.94%, while it is in the SVM model with improving the accuracy by 4.83% under 60 m grid resolution. In a word, the 30 m grid resolution is more suitable for the prediction of landslide susceptibility. Furthermore, the effect of SE with different grid resolutions on landslide susceptibility is not significant, and SE can significantly improve the accuracy of the model under different supposed size.

Statistical Differences between SE-Based Models and Single Models
The statistically significant differences between the LSIs calculated by the SE-based models and single model are shown inTable 5, the LSIs under 30 m and 60 m resolutions between SE-based models and single models all show significant differences, indicating that the SE factor has a great influence on LSP results. Therefore, it is necessary to consider the SE factor as a new predisposing factor in LSP modelling processes.

Conclusions
This paper explores the effect of the SE factor on the LSP under different grid resolutions. The MLP, LR, SVM and C5.0 DT models are used to determine the importance ranking of the SE factor in each model and the performance of each model in LSP for Ningdu County, China. The LSP results of these SE-based models and single models with no SE factor are discussed.
(1) The FR analysis suggests that the FR value of SE gradually increases with the rise of soil erosion class, and the landslides are more inclined to the higher class soil erosion area. Furthermore, it can be concluded that the contribution of SE factor in all models is the largest under both 30 m and 60 m grid resolutions. That is to say, the SE factor plays the most important role in LSP modelling compared to the other traditional predisposing factors.  The LSP accuracy of the SE-based models under 30 m resolution is 2~4% higher than that of the single models with no SE factor, and it is 2~5% under 60 m resolution. Moreover, the SE factor performs best in the MLP model under 30 m grid resolution, improving the accuracy by 3.94%, while it is in the SVM model with improving the accuracy by 4.83% under 60 m grid resolution. In a word, the 30 m grid resolution is more suitable for the prediction of landslide susceptibility. Furthermore, the effect of SE with different grid resolutions on landslide susceptibility is not significant, and SE can significantly improve the accuracy of the model under different supposed size.

Statistical Differences between SE-Based Models and Single Models
The statistically significant differences between the LSIs calculated by the SE-based models and single model are shown inTable 5, the LSIs under 30 m and 60 m resolutions between SE-based models and single models all show significant differences, indicating that the SE factor has a great influence on LSP results. Therefore, it is necessary to consider the SE factor as a new predisposing factor in LSP modelling processes.

Conclusions
This paper explores the effect of the SE factor on the LSP under different grid resolutions. The MLP, LR, SVM and C5.0 DT models are used to determine the importance ranking of the SE factor in each model and the performance of each model in LSP for Ningdu County, China. The LSP results of these SE-based models and single models with no SE factor are discussed.
(1) The FR analysis suggests that the FR value of SE gradually increases with the rise of soil erosion class, and the landslides are more inclined to the higher class soil erosion area. Furthermore, it can be concluded that the contribution of SE factor in all models is the largest under both 30 m and 60 m grid resolutions. That is to say, the SE factor plays the most important role in LSP modelling compared to the other traditional predisposing factors.