Landslides are one of the most common geological disasters worldwide, costing many human lives and incurring economic losses every year [1
]. For example, millions of people and related property are seriously menaced by the widely distributed landslides in China [6
]. Determining how to predict the spatial distribution of landslides has become a major concern of engineers all over the world. Landslide susceptibility prediction (LSP) can be defined as the spatial probability of landslide occurrence in a certain prediction unit of the study area, under the non-linear coupling effects of landslide-related basic predisposing factors with no consideration of external inducing factors [7
]. Landslide susceptibility maps (LSMs) produced from LSP, one of the main visualization tools of the landslide spatial distribution, are beneficial to local engineering geological surveys and the management of landslide-prone areas [8
The LSP utilizes the similar geological, terrain and other related conditions of previously occurring landslides to predict the possible location of landslide occurrence in the future [12
]. Therefore, it is very important to select the appropriate predisposing factors that affect the evaluation of landslides for accurate and reliable LSP [14
]. The existing literature has shown that the predisposing factors commonly used in LSP can be divided into four categories: topography factors (elevation, curvature, slope, topographic relief, etc.) [16
]; basic geology factors (rock categories, geological fault, etc. ) [17
]; hydrological factors (surface humidity index, distance from river network, annual rainfall, etc.) [18
]; and surface cover factors (normalized difference vegetation index (NDVI), normalized difference built-up index (NDBI), etc.) [20
]. The four types of predisposing factors mentioned above are mainly based on the slope structure, surface morphology, hydrological environmental and geological features of the slope. However, the influences of the soil material source on landslide evolution and the long-term process of rainfall erosion on landslide susceptibility have not been considered. In fact, slide mass sources and long-term rainfall erosion destruction are very important geology- and hydrology-related factors in landslide disasters. Hence, to better determine and explore the landslide susceptibility distribution, this study explores the effects of anew predisposing factor related to the slide mass sources and rainfall erosion on landslide occurrence based on the four types of conventional predisposing factors.
The slide mass sources of the soil landslide mainly include quaternary accumulation, such as alluvial deposits, residual and slope deposits, which are the products of regional soil erosion (SE) processes [22
]. Generally, the SE can be used to quantitatively analyse the processes of slope erosion and destruction by long-term rainfall. This is because SE reflects the processes of erosion, destruction, separation, transportation and deposition of soil and/or other ground materials under the actions of natural forces (mainly rainfall forces) and/or the combined actions of natural forces and human activities [21
]. In addition, related studies show that there is a certain correlation between SE intensity and the landslide occurrence [24
], meanwhile, the SE has been used for landslide prevention [25
]. Therefore, the SE can be introduced as a new predisposing factor and taken as the input variable of an LSP model.
On the basis of determining the input variables, researchers have developed many models for LSP. These models include heuristic and mathematical statistics models such as the weights of evidence model [9
], certainty-based factor model [26
], logistic regression (LR) [18
], linear discriminant model [28
], analytic hierarchy process method [27
], etc. Recently, various machine-learning models have also been commonly introduced to implement LSP, such as decision tree (DT) [29
], fuzzy logic [32
], artificial neural network [28
], random forest [35
], support vector machine (SVM) [36
], least-square support vector machine [39
] and some ensemble methods [6
], etc. These models connect the input variables with the landslide susceptibility index (LSI) which is regarded as the model output, through various training and testing algorithms. However, the landslide susceptibility prediction results of these types of models are significantly different, and there is no consensus on which model is the most suitable for LSP [40
]. Hence, through considering the SE factor, this paper builds SE-based multilayer perceptron (MLP), LR, SVM and C5.0 DT models to address LSP. In addition, to explore and compare the influence of the SE factor on the LSP modelling, single MLP, LR, SVM and C5.0 DT models without considering the SE factor are also used to address LSP. Furthermore, SE intensity and LSP model building under different grid resolutions (30 m and 60 m) are also carried out, to discuss the effects of different prediction units associated with these input variables on the conclusions of this study. Finally, the area under the receiver operating curve (AUC) is used to evaluate the performance of SE-based and single models to obtain the optimal LSP modelling processes.
In this paper, the importance of the SE factor in LSP modelling, the comparative studies between SE-based models and single models with no SE factor, and the statistical differences of LSP models under 30 m and 60 m resolutions are discussed.
5.1. Frequency Ratio Analysis of SE Factor under Different Resolutions
The predisposing factors with different spatial resolutions have major influence on the results of soil erosion intensity and LSP. Taking the soil erosion prediction as example, the characteristic of soil erosion in different unit areas is one of difference, e.g., the maximum soil erosion modulus is about 4700 (t/ha) under 30 m resolution while 1870 (t/ha) under 60 m resolution. Similarly, the LSP results under 30 m and 60 m resolutions are also different from each other due to the changes of landslide grid units number and the corresponding predisposing factors. The purpose of using different resolutions is to avoid the effects of the prediction errors of landslide susceptibility and soil erosion amount on the research results, and to obtain more realistic and reliable influence rules of soil erosion factor on LSP.
indicates the relationship between landslide and SE classes and area distribution under 30 m and 60 m grid resolutions. Compared with the 60 m grid resolution, soil erosion under 30 m grid resolution has a stronger correlation with landslide events. The FR value is greater than 1 in the area with the high SE class and above, and it is 2.272 in the area with a very high class under 30 m grid resolution, where the FR is 2.68 times higher than that in the very low class. In addition, it is 1.295 in the area with a very high class under 60 m grid resolution. It is can be seen from Figure 7
c that the area of SE with low level and above increase in 60 m grid resolution comparing with 30 m grid resolution.
However, the results in the two cases are similar in that soil erosion has a strong influence on the occurrence of landslide, which reflect how the frequency of landslides occurrence increases with the increase of soil erosion class (Figure 7
a,b). The main reasons are that soil erosion reflects the soil loss amount under the coupling effects of rainfall, soil properties, topographic and land cover factors. Generally speaking, if a certain area has a more serious soil erosion condition, it indicates that there are worse environmental characteristics of brittle soil physical properties, complicated terrain, less land cover and/or sensitivity to heavy rainfall, which are more conducive to the occurrence of landslides [43
]. Hence, soil erosion is one of the most important predisposing factors for the evolution of rainfall-induced shallow soil landslides. In this study, the FR values of the soil erosion factor embody the influence degrees of different classes of soil erosion intensity on the evolution of landslides, and the increase of FR value suggests that landslides are more likely to occur in areas with more serious soil erosion conditions. Therefore, FR values of the SE factor rise gradually with increasing SE intensity class.
The importance of each predisposing factor reflects the contribution of the predisposing factor to landslide occurrence. These four models are essentially non-linear predictors based on the determination processes of the weights of corresponding predisposing factors. The importance ranks of these predisposing factors are evaluated depending on the relative values of these weights according to the inherent classification attributes of these non-linear predictors. It can be seen from Figure 8
a,b that the relative importance of predisposing factors in the SE_SVM model is different compared with other models, in which the relative importance of SE, NDBI and slope are 23%, 19% and 17%, respectively, while other factors are relatively small under 30 m resolution. The same phenomenon also happened under 60 m resolution. On the whole, SE, elevation, NDBI, and slope make great contributions in all models under both 30 m and 60 m grid resolutions. Specially, the SE factor has the largest contribution in each model, with a relative importance in SE_MLP, SE_LR, SE_SVM and SE_C5.0 of 21.6% (14.15%), 17.32% (16.81%), 23% (23.86%) and 13% (14.29%) under 30 m (60 m) grid resolutions, respectively.
Most of the landslides in Ningdu County belong to shallow soil landslides with characteristics of relatively loose soil, poor shear strength and sensitivity to rainfall. These unfavorable characteristics significantly improve the probability of landslide occurrence. At the same time, the SE can quantitatively reflect the processes of slope erosion and effect of sustained rainfall, and the product of soil erosion is the material source of slope accumulation layer [22
]. Moreover, the FR values of landslides increase with the rise of SE level, suggesting that there is also a certain correlation between SE intensity and landslide occurrence [25
]. Therefore, it is very necessary to consider the SE factor to improve the modelling accuracy of conventional LSP.
5.2. Accuracy Comparisons of SE-Based Models and Single Models under Different Resolutions
The evaluation of model performance is one of the key processes in successful LSP modelling, and the AUC is widely used to evaluate LSP performance [82
]. The X
-axis of the receiver operating characteristic (ROC) represents the proportion of non-landslides correctly classified as non-landslides, and the Y
-axis represents the proportion of landslides correctly classified as landslides. The AUC can quantitatively reflect the model accuracy, and further analyses of the performance of the SE factor in each model as shown in Table 4
, Figure 9
and Figure 10
In terms of model performance, SE_MLP, SE_LR, SE_SVM and SE_C5.0 DT all have good predictive powers, with AUC values under 30 m (60 m) grid resolutions of 79.2% (76.2%), 78.3% (74.3%), 82.1% (82.4%) and 86.7% (79.7%), respectively. Meanwhile, the AUC values under 30 m and 60 m resolutions of single MLP, LR, SVM and C5.0 DT models with no SE factor are 76.2% (74%), 75.7% (72.3%), 79.6% (78.6%) and 84.8% (76.8%), respectively. The C5.0 DT model has the highest performance under 30 m resolution and the SVM model has the highest performance under 60 m resolution, while the LR model has the lowest prediction accuracy under both 30 m and 60 m resolutions. In addition, compared with the LSP accuracy under 60 m resolution, the LSP accuracy of the model increased as a whole under the 30 m resolution. However, the variation of SE on landslide susceptibility under different resolution are not great. The LSP accuracy of the SE-based models under 30 m resolution is 2~4% higher than that of the single models with no SE factor, and it is 2~5% under 60 m resolution. Moreover, the SE factor performs best in the MLP model under 30 m grid resolution, improving the accuracy by 3.94%, while it is in the SVM model with improving the accuracy by 4.83% under 60 m grid resolution. In a word, the 30 m grid resolution is more suitable for the prediction of landslide susceptibility. Furthermore, the effect of SE with different grid resolutions on landslide susceptibility is not significant, and SE can significantly improve the accuracy of the model under different supposed size.
5.3. Statistical Differences between SE-Based Models and Single Models
The statistically significant differences between the LSIs calculated by the SE-based models and single model are shown inTable 5
, the LSIs under 30 m and 60 m resolutions between SE-based models and single models all show significant differences, indicating that the SE factor has a great influence on LSP results. Therefore, it is necessary to consider the SE factor as a new predisposing factor in LSP modelling processes.