1. Introduction
Landslides are one of the most destructive geological hazards, which not only cause enormous damage to houses and infrastructure, such as bridges and roads, but also lead to loss of life [
1]. According to the World Health Organization, approximately 4.8 million people were affected, and more than 18,000 deaths were caused by landslides between 1998 and 2017. Specifically, as one of the countries with a high incidence of landslides, China suffered severe loss of life [
2,
3]. The China Statistical Yearbook indicates that during 2000 to 2015, 373,630 landslides occurred in this country, killing 10,996 people, which is approximately 690 landslide-related deaths per year [
4]. To mitigate the serious social impact caused by landslides, constructive and productive activities should be avoided in areas with high susceptibility to landslides. Therefore, developing an efficient method to distinguish landslide-prone zones is an essential need for both local governments and research institutes [
5]. Landslide susceptibility describes the likelihood that a landslide will occur in a certain area based on local terrain conditions [
6]. Landslide susceptibility mapping (LSM) is one of the most widely used assessment methods, it visualizes the spatial distribution of zones with different probabilities of occurrence of landslides in a certain area.
Various methods such as probabilistic analysis, statistical analysis, analytic process, and weighted overlay were widely applied to LSM by researchers in the early stages. With the development of Artificial Intelligence (AI) and Geographic Information System (GIS), machine learning-based methods, with the capability of solving complex nonlinear problems, are becoming increasingly popular compared to opinion-driven models and statistical learning, making the accuracy and precision of susceptibility models evolve rapidly [
7,
8]. Huang et al. [
9] adopted logic regression (LR), support vector machine (SVM), and random forest (RF) on LSM for model comparison. He et al. [
10] used RF in the global assessment of earthquake-induced landslide susceptibility. Sun et al. [
11] applied the Bayes algorithm to optimize the hyper-parameters of the RF model for LSM. Smith et al. [
12] compared the effect of landslide inventories assembled by different methods on the performance of RF and LF for LSM. Lim et al. [
13] applied the RF model to estimate the probability of a landslide. Nhu et al. [
14] investigated and compared the Logistic Model Tree, LR, NBTress, Artificial Neural Network (ANN), and SVM in the shallow landslide susceptibility mapping for Bijar City in Kurdistan City. Zhang et al. [
15] used the predictive performance of RF, XGBoost, SVM, and LR on landslide susceptibility mapping in Yunyang County. Hu et al. [
16] compared the effect of different non-landslide sampling methods on the performance of SVM and NB for LSM. Zhou et al. [
17] applied GeoDetector and RFE for factor optimization and then used the selected factors as inputs to train an RF model to obtain the LSM of Wuxi County. Sun et al. [
18] proposed a hybrid landslide warning model based on RF susceptibility zoning and precipitation. Zhou et al. [
19] constructed an interpretable model for the susceptibility to rainfall-induced shallow landslides based on SHAP and XGBoost.
Among those methods, RF is the most commonly used method in large-scale mapping and classification [
20,
21,
22] due to its characteristics of low computational cost, low data requirement, convenience of hyper-parameters tuning, and robustness in solving complex nonlinear problems [
23]. Previous work usually focused on quantitative analysis, such as the selection and improvement of models and input features, but rarely took into account the qualitative analysis of landslide areas. Actually, as one of the major geological hazards, landslides are highly area dependent, the mechanism of landslide formation and its corresponding triggering factors are undoubtedly different in distinct areas. The frequent fluctuation of reservoir water seriously reduces the stability of the slopes in the reservoir area, making them prone to landslides [
24]. For mountainous areas, however, rainfall is the major triggering factor for the occurrence of landslides [
25]. With increased population, human activities have become the major issue that accelerates landslide formation in areas with high population density. Therefore, manually dividing a relatively large region into different sub-zones according to the qualitative analysis of the landslide formation and geomorphic unit characteristics will theoretically improve mapping accuracy. This paper aims to use the 827 historical landslide data points in Yunyang County and the 20 conditioning factors to build 5 RF models, including an RF model (referred to as the parent model below) for the whole region and four RF models for the divided four sub-zones (referred to as sub-model one to sub-model four below). Then, the feature importance and the performances of the parent model and the four sub-models are analyzed and compared to verify the effectiveness of applying experience-based zonation before modeling.
2. Study Area
Chongqing City is located in the mountainous area around the eastern Sichuan basin and the slope area of the basin margin. It spans two tectonic units, namely the Yangtze quasi-platform and the Qinling fold system. The landscape of Chongqing City is mainly mountains and hills, which make up 92% of its total area. There are many adverse geological conditions accelerating the formation of landslides, dangerous rock collapse, ground collapse, debris flow, and other geological disasters, including developed surface water networks, strongly cut terrain, complex rock and soil structure, and geological structure, making it one of the cities with the highest geological disaster frequency in the country.
The spatial distribution of geological hazards in Chongqing City shows a certain degree of concentration and can be concluded as a striped distribution and vertical zonal distribution; moreover, its temporal distribution presents a seasonal cluster pattern. According to the statistics, there are currently 14,926 geological hazard-prone points in Chongqing City, of which 5776 (38.7%) are located in the 7 districts and counties of northeast Chongqing City (Wanzhou, Kaizhou, Chengkou, Wuxi, Wushan, Fengjie, Yunyang), 1864 (12.49%) are in the 5 districts and counties of southeastern Chongqing City (Wulong, Youyang, Qianjiang, Pengshui, Xiushan), and 1320 (8.84%) are in the 11 districts of the main city. Therefore, northeast Chongqing City is the key area with a high probability of potential geological disasters.
As one of the seven districts and counties in the northeast of Chongqing City (
Figure 1), Yunyang County (spans 108°24′37″–109°14′47″ E and 30°34′59″–31°26′28″ N) is located in the middle of the Three Gorges Reservoir Project area, being the important hub of the ecological and economic zone along the Yangtze River. According to the announcement of the Chongqing Forest Bureau, while the forest area of Chongqing city reaches 54.5%, that of Yunyang County exceeds 58.5%, making it one of the greenest counties in China. Based on the Seventh National Census of China, there were 929,034 long-term residents (48% of them are urban residents) in this area in the year 2020. Yunyang County is crossed by twelve major folds, namely Changdianfang Syncline (1), Macaoba Anticline (2), Qvmahe Syncline (3), Tiefengshan Anticline (4), Yangliuwan Syncline (5), Dongcun Anticline (6), Xinchang Anticline (7), Huangpoxi Syncline (8), Guling Syncline (9), Fangdoushan Anticline (10), Ganchang Syncline (11), and Longjukan Syncline (12). Under the subtropical monsoon climate, Yunyang County has an average annual rainfall of 1123.7 to 1264.8 mm and an average annual temperature from 10.2 to 18.5 °C.
Mountainous areas are generally susceptible to mass movements due to preparatory and triggering causal factors [
26]; not only the weathering effects but anthropogenic activities in the region also commonly accelerate the formation of unstable areas on both the earth material and on hill slopes [
27]. As a part of Chongqing City, Yunyang County has always been a significant hotspot for landslide occurrences. There are a total of 836 historical landslides recorded in the dataset; 827 data points are left after data cleaning. A total of 28.2% of them are small landslides, 51.8% are medium landslides, and 20% are large landslides. Among them, trust-load-caused landslides accounted for 53.7%, and loosen-caused landslides and multi-caused landslides accounted for 14.5% and 31.8%, respectively. To build sub-models, we manually divided the study area into four different sub-zones (
Figure 2) based on the information from the exploration of geological hazards in Chongqing City, such as the mechanism of landslide formation and sliding failure and the geomorphic unit characteristics. Among the four sub-zones, sub-zone II contains all the strip-distributing landslides along the mainstream of the Yangtze River, so it can also be called the Yangtze River mainstream zone. From a larger scope, a part of Yunyang County belongs to the low-hills section that crosses Yunyang, Fengjie, and Kaizhou; this area is classified as sub-zone IV. Sub-zone I (south of sub-zone II) is crossed by the main highway called S305, and the main area of sub-zone III (between sub-zone II and IV) is crossed by the S103 and S305. Similarly, the density of the road network is also at a high stage in the other two parts of sub-zone III. The landslides that occurred in these two sub-zones are found to be mainly along the roads (
Figure 3). After zonation, 89 landslides are located in sub-zone I, 285 of them are in sub-zone II, sub-zone III contains 44 landslides, and with the largest area, 408 of the historical landslides occurred in sub-zone IV.
As one of the typical landslides in the Three Gorges Reservoir area, the Jiuxianping landslide (in sub-zone II) is located on the left bank of the Yangtze River (
Figure 2b). After the Three Gorges Reservoir project, the fluctuation of the Three Gorges Reservoir water level restarted the displacement and deformation of the ancient landslide, making this area more prone to geological hazards. A subsidence of about one meter occurred on a roadway in the middle of the landslide body after heavy rain in 2003 and 2004, causing the roadway to be abandoned. With the impact of continuous heavy rain, landslides occurred in the back accumulation of Jiuxianping on 19 and 22 June 2007, causing the houses of the villagers to collapse, and the mountain body cracked. On 9 June 2009, the back-accumulation of Jiuxianping deformed again under the impact of heavy rain, causing cracks on both the accumulation body and the houses of the villagers. Recently, under the continuous effect of the Three Gorges Reservoir, this area has been in the overall creep deformation stage for years, especially the cliffs near the river, which often suffer from local collapse and damage.
The continuous heavy rains from 30 August to 1 September 2014 made the accumulated rainfall in Jiangkou Town more than 300 mm. The day after that, the Tuantan landslide (
Figure 2d) occurred on the back mountain and on the left side of the Yongfa Coal Mine staff dormitory in Tuantan village, Jiangkou Town, Yunyang County (in sub-zone IV). Although the employees were notified to evacuate from the area subjected to the massive landslide, twelve of them were buried on the spot. Unfortunately, only one of the twelve was saved.
Typically, in sub-zone I, under the impact of rainfall, a landslide occurred on the S202 Highway in the direction from Longjiao to Rucao (
Figure 2a) on 13 July 2021. Similarly, there was a 10,000 cubic-meter landslide triggered by heavy rainfall in the area of Mawang Temple (
Figure 2c), which trapped two four-wheel cars and a motorcycle, and blocked the highway section for five days.
5. Results
The landslide susceptibility maps generated by the parent model and sub-models are displayed in
Figure 8, where
Figure 8a represents the map outputted by the parent model, and
Figure 8b is the map produced by sub-models,
Figure 8c,d are the detailed scopes for the part of both maps. The entire region is divided and classified into five zones of susceptibility to different levels of landslides (very low, low, moderate, high, and very high) by the method of natural breaks method. From the result of the parent model in
Table 5, 14.6% of the whole area is classified as a very low landslide-prone zone, 22.8% as a low landslide-prone zone, 25.23% as a moderate landslide-prone zone, 21.47% as a high landslide-prone zone, and 15.90% as a very high landslide-prone zone. From the result of the sub-models, their ratios are 16.42%, 20.81%, 22.56%, 21.54%, and 18.67%, respectively.
Logically, the landslides/area ratio should increase from a very low landslide-prone zone to a very high landslide-prone zone, which is exactly what our models indicate. According to the results of the parent model, the landslides/area ratio increases from 0.016 to 4.548, from a very low landslide-prone zone to a very high landslide-prone zone, and that also increases from 0.007 to 4.051 from the results of the sub-models.
Figure 9 displays such a tendency, and it can be seen that the outputs of the sub-models have more obvious gaps between the very low landslide-prone zone and the very high landslide prone-zone, which reflects another merit of the sub-models compared with the parent model.
The validation AUC values of the five models are shown in
Figure 10; the AUC value of the parent model is 0.872, while that of sub-model one to sub-model four are 0.949, 0.892, 0.889, and 0.951, respectively. All of the sub-models outperformed the parent model, which proved the aforementioned hypothesis. However, the increases are not at the same level; sub-model four achieved the highest improvement (9.1%), while the lowest improvement is 1.9%, which was obtained by sub-model two.
The importance of features generally represents how much a specific feature contributes to the decision-making process of a model. In this case, the most important features can be the key factors in identifying landslide/non-landslide points. As shown in
Table 6 and
Figure 11, the distance from syncline axis, aridity, elevation, distance from rivers, and average annual temperature are of the highest importance for the parent model. NDVI, average annual rainfall, distance from road, and elevation are the main features that are associated with the formation of landslides in sub-zone I. For sub-model two, elevation is the most important feature, which is followed by average annual rainfall, distance from rivers, distance from anticline axis, and average annual temperature. Distance from road, HAILS, elevation, plane curvature, and average annual temperature are the top features for sub-model three. Last but not least, distance from syncline axis, average annual temperature, elevation, aridity, and average annual rainfall play important roles during the predicting process of sub-model four.
7. Conclusions
In this study, Yunyang County is manually zoned into four parts based on the qualitative analysis of geological hazards exploration in Chongqing City, including the mechanism of landslide formation and sliding failure and geomorphic unit characteristics. Based on the qualitative analysis result, five random forest landslide susceptibility models are constructed using historical landslides data points and twenty relating factors for the following quantity analysis. These models, including a parent model and four sub-models, are optimized by the grid search method individually. A comparison between the parent model and the combination of the sub-models is conducted. The following conclusions are drawn:
The AUC value of the parent model achieves 0.872, which shows that the traditional RF with the hyper-parameters tuned by the grid search method has a reliable performance on landslide susceptibility mapping. In this study, synclines have the most important effects on the formation of landslides in Yunyang County, followed by aridity, elevation, distance from rivers, and average annual temperature.
However, more general information extracted from “mainstream” landslides would usually cover that of the “minority” landslides when treating a large region equally, resulting in low information utility and the inability to identify potential landslides under special geological conditions. With enough data points, experience-based zoning before modeling is proved to be an effective solution to the issue; the qualitative analysis serves the purpose of pre-classification based on the information from geological hazards exploration, which groups the landslides that occurred under similar geological conditions, and thus enables the models to obtain the specific knowledge under each condition. Therefore, in our case, while the traditional RF obtained the general prediction skill for the entire region of Yunyang County, all the sub-models have become “experts” in their respective sub-areas. The test AUC values of sub-model one to four are 8.8%, 2.3%, 1.9%, and 9.1% higher than those of the parent model. Furthermore, the proposed method also contributes to further revealing the key factors that include local landslide instability under specific geological conditions, which can be used by planners and policymakers for a more specific and accurate landslide control in certain areas, thus further improving the safety of life and public property.
For sub-zone I, the top five conditioning factors are distance from syncline axis, NDVI, average annual rainfall, distance from road, and elevation. For sub-zone III, without the influence of major synclines, its top factors are distance from road, HAILS, elevation, plane curvature, and average annual temperature. For sub-zone IV, the distance from syncline axis becomes the most important factor again, and it is followed by average annual temperature, elevation, aridity, and average annual rainfall.
Sub-zone II is crossed by the Three Gorges Reservoir area. Suffered by periodic variation in reservoir water level and the impacts of other factors related to the reservoir band, the modified method based on general conditioning factors has relatively less effect on improving the accuracy of the mapping. The effect of more specific factors on the formation of landslides on the banks of the reservoir will be analyzed in further research. In the case of this paper, the results of sub-model two point out that elevation, average annual rainfall, distance from rivers, distance from anticline axis, and average annual temperature are the top five conditioning factors among the existing twenty factors for sub-zone II.