Landslide Susceptibility Research Combining Qualitative Analysis and Quantitative Evaluation: A Case Study of Yunyang County in Chongqing, China

: Machine learning-based methods are commonly used for landslide susceptibility mapping. Most of the recent publications focused on quantitative analysis, i.e., improving data processing methods, comparing and perfecting the data-driven model itself, but rarely taking the qualitative aspects of the local landslide occurrences into consideration and the further analysis of the key features was always lacking. This study aims to combine qualitative and quantitative analysis and examine its effect on mapping accuracy; based on the feature importance ranks and the related literature, the key features for identifying landslide/non-landslide points of different sub-zones were further analyzed. Before modeling, the study area Yunyang County, Chongqing City, China, was manually divided into four sub-zones based on the information from geological hazards exploration in Chongqing, including the mechanism of landslide formation and sliding failure and geomorphic unit characteristics. Upon the qualitative analysis basis, five grid searches tuned random forest models (one for the whole region and four for the sub-zones independently) were established by 1654 data points and 20 conditioning features. Compared with the conventional data-driven method, the integrated quantitative evaluation based on the qualitative analysis results showed higher reliability, which not only improved the mapping accuracy but also increased the AUC values of all four sub-models, which were 8.8%, 2.3%, 1.9% and 9.1% higher than that of the parent model. Moreover, the quantitative evaluation based on the qualitative analysis revealed the key factors affecting local landslide formation. Therefore, qualitative analysis is recommended in future landslide susceptibility modeling with the additional combination of data-driven methods.


Introduction
Landslides are one of the most destructive geological hazards, which not only cause enormous damage to houses and infrastructure, such as bridges and roads, but also lead to loss of life [1]. According to the World Health Organization, approximately 4.8 million people were affected, and more than 18,000 deaths were caused by landslides between 1998 and 2017. Specifically, as one of the countries with a high incidence of landslides, China suffered severe loss of life [2,3]. The China Statistical Yearbook indicates that during 2000 to 2015, 373,630 landslides occurred in this country, killing 10,996 people, which is approximately 690 landslide-related deaths per year [4]. To mitigate the serious social impact caused by landslides, constructive and productive activities should be avoided in areas with high susceptibility to landslides. Therefore, developing an efficient method to distinguish landslide-prone zones is an essential need for both local governments and research institutes [5]. Landslide susceptibility describes the likelihood that a landslide will occur in a certain area based on local terrain conditions [6]. Landslide susceptibility mapping (LSM) is one of the most widely used assessment methods, it visualizes the spatial distribution of zones with different probabilities of occurrence of landslides in a certain area.
Various methods such as probabilistic analysis, statistical analysis, analytic process, and weighted overlay were widely applied to LSM by researchers in the early stages. With the development of Artificial Intelligence (AI) and Geographic Information System (GIS), machine learning-based methods, with the capability of solving complex nonlinear problems, are becoming increasingly popular compared to opinion-driven models and statistical learning, making the accuracy and precision of susceptibility models evolve rapidly [7,8]. Huang et al. [9] adopted logic regression (LR), support vector machine (SVM), and random forest (RF) on LSM for model comparison. He et al. [10] used RF in the global assessment of earthquake-induced landslide susceptibility. Sun et al. [11] applied the Bayes algorithm to optimize the hyper-parameters of the RF model for LSM. Smith et al. [12] compared the effect of landslide inventories assembled by different methods on the performance of RF and LF for LSM. Lim et al. [13] applied the RF model to estimate the probability of a landslide. Nhu et al. [14] investigated and compared the Logistic Model Tree, LR, NBTress, Artificial Neural Network (ANN), and SVM in the shallow landslide susceptibility mapping for Bijar City in Kurdistan City. Zhang et al. [15] used the predictive performance of RF, XGBoost, SVM, and LR on landslide susceptibility mapping in Yunyang County. Hu et al. [16] compared the effect of different non-landslide sampling methods on the performance of SVM and NB for LSM. Zhou et al. [17] applied GeoDetector and RFE for factor optimization and then used the selected factors as inputs to train an RF model to obtain the LSM of Wuxi County. Sun et al. [18] proposed a hybrid landslide warning model based on RF susceptibility zoning and precipitation. Zhou et al. [19] constructed an interpretable model for the susceptibility to rainfall-induced shallow landslides based on SHAP and XGBoost.
Among those methods, RF is the most commonly used method in large-scale mapping and classification [20][21][22] due to its characteristics of low computational cost, low data requirement, convenience of hyper-parameters tuning, and robustness in solving complex nonlinear problems [23]. Previous work usually focused on quantitative analysis, such as the selection and improvement of models and input features, but rarely took into account the qualitative analysis of landslide areas. Actually, as one of the major geological hazards, landslides are highly area dependent, the mechanism of landslide formation and its corresponding triggering factors are undoubtedly different in distinct areas. The frequent fluctuation of reservoir water seriously reduces the stability of the slopes in the reservoir area, making them prone to landslides [24]. For mountainous areas, however, rainfall is the major triggering factor for the occurrence of landslides [25]. With increased population, human activities have become the major issue that accelerates landslide formation in areas with high population density. Therefore, manually dividing a relatively large region into different sub-zones according to the qualitative analysis of the landslide formation and geomorphic unit characteristics will theoretically improve mapping accuracy. This paper aims to use the 827 historical landslide data points in Yunyang County and the 20 conditioning factors to build 5 RF models, including an RF model (referred to as the parent model below) for the whole region and four RF models for the divided four sub-zones (referred to as sub-model one to sub-model four below). Then, the feature importance and the performances of the parent model and the four sub-models are analyzed and compared to verify the effectiveness of applying experience-based zonation before modeling.

Study Area
Chongqing City is located in the mountainous area around the eastern Sichuan basin and the slope area of the basin margin. It spans two tectonic units, namely the Yangtze quasi-platform and the Qinling fold system. The landscape of Chongqing City is mainly mountains and hills, which make up 92% of its total area. There are many adverse geological conditions accelerating the formation of landslides, dangerous rock collapse, ground collapse, debris flow, and other geological disasters, including developed surface water networks, strongly cut terrain, complex rock and soil structure, and geological structure, making it one of the cities with the highest geological disaster frequency in the country.
The spatial distribution of geological hazards in Chongqing City shows a certain degree of concentration and can be concluded as a striped distribution and vertical zonal distribution; moreover, its temporal distribution presents a seasonal cluster pattern. According to the statistics, there are currently 14,926 geological hazard-prone points in Chongqing City, of which 5776 (38.7%) are located in the 7 districts and counties of northeast Chongqing City (Wanzhou, Kaizhou, Chengkou, Wuxi, Wushan, Fengjie, Yunyang), 1864 (12.49%) are in the 5 districts and counties of southeastern Chongqing City (Wulong, Youyang, Qianjiang, Pengshui, Xiushan), and 1320 (8.84%) are in the 11 districts of the main city. Therefore, northeast Chongqing City is the key area with a high probability of potential geological disasters.
As one of the seven districts and counties in the northeast of Chongqing City ( Figure  1), Yunyang County (spans 108°24′37″-109°14′47″ E and 30°34′59″-31°26′28″ N) is located in the middle of the Three Gorges Reservoir Project area, being the important hub of the ecological and economic zone along the Yangtze River. According to the announcement of the Chongqing Forest Bureau, while the forest area of Chongqing city reaches 54.5%, that of Yunyang County exceeds 58.5%, making it one of the greenest counties in China. Based on the Seventh National Census of China, there were 929,034 long-term residents (48% of them are urban residents) in this area in the year 2020. Yunyang County is crossed by twelve major folds, namely Changdianfang Syncline (1), Macaoba Anticline (2), Qvmahe Syncline (3), Tiefengshan Anticline (4), Yangliuwan Syncline (5), Dongcun Anticline (6), Xinchang Anticline (7), Huangpoxi Syncline (8), Guling Syncline (9), Fangdoushan Anticline (10), Ganchang Syncline (11), and Longjukan Syncline (12). Under the subtropical monsoon climate, Yunyang County has an average annual rainfall of 1123.7 to 1264.8 mm and an average annual temperature from 10.2 to 18.5 °C. Mountainous areas are generally susceptible to mass movements due to preparatory and triggering causal factors [26]; not only the weathering effects but anthropogenic activities in the region also commonly accelerate the formation of unstable areas on both the earth material and on hill slopes [27]. As a part of Chongqing City, Yunyang County has always been a significant hotspot for landslide occurrences. There are a total of 836 historical landslides recorded in the dataset; 827 data points are left after data cleaning. A total of 28.2% of them are small landslides, 51.8% are medium landslides, and 20% are large landslides. Among them, trust-load-caused landslides accounted for 53.7%, and loosencaused landslides and multi-caused landslides accounted for 14.5% and 31.8%, respectively. To build sub-models, we manually divided the study area into four different subzones ( Figure 2) based on the information from the exploration of geological hazards in Chongqing City, such as the mechanism of landslide formation and sliding failure and the geomorphic unit characteristics. Among the four sub-zones, sub-zone II contains all the strip-distributing landslides along the mainstream of the Yangtze River, so it can also be called the Yangtze River mainstream zone. From a larger scope, a part of Yunyang County belongs to the low-hills section that crosses Yunyang, Fengjie, and Kaizhou; this area is classified as sub-zone IV. Sub-zone I (south of sub-zone II) is crossed by the main highway called S305, and the main area of sub-zone III (between sub-zone II and IV) is crossed by the S103 and S305. Similarly, the density of the road network is also at a high stage in the other two parts of sub-zone III. The landslides that occurred in these two sub-zones are found to be mainly along the roads (Figure 3). After zonation, 89 landslides are located in sub-zone I, 285 of them are in sub-zone II, sub-zone III contains 44 landslides, and with the largest area, 408 of the historical landslides occurred in sub-zone IV.  The continuous heavy rains from 30 August to 1 September 2014 made the accumulated rainfall in Jiangkou Town more than 300 mm. The day after that, the Tuantan landslide ( Figure 2d) occurred on the back mountain and on the left side of the Yongfa Coal Mine staff dormitory in Tuantan village, Jiangkou Town, Yunyang County (in sub-zone IV). Although the employees were notified to evacuate from the area subjected to the massive landslide, twelve of them were buried on the spot. Unfortunately, only one of the twelve was saved.
Typically, in sub-zone I, under the impact of rainfall, a landslide occurred on the S202 Highway in the direction from Longjiao to Rucao (Figure 2a) on 13 July 2021. Similarly, there was a 10,000 cubic-meter landslide triggered by heavy rainfall in the area of Mawang Temple (Figure 2c), which trapped two four-wheel cars and a motorcycle, and blocked the highway section for five days.

Random Forest
As one of the most popular classification methods, RF was first proposed by Breiman [28] and Cutler [29]. During its training process, different data subsets obtained by random sampling are used to train multiple decision trees as independent estimators; each of them is only allowed to fit the data based on the part of the input features, and the final output of the RF model is based on the voting results of the constructed estimators. With the double randomness, those estimators will be trained as distinct ones, and such a special structure makes RF less sensitive to noise and outliers, less likely to overfit and has a lower dependency on feature selection, but more robust and can provide more accurate predictions compared with the other machine learning models.
The RF training process can be concluded as the flowing three simple steps ( Figure  4): Bootstrapping: To build M decision trees (estimators), M subsets will be generated as training sets from the original dataset by sampling with replacement.
Modeling: Each subset obtained by bagging is used to train the corresponding estimator. Similarly, for the purpose of getting estimators as distinct as possible, the available features for each estimator are also randomly chosen from the entire feature list.
Voting: Each estimator will output a result independently, then the final result of the model will be generated based on the voting results (Equation (1)).
where f(x) represents the final output, and ( ) means the output of the mth tree.

Grid Search
As pre-set parameters, hyper-parameters play a crucial role in model performance, and there are rare algorithms that are hyper-parameter-free [30]. Therefore, proper hyperparameter tuning is essential for improving model performance. Grid search, as one of the conventional automatic hyper-parameter tuning methods, is widely used because of its simple operation. Its basic idea is to choose the best hyper-parameter combination by enumerating and iterating over all possible combinations. Although it is computationally expensive because of the exhaustive search process, grid search suits random forest very well, as random forest is a hyper-parameter tuning friendly model; only two hyper-parameters are to be tuned in our case.

Performance Measure
For classification problems, accurate-oriented modeling sometimes ends up with "rabid" models that tend to classify all samples as a certain type, especially for unbalanced datasets. Therefore, the confusion matrix is introduced in this case (Table 1). According to Table 1, the true positive rate, also called the recall, is defined as TPR = TP/(TP + FN), and the false positive rate is defined as FPR = FP/(FP + TN). After model establishment, its corresponding TPR versus FPR at different cutoff values can be plotted, which is called as receiver operation characteristic (ROC) curve and can be used to represent the predictive ability of the model. The area under the ROC curve (AUC) is commonly used as an index that represents the true classification accuracy of the model.

Data Collection and Preparation
Data collection and data cleaning are of the top importance in machine learning applications; high-quality data is the foundation of accurate prediction models. Model perfection also plays a vital role; choosing appropriate hyper-parameters will help models to extract useful information from input data more effectively and precisely. In this section, we will introduce the procedure for data preparation and model establishment. The analysis of the performance and the factor importance of different models will be presented in the following sections. The flowchart for this study is shown in Figure 5. Triggered by the joint effects of natural factors and human activities, the mechanism of the occurrence of landslides is very complex [5]. Based on the assumption that future landslides will occur under the same conditions as past landslides [31], evaluating these factors and analyzing their relationships with historically recorded landslides can contribute to forming the basis for the prediction of future landslides in an area [32,33]. However, after reviewing the studies related to the evaluation of landslide sensitivity from 1983 to 2016, Reichenbach et al. [34] found that there was a total of 596 different factors that affected landslide formation, considering all of them will be laborious and time-consuming. According to previous articles, topography, hydrology, geology, land cover, and natural and human-related factors are generally used for landslide susceptibility analysis [14,17]. Therefore, in this study, topographical factors (aspect, elevation, plane curvature, profile curvature, relief amplitude, slope), hydrological factors (aridity, distance from rivers, index of moisture (IM), and topographic wetness index (TWI)), geological factors (lithology, distance from anticline axis, distance from syncline axis), factors related to land cover (namely land use and normalized difference vegetation index (NDVI)), human factors (such as human activity intensity of land surface (HAILS), distance from road, and population), and natural factors (average annual temperature, average annual rainfall) are all considered ( Figure 6). As the grid data layers cannot be directly used for model training, the Yunyang County fishnet with a cell size of 25 × 25 m was created for the purpose of data extraction for model training and measuring the distances from specific structures/natural sources, the total number of cells is 5,831,382. The secondary data includes aspect, plane curvature, profile curvature, relief amplitude, slope, distance from rivers, TWI, distance from anticline axis, distance from syncline axis, and distance from road. Among them, TWI, aspect, plane curvature, profile curvature, relief amplitude, and slope were obtained by ArcGIS processing of DEM. The characteristics associated with distances (i.e., distance from rivers, distance from anticline axis, and distance from syncline axis) were obtained by applying the near function of ArcGIS to the corresponding primary data (i.e., the river network of Chongqing City, the road network of Chongqing City, and the geological structure of

Human-related factors
Yunyang County) and the created fishnet of Yunyang County. The ranges and categories of the 20 characteristics are summarized in Table 2. All 20 factors were visualized by ArcGIS with a resolution of 25 m (Figure 7). Then the corresponding classes and rating values of the 20 factors were assigned to each cell of the prepared fishnet.
After obtaining the 827 historical landslide cells in the study area as positive samples (1), the same number of non-landslide cells were randomly extracted from the landslidefree areas as negative samples (0) [35,36]. To make sure that the sub-models can be constructed with the same data as the parent model, the number of negative samples selected from each sub-zone was determined by the number of the positive samples in the subzone (i.e., 89 for sub-zone I, 286 for sub-zone II, 44 for sub-zone III, and 408 for sub-zone IV). Thus the five sample datasets were formed (Table 3). Based on previous research [14,37], 80% of each dataset was randomly selected to train the corresponding statistical model, and the remaining 20% was used for validation purposes.  I  89  89  178  80%  142  II  286  286  572  80%  457  III  44  44  88  80%  70  IV  408  408  816  80%  652  Whole Region  827  827  1654 80% 1323

Model Development and Application
Five independent random forest-based models were established and validated based on the sample datasets constructed before. The hyper-parameters of each model were tuned by the grid search method. Being considered as the most straightforward optimization method [38], the grid search method needs to iterate over the entire interval of each hyper-parameter. Therefore, the number and the iterating interval of the tuned hyperparameters have an obvious impact on its search efficiency. The number of estimators determines the stability of a random forest model, and adding more estimators will lower its mean squared prediction delta (MSPD) and hence improve model stability [39], but will increase computational cost [40]. The maximum depth of a tree controls the stability of a random forest model in a different way. The stability will decrease as depth increases since increasing the depth will make the model tend to just memorize the training data, but if the forest is too shallow, the model will underfit, resulting in low AUC [39]. In this case, the number of estimators and the maximum depth of a tree are the two hyper-parameters to be tuned; the results are shown in Table 4.
After obtaining well-trained models, we used all 5,831,382 cells prepared in Section 4.1 as input to generate the landslide susceptibility maps for the study area.

Results
The landslide susceptibility maps generated by the parent model and sub-models are displayed in Figure 8, where Figure 8a represents the map outputted by the parent model, and Figure 8b is the map produced by sub-models, Figures 8c,d are the detailed scopes for the part of both maps. The entire region is divided and classified into five zones of susceptibility to different levels of landslides (very low, low, moderate, high, and very high) by the method of natural breaks method. From the result of the parent model in Table 5, 14.6% of the whole area is classified as a very low landslide-prone zone, 22.8% as a low landslide-prone zone, 25.23% as a moderate landslide-prone zone, 21.47% as a high landslide-prone zone, and 15.90% as a very high landslide-prone zone. From the result of the sub-models, their ratios are 16.42%, 20.81%, 22.56%, 21.54%, and 18.67%, respectively.
Logically, the landslides/area ratio should increase from a very low landslide-prone zone to a very high landslide-prone zone, which is exactly what our models indicate. According to the results of the parent model, the landslides/area ratio increases from 0.016 to 4.548, from a very low landslide-prone zone to a very high landslide-prone zone, and that also increases from 0.007 to 4.051 from the results of the sub-models. Figure 9 displays such a tendency, and it can be seen that the outputs of the sub-models have more obvious gaps between the very low landslide-prone zone and the very high landslide prone-zone, which reflects another merit of the sub-models compared with the parent model.    The validation AUC values of the five models are shown in Figure 10; the AUC value of the parent model is 0.872, while that of sub-model one to sub-model four are 0.949, 0.892, 0.889, and 0.951, respectively. All of the sub-models outperformed the parent model, which proved the aforementioned hypothesis. However, the increases are not at the same level; sub-model four achieved the highest improvement (9.1%), while the lowest improvement is 1.9%, which was obtained by sub-model two.
The importance of features generally represents how much a specific feature contributes to the decision-making process of a model. In this case, the most important features can be the key factors in identifying landslide/non-landslide points. As shown in Table 6 and Figure 11, the distance from syncline axis, aridity, elevation, distance from rivers, and average annual temperature are of the highest importance for the parent model. NDVI, average annual rainfall, distance from road, and elevation are the main features that are associated with the formation of landslides in sub-zone I. For sub-model two, elevation is the most important feature, which is followed by average annual rainfall, distance from rivers, distance from anticline axis, and average annual temperature. Distance from road, HAILS, elevation, plane curvature, and average annual temperature are the top features for sub-model three. Last but not least, distance from syncline axis, average annual temperature, elevation, aridity, and average annual rainfall play important roles during the predicting process of sub-model four.

Feature Importance Analysis
As indicated in Table 6, the distance from syncline axis is at the dominant place among the twenty factors for the parent model, sub-model one, and sub-model four. Factors associated with water, such as average annual rainfall and the distance from rivers, are also important places. The triggering factors of shallow landslides are highly dependent on the rainfall water infiltration and its further redistribution [41]. Due to its softening effect and long-term erosion effect, the distance from rivers has a significant influence on the development of landslides [42]. Temperature has a remarkable effect on landslide formation; experimental results indicated that the shear strength of slip surface soils reduces with decreasing temperature, which will negatively affect slope instability [43]. Therefore, obtained by the accumulated temperature and rainfall, aridity is also of great importance. Although different plants generally have a different contribution of rainfall to soil water [44], which will affect slope stability differently, on average, the effect of water uptake from the plant cover makes the vegetated slopes averagely 12.84% drier, and matric suctions three times higher than the fallow slope, which contributes to slope stability [45]; so NDVI is taken as one of the main considerations. Unlike vegetation cover, human-made land covers usually have negative influences on slope stability. The development of new building areas can potentially increase susceptibility to landslides [46]. Road networks not only directly destabilize existing slopes by disturbing their original structures during construction processes, but transport activities also have negative effects on slope stability. Similarly, HAILS is another influential factor. For topological factors, the two most important factors, in this case, are elevation and plane curvature.

Parent Model
When dealing with the whole region, the model tends to capture information that is more general, and thus its feature importance has a higher universality. In our case, six major synclines are lying in the whole domain, and their generally dominant effects on identifying landslide/non-landslide points are successfully captured by the parent model. Aridity, distance from rivers, and average annual temperature affect the formation of landslides in meteorological and hydrological aspects. Although ranked fourth as the most important one, elevation is the most important topological factor in the research area.

Sub-Models
After dividing the whole region into different sub-zones, the models will have the chance to learn the knowledge that is specifically right for each sub-zone. Thus their performances will be improved.
Sub-model One. Located in the south of Yunyang County, sub-zone I occupies 741.7 square kilometers, which is only 20% of the entire research area. Nevertheless, it is crossed by two of the six major synclines, which makes the syncline effect more obvious, and, therefore, the distance from the syncline axis is still the most important factor for sub-zone I. Similarly, the landslide susceptibility in sub-zone I is also affected by both hydrological and topological factors, namely average annual rainfall and elevation. However, specifically, as it is crossed by the major highway S305, land cover (NDVI) and human activities (distance from road) are the other two main influential factors.
Sub-model Two. As we discussed before, sub-zone II is the most special sub-zone since it is crossed by the Three Gorges Reservoir area, most of the landslides in Yunyang County were impacted by water occurred here. Therefore, theoretically, the landslide formation in sub-zone II is sensitive to hydrological factors. In this case, the average annual rainfall and the distance from rivers are ranked as the second and the third-most important factors. The reason why elevation is of the greatest importance is that lower places are usually prone to the influence of the periodic variation of reservoir water level. Statistically, 76% of the landslides impacted by water in sub-zone II occurred below the elevation of 249 m. The tectonic action and natural factors also contribute a lot to the landslide formation in sub-zone II; the distance from anticline axis and the average annual temperature are ranked in fourth and fifth place, respectively.
Sub-model Three. For sub-zone III, specifically, its data points are relatively far away from the major synclines, which mitigates the importance of the feature. Instead, as an area crossed by major highways and dense road networks, human activities, including distance from road and HAILS, are the two most important factors that have the main contribution to the landslide formation of the sub-zone. Topographical factors, namely elevation and plane curvature, are ranked in third and fourth place, respectively, and the average annual rainfall is evaluated as the fifth-most important factor.
Sub-model Four. Crossed by three major synclines, the landslide identification in subzone IV is also significantly dependent on the distance from the syncline axis. Furthermore, since it is part of the low hills section crossing Yunyang, Fengjie, and Kaizhou, the formation of landslides in this area is primarily influenced by the factors associated with the stability of mountain bodies, such as the average annual temperature, elevation, aridity, and the average annual rainfall.

Conclusions
In this study, Yunyang County is manually zoned into four parts based on the qualitative analysis of geological hazards exploration in Chongqing City, including the mechanism of landslide formation and sliding failure and geomorphic unit characteristics. Based on the qualitative analysis result, five random forest landslide susceptibility models are constructed using historical landslides data points and twenty relating factors for the following quantity analysis. These models, including a parent model and four sub-models, are optimized by the grid search method individually. A comparison between the parent model and the combination of the sub-models is conducted. The following conclusions are drawn: The AUC value of the parent model achieves 0.872, which shows that the traditional RF with the hyper-parameters tuned by the grid search method has a reliable performance on landslide susceptibility mapping. In this study, synclines have the most important effects on the formation of landslides in Yunyang County, followed by aridity, elevation, distance from rivers, and average annual temperature.
However, more general information extracted from "mainstream" landslides would usually cover that of the "minority" landslides when treating a large region equally, resulting in low information utility and the inability to identify potential landslides under special geological conditions. With enough data points, experience-based zoning before modeling is proved to be an effective solution to the issue; the qualitative analysis serves the purpose of pre-classification based on the information from geological hazards exploration, which groups the landslides that occurred under similar geological conditions, and thus enables the models to obtain the specific knowledge under each condition. Therefore, in our case, while the traditional RF obtained the general prediction skill for the entire region of Yunyang County, all the sub-models have become "experts" in their respective sub-areas. The test AUC values of sub-model one to four are 8.8%, 2.3%, 1.9%, and 9.1% higher than those of the parent model. Furthermore, the proposed method also contributes to further revealing the key factors that include local landslide instability under specific geological conditions, which can be used by planners and policymakers for a more specific and accurate landslide control in certain areas, thus further improving the safety of life and public property.
For sub-zone I, the top five conditioning factors are distance from syncline axis, NDVI, average annual rainfall, distance from road, and elevation. For sub-zone III, without the influence of major synclines, its top factors are distance from road, HAILS, elevation, plane curvature, and average annual temperature. For sub-zone IV, the distance from syncline axis becomes the most important factor again, and it is followed by average annual temperature, elevation, aridity, and average annual rainfall.
Sub-zone II is crossed by the Three Gorges Reservoir area. Suffered by periodic variation in reservoir water level and the impacts of other factors related to the reservoir band, the modified method based on general conditioning factors has relatively less effect on improving the accuracy of the mapping. The effect of more specific factors on the formation of landslides on the banks of the reservoir will be analyzed in further research. In the case of this paper, the results of sub-model two point out that elevation, average annual rainfall, distance from rivers, distance from anticline axis, and average annual temperature are the top five conditioning factors among the existing twenty factors for subzone II.

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.